Skip to content

Bugfix for aborted jobs handling in the scheduler

This is a bug fix for the aborted job handling of the scheduler. Instead of waiting for the configured amount of time before restarting an aborted job, jobs start immediately.

This is due to a concurrency issue. Failure of alignment etc. is communicated in a separate thread as event to the scheduler. When the scheduler is running, it has a timer the triggers scheduler iterations. In order to communicate the delay, it is necessary to stop the scheduler loop, set the delay as timeout and restart the scheduler loop.

To test it, take a simple schedule where alignment is necessary and set the capture time in alignment to 0.1 sec so that alignment fails. Start the scheduler and check what happens if the scheduler aborts the job due to alignment failure.

Without this fix, if "Immediate" is selected, the aborted job immediately starts and not after the configured delay of typical 2min. In the case of "Queue", the scheduler shuts down the observatory.

Edited by Wolfgang Reissenberger

Merge request reports