Failed dither settling indefinitely stalls capture module
Over my experience with kstars, I've unfortunately lost many night's automated observations to something related to a dither request.
I am running kstars as versioned below:
[2021-04-14T20:33:52.805 PDT INFO ][ org.kde.kstars] - Welcome to KStars 3.5.2 Stable
[2021-04-14T20:33:52.823 PDT INFO ][ org.kde.kstars] - Build: 2021-03-01T11:47:36Z
[2021-04-14T20:33:52.850 PDT INFO ][ org.kde.kstars] - OS: "ubuntu"
[2021-04-14T20:33:52.858 PDT INFO ][ org.kde.kstars] - API: "x86_64-little_endian-lp64"
[2021-04-14T20:33:52.868 PDT INFO ][ org.kde.kstars] - Arch: "x86_64"
[2021-04-14T20:33:52.884 PDT INFO ][ org.kde.kstars] - Kernel Type: "linux"
[2021-04-14T20:33:52.893 PDT INFO ][ org.kde.kstars] - Kernel Version: "5.4.0-70-generic"
[2021-04-14T20:33:52.902 PDT INFO ][ org.kde.kstars] - Qt Version: 5.12.8
I use PHD2 external guiding as versioned:
PHD2 version 2.6.9dev4 [Linux]
In kstars, my dither settings are as follows:
- Dither (enabled): 2.0 pixels
- Frequency: 3 frames
- Threshold: 1.00 pixels
- Settle: 15 seconds
- Timeout: 45 seconds Max. Iterations: 10 seconds
In PHD2, it is implemented as Spiral, with a scale of 1.0
What I'll generally see is that kstars made a dither request, PHD2 thinks it responded, or perhaps failed to settle, and then the kstars capture module just waits -- forever -- until sunrise and the mount gets parked. PHD2 guides happily during this entire duration, and I've lost the remainder of the night to this "glitch."
The Timeout / iterations settings in kstars appear to have no effect, as no timeout or retry ever happens. I'm usually operating within an Ekos sequence, but I believe I've seen it happen in just a normal capture module sequence.
Examining the logs, I don't see anything obvious. I've attached the verbose/debug logs from both Kstars and PHD2 for reference.
You will see the final dither activity at 2021-04-15T00:45:51.628
in the kstars log. In the PHD2 log, this corresponds to line ~10176.
I also have PHD Debug logs, which provide the following output on the last/failed dither command:
00:45:51.257 00.001 140714079116544 SchedulePrimaryMove(0x563adbb4c4c0, x=0.72, y=0.01, opts=13)
00:45:51.257 00.000 140714079116544 Enqueuing Move request for scope (0.72, 0.01)
00:45:51.257 00.000 140714069317376 Worker thread wakes up
00:45:51.257 00.000 140714069317376 worker thread servicing REQUEST_MOVE scope ofs (0.72, 0.01) opts 0xd
00:45:51.257 00.000 140714069317376 Handling offset move in thread for scope, endpoint = (0.72, 0.01)
00:45:51.257 00.000 140714069317376 Moving (0.72, 0.01) raw xDistance=0.04 yDistance=-0.72
00:45:51.258 00.001 140714069317376 PPEC rslt: input = 0.04, final = 0.06, react = 0.03, pred = 0.12, hyst = 0.00, hyst_pct = 0.52, period_length = 490.70
00:45:51.258 00.000 140714069317376 PPEC: input: 0.04, control: 0.06, exposure: 1000
00:45:51.258 00.000 140714069317376 Throw from /build/phd2-07BSyW/phd2-2.6.9.rev20210208/guide_algorithm_resistswitch.cpp:154->Not getting worse
00:45:51.258 00.000 140714069317376 GuideAlgorithmResistSwitch::result() returns 0.00 from input -0.72
00:45:51.258 00.000 140714069317376 MoveAxis(W, 15, ABG)
00:45:51.258 00.000 140714069317376 INDI Mount: timed pulse dir 3 dur 15 ms
00:45:51.258 00.000 140714069317376 INDI Mount: wait for move complete
00:45:51.262 00.004 140714079116544 UpdateImageDisplay: Size=(1936,1096) min=0, max=65517, med=167, FiltMin=90, FiltMax=65476, Gamma=0.270
00:45:51.287 00.025 140714010855168 INDI Mount: Receiving Switch: TELESCOPE_SLEW_RATE = 1
00:45:51.287 00.000 140714010855168 INDI Mount: Receiving Number: TELESCOPE_TIMED_GUIDE_WE = 15 state = Busy
00:45:51.331 00.044 140714010855168 INDI Mount: Receiving Switch: TELESCOPE_MOTION_WE = 0
00:45:51.331 00.000 140714010855168 INDI Mount: Receiving Number: TELESCOPE_TIMED_GUIDE_WE = 0 state = Idle
00:45:51.331 00.000 140714069317376 INDI Mount: move completed
00:45:51.331 00.000 140714069317376 Move returns status 0, amount 15
00:45:51.331 00.000 140714069317376 MoveAxis(N, 0, ABG)
00:45:51.331 00.000 140714069317376 Move returns status 0, amount 0
00:45:51.331 00.000 140714069317376 move complete, result=0
00:45:51.331 00.000 140714069317376 worker thread done servicing request
00:45:51.351 00.020 140714079116544 UpdateGuideState exits: m=271779 SNR=306.5
00:45:51.351 00.000 140714079116544 PhdController: settling, locked = 1, distance = 0.87 (1.00) aobump = 0 frame = 26 / 99999
00:45:51.351 00.000 140714079116544 PhdController failed: timed-out waiting for guider to settle
00:45:51.351 00.000 140714079116544 PhdController: newstate STATE_FINISH
00:45:51.351 00.000 140714079116544 PhdController complete: fail: timed-out waiting for guider to settle
00:45:51.351 00.000 140714079116544 evsrv: {"Event":"SettleDone","Timestamp":1618472751.351,"Host":"astroNUC","Inst":1,"Status":1,"Error":"timed-out waiting for guider to settle","TotalFrames":26,"DroppedFrames":0}
00:45:51.351 00.000 140714079116544 Mount: notify guiding dither settle done success=0
00:45:51.351 00.000 140714079116544 PhdController: newstate STATE_IDLE
It appears that PHD2 tells kstars that it failed to settle, and then kstars hangs on that response.
Earlier in the evening I had done some PEC training through the main scope, and then various calibrations and guiding settling in.
I'm not quite sure as to why settling is being marked as "failed" (because the RMS is more than 1 pixel after the 15 second timeout?), but I'm more concerned that the 45 second timeout is not having the desired effect of "trying again."
114MB PHD2 Debug Log available via link: https://www.dropbox.com/s/7oamgxmzyad5xbg/PHD2_DebugLog_2021-04-14_203325.txt.gz?dl=0