Ekos/StellarSolver Memory issue
I believe there are rare but often-enough memory issues when running Ekos with StellarSolver. For instance, if I run the scheduler to image for an entire night, and have the internal guider running (uses StellarSolver to detect stars), and automatic HFR calculation at the end of each sub (uses StellarSolver to detect stars), and automatic drift detection enabled in the scheduler (uses StellarSolver to plate-solve each sub), then it isn't unusual to crash before the end of the night. I believe 2 or all three of these StellarSolvers are running in parallel threads, and thus I suspect that this is likely a multi-threading issue with StellarSolver, or perhaps related to double free-ing, or using free'd memory, but I can't be sure of that.
Since I can't track it down, I created/submitted a test to help us debug it.
-
Edit Tests/fitsviewer/testfitsdata.cpp and comment out the line below (around line 532 in the file) so that the test is enabled: #define SKIP_PARALLEL_SOLVERS_TEST
-
You can find the data I provided in https://drive.google.com/drive/folders/1eUrcJd1IENvcRUtnwsaUNWKwKuj5G077?usp=sharing Download it and put it in some directory on your machine. Then change the line (around line 539) QString dir = "/home/hy/Desktop/SharedFolder/DEBUG-solver"; to point to where you put the data.
-
Around line 590, change "num = 3000" to "num = 10000"
-
Uncomment the 3 lines that have to do with loop4 so you run the solver as well.
-
compile the test: "make -j12 testfitsdata"
-
run the test: "bin/testfitsdata testParallelSolvers"
This is not a "test" in the sense that it doesn't check any computations. Rather, passing is simply not crashing.
This is what happened when I just ran it:
> bin/testfitsdata testParallelSolvers
********* Start testing of TestFitsData *********
Config: Using QtTest library 5.12.8, Qt 5.12.8 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 9.3.0)
PASS : TestFitsData::initTestCase()
Running solver with /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-08.fits
QINFO : TestFitsData::testParallelSolvers() "#0: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-30.fits HFR 2.0226"
QINFO : TestFitsData::testParallelSolvers() "#0: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-08.fits HFR 1.98326"
QINFO : TestFitsData::testParallelSolvers() "#1: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-34.fits HFR 2.19655"
QINFO : TestFitsData::testParallelSolvers() "#1: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-12.fits HFR 1.96469"
...
QINFO : TestFitsData::testParallelSolvers() "#669: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-21.fits HFR 1.9854"
QINFO : TestFitsData::testParallelSolvers() "#673: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-34.fits HFR 2.19655"
QINFO : TestFitsData::testParallelSolvers() "#670: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-24.fits HFR 2.08071"
QINFO : TestFitsData::testParallelSolvers() "#674: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-37.fits HFR 2.03938"
=== Received signal at function time: 300012ms, total time: 300013ms, dumping stack ===
QINFO : TestFitsData::testParallelSolvers() "#671: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-27.fits HFR 2.08168"
QINFO : TestFitsData::testParallelSolvers() "#675: /home/hy/Desktop/SharedFolder/DEBUG-solver/guide_frame_00-20-40.fits HFR 2.12139"
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 37916
(gdb) === End of stack trace ===
QFATAL : TestFitsData::testParallelSolvers() Test function timed out
FAIL! : TestFitsData::testParallelSolvers() Received a fatal error.
Loc: [Unknown file(0)]
Totals: 1 passed, 1 failed, 0 skipped, 0 blacklisted, 300315ms
********* Finished testing of TestFitsData *********
Aborted (core dumped)