Run HaarIface::findDuplicates lock-free in parallel
This works by splitting the duplicates finding logic in 4 major steps:
- Resolve all image ids before starting the searches jobs during
DuplicatesFinder::slotStart()
. - Create a shared
HaarIface
with signature cache inSearchesDBJobsThread
to be used by allSearchesJob
in parallel. - Break down the whole "images to scan" set into iterator ranges, and run these in parallel (lock-free).
- Rebuild (or update) the search albums in the database.
Step 3) can be run lock-free in parallel with some adjustments e.g. because we're using constant iterator ranges, it is not possible to remove unused images from the cache when running multi-thread. Also because we use ranges in step 3), sometimes the same search album is generated multiple times in separate threads using different reference images; in step 4) we ensure the aggregated results are filtered so there's only one search album with similar images per duplicates albums found.
From my local measurements on a album collection of 27.470 photos:
"Find duplicates" in all albums (excluding the reference album):
- Before: 181s
- After: 41s (77% reduction, or 4.4x speedup when compared to the current implementation).