Skip to content

Run HaarIface::findDuplicates lock-free in parallel

This works by splitting the duplicates finding logic in 4 major steps:

  1. Resolve all image ids before starting the searches jobs during DuplicatesFinder::slotStart().
  2. Create a shared HaarIface with signature cache in SearchesDBJobsThread to be used by all SearchesJob in parallel.
  3. Break down the whole "images to scan" set into iterator ranges, and run these in parallel (lock-free).
  4. Rebuild (or update) the search albums in the database.

Step 3) can be run lock-free in parallel with some adjustments e.g. because we're using constant iterator ranges, it is not possible to remove unused images from the cache when running multi-thread. Also because we use ranges in step 3), sometimes the same search album is generated multiple times in separate threads using different reference images; in step 4) we ensure the aggregated results are filtered so there's only one search album with similar images per duplicates albums found.

From my local measurements on a album collection of 27.470 photos:

"Find duplicates" in all albums (excluding the reference album):

  • Before: 181s
  • After: 41s (77% reduction, or 4.4x speedup when compared to the current implementation).

Merge request reports