Draft: Multithreaded floodfill algorithm
This MR explores the potential benefits of parallelizing the floodfill algorithm.
Some comments:
- It operates on contiguous, non-overlapping, rectangular regions of size
KisTileData::WIDTH * KisTileData::HEIGHT
. In each region the floodfill is performed in parallel. - The main high-level class has the same interface as
KisScanlineFill
but internally a new scanline algorithm is used. - The multithreaded code uses
QtConcurrent::run
so that it is contained in one file. I'm not sure yet why/how it should be ported to strokes. - The selection and difference policies are the same as in
KisScanlineFill
, but the pixel policies were replaced with new tile policies. The tile policy encapsulates the manipulation of data inside a specific tile rect for the needed devices (reference device, mask device, boundary selection device, etc.). It can operate on aligned or unaligned devices. The reference device (the device that is used to check color differences) is what establishes if another device is aligned. The algorithm's tile grid is aligned to the tile grid of the reference device. So, in other words, a device is considered aligned if its tiles are aligned with those of the reference device (although the device's offset may be different). Accessing an aligned device is faster. - I changed the
kis_fill_painter.cc
with a newsetFloodFillAlgorithm
function that allows to select which algorithm to use. It can take as values the sequential scanline fill or the multi-threaded scanline fill enum values. It may be interesting to add a new "auto" enum value that chooses the best one depending on some factors (image size, number of threads, etc.), and maybe make it the default. - I changed the
FillProcessingVisitor
to use the multi-threaded algorithm inKisFillPainter
just to be able to use the algorithm and see the results, so one can use the fill tool with the new floodfill.
Tests
I added tests to the fill painter test file. They are a copy of the ones that use KisScanlineFill
. They pass, which seems to indicate that the results produced by this algorithm are correct.
Benchmarks
I added some benchmarks to compare KisScanlineFill
and KisMultithreadedScanlineFill
. It benchmarks fillColor
and fillSelection
(both when the mask is aligned and unaligned by 32x32px with respect to the reference device). Each benchmark loads a 256x256 image with some curved lines that resemble a labyrinth, and keeps re-scaling it iteratively until the size is greater than 12000px. For each iteration it performs the fill with the multithreaded and non-multithreaded algorithms.
Following is a chart I made by running the benchmarks 3 times and taking the mean. My conclusion is that I don't know how much exactly the multithreading improves the performance. It appears to improve it a bit on large images/areas to fill. But I think that what is clear is that using aligned devices greatly improves it, since the algorithm can then just move the random accessor to the top left corner of the tile, then get the raw data pointer, just once per tile (instead of per scanline for example), then the raw pointer is used to access the data directly. This algorithm is twice as fast if the selection is aligned. So in the KisFillPainter
(which the fill tool uses, for example) the selection that is created should be moved to align it to the reference device. That way the fill tool wouldn't suffer from the selection alignment problem (I still have to check what happens when a boundary selection is used).
Next is a graph comparing the fillColor
when using an external device, both aligned and unaligned:
The next graph compares the algorithms in the case where a boundary selection is used. In the legend there are some codes to shorten the text: A
means aligned and U
means unaligned; the first letter refers to the mask device and the second one to the boundary selection device. In practice, when using the fillSelection[WithBoundary]
functions the mask device is created expressedly most of the time to perform the floodfill, so we can align it to the reference device to gain performance, so I don't think that the "unaligned mask" versions are that important, although they are still faster than the KisScanlineFill::fillSelection
.
There is no benchmark for the fillSelectionUntil*
and clearNonZeroComponent
since they are basically the same as some of the previous. For fillContiguousGroup
I didn't know how to write the benchmark (mostly how the images should look like), but I have to say that I did't see any improvements.
Formalities Checklist
-
I confirmed this builds. -
I confirmed Krita ran and the relevant functions work. -
I tested the relevant unit tests and can confirm they are not broken. (If not possible, don't hesitate to ask for help!) -
I made sure my commits build individually and have good descriptions as per KDE guidelines. -
I made sure my code conforms to the standards set in the HACKING file. -
I can confirm the code is licensed and attributed appropriately, and that unattributed code is mine, as per KDE Licensing Policy.