Commit cbd3f8dd authored by Volker Krause's avatar Volker Krause
Browse files

Deduplicate images only per page in PDF documents

We need to trigger extractors per page even if a barcode repeats, e.g.
for multi-leg IATA BCBPs.

The deduplication logic was apparently always broken and got now fixed
with the new PDF image reference type. This then made things break here
with deduplication becoming too aggressive.
parent cdbb9797
Pipeline #266836 passed with stage
in 5 minutes and 33 seconds
......@@ -85,9 +85,9 @@ void PdfDocumentProcessor::expandNode(ExtractorDocumentNode &node, const Extract
{
const auto doc = node.content<PdfDocument*>();
m_imageIds.clear();
for (int i = 0; i < doc->pageCount(); ++i) {
const auto page = doc->page(i);
m_imageIds.clear();
for (int j = 0; j < page.imageCount(); ++j) {
auto img = page.image(j);
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment