autotests/extractordocumentnodetest.cpp · dea29a3241a0c6943710036c442eb5003e159836 · PIM / KItinerary

Create plain text nodes for HTML and PDF content · dea29a32

Volker Krause authored Mar 26, 2021

This makes extractors work that relied on the implicit type conversion
that the old system had special-cased for a few types.

Counter-intuitively this has practically no performance impact despite
doing the conversion unconditionally: In case the parent type is extracted
from, doing the text conversion comes almost for free (ie. the full PDF
or HTML parsing is done already), and in case the parent doesn't produce
output, content-based matching for plain text extractors will always
trigger the type conversion.

dea29a32