Skip to content
  • Volker Krause's avatar
    Create plain text nodes for HTML and PDF content · dea29a32
    Volker Krause authored
    This makes extractors work that relied on the implicit type conversion
    that the old system had special-cased for a few types.
    
    Counter-intuitively this has practically no performance impact despite
    doing the conversion unconditionally: In case the parent type is extracted
    from, doing the text conversion comes almost for free (ie. the full PDF
    or HTML parsing is done already), and in case the parent doesn't produce
    output, content-based matching for plain text extractors will always
    trigger the type conversion.
    dea29a32