1. 23 Mar, 2021 2 commits
  2. 22 Mar, 2021 2 commits
  3. 21 Mar, 2021 12 commits
  4. 20 Mar, 2021 6 commits
    • Volker Krause's avatar
      Implement matching ExtractorFilter against the new document nodes · 76fd9a3d
      Volker Krause authored
      This is more explicit than the old method, taking the filter scope into
      account, and no longer contains type-specific code.
      76fd9a3d
    • Volker Krause's avatar
      Add IATA BCBP extractor · 7287f0ac
      Volker Krause authored
      This was previously done manually in a few places, the new extractor
      engine will apply this to any text-containing node.
      7287f0ac
    • Volker Krause's avatar
      Move extractor scripts to a different folder · 0247cdbb
      Volker Krause authored
      We need their current location for new parts of the new extractor engine.
      0247cdbb
    • Volker Krause's avatar
      Add abstract base classes for extractors · 517230ac
      Volker Krause authored
      The current script extractors will be rebased onto this, while also
      allowing built-in C++ extractors next to them using the same interface.
      This will replace the current generic/custom extractor split.
      517230ac
    • Volker Krause's avatar
      Add document model for the new extractor engine · 537968ac
      Volker Krause authored
      This is essentially a tree of variants representing nested documents.
      Unlike the previous approach, nested documents are no longer an after-
      thought but will now also be properly accessible by tooling. Using
      variants and MIME types as well as delegating type-specific functionality
      also makes this type-independent and easier extensible.
      537968ac
    • Volker Krause's avatar
      Deprecate the type-specific ExtractorEngine input interface · c7711fbe
      Volker Krause authored
      Going forward, there will only be two input methods, one taking raw data
      and one taking a variant with already decoded data. This is also continuing
      the work of phasing out ExtractorInput enums in favor of full MIME types.
      
      Nothing really changed on the inside yet, this is mostly transitional
      scaffolding.
      c7711fbe
  5. 19 Mar, 2021 1 commit
  6. 18 Mar, 2021 2 commits
  7. 17 Mar, 2021 2 commits
  8. 16 Mar, 2021 1 commit
    • Volker Krause's avatar
      Make the extractor filter match scope explicit · 716464b8
      Volker Krause authored
      That is, which parts of the document need to match relative to the part
      being considered for extraction. This so far is all implicit based on the
      types of the matching and extracted parts. Explicitly specifying this will
      therefore further allow us to remove implicit type-specific logic from the
      core engine, while giving us even more flexibility.
      
      This information isn't actually used yet, this is only a small preparation
      for a larger upcoming rework of the extractor engine.
      716464b8
  9. 15 Mar, 2021 2 commits
  10. 14 Mar, 2021 1 commit
  11. 13 Mar, 2021 1 commit
  12. 11 Mar, 2021 1 commit
  13. 09 Mar, 2021 1 commit
  14. 05 Mar, 2021 1 commit
  15. 02 Mar, 2021 1 commit
  16. 01 Mar, 2021 1 commit
  17. 24 Feb, 2021 3 commits