1. 23 Mar, 2021 1 commit
  2. 20 Mar, 2021 1 commit
  3. 19 Mar, 2021 1 commit
  4. 16 Mar, 2021 1 commit
    • Volker Krause's avatar
      Make the extractor filter match scope explicit · 716464b8
      Volker Krause authored
      That is, which parts of the document need to match relative to the part
      being considered for extraction. This so far is all implicit based on the
      types of the matching and extracted parts. Explicitly specifying this will
      therefore further allow us to remove implicit type-specific logic from the
      core engine, while giving us even more flexibility.
      This information isn't actually used yet, this is only a small preparation
      for a larger upcoming rework of the extractor engine.
  5. 15 Mar, 2021 1 commit
    • Volker Krause's avatar
      Unify extractor filter field name key · 333834f5
      Volker Krause authored
      Now that we don't need that anymore to determine the filter type, we can
      name them the same everywhere. This is not only less error prone and
      simplifies the code, it also a first step towards making the extractor
      engine core more type-agnostic.
  6. 17 Feb, 2021 1 commit
  7. 28 Mar, 2020 1 commit
  8. 05 Dec, 2019 1 commit
  9. 05 Oct, 2019 1 commit
  10. 04 Oct, 2019 1 commit
    • Volker Krause's avatar
      Don't trigger text-based extractors if we have a PDF alternative · 4bfedd54
      Volker Krause authored
      The text-based alternatives only exist for DB and SNCF so we can check in
      unit test data, so change the trigger filter to something special for the
      tests. As a result we only run the PDF variant on real data, avoiding the
      extra work and any possible merging issues.
  11. 25 Nov, 2018 1 commit
  12. 23 Sep, 2018 1 commit
  13. 01 May, 2018 1 commit
  14. 17 Mar, 2018 1 commit
  15. 22 Dec, 2017 1 commit
    • Volker Krause's avatar
      Load extractor scripts from the file system too · c23b2bfc
      Volker Krause authored
      Simplifies testing them, as you don't need to recompile anymore. Also,
      rename the extractor folder from 'rules' to 'extractors', a better match
      now that they contain scripts rather than declarative extraction rules.
  16. 20 Dec, 2017 1 commit
    • Volker Krause's avatar
      Replace the declarative extractor definitions with JavaScript · 50b50322
      Volker Krause authored
      The approach worked technically, but turned out fairly hard to work with.
      JavaScript is a bit more verbose, but easier to work with as it doesn't
      enforce a very specific way of modeling the extractors. Being able to do
      printf debugging inside the extractor code rather than reviewing a full
      extractor rule execution trace to find mistakes is also convenient.