Skip to content

[PlainTextExtractor] Cleanup, extend test coverage, fix possible truncation

Several aspects of the PlaintTextExtractor were not covered by tests, or incorrectly:

  • 'text/plain' was not included in the coverage tests for the collection
  • Skipping of ExtractionResult::ExtractPlainText was not actually tested, although a dedicated test exists
  • Counting of empty lines was not verified

Corresponding tests have been added, or existing tests have been fixed.

The line count is verified, and matches the output of e.g. the wc CLI tool.

Also the handling of text files ending without a line delimiter has been fixed, depending on the used implementation the last character would have been truncated.

Merge request reports