Skip to content

Clean up Office XML and generic XML handling, improve OPC comliance

There is quite some duplicated code between the generic XML extractor and the Office XML (MS Office 2007+) extractor.

The differences between the two are mostly omissions (e.g the generic XML handling did not cover the CreationDate (Dublin Core created) property), and implementation deficiencies.

Notably, the namespace handling in the Office XML extractor only worked with "typical" namespace prefixes, but was not fully XML or OPC (Open Packaging Conventions) compliant.

Edited by Stefan Brüns

Merge request reports