QXmlStreamReader: Raise error on unexpected tokens

QXmlStreamReader accepted multiple DOCTYPE elements, containing DTD fragments in the XML prolog, and in the XML body. Well-formed but invalid XML files - with multiple DTD fragments in prolog and body, combined with recursive entity expansions - have caused infinite loops in QXmlStreamReader.

This patch implements a token check in QXmlStreamReader. A stream is allowed to start with an XML prolog. StartDocument and DOCTYPE elements are only allowed in this prolog, which may also contain ProcessingInstruction and Comment elements. As soon as anything else is seen, the prolog ends. After that, the prolog-specific elements are treated as unexpected. Furthermore, the prolog can contain at most one DOCTYPE element.

Update the documentation to reflect the new behavior. Add an autotest that checks the new error cases are correctly detected, and no error is raised for legitimate input.

The original OSS-Fuzz files (see bug reports) are not included in this patch for file size reasons. They have been tested manually. Each of them has more than one DOCTYPE element, causing infinite loops in recursive entity expansions. The newly implemented functionality detects those invalid DTD fragments. By raising an error, it aborts stream reading before an infinite loop occurs.

Thanks to OSS-Fuzz for finding this.

Fixes: QTBUG-92113 Fixes: QTBUG-95188 Pick-to: 6.6 6.5 6.2 5.15 Change-Id: I0a082b9188b2eee50b396c4d5b1c9e1fd237bbdd Reviewed-by: Volker Hilsheimer (cherry picked from commit c4301be7)

  • asturmlechner 2023-07-27: Backport commit equivalent to upstream's CVE-2023-38197-qtbase-5.15.diff, tests have a different structure.

