Don't use the actual char bounding box, instead use (ascent - halfLeading, descent + halfLeading) which takes line-height into account.
I've compared the text to what Firefox outputs with the HTML reproduction and the positioning of lines is very close with this patch. There is a subpixel difference that can be explained by the "precision" offset and some pixel rounding in Firefox.
The first line in Krita with this patch is several pixels lower than Inkscape, that I suspect Inkscape is wrong.
There is an issue with this patch. If line-height is set to 0, then the first word (and only the first word) is missing from the render. Not even its debug bounding box shows up. I haven't looked into this yet. Fixed.