Bugzilla@Mozilla – Bug 619021
crash when opening a page with several marquee elements [@ nsStyleSet::GetContext ]
Last modified: 2011-05-24 23:37:07 PDT
Summon comment box
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:2.0b7) Gecko/20100101 Firefox/4.0b7 Build Identifier: Firefox 4.0b7 Opening a specific HTML document consisting mainly of marquee elements causes current Firefox 4.0b7 to crash with an invalid read of size 4 with a high address (0xf0dea813) in libxul.so. Reproducible: Always Steps to Reproduce: 1.$ echo "<foo> <marquee> <marquee> <marquee> <marquee> <marquee> <marquee> <foo> <marquee> <foo> <marquee> <foo> <object> <marquee> <foo> <marquee> <marquee> <foo> <foo> <marquee> <marquee> <foo> <marquee> <marquee> <marquee> <foo> <marquee> <foo> <foo> <marquee> <marquee>" > marquee.html 2. 3. firefox marquee.html Actual Results: Firefox crashes with a segfault. Expected Results: Firefox remains running. This looks like a potentially dangerous crash to me. 4.0 stable release is probably not too far off, and many people are already using 4.0 beta series, so reporting this as a security issue to be on the safe side.
Created attachment 497459 [details] page triggering the error
Created attachment 497461 [details] page triggering the error More correct file for triggering the error. Also the reproduction instructions above seem to be using the wrong one. Crash state is at http://crash-stats.mozilla.com/report/index/bp-bf85ff2b-8b14-4041-b84e-7d1bb2101214
Created attachment 498640 [details] crash report (mac debug + minidump_stackwalk)
Comment 0 and comment 3 have the crash address as 0xf0dea813 (frame poison address). But comment 2 has the crash address as 0xf363a813, so this might be exploitable.
regression range http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2f1347cd7c44&tochange=b14428284d51 And I can't narrow that down any further by building myself because it seems like the tree from that time doesn't build with whatever version of pango I have on my Linux system now.
Fwiw, on 10.6 builds that old don't build either. But nothing in that range looks obviously relevant...
Comment 5 was on Windows. Linux seems to have a different regression range...
regression range on linux http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d9d4bf676f65&tochange=35d78a340566
And linux debug builds before that range crash too...
This needs an owner, over to tn for now. Please speak up if this needs to be owned by someone else.
I should probably take this instead.
Created attachment 502404 [details] valgrind output
It looks like the poisoned frame pointer is coming from nsFrame::DoGetParentStyleContextFrame (on an nsBlockFrame with a :-moz-anonymous-block style context), where we should have gone (based on examining the frame we were calling it for) through the not-out-of-flow but special-frame codepath: *aProviderFrame = GetIBSpecialSiblingForAnonymousBlock(this); if (*aProviderFrame) { return NS_OK; } So that implies that we have a dangling pointer in the nsIFrame::IBSplitSpecialPrevSibling frame property.
Based on the valgrind data, I'm guessing that means we somehow got a next-in-flow as an IBSplitSpecialPrevSibling, which really shouldn't happen; that should always point to the first-in-flow, I thought.
Or, perhaps more likely, we called DeleteNextInFlowChild on something that was supposed to be empty, but it wasn't empty because it erroneously reported NS_FRAME_IS_COMPLETE.
(In reply to comment #16) > Or, perhaps more likely, we called DeleteNextInFlowChild on something that was > supposed to be empty, but it wasn't empty because it erroneously reported > NS_FRAME_IS_COMPLETE. Yes. I added assertions for this, and they fire on this testcase (but not in normal browsing of a few sites).
We should have those assertions permanently in the tree....
Yes, it's in my patch queue. We're returning NS_FRAME_IS_COMPLETE incorrectly because of nsFrame::IsFrameTreeTooDeep. Joy. Maybe if I hack initialization of nsHTMLReflowState::mReflowDepth for the reflow root case to match what we do in the reflow-from-root case, it will fix this...
(In reply to comment #19) > Maybe if I hack initialization of nsHTMLReflowState::mReflowDepth for the > reflow root case to match what we do in the reflow-from-root case, it will fix > this... It doesn't. I guess I'll have to investigate how we end up with the frame being split.
Reflow depth for a given frame seems to vary quite a bit. Given that, I think the better solution is to just report incomplete status when appropriate.
Created attachment 502663 [details] [review] patch 1: add assertions
Created attachment 502665 [details] [review] patch 2: warn when frame tree is too deep I think a console warning in this case is appropriate; it would have helped me to know that we were entering this code. But we don't need the old DEBUG_kipp blocks that don't compile anymore.
Created attachment 502666 [details] [review] patch 3: fix
+ /** + * Assert that the frame tree rooted at |aSubtreeRoot| is empty, i.e., + * that it contains no first-in-flows. + */ I don't think that AssertFrameTreeIsEmpty is the best name for this. Apart from text frames, we also support split image frames, so this can return true even when there's real content in the frame tree. + // Should we also assert about text frames not mapping any text, or is + // that necessarily fixed up in the next continuation when the + // previous continuation maps all the text? It should be fixed in the next continuation, but there's no harm in asserting it.
In part 3, maybe we should pass &aReflowStatus as a parameter to IsFrameTreeTooDeep and have it set the status?
Created attachment 502733 [details] [review] patch 1: add assertions renames the assertion function, and adds assertions based on nsIFrame::GetOffsets for text frames
Created attachment 502734 [details] [review] patch 3: fix good idea
It turns out the start == end assertion (added in the second version of patch 1) fires on five tests: REFTEST TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/layout/generic/crashtests/472774-1.html | assertion count 3 is more than expected 0 assertions REFTEST TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/layout/generic/crashtests/478185-1.html | assertion count 4917 is more than expected 0 assertions REFTEST TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/23604-1.html | assertion count 2 is more than expected 0 assertions REFTEST TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/372553-1.html | assertion count 3 is more than expected 0 assertions REFTEST TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/457398-1.html | assertion count 2 is more than expected 0 assertions
...which all involve :first-letter.
Created attachment 502899 [details] [review] patch 1: add assertions work around :first-letter issues. I tried some things to get good values, but I couldn't find a way that worked.
http://hg.mozilla.org/mozilla-central/rev/dfa73f7b1acf http://hg.mozilla.org/mozilla-central/rev/67cfc95b4b90 http://hg.mozilla.org/mozilla-central/rev/e2f7319148ce Need to remember to add the test later (though I think Jesse goes through and does those occasionally).
Takes down 3.6.x too.
I merged patches 2 and 3 to 1.9.2 and tested that they do in fact fix attachment 497461 [details] (I got a crash without them and it went away with them). I merged patch 2 simply because it made merging (and reading) patch 3 easier. Patch 2 is debug-only.
Created attachment 508002 [details] [review] patch 2 for 1.9.2
Created attachment 508003 [details] [review] patch 3 for 1.9.2
Some of the bugs in the regression range appear to have been fixed in 1.9.1 as well. Does this affect that branch? Or can we narrow the regression range to a specific patch so we can see if that's one of the ones that was backported?
Comment on attachment 508003 [details] [review] patch 3 for 1.9.2 Approved for 1.9.2.15, a=dveditz for release-drivers
fwiw 3.5.18pre doesn't crash on the testcase.
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/ac05f0d960ff http://hg.mozilla.org/releases/mozilla-1.9.2/rev/c1beedb67abd
I don't know why it doesn't crash on 1.9.1; I'd have thought the underlying problem would be present going back pretty much forever. It's possible it's something specific to the testcase. The patch should be safe to take on 1.9.1 if we want to take it there.
The fact that the regression ranges are all different on win opt, linux opt, and linux debug and none of them contains anything suspicious probably means we are just getting lucky when it doesn't crash.
Created attachment 508839 [details] [review] patch 2 for 1.9.1
Created attachment 508840 [details] [review] patch 3 for 1.9.1
Comment on attachment 508839 [details] [review] patch 2 for 1.9.1 Approved for 1.9.1.18, a=dveditz for release-drivers
Comment on attachment 508840 [details] [review] patch 3 for 1.9.1 Approved for 1.9.1.18, a=dveditz for release-drivers
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/8e040c1af890 http://hg.mozilla.org/releases/mozilla-1.9.1/rev/8dd77caa5068
I cannot reproduce the crash using the testcase in comment 1 with the Firefox 4 beta 10 build (pre-fix) on Windows 7 or OS X 10.6. Is there a better STR?
The testcase in comment 2 worked pretty reliably for me, if my memory is correct. (Were you really using the one in comment 1 instead?)
I was using comment 2. What OS have you been using?
I was on 64-bit Linux. tn also saw it on Windows (comment 5 and comment 7), and Jesse did on Mac (comment 3).
The "3.6.15" we're releasing today does not fix this bug, the release containing this bug fix has been renamed to "3.6.16" and the bugzilla flags will be updated to reflect that soon. Today's release is a re-release of 3.6.14 plus a fix for a bug that prevented many Java applets from starting up.
Verified fixed using attached test page in 1.9.2 nightly (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.16pre) Gecko/20110317 Namoroka/3.6.16pre (.NET CLR 3.5.30729)) and crash in 1.9.2.15. Test page does NOT crash in 1.9.1.17. Have we actually seen this crash in 1.9.1 or do we have STR there?