Bugzilla@Mozilla – Bug 571995
Crash [@ BuildTextRunsScanner::BreakSink::SetBreaks(unsigned int, unsigned int, unsigned char*)] with crazy -moz-column testcase
Last modified: 2011-01-27 17:21:21 PST
Summon comment box
Created attachment 451142 [details] testcase (crashes 1.9.1 and 1.9.2 Firefox when loaded)
Created attachment 451143 [details] crash report
We're calling a virtual method on a deleted gfxTextRun.
Created attachment 451301 [details] [review] to illustrate This is not a suggested fix/wallpaper - just to illustrate where the problem starts. In BuildTextRunsScanner::AssignTextRun we call f->ClearTextRun() which deletes mTextRun if it doesn't have the TEXT_IN_CACHE bit: http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#3759 The patch illustrates the deleted text run is still in use by one of the mBreakSinks.
Same crash occurs on 1.9.1 and 1.9.2
Mats can probably fix this.
Mats, do you know when you'll be able to work on this?
Created attachment 455029 [details] Frame dump + stacks The first stack dump is from when the first "REASSIGNING MULTIFLOW ..." warning occurs, note that aTextRun=0x163b680 (red) here. The next stack is when we hit the warning again and this is the same call where we have 0x163b680 in one of the BreakSinks. Also note the "UnhookTextRunFromFrames 0x163b680 (red)" here, I guess from "f->ClearTextRun();"
roc, can you tell me what the MULTIFLOW warning means: http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#1953
It means that we had to wipe out the textrun for a frame, deleting the entire textrun, and the textrun had text accumulated from multiple frames. This is not wrong, it can happen, but I added that warning for myself because if it happens too much we have serious performance problems.
Created attachment 455365 [details] [review] wip Removing the BreakSink that refers to the old textrun and all BreakSinks that follows it, then for each of those BreakSinks, remove it from mLineBreaker and all TextItems that follows. This fixes the crash for me locally (Linux) and pass reftests+crashtests. I've submitted the patch to TryServer.
Comment on attachment 455365 [details] [review] wip This doesn't fix the crash on 1.9.2/1.9.1 though... just moves it to some other place. Removing *all* BreakSinks seems to fix it though... I'll investigate some more...
Created attachment 455617 [details] [review] fix v2 This fixes the crash also on 1.9.2 and 1.9.1. It passed TryServer (m-c) unit tests and reftests/crashtests in local 1.9.2 debug builds on Linux and OSX.
Hmm, it doesn't seem right just to reset the linebreaker. We could be in the middle of finding line breaks. Do you know why a textrun whose text was added to the linebreaker is being replaced?
Created attachment 455900 [details] Frame dumps + stack This a detailed trace starting in BuildTextRuns (first frame dump), to the error condition in AssignTextRun (last frame dump). While looping in BuildTextRuns, calling ScanFrame for each frame, we eventually reach 0x19243f0(pink) where we hit 1388 if (!ContinueTextRunAcrossFrames(mLastFrame, frame)) { 1389 FlushFrames(PR_FALSE, PR_FALSE); http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#1388 (mLastFrame=0x1924158(lime) "\n", frame=0x19243f0(pink) " ") so we call FlushFrames, which creates a new text run (0x181be80) and assign it to the frames in the mapped flow (0x14d71b8 0x1924158). I don't see anything wrong in this per se, other than the old text run still being referenced in the BreakSinks.
Created attachment 455905 [details] [review] fix v3 Don't use ClearTextRun() since it might delete the text run while it's used in BreakSinks/LineBreaker. Delete it later when we flush line breaks instead (in FlushLineBreaks).
This patch is nice and safe, but I'm still worried that we could lose line breaks. The line breaker might record line break opportunities in this textrun that we are just going to throw away. In your log, it looks like the frame 0x14d71b8 where we replace the old textrun with the new textrun is actually zero-length? Maybe that's an underlying problem we should fix. The reason we wipe out the old textrun when a new textrun overlaps it is that we have to keep the invariant that the text ranges mapped by textruns don't overlap. But if the frame has no text then there is no overlap to worry about so there shouldn't be a problem just changing its textrun from one to another.
ETA is July 24th, since mats is out for a bit
Comment on attachment 455905 [details] [review] fix v3 per comment 16 and offline, roc says we need a new patch.
Blocking Final 2.0+
#114 crash for thunderbird 3.1.1, approx same for v3.0.6 a firefox crash mentions "accidentally rsized the font larger with the mouse pad somehow, then tried to resize the window to full screen and crash" bp-269e3e50-3730-42e3-b9e4-379322100709 summary sig change to match BuildTextRunsScanner::BreakSink::SetBreaks(unsigned int, unsigned int, unsigned char*) to crash-stats unrelated? ... free | nsAutoPtr<BuildTextRunsScanner::BreakSink>::`scalar deleting destructor''(unsigned int)
Comment on attachment 451142 [details] testcase (crashes 1.9.1 and 1.9.2 Firefox when loaded) This testcase does not trigger the bug on trunk anymore due to: http://hg.mozilla.org/mozilla-central/rev/c2fa4377a799 It still crashes with that change reverted so replacing the newlines in the testcase with some other character should still crash on trunk...
Created attachment 463024 [details] [review] crashtest.diff (for trunk and branches)
(In reply to comment #16) > ... it looks like the frame 0x14d71b8 where we replace the old textrun > with the new textrun is actually zero-length? Maybe that's an underlying > problem we should fix. I don't think that's a bug in this case, it can happen when a text frame grows to include text from its next continuation during reflow, leaving the continuation temporarily empty until its reflowed. SetLength mentions this case explicitly: http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#6083
Created attachment 463029 [details] [review] fix v4 ClearTextRun + update break sinks to use the new text run -- but only if the old text run is NOT in use by the previous continuation.
As discussed offline, Mats is going to make a new patch here.
Mats, got an update?
dbaron is on a -moz-column crash-killing rampage.
Is that new patch coming soon? (roc, do you remember what was wrong with the old one?)
Mats has been working on the swapDocShells frame-tree-swapping bug. This patch was not the right approach. Basically, the situation in this bug is that we have a text continuation frame C that is empty (maps zero characters) but is associated with textrun A. Textruns get recreated and C is assigned a new textrun B. This causes us to delete textrun A (which is also being used by C's prev-in-flows), but it shouldn't need to; because C has no actual text associated with it at this point, we can just flip it to point to textrun B and no textruns need to be updated. (We do need to ensure that C does not appear in the userdata for textrun A.)
Any updates on a new approach here?
Mats has finally finished the presentation-swapping work and will work on this this week.
Created attachment 480764 [details] [review] fix v5
Intentionally didn't land the tests yet, until we fix branches. http://hg.mozilla.org/mozilla-central/rev/cc8777e97c21
could this landing be connected to the crash volume regression described in https://bugzilla.mozilla.org/show_bug.cgi?id=485003#c11 and below?
I think so. I'm looking at bug 603490 and bug 603510 which shows that the assumptions we made in this patch are wrong. So I'm backing out and reopening this bug instead.
Backout changeset: http://hg.mozilla.org/mozilla-central/rev/9af1a5813a7b
Created attachment 482664 [details] Trace for bug 603490
Created attachment 482665 [details] Trace for bug 603510
roc, how about if we always do a limited ClearTextRun() -- clearing only the tail of continuations starting at the frame we're looking at (f). The crash in this bug comes from UnhookTextRunFromFrames starting at the "first-in-userdata" (or first-in-flow for SIMPLE_FLOW). That would mean a full ClearTextRun() for bug 603490 and bug 603510 (which I think is right) and for this bug we would start at the "first-in-userdata" which is the first frame using the oldTextRun (0x1920b58 in attachment 455900 [details]), then loop until we find 0x14d71b8 (f) and clear from there, the net result being a simple SetTextRun(nsnull) in this case. What do you think?
Sounds good.
Created attachment 482907 [details] [review] fix v6 Add 'aStartContinuation' parameter to the relevant methods; in case it's null these methods should work as before, in case it's non-null they should traverse frames as before but not do anything until we get to 'aStartContinuation', then they should start doing their thing. The only non-null caller is AssignTextRun. If 'aStartContinuation' is non-null it's an error if it's not found in the user data. Still waiting for Try results, but an earlier version with only a minor difference worked fine...
(TryServer unit tests were successful with a couple of known [orange] bugs)
Comment on attachment 482907 [details] [review] fix v6 Probably best to not use default parameters here.
Created attachment 483383 [details] [review] fix v7 Ok, I dropped the default param and added 'nsnull' to callers as needed. I don't think this needs a new review so I'm pushing this in a bit....
http://hg.mozilla.org/mozilla-central/rev/9a1c991eddb0
Created attachment 489213 [details] [review] Patch for branches Accumulated patch for 1.9.2/1.9.1 branches. It contains the fix for this bug and its regressions, bug 605340 and bug 604843, plus crashtests for bug 603510 and bug 603490. (I'm holding the crashtest on this bug)
Comment on attachment 489213 [details] [review] Patch for branches Approved for 1.9.2.13 and 1.9.1.16, a=dveditz for release-drivers
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/650e7b0d6d34 http://hg.mozilla.org/releases/mozilla-1.9.1/rev/bd03fbc2b2fc http://hg.mozilla.org/releases/mozilla-1.9.1/rev/19e3f097936a http://hg.mozilla.org/releases/mozilla-1.9.1/rev/91bf1ee0a8ac http://hg.mozilla.org/releases/mozilla-1.9.1/rev/5cb35072f373 http://hg.mozilla.org/releases/mozilla-1.9.1/rev/6444dcb39b02 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/64e803656a1f http://hg.mozilla.org/releases/mozilla-1.9.2/rev/ee365bf30311 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/0ac503e14b53 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/1d2fa8e53cc3 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/57ce1356af46 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/3956db8d0373
It looks like none of the crashtests make 1.9.0 crash.