Last Comment Bug 472776 - Crash [@ UnhookTextRunFromFrames] [@ ClearAllTextRunReferences] with bidi
: Crash [@ UnhookTextRunFromFrames] [@ ClearAllTextRunReferences] with bidi
Status: VERIFIED FIXED
: [sg:critical] post 1.8-branch
: crash, regression, testcase, verified1.9.0.11, verified1.9.1
Product: Core
Classification: Components
Component: Layout: Text
: Trunk
: All All
: P2 critical (vote)
: mozilla1.9.2a1
Assigned To: Robert O'Callahan (:roc) (Mozilla Corporation)
: layout.fonts-and-text
:
:
: 377438 467150 490410
  Show dependency treegraph
 
Reported: 2009-01-08 18:20 PST by Jesse Ruderman
Modified: 2009-11-24 01:12 PST (History)
13 users (show)
roc: blocking1.9.1+
dveditz: blocking1.9.0.11+
dveditz: wanted1.9.0.x+
samuel.sidler+old: wanted1.8.1.x-
roc: in‑testsuite+
See Also:
Crash Signature:
[@ UnhookTextRunFromFrames] [@ ClearAllTextRunReferences]


Attachments
testcase (causes shutdown crash) (396 bytes, text/html)
2009-01-08 18:20 PST, Jesse Ruderman
no flags Details
fix (2.96 KB, patch)
2009-02-04 20:16 PST, Robert O'Callahan (:roc) (Mozilla Corporation)
smontagu: review+
dveditz: approval1.9.0.11+
Details | Diff | Splinter Review
fix for 1.9.0.x (fixed bitrot in crashtests.list) (2.98 KB, patch)
2009-05-02 16:57 PDT, Daniel Holbert [:dholbert]
no flags Details | Diff | Splinter Review

Summon comment box

Description Jesse Ruderman 2009-01-08 18:20:01 PST
Created attachment 356114 [details]
testcase (causes shutdown crash)

Steps to reproduce:
1. Load the testcase.
2. Quit Firefox.

Result: crash [@ UnhookTextRunFromFrames] or [@ ClearAllTextRunReferences].  If MallocScribble is enabled, the crash involves dereferencing 5555567d, but without MallocScribble, a random address gets called.
Comment 1 Daniel Holbert [:dholbert] 2009-01-12 14:17:51 PST
I can reproduce the ClearAllTextRunReferences shutdown-crash in my Linux mozilla-central debug build from this morning.  OS --> All.

The crash is at line 328 in nsTextFrameThebes.cpp, on the call to aFrame->GetType(), quoted here:
  328     NS_ASSERTION(aFrame->GetType() == nsGkAtoms::textFrame,
  329                  "Bad frame");

If I run "print *aFrame" in gdb after the crash, I see that all of aFrame's member-data pointers are bad (0x5a5a5a5a).  firebot tells me that that pattern signifies "jemalloc allocated but unused memory (MALLOC_FILL) - or possibly dead land (deleted memory)".

FWIW, I can't reproduce the shutdown-crash in today's optimized nightly build.  (Or at least I can't tell that it's crashing -- Firefox closes normally, and I get no errors printed to stderr or stdout.)
Comment 2 Robert O'Callahan (:roc) (Mozilla Corporation) 2009-01-13 18:20:06 PST
That assertion was added specifically to catch stale text frames here. Without the assertion, the code in that method will usually silently exit without causing any problems.
Comment 3 Jonathan Kew 2009-01-14 02:12:16 PST
"Usually" not causing problems isn't very reassuring! Actually, it looks like the crash can happen after merely closing the window, it's not just a shutdown thing. And in a release build, without the assertion, there's a risk that we'll be dereferencing something stale elsewhere in the function, so we're still liable to crash.

I'm suspicious that this may be a result of the fix to bug 470418. That patch redefined the TEXT_INCOMING_ARABICCHAR flag to avoid conflicting with TEXT_IN_CACHE, which was most definitely a Bad Thing. However, it now conflicts with TEXT_IN_TEXTRUN_USER_DATA. :( This means that it may affect the behavior of nsContinuingTextFrame::Destroy(), at least: it looks like it could cause ClearTextRun() to be called in cases where otherwise it wouldn't have been. Could that be damaging things?

I haven't yet tested enough to confirm whether this is really the root of the problem, but it looks bad. Part of the trouble is that these flag bits are defined in a variety of places, so it's not immediately obvious if we're inadvertently reusing bits for multiple purposes.
Comment 4 Robert O'Callahan (:roc) (Mozilla Corporation) 2009-01-14 02:37:48 PST
"Usually not causing problems" is actually worse than "always causes problems", since it's harder to debug, which is why I added the assertion to ensure it "always causes problems" :-).

In gfxFont.h are the rules that are supposed to govern who owns what flags:
        CACHE_TEXT_FLAGS    = 0xF0000000,
        USER_TEXT_FLAGS     = 0x0FFF0000,
        PLATFORM_TEXT_FLAGS = 0x0000F000,
        TEXTRUN_TEXT_FLAGS  = 0x00000FFF

> However, it now conflicts with TEXT_IN_TEXTRUN_USER_DATA.

No, because TEXT_IN_TEXTRUN_USER_DATA is a frame state bit, and TEXT_INCOMING_ARABICCHAR is a textrun flag bit.
Comment 5 Jonathan Kew 2009-01-14 02:46:17 PST
Ah, ok... sorry for my confusion!
Comment 6 Jesse Ruderman 2009-01-23 15:37:36 PST
Nominating for blocking1.9.1 -- this looks especially easy to exploit, and it's a regression with a fairly simple testcase.
Comment 7 chris hofmann 2009-01-23 17:17:01 PST
looks like we get about 50-60 crash reports a day that look something like the stack below and a few other variations:

0  	xul.dll  	UnhookTextRunFromFrames  	layout/generic/nsTextFrameThebes.cpp:341
1 	xul.dll 	FrameTextRunCache::NotifyExpired 	layout/generic/nsTextFrameThebes.cpp:381
2 	xul.dll 	nsExpirationTracker<gfxTextRun,3>::AgeOneGeneration 	obj-firefox/dist/include/xpcom/nsExpirationTracker.h:210
3 	xul.dll 	nsExpirationTracker<gfxTextRun,3>::TimerCallback 	obj-firefox/dist/include/xpcom/nsExpirationTracker.h:299
4 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:420
5 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:512
6 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:510
7 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:170
8 	nspr4.dll 	PR_GetEnv 	
9 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:87
10 	firefox.exe 	firefox.exe@0x2197 	
11 	kernel32.dll 	BaseProcessStart

no comments, but I'm wondering if people just don't know what to say since they might be in the middle of a shutdown.
Comment 8 Robert O'Callahan (:roc) (Mozilla Corporation) 2009-02-04 20:16:19 PST
Created attachment 360649 [details] [review]
fix

The problem was when we constructed a textrun for a frame that already has one. We set TEXT_IN_TEXTRUN_USER_DATA on to frame and then call ClearTextRun on it before we set its mTextRun. Because it already has a textrun, we clear out that textrun and also clear the TEXT_IN_TEXTRUN_USER_DATA flag we just set... The fix is simple, just set TEXT_IN_TEXTRUN_USER_DATA after the ClearTextRun call.
Comment 9 Olli Pettay [:smaug] 2009-02-20 12:07:10 PST
Is there still something to do here, or could the patch land?
Comment 10 Robert O'Callahan (:roc) (Mozilla Corporation) 2009-02-20 12:18:18 PST
It's ready to land. I was going to land it myself, but you can if you like.
Comment 11 Robert O'Callahan (:roc) (Mozilla Corporation) 2009-02-24 00:50:27 PST
Pushed http://hg.mozilla.org/mozilla-central/rev/ef2d14abf75e
Comment 12 Robert O'Callahan (:roc) (Mozilla Corporation) 2009-02-26 13:09:27 PST
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/9bc41efd9298
Comment 13 Henrik Skupin (:whimboo) 2009-04-23 06:47:10 PDT
Verified fixed on trunk and 1.9.1 with builds on OS X and Windows:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090423 Minefield/3.6a1pre ID:20090423040020

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090423 Shiretoko/3.5b4pre ID:20090423040926
Comment 14 Daniel Holbert [:dholbert] 2009-04-28 21:38:25 PDT
It look like bug 490410 may be a 1.9.0.x version of this bug.
Comment 15 Daniel Holbert [:dholbert] 2009-04-28 21:47:49 PDT
This bug appears to have originally regressed between these two mozilla-central nightlies:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20081216 Minefield/3.2a1pre
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20081217 Minefield/3.2a1pre

--> looks like it was a regression from Bug 467150
Comment 16 Daniel Holbert [:dholbert] 2009-04-30 15:12:36 PDT
Regarding the approval1.9.0.11 request on this bug's patch -- I've confirmed that the patch fixes bug 490410, a 1.9.0.x crash-regression (as noted in bug 490410 comment 10).
Comment 17 Daniel Veditz 2009-05-01 10:49:15 PDT
Comment on attachment 360649 [details] [review]
fix

Approved for 1.9.0.11, a=dveditz for release-drivers
Comment 18 Daniel Holbert [:dholbert] 2009-05-02 16:55:23 PDT
Patch landed on CVS for 1.9.0.11:

Checking in layout/generic/nsTextFrameThebes.cpp;
/cvsroot/mozilla/layout/generic/nsTextFrameThebes.cpp,v  <--  nsTextFrameThebes.cpp
new revision: 3.190; previous revision: 3.189
done
RCS file: /cvsroot/mozilla/layout/generic/crashtests/472776-1.html,v
done
Checking in layout/generic/crashtests/472776-1.html;
/cvsroot/mozilla/layout/generic/crashtests/472776-1.html,v  <--  472776-1.html
initial revision: 1.1
done
Checking in layout/generic/crashtests/crashtests.list;
/cvsroot/mozilla/layout/generic/crashtests/crashtests.list,v  <--  crashtests.list
new revision: 1.124; previous revision: 1.123
done
Comment 19 Daniel Holbert [:dholbert] 2009-05-02 16:57:27 PDT
Created attachment 375493 [details] [review]
fix for 1.9.0.x (fixed bitrot in crashtests.list)

Here's the fix as landed on 1.9.0.x -- I just fixed the bitrot in crashtests.list.
Comment 20 Al Billings [:abillings] 2009-05-11 14:53:02 PDT
Is this a debug only crash? I cannot crash with 1.9.0.10 on Ubuntu or OS X.
Comment 21 Daniel Holbert [:dholbert] 2009-05-11 15:03:10 PDT
Yes, I believe it was debug-only on mozilla-central.

However, I'm not sure anyone's seen this crash in 1.9.0.10 debug builds.  (I haven't -- see bug 490410 comment 11.)  This bug's testcase may require some additional not-currently-backported-to-1.9.0.x code in order to crash.

In any case, though, we took this bug's patch to fix bug 490410's testcase, not to fix this bug's testcase.
Comment 22 Al Billings [:abillings] 2009-05-12 11:28:14 PDT
I've verified that bug 490410 is fixed in 1.9.0.11. I'm going to mark this as verified for 1.9.0.11.
Comment 23 John Daggett (:jtd) 2009-11-23 23:10:24 PST
Still occurring, sample crash report from 3.5.5:

http://crash-stats.mozilla.com/report/index/0b603ce5-63b2-4545-b5a9-537ac2091123

ClearAllTextRunReferences layout/generic/nsTextFrameThebes.cpp:57f71400f4cf|331|0x0
UnhookTextRunFromFrames layout/generic/nsTextFrameThebes.cpp:57f71400f4cf|355|0xd
FrameTextRunCache::NotifyExpired(gfxTextRun *) layout/generic/nsTextFrameThebes.cpp:57f71400f4cf|389|0xd
nsExpirationTracker<gfxTextRun,3>::AgeOneGeneration()|hg:hg.mozilla.org/releases/mozilla-1.9.1:obj-firefox/dist/include/xpcom/nsExpirationTracker.h:57f71400f4cf|210|0xa
nsExpirationTracker<gfxTextRun,3>::TimerCallback(nsITimer *,void *)|hg:hg.mozilla.org/releases/mozilla-1.9.1:obj-firefox/dist/include/xpcom/nsExpirationTracker.h:57f71400f4cf|299|0x8
nsTimerImpl::Fire()|hg:hg.mozilla.org/releases/mozilla-1.9.1:xpcom/threads/nsTimerImpl.cpp:57f71400f4cf|420|0x6
nsTimerEvent::Run()|hg:hg.mozilla.org/releases/mozilla-1.9.1:xpcom/threads/nsTimerImpl.cpp:57f71400f4cf|512|0x4
nsThread::ProcessNextEvent(int,int *)|hg:hg.mozilla.org/releases/mozilla-1.9.1:xpcom/threads/nsThread.cpp:57f71400f4cf|521|0x13
Comment 24 Henrik Skupin (:whimboo) 2009-11-24 01:12:09 PST
(In reply to comment #23)
> Still occurring, sample crash report from 3.5.5:
> 
> http://crash-stats.mozilla.com/report/index/0b603ce5-63b2-4545-b5a9-537ac2091123

We have closed this bug a long time ago. I think it would be better to handle it in a new bug.

Note You need to log in before you can comment on or make changes to this bug.