Last Comment Bug 522030 - Can still crash due to weak refs in the id table
: Can still crash due to weak refs in the id table
Status: RESOLVED FIXED
: [sg:investigate]
: crash, fixed1.9.0.15, fixed1.9.0.16
Product: Core
Classification: Components
Component: XUL
: Trunk
: All All
: P1 critical (vote)
: mozilla1.9.3a1
Assigned To: Boris Zbarsky (:bz)
: xptoolkit.widgets
:
:
: 521969 489925
  Show dependency treegraph
 
Reported: 2009-10-13 07:30 PDT by Boris Zbarsky (:bz)
Modified: 2009-11-09 18:59 PST (History)
16 users (show)
mbeltzner: blocking1.9.2+
samuel.sidler+old: blocking1.9.0.15+
samuel.sidler+old: blocking1.9.0.16+
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  beta1-fixed
  .4+
  .4-fixed


Attachments
Proposed fix (4.91 KB, patch)
2009-10-13 13:36 PDT, Boris Zbarsky (:bz)
jst: review+
Details | Diff | Splinter Review
Said crashtest (826 bytes, application/xhtml+xml)
2009-10-13 16:32 PDT, Boris Zbarsky (:bz)
no flags Details
1.9.1 fix (5.62 KB, patch)
2009-10-13 22:32 PDT, Boris Zbarsky (:bz)
jst: review+
samuel.sidler+old: approval1.9.1.4+
samuel.sidler+old: approval1.9.1.6+
Details | Diff | Splinter Review
1.9.0 fix (6.49 KB, patch)
2009-10-13 22:44 PDT, Boris Zbarsky (:bz)
jst: review+
samuel.sidler+old: approval1.9.0.15+
samuel.sidler+old: approval1.9.0.16+
Details | Diff | Splinter Review

Summon comment box

Description Boris Zbarsky (:bz) 2009-10-13 07:30:01 PDT
I hate XUL.  Fix + test coming up, though I still need to try-server this.
Comment 1 Boris Zbarsky (:bz) 2009-10-13 07:36:55 PDT
Er, found one issue I'm not sure how to deal with.   There's an AddElementForID caller in the template code that seems to add an element to the id table.  Can this ever be an anonymous element?  Does that element ever actually get added to the DOM?  If so, where?  As far as I can tell, it's added to the id table before it's added to the DOM, which means we can't tell whether it's anonymous...
Comment 2 Samuel Sidler (old account; do not CC) 2009-10-13 10:53:50 PDT
We're going to re-spin both Firefox 3.0.15 and 3.5.4 for this issue.
Comment 3 Mike Beltzner [:beltzner] 2009-10-13 12:54:32 PDT
Ugh.

Here's the deal: according to bz there are two fixes for this: a branch-safe one and a more-correct one.

The branch-safe fix will come with some small perf costs (~1% on various metrics, though not Ts) and potential for memory leaks (this worries bz the most, and consequently, me) but has the benefit of not absolutely shattering add-on compatibility.

The more-correct fix comes without perf or memory leak costs, but absolutely shatters add-on compatibility and parts of the Firefox UI (which bz said he can fix).

We've decided to start with the more pressing branch-safe fix, which will be checked into mozilla-central and mozilla-1.9.2 to get test coverage and measure the performance impact. This means holding the beta a little longer, so marking this as a P1 blocker. Based on the performance impact, we'll make a decision about what to do for mozilla-1.9.2; in order to preserve add-on compatibility, we might have to suck it up.
Comment 4 Boris Zbarsky (:bz) 2009-10-13 13:36:55 PDT
Created attachment 406072 [details] [review]
Proposed fix

This applies on top of a backout of the patch for bug 489925.
Comment 5 Johnny Stenback (:jst, jst@mozilla.com) 2009-10-13 13:44:45 PDT
Comment on attachment 406072 [details] [review]
Proposed fix

Looks good! r=jst
Comment 6 Boris Zbarsky (:bz) 2009-10-13 13:53:52 PDT
Pushed:

http://hg.mozilla.org/mozilla-central/rev/b40bf2ef14a7
http://hg.mozilla.org/mozilla-central/rev/b5f738553e38

(backout of the original fix for bug 489925 plus this fix) and 

http://hg.mozilla.org/releases/mozilla-1.9.2/rev/9ef0bceaf6b2
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/96207da1b7c1

Will wait for those to cycle and give us perf and leak data, then request branch approvals.  Will also file followup bug for the real trunk (and maybe 1.9.2) fix.
Comment 7 Boris Zbarsky (:bz) 2009-10-13 16:31:35 PDT
This is most emphatically NOT in-testsuite+.  It can't be until we open up the bug (at which point a modified version of the testcase in bug 489925 should be checked in as a crashtest.
Comment 8 Boris Zbarsky (:bz) 2009-10-13 16:32:08 PDT
Created attachment 406133 [details]
Said crashtest
Comment 10 Boris Zbarsky (:bz) 2009-10-13 22:32:51 PDT
Created attachment 406177 [details] [review]
1.9.1 fix

Please pay particular attention to the ID_NOT_IN_DOCUMENT stuff.  I made changes to Traverse(), ~nsIdentifierMapEntry, and AddIdContent to handle it; any other obvious changes that need to happen?

Again, this applies on top of a backout of bug 489925 on 1.9.1 branch.
Comment 11 Boris Zbarsky (:bz) 2009-10-13 22:44:51 PDT
Created attachment 406179 [details] [review]
1.9.0 fix

Lots of merging here... Please look over carefully?
Comment 12 Mike Beltzner [:beltzner] 2009-10-13 23:41:08 PDT
No discerned perf impact or leaking on mozilla-central or mozilla-1.9.2. Boris, how should we be looking for leaks? I'm copying in Leakcat who might be able to help there ...

In the meantime, I think we should consider this as fixed for mozilla-1.9.2 until we get evidence that it's caused bigger problems.
Comment 13 Boris Zbarsky (:bz) 2009-10-13 23:50:40 PDT
As far as leaks, I had two specific worries:

1)  Document leaks through shutdown.  Just browsing around a bunch with leak logging enabled will catch these.  I suspect there won't be many, if any, given the cycle collection hookups this patch ended up taking advantage of.

2)  Element leaks for document lifetime.  I don't know that we have a sane way to measure this; it would only bite situations where the elements are anonymous, so not likely to hit web pages.  Extensions that leak due to this category of leaks would have crashed before, so probably a net win...
Comment 14 Carsten Book [:Tomcat] 2009-10-14 07:30:50 PDT
(In reply to comment #13)
> As far as leaks, I had two specific worries:
> 

So for 1)  i will run a test over the Top 10000 Topsites to check for leaks and for 2) a run with the Top Extensions installed
Comment 15 Boris Zbarsky (:bz) 2009-10-14 08:42:09 PDT
That sounds great.  Thanks!

We probably want to do that with all of 1.9.2, 1.9.1, and 1.9.0, since the code is significantly different and the cycle collector behavior is significantly different between them.  :(
Comment 16 Carsten Book [:Tomcat] 2009-10-14 09:21:46 PDT
(In reply to comment #15)
> That sounds great.  Thanks!
> 
> We probably want to do that with all of 1.9.2, 1.9.1, and 1.9.0, since the code
> is significantly different and the cycle collector behavior is significantly
> different between them.  :(

yeah that's no problem, can cover all this branches with the vm's
Comment 17 Samuel Sidler (old account; do not CC) 2009-10-14 09:59:02 PDT
Johnny's not actually CCed to this bug...
Comment 18 Johnny Stenback (:jst, jst@mozilla.com) 2009-10-14 14:17:51 PDT
Comment on attachment 406177 [details] [review]
1.9.1 fix

Looks good.
Comment 19 Samuel Sidler (old account; do not CC) 2009-10-14 14:30:25 PDT
Comment on attachment 406179 [details] [review]
1.9.0 fix

Approved for 1.9.0.15. a=ss for release-drivers
Comment 20 Samuel Sidler (old account; do not CC) 2009-10-14 14:30:28 PDT
Comment on attachment 406177 [details] [review]
1.9.1 fix

Approved for 1.9.1.4. a=ss for release-drivers
Comment 22 Boris Zbarsky (:bz) 2009-10-14 15:14:46 PDT
Checked into CVS:

Checking in content/html/document/src/nsHTMLDocument.cpp;
/cvsroot/mozilla/content/html/document/src/nsHTMLDocument.cpp,v  <--  nsHTMLDocument.cpp
new revision: 3.794; previous revision: 3.793
done
Checking in content/html/document/src/nsHTMLDocument.h;
/cvsroot/mozilla/content/html/document/src/nsHTMLDocument.h,v  <--  nsHTMLDocument.h
new revision: 3.232; previous revision: 3.231
done
Checking in content/html/document/src/nsHTMLDocument.cpp;
/cvsroot/mozilla/content/html/document/src/nsHTMLDocument.cpp,v  <--  nsHTMLDocument.cpp
new revision: 3.795; previous revision: 3.794

for 1.9.0.16 and:

Checking in content/html/document/src/nsHTMLDocument.cpp;
/cvsroot/mozilla/content/html/document/src/nsHTMLDocument.cpp,v  <--  nsHTMLDocument.cpp
new revision: 3.793.2.1; previous revision: 3.793
done
Checking in content/html/document/src/nsHTMLDocument.cpp;
/cvsroot/mozilla/content/html/document/src/nsHTMLDocument.cpp,v  <--  nsHTMLDocument.cpp
new revision: 3.793.2.1; previous revision: 3.793
done

for 1.9.0.15
Comment 23 Boris Zbarsky (:bz) 2009-10-14 15:15:24 PDT
Er, the last part of the 1.9.0.15 cvs thing should have been:

Checking in content/html/document/src/nsHTMLDocument.h;
/cvsroot/mozilla/content/html/document/src/nsHTMLDocument.h,v  <--  nsHTMLDocument.h
new revision: 3.231.12.1; previous revision: 3.231
done
Checking in content/html/document/src/nsHTMLDocument.cpp;
/cvsroot/mozilla/content/html/document/src/nsHTMLDocument.cpp,v  <--  nsHTMLDocument.cpp
new revision: 3.793.2.2; previous revision: 3.793.2.1
done
Comment 24 Henrik Skupin (:whimboo) 2009-10-15 06:17:42 PDT
Boris, I wonder how I can verify this fix. I tried your attached crash test with builds from 10/13 but I cannot get those to crash.
Comment 25 Boris Zbarsky (:bz) 2009-10-15 08:05:51 PDT
The attached test doesn't crash in those builds due to the fix for bug 489925 being present in them.  Let me see if I can write a testcase which will crash them.
Comment 26 Boris Zbarsky (:bz) 2009-10-15 10:55:08 PDT
OK, I haven't succeeded at it yet, because XUL does weird things on subtree removals...  I _think_ it's still possible to trigger such crashes, but I'm not sure its worth spending a few days to figure out the exact right codepath to hit.
Comment 27 Al Billings [:abillings] 2009-10-16 16:33:55 PDT
This doesn't crash with build 3 of 1.9.0.15 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.15) Gecko/2009101601 Firefox/3.0.15 (.NET CLR 3.5.30729)) or build 1 of it or 1.9.0.14. 

Is there a means to verify this bug for 1.9.0 and 1.9.1?
Comment 28 Boris Zbarsky (:bz) 2009-10-16 17:21:55 PDT
Al, I don't have a crash testcase for you, sorry.

One could try to verify buy fuzzing the testcase posted in this bug and especially a XUL (not XHTML) version of that testcase?

Note You need to log in before you can comment on or make changes to this bug.