Last Comment Bug 460706 - Crash [@ little2_updatePosition] with xmlhttprequest and large xml file
: Crash [@ little2_updatePosition] with xmlhttprequest and large xml file
Status: VERIFIED FIXED
: [sg:critical?] post 1.8-branch
: crash, testcase, verified1.9.0.7, verified1.9.0.9, verified1.9.1
Product: Core
Classification: Components
Component: XML
: Trunk
: x86 Windows XP
: P2 critical (vote)
: mozilla1.9.1b3
Assigned To: Blake Kaplan (:mrbkap)
: xml
:
: 444322 482293
:
  Show dependency treegraph
 
Reported: 2008-10-19 17:44 PDT by Martijn Wargers [:mw22] (QA - IRC nick: mw22)
Modified: 2009-10-26 18:38 PDT (History)
19 users (show)
jonas: blocking1.9.1+
dveditz: blocking1.9.0.7+
dveditz: blocking1.9.0.9+
dveditz: wanted1.9.0.x+
samuel.sidler+old: wanted1.8.1.x-
mrbkap: in‑testsuite+
See Also:
Crash Signature:
[@ little2_updatePosition]


Attachments
testcase (74.76 KB, application/xhtml+xml)
2008-10-19 17:44 PDT, Martijn Wargers [:mw22] (QA - IRC nick: mw22)
no flags Details
Fix [Checkin: Comment 19] (1.02 KB, patch)
2008-12-15 15:08 PST, Blake Kaplan (:mrbkap)
jonas: review+
jonas: superreview+
dveditz: approval1.9.0.7+
Details | Diff | Splinter Review

Summon comment box

Description Martijn Wargers [:mw22] (QA - IRC nick: mw22) 2008-10-19 17:44:47 PDT
Created attachment 343855 [details]
testcase

See testcase, which crashes current trunk build and Firefox 3 on load for me.
This might be related to bug 314660, although the testcase doesn't crash in Firefox 2.

http://crash-stats.mozilla.com/report/index/cc37c568-9e3c-11dd-835e-001cc45a2ce4?p=1
0  	xul.dll  	little2_updatePosition  	 parser/expat/lib/xmltok_impl.c:1745
1 	xul.dll 	MOZ_XML_ResumeParser 	parser/expat/lib/xmlparse.c:1787
2 	xul.dll 	xul.dll@0x2ab720
Comment 1 Jonas Sicking (:sicking) 2008-12-02 14:56:31 PST
Based on testcase this looks very scary, quite possibly exploitable.

Peterv: Can you look into this since this looks like an expat bug?
Comment 2 Jonas Sicking (:sicking) 2008-12-02 14:57:24 PST
Peter, see above comment.
Comment 3 Johnny Stenback (:jst, jst@mozilla.com) 2008-12-03 15:32:29 PST
On second thought, we won't block the release on this. Peter, please do work on this as soon as you have time to though.
Comment 4 Peter Van der Beken [:peterv] 2008-12-04 06:00:00 PST
Can't reproduce on OS X, might be Windows only?
Comment 5 Peter Van der Beken [:peterv] 2008-12-04 06:02:07 PST
Actually, nevermind. Need to make sure to load as XML.
Comment 6 Peter Van der Beken [:peterv] 2008-12-04 07:10:40 PST
Doesn't look like an Expat bug.

Looks like the first script will block the parser, we'll evaluate the script (javascript:// URL), unblock the parser and post a ContinueInterruptedParsing event. Then we'll evaluate the next script which does a synchronous XMLHttpRequest, which spins the event loop and we'll process the ContinueInterruptedParsing event. This will make us parse the original document to the end, but the Expat code that's on the stack (waiting for the script evaluation to end) doesn't know that. It seems odd that we post a ContinueInterruptedParsing event when synchronously evaluating a script (javascript:// URL).

Jonas or Boris, feel free to chime in :-).
Comment 7 Jonas Sicking (:sicking) 2008-12-04 12:35:46 PST
Wait, the javascript url isn't synchronously being evaluated, is it? It's when we execute the second <script> that we will re-enter expat in unholy ways. I'm not sure why the javascript:// thing is needed at all.

I think that mrbkaps patch in bug 444322 might actually fix this bug, unless there is further interaction with the javascript url.
Comment 8 Peter Van der Beken [:peterv] 2008-12-04 12:58:21 PST
(In reply to comment #7)
> Wait, the javascript url isn't synchronously being evaluated, is it?

You're right, need some more debugging.
Comment 9 Peter Van der Beken [:peterv] 2008-12-05 07:11:22 PST
We get a ScriptAvailable for the javascript:// URL with aResult==NS_ERROR_DOM_RETVAL_UNDEFINED. ScriptAvailable ublocks the parser and posts a ContinueInterruptedParsing event. Not 100% sure yet about the next part, looks like we get more data at that point (there's a nsInputStreamReadyEvent in the queue), so we parse some more. That makes us evaluate the inline script, which spins the event queue (XMLHttpRequest.send) and we run the ContinueInterruptedParsing event which was after the nsInputStreamReadyEvent in the queue.

It seems wrong to enable the parser and then post the event. Making the event enable the parser if the script load failed does fix this bug, but I need to think a bit more about it. I wonder if we could hit the same problem with a successful script load, but in that case we probably need to enable the parser from ScriptAvailable (for document.write?). Maybe we should block the parser in ScriptEvaluated then and let the event unblock there too?
Comment 10 Peter Van der Beken [:peterv] 2008-12-05 07:19:25 PST
Hmm, we seem to get NS_ERROR_DOM_RETVAL_UNDEFINED because "WARNING: No principal to execute JS with: file /Users/peterv/source/mozilla/central/mozilla/dom/src/jsurl/nsJSProtocolHandler.cpp, line 181"
Comment 11 Peter Van der Beken [:peterv] 2008-12-05 07:30:09 PST
Replacing the javascript:// URL with data:text/javascript, still gets very confused.
Comment 12 Jonas Sicking (:sicking) 2008-12-05 08:52:19 PST
Note that the fix for bug 444322 will make us not parse anything while waiting for the XHR to finish loading. Maybe that is enough here?
Comment 13 Peter Van der Beken [:peterv] 2008-12-05 09:44:32 PST
Yeah, looks like this will be fixed by bug 444322.
Comment 14 Martijn Wargers [:mw22] (QA - IRC nick: mw22) 2008-12-12 09:21:40 PST
I still crash with the testcase, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20081212 Minefield/3.2a1pre
But I had to test the testcase locally, online it caused a hang for me.
Comment 15 Jonas Sicking (:sicking) 2008-12-12 14:37:57 PST
Remarking as blocking then :(

Martijn, you were using a build with bug 444322 fixed right?
Comment 16 Martijn Wargers [:mw22] (QA - IRC nick: mw22) 2008-12-12 15:39:03 PST
I tested with the 20081212 build, I think it has the fix for 444322 in, afaict. Anyway, I'll try to retest tomorrow.
Comment 17 Martijn Wargers [:mw22] (QA - IRC nick: mw22) 2008-12-15 04:27:25 PST
Also crashes, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20081214 Minefield/3.2a1pre
Comment 18 Blake Kaplan (:mrbkap) 2008-12-15 15:08:23 PST
Created attachment 353118 [details] [review]
Fix
[Checkin: Comment 19]

I had thought this was impossible, but it wasn't. What happens is that we process the first <script src />, then the content sink does a ContinueInterruptedParsingAsync. This returns to the event loop, hoping to get the newly posted event. But we've already received an OnDataAvailable, so we enter the parser, which does a synchronous XMLHttpRequest. This processes events, one of which is the "continue your interrupted parsing" event, re-entering the parser, and we lose.
Comment 19 Blake Kaplan (:mrbkap) 2008-12-19 15:53:58 PST
http://hg.mozilla.org/mozilla-central/rev/8159ae6a8046
Comment 20 Blake Kaplan (:mrbkap) 2008-12-19 17:39:00 PST
http://hg.mozilla.org/mozilla-central/rev/9990da98d7b7 contains a crashtest, too.
Comment 21 Daniel Veditz 2008-12-19 17:49:45 PST
Given the now-public crashtest we'd better make sure we land the fix on 3.0.x sooner rather than later.
Comment 22 Joe Drew (:JOEDREW!) 2008-12-19 20:33:24 PST
At least some part of this was backed out by Serge.
Comment 23 Serge Gautherie (:sgautherie) 2008-12-19 21:02:39 PST
(In reply to comment #20)
> http://hg.mozilla.org/mozilla-central/rev/9990da98d7b7 contains a crashtest,
> too.

This test crashes randomly, at least on Linux and Windows...

Backed out:
http://hg.mozilla.org/mozilla-central/rev/8a97d8909df3
http://hg.mozilla.org/mozilla-central/rev/050ae62b7d9d

See for example:
{
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229740533.1229744280.21191.gz
Linux mozilla-central moz2-linux-slave08 dep unit test on 2008/12/19 18:35:33

REFTEST TEST-PASS | file:///builds/slave/trunk_linux-8/build/parser/htmlparser/tests/crashtests/423373-1.html | (LOAD ONLY)
command timed out: 1200 seconds without output, killing pid 15188
}
Comment 24 Brandon Sterne (:bsterne) 2009-01-07 11:35:25 PST
We placed this on our Top Security Bugs list... can we get a new patch as soon as possible?  Thanks!
Comment 25 Daniel Veditz 2009-01-07 15:48:17 PST
Not going to get a baked patch in time for 3.0.6, aim for next time.
Comment 26 Samuel Sidler (old account; do not CC) 2009-01-08 10:18:25 PST
I'm assuming we don't want this on 1.8.1 since it doesn't crash there. Blake, correct me if I'm wrong.
Comment 27 Samuel Sidler (old account; do not CC) 2009-02-02 07:50:08 PST
Blake: Where are you on a patch for this?
Comment 28 Blake Kaplan (:mrbkap) 2009-02-02 11:54:46 PST
I have no clue why the crashtest crashes on tinderbox. Nobody else can seem to make it crash, either. IMO we should just land this patch on all of the branches, since it does fix the crash that I *was* able to reproduce.
Comment 29 Samuel Sidler (old account; do not CC) 2009-02-02 15:57:22 PST
Let's try re-landing it on mozilla-central during a quiet period and seeing if it sticks this time. If not, we need to either disable that test and find a better one. If this stick on m-c, we'll approve it for 1.9.0.7, given the relative safety of the patch.
Comment 30 Blake Kaplan (:mrbkap) 2009-02-04 15:20:40 PST
I tried again. It did not stick.
Comment 31 Daniel Veditz 2009-02-04 15:35:10 PST
I think we've got to push this out until we figure out the test situation. When you've tried reproducing the crash (with a fixed build) do you run all the tests before it or just this one test? It could be the previous tests are creating some bad state that this test is merely exposing.
Comment 32 Blake Kaplan (:mrbkap) 2009-02-04 18:55:55 PST
Today I spent a couple of hours running all of the tests. I didn't crash in my test, but I did find bug 476975. Still no clue what's going on with my test.
Comment 33 Blake Kaplan (:mrbkap) 2009-02-12 15:58:09 PST
Comment on attachment 353118 [details] [review]
Fix
[Checkin: Comment 19]

This is causing bug 478277 on the 1.9.0 branch. It has been baking on trunk for a while with no ill effects. I think we should get the patch in and deal with the test stupidity later.
Comment 34 Daniel Veditz 2009-02-12 16:14:14 PST
mrbkap wants this on the respin shortlist, nom-ing for 1.9.0.7 should that come to pass.
Comment 35 Damon Sicore (:damons) 2009-02-17 10:50:41 PST
Are we just waiting on approval here?
Comment 36 Blake Kaplan (:mrbkap) 2009-02-17 12:15:04 PST
Pretty much (I'm asking for explicit approval for the 1.9.1 branch because of the confusion with the crashtest). This bug has been FIXED (as far as I know) for a while, but it's been in-testsuite? because the crashtest apparently crashes on tinderbox and nowhere else.
Comment 37 Daniel Veditz 2009-02-17 16:55:30 PST
mrbkap: you get your re-spin ride-along wish. Please land on the 1.8 branch and add the fixed1.9.0.8 keyword, and on the GECKO190_20090217_RELBRANCH and add fixed1.9.0.7
Comment 38 Daniel Veditz 2009-02-17 16:59:27 PST
I meant 1.9.0 branch (CVS HEAD) with the fixed1.9.0.8 keyword, of course
Comment 39 Daniel Veditz 2009-02-17 17:01:03 PST
Comment on attachment 353118 [details] [review]
Fix
[Checkin: Comment 19]

Approved for 1.9.0.7, a=dveditz for release-drivers
Comment 40 Serge Gautherie (:sgautherie) 2009-02-17 17:01:21 PST
(In reply to comment #20)
> http://hg.mozilla.org/mozilla-central/rev/9990da98d7b7 contains a crashtest,
> too.

Could you attach this test patch here, ftr.

(In reply to comment #23)
> This test crashes randomly, at least on Linux and Windows...

Fwiw, I wondered about:
does this test patch crash without the code patch ?
Is the fix not enough ? Is it wrong ?
Is the test wrong ? Does it trigger some other bug/crash ?
Comment 41 Blake Kaplan (:mrbkap) 2009-02-17 17:10:44 PST
(In reply to comment #40)
> Could you attach this test patch here, ftr.

Will do, shortly (it's just the testcase in this bug in crashtest form).

> does this test patch crash without the code patch ?

Yes.

> Is the fix not enough ? Is it wrong ?

That would be the million dollar question. I haven't been able to reproduce the crash that the tinderboxes have been seeing, and the fix here is correct as far as it goes so it's really hard to answer definitively.

> Is the test wrong ? Does it trigger some other bug/crash ?

This is my suspicion, but until tinderbox cna report the stack for its crash, we can't answer it.
Comment 42 Blake Kaplan (:mrbkap) 2009-02-17 17:11:52 PST
Fix checked in everywhere. I'm going to file a new bug on getting the crashtest into the tree.
Comment 43 Blake Kaplan (:mrbkap) 2009-02-17 17:15:18 PST
(In reply to comment #41)
> (In reply to comment #40)
> > Could you attach this test patch here, ftr.
> 
> Will do, shortly (it's just the testcase in this bug in crashtest form).

Bug 478978 contains the testcase.
Comment 44 Blake Kaplan (:mrbkap) 2009-02-17 17:16:06 PST
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/e8dcf87ee2b9
Comment 45 Al Billings [:abillings] 2009-02-19 11:51:29 PST
Crashes 1.9.0.6 cleanly on Windows XP. On the 1.9.0 nightly from last night (since I'm waiting for Windows 1.9.0.7 bits), it takes my CPU to 98% and sits there for more than five minutes. 

There is no crash though. Does this count as a "fix" since it doesn't crash?

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7pre) Gecko/2009021705 GranParadiso/3.0.7pre (.NET CLR 3.5.30729)
Comment 46 Martijn Wargers [:mw22] (QA - IRC nick: mw22) 2009-02-20 12:20:59 PST
Yeah, I've noticed the hang too, I filed bug 460706 for it.
This bug was indeed about the crash, so this indeed counts as a fix, so marking verified in the corresponding areas.
Comment 47 Al Billings [:abillings] 2009-02-20 13:50:48 PST
I let it run for a couple of hours too. It pegged my CPU and just sat there the whole time.
Comment 48 Martijn Wargers [:mw22] (QA - IRC nick: mw22) 2009-02-20 13:58:16 PST
(In reply to comment #46)
> Yeah, I've noticed the hang too, I filed bug 460706 for it.

Sorry, I meant I filed bug 479499 for it.
Comment 49 Blake Kaplan (:mrbkap) 2009-03-02 16:22:29 PST
I relanded the crashtest since bug 479499 is now fixed.
Comment 50 Rob Campbell [:rc] (robcee) 2009-03-09 12:15:28 PDT
we've found a regression caused by this bug (verified by bc).

regression range was February 17-18th and is in 1.9.1.

Followed a separate bug 482293 on the regression.
Comment 51 Rob Campbell [:rc] (robcee) 2009-03-09 12:19:58 PDT
sorry for the spamming, meant to say that the regression is in 1.9.1, 1.9.0 and trunk.
Comment 52 John J. Barton 2009-03-17 11:41:01 PDT
(In reply to comment #1)
> Based on testcase this looks very scary, quite possibly exploitable.

How can this be exploited? I don't see how the original crash or fix is security related.  Rather the problem is basically threading related (as is 482293).
Comment 53 Blake Kaplan (:mrbkap) 2009-03-17 15:54:50 PDT
(In reply to comment #52)
> How can this be exploited? I don't see how the original crash or fix is
> security related.  Rather the problem is basically threading related (as is
> 482293).

The concern is that since we're re-entering expat, we could end up stomping over free'd memory when we unwind.
Comment 54 Aakash Desai [:aakashd] 2009-07-23 14:13:41 PDT
No crash, but I do receive the crash on 1.9.1:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2pre) Gecko/20090723 Shiretoko/3.5.2pre (.NET CLR 3.5.30729) ID:20090723050841
Comment 55 Aakash Desai [:aakashd] 2009-07-23 15:38:18 PDT
Sigh, thanks for the heads up, ss. The second "crash" is supposed to be "hang"

Note You need to log in before you can comment on or make changes to this bug.