Last Comment Bug 650874 - crash [@ ExecuteTree]
: crash [@ ExecuteTree]
Status: RESOLVED FIXED
: [sg:critical?]
: crash, testcase, topcrash, verified1.9.2
Product: Core
Classification: Components
Component: JavaScript Engine
: 1.9.2 Branch
: All All
: -- critical (vote)
: ---
Assigned To: Luke Wagner [:luke]
: general
: http://www.google.de/#sclient=psy&hl=...
: 663666
:
  Show dependency treegraph
 
Reported: 2011-04-18 11:31 PDT by minhdk92
Modified: 2011-07-12 09:09 PDT (History)
26 users (show)
See Also:
Crash Signature:
[@ ExecuteTree]
  -
  unaffected
  -
  unaffected
  -
  unaffected
  ---
  ---
  ---
  ---
  unaffected
  .18+
  .18-fixed
  needed
  wanted


Attachments
testcase (8.16 KB, text/html)
2011-04-19 03:59 PDT, Henrik Skupin (:whimboo)
no flags Details
Minimized testcase (364 bytes, text/html)
2011-04-19 04:34 PDT, Henrik Skupin (:whimboo)
no flags Details
WinDBG stack (58.52 KB, text/plain)
2011-04-19 11:04 PDT, Henrik Skupin (:whimboo)
no flags Details
don't trace slow natives (1.50 KB, patch)
2011-05-13 21:31 PDT, Luke Wagner [:luke]
gal: review+
clegnitto: approval1.9.2.18-
clegnitto: approval1.9.1.20-
Details | Diff | Splinter Review
(maybe) fix actual problem (716 bytes, patch)
2011-05-16 15:20 PDT, Luke Wagner [:luke]
abillings: review+
dvander: review+
jorendorff: review+
clegnitto: approval1.9.2.18+
clegnitto: approval1.9.1.20+
Details | Diff | Splinter Review

Summon comment box

Description minhdk92 2011-04-18 11:31:37 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16

error crash reporter

Reproducible: Always
Comment 1 David Baron [:dbaron] 2011-04-18 12:37:20 PDT
Moving bp-4c8a5444-4a10-4076-b764-36ef22110418 out of the summary line.
Comment 2 Henrik Skupin (:whimboo) 2011-04-19 03:07:51 PDT
This is now our top4 topcrash for Firefox 3.6.16 and happens across all platforms. It started around April 14th and got a high spike yesterday. It's still present in yesterdays nightly build on 3.6 and in our official 3.6.17 beta version. It also affects Firefox 3.5, but 4.0 seems to be unaffected and only shows a high cpu load and memory usage. Crash also happens in a fresh profile.

Firefox 3.6.18pre: bp-f23cf48b-bed7-45a1-a243-e6c372110419

0 		@0x189bfe63 	
1 	libmozjs.dylib 	ExecuteTree 	js/src/jstracer.cpp:6276
2 	libmozjs.dylib 	js_MonitorLoopEdge 	js/src/jstracer.cpp:6749
3 	libmozjs.dylib 	js_Interpret 	js/src/jsops.cpp:904
4 	libmozjs.dylib 	js_Execute 	js/src/jsinterp.cpp:1601
5 	libmozjs.dylib 	JS_EvaluateUCScriptForPrincipals 	js/src/jsapi.cpp:5057
6 	XUL 	nsJSContext::EvaluateString 	dom/base/nsJSEnvironment.cpp:1764


Firefox 3.5.20pre: bp-4711dfda-76b6-4761-b8b7-37e8e2110419

0  	 	@0x426ce69  	
1 		@0xbfffbbc7 	
2 	libmozjs.dylib 	js_MonitorLoopEdge 	js/src/jstracer.cpp:4931
3 	libmozjs.dylib 	js_Interpret 	js/src/jsinterp.cpp:3877
4 	libmozjs.dylib 	js_Execute 	js/src/jsinterp.cpp:1622
5 	libmozjs.dylib 	JS_EvaluateUCScriptForPrincipals 	js/src/jsapi.cpp:5132
6 	XUL 	nsJSContext::EvaluateString 	dom/src/base/nsJSEnvironment.cpp:1753


As you can see there is a slightly different stack between 3.6 and 3.5.
Comment 3 Henrik Skupin (:whimboo) 2011-04-19 03:39:38 PDT
On mozilla-central the crash has been fixed between 10022405 and 10022503.

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ddfecbc93934&tochange=a29d44f196a6

I wonder if this is related to plugins like Flash. I will further investigate.
Comment 4 Henrik Skupin (:whimboo) 2011-04-19 03:59:15 PDT
Created attachment 526949 [details]
testcase
Comment 5 Henrik Skupin (:whimboo) 2011-04-19 04:10:35 PDT
Completely unrelated to plugins. It's somekind CSS and JS related. The patches on bug 542592 or bug 461199 could have probably fixed it on mozilla-central.
Comment 6 Henrik Skupin (:whimboo) 2011-04-19 04:34:23 PDT
Created attachment 526956 [details]
Minimized testcase

This is the minimized version of the testcase. I can't reduce it more without preventing the crash to happen.
Comment 7 chris hofmann 2011-04-19 07:25:59 PDT
yes, something caused an explosion in this signature just after april 13, going from around 300-400 crashes per day to now over 3000 per day.

date     crashes at
         ExecuteTree
20110408 401
20110409 334
20110410 386
20110411 387
20110412 364
20110413 379
20110414 1681
20110415 3268
20110416 3247

most of the new urls see after the explosion are out of wyciwg cache at for the site www.av.by

1428 wyciwyg: http: www.av.by
 378 wyciwyg: http: av.by

http://www.av.by/public/public.php?event=View&public_id=2221144

when not coming from the wyciwyg cache there are also crashes on av.by

 1 http://www.av.by/public/search.php?event=Search&category_parent=545&country_id=1&category_id=0&body_type_id=0&class_id=0&engine_type_all=1&volume_value=0&volume_value_max=0&cylinders_number=0&run_value=0&run_value_max=0&run_unit=-1&transmission_id=0&year_
   1 http://www.av.by/public/public.php?event=View_XXL&public_id=2238995
   1 http://www.av.by/public/public.php?event=View&public_id=2233754
   1 http://www.av.by/public/public.php?event=View&public_id=2110218
   1 http://www.av.by/public/index.php?event=3&category_id=78&show_new=
   1 http://www.av.by/public/index.php?event=3&category_id=1135&country_id=0&order_id=3&show_new=0&page=2
   1 http://www.av.by/
   1 http://forum.av.by/viewtopic.php?p=586579
   1 http://av.by/public/public.php?event=View&public_id=2063090
   1 http://av.by/public/public.php?event=Image_Next&public_id=2240666&image_id=2
   1 http://av.by/news/index.php?event=View&news_id=13276
Comment 8 Christian Legnitto [:LegNeato] 2011-04-19 07:43:36 PDT
Blocking the in progress branch releases. We intend to build tomorrow for a different issue and need this resolved by then
Comment 9 chris hofmann 2011-04-19 07:48:39 PDT
as mentioned this is mostly affecting 3.6.

checking --- ExecuteTree 20110416-crashdata.csv
found in: 3.6.16 3.6.13 4.0 3.6.3 3.6 3.6.8 3.6.6 3.6.15 3.6.10 3.6.12 3.6.14 3.6b2 3.6.4 3.6.11 3.6.2 3.6b5 3.6b1 3.6.7 3.6.3plugin1 4.0b12 3.6b4 3.6b3 3.6.9 4.0b9 4.0b8 4.0b7 4.0b13pre 4.0.1 3.6.17pre
release total-crashes
              ExecuteTree crashes
                         pct.
all     270520  3247    0.0120028
3.6.16  117456  2441    0.0207823
3.6.13  13552   250     0.0184475
4.0     66559   86      0.00129209
3.6.3   4403    72      0.0163525
3.6     3510    69      0.0196581
3.6.8   2862    62      0.0216632
3.6.6   1509    38      0.0251822
3.6.15  4134    37      0.00895017
3.6.10  2576    35      0.013587

I think we did some thing in 4.0 to improve in this area, and I'm guessing unless we figure out what changes to backport the only other option would be some site outreach to figure out content work arounds.
Comment 10 Henrik Skupin (:whimboo) 2011-04-19 07:55:44 PDT
(In reply to comment #9)
> as mentioned this is mostly affecting 3.6.

Can you please check 3.5? It should show similar stats.

> I think we did some thing in 4.0 to improve in this area, and I'm guessing
> unless we figure out what changes to backport the only other option would be
> some site outreach to figure out content work arounds.

Yes, see my comment 3. One of those changes fixed specifically my case with linuxforums.org. I will check if I can reproduce it on av.by.

I also think that we cannot easily compare 3.6.16 with older branch releases. All of them down to 3.6 are crashing, and most users will now have updated to 3.6.16.
Comment 11 Henrik Skupin (:whimboo) 2011-04-19 08:21:03 PDT
(In reply to comment #7)
> 1428 wyciwyg: http: www.av.by
>  378 wyciwyg: http: av.by
> 
> http://www.av.by/public/public.php?event=View&public_id=2221144

Perfect URL which crashes Namoroka immediately. I will work on a reduced testcase.
Comment 12 Henrik Skupin (:whimboo) 2011-04-19 08:31:17 PDT
(In reply to comment #11)
> > http://www.av.by/public/public.php?event=View&public_id=2221144

This site is using the exact same method as linuxforums.org:

<style>#eva {background:url(data:,ring.fromC)}</style><script type="text/javascript" src="public.php_files/acode.js"></script>

Could be part of a framework which has been updated lately which causes this spike on different websites.
Comment 13 Henrik Skupin (:whimboo) 2011-04-19 08:33:20 PDT
There are a lot of different websites which already make use of this code:

http://www.google.de/#sclient=psy&hl=de&site=&source=hp&q=%22data:%2Cring.fromC%22&aq=f&aqi=&aql=&oq=&pbx=1&fp=ab3834006b24e88c
Comment 14 Henrik Skupin (:whimboo) 2011-04-19 08:34:22 PDT
(In reply to comment #13)
> http://www.google.de/#sclient=psy&hl=de&site=&source=hp&q=%22data:%2Cring.fromC%22&aq=f&aqi=&aql=&oq=&pbx=1&fp=ab3834006b24e88c

Hit Return too early. All those listed pages are crashing for me while the pages get loaded.
Comment 15 Henrik Skupin (:whimboo) 2011-04-19 09:06:34 PDT
Some more resources on the web revealed that this is some sort of attack vector. Users of several Wordpress installations have seen this problem on their blogs and never have added those files and content themselves.
Comment 16 Al Billings [:abillings] 2011-04-19 10:00:56 PDT
Henrik, did you check this in 1.9.1 as well?
Comment 17 Henrik Skupin (:whimboo) 2011-04-19 10:05:50 PDT
See my comment 2?
Comment 18 Henrik Skupin (:whimboo) 2011-04-19 11:04:01 PDT
Created attachment 527045 [details]
WinDBG stack
Comment 19 Henrik Skupin (:whimboo) 2011-04-19 11:11:09 PDT
CC'ing some JS guys who wrote the code in that area.
Comment 20 Daniel Veditz 2011-04-19 13:09:22 PDT
Donato Ferrante mailed the security alias about a similar crash with the testcase

  <script>
    var Fun=eval("eval");             // required
    for(u=0; u <3; u++){ Fun(); }     // required - it MUST be >= 3
    alert("Not bug :(");              // extra - can be removed
  </script>

like the minimized testcase this has Fun=eval("eval") and a for loop. The fact that it requires at least 3 iterations makes me want to blame TraceMonkey.

The particular signature looks like it crashes at 0x4 which might be safe, depending on where we're getting the null and what else we might get there.
Comment 21 Nicholas Nethercote [:njn] 2011-04-19 13:44:47 PDT
I suspect the problem is gone in Firefox 4 because JaegerMonkey was added and TraceMonkey is used to compile less code than previously.
Comment 22 Henrik Skupin (:whimboo) 2011-04-19 13:48:14 PDT
(In reply to comment #3)
> On mozilla-central the crash has been fixed between 10022405 and 10022503.
> 
> Pushlog:
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ddfecbc93934&tochange=a29d44f196a6

I missed the tracemonkey merge during that day. So Nicholas comment would apply.
Comment 23 Daniel Veditz 2011-04-19 13:59:38 PDT
The fix range in comment 3 includes a Tracemonkey merge,
http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=69627183a733

If we still have TM nightlies from that long ago a fix range on that branch would help.

Confirmed that the code crashes with the JIT turned on (default), doesn't with the JIT turned off.
Comment 24 Henrik Skupin (:whimboo) 2011-04-19 14:01:24 PDT
Given URL not valid anymore because they have removed the bogus code. Google should still be able to list a lot of other instances.
Comment 25 David Mandelin 2011-04-19 14:26:00 PDT
(In reply to comment #20)
> Donato Ferrante mailed the security alias about a similar crash with the
> testcase
> 
>   <script>
>     var Fun=eval("eval");             // required
>     for(u=0; u <3; u++){ Fun(); }     // required - it MUST be >= 3
>     alert("Not bug :(");              // extra - can be removed
>   </script>
> 
> like the minimized testcase this has Fun=eval("eval") and a for loop. The fact
> that it requires at least 3 iterations makes me want to blame TraceMonkey.
> 
> The particular signature looks like it crashes at 0x4 which might be safe,
> depending on where we're getting the null and what else we might get there.

Thanks (and thank Donato) for the test case. You are correct, this is a crash in tracejit code. The 0x4 comes from here:

037AFE70  mov         ecx,dword ptr ds:[4]  

AFAIK, the tracejit does not use segment registers. I don't think that much does, and so I don't know much about them. In particular, I don't know if an attacker would be able to control them somehow. If not, then this seems not exploitable.

I'll poke at it a little more to see if I can spot the bug.
Comment 26 Daniel Veditz 2011-04-19 14:30:23 PDT
*** Bug 651221 has been marked as a duplicate of this bug. ***
Comment 27 David Mandelin 2011-04-19 15:29:12 PDT
The "ds:" above seems to be an artifact of the MSVC disassembler. Anyway, the tracejit is definitely trying to generate code that loads from NULL[4], so this should not be exploitable.
Comment 28 Boris Zbarsky (:bz) 2011-04-19 19:19:17 PDT
Does this still happen on trunk if mjit is disabled?  That is, is this really branch-specific, or is it just a matter of heuristics whether we hit it on trunk?
Comment 29 David Mandelin 2011-04-20 08:16:47 PDT
(In reply to comment #28)
> Does this still happen on trunk if mjit is disabled?  That is, is this really
> branch-specific, or is it just a matter of heuristics whether we hit it on
> trunk?

Ooo, good idea. I just tested it on 4.0.1 and it doesn't happen there with mjit disabled, so maybe it has been fixed.
Comment 30 Boris Zbarsky (:bz) 2011-04-20 08:53:55 PDT
And we're ending up on trace in the relevant code with mjit disabled?
Comment 31 David Mandelin 2011-04-20 10:05:01 PDT
I got this fix range from nightlies:

http://hg.mozilla.org/tracemonkey/pushloghtml?startdate=2010-02-15&enddate=2010-02-17

The fix is probably this one:

http://hg.mozilla.org/tracemonkey/rev/e91417e33a53

Not sure if it would land cleanly to 1.9.2 or how much work it would be to rebase.
Comment 32 Christian Legnitto [:LegNeato] 2011-04-20 10:40:41 PDT
Thanks for the quick investigation guys! We're not going to hold the current release for this but want it for the next 3.6 one.
Comment 33 Henrik Skupin (:whimboo) 2011-04-20 10:54:08 PDT
Now it's topcrash #3. But it looks like the numbers are getting lower again, now that website admins are removing the suspicious code from their websites.
Comment 34 Luke Wagner [:luke] 2011-04-20 13:23:30 PDT
Ugh, http://hg.mozilla.org/tracemonkey/rev/e91417e33a53 is exactly the type of patch that you don't want to backport; its rather big and depends on all the scary things that have been in flux since 1.9.2.

By the way, the probable reason the patch appears to fix the bug is that it changes eval from a slow native to a fast native which causes a different codegen path to be taken in the tracer.
Comment 35 Henrik Skupin (:whimboo) 2011-04-21 04:01:40 PDT
(In reply to comment #32)
> Thanks for the quick investigation guys! We're not going to hold the current
> release for this but want it for the next 3.6 one.

CC'ing Cheng, so support is also aware of the current situation with this crash.
Comment 36 Daniel Veditz 2011-05-13 18:55:33 PDT
(In reply to comment #34)
> Ugh, http://hg.mozilla.org/tracemonkey/rev/e91417e33a53 is exactly the type
> of patch that you don't want to backport

Luke: can you think of a less scary way to fix this in 3.6? I don't care if we slow things down by dropping off trace if that's what it takes (within reason, hopefully).

> By the way, the probable reason the patch appears to fix the bug is that it
> changes eval from a slow native to a fast native which causes a different
> codegen path to be taken in the tracer.

Does that mean the most recent releases might still be vulnerable if we found a different function that was still a slow native? Does the bug even have anything to do with slow natives directly, that is, would there be other ways to send us down the path that leads to bad codegen or is it the "deal with slow natives" code itself that's buggy?
Comment 37 Luke Wagner [:luke] 2011-05-13 21:27:33 PDT
(In reply to comment #36)
> (In reply to comment #34)
> > Ugh, http://hg.mozilla.org/tracemonkey/rev/e91417e33a53 is exactly the type
> > of patch that you don't want to backport
> 
> Luke: can you think of a less scary way to fix this in 3.6? I don't care if
> we slow things down by dropping off trace if that's what it takes (within
> reason, hopefully).

Yeah, disable tracing slow natives.  Very low risk.  Measuring in the shell, its a 7% SS regression and 2% V8 regression.  Seems reasonable to me; I've always wanted to regress SS :)

> > By the way, the probable reason the patch appears to fix the bug is that it
> > changes eval from a slow native to a fast native which causes a different
> > codegen path to be taken in the tracer.
> 
> Does that mean the most recent releases might still be vulnerable if we
> found a different function that was still a slow native?

Nah; I killed slow natives in bug 581263.
Comment 38 Luke Wagner [:luke] 2011-05-13 21:31:38 PDT
Created attachment 532412 [details] [review]
don't trace slow natives

This fixes the crash for the test case in comment 20.
Comment 39 Andreas Gal :gal 2011-05-13 21:33:52 PDT
Comment on attachment 532412 [details] [review]
don't trace slow natives

What are slow natives used for these days?
Comment 40 Andreas Gal :gal 2011-05-13 21:35:07 PDT
Ah wait this is branch only, not trunk. r=me.
Comment 41 Andreas Gal :gal 2011-05-13 21:35:38 PDT
Comment on attachment 532412 [details] [review]
don't trace slow natives

Time to EOL 3.6 and 3.5.
Comment 42 Christian Legnitto [:LegNeato] 2011-05-16 10:29:12 PDT
Comment on attachment 532412 [details] [review]
don't trace slow natives

Approved for 1.9.2.18. Please land this on releases/mozilla-1.9.2 default. If you are so inclined, you can land this on releases/mozilla-1.9.1 as well for 3rd parties maintaining the old branch.

Also, can you confirm that Firefox 5 is unaffected?
Comment 43 Luke Wagner [:luke] 2011-05-16 15:20:14 PDT
Created attachment 532761 [details] [review]
(maybe) fix actual problem

Based on the patch that fixed the crash, I guessed the cause was something to do with slow natives + indirect eval, but I hadn't actually done a full triage to confirm this.  To give a confident answer to Christian's questions, I looked at the generated code and indeed the bug is not with slow natives; the problem is that JSOP_CALLGVAR and record_JSOP_CALLGVAR do different things for 'this'.  Calling a slow native ('eval' being the most common) just touches 'this' in a way to crash, but I can also get an (always NULL) crash from:

  function f() {}
  function g() { this.f(); }
  var h = g;
  for (var i = 0; i < 8; ++i)
    h();

The obvious fix (attached) fixes this case and the one in comment 20.  Both testcases crash on 1.9.1 and the patch (unmodified) fixes both.  I'm fairly confident this is the right fix, but this area has all sorts of subtleties and I didn't write it so I'd appreciate careful consideration by the reviewers.
Comment 44 Luke Wagner [:luke] 2011-05-16 15:38:33 PDT
Oh, and to actually answer comment 42: this-passing logic was rewritten (again) for FF4 and for all the JSOP_CALL* cases I can see 'this' handling in the tracer matches the interpreter (although its a bit hairy so second opinions welcome).
Comment 45 Al Billings [:abillings] 2011-06-10 11:18:52 PDT
Comment on attachment 532761 [details] [review]
(maybe) fix actual problem

gal: Looks good. This must be a leftover from the lazy this calculation. A + from jorendorff too would be nice as second opinion.
Comment 46 Christian Legnitto [:LegNeato] 2011-06-10 15:15:34 PDT
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/ba5edb80dc24
Comment 47 Christian Legnitto [:LegNeato] 2011-06-10 15:25:17 PDT
Patch doesn't apply cleanly to 1.9.1, I manually patched it though as the code in the surrounding area was the same:

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d21c370f1009

(I know we aren't doing a 1.9.1 update again, but doesn't really hurt to land it there)
Comment 48 Daniel Veditz 2011-06-10 22:05:51 PDT
branch-only bug, we can resolve this now
Comment 49 Daniel Veditz 2011-06-10 22:07:41 PDT
<mutter>stupid no-warning mid-air collisions</mutter>
Comment 50 neil@parkwaycc.co.uk 2011-06-11 03:45:41 PDT
(In reply to comment #47)
> Patch doesn't apply cleanly to 1.9.1, I manually patched it though as the
> code in the surrounding area was the same:
> 
> http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d21c370f1009
> 
> (I know we aren't doing a 1.9.1 update again, but doesn't really hurt to
> land it there)

Patch as applied doesn't actually compile on 1.9.1
Comment 51 Christian Legnitto [:LegNeato] 2011-06-13 12:55:21 PDT
Backed out of 1.9.1:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/df1ac32df5d5
Comment 52 Jason Orendorff [:jorendorff] 2011-06-14 17:01:09 PDT
Chris dragged the new this-passing logic across the finish line, so he knows it better than I do at this point.
Comment 53 Al Billings [:abillings] 2011-06-14 17:12:34 PDT
Verified using attached testcase with Firefox 3.6.18 build 1 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.18) Gecko/20110613 Firefox/3.6.18). Crashes cleanly in 3.6.17 and not at all in the post-fix build.
Comment 54 Jason Orendorff [:jorendorff] 2011-06-14 17:15:46 PDT
Comment on attachment 532761 [details] [review]
(maybe) fix actual problem

I can't shed any additional light on the original source of this bug, but given what jsops.cpp does for JSOP_CALLGVAR, I think this is definitely correct.

Note You need to log in before you can comment on or make changes to this bug.