Bugzilla@Mozilla – Bug 514554
Crash with recursive web-worker calls [@XPCNativeSet::Mark]
Last modified: 2009-11-04 05:58:16 PST
Summon comment box
Created attachment 398496 [details] testcase (open either of the two index files) Orlando Berrera from sectheory.com sent me this testcase, a variant (likely the original) was posted by Malloc(i) at http://sla.ckers.org/forum/read.php?14,30128 so this is a public crash. Orlando and Malloci report this as a DoS, a near-null read access violation. When I tried it I saw a few scarier and suspicious things, a near-null write access violation due to an integer overflow (reading from 8 + register containing 0xffffffff), crashes in GC, and crashes in the cycle collector. I'm not at all sure web workers are a required part of this testcase. Would we get the same results appending to the document in a simple setInterval() ? This testcase seems to crash faster than I'd expect that approach to, but maybe that's just because the threads can generate events while layout is reflowing instead of having to wait for it.
You should be able to run the testcase using a jar: url like jar:https://bug514554.bugzilla.mozilla.org/attachment.cgi?id=398496!/index.html (also index2.html) if that doesn't work you can download and run it locally.
I thought web workers were a feature of 1.9.1?
They are, I nominated it for 1.9.0 because it might not be the workers themselves that are the problem. But we can wait until we identify a fix and decide then.
Ben: Can you put this on the top of your list to investigate?
I could only get this to crash on windows, here: http://crash-stats.mozilla.com/report/index/d45fc94d-dabf-4019-ac6d-a728d2090904 Other than that I saw a few OOM-ish crashes (bad_alloc exceptions in gfx fonts).
(In reply to comment #5) And the caller of XPCNativeSet::Mark is XPCJSRuntime::GCCallback
I'll try a debug build over the weekend, but for now moving this over to XPConnect since it looks to be the culprit.
Created attachment 399320 [details] [review] Proposed fix The code in XPC_WN_CallMethod does: XPCCallContext ccx(JS_CALLER, cx, obj, funobj, 0, argc, argv, vp); ... if(!XPCNativeMember::GetCallInfo(ccx, funobj, &iface, &member)) return Throw(NS_ERROR_XPC_CANT_GET_METHOD_INFO, cx); ccx.SetCallInfo(iface, member, JS_FALSE); Because XPCNativeMember::GetCallInfo must enter a request, it is a potential GC point. XPCCallContext::Init has the following code: if(name) SetName(name); if(argc != NO_ARGS) SetArgsAndResultPtr(argc, argv, rval); Note that SetArgsAndResultPtr sets the call context's state to HAVE_ARGS, which is > HAVE_NAME. So here we end up with a call context which is HAVE_NAME but has not initialized mSet. XPC_WN_CallMethod works because it then calls XPCCallContext::SetCallInfo, which initializes mSet (amongst other members). So, if GC happens between the construction and the call to SetCallInfo, then we could attempt to mark a garbage set and crash. This patch catches this case and initializes the offending members to avoid such crashes.
Comment on attachment 399320 [details] [review] Proposed fix >+ mStaticMemberIsLocal = nsnull; Please pretend this says 'JS_FALSE' instead of nsnull.
Comment on attachment 399320 [details] [review] Proposed fix Would it make sense to do it in SetArgsAndResultPtr instead (if mState < HAVE_NAME)? That would also avoid the problem for callers who don't pass argc, etc to the constructor and then call SetArgsAndResultPtr without calling SetName. Not sure if we care about those. This looks like an old bug, probably need to take this on branches.
Created attachment 399572 [details] [review] With that
mrbkap: does this also affect xpconnect in 1.9.0? Might be harder to get the timing right without workers, but it's a small fix and might be good to take even if we're not sure it can be triggered.
(In reply to comment #12) > mrbkap: does this also affect xpconnect in 1.9.0? Might be harder to get the > timing right without workers, but it's a small fix and might be good to take > even if we're not sure it can be triggered. The fix applied to 1.9.0 (with a little massaging). In order to hit this bug though, you have to run JS on not the main thread, which I don't think we do on the 1.9.0 branch. I'll attach a backported patch in a little bit.
http://hg.mozilla.org/mozilla-central/rev/709036ca5768
Comment on attachment 399572 [details] [review] With that This actually applies on 1.9.0.
Comment on attachment 399572 [details] [review] With that This doesn't need approval on 1.9.2 since it's blocking.
Comment on attachment 399572 [details] [review] With that Approved for 1.9.1.4 and 1.9.0.15, a=dveditz
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/61015c1c2156
Checking in js/src/xpconnect/src/xpccallcontext.cpp; /cvsroot/mozilla/js/src/xpconnect/src/xpccallcontext.cpp,v <-- xpccallcontext.cpp new revision: 1.27; previous revision: 1.26 done
(In reply to comment #13) > The fix applied to 1.9.0 (with a little massaging). In order to hit this bug > though, you have to run JS on not the main thread, which I don't think we do on > the 1.9.0 branch. I'll attach a backported patch in a little bit. The attached PoC doesn't crash 1.9.0.14, either run via Dan's jar: url or running the files locally.
(In reply to comment #20) > The attached PoC doesn't crash 1.9.0.14, either run via Dan's jar: url or > running the files locally. I don't know if it's possible to reproduce this bug on 1.9.0. It requires multiple threads running JS which don't exist without web workers in Firefox proper.
Using the attached testcase, I peg my CPU at above 95% with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4 (.NET CLR 3.5.30729). I assume that the DoS is expected while it runs the script. I had it crash randomly once with 1.9.1.4 but I haven't been able to do it again. I'm not 100% convinced that this is fixed.
If I run the index.html file in Dan's attached testcase and then shut down Firefox after a couple of minutes, it sometimes crashes. It doesn't crash immediately like it does in 1.9.1.3.
Al, What stacks are you crashing with?
It didn't submit them for some reason and they aren't in the local directory. I'm trying again.
Created attachment 406824 [details] crash report dmp file for 1.9.1.4 crash
Created attachment 406825 [details] crash report extra file for 1.9.1.4 crash
My upload of the crash is being throttled for some reason.
There we go. I can do it about half or a third of the time by letting it run with index.html and then just clicking the red x to close Firefox. http://crash-stats.mozilla.com/report/index/62c11801-4dbe-403b-811f-3d2ad2091016?p=1
(In reply to comment #29) > http://crash-stats.mozilla.com/report/index/62c11801-4dbe-403b-811f-3d2ad2091016?p=1 This shows a null pointer deref in the web worker code. I think it should get its own bug.
(In reply to comment #30) > This shows a null pointer deref in the web worker code. I think it should get > its own bug. Except for the fact that the stack is totally wacky... Looks more like some kind of stack corruption.
(In reply to comment #31) > Except for the fact that the stack is totally wacky... Looks more like some > kind of stack corruption. Heh, ignoring the XPC_WN_NoMods_Proto_Resolve, the stack actually looks fairly believable. Anyway, let's take this to bug 522839.
So do you think it is safe to mark this as verified for 1.9.1 then, Blake? The crash is really a separate issue?
I think so, yes.
So you guys found that the bug was causing stack corruption? Could I get access to the new bug filed https://bugzilla.mozilla.org/show_bug.cgi?id=522839
When I load the test case in 3.5.3 I crash immediately and the stacks are consistent for me: http://crash-stats.mozilla.com/report/index/27587aae-b48b-4494-84e5-e5d712091029 However, when I load the test case in 3.5.4 the application does not crash right away, but it becomes non-responsive and after a couple of minutes it crashes: http://crash-stats.mozilla.com/report/index/32b482a9-6ae0-4ee5-9bc0-f76412091029 These two look different, but unfortunately the latter one does not look like the stack signature in bug 522839, which would make the results of this investigation a little tidier.
bp-32b482a9-6ae0-4ee5-9bc0-f76412091029 is just a fancy "out of memory" crash. If you've made it that far then you've avoided this bug.
Ben, Blake, can this land for 1.9.2? Or do we need to dig in deeper here before we land this? Seems like it should be landable given that we shipped 1.9.1.4 with this...
Apparently I landed this in time to make beta 1: http://hg.mozilla.org/releases/mozilla-1.9.2/rev/8319e4e20c0e