Bugzilla@Mozilla – Bug 590116
Stack corruption in Windows' nsIconChannel::MakeInputStream due to GetDIBits being incorrectly used
Last modified: 2011-01-28 12:00:58 PST
Summon comment box
I can reproduce this on the official builds: Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; rv:2.0b3) Gecko/20100805 Firefox/4.0b3 as well as on Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b5pre) Gecko/20100823 Minefield/4.0b5pre (.NET CLR 3.5.30729). However, I can't reproduce this on a debug build, either VC8 or VC10. A call in nsIconChannel::MakeInputStream [1] to Win32's GetDIBits [2] seems to clobber the stack on returning. This happens whenever this point in code is reached. In most cases those locations aren't touched any more. However, in at least one case at least one location is -- when stockIcon [3] contains a shared string, during its destruction that value on the stack (now 0x00ffffff) is used. An easy way to trigger a crash on Windows 7 using it is to load the URL moz-icon://stock/uac-shield. This immediately causes a crash, e.g. [4], which I submitted. [1] http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp#538 [2] http://msdn.microsoft.com/en-us/library/dd144879%28VS.85%29.aspx -- the call is in http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp#632 [3] http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp#551 [4] http://crash-stats.mozilla.com/report/index/bp-fb1225e0-ca4a-4c4b-99ae-84df22100824 There doesn't seem to be anything obviously wrong with the API calls. This might be something hidden in the API or a codegen/PGO bug.
http://crash-stats.mozilla.com/report/index/bp-fb1225e0-ca4a-4c4b-99ae-84df22100824 0 nspr4.dll PR_AtomicDecrement nsprpub/pr/src/misc/pratom.c:312 1 xul.dll nsACString_internal::Finalize xpcom/string/src/nsTSubstring.cpp:186 2 xul.dll nsIconChannel::MakeInputStream modules/libpr0n/decoders/icon/win/nsIconChannel.cpp:559 3 xul.dll nsIconChannel::AsyncOpen modules/libpr0n/decoders/icon/win/nsIconChannel.cpp:246 4 xul.dll nsURILoader::OpenURI uriloader/base/nsURILoader.cpp:863 5 xul.dll nsDocShell::DoChannelLoad docshell/base/nsDocShell.cpp:8862 6 xul.dll nsDocShell::DoURILoad docshell/base/nsDocShell.cpp:8704 7 xul.dll nsDocShell::InternalLoad docshell/base/nsDocShell.cpp:8374 8 xul.dll nsDocShell::LoadURI docshell/base/nsDocShell.cpp:1414
When did this start happening? I've recently landed a bunch of imagelib changes...
(In reply to comment #2) > When did this start happening? I've recently landed a bunch of imagelib > changes... Not sure, but at least the crash doesn't happen with 3.6.8. However it seems we've been using the API incorrectly all this while -- the BITMAPINFO structure [1] actually expects an array of RGBQUADs rather an a single RGBQUAD. The number of RGBQUADs is determined by biBitCount and biClrUsed [2]. A mask has 1 bit per pixel, which means according to [2] that bmiColors has to contain two entries, not the one you would get by allocating on the stack. This results in the four bytes right after maskInfo being overwritten. [1] http://msdn.microsoft.com/en-us/library/dd183375%28v=VS.85%29.aspx [2] http://msdn.microsoft.com/en-us/library/dd183376%28v=VS.85%29.aspx
Bobby, can you look at this crash?
(In reply to comment #2) > When did this start happening? I've recently landed a bunch of imagelib > changes... we did see a bump in this signature shortly after the release of beta1 around July 7 going from 15-25 crashes per day before 7/7 to 30-55 crashes per day after 7/7. then we started to see a decline again around July 27. peak volume day was july 15. here is the breakdown of releases for this signature on that day. checking --- PR_AtomicDecrement 20100715-crashdata.csv found in: 4.0b1 3.0.19 3.0 3.0.5 3.7a5 3.0b4 3.0.4 3.0.3 3.0.10 3.0.1 release total-crashes PR_AtomicDecrement crashes pct. all 292787 55 0.00018785 4.0b1 20302 37 0.00182248 3.0.19 7892 6 0.000760264 3.0 818 4 0.00488998 3.0.5 769 2 0.00260078 3.7a5 795 1 0.00125786 3.0b4 254 1 0.00393701 3.0.4 493 2 0.0040568 3.0.3 510 1 0.00196078 3.0.10 420 1 0.00238095 3.0.1 1202 1 0.000831947 maybe skip listing would help to sort out possible multiple different crashes under the signature PR_AtomicDecrement. It looks like we still see the crash on the trunk/beta3. here is the breakdown for yesterday. checking --- PR_AtomicDecrement 20100823-crashdata.csv found in: 4.0b3 4.0b4pre 3.0 4.0b5pre 3.0.19 4.0b3pre 4.0b1 3.6.8 3.0.1 release total-crashes PR_AtomicDecrement crashes pct. all 287185 41 0.000142765 4.0b3 17548 24 0.00136768 4.0b4pre 146 6 0.0410959 3.0 822 3 0.00364964 4.0b5pre 1962 2 0.00101937 3.0.19 7382 2 0.000270929 4.0b3pre 115 1 0.00869565 4.0b1 1689 1 0.000592066 3.6.8 186488 1 5.36228e-06 3.0.1 1176 1 0.00085034
Given that this would involve me investigating things on windows (not my native platform - I have a VM, but not really a development environment), I don't really have the bandwidth for it right now. Maybe in a few weeks? Or maybe joe could look at it?
Resetting assignee to default to signify the fact that I'm not owning this for the time being.
Updating summary to reflect current information.
I bet bug 526038 has the same cause.
Created attachment 469080 [details] [review] possible fix Here's one possible way to fix it. The variable length of BITMAPINFO means that we'll need to allocate it on the heap. I'm still not sure whether the colour table should never be part of the icon. (If I do include it with a 32-bit icon, the image doesn't come out right, so I removed it.)
Created attachment 469350 [details] [review] Possible fix, v1.01 Oops, missed a couple of error checks.
Comment on attachment 469350 [details] [review] Possible fix, v1.01 >+// Given a header and a size, creates a freshly allocated BITMAPINFO structure. >+// It is the caller's responsibility to free the structure. >+static BITMAPINFO* CreateBitmapInfo(BITMAPINFOHEADER* aHeader, >+ size_t aColorTableSize) >+{ >+ BITMAPINFO* bmi = (BITMAPINFO*) moz_xmalloc(sizeof(BITMAPINFOHEADER) + >+ aColorTableSize); We should probably use fallible alloc here, because this is a user-controlled, mostly arbitrarily-large size. I know that'll complicate things and add error handling, though. The style here is also to use new[]/delete[], so it might not be a bad thing to use fallible new[]. Looks good otherwise, but this also needs sr.
Comment on attachment 469350 [details] [review] Possible fix, v1.01 Looks good to me, though most of the patch seems to be some cleanup (using a BITMAPINFOHEADER instead of a BITMAPINFO), with the core change happening in the last hunk. I don't have anything against the cleanup, but would be nice to split it up into two separate changesets for better inspectability.
(In reply to comment #12) > The style here is also to use new[]/delete[] , so it might not > be a bad thing to use fallible new[]. The size of the pointer is statically unknown, and I'd rather do a malloc than a new char[]. (In reply to comment #13) > Comment on attachment 469350 [details] [review] > Possible fix, v1.01 > > Looks good to me, though most of the patch seems to be some cleanup (using a > BITMAPINFOHEADER instead of a BITMAPINFO) Oh, I only did that because a statically allocated BITMAPINFO is 4 bytes larger than just the header, and at that stage just the header's required.
Created attachment 469854 [details] [review] patch addressing review comments, to check in OK, I've switched to fallible operator new (there could also possibly be alignment issues with new char[]). The bug affects trunk and all release branches. How is this going to be landed?
Created attachment 469987 [details] [review] fix for one more potential issue Hmm, I noticed one potential issue my patch could introduce. We could convince ourselves to allocate a potentially unbounded amount of memory if the header is malformed and has a strange value for aHeader->biClrUsed. Is failing the right thing to do here? (sorry for the additional review requests!)
Created attachment 469991 [details] [review] ... correct patch to fix one more potential issue Er, and here's the correct patch. Stupid hg.
Landed: http://hg.mozilla.org/mozilla-central/rev/08937a340174
Comment on attachment 469991 [details] [review] ... correct patch to fix one more potential issue This bug affects 1.9.2 as well, even though we don't outright crash there. The stack corruption still occurs (I verified this using the VS debugger's memory view). The patch applies cleanly on 1.9.2.
Comment on attachment 469991 [details] [review] ... correct patch to fix one more potential issue ... but it doesn't actually build, since 1.9.2 doesn't have infallible malloc.
Created attachment 475169 [details] [review] patch for 1.9.2 This works. As for 1.9.1, it seems to be unaffected by the bug, but entirely by accident. In <http://mxr.mozilla.org/mozilla1.9.1/source/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp?mark=497-505#497>, the pointer that's passed in for the bitmap info is an offset of the buffer itself, so I think it first writes the colour table out, then overwrites it with the actual bitmap.
Comment on attachment 475169 [details] [review] patch for 1.9.2 Approved for 1.9.2.11, a=dveditz Does this patch work for 1.9.1.x too, or do we need a different version there?
- 1.9.1 is unaffected by the actual corruption issue by accident, as I mentioned in comment 21. - The 1.9.1 code is completely different, so if we do decide to explicitly fix it there we'll have to write a completely different patch.
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/7e02b0747e53
(In reply to comment #25) > Created attachment 482359 [details] > Bug Bounty Nomination I haven't been able to do anything worse than a denial of service, and I think DEP + ASLR would protect against this, so this bug might not be eligible. On the other hand, the executable is 32-bit only, so one might in theory be able to brute-force one's way past ASLR easily.
Of course, ASLR isn't present on Windows XP.
... and Win2k doesn't have DEP, and we still support it. Never mind. :)