Bugzilla@Mozilla – Bug 659349
WebGL allows access to uninitialised graphics memory
Last modified: 2011-07-15 15:45:35 PDT
Summon comment box
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 Build Identifier: The linked conformance test checks that a WebGL canvas isn't given unintialised memory when it is resized. We found that it fails on some configurations on Windows XP and MacOS X. By using a larger canvas is it possible to view graphics data from other web pages, and on MacOS various parts of the desktop as well. I'm attaching a PoC that loads an iframe onto a texture (by applying a CSS transform/transition to it), then searches for it in graphics memory. An SVG color matrix filter is allied to it to set the red channel to a specific value which we then search for. Although the retrieved image is somewhat scrambled, the PoC could be further developed by encoding position information for each pixel into the color channels and using that information to reconstruct the whole image. We've tested this with success on Intel and ATI cards on WinXP and Nvidia and Intel cards on some recent Macs. Reproducible: Always Steps to Reproduce: 1. Get WebGL context 2. Resize canvas 3. Copy WebGL canvas onto 2d canvas 4. read pixels
Created attachment 534801 [details] svg filter
Created attachment 534802 [details] iframe texture
Created attachment 534804 [details] iframe texture
Created attachment 534805 [details] iframe texture
Created attachment 534806 [details] webgl reader
Created attachment 534807 [details] frameset This is the one you should run. It seems to work best on MacOS X. For some reason the HTML is uploading as text/plain, I guess this is a new Bugzilla security feature?
The testcase works damn well here. Just save the 'frameset' attachment and run it locally. Mac OS 10.6.4, Firefox Nightly from today, NVIDIA GeForce 320M OpenGL Engine -- 2.1 NVIDIA-1.6.18.
Removing the colorMask() call makes it not work anymore. I think I know what's happening. GLContext::ClearSafely is being naive: void GLContext::ClearSafely() { GLfloat clearColor[4]; GLfloat clearDepth; GLint clearStencil; fGetFloatv(LOCAL_GL_COLOR_CLEAR_VALUE, clearColor); fGetFloatv(LOCAL_GL_DEPTH_CLEAR_VALUE, &clearDepth); fGetIntegerv(LOCAL_GL_STENCIL_CLEAR_VALUE, &clearStencil); fClearColor(0.0f, 0.0f, 0.0f, 0.0f); fClearStencil(0); fClearDepth(1.0f); fClear(LOCAL_GL_COLOR_BUFFER_BIT | LOCAL_GL_DEPTH_BUFFER_BIT | LOCAL_GL_STENCIL_BUFFER_BIT); fClearColor(clearColor[0], clearColor[1], clearColor[2], clearColor[3]); fClearStencil(clearStencil); fClearDepth(clearDepth); } As you can see, it's making many assumptions, in particular it assumes no masks, so it can be trumped by the testcase's glColorMask() call. We already check for this kind of issues at 2 different places in our WebGL implementation, see WebGLContext::ForceClearFramebufferWithDefaultValues(). We call that when creating new FBOs and with the default option preserveDrawingBuffer=false we also call that after the backbuffer has been presented to the compositor. However we don't currently call that after the canvas has been resized! Patch coming.
The big discussion we're having now with Jeff and Joe is: should we do: * option 1: fix this by just letting WebGLContext::SetDimensions call orceClearFramebufferWithDefaultValues() * option 2: actually fix GLContext::ClearSafely() Option 1 has the benefit that it's a very simple patch and doesn't add complexity to GLContext. Option 2 has the benefit that it doesn't waste time clearing the buffer _twice_, and makes GLContext::ClearSafely() actually safe. The question is, should the method in GLContext trust GL state? In general they can, but in the case of a GL context backing a WebGL context, state is scriptable and thus can't be trusted. Since it's a webgl-specific problem, maybe it's not a good idea to complexify GLContext in general just for it.
Looking at other GLContext methods, the code we have there is already very careful about not trusting existing GL state and is mostly just calling glGetIntegerv to query state rather than keeping a copy of state, which answers the design question raised in comment 9. I'll just go for option 2 and just call glGetIntegerv.
Created attachment 535112 [details] [review] rewrite GLContext::ClearSafely There's a comment in the patch with some explanations.
Try push: http://tbpl.mozilla.org/?tree=Try&rev=9274be381579 When the Mac builds are done I'll check if it fixes the testcase.
(In reply to comment #12) > Try push: > http://tbpl.mozilla.org/?tree=Try&rev=9274be381579 > > When the Mac builds are done I'll check if it fixes the testcase. It's fixed indeed.
Comment on attachment 535112 [details] [review] rewrite GLContext::ClearSafely It would be nice if you could document why particular state needs to be set. (e.g. GL_DITHER)
Jeff: still giving a bit more time to the WebKit guys, but at this point I think that disabling GL_DITHER is useless. I think the biggest reason was as an optimization, but in the case of a Clear() that shouldn't make a difference, right? Since it's the same small dither pattern repeated all over the framebuffer.
If I remove the GL_DITHER, that will be in a follow-up though, I don't want to introduce any risk here.
(In reply to comment #12) > Try push: > http://tbpl.mozilla.org/?tree=Try&rev=9274be381579 Hey, this makes us pass canvas-test.html on Mac! And looking at the log, it also passes on WinXP where we had added it to the ignore-list. So this test failure was actually very meaningful.
Pushed to mozilla-central: http://hg.mozilla.org/mozilla-central/rev/a45a190af4f7 Please give me aurora and beta approval for this.
Dan, can you help the release drivers with the call on whether this meets the criteria you all have been using for making tracking+ decisions for Firefox 5 and/or Firefox 6?
Benoit, Jeff, can you two help the release team by explaining the risks associated with this patch and what's been done to help assure there's no negative fallout?
Sure. This bug allows to read back arbitrary uninitialized video memory at least on Mac and on Windows XP, when WebGL is enabled. When I ran the testcase on a Mac, I saw the contents of other windows and of the desktop being painted on the canvas. It can't read currently-in-use video memory, but it can read memory that had been used by any other application and had been released since (even if the application is still running). It's known to happen also on Windows XP. We used to have a WebGL test failing because of that on Windows XP. So it's really a quite severe flaw. Moreover the code that this patch adds is not entirely new since it's a simplified version of the code that we already have in WebGL to clear framebuffer objects. I should also note that although the bug currently is only known to affect WebGL, it's not a bug in the WebGL implementation, see the patch, it's touching only GLContext::ClearSafely().
Comment on attachment 535112 [details] [review] rewrite GLContext::ClearSafely Please land in Aurora ASAP and we'll evaluate after further testing for 5 Beta.
Landed on Aurora: http://hg.mozilla.org/releases/mozilla-aurora/rev/0b02d41a5436 For Beta, I pushed it to Try: http://tbpl.mozilla.org/?tree=Try&rev=623856993a20
Hey!!! Is tryserver busted? My tryserver builds fail with this: === === If you get failures below, please file a bug describing the error === and your environment (compiler and linker versions), and use === --disable-elf-hack until this is fixed. === /builds/slave/try-lnx-dbg/build/obj-firefox/build/unix/elfhack/elfhack -b test.so /builds/slave/try-lnx-dbg/build/obj-firefox/build/unix/elfhack/elfhack: /tools/gcc-4.3.3/installed/lib/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by /builds/slave/try-lnx-dbg/build/obj-firefox/build/unix/elfhack/elfhack) NEXT ERROR make[7]: *** [test.so] Error 1 I'm not the only one to get these errors, see e.g. http://tbpl.mozilla.org/?tree=Try&rev=bed0a8106c45 http://tbpl.mozilla.org/?tree=Try&rev=df89a4ecc7ff CC'ing glandium about the elf-hack message.
(In reply to comment #24) > Hey!!! Is tryserver busted? > > My tryserver builds fail with this: > > === > === If you get failures below, please file a bug describing the error > === and your environment (compiler and linker versions), and use > === --disable-elf-hack until this is fixed. > === > /builds/slave/try-lnx-dbg/build/obj-firefox/build/unix/elfhack/elfhack -b > test.so > /builds/slave/try-lnx-dbg/build/obj-firefox/build/unix/elfhack/elfhack: > /tools/gcc-4.3.3/installed/lib/libstdc++.so.6: version `GLIBCXX_3.4.14' not > found (required by > /builds/slave/try-lnx-dbg/build/obj-firefox/build/unix/elfhack/elfhack) > NEXT ERROR make[7]: *** [test.so] Error 1 https://wiki.mozilla.org/index.php?title=ReleaseEngineering/TryServer#Using_older_GCC (You should have gotten a warning when pushing, but maybe the hook didn't work properly?) > I'm not the only one to get these errors, see e.g. > http://tbpl.mozilla.org/?tree=Try&rev=bed0a8106c45 > http://tbpl.mozilla.org/?tree=Try&rev=df89a4ecc7ff Not the same errors at all.
(In reply to comment #25) > https://wiki.mozilla.org/index.php?title=ReleaseEngineering/ > TryServer#Using_older_GCC > > (You should have gotten a warning when pushing, but maybe the hook didn't > work properly?) Thanks for the tip, and you're right, this gets printed when I push, I just hadn't seen it. New try push: http://tbpl.mozilla.org/?tree=Try&rev=248aead2bb55 Also I've made a local build and it works fine.
Comment on attachment 535112 [details] [review] rewrite GLContext::ClearSafely Can this please land today on beta?
Landed on beta: http://hg.mozilla.org/releases/mozilla-beta/rev/bca54750eec4
CC'ing Ken as Khronos is releasing a statement about that in response to today's blog post from Context.
@ Paul Stone: You knew that this was fixed in Firefox 5, which is going to be released in 5 days from now. Why did you blog about it today? http://www.contextis.com/resources/blog/webgl2/
Timing wasn't down to me, I'm afraid. I just did the PoC.
Speaking about timing, just a few hours after Context's blog post, Microsoft released this: http://blogs.technet.com/b/srd/archive/2011/06/16/webgl-considered-harmful.aspx
Created attachment 539878 [details] Microsoft TechNet - WebGL considered harmful - at 9:21 AM Pacific time today Microsoft's post says 9:21 AM. Looking at other posts, clearly they use Pacific time. When exactly was Context's post published?
The MS blog is no surprise, they expressed those same three concerns to us when Vlad first started talking about WebGL a few years ago.
http://www.google.com/#sclient=psy&hl=en&source=hp&q=http:%2F%2Fwww.contextis.com%2Fresources%2Fblog%2Fwebgl2&aq=f&aqi=&aql=&oq=&pbx=1&bav=on.2,or.r_gc.r_pw.&fp=9dd7129c70b71d22&biw=1362&bih=625&as_qdr=y15 currently says '8 hours ago' and just a minute ago it said '7 hours ago'. That means it went online exactly 8 hours ago, which means 7:10 AM Pacific time (it's currently 15:10 Pacific time). http://www.contextis.co.uk/resources/blog/blog.xml confirms it was published June 16 but doesn't give a time of the day. So, assuming that (my interpretation of) Google's measurement is correct, there is only 2 hours and 10 minutes between the publication of Context's post, and the publication of the Microsoft TechNet article quoting it. Please, someone give me an explanation that makes me drop my tinfoil hat.
(In reply to comment #34) > The MS blog is no surprise, they expressed those same three concerns to us > when Vlad first started talking about WebGL a few years ago. But they quote this new article by context, that was published just 2 hours before.
Fnord!
http://www.google.com/#sclient=psy&hl=en&source=hp&q=http:%2F%2Fwww.contextis.com%2Fresources%2Fblog%2Fwebgl2&aq=f&aqi=&aql=&oq=&pbx=1&fp=1&biw=1362&bih=625&as_qdr=y15&bav=on.2,or.r_gc.r_pw.&cad=b now says '23 hours ago' which is not plausible as that would mean it was published at midnight, UK time... I suppose the timestamp got cleared to 00:00:00, as in http://www.contextis.co.uk/resources/blog/blog.xml where it's Thur, 16 Jun 2011 00:00:00 +0000 What, if anything, can/should we say about this affair? Just ignore it? Or just mention, in a way as neutral as possible, that at most a few hours elapsed between Context's blog and Microsoft's article quoting it?
We shouldn't say anything about it. We can't prove anything and it wouldn't matter if we could.
Yeah, OK, I had come to the same conclusion.