Bugzilla@Mozilla – Bug 564679
Bytes mapped to U+FFFD in 8-bit encodings make the following byte/character disappear
Last modified: 2010-07-20 16:15:35 PDT
Summon comment box
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a5pre) Gecko/20100508 Minefield/3.7a5pre Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a5pre) Gecko/20100508 Minefield/3.7a5pre Many 8-bit encodings contain undefined positions, which are mapped to U+FFFD. In the last Minefield for Mac, such bytes are correctly mapped to U+FFFD, but the immediately following byte disappears(!). For example, the sequence {'\xD1', '\xD2', '\xD3', 'xD4'} in windows-1253 should result in {U+3A1, U+FFFD, U+3A3, U+3A4} (i.e., the string "Ρ�ΣΤ"), but the actual result is the shorter sequence {U+3A1, U+FFFD, U+3A4} with no U+3A3 character (i.e., the string "Ρ�Τ", with no 'Σ'). (Two consecutive bytes both mapped to U+FFFD result in only one U+FFFD character instead of two.) This seems to be a general problem; it does apply to several windows-* and ISO-8859-* encodings. Firefox 3.6.3 (release) shows the same incorrect behaviour. This bug did not exist in Firefox 3.5.8. [Incidentally, it might make sense to map bytes in the range 0x7F..0x9F to U+7F..U+9F and not to U+FFFD for many of the affected encodings, but that is a separate issue and would in any case not solve the current problem completely since many encodings, including windows-1253, have undefined characters outside this range, for which U+FFFD is the only reasonable mapping.] Reproducible: Always
Investigating. There is more to this than meets the eye: I see the failure at http://coq.no/X/charset5/test8bit.php?enc=windows-1253&mime=windows-1253, but on the other hand data:text/html;charset=windows-1253,%d1%d2%d3%d4 decodes as expected to Ρ�ΣΤ. Unfortunately our own unit tests at intl/uconv/tests/unit/test_decode_*.js don't test undefined code points, and I will fix this, but I tried adding 0xd2 to test_decode_CP1253.js manually, and that didn't show the bug either.
Regression range is http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b5359ab6a52c&tochange=ec53caefb5b8. Most likely candidate is http://hg.mozilla.org/mozilla-central/rev/5953efc48779 from bug 174351
This could be an XSS vulnerability
(In reply to comment #1) > Investigating. There is more to this than meets the eye: I see the failure at > http://coq.no/X/charset5/test8bit.php?enc=windows-1253&mime=windows-1253, but > on the other hand data:text/html;charset=windows-1253,%d1%d2%d3%d4 decodes as > expected to Ρ�ΣΤ. This turns out to be true only on trunk with the HTML5 parser enabled. With it disabled, and also on 3.6, data:text/html;charset=windows-1253,%d1%d2%d3%d4 decodes to Ρ�Τ
Created attachment 444362 [details] [review] Fix
Created attachment 444363 [details] [review] Test
Created attachment 444365 [details] [review] Test
Simon, did you mean to ask someone in particular for review?
(In reply to comment #8) > Simon, did you mean to ask someone in particular for review? Thanks for spotting that, Boris. Not only did I mean to do so, I *did* do so, but it failed, and silently at that. This is the supremely annoying bug 372539, and I have been bitten by it before...
Comment on attachment 444362 [details] [review] Fix > + <title>Test for Unicode non-characters</title> Fix the test title. r=me with this.
http://hg.mozilla.org/mozilla-central/rev/96edff678527 http://hg.mozilla.org/mozilla-central/rev/15cec4043fba
Comment on attachment 444362 [details] [review] Fix Requesting branch approval after trunk baking. This is a very low-risk change which prevents illegal codepoints from corrupting the following character. It is more important to have this on the branch than on trunk, since the HTML5 parser mitigates its effect in some cases.
Is 1.9.1 affected as well?
No. Bug 174351 was not landed on the 1.9.1 branch.
Comment on attachment 444362 [details] [review] Fix Approved for 1.9.2.6, a=dveditz for release-drivers
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/8478cfe10e43
and tests: http://hg.mozilla.org/releases/mozilla-1.9.2/rev/4422b1e5b0dc
Verified for 1.9.2 with passing tests.