Bugzilla@Mozilla – Bug 452979
Invisible control characters in URL MUST NOT be decoded when showing its address
Last modified: 2009-03-25 22:19:42 PDT
Summon comment box
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 Invisible Control characters (%0B,%0C,%1C,%1D,%1E and %1F in URL) MUST NOT de decoded in location bar. Reproducible: Always Steps to Reproduce: 1.Goto https://wiki.mozilla.org/?a=%0B,%0C,%1C,%1D,%1E,%1F, 2.See your location bar 3. Actual Results: Location bar shows https://wiki.mozilla.org/?a=%0B,%0C,%1C,%1D,%1E,%1F, as decoded string. But %0B,%0C,%1C,%1D,%1E and %1F not visible. You will read it as "https://wiki.mozilla.org/?a=,,,,,,". Expected Results: Invisible characters (%0B,%0C,%1C,%1D,%1E and %1F) MUST NOT decoded on location bar. According to result of "https://wiki.mozilla.org/?a=%00,%01,%02,%03,%04,%05,%06,%07,%08,%09,%0A,%0B,%0C,%0D,%0E,%0F,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%1A,%1B,%1C,%1D,%1E,%1F,%7F,", Other controll characters that between %00 and %7F does not have same problem. This issue can be used to address bar spoofing.
This issue may depends on font configuration. I tested using "MS UI Gothic" on Windows XP / Japanese.
Confirmed, new in Firefox 3. The underlying core URI service doesn't do this, is there an extra unescaping going on in the addressbar frontend?
There is. See http://hg.mozilla.org/mozilla-central/annotate/9d40cd95d9c9/browser/base/content/browser.js#l1973 . We avoid decoding \r \n and \t, but I guess we need to expand that to cover other types of whitespace.
I wonder if decodeURI shouldn't decode some of these characters in the first place...
Created attachment 336248 [details] [review] patch rev.1.0 for Trunk Whom should I request to review this patch?.
Where does that list of control characters come from?
(In reply to comment #6) > Where does that list of control characters come from? They come from result of https://wiki.mozilla.org/?a=%00,%01,%02,%03,%04,%05,%06,%07,%08,%09,%0A,%0B,%0C,%0D,%0E,%0F,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%1A,%1B,%1C,%1D,%1E,%1F,%7F, But, it may be needed to encode other Unicode control characters (e.g. BOM, soft-hypehn).
Comment on attachment 336248 [details] [review] patch rev.1.0 for Trunk It is needed to fix list of invisible characters. This patch contains ONLY UI-ASCII defined cntrol characters. So, other Unicode control charcters (e.g. BOM) is missing.
Masahiro: Assigning to you because you appear to be working on this, but if you're not please let us know.
Created attachment 338005 [details] testcase more testcase. This tescase contains 1.ASCII control code between \x00 and \x1f, and \x7f 2.Invisible Unicode Characters (http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf table4-10) Section 15.5: '\u2062' (INVISIBLE TIMES), '\u2063' (INVISIBLE SEPARATOR), Section 16.2: '\xad' (SOFT HYPEN), '\u200B' (ZERO WIDTH SPACE),'\u2060' (WORD JOINER) Section 16.8: '\ufffc' (OBJECT REPLACEMENT CHARACTER) Section 16.8: '\ufeff' (BOM) 3.Noncharacters (http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf section 16.7) '\uffff'.'\ufffe' 4.Unassigned Characters (http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf section 16.8) '\ufff0','\ufff1','\ufff2','\ufff3','\ufff4', '\ufff5','\ufff6','\ufff7','\ufff8' 5.Line and paragraph separator '\u2028','\u2029' -------------------------------------------
Created attachment 338122 [details] [review] Patch rev.2 fro Trunk According to result of testcase, these characters requires to be encoded. UASCII controll codes (U+000B,U+001C,U+001D,U+001E,U+001F), Soft-hyphen (U+00AD), Zero-width-space (U+200b), BOM (U+feff), Line separator and paragraph separator (U+2028,U+2029) is needed to be encoded on location bar.
Created attachment 338123 [details] [review] Patch rev.2.01 for Trunk Removed difference caused by broken character. Code is same as Rev.2.
I wonder why the existing whitespace replacement is inside the if - don't we want to replace whitespace either way?
I guess the idea is that aURI.spec will never contain unescaped whitespace.
Are we sure that's true for all of the characters you're adding here? Probably safer to split it out either way.
Comment on attachment 338123 [details] [review] Patch rev.2.01 for Trunk r=me, but please move the whitespace .replace() outside of the if like the others, and add a newline between the catch() and the comment you're adding.
Created attachment 339645 [details] [review] Patch rev.2.02 fixed patch
good work yamada!
Comment on attachment 339645 [details] [review] Patch rev.2.02 Just some whitespace nits: please add a newline before these lines you're adding, and a space after "BOM,".
and remove a space in front of encodeURIComponent ;)
Created attachment 339774 [details] [review] Patch rev 2.03
Comment on attachment 339774 [details] [review] Patch rev 2.03 No need to re-request review for such a minor change :)
Why not merge the new line with the existing one (below) ? As Neil suggested in bug 425480 comment 35, some character ranges can be used. In bug 425480 comment 39, he suggested other characters, but, reading this bug, I think it's not needed !?
Any update here? Does this just need landing on mozilla-central to be considered fixed?
"checkin-needed": What about (my) comment 24 ?
There's nothing really wrong with the current patch; it basically fixes this bug and should land as soon as possible, because it's blocking 1.9.0.4. If anyone attaches a better patch and gets it reviewed in time, I guess it wouldn't be rejected.
(In reply to comment #27) > The existing line is always needed, the new one depends on decodeURI (which > should probably be mentioned in a comment). If you could give me more details about this (comment), as the situation is not fully clear to me, I would probably update the patch.
see comment #2: > Confirmed, new in Firefox 3. The underlying core URI service doesn't do this, > is there an extra unescaping going on in the addressbar frontend?
(In reply to comment #31) > see comment #2: > > Confirmed, new in Firefox 3. The underlying core URI service doesn't do this, > > is there an extra unescaping going on in the addressbar frontend? Doesn't help me much :-(
Current patch does not encode control characters of other than U+0000 to U+FFFF. Some control character may be missing. But I don't have Windows Vista (or other OS that can display surrogate pair). I'll make additinal testcase for control characters that has special property. And, please test it.
Created attachment 341759 [details] new testcase
Created attachment 341760 [details] Screenshot of attachment 341759 [details] on Windows Vista
Created attachment 341897 [details] Screenshot of attachment 341759 [details] on ubuntu
(In reply to comment #36) > Created an attachment (id=341897) [details] > Screenshot of attachment 341759 [details] on ubuntu (In reply to comment #35) > Created an attachment (id=341760) [details] > Screenshot of attachment 341759 [details] on Windows Vista Sorry,My description was bad. Results I wanted is how Location bar shows URL after clicking link in HTML, not how Gecko renders HTML.
(In reply to comment #37) > (In reply to comment #36) > > Created an attachment (id=341897) > > Screenshot of attachment 341759 [details] on ubuntu > > (In reply to comment #35) > > Created an attachment (id=341760) > > Screenshot of attachment 341759 [details] on Windows Vista > > Sorry,My description was bad. > Results I wanted is how Location bar shows URL after clicking link in HTML, > not how Gecko renders HTML. Well, at least you can see that there aren't the same characters visible in the content area, which would be true for the location bar as well. Anyway, is there a reason not to encode control characters even if they are rendered consistently as hexboxes?
Wouldn't it be wanted to make a "mochitest" out of this testcase rather !?
Are there some missing characters that is needed to encode in my patch ?
Comment on attachment 339774 [details] [review] Patch rev 2.03 Can we land this patch?
Comment on attachment 339774 [details] [review] Patch rev 2.03 Sorry, it is too late to 1.9.0.4. Andm, this patch can be appleied to 1.8.1 branch. But 1.8.0 branch is not needed to fix this bug (1.8.0 branch does not decode Unicode characters on location bar).
This should land on trunk first. Can someone please land this on trunk?
(In reply to comment #39) > Wouldn't it be wanted to make a "mochitest" out of this testcase rather !? What about this ? Don't we want, or can't we have, some kind of automated test ?
(In reply to comment #44) > What about this ? Don't we want, or can't we have, some kind of automated test? It'd be nice, but we do have a manual testcase and I think we need new test scaffolding to check the URL bar. Please land this on the trunk and file a separate bug on the testcases.
Comment on attachment 339774 [details] [review] Patch rev 2.03 approved for 1.8.1.19 and 1.9.0.5, a=dveditz for release-drivers
Due to trunk freeze we'll take this safe fix on the branches before a trunk landing if necessary. Not sure how long the trunk will be closed or how likely this would be to get approved.
(In reply to comment #42) > Andm, this patch can be appleied to 1.8.1 branch. I don't see how; Firefox 2 doesn't decode URLs, and the function you're patching doesn't even exist there.
(In reply to comment #48) > (In reply to comment #42) > > Andm, this patch can be appleied to 1.8.1 branch. > > I don't see how; Firefox 2 doesn't decode URLs, and the function you're > patching doesn't even exist there. It's right. This patch is needed for only 1.9.0 brach or later and trunk. Sorry for spam.
Comment on attachment 339774 [details] [review] Patch rev 2.03 approval 1.8.1.19+ is not needed.
Comment on attachment 339774 [details] [review] Patch rev 2.03 a=beltzner
landed to trunk. http://hg.mozilla.org/mozilla-central/rev/4a4e856855e6
Backed this out in an attempt to fix bustage (not directly caused by this, but testing a theory).
Committed again, wasn't the problem.
Can we land this patch to 1.9.0Brach before Fx3.0.5 Code Freeze?
landed to cvs trunk, if there are some problems, please backout it.
I don't see the following characters on Linux: u+2060 u+2062 u+2063 u+fffc
Created attachment 348570 [details] [review] Patch Rev.2.04 Added to replace pattern : U+2060,U+2062,U+2063,U+fffc
backed out from branch.
Comment on attachment 348570 [details] [review] Patch Rev.2.04 fwiw r=dveditz, leaving request for gavin since he reviewed the original and I'm not a browser peer. Approved for 1.9.0.5, a=dveditz for release-drivers
It's not obvious to me which characters are affected by decodeURI and which aren't, so it seems like the safe choice here is to add any additional explicit encoding outside of the if statement (as with the original patch). Am I missing some reason to be confident that the encoding is only needed when we're doing the decoding?
Comment on attachment 348570 [details] [review] Patch Rev.2.04 Let's wait until after beta 2 for this. Gavin, can you nominate for approval1.9.1 once reviewed?
Comment on attachment 348570 [details] [review] Patch Rev.2.04 We're closing up for beta2 and this hasn't landed yet, so we'll hold off until after we branch.
Masahiro, any update to this bug about comment 61?
(In reply to comment #64) > Masahiro, any update to this bug about comment 61? U+FFF0 to U+FFF8 may be better to encode because they are defined as UNASSIGNED. So, at future, it may be assigned to special character that can spoof URL. And, If browser.js cat detect current OS supports surrogate pair or not, It may be better to encode surrogate pair when OS does not supports it. Should we encode them?
I wasn't questioning the chosen set of characters, I'm just pointing out that without evidence that the characters can only appear because of the decodeURI call, we should *always* call encodeURIComponent, regardless of whether we've called decodeURI. The first patch did this, but the latest patch adds one of the replace calls inside of the if() branch where we call decodeURI.
Comment on attachment 348570 [details] [review] Patch Rev.2.04 (In reply to comment #66) > I wasn't questioning the chosen set of characters, I'm just pointing out that > without evidence that the characters can only appear because of the decodeURI > call, we should *always* call encodeURIComponent, regardless of whether we've > called decodeURI. The first patch did this, but the latest patch adds one of > the replace calls inside of the if() branch where we call decodeURI. Oops. I'll write new patch.
Created attachment 349539 [details] [review] Patch Rev.2.05
Moving blocking out to 1.9.0.6. This missed 1.9.0.5.
Comment on attachment 349539 [details] [review] Patch Rev.2.05 >diff --git a/browser/base/content/browser.js b/browser/base/content/browser.js > } catch (e) {} >+ // Encode invisible characters (soft hyphen,zero-width space, BOM, nit: newline after the catch (e) {} nit: add a space after "soft hyphen," >+ // line and paragraph separator, word joiner, invisible times, >+ // invisivle separator, object replacement character) (bug 452979) "invisivle" -> "invisible" Can we get a bug on file for comment 27?
Does this need to be checked in by someone other than Masahiro?
I tested both of patch Rev.2.04 and 2.05, but I cannot find difference of behavior them. Are ther any change of implementation about handling controll characters in URL?
No, 2.05 is fine with the changes from comment 70 addressed. Shawn was just asking if you needed someone to check this in for you.
one of the patches here was already checked in to trunk. does trunk need to be updated to the 2.05 patch?
Masahiro: Can you please attach a patch with the changes requested in comment 70 so we can get this checked in all around?
Created attachment 355529 [details] [review] v2.06 for branch checkin
Created attachment 355530 [details] [review] v2.06 for trunk checkin
attached patches for branch and trunk, w/ comment 70 issues fixed. can land whenever the tree is clean.
on trunk: http://hg.mozilla.org/mozilla-central/rev/4af786d20e41
on branch: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/b8dca10fdb01
Do we need a new patch for 1.9.0 or can we safely merge the ones here?
Created attachment 356391 [details] [review] 1.9.0 patch
Comment on attachment 356391 [details] [review] 1.9.0 patch Approved for 1.9.0.7, a=dveditz for release-drivers.
Checked in to 1.9.0 branch.
Verified for 1.9.0.7 with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.7pre) Gecko/2009021704 GranParadiso/3.0.7pre. Verified for 1.9.1. with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090213 Shiretoko/3.1b3pre.
Verified for trunk with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090217 Minefield/3.2a1pre.