Re: [whatwg] [URL] Starting work on a URL spec

<AANLkTi=88AtQTJroZUuC5ihX5jqOuj5RL4nop7Cm5eSr@mail.gmail.com>

Current votes: None.

--000e0cd5927adafc99048c1a6530
Content-Type: text/plain; charset=UTF-8

http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization
lists
some interesting cases we've come across on the anti-phishing team in
Google. To the extent you're concerned with / interested in
canonicalizaiton, it may be worth taking a look at (not to suggest you
follow that in determining how to parse/canonicalize URLs, but rather to
make sure that you have some "correct" way of handling the listed URLs).

BTW, are you covering canonicalization?

-Ian

On Fri, Jul 23, 2010 at 9:02 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:

> On 7/23/10 11:59 PM, Silvia Pfeiffer wrote:
>
>> Is that URLs as values of attributes in HTML or is that URLs as pasted
>> into the address bar? I believe their processing differs...
>>
>
> It certainly does in Firefox (the latter have a lot more fixup done to
> them, and there are also differences in terms of how character encodings are
> handled).
>
> I would be particularly interested in data on this last, across different
> browsers, operating systems, and locales...  There seem to be servers out
> there expecting their URIs in UTF-8 and others expecting them in ISO-8859-1,
> and it's not clear to me how to make things work with them all.
>
> -Boris
>

--000e0cd5927adafc99048c1a6530
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<a href=3D"http://code.google.com/apis/safebrowsing/developers_guide_v2.htm=
l#Canonicalization">http://code.google.com/apis/safebrowsing/developers_gui=
de_v2.html#Canonicalization</a>=C2=A0lists some interesting cases we&#39;ve=
 come across on the anti-phishing team in Google. To the extent you&#39;re =
concerned with / interested in canonicalizaiton, it may be worth taking a l=
ook at (not to suggest you follow that in determining how to parse/canonica=
lize URLs, but rather to make sure that you have some &quot;correct&quot; w=
ay of handling the listed URLs).<div>
<br></div><div>BTW, are you covering canonicalization?</div><div><br></div>=
<div>-Ian</div><div><br></div><div><div class=3D"gmail_quote">On Fri, Jul 2=
3, 2010 at 9:02 PM, Boris Zbarsky <span dir=3D"ltr">&lt;<a href=3D"mailto:b=
zbarsky@mit.edu">bzbarsky@mit.edu</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div class=3D"im">On 7/23/10 11:59 PM, Silv=
ia Pfeiffer wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Is that URLs as values of attributes in HTML or is that URLs as pasted<br>
into the address bar? I believe their processing differs...<br>
</blockquote>
<br></div>
It certainly does in Firefox (the latter have a lot more fixup done to them=
, and there are also differences in terms of how character encodings are ha=
ndled).<br>
<br>
I would be particularly interested in data on this last, across different b=
rowsers, operating systems, and locales... =C2=A0There seem to be servers o=
ut there expecting their URIs in UTF-8 and others expecting them in ISO-885=
9-1, and it&#39;s not clear to me how to make things work with them all.<br=
>
<font color=3D"#888888">
<br>
-Boris<br>
</font></blockquote></div><br></div>

--000e0cd5927adafc99048c1a6530--