- Re: [whatwg] pushState
So to back up. There are a few things that are important to me:
1. The session history (SH) entries created in this way should act
as much like normal SH entries as possible to the user.
I.e. the user doesn't care if clicking what looks like a link
results in a navigation or a .pushState call.
2. That this API allows pages to be written that survives a browser
restart.
This is to prevent data loss in case of crash, and to increase
chances that users install security updates which require restart
as soon as possible.
3. That the API encourages bug-free pages. I.e. pages that use the API
'correctly'.
Just 1 in itself means that we can't just kill SH entries just because
the Document they were created from goes away from fastback. If I browse
away from a gmail and see 5 pages in my SH list then I would be very
disappointed if those 5 pages just disappeared because I happened to be
browsing around in another window which ate up the fastback cache.
Similarly, a mobile device might not have a fastback cache at all. It
would look very strange if while i'm staying on the gmail site i'm able
to go 'back' to what to me looks like various different pages. But as
soon as i go to some other site, I loose all those pages and can only go
back directly to the first gmail page.
So we must definitely store the pushed SH data in such a way that if the
Document is recreated the data survives.
This applies whether the URL of my browser changes or not while I'm
doing this browsing. The rendering changing is more important than the
URL changing for a user.
So what I think we should do is to enforce that 'data' is a JSON
serializable object. When a Document is destroyed (due to eviction from
the fastback cache or due to not being cachable) we don't touch the SH
entries associated with that document.
When entering a SH state for which a Document has been destroyed, we
load the URL associated with that SH entry. After the 'load' event for
the Document has fired, and if a data object was provided in the
pushState call for the SH entry, we fire a PopStateEvent event
containing the data stored for the object.
The same thing happens if the user clicks the reload button while on an
SH entry created using pushState. The URL for that entry is loaded and
after the 'load' event has fired, if a data object was provided during
the pushState call, a PopStateEvent is fired with that data.
To minimize the difference between when a SH entry is recovered from the
fastback cache, compared to when the document is reparsed, the Location
object should be changed to reflect the new URL whenever pushState is
called with a url. The advantage of changing the Location object is that
this makes the page behave the same no matter of if it has been reloaded
or not. Setting the Location objects value will cause reloading vs.
scrolling vs. do nothing in the same cases. Similarly reading the
Location will return the same thing no matter of if there has been a
reload or not.
Yes, this is different from how legacy browsers behave. However the
whole point of this API is to improve on the current iframe hacks. If we
didn't there would be no point in adding a new API as it wouldn't be
worth the code fork for users.
Reloads mostly don't work with the iframe hack anyway so you'll end up
with vastly different behavior no matter what. And if we're not
considering the reload case then the hashchanged event should be enough.
This isn't a big deal though as far as I can think of. Only if your
application runs inside someone elses iframe and that outer app is
intimately interacting with you I can see that it makes a difference.
This doesn't seem common enough that we should prioritize for it.
/ Jonas
- Re: [whatwg] pushState
Jonas Sicking wrote:
> To minimize the difference between when a SH entry is recovered from the
> fastback cache, compared to when the document is reparsed, the Location
> object should be changed to reflect the new URL whenever pushState is
> called with a url. The advantage of changing the Location object is that
> this makes the page behave the same no matter of if it has been reloaded
> or not. Setting the Location objects value will cause reloading vs.
> scrolling vs. do nothing in the same cases. Similarly reading the
> Location will return the same thing no matter of if there has been a
> reload or not.
>
>
> Yes, this is different from how legacy browsers behave. However the
> whole point of this API is to improve on the current iframe hacks. If we
> didn't there would be no point in adding a new API as it wouldn't be
> worth the code fork for users.
>
> Reloads mostly don't work with the iframe hack anyway so you'll end up
> with vastly different behavior no matter what. And if we're not
> considering the reload case then the hashchanged event should be enough.
>
> This isn't a big deal though as far as I can think of. Only if your
> application runs inside someone elses iframe and that outer app is
> intimately interacting with you I can see that it makes a difference.
> This doesn't seem common enough that we should prioritize for it.
Additionally, I just noticed that gmail does set the location.hash of
the top window. So gmail would want to change the Location object for
compat with legacy browsers.
And a library that used a hidden iframe for legacy browsers and really
wanted to be compatible as far as the Location object goes (though i'm
still unconvinced that anyone cares) could use iframe.history.pushState();
/ Jonas
- Re: [whatwg] pushState
On Sun, 03 Aug 2008 22:47:24 +0200, Jonas Sicking <jonas@sicking.cc> wrote:
> Personally I think keeping the URL is fine. We can never entirely
> prevent pages from having bugs. But instead encourage the safe
> transitions, and always use safe-looking transitions in examples in the
> spec.
FWIW, I think the URL argument is the best part of this feature. I don't
want to lose it!
- fragid navigation and pct-encoded
Hi,
Apparently there are some differences between browsers in the handling of
percent escaped characters in fragment identifiers. I made a few tests to
figure out the different behavior:
http://tc.labs.opera.com/html/navigation/fragids/
I was able to test in Opera 9.5, Firefox 3.0, and Internet Explorer 6.0.
Results:
IE does not handle pct-encoded in fragment which is in violation of RFC
3986. It does nothing special with either the name or id attributes;
simple literal matching.
Firefox does handle pct-encoded in fragment. It also handles pct-encoded
in the name attribute. It effectively performs pct-encoded handling in
fragment and name attributes and after that performs literal matching.
Thus a fragment of ? and a name attribute of %3FC match and vice versa.
Likewise, a fragment of %253F does not match a name attribute of %3FC. The
id attribute is not affected by pct-encoded handling. So a fragment of ?
does not match an id attribute of %3F.
Opera does handle pct-encoded in fragment. It does not have special
handling of attributes. This is the behavior prescribed by HTML5 but
breaks sites. Eg,
http://www.readynas.com/forum/faq.php
The test suite assumes Firefox is correct as that seems the most
"sensible" behavior if you want to be compliant with RFC 3986 and
compatible with the Web. I suggest we change HTML5 to perform pct-encoded
handling for name attributes. I have not checked whether this affects the
usemap attribute.
Kind regards,
- Re: fragid navigation and pct-encoded
On Thu, Sep 4, 2008 at 3:05 PM, Anne van Kesteren wrote:
>
> Apparently there are some differences between browsers in the handling of
> percent escaped characters in fragment identifiers. I made a few tests to
> figure out the different behavior:
>
> http://tc.labs.opera.com/html/navigation/fragids/
>
> I was able to test in Opera 9.5, Firefox 3.0, and Internet Explorer 6.0.
Raw results:
In Chrome and Safari 3.1.2 on Windows (same results, unsurprisingly ;-) ):
Test 3 says A (as HTML5)
Test 4 says B
Test 8 says B
In IE7:
Test 2 says A
Test 3 says A
Test 6 says A
Test 8 says B
In Firefox 2.0.0.16:
Same as the "expected results" column.
> Results:
>
> IE does not handle pct-encoded in fragment which is in violation of RFC
> 3986. It does nothing special with either the name or id attributes; simple
> literal matching.
IE7 seems to align with IE6 here, if I analyze the results correctly.
I'll let you do the analysis of the WebKit results, as I haven't
digged into the internals of the tests.
BTW, have you tried with links within the same document? There
shouldn't be any difference IMO but just in case...
- Re: fragid navigation and pct-encoded
On Thu, 04 Sep 2008 15:55:17 +0200, Thomas Broyer <t.broyer@gmail.com>
wrote:
> On Thu, Sep 4, 2008 at 3:05 PM, Anne van Kesteren wrote:
>> Apparently there are some differences between browsers in the handling
>> of percent escaped characters in fragment identifiers. I made a few
>> tests to figure out the different behavior:
>>
>> http://tc.labs.opera.com/html/navigation/fragids/
>
> Raw results:
>
> In Chrome and Safari 3.1.2 on Windows (same results, unsurprisingly ;-)
> ):
> Test 3 says A (as HTML5)
> Test 4 says B
> Test 8 says B
Hmm, so WebKit has a fourth algorithm? That is, first try if fragment
works without handling pct-encoded and if that fails, try again with a
pct-encoded fragment.
> BTW, have you tried with links within the same document? There
> shouldn't be any difference IMO but just in case...
I tested that a little bit, does not matter as far as I can tell.
- RE: [whatwg] hashchange only dispatched in history traversal
I think the main problem here is that we want to package two different
functions into one event. The legitimate use of hash change where it is
used to reveal a bookmark should trigger an event named "reveal" dispatched
to the target anchor and bubbling, where the handler for the Window object
can be specified as BODY[onreveal]. The AJAX abuse where it is used to
change the context of the page, whereas it can be determined when no
corresponding target anchor can be found, should trigger the hash change
notification broadcast to every object of the active document that registers
for it by declaring a handler. Does this make sense?
Chris
- Re: [whatwg] hashchange only dispatched in history traversal
On Tue, Sep 9, 2008 at 12:48 PM, Garrett Smith <dhtmlkitchen@gmail.com> wrote:
> On Thu, Aug 21, 2008 at 4:09 PM, Ian Hickson <ian@hixie.ch> wrote:
>> On Thu, 21 Aug 2008, Garrett Smith wrote:
>>>
>
> and what if you have:
>
> <body onhashchange="alert(document.body.ohashchange);">
>
............................................................................^
Should be onhashchange,
alert(document.body.onhashchange);
>
> Garrett
>
>> --
>> Ian Hickson U+1047E )\._.,--....,'``. fL
- Re: [whatwg] Citing multiple <blockquote> elements in HTML5
Ian Hickson ha scritto:
> On Wed, 3 Dec 2008, Calogero Alex Baldacchino wrote:
>
>> But, isn't it worth to spend a word everywhere in the spec to tell when
>> it's a quirck for backward compatibility, which might go away in the
>> future, and when it's not, because that's not needed?
>>
>
> None of the implementation requirements in HTML5 will go away in the
> future. We will always have to define how implementation are to handle all
> inputs, today, tomorrow, and 100 years from now. Authors aren't going to
> stop writing invalid documents, unfortunately; and even if they did, the
> documents that exist today aren't going anywhere. (One of the goals of the
> HTML5 project is to document how someone in 2100 AD, or even 21000 AD,
> should handle Web pages of today, so that today's heritage isn't lost.)
>
>
>
Ok, and agreed. Due to the nature of the web (and of web authors'
practices), a strict conformance requirement (such as it might be for a
C compiler) will never be a good idea.
>> I mean, if you allow spacing characters inside an id value, as a parsing rule,
>> you can face something like '<div id="foo bar" >', that is an id consisting of
>> more than one token. Is it good to leave it in untouched? Yes? Ok, but what
>> does it mean for CSS's, since there is a reference to them as one reason to
>> allow space characters? That is, can a browser handle an id selector starting
>> with the '#' character and being broken by a blank space?
>>
>
> Sure:
>
> #foo\ bar { ... }
>
> ...would match an element with id="foo bar".
>
>
>
Right, now I remember... sorry for my mess...
>> Now, let's say, instead, that a user agent, conforming with HTML 5
>> specifications, must cut off any token after the first one (I know
>> actually "foo bar" is taken as is), that is <div id="foo bar"> becomes
>> <div id="foo "> and <div id=" foo "> is valid too. In such a case,
>> skipping any spaces too, and stating the same behaviour for strings
>> passed to .getElementById() could be nice as a graceful degradation for
>> documents non-conforming with the rule "the value [of an id attribute]
>> must not contain any space characters", but such might fail with CSS
>> selectors such as 'div[id="foo bar"]'.
>>
>
> I don't follow you there. What problem are you trying to solve?
>
>
Just trying to explain why I was suggesting such a behaviour (=
stripping space characters) in my first message about that. I was
wrongly ignoring the case of id="foo bar" and just concerning on id="
foo ", but not confusing authoring and parsing rules (even if I admit
sometimes I've strict conformance in mind). If the latter were the only
"naughty boy" out there, perhaps stripping spaces might have had some
sense (though not the best choice without touching other things maybe
out of scope).
>
>> Perhaps a compromise, if acceptable for backward compatibility, might be:
>> - when the id value must be compared to a fragment identifier, strip any
>> trailing space characters; if the match fails, escape any other space
>> characters both in the id value and in the fragid and try again;
>>
>
> Why not just do what we do now, and treat the attribute as-is?
>
>
>
>> - when an attribute is defined to hold an url and its value has spaces in its
>> path/query/fragment, escape them before resolving the url (not sure if
>> needed);
>>
>
> Again, aren't the current rules for handling URLs as defined in HTML5
> enough?
>
>
>
Maybe the first is wrong, and I'm still unsure of the second. My concern
is, a character-by-character comparison between an id value and a
fragment identifier may fail several ways. What for href="#foo bar " and
id="foo bar "? Actual rules would strip the trailing space only for the
href, so the matching would fail (but we might survive broken links).
Escaping both, then comparing would succed, as well as first escaping
then unescaping the href value before comparing (should it be pointed
out, somewhere, that a fragment identifier must be unescaped before
comparing to an id or a name? is it and I've missed it? - having space
characters in the unreserved production means thy don't need to be
escaped, but does it mean also they must be decoded from their
pct-production, after parsing and for resolving?). As well, stripping
the trailing spaces in both cases would succed, but would fail when
comparing id="foo bar " with href="#foo bar%20" (which is a valid url,
according with actual parsing rules), even with escaping rules (in this
case the id value trailing space must stay there). And what about
id="foo%20bar" in http://foo.example.org/foo.html and href="#foo bar"
on the same page, or on a page having the same base URL, or a base
element with href="http://foo.example.org/foo.html" ? My point is, since
comparisons for matching purpose happen after the URL parsing and
resolution, and the id value is not involved in such steps,
character-by-character comparisons may fail without a prior
normalization of both th fragment-identifier an the id value (or one of
them). However, if the above is yet solved with parsing and resolving
rules and I've misunderstood the spec, I retire all and apologize. Or,
perhaps, must a valid url with a valid fragment, which is equivalent but
not exactly matching an id value, be considered as a broken link?
>
>
>> Anyway, if the id value is also a fragment identifier, which might have
>> space characters (since parsing rules prescribe to add such characters
>> to the unreserved production), does the (authoring) rule "the value must
>> not contain any space characters" make sense?
>>
>
> Sure, why wouldn't it make sense? If IDs have spaces in them, you can't
> refer to them from space-separated lists of IDs, so to avoid authoring
> problems, authors will want to be told when they acidentally use spaces.
>
>
>
I'll try and make that point a bit clearer, since the reference to url
parsing rules was wrong - the question is another.
That's because of the double nature of the id attribute as both an ID
and a fragment identifier: according to RFC 3986, unless I have
misunderstood anything there, after dividing an URI into its component,
pct-triplets may be safely decoded (and should be to correctly interpret
each component), thus "%20foo%20bar%20" and " foo bar " are equivalent
and both valid as conforming dereferenced <fragment-identifier>
components (while only the former is conforming as a part of a complete
URI, since for rfc3986 spaces are not 'unreserved'), but the latter is a
non conforming ID according to the rule "an id value must not contain
any space characters", which is a somewhat restriction to the
fragment-identifier conformance. As far as conforming user agents leave
it as is, that's not a concern; anyway, formally, is it something to be
solved or pointed out somehow, in the spec? When a validator/an
authoring tool finds something like,
<!-- The following section is a review of Los Angeles inside an article
about California
- just to create a context for the example -->
<section id="Los Angeles" >
...
</section>
shall it only report the id value as mistaken, or has it to say also
it's a valid fragment identifier if the author is setting the id as an
anchor?
>
> What terminology would you prefer rather than "subtree"? (We can't say
> document, since we are also trying to define conformance rules for
> disconnected subtrees handled from scripts.)
>
>
>
Uhm, it may depend on what kinds of manipulations you have in mind,
whether the disconnected subtree must be anyway a whole document to
fulfil the uniqueness rule, and perhaps also on what the subtree concept
might be turned into by future DOM Core versions, so maybe just a
clarification on what a subtree is with respect to both the document (as
a tree) and the scripts handling possibilities might be enough, instead
of searching a new terminology, just to 'scope' the id visibility. I
mean, if the ID matching is relevant for scripts accessing the matching
element through the getElementById() method, actually a document tree is
always overlapping the concept of subtree, and a disconnected subtree
must be a document without a browsing context; otherwise, if other dom
manipulations are involved the concept of subtree may change, for
instance a script might implement its own scanning routine, treating an
id attribute as any other attribute and leading to the concept that any
non-leaf node may be the root of a subtree (that is identifying a
subtree with any possible document fragment); furthermore, a possible
future version of DOM Core interfaces might move the getElementById
method to the Node interface, leading to the same result. Thus, a
generic definition of 'subtree' (or no definition, or a definition
relying upon a specific DOM feature or on script handling) might result
in a variable concept with a variable scope for the ID uniqueness, but
might make sense in a working draft until at least a first definition of
the Web DOM Core specification, or waiting for any reason arising to
restrict or enlarge the concept; otherwise, if that's been stated with a
large consensus that a subtree is always a document tree, the term might
be changed into the expression "a document, with or without a browsing
context", or (equivalently) be defined as "a document subtree having a
node of type document as its root" (to cover the case of dynamically
created documents). Otherwise, if a subtree can be either a whole
document, or a document subtree detached from its owner document (i.e. a
node removed from a document with its descendants, or a tree of nodes
whose ownerDocument property is not defined or null), it might be
defined just as such, leaving the term 'subtree' wherever it is now (but
would such a manipulation be consistent with the - authoring -
uniqueness rule when the subtree is inserted into an actual document?).
> The getElementById() method will be defined more precisely than the vague
> wording in the DOM specs. I believe Simon Pieters is working on that.
>
>
>
I acknowledge this.
> CSS doesn't search for a single match for IDs, it just looks for whether
> an element matches the selector or not. So it doesn't care if there are
> duplicates. But anyway, CSS is out of scope for this mailing list.
>
>
I agree, and just wondered whether it may or may not be a concern for
consistent manipulation through both the DOM and the CSS, but I can't
focus a concrete example where such a concern might arise, not being a
side effect of a bad programming out of scope for both CSS and DOM, and
I also acknowledge that might be in the scope of Web DOM Core, since
it's been established it's out of scope for HTML specific DOM (which
doesn't define any basic elements properties and access methods, but
just html-specifically targeted ones, and I found this is consistent
with the choice to define some stand-alone interfaces instead of always
inheriting from the basic counterparts).
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Attiva Carta Eureka! Credito fino a 3.000, rate da 20 e zero costi di attivazione. Conviene!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8429&d=3-12
- [whatwg] URL parsing and same-document references [was: Re: [whatwg] Citing multiple <blockquote> elements in HTML5]
Calogero Alex Baldacchino ha scritto:
>
> Maybe the first is wrong, and I'm still unsure of the second. My
> concern is, a character-by-character comparison between an id value
> and a fragment identifier may fail several ways. What for href="#foo
> bar " and id="foo bar "? Actual rules would strip the trailing space
> only for the href, so the matching would fail (but we might survive
> broken links). Escaping both, then comparing would succed, as well as
> first escaping then unescaping the href value before comparing (should
> it be pointed out, somewhere, that a fragment identifier must be
> unescaped before comparing to an id or a name? is it and I've missed
> it? - having space characters in the unreserved production means thy
> don't need to be escaped, but does it mean also they must be decoded
> from their pct-production, after parsing and for resolving?). As well,
> stripping the trailing spaces in both cases would succed, but would
> fail when comparing id="foo bar " with href="#foo bar%20" (which is a
> valid url, according with actual parsing rules), even with escaping
> rules (in this case the id value trailing space must stay there). And
> what about id="foo%20bar" in http://foo.example.org/foo.html and
> href="#foo bar" on the same page, or on a page having the same base
> URL, or a base element with href="http://foo.example.org/foo.html" ?
> My point is, since comparisons for matching purpose happen after the
> URL parsing and resolution, and the id value is not involved in such
> steps, character-by-character comparisons may fail without a prior
> normalization of both th fragment-identifier an the id value (or one
> of them). However, if the above is yet solved with parsing and
> resolving rules and I've misunderstood the spec, I retire all and
> apologize. Or, perhaps, must a valid url with a valid fragment, which
> is equivalent but not exactly matching an id value, be considered as a
> broken link?
>
Maybe the above needs a further clarification. Let me start from URL
parsing (and resolving) rules: after the URL is validated, it's divided
into its components, but nothing is stated about normalization and/or
%-encoded characters. I think that applying a somewhat normalization may
be useful to parse equivalent URLs in a consistent manner, helpful when
dealing with the interfaces for URL manipulation, as described in
section 2.5.5, and, last but not least, an improvement in relative
references matching (especially same-document references). A minimum
requirement, for standardization sake, may consist of decoding any
%-encoded characters in the <fragment> production, which are part of the
<unreserved> production as defined in RFC 3986 with the changes defined
in HTML 5 specification for URLs parsing and restricted to the Unicode
ranges representing valid characters for an attribute value (those which
are not prohibited neither as 'text' nor as 'character references').
This way, a character-for-character comparison between a fragment
identifier and an id attribute value, which would have been equivalent
but not matching without the normalization, should success most of
times, because, as a consequence of the changes applied by HTML 5
current specification to the <unreserved> production, such characters
might or might not be %-encoded in a valid URL, while an id value is
likely to contain them non-encoded.
After the above <fragment> normalization, a character-for-character
comparison would fail if the id value contained any %-encoded triplet
representing a decoded character, such as "foo%20bar". Anyway, such may
be a weird thing to deal with, since it can be the %-encoded form of
"foo bar", but also the decoded form of "foo%2520bar". In other words,
if we apply the same normalization to two complete URLs, then we compare
them, the result is quite reliable, but if we start from a component
(such as a fragment identifier stored in an id attribute value) it's not
easy to tell whether any normalization has been applied and which one,
so there are always chances for false positives or false negatives to
happen. According with RFC 3986, section "4.4. Same-Document Reference",
the correct interpretation of a URI as a same-document reference cannot
be hold as guaranteed, thus the mismatch between, for instance, the
decoded fragment identifier "foo bar" and the id attribute value
"foo%20bar", in front of (as I think) a wide majority of good matches,
can be reasonable. Anyway, a kind of double check might be considered,
such as:
- comparing the %-unescaped fragment identifier with the ID of each
element in the DOM;
- upon failure, applying a %-unescape algorithm to the ID, then
comparing again with the fragment identifier and, if matching, marking
the element as a 'possible choice';
- upon a perfect (exact) match, without unescaping the evaluated element
ID, choosing such element as the referenced document part (actually
defined as "the indicated part of the document" in the spec) and stopping;
- without any perfect match in the whole document, choosing the first
'possible choice', if any;
- without any match at all, the search for the referenced document part
fails.
With respect to a "single check" for an exact match, the overall
computational time should increase linearly, thus not being a
performance issue.
Best regards, Alex.
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
RC Auto?
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8496&d=5-12
- Re: [whatwg] URL parsing and same-document references [was: Re: Citing multiple <blockquote> elements in HTML5]
Calogero Alex Baldacchino ha scritto:
> Maybe the above needs a further clarification. Let me start from URL
> parsing (and resolving) rules: after the URL is validated, it's
> divided into its components, but nothing is stated about normalization
> and/or %-encoded characters. I think that applying a somewhat
> normalization may be useful to parse equivalent URLs in a consistent
> manner, helpful when dealing with the interfaces for URL manipulation,
> as described in section 2.5.5, and, last but not least, an improvement
> in relative references matching (especially same-document references).
> A minimum requirement, for standardization sake, may consist of
> decoding any %-encoded characters in the <fragment> production, which
> are part of the <unreserved> production as defined in RFC 3986 with
> the changes defined in HTML 5 specification for URLs parsing and
> restricted to the Unicode ranges representing valid characters for an
> attribute value (those which are not prohibited neither as 'text' nor
> as 'character references'). This way, a character-for-character
> comparison between a fragment identifier and an id attribute value,
> which would have been equivalent but not matching without the
> normalization, should success most of times, because, as a consequence
> of the changes applied by HTML 5 current specification to the
> <unreserved> production, such characters might or might not be
> %-encoded in a valid URL, while an id value is likely to contain them
> non-encoded.
>
> After the above <fragment> normalization, a character-for-character
> comparison would fail if the id value contained any %-encoded triplet
> representing a decoded character, such as "foo%20bar". Anyway, such
> may be a weird thing to deal with, since it can be the %-encoded form
> of "foo bar", but also the decoded form of "foo%2520bar". In other
> words, if we apply the same normalization to two complete URLs, then
> we compare them, the result is quite reliable, but if we start from a
> component (such as a fragment identifier stored in an id attribute
> value) it's not easy to tell whether any normalization has been
> applied and which one, so there are always chances for false positives
> or false negatives to happen. According with RFC 3986, section "4.4.
> Same-Document Reference", the correct interpretation of a URI as a
> same-document reference cannot be hold as guaranteed, thus the
> mismatch between, for instance, the decoded fragment identifier "foo
> bar" and the id attribute value "foo%20bar", in front of (as I think)
> a wide majority of good matches, can be reasonable. Anyway, a kind of
> double check might be considered, such as:
>
> - comparing the %-unescaped fragment identifier with the ID of each
> element in the DOM;
> - upon failure, applying a %-unescape algorithm to the ID, then
> comparing again with the fragment identifier and, if matching, marking
> the element as a 'possible choice';
> - upon a perfect (exact) match, without unescaping the evaluated
> element ID, choosing such element as the referenced document part
> (actually defined as "the indicated part of the document" in the spec)
> and stopping;
> - without any perfect match in the whole document, choosing the first
> 'possible choice', if any;
> - without any match at all, the search for the referenced document
> part fails.
>
> With respect to a "single check" for an exact match, the overall
> computational time should increase linearly, thus not being a
> performance issue.
>
> Best regards, Alex.
The above (but the 'double check' I was suggesting) is about the way
Firefox (2.x and 3.0.4) behaves (both href="#foo%20bar" and, in a
different page, href="./example.html#foo%20bar" match id="foo bar"),
while IE7 and Opera 9.x perform an exact comparison, and show, in the
address bar, an url with eventual blank spaces, thus applying the
relaxation allowed by URL parsing rules, but not conforming to RFC 3986,
as a complete URI string. It seems different browsers implement (more or
less) different normalization/resolution algorithms, leading to
different matches, thus the specification of a uniform behaviour
(whatever one) might be reasonable and useful. Actual resolving
algorithm, while explicitly asking for %-encoding in a path component
and for conformance with RFC 3986 in general, doesn't talk about
fragment identifiers; the referred algorithm for relative resolutions
(section 5.2 of RFC 3986), AIUI, might not require the creation of a
complete URI string, but instead be accomplished by returning an object
holding a separated string for each URI part, thus not necessarily
requiring %-encoding and potentially leaving out to UAs a certain degree
of freedom. Furthermore, about URL decomposition attributes it is said,
'On setting, the new value must first be mutated as described by the
"setter preprocessor" column, then mutated by %-escaping any characters
in the new value that are not valid in the relevant component as given
by the "component" column.'; such seems to refer to the stricter RFC3986
requirements (which in turn might be relaxed, since any part of a
decomposed URL may contain unescaped characters), however, the
'component column' points, for each component, to the corresponding
definition givent for a parsed-URL component, which is not strictly
required to have escaped characters by actual parsing rules. I'd suggest
to re-consider the whole mechanism to avoid any free interpretation and
make each phase/operation (parsing, resolving, attributes setting) more
consistent both with each other and cross-browser, if possible (I'd also
consider one or more DOM methods to help an easy comparison between
URL-strings and/or between component attributes).
Best regards,
Alex.
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Scopri le supernovità dei games per cellulare! Giocale tutte!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8271&d=12-12
- Re: [whatwg] URL parsing and same-document references [was:
Am Freitag, den 12.12.2008, 20:36 +0100 schrieb Calogero Alex
Baldacchino:
> The above (but the 'double check' I was suggesting) is about the way
> Firefox (2.x and 3.0.4) behaves (both href="#foo%20bar" and, in a
> different page, href="./example.html#foo%20bar" match id="foo bar"),
> while IE7 and Opera 9.x perform an exact comparison, and show, in the
> address bar, an url with eventual blank spaces, thus applying the
> relaxation allowed by URL parsing rules, but not conforming to RFC 3986,
> as a complete URI string.
Whenever I copypaste an URI from the address bar to any other program, I
am severely annoyed by this, especially when spaces (delimiters !) are
part of the fake-URI. A chat or office program, for example, is unable
to highlight the fake-URI anymore, (how could it ?), also pasting it
into source code can create all kind of validation errors. And whenever
I get a bastardized URI via chat or mail, only a part of it is
clickable.
Can someone from the web browser faction please state if there is any
data to support breaking RFC-compatibility ? Because as I see it, its
something that makes it appear nicer, but breaks whenever URIs are to be
transferred / communicated.
Getting to the problem mentioned here, the robustness principle says
that id="foo bar" should be accepted, but nevertheless invalid - because
a fragment with a space can never be part of an URI. So IMHO, any
program should strive to accept broken URIs if they are unambigous
(which they are here, because the address can hold only one URI at a
time), but never output them.
Greetings
- Re: [whatwg] URL parsing and same-document references
Nils Dagsson Moskopp ha scritto:
> Am Freitag, den 12.12.2008, 20:36 +0100 schrieb Calogero Alex
> Baldacchino:
>
>> The above (but the 'double check' I was suggesting) is about the way
>> Firefox (2.x and 3.0.4) behaves (both href="#foo%20bar" and, in a
>> different page, href="./example.html#foo%20bar" match id="foo bar"),
>> while IE7 and Opera 9.x perform an exact comparison, and show, in the
>> address bar, an url with eventual blank spaces, thus applying the
>> relaxation allowed by URL parsing rules, but not conforming to RFC 3986,
>> as a complete URI string.
>>
> Whenever I copypaste an URI from the address bar to any other program, I
> am severely annoyed by this, especially when spaces (delimiters !) are
> part of the fake-URI. A chat or office program, for example, is unable
> to highlight the fake-URI anymore, (how could it ?), also pasting it
> into source code can create all kind of validation errors. And whenever
> I get a bastardized URI via chat or mail, only a part of it is
> clickable.
>
> Can someone from the web browser faction please state if there is any
> data to support breaking RFC-compatibility ? Because as I see it, its
> something that makes it appear nicer, but breaks whenever URIs are to be
> transferred / communicated.
>
Actually I'm not from any faction, to be honest. I think a rationale for
that may be "people write strange things, both in address bars and in
html code", thus relaxing rules when parsing an URL is meaningful; but I
think when resolving and recomposing a whole URI the strictest rules
should be applied.
> Getting to the problem mentioned here, the robustness principle says
> that id="foo bar" should be accepted, but nevertheless invalid - because
> a fragment with a space can never be part of an URI.
Indeed, that's not part of an URI, but a dereferenced component: when
splitting an URI into its components, there is no need to keep %-encoded
characters (RFC3986 says separated components can be decoded, thus,
AIUI, both href="#foo bar" and id="foo bar" respect to conformance
rules, but when resolving "#foo bar" into a complete, absolute URI, the
result should always look like
"http://example.org/something.html#foo%20bar" to be conforming).
Regards,
Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Proteggi la tua auto
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8509&d=13-12
- Re: [whatwg] URL parsing and same-document
Am Samstag, den 13.12.2008, 19:09 +0100 schrieb Calogero Alex
Baldacchino:
> Actually I'm not from any faction, to be honest. I think a rationale for
> that may be "people write strange things, both in address bars and in
> html code", thus relaxing rules when parsing an URL is meaningful; but I
> think when resolving and recomposing a whole URI the strictest rules
> should be applied.
Accepting weird input is not a problem here, outputting is. Try writing
a valid URI into the address bar, then get an invalid displayed.
Greetings
- Re: [whatwg] URL parsing and
Nils Dagsson Moskopp ha scritto:
> Am Samstag, den 13.12.2008, 19:09 +0100 schrieb Calogero Alex
> Baldacchino:
>
>> Actually I'm not from any faction, to be honest. I think a rationale for
>> that may be "people write strange things, both in address bars and in
>> html code", thus relaxing rules when parsing an URL is meaningful; but I
>> think when resolving and recomposing a whole URI the strictest rules
>> should be applied.
>>
> Accepting weird input is not a problem here, outputting is. Try writing
> a valid URI into the address bar, then get an invalid displayed.
>
>
> Greetings
>
Could you make an example, please? I wasn't able to reproduce such in
IE7 - Opera 9.27 (e.g.,
"http://real.addressofasite.com/index.html#foo%20bar" wasn't changed
into "http://real.addressofasite.com/index.html#foo bar").
Anyway, I guess you got the point. Relaxed parsing rules are for input
URLs, but after parsing, a normalization and/or the resolution algorithm
should be applied, and the showed URL, being absolute and complete,
should conform to RFC3986. Actual resolution algorithm (section 2.5.3 of
html5 spec) does not mention fragment identifiers explicitly, and,
although its 10th step says "Apply any relevant conformance criteria of
RFC 3986 and RFC 3987, returning an error and aborting these steps if
appropriate.", step 9 says "Apply the algorithm described in RFC 3986
section 5.2 Relative Resolution, using url as the potentially relative
URI reference (R), and base as the base URI (Base)": AIUI, the algorithm
described in section 5.2 of rfc3986 might be applied to each component
of an URI without building a complete URI (instead, leaving each part
separated and held as a property of an object - a components
recomposition algorithm is defined in section 5.3 of rfc3986, but that's
not a 'must'); when a single component of an URI is to be handled,
rfc3986 does not require %-encoding as a 'must', thus the freedom of
interpretations and the different behaviors in different UAs, leading to
inconsistent results when copying a URL from a UA and pasting it into
another one. I think a uniform behaviour should be defined as standard
(and implemented!), instead (the concern you rised about copy&paste
perhaps results in a further issue regarding how line breaks should be
handled by parsing rules - e.g. stripped like leading and trailing
characters).
Regards,
Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
CheBanca! La prima banca che ti dà gli interessi in anticipo.
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=7918&d=14-12