Re: [whatwg] base64 entities

<AANLkTi=qL8m3rn5FpX8s7eFnTbn0yE=e+zEFxrGY6DF7@mail.gmail.com>

Current votes: None.

On Thu, Aug 26, 2010 at 3:52 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> On 8/26/10 6:45 PM, Adam Barth wrote:
>>>
>>> Note that this issue means that using atob or btoa for dealing with thi=
s
>>> is
>>> a huge pain if non-ASCII chars are involved, since those take and retur=
n
>>> byte arrays masquerading as JS strings, not actual Unicode strings.
>>
>> I'm slightly confused how that works. =A0How do you represent arbitrary
>> binary data as characters?
>
> You mean how do atob/btoa take their binary data in JS-land? =A0You take =
your
> byte array, and convert it to a sequence of two-byte units by setting the
> high byte to 0. =A0This sequence of two-byte units is a JS string.

Crazy.

>> Another option is to provide a base64
>> encoder/decoder that uses UTF8 to encode/decode the binary.
>
> Not sure what the exact proposal here is.

The pipeline that makes sense to me is the following:

Unicode base64 character
--base64decode-->
byte array
--UTF8 decode-->
Unicode characters

Once we have real byte arrays in JavaScript, it probably makes sense
to expose a base64 decode function that takes unicode and produces an
honest byte array.  We might also want to expose a function that takes
byte arrays and interprets them as UTF8 (to produce unicode
characters).

>> Because<script> =A0does not decode entities in HTML, the attacker will
>> be limited to what he or she can do with alphanumeric characters
>
> OK. =A0I had misunderstood what you were proposing for <script> here. =A0=
The
> point is that inside <script> this base64 thing will only be useful for
> setting innerHTML, right?

Yes.  The point is that it's safe in most (all?) contexts, although
it's most useful between tags and in attributes.

Adam