<AANLkTi=qL8m3rn5FpX8s7eFnTbn0yE=e+zEFxrGY6DF7@mail.gmail.com>
Current votes: None.
On Thu, Aug 26, 2010 at 3:52 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote: > On 8/26/10 6:45 PM, Adam Barth wrote: >>> >>> Note that this issue means that using atob or btoa for dealing with thi= s >>> is >>> a huge pain if non-ASCII chars are involved, since those take and retur= n >>> byte arrays masquerading as JS strings, not actual Unicode strings. >> >> I'm slightly confused how that works. =A0How do you represent arbitrary >> binary data as characters? > > You mean how do atob/btoa take their binary data in JS-land? =A0You take = your > byte array, and convert it to a sequence of two-byte units by setting the > high byte to 0. =A0This sequence of two-byte units is a JS string. Crazy. >> Another option is to provide a base64 >> encoder/decoder that uses UTF8 to encode/decode the binary. > > Not sure what the exact proposal here is. The pipeline that makes sense to me is the following: Unicode base64 character --base64decode--> byte array --UTF8 decode--> Unicode characters Once we have real byte arrays in JavaScript, it probably makes sense to expose a base64 decode function that takes unicode and produces an honest byte array. We might also want to expose a function that takes byte arrays and interprets them as UTF8 (to produce unicode characters). >> Because<script> =A0does not decode entities in HTML, the attacker will >> be limited to what he or she can do with alphanumeric characters > > OK. =A0I had misunderstood what you were proposing for <script> here. =A0= The > point is that inside <script> this base64 thing will only be useful for > setting innerHTML, right? Yes. The point is that it's safe in most (all?) contexts, although it's most useful between tags and in attributes. Adam