[zeromq-dev] RFC 32 - Z85
Peter Taylor
peter at vidavia.com
Wed Jul 10 13:02:51 CEST 2013
Pieter, I appreciate your overall rationale, but there seems to be a gap in
the explanation and that's the reasoning for which 23 of the 33
non-alphanumeric printable ASCII characters to include.
As far as I can see, you've disqualified single and double quotes and
probably backslashes. That still leaves free choice of 7 other characters to
exclude. Was that choice arbitrary, or did you use additional criteria? If
so, what were they?
It seems to me that an additional valuable consideration would be XML
safety. Bjorn is correct to observe that including < and & in the alphabet
makes escaping a requirement for well-formed XML whether the encoded data is
included as text nodes or as attribute values:
> The ampersand character (&) and the left angle bracket (<) must not appear
> in their literal form, except when used as markup delimiters, or within a
> comment, a processing instruction, or a CDATA section. If they are needed
> elsewhere, they must be escaped using either numeric character references
> or the strings " & " and " < " respectively.
(Source: http://www.w3.org/TR/REC-xml/#syntax )
For URLs there are 18 reserved characters, but since encoded data is going
to be in either the path or the query-string it might be possible to
restrict the excluded characters to those which are unsafe in those
contexts in the http scheme, which I think would be /+?&%#
Putting those together suggests that the excluded 10 characters should be
"#%&'+/<?\
Do any of space or !$()*,-.:;=>@[]^_`{|}~ have stronger reasons for
exclusion? (The strongest, it seems to me, would be space as potentially
trimmed by mistake, and > for XML text nodes).
More information about the zeromq-dev
mailing list