java - When escaping a string with HTML entities, can I safely skip encoding chars above Unicode 127 if I use UTF-8? -
when outputting string in html, 1 must escape special characters html entities ("&<>" etc.) understandable reasons.
i've examined 2 java implementations of this: org.apache.commons.lang.stringescapeutils.escapehtml(string) net.htmlparser.jericho.characterreference.encode(charsequence)
both escape characters above unicode code point 127 (0x7f), non-english characters.
this behavior fine, strings produces non-human-readable when characters non-english (for example, in hebrew or arabic). i've seen when chars above unicode 127 aren't escaped this, still render correctly in browsers - believe because html page utf-8 encoded , these characters understandable browser.
my question: can safely disable escaping unicode characters above code point 127 when escaping html entities, provided web page utf-8 encoded?
you need use html entities under 2 circumstances:
- to escape character has special meaning in html (e.g.
<
) - to display character doesn't belong document encoding (e.g.,
€
symbol in iso-8859-1 document)
given utf-8 can represent unicode characters, first case apply.
when typing html manually may find practical insert html entity , if editor and/or keyboard won't allow type character (it's easier type ©
rather trying figure out how type actual ©) when escaping text automatically make page size grow ;-)
i know little java other languages have different functions encode special chars , possible entities.
Comments
Post a Comment