java - When escaping a string with HTML entities, can I safely skip encoding chars above Unicode 127 if I use UTF-8? -


when outputting string in html, 1 must escape special characters html entities ("&<>" etc.) understandable reasons.

i've examined 2 java implementations of this: org.apache.commons.lang.stringescapeutils.escapehtml(string) net.htmlparser.jericho.characterreference.encode(charsequence)

both escape characters above unicode code point 127 (0x7f), non-english characters.

this behavior fine, strings produces non-human-readable when characters non-english (for example, in hebrew or arabic). i've seen when chars above unicode 127 aren't escaped this, still render correctly in browsers - believe because html page utf-8 encoded , these characters understandable browser.

my question: can safely disable escaping unicode characters above code point 127 when escaping html entities, provided web page utf-8 encoded?

you need use html entities under 2 circumstances:

  • to escape character has special meaning in html (e.g. <)
  • to display character doesn't belong document encoding (e.g., symbol in iso-8859-1 document)

given utf-8 can represent unicode characters, first case apply.

when typing html manually may find practical insert html entity , if editor and/or keyboard won't allow type character (it's easier type &copy; rather trying figure out how type actual ©) when escaping text automatically make page size grow ;-)

i know little java other languages have different functions encode special chars , possible entities.


Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

c# - How to add a new treeview at the selected node? -

java - netbeans "Please wait - classpath scanning in progress..." -