python - Why don't scripting languages output Unicode to the Windows console? -
the windows console has been unicode aware @ least decade , perhaps far windows nt. reason major cross-platform scripting languages including perl , python ever output various 8-bit encodings, requiring trouble work around. perl gives "wide character in print" warning, python gives charmap error , quits. why on earth after these years not call win32 -w apis output utf-16 unicode instead of forcing through ansi/codepage bottleneck?
is cross-platform performance low priority? languages use utf-8 internally , find bother output utf-16? or -w apis inherently broken such degree can't used as-is?
update
it seems blame may need shared parties. imagined scripting languages call wprintf
on windows , let os/runtime worry things such redirection. turns out even wprintf on windows converts wide characters ansi , before printing console!
please let me know if has been fixed since bug report link seems broken visual c test code still fails wprintf , succeeds writeconsolew.
update 2
actually can print utf-16 console c using wprintf
if first _setmode(_fileno(stdout), _o_u16text)
.
from c can print utf-8 console codepage set codepage 65001, perl, python, php , ruby have bugs prevent this. perl , php corrupt output adding additional blank lines following lines contain @ least 1 wide character. ruby has different corrupt output. python crashes.
update 3
node.js first scripting language shipped without problem straight out of box.
the python dev team came realize real problem since it first reported @ end of 2007 , has seen huge flurry of activity understand , fix bug in 2016.
the main problem seems not possible use unicode on windows using standard c library , no platform-dependent or third-party extensions. languages mentioned originate unix platforms, method of implementing unicode blends c (they use normal char*
strings, c locale functions, , utf-8). if want unicode in c, more or less have write twice: once using nonstandard microsoft extensions, , once using standard c api functions other operating systems. while can done, doesn't have high priority because it's cumbersome , scripting language developers either hate or ignore windows anyway.
at more technical level, think basic assumption standard library designers make i/o streams inherently byte-based on os level, true files on operating systems, , streams on unix-like systems, windows console being exception. architecture many class libraries , programming language standard have modified great extent if 1 wants incorporate windows console i/o.
another more subjective point microsoft did not enough promote use of unicode. first windows os decent (for time) unicode support windows nt 3.1, released in 1993, long before linux , os x grew unicode support. still, transition unicode in oses has been more seamless , unproblematic. microsoft once again listened sales people instead of engineers, , kept technically obsolete windows 9x around until 2001; instead of forcing developers use clean unicode interface, still ship broken , now-unnecessary 8-bit api interface, , invite programmers use (look @ few of recent windows api questions on stack overflow, newbies still use horrible legacy api!).
when unicode came out, many people realized useful. unicode started pure 16-bit encoding, natural use 16-bit code units. microsoft apparently said "ok, have 16-bit encoding, have create 16-bit api", not realizing nobody use it. unix luminaries, however, thought "how can integrate current system in efficient , backward-compatible way people use it?" , subsequently invented utf-8, brilliant piece of engineering. when unix created, unix people thought bit more, needed bit longer, has less financially success, did right.
i cannot comment on perl (but think there more windows haters in perl community in python community), regarding python know bdfl (who doesn't windows well) has stated adequate unicode support on platforms major goal.
Comments
Post a Comment