Is there a library for Python that gives the script name for a given unicode character or string? -


is there library tells script particular unicode character belongs to?

for example input "u'ሕ'" should return ethiopic or similar.

you can parse scripts.txt file:

# -*- coding: utf-8; -*-  import bisect  script_file = "/path/to/scripts.txt" scripts = []  open(script_file, "rt") stream:     line in stream:         line = line.split("#", 1)[0].strip()         if line:             rng, script = line.split(";", 1)             elems = rng.split("..", 1)             start = int(elems[0], 16)             if len(elems) == 2:                 stop = int(elems[1], 16)             else:                 stop = start             scripts.append((start, stop, script.lstrip()))  scripts.sort() indices = [elem[0] elem in scripts]  def find_script(char):     if not isinstance(char, int):         char = ord(char)     index = bisect.bisect(indices, char) - 1     start, stop, script = scripts[index]     if start <= char <= stop:         return script     else:         return "unknown"  print find_script(u'a') print find_script(u'Д') print find_script(u'ሕ') print find_script(0x1000) print find_script(0xe007f) print find_script(0xe0080) 

note code neither robust nor optimized. should test whether argument denotes valid character or code point, , should coalesce adjacent equivalent ranges.


Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

c# - How to add a new treeview at the selected node? -

java - netbeans "Please wait - classpath scanning in progress..." -