ocr - How to make tesseract to recognize only numbers, when they are mixed with letters? -
want use tesseract recognize numbers. problem have mixture of
numbers & letters , when use setvariable("tessedit_char_whitelist", "0123456789")
every symbol tesseract returns wrong digit.
can set threshold value tesseract omits symbols low resemblance?
note: set tesseract recognize digits there no confusion between o , 0.
recognizing numbers answered on tesseract faq page. see page more info, if have version 3 package, config files set up. specify on commandline:
tesseract image.tif outputbase nobatch digits
as threshold value, i'm not sure mean. if input unusual font, perhaps might retrain sample of input. alternative change tesseract's pruning threshold. both options mentioned in faq.
Comments
Post a Comment