ocr - How to make tesseract to give relevant results in the presence of noise? -


i using tesseract 3.0.0 , bumped following problem:

when there small tesseract recognize seems it's merged other fragments. result nothing relevant returned.

the image below shows 3 cases. rectangle dashed line passed tesseract. on rectangle result (v on t means new line).

the last case problem one. there someway improve tesseract in situations this?

enter image description here

as far know, tesseract not have proper image segmentation yet (or document analysis, called in commertial ocr applications.) typically, before ocr done, image get's split on separate areas contain text, pictures, barcodes, lines , on. apply ocr on text ares , don't face problems have described.

earlier versions of tesseract did not have functionality @ all, , tesseract supposed used line recognizer only, or called field-level recognizer, when use on small snippets of text cut bigger image.

i did not followed throughly introduced in 3.0, there partially, not work expected, have found out.

there opensource project - ocropus, aproached problem described - first document analisys (aka segmentation) , ocr. earlier versions using tesseract ocr after analisys step finished. later introduced own ocr (which still not good) , moved tesseract plugin support down in priorities list.

here's can address problem:

  • if images have typical structure, can try dumb segmentation , cut text image before passing tesseract. however, if expect have wide variety of images supported, forget it.
  • you can ckeck ocropus , see if segmentation work images. if yes, can spend time make ocropus + tesseract work together.
  • well, if not fun , value time, recommend thinking real ocr engine abbyy. higher accuracy of both segmentaiton , ocr out of box, , professional customer support of course.

disclaimer: work abbyy


Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

c# - How to add a new treeview at the selected node? -

java - netbeans "Please wait - classpath scanning in progress..." -