Dear Rod,
I'm not surprised that you had OCR problems with the Canon M305.
Early on, when I started this general topic, I had some hopes that OCR software really could take broken characters and determine what the intended letters were. This is what I run into with historical documents. But, my initial look-see wasn't promising and I digressed to transcribing, obviously a very slow and laborious procedure and one requiring careful proofreading. If there is a "magical", omniscient" OCR software out there, I'd love to hear of it. My thinking is that there is a long distance between the hype and reality.
Having personally retrenched on this, but still holding out hopes, the key thing in my mind is legibility, the being able to see/infer as much of each character as possible. To this end, this desire is actually in conflict with the needs of OCR. With OCR, as you say, you really want black on white, with no tonality. But, for greatest legibility to the naked eye, you want just the opposite: as much tonality as possible. This, then, allows for the presentation of even the faintest of details, essentially resulting in darker gray against lighter gray, the difference in shade poytentially being quite subtle. Taking the same document and putting it through text mode, giving strictly black on white, much of the character information disappears. Thus, the needs of readability and OCR conversion compatability are in distinct conflict with many historical documents. What is needed is OCR software that acts more like the human eye, integrating all that it sees and infering what is missing.
Regards,
Richard
|