Wednesday, June 23, 2010

OCR in Google Docs

RantWoman receives much hype in her inbox and generally just roundfiles it. RantWoman would prefer to test the following before singing its praises prematurely. However RantWoman does not want to lose track. Plus RantWoman would love to get a handle on some of her recently reported adventure involving P D F and should give it a try on those grounds alone!



Optical character recognition (OCR) in Google Docs
source url:
http://googledocs.blogspot.com/2010/06/optical-character-recognition-ocr-in.htmlOptical character recognition (OCR) in Google Docs.
Tuesday, June 22, 2010.
A couple of months ago, my co-worker, Mike, showed up at my desk with a pile of paper, each of the yellowed sheets densely covered with an ancient-looking typewriter font. His wife had recently discovered parts of her family chronicles in the attic, typed up by her grandmother many years ago! Now he was wondering if there was a way for her to continue writing the chronicles in Google Docs.
The papers sat on my desk for a while, but recently, I returned them to Mike with a smile, cheerfully telling him that what started as my 20% project is now ready for everyone to use - Google Docs now officially supports importing scanned documents. What we launched as an experimental feature for the DocumentsList Data API last year is now available on the upload page: check the “Converttext from PDF or image files to Google Docs documents”, upload your scannedimages (JPEG, GIF, PNG) or PDFs, and Google Docs will extract text andformatting from the scans for you to edit away.
For the technically curious: we’re using Optical Character Recognition (OCR)that our friends from Google Books helped us set up. OCR works best with high-resolution images, and not all formatting may be preserved. The original images will be included in the new document to make it easier for you to correct mistakes. Supported languages include English, French, Italian, German andSpanish, with more languages and character sets on their way. We’re lookingforward to get feedback from you while we keep improving the feature over thenext months.
And Mike’s scanned family chronicles have even been extended by an additional chapter in Google Docs: his wife recently had a baby boy named James!
Posted by: Jaron Schaeffer, Software Engineer, Google Docs.

No comments:

Post a Comment