Import Scans or Go Multilingual
September 27th, 2009 | Published in Google Code
About a month ago, we launched v3.0 of the Documents List Data API and promised more features were on the way. Well today, we're releasing two experimental features in the API: OCR and Document Translation.
The first, Optical Character Recognition (OCR), allows your application to create editable Google Documents from high-resolution images containing text (such as faxes or scanned letters). To perform OCR on a .png, .jpg, or .gif upload, add the ocr=true parameter onto your upload request:
POST /feeds/default/private/full?ocr=true HTTP/1.1
OCR will only work well on high-resolution images. The quality of the extracted text isn't perfect yet, but we're busy improving it!
Secondly, we have integrated Google Translate into the API. As a result, you can translate a document during upload. Simply add the targetLanguage and sourceLanguage parameters to your upload request:
POST /feeds/default/private/full/?targetLanguage=de&sourceLanguage=en HTTP/1.1
If sourceLanguage is omitted, we'll try to auto detect the document's language. All languages supported by Google Translate (full list here) are supported in the API.
As always, see the documentation for details. There's also a live demo (source will be available soon) up at googlecodesamples.com.