Online Web Ocr Api Sdk

  1. Home
  2. /
  3. Cloud
  4. /
  5. Online Web Ocr Api...

OCR API SDK info

OCR-IT Conversion Services have been designed specifically for Litigation Support needs. Trials with millions of pages of legal document, often in multiple languages, put stringent requirements on the accuracy and speed of image-to-text conversion systems. Our customers have asked and we responded with a unique image-to-text conversion system that exceeds requirements of legal community for service accurate and processing speed as well as ease of integration with existing platforms like Opticon, IPRO, and Summation. We can accommodate multiple load file formats and match virtually any project requirements.

Text Recognition As A Service (SaaS)

Developers can access the OCR Cloud 2.0 via HTTP(S) POST interface. Whenever and wherever text recognition is required, images are transmitted to the OCR Cloud 2.0. Instant load-balanced processing in the Cloud performs OCR. The recognition result is returned via the same interface, either by querying for result or URL notification. OCR-It OCR Cloud 2.0 is a robust and scaleable platform.

OCR IT Conversion the Simpler Solution

OCR API Cloud software helps users to introduce character recognition abilities for their numerous software application products such as computer hardware, online portals and mobile devices. Trails have been successfully conducted on legal document files written in various different languages. This unique Optical Character Recognition Software doesn’t involve complex procedures like software licensing. Software developers can access with simple coded lines through specific web services after they submit complex images. The developer is assured of accurate recognized code samples in multiple languages according to his preference.

Developers can easily access OCR Software Development Cloud through HTTP interface whenever they feel the need for text recognition solutions from image files, it is then broadcasted to OCR API, which instantly processes the files and performs optical character recognition and feeds the result through the same HTTP interface.

This is an extremely scalable platform and is virus free too. OCR is created to provide 100% accuracy in results. OCR Cloud is fully independent web program and is powerful enough to be operated through quality mobile devices and other web-based applications to change image files into simple usable text codes converted using the OCR or optical character recognition procedure. OCR library provides wholesome solutions to those involved in litigation Support and other corporate atmosphere

 

Mobile Friendly

OCR Cloud 2.0 is a powerful Web-based API which allows developers of mobile and small footprint applications to integrate highly accurate Optical Character Recognition technologies that convert images and photographs into manageable, usable and searchable text.  With advanced binarization, image pre-processing and filtering algorithms, OCR Cloud 2.0 produces quality results even from less than perfect pictures.

 

SaaS

Developers can access the OCR Cloud 2.0 via an HTTP or HTTP(S) interfaces.  Whenever and wherever text recognition is required, images are transmitted to the OCR Cloud 2.0 using this interface.  Once transmitted, the images are processed in the Cloud where OCR is performed. The recognition result is returned via the same interface, either by querying for result or notification. The OCR-IT OCR Cloud 2.0 is a robust platform designed to operate at scale.  It is commercially deployed in numerous distributed or mobile environments.

 

Free Development Account

One of most powerful automated ways to reach OCR Cloud 2.0 is through Web-based API.  This innovative capability provides software application developers with flexible powerful access to best-in-marketplace award-winning OCR on demand and without any initial investments.  Free Trial and production subscriptions are available online through an automated portal.

OCR CLOUD 2.0 API

OCR Cloud 2.0 is a cloud-based state of the art hosted OCR platform designed to convert millions of pages ACCURATELY and EFFICIENTLY at unbeatable prices. It requires no complex software licensing or purchases, and provides easy access to best OCR technology for integrations within minutes.

To meet growing needs in distributed applications and mobile markets, OCR-IT LLC created OCR Cloud 2.0 – the next generation document conversion platform – a flexible, efficient, powerful and scalable platform that can handle high volumes of pages and large numbers of requests. By combining the best of breed OCR engines and industry leading system integration expertise, OCR Cloud 2.0 now offer the highest accuracy document conversion at unbeatable price

CAPABILITIES

OCR Cloud 2.0 platform can convert virtually any image (TIF, JPG, PNG, BMP) or PDF to any standard text-based document type (TXT, DOC, RTF, XLS, PPT, XML, HTML) or searchable PDF.

Auto-language detection and support for over 200 languages including: Latin based languages Cyrillic based languages Chinese, Japanese, Korean, Thai, and Hebrew.

FREE TRIAL

One of most powerful automated ways to reach OCR Cloud 2.0 is through Web-based API.  This innovative capability provides software application developers with flexible powerful access to best-in-marketplace award-winning OCR on demand and without any initial investments.  Free Trial and production subscriptions are available online through an automated portal.

ARCHITECTURE

Accurate and 100% automated to ensure privacy
The OCR Cloud 2.0 is built on high accuracy automated text recognition technology and modern state of art platform. How accurate? Benchmark tests show that recognition from OCR-IT LLC delivers accuracy that is virtually on par with leading OCR software alternatives.  A free development account offers full access to API for your own evaluation. Privacy and Security of data OCR Cloud 2.0 is a fully automated service without any human intervention. This is important, since you as developers, as well as your users, want to make certain that their images are secure and private. OCR-IT LLC understands that and treats security and privacy among our top priorities. Our security mechanisms provide variety of controls at your fingertips to access and delete your image once the process completes. OCR on Mobile devices

OCR Cloud 2.0 is a powerful Web-based API which allows developers of mobile and small footprint applications to integrate highly accurate Optical Character Recognition technologies that convert images and photographs into manageable, usable and searchable text.  With advanced binarization, image pre-processing and filtering algorithms, OCR Cloud 2.0 produces quality results even from less than perfect pictures.

Text Recognition as a Service (SaaS)

Developers can access the OCR Cloud 2.0 via an HTTP(S) interface.  Whenever and wherever text recognition is required, images are transmitted to the OCR Cloud 2.0 using this interface.  Once transmitted, the images are processed in the Cloud where OCR is performed. The recognition result is returned via the same interface, either by querying for result or notification. The OCR-It OCR Cloud 2.0 is a robust platform designed to operate at scale.  It is commercially deployed in numerous distributed or mobile environments.

CLOUD BASED OCR API

OCR Cloud 2.0 is a cloud-based state of the art hosted OCR platform designed to convert millions of pages ACCURATELY and EFFICIENTLY at unbeatable prices. It requires no complex software licensing or purchases, and provides easy access to best OCR technology for integrations within minutes.
To meet growing needs in distributed applications and mobile markets, OCR-IT created OCR Cloud 2.0 – the next generation document conversion platform – a flexible, efficient, powerful and scalable platform that can handle high volumes of pages and large numbers of requests. By combining the best of breed OCR engines and industry leading system integration expertise, OCR Cloud 2.0 now offer the highest accuracy document conversion at unbeatable prices.

Image Formats

  • PDF
  • TIF
  • JPG
  • JPEG2000
  • PNG
  • BMP

OCR Languages

  • Latin based languages,
    Cyrillic based languages
  • Chinese, Japanese, Korean, Thai
  • Hebrew
  • Arabic

Auto-language/multi-language detection and support for over 190 languages.

Output Formats

  • Cleaned JPG, TIF, PDF Image export
  • Searchable PDF (Text under/over image, PDF/A, PDF Compressed)
  • TXT (standard or Unicode)
  • DOC / DOCX / RTF
  • XLS / XLSX
  • XML
  • HTML
  • ODBC Compatible Databases

Multiple simultaneous output streams available.

Data Transfer Methods

  • POST HTTP via Web API

TABLE OF CONTENTS

Swagger URL

4

SIGN UP AND PRICING

4

Overview

4

1. SUBMITTING A JOB

4

OVERVIEW

4

INPUT PARAMETERS

5

ApiKey (required):

5

InputFiles (required):

5

Name (optional)

6

Password (optional)

6

InputUrl (required if inputBlob not provided)

6

InputType (required in a few cases):

7

InputBlob (optional)

7

NotifyURL (optional):

7

CleanupSettings (optional):

8

OCRSettings (optional):

9

OutputSettings (optional):

10

Error Element On Failed Job Submission

11

Examples

12

2. HANDLING JOB STATUS

13

2.1 Status for jobs in progress

13

2.2 Status for successful jobs

13

2.3 Status for expired jobs

14

2.4 Status for failed jobs

15

3. CLEANUP JOB

15

4. RETRIEVING JOB RESULTS

15

PER-PAGE CHARGES

15

PER-PAGE CHARGES

15

LIST OF SUPPORTED LANGUAGES

15

Languages with full dictionary support

15

Languages without dictionary support

16

Artificial languages

17

Formal languages

17

Note:

17

Questions?

17

 

SWAGGER URL

Please visit the Swagger page to check the latest description of service methods. Documentation may become obsolete but Swagger would be the latest always:

http://ocrapi.datacapture.cloud/swagger/ui/index.html

SIGN UP AND PRICING

GET PRICING INFORMATION AND SIGN UP FOR AN API KEY:

https://portal.datacapture.cloud/#/login

OVERVIEW

The DataCapture.cloud OCR Web API allows to submit OCR requests (images in PDF / TIFF / PNG / JPG / BMP / PCX / DCX formats) and get back textual results (in TXT / PDF / RTF / Word / Excel / XML / CSV / others, with full Unicode support).  Multilingual OCR in a variety of languages (listed at the end of this document) is supported. 

Key Features:

  • Support of Common Image Formats
  • Variety of Print Types
  • Image Cleanup: Deskew, Despeckele, Remove Texture, Automatic Rotation Detection
  • Over 180 OCR Languages
  • Mixed Languages Auto-detection
  • Barcode Recognition
  • Two Speeds of Text Recognition: Quality, Speed
  • Specialized Text Extraction Algorithms
  • All Popular Output Formats
  • Enhanced Error Handling

By using the various API settings, you can optimize the OCR process to a variety of sources (scans, digital camera images, etc) and a variety of purposes (full-text indexing of articles, invoice scanning, etc). Barcode scanning is also supported. For assistance in optimizing the API for your particular task, please contact our Support Team.

Using the API consists of the following stages:

  • Submit a Job
  • Handle the Job Status – one or both of the following:
  • Check Job Status manually
  • Get notified about Job Status automatically
  • Get Results of a Job

1. SUBMITTING A JOB

OVERVIEW

 

Submit a job by sending an HTTP POST request to the following URL:

http://BASEURL/api/jobs

The request message body should contain JSON of the following format (explained in detail below):

{

  “apiKey”:””,

  “profile”:””,

  “notifyUrl”:””

  “inputFiles“: [],

  “cleanupSettings“: {},

  “ocrSettings“: {},

  “outputSettings“: {}

}

The Content-Type of the request should be “application/json”.

In case of success, the response will be an HTTP 200 (Success) response code, and the following JSON (explained in detail below):

{  

“jobUrl“: “string”,

“status“: “Submitted”

}

In case of an error, an HTTP error code is returned along with JSON explaining the error (see section on Error Elements at the end of this document).

You will be charged for 1 page of OCR upon successful job submission, and for the rest of the pages (in case of multi-page document) upon successful job completion. Please note that certain errors (such as a corrupt input file) can only be detected once you’ve already been charged for the 1st page.

 

INPUT PARAMETERS

 

APIKEY (REQUIRED)

 

This is your API key, which is issued to you when you subscribe to the DataCapture.cloud API here

PROFILE (OPTIONAL):

 

If this parameter specified API would use specific profile settings for file processing. If parameter not specified profile “Default” would be used.
Currently only “Default” profile is supported.

Please note, that OCR API does not currently share Profiles with the other DataCapture.cloud services.

INPUTFILES (REQUIRED):

 

Array of JSON objects. One object per file

“inputFiles”: [{

“name”: “string”,

“password”: “string”,

“inputUrl”: ”string”,

“inputBlob”: “string”,

“inputType”: “string”

}]

NAME (OPTIONAL)

 

Optional parameter to provide output filename. Currently it is not supported.

PASSWORD (OPTIONAL)

 

Required If file protected by password.

INPUTURL (REQUIRED IF INPUTBLOB NOT PROVIDED)

 

The URL of the image on which you want to perform OCR (must be http:// or https://)

NOTE 1: Make sure that the InputURL is properly encoded. This is especially a concern if the URL contains query parameters. For example, if your image is at:
      http://example.com/images?id=565&size=large,
the job request should be:
      {“inputFiles“: [{“inputUrl“: “http://example.com/images?id=565&size=large”}]}Note that the “&” in the original URL has turned into “&”, as required by encoding rules.

Normally, if you use a standard library for dealing with JSON, this would be done for you automatically. However, if you are constructing JSON manually from strings, you may need to do this manually.

 

NOTE 2: Do not URL-encode (percent-encode) the InputURL. For example, if your image is at:
      http://example.com/My%20Picture.jpg,
the job request should be:
      {“inputFiles“: [{“inputUrl“: “http://example.com/My Picture.jpg “}]}Note that a real space is used instead of the “%20” percent-encoded version.

The image cannot exceed 200MB in size and cannot take more than 15 minutes to download.

The image must be in a supported format (see table below). If the image URL path (not counting the query string, if any) does not end in a dot followed by a supported extension (case-insensitive, see table below), the InputType parameter must be provided. E.g.:

http://example.com/scan001.tif – InputType not required (TIF auto-detected)
http://example.com/scan001.tif?resolution=high – InputType not required (TIF auto-detected)
http://example.com/scan001 – InputType required
http://example.com/scan001?format=.tif – InputType required

Supported formats and extensions are:

FORMAT EXTENSIONS SUPPORTED FORMAT DETAILS
PDF pdf Version 1.6 or earlier
BMP bmp 2-bit – Uncompressed Black & White
4- and 8-bit – Uncompressed Palette
16-bit – Uncompressed Mask
24-bit – Uncompressed Palette and TrueColor
32-bit – Uncompressed Mask
PCX pcx 2-bit Black & White, 4- and 8-bit Gray
DCX dcx 2-bit Black & White, 4- and 8-bit Gray
JPG jpg, jpeg Jpeg: Gray, Color
Jpeg 2000: Gray Part 1, Color Part 1
TIF tif, tiff Black&White: uncompressed, CCITT3, CCITT3FAX, CCITT4, PackBits, ZIP, LZW
Gray: uncompressed, Packbits, JPEG, ZIP, LZW
TrueColor: uncompressed, JPEG, ZIP, LZW
Palette: uncompressed, Packbits, ZIP
Multi-image TIFF
PNG png Black&white, gray, color

INPUTTYPE (REQUIRED IN A FEW CASES):

 

Specifies the input type. Must be one of the Supported Formats (leftmost column in the table above). Not required if the type can be auto-detected from the URL (see InputURL above).

InputType parameter required if image/file provided through inputBlob. Also InputType parameter required if the type can’t be auto-detected from the URL (see InputURL above). In other cases parameter is optional.

INPUTBLOB (OPTIONAL)

 

Image/File can be posted as base64 string in that parameter. It is necessary to specify inputType parameter (see above).

Please note – at one request you can post only one blob file. Also it is forbidden to post a regular file with inputUrl specified and a blob.

NOTIFYURL (OPTIONAL):

 

The URL to which a notification should be sent when the job succeeds or fails (see section 2b on notifications). Must be http:// or https://.

NOTE: The NotifyURL must not be URL-encoded (i.e. should use “ “ and not “%20”), and must be encoded (i.e. should use “&” and not “&”), just like the InputURL. See the InputURL section above for more details and examples

OCR API will send either a successful (see 2.2) or failed (2.4) status report to the webhook.

CLEANUPSETTINGS (OPTIONAL):

 

Settings that control image cleanup, in the following form (every element is optional):

“cleanupSettings“: {

    “deskew“: true,

    “removeGarbage“: true,

    “removeTexture“: true,

    “splitDualPage“: true,

    “rotationType“: “NoRotation”,

}

The settings are explained below

Deskew (Boolean)  Specifies whether the skew angle for an image should be corrected during preprocessing. This mode is recommended if you want to automatically correct skew for images you work with. The default value is ‘true’.
RemoveGarbage (Boolean)  Specifies whether garbage (excess dots that are smaller than a certain size) should be removed from the image during preprocessing. The default value is ‘true’.
RemoveTexture (Boolean)  Specifies whether background noise should be cleared before the recognition process starts. The default value is ‘true’. Before                         After
SplitDualPage (Boolean) Specifies whether API should try to split the image vertically to 2 separate pages. The default value is ‘false’. 
RotationType (String)  Specifies what type of rotation will be performed upon the image during preprocessing. The default value is “Automatic”, which means that rotation will be detected automatically. Allowed values:
NoRotation – no rotation
Automatic – auto-detect rotation
Clockwise – rotate by 90 degrees clockwise
Counterclockwise – rotate by 90 degrees counterclockwise
Upsidedown – rotate by 180 degrees

 

OCRSETTINGS (OPTIONAL):

 

Settings that control image recognition, in the following form (every element is optional):

“ocrSettings“: {

    “speedOcr“: false,

    “lookForBarcodes“: true,

    “analysisMode“: “MixedDocument”,

    “printType“: “Normal”,

    “ocrLanguage“: “English”

  }

The settings are described below:

PrintType (Semicolon-delimited list of strings)  Specifies the types of printed text in the image.  The default value is “Normal”, which corresponds to common typographic text equivalent to laser printer.

Normal

Modern Text

Typewriter 

Matrix

OCR_A

OCR-A Text

OCR_B

OCR-B Text

MICR_E13B

If you would like to recognize more than one text type in the same document, separate types with semicolons without spaces. For example, “Normal;Typewriter”.
OCRLanguage (Semicolon-delimited list of strings)  This property allows you to specify which of over 200 supported languages should be used for OCR, including mixed languages within the same document.  See list of supported languages at the end of this document (6). The default value is “English“. To specify more than one language, separate languages with semicolons (without spaces) – for example:
English;Dutch (Belgium);Danish”.
SpeedOCR (Boolean)  This property provides faster recognition speed (by as much as 2-2.5 times, depending on server load) at the cost of a moderately increased error rate (1.5-2 times more errors).  On good, print-quality texts, OCR makes an average of 1-2 errors per page more in this mode, which in some cases is a small sacrifice for the substantial increase in speed. Such moderate increase in error rate can be easily tolerated in many cases, such as full text indexing with “fuzzy” searches, preliminary recognition, etc. The default value is ‘false’.
AnalysisMode (String)  Specifies how aggressively the text should be extracted. The default value is “FullPageDocument”.

FullPageDocument – This mode is useful if you export your text to document archives: the full page layout is retained and full-text search is available if you save in this mode.  This mode will look for images and text within an image.

FullTextIndexing — This mode is used to extract data from a document, including text in pictures.  Note that the OCR retains both the picture and the text in it. Text extracted from a picture block can only be exported to TXT, PDF and XML formats (XML export support is coming soon).  The data can then be used for subsequent full-text indexing and search.  The program retains the logical reading order, pictures, and tables. 

InvoicePreprocessing — This mode is used to pre-process invoices. Usually they are noisy, low-quality images. This mode extracts all text from the image, including tables, pictures, small text areas, and noise. The result is plain text without table blocks and picture blocks. 

ExtractBarcodes — this mode is used to extract barcodes only.NOTE: Barcode values are extracted in all modes as long as LookForBarcodes is true.
LookForBarcodes (Boolean) Specifies whether barcodes should be recognized. Default is ‘true’.

 

OUTPUTSETTINGS (OPTIONAL)

 

Settings that control text result output, in the following form (every element is optional):

“outputSettings“: {

    “exportFormat“: “Text;PDF”

  }

The settings are explained below:

ExportFormat (Semicolon-delimited list of strings)  Specifies the desired formats for text output.

The default value is “Text;PDF”, which corresponds to both Text and PDF output.

RTF – export to *.RTF (rich-text) format.  Retains full page layout and preserves pictures.  The program will automatically select the most suitable paper size when saving the recognized text and pictures.

MSWord – export to *.DOC (Microsoft Word) format.  Retains full page layout and preserves pictures.  The program will automatically select the most suitable paper size when saving the recognized text and pictures.

MSExcel – export to *.XLS (Microsoft Excel) format.  

PDF – export to *.PDF format  

DBF – export to *.DBF format

Text – export to *.TXT common formatted ASCII text-only output

CSV – export to *.CSV format

PPT – export to *.PPT format

XML – export to *.XML format

UnicodeText_UTF8 – export to *.UTF8.TXT format

UnicodeText_UTF16 – export to *.UTF16.TXT format

UnicodeCSV_UTF8 – export to *.UTF8.CSV format

UnicodeCSV_UTF16 – export to *.UTF16.CSV format

If you would like to produce more than one output format from the same image request, separate your desired output formats with semicolons without spaces. For example, “PDF;Text;UnicodeText_UTF8”.

NOTE: You will need to know the file extension of the desired format (specified above) to retrieve the job results (see section 2.2 of this document).

 

ERROR ELEMENT ON FAILED JOB SUBMISSION

 

If the job submission fails, you will receive an appropriate HTTP error code, as well as an <Error>Code</Error> response. The possible values of ‘Code’ are:

Code HTTP Error Code Description
BadInputURL 400 InputURL is invalid or missing, or is not an HTTP/HTTPS URL
BadNotifyURL 400 NotifyURL is invalid or missing, or is not an HTTP/HTTPS URL
BadInputType 400 The specified InputType is invalid,
OR
InputType is missing, and auto-detected file type is not valid,
OR
InputType is missing, and auto-detection of file type has failed
BadRotationType 400 Rotation specified in CleanupSettings is invalid. Please note that it is case-sensitive.
BadAnalysisType 400 AnalysisMode specified in OCRSettings is invalid. Please note that it is case-sensitive.
BadPrintType 400 PrintType specified in OCRSettings is invalid. Please note that it is case-sensitive.
BadExportFormat 400 ExportFormat specified in OutputSettings is invalid. Please note that it is case-sensitive.
OCRSettingsTooComplex 400 OCRSettings are too complex. Try reducing the number of OCRLanguages and PrintTypes you are recognizing.
InternalError:ErrorNumber 500 Internal error has occurred. Contact support@wisetrend.com 

EXAMPLES

 

URL Example:

HTTP POST to http://BASEURL/api/jobs 

Message body example (simple):

{

  “inputFiles“: [

    {

      “inputUrl“: http://www.example.com/images/scan001.tif

    }

  ]

}

Message body example (with full parameters):

{

  “notifyUrl“: “http://example.com/notify”,

  “inputFiles“: [

    {

      “inputUrl“: “http://www.example.com/getScans.php?DocumentID=569“,

      “inputType“: “TIF”

    }

  ],

  “cleanupSettings“: {

    “deskew“: true,

    “removeGarbage“: true,

    “removeTexture“: true,

    “splitDualPage“: true,

    “rotationType“: “NoRotation”,

    “outputFormat“: “pdf”,

    “resolution“: “high”,

    “jpegQuality“: “string”

  },

  “ocrSettings“: {

    “speedOcr“: true,

    “lookForBarcodes“: true,

    “analysisMode“: “MixedDocument”,

    “printType“: “Print”,

    “ocrLanguage“: “French”

  },

  “outputSettings“: {

    “exportFormat“: “Text;PDF”

  }

}

Response example (status “Submitted”):

{

  “jobUrl“: “http://BASEURL/api/Jobs?JobId=00000000-0000-0000-0000-000000000000”,

  “status“: “Submitted”,

}

See next section for different available Status responses.

2. HANDLING JOB STATUS

There are two ways to handle job status:

  • You can manually check the status of any job by sending an HTTP GET request to the JobURL that you received when you submitted the job.
  • You can automatically get notified when the job succeeds or fails if you provide a NotifyURL when you submit a job. There will only be one attempt to notify you. It will be made when the job fully succeeds or fails (you will not get any intermediate status notifications). The notification will consist of an HTTP POST containing JSON status information (see 2.2 and 2.4).

Regardless of which method you use, the status report is in the same format, as described below.

2.1 STATUS FOR JOBS IN PROGRESS

 

For jobs that are not yet complete, the status report looks as follows:

{

  “jobUrl“: “http://BASEURL/api/Jobs?JobId=00000000-0000-0000-0000-000000000000”,

  “status“: “[status]”,

}

“Status” can either be:
Submitted” –  the job has been submitted but the image to be OCRed has not yet been downloaded
Processing” – the image has been downloaded and is in the process of being OCRed
Finished” – successful/expired/failed jobs.

“JobURL” repeats the URL where updated job status may be obtained.

2.2 STATUS FOR SUCCESSFULL JOBS

 

For jobs that have completed successfully, the status report looks as follows:

{  

   “jobUrl“:”http://BASEURL/api/Jobs?JobId=00000000-0000-0000-0000-000000000000”,

   “status“:”Finished”,

   “download“:[  

      {     “uri“:”http://ocrapi.datacapture.cloud/api/Files?JobId=00000000-0000-0000-0000-000000000000&outputFormat=pdf”,

         “outputFormat“:”pdf”,

         “creationDateUTC“: “2017-04-01T06:52:08.839Z”

      }

   ],

   “statistics“:{  

      “files“:[  

         {  

            “fileName“:”readme”,

            “downloadDateUTC“: “2017-04-01T06:52:08.839Z”,

            “warning“: “string”,

            “totalCharacters“:5594,

            “uncertainCharacters“:123,

            “pagesArea“:3

         }

      ],

      “creationDateUTC“: “2017-04-01T06:52:08.839Z”,

      “totalCharacters“:5594,

      “uncertainCharacters“:123,

      “pagesArea“:3

   }

}

There will be one <File> entry for each requested output format – by default, there will be  one for TXT (plaintext) and the other for PDF. The <File> entries may appear in any order. Each contains an <OutputType> indicating the output type (file extension), and a <Uri> containing the address where the output may be downloaded. 

As usual, “JobURL” repeats the URL where updated job status may be obtained.

2.3 STATUS FOR EXPIRED JOBS

 

Job results are not guaranteed to be kept for more than 24 hours. If a job has expired, it will not have a <Download> element, and the <Status> will be “Expired”.

2.4 STATUS FOR FAILED JOBS

 

For jobs that have failed, the status report looks as follows:

{  

   “jobUrl“:”http://ocrapi.datacapture.cloud/api/Jobs?JobId=00000000-0000-0000-0000-000000000000”,

   “status“:”Failed”,

   “errors“: [

    {

      “code“: “string”,

      “message“: “string”

    }

   ],

}

The <Status> may be one of the following:

FailedDownload Could not download the image to be OCRed
FailedConversion Could not perform OCR
FailedNoFunds Insufficient funds for the number of pages you are attempting to OCR
FailedInternalError Internal error, please contact support@wisetrend.com

The <Errors> element may or may not be present. If it is present, it may contain one or more <Error> elements with <Code> and <Message> sub-elements that can help you debug the problem. Here are some common <Code> values:

ConvertFailed The ABBYY OCR engine reported an error during conversion. Make sure that the input file is not corrupt ad is not password-protected.
SubmitFailed Could not submit the OCR job. Possibly an internal error, contact support@wisetrend.com
DownloadRejected Could not download the input image. Ensure that it does not exceed maximum size and that the server with the image responds promptly.
DownloadFailed Could not download the input image. Ensure that the image URL exists and does not require authentication.

As usual, “JobURL” repeats the URL where updated job status may be obtained.

3. CLEANUP JOB

POST http://BASEURL/api/[jobId]/cleanup also apiKey should be passed through BODY

Method sets job status to EXPIRED and no more information can be received through API including files and statistics

4. RETRIEVING JOB RESULTS

To get the results of the job, use the URLs from the successful job status reports (see section 2.2 above). Results will be returned with the correct Content-Type header. Note that results may be deleted after 7 days.

5. PER-PAGE CHARGES

You will be charged for 1 page at the time the OCR request is submitted (regardless of whether the job fails or succeeds) – this is the minimum charge to attempt a job. You will be charged for the rest of the pages only when the job succeeds.

6. LIST OF SUPPORTED LANGUAGES

LANGUAGES WITH FULL DICTIONARY SUPPORT

Your content goes here. Edit or remove this text inline or in the module Content settings. You can also style every aspect of this content in the module Design settings and even apply custom CSS to this text in the module Advanced settings.

  • Armenian (Eastern) 
  • Armenian (Grabar) 
  • Armenian (Western) 
  • Bashkir 
  • Bulgarian 
  • Catalan 
  • Chinese Simplified*
  • Chinese Traditional*
  • Croatian 
  • Czech 
  • Danish 
  • Dutch (Belgium) 
  • Dutch (Netherlands) 
  • English 
  • Estonian 
  • Finnish 
  • French 
  • German
  • German (new spelling) 
  • Greek 
  • Hebrew* 
  • Hungarian 
  • Indonesian 
  • Italian 
  • Japanese*
  • Korean*
  • Latvian 
  • Lithuanian 
  • Norwegian (Group of Norwegian (Nynorsk) and Norwegian (Bokmal) languages.) 
  • Norwegian (Bokmal) 
  • Norwegian (Nynorsk) 
  • Old English 
  • Old French
  • Old German 
  • Old Italian 
  • Old Spanish 
  • Polish 
  • Portuguese (Brazil) 
  • Portuguese (Portugal) 
  • Romanian 
  • Russian 
  • Slovak 
  • Slovenian 
  • Spanish 
  • Swedish 
  • Tatar 
  • Turkish 
  • Ukrainian

LANGUAGES WITHOUT DICTIONARY SUPPORT

 

  • Abkhaz 
  • Adyghe 
  • Afrikaans 
  • Agul 
  • Albanian 
  • Altaic 
  • Avar 
  • Aymara 
  • Azerbaijani (Cyrillic) 
  • Azerbaijani (Latin) 
  • Basque 
  • Belarussian 
  • Bemba 
  • Blackfoot
  • Icelandic 
  • Ingush 
  • Irish 
  • Jingpo 
  • Kabardian 
  • Kalmyk 
  • Karachay-Balkar 
  • Karakalpak 
  • Kasub 
  • Kawa 
  • Kazakh 
  • Khakas 
  • Khanty 
  • Kikuyu 
  • Kirghiz 
  • Kongo 
  • Koryak 
  • Kpelle 
  • Kumyk 
  • Kurdish 
  • Lak 
  • Latin 
  • Lezgin 
  • Luba 
  • Macedonian 
  • Malagasy 
  • Malay 
  • Malinke 
  • Maltese 
  • Mansi 
  • Maori
  • Breton 
  • Bugotu 
  • Buryat 
  • Cebuano 
  • Chamorro 
  • Chechen 
  • Chukchee 
  • Chuvash 
  • Corsican 
  • Crimean Tatar 
  • Crow 
  • Dakota 
  • Dargwa 
  • Dungan 
  • Eskimo (Cyrillic)
  • Mari 
  • Maya 
  • Miao 
  • Minangkabau 
  • Mohawk 
  • Mongol 
  • Mordvin 
  • Nahuatl 
  • Nenets 
  • Nivkh 
  • Nogay 
  • Nyanja 
  • Ojibway 
  • Ossetian 
  • Papiamento 
  • Provencal 
  • Quechua 
  • Rhaeto-Romanic 
  • Romanian (Moldavia) 
  • Romany 
  • Ruanda 
  • Rundi 
  • Russian (old spelling) 
  • Sami (Lappish) 
  • Samoan 
  • Scottish Gaelic 
  • Selkup 
  • Serbian (Cyrillic) 
  • Serbian (Latin) 
  • Shona 
  • Somali
  • Eskimo (Latin) 
  • Even 
  • Evenki 
  • Faroese 
  • Fijian 
  • Frisian 
  • Friulian 
  • Gagauz 
  • Galician 
  • Ganda 
  • German (Luxembourg) 
  • Guarani 
  • Hani 
  • Hausa 
  • Hawaiian
  • Sorbian 
  • Sotho 
  • Sunda 
  • Swahili 
  • Swazi 
  • Tabassaran 
  • Tagalog 
  • Tahitian 
  • Tajik 
  • Tok Pisin 
  • Tongan 
  • Tswana 
  • Tun 
  • Turkmen 
  • Tuvan 
  • Udmurt 
  • Uighur (Cyrillic) 
  • Uighur (Latin) 
  • Uzbek (Cyrillic) 
  • Uzbek (Latin) 
  • Welsh 
  • Wolof 
  • Xhosa 
  • Yakut 
  • Zapotec 
  • Zulu

ARTIFICIAL LANGUAGES

 

  • Esperanto 
  • Ido
  • Interlingua 
  • Occidental
 

NOTE

 

  • Basic 
  • C/C++ 
  • COBOL 
  • Fortran
  • Java 
  • Pascal 
  • Simple chemical formulas MICR (E-13B) – recognition
  • Language for MICR (E-13B) text type Numbers Only

FORMAL LANGUAGES

 

Languages marked with “*” are not available in this API release.  May be available by special request.  Limited export formats and combinations of languages are available.  Consult additional documentation or contact DataCapture.cloud team for assistance.

QUESTIONS

Contact support@wisetrend.com