Releases · itext/itext-pdfocr-dotnet

22 Oct 07:27

ars18wrw

1.0.2

6386958

pdfOCR 1.0.2

pdfOCR 1.0.2 is already the third release of our newest project.

It brings some important improvements which allow you to process documents more precisely. These are:

Refinement of the symbol position based on the HOCR data that fixes output for Thai and some CJK fonts. This is especially important for our pdfCalligraph customers.

You can turn it on with: tesseract4OcrEngineProperties.setUseTxtToImproveHocrParsing(true);

Possibility for configuration of image preprocessing. That allows smoothing out fluctuations in a document's brightness to give you better results in cases of images taken by a camera.
You can pass the parameters which are described on http://www.leptonica.org/binarization.html using tesseract4OcrEngineProperties.setImagePreprocessingOptions

Improvements

Combine HOCR and TXT outputs for more precise text recognition
Add possibility to set image preprocessing properties (adaptive threshold tile size, threshold smoothing)

Assets 4

21 Oct 12:07

Snipx

1.0.1

0400706

pdfOCR 1.0.1

Hot on the heels of our initial release, we're releasing 1.0.1 already!

We've made improvements to the way that the calculations for word bounding boxes are made, so that in languages where ligatures are required, we are able to properly detect the text and render each character correctly.

Improvements

Improvements in word bbox calculation

Assets 4

26 Jun 10:42

Snipx

1.0.0

30d7642

pdfOCR 1.0.0

We are proud to announce the first release of pdfOCR, the newest addition to our iText 7 Suite, which enables you to OCR your images into fully ISO-compliant PDF or PDF/A-3u files, making it possible to access and process the text they contain.

Given that we rely on the open-source Tesseract 4.x project to do the heavy lifting, we couldn't, in conscience, not make this add-on open source as well.

You may also notice that we have split up the project in two. We have an API module and the implementation module for Tesseract. In essence this means that you can hook up other OCR engines to iText, but it also means that we're not closing the door on taking on more options for our users to choose from.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements

Improvements

Releases: itext/itext-pdfocr-dotnet

pdfOCR 1.0.2

Improvements

pdfOCR 1.0.1

Improvements

pdfOCR 1.0.0