Skip to content

Commit f271a87

Browse files
committed
[RELEASE] iText 7 pdfOcr - 1.0.0
https://git.itextsupport.com/ * release/1.0.0: [RELEASE] 1.0.0-SNAPSHOT -> 1.0.0 Hide possibility to set userWords Refactor MultiThreadingTest test to reuse code from IntegrationTestHelper Make the scope of a method stricter A couple of small fixes to remove workarounds from code Allow tesseract4 events from com.itextpdf.pdfocr space Implement pdfOcr licensing Create jar with sources in Maven User words file is unexpectedly removed from disk Increase test timeouts Deploy jar with sources to Artifactory as the add-on is open source Make pdfOcr classes autoportable Remove licensekey version property from root pom file Fix issue with saving processed images Performance drop on some complex halftone images Improve artifact descriptions Improve Javadocs for Tesseract implementations Small fix to avoid inner class in .NET Change in Jenkinsfile to abort possible already running automatic builds Change in Jenkinsfile so that the automatic build is only blocked when the build for itextcore for Java is running Hide AbstractTesseract4OcrEngine#doTesseractOcr(File , List<File>, OutputFormat, int) AbstractIntegrationTest#testSimpleTextOutput is triggered 13 times PDFOC-89 Add copyright headers Add license information Fix several Javadoc and code remarks PDFOC-84 Throw proper exceptions in case the Tesseract prerequisites have possbily not been met Add FontProvider mechanism PDFOC-73 Update .mailmap Improve test coverage Add ActualText if there are NotDef glyphs Introduce an option not to add layers to output PDF file PDFOC-74 Move NOTICE.txt to another directory Update log message Update comments Update command structure for executable Fix remarks related to TesseractOcrUtil class and add check for NOTDEF glyphs Update target branch for sonar Fix various code remarks Fix various code and API design remarks Split to two modules Change name of root artifact Remove clirr-maven-plugin Improve test coverage Split to two modules Fix for SonarQube analysis PDFOC-65 Fix various code remarks in test code PDFOC-65 Fix various code remarks in test code Set user_defined_dpi Fix various code and API design remarks Fix various code and API design remarks On Linux the VM crashes at times to build the Java version of pdfOCR Remove vulnerable dependency On Linux the VM crashes at times to build the Java version of pdfOCR Fix various code remarks Build only on windows until PDFOC-68 is fixed Add category to tests Refactor test to junit ExpectedException Add test for invalid font Fix javadoc issues Fix code style for enums Rename test files Refactor ocr images method and remove ImgFormat enum Add license info for fonts Refactor exceptions and log messages Remove unused method Add tests for log messages Remove unused test files Remove commented code Change Jsoup to styled-xml-parser and fix according to review Refactoring for porting to .net Update test files Fix for user words Update dependencies Update tests Fix text positioning Add tests for PDFCOC-31,32,33,34 Refactor image preprocessing Add tests for ppm images Remove creating the sources jar from pom.xml Remove creating the sources jar from the Jenkinsfile Fix ocr for ppm images Fix Jenkinsfile: mvn workspace repository for Windows machines Check ppm on linux Fix for eng language Fix for tmp file in tmp directory Add default language with adding user-words Fix getting font path Fix for embedded font in jar Refactoring for porting to .net Add custom user words Fix wrong message in OCRException Add tests for text files Add tests for text file output Add possibility to OCR to a file + refactoring for multipage tiffs Small refactoring, add test for ppm images Fix for PNM images Fix for tif images Performance improvements of Jenkins builds Update text positioning PDFOC-18 Add gitattributes Update default font Add .gitignore Add tests for path to hocr script Update preprocessing Fix tests Make path to tess data mandatory Add separator for tess data path Remove createPdfA3u parameter Update tests with transparent text Update TextInfo to public Change default text color to transparent Add new font for tests Add comments Add greek test Add missed test pdf Update compare tool test Replace few tests using compare tool Fix for tiff images Add preprocessing and fix tests Add logging for exceptions Move tests for lib Add tesseract lib and tests Update images coordinates calculation Update tesseract dir Add null check for imagedata Update scale mode tests Update default scale mode Update exception handling Add new test image Add empty text test Update tests and code style Add basic exception handling and cosmetic refactoring Add placeholder in case of corrupted images Update tests with compare tool Refactoring according to the checkstyle plugin checks Add japicmp plugin Clean up dependencies Fix logging lib Add tests for tiff Update tests for new tess data files Add tests for scripts Update directories structure and add tests for languages Update exception handling Update temp filenames in tests Add tests for pdfa3u Add tests and update structure Add extended tests using compare tool first approach
2 parents 65582e2 + 306e56c commit f271a87

File tree

185 files changed

+14404
-82
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

185 files changed

+14404
-82
lines changed

.gitattributes

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Set the default behavior, in case people don't have core.autocrlf set.
2+
* text=auto
3+
4+
# Explicitly declare text files you want to always be normalized and converted
5+
# to LF line endings on checkout.
6+
*.afm text eol=lf
7+
*.cmap text eol=lf
8+
*.cs text eol=lf ident
9+
*.css text eol=lf
10+
*.htm text eol=lf
11+
*.html text eol=lf
12+
*.java text eol=lf ident
13+
*.lng text eol=lf
14+
*.md text eol=lf
15+
*.pom text eol=lf
16+
*.properties text eol=lf
17+
*.svg text eol=lf
18+
*.txt text eol=lf
19+
*.xfdf text eol=lf
20+
*.xht text eol=lf
21+
*.xhtml text eol=lf
22+
*.xml text eol=lf
23+
port-hash text eol=lf
24+
25+
# Declare files that will always have CRLF line endings on checkout.
26+
*.bat text eol=crlf
27+
*.csproj text eol=crlf
28+
*.sln text eol=crlf
29+
30+
# Denote all files that are truly binary and should not be modified.
31+
*.aif binary
32+
*.aiff binary
33+
*.bmp binary
34+
*.cer binary
35+
*.cmp binary
36+
*.crt binary
37+
*.dib binary
38+
*.gif binary
39+
*.icc binary
40+
*.j2k binary
41+
*.jb2 binary
42+
*.jp2 binary
43+
*.jpc binary
44+
*.jpg binary
45+
*.key binary
46+
*.otf binary
47+
*.p12 binary
48+
*.pdf binary
49+
*.pfb binary
50+
*.pfm binary
51+
*.png binary
52+
*.snd binary
53+
*.tif binary
54+
*.tiff binary
55+
*.ttc binary
56+
*.ttf binary
57+
*.u3d binary
58+
*.wav binary
59+
*.wmf binary
60+
*.woff binary
61+
*.woff2 binary
62+
*.dat binary

.gitignore

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# Created by https://www.gitignore.io
2+
3+
### Java ###
4+
*.class
5+
6+
# Mobile Tools for Java (J2ME)
7+
.mtj.tmp/
8+
9+
# Package Files #
10+
*.jar
11+
*.war
12+
*.ear
13+
14+
# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
15+
hs_err_pid*
16+
17+
18+
### Eclipse ###
19+
*.pydevproject
20+
.metadata
21+
.gradle
22+
bin/
23+
tmp/
24+
*.tmp
25+
*.bak
26+
*.swp
27+
*~.nib
28+
local.properties
29+
.settings/
30+
.loadpath
31+
32+
# Eclipse Core
33+
.project
34+
35+
# External tool builders
36+
.externalToolBuilders/
37+
38+
# Locally stored "Eclipse launch configurations"
39+
*.launch
40+
41+
# CDT-specific
42+
.cproject
43+
44+
# JDT-specific (Eclipse Java Development Tools)
45+
.classpath
46+
47+
# PDT-specific
48+
.buildpath
49+
50+
# sbteclipse plugin
51+
.target
52+
53+
# TeXlipse plugin
54+
.texlipse
55+
56+
57+
### Intellij ###
58+
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm
59+
60+
*.iml
61+
62+
## Directory-based project format:
63+
.idea/
64+
# if you remove the above rule, at least ignore the following:
65+
66+
# User-specific stuff:
67+
# .idea/workspace.xml
68+
# .idea/tasks.xml
69+
# .idea/dictionaries
70+
71+
# Sensitive or high-churn files:
72+
# .idea/dataSources.ids
73+
# .idea/dataSources.xml
74+
# .idea/sqlDataSources.xml
75+
# .idea/dynamic.xml
76+
# .idea/uiDesigner.xml
77+
78+
# Gradle:
79+
# .idea/gradle.xml
80+
# .idea/libraries
81+
82+
# Mongo Explorer plugin:
83+
# .idea/mongoSettings.xml
84+
85+
## File-based project format:
86+
*.ipr
87+
*.iws
88+
89+
## Plugin-specific files:
90+
91+
# IntelliJ
92+
out/
93+
94+
# mpeltonen/sbt-idea plugin
95+
.idea_modules/
96+
97+
# JIRA plugin
98+
atlassian-ide-plugin.xml
99+
100+
# Crashlytics plugin (for Android Studio and IntelliJ)
101+
com_crashlytics_export_strings.xml
102+
crashlytics.properties
103+
crashlytics-build.properties
104+
105+
106+
### NetBeans ###
107+
nbproject/private/
108+
build/
109+
nbbuild/
110+
dist/
111+
nbdist/
112+
nbactions.xml
113+
nb-configuration.xml
114+
.nb-gradle/
115+
116+
117+
### Linux ###
118+
*~
119+
120+
# KDE directory preferences
121+
.directory
122+
123+
# Linux trash folder which might appear on any partition or disk
124+
.Trash-*
125+
126+
127+
### Windows ###
128+
# Windows image file caches
129+
Thumbs.db
130+
ehthumbs.db
131+
132+
# Folder config file
133+
Desktop.ini
134+
135+
# Recycle Bin used on file shares
136+
$RECYCLE.BIN/
137+
138+
# Windows Installer files
139+
*.cab
140+
*.msi
141+
*.msm
142+
*.msp
143+
144+
# Windows shortcuts
145+
*.lnk
146+
147+
target/
148+
nbactions*.xml
149+
.checkstyle
150+
.pmd
151+
.pmdruleset.xml
152+
153+
# Ignore generated files
154+
*.log
155+
156+
.vagrant/
157+
.vscode/

.mailmap

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
2+
Alexander Chingarev <[email protected]> <[email protected]>
3+
Alexander Chingarev <[email protected]> <[email protected]>
4+
Alexander Chingarev <[email protected]> <[email protected]>
5+
6+
7+
8+
9+
10+
11+
Benoît Lagae <[email protected]> <benoit@iText-blagae>
12+
13+
14+
15+
16+
Bruno Lowagie <[email protected]> <iText@Catullus>
17+
18+
Dimitry Alexandrov <[email protected]> <[email protected]>
19+
20+
Dmitry Trusevich <dmitry.trusevich@duallab> <dmitry.trusevich@duallab>
21+
22+
23+
Ilya Idamkin <[email protected]> <ilya.idamkin@TeamCity>
24+
25+
26+
27+
28+
29+
iText Software <[email protected]> <teamcity.bot@TeamCity>
30+
31+
32+
33+
34+
35+
36+
37+
38+
39+
40+
41+
42+
Michaël Demey <[email protected]> michael.demey <>
43+
44+
Michaël Demey <[email protected]> <michael.demey@TeamCity>
45+
46+
47+
48+
49+
Nadia Ivaniukovich <[email protected]> <[email protected]>
50+
Nadia Ivaniukovich <[email protected]> <[email protected]>
51+
52+
Natalia Zgirovskaya <[email protected]> <[email protected]>
53+
Natalia Zgirovskaya <[email protected]> <[email protected]>
54+
55+
56+
57+
58+
59+
Pavel Alay <[email protected]> pavel.alay <>
60+
61+
Pavel Alay <[email protected]> <pavel.alay@TeamCity>
62+
63+
64+
65+
66+
67+
68+
69+
70+
71+
72+
73+
74+
75+
Veronika Lisovskaya <[email protected]> <veronika.lisovskaya@TeamCity>
76+
77+
Yanina Cheremisina <[email protected]> <[email protected]>
78+
Yulian Gaponenko <[email protected]> <duallab@DESKTOP-PG4L5J1>
79+
Yulian Gaponenko <[email protected]> <yulian.gaponenko@TeamCity>

0 commit comments

Comments
 (0)