Commit graph

4 commits

Author SHA1 Message Date
Baptiste Jonglez
57ed36db42 tessdata: uncompress tarball only once to speed up builds
The previous approach was to uncompress N times a big tarball (638 MB)
where N=130 is the number of supported languages.  Each iteration would
only extract a single file, but it still needs to uncompress the whole
tarball.  This is of course completely inefficient.

Now, we uncompress the tarball only once to extract all relevant files,
and then iterate N times to copy the file needed for each language.

This massively speeds up builds, at the expense of temporarily requiring
more build space (about 1 GB more)

Signed-off-by: Baptiste Jonglez <git@bitsofnetworks.org>
(cherry picked from commit 7fe513971f)
2021-07-02 20:37:50 +02:00
Rosen Penev
3d7d41f712 tessdata: update to 2.1.0
Switch to AUTORELEASE for simplicity.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
(cherry picked from commit 37bffba074)
2021-07-02 20:37:45 +02:00
Eneas U de Queiroz
ea0f17c3ac
tessdata: reorganize menu
Move language data menu under the package itself, and shorten the titles
so that all of them show up in the menu.

Signed-off-by: Eneas U de Queiroz <cotequeiroz@gmail.com>
2019-07-24 08:47:01 -03:00
Valentín Kivachuk
9c8e7c6f52 tesseract: add package
Tesseract is an open source text recognizer (OCR) Engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API to extract printed text from images. It supports a wide variety of languages.

Signed-off-by: Valentín Kivachuk <vk18496@gmail.com>
2019-07-18 11:38:04 +02:00