growinscala / flipper   0.3

MIT License GitHub

PDF to JSON, JSON to PDF and etc.

Scala versions: 2.12

Flipper · Build Status

What is Flipper?

Flipper is an open-source PDF library written in Scala and that can be integrated in any Java/Scala environment developed by the good people at Growin. It has some really usefull features such as:

  • Parsing a PDF document and returning a JSON object - Flipper is able to parse the text in a PDF document, as well as recognize text in images inside the PDF document, and return a JSON object with the extracted information. You simply specify the type of value you want to obtain for a given keyword (A noun, a verb, a number etc.), and Flipper will do the rest!

  • Convert JSON to PDF - Flipper does not content itself with just parsing a PDF file, that's easy! Flipper also converts a given JSON object to a PDF document. You can also customize the outputted document with CSS.

  • Convert PDF to other file types - We also support the conversion from PDF to other popular formats: .png; .jpeg/jpg; .gif; .odt.

Current version: 0.3


Project structure

Flipper is divided into 3 different modules that can be used individually: Reader, Generator and Converter.

Flipper/
        ├── converter ; PDF to other file types module
        ├── generator ; JSON to PDF module
        ├── reader    ; PDF parser to JSON
        └── build.sbt ; Project config file

You can find the individual README.md files with examples and documentation here:


Table of contents


Configuration

Flipper is available on maven central, so to use it you simply need to add the lines bellow to your own project.

If you are using SBT, add the following line to your build.sbt:

libraryDependencies += "com.growin" %% "flipper" % "0.3"

Or Maven, add these lines to your pom.xml:

<dependency>
    <groupId>com.growin</groupId>
    <artifactId>flipper_2.12</artifactId>
    <version>0.3</version>
</dependency>

For other versions you can access the maven repository. There you will also find other ways of including Flipper into your project without using SBT or Maven.


Dependencies

Download the eng.traineddata and por.traineddata from here and insert them in a directory named tessdata in the root of the project.

Flipper uses Tess4j (a tesseract for java wrapper) to extract text from images (using an algorithm known as optical character recognition). In order to improve this algorithms accuracy, we must provide Tess4j with a set of training data.


How to test Flipper

If you want to make sure for your self that Flipper is in fact amazing and working properly you can the folowing steps to test it (using sbt):

  • Start by cloning this repository

git clone https://github.com/GrowinScala/Flipper.git

  • Then cd into it

cd Flipper

  • And run the unit tests using sbt

sbt test



Who do I talk to?

Flipper is an Open-Source project developed at Growin in our offices in Lisbon.
If you have any questions, you can contact:

Or visit our website: www.growin.com



License

Open source licensed under the MIT License