Skip to content
This repository has been archived by the owner on Jan 29, 2019. It is now read-only.

BlackacreLabs/usreports-tokenizer

Repository files navigation

United States Reports PDF Miner

Dump the contents of PDF documents published by the Supreme Court of the United States into JSON lists of tokens indicating document structure.

Building & Installation

You will need:

At the command prompt:

$ mvn clean install

To build a self-contained .jar file:

$ mvn package
$ java -jar usreportsminer-[VERSION]-jar-with-dependencies.jar [PDF FILE]

Where [VERSION] is the current build version and [PDF FILE] is the path to a Supreme Court opinion PDF file.

Development

The program is just enough lines of Java to jerry-rig an Apache PDFBox document renderer to a stub Graphics2D and feed Token objects to Gson for JSON output and get me out of Java-land.

About

Tokenize United States Reports PDFs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages