Dump the contents of PDF documents published by the Supreme Court of the United States into JSON lists of tokens indicating document structure.
You will need:
At the command prompt:
$ mvn clean install
To build a self-contained .jar
file:
$ mvn package
$ java -jar usreportsminer-[VERSION]-jar-with-dependencies.jar [PDF FILE]
Where [VERSION]
is the current build version and [PDF FILE]
is the path to a Supreme Court opinion PDF file.
The program is just enough lines of Java to jerry-rig an Apache PDFBox
document renderer to a stub Graphics2D
and feed Token
objects to Gson
for JSON output and get me out of Java-land.