Skip to content

Extension for Jackson JSON processor that adds support for (de)serializing POJOs as Spreadsheets

License

Notifications You must be signed in to change notification settings

scndry/jackson-dataformat-spreadsheet

Repository files navigation

Jackson dataformat: Spreadsheet

badge v?metadataUrl=https%3A%2F%2Fs01.oss.sonatype.org%2Fservice%2Flocal%2Frepositories%2Fsnapshots%2Fcontent%2Fio%2Fgithub%2Fscndry%2Fjackson dataformat spreadsheet%2Fmaven metadata

Let’s get rid of the Spread Sh!t.

This Jackson extension library is a component that memory-efficiently reads and writes the SpreadsheetML format[1] using Apache POI and StAX via Jackson abstraction. — Implemented as full streaming implementation, which allows full access (streaming, data-binding, and tree-model).

💬
The same abstraction is provided for the Horrible Microsoft Excel format[2]. However, not based on Event API, so may not be suitable for large .xls files.

Installation

📙
This project is currently under development. If you are interested, you can always use the latest snapshot artifacts from OSS Sonatype.

Maven

<repository>
    <id>sonatype-snapshots</id>
    <url>https://s01.oss.sonatype.org/content/repositories/snapshots/</url>
</repository>

<dependency>
    <groupId>io.github.scndry</groupId>
    <artifactId>jackson-dataformat-spreadsheet</artifactId>
    <version>0.0.1-SNAPSHOT</version>
</dependency>

Gradle

repositories {
    maven { url "https://s01.oss.sonatype.org/content/repositories/snapshots/" }
}

dependencies {
    implementation "io.github.scndry:jackson-dataformat-spreadsheet:0.0.1-SNAPSHOT"
}

Busy Developers' Guide to Features

Want to use HSSF and XSSF read and write spreadsheets in a hurry? This guide is for you. — Apache POI’s Quick Guide.

Reading Spreadsheet

First, let’s look at input-entries.xlsx file to read.

Table 1. input-entries.xlsx
Column A Column B

1

2

3

4

Then let’s define the Entry class annotated @DataGrid.

@DataGrid
public class Entry {
    private int a;
    private int b;
    // Constructors, Getters, Setters and toString
}

We use SpreadsheetMapper to read the file into an Entry object, so let’s set it up now.

SpreadsheetMapper mapper = new SpreadsheetMapper();
File file = new File("input-entries.xlsx");

Once we have SpreadsheetMapper configured, simply use readValue.

Entry value = mapper.readValue(file, Entry.class);
// Output: Entry(a=1, b=2)

You can use readValues to read the entire list.

List<Entry> values = mapper.readValues(file, Entry.class);
// Output: [Entry(a=1, b=2), Entry(a=3, b=4)]

By default, it reads the first sheet. If you need to find and read a specific sheet by name or index, you can use SheetInput.

String sheetName = "Entries";
SheetInput<File> input = SheetInput.source(file, sheetName);
List<Entry> values = mapper.readValues(input, Entry.class);
// Output: [Entry(a=1, b=2), Entry(a=3, b=4)]

Writing Spreadsheet

We also use SpreadsheetMapper to write an Entry object to the file. Just call writeValue.

SpreadsheetMapper mapper = new SpreadsheetMapper();
File file = new File("output-entries.xlsx");
Entry value = new Entry(1, 2);
mapper.writeValue(file, value);

The output of the above in the file will be:

Table 2. output-entries.xlsx

a

b

1

2

This time, let’s write a list.

List<Entry> values = Arrays.asList(new Entry(1, 2), new Entry(3, 4));
mapper.writeValue(file, values);
java.lang.IllegalArgumentException: `valueType` MUST be specified to write a value of a Collection or array type
Oh, Sheet!

Okay. Let’s add a valueType to the 3rd parameter.

Class<Entry> valueType = Entry.class;
mapper.writeValue(file, values, valueType);
Table 3. output-entries.xlsx

a

b

1

2

3

4

Now let’s annotate on properties with @DataColumn to write each column’s name.

@DataGrid
public class Entry {
    @DataColumn("Column A")
    private int a;
    @DataColumn("Column B")
    private int b;
    // Constructors, Getters, Setters and toString
}
Table 4. output-entries.xlsx

Column A

Column B

1

2

3

4

And also you can use SheetOutput to write the sheet name. By default, the created sheet is named "Sheet" + indexOfSheet.

String sheetName = "Entries";
SheetOutput<File> output = SheetOutput.target(file, sheetName);
mapper.writeValue(output, values, Entry.class);

Nested Object (de)Structuring

Spreadsheets are a flat data representation of a 2-dimensional structure, but most of the POJOs we have take complex nested structures. Of course, it provides (de)structuring these complex models, as you can see below:

ID NAME ZIPCODE ADDRESS LINE 1 ADDRESS LINE 2 DESIGNATION SALARY

1

John Doe

12345

123 Main St.

Anytown, USA

CEO

$300,000.00

As we are Object-Oriented Developers, you probably don’t want to manage a lot of column data in a flat model. Let’s first define the well-structured Employee class.

@DataGrid
class Employee {
    int id;
    String name;
    Address address;
    Employment employment;
    // ...
}

class Address {
    String zipcode;
    AddressLines addressLines;
    // ...
}

class AddressLines {
    String line1;
    String line2;
    // ...
}

class Employment {
    Designation designation;
    long salary;
    // ...
}

enum Designation {
    CEO, CTO, SM, ARCH, INT
}

No additional code is required when deserializing data into nested structures, you can deserialize the same way.

Employee value = mapper.readValue(input, Employee.class);

Here’s the Employee object that output will be:

Employee(
    id=1
    name=John doe
    address=Address(
        zipcode=12345
        addressLines=AddressLines(
            line1=123 Main St.
            line2=Anytown, USA
        )
    )
    employment=Emeployment(
        designation=CEO
        salary=300000
    )
)

Just as can be deserialized without any additional code, you can serialize these nested objects.

mapper.writeValue(output, values, Employee.class);

Writing the Nested List

Although limited, it supports serializing nested list structures. However, deserializing is not supported due to implementation complexity.

@DataGrid
class Outer {
    @DataColumn("A") int a;
    List<Inner> inners;
    @DataColumn("E") int e;
    // ...
}

class Inner {
    @DataColumn("B") int b;
    @DataColumn("C") int[] c;
    @DataColumn("D") int d;
    // ...
}
Outer value = new Outer(
        1, Arrays.asList(
            new Inner(2, new int[]{3, 4}, 5),
            new Inner(6, new int[]{7, 8}, 9)),
        10);
mapper.writeValue(output, value);

This will lead to the desired result like this:

A B C D E

1

2

3

5

10

BLANK

BLANK

4

BLANK

BLANK

BLANK

6

7

9

BLANK

BLANK

BLANK

8

BLANK

BLANK

Annotations and Schema

For most column schema generation requirements, @DataGrid and @DataColumn annotations will provide all you need. The two have almost similar attributes, so only listing attributes of @DataColumn.

Attributes in @DataColumn
  • String value(): The column’s name to write in header

  • String style(): The name of the cellStyle to apply to the column, it should be registered with StylesBuilder.

  • String headerStyle(): Same as style(), but only applied to the column’s header.

  • int width(): Fixed width for the column.

  • OptBoolean autoSize(): Adjusts the column width to fit the contents. Not accurate for fullwidth forms like the CJK characters.

  • int minWidth(): Minimum width of the auto-sized column.

  • int maxWidth(): Maximum width of the auto-sized column.

Attribute Resolution

Column schema uses particular order designed to allow for value overrides. Attributes are considered in the following order:

  1. @DataColumn's attribute on property.

  2. @DataGrid's attribute on the class in which the property is declared.

  3. @DataGrid's attribute of the class of which the class in which the property is declared is a member.

For a better understanding, let’s look at the code below:

@DataGrid(autoSizeColumn = OptBoolean.FALSE)
class Foo {
    @DataColumn(style = "Foo_a_style")
    int a;
    @DataColumn(headerStyle = "Foo_b_headerStyle")
    int b;
    // ...
}

@DataGrid(columnHeaderStyle = "Bar_columnHeaderStyle")
class Bar {
    Foo foo;
    // ...
}

@DataGrid(columnStyle = "Baz_columnStyle", autoSizeColumn = OptBoolean.TRUE)
class Baz {
    Foo foo;
    // ...
}

Schema is generated and used when (de)serializing within SpreadsheetMapper, but you can generate using sheetSchemaFor.

SpreadsheetSchema foo = mapper.sheetSchemaFor(Foo.class);
SpreadsheetSchema bar = mapper.sheetSchemaFor(Bar.class);
SpreadsheetSchema baz = mapper.sheetSchemaFor(Baz.class);

Column attributes of schemas generated for each of the above types are as follows:

Column Style Header Style Auto Size

foo.a

"Foo_a_style"

""

FALSE

foo.b

""

"Foo_b_headerStyle"

FALSE

bar.foo.a

"Foo_a_style"

"Bar_columnHeaderStyle"

FALSE

bar.foo.b

""

"Foo_b_headerStyle"

FALSE

baz.foo.a

"Foo_a_style"

""

FALSE

baz.foo.b

"Baz_columnStyle"

"Foo_b_headerStyle"

FALSE

Build Your Own Styles

@DataGrid(columnHeaderStyle = "baseHeader")
class Employee {
    // ...
    Address address;
    Employment employment;
    // ...
}

class Address {
    @DataColumn(value = "ZIPCODE", style = "zipcode")
    String zipcode;
    // ...
}

@DataGrid(columnHeaderStyle = "employmentHeader")
class Employment {
    // ...
    @DataColumn(value = "SALARY", style = "salary", headerStyle = "salaryHeader")
    long salary;
    // ...
}
StylesBuilder builder = new StylesBuilder()
        .cellStyle(/*name*/ "baseHeader")
            .border().thin()
            .fillForegroundColor(IndexedColors.GREY_25_PERCENT)
            .fillPattern().solidForeground()
            .font().bold().end()
            .end()
        .cellStyle(/*name*/ "employmentHeader", /*cloneStyleFrom*/ "baseHeader")
            .fillForegroundColor(198, 239, 206)
            .end()
        .cellStyle("salaryHeader", "baseHeader")
            .fillForegroundColor(0xFFC7CE)
            .font().bold().color(0x9C0006).end()
            .end()
        .cellStyle("salary")
            .dataFormat("[Red][>=100000]$#,##0.00;$#,##0.00")
            .end()
        .cellStyle("zipcode")
            .dataFormat().text()
            .end();
SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .stylesBuilder(builder)
        .build();
mapper.writeValue(output, values, type);
Built-in data formats for convenience
  • general(): "General"

  • text(): "@"

  • numberInt(): "0"

  • numberFloat(): "0.00"

  • numberIntWithComma(): "#,##0"

  • numberFloatWithComma(): "#,##0.00"

  • date(): "yyyy-mm-dd"

  • dateTime(): "yyyy-mm-dd hh:mm:ss"

SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .addModule(new ExcelDateModule())
        .build();

Change Origin Address

CellAddress address = ...;
SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .origin(address)
        .origin("B2")
        .origin(1, 1)
        .build();
mapper.writeValue(output, values, type);

BLANK

BLANK

BLANK

BLANK

Column A

Column B

BLANK

1

2

BLANK

3

4

Overwrite Column Names

SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .columnNameResolver(prop -> "Overwrite " + prop.getName().toUpperCase())
        .build();
mapper.writeValue(output, values, type);

Overwrite A

Overwrite B

1

2

3

4

enum ColumnCode {
    A("Code A"), B("Code B");
    String text;
    // ...
}

@Retention(RetentionPolicy.RUNTIME)
@interface NameOf {
    ColumnCode value();
}

@DataGrid
class Entity {
    @NameOf(ColumnCode.A)
    @DataColumn("It will be overwritten")
    int a;
    @NameOf(ColumnCode.B)
    int b;
    // ...
}
ColumnNameResolver byText = AnnotatedNameResolver.forValue(NameOf.class, ColumnCode::getText);
SpreadsheetMapper mapper = SpreadsheetMapper.builder()
        .columnNameResolver(byText)
        .build();
mapper.writeValue(output, values, type);

Code A

Code B

1

2

3

4

Reading Under Control

SpreadsheetReader reader = mapper.sheetReaderFor(Entry.class);
try (SheetMappingIterator<Entry> iterator = reader.readValues(input)) {
    while (iterator.hasNext()) {
        Entry value = iterator.next();
        SheetLocation location = iterator.getCurrentLocation();
        boolean done = ...;
        if (done) {
            break;
        }
    }
}
SpreadsheetMapper mapper = SpreadsheetMapper.builder();
        .enable(SheetParser.Feature.BREAK_ON_BLANK_ROW)
        .build();
On/Off Features in SheetParser.Feature
  • BLANK_ROW_AS_NULL (default true)

  • BREAK_ON_BLANK_ROW (default false)

And more.

Supported Types

  • Deserialization

    • java.io.File

    • java.io.InputStream

    • io.github.scndry.jackson.dataformat.spreadsheet.deser.SheetInput<T>

    • org.apache.poi.ss.usermodel.Sheet

  • Serialization

    • java.io.File

    • java.io.OutputStream

    • io.github.scndry.jackson.dataformat.spreadsheet.ser.SheetOutput<T>

    • org.apache.poi.ss.usermodel.Sheet

License

Project is licensed under Apache License 2.0.


1. XLSX — XSSF (XML SpreadSheet Format)
2. XLS — HSSF (Horrible SpreadSheet Format)

About

Extension for Jackson JSON processor that adds support for (de)serializing POJOs as Spreadsheets

Topics

Resources

License

Stars

Watchers

Forks