Skip to content

LeveragingAvro

Zoltan Farkas edited this page May 29, 2019 · 29 revisions

Leveraging avro as a data format.

Avro is one of the many new serialization formats that have been created in the last 20 years. for a good introductions and also a comparison between with probably the 2 most popular alternatives see

In this demo project we use avro for:

  • wire format.
  • log format.

Why avro for wire format?

  1. Multiple encodings support:
    • binary for efficiency.
    • json for ineroperability and debugging.
    • csv for interoperability and debugging.
  2. Extensible. You can add you own metadata to the schema. (@beta, @displayName, ...)
  3. Avro schemas have a Json representation.
  4. Multiple language support.
  5. Open source.

Demo of a REST endpoint.

Start up the demo app as described at.

Let's try to get some data from:

images

As you can observe the writer schema info is provided by the content-schema HTTP header:

Content-Length: 505
content-schema: {"type":"array","items":{"$ref":"org.spf4j.demo:jaxrs-spf4j-demo-schema:0.3:2"}}
Content-Type: application/avro-x+json;charset=UTF-8

removing ?_Accept=application/json will yield the more efficient binary response:

Content-Length: 220
content-schema: {"type":"array","items":{"$ref":"org.spf4j.demo:jaxrs-spf4j-demo-schema:0.3:2"}}
Content-Type: application/avro

If we would desire the data in CSV format, since this endpoint is compliant we can use: ?_Accept=text/csv

Content-Length: 376
content-schema: {"type":"array","items":{"$ref":"org.spf4j.demo:jaxrs-spf4j-demo-schema:0.3:2"}}
Content-Type: text/csv;fmt=avro;charset=UTF-8

All the above is magically served by a endpoint definition like:

  @GET
  @Produces(value = {"application/avro", "application/avro-x+json", "application/octet-stream", "application/json", "text/csv"})
  List<DemoRecordInfo> getRecords();

Having your Open api descriptor and ui can also be out of the box:

images

This functionality is implemented by the spf4j avro feature and leverages Avro references

Why avro for logs?

  • structure. No need to write custom parsers. See for the record structure.
  • efficiency. smaller in size due to binary format, and built in compression.

An example of how to use avro for logs (leverages spf4j-logback and spf4j-jaxrs-actuator) is at.

As you might observe logs are written to the console, and that is on purpose. Although logging to console is what is being recommended in most literature, there are disadvantages to it. The console output is limited to text format which leads to ineficiency (large size) compounded by json wrapping and loss of structure (various libraries will write there in various formats).

here is a stdout log line example from a kubernetes node:

{"log":"SLF4J: A number (2) of logging calls during the initialization phase have been intercepted and are\n","stream":"stderr","time":"2019-05-29T01:34:59.1306243Z"}
{"log":"SLF4J: now being replayed. These are subject to the filtering rules of the underlying logging system.\n","stream":"stderr","time":"2019-05-29T01:34:59.1307042Z"}

As you can see every stdout/stderr log line is wrapped into a json object, which not only adds extra overhead to you rmessages, it also obscures their structure.

To overcome theese limitations, your logging backend can be configured to log to the kubernetes host log folder, and your logs can take 5-10 times less disk space which should increase your logging efficiency significantly.

A good example for this is at. In this example, the service can serve its own logs (cluster level), which reduces the need of a log aggragator like splunk. Actually I think deploying a log service(aggregator) to serve the logs from where they are and avoid data movement will result in a significantly more scalable system.

Here are some examples of what can you do:

Show latest logs in text format:

images

Show latest logs in JSON format:

images

Show request logs where exec time exceeds a value:

images

Browse cluster log files:

images

Show all Log files from a particular node:

images

Download a log file:

images

Clone this wiki locally