So many formats, so much wow. See wesm comment before making up your mind.
- API
- RPC
- Big Data
- Scientific
- Machine Learning
- Graph
- Object
- Workflow
- Programming
- Streaming
- Security-Focused
- Academic
Serialization suitable for API RPC networked services.
- CSV - Comma Separated Values. Textual.
- JSON - Lightweight document data-interchange format. Textual.
- JSONL - Schemeless "multiple JSON documents in 1 file" container data format. Textual.
- JSON5 - JSON with added support for comments and relaxed syntax. Textual.
- Thrift - Scalable code generation, schema evolution binary format. Binary.
- Protocol Buffers - Google's data interchange format. Binary.
- Message Pack - Efficient JSON-like binary serialization format. Binary.
- bson - Binary schemeless JSON encoding. Binary.
- XML - Extensible Markup Language. Genuinely Horrible. Textual.
- Plist - Property List representation. Apple. Textual.
- YAML - Indentation-based data serialization standard. Textual.
- TOML - Tom's Obvious, Minimal Language. Textual.
- CBOR - Concise Binary Object Representation. Schema-free. Binary.
- ION - Amazon's advanced JSON-compatible serialization. Textual/Binary.
- gRPC - A high-performance, open source universal RPC framework. Binary, ISO Layer 7.
- RSocket - Application protocol providing Reactive Streams semantics. Binary, ISO Layer 5 (or 6).
- Cap’n Proto - High-performance, schema-based data interchange format. Binary.
- FlatBuffers - Suitable for zero-copy deserialization. Binary.
Serialization suitable for big data at rest systems, from Hadoop family of solutions.
- Parquet - Columnar storage for Hadoop workloads. Binary.
- FlatBuffers - Protocol Buffers suitable for larger datasets. Binary.
- ORC - Columnar storage for Hadoop workloads. Binary.
- Avro - Scheme embedded, dynamic rich data structures. Textual/Binary.
- Ion - Row storage with skip scan parsing. Structured, schema embedded. Amazon. Textual/Binary.
- Arrow - Cross-language columnar data format optimized for analytics workloads. Binary.
- Delta Lake - Transactional storage layer for big data workflows. Binary.
- Iceberg - Open table format for large datasets. Binary.
Large-scale sparse arrays used in physical, mathematics, and statistics research.
- HDF5® - n-dimensional datasets, complex objects, with schema. Efficient I/O. Binary.
- npy - Numpy arrays, cell sparse metadata. Binary.
- NetCDF - Self-describing, machine-independent data format for scientific data. Binary.
- Zarr - Scalable storage of n-dimensional arrays. Binary.
- ASDF - Advanced Scientific Data Format for astronomy and beyond. Binary/Textual.
Serialization of deep learning networks and weights.
- SavedModel - TensorFlow package, weights, graph, executable code. Binary.
- CoreML - Apple's on-device ML model format. Binary.
- ONNX - Open Neural Network Exchange. Interoperability focused. Binary.
- MLIR - Intermediate representation for machine learning computations. Textual/Binary.
- TorchScript - Serialization for PyTorch models. Binary.
- GraphDef - TensorFlow graphs. Binary.
- PMML - Predictive Model Markup Language for exchanging ML models.
Serialization formats for representing graph-oriented data structures.
- json-ld - JSON for Linking Data. Textual.
- Turtle - Terse RDF Triple Language. Textual.
- GraphML - XML-based graph serialization format. Textual.
- DOT - Graph description language, developed as a part of the Graphviz project. Textual.
- ParquetGraph - Integration of Parquet with graph data structures. Binary.
- GraphSON - JSON-based graph serialization. Textual.
- GraphML - XML-based graph serialization format. Textual.- GML - Graph Modeling Language. Hierarchical ASCII-based. Textual.
- common-workflow-language - Specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments.
- Apache Airflow DAGs - Python-based Directed Acyclic Graphs for workflows.
- WDL - Workflow Description Language for genomics and scientific workflows.
- Relational Algebra and Datalog for Graphs - Coursera course on graph data manipulation.
- Cromwell - Scientific workflow management, compatible with WDL and CWL.
- Nextflow - Scalable and reproducible scientific workflows.
Language-native serialization formats for in-transit data (aka memory-based "live" objects).
- Dart Object Serialization - RAM to Disk serialization. Dart-specific. Binary.
- pickle - RAM to Disk serialization. Binary.
- msgpack-python - MessagePack serializer implementation for Python.
- srsly - Modern high-performance serialization utilities for Python.
- MessagePack.swift - Swift MessagePack Serializer.
- Java Object Serialization - RAM to Disk serialization. Binary.
- Serde - Rust's serialization framework for multiple formats like JSON, CBOR, and MessagePack.
- bincode - High-performance binary serialization for Rust.
- GOB - Go's built-in serialization format for arbitrary data structures. Binary.
Serialization formats optimized for real-time streaming data.
- Kafka Streams - Real-time stream processing framework with built-in serialization.
- Protobuf-Lite - Lightweight Protocol Buffers for constrained environments.
Serialization formats designed with security and robustness in mind.
- safetensors - Safe serialization of tensors for machine learning. Binary.
- Sealed Object Serialization - Encrypted serialization for web data. Textual/Binary.
Research papers discussing types, category theory, benchmarks & co.
- Type theory - Studies types, which informally are attributes that objects can possess.
- Category theory - General theory of functions. Axiomatic foundation for mathematics, as an alternative to set theory.
- Graph Compression Techniques - Research on optimizing graph serialization.
- Efficient Serialization in Distributed Systems - Study of efficient serialization techniques for scalability.
Contributions welcome! Read the contribution guidelines first.
To the extent possible under law, Maxim Veksler has waived all copyright and related or neighboring rights to this work.