Skip to content

Latest commit

 

History

History
95 lines (74 loc) · 2.94 KB

README.md

File metadata and controls

95 lines (74 loc) · 2.94 KB

mopper

I hate mappings!*

(*)That's why mopper tries to do the job as fast as possible!

A fast and lightweight data-to-RDF mapping tool. It consists of a library and a command line interface. It takes a mapping document as input ( AlgeMapLoom, RML, or ShExML ) and generates a knowledge graph (in RDF).

Conceptually every operator runs in its own thread, and data flow between them as a stream of messages (as a kind of simplified actor model). There is still plenty of room for optimizations though...

Running

Most basically:

mopper -m my-mapping-file.json

To check all options, run mopper --help

Usage: mopper [OPTIONS] --mapping-file <FILE>

Options:
  -m, --mapping-file <FILE>          Required. The path to the mapping file
  -l, --mapping-lang <LANG>          The language of the mapping file. If not given, AlgeMapLoom is assumed [possible values: rml, shexml]
  -v, --verbose...                   Increase log level
  -q, --quiet                        Be quiet; no logging
      --force-std-out                Force output to standard out, ignoring the targets in the plan. Takes precedence over --force-to-file
      --force-to-file <FILE>         Force output to file, ignoring the targets in the plan
      --message-buffer-capacity <N>  Set the maximum number of messages each communication channel can hold before blocking the sender thread. `0` means no messages are hold: 'send' and 'receive' must happen at the same time. The default is `128`
  -d, --deduplicate                  Remove duplicate triples or quads. Note that currently deduplication only works on a per-sink basis and has a negative impact on speed and memory consumption
  -h, --help                         Print help

Building

You need Rust and Cargo to build mopper (install instructions).

Then, in the root directory, run

cargo build --release

The executable binary comes in the target/release directory.

Current state

Mopper is work in progress. Here's a rough overview of what's (not) implemented:

Input formats:

  • CSV
  • JSON
  • XML

Input / output types:

  • File
  • Standard out
  • Standard in
  • Stream (e.g. Kafka, Websocket)
  • Relational database

Output formats:

  • N-Triples
  • N-Quads
  • More RDF serializations

Mapping features:

  • IRI generation function
  • Reference function
  • IRI template function
  • Constant IRI generation
  • URL encode function
  • IRI generation
  • Projection operator
  • Fragmenting
  • Join operator (only inner join with equals condition)
  • Blank node generation function
  • Deduplication
  • Concatenate function
  • Replace function
  • To uppercase / lowercase function
  • FnO function handling
  • Rename operator