Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store tileset metadata separately from tileset data #129

Open
keller-mark opened this issue Mar 2, 2020 · 1 comment
Open

Store tileset metadata separately from tileset data #129

keller-mark opened this issue Mar 2, 2020 · 1 comment

Comments

@keller-mark
Copy link
Member

@sehilyi, @alexanderveit, @ngehlenborg and I had a discussion on Friday about improving support for tileset metadata (initially with the cistrome-higlass-wrapper use case but now with others in mind - vitessce, etc), so I wanted to create this issue to discuss with the entire development team

One idea is to specify a corresponding metadata file when tileset file is ingested.

Some open questions

  • What types of fields need to be stored in the metadata file? For cistrome-higlass-wrapper, it will at least be:
    • quantitative fields (bar chart)
    • multiple related quantitative fields (stacked bar chart)
    • categorical/nominal fields (e.g. cell type, tissue type, species)
    • links/text (just a special case of the categorical/nominal?)
    • hierarchy - right now we do the tree-to-matrix and matrix-to-tree thing, but if we are defining a new metadata storage format, we could also define "aggregated" metadata which can store the hierarchy as a tree data structure separately so that no conversion step is required
  • For which axis is the metadata? How can this be specified in the metadata file?
  • How should metadata be stored when it is associated with:
    • points along a continuous axis
    • categories along a categorical axis (the current cistrome-higlass-wrapper case)
    • 1D intervals along a continuous axis
    • 2D regions along two continuous axes
  • Does metadata need to be aggregated? For example, does different metadata need to correspond to different track zoom levels?
    • I could imagine aggregating the multivec data along the y/sample axis as well, such that for instance
      • y-zoom-level 0 corresponds to displaying multivec rows by species: 2 rows are displayed, one for human and one for mouse
      • y-zoom-level 1 corresponds to displaying multivec rows by tissue type for a particular species: more rows are displayed, for all tissue types within the species
      • y-zoom-level 2 corresponds to displaying multivec rows by cell type for a particular species and tissue type: more rows are displayed, for all cell types within the tissue type
  • In what file format should metadata be stored (json, csv, etc)? How flexible does this format need to be, and does a schema need to be defined? There obviously would need to be a schema if there is some aggregation going on, so that the server can parse and return different metadata based on query parameters. But if not, then there may not need to be any schema if neither higlass-server nor higlass ever needs to look at the data, and only these "wrapper" applications are using the data.
  • From which server API endpoint will the metadata be served? (Right now some metadata is served from the /tileset_info endpoint)
@keller-mark keller-mark changed the title Store track metadata separately from track data Store tileset metadata separately from tileset data Mar 2, 2020
@pkerpedjiev
Copy link
Member

I think I'm missing some context here. What are you trying to accomplish with all this metadata?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants