Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instill end-user confidence in data integrity #5

Open
gobengo opened this issue Jan 8, 2025 · 2 comments
Open

Instill end-user confidence in data integrity #5

gobengo opened this issue Jan 8, 2025 · 2 comments
Assignees

Comments

@gobengo
Copy link

gobengo commented Jan 8, 2025

Challenge Description

  • users want to store their data on someone else's computer and fetch it later
  • but it could be tampered with or otherwise corrupted while on the other person's computer
  • when the end-user retrieves the data from storage, how do they know some data wasn't lost, or malicious data wasn't added?

Impact and Importance

  • Without communicating data integrity, end-users may not be able to ensure the Data Space is, in fact, storing the entirety of data
  • Without data integrity checks, there is no way to check the Data Space server did not inject malware or other junk data into the served data

Desired Solution

  • individual data objects in LWS should have something like one or more checksums. The end-user should be able to deterministically create a checksum from a representation of the Data Space Data Object. The Data Space Server should be able to advertise the checksum of the objects it stores.
  • If Data Space allows organizing several data objects into a collection, the collection itself should also have a checksum that changes when collections members are added, removed, and updated
  • If I have data in Data Space A with checksum X, and i migrate the data to Data Space B. The checksum should still be X.

Acceptance Criteria

References and Resources

Additional Notes

@pietercolpaert
Copy link
Collaborator

pietercolpaert commented Jan 10, 2025

I take it LWS refers to the on-going Linked Web Storage working group from W3C? https://www.w3.org/groups/wg/lws/

I believe an additional complexity here is how to refer to a “package” of data on which the checksum should apply. Will the checksum only be based on an HTTP response or file? Or should we also consider the fact that we can refer to a set of triples/quads as a package?

@JohannesLipp
Copy link
Collaborator

+1 for the challenge and general approach.

Just two things that came into my mind:

  1. The term "the Data Space server" should be improved, because there is no single dataspace server. Dataspaces are distributed systems per definition / design.
  2. Dataspaces distinguish between data discovery and data access. Thus, is might make sense to not only apply this to the data but also to its metadata to enable the same benefits for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants