Recursive data map, Functor based decrypt method (for direct network calls) and also same pattern for getting root data_map #394

dirvine · 2024-11-02T00:32:33Z

Description (BREAKING CHANGE:)

This PR introduces hierarchical data maps and flexible storage backends to the self_encryption library, along with comprehensive Python bindings and documentation updates.

Key Changes

Core Functionality

Add shrink_data_map function for hierarchical data map reduction
Add get_root_data_map function for recursive data map expansion
Implement decrypt_from_storage with generic storage interface
Update DataMap struct to support child levels
Add flexible storage backend support using functors

Python Bindings

Add Python bindings for hierarchical data map operations
Add shrink_data_map and get_root_data_map Python functions
Update PyDataMap class to support serialization/deserialization
Add comprehensive Python examples

Documentation

Extensive README update with detailed examples for both Rust and Python
Add implementation details section
Add hierarchical data map documentation
Maintain original diagrams and explanations
Add comprehensive usage examples

Testing

Add test suite for new functionality
Add disk-based storage tests
Add memory-based storage tests
Add error handling tests
Add hierarchical data map tests
Add tests for multiple levels of shrinking

Impact

NBREAKING CHANGES to existing API
Not fully backward compatible with existing code
No new dependencies added
Improved handling of large files through hierarchical data maps
More flexible storage options through generic backends

Testing Done

Comprehensive test suite added
All tests passing
Manual testing with large files
Python binding verification
Storage backend validation

Documentation

Updated README with comprehensive examples
Added detailed Python usage section
Added implementation details
Maintained existing documentation

Related Issues

Closes #XXX (replace with actual issue number)

Checklist

* Remove parallel processing in decrypt to maintain chunk boundaries * Improve chunk ordering and boundary handling * Add better error messages for missing chunks * Fix seek_and_join test to properly handle chunk boundaries * Fix overflow issues in seek_with_length_over_data_size test

* Reorganize README.md to clearly document both Rust and Python interfaces * Add Python installation and usage examples * Configure pyproject.toml to use Cargo.toml version via maturin * Maintain all existing Rust documentation and imagery * Keep security notes and whitepaper references prominent The Python package will automatically sync its version with Cargo.toml using maturin's dynamic versioning feature.

* Add optional `child` field to DataMap struct * Update DataMap constructors to support child field: - new() creates DataMap without child - with_child() creates DataMap with specified child value * Add child() getter method * Update Debug implementation to display child field when present * Add comprehensive tests: - Basic functionality with/without child - Serialization/deserialization - Debug output formatting * Add test helper functions for creating test DataMaps This change allows DataMap to track an optional child identifier while maintaining backward compatibility with existing code.

This commit introduces functionality to handle large data maps by implementing a hierarchical shrinking mechanism and corresponding expansion capability. Key changes: - Add shrink_data_map function that recursively shrinks data maps until they contain fewer than 4 chunks - Add get_root_data_map function to recursively expand child data maps back to their root form - Implement generic storage interface using functors to allow flexible backend storage (disk, memory, network etc) - Add comprehensive test suite covering: * Disk-based storage using tempdir * In-memory storage using thread-safe HashMap * Multiple levels of shrinking/expansion * Error handling scenarios * Data integrity verification - Add create_dummy_data_map helper for testing large maps - Update DataMap struct to track child levels for hierarchical relationships The changes enable efficient handling of large data maps by breaking them down into manageable chunks while maintaining data integrity and providing flexible storage options. Testing: - All tests pass except for multiple_levels test which needs larger input - Added disk and memory-based storage tests - Added error handling tests - Added data integrity verification

…ackends This commit introduces major enhancements to the self_encryption library: Core Changes: - Add shrink_data_map function for hierarchical data map reduction - Add get_root_data_map function for recursive data map expansion - Implement flexible storage backend support using functors - Add decrypt_from_storage with generic storage interface - Update DataMap struct to support child levels Python Bindings: - Add Python bindings for hierarchical data map operations - Add shrink_data_map and get_root_data_map Python functions - Update PyDataMap class to support serialization/deserialization Documentation: - Comprehensive README update with detailed examples - Add extensive Rust usage examples - Add Python usage examples - Document hierarchical data map functionality - Add implementation details section Testing: - Add comprehensive test suite for new functionality - Add disk-based storage tests - Add memory-based storage tests - Add error handling tests - Add hierarchical data map tests - Add tests for multiple levels of shrinking The changes enable efficient handling of large files through hierarchical data maps and provide flexible storage options through generic backends. All new functionality is thoroughly tested and documented with examples in both Rust and Python. Breaking Changes: None Dependencies: No new dependencies added

Added extensive integration tests to verify self-encryption functionality across different storage backends: * Added StorageBackend helper struct for managing memory/disk storage * Added debug helpers for storage state visualization * Implemented cross-backend tests: - Memory-to-memory, memory-to-disk, disk-to-memory operations - Large file handling (100MB+) - Concurrent access with multiple file sizes - Platform-specific size handling (page sizes, u16/u32 boundaries) - Error handling and recovery * Added verification steps between operations * Fixed chunk handling to ensure proper storage/retrieval flow * Added detailed logging for debugging storage operations These tests ensure consistent behavior across different storage backends and verify data integrity through the entire encrypt/decrypt cycle.

Also fix python bindings and add more comprehensive exampels in the README

plus ready version bump

happybeing · 2024-11-03T12:23:22Z

This PR introduces hierarchical data maps and flexible storage backends to the self_encryption library

This sounds interesting @dirvine. Please can you elaborate on the new/changed functionality this enables (maybe in the OP)?

dirvine · 2024-11-13T23:59:05Z

Sorry Ar, I missed your question. Yes there is a couple of things happening here

Make Data map recursive. So in encrypt we always return a data map of Len() 3. In decrypt we decrypt the recusrively encrypted data map. It's just easier API. This is in place in other PR and functionality will be in next PR to apply this to all functions.
The functor based methods will allow you to pass a storage functor to encrypt or decrypt to store or retrieve chunks from a location as per the functor, so network, disk, ram etc. This is to allow folk to pass the functionality for using the lib and the storage they want (I,e, network) as a functor and for decrypt especially just read direct from the network in our case. So no need for on disk locally

BREAKING CHANGE: This PR alters the API

maqi · 2024-11-15T09:50:33Z

src/data_map.rs

+pub struct DataMap {
+    /// List of chunk hashes
+    pub chunk_identifiers: Vec<ChunkInfo>,
+    /// Child value


what that value stands for, number of generated chunks or recursive levels, or the original data_map size ?

also, the comment of this struct may need to be updated as well ?

the value here is the level of of the child really. So None means it's a root_level data_map (so can be very large). The oldest child will be what we return from encrypt and will be length 3.

shall above put as comment?
the current one is really vague

src/data_map.rs

src/decrypt.rs

maqi · 2024-11-15T09:59:06Z

src/tests.rs

@@ -241,131 +239,90 @@ fn get_chunk_sizes() -> Result<(), Error> {

 #[test]
 fn seek_and_join() -> Result<(), Error> {
-    for i in 1..15 {


ideally keep this test, make the new one as a new separated test.

This test was invalid and prone to random failures. We could write a whole new one though, but this one was not good.

maqi · 2024-11-15T10:00:49Z

src/tests.rs

    let start_size = 4_194_300;
-    for i in 0..27 {
+    for i in 0..5 {


size has been reduced for any reason ?
if aim to cover small sized file purposely, maybe better make as a new testing scenario?
say make this test as a function taking a parameter of i, and pass down different values in ?

It was just to reduce test time, the larger size was not testing anything further AFAIK.

maqi · 2024-11-15T10:01:30Z

src/tests.rs

-}
-
-#[test]
-fn seek_with_length_over_data_size() -> Result<(), Error> {


better keep this scenario (may can be merged with above)

This test used old mechanism and was updated to the new code in a new test. We could do another larger test again, but this suite was upgraded to handle new code.

This commit fixes potential panic conditions in the decryption functions and improves their robustness and clarity: - Replace unsafe direct indexing with safe `.get()` lookups in chunk hash validation - Add proper error handling for missing or corrupted chunks - Rename `relative_pos` parameter to `file_pos` to clarify its meaning - Add comprehensive documentation for function parameters and return values - Improve error messages to aid in debugging missing/corrupted chunks - Maintain consistent error handling throughout both functions The changes prevent runtime panics that could occur when: - A chunk's content hash doesn't match any expected hash in the data map - Chunks are missing or corrupted - Invalid chunk indices are encountered This is a breaking change for `decrypt_range()` as the `relative_pos` parameter has been renamed to `file_pos` to better reflect that it represents a position within the complete file rather than within the first chunk. Testing: - Existing tests pass - Added error cases are properly handled - API documentation is complete and accurate

maqi · 2024-11-15T21:40:42Z

src/decrypt.rs

@@ -8,63 +8,42 @@

 use crate::{encryption, get_pad_key_and_iv, xor, EncryptedChunk, Error, Result};
 use bytes::Bytes;
-use itertools::Itertools;
-use rayon::prelude::*;
 use std::io::Cursor;
 use xor_name::XorName;

 pub fn decrypt(src_hashes: Vec<XorName>, encrypted_chunks: &[&EncryptedChunk]) -> Result<Bytes> {


better rename encrypted_chunks to sorted_encrypted_chunks, or put comment on top of the function to highlight the input shall be sorted.
to avoid future misuse.

dirvine added 21 commits October 30, 2024 15:11

Python bindings

178843c

feat: add python bindings using PyO3

2ba4e89

feat: test pythno bindings

773a8be

feat: test python bindings

4299c21

feat: test python bindings

9d3fd47

feat: test full python bindings

f36df45

feat: Python full bindings

ab06d9c

feat: PYPI Token

9cc652b

feat: PYPI Token

8a622a3

feat: PYPI Token / linux/mac/win

221234f

feat: PYPI Token / linux/mac/win

0886ca2

feat: PYPI Token / linux/mac/win test 5

63d8ad5

feat: PYPI Token / linux/mac/win test 6

3fe260b

feat: PYPI Token / linux/mac/win test 7

992e168

feat: PYPI Token / linux/mac/win test 8

2a4b4bf

remove unneeded file

2ae4c18

dirvine requested a review from a team as a code owner November 2, 2024 00:32

dirvine added 5 commits November 2, 2024 12:02

version bump

8b54c1c

chore: Format

4af6689

Also fix python bindings and add more comprehensive exampels in the README

doc: Update README

b65bcef

plus ready version bump

chore: fmt & clippy

748dd24

feat: version bump and fmt

256f1f7

BREAKING CHANGE: This PR alters the API

maqi requested changes Nov 15, 2024

View reviewed changes

dirvine added 3 commits November 15, 2024 11:28

fix: remove unwanted file

60ac0a1

doc: update data_map.rs with comment

1b18b21

maqi reviewed Nov 15, 2024

View reviewed changes

maqi previously approved these changes Nov 15, 2024

View reviewed changes

doc: add comment on sorted chunks

3719022

dirvine dismissed maqi’s stale review via 3719022 November 15, 2024 21:52

dirvine enabled auto-merge (rebase) November 15, 2024 21:55

dirvine disabled auto-merge November 15, 2024 22:07

dirvine merged commit c2aa75e into maidsafe:master Nov 15, 2024
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive data map, Functor based decrypt method (for direct network calls) and also same pattern for getting root data_map #394

Recursive data map, Functor based decrypt method (for direct network calls) and also same pattern for getting root data_map #394

dirvine commented Nov 2, 2024 •

edited

Loading

happybeing commented Nov 3, 2024

dirvine commented Nov 13, 2024 •

edited

Loading

maqi Nov 15, 2024

dirvine Nov 15, 2024

maqi Nov 15, 2024

maqi Nov 15, 2024

dirvine Nov 15, 2024

maqi Nov 15, 2024

dirvine Nov 15, 2024

maqi Nov 15, 2024

dirvine Nov 15, 2024

maqi Nov 15, 2024

Recursive data map, Functor based decrypt method (for direct network calls) and also same pattern for getting root data_map #394

Recursive data map, Functor based decrypt method (for direct network calls) and also same pattern for getting root data_map #394

Conversation

dirvine commented Nov 2, 2024 • edited Loading

Description (BREAKING CHANGE:)

Key Changes

Core Functionality

Python Bindings

Documentation

Testing

Impact

Testing Done

Documentation

Related Issues

Checklist

happybeing commented Nov 3, 2024

dirvine commented Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirvine commented Nov 2, 2024 •

edited

Loading

dirvine commented Nov 13, 2024 •

edited

Loading