-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive data map, Functor based decrypt method (for direct network calls) and also same pattern for getting root data_map #394
Conversation
* Remove parallel processing in decrypt to maintain chunk boundaries * Improve chunk ordering and boundary handling * Add better error messages for missing chunks * Fix seek_and_join test to properly handle chunk boundaries * Fix overflow issues in seek_with_length_over_data_size test
* Reorganize README.md to clearly document both Rust and Python interfaces * Add Python installation and usage examples * Configure pyproject.toml to use Cargo.toml version via maturin * Maintain all existing Rust documentation and imagery * Keep security notes and whitepaper references prominent The Python package will automatically sync its version with Cargo.toml using maturin's dynamic versioning feature.
* Add optional `child` field to DataMap struct * Update DataMap constructors to support child field: - new() creates DataMap without child - with_child() creates DataMap with specified child value * Add child() getter method * Update Debug implementation to display child field when present * Add comprehensive tests: - Basic functionality with/without child - Serialization/deserialization - Debug output formatting * Add test helper functions for creating test DataMaps This change allows DataMap to track an optional child identifier while maintaining backward compatibility with existing code.
This commit introduces functionality to handle large data maps by implementing a hierarchical shrinking mechanism and corresponding expansion capability. Key changes: - Add shrink_data_map function that recursively shrinks data maps until they contain fewer than 4 chunks - Add get_root_data_map function to recursively expand child data maps back to their root form - Implement generic storage interface using functors to allow flexible backend storage (disk, memory, network etc) - Add comprehensive test suite covering: * Disk-based storage using tempdir * In-memory storage using thread-safe HashMap * Multiple levels of shrinking/expansion * Error handling scenarios * Data integrity verification - Add create_dummy_data_map helper for testing large maps - Update DataMap struct to track child levels for hierarchical relationships The changes enable efficient handling of large data maps by breaking them down into manageable chunks while maintaining data integrity and providing flexible storage options. Testing: - All tests pass except for multiple_levels test which needs larger input - Added disk and memory-based storage tests - Added error handling tests - Added data integrity verification
…ackends This commit introduces major enhancements to the self_encryption library: Core Changes: - Add shrink_data_map function for hierarchical data map reduction - Add get_root_data_map function for recursive data map expansion - Implement flexible storage backend support using functors - Add decrypt_from_storage with generic storage interface - Update DataMap struct to support child levels Python Bindings: - Add Python bindings for hierarchical data map operations - Add shrink_data_map and get_root_data_map Python functions - Update PyDataMap class to support serialization/deserialization Documentation: - Comprehensive README update with detailed examples - Add extensive Rust usage examples - Add Python usage examples - Document hierarchical data map functionality - Add implementation details section Testing: - Add comprehensive test suite for new functionality - Add disk-based storage tests - Add memory-based storage tests - Add error handling tests - Add hierarchical data map tests - Add tests for multiple levels of shrinking The changes enable efficient handling of large files through hierarchical data maps and provide flexible storage options through generic backends. All new functionality is thoroughly tested and documented with examples in both Rust and Python. Breaking Changes: None Dependencies: No new dependencies added
Added extensive integration tests to verify self-encryption functionality across different storage backends: * Added StorageBackend helper struct for managing memory/disk storage * Added debug helpers for storage state visualization * Implemented cross-backend tests: - Memory-to-memory, memory-to-disk, disk-to-memory operations - Large file handling (100MB+) - Concurrent access with multiple file sizes - Platform-specific size handling (page sizes, u16/u32 boundaries) - Error handling and recovery * Added verification steps between operations * Fixed chunk handling to ensure proper storage/retrieval flow * Added detailed logging for debugging storage operations These tests ensure consistent behavior across different storage backends and verify data integrity through the entire encrypt/decrypt cycle.
Also fix python bindings and add more comprehensive exampels in the README
plus ready version bump
This sounds interesting @dirvine. Please can you elaborate on the new/changed functionality this enables (maybe in the OP)? |
Sorry Ar, I missed your question. Yes there is a couple of things happening here
|
BREAKING CHANGE: This PR alters the API
src/data_map.rs
Outdated
pub struct DataMap { | ||
/// List of chunk hashes | ||
pub chunk_identifiers: Vec<ChunkInfo>, | ||
/// Child value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what that value
stands for, number of generated chunks or recursive levels, or the original data_map size ?
also, the comment of this struct may need to be updated as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the value here is the level of of the child really. So None means it's a root_level data_map (so can be very large). The oldest child will be what we return from encrypt and will be length 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall above put as comment?
the current one is really vague
@@ -241,131 +239,90 @@ fn get_chunk_sizes() -> Result<(), Error> { | |||
|
|||
#[test] | |||
fn seek_and_join() -> Result<(), Error> { | |||
for i in 1..15 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally keep this test, make the new one as a new separated test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was invalid and prone to random failures. We could write a whole new one though, but this one was not good.
let start_size = 4_194_300; | ||
for i in 0..27 { | ||
for i in 0..5 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size has been reduced for any reason ?
if aim to cover small sized file
purposely, maybe better make as a new testing scenario?
say make this test as a function taking a parameter of i
, and pass down different values in ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just to reduce test time, the larger size was not testing anything further AFAIK.
} | ||
|
||
#[test] | ||
fn seek_with_length_over_data_size() -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better keep this scenario (may can be merged with above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test used old mechanism and was updated to the new code in a new test. We could do another larger test again, but this suite was upgraded to handle new code.
This commit fixes potential panic conditions in the decryption functions and improves their robustness and clarity: - Replace unsafe direct indexing with safe `.get()` lookups in chunk hash validation - Add proper error handling for missing or corrupted chunks - Rename `relative_pos` parameter to `file_pos` to clarify its meaning - Add comprehensive documentation for function parameters and return values - Improve error messages to aid in debugging missing/corrupted chunks - Maintain consistent error handling throughout both functions The changes prevent runtime panics that could occur when: - A chunk's content hash doesn't match any expected hash in the data map - Chunks are missing or corrupted - Invalid chunk indices are encountered This is a breaking change for `decrypt_range()` as the `relative_pos` parameter has been renamed to `file_pos` to better reflect that it represents a position within the complete file rather than within the first chunk. Testing: - Existing tests pass - Added error cases are properly handled - API documentation is complete and accurate
@@ -8,63 +8,42 @@ | |||
|
|||
use crate::{encryption, get_pad_key_and_iv, xor, EncryptedChunk, Error, Result}; | |||
use bytes::Bytes; | |||
use itertools::Itertools; | |||
use rayon::prelude::*; | |||
use std::io::Cursor; | |||
use xor_name::XorName; | |||
|
|||
pub fn decrypt(src_hashes: Vec<XorName>, encrypted_chunks: &[&EncryptedChunk]) -> Result<Bytes> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better rename encrypted_chunks
to sorted_encrypted_chunks
, or put comment on top of the function to highlight the input shall be sorted
.
to avoid future misuse.
Description (BREAKING CHANGE:)
This PR introduces hierarchical data maps and flexible storage backends to the self_encryption library, along with comprehensive Python bindings and documentation updates.
Key Changes
Core Functionality
shrink_data_map
function for hierarchical data map reductionget_root_data_map
function for recursive data map expansiondecrypt_from_storage
with generic storage interfaceDataMap
struct to support child levelsPython Bindings
shrink_data_map
andget_root_data_map
Python functionsPyDataMap
class to support serialization/deserializationDocumentation
Testing
Impact
Testing Done
Documentation
Related Issues
Closes #XXX (replace with actual issue number)
Checklist