What's Changed
This release introduces 3 major changes:
-
Introducing
torchdata.nodes
, a library of extensible and composable iterators that lets you chain together common dataloading and pre-proc operations! This initial release includes the following features, with more on the way:- Multi-threaded paralellism, and experimental support for Free-Threaded (No-GIL) Python, in addition to the typical Multi-process parallelism.
- Note: FT Python support is experimental, requires Python 3.13t and torch>=2.5.0, and is currently only tested for Linux
- Multi-dataset weighted sampling
- State Management through state_dict/load_state_dict methods
- Near-feature-parity with torch.utils.data.DataLoader, with full support for existing torch.utils.data.Dataset (IterableDataset and persistent_workers coming soon!).
- Refer to the
torchdata.nodes
docs for more details.
- Multi-threaded paralellism, and experimental support for Free-Threaded (No-GIL) Python, in addition to the typical Multi-process parallelism.
-
This release drops support for DataPipes and DataLoader2. Release v0.9 was the last stable release which includes them. Please see this issue for more details.
-
PyTorch's official conda channel is deprecated. TorchData has removed its conda builds as well. TorchData will be available for installation through pip, on PyPI and download.pytorch.org.
Full Changelog: v0.9.0...v0.10.1