Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding parameters as a concept to stream/hal/tooling. (iree-org#15104)
Parameters are externalized storage for resources that are asynchronously accessible and device-aware. Parameters can be read or written on the same device timelines as the operations that consume or produce them and with locality pinning to ensure memory doesn't need to move. Parameters are referenced by an optional scope (a file name, a model name, whatever) and a unique key within that scope, with the scope being strongly recommended as a way to distinguish sets of parameters that may exist when multiple model parts are compiled together and would otherwise collide. Parameters are provided to programs by a virtual interface and can support shared parameters (same storage used in multiple contexts, or outliving a single instantiation in a context), in-memory caches, memory-mapped files (including directly using the mapped memory for execution when devices support it), iree_hal_file_t usage for device-supported I/O, and parameter subsetting for things like runtime sharding. A basic file cache is implemented to allow for programs to decide when and where they want to use the parameters without needing to have bound them to devices at startup time. Alongside read(+load) and write operations gather and scatter allow for batching of large numbers of reads and writes into/from single buffers. For parameter providers that can batch operations this allows for a handful (~1-4) of calls out to perform many more operations (~thousands). Modeling the gather/scatter also gives us a point where we could extract the mapping and use it to repack files/defrag memory in the future. Parameters are currently defined by the `#stream.parameter.named` attribute which specifies an optional parameter scope and a scope-unique key for the parameter along with its logical type. Today these are intended to be used as default values on global ops (mutable or immutable) and are hackily processed as if they were constants. Future changes will allow parameter mutation and storage but what's present should be enough for inference and training parameter initialization. Example parameter (here a tensor but parameters can be other types in the future to act as bags of bits): ```mlir util.global private @"model.layer-1.kernel" = #stream.parameter.named<"mnist"::"model.layer-1.kernel"> : tensor<784x128xf32> ``` Parameters can optionally have a subrange specified indicating that the logical tensor is a block of some larger storage. When sharding this can be used to have an individual shard load a subset of the parameter data: ```mlir util.global private @"model.layer-1.kernel-shard-0" = #stream.parameter.named<"mnist"::"model.layer-1.kernel", {offset = 0}> : tensor<392x128xf32> util.global private @"model.layer-1.kernel-shard-1" = #stream.parameter.named<"mnist"::"model.layer-1.kernel", {offset = 200704}> : tensor<392x128xf32> ``` In this initial implementation we err on the side of optimizing for discrete memory devices (GPUs/etc) by emitting gathers of all parameters. On unified memory systems where we can zero-copy import parameters into device memory this is wasteful but it ensures proper alignment/packing/limited runtime overheads. Setting the resource memory model to `unified` via the `#stream.resource_config` attribute (helper flag `--iree-stream-resource-memory-model=unified`) will change instead to aliasing parameter memory where possible at the cost of increased runtime overhead. Future changes will connect the resource memory model to those of the devices under compilation and allow for heterogenous deployments to treat parameters used exclusively on different devices in whatever way is best for that device. Basic tooling support for read-only parameters has been added for testing by allowing parameter files to be specified on the command line: ``` $ iree-run-module \ --parameter_mode=mmap \ --parameters=some_scope=some/file0.safetensors \ --parameters=other_scope=some/file1.gguf \ --module=... ``` Currently parameters are only usable from the full HAL implementation and not the inline HAL - the parameter file format and index code was kept portable such that it could be reused for a lighter-weight feature set if we wanted to support parameters in the inline HAL but given that cases where the inline HAL is interesting are usually small models on tiny systems where optimization of parameters is critical to memory/performance I haven't bothered here. Since all parameter file formats are terrible a new parameter file format that is less terrible for our uses will be introduced in future changes. It's still experimental and not fully wired up but will be something we can convert other formats into for optimize use as both immutable constant and mutable variable storage in our tools when direct compatibility with existing frameworks is not required without conversion steps. The `iree-dump-parameters` tool can be used to inspect any of the parameter file formats the tooling can load and extract individual parameters. It indexes parameters using the same flags as the rest of the tooling so it can also be useful to see what parameters are actually available for use without trial and error in other tools. Example output: ``` $ ../iree-build/tools/iree-dump-parameters.exe --parameters=a=tools/test/parameters_a.safetensors --parameters=runtime/src/iree/io/formats/gguf/testdata/multiple.gguf --extract=a::a0=a0.bin --extract=tensor0=tensor0.bin //===--------------------------------------------------------------------------------------------------------------===// // Parameter scope `a` (2 entries, 64 total bytes) //===------------+------------------+------------------+-----------------------------------------------------------===// // Start | End | Length | Key //---------------+------------------+------------------+--------------------------------------------------------------// 120 | 152 | 32 | `a0` 152 | 184 | 32 | `a1` //===--------------------------------------------------------------------------------------------------------------===// // Parameter scope `` (3 entries, 72 total bytes) //===------------+------------------+------------------+-----------------------------------------------------------===// // Start | End | Length | Key //---------------+------------------+------------------+--------------------------------------------------------------// 448 | 464 | 16 | `tensor0` 512 | 520 | 8 | `tensor1` 576 | 624 | 48 | `tensor2` Extracting parameter `a::a0` (32b) to `a0.bin`... Extracting parameter `tensor0` (16b) to `tensor0.bin`... ``` Progress on iree-org#14987. --------- Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
- Loading branch information