Skip to content

Commit

Permalink
Adding parameters as a concept to stream/hal/tooling. (iree-org#15104)
Browse files Browse the repository at this point in the history
Parameters are externalized storage for resources that are
asynchronously accessible and device-aware. Parameters can be read or
written on the same device timelines as the operations that consume or
produce them and with locality pinning to ensure memory doesn't need to
move. Parameters are referenced by an optional scope (a file name, a
model name, whatever) and a unique key within that scope, with the scope
being strongly recommended as a way to distinguish sets of parameters
that may exist when multiple model parts are compiled together and would
otherwise collide.

Parameters are provided to programs by a virtual interface and can
support shared parameters (same storage used in multiple contexts, or
outliving a single instantiation in a context), in-memory caches,
memory-mapped files (including directly using the mapped memory for
execution when devices support it), iree_hal_file_t usage for
device-supported I/O, and parameter subsetting for things like runtime
sharding. A basic file cache is implemented to allow for programs to
decide when and where they want to use the parameters without needing to
have bound them to devices at startup time.

Alongside read(+load) and write operations gather and scatter allow for
batching of large numbers of reads and writes into/from single buffers.
For parameter providers that can batch operations this allows for a
handful (~1-4) of calls out to perform many more operations
(~thousands). Modeling the gather/scatter also gives us a point where we
could extract the mapping and use it to repack files/defrag memory in
the future.

Parameters are currently defined by the `#stream.parameter.named`
attribute which specifies an optional parameter scope and a scope-unique
key for the parameter along with its logical type. Today these are
intended to be used as default values on global ops (mutable or
immutable) and are hackily processed as if they were constants. Future
changes will allow parameter mutation and storage but what's present
should be enough for inference and training parameter initialization.

Example parameter (here a tensor but parameters can be other types in
the future to act as bags of bits):
```mlir
util.global private @"model.layer-1.kernel" = #stream.parameter.named<"mnist"::"model.layer-1.kernel"> : tensor<784x128xf32>
```

Parameters can optionally have a subrange specified indicating that the
logical tensor is a block of some larger storage. When sharding this can
be used to have an individual shard load a subset of the parameter data:
```mlir
util.global private @"model.layer-1.kernel-shard-0" = #stream.parameter.named<"mnist"::"model.layer-1.kernel", {offset = 0}> : tensor<392x128xf32>
util.global private @"model.layer-1.kernel-shard-1" = #stream.parameter.named<"mnist"::"model.layer-1.kernel", {offset = 200704}> : tensor<392x128xf32>
```

In this initial implementation we err on the side of optimizing for
discrete memory devices (GPUs/etc) by emitting gathers of all
parameters. On unified memory systems where we can zero-copy import
parameters into device memory this is wasteful but it ensures proper
alignment/packing/limited runtime overheads. Setting the resource memory
model to `unified` via the `#stream.resource_config` attribute (helper
flag `--iree-stream-resource-memory-model=unified`) will change instead
to aliasing parameter memory where possible at the cost of increased
runtime overhead. Future changes will connect the resource memory model
to those of the devices under compilation and allow for heterogenous
deployments to treat parameters used exclusively on different devices in
whatever way is best for that device.

Basic tooling support for read-only parameters has been added for
testing by allowing parameter files to be specified on the command line:
```
$ iree-run-module \
    --parameter_mode=mmap \
    --parameters=some_scope=some/file0.safetensors \
    --parameters=other_scope=some/file1.gguf \
    --module=...
```

Currently parameters are only usable from the full HAL implementation
and not the inline HAL - the parameter file format and index code was
kept portable such that it could be reused for a lighter-weight feature
set if we wanted to support parameters in the inline HAL but given that
cases where the inline HAL is interesting are usually small models on
tiny systems where optimization of parameters is critical to
memory/performance I haven't bothered here.

Since all parameter file formats are terrible a new parameter file
format that is less terrible for our uses will be introduced in future
changes. It's still experimental and not fully wired up but will be
something we can convert other formats into for optimize use as both
immutable constant and mutable variable storage in our tools when direct
compatibility with existing frameworks is not required without
conversion steps.

The `iree-dump-parameters` tool can be used to inspect any of the
parameter file formats the tooling can load and extract individual
parameters. It indexes parameters using the same flags as the rest of
the tooling so it can also be useful to see what parameters are actually
available for use without trial and error in other tools. Example
output:
```
$ ../iree-build/tools/iree-dump-parameters.exe --parameters=a=tools/test/parameters_a.safetensors --parameters=runtime/src/iree/io/formats/gguf/testdata/multiple.gguf --extract=a::a0=a0.bin --extract=tensor0=tensor0.bin
//===--------------------------------------------------------------------------------------------------------------===//
// Parameter scope `a` (2 entries, 64 total bytes)
//===------------+------------------+------------------+-----------------------------------------------------------===//
//         Start |              End |           Length | Key
//---------------+------------------+------------------+--------------------------------------------------------------//
             120 |              152 |               32 | `a0`
             152 |              184 |               32 | `a1`

//===--------------------------------------------------------------------------------------------------------------===//
// Parameter scope `` (3 entries, 72 total bytes)
//===------------+------------------+------------------+-----------------------------------------------------------===//
//         Start |              End |           Length | Key
//---------------+------------------+------------------+--------------------------------------------------------------//
             448 |              464 |               16 | `tensor0`
             512 |              520 |                8 | `tensor1`
             576 |              624 |               48 | `tensor2`

Extracting parameter `a::a0` (32b) to `a0.bin`...
Extracting parameter `tensor0` (16b) to `tensor0.bin`...
```

Progress on iree-org#14987.

---------

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
  • Loading branch information
benvanik and stellaraccident authored Nov 3, 2023
1 parent 988f7c5 commit 11ced0c
Show file tree
Hide file tree
Showing 134 changed files with 9,389 additions and 533 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ iree_compiler_cc_library(
"Patterns.h",
],
deps = [
":Utils",
"//compiler/src/iree/compiler/Dialect/HAL/Conversion",
"//compiler/src/iree/compiler/Dialect/HAL/IR",
"//compiler/src/iree/compiler/Dialect/HAL/IR:HALDialect",
Expand All @@ -36,3 +37,24 @@ iree_compiler_cc_library(
"@llvm-project//mlir:Transforms",
],
)

iree_compiler_cc_library(
name = "Utils",
srcs = [
"Utils.cpp",
],
hdrs = [
"Utils.h",
],
deps = [
"//compiler/src/iree/compiler/Dialect/HAL/IR",
"//compiler/src/iree/compiler/Dialect/HAL/IR:HALDialect",
"//compiler/src/iree/compiler/Dialect/HAL/Target",
"//compiler/src/iree/compiler/Dialect/Stream/IR",
"//compiler/src/iree/compiler/Dialect/Util/IR",
"@llvm-project//llvm:Support",
"@llvm-project//mlir:ArithDialect",
"@llvm-project//mlir:IR",
"@llvm-project//mlir:SCFDialect",
],
)
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ iree_cc_library(
SRCS
"Patterns.cpp"
DEPS
::Utils
LLVMSupport
MLIRArithDialect
MLIRFuncDialect
Expand All @@ -34,4 +35,24 @@ iree_cc_library(
PUBLIC
)

iree_cc_library(
NAME
Utils
HDRS
"Utils.h"
SRCS
"Utils.cpp"
DEPS
LLVMSupport
MLIRArithDialect
MLIRIR
MLIRSCFDialect
iree::compiler::Dialect::HAL::IR
iree::compiler::Dialect::HAL::IR::HALDialect
iree::compiler::Dialect::HAL::Target
iree::compiler::Dialect::Stream::IR
iree::compiler::Dialect::Util::IR
PUBLIC
)

### BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_BELOW_THIS_LINE ###
Loading

0 comments on commit 11ced0c

Please sign in to comment.