Skip to content

Commit

Permalink
tutorials (#375)
Browse files Browse the repository at this point in the history
* doc fine tune

* add example for ddp, edit c++ example

* 1st review

* corrected package name in installation guide

* add model zoo to examples page

* updata int8 doc (#377)

* updata int8 doc

* version 2

* modify optimizers optiization (#378)

* review 20211130

* add INT8 fusion patterns and API in graph_optimization (#380)

* add INT8 fusion patterns and API in graph_optimization

* add integration with oneDNN graph

* Add BN folding for graph_optimization

* tutorials for v1.10.0

* int8.md fine tune

* finalized for v1.10.0 release

Co-authored-by: XiaobingZhang <xiaobing.zhang@intel.com>
Co-authored-by: zhuhaozhe <haozhe.zhu@intel.com>
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
  • Loading branch information
4 people authored and EikanWang committed Dec 1, 2021
1 parent 11dbc83 commit 9318fae
Show file tree
Hide file tree
Showing 25 changed files with 1,132 additions and 582 deletions.
62 changes: 31 additions & 31 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
name: Publish

on:
push:
branches:
- ghpapers_style

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v1
- name: Install dependencies
run: |
export PATH="$HOME/.local/bin:$PATH"
sudo apt-get install -y python3-setuptools
pip3 install --user --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
pip3 install --user -r requirements.txt
python3 setup.py install
pip3 install --user -r docs/requirements.txt
- name: Build the docs
run: |
export PATH="$HOME/.local/bin:$PATH"
cd docs
make html
- name: Push the docs
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docs/_build/html
publish_branch: gh-pages
#on:
# push:
# branches:
# - gh-pages
#
#jobs:
# build:
#
# runs-on: ubuntu-latest
#
# steps:
# - uses: actions/checkout@v1
# - name: Install dependencies
# run: |
# export PATH="$HOME/.local/bin:$PATH"
# sudo apt-get install -y python3-setuptools
# pip3 install --user torch=1.10.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
# pip3 install --user -r requirements.txt
# python3 setup.py install
# pip3 install --user -r docs/requirements.txt
# - name: Build the docs
# run: |
# export PATH="$HOME/.local/bin:$PATH"
# cd docs
# make html
# - name: Push the docs
# uses: peaceiris/actions-gh-pages@v3
# with:
# github_token: ${{ secrets.GITHUB_TOKEN }}
# publish_dir: docs/_build/html
# publish_branch: gh-pages
379 changes: 21 additions & 358 deletions README.md

Large diffs are not rendered by default.

5 changes: 2 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Welcome to Intel® Extension for PyTorch* documentation!

Intel® Extension for PyTorch* extends PyTorch with optimizations for extra performance boost on Intel hardware. Most of the optimizations will be included in stock PyTorch releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch on Intel hardware, examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).

Intel® Extension for PyTorch* is structured as the following figure. It is a runtime extension. Users can enable it dynamically in script by importing `intel_extension_for_pytorch`. It covers optimizations for both imperative mode and graph mode. Optimized operators and kernels are registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel hardware. During execution, Intel® Extension for PyTorch* intercepts invocation of ATen operators, and replace the original ones with these optimized ones. In graph mode, further operator fusions are applied manually by Intel engineers or through a tool named *oneDNN Graph* to reduce operator/kernel invocation overheads, and thus increase performance.
Intel® Extension for PyTorch* is structured as the following figure. It is loaded as a Python module for Python programs or linked as a C++ library for C++ programs. Users can enable it dynamically in script by importing `intel_extension_for_pytorch`. It covers optimizations for both imperative mode and graph mode. Optimized operators and kernels are registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel hardware. During execution, Intel® Extension for PyTorch* intercepts invocation of ATen operators, and replace the original ones with these optimized ones. In graph mode, further operator fusions are applied manually by Intel engineers or through a tool named *oneDNN Graph* to reduce operator/kernel invocation overheads, and thus increase performance.

.. image:: ../images/intel_extension_for_pytorch_structure.png
:width: 800
Expand All @@ -24,8 +24,7 @@ Intel® Extension for PyTorch* has been released as an open–source project at
:maxdepth: 1

tutorials/features
tutorials/notices
tutorials/release_notes
tutorials/releases
tutorials/installation
tutorials/examples
tutorials/api_doc
Expand Down
2 changes: 2 additions & 0 deletions docs/tutorials/blogs_publications.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@ Blogs & Publications

* [Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
* [Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
* *Note*: APIs mentioned in it are deprecated.
* [Scaling up BERT-like model Inference on modern CPU - Part 1 by the launcher of the extension](https://huggingface.co/blog/bert-cpu-scaling-part-1)
* [KT Optimizes Performance for Personalized Text-to-Speech](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/KT-Optimizes-Performance-for-Personalized-Text-to-Speech/post/1337757)
9 changes: 0 additions & 9 deletions docs/tutorials/contribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,6 @@ In case you want to reinstall, make sure that you uninstall Intel® Extension fo
ENV_KEY1=ENV_VAL1[, ENV_KEY2=ENV_VAL2]* python setup.py develop
```

## Codebase structure

* [torch_ipex/csrc](https://github.com/intel/intel-extension-for-pytorch/tree/master/torch_ipex/csrc) - C++ library for Intel® Extension for PyTorch\*
* [intel_extension_for_pytorch](https://github.com/intel/intel-extension-for-pytorch/tree/master/intel_extension_for_pytorch) - The actual Intel® Extension for PyTorch\* library. Everything that is not in [csrc](https://github.com/intel/intel-extension-for-pytorch/tree/master/torch_ipex/csrc) is a Python module, following the PyTorch Python frontend module structure.
* [tools](https://github.com/intel/intel-extension-for-pytorch/tree/master/tools) -
* [tests](https://github.com/intel/intel-extension-for-pytorch/tree/master/tests) - Python unit tests for Intel® Extension for PyTorch\* Python frontend.
* [cpu](https://github.com/intel/intel-extension-for-pytorch/tree/master/tests/cpu) -
* [cpp](https://github.com/intel/intel-extension-for-pytorch/tree/master/tests/cpu/cpp) - C++ unit tests for Intel® Extension for PyTorch\* C++ frontend.

## Unit testing

### Python Unit Testing
Expand Down
166 changes: 143 additions & 23 deletions docs/tutorials/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ output = model(data)

#### Complete - Float32


```
import torch
import torchvision
Expand Down Expand Up @@ -128,7 +127,69 @@ torch.save({

### Distributed Training

Distributed training with PyTorch DDP is accelerated by oneAPI Collective Communications Library Bindings for Pytorch\* (oneCCL Bindings for Pytorch\*). More detailed information and examples are available at its [Github repo](https://github.com/intel/torch-ccl).
Distributed training with PyTorch DDP is accelerated by oneAPI Collective Communications Library Bindings for Pytorch\* (oneCCL Bindings for Pytorch\*). The extension supports FP32 and BF16 data types. More detailed information and examples are available at its [Github repo](https://github.com/intel/torch-ccl).

**Note:** When performing distributed training with BF16 data type, please use oneCCL Bindings for Pytorch\*. Due to a PyTorch limitation, distributed training with BF16 data type with Intel® Extension for PyTorch\* is not supported.

```
import os
import torch
import torch.distributed as dist
import torchvision
import torch_ccl
import intel_extension_for_pytorch as ipex
LR = 0.001
DOWNLOAD = True
DATA = 'datasets/cifar10/'
transform = torchvision.transforms.Compose([
torchvision.transforms.Resize((224, 224)),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_dataset = torchvision.datasets.CIFAR10(
root=DATA,
train=True,
transform=transform,
download=DOWNLOAD,
)
train_loader = torch.utils.data.DataLoader(
dataset=train_dataset,
batch_size=128
)
os.environ['MASTER_ADDR'] = '127.0.0.1'
os.environ['MASTER_PORT'] = '29500'
os.environ['RANK'] = os.environ.get('PMI_RANK', 0)
os.environ['WORLD_SIZE'] = os.environ.get('PMI_SIZE', 1)
dist.init_process_group(
backend='ccl',
init_method='env://'
)
model = torchvision.models.resnet50()
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
model.train()
model, optimizer = ipex.optimize(model, optimizer=optimizer)
model = torch.nn.parallel.DistributedDataParallel(model)
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
data = data.to(memory_format=torch.channels_last)
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
print('batch_id: {}'.format(batch_idx))
torch.save({
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
}, 'checkpoint.pth')
```

## Inference

Expand All @@ -148,7 +209,7 @@ data = torch.rand(1, 3, 224, 224)
import intel_extension_for_pytorch as ipex
model = model.to(memory_format=torch.channels_last)
model = ipex.optimize(model, dtype=torch.float32, level='O1')
model = ipex.optimize(model)
data = data.to(memory_format=torch.channels_last)
with torch.no_grad():
Expand All @@ -170,7 +231,7 @@ seq_length = 512
data = torch.randint(vocab_size, size=[batch_size, seq_length])
import intel_extension_for_pytorch as ipex
model = ipex.optimize(model, dtype=torch.float32, level="O1")
model = ipex.optimize(model)
with torch.no_grad():
model(data)
Expand All @@ -190,7 +251,7 @@ data = torch.rand(1, 3, 224, 224)
import intel_extension_for_pytorch as ipex
model = model.to(memory_format=torch.channels_last)
model = ipex.optimize(model, dtype=torch.float32, level='O1')
model = ipex.optimize(model)
data = data.to(memory_format=torch.channels_last)
with torch.no_grad():
Expand All @@ -216,7 +277,7 @@ seq_length = 512
data = torch.randint(vocab_size, size=[batch_size, seq_length])
import intel_extension_for_pytorch as ipex
model = ipex.optimize(model, dtype=torch.float32, level="O1")
model = ipex.optimize(model)
with torch.no_grad():
d = torch.randint(vocab_size, size=[batch_size, seq_length])
Expand All @@ -242,7 +303,7 @@ data = torch.rand(1, 3, 224, 224)
import intel_extension_for_pytorch as ipex
model = model.to(memory_format=torch.channels_last)
model = ipex.optimize(model, dtype=torch.bfloat16, level='O1')
model = ipex.optimize(model, dtype=torch.bfloat16)
data = data.to(memory_format=torch.channels_last)
with torch.no_grad():
Expand All @@ -265,7 +326,7 @@ seq_length = 512
data = torch.randint(vocab_size, size=[batch_size, seq_length])
import intel_extension_for_pytorch as ipex
model = ipex.optimize(model, dtype=torch.bfloat16, level="O1")
model = ipex.optimize(model, dtype=torch.bfloat16)
with torch.no_grad():
with torch.cpu.amp.autocast():
Expand All @@ -286,7 +347,7 @@ data = torch.rand(1, 3, 224, 224)
import intel_extension_for_pytorch as ipex
model = model.to(memory_format=torch.channels_last)
model = ipex.optimize(model, dtype=torch.bfloat16, level='O1')
model = ipex.optimize(model, dtype=torch.bfloat16)
data = data.to(memory_format=torch.channels_last)
with torch.no_grad():
Expand All @@ -312,7 +373,7 @@ seq_length = 512
data = torch.randint(vocab_size, size=[batch_size, seq_length])
import intel_extension_for_pytorch as ipex
model = ipex.optimize(model, dtype=torch.bfloat16, level="O1")
model = ipex.optimize(model, dtype=torch.bfloat16)
with torch.no_grad():
with torch.cpu.amp.autocast():
Expand Down Expand Up @@ -349,13 +410,12 @@ for d in calibration_data_loader():
model(d)
conf.save('int8_conf.json', default_recipe=True)
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
with torch.no_grad():
model(data)
```

#### Deployment

##### Imperative Mode

```
import torch
Expand All @@ -371,15 +431,31 @@ with torch.no_grad():
model(data)
```

##### Graph Mode

```
import torch
import intel_extension_for_pytorch as ipex
model = torch.jit.load('<INT8 model file>')
model.eval()
data = torch.rand(<shape>)
with torch.no_grad():
model(data)
```

## C++

To work with libtorch, C++ library of PyTorch, Intel® Extension for PyTorch\* provides its C++ dynamic library as well. The C++ library is supposed to handle inference workload only, such as service deployment. For regular development, please use Python interface. Comparing to usage of libtorch, no specific code changes are required, except for converting input data into channels last data format. Compilation follows the recommended methodology with CMake. Detailed instructions can be found in [PyTorch tutorial](https://pytorch.org/tutorials/advanced/cpp_export.html#depending-on-libtorch-and-building-the-application).

During compilation, Intel optimizations will be activated automatically once C++ dynamic library of Intel® Extension for PyTorch\* is linked.

The example code below works for all data types.

**example-app.cpp**

```
```cpp
#include <torch/script.h>
#include <iostream>
#include <memory>
Expand All @@ -405,25 +481,69 @@ int main(int argc, const char* argv[]) {
**CMakeList.txt**
```
```cmake
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(example-app)
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS} -Wl,--no-as-needed")
find_package(intel-ext-pt-cpu REQUIRED)
add_executable(example-app example-app.cpp)
# Link the binary against the C++ dynamic library file of Intel® Extension for PyTorch*
target_link_libraries(example-app "${TORCH_LIBRARIES}" "${INTEL_EXTENSION_FOR_PYTORCH_PATH}/lib/libintel-ext-pt-cpu.so")
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)
```

**Note:** Since Intel® Extension for PyTorch\* is still under development, name of the c++ dynamic library in the master branch may defer to *libintel-ext-pt-cpu.so* shown above. Please check the name out in the installation folder. The so file name starts with *libintel-*.

**Command for compilation**

```
$ cmake -DCMAKE_PREFIX_PATH=<LIBPYTORCH_PATH> -DINTEL_EXTENSION_FOR_PYTORCH_PATH=<INTEL_EXTENSION_FOR_PYTORCH_INSTALLATION_PATH> ..
```bash
$ cmake -DCMAKE_PREFIX_PATH=<LIBPYTORCH_PATH> ..
$ make
```

If *Found INTEL_EXT_PT_CPU* is shown as *TRUE*, the extension had been linked into the binary. This can be verified with Linux command *ldd*.

```bash
$ cmake -DCMAKE_PREFIX_PATH=/workspace/libtorch ..
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found Torch: /workspace/libtorch/lib/libtorch.so
-- Found INTEL_EXT_PT_CPU: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /workspace/build

$ ldd example-app
...
libtorch.so => /workspace/libtorch/lib/libtorch.so (0x00007f3cf98e0000)
libc10.so => /workspace/libtorch/lib/libc10.so (0x00007f3cf985a000)
libintel-ext-pt-cpu.so => /workspace/libtorch/lib/libintel-ext-pt-cpu.so (0x00007f3cf70fc000)
libtorch_cpu.so => /workspace/libtorch/lib/libtorch_cpu.so (0x00007f3ce16ac000)
...
libdnnl_graph.so.0 => /workspace/libtorch/lib/libdnnl_graph.so.0 (0x00007f3cde954000)
...
```
## Model Zoo
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.10-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.10-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
Loading

0 comments on commit 9318fae

Please sign in to comment.