Skip to content

Commit cefe583

Browse files
add DLRMv2 BKC under examples (#3743)
* add a models folder under examples for BKCs * add folder for deepseek, dlrm and llama * cp dlrm from intel-innersource/frameworks.ai.models.intel-models@82a9314 * suppport fp8 * replace crossnet * support tf32 * remove unused codes * change link * rm CONTAINEAR * update README * remove jit_trace; fix flake * rename to fp8_data.json; rm unused dependance * cp unpack from jit_trace * fix flake * Update README.md Add tmp branch * update BKC * fix ao link * update BKC; refine q/dq * update run_model.sh for fp8 and tf32 * update .bom * fix flake * Update branch on README.md * fix flake --------- Co-authored-by: y <[email protected]>
1 parent a07c604 commit cefe583

19 files changed

+37288
-2
lines changed

.bom

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ types-dataclasses
6262
neural-compressor
6363
oneTBB
6464
oneCCL
65-
onednn-graph
65+
onednn-graph
6666
oneDNN
6767
jemalloc
6868
click
@@ -118,7 +118,7 @@ pyyaml
118118
schema
119119
setuptools
120120
Triton
121-
urllib3
121+
urllib3
122122
libcurl
123123
intel-media-va-driver-non-free
124124
libmfx1
@@ -165,3 +165,6 @@ g++-12
165165
backoff
166166
torchaudio
167167
einops
168+
torchao
169+
torchrec
170+
torchmetrics
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
2+
project(aoti_example)
3+
4+
find_package(Torch REQUIRED)
5+
6+
add_executable(aoti_example bench.cpp model.so)
7+
8+
target_link_libraries(aoti_example "${TORCH_LIBRARIES}")
9+
set_property(TARGET aoti_example PROPERTY CXX_STANDARD 17)
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# DLRM v2 Inference
2+
3+
DLRM v2 Inference best known configurations with PyTorch.
4+
5+
## Model Information
6+
7+
| **Use Case** | **Framework** | **Model Repo** | **Branch/Commit/Tag** | **Optional Patch** |
8+
|:---:| :---: |:--------------:|:---------------------:|:------------------:|
9+
| Inference | PyTorch | https://github.com/facebookresearch/dlrm/tree/main/torchrec_dlrm | - | - |
10+
11+
# Pre-Requisite
12+
## Bare Metal
13+
### General setup
14+
15+
Install Pytorch, TorchVison and jeMalloc.
16+
```
17+
git clone https://github.com/yanbing-j/pytorch.git
18+
cd pytorch
19+
git checkout yanbing/tf32_dev_branch_for_test/
20+
git submodule sync
21+
git submodule update --init --recursive
22+
conda install cmake ninja
23+
pip install -r requirements.txt
24+
pip install mkl-static mkl-include
25+
python setup.py install
26+
cd ..
27+
28+
git clone https://github.com/shiyang-weng/ao.git
29+
cd ao
30+
git checkout wengshiy/scaled_mm
31+
git submodule sync
32+
git submodule update --init --recursive
33+
pip install -r requirements.txt
34+
python setup.py install
35+
cd ..
36+
37+
conda install jemalloc
38+
```
39+
40+
### Model Specific Setup
41+
42+
* Set Jemalloc and tcmalloc Preload for better performance
43+
44+
The jemalloc and tcmalloc should be built from the [General setup](#general-setup) section.
45+
```
46+
export LD_PRELOAD="<path to the jemalloc directory>/lib/libjemalloc.so":"path_to/tcmalloc/lib/libtcmalloc.so":$LD_PRELOAD
47+
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000"
48+
```
49+
* Set IOMP preload for better performance
50+
```
51+
pip install packaging intel-openmp
52+
export LD_PRELOAD=path/lib/libiomp5.so:$LD_PRELOAD
53+
```
54+
55+
## Datasets
56+
The dataset can be downloaded and preprocessed by following https://github.com/mlcommons/training/tree/master/recommendation_v2/torchrec_dlrm#create-the-synthetic-multi-hot-dataset.
57+
We also provided a preprocessed scripts based on the instruction above. `preprocess_raw_dataset.sh`.
58+
After you loading the raw dataset `day_*.gz` and unzip them to RAW_DIR.
59+
```bash
60+
cd intel-extension-for-pytorch/examples/cpu/inference/python/models/dlrm/
61+
export MODEL_DIR=$(pwd)
62+
export RAW_DIR=<the unziped raw dataset>
63+
export TEMP_DIR=<where you choose the put the temp file during preprocess>
64+
export PREPROCESSED_DIR=<where you choose the put the one-hot dataset>
65+
export MULTI_HOT_DIR=<where you choose the put the multi-hot dataset>
66+
bash preprocess_raw_dataset.sh
67+
```
68+
69+
## Pre-Trained checkpoint
70+
You can download and unzip checkpoint by following
71+
https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch#downloading-model-weights
72+
73+
## Inference
74+
1. `git clone https://github.com/intel/intel-extension-for-pytorch.git`
75+
2. `cd intel-extension-for-pytorch/examples/cpu/inference/python/models/dlrm/`
76+
3. Create virtual environment `venv` and activate it:
77+
```
78+
python3 -m venv venv
79+
. ./venv/bin/activate
80+
```
81+
4. Install general model requirements
82+
```
83+
./setup.sh
84+
```
85+
86+
5. Setup required environment paramaters
87+
88+
| **Parameter** | **export command** |
89+
|:---------------------------:|:------------------------------------------------------------------------------------:|
90+
| **TEST_MODE** (THROUGHPUT, ACCURACY) | `export TEST_MODE=THROUGHPUT` |
91+
| **DATASET_DIR** | `export DATASET_DIR=<multi-hot dataset dir>` |
92+
| **WEIGHT_DIR** (ONLY FOR ACCURACY) | `export WEIGHT_DIR=<offical released checkpoint>` |
93+
| **PRECISION** | `export PRECISION=int8 <specify the precision to run: int8, fp32, bf32, bf16 or tf32>` |
94+
| **OUTPUT_DIR** | `export OUTPUT_DIR=$PWD` |
95+
| **BATCH_SIZE** (optional) | `export BATCH_SIZE=<set a value for batch size, else it will run with default batch size>` |
96+
| **TORCH_INDUCTOR** (optional) | `export TORCH_INDUCTOR=<0 or 1>` |
97+
| **FP8** (optional) | `export FP8=<0 or 1>` |
98+
99+
6. Run `run_model.sh`
100+
## Output
101+
102+
Single-tile output will typically look like:
103+
104+
```
105+
2024-07-18 15:58:00,970 - dlrm_main.py - __main__ - INFO - EVAL_START, EPOCH_NUM: 0
106+
2024-07-18 16:00:14,120 - dlrm_main.py - __main__ - INFO - AUROC over test set: [0.5129603203103565, 0.0, 0.0].
107+
2024-07-18 16:00:14,121 - dlrm_main.py - __main__ - INFO - Number of test samples: 131072
108+
2024-07-18 16:00:14,121 - dlrm_main.py - __main__ - INFO - Throughput: 103711.5248249468 fps
109+
2024-07-18 16:00:14,121 - dlrm_main.py - __main__ - INFO - Final AUROC: [0.5129603203103565, 0.0, 0.0]
110+
2024-07-18 16:00:17,133 - dlrm_main.py - __main__ - INFO - AUROC over test set: [0.5129603203103565, 0.0, 0.0].
111+
2024-07-18 16:00:17,133 - dlrm_main.py - __main__ - INFO - Number of test samples: 131072
112+
2024-07-18 16:00:17,133 - dlrm_main.py - __main__ - INFO - Throughput: 102890.12235101678 fps
113+
2024-07-18 16:00:17,134 - dlrm_main.py - __main__ - INFO - Final AUROC: [0.5129603203103565, 0.0, 0.0]
114+
```
115+
116+
117+
Final results of the inference run can be found in `results.yaml` file.
118+
```
119+
results:
120+
- key: throughput
121+
value: 102890.122
122+
unit: fps
123+
- key: latency
124+
value: N/A
125+
unit: s
126+
- key: accuracy
127+
value: 0.513
128+
unit: ROC AUC
129+
```

examples/cpu/inference/python/models/dlrm/__init__.py

Whitespace-only changes.

examples/cpu/inference/python/models/dlrm/data_process/__init__.py

Whitespace-only changes.
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
#!/usr/bin/env python3
2+
#
3+
# -*- coding: utf-8 -*-
4+
#
5+
# Copyright (c) 2023 Intel Corporation
6+
#
7+
# Licensed under the Apache License, Version 2.0 (the "License");
8+
# you may not use this file except in compliance with the License.
9+
# You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
# Copyright (c) Meta Platforms, Inc. and affiliates.
21+
#
22+
# This source code is licensed under the MIT license found in the
23+
# LICENSE file in the root directory of this source tree.
24+
25+
import argparse
26+
import os
27+
from typing import List
28+
29+
from torch.utils.data import DataLoader
30+
from torchrec.datasets.criteo import (
31+
CAT_FEATURE_COUNT,
32+
DAYS,
33+
DEFAULT_CAT_NAMES,
34+
DEFAULT_INT_NAMES,
35+
InMemoryBinaryCriteoIterDataPipe,
36+
)
37+
from torchrec.datasets.random import RandomRecDataset
38+
39+
# OSS import
40+
try:
41+
# pyre-ignore[21]
42+
# @manual=//ai_codesign/benchmarks/dlrm/torchrec_dlrm/data:multi_hot_criteo
43+
from data.multi_hot_criteo import MultiHotCriteoIterDataPipe
44+
45+
except ImportError:
46+
pass
47+
48+
# internal import
49+
try:
50+
from .multi_hot_criteo import MultiHotCriteoIterDataPipe # noqa F811
51+
except ImportError:
52+
pass
53+
54+
STAGES = ["train", "val", "test"]
55+
56+
57+
def _get_random_dataloader(
58+
args: argparse.Namespace,
59+
stage: str,
60+
) -> DataLoader:
61+
attr = f"limit_{stage}_batches"
62+
num_batches = getattr(args, attr)
63+
if stage in ["val", "test"] and args.test_batch_size is not None:
64+
batch_size = args.test_batch_size
65+
else:
66+
batch_size = args.batch_size
67+
return DataLoader(
68+
RandomRecDataset(
69+
keys=DEFAULT_CAT_NAMES,
70+
batch_size=batch_size,
71+
hash_size=args.num_embeddings,
72+
hash_sizes=(
73+
args.num_embeddings_per_feature
74+
if hasattr(args, "num_embeddings_per_feature")
75+
else None
76+
),
77+
manual_seed=args.seed if hasattr(args, "seed") else None,
78+
ids_per_feature=1,
79+
num_dense=len(DEFAULT_INT_NAMES),
80+
num_batches=num_batches,
81+
),
82+
batch_size=None,
83+
batch_sampler=None,
84+
pin_memory=args.pin_memory,
85+
num_workers=0,
86+
)
87+
88+
89+
def _get_in_memory_dataloader(
90+
args: argparse.Namespace,
91+
stage: str,
92+
) -> DataLoader:
93+
if args.in_memory_binary_criteo_path is not None:
94+
dir_path = args.in_memory_binary_criteo_path
95+
sparse_part = "sparse.npy"
96+
datapipe = InMemoryBinaryCriteoIterDataPipe
97+
else:
98+
dir_path = args.synthetic_multi_hot_criteo_path
99+
sparse_part = "sparse_multi_hot.npz"
100+
datapipe = MultiHotCriteoIterDataPipe
101+
102+
if stage == "train":
103+
stage_files: List[List[str]] = [
104+
[os.path.join(dir_path, f"day_{i}_dense.npy") for i in range(DAYS - 1)],
105+
[os.path.join(dir_path, f"day_{i}_{sparse_part}") for i in range(DAYS - 1)],
106+
[os.path.join(dir_path, f"day_{i}_labels.npy") for i in range(DAYS - 1)],
107+
]
108+
elif stage in ["val", "test"]:
109+
stage_files: List[List[str]] = [
110+
[os.path.join(dir_path, f"day_{DAYS-1}_dense.npy")],
111+
[os.path.join(dir_path, f"day_{DAYS-1}_{sparse_part}")],
112+
[os.path.join(dir_path, f"day_{DAYS-1}_labels.npy")],
113+
]
114+
if stage in ["val", "test"] and args.test_batch_size is not None:
115+
batch_size = args.test_batch_size
116+
else:
117+
batch_size = args.batch_size
118+
dataloader = DataLoader(
119+
datapipe(
120+
stage,
121+
*stage_files, # pyre-ignore[6]
122+
batch_size=batch_size,
123+
rank=0,
124+
world_size=1,
125+
drop_last=args.drop_last_training_batch if stage == "train" else False,
126+
shuffle_batches=args.shuffle_batches,
127+
shuffle_training_set=args.shuffle_training_set,
128+
shuffle_training_set_random_seed=args.seed,
129+
mmap_mode=args.mmap_mode,
130+
hashes=(
131+
args.num_embeddings_per_feature
132+
if args.num_embeddings is None
133+
else ([args.num_embeddings] * CAT_FEATURE_COUNT)
134+
),
135+
),
136+
batch_size=None,
137+
pin_memory=args.pin_memory,
138+
collate_fn=lambda x: x,
139+
)
140+
return dataloader
141+
142+
143+
def get_dataloader(args: argparse.Namespace, backend: str, stage: str) -> DataLoader:
144+
"""
145+
Gets desired dataloader from dlrm_main command line options. Currently, this
146+
function is able to return either a DataLoader wrapped around a RandomRecDataset or
147+
a Dataloader wrapped around an InMemoryBinaryCriteoIterDataPipe.
148+
149+
Args:
150+
args (argparse.Namespace): Command line options supplied to dlrm_main.py's main
151+
function.
152+
backend (str): "nccl" or "gloo".
153+
stage (str): "train", "val", or "test".
154+
155+
Returns:
156+
dataloader (DataLoader): PyTorch dataloader for the specified options.
157+
158+
"""
159+
stage = stage.lower()
160+
if stage not in STAGES:
161+
raise ValueError(f"Supplied stage was {stage}. Must be one of {STAGES}.")
162+
163+
args.pin_memory = (
164+
(backend == "nccl") if not hasattr(args, "pin_memory") else args.pin_memory
165+
)
166+
167+
if (
168+
args.in_memory_binary_criteo_path is None
169+
and args.synthetic_multi_hot_criteo_path is None
170+
):
171+
return _get_random_dataloader(args, stage)
172+
else:
173+
return _get_in_memory_dataloader(args, stage)

0 commit comments

Comments
 (0)