Showing preview only (295K chars total). Download the full file or copy to clipboard to get everything.
Repository: Lednik7/CLIP-ONNX
Branch: main
Commit: ebd4852b7d3e
Files: 19
Total size: 284.9 KB
Directory structure:
gitextract_k9qaok4o/
├── .gitignore
├── LICENSE
├── README.md
├── benchmark.md
├── clip_onnx/
│ ├── __init__.py
│ ├── benchmark.py
│ ├── clip_converter.py
│ ├── clip_onnx.py
│ └── utils.py
├── examples/
│ ├── RuCLIP_onnx_example.ipynb
│ ├── clip_onnx_example.ipynb
│ ├── dev/
│ │ ├── clip_onnx_benchmark_cpu.ipynb
│ │ ├── clip_onnx_benchmark_gpu.ipynb
│ │ ├── clip_onnx_benchmark_gpu_K80.ipynb
│ │ └── clip_onnx_benchmark_gpu_T4.ipynb
│ ├── readme_example.ipynb
│ └── ru_CLIP_tiny_onnx.ipynb
├── requirements.txt
└── setup.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2022 Gerasimov Maxim
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# CLIP-ONNX
It is a simple library to speed up CLIP inference up to 3x (K80 GPU)!
[](https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/main/examples/readme_example.ipynb)
Open AI CLIP
[](https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/main/examples/RuCLIP_onnx_example.ipynb)
RuCLIP Example
[](https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/main/examples/ru_CLIP_tiny_onnx.ipynb)
RuCLIP tiny Example
## Usage
Install clip-onnx module and requirements first. Use this trick
```python3
!pip install git+https://github.com/Lednik7/CLIP-ONNX.git
!pip install git+https://github.com/openai/CLIP.git
!pip install onnxruntime-gpu
```
## Example in 3 steps
0. Download CLIP image from repo
```python3
!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
```
1. Load standard CLIP model, image, text on cpu
```python3
import clip
from PIL import Image
import numpy as np
# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224]
image_onnx = image.detach().cpu().numpy().astype(np.float32)
# batch first
text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77]
text_onnx = text.detach().cpu().numpy().astype(np.int32)
```
2. Create CLIP-ONNX object to convert model to onnx
```python3
from clip_onnx import clip_onnx
visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"
onnx_model = clip_onnx(model, visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model.start_sessions(providers=["CPUExecutionProvider"]) # cpu mode
```
3. Use for standard CLIP API. Batch inference
```python3
image_features = onnx_model.encode_image(image_onnx)
text_features = onnx_model.encode_text(text_onnx)
logits_per_image, logits_per_text = onnx_model(image_onnx, text_onnx)
probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()
print("Label probs:", probs) # prints: [[0.9927937 0.00421067 0.00299571]]
```
**Enjoy the speed**
## Load saved model
Example for ViT-B/32 from Model Zoo
```python3
!wget https://clip-as-service.s3.us-east-2.amazonaws.com/models/onnx/ViT-B-32/visual.onnx
!wget https://clip-as-service.s3.us-east-2.amazonaws.com/models/onnx/ViT-B-32/textual.onnx
```
```python3
onnx_model = clip_onnx(None)
onnx_model.load_onnx(visual_path="visual.onnx",
textual_path="textual.onnx",
logit_scale=100.0000) # model.logit_scale.exp()
onnx_model.start_sessions(providers=["CPUExecutionProvider"])
```
## Model Zoo
Models of the original CLIP can be found on this [page](https://github.com/jina-ai/clip-as-service/blob/main/server/clip_server/model/clip_onnx.py).\
They are not part of this library but should work correctly.
## If something doesn't work
It happens that onnx does not convert the model the first time, in these cases it is worth trying to run it again.
If it doesn't help, it makes sense to change the export settings.
Model export options in onnx looks like this:
```python3
DEFAULT_EXPORT = dict(input_names=['input'], output_names=['output'],
export_params=True, verbose=False, opset_version=12,
do_constant_folding=True,
dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})
```
You can change them pretty easily.
```python3
from clip_onnx.utils import DEFAULT_EXPORT
DEFAULT_EXPORT["opset_version"] = 15
```
Alternative option (change only visual or textual):
```python3
from clip_onnx import clip_onnx
from clip_onnx.utils import DEFAULT_EXPORT
visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"
textual_export_params = DEFAULT_EXPORT.copy()
textual_export_params["dynamic_axes"] = {'input': {1: 'batch_size'},
'output': {0: 'batch_size'}}
textual_export_params["opset_version"] = 12
Textual = lambda x: x
onnx_model = clip_onnx(model.cpu(), visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(dummy_input_image, dummy_input_text, verbose=True,
textual_wrapper=Textual,
textual_export_params=textual_export_params)
```
## Best practices
See [benchmark.md](https://github.com/Lednik7/CLIP-ONNX/tree/main/benchmark.md)
## Examples
See [examples folder](https://github.com/Lednik7/CLIP-ONNX/tree/main/examples) for more details \
Some parts of the code were taken from the [post](https://twitter.com/apeoffire/status/1478493291008172038). Thank you [neverix](https://github.com/neverix) for this notebook.
================================================
FILE: benchmark.md
================================================
# CPU benchmarks
#### Run on Intel (R) Xeon (R) CPU @ 2.30 GHz with 2 cores (Google Colab session)
| ONNX | batch | encode_image | encode_text | total |
|:---------|--------:|---------------:|--------------:|--------:|
| ViT-B/32 | 2 | 0.234 | 0.162 | 0.396 |
| ViT-B/32 | 8 | 0.923 | 0.656 | 1.579 |
| ViT-B/32 | 16 | 2.079 | 1.288 | 3.367 |
| ViT-B/32 | 32 | 3.937 | 2.658 | 6.595 |
| ViT-B/32 | 64 | 7.944 | 5.567 | 13.511 |
| TORCH | batch | encode_image | encode_text | total |
|:---------|--------:|---------------:|--------------:|--------:|
| ViT-B/32 | 2 | 0.343 | 0.243 | 0.586 |
| ViT-B/32 | 8 | 1.093 | 0.831 | 1.924 |
| ViT-B/32 | 16 | 1.952 | 1.523 | 3.475 |
| ViT-B/32 | 32 | 4.079 | 3.015 | 7.094 |
| ViT-B/32 | 64 | 8.07 | 6.212 | 14.282 |
# GPU benchmarks
#### Run on NVIDIA Tesla K80 (Google Colab session)
| ONNX | batch | encode_image | encode_text | total |
|:---------|--------:|---------------:|--------------:|--------:|
| ViT-B/32 | 2 | 0.136 | 0.021 | 0.157 |
| ViT-B/32 | 8 | 0.054 | 0.04 | 0.094 |
| ViT-B/32 | 16 | 0.089 | 0.071 | 0.16 |
| ViT-B/32 | 32 | 0.158 | 0.134 | 0.292 |
| ViT-B/32 | 64 | 0.325 | 0.258 | 0.583 |
| TORCH | batch | encode_image | encode_text | total |
|:---------|--------:|---------------:|--------------:|--------:|
| ViT-B/32 | 2 | 0.02 | 0.035 | 0.055 |
| ViT-B/32 | 8 | 0.081 | 0.098 | 0.179 |
| ViT-B/32 | 16 | 0.207 | 0.196 | 0.403 |
| ViT-B/32 | 32 | 0.44 | 0.374 | 0.814 |
| ViT-B/32 | 64 | 0.919 | 0.719 | 1.638 |
#### Run on NVIDIA Tesla T4 (Google Colab session)
| ONNX | batch | encode_image | encode_text | total |
|:---------|--------:|---------------:|--------------:|--------:|
| ViT-B/32 | 2 | 0.155 | 0.01 | 0.165 |
| ViT-B/32 | 8 | 0.032 | 0.014 | 0.046 |
| ViT-B/32 | 16 | 0.037 | 0.029 | 0.066 |
| ViT-B/32 | 32 | 0.076 | 0.059 | 0.135 |
| ViT-B/32 | 64 | 0.169 | 0.117 | 0.286 |
| TORCH | batch | encode_image | encode_text | total |
|:---------|--------:|---------------:|--------------:|--------:|
| ViT-B/32 | 2 | 0.017 | 0.009 | 0.026 |
| ViT-B/32 | 8 | 0.008 | 0.008 | 0.016 |
| ViT-B/32 | 16 | 0.009 | 0.012 | 0.021 |
| ViT-B/32 | 32 | 0.008 | 0.025 | 0.033 |
| ViT-B/32 | 64 | 0.009 | 0.049 | 0.058 |
================================================
FILE: clip_onnx/__init__.py
================================================
from .clip_converter import clip_converter
from .clip_onnx import clip_onnx
from .utils import Textual, attention
from .benchmark import speed_test
================================================
FILE: clip_onnx/benchmark.py
================================================
import time
import torch
def speed_test(func, data_gen, n: int = 5, empty_cache: bool = True):
if empty_cache:
torch.cuda.empty_cache()
values = []
for _ in range(n):
input_data = data_gen()
t = time.time()
func(input_data)
values.append(time.time() - t)
if empty_cache:
torch.cuda.empty_cache()
return sum(values) / n
================================================
FILE: clip_onnx/clip_converter.py
================================================
import torch
import onnx
from torch import nn
from onnxruntime.quantization import quantize_dynamic, QuantType
from .utils import Textual, DEFAULT_EXPORT
class clip_converter(nn.Module):
def __init__(self, model, visual_path: str = "clip_visual.onnx",
textual_path: str = "clip_textual.onnx"):
super().__init__()
self.model = model
self.visual_path = visual_path
self.textual_path = textual_path
self.visual_flag = False
self.textual_flag = False
self.logit_scale = self.model.logit_scale.exp()
self.model.eval()
for x in self.model.parameters():
x.requires_grad = False
def quantization(self, mode: str = "dynamic"):
assert mode in ["dynamic"]
if mode == "dynamic":
model_quant_visual = f"{self.visual_path}.quant"
quantize_dynamic(self.visual_path,
model_quant_visual,
weight_type=QuantType.QUInt8)
self.visual_path = model_quant_visual
model_quant_textual = f"{self.textual_path}.quant"
quantize_dynamic(self.textual_path,
model_quant_textual,
weight_type=QuantType.QUInt8)
self.textual_path = model_quant_textual
def torch_export(self, model, dummy_input, path: str, export_params=DEFAULT_EXPORT):
torch.onnx.export(model, dummy_input, path, **export_params)
def onnx_checker(self, path: str):
model = onnx.load(path)
onnx.checker.check_model(model)
del model
def convert_visual(self, dummy_input, wrapper=lambda x: x,
export_params=DEFAULT_EXPORT):
visual = wrapper(self.model.visual)
self.torch_export(visual, dummy_input, self.visual_path,
export_params=export_params)
self.onnx_checker(self.visual_path)
def convert_textual(self, dummy_input, wrapper=Textual,
export_params=DEFAULT_EXPORT):
textual = wrapper(self.model)
self.torch_export(textual, dummy_input, self.textual_path,
export_params=export_params)
self.onnx_checker(self.textual_path)
def convert2onnx(self, visual_input=None, textual_input=None, verbose=True,
visual_wrapper=lambda x: x,
textual_wrapper=Textual,
visual_export_params=DEFAULT_EXPORT,
textual_export_params=DEFAULT_EXPORT):
isinstance_visual_input = isinstance(visual_input, (torch.Tensor))
isinstance_textual_input = isinstance(textual_input, (torch.Tensor))
if (not isinstance_visual_input) and (not isinstance_textual_input):
raise Exception("[CLIP ONNX] Please, choose a dummy input")
elif not isinstance_visual_input:
print("[CLIP ONNX] Convert only textual model")
elif not isinstance_textual_input:
print("[CLIP ONNX] Convert only visual model")
if isinstance_visual_input:
self.visual_flag = True
if verbose:
print("[CLIP ONNX] Start convert visual model")
self.convert_visual(visual_input, visual_wrapper, visual_export_params)
if verbose:
print("[CLIP ONNX] Start check visual model")
self.onnx_checker(self.visual_path)
if isinstance_textual_input:
self.textual_flag = True
if verbose:
print("[CLIP ONNX] Start convert textual model")
self.convert_textual(textual_input, textual_wrapper, textual_export_params)
if verbose:
print("[CLIP ONNX] Start check textual model")
self.onnx_checker(self.textual_path)
if verbose:
print("[CLIP ONNX] Models converts successfully")
================================================
FILE: clip_onnx/clip_onnx.py
================================================
from .clip_converter import clip_converter
import torch
import onnxruntime
class clip_onnx(clip_converter):
def __init__(self, model=None,
visual_path: str = "clip_visual.onnx",
textual_path: str = "clip_textual.onnx"):
if not isinstance(model, (type(None))):
super().__init__(model, visual_path, textual_path)
else:
print("[CLIP ONNX] Load mode")
def load_onnx(self, visual_path=None, textual_path=None, logit_scale=None):
if visual_path and textual_path:
if not logit_scale:
raise Exception("For this mode logit_scale must be specified. Example: model.logit_scale.exp()")
self.logit_scale = logit_scale
if visual_path:
self.visual_path = visual_path
self.visual_flag = True
if textual_path:
self.textual_path = textual_path
self.textual_flag = True
def start_sessions(self, providers=['TensorrtExecutionProvider',
'CUDAExecutionProvider',
'CPUExecutionProvider']):
if self.visual_flag:
self.visual_session = onnxruntime.InferenceSession(self.visual_path,
providers=providers)
if self.textual_flag:
self.textual_session = onnxruntime.InferenceSession(self.textual_path,
providers=providers)
def visual_run(self, onnx_image):
onnx_input_image = {self.visual_session.get_inputs()[0].name: onnx_image}
visual_output, = self.visual_session.run(None, onnx_input_image)
return visual_output
def textual_run(self, onnx_text):
onnx_input_text = {self.textual_session.get_inputs()[0].name: onnx_text}
textual_output, = self.textual_session.run(None, onnx_input_text)
return textual_output
def __call__(self, image, text, device: str = "cpu"):
assert self.visual_flag and self.textual_flag
image_features = torch.from_numpy(self.visual_run(image)).to(device)
text_features = torch.from_numpy(self.textual_run(text)).to(device)
# normalized features
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
# cosine similarity as logits
logits_per_image = self.logit_scale * image_features @ text_features.t()
logits_per_text = logits_per_image.t()
# shape = [global_batch_size, global_batch_size]
return logits_per_image, logits_per_text
def encode_image(self, image):
return self.visual_run(image)
def encode_text(self, text):
return self.textual_run(text)
================================================
FILE: clip_onnx/utils.py
================================================
import torch.nn.functional as F
import torch
from torch import nn
class Textual(nn.Module):
def __init__(self, model):
super().__init__()
self.transformer = model.transformer
self.positional_embedding = model.positional_embedding
self.transformer = model.transformer
self.ln_final = model.ln_final
self.text_projection = model.text_projection
self.token_embedding = model.token_embedding
def forward(self, text):
x = self.token_embedding(text) # [batch_size, n_ctx, d_model]
x = x + self.positional_embedding
x = x.permute(1, 0, 2) # NLD -> LND
x = self.transformer(x)
x = x.permute(1, 0, 2) # LND -> NLD
x = self.ln_final(x)
# x.shape = [batch_size, n_ctx, transformer.width]
# take features from the eot embedding (eot_token is the highest number in each sequence)
# needs .float() before .argmax( ) to work
x = x[torch.arange(x.shape[0]), text.float().argmax(dim=-1)] @ self.text_projection
return x
def attention(self, x: torch.Tensor):
# onnx doesn't like multi_head_attention_forward so this is a reimplementation
self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None
q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk(
3, dim=-1)
tgt_len = q.shape[0]
bsz = q.shape[1]
num_heads = self.attn.num_heads
head_dim = q.shape[2] // num_heads
attn_output = scaled_dot_product_attention(
q.reshape(tgt_len, bsz * num_heads, head_dim).transpose(0, 1),
k.reshape(tgt_len, bsz * num_heads, head_dim).transpose(0, 1),
v.reshape(tgt_len, bsz * num_heads, head_dim).transpose(0, 1), self.attn_mask, 0.0
)
attn_output = attn_output.transpose(0, 1).contiguous().view(q.shape)
attn_output = F.linear(attn_output, self.attn.out_proj.weight, self.attn.out_proj.bias)
return attn_output
def scaled_dot_product_attention(Q, K, V, attn_mask, dropout_p):
if attn_mask is None:
attn_weight = torch.softmax(Q @ K.transpose(-2, -1) / Q.size(-1)**0.5, dim=-1)
else:
attn_weight = torch.softmax(Q @ K.transpose(-2, -1) / Q.size(-1)**0.5 + attn_mask[None, ...], dim=-1)
# attn_weight = torch.dropout(attn_weight, dropout_p) # this is always 0.0 in CLIP so I comment it out.
return attn_weight @ V
DEFAULT_EXPORT = dict(input_names=['input'], output_names=['output'],
export_params=True, verbose=False, opset_version=12,
do_constant_folding=True,
dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})
================================================
FILE: examples/RuCLIP_onnx_example.ipynb
================================================
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "RuCLIP_onnx_example.ipynb",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/main/examples/RuCLIP_onnx_example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"source": [
"#@title Allowed Resources\n",
"import multiprocessing\n",
"import torch\n",
"from psutil import virtual_memory\n",
"\n",
"ram_gb = round(virtual_memory().total / 1024**3, 1)\n",
"\n",
"print('CPU:', multiprocessing.cpu_count())\n",
"print('RAM GB:', ram_gb)\n",
"print(\"PyTorch version:\", torch.__version__)\n",
"print(\"CUDA version:\", torch.version.cuda)\n",
"print(\"cuDNN version:\", torch.backends.cudnn.version())\n",
"device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
"print(\"device:\", device.type)\n",
"\n",
"!nvidia-smi"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"cellView": "form",
"id": "4gfq46gnYcnU",
"outputId": "41e2054a-e2e4-4bb5-ed39-8bd8bfc639c3"
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"CPU: 2\n",
"RAM GB: 12.7\n",
"PyTorch version: 1.10.0+cu111\n",
"CUDA version: 11.1\n",
"cuDNN version: 8005\n",
"device: cuda\n",
"Wed Jan 19 22:10:10 2022 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 495.46 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 41C P8 9W / 70W | 3MiB / 15109MiB | 0% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Restart colab session after installation\n",
"Reload the session if something doesn't work"
],
"metadata": {
"id": "whlsBiJgR8le"
}
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"!pip install git+https://github.com/Lednik7/CLIP-ONNX.git\n",
"!pip install ruclip==0.0.1rc7\n",
"!pip install onnxruntime-gpu"
],
"metadata": {
"id": "HnbpAkvuR73L"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true"
],
"metadata": {
"id": "tqy0zKM4R-7M"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import onnxruntime\n",
"\n",
"# priority device (if available)\n",
"print(onnxruntime.get_device())"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "x8IN72OnSAIh",
"outputId": "3174cf2c-ace3-4e1f-a550-e16c72302d51"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"GPU\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## RuCLIP\n",
"WARNING: specific RuCLIP like forward \"model(text, image)\" instead of classic(OpenAI CLIP) \"model(image, text)\""
],
"metadata": {
"id": "8_wSsSheT5mw"
}
},
{
"cell_type": "code",
"source": [
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\", category=UserWarning)"
],
"metadata": {
"id": "gZTxanR26knr"
},
"execution_count": 1,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import ruclip\n",
"\n",
"# onnx cannot export with cuda\n",
"model, processor = ruclip.load(\"ruclip-vit-base-patch32-384\", device=\"cpu\")"
],
"metadata": {
"id": "FdTLuqsJUBFY"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from PIL import Image\n",
"import numpy as np\n",
"\n",
"# simple input\n",
"pil_images = [Image.open(\"CLIP.png\")]\n",
"labels = ['диаграмма', 'собака', 'кошка']\n",
"dummy_input = processor(text=labels, images=pil_images,\n",
" return_tensors='pt', padding=True)\n",
"\n",
"# batch first\n",
"image = dummy_input[\"pixel_values\"] # torch tensor [1, 3, 384, 384]\n",
"image_onnx = dummy_input[\"pixel_values\"].cpu().detach().numpy().astype(np.float32)\n",
"\n",
"# batch first\n",
"text = dummy_input[\"input_ids\"] # torch tensor [3, 77]\n",
"text_onnx = dummy_input[\"input_ids\"].cpu().detach().numpy()[::-1].astype(np.int64)"
],
"metadata": {
"id": "rPwc6A2SSGyl"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#RuCLIP output\n",
"logits_per_image, logits_per_text = model(text, image)\n",
"probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()\n",
"\n",
"print(\"Label probs:\", probs) # prints: [[0.9885839 0.00894288 0.0024732 ]]"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pv0mH626SdzO",
"outputId": "d563462f-b2a9-4d49-b491-17e88ffa81f0"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Label probs: [[0.9885839 0.00894288 0.0024732 ]]\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Convert RuCLIP model to ONNX"
],
"metadata": {
"id": "R_e5OjJeXRiF"
}
},
{
"cell_type": "code",
"source": [
"from clip_onnx import clip_onnx\n",
"\n",
"visual_path = \"clip_visual.onnx\"\n",
"textual_path = \"clip_textual.onnx\"\n",
"\n",
"onnx_model = clip_onnx(model, visual_path=visual_path, textual_path=textual_path)\n",
"onnx_model.convert2onnx(image, text, verbose=True)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "oYM5FDSGSJBW",
"outputId": "c647dc2e-946d-4769-c66e-77edfa98237f"
},
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start convert visual model\n",
"[CLIP ONNX] Start check visual model\n",
"[CLIP ONNX] Start convert textual model\n",
"[CLIP ONNX] Start check textual model\n",
"[CLIP ONNX] Models converts successfully\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## [ONNX] CPU inference mode"
],
"metadata": {
"id": "U1Pr-YTtSEhs"
}
},
{
"cell_type": "code",
"source": [
"# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']\n",
"onnx_model.start_sessions(providers=[\"CPUExecutionProvider\"]) # cpu mode"
],
"metadata": {
"id": "aY9wRe5kT3wG"
},
"execution_count": 6,
"outputs": []
},
{
"cell_type": "code",
"source": [
"image_features = onnx_model.encode_image(image_onnx)\n",
"text_features = onnx_model.encode_text(text_onnx)\n",
"\n",
"logits_per_image, logits_per_text = onnx_model(image_onnx, text_onnx)\n",
"probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()\n",
"\n",
"print(\"Label probs:\", probs) # prints: Label probs: [[0.90831375 0.07174418 0.01994203]]"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tYVuk72nSLw6",
"outputId": "75bf3803-6ed7-4516-ccd0-42f9cf7f22e0"
},
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Label probs: [[0.90831375 0.07174418 0.01994203]]\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"%timeit onnx_model.encode_text(text_onnx) # text representation"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Bpu4_HFRVeNk",
"outputId": "e8f1681b-40dc-495f-d382-f0348d87c412"
},
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1 loop, best of 5: 285 ms per loop\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"%timeit onnx_model.encode_image(image_onnx) # image representation"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JsOccP2gVmpo",
"outputId": "adb33860-b000-461b-959f-95126e2ac049"
},
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1 loop, best of 5: 412 ms per loop\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## [ONNX] GPU inference mode"
],
"metadata": {
"id": "Zww0E-jIULug"
}
},
{
"cell_type": "code",
"source": [
"onnx_model.start_sessions(providers=[\"CUDAExecutionProvider\"]) # cuda mode"
],
"metadata": {
"id": "PBakYeiQUOAm"
},
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%timeit onnx_model.encode_text(text_onnx) # text representation"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EjvRBvCaWJBL",
"outputId": "07426652-1cc5-4713-c355-fb4f1bd138d4"
},
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The slowest run took 5.07 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
"100 loops, best of 5: 6.89 ms per loop\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"%timeit onnx_model.encode_image(image_onnx) # image representation"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pmu4mQCsWJ8w",
"outputId": "5cb45026-dfd3-419d-e5d3-f5d0d9681cd0"
},
"execution_count": 12,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The slowest run took 699.84 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
"1 loop, best of 5: 18.9 ms per loop\n"
]
}
]
}
]
}
================================================
FILE: examples/clip_onnx_example.ipynb
================================================
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "clip_onnx_example.ipynb",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/main/examples/clip_onnx_example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"## Restart colab session after installation\n",
"Reload the session if something doesn't work"
],
"metadata": {
"id": "fxPg_VvZuScV"
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "al_QNjyFq6Jj"
},
"outputs": [],
"source": [
"%%capture\n",
"!pip install git+https://github.com/Lednik7/CLIP-ONNX.git\n",
"!pip install git+https://github.com/openai/CLIP.git\n",
"!pip install onnxruntime-gpu"
]
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true"
],
"metadata": {
"id": "42eeJz9lTdJ6"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"source": [
"!nvidia-smi"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XuauIZIBSEUX",
"outputId": "2c7c2bd9-90dd-4b1a-e98a-79e1f2218644"
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Thu Jan 6 16:36:44 2022 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 495.44 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 35C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import onnxruntime\n",
"print(onnxruntime.get_device())"
],
"metadata": {
"id": "gqvxpdajRX5_"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## CPU inference mode"
],
"metadata": {
"id": "010k-ksVTjAu"
}
},
{
"cell_type": "markdown",
"source": [
"### Torch CLIP"
],
"metadata": {
"id": "KdTz0IJWVBqE"
}
},
{
"cell_type": "code",
"source": [
"import clip\n",
"from PIL import Image\n",
"import numpy as np\n",
"\n",
"# onnx cannot work with cuda\n",
"model, preprocess = clip.load(\"ViT-B/32\", device=\"cpu\", jit=False)\n",
"\n",
"# batch first\n",
"image = preprocess(Image.open(\"CLIP.png\")).unsqueeze(0).cpu() # [1, 3, 224, 224]\n",
"image_onnx = image.detach().cpu().numpy().astype(np.float32)\n",
"\n",
"# batch first\n",
"text = clip.tokenize([\"a diagram\", \"a dog\", \"a cat\"]).cpu() # [3, 77]\n",
"text_onnx = text.detach().cpu().numpy().astype(np.int64)"
],
"metadata": {
"id": "9ROPwKYurOhP"
},
"execution_count": 1,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%timeit model(image, text)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1CrHQ8cYt8Cx",
"outputId": "4d98f85d-4b02-4ae2-b18f-fb3c7a2d6caf"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1 loop, best of 5: 636 ms per loop\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### CLIP-ONNX"
],
"metadata": {
"id": "Ao2MriaVVG6Y"
}
},
{
"cell_type": "code",
"source": [
"from clip_onnx import clip_onnx, attention\n",
"clip.model.ResidualAttentionBlock.attention = attention\n",
"\n",
"onnx_model = clip_onnx(model)\n",
"onnx_model.convert2onnx(image, text, verbose=True)\n",
"# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']\n",
"onnx_model.start_sessions(providers=[\"CPUExecutionProvider\"]) # cpu mode"
],
"metadata": {
"id": "nSeG9uAZrcph",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8c394684-d78e-49f6-a60f-872485d5f650"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start convert visual model\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.7/dist-packages/clip_onnx/utils.py:40: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').\n",
" head_dim = q.shape[2] // num_heads\n",
"/usr/local/lib/python3.7/dist-packages/torch/onnx/symbolic_helper.py:716: UserWarning: allowzero=0 by default. In order to honor zero value in shape use allowzero=1\n",
" warnings.warn(\"allowzero=0 by default. In order to honor zero value in shape use allowzero=1\")\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start check visual model\n",
"[CLIP ONNX] Start convert textual model\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.7/dist-packages/torch/onnx/symbolic_opset9.py:2819: UserWarning: Exporting aten::index operator of advanced indexing in opset 14 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.\n",
" \"If indices include negative values, the exported graph will produce incorrect results.\")\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start check textual model\n",
"[CLIP ONNX] Models converts successfully\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"%timeit onnx_model(image_onnx, text_onnx)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "B15dr51UrvMh",
"outputId": "7c5fbc64-61f5-4742-d5a1-24d123971515"
},
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1 loop, best of 5: 550 ms per loop\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## GPU inference mode\n",
"Select a runtime GPU to continue:\n",
"\n",
"Click Runtime -> Change Runtime Type -> switch \"Harware accelerator\" to be GPU. Save it, and you maybe connect to GPU"
],
"metadata": {
"id": "Ahh_7CkTUb8y"
}
},
{
"cell_type": "markdown",
"source": [
"### CLIP-ONNX"
],
"metadata": {
"id": "B6M7yq7qceb5"
}
},
{
"cell_type": "code",
"source": [
"onnx_model.start_sessions(providers=[\"CUDAExecutionProvider\"]) # GPU mode"
],
"metadata": {
"id": "6LtPSZhfUd_m"
},
"execution_count": 6,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_model.visual_session.get_providers() # optional"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xE0VGt9sQwrf",
"outputId": "6feb4701-7b7f-437e-dc2f-c95c504dbb89"
},
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['CUDAExecutionProvider', 'CPUExecutionProvider']"
]
},
"metadata": {},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"source": [
"%timeit onnx_model(image_onnx, text_onnx)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "iPUVzqmgcYas",
"outputId": "3e7c1526-6e38-4982-ca36-eabfc95c2ab9"
},
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The slowest run took 79.70 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
"1 loop, best of 5: 60.8 ms per loop\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### Torch CLIP"
],
"metadata": {
"id": "jb58mrkbch2V"
}
},
{
"cell_type": "code",
"source": [
"import clip\n",
"from PIL import Image\n",
"\n",
"device = \"cuda\"\n",
"# onnx cannot work with cuda\n",
"model, preprocess = clip.load(\"ViT-B/32\", device=device, jit=False)\n",
"# batch first\n",
"image = preprocess(Image.open(\"CLIP.png\")).unsqueeze(0).to(device) # [1, 3, 224, 224]\n",
"text = clip.tokenize([\"a diagram\", \"a dog\", \"a cat\"]).to(device) # [3, 77]"
],
"metadata": {
"id": "gidR99GOckyF"
},
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%timeit model(image, text)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XpBrtjlOcwOC",
"outputId": "56375401-18a0-499b-f29b-c6e2d4d07e42"
},
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"10 loops, best of 5: 72.2 ms per loop\n"
]
}
]
}
]
}
================================================
FILE: examples/dev/clip_onnx_benchmark_cpu.ipynb
================================================
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "clip-onnx-benchmark-cpu.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyNUvpypuYYk54s1lZecP8Pf",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/dev/examples/dev/clip_onnx_benchmark_cpu.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"## Restart colab session after installation\n",
"Reload the session if something doesn't work"
],
"metadata": {
"id": "fxPg_VvZuScV"
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "al_QNjyFq6Jj"
},
"outputs": [],
"source": [
"%%capture\n",
"!pip install git+https://github.com/Lednik7/CLIP-ONNX.git@dev\n",
"!pip install git+https://github.com/openai/CLIP.git\n",
"!pip install onnxruntime-gpu"
]
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true"
],
"metadata": {
"id": "42eeJz9lTdJ6"
},
"execution_count": 1,
"outputs": []
},
{
"cell_type": "code",
"source": [
"!nvidia-smi"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XuauIZIBSEUX",
"outputId": "7e3fa9a5-2970-4bc1-81e5-9ec997a267a1"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Tue May 3 06:56:57 2022 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 47C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import onnxruntime\n",
"print(onnxruntime.get_device())"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gqvxpdajRX5_",
"outputId": "4ad23904-186a-4e19-af9a-66538a70a3c8"
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"GPU\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## GPU inference mode\n",
"Select a runtime GPU to continue:\n",
"\n",
"Click Runtime -> Change Runtime Type -> switch \"Harware accelerator\" to be GPU. Save it, and you maybe connect to GPU"
],
"metadata": {
"id": "010k-ksVTjAu"
}
},
{
"cell_type": "markdown",
"source": [
"### Torch CLIP"
],
"metadata": {
"id": "KdTz0IJWVBqE"
}
},
{
"cell_type": "code",
"source": [
"import clip\n",
"from PIL import Image\n",
"import numpy as np\n",
"\n",
"# onnx cannot work with cuda\n",
"model, preprocess = clip.load(\"ViT-B/32\", device=\"cpu\", jit=False)\n",
"\n",
"# batch first\n",
"image = preprocess(Image.open(\"CLIP.png\")).unsqueeze(0) # [1, 3, 224, 224]\n",
"image_onnx = image.detach().cpu().numpy().astype(np.float32)\n",
"\n",
"# batch first\n",
"text = clip.tokenize([\"a diagram\", \"a dog\", \"a cat\"]) # [3, 77]\n",
"text_onnx = text.detach().cpu().numpy().astype(np.int32)"
],
"metadata": {
"id": "9ROPwKYurOhP"
},
"execution_count": 4,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### CLIP-ONNX"
],
"metadata": {
"id": "Ao2MriaVVG6Y"
}
},
{
"cell_type": "code",
"source": [
"from clip_onnx import clip_onnx\n",
"\n",
"onnx_model = clip_onnx(model)\n",
"onnx_model.convert2onnx(image, text, verbose=True)\n",
"# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']\n",
"onnx_model.start_sessions(providers=[\"CPUExecutionProvider\"]) # GPU mode"
],
"metadata": {
"id": "nSeG9uAZrcph",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "32e7fb6e-191a-4c3a-a8be-42ddf41ee62d"
},
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start convert visual model\n",
"[CLIP ONNX] Start check visual model\n",
"[CLIP ONNX] Start convert textual model\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.7/dist-packages/torch/onnx/symbolic_opset9.py:2909: UserWarning: Exporting aten::index operator of advanced indexing in opset 12 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.\n",
" \"If indices include negative values, the exported graph will produce incorrect results.\")\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start check textual model\n",
"[CLIP ONNX] Models converts successfully\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"onnx_model = clip_onnx(model)\n",
"onnx_model.load_onnx(\"/content/clip_visual.onnx\",\n",
" \"/content/clip_textual.onnx\",\n",
" model.logit_scale.exp())\n",
"onnx_model.start_sessions(providers=[\"CPUExecutionProvider\"]) # GPU mode"
],
"metadata": {
"id": "PsDS7ty79zZf"
},
"execution_count": 7,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_model.visual_session.get_providers()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "aZsGJNrbNCYe",
"outputId": "27eec69c-6535-46e1-d98a-15836459149e"
},
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['CPUExecutionProvider']"
]
},
"metadata": {},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"source": [
"## Benchmark"
],
"metadata": {
"id": "J5IcOG_6jAFz"
}
},
{
"cell_type": "code",
"source": [
"model, preprocess = clip.load(\"ViT-B/32\", device=\"cpu\", jit=False)"
],
"metadata": {
"id": "SJ_5_x7vLepK"
},
"execution_count": 9,
"outputs": []
},
{
"cell_type": "code",
"source": [
"model.eval()\n",
"for x in model.parameters():\n",
" x.requires_grad = False"
],
"metadata": {
"id": "OnOzZ3LMuubW"
},
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import numpy, random, torch"
],
"metadata": {
"id": "wDwqRRrTGKUS"
},
"execution_count": 11,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def set_seed():\n",
" torch.manual_seed(12)\n",
" torch.cuda.manual_seed(12)\n",
" np.random.seed(12)\n",
" random.seed(12)\n",
"\n",
" torch.backends.cudnn.deterministic=True"
],
"metadata": {
"id": "9H17n_6gGJgT"
},
"execution_count": 12,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import torch\n",
"import time\n",
"\n",
"n = 5\n",
"clip_results = {\"encode_image\": [],\n",
" \"encode_text\": []}\n",
"onnx_results = {\"encode_image\": [],\n",
" \"encode_text\": []}\n",
"for batch in [2, 8, 16, 32, 64]:\n",
" set_seed()\n",
" t_mean = []\n",
" for _ in range(n):\n",
" image_input = torch.randint(1, 255, (batch, 3, 224, 224))\n",
" image_input_onnx = image_input.detach().cpu().numpy().astype(np.float32)\n",
" t = time.time()\n",
" onnx_model.encode_image(image_input_onnx)\n",
" t_mean.append(time.time() - t)\n",
" print(\"onnx\", batch, \"encode_image\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" onnx_results[\"encode_image\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" set_seed()\n",
" with torch.inference_mode():\n",
" t_mean = []\n",
" for _ in range(n):\n",
" image_input = torch.randint(1, 255, (batch, 3, 224, 224))\n",
" t = time.time()\n",
" model.encode_image(image_input)\n",
" t_mean.append(time.time() - t)\n",
" print(\"torch\", batch, \"encode_image\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" clip_results[\"encode_image\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" set_seed()\n",
" t_mean = []\n",
" for _ in range(n):\n",
" text_input = torch.randint(320, 49407, (batch, 77))\n",
" text_input_onnx = text_input.detach().cpu().numpy().astype(np.int32)\n",
" t = time.time()\n",
" onnx_model.encode_text(text_input_onnx)\n",
" t_mean.append(time.time() - t)\n",
" print(\"onnx\", batch, \"encode_text\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" onnx_results[\"encode_text\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" set_seed()\n",
" with torch.inference_mode():\n",
" t_mean = []\n",
" for _ in range(n):\n",
" text_input = torch.randint(320, 49407, (batch, 77))\n",
" t = time.time()\n",
" model.encode_text(text_input)\n",
" t_mean.append(time.time() - t)\n",
" print(\"torch\", batch, \"encode_text\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" clip_results[\"encode_text\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" print(\"-\" * 78)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4lFL6tzWjiWL",
"outputId": "45819718-619e-429c-9aa4-7e28b068b9a3"
},
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"onnx 2 encode_image 0.234\n",
"torch 2 encode_image 0.343\n",
"onnx 2 encode_text 0.162\n",
"torch 2 encode_text 0.243\n",
"------------------------------------------------------------------------------\n",
"onnx 8 encode_image 0.923\n",
"torch 8 encode_image 1.093\n",
"onnx 8 encode_text 0.656\n",
"torch 8 encode_text 0.831\n",
"------------------------------------------------------------------------------\n",
"onnx 16 encode_image 2.079\n",
"torch 16 encode_image 1.952\n",
"onnx 16 encode_text 1.288\n",
"torch 16 encode_text 1.523\n",
"------------------------------------------------------------------------------\n",
"onnx 32 encode_image 3.937\n",
"torch 32 encode_image 4.079\n",
"onnx 32 encode_text 2.658\n",
"torch 32 encode_text 3.015\n",
"------------------------------------------------------------------------------\n",
"onnx 64 encode_image 7.944\n",
"torch 64 encode_image 8.07\n",
"onnx 64 encode_text 5.567\n",
"torch 64 encode_text 6.212\n",
"------------------------------------------------------------------------------\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import pandas as pd"
],
"metadata": {
"id": "P2YhbE9v_4ci"
},
"execution_count": 14,
"outputs": []
},
{
"cell_type": "code",
"source": [
"pd.DataFrame({\"backend\": [\"onnx\", \"torch\"] * 5,\n",
" \"batch\": [2, 2, 8, 8, 16, 16, 32, 32, 64, 64],\n",
" \"encode_image\": [j[1] for i in zip(onnx_results[\"encode_image\"],\n",
" clip_results[\"encode_image\"]) for j in i],\n",
" \"encode_text\": [j[1] for i in zip(onnx_results[\"encode_text\"],\n",
" clip_results[\"encode_text\"]) for j in i]})"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 362
},
"id": "WfZfDk4PAlqm",
"outputId": "38710ad6-09ae-4c48-fc20-1cdabf4c2a50"
},
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" backend batch encode_image encode_text\n",
"0 onnx 2 0.234 0.162\n",
"1 torch 2 0.343 0.243\n",
"2 onnx 8 0.923 0.656\n",
"3 torch 8 1.093 0.831\n",
"4 onnx 16 2.079 1.288\n",
"5 torch 16 1.952 1.523\n",
"6 onnx 32 3.937 2.658\n",
"7 torch 32 4.079 3.015\n",
"8 onnx 64 7.944 5.567\n",
"9 torch 64 8.070 6.212"
],
"text/html": [
"\n",
" <div id=\"df-e4f91703-fb85-4559-be94-d5ff4e38a360\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>backend</th>\n",
" <th>batch</th>\n",
" <th>encode_image</th>\n",
" <th>encode_text</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>onnx</td>\n",
" <td>2</td>\n",
" <td>0.234</td>\n",
" <td>0.162</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>torch</td>\n",
" <td>2</td>\n",
" <td>0.343</td>\n",
" <td>0.243</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>onnx</td>\n",
" <td>8</td>\n",
" <td>0.923</td>\n",
" <td>0.656</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>torch</td>\n",
" <td>8</td>\n",
" <td>1.093</td>\n",
" <td>0.831</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>onnx</td>\n",
" <td>16</td>\n",
" <td>2.079</td>\n",
" <td>1.288</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>torch</td>\n",
" <td>16</td>\n",
" <td>1.952</td>\n",
" <td>1.523</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>onnx</td>\n",
" <td>32</td>\n",
" <td>3.937</td>\n",
" <td>2.658</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>torch</td>\n",
" <td>32</td>\n",
" <td>4.079</td>\n",
" <td>3.015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>onnx</td>\n",
" <td>64</td>\n",
" <td>7.944</td>\n",
" <td>5.567</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>torch</td>\n",
" <td>64</td>\n",
" <td>8.070</td>\n",
" <td>6.212</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e4f91703-fb85-4559-be94-d5ff4e38a360')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-e4f91703-fb85-4559-be94-d5ff4e38a360 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-e4f91703-fb85-4559-be94-d5ff4e38a360');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "code",
"source": [
"onnx_df = pd.DataFrame({\"ONNX\": [\"ViT-B/32\"] * 5,\n",
" \"batch\": [2, 8, 16, 32, 64],\n",
" \"encode_image\": [i[1] for i in onnx_results[\"encode_image\"]],\n",
" \"encode_text\": [i[1] for i in onnx_results[\"encode_text\"]]})\n",
"onnx_df[\"total\"] = onnx_df[\"encode_image\"] + onnx_df[\"encode_text\"]"
],
"metadata": {
"id": "Xpw9lV7yBbA8"
},
"execution_count": 16,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_df"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "LItAyQkeDhnQ",
"outputId": "37517a71-baf3-494c-8a46-9f05cbfb7d32"
},
"execution_count": 17,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" ONNX batch encode_image encode_text total\n",
"0 ViT-B/32 2 0.234 0.162 0.396\n",
"1 ViT-B/32 8 0.923 0.656 1.579\n",
"2 ViT-B/32 16 2.079 1.288 3.367\n",
"3 ViT-B/32 32 3.937 2.658 6.595\n",
"4 ViT-B/32 64 7.944 5.567 13.511"
],
"text/html": [
"\n",
" <div id=\"df-93a4fa7a-32c4-4c2d-803e-5e150f825186\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ONNX</th>\n",
" <th>batch</th>\n",
" <th>encode_image</th>\n",
" <th>encode_text</th>\n",
" <th>total</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ViT-B/32</td>\n",
" <td>2</td>\n",
" <td>0.234</td>\n",
" <td>0.162</td>\n",
" <td>0.396</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ViT-B/32</td>\n",
" <td>8</td>\n",
" <td>0.923</td>\n",
" <td>0.656</td>\n",
" <td>1.579</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ViT-B/32</td>\n",
" <td>16</td>\n",
" <td>2.079</td>\n",
" <td>1.288</td>\n",
" <td>3.367</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ViT-B/32</td>\n",
" <td>32</td>\n",
" <td>3.937</td>\n",
" <td>2.658</td>\n",
" <td>6.595</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ViT-B/32</td>\n",
" <td>64</td>\n",
" <td>7.944</td>\n",
" <td>5.567</td>\n",
" <td>13.511</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-93a4fa7a-32c4-4c2d-803e-5e150f825186')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-93a4fa7a-32c4-4c2d-803e-5e150f825186 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-93a4fa7a-32c4-4c2d-803e-5e150f825186');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 17
}
]
},
{
"cell_type": "code",
"source": [
"print(onnx_df.to_markdown(index=False))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "AIQDA9FaJZ7Y",
"outputId": "8e8d4109-822e-4328-b2ca-66d4b9a19f8d"
},
"execution_count": 18,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"| ONNX | batch | encode_image | encode_text | total |\n",
"|:---------|--------:|---------------:|--------------:|--------:|\n",
"| ViT-B/32 | 2 | 0.234 | 0.162 | 0.396 |\n",
"| ViT-B/32 | 8 | 0.923 | 0.656 | 1.579 |\n",
"| ViT-B/32 | 16 | 2.079 | 1.288 | 3.367 |\n",
"| ViT-B/32 | 32 | 3.937 | 2.658 | 6.595 |\n",
"| ViT-B/32 | 64 | 7.944 | 5.567 | 13.511 |\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"clip_df = pd.DataFrame({\"TORCH\": [\"ViT-B/32\"] * 5,\n",
" \"batch\": [2, 8, 16, 32, 64],\n",
" \"encode_image\": [i[1] for i in clip_results[\"encode_image\"]],\n",
" \"encode_text\": [i[1] for i in clip_results[\"encode_text\"]]})\n",
"clip_df[\"total\"] = clip_df[\"encode_image\"] + clip_df[\"encode_text\"]"
],
"metadata": {
"id": "E1OXQUDvDZmI"
},
"execution_count": 19,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(clip_df.to_markdown(index=False))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xAj-ynhCDpPO",
"outputId": "88243c7f-bd6d-4a63-9ee2-154440c3df7e"
},
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"| TORCH | batch | encode_image | encode_text | total |\n",
"|:---------|--------:|---------------:|--------------:|--------:|\n",
"| ViT-B/32 | 2 | 0.343 | 0.243 | 0.586 |\n",
"| ViT-B/32 | 8 | 1.093 | 0.831 | 1.924 |\n",
"| ViT-B/32 | 16 | 1.952 | 1.523 | 3.475 |\n",
"| ViT-B/32 | 32 | 4.079 | 3.015 | 7.094 |\n",
"| ViT-B/32 | 64 | 8.07 | 6.212 | 14.282 |\n"
]
}
]
}
]
}
================================================
FILE: examples/dev/clip_onnx_benchmark_gpu.ipynb
================================================
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "clip-onnx-benchmark-gpu.ipynb",
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"## Restart colab session after installation\n",
"Reload the session if something doesn't work"
],
"metadata": {
"id": "fxPg_VvZuScV"
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "al_QNjyFq6Jj"
},
"outputs": [],
"source": [
"%%capture\n",
"!pip install git+https://github.com/Lednik7/CLIP-ONNX.git\n",
"!pip install git+https://github.com/openai/CLIP.git\n",
"!pip install onnxruntime-gpu"
]
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true"
],
"metadata": {
"id": "42eeJz9lTdJ6"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"source": [
"!nvidia-smi"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XuauIZIBSEUX",
"outputId": "7e2b352b-751e-439e-bb3d-4e1323e2e44d"
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Thu Jan 6 15:47:04 2022 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 495.44 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 34C P8 28W / 149W | 0MiB / 11441MiB | 0% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import onnxruntime\n",
"print(onnxruntime.get_device())"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gqvxpdajRX5_",
"outputId": "7c44b4e1-d916-42d9-cc61-52efdf0fa9a9"
},
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"GPU\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## GPU inference mode\n",
"Select a runtime GPU to continue:\n",
"\n",
"Click Runtime -> Change Runtime Type -> switch \"Harware accelerator\" to be GPU. Save it, and you maybe connect to GPU"
],
"metadata": {
"id": "010k-ksVTjAu"
}
},
{
"cell_type": "markdown",
"source": [
"### Torch CLIP"
],
"metadata": {
"id": "KdTz0IJWVBqE"
}
},
{
"cell_type": "code",
"source": [
"import clip\n",
"from PIL import Image\n",
"import numpy as np\n",
"\n",
"# onnx cannot work with cuda\n",
"model, preprocess = clip.load(\"ViT-B/32\", device=\"cpu\", jit=False)\n",
"\n",
"# batch first\n",
"image = preprocess(Image.open(\"CLIP.png\")).unsqueeze(0) # [1, 3, 224, 224]\n",
"image_onnx = image.detach().cpu().numpy().astype(np.float32)\n",
"\n",
"# batch first\n",
"text = clip.tokenize([\"a diagram\", \"a dog\", \"a cat\"]) # [3, 77]\n",
"text_onnx = text.detach().cpu().numpy().astype(np.int64)"
],
"metadata": {
"id": "9ROPwKYurOhP"
},
"execution_count": 1,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### CLIP-ONNX"
],
"metadata": {
"id": "Ao2MriaVVG6Y"
}
},
{
"cell_type": "code",
"source": [
"from clip_onnx import clip_onnx, attention\n",
"clip.model.ResidualAttentionBlock.attention = attention\n",
"\n",
"onnx_model = clip_onnx(model)\n",
"onnx_model.convert2onnx(image, text, verbose=False)\n",
"# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']\n",
"onnx_model.start_sessions(providers=[\"CPUExecutionProvider\"]) # GPU mode"
],
"metadata": {
"id": "nSeG9uAZrcph",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "25e07d68-6ef2-44c4-d144-c43b611f3316"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.7/dist-packages/clip_onnx/utils.py:40: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').\n",
" head_dim = q.shape[2] // num_heads\n",
"/usr/local/lib/python3.7/dist-packages/torch/onnx/symbolic_helper.py:716: UserWarning: allowzero=0 by default. In order to honor zero value in shape use allowzero=1\n",
" warnings.warn(\"allowzero=0 by default. In order to honor zero value in shape use allowzero=1\")\n",
"/usr/local/lib/python3.7/dist-packages/torch/onnx/symbolic_opset9.py:2819: UserWarning: Exporting aten::index operator of advanced indexing in opset 14 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.\n",
" \"If indices include negative values, the exported graph will produce incorrect results.\")\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"from clip_onnx import clip_onnx, attention\n",
"clip.model.ResidualAttentionBlock.attention = attention"
],
"metadata": {
"id": "imMVbHFO-KSH"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_model = clip_onnx(model)\n",
"onnx_model.load_onnx(\"/content/clip_visual.onnx\",\n",
" \"/content/clip_textual.onnx\",\n",
" model.logit_scale.exp())\n",
"onnx_model.start_sessions(providers=[\"CUDAExecutionProvider\"]) # GPU mode"
],
"metadata": {
"id": "PsDS7ty79zZf"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_model.visual_session.get_providers()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "aZsGJNrbNCYe",
"outputId": "9dcdd2d6-2a73-4dad-9ea7-c2892273c631"
},
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['CUDAExecutionProvider', 'CPUExecutionProvider']"
]
},
"metadata": {},
"execution_count": 4
}
]
},
{
"cell_type": "markdown",
"source": [
"## Benchmark"
],
"metadata": {
"id": "J5IcOG_6jAFz"
}
},
{
"cell_type": "code",
"source": [
"model, preprocess = clip.load(\"ViT-B/32\", device=\"cuda\", jit=False)"
],
"metadata": {
"id": "SJ_5_x7vLepK"
},
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"source": [
"model.eval()\n",
"for x in model.parameters():\n",
" x.requires_grad = False"
],
"metadata": {
"id": "OnOzZ3LMuubW"
},
"execution_count": 6,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import numpy, random, torch"
],
"metadata": {
"id": "wDwqRRrTGKUS"
},
"execution_count": 7,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def set_seed():\n",
" torch.manual_seed(12)\n",
" torch.cuda.manual_seed(12)\n",
" np.random.seed(12)\n",
" random.seed(12)\n",
"\n",
" torch.backends.cudnn.deterministic=True"
],
"metadata": {
"id": "9H17n_6gGJgT"
},
"execution_count": 8,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%timeit onnx_model.encode_image(image_onnx)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IsJ2TsBRNh8f",
"outputId": "bb642ee7-0112-4195-be35-14fdf719e7bc"
},
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The slowest run took 23.27 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
"1 loop, best of 5: 20.1 ms per loop\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import torch\n",
"import time\n",
"\n",
"n = 5\n",
"clip_results = {\"encode_image\": [],\n",
" \"encode_text\": []}\n",
"onnx_results = {\"encode_image\": [],\n",
" \"encode_text\": []}\n",
"for batch in [2, 8, 16, 32, 64]:\n",
" set_seed()\n",
" image_input = torch.randint(1, 255, (batch, 3, 224, 224)).cuda()\n",
" text_input = torch.randint(320, 49407, (batch, 77)).cuda()\n",
" image_input_onnx = image_input.detach().cpu().numpy().astype(np.float32)\n",
" text_input_onnx = text_input.detach().cpu().numpy().astype(np.int64)\n",
"\n",
" t_mean = []\n",
" for _ in range(n):\n",
" t = time.time()\n",
" onnx_model.encode_image(image_input_onnx)\n",
" t_mean.append(time.time() - t)\n",
" print(\"onnx\", batch, \"encode_image\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" onnx_results[\"encode_image\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" with torch.inference_mode():\n",
" t_mean = []\n",
" for _ in range(n):\n",
" t = time.time()\n",
" model.encode_image(image_input)\n",
" t_mean.append(time.time() - t)\n",
" print(\"torch\", batch, \"encode_image\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" clip_results[\"encode_image\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" t_mean = []\n",
" for _ in range(n):\n",
" t = time.time()\n",
" onnx_model.encode_text(text_input_onnx)\n",
" t_mean.append(time.time() - t)\n",
" print(\"onnx\", batch, \"encode_text\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" onnx_results[\"encode_text\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" with torch.inference_mode():\n",
" t_mean = []\n",
" for _ in range(n):\n",
" t = time.time()\n",
" model.encode_text(text_input)\n",
" t_mean.append(time.time() - t)\n",
" print(\"torch\", batch, \"encode_text\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" clip_results[\"encode_text\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" print(\"-\" * 78)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4lFL6tzWjiWL",
"outputId": "a209b78a-fe78-4b46-9220-4b9624a1568f"
},
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"onnx 2 encode_image 0.073\n",
"torch 2 encode_image 0.041\n",
"onnx 2 encode_text 0.032\n",
"torch 2 encode_text 0.033\n",
"------------------------------------------------------------------------------\n",
"onnx 8 encode_image 0.088\n",
"torch 8 encode_image 0.128\n",
"onnx 8 encode_text 0.052\n",
"torch 8 encode_text 0.102\n",
"------------------------------------------------------------------------------\n",
"onnx 16 encode_image 0.123\n",
"torch 16 encode_image 0.258\n",
"onnx 16 encode_text 0.08\n",
"torch 16 encode_text 0.201\n",
"------------------------------------------------------------------------------\n",
"onnx 32 encode_image 0.196\n",
"torch 32 encode_image 0.505\n",
"onnx 32 encode_text 0.138\n",
"torch 32 encode_text 0.386\n",
"------------------------------------------------------------------------------\n",
"onnx 64 encode_image 0.352\n",
"torch 64 encode_image 0.995\n",
"onnx 64 encode_text 0.252\n",
"torch 64 encode_text 0.754\n",
"------------------------------------------------------------------------------\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import pandas as pd"
],
"metadata": {
"id": "P2YhbE9v_4ci"
},
"execution_count": 11,
"outputs": []
},
{
"cell_type": "code",
"source": [
"pd.DataFrame({\"backend\": [\"onnx\", \"torch\"] * 5,\n",
" \"batch\": [2, 2, 8, 8, 16, 16, 32, 32, 64, 64],\n",
" \"encode_image\": [j[1] for i in zip(onnx_results[\"encode_image\"],\n",
" clip_results[\"encode_image\"]) for j in i],\n",
" \"encode_text\": [j[1] for i in zip(onnx_results[\"encode_text\"],\n",
" clip_results[\"encode_text\"]) for j in i]})"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 362
},
"id": "WfZfDk4PAlqm",
"outputId": "aa180c38-35f8-403c-a172-4e78266510d5"
},
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
" <div id=\"df-1af743e1-b1a7-4998-8692-878c02e1ea97\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>backend</th>\n",
" <th>batch</th>\n",
" <th>encode_image</th>\n",
" <th>encode_text</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>onnx</td>\n",
" <td>2</td>\n",
" <td>0.073</td>\n",
" <td>0.032</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>torch</td>\n",
" <td>2</td>\n",
" <td>0.041</td>\n",
" <td>0.033</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>onnx</td>\n",
" <td>8</td>\n",
" <td>0.088</td>\n",
" <td>0.052</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>torch</td>\n",
" <td>8</td>\n",
" <td>0.128</td>\n",
" <td>0.102</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>onnx</td>\n",
" <td>16</td>\n",
" <td>0.123</td>\n",
" <td>0.080</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>torch</td>\n",
" <td>16</td>\n",
" <td>0.258</td>\n",
" <td>0.201</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>onnx</td>\n",
" <td>32</td>\n",
" <td>0.196</td>\n",
" <td>0.138</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>torch</td>\n",
" <td>32</td>\n",
" <td>0.505</td>\n",
" <td>0.386</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>onnx</td>\n",
" <td>64</td>\n",
" <td>0.352</td>\n",
" <td>0.252</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>torch</td>\n",
" <td>64</td>\n",
" <td>0.995</td>\n",
" <td>0.754</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-1af743e1-b1a7-4998-8692-878c02e1ea97')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-1af743e1-b1a7-4998-8692-878c02e1ea97 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-1af743e1-b1a7-4998-8692-878c02e1ea97');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
],
"text/plain": [
" backend batch encode_image encode_text\n",
"0 onnx 2 0.073 0.032\n",
"1 torch 2 0.041 0.033\n",
"2 onnx 8 0.088 0.052\n",
"3 torch 8 0.128 0.102\n",
"4 onnx 16 0.123 0.080\n",
"5 torch 16 0.258 0.201\n",
"6 onnx 32 0.196 0.138\n",
"7 torch 32 0.505 0.386\n",
"8 onnx 64 0.352 0.252\n",
"9 torch 64 0.995 0.754"
]
},
"metadata": {},
"execution_count": 12
}
]
},
{
"cell_type": "code",
"source": [
"onnx_df = pd.DataFrame({\"ONNX\": [\"ViT-B/32\"] * 5,\n",
" \"batch\": [2, 8, 16, 32, 64],\n",
" \"encode_image\": [i[1] for i in onnx_results[\"encode_image\"]],\n",
" \"encode_text\": [i[1] for i in onnx_results[\"encode_text\"]]})\n",
"onnx_df[\"summary\"] = onnx_df[\"encode_image\"] + onnx_df[\"encode_text\"]"
],
"metadata": {
"id": "Xpw9lV7yBbA8"
},
"execution_count": 13,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_df"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "LItAyQkeDhnQ",
"outputId": "ebd84ad1-f305-4578-9164-2884aaa2b245"
},
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
" <div id=\"df-37ded07e-3c67-4ace-b569-8fa6f4dd5a44\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ONNX</th>\n",
" <th>batch</th>\n",
" <th>encode_image</th>\n",
" <th>encode_text</th>\n",
" <th>summary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ViT-B/32</td>\n",
" <td>2</td>\n",
" <td>0.073</td>\n",
" <td>0.032</td>\n",
" <td>0.105</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ViT-B/32</td>\n",
" <td>8</td>\n",
" <td>0.088</td>\n",
" <td>0.052</td>\n",
" <td>0.140</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ViT-B/32</td>\n",
" <td>16</td>\n",
" <td>0.123</td>\n",
" <td>0.080</td>\n",
" <td>0.203</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ViT-B/32</td>\n",
" <td>32</td>\n",
" <td>0.196</td>\n",
" <td>0.138</td>\n",
" <td>0.334</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ViT-B/32</td>\n",
" <td>64</td>\n",
" <td>0.352</td>\n",
" <td>0.252</td>\n",
" <td>0.604</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-37ded07e-3c67-4ace-b569-8fa6f4dd5a44')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-37ded07e-3c67-4ace-b569-8fa6f4dd5a44 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-37ded07e-3c67-4ace-b569-8fa6f4dd5a44');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
],
"text/plain": [
" ONNX batch encode_image encode_text summary\n",
"0 ViT-B/32 2 0.073 0.032 0.105\n",
"1 ViT-B/32 8 0.088 0.052 0.140\n",
"2 ViT-B/32 16 0.123 0.080 0.203\n",
"3 ViT-B/32 32 0.196 0.138 0.334\n",
"4 ViT-B/32 64 0.352 0.252 0.604"
]
},
"metadata": {},
"execution_count": 14
}
]
},
{
"cell_type": "code",
"source": [
"print(onnx_df.to_markdown(index=False))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "AIQDA9FaJZ7Y",
"outputId": "4fdfd92a-5c8c-43d9-e875-7bcddc882113"
},
"execution_count": 15,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"| ONNX | batch | encode_image | encode_text | summary |\n",
"|:---------|--------:|---------------:|--------------:|----------:|\n",
"| ViT-B/32 | 2 | 0.073 | 0.032 | 0.105 |\n",
"| ViT-B/32 | 8 | 0.088 | 0.052 | 0.14 |\n",
"| ViT-B/32 | 16 | 0.123 | 0.08 | 0.203 |\n",
"| ViT-B/32 | 32 | 0.196 | 0.138 | 0.334 |\n",
"| ViT-B/32 | 64 | 0.352 | 0.252 | 0.604 |\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"clip_df = pd.DataFrame({\"TORCH\": [\"ViT-B/32\"] * 5,\n",
" \"batch\": [2, 8, 16, 32, 64],\n",
" \"encode_image\": [i[1] for i in clip_results[\"encode_image\"]],\n",
" \"encode_text\": [i[1] for i in clip_results[\"encode_text\"]]})\n",
"clip_df[\"summary\"] = clip_df[\"encode_image\"] + clip_df[\"encode_text\"]"
],
"metadata": {
"id": "E1OXQUDvDZmI"
},
"execution_count": 16,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(clip_df.to_markdown(index=False))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xAj-ynhCDpPO",
"outputId": "6a36903d-6bba-4675-8eb3-7f58af98e165"
},
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"| TORCH | batch | encode_image | encode_text | summary |\n",
"|:---------|--------:|---------------:|--------------:|----------:|\n",
"| ViT-B/32 | 2 | 0.041 | 0.033 | 0.074 |\n",
"| ViT-B/32 | 8 | 0.128 | 0.102 | 0.23 |\n",
"| ViT-B/32 | 16 | 0.258 | 0.201 | 0.459 |\n",
"| ViT-B/32 | 32 | 0.505 | 0.386 | 0.891 |\n",
"| ViT-B/32 | 64 | 0.995 | 0.754 | 1.749 |\n"
]
}
]
}
]
}
================================================
FILE: examples/dev/clip_onnx_benchmark_gpu_K80.ipynb
================================================
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "clip-onnx-benchmark-gpu-K80.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyOXxz4T8v9RCW/JZlRRUtl4",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/dev/examples/dev/clip_onnx_benchmark_gpu_K80.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"## Restart colab session after installation\n",
"Reload the session if something doesn't work"
],
"metadata": {
"id": "fxPg_VvZuScV"
}
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "al_QNjyFq6Jj"
},
"outputs": [],
"source": [
"%%capture\n",
"!pip install git+https://github.com/Lednik7/CLIP-ONNX.git@dev\n",
"!pip install git+https://github.com/openai/CLIP.git\n",
"!pip install onnxruntime-gpu"
]
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true"
],
"metadata": {
"id": "42eeJz9lTdJ6"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"source": [
"!nvidia-smi"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XuauIZIBSEUX",
"outputId": "3bfb5833-272d-4aa0-f296-edab8122547c"
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Tue May 3 07:20:58 2022 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 56C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import onnxruntime\n",
"print(onnxruntime.get_device())"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gqvxpdajRX5_",
"outputId": "bb8e9195-fe9c-421c-e27b-d76da7136b82"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"GPU\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## GPU inference mode\n",
"Select a runtime GPU to continue:\n",
"\n",
"Click Runtime -> Change Runtime Type -> switch \"Harware accelerator\" to be GPU. Save it, and you maybe connect to GPU"
],
"metadata": {
"id": "010k-ksVTjAu"
}
},
{
"cell_type": "markdown",
"source": [
"### Torch CLIP"
],
"metadata": {
"id": "KdTz0IJWVBqE"
}
},
{
"cell_type": "code",
"source": [
"import clip\n",
"from PIL import Image\n",
"import numpy as np\n",
"\n",
"# onnx cannot work with cuda\n",
"model, preprocess = clip.load(\"ViT-B/32\", device=\"cpu\", jit=False)\n",
"\n",
"# batch first\n",
"image = preprocess(Image.open(\"CLIP.png\")).unsqueeze(0) # [1, 3, 224, 224]\n",
"image_onnx = image.detach().cpu().numpy().astype(np.float32)\n",
"\n",
"# batch first\n",
"text = clip.tokenize([\"a diagram\", \"a dog\", \"a cat\"]) # [3, 77]\n",
"text_onnx = text.detach().cpu().numpy().astype(np.int32)"
],
"metadata": {
"id": "9ROPwKYurOhP"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### CLIP-ONNX"
],
"metadata": {
"id": "Ao2MriaVVG6Y"
}
},
{
"cell_type": "code",
"source": [
"from clip_onnx import clip_onnx\n",
"from clip_onnx.utils import DEFAULT_EXPORT\n",
"\n",
"DEFAULT_EXPORT[\"opset_version\"] = 15\n",
"\n",
"onnx_model = clip_onnx(model)\n",
"onnx_model.convert2onnx(image, text, verbose=True)\n",
"# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']\n",
"onnx_model.start_sessions(providers=[\"CPUExecutionProvider\"]) # GPU mode"
],
"metadata": {
"id": "nSeG9uAZrcph",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "1d4a8404-104f-4107-f2c4-e7e1f7b1d104"
},
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start convert visual model\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.7/dist-packages/torch/onnx/symbolic_helper.py:719: UserWarning: allowzero=0 by default. In order to honor zero value in shape use allowzero=1\n",
" warnings.warn(\"allowzero=0 by default. In order to honor zero value in shape use allowzero=1\")\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start check visual model\n",
"[CLIP ONNX] Start convert textual model\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.7/dist-packages/torch/onnx/symbolic_opset9.py:2909: UserWarning: Exporting aten::index operator of advanced indexing in opset 15 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.\n",
" \"If indices include negative values, the exported graph will produce incorrect results.\")\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start check textual model\n",
"[CLIP ONNX] Models converts successfully\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"onnx_model = clip_onnx(model)\n",
"onnx_model.load_onnx(\"/content/clip_visual.onnx\",\n",
" \"/content/clip_textual.onnx\",\n",
" model.logit_scale.exp())\n",
"onnx_model.start_sessions(providers=[\"CUDAExecutionProvider\"]) # GPU mode"
],
"metadata": {
"id": "PsDS7ty79zZf"
},
"execution_count": 6,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_model.visual_session.get_providers()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "aZsGJNrbNCYe",
"outputId": "b0ee40a7-2ece-4e88-9e35-9ed0a735c533"
},
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['CUDAExecutionProvider', 'CPUExecutionProvider']"
]
},
"metadata": {},
"execution_count": 7
}
]
},
{
"cell_type": "markdown",
"source": [
"## Benchmark"
],
"metadata": {
"id": "J5IcOG_6jAFz"
}
},
{
"cell_type": "code",
"source": [
"model, preprocess = clip.load(\"ViT-B/32\", device=\"cuda\", jit=False)"
],
"metadata": {
"id": "SJ_5_x7vLepK"
},
"execution_count": 8,
"outputs": []
},
{
"cell_type": "code",
"source": [
"model.eval()\n",
"for x in model.parameters():\n",
" x.requires_grad = False"
],
"metadata": {
"id": "OnOzZ3LMuubW"
},
"execution_count": 9,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import numpy, random, torch"
],
"metadata": {
"id": "wDwqRRrTGKUS"
},
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def set_seed():\n",
" torch.manual_seed(12)\n",
" torch.cuda.manual_seed(12)\n",
" np.random.seed(12)\n",
" random.seed(12)\n",
"\n",
" torch.backends.cudnn.deterministic=True"
],
"metadata": {
"id": "9H17n_6gGJgT"
},
"execution_count": 11,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import torch\n",
"import time\n",
"\n",
"n = 5\n",
"clip_results = {\"encode_image\": [],\n",
" \"encode_text\": []}\n",
"onnx_results = {\"encode_image\": [],\n",
" \"encode_text\": []}\n",
"for batch in [2, 8, 16, 32, 64]:\n",
" set_seed()\n",
" t_mean = []\n",
" for _ in range(n):\n",
" image_input = torch.randint(1, 255, (batch, 3, 224, 224))\n",
" image_input_onnx = image_input.detach().cpu().numpy().astype(np.float32)\n",
" t = time.time()\n",
" onnx_model.encode_image(image_input_onnx)\n",
" t_mean.append(time.time() - t)\n",
" print(\"onnx\", batch, \"encode_image\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" onnx_results[\"encode_image\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" set_seed()\n",
" with torch.inference_mode():\n",
" t_mean = []\n",
" for _ in range(n):\n",
" image_input = torch.randint(1, 255, (batch, 3, 224, 224)).cuda()\n",
" t = time.time()\n",
" model.encode_image(image_input)\n",
" t_mean.append(time.time() - t)\n",
" print(\"torch\", batch, \"encode_image\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" clip_results[\"encode_image\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" set_seed()\n",
" t_mean = []\n",
" for _ in range(n):\n",
" text_input = torch.randint(320, 49407, (batch, 77))\n",
" text_input_onnx = text_input.detach().cpu().numpy().astype(np.int32)\n",
" t = time.time()\n",
" onnx_model.encode_text(text_input_onnx)\n",
" t_mean.append(time.time() - t)\n",
" print(\"onnx\", batch, \"encode_text\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" onnx_results[\"encode_text\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" set_seed()\n",
" with torch.inference_mode():\n",
" t_mean = []\n",
" for _ in range(n):\n",
" text_input = torch.randint(320, 49407, (batch, 77)).cuda()\n",
" t = time.time()\n",
" model.encode_text(text_input)\n",
" t_mean.append(time.time() - t)\n",
" print(\"torch\", batch, \"encode_text\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" clip_results[\"encode_text\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" print(\"-\" * 78)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4lFL6tzWjiWL",
"outputId": "ccaa7e0a-96f3-4a51-c4bd-c442aa13763c"
},
"execution_count": 12,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"onnx 2 encode_image 0.136\n",
"torch 2 encode_image 0.02\n",
"onnx 2 encode_text 0.021\n",
"torch 2 encode_text 0.035\n",
"------------------------------------------------------------------------------\n",
"onnx 8 encode_image 0.054\n",
"torch 8 encode_image 0.081\n",
"onnx 8 encode_text 0.04\n",
"torch 8 encode_text 0.098\n",
"------------------------------------------------------------------------------\n",
"onnx 16 encode_image 0.089\n",
"torch 16 encode_image 0.207\n",
"onnx 16 encode_text 0.071\n",
"torch 16 encode_text 0.196\n",
"------------------------------------------------------------------------------\n",
"onnx 32 encode_image 0.158\n",
"torch 32 encode_image 0.44\n",
"onnx 32 encode_text 0.134\n",
"torch 32 encode_text 0.374\n",
"------------------------------------------------------------------------------\n",
"onnx 64 encode_image 0.325\n",
"torch 64 encode_image 0.919\n",
"onnx 64 encode_text 0.258\n",
"torch 64 encode_text 0.719\n",
"------------------------------------------------------------------------------\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import pandas as pd"
],
"metadata": {
"id": "P2YhbE9v_4ci"
},
"execution_count": 13,
"outputs": []
},
{
"cell_type": "code",
"source": [
"pd.DataFrame({\"backend\": [\"onnx\", \"torch\"] * 5,\n",
" \"batch\": [2, 2, 8, 8, 16, 16, 32, 32, 64, 64],\n",
" \"encode_image\": [j[1] for i in zip(onnx_results[\"encode_image\"],\n",
" clip_results[\"encode_image\"]) for j in i],\n",
" \"encode_text\": [j[1] for i in zip(onnx_results[\"encode_text\"],\n",
" clip_results[\"encode_text\"]) for j in i]})"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 362
},
"id": "WfZfDk4PAlqm",
"outputId": "78a5cae8-68ee-4edd-f34d-ccf7d3d8a23b"
},
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" backend batch encode_image encode_text\n",
"0 onnx 2 0.136 0.021\n",
"1 torch 2 0.020 0.035\n",
"2 onnx 8 0.054 0.040\n",
"3 torch 8 0.081 0.098\n",
"4 onnx 16 0.089 0.071\n",
"5 torch 16 0.207 0.196\n",
"6 onnx 32 0.158 0.134\n",
"7 torch 32 0.440 0.374\n",
"8 onnx 64 0.325 0.258\n",
"9 torch 64 0.919 0.719"
],
"text/html": [
"\n",
" <div id=\"df-253653f6-3c54-446c-8c64-9345630eaf7b\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>backend</th>\n",
" <th>batch</th>\n",
" <th>encode_image</th>\n",
" <th>encode_text</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>onnx</td>\n",
" <td>2</td>\n",
" <td>0.136</td>\n",
" <td>0.021</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>torch</td>\n",
" <td>2</td>\n",
" <td>0.020</td>\n",
" <td>0.035</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>onnx</td>\n",
" <td>8</td>\n",
" <td>0.054</td>\n",
" <td>0.040</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>torch</td>\n",
" <td>8</td>\n",
" <td>0.081</td>\n",
" <td>0.098</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>onnx</td>\n",
" <td>16</td>\n",
" <td>0.089</td>\n",
" <td>0.071</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>torch</td>\n",
" <td>16</td>\n",
" <td>0.207</td>\n",
" <td>0.196</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>onnx</td>\n",
" <td>32</td>\n",
" <td>0.158</td>\n",
" <td>0.134</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>torch</td>\n",
" <td>32</td>\n",
" <td>0.440</td>\n",
" <td>0.374</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>onnx</td>\n",
" <td>64</td>\n",
" <td>0.325</td>\n",
" <td>0.258</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>torch</td>\n",
" <td>64</td>\n",
" <td>0.919</td>\n",
" <td>0.719</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-253653f6-3c54-446c-8c64-9345630eaf7b')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-253653f6-3c54-446c-8c64-9345630eaf7b button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-253653f6-3c54-446c-8c64-9345630eaf7b');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 14
}
]
},
{
"cell_type": "code",
"source": [
"onnx_df = pd.DataFrame({\"ONNX\": [\"ViT-B/32\"] * 5,\n",
" \"batch\": [2, 8, 16, 32, 64],\n",
" \"encode_image\": [i[1] for i in onnx_results[\"encode_image\"]],\n",
" \"encode_text\": [i[1] for i in onnx_results[\"encode_text\"]]})\n",
"onnx_df[\"total\"] = onnx_df[\"encode_image\"] + onnx_df[\"encode_text\"]"
],
"metadata": {
"id": "Xpw9lV7yBbA8"
},
"execution_count": 15,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_df"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "LItAyQkeDhnQ",
"outputId": "f9c1860c-e405-4d41-e530-d2b0027f1fd0"
},
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" ONNX batch encode_image encode_text total\n",
"0 ViT-B/32 2 0.136 0.021 0.157\n",
"1 ViT-B/32 8 0.054 0.040 0.094\n",
"2 ViT-B/32 16 0.089 0.071 0.160\n",
"3 ViT-B/32 32 0.158 0.134 0.292\n",
"4 ViT-B/32 64 0.325 0.258 0.583"
],
"text/html": [
"\n",
" <div id=\"df-fee38102-dd90-4015-a566-69309cf3ae5f\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ONNX</th>\n",
" <th>batch</th>\n",
" <th>encode_image</th>\n",
" <th>encode_text</th>\n",
" <th>total</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ViT-B/32</td>\n",
" <td>2</td>\n",
" <td>0.136</td>\n",
" <td>0.021</td>\n",
" <td>0.157</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ViT-B/32</td>\n",
" <td>8</td>\n",
" <td>0.054</td>\n",
" <td>0.040</td>\n",
" <td>0.094</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ViT-B/32</td>\n",
" <td>16</td>\n",
" <td>0.089</td>\n",
" <td>0.071</td>\n",
" <td>0.160</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ViT-B/32</td>\n",
" <td>32</td>\n",
" <td>0.158</td>\n",
" <td>0.134</td>\n",
" <td>0.292</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ViT-B/32</td>\n",
" <td>64</td>\n",
" <td>0.325</td>\n",
" <td>0.258</td>\n",
" <td>0.583</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-fee38102-dd90-4015-a566-69309cf3ae5f')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-fee38102-dd90-4015-a566-69309cf3ae5f button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-fee38102-dd90-4015-a566-69309cf3ae5f');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 16
}
]
},
{
"cell_type": "code",
"source": [
"print(onnx_df.to_markdown(index=False))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "AIQDA9FaJZ7Y",
"outputId": "36aa68bb-8ebb-47de-d2b4-b8ce36cacfd7"
},
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"| ONNX | batch | encode_image | encode_text | total |\n",
"|:---------|--------:|---------------:|--------------:|--------:|\n",
"| ViT-B/32 | 2 | 0.136 | 0.021 | 0.157 |\n",
"| ViT-B/32 | 8 | 0.054 | 0.04 | 0.094 |\n",
"| ViT-B/32 | 16 | 0.089 | 0.071 | 0.16 |\n",
"| ViT-B/32 | 32 | 0.158 | 0.134 | 0.292 |\n",
"| ViT-B/32 | 64 | 0.325 | 0.258 | 0.583 |\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"clip_df = pd.DataFrame({\"TORCH\": [\"ViT-B/32\"] * 5,\n",
" \"batch\": [2, 8, 16, 32, 64],\n",
" \"encode_image\": [i[1] for i in clip_results[\"encode_image\"]],\n",
" \"encode_text\": [i[1] for i in clip_results[\"encode_text\"]]})\n",
"clip_df[\"total\"] = clip_df[\"encode_image\"] + clip_df[\"encode_text\"]"
],
"metadata": {
"id": "E1OXQUDvDZmI"
},
"execution_count": 18,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(clip_df.to_markdown(index=False))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xAj-ynhCDpPO",
"outputId": "6f31dab3-8b2a-4b64-ed97-2ac309d6d749"
},
"execution_count": 19,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"| TORCH | batch | encode_image | encode_text | total |\n",
"|:---------|--------:|---------------:|--------------:|--------:|\n",
"| ViT-B/32 | 2 | 0.02 | 0.035 | 0.055 |\n",
"| ViT-B/32 | 8 | 0.081 | 0.098 | 0.179 |\n",
"| ViT-B/32 | 16 | 0.207 | 0.196 | 0.403 |\n",
"| ViT-B/32 | 32 | 0.44 | 0.374 | 0.814 |\n",
"| ViT-B/32 | 64 | 0.919 | 0.719 | 1.638 |\n"
]
}
]
}
]
}
================================================
FILE: examples/dev/clip_onnx_benchmark_gpu_T4.ipynb
================================================
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "clip-onnx-benchmark-gpu-T4.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyNqeHpYdbkhiqZatysOn5ch",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/dev/examples/dev/clip_onnx_benchmark_gpu_T4.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"## Restart colab session after installation\n",
"Reload the session if something doesn't work"
],
"metadata": {
"id": "fxPg_VvZuScV"
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "al_QNjyFq6Jj"
},
"outputs": [],
"source": [
"%%capture\n",
"!pip install git+https://github.com/Lednik7/CLIP-ONNX.git@dev\n",
"!pip install git+https://github.com/openai/CLIP.git\n",
"!pip install onnxruntime-gpu"
]
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true"
],
"metadata": {
"id": "42eeJz9lTdJ6"
},
"execution_count": 1,
"outputs": []
},
{
"cell_type": "code",
"source": [
"!nvidia-smi"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XuauIZIBSEUX",
"outputId": "3e459c2c-8f31-4aff-c288-f2e6c4684e36"
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Tue May 3 07:10:09 2022 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 38C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import onnxruntime\n",
"print(onnxruntime.get_device())"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gqvxpdajRX5_",
"outputId": "48a89abb-a326-4563-f99a-40c7d25145af"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"GPU\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## GPU inference mode\n",
"Select a runtime GPU to continue:\n",
"\n",
"Click Runtime -> Change Runtime Type -> switch \"Harware accelerator\" to be GPU. Save it, and you maybe connect to GPU"
],
"metadata": {
"id": "010k-ksVTjAu"
}
},
{
"cell_type": "markdown",
"source": [
"### Torch CLIP"
],
"metadata": {
"id": "KdTz0IJWVBqE"
}
},
{
"cell_type": "code",
"source": [
"import clip\n",
"from PIL import Image\n",
"import numpy as np\n",
"\n",
"# onnx cannot work with cuda\n",
"model, preprocess = clip.load(\"ViT-B/32\", device=\"cpu\", jit=False)\n",
"\n",
"# batch first\n",
"image = preprocess(Image.open(\"CLIP.png\")).unsqueeze(0) # [1, 3, 224, 224]\n",
"image_onnx = image.detach().cpu().numpy().astype(np.float32)\n",
"\n",
"# batch first\n",
"text = clip.tokenize([\"a diagram\", \"a dog\", \"a cat\"]) # [3, 77]\n",
"text_onnx = text.detach().cpu().numpy().astype(np.int32)"
],
"metadata": {
"id": "9ROPwKYurOhP"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### CLIP-ONNX"
],
"metadata": {
"id": "Ao2MriaVVG6Y"
}
},
{
"cell_type": "code",
"source": [
"from clip_onnx import clip_onnx\n",
"\n",
"onnx_model = clip_onnx(model)\n",
"onnx_model.convert2onnx(image, text, verbose=True)\n",
"# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']\n",
"onnx_model.start_sessions(providers=[\"CPUExecutionProvider\"]) # GPU mode"
],
"metadata": {
"id": "nSeG9uAZrcph",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "1186f909-6cfb-400b-c2d9-3dddc93d318b"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start convert visual model\n",
"[CLIP ONNX] Start check visual model\n",
"[CLIP ONNX] Start convert textual model\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.7/dist-packages/torch/onnx/symbolic_opset9.py:2909: UserWarning: Exporting aten::index operator of advanced indexing in opset 12 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.\n",
" \"If indices include negative values, the exported graph will produce incorrect results.\")\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"[CLIP ONNX] Start check textual model\n",
"[CLIP ONNX] Models converts successfully\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"onnx_model = clip_onnx(model)\n",
"onnx_model.load_onnx(\"/content/clip_visual.onnx\",\n",
" \"/content/clip_textual.onnx\",\n",
" model.logit_scale.exp())\n",
"onnx_model.start_sessions(providers=[\"CUDAExecutionProvider\"]) # GPU mode"
],
"metadata": {
"id": "PsDS7ty79zZf"
},
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_model.visual_session.get_providers()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "aZsGJNrbNCYe",
"outputId": "05464d1a-7047-4efd-80fe-32870cf34afd"
},
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['CUDAExecutionProvider', 'CPUExecutionProvider']"
]
},
"metadata": {},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"source": [
"## Benchmark"
],
"metadata": {
"id": "J5IcOG_6jAFz"
}
},
{
"cell_type": "code",
"source": [
"model, preprocess = clip.load(\"ViT-B/32\", device=\"cuda\", jit=False)"
],
"metadata": {
"id": "SJ_5_x7vLepK"
},
"execution_count": 7,
"outputs": []
},
{
"cell_type": "code",
"source": [
"model.eval()\n",
"for x in model.parameters():\n",
" x.requires_grad = False"
],
"metadata": {
"id": "OnOzZ3LMuubW"
},
"execution_count": 8,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import numpy, random, torch"
],
"metadata": {
"id": "wDwqRRrTGKUS"
},
"execution_count": 9,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def set_seed():\n",
" torch.manual_seed(12)\n",
" torch.cuda.manual_seed(12)\n",
" np.random.seed(12)\n",
" random.seed(12)\n",
"\n",
" torch.backends.cudnn.deterministic=True"
],
"metadata": {
"id": "9H17n_6gGJgT"
},
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import torch\n",
"import time\n",
"\n",
"n = 5\n",
"clip_results = {\"encode_image\": [],\n",
" \"encode_text\": []}\n",
"onnx_results = {\"encode_image\": [],\n",
" \"encode_text\": []}\n",
"for batch in [2, 8, 16, 32, 64]:\n",
" set_seed()\n",
" t_mean = []\n",
" for _ in range(n):\n",
" image_input = torch.randint(1, 255, (batch, 3, 224, 224))\n",
" image_input_onnx = image_input.detach().cpu().numpy().astype(np.float32)\n",
" t = time.time()\n",
" onnx_model.encode_image(image_input_onnx)\n",
" t_mean.append(time.time() - t)\n",
" print(\"onnx\", batch, \"encode_image\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" onnx_results[\"encode_image\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" set_seed()\n",
" with torch.inference_mode():\n",
" t_mean = []\n",
" for _ in range(n):\n",
" image_input = torch.randint(1, 255, (batch, 3, 224, 224)).cuda()\n",
" t = time.time()\n",
" model.encode_image(image_input)\n",
" t_mean.append(time.time() - t)\n",
" print(\"torch\", batch, \"encode_image\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" clip_results[\"encode_image\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" set_seed()\n",
" t_mean = []\n",
" for _ in range(n):\n",
" text_input = torch.randint(320, 49407, (batch, 77))\n",
" text_input_onnx = text_input.detach().cpu().numpy().astype(np.int32)\n",
" t = time.time()\n",
" onnx_model.encode_text(text_input_onnx)\n",
" t_mean.append(time.time() - t)\n",
" print(\"onnx\", batch, \"encode_text\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" onnx_results[\"encode_text\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" set_seed()\n",
" with torch.inference_mode():\n",
" t_mean = []\n",
" for _ in range(n):\n",
" text_input = torch.randint(320, 49407, (batch, 77)).cuda()\n",
" t = time.time()\n",
" model.encode_text(text_input)\n",
" t_mean.append(time.time() - t)\n",
" print(\"torch\", batch, \"encode_text\", round(sum(t_mean) / n, 3))\n",
" torch.cuda.empty_cache()\n",
" clip_results[\"encode_text\"].append([batch, round(sum(t_mean) / n, 3)])\n",
"\n",
" print(\"-\" * 78)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4lFL6tzWjiWL",
"outputId": "c2b9f0e4-9b93-408b-96bf-3fdb3057e15b"
},
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"onnx 2 encode_image 0.155\n",
"torch 2 encode_image 0.017\n",
"onnx 2 encode_text 0.01\n",
"torch 2 encode_text 0.009\n",
"------------------------------------------------------------------------------\n",
"onnx 8 encode_image 0.032\n",
"torch 8 encode_image 0.008\n",
"onnx 8 encode_text 0.014\n",
"torch 8 encode_text 0.008\n",
"------------------------------------------------------------------------------\n",
"onnx 16 encode_image 0.037\n",
"torch 16 encode_image 0.009\n",
"onnx 16 encode_text 0.029\n",
"torch 16 encode_text 0.012\n",
"------------------------------------------------------------------------------\n",
"onnx 32 encode_image 0.076\n",
"torch 32 encode_image 0.008\n",
"onnx 32 encode_text 0.059\n",
"torch 32 encode_text 0.025\n",
"------------------------------------------------------------------------------\n",
"onnx 64 encode_image 0.169\n",
"torch 64 encode_image 0.009\n",
"onnx 64 encode_text 0.117\n",
"torch 64 encode_text 0.049\n",
"------------------------------------------------------------------------------\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import pandas as pd"
],
"metadata": {
"id": "P2YhbE9v_4ci"
},
"execution_count": 12,
"outputs": []
},
{
"cell_type": "code",
"source": [
"pd.DataFrame({\"backend\": [\"onnx\", \"torch\"] * 5,\n",
" \"batch\": [2, 2, 8, 8, 16, 16, 32, 32, 64, 64],\n",
" \"encode_image\": [j[1] for i in zip(onnx_results[\"encode_image\"],\n",
" clip_results[\"encode_image\"]) for j in i],\n",
" \"encode_text\": [j[1] for i in zip(onnx_results[\"encode_text\"],\n",
" clip_results[\"encode_text\"]) for j in i]})"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 362
},
"id": "WfZfDk4PAlqm",
"outputId": "3375eac7-47b0-40ba-c2d6-c30fda2ab6d5"
},
"execution_count": 13,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" backend batch encode_image encode_text\n",
"0 onnx 2 0.155 0.010\n",
"1 torch 2 0.017 0.009\n",
"2 onnx 8 0.032 0.014\n",
"3 torch 8 0.008 0.008\n",
"4 onnx 16 0.037 0.029\n",
"5 torch 16 0.009 0.012\n",
"6 onnx 32 0.076 0.059\n",
"7 torch 32 0.008 0.025\n",
"8 onnx 64 0.169 0.117\n",
"9 torch 64 0.009 0.049"
],
"text/html": [
"\n",
" <div id=\"df-7accc06f-b13e-47ae-837f-e962ce3f48e2\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>backend</th>\n",
" <th>batch</th>\n",
" <th>encode_image</th>\n",
" <th>encode_text</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>onnx</td>\n",
" <td>2</td>\n",
" <td>0.155</td>\n",
" <td>0.010</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>torch</td>\n",
" <td>2</td>\n",
" <td>0.017</td>\n",
" <td>0.009</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>onnx</td>\n",
" <td>8</td>\n",
" <td>0.032</td>\n",
" <td>0.014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>torch</td>\n",
" <td>8</td>\n",
" <td>0.008</td>\n",
" <td>0.008</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>onnx</td>\n",
" <td>16</td>\n",
" <td>0.037</td>\n",
" <td>0.029</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>torch</td>\n",
" <td>16</td>\n",
" <td>0.009</td>\n",
" <td>0.012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>onnx</td>\n",
" <td>32</td>\n",
" <td>0.076</td>\n",
" <td>0.059</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>torch</td>\n",
" <td>32</td>\n",
" <td>0.008</td>\n",
" <td>0.025</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>onnx</td>\n",
" <td>64</td>\n",
" <td>0.169</td>\n",
" <td>0.117</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>torch</td>\n",
" <td>64</td>\n",
" <td>0.009</td>\n",
" <td>0.049</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-7accc06f-b13e-47ae-837f-e962ce3f48e2')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-7accc06f-b13e-47ae-837f-e962ce3f48e2 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-7accc06f-b13e-47ae-837f-e962ce3f48e2');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 13
}
]
},
{
"cell_type": "code",
"source": [
"onnx_df = pd.DataFrame({\"ONNX\": [\"ViT-B/32\"] * 5,\n",
" \"batch\": [2, 8, 16, 32, 64],\n",
" \"encode_image\": [i[1] for i in onnx_results[\"encode_image\"]],\n",
" \"encode_text\": [i[1] for i in onnx_results[\"encode_text\"]]})\n",
"onnx_df[\"total\"] = onnx_df[\"encode_image\"] + onnx_df[\"encode_text\"]"
],
"metadata": {
"id": "Xpw9lV7yBbA8"
},
"execution_count": 14,
"outputs": []
},
{
"cell_type": "code",
"source": [
"onnx_df"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "LItAyQkeDhnQ",
"outputId": "e6c88747-5eba-4c16-be40-d4de584f429e"
},
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" ONNX batch encode_image encode_text total\n",
"0 ViT-B/32 2 0.155 0.010 0.165\n",
"1 ViT-B/32 8 0.032 0.014 0.046\n",
"2 ViT-B/32 16 0.037 0.029 0.066\n",
"3 ViT-B/32 32 0.076 0.059 0.135\n",
"4 ViT-B/32 64 0.169 0.117 0.286"
],
"text/html": [
"\n",
" <div id=\"df-ec0578ab-55c5-4651-beb1-5a0710dffb36\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ONNX</th>\n",
" <th>batch</th>\n",
" <th>encode_image</th>\n",
" <th>encode_text</th>\n",
" <th>total</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ViT-B/32</td>\n",
" <td>2</td>\n",
" <td>0.155</td>\n",
" <td>0.010</td>\n",
" <td>0.165</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ViT-B/32</td>\n",
" <td>8</td>\n",
" <td>0.032</td>\n",
" <td>0.014</td>\n",
" <td>0.046</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ViT-B/32</td>\n",
" <td>16</td>\n",
" <td>0.037</td>\n",
" <td>0.029</td>\n",
" <td>0.066</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ViT-B/32</td>\n",
" <td>32</td>\n",
" <td>0.076</td>\n",
" <td>0.059</td>\n",
" <td>0.135</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ViT-B/32</td>\n",
" <td>64</td>\n",
" <td>0.169</td>\n",
" <td>0.117</td>\n",
" <td>0.286</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ec0578ab-55c5-4651-beb1-5a0710dffb36')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-ec0578ab-55c5-4651-beb1-5a0710dffb36 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-ec0578ab-55c5-4651-beb1-5a0710dffb36');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "code",
"source": [
"print(onnx_df.to_markdown(index=False))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "AIQDA9FaJZ7Y",
"outputId": "8b197c3c-63d1-42c4-8ca3-a3258acfc878"
},
"execution_count": 16,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"| ONNX | batch | encode_image | encode_text | total |\n",
"|:---------|--------:|---------------:|--------------:|--------:|\n",
"| ViT-B/32 | 2 | 0.155 | 0.01 | 0.165 |\n",
"| ViT-B/32 | 8 | 0.032 | 0.014 | 0.046 |\n",
"| ViT-B/32 | 16 | 0.037 | 0.029 | 0.066 |\n",
"| ViT-B/32 | 32 | 0.076 | 0.059 | 0.135 |\n",
"| ViT-B/32 | 64 | 0.169 | 0.117 | 0.286 |\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"clip_df = pd.DataFrame({\"TORCH\": [\"ViT-B/32\"] * 5,\n",
" \"batch\": [2, 8, 16, 32, 64],\n",
" \"encode_image\": [i[1] for i in clip_results[\"encode_image\"]],\n",
" \"encode_text\": [i[1] for i in clip_results[\"encode_text\"]]})\n",
"clip_df[\"total\"] = clip_df[\"encode_image\"] + clip_df[\"encode_text\"]"
],
"metadata": {
"id": "E1OXQUDvDZmI"
},
"execution_count": 17,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(clip_df.to_markdown(index=False))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xAj-ynhCDpPO",
"outputId": "f90bc132-4727-45df-a6c2-49e2a68e0a4a"
},
"execution_count": 18,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"| TORCH | batch | encode_image | encode_text | total |\n",
"|:---------|--------:|---------------:|--------------:|--------:|\n",
"| ViT-B/32 | 2 | 0.017 | 0.009 | 0.026 |\n",
"| ViT-B/32 | 8 | 0.008 | 0.008 | 0.016 |\n",
"| ViT-B/32 | 16 | 0.009 | 0.012 | 0.021 |\n",
"| ViT-B/32 | 32 | 0.008 | 0.025 | 0.033 |\n",
"| ViT-B/32 | 64 | 0.009 | 0.049 | 0.058 |\n"
]
}
]
}
]
}
================================================
FILE: examples/readme_example.ipynb
================================================
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "readme_example.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyPpME0Qdi/m3VZQ+jNj39dT",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/main/examples/readme_example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"## Restart colab session after installation\n",
"Reload the session if something doesn't work"
],
"metadata": {
"id": "whlsBiJgR8le"
}
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"!pip install git+https://github.com/Lednik7/CLIP-ONNX.git\n",
"!pip install git+https://github.com/openai/CLIP.git\n",
"!pip install onnxruntime-gpu"
],
"metadata": {
"id": "HnbpAkvuR73L"
},
"execution_count": 1,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true"
],
"metadata": {
"id": "tqy0zKM4R-7M"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"source": [
"!nvidia-smi # CPU Provider"
],
"metadata": {
"id": "eKqETHL4YscZ",
"outputId": "7ff0bc18-fb40-4296-ab05-b079043e46a1",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.\n",
"\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import onnxruntime\n",
"\n",
"print(onnxruntime.get_device()) # priority device"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "x8IN72OnSAIh",
"outputId": "81d14047-91fa-4a5c-a1e3-f5b550556591"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"CPU\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## CPU inference mode"
],
"metadata": {
"id": "U1Pr-YTtSEhs"
}
},
{
"cell_type": "code",
"source": [
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\", category=UserWarning)"
],
"metadata": {
"id": "gZTxanR26knr"
},
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import clip\n",
"from PIL import Image\n",
"import numpy as np\n",
"\n",
"# onnx cannot expo
gitextract_k9qaok4o/ ├── .gitignore ├── LICENSE ├── README.md ├── benchmark.md ├── clip_onnx/ │ ├── __init__.py │ ├── benchmark.py │ ├── clip_converter.py │ ├── clip_onnx.py │ └── utils.py ├── examples/ │ ├── RuCLIP_onnx_example.ipynb │ ├── clip_onnx_example.ipynb │ ├── dev/ │ │ ├── clip_onnx_benchmark_cpu.ipynb │ │ ├── clip_onnx_benchmark_gpu.ipynb │ │ ├── clip_onnx_benchmark_gpu_K80.ipynb │ │ └── clip_onnx_benchmark_gpu_T4.ipynb │ ├── readme_example.ipynb │ └── ru_CLIP_tiny_onnx.ipynb ├── requirements.txt └── setup.py
SYMBOL INDEX (23 symbols across 4 files)
FILE: clip_onnx/benchmark.py
function speed_test (line 5) | def speed_test(func, data_gen, n: int = 5, empty_cache: bool = True):
FILE: clip_onnx/clip_converter.py
class clip_converter (line 8) | class clip_converter(nn.Module):
method __init__ (line 9) | def __init__(self, model, visual_path: str = "clip_visual.onnx",
method quantization (line 23) | def quantization(self, mode: str = "dynamic"):
method torch_export (line 38) | def torch_export(self, model, dummy_input, path: str, export_params=DE...
method onnx_checker (line 41) | def onnx_checker(self, path: str):
method convert_visual (line 46) | def convert_visual(self, dummy_input, wrapper=lambda x: x,
method convert_textual (line 53) | def convert_textual(self, dummy_input, wrapper=Textual,
method convert2onnx (line 60) | def convert2onnx(self, visual_input=None, textual_input=None, verbose=...
FILE: clip_onnx/clip_onnx.py
class clip_onnx (line 6) | class clip_onnx(clip_converter):
method __init__ (line 7) | def __init__(self, model=None,
method load_onnx (line 15) | def load_onnx(self, visual_path=None, textual_path=None, logit_scale=N...
method start_sessions (line 27) | def start_sessions(self, providers=['TensorrtExecutionProvider',
method visual_run (line 37) | def visual_run(self, onnx_image):
method textual_run (line 42) | def textual_run(self, onnx_text):
method __call__ (line 47) | def __call__(self, image, text, device: str = "cpu"):
method encode_image (line 63) | def encode_image(self, image):
method encode_text (line 66) | def encode_text(self, text):
FILE: clip_onnx/utils.py
class Textual (line 6) | class Textual(nn.Module):
method __init__ (line 7) | def __init__(self, model):
method forward (line 16) | def forward(self, text):
function attention (line 33) | def attention(self, x: torch.Tensor):
function scaled_dot_product_attention (line 51) | def scaled_dot_product_attention(Q, K, V, attn_mask, dropout_p):
Condensed preview — 19 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (320K chars).
[
{
"path": ".gitignore",
"chars": 1799,
"preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
},
{
"path": "LICENSE",
"chars": 1072,
"preview": "MIT License\n\nCopyright (c) 2022 Gerasimov Maxim\n\nPermission is hereby granted, free of charge, to any person obtaining a"
},
{
"path": "README.md",
"chars": 4992,
"preview": "# CLIP-ONNX\nIt is a simple library to speed up CLIP inference up to 3x (K80 GPU)!\n\n[ Xeon (R) CPU @ 2.30 GHz with 2 cores (Google Colab session)\n\n| ONNX | batch"
},
{
"path": "clip_onnx/__init__.py",
"chars": 148,
"preview": "from .clip_converter import clip_converter\nfrom .clip_onnx import clip_onnx\nfrom .utils import Textual, attention\nfrom ."
},
{
"path": "clip_onnx/benchmark.py",
"chars": 397,
"preview": "import time\nimport torch\n\n\ndef speed_test(func, data_gen, n: int = 5, empty_cache: bool = True):\n if empty_cache:\n "
},
{
"path": "clip_onnx/clip_converter.py",
"chars": 3921,
"preview": "import torch\nimport onnx\nfrom torch import nn\nfrom onnxruntime.quantization import quantize_dynamic, QuantType\nfrom .uti"
},
{
"path": "clip_onnx/clip_onnx.py",
"chars": 2870,
"preview": "from .clip_converter import clip_converter\nimport torch\nimport onnxruntime\n\n\nclass clip_onnx(clip_converter):\n def __"
},
{
"path": "clip_onnx/utils.py",
"chars": 2758,
"preview": "import torch.nn.functional as F\nimport torch\nfrom torch import nn\n\n\nclass Textual(nn.Module):\n def __init__(self, mod"
},
{
"path": "examples/RuCLIP_onnx_example.ipynb",
"chars": 13717,
"preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n \"colab\": {\n \"name\": \"RuCLIP_onnx_example.ipynb\",\n "
},
{
"path": "examples/clip_onnx_example.ipynb",
"chars": 12849,
"preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n \"colab\": {\n \"name\": \"clip_onnx_example.ipynb\",\n "
},
{
"path": "examples/dev/clip_onnx_benchmark_cpu.ipynb",
"chars": 36358,
"preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n \"colab\": {\n \"name\": \"clip-onnx-benchmark-cpu.ipynb\",\n"
},
{
"path": "examples/dev/clip_onnx_benchmark_gpu.ipynb",
"chars": 37033,
"preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n \"colab\": {\n \"name\": \"clip-onnx-benchmark-gpu.ipynb\",\n"
},
{
"path": "examples/dev/clip_onnx_benchmark_gpu_K80.ipynb",
"chars": 37047,
"preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n \"colab\": {\n \"name\": \"clip-onnx-benchmark-gpu-K80.ipyn"
},
{
"path": "examples/dev/clip_onnx_benchmark_gpu_T4.ipynb",
"chars": 36396,
"preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n \"colab\": {\n \"name\": \"clip-onnx-benchmark-gpu-T4.ipynb"
},
{
"path": "examples/readme_example.ipynb",
"chars": 6360,
"preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n \"colab\": {\n \"name\": \"readme_example.ipynb\",\n \"pr"
},
{
"path": "examples/ru_CLIP_tiny_onnx.ipynb",
"chars": 90544,
"preview": "{\n \"nbformat\": 4,\n \"nbformat_minor\": 0,\n \"metadata\": {\n \"colab\": {\n \"name\": \"ru_CLIP_tiny_onnx.ipynb\",\n "
},
{
"path": "requirements.txt",
"chars": 47,
"preview": "torch==1.13.1\nonnxruntime>=1.11.1\nonnx>=1.11.0\n"
},
{
"path": "setup.py",
"chars": 397,
"preview": "import os\nimport pkg_resources\nfrom setuptools import setup, find_packages\n\nwith open(\"requirements.txt\", \"r\") as f:\n "
}
]
About this extraction
This page contains the full source code of the Lednik7/CLIP-ONNX GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 19 files (284.9 KB), approximately 82.8k tokens, and a symbol index with 23 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.