Showing preview only (1,947K chars total). Download the full file or copy to clipboard to get everything.
Repository: apple/ml-stable-diffusion
Branch: main
Commit: e12202c1f640
Files: 61
Total size: 1.8 MB
Directory structure:
gitextract_srx0ds09/
├── .github/
│ └── pull_request_template.md
├── .gitignore
├── ACKNOWLEDGEMENTS
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE.md
├── Package.swift
├── README.md
├── python_coreml_stable_diffusion/
│ ├── __init__.py
│ ├── _version.py
│ ├── activation_quantization.py
│ ├── attention.py
│ ├── chunk_mlprogram.py
│ ├── controlnet.py
│ ├── coreml_model.py
│ ├── layer_norm.py
│ ├── mixed_bit_compression_apply.py
│ ├── mixed_bit_compression_pre_analysis.py
│ ├── multilingual_projection.py
│ ├── pipeline.py
│ ├── torch2coreml.py
│ └── unet.py
├── requirements.txt
├── setup.py
├── swift/
│ ├── StableDiffusion/
│ │ ├── pipeline/
│ │ │ ├── CGImage+vImage.swift
│ │ │ ├── ControlNet.swift
│ │ │ ├── DPMSolverMultistepScheduler.swift
│ │ │ ├── Decoder.swift
│ │ │ ├── DiscreteFlowScheduler.swift
│ │ │ ├── Encoder.swift
│ │ │ ├── ManagedMLModel.swift
│ │ │ ├── MultiModalDiffusionTransformer.swift
│ │ │ ├── MultilingualTextEncoder.swift
│ │ │ ├── NumPyRandomSource.swift
│ │ │ ├── NvRandomSource.swift
│ │ │ ├── RandomSource.swift
│ │ │ ├── ResourceManaging.swift
│ │ │ ├── SafetyChecker.swift
│ │ │ ├── SampleTimer.swift
│ │ │ ├── Scheduler.swift
│ │ │ ├── StableDiffusion3Pipeline+Resources.swift
│ │ │ ├── StableDiffusion3Pipeline.swift
│ │ │ ├── StableDiffusionPipeline+Resources.swift
│ │ │ ├── StableDiffusionPipeline.Configuration.swift
│ │ │ ├── StableDiffusionPipeline.swift
│ │ │ ├── StableDiffusionXL+Resources.swift
│ │ │ ├── StableDiffusionXLPipeline.swift
│ │ │ ├── TextEncoder.swift
│ │ │ ├── TextEncoderT5.swift
│ │ │ ├── TextEncoderXL.swift
│ │ │ ├── TorchRandomSource.swift
│ │ │ └── Unet.swift
│ │ └── tokenizer/
│ │ ├── BPETokenizer+Reading.swift
│ │ ├── BPETokenizer.swift
│ │ └── T5Tokenizer.swift
│ ├── StableDiffusionCLI/
│ │ └── main.swift
│ └── StableDiffusionTests/
│ ├── Resources/
│ │ ├── merges.txt
│ │ └── vocab.json
│ └── StableDiffusionTests.swift
└── tests/
├── __init__.py
└── test_stable_diffusion.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/pull_request_template.md
================================================
Thank you for your interest in contributing to Core ML Stable Diffusion! Please review [CONTRIBUTING.md](../CONTRIBUTING.md) first. We appreciate your interest in the project!
================================================
FILE: .gitignore
================================================
*~
# Swift Package
.DS_Store
/.build
/Packages
/*.xcodeproj
.swiftpm
.vscode
.*.sw?
*.docc-build
*.vs
Package.resolved
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# macOS filesystem
*.DS_Store
================================================
FILE: ACKNOWLEDGEMENTS
================================================
Acknowledgements
Portions of this software may utilize the following copyrighted
material, the use of which is hereby acknowledged.
_____________________
The Hugging Face team (diffusers)
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
The Hugging Face team (transformers)
Copyright 2018- The Hugging Face team. All rights reserved.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Facebook, Inc (PyTorch)
From PyTorch:
Copyright (c) 2016- Facebook, Inc (Adam Paszke)
Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
From Caffe2:
Copyright (c) 2016-present, Facebook Inc. All rights reserved.
All contributions by Facebook:
Copyright (c) 2016 Facebook Inc.
All contributions by Google:
Copyright (c) 2015 Google Inc.
All rights reserved.
All contributions by Yangqing Jia:
Copyright (c) 2015 Yangqing Jia
All rights reserved.
All contributions by Kakao Brain:
Copyright 2019-2020 Kakao Brain
All contributions by Cruise LLC:
Copyright (c) 2022 Cruise LLC.
All rights reserved.
All contributions from Caffe:
Copyright(c) 2013, 2014, 2015, the respective contributors
All rights reserved.
All other contributions:
Copyright(c) 2015, 2016 the respective contributors
All rights reserved.
Caffe2 uses a copyright model similar to Caffe: each contributor holds
copyright over their contributions to Caffe2. The project versioning records
all such contribution and copyright details. If a contributor wants to further
mark their specific copyright on a particular contribution, they should
indicate their copyright solely in the commit message of the change when it is
committed.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
and IDIAP Research Institute nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
NumPy (RandomKit 1.3)
Copyright (c) 2003-2005, Jean-Sebastien Roy (js@jeannot.org)
The rk_random and rk_seed functions algorithms and the original design of
the Mersenne Twister RNG:
Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura,
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The names of its contributors may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Original algorithm for the implementation of rk_interval function from
Richard J. Wagner's implementation of the Mersenne Twister RNG, optimised by
Magnus Jonsson.
Constants used in the rk_double implementation by Isaku Wada.
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the open source team at [opensource-conduct@group.apple.com](mailto:opensource-conduct@group.apple.com). All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 1.4,
available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct.html](https://www.contributor-covenant.org/version/1/4/code-of-conduct.html)
================================================
FILE: CONTRIBUTING.md
================================================
# Contribution Guide
Thank you for your interest in contributing to Core ML Stable Diffusion! This project was released for system demonstration purposes and there are limited plans for future development of the repository. While we welcome new pull requests and issues please note that our response may be limited.
## Submitting a Pull Request
The project is licensed under the MIT license. By submitting a pull request, you represent that you have the right to license your contribution to Apple and the community, and agree by submitting the patch that your contributions are licensed under the MIT license.
## Code of Conduct
We ask that all community members read and observe our [Code of Conduct](CODE_OF_CONDUCT.md).
================================================
FILE: LICENSE.md
================================================
MIT License
Copyright (c) 2024 Apple Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
================================================
FILE: Package.swift
================================================
// swift-tools-version: 5.8
// The swift-tools-version declares the minimum version of Swift required to build this package.
import PackageDescription
let package = Package(
name: "stable-diffusion",
platforms: [
.macOS(.v13),
.iOS(.v16),
],
products: [
.library(
name: "StableDiffusion",
targets: ["StableDiffusion"]),
.executable(
name: "StableDiffusionSample",
targets: ["StableDiffusionCLI"])
],
dependencies: [
.package(url: "https://github.com/apple/swift-argument-parser.git", from: "1.2.3"),
.package(url: "https://github.com/huggingface/swift-transformers.git", exact: "0.1.8"),
],
targets: [
.target(
name: "StableDiffusion",
dependencies: [
.product(name: "Transformers", package: "swift-transformers"),
],
path: "swift/StableDiffusion"),
.executableTarget(
name: "StableDiffusionCLI",
dependencies: [
"StableDiffusion",
.product(name: "ArgumentParser", package: "swift-argument-parser")],
path: "swift/StableDiffusionCLI"),
.testTarget(
name: "StableDiffusionTests",
dependencies: ["StableDiffusion"],
path: "swift/StableDiffusionTests",
resources: [
.copy("Resources/vocab.json"),
.copy("Resources/merges.txt")
]),
]
)
================================================
FILE: README.md
================================================
# Core ML Stable Diffusion
Run Stable Diffusion on Apple Silicon with Core ML
[\[Blog Post\]](https://machinelearning.apple.com/research/stable-diffusion-coreml-apple-silicon) [\[BibTeX\]](#bibtex)
This repository comprises:
- `python_coreml_stable_diffusion`, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face [diffusers](https://github.com/huggingface/diffusers) in Python
- `StableDiffusion`, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by `python_coreml_stable_diffusion`
If you run into issues during installation or runtime, please refer to the [FAQ](#faq) section. Please refer to the [System Requirements](#system-requirements) section before getting started.
<img src="assets/readme_reel.png">
## <a name="system-requirements"></a> System Requirements
<details>
<summary> Details (Click to expand) </summary>
Model Conversion:
macOS | Python | coremltools |
:------:|:------:|:-----------:|
13.1 | 3.8 | 7.0 |
Project Build:
macOS | Xcode | Swift |
:------:|:-----:|:-----:|
13.1 | 14.3 | 5.8 |
Target Device Runtime:
macOS | iPadOS, iOS |
:------:|:-----------:|
13.1 | 16.2 |
Target Device Runtime ([With Memory Improvements](#compression-6-bits-and-higher)):
macOS | iPadOS, iOS |
:------:|:-----------:|
14.0 | 17.0 |
Target Device Hardware Generation:
Mac | iPad | iPhone |
:------:|:-------:|:-------:|
M1 | M1 | A14 |
</details>
## <a name="performance-benchmark"></a> Performance Benchmarks
<details>
<summary> Details (Click to expand) </summary>
[`stabilityai/stable-diffusion-2-1-base`](https://huggingface.co/apple/coreml-stable-diffusion-2-1-base) (512x512)
| Device | `--compute-unit`| `--attention-implementation` | End-to-End Latency (s) | Diffusion Speed (iter/s) |
| --------------------- | --------------- | ---------------------------- | ---------------------- | ------------------------ |
| iPhone 12 Mini | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 18.5* | 1.44 |
| iPhone 12 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 15.4 | 1.45 |
| iPhone 13 | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 10.8* | 2.53 |
| iPhone 13 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 10.4 | 2.55 |
| iPhone 14 | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 8.6 | 2.57 |
| iPhone 14 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 7.9 | 2.69 |
| iPad Pro (M1) | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 11.2 | 2.19 |
| iPad Pro (M2) | `CPU_AND_NE` | `SPLIT_EINSUM_V2` | 7.0 | 3.07 |
<details>
<summary> Details (Click to expand) </summary>
- This benchmark was conducted by Apple and Hugging Face using public beta versions of iOS 17.0, iPadOS 17.0 and macOS 14.0 Seed 8 in August 2023.
- The performance data was collected using the `benchmark` branch of the [Diffusers app](https://github.com/huggingface/swift-coreml-diffusers)
- Swift code is not fully optimized, introducing up to ~10% overhead unrelated to Core ML model execution.
- The median latency value across 5 back-to-back end-to-end executions are reported
- The image generation procedure follows the standard configuration: 20 inference steps, 512x512 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet).
- The actual prompt length does not impact performance because the Core ML model is converted with a static shape that computes the forward pass for all of the 77 elements (`tokenizer.model_max_length`) in the text token sequence regardless of the actual length of the input text.
- Weights are compressed to 6 bit precision. Please refer to [this section](#compression-6-bits-and-higher) for details.
- Activations are in float16 precision for both the GPU and the Neural Engine.
- `*` indicates that the [reduceMemory](https://github.com/apple/ml-stable-diffusion/blob/main/swift/StableDiffusion/pipeline/StableDiffusionPipeline.swift#L91) option was enabled which loads and unloads models just-in-time to avoid memory shortage. This added up to 2 seconds to the end-to-end latency.
- In the benchmark table, we report the best performing `--compute-unit` and `--attention-implementation` values per device. The former does not modify the Core ML model and can be applied during runtime. The latter modifies the Core ML model. Note that the best performing compute unit is model version and hardware-specific.
- Note that the performance optimizations in this repository (e.g. `--attention-implementation`) are generally applicable to Transformers and not customized to Stable Diffusion. Better performance may be observed upon custom kernel tuning. Therefore, these numbers do not represent **peak** HW capability.
- Performance may vary across different versions of Stable Diffusion due to architecture changes in the model itself. Each reported number is specific to the model version mentioned in that context.
- Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state.
</details>
[`stabilityai/stable-diffusion-xl-base-1.0-ios`](https://huggingface.co/apple/coreml-stable-diffusion-xl-base-ios) (768x768)
| Device | `--compute-unit`| `--attention-implementation` | End-to-End Latency (s) | Diffusion Speed (iter/s) |
| --------------------- | --------------- | ---------------------------- | ---------------------- | ------------------------ |
| iPhone 12 Pro | `CPU_AND_NE` | `SPLIT_EINSUM` | 116* | 0.50 |
| iPhone 13 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM` | 86* | 0.68 |
| iPhone 14 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM` | 77* | 0.83 |
| iPhone 15 Pro Max | `CPU_AND_NE` | `SPLIT_EINSUM` | 31 | 0.85 |
| iPad Pro (M1) | `CPU_AND_NE` | `SPLIT_EINSUM` | 36 | 0.69 |
| iPad Pro (M2) | `CPU_AND_NE` | `SPLIT_EINSUM` | 27 | 0.98 |
<details>
<summary> Details (Click to expand) </summary>
- This benchmark was conducted by Apple and Hugging Face using iOS 17.0.2 and iPadOS 17.0.2 in September 2023.
- The performance data was collected using the `benchmark` branch of the [Diffusers app](https://github.com/huggingface/swift-coreml-diffusers)
- The median latency value across 5 back-to-back end-to-end executions are reported
- The image generation procedure follows this configuration: 20 inference steps, 768x768 output image resolution, 77 text token sequence length, classifier-free guidance (batch size of 2 for unet).
- `Unet.mlmodelc` is compressed to 4.04 bit precision following the [Mixed-Bit Palettization](#compression-lower-than-6-bits) algorithm recipe published [here](https://huggingface.co/apple/coreml-stable-diffusion-mixed-bit-palettization/blob/main/recipes/stabilityai-stable-diffusion-xl-base-1.0_palettization_recipe.json)
- All models except for `Unet.mlmodelc` are compressed to 16 bit precision
- [madebyollin/sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) by [@madebyollin](https://github.com/madebyollin) was used as the source PyTorch model for `VAEDecoder.mlmodelc` in order to enable float16 weight and activation quantization for the VAE model.
- `--attention-implementation SPLIT_EINSUM` is chosen in lieu of `SPLIT_EINSUM_V2` due to the prohibitively long compilation time of the latter
- `*` indicates that the [reduceMemory](https://github.com/apple/ml-stable-diffusion/blob/main/swift/StableDiffusion/pipeline/StableDiffusionPipeline.swift#L91) option was enabled which loads and unloads models just-in-time to avoid memory shortage. This added significant overhead to the end-to-end latency. Note that end-to-end latency difference between `iPad Pro (M1)` and `iPhone 13 Pro Max` despite identical diffusion speed.
- The actual prompt length does not impact performance because the Core ML model is converted with a static shape that computes the forward pass for all of the 77 elements (`tokenizer.model_max_length`) in the text token sequence regardless of the actual length of the input text.
- In the benchmark table, we report the best performing `--compute-unit` and `--attention-implementation` values per device. The former does not modify the Core ML model and can be applied during runtime. The latter modifies the Core ML model. Note that the best performing compute unit is model version and hardware-specific.
- Note that the performance optimizations in this repository (e.g. `--attention-implementation`) are generally applicable to Transformers and not customized to Stable Diffusion. Better performance may be observed upon custom kernel tuning. Therefore, these numbers do not represent **peak** HW capability.
- Performance may vary across different versions of Stable Diffusion due to architecture changes in the model itself. Each reported number is specific to the model version mentioned in that context.
- Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state.
</details>
[`stabilityai/stable-diffusion-xl-base-1.0`](https://huggingface.co/apple/coreml-stable-diffusion-xl-base) (1024x1024)
| Device | `--compute-unit`| `--attention-implementation` | End-to-End Latency (s) | Diffusion Speed (iter/s) |
| --------------------- | --------------- | ---------------------------- | ---------------------- | ------------------------ |
| MacBook Pro (M1 Max) | `CPU_AND_GPU` | `ORIGINAL` | 46 | 0.46 |
| MacBook Pro (M2 Max) | `CPU_AND_GPU` | `ORIGINAL` | 37 | 0.57 |
| Mac Studio (M1 Ultra) | `CPU_AND_GPU` | `ORIGINAL` | 25 | 0.89 |
| Mac Studio (M2 Ultra) | `CPU_AND_GPU` | `ORIGINAL` | 20 | 1.11 |
<details>
<summary> Details (Click to expand) </summary>
- This benchmark was conducted by Apple and Hugging Face using public beta versions of iOS 17.0, iPadOS 17.0 and macOS 14.0 in July 2023.
- The performance data was collected by running the `StableDiffusion` Swift pipeline.
- The median latency value across 3 back-to-back end-to-end executions are reported
- The image generation procedure follows the standard configuration: 20 inference steps, 1024x1024 output image resolution, classifier-free guidance (batch size of 2 for unet).
- Weights and activations are in float16 precision
- Performance may vary across different versions of Stable Diffusion due to architecture changes in the model itself. Each reported number is specific to the model version mentioned in that context.
- Performance may vary due to factors like increased system load from other applications or suboptimal device thermal state. Given these factors, we do not report sub-second variance in latency.
</details>
</details>
## <a name="compression-6-bits-and-higher"></a> Weight Compression (6-bits and higher)
<details>
<summary> Details (Click to expand) </summary>
coremltools-7.0 supports advanced weight compression techniques for [pruning](https://coremltools.readme.io/v7.0/docs/pruning), [palettization](https://coremltools.readme.io/v7.0/docs/palettization-overview) and [linear 8-bit quantization](https://coremltools.readme.io/v7.0/docs/quantization-aware-training). For these techniques, `coremltools.optimize.torch.*` includes APIs that require fine-tuning to maintain accuracy at higher compression rates whereas `coremltools.optimize.coreml.*` includes APIs that are applied post-training and are data-free.
We demonstrate how data-free [post-training palettization](https://coremltools.readme.io/v7.0/docs/post-training-palettization) implemented in `coremltools.optimize.coreml.palettize_weights` enables us to achieve greatly improved performance for Stable Diffusion on mobile devices. This API implements the [Fast Exact k-Means](https://arxiv.org/abs/1701.07204) algorithm for optimal weight clustering which yields more accurate palettes. Using `--quantize-nbits {2,4,6,8}` during [conversion](#converting-models-to-coreml) is going to apply this compression to the unet and text_encoder models.
For best results, we recommend [training-time palettization](https://coremltools.readme.io/v7.0/docs/training-time-palettization): `coremltools.optimize.torch.palettization.DKMPalettizer` if fine-tuning your model is feasible. This API implements the [Differentiable k-Means (DKM)](https://machinelearning.apple.com/research/differentiable-k-means) learned palettization algorithm. In this exercise, we stick to post-training palettization for the sake of simplicity and ease of reproducibility.
The Neural Engine is capable of accelerating models with low-bit palettization: 1, 2, 4, 6 or 8 bits. With iOS 17 and macOS 14, compressed weights for Core ML models can be just-in-time decompressed during runtime (as opposed to ahead-of-time decompression upon load) to match the precision of activation tensors. This yields significant memory savings and enables models to run on devices with smaller RAM (e.g. iPhone 12 Mini). In addition, compressed weights are faster to fetch from memory which reduces the latency of memory bandwidth-bound layers. The just-in-time decompression behavior depends on the compute unit, layer type and hardware generation.
| Weight Precision | `--compute-unit` | [`stabilityai/stable-diffusion-2-1-base`](https://huggingface.co/apple/coreml-stable-diffusion-2-1-base) generating *"a high quality photo of a surfing dog"* |
| :---------------:| :----------------: | ------------------------------------------------------ |
| 6-bit | cpuAndNeuralEngine | <img src="assets/palette6_cpuandne_readmereel.png"> |
| 16-bit | cpuAndNeuralEngine | <img src="assets/float16_cpuandne_readmereel.png"> |
| 16-bit | cpuAndGPU | <img src="assets/float16_gpu_readmereel.png"> |
Note that there are minor differences across 16-bit (float16) and 6-bit results. These differences are comparable to the differences across float16 and float32 or differences across compute units as exemplified above. We recommend a minimum of 6 bits for palettizing Stable Diffusion. Smaller number of bits (1, 2 and 4) will require either fine-tuning or advanced palettization techniques such as [MBP](#compression-lower-than-6-bits).
Resources:
- [Core ML Tools Docs: Optimizing Models](https://coremltools.readme.io/v7.0/docs/optimizing-models)
- [WWDC23 Session Video: Use Core ML Tools for machine learning model compression](https://developer.apple.com/videos/play/wwdc2023/10047)
</details>
## <a name="compression-lower-than-6-bits"></a> Advanced Weight Compression (Lower than 6-bits)
<details>
<summary> Details (Click to expand) </summary>
This section describes an advanced compression algorithm called [Mixed-Bit Palettization (MBP)](https://huggingface.co/blog/stable-diffusion-xl-coreml#what-is-mixed-bit-palettization) built on top of the [Post-Training Weight Palettization tools](https://apple.github.io/coremltools/docs-guides/source/post-training-palettization.html) and using the [Weights Metadata API](https://apple.github.io/coremltools/docs-guides/source/mlmodel-utilities.html#get-weights-metadata) from [coremltools](https://github.com/apple/coremltools).
MBP builds a per-layer "palettization recipe" by picking a suitable number of bits among the Neural Engine supported bit-widths of 1, 2, 4, 6 and 8 in order to achieve the minimum average bit-width while maintaining a desired level of signal strength. The signal strength is measured by comparing the compressed model's output to that of the original float16 model. Given the same random seed and text prompts, PSNR between denoised latents is computed. The compression rate will depend on the model version as well as the tolerance for signal loss (drop in PSNR) since this algorithm is adaptive.
| 3.41-bit | 4.50-bit | 6.55-bit | 16-bit (original) |
| :-------:| :-------:| :-------:| :----------------:|
| <img src="assets/mbp/a_high_quality_photo_of_a_surfing_dog.7667.final_3.41-bits.png"> | <img src="assets/mbp/a_high_quality_photo_of_a_surfing_dog.7667.final_4.50-bits.png"> | <img src="assets/mbp/a_high_quality_photo_of_a_surfing_dog.7667.final_6.55-bits.png"> | <img src="assets/mbp/a_high_quality_photo_of_a_surfing_dog.7667.final_float16_original.png"> |
For example, the original float16 [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model has an ~82 dB signal strength. Naively applying [linear 8-bit quantization](https://coremltools.readme.io/docs/data-free-quantization) to the Unet model drops the signal to ~65 dB. Instead, applying MBP yields an average of 2.81-bits quantization while maintaining a signal strength of ~67 dB. This technique generally yields better results compared to using `--quantize-nbits` during model conversion but requires a "pre-analysis" run that takes up to a few hours on a single GPU (`mps` or `cuda`).
Here is the signal strength (PSNR in dB) versus model size reduction (% of float16 size) for `stabilityai/stable-diffusion-xl-base-1.0`. The `{1,2,4,6,8}-bit` curves are generated by progressively palettizing more layers using a palette with fixed number of bits. The layers were ordered in ascending order of their isolated impact to end-to-end signal strength so the cumulative compression's impact is delayed as much as possible. The mixed-bit curve is based on falling back to a higher number of bits as soon as a layer's isolated impact to end-to-end signal integrity drops below a threshold. Note that all curves based on palettization outperform linear 8-bit quantization at the same model size except for 1-bit.
<img src="assets/mbp/stabilityai_stable-diffusion-xl-base-1.0_psnr_vs_size.png" width="640">
Here are the steps for applying this technique on another model version:
**Step 1:** Run the pre-analysis script to generate "recipes" with varying signal strength:
```python
python -m python_coreml_stable_diffusion.mixed_bit_compression_pre_analysis --model-version <model-version> -o <output-dir>
```
For popular base models, you may find the pre-computed pre-analysis results [here](https://huggingface.co/apple/coreml-stable-diffusion-mixed-bit-palettization/tree/main/recipes). Fine-tuned models models are likely to honor the recipes of their corresponding base models but this is untested.
**Step 2:** The resulting JSON file from Step 1 will list "baselines", e.g.:
```json
{
"model_version": "stabilityai/stable-diffusion-xl-base-1.0",
"baselines": {
"original": 82.2,
"linear_8bit": 66.025,
"recipe_6.55_bit_mixedpalette": 79.9,
"recipe_5.52_bit_mixedpalette": 78.2,
"recipe_4.89_bit_mixedpalette": 76.8,
"recipe_4.41_bit_mixedpalette": 75.5,
"recipe_4.04_bit_mixedpalette": 73.2,
"recipe_3.67_bit_mixedpalette": 72.2,
"recipe_3.32_bit_mixedpalette": 71.4,
"recipe_3.19_bit_mixedpalette": 70.4,
"recipe_3.08_bit_mixedpalette": 69.6,
"recipe_2.98_bit_mixedpalette": 68.6,
"recipe_2.90_bit_mixedpalette": 67.8,
"recipe_2.83_bit_mixedpalette": 67.0,
"recipe_2.71_bit_mixedpalette": 66.3
},
}
```
Among these baselines, select a recipe based on your desired signal strength. We recommend palettizing to ~4 bits depending on the use case even if the signal integrity for lower bit values are higher than the linear 8-bit quantization baseline.
Finally, apply the selected recipe to the float16 Core ML model as follows:
```python
python -m python_coreml_stable_diffusion.mixed_bit_compression_apply --mlpackage-path <path-to-float16-unet-mlpackage> -o <output-dir> --pre-analysis-json-path <path-to--pre-analysis-json> --selected-recipe <selected-recipe-string-key>
```
An example `<selected-recipe-string-key>` would be `"recipe_4.50_bit_mixedpalette"` which achieves an average of 4.50-bits compression (compressed from ~5.2GB to ~1.46GB for SDXL). Please note that signal strength does not directly map to image-text alignment. Always verify that your MBP-compressed model variant is accurately generating images for your test prompts.
</details>
## <a name="activation-quant"></a> Activation Quantization
<details>
<summary> Details (Click to expand) </summary>
On newer hardware with A17 Pro or M4 chips, such as the iPhone 15 Pro, quantizing both activations and weight to int8 can leverage optimized compute on the Neural Engine which can be used to improve runtime latency in compute-bound models.
In this section, we demonstrate how to apply [Post Training Activation Quantization](https://apple.github.io/coremltools/docs-guides/source/opt-quantization-algos.html#post-training-data-calibration-activation-quantization), using calibration data, on Stable Diffusion UNet model.
Similar to Mixed-Bit Palettization (MBP) described [above](#a-namecompression-lower-than-6-bitsa-advanced-weight-compression-lower-than-6-bits), first, a per-layer analysis is run to determine which intermediate activations are more sensitive to 8-bit compression.
Less sensitive layers are weight and activation quantized (W8A8), whereas more sensitive layers are only weight quantized (W8A16).
Here are the steps for applying this technique:
**Step 1:** Generate calibration data
```python
python -m python_coreml_stable_diffusion.activation_quantization --model-version <model-version> --generate-calibration-data -o <output-dir>
```
A set of calibration text prompts are run through StableDiffusionPipeline and UNet model inputs are recorded and stored as pickle files in `calibration_data_<model-version>` folder inside specified output directory.
**Step 2:** Run layer-wise sensitivity analysis
```python
python -m python_coreml_stable_diffusion.activation_quantization --model-version <model-version> --layerwise-sensitivity --calibration-nsamples <num-samples> -o <output-dir>
```
This will run the analysis on all Convolutional and Attention (Einsum) modules in the model.
For each module, a compressed version is generated by quantizing only that layer’s weights and activations.
Then the PSNR between the outputs of the compressed and original model is calculated, using the same random seed and text prompts.
This analysis takes up to a few hours on a single GPU (cuda). The number of calibration samples used to quantize the model can be reduced to speed up the process.
The resulting JSON file looks like this:
```json
{
"conv": {
"conv_in": 30.74,
"down_blocks.0.attentions.0.proj_in": 38.93,
"down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q": 48.15,
"down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k": 50.13,
"down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v": 45.70,
"down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0": 39.56,
...
},
"einsum": {
"down_blocks.0.attentions.0.transformer_blocks.0.attn1.einsum": 25.34,
"down_blocks.0.attentions.0.transformer_blocks.0.attn2.einsum": 31.76,
"down_blocks.0.attentions.1.transformer_blocks.0.attn1.einsum": 23.40,
"down_blocks.0.attentions.1.transformer_blocks.0.attn2.einsum": 31.56,
...
},
"model_version": "stabilityai/stable-diffusion-2-1-base"
}
```
**Step 3:** Generate quantized model
Using calibration data and layer-wise sensitivity the quantized CoreML model can be generated as follows:
```python
python -m python_coreml_stable_diffusion.activation_quantization --model-version <model-version> --quantize-pytorch --conv-psnr 38 --attn-psnr 26 -o <output-dir>
```
The PSNR thresholds determine which layers will be activation quantized. This number can be tuned to trade-off between output quality and inference latency.
</details>
## <a name="using-stable-diffusion-3"></a> Using Stable Diffusion 3
<details>
<summary> Details (Click to expand) </summary>
### Model Conversion
Stable Diffusion 3 uses some new and some old models to run. For the text encoders, the conversion can be done using a similar command as before with the `--sd3-version` flag.
```bash
python -m python_coreml_stable_diffusion.torch2coreml --model-version stabilityai/stable-diffusion-3-medium --bundle-resources-for-swift-cli --convert-text-encoder --sd3-version -o <output-dir>
```
For the new models (MMDiT, a new VAE with 16 channels, and the T5 text encoder), there are a number of new CLI flags that utilize the [DiffusionKit](https://www.github.com/argmaxinc/DiffusionKit) repo:
- `--sd3-version`: Indicates to the converter to treat this as a Stable Diffusion 3 model
- `--convert-mmdit`: Convert the MMDiT model
- `--convert-vae-decoder`: Convert the new VAE model (this will use the 16 channel version if --sd3-version is set)
- `--include-t5`: Downloads and includes a pre-converted T5 text encoder in the conversion
e.g.:
```bash
python -m python_coreml_stable_diffusion.torch2coreml --model-version stabilityai/stable-diffusion-3-medium --bundle-resources-for-swift-cli --convert-vae-decoder --convert-mmdit --include-t5 --sd3-version -o <output-dir>
```
To convert the full pipeline with at 1024x1024 resolution, the following command may be used:
```bash
python -m python_coreml_stable_diffusion.torch2coreml --model-version stabilityai/stable-diffusion-3-medium --bundle-resources-for-swift-cli --convert-text-encoder --convert-vae-decoder --convert-mmdit --include-t5 --sd3-version --latent-h 128 --latent-w 128 -o <output-dir>
```
Keep in mind that the MMDiT model is quite large and will require increasingly more memory and time to convert as the latent resolution increases.
Also note that currently the MMDiT model requires fp32 and therefore only supports `CPU_AND_GPU` compute units and `ORIGINAL` attention implementation (the default for this pipeline).
### Swift Inference
Swift inference for Stable Diffusion 3 is similar to the previous versions. The only difference is that the `--sd3` flag should be used to indicate that the model is a Stable Diffusion 3 model.
```bash
swift run StableDiffusionSample <prompt> --resource-path <output-mlpackages-directory/Resources> --output-path <output-dir> --compute-units cpuAndGPU --sd3
```
</details>
## <a name="using-stable-diffusion-xl"></a> Using Stable Diffusion XL
<details>
<summary> Details (Click to expand) </summary>
### Model Conversion
e.g.:
```bash
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-vae-decoder --convert-text-encoder --xl-version --model-version stabilityai/stable-diffusion-xl-base-1.0 --refiner-version stabilityai/stable-diffusion-xl-refiner-1.0 --bundle-resources-for-swift-cli --attention-implementation {ORIGINAL,SPLIT_EINSUM} -o <output-dir>
```
- `--xl-version`: Additional argument to pass to the conversion script when specifying an XL model
- `--refiner-version`: Additional argument to pass to the conversion script when specifying an XL refiner model, required for ["Ensemble of Expert Denoisers"](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_xl#1-ensemble-of-expert-denoisers) inference.
- `--attention-implementation`: `ORIGINAL` is recommended for `cpuAndGPU` for deployment on Mac
- `--attention-implementation`: `SPLIT_EINSUM` is recommended for `cpuAndNeuralEngine` for deployment on iPhone & iPad
- `--attention-implementation`: `SPLIT_EINSUM_V2` is not recommended for Stable Diffusion XL because of prohibitively long compilation time
- **Tip:** Adding `--latent-h 96 --latent-w 96` is recommended for iOS and iPadOS deployment which leads to 768x768 generation as opposed to the default 1024x1024.
- **Tip:** Due to known float16 overflow issues in the original Stable Diffusion XL VAE, [the model conversion script enforces float32 precision](https://github.com/apple/ml-stable-diffusion/blob/main/python_coreml_stable_diffusion/torch2coreml.py#L486). Using a custom VAE version such as [madebyollin/sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) by [@madebyollin](https://github.com/madebyollin) via `--custom-vae-version madebyollin/sdxl-vae-fp16-fix` will restore the default float16 precision for VAE.
### Swift Inference
```bash
swift run StableDiffusionSample <prompt> --resource-path <output-mlpackages-directory/Resources> --output-path <output-dir> --compute-units {cpuAndGPU,cpuAndNeuralEngine} --xl
```
- Only the `base` model is required, `refiner` model is optional and will be used by default if provided in the resource directory
- ControlNet for XL is not yet supported
### Python Inference
```bash
python -m python_coreml_stable_diffusion.pipeline --prompt <prompt> --compute-unit {CPU_AND_GPU,CPU_AND_NE} -o <output-dir> -i <output-mlpackages-directory/Resources> --model-version stabilityai/stable-diffusion-xl-base-1.0
```
- `refiner` model is not yet supported
- ControlNet for XL is not yet supported
</details>
## <a name="using-controlnet"></a> Using ControlNet
<details>
<summary> Details (Click to expand) </summary>
Example results using the prompt *"a high quality photo of a surfing dog"* conditioned on the scribble (leftmost):
<img src="assets/controlnet_readme_reel.png">
[ControlNet](https://huggingface.co/lllyasviel/ControlNet) allows users to condition image generation with Stable Diffusion on signals such as edge maps, depth maps, segmentation maps, scribbles and pose. Thanks to [@ryu38's contribution](https://github.com/apple/ml-stable-diffusion/pull/153), both the Python CLI and the Swift package support ControlNet models. Please refer to [this section](#converting-models-to-coreml) for details on setting up Stable Diffusion with ControlNet.
Note that ControlNet is not yet supported for Stable Diffusion XL.
</details>
## <a name="system-multilingual-text-encoder"></a> Using the System Multilingual Text Encoder
<details>
<summary> Details (Click to expand) </summary>
With iOS 17 and macOS 14, `NaturalLanguage` framework introduced the [NLContextualEmbedding](https://developer.apple.com/documentation/naturallanguage/nlcontextualembedding) which provides Transformer-based textual embeddings for Latin (20 languages), Cyrillic (4 languages) and CJK (3 languages) scripts. The WWDC23 session titled [Explore Natural Language multilingual models](https://developer.apple.com/videos/play/wwdc2023/10042) demonstrated how this powerful new model can be used by developers to train downstream tasks such as multilingual image generation with Stable Diffusion.
The code to reproduce this demo workflow is made available in this repository. There are several ways in which this workflow can be implemented. Here is an example:
**Step 1:** Curate an image-text dataset with the desired languages.
**Step 2:** Pre-compute the NLContextualEmbedding values and replace the text strings with these embedding vectors in your dataset.
**Step 3:** Fine-tune a base model from Hugging Face Hub that is compatible with the [StableDiffusionPipeline](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview) by using your new dataset and replacing the default text_encoder with your pre-computed NLContextualEmbedding values.
**Step 4:** In order to be able to swap the text_encoder of a base model without training new layers, the base model's `text_encoder.hidden_size` must match that of NLContextualEmbedding. If it doesn't, you will need to train a linear projection layer to map between the two dimensionalities. After fine-tuning, this linear layer should be converted to CoreML as follows:
```shell
python -m python_coreml_stable_diffusion.multilingual_projection --input-path <path-to-projection-torchscript> --output-dir <output-dir>
```
The command above will yield a `MultilingualTextEncoderProjection.mlmodelc` file under `--output-dir` and this should be colocated with the rest of the Core ML model assets that were generated through `--bundle-resources-for-swift-cli`.
**Step 5:** The multilingual system text encoder can now be invoked by setting `useMultilingualTextEncoder` to true when initializing a pipeline or setting `--use-multilingual-text-encoder` in the CLI. Note that the model assets are distributed over-the-air so the first invocation will trigger asset downloads which is less than 100MB.
Resources:
- [WWDC23 Session Video: Explore Natural Language multilingual models](https://developer.apple.com/videos/play/wwdc2023/10042)
- [NLContextualEmbedding API Documentation](https://developer.apple.com/documentation/naturallanguage/nlcontextualembedding)
</details>
## <a name="using-converted-weights"></a> Using Ready-made Core ML Models from Hugging Face Hub
<details>
<summary> Click to expand </summary>
🤗 Hugging Face ran the [conversion procedure](#converting-models-to-coreml) on the following models and made the Core ML weights publicly available on the Hub. If you would like to convert a version of Stable Diffusion that is not already available on the Hub, please refer to the [Converting Models to Core ML](#converting-models-to-coreml).
* 6-bit quantized models (suitable for iOS 17 and macOS 14):
- [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/apple/coreml-stable-diffusion-1-4-palettized)
- [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/apple/coreml-stable-diffusion-v1-5-palettized)
- [`stabilityai/stable-diffusion-2-base`](https://huggingface.co/apple/coreml-stable-diffusion-2-base-palettized)
- [`stabilityai/stable-diffusion-2-1-base`](https://huggingface.co/apple/coreml-stable-diffusion-2-1-base-palettized)
* Mixed-bit quantized models
- [`stabilityai/stable-diffusion-xl-base-1.0`](https://huggingface.co/apple/coreml-stable-diffusion-mixed-bit-palettization)
- [`stabilityai/stable-diffusion-xl-base-1.0-ios`](https://huggingface.co/apple/coreml-stable-diffusion-xl-base-ios)
* Uncompressed models:
- [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/apple/coreml-stable-diffusion-v1-4)
- [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/apple/coreml-stable-diffusion-v1-5)
- [`stabilityai/stable-diffusion-2-base`](https://huggingface.co/apple/coreml-stable-diffusion-2-base)
- [`stabilityai/stable-diffusion-2-1-base`](https://huggingface.co/apple/coreml-stable-diffusion-2-1-base)
- [`stabilityai/stable-diffusion-xl-base-1.0`](https://huggingface.co/apple/coreml-stable-diffusion-xl-base)
- [`stabilityai/stable-diffusion-xl-{base+refiner}-1.0`](https://huggingface.co/apple/coreml-stable-diffusion-xl-base-with-refiner)
- [`stabilityai/stable-diffusion-3-medium`](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
If you want to use any of those models you may download the weights and proceed to [generate images with Python](#image-generation-with-python) or [Swift](#image-generation-with-swift).
There are several variants in each model repository. You may clone the whole repos using `git` and `git lfs` to download all variants, or selectively download the ones you need.
To clone the repos using `git`, please follow this process:
**Step 1:** Install the `git lfs` extension for your system.
`git lfs` stores large files outside the main git repo, and it downloads them from the appropriate server after you clone or checkout. It is available in most package managers, check [the installation page](https://git-lfs.com) for details.
**Step 2:** Enable `git lfs` by running this command once:
```bash
git lfs install
```
**Step 3:** Use `git clone` to download a copy of the repo that includes all model variants. For Stable Diffusion version 1.4, you'd issue the following command in your terminal:
```bash
git clone https://huggingface.co/apple/coreml-stable-diffusion-v1-4
```
If you prefer to download specific variants instead of cloning the repos, you can use the `huggingface_hub` Python library. For example, to do generation in Python using the `ORIGINAL` attention implementation (read [this section](#converting-models-to-coreml) for details), you could use the following helper code:
```Python
from huggingface_hub import snapshot_download
from pathlib import Path
repo_id = "apple/coreml-stable-diffusion-v1-4"
variant = "original/packages"
model_path = Path("./models") / (repo_id.split("/")[-1] + "_" + variant.replace("/", "_"))
snapshot_download(repo_id, allow_patterns=f"{variant}/*", local_dir=model_path, local_dir_use_symlinks=False)
print(f"Model downloaded at {model_path}")
```
`model_path` would be the path in your local filesystem where the checkpoint was saved. Please, refer to [this post](https://huggingface.co/blog/diffusers-coreml) for additional details.
</details>
## <a name="converting-models-to-coreml"></a> Converting Models to Core ML
<details>
<summary> Click to expand </summary>
**Step 1:** Create a Python environment and install dependencies:
```bash
conda create -n coreml_stable_diffusion python=3.8 -y
conda activate coreml_stable_diffusion
cd /path/to/cloned/ml-stable-diffusion/repository
pip install -e .
```
**Step 2:** Log in to or register for your [Hugging Face account](https://huggingface.co), generate a [User Access Token](https://huggingface.co/settings/tokens) and use this token to set up Hugging Face API access by running `huggingface-cli login` in a Terminal window.
**Step 3:** Navigate to the version of Stable Diffusion that you would like to use on [Hugging Face Hub](https://huggingface.co/models?search=stable-diffusion) and accept its Terms of Use. The default model version is [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4). The model version may be changed by the user as described in the next step.
**Step 4:** Execute the following command from the Terminal to generate Core ML model files (`.mlpackage`)
```shell
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker --model-version <model-version-string-from-hub> -o <output-mlpackages-directory>
```
**WARNING:** This command will download several GB worth of PyTorch checkpoints from Hugging Face. Please ensure that you are on Wi-Fi and have enough disk space.
This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful execution, the 4 neural network models that comprise Stable Diffusion will have been converted from PyTorch to Core ML (`.mlpackage`) and saved into the specified `<output-mlpackages-directory>`. Some additional notable arguments:
- `--model-version`: The model version name as published on the [Hugging Face Hub](https://huggingface.co/models?search=stable-diffusion)
- `--refiner-version`: The refiner version name as published on the [Hugging Face Hub](https://huggingface.co/models?search=stable-diffusion). This is optional and if specified, this argument will convert and bundle the refiner unet alongside the model unet.
- `--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `<output-mlpackages-directory>/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline. [However using these compiled models in Python will significantly speed up inference](https://apple.github.io/coremltools/docs-guides/source/model-prediction.html#why-use-a-compiled-model).
- `--quantize-nbits`: Quantizes the weights of unet and text_encoder models down to 2, 4, 6 or 8 bits using a globally optimal k-means clustering algorithm. By default all models are weight-quantized to 16 bits even if this argument is not specified. Please refer to [this section](#compression-6-bits-and-higher for details and further guidance on weight compression.
- `--chunk-unet`: Splits the Unet model in two approximately equal chunks (each with less than 1GB of weights) for mobile-friendly deployment. This is **required** for Neural Engine deployment on iOS and iPadOS if weights are not quantized to 6-bits or less (`--quantize-nbits {2,4,6}`). This is not required for macOS. Swift CLI is able to consume both the chunked and regular versions of the Unet model but prioritizes the former. Note that chunked unet is not compatible with the Python pipeline because Python pipeline is intended for macOS only.
- `--attention-implementation`: Defaults to `SPLIT_EINSUM` which is the implementation described in [Deploying Transformers on the Apple Neural Engine](https://machinelearning.apple.com/research/neural-engine-transformers). `--attention-implementation SPLIT_EINSUM_V2` yields 10-30% improvement for mobile devices, still targeting the Neural Engine. `--attention-implementation ORIGINAL` will switch to an alternative implementation that should be used for CPU or GPU deployment on some Mac devices. Please refer to the [Performance Benchmark](#performance-benchmark) section for further guidance.
- `--check-output-correctness`: Compares original PyTorch model's outputs to final Core ML model's outputs. This flag increases RAM consumption significantly so it is recommended only for debugging purposes.
- `--convert-controlnet`: Converts ControlNet models specified after this option. This can also convert multiple models if you specify like `--convert-controlnet lllyasviel/sd-controlnet-mlsd lllyasviel/sd-controlnet-depth`.
- `--unet-support-controlnet`: enables a converted UNet model to receive additional inputs from ControlNet. This is required for generating image with using ControlNet and saved with a different name, `*_control-unet.mlpackage`, distinct from normal UNet. On the other hand, this UNet model can not work without ControlNet. Please use normal UNet for just txt2img.
- `--unet-batch-one`: use a batch size of one for the unet, this is needed if you do not want to do classifier free guidance, i.e. using a `guidance-scale` of less than one.
- `--convert-vae-encoder`: not required for text-to-image applications. Required for image-to-image applications in order to map the input image to the latent space.
</details>
## <a name="image-generation-with-python"></a> Image Generation with Python
<details>
<summary> Click to expand </summary>
Run text-to-image generation using the example Python pipeline based on [diffusers](https://github.com/huggingface/diffusers):
```shell
python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i <core-ml-model-directory> -o </path/to/output/image> --compute-unit ALL --seed 93
```
Please refer to the help menu for all available arguments: `python -m python_coreml_stable_diffusion.pipeline -h`. Some notable arguments:
- `-i`: Should point to the `-o` directory from Step 4 of [Converting Models to Core ML](#converting-models-to-coreml) section from above. If you specified `--bundle-resources-for-swift-cli` during conversion, then use the resulting `Resources` folder (which holds the compiled `.mlmodelc` files). [The compiled models load much faster after first use](https://apple.github.io/coremltools/docs-guides/source/model-prediction.html#why-use-a-compiled-model).
- `--model-version`: If you overrode the default model version while converting models to Core ML, you will need to specify the same model version here.
- `--compute-unit`: Note that the most performant compute unit for this particular implementation may differ across different hardware. `CPU_AND_GPU` or `CPU_AND_NE` may be faster than `ALL`. Please refer to the [Performance Benchmark](#performance-benchmark) section for further guidance.
- `--scheduler`: If you would like to experiment with different schedulers, you may specify it here. For available options, please see the help menu. You may also specify a custom number of inference steps by `--num-inference-steps` which defaults to 50.
- `--controlnet`: ControlNet models specified with this option are used in image generation. Use this option in the format `--controlnet lllyasviel/sd-controlnet-mlsd lllyasviel/sd-controlnet-depth` and make sure to use `--controlnet-inputs` in conjunction.
- `--controlnet-inputs`: Image inputs corresponding to each ControlNet model. Please provide image paths in same order as models in `--controlnet`, for example: `--controlnet-inputs image_mlsd image_depth`.
- `--unet-batch-one`: Do not batch unet predictions for the prompt and negative prompt. This requires the unet has been converted with a batch size of one, see `--unet-batch-one` option in conversion script.
</details>
## <a name="image-gen-swift"></a> Image Generation with Swift
<details>
<summary> Click to expand </summary>
### Example CLI Usage
```shell
swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path <output-mlpackages-directory>/Resources/ --seed 93 --output-path </path/to/output/image>
```
The output will be named based on the prompt and random seed:
e.g. `</path/to/output/image>/a_photo_of_an_astronaut_riding_a_horse_on_mars.93.final.png`
Please use the `--help` flag to learn about batched generation and more.
### Example Library Usage
```swift
import StableDiffusion
...
let pipeline = try StableDiffusionPipeline(resourcesAt: resourceURL)
pipeline.loadResources()
let image = try pipeline.generateImages(prompt: prompt, seed: seed).first
```
On iOS, the `reduceMemory` option should be set to `true` when constructing `StableDiffusionPipeline`
### Swift Package Details
This Swift package contains two products:
- `StableDiffusion` library
- `StableDiffusionSample` command-line tool
Both of these products require the Core ML models and tokenization resources to be supplied. When specifying resources via a directory path that directory must contain the following:
- `TextEncoder.mlmodelc` or `TextEncoder2.mlmodelc (text embedding model)
- `Unet.mlmodelc` or `UnetChunk1.mlmodelc` & `UnetChunk2.mlmodelc` (denoising autoencoder model)
- `VAEDecoder.mlmodelc` (image decoder model)
- `vocab.json` (tokenizer vocabulary file)
- `merges.text` (merges for byte pair encoding file)
Optionally, for image2image, in-painting, or similar:
- `VAEEncoder.mlmodelc` (image encoder model)
Optionally, it may also include the safety checker model that some versions of Stable Diffusion include:
- `SafetyChecker.mlmodelc`
Optionally, for the SDXL refiner:
- `UnetRefiner.mlmodelc` (refiner unet model)
Optionally, for ControlNet:
- `ControlledUNet.mlmodelc` or `ControlledUnetChunk1.mlmodelc` & `ControlledUnetChunk2.mlmodelc` (enabled to receive ControlNet values)
- `controlnet/` (directory containing ControlNet models)
- `LllyasvielSdControlnetMlsd.mlmodelc` (for example, from lllyasviel/sd-controlnet-mlsd)
- `LllyasvielSdControlnetDepth.mlmodelc` (for example, from lllyasviel/sd-controlnet-depth)
- Other models you converted
Note that the chunked version of Unet is checked for first. Only if it is not present will the full `Unet.mlmodelc` be loaded. Chunking is required for iOS and iPadOS and not necessary for macOS.
</details>
## <a name="swift-app"></a> Example Swift App
<details>
<summary> Click to expand </summary>
🤗 Hugging Face created an [open-source demo app](https://github.com/huggingface/swift-coreml-diffusers) on top of this library. It's written in native Swift and Swift UI, and runs on macOS, iOS and iPadOS. You can use the code as a starting point for your app, or to see how to integrate this library in your own projects.
Hugging Face has made the app [available in the Mac App Store](https://apps.apple.com/app/diffusers/id1666309574?mt=12).
</details>
## <a name="faq"></a> FAQ
<details>
<summary> Click to expand </summary>
<details>
<summary> <b> Q1: </b> <code> ERROR: Failed building wheel for tokenizers or error: can't find Rust compiler </code> </summary>
<b> A1: </b> Please review this [potential solution](https://github.com/huggingface/transformers/issues/2831#issuecomment-592724471).
</details>
<details>
<summary> <b> Q2: </b> <code> RuntimeError: {NSLocalizedDescription = "Error computing NN outputs." </code> </summary>
<b> A2: </b> There are many potential causes for this error. In this context, it is highly likely to be encountered when your system is under increased memory pressure from other applications. Reducing memory utilization of other applications is likely to help alleviate the issue.
</details>
<details>
<summary> <b> <a name="low-mem-conversion"></a> Q3: </b> My Mac has 8GB RAM and I am converting models to Core ML using the example command. The process is getting killed because of memory issues. How do I fix this issue? </summary>
<b> A3: </b> In order to minimize the memory impact of the model conversion process, please execute the following command instead:
```bash
python -m python_coreml_stable_diffusion.torch2coreml --convert-vae-encoder --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> && \
python -m python_coreml_stable_diffusion.torch2coreml --convert-vae-decoder --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> && \
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> && \
python -m python_coreml_stable_diffusion.torch2coreml --convert-text-encoder --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> && \
python -m python_coreml_stable_diffusion.torch2coreml --convert-safety-checker --model-version <model-version-string-from-hub> -o <output-mlpackages-directory> &&
```
If you need `--chunk-unet`, you may do so in yet another independent command which will reuse the previously exported Unet model and simply chunk it in place:
```bash
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --chunk-unet -o <output-mlpackages-directory>
```
</details>
<details>
<summary> <b> Q4: </b> My Mac has 8GB RAM, should image generation work on my machine? </summary>
<b> A4: </b> Yes! Especially the `--compute-unit CPU_AND_NE` option should work under reasonable system load from other applications. Note that part of the [Example Results](#example-results) were generated using an M2 MacBook Air with 8GB RAM.
</details>
<details>
<summary> <b> Q5: </b> Every time I generate an image using the Python pipeline, loading all the Core ML models takes 2-3 minutes. Is this expected? </summary>
<b> A5: </b> Both `.mlpackage` and `.mlmodelc` models are compiled (also known as "model preparation" in Core ML terms) upon first load when a specific compute unit is specified. `.mlpackage` does not cache this compiled asset so each model load retriggers this compilation which may take up to a few minutes. On the other hand, `.mlmodelc` files do cache this compiled asset and non-first load times are reduced to just a few seconds.
In order to benefit from compilation caching, you may use the `.mlmodelc` assets instead of `.mlpackage` assets in both Swift (default) and Python (possible thanks to [@lopez-hector](https://github.com/lopez-hector)'s [contribution](https://github.com/apple/ml-stable-diffusion/commit/f3a212491cf531dd88493c89ad3d98d016db407f)) image generation pipelines.
</details>
<details>
<summary> <b> <a name="q-mobile-app"></a> Q6: </b> I want to deploy <code>StableDiffusion</code>, the Swift package, in my mobile app. What should I be aware of? </summary>
<b> A6: </b>The [Image Generation with Swift](#image-gen-swift) section describes the minimum SDK and OS versions as well as the device models supported by this package. We recommend carefully testing the package on the device with the least amount of RAM available among your deployment targets.
The image generation process in `StableDiffusion` can yield over 2 GB of peak memory during runtime depending on the compute units selected. On iPadOS, we recommend using `.cpuAndNeuralEngine` in your configuration and the `reduceMemory` option when constructing a `StableDiffusionPipeline` to minimize memory pressure.
If your app crashes during image generation, consider adding the [Increased Memory Limit](https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_developer_kernel_increased-memory-limit) capability to inform the system that some of your app’s core features may perform better by exceeding the default app memory limit on supported devices.
On iOS, depending on the iPhone model, Stable Diffusion model versions, selected compute units, system load and design of your app, this may still not be sufficient to keep your apps peak memory under the limit. Please remember, because the device shares memory between apps and iOS processes, one app using too much memory can compromise the user experience across the whole device.
We **strongly recommend** compressing your models following the recipes in [Advanced Weight Compression (Lower than 6-bits)](#compression-lower-than-6-bits) for iOS deployment. This reduces the peak RAM usage by up to 75% (from 16-bit to 4-bit) while preserving model output quality.
</details>
<details>
<summary> <b> Q7: </b> How do I generate images with different resolutions using the same Core ML models? </summary>
<b> A7: </b> The current version of `python_coreml_stable_diffusion` does not support single-model multi-resolution out of the box. However, developers may fork this project and leverage the [flexible shapes](https://coremltools.readme.io/docs/flexible-inputs) support from coremltools to extend the `torch2coreml` script by using `coremltools.EnumeratedShapes`. Note that, while the `text_encoder` is agnostic to the image resolution, the inputs and outputs of `vae_decoder` and `unet` models are dependent on the desired image resolution.
</details>
<details>
<summary> <b> Q8: </b> Are the Core ML and PyTorch generated images going to be identical? </summary>
<b> A8: </b> If desired, the generated images across PyTorch and Core ML can be made approximately identical. However, it is not guaranteed by default. There are several factors that might lead to different images across PyTorch and Core ML:
<b> 1. Random Number Generator Behavior </b>
The main source of potentially different results across PyTorch and Core ML is the Random Number Generator ([RNG](https://en.wikipedia.org/wiki/Random_number_generation)) behavior. PyTorch and Numpy have different sources of randomness. `python_coreml_stable_diffusion` generally relies on Numpy for RNG (e.g. latents initialization) and `StableDiffusion` Swift Library reproduces this RNG behavior by default. However, PyTorch-based pipelines such as Hugging Face `diffusers` relies on PyTorch's RNG behavior. Thanks to @liuliu's [contributions](https://github.com/apple/ml-stable-diffusion/pull/124), one can match the PyTorch (CPU/GPU) RNG behavior in Swift by specifying `--rng torch/cuda` which selects the `torchRNG/cudaRNG` mode.
<b> 2. PyTorch </b>
*"Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds."* ([source](https://pytorch.org/docs/stable/notes/randomness.html#reproducibility)).
<b> 3. Model Function Drift During Conversion </b>
The difference in outputs across corresponding PyTorch and Core ML models is a potential cause. The signal integrity is tested during the conversion process (enabled via `--check-output-correctness` argument to `python_coreml_stable_diffusion.torch2coreml`) and it is verified to be above a minimum [PSNR](https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio) value as tested on random inputs. Note that this is simply a sanity check and does not guarantee this minimum PSNR across all possible inputs. Furthermore, the results are not guaranteed to be identical when executing the same Core ML models across different compute units. This is not expected to be a major source of difference as the sample visual results indicate in [this section](#compression-6-bits-and-higher).
<b> 4. Weights and Activations Data Type </b>
When quantizing models from float32 to lower-precision data types such as float16, the generated images are [known to vary slightly](https://lambdalabs.com/blog/inference-benchmark-stable-diffusion) in semantics even when using the same PyTorch model. Core ML models generated by coremltools have float16 weights and activations by default [unless explicitly overridden](https://github.com/apple/coremltools/blob/main/coremltools/converters/_converters_entry.py#L256). This is not expected to be a major source of difference.
</details>
<details>
<summary> <b> Q9: </b> The model files are very large, how do I avoid a large binary for my App? </summary>
<b> A9: </b> The recommended option is to prompt the user to download these assets upon first launch of the app. This keeps the app binary size independent of the Core ML models being deployed. Disclosing the size of the download to the user is extremely important as there could be data charges or storage impact that the user might not be comfortable with.
</details>
<details>
<summary> <b> Q10: </b> <code> `Could not initialize NNPACK! Reason: Unsupported hardware` </code> </summary>
<b> A10: </b> This warning is safe to ignore in the context of this repository.
</details>
<details>
<summary> <b> Q11: </b> <code> TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect </code> </summary>
<b> A11: </b> This warning is safe to ignore in the context of this repository.
</details>
<details>
<summary> <b> Q12: </b> <code> UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown </code> </summary>
<b> A12: </b> If this warning is printed right after <code> zsh: killed python -m python_coreml_stable_diffusion.torch2coreml ... </code>, then it is highly likely that your Mac has run out of memory while converting models to Core ML. Please see [Q3](#low-mem-conversion) from above for the solution.
</details>
</details>
</details>
## <a name="bibtex"></a> BibTeX Reference
```latex
@misc{stable-diffusion-coreml-apple-silicon,
title = {Stable Diffusion with Core ML on Apple Silicon},
author = {Atila Orhon and Michael Siracusa and Aseem Wadhwa},
year = {2022},
URL = {null}
}
```
================================================
FILE: python_coreml_stable_diffusion/__init__.py
================================================
from ._version import __version__
================================================
FILE: python_coreml_stable_diffusion/_version.py
================================================
__version__ = "1.1.0"
================================================
FILE: python_coreml_stable_diffusion/activation_quantization.py
================================================
#
# For licensing see accompanying LICENSE.md file.
# Copyright (C) 2022 Apple Inc. All Rights Reserved.
#
import logging
import operator
import torch
logging.basicConfig()
logger = logging.getLogger()
logger.setLevel('INFO')
import argparse
import gc
import json
import os
import pickle
from copy import deepcopy
import coremltools as ct
import numpy as np
from coremltools.optimize.torch.quantization import (
LinearQuantizer, LinearQuantizerConfig, ModuleLinearQuantizerConfig)
from diffusers import StableDiffusionPipeline
from tqdm import tqdm
from python_coreml_stable_diffusion import attention
from python_coreml_stable_diffusion import unet
from python_coreml_stable_diffusion.layer_norm import LayerNormANE
from python_coreml_stable_diffusion.torch2coreml import compute_psnr
from python_coreml_stable_diffusion.unet import Einsum
attention.SPLIT_SOFTMAX = True
CALIBRATION_DATA = [
"image of a transparent tall glass with ice, fruits and mint, photograph, commercial, food, warm background, beautiful image, detailed",
"picture of dimly lit living room, minimalist furniture, vaulted ceiling, huge room, floor to ceiling window with an ocean view, nighttime, 3D render, high quality, detailed",
"modern office building, 8 stories tall, glass and steel, 3D render style, wide angle view, very detailed, sharp photographic image, in an office park, bright sunny day, clear blue skies, trees and landscaping",
"cute small cat sitting in a movie theater eating popcorn, watching a movie, cozy indoor lighting, detailed, digital painting, character design",
"a highly detailed matte painting of a man on a hill watching a rocket launch in the distance by studio ghibli, volumetric lighting, octane render, 4K resolution, hyperrealism, highly detailed, insanely detailed, cinematic lighting, depth of field",
"an undersea world with several of fish, rocks, detailed, realistic, photograph, amazing, beautiful, high resolution",
"large ocean wave hitting a beach at sunset, photograph, detailed",
"pocket watch on a table, close up. macro, sharp, high gloss, brass, gears, sharp, detailed",
"pocket watch in the style of pablo picasso, painting",
"majestic royal tall ship on a calm sea, realistic painting, cloudy blue sky, in the style of edward hopper",
"german castle on a mountain, blue sky, realistic, photograph, dramatic, wide angle view",
"artificial intelligence, AI, concept art, blue line sketch",
"a humanoid robot, concept art, 3D render, high quality, detailed",
"donut with sprinkles and a cup of coffee on a wood table, detailed, photograph",
"orchard at sunset, beautiful, photograph, great composition, detailed, realistic, HDR",
"image of a map of a country, tattered, old, styled, illustration, for a video game style",
"blue and green woven fibers, nano fiber material, detailed, concept art, micro photography",
]
RANDOM_TEST_DATA = [
"a black and brown dog standing outside a door.",
"a person on a motorcycle makes a turn on the track.",
"inflatable boats sit on the arizona river, and on the bank",
"a white cat sitting under a white umbrella",
"black bear standing in a field of grass under a tree.",
"a train that is parked on tracks and has graffiti writing on it, with a mountain range in the background.",
"a cake inside of a pan sitting in an oven.",
"a table with paper plates and flowers in a home",
]
def get_coreml_inputs(sample_inputs):
return [
ct.TensorType(
name=k,
shape=v.shape,
dtype=v.numpy().dtype if isinstance(v, torch.Tensor) else v.dtype,
) for k, v in sample_inputs.items()
]
def convert_to_coreml(torchscript_module, sample_inputs):
logger.info("Converting model to CoreML..")
coreml_model = ct.convert(
torchscript_module,
convert_to="mlprogram",
minimum_deployment_target=ct.target.macOS14,
inputs=get_coreml_inputs(sample_inputs),
outputs=[ct.TensorType(name="noise_pred", dtype=np.float32)],
compute_units=ct.ComputeUnit.ALL,
skip_model_load=True,
)
return coreml_model
def unet_data_loader(data_dir, device='cpu', calibration_nsamples=None):
"""
Load calibration data from specified path.
Limit number of samples to calibration_nsamples, if specified.
"""
dataloader = []
skip_load = False
for file in sorted(os.listdir(data_dir)):
if file.endswith('.pkl'):
filepath = os.path.join(data_dir, file)
with open(filepath, 'rb') as data:
try:
while not skip_load:
unet_data = pickle.load(data)
for input in unet_data:
dataloader.append([x.to(torch.float).to(device) for x in input])
if calibration_nsamples:
if len(dataloader) >= calibration_nsamples:
skip_load = True
break
except EOFError:
pass
if skip_load:
break
logger.info(f"Total calibration samples: {len(dataloader)}")
return dataloader
def quantize_module_config(module_name):
"""
Generate quantization config to apply W8A8 quantization for specified module.
Rest of the model is kept in FP32 precision.
"""
config = LinearQuantizerConfig(
global_config=ModuleLinearQuantizerConfig(
milestones=[0, 1000, 1000, 0],
weight_dtype=torch.float32,
activation_dtype=torch.float32,
),
module_name_configs={
module_name: ModuleLinearQuantizerConfig(
quantization_scheme="symmetric",
milestones=[0, 1000, 1000, 0],
),
},
)
return config
def quantize_cumulative_config(skip_conv_layers, skip_einsum_layers):
"""
Generate quantization config to apply W8A8 quantization.
Skipped layers are kept in W8A32 precision.
"""
logger.info(f"Skipping {len(skip_conv_layers)} conv layers and {len(skip_einsum_layers)} einsum layers")
w8config = ModuleLinearQuantizerConfig(
quantization_scheme="symmetric",
milestones=[0, 1000, 1000, 0],
activation_dtype=torch.float32)
conv_modules_config = {name: w8config for name in skip_conv_layers}
einsum_modules_config = {name: w8config for name in skip_einsum_layers}
module_name_config = {}
module_name_config.update(conv_modules_config)
module_name_config.update(einsum_modules_config)
config = LinearQuantizerConfig(
global_config=ModuleLinearQuantizerConfig(
quantization_scheme="symmetric",
milestones=[0, 1000, 1000, 0],
),
module_name_configs=module_name_config,
module_type_configs={
torch.cat: None,
torch.nn.GroupNorm: None,
torch.nn.SiLU: None,
torch.nn.functional.gelu: None,
operator.add: None,
},
)
return config
def quantize(model, config, calibration_data):
"""
Apply post training activation quantization to specified model, using calibration data
"""
submodules = dict(model.named_modules(remove_duplicate=True))
layer_norm_modules = [key for key, val in submodules.items() if isinstance(val, LayerNormANE)]
non_traceable_module_names = layer_norm_modules + [
"time_proj",
"time_embedding",
]
# Mark certain modules as non-traceable to make the UNet model fx traceable
config.non_traceable_module_names = non_traceable_module_names
config.preserved_attributes = ['config', 'device']
sample_input = calibration_data[0]
quantizer = LinearQuantizer(model, config)
logger.info("Preparing model for quantization")
prepared_model = quantizer.prepare(example_inputs=(sample_input,))
prepared_model.eval()
quantizer.step()
logger.info("Calibrate")
for idx, data in enumerate(calibration_data):
logger.info(f"Calibration data sample: {idx}")
prepared_model(*data)
logger.info("Finalize model")
quantized_model = quantizer.finalize()
return quantized_model
def get_quantizable_modules(unet):
quantizable_modules = []
for name, module in unet.named_modules():
if len(list(module.children())) > 0:
continue
if type(module) == torch.nn.modules.conv.Conv2d:
quantizable_modules.append(('conv', name))
if type(module) == Einsum:
quantizable_modules.append(('einsum', name))
return quantizable_modules
def recipe_overrides_for_inference_speedup(conv_layers, skipped_conv):
"""
Quantize the slowest conv layers, even if in skipped set based on PSNR, for good inference speedup
"""
for layer in conv_layers:
if "up_blocks" in layer and "resnets" in layer and "conv1" in layer:
if layer in skipped_conv:
logger.info(f"removing {layer}")
skipped_conv.remove(layer)
if "upsamplers" in layer:
if layer in skipped_conv:
logger.info(f"removing {layer}")
skipped_conv.remove(layer)
def recipe_overrides_for_quality(conv_layers, skipped_conv):
"""
Do not quantize out projection layers to avoid quantizing outputs of preceding concat layers.
Quantizing output of concat layers can lead to quality degradation, due to sharing of scales
across concat inputs, which can have varied ranges. Since this is a constraint enforced during
model conversion, it may not be captured in layer-wise PSNR analysis of PyTorch model.
"""
out_proj_layers = [layer for layer in conv_layers if "to_out" in layer]
for layer in out_proj_layers:
if layer not in skipped_conv:
logger.info(f"adding {layer}")
skipped_conv.add(layer)
def register_input_log_hook(unet, inputs):
"""
Register forward pre hook to save model inputs
"""
def hook(_, input):
input_copy = deepcopy(input)
input_copy = tuple(i.to('cpu') for i in input_copy)
inputs.append(input_copy)
# Return inputs unmodified
return input
return unet.register_forward_pre_hook(hook)
def generate_calibration_data(pipe, args, calibration_dir):
# Register forward pre hook to record unet inputs
unet_inputs = []
handle = register_input_log_hook(pipe.unet, unet_inputs)
# If directory doesn't exist, create it
os.makedirs(calibration_dir, exist_ok=True)
# Run calibration prompts through the pipeline and
# serialize recorded UNet model inputs
for prompt in CALIBRATION_DATA:
gen = torch.manual_seed(args.seed)
# run forward pass
pipe(prompt=prompt, generator=gen)
# save unet inputs
filename = "_".join(prompt.split(" ")) + "_" + str(args.seed) + ".pkl"
filepath = os.path.join(calibration_dir, filename)
with open(filepath, 'wb') as f:
pickle.dump(unet_inputs, f)
# clear
unet_inputs.clear()
handle.remove()
def register_input_preprocessing_hook(pipe):
"""
Register forward pre hook to convert UNet inputs from HuggingFace StableDiffusionPipeline
to match expected model inputs in UNet2DConditionModel defined in unet.py
"""
def hook(_, args, kwargs):
sample = args[0]
timestep = args[1]
if len(timestep.shape) == 0:
timestep = timestep[None]
timestep = timestep.expand(sample.shape[0])
encoder_hidden_states = kwargs["encoder_hidden_states"]
encoder_hidden_states = encoder_hidden_states.permute((0, 2, 1)).unsqueeze(2)
modified_args = (sample, timestep, encoder_hidden_states)
return (modified_args, {})
return pipe.unet.register_forward_pre_hook(hook, with_kwargs=True)
def prepare_pipe(pipe, unet):
"""
Create a new pipeline from `pipe` with `unet` as the noise predictor
"""
new_pipe = deepcopy(pipe)
unet.to(new_pipe.unet.device)
new_pipe.unet = unet
pre_hook_handle = register_input_preprocessing_hook(new_pipe)
return new_pipe, pre_hook_handle
def run_pipe(pipe):
gen = torch.manual_seed(args.seed)
kwargs = dict(
prompt=RANDOM_TEST_DATA,
output_type="latent",
generator=gen,
)
return np.array([latent.cpu().numpy() for latent in pipe(**kwargs).images])
def get_reference_pipeline(model_version):
# Initialize pipe
pipe = StableDiffusionPipeline.from_pretrained(
model_version,
use_safetensors=True,
use_auth_token=True,
)
DEFAULT_NUM_INFERENCE_STEPS = 50
pipe.scheduler.set_timesteps(DEFAULT_NUM_INFERENCE_STEPS)
# Initialize reference unet
unet_cls = unet.UNet2DConditionModel
reference_unet = unet_cls(**pipe.unet.config).eval()
reference_unet.load_state_dict(pipe.unet.state_dict())
# Initialize reference pipeline
ref_pipe, _ = prepare_pipe(pipe, reference_unet)
del pipe
gc.collect()
return ref_pipe
def main(args):
# Initialize reference pipeline
ref_pipe = get_reference_pipeline(args.model_version)
if torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
logger.debug(f"Placing pipe in {device}")
ref_pipe.to(device)
# Generate baseline outputs
ref_out = run_pipe(ref_pipe)
# Setup artifact file paths
os.makedirs(args.o, exist_ok=True)
recipe_json_path = os.path.join(args.o, f"{args.model_version.replace('/', '_')}_quantization_recipe.json")
calibration_dir = os.path.join(args.o, f"calibration_data_{args.model_version.replace('/', '_')}")
# Generate calibration data
if args.generate_calibration_data:
generate_calibration_data(ref_pipe, args, calibration_dir)
# Compute layer-wise PSNR
if args.layerwise_sensitivity:
logger.info("Compute Layer-wise PSNR")
quantizable_modules = get_quantizable_modules(ref_pipe.unet)
results = {
'conv': {},
'einsum': {},
'model_version': args.model_version
}
dataloader = unet_data_loader(calibration_dir, device, args.calibration_nsamples)
for module_type, module_name in tqdm(quantizable_modules):
logger.info(f"Quantizing UNet Layer: {module_name}")
config = quantize_module_config(module_name)
quantized_unet = quantize(ref_pipe.unet, config, dataloader)
# Generate outputs from quantized model
q_pipe, _ = prepare_pipe(ref_pipe, quantized_unet)
test_out = run_pipe(q_pipe)
psnr = [float(f"{compute_psnr(r, t):.1f}") for r, t in zip(ref_out, test_out)]
logger.info(f"PSNR: {psnr}")
avg_psnr = sum(psnr) / len(psnr)
logger.info(f"AVG PSNR: {avg_psnr}")
results[module_type][module_name] = avg_psnr
del quantized_unet
del q_pipe
gc.collect()
with open(recipe_json_path, 'w') as f:
json.dump(results, f, indent=2)
if args.quantize_pytorch:
logger.info("Quantizing UNet PyTorch model")
dataloader = unet_data_loader(calibration_dir, device, args.calibration_nsamples)
with open(recipe_json_path, "r") as f:
results = json.load(f)
logger.info(f"Conv PSNR threshold: {args.conv_psnr}, Attn PSNR threshold: {args.attn_psnr}")
skipped_conv = set([layer for layer, psnr in results['conv'].items() if psnr < args.conv_psnr])
skipped_einsum = set([layer for layer, psnr in results['einsum'].items() if psnr < args.attn_psnr])
# Apply some overrides on PSNR based recipe for inference and quality improvements
# Users can disable these selectively based on specific targets
recipe_overrides_for_inference_speedup(results['conv'].keys(), skipped_conv)
recipe_overrides_for_quality(results['conv'].keys(), skipped_conv)
config = quantize_cumulative_config(skipped_conv, skipped_einsum)
quantized_unet = quantize(ref_pipe.unet, config, dataloader)
# Generate outputs from quantized model
q_pipe, handle = prepare_pipe(ref_pipe, quantized_unet)
test_out = run_pipe(q_pipe)
psnr = [float(f"{compute_psnr(r, t):.1f}") for r, t in zip(ref_out, test_out)]
logger.info(f"PSNR: {psnr}")
avg_psnr = sum(psnr) / len(psnr)
logger.info(f"AVG PSNR: {avg_psnr}")
handle.remove()
quantized_unet.to('cpu')
sample_unet_input = {
"sample": dataloader[0][0].to('cpu'),
"timestep": dataloader[0][1].to('cpu'),
"encoder_hidden_states": dataloader[0][2].to('cpu'),
}
logger.info("JIT tracing quantized model")
traced_model = torch.jit.trace(quantized_unet, example_inputs=list(sample_unet_input.values()))
logger.info("Converting to CoreML")
coreml_sample_unet_input = {
k: v.numpy().astype(np.float16)
for k, v in sample_unet_input.items()
}
coreml_model = convert_to_coreml(traced_model, coreml_sample_unet_input)
coreml_filename = f"Stable_Diffusion_version_{args.model_version.replace('/', '_')}_unet.mlpackage"
coreml_model.save(os.path.join(args.o, coreml_filename))
del q_pipe
del ref_pipe
gc.collect()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"-o",
required=True,
help="Output directory to save calibration data and quantization artifacts"
)
parser.add_argument(
"--model-version",
required=True,
choices=("runwayml/stable-diffusion-v1-5", "stabilityai/stable-diffusion-2-1-base"),
help=
("The pre-trained model checkpoint and configuration to restore"
))
parser.add_argument(
"--generate-calibration-data",
action="store_true",
help="Generate calibration data for UNet model"
)
parser.add_argument(
"--layerwise-sensitivity",
action="store_true",
help="Compute compression sensitivity per-layer, by quantizing one layer at a time"
)
parser.add_argument(
"--quantize-pytorch",
action="store_true",
help="Generate activation quantized UNet model by quantizing layers above specified PSNR threshold"
)
parser.add_argument(
"--calibration-nsamples",
type=int,
help="Number of samples to use for calibrating UNet model"
)
parser.add_argument("--seed",
"-s",
default=50,
type=int,
help="Random seed to be able to reproduce results"
)
parser.add_argument("--conv-psnr",
default=40.0,
type=float,
help="PSNR threshold for convolutional layers (default for stabilityai/stable-diffusion-2-1-base)"
)
parser.add_argument("--attn-psnr",
default=30.0,
type=float,
help="PSNR threshold for attention (Einsum) layers (default for stabilityai/stable-diffusion-2-1-base)"
)
args = parser.parse_args()
main(args)
================================================
FILE: python_coreml_stable_diffusion/attention.py
================================================
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
import torch
import math
SPLIT_SOFTMAX = False
def softmax(x, dim):
# Reduction max
max_x = x.max(dim=dim, keepdim=True).values
# EW sub
x -= max_x
# Scale for EXP to EXP2, Activation EXP2
scaled_x = x * (1 / math.log(2))
exp_act = torch.exp2(scaled_x)
# Reduction Sum + Inv
exp_sum_inv = 1 / exp_act.sum(dim=dim, keepdims=True)
# EW Mult
return exp_act * exp_sum_inv
def split_einsum(q, k, v, mask, heads, dim_head):
""" Attention Implementation backing AttentionImplementations.SPLIT_EINSUM
- Implements https://machinelearning.apple.com/research/neural-engine-transformers
- Recommended for ANE
- Marginally slower on GPU
"""
mh_q = [
q[:, head_idx * dim_head:(head_idx + 1) *
dim_head, :, :] for head_idx in range(heads)
] # (bs, dim_head, 1, max_seq_length) * heads
k = k.transpose(1, 3)
mh_k = [
k[:, :, :,
head_idx * dim_head:(head_idx + 1) * dim_head]
for head_idx in range(heads)
] # (bs, max_seq_length, 1, dim_head) * heads
mh_v = [
v[:, head_idx * dim_head:(head_idx + 1) *
dim_head, :, :] for head_idx in range(heads)
] # (bs, dim_head, 1, max_seq_length) * heads
attn_weights = [
torch.einsum("bchq,bkhc->bkhq", [qi, ki]) * (dim_head**-0.5)
for qi, ki in zip(mh_q, mh_k)
] # (bs, max_seq_length, 1, max_seq_length) * heads
if mask is not None:
for head_idx in range(heads):
attn_weights[head_idx] = attn_weights[head_idx] + mask
if SPLIT_SOFTMAX:
attn_weights = [
softmax(aw, dim=1) for aw in attn_weights
] # (bs, max_seq_length, 1, max_seq_length) * heads
else:
attn_weights = [
aw.softmax(dim=1) for aw in attn_weights
] # (bs, max_seq_length, 1, max_seq_length) * heads
attn = [
torch.einsum("bkhq,bchk->bchq", wi, vi)
for wi, vi in zip(attn_weights, mh_v)
] # (bs, dim_head, 1, max_seq_length) * heads
attn = torch.cat(attn, dim=1) # (bs, dim, 1, max_seq_length)
return attn
CHUNK_SIZE = 512
def split_einsum_v2(q, k, v, mask, heads, dim_head):
""" Attention Implementation backing AttentionImplementations.SPLIT_EINSUM_V2
- Implements https://machinelearning.apple.com/research/neural-engine-transformers
- Recommended for ANE
- Marginally slower on GPU
- Chunks the query sequence to avoid large intermediate tensors and improves ANE performance
"""
query_seq_length = q.size(3)
num_chunks = query_seq_length // CHUNK_SIZE
if num_chunks == 0:
logger.info(
"AttentionImplementations.SPLIT_EINSUM_V2: query sequence too short to chunk "
f"({query_seq_length}<{CHUNK_SIZE}), fall back to AttentionImplementations.SPLIT_EINSUM (safe to ignore)")
return split_einsum(q, k, v, mask, heads, dim_head)
logger.info(
"AttentionImplementations.SPLIT_EINSUM_V2: Splitting query sequence length of "
f"{query_seq_length} into {num_chunks} chunks")
mh_q = [
q[:, head_idx * dim_head:(head_idx + 1) *
dim_head, :, :] for head_idx in range(heads)
] # (bs, dim_head, 1, max_seq_length) * heads
# Chunk the query sequence for each head
mh_q_chunked = [
[h_q[..., chunk_idx * CHUNK_SIZE:(chunk_idx + 1) * CHUNK_SIZE] for chunk_idx in range(num_chunks)]
for h_q in mh_q
] # ((bs, dim_head, 1, QUERY_SEQ_CHUNK_SIZE) * num_chunks) * heads
k = k.transpose(1, 3)
mh_k = [
k[:, :, :,
head_idx * dim_head:(head_idx + 1) * dim_head]
for head_idx in range(heads)
] # (bs, max_seq_length, 1, dim_head) * heads
mh_v = [
v[:, head_idx * dim_head:(head_idx + 1) *
dim_head, :, :] for head_idx in range(heads)
] # (bs, dim_head, 1, max_seq_length) * heads
attn_weights = [
[
torch.einsum("bchq,bkhc->bkhq", [qi_chunk, ki]) * (dim_head**-0.5)
for qi_chunk in h_q_chunked
] for h_q_chunked, ki in zip(mh_q_chunked, mh_k)
] # ((bs, max_seq_length, 1, chunk_size) * num_chunks) * heads
attn_weights = [
[aw_chunk.softmax(dim=1) for aw_chunk in aw_chunked]
for aw_chunked in attn_weights
] # ((bs, max_seq_length, 1, chunk_size) * num_chunks) * heads
attn = [
[
torch.einsum("bkhq,bchk->bchq", wi_chunk, vi)
for wi_chunk in wi_chunked
] for wi_chunked, vi in zip(attn_weights, mh_v)
] # ((bs, dim_head, 1, chunk_size) * num_chunks) * heads
attn = torch.cat([
torch.cat(attn_chunked, dim=3) for attn_chunked in attn
], dim=1) # (bs, dim, 1, max_seq_length)
return attn
def original(q, k, v, mask, heads, dim_head):
""" Attention Implementation backing AttentionImplementations.ORIGINAL
- Not recommended for ANE
- Recommended for GPU
"""
bs = q.size(0)
mh_q = q.view(bs, heads, dim_head, -1)
mh_k = k.view(bs, heads, dim_head, -1)
mh_v = v.view(bs, heads, dim_head, -1)
attn_weights = torch.einsum("bhcq,bhck->bhqk", [mh_q, mh_k])
attn_weights.mul_(dim_head**-0.5)
if mask is not None:
attn_weights = attn_weights + mask
attn_weights = attn_weights.softmax(dim=3)
attn = torch.einsum("bhqk,bhck->bhcq", [attn_weights, mh_v])
attn = attn.contiguous().view(bs, heads * dim_head, 1, -1)
return attn
================================================
FILE: python_coreml_stable_diffusion/chunk_mlprogram.py
================================================
#
# For licensing see accompanying LICENSE.md file.
# Copyright (C) 2022 Apple Inc. All Rights Reserved.
#
import argparse
from collections import OrderedDict
import coremltools as ct
from coremltools.converters.mil import Block, Program, Var
from coremltools.converters.mil.frontend.milproto.load import load as _milproto_to_pymil
from coremltools.converters.mil.mil import Builder as mb
from coremltools.converters.mil.mil import Placeholder
from coremltools.converters.mil.mil import types as types
from coremltools.converters.mil.mil.passes.helper import block_context_manager
from coremltools.converters.mil.mil.passes.pass_registry import PASS_REGISTRY
from coremltools.converters.mil.testing_utils import random_gen_input_feature_type
import gc
import logging
logging.basicConfig()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
import numpy as np
import os
from python_coreml_stable_diffusion import torch2coreml
import shutil
import time
def _verify_output_correctness_of_chunks(full_model,
first_chunk_model=None,
second_chunk_model=None,
pipeline_model=None,):
""" Verifies the end-to-end output correctness of full (original) model versus chunked models
"""
# Generate inputs for first chunk and full model
input_dict = {}
for input_desc in full_model._spec.description.input:
input_dict[input_desc.name] = random_gen_input_feature_type(input_desc)
# Generate outputs for full model
outputs_from_full_model = full_model.predict(input_dict)
if pipeline_model is not None:
outputs_from_pipeline_model = pipeline_model.predict(input_dict)
final_outputs = outputs_from_pipeline_model
elif first_chunk_model is not None and second_chunk_model is not None:
# Generate outputs for first chunk
outputs_from_first_chunk_model = first_chunk_model.predict(input_dict)
# Prepare inputs for second chunk model from first chunk's outputs and regular inputs
second_chunk_input_dict = {}
for input_desc in second_chunk_model._spec.description.input:
if input_desc.name in outputs_from_first_chunk_model:
second_chunk_input_dict[
input_desc.name] = outputs_from_first_chunk_model[
input_desc.name]
else:
second_chunk_input_dict[input_desc.name] = input_dict[
input_desc.name]
# Generate output for second chunk model
outputs_from_second_chunk_model = second_chunk_model.predict(
second_chunk_input_dict)
final_outputs = outputs_from_second_chunk_model
else:
raise ValueError
# Verify correctness across all outputs from second chunk and full model
for out_name in outputs_from_full_model.keys():
torch2coreml.report_correctness(
original_outputs=outputs_from_full_model[out_name],
final_outputs=final_outputs[out_name],
log_prefix=f"{out_name}")
def _load_prog_from_mlmodel(model):
""" Load MIL Program from an MLModel
"""
model_spec = model.get_spec()
start_ = time.time()
logger.info(
"Loading MLModel object into a MIL Program object (including the weights).."
)
prog = _milproto_to_pymil(
model_spec=model_spec,
specification_version=model_spec.specificationVersion,
file_weights_dir=model.weights_dir,
)
logger.info(f"Program loaded in {time.time() - start_:.1f} seconds")
return prog
def _get_op_idx_split_location(prog: Program):
""" Find the op that approximately bisects the graph as measure by weights size on each side
"""
main_block = prog.functions["main"]
main_block.operations = list(main_block.operations)
total_size_in_mb = 0
for op in main_block.operations:
if op.op_type == "const" and isinstance(op.val.val, np.ndarray):
size_in_mb = op.val.val.size * op.val.val.itemsize / (1024 * 1024)
total_size_in_mb += size_in_mb
half_size = total_size_in_mb / 2
# Find the first non const op (single child), where the total cumulative size exceeds
# the half size for the first time
cumulative_size_in_mb = 0
for op in main_block.operations:
if op.op_type == "const" and isinstance(op.val.val, np.ndarray):
size_in_mb = op.val.val.size * op.val.val.itemsize / (1024 * 1024)
cumulative_size_in_mb += size_in_mb
# Note: The condition "not op.op_type.startswith("const")" is to make sure that the
# incision op is neither of type "const" nor "constexpr_*" ops that
# are used to store compressed weights
if (cumulative_size_in_mb > half_size and not op.op_type.startswith("const")
and len(op.outputs) == 1
and len(op.outputs[0].child_ops) == 1):
op_idx = main_block.operations.index(op)
return op_idx, cumulative_size_in_mb, total_size_in_mb
def _get_first_chunk_outputs(block, op_idx):
# Get the list of all vars that go across from first program (all ops from 0 to op_idx (inclusive))
# to the second program (all ops from op_idx+1 till the end). These all vars need to be made the output
# of the first program and the input of the second program
boundary_vars = set()
block.operations = list(block.operations)
for i in range(op_idx + 1):
op = block.operations[i]
if not op.op_type.startswith("const"):
for var in op.outputs:
if var.val is None: # only consider non const vars
for child_op in var.child_ops:
child_op_idx = block.operations.index(child_op)
if child_op_idx > op_idx:
boundary_vars.add(var)
return list(boundary_vars)
@block_context_manager
def _add_fp32_casts(block, boundary_vars):
new_boundary_vars = []
for var in boundary_vars:
if var.dtype != types.fp16:
new_boundary_vars.append(var)
else:
fp32_var = mb.cast(x=var, dtype="fp32", name=var.name)
new_boundary_vars.append(fp32_var)
return new_boundary_vars
def _make_first_chunk_prog(prog, op_idx):
""" Build first chunk by declaring early outputs and removing unused subgraph
"""
block = prog.functions["main"]
boundary_vars = _get_first_chunk_outputs(block, op_idx)
# Due to possible numerical issues, cast any fp16 var to fp32
new_boundary_vars = _add_fp32_casts(block, boundary_vars)
block.outputs.clear()
block.set_outputs(new_boundary_vars)
PASS_REGISTRY["common::dead_code_elimination"](prog)
return prog
def _make_second_chunk_prog(prog, op_idx):
""" Build second chunk by rebuilding a pristine MIL Program from MLModel
"""
block = prog.functions["main"]
block.opset_version = ct.target.iOS16
# First chunk outputs are second chunk inputs (e.g. skip connections)
boundary_vars = _get_first_chunk_outputs(block, op_idx)
# This op will not be included in this program. Its output var will be made into an input
block.operations = list(block.operations)
boundary_op = block.operations[op_idx]
# Add all boundary ops as inputs
with block:
for var in boundary_vars:
new_placeholder = Placeholder(
sym_shape=var.shape,
dtype=var.dtype if var.dtype != types.fp16 else types.fp32,
name=var.name,
)
block._input_dict[
new_placeholder.outputs[0].name] = new_placeholder.outputs[0]
block.function_inputs = tuple(block._input_dict.values())
new_var = None
if var.dtype == types.fp16:
new_var = mb.cast(x=new_placeholder.outputs[0],
dtype="fp16",
before_op=var.op)
else:
new_var = new_placeholder.outputs[0]
block.replace_uses_of_var_after_op(
anchor_op=boundary_op,
old_var=var,
new_var=new_var,
# This is needed if the program contains "constexpr_*" ops. In normal cases, there are stricter
# rules for removing them, and their presence may prevent replacing this var.
# However in this case, since we want to remove all the ops in chunk 1, we can safely
# set this to True.
force_replace=True,
)
PASS_REGISTRY["common::dead_code_elimination"](prog)
# Remove any unused inputs
new_input_dict = OrderedDict()
for k, v in block._input_dict.items():
if len(v.child_ops) > 0:
new_input_dict[k] = v
block._input_dict = new_input_dict
block.function_inputs = tuple(block._input_dict.values())
return prog
def _legacy_model_chunking(args):
# TODO: Remove this method after setting the coremltools dependency >= 8.0
os.makedirs(args.o, exist_ok=True)
# Check filename extension
mlpackage_name = os.path.basename(args.mlpackage_path)
name, ext = os.path.splitext(mlpackage_name)
assert ext == ".mlpackage", f"`--mlpackage-path` (args.mlpackage_path) is not an .mlpackage file"
# Load CoreML model
logger.info("Loading model from {}".format(args.mlpackage_path))
start_ = time.time()
model = ct.models.MLModel(
args.mlpackage_path,
compute_units=ct.ComputeUnit.CPU_ONLY,
)
logger.info(
f"Loading {args.mlpackage_path} took {time.time() - start_:.1f} seconds"
)
# Load the MIL Program from MLModel
prog = _load_prog_from_mlmodel(model)
# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
prog)
main_block = prog.functions["main"]
incision_op = main_block.operations[op_idx]
logger.info(f"{args.mlpackage_path} will chunked into two pieces.")
logger.info(
f"The incision op: name={incision_op.name}, type={incision_op.op_type}, index={op_idx}/{len(main_block.operations)}"
)
logger.info(f"First chunk size = {first_chunk_weights_size:.2f} MB")
logger.info(
f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB"
)
# Build first chunk (in-place modifies prog by declaring early exits and removing unused subgraph)
prog_chunk1 = _make_first_chunk_prog(prog, op_idx)
# Build the second chunk
prog_chunk2 = _make_second_chunk_prog(_load_prog_from_mlmodel(model),
op_idx)
if not args.check_output_correctness:
# Original model no longer needed in memory
del model
gc.collect()
# Convert the MIL Program objects into MLModels
logger.info("Converting the two programs")
model_chunk1 = ct.convert(
prog_chunk1,
convert_to="mlprogram",
compute_units=ct.ComputeUnit.CPU_ONLY,
minimum_deployment_target=ct.target.iOS16,
)
del prog_chunk1
gc.collect()
logger.info("Conversion of first chunk done.")
model_chunk2 = ct.convert(
prog_chunk2,
convert_to="mlprogram",
compute_units=ct.ComputeUnit.CPU_ONLY,
minimum_deployment_target=ct.target.iOS16,
)
del prog_chunk2
gc.collect()
logger.info("Conversion of second chunk done.")
# Verify output correctness
if args.check_output_correctness:
logger.info("Verifying output correctness of chunks")
_verify_output_correctness_of_chunks(
full_model=model,
first_chunk_model=model_chunk1,
second_chunk_model=model_chunk2,
)
if args.merge_chunks_in_pipeline_model:
# Make a single pipeline model to manage the model chunks
pipeline_model = ct.utils.make_pipeline(model_chunk1, model_chunk2)
out_path_pipeline = os.path.join(args.o, name + "_chunked_pipeline.mlpackage")
# Save and reload to ensure CPU placement
pipeline_model.save(out_path_pipeline)
pipeline_model = ct.models.MLModel(out_path_pipeline, compute_units=ct.ComputeUnit.CPU_ONLY)
if args.check_output_correctness:
logger.info("Verifying output correctness of pipeline model")
_verify_output_correctness_of_chunks(
full_model=model,
pipeline_model=pipeline_model,
)
else:
# Save the chunked models to disk
out_path_chunk1 = os.path.join(args.o, name + "_chunk1.mlpackage")
out_path_chunk2 = os.path.join(args.o, name + "_chunk2.mlpackage")
logger.info(
f"Saved chunks in {args.o} with the suffix _chunk1.mlpackage and _chunk2.mlpackage"
)
model_chunk1.save(out_path_chunk1)
model_chunk2.save(out_path_chunk2)
logger.info("Done.")
def main(args):
ct_version = ct.__version__
if ct_version != "8.0b2" and ct_version < "8.0":
# With coremltools version <= 8.0b1,
# we use the legacy implementation.
# TODO: Remove the logic after setting the coremltools dependency >= 8.0.
logger.info(
f"coremltools version {ct_version} detected. Recommended upgrading the package version to "
f"'8.0b2' when you running chunk_mlprogram.py script for the latest supports and bug fixes."
)
_legacy_model_chunking(args)
else:
# Starting from coremltools==8.0b2, there is this `bisect_model` API that
# we can directly call into.
from coremltools.models.utils import bisect_model
logger.info(f"Start chunking model {args.mlpackage_path} into two pieces.")
ct.models.utils.bisect_model(
model=args.mlpackage_path,
output_dir=args.o,
merge_chunks_to_pipeline=args.merge_chunks_in_pipeline_model,
check_output_correctness=args.check_output_correctness,
)
logger.info(f"Model chunking is done.")
# Remove original (non-chunked) model if requested
if args.remove_original:
logger.info(
"Removing original (non-chunked) model at {args.mlpackage_path}")
shutil.rmtree(args.mlpackage_path)
logger.info("Done.")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--mlpackage-path",
required=True,
help=
"Path to the mlpackage file to be split into two mlpackages of approximately same file size.",
)
parser.add_argument(
"-o",
required=True,
help=
"Path to output directory where the two model chunks should be saved.",
)
parser.add_argument(
"--remove-original",
action="store_true",
help=
"If specified, removes the original (non-chunked) model to avoid duplicating storage."
)
parser.add_argument(
"--check-output-correctness",
action="store_true",
help=
("If specified, compares the outputs of original Core ML model with that of pipelined CoreML model chunks and reports PSNR in dB. ",
"Enabling this feature uses more memory. Disable it if your machine runs out of memory."
))
parser.add_argument(
"--merge-chunks-in-pipeline-model",
action="store_true",
help=
("If specified, model chunks are managed inside a single pipeline model for easier asset maintenance"
))
args = parser.parse_args()
main(args)
================================================
FILE: python_coreml_stable_diffusion/controlnet.py
================================================
#
# For licensing see accompanying LICENSE.md file.
# Copyright (C) 2022 Apple Inc. All Rights Reserved.
#
from diffusers.configuration_utils import ConfigMixin, register_to_config
from diffusers import ModelMixin
import torch
import torch.nn as nn
import torch.nn.functional as F
from .unet import Timesteps, TimestepEmbedding, get_down_block, UNetMidBlock2DCrossAttn, linear_to_conv2d_map
class ControlNetConditioningEmbedding(nn.Module):
def __init__(
self,
conditioning_embedding_channels,
conditioning_channels=3,
block_out_channels=(16, 32, 96, 256),
):
super().__init__()
self.conv_in = nn.Conv2d(conditioning_channels, block_out_channels[0], kernel_size=3, padding=1)
self.blocks = nn.ModuleList([])
for i in range(len(block_out_channels) - 1):
channel_in = block_out_channels[i]
channel_out = block_out_channels[i + 1]
self.blocks.append(nn.Conv2d(channel_in, channel_in, kernel_size=3, padding=1))
self.blocks.append(nn.Conv2d(channel_in, channel_out, kernel_size=3, padding=1, stride=2))
self.conv_out = nn.Conv2d(block_out_channels[-1], conditioning_embedding_channels, kernel_size=3, padding=1)
def forward(self, conditioning):
embedding = self.conv_in(conditioning)
embedding = F.silu(embedding)
for block in self.blocks:
embedding = block(embedding)
embedding = F.silu(embedding)
embedding = self.conv_out(embedding)
return embedding
class ControlNetModel(ModelMixin, ConfigMixin):
@register_to_config
def __init__(
self,
in_channels=4,
flip_sin_to_cos=True,
freq_shift=0,
down_block_types=(
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D",
"DownBlock2D",
),
only_cross_attention=False,
block_out_channels=(320, 640, 1280, 1280),
layers_per_block=2,
downsample_padding=1,
mid_block_scale_factor=1,
act_fn="silu",
norm_num_groups=32,
norm_eps=1e-5,
cross_attention_dim=1280,
transformer_layers_per_block=1,
attention_head_dim=8,
use_linear_projection=False,
upcast_attention=False,
resnet_time_scale_shift="default",
conditioning_embedding_out_channels=(16, 32, 96, 256),
**kwargs,
):
super().__init__()
# Check inputs
if len(block_out_channels) != len(down_block_types):
raise ValueError(
f"Must provide the same number of `block_out_channels` as `down_block_types`. `block_out_channels`: {block_out_channels}. `down_block_types`: {down_block_types}."
)
if not isinstance(only_cross_attention, bool) and len(only_cross_attention) != len(down_block_types):
raise ValueError(
f"Must provide the same number of `only_cross_attention` as `down_block_types`. `only_cross_attention`: {only_cross_attention}. `down_block_types`: {down_block_types}."
)
if not isinstance(attention_head_dim, int) and len(attention_head_dim) != len(down_block_types):
raise ValueError(
f"Must provide the same number of `attention_head_dim` as `down_block_types`. `attention_head_dim`: {attention_head_dim}. `down_block_types`: {down_block_types}."
)
self._register_load_state_dict_pre_hook(linear_to_conv2d_map)
# input
conv_in_kernel = 3
conv_in_padding = (conv_in_kernel - 1) // 2
self.conv_in = nn.Conv2d(
in_channels, block_out_channels[0], kernel_size=conv_in_kernel, padding=conv_in_padding
)
# time
time_embed_dim = block_out_channels[0] * 4
self.time_proj = Timesteps(block_out_channels[0], flip_sin_to_cos, freq_shift)
timestep_input_dim = block_out_channels[0]
self.time_embedding = TimestepEmbedding(
timestep_input_dim,
time_embed_dim,
)
# control net conditioning embedding
self.controlnet_cond_embedding = ControlNetConditioningEmbedding(
conditioning_embedding_channels=block_out_channels[0],
block_out_channels=conditioning_embedding_out_channels,
)
self.down_blocks = nn.ModuleList([])
self.controlnet_down_blocks = nn.ModuleList([])
if isinstance(only_cross_attention, bool):
only_cross_attention = [only_cross_attention] * len(down_block_types)
if isinstance(attention_head_dim, int):
attention_head_dim = (attention_head_dim,) * len(down_block_types)
if isinstance(transformer_layers_per_block, int):
transformer_layers_per_block = [transformer_layers_per_block] * len(down_block_types)
# down
output_channel = block_out_channels[0]
controlnet_block = nn.Conv2d(output_channel, output_channel, kernel_size=1)
self.controlnet_down_blocks.append(controlnet_block)
for i, down_block_type in enumerate(down_block_types):
input_channel = output_channel
output_channel = block_out_channels[i]
is_final_block = i == len(block_out_channels) - 1
down_block = get_down_block(
down_block_type,
transformer_layers_per_block=transformer_layers_per_block[i],
num_layers=layers_per_block,
in_channels=input_channel,
out_channels=output_channel,
temb_channels=time_embed_dim,
resnet_eps=norm_eps,
resnet_act_fn=act_fn,
cross_attention_dim=cross_attention_dim,
attn_num_head_channels=attention_head_dim[i],
downsample_padding=downsample_padding,
add_downsample=not is_final_block,
)
self.down_blocks.append(down_block)
for _ in range(layers_per_block):
controlnet_block = nn.Conv2d(output_channel, output_channel, kernel_size=1)
self.controlnet_down_blocks.append(controlnet_block)
if not is_final_block:
controlnet_block = nn.Conv2d(output_channel, output_channel, kernel_size=1)
self.controlnet_down_blocks.append(controlnet_block)
# mid
mid_block_channel = block_out_channels[-1]
controlnet_block = nn.Conv2d(mid_block_channel, mid_block_channel, kernel_size=1)
self.controlnet_mid_block = controlnet_block
self.mid_block = UNetMidBlock2DCrossAttn(
in_channels=mid_block_channel,
temb_channels=time_embed_dim,
resnet_eps=norm_eps,
resnet_act_fn=act_fn,
output_scale_factor=mid_block_scale_factor,
resnet_time_scale_shift=resnet_time_scale_shift,
cross_attention_dim=cross_attention_dim,
attn_num_head_channels=attention_head_dim[-1],
resnet_groups=norm_num_groups,
use_linear_projection=use_linear_projection,
upcast_attention=upcast_attention,
)
def get_num_residuals(self):
num_res = 2 # initial sample + mid block
for down_block in self.down_blocks:
num_res += len(down_block.resnets)
if hasattr(down_block, "downsamplers") and down_block.downsamplers is not None:
num_res += len(down_block.downsamplers)
return num_res
def forward(
self,
sample,
timestep,
encoder_hidden_states,
controlnet_cond,
):
# 1. time
t_emb = self.time_proj(timestep)
emb = self.time_embedding(t_emb)
# 2. pre-process
sample = self.conv_in(sample)
controlnet_cond = self.controlnet_cond_embedding(controlnet_cond)
sample += controlnet_cond
# 3. down
down_block_res_samples = (sample,)
for downsample_block in self.down_blocks:
if hasattr(downsample_block, "attentions") and downsample_block.attentions is not None:
sample, res_samples = downsample_block(
hidden_states=sample,
temb=emb,
encoder_hidden_states=encoder_hidden_states,
)
else:
sample, res_samples = downsample_block(hidden_states=sample, temb=emb)
down_block_res_samples += res_samples
# 4. mid
if self.mid_block is not None:
sample = self.mid_block(
sample,
emb,
encoder_hidden_states=encoder_hidden_states,
)
# 5. Control net blocks
controlnet_down_block_res_samples = ()
for down_block_res_sample, controlnet_block in zip(down_block_res_samples, self.controlnet_down_blocks):
down_block_res_sample = controlnet_block(down_block_res_sample)
controlnet_down_block_res_samples += (down_block_res_sample,)
down_block_res_samples = controlnet_down_block_res_samples
mid_block_res_sample = self.controlnet_mid_block(sample)
return down_block_res_samples, mid_block_res_sample
================================================
FILE: python_coreml_stable_diffusion/coreml_model.py
================================================
#
# For licensing see accompanying LICENSE.md file.
# Copyright (C) 2022 Apple Inc. All Rights Reserved.
#
import coremltools as ct
import logging
import json
logging.basicConfig()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
import numpy as np
import os
import time
import subprocess
import sys
def _macos_version():
"""
Returns macOS version as a tuple of integers. On non-Macs, returns an empty tuple.
"""
if sys.platform == "darwin":
try:
ver_str = subprocess.run(["sw_vers", "-productVersion"], stdout=subprocess.PIPE).stdout.decode('utf-8').strip('\n')
return tuple([int(v) for v in ver_str.split(".")])
except:
raise Exception("Unable to determine the macOS version")
return ()
class CoreMLModel:
""" Wrapper for running CoreML models using coremltools
"""
def __init__(self, model_path, compute_unit, sources='packages', optimization_hints=None):
logger.info(f"Loading {model_path}")
start = time.time()
if sources == 'packages':
assert os.path.exists(model_path) and model_path.endswith(".mlpackage")
self.model = ct.models.MLModel(
model_path,
compute_units=ct.ComputeUnit[compute_unit],
optimization_hints=optimization_hints,
)
DTYPE_MAP = {
65552: np.float16,
65568: np.float32,
131104: np.int32,
}
self.expected_inputs = {
input_tensor.name: {
"shape": tuple(input_tensor.type.multiArrayType.shape),
"dtype": DTYPE_MAP[input_tensor.type.multiArrayType.dataType],
}
for input_tensor in self.model._spec.description.input
}
elif sources == 'compiled':
assert os.path.exists(model_path) and model_path.endswith(".mlmodelc")
self.model = ct.models.CompiledMLModel(
model_path,
compute_units=ct.ComputeUnit[compute_unit],
optimization_hints=optimization_hints,
)
# Grab expected inputs from metadata.json
with open(os.path.join(model_path, 'metadata.json'), 'r') as f:
config = json.load(f)[0]
self.expected_inputs = {
input_tensor['name']: {
"shape": tuple(eval(input_tensor['shape'])),
"dtype": np.dtype(input_tensor['dataType'].lower()),
}
for input_tensor in config['inputSchema']
}
else:
raise ValueError(f'Expected `packages` or `compiled` for sources, received {sources}')
load_time = time.time() - start
logger.info(f"Done. Took {load_time:.1f} seconds.")
if load_time > LOAD_TIME_INFO_MSG_TRIGGER:
logger.info(
"Loading a CoreML model through coremltools triggers compilation every time. "
"The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load."
)
def _verify_inputs(self, **kwargs):
for k, v in kwargs.items():
if k in self.expected_inputs:
if not isinstance(v, np.ndarray):
raise TypeError(
f"Expected numpy.ndarray, got {v} for input: {k}")
expected_dtype = self.expected_inputs[k]["dtype"]
if not v.dtype == expected_dtype:
raise TypeError(
f"Expected dtype {expected_dtype}, got {v.dtype} for input: {k}"
)
expected_shape = self.expected_inputs[k]["shape"]
if not v.shape == expected_shape:
raise TypeError(
f"Expected shape {expected_shape}, got {v.shape} for input: {k}"
)
else:
raise ValueError(f"Received unexpected input kwarg: {k}")
def __call__(self, **kwargs):
self._verify_inputs(**kwargs)
return self.model.predict(kwargs)
LOAD_TIME_INFO_MSG_TRIGGER = 10 # seconds
def get_resource_type(resources_dir: str) -> str:
"""
Detect resource type based on filepath extensions.
returns:
`packages`: for .mlpackage resources
'compiled`: for .mlmodelc resources
"""
directories = [f for f in os.listdir(resources_dir) if os.path.isdir(os.path.join(resources_dir, f))]
# consider directories ending with extension
extensions = set([os.path.splitext(e)[1] for e in directories if os.path.splitext(e)[1]])
# if one extension present we may be able to infer sources type
if len(set(extensions)) == 1:
extension = extensions.pop()
else:
raise ValueError(f'Multiple file extensions found at {resources_dir}.'
f'Cannot infer resource type from contents.')
if extension == '.mlpackage':
sources = 'packages'
elif extension == '.mlmodelc':
sources = 'compiled'
else:
raise ValueError(f'Did not find .mlpackage or .mlmodelc at {resources_dir}')
return sources
def _load_mlpackage(submodule_name,
mlpackages_dir,
model_version,
compute_unit,
sources=None):
"""
Load Core ML (mlpackage) models from disk (As exported by torch2coreml.py)
"""
# if sources not provided, attempt to infer `packages` or `compiled` from the
# resources directory
if sources is None:
sources = get_resource_type(mlpackages_dir)
if sources == 'packages':
logger.info(f"Loading {submodule_name} mlpackage")
fname = f"Stable_Diffusion_version_{model_version}_{submodule_name}.mlpackage".replace(
"/", "_")
mlpackage_path = os.path.join(mlpackages_dir, fname)
if not os.path.exists(mlpackage_path):
raise FileNotFoundError(
f"{submodule_name} CoreML model doesn't exist at {mlpackage_path}")
elif sources == 'compiled':
logger.info(f"Loading {submodule_name} mlmodelc")
# FixMe: Submodule names and compiled resources names differ. Can change if names match in the future.
submodule_names = ["text_encoder", "text_encoder_2", "unet", "vae_decoder", "vae_encoder", "safety_checker"]
compiled_names = ['TextEncoder', 'TextEncoder2', 'Unet', 'VAEDecoder', 'VAEEncoder', 'SafetyChecker']
name_map = dict(zip(submodule_names, compiled_names))
cname = name_map[submodule_name] + '.mlmodelc'
mlpackage_path = os.path.join(mlpackages_dir, cname)
if not os.path.exists(mlpackage_path):
raise FileNotFoundError(
f"{submodule_name} CoreML model doesn't exist at {mlpackage_path}")
# On macOS 15+, set fast prediction optimization hint for the unet.
optimization_hints = None
if submodule_name == "unet" and _macos_version() >= (15, 0):
optimization_hints = {"specializationStrategy": ct.SpecializationStrategy.FastPrediction}
return CoreMLModel(mlpackage_path,
compute_unit,
sources=sources,
optimization_hints=optimization_hints)
def _load_mlpackage_controlnet(mlpackages_dir, model_version, compute_unit):
""" Load Core ML (mlpackage) models from disk (As exported by torch2coreml.py)
"""
model_name = model_version.replace("/", "_")
logger.info(f"Loading controlnet_{model_name} mlpackage")
fname = f"ControlNet_{model_name}.mlpackage"
mlpackage_path = os.path.join(mlpackages_dir, fname)
if not os.path.exists(mlpackage_path):
raise FileNotFoundError(
f"controlnet_{model_name} CoreML model doesn't exist at {mlpackage_path}")
return CoreMLModel(mlpackage_path, compute_unit)
def get_available_compute_units():
return tuple(cu for cu in ct.ComputeUnit._member_names_)
================================================
FILE: python_coreml_stable_diffusion/layer_norm.py
================================================
#
# For licensing see accompanying LICENSE.md file.
# Copyright (C) 2022 Apple Inc. All Rights Reserved.
#
import torch
import torch.nn as nn
# Reference: https://github.com/apple/ml-ane-transformers/blob/main/ane_transformers/reference/layer_norm.py
class LayerNormANE(nn.Module):
""" LayerNorm optimized for Apple Neural Engine (ANE) execution
Note: This layer only supports normalization over the final dim. It expects `num_channels`
as an argument and not `normalized_shape` which is used by `torch.nn.LayerNorm`.
"""
def __init__(self,
num_channels,
clip_mag=None,
eps=1e-5,
elementwise_affine=True):
"""
Args:
num_channels: Number of channels (C) where the expected input data format is BC1S. S stands for sequence length.
clip_mag: Optional float value to use for clamping the input range before layer norm is applied.
If specified, helps reduce risk of overflow.
eps: Small value to avoid dividing by zero
elementwise_affine: If true, adds learnable channel-wise shift (bias) and scale (weight) parameters
"""
super().__init__()
# Principle 1: Picking the Right Data Format (machinelearning.apple.com/research/apple-neural-engine)
self.expected_rank = len("BC1S")
self.num_channels = num_channels
self.eps = eps
self.clip_mag = clip_mag
self.elementwise_affine = elementwise_affine
if self.elementwise_affine:
self.weight = nn.Parameter(torch.Tensor(num_channels))
self.bias = nn.Parameter(torch.Tensor(num_channels))
self._reset_parameters()
def _reset_parameters(self):
if self.elementwise_affine:
nn.init.ones_(self.weight)
nn.init.zeros_(self.bias)
def forward(self, inputs):
input_rank = len(inputs.size())
# Principle 1: Picking the Right Data Format (machinelearning.apple.com/research/apple-neural-engine)
# Migrate the data format from BSC to BC1S (most conducive to ANE)
if input_rank == 3 and inputs.size(2) == self.num_channels:
inputs = inputs.transpose(1, 2).unsqueeze(2)
input_rank = len(inputs.size())
assert input_rank == self.expected_rank
assert inputs.size(1) == self.num_channels
if self.clip_mag is not None:
inputs.clamp_(-self.clip_mag, self.clip_mag)
channels_mean = inputs.mean(dim=1, keepdims=True)
zero_mean = inputs - channels_mean
zero_mean_sq = zero_mean * zero_mean
denom = (zero_mean_sq.mean(dim=1, keepdims=True) + self.eps).rsqrt()
out = zero_mean * denom
if self.elementwise_affine:
out = (out + self.bias.view(1, self.num_channels, 1, 1)
) * self.weight.view(1, self.num_channels, 1, 1)
return out
================================================
FILE: python_coreml_stable_diffusion/mixed_bit_compression_apply.py
================================================
import argparse
import gc
import json
import logging
import os
import coremltools as ct
import coremltools.optimize.coreml as cto
import numpy as np
from python_coreml_stable_diffusion.torch2coreml import get_pipeline
from python_coreml_stable_diffusion.mixed_bit_compression_pre_analysis import (
NBITS,
PALETTIZE_MIN_SIZE as MIN_SIZE
)
logging.basicConfig()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
def main(args):
# Load Core ML model
coreml_model = ct.models.MLModel(args.mlpackage_path, compute_units=ct.ComputeUnit.CPU_ONLY)
logger.info(f"Loaded {args.mlpackage_path}")
# Load palettization recipe
with open(args.pre_analysis_json_path, 'r') as f:
pre_analysis = json.load(f)
if args.selected_recipe not in list(pre_analysis["recipes"]):
raise KeyError(
f"--selected-recipe ({args.selected_recipe}) not found in "
f"--pre-analysis-json-path ({args.pre_analysis_json_path}). "
f" Available recipes: {list(pre_analysis['recipes'])}"
)
recipe = pre_analysis["recipes"][args.selected_recipe]
assert all(nbits in NBITS + [16] for nbits in recipe.values()), \
f"Some nbits values in the recipe are illegal. Allowed values: {NBITS}"
# Hash tensors to be able to match torch tensor names to mil tensors
def get_tensor_hash(tensor):
assert tensor.dtype == np.float16
return tensor.ravel()[0] + np.prod(tensor.shape)
args.model_version = pre_analysis["model_version"]
pipe = get_pipeline(args)
torch_model = pipe.unet
hashed_recipe = {}
for torch_module_name, nbits in recipe.items():
tensor = [
tensor.cpu().numpy().astype(np.float16) for name,tensor in torch_model.named_parameters()
if name == torch_module_name + '.weight'
][0]
hashed_recipe[get_tensor_hash(tensor)] = nbits
del pipe
gc.collect()
op_name_configs = {}
weight_metadata = cto.get_weights_metadata(coreml_model, weight_threshold=MIN_SIZE)
hashes = np.array(list(hashed_recipe))
for name, metadata in weight_metadata.items():
# Look up target bits for this weight
tensor_hash = get_tensor_hash(metadata.val)
pdist = np.abs(hashes - tensor_hash)
assert(pdist.min() < 0.01)
matched = pdist.argmin()
target_nbits = hashed_recipe[hashes[matched]]
if target_nbits == 16:
continue
op_name_configs[name] = cto.OpPalettizerConfig(
mode="kmeans",
nbits=target_nbits,
weight_threshold=int(MIN_SIZE)
)
config = ct.optimize.coreml.OptimizationConfig(op_name_configs=op_name_configs)
coreml_model = ct.optimize.coreml.palettize_weights(coreml_model, config)
coreml_model.save(args.o)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"-o",
required=True,
help="Output directory to save the custom palettized model"
)
parser.add_argument(
"--mlpackage-path",
required=True,
help="Path to .mlpackage model to be palettized"
)
parser.add_argument(
"--pre-analysis-json-path",
required=True,
type=str,
help=("The JSON file generated by mixed_bit_compression_pre_analysis.py"
))
parser.add_argument(
"--selected-recipe",
required=True,
type=str,
help=("The string key into --pre-analysis-json-path's baselines dict"
))
parser.add_argument(
"--custom-vae-version",
type=str,
default=None,
help=
("Custom VAE checkpoint to override the pipeline's built-in VAE. "
"If specified, the specified VAE will be converted instead of the one associated to the `--model-version` checkpoint. "
"No precision override is applied when using a custom VAE."
))
args = parser.parse_args()
if not os.path.exists(args.mlpackage_path):
raise FileNotFoundError
if not os.path.exists(args.pre_analysis_json_path):
raise FileNotFoundError
if not args.pre_analysis_json_path.endswith('.json'):
raise ValueError("--recipe-json-path should end with '.json'")
main(args)
================================================
FILE: python_coreml_stable_diffusion/mixed_bit_compression_pre_analysis.py
================================================
from collections import OrderedDict
from copy import deepcopy
from functools import partial
import argparse
import gc
import json
import logging
logging.basicConfig()
logger = logging.getLogger()
logger.setLevel('INFO')
import numpy as np
import os
from PIL import Image
from python_coreml_stable_diffusion.torch2coreml import compute_psnr, get_pipeline
import time
import torch
import torch.nn as nn
import requests
torch.set_grad_enabled(False)
from tqdm import tqdm
# Bit-widths the Neural Engine is capable of accelerating
NBITS = [1, 2, 4, 6, 8]
# Minimum number of elements in a weight tensor to be considered for palettization
# (saves pre-analysis time)
PALETTIZE_MIN_SIZE = 1e5
# Signal integrity is computed based on these 4 random prompts
RANDOM_TEST_DATA = [
"a black and brown dog standing outside a door.",
"a person on a motorcycle makes a turn on the track.",
"inflatable boats sit on the arizona river, and on the bank",
"a white cat sitting under a white umbrella",
"black bear standing in a field of grass under a tree.",
"a train that is parked on tracks and has graffiti writing on it, with a mountain range in the background.",
"a cake inside of a pan sitting in an oven.",
"a table with paper plates and flowers in a home",
]
TEST_RESOLUTION = 768
RANDOM_TEST_IMAGE_DATA = [
Image.open(
requests.get(path, stream=True).raw).convert("RGB").resize(
(TEST_RESOLUTION, TEST_RESOLUTION), Image.LANCZOS
) for path in [
"http://farm1.staticflickr.com/106/298138827_19bb723252_z.jpg",
"http://farm4.staticflickr.com/3772/9666116202_648cd752d6_z.jpg",
"http://farm3.staticflickr.com/2238/2472574092_f5534bb2f7_z.jpg",
"http://farm1.staticflickr.com/220/475442674_47d81fdc2c_z.jpg",
"http://farm8.staticflickr.com/7231/7359341784_4c5358197f_z.jpg",
"http://farm8.staticflickr.com/7283/8737653089_d0c77b8597_z.jpg",
"http://farm3.staticflickr.com/2454/3989339438_2f32b76ebb_z.jpg",
"http://farm1.staticflickr.com/34/123005230_13051344b1_z.jpg",
]]
# Copied from https://github.com/apple/coremltools/blob/7.0b1/coremltools/optimize/coreml/_quantization_passes.py#L602
from coremltools.converters.mil.mil import types
def fake_linear_quantize(val, axis=-1, mode='LINEAR', dtype=types.int8):
from coremltools.optimize.coreml._quantization_passes import AffineQuantParams
from coremltools.converters.mil.mil.types.type_mapping import nptype_from_builtin
val_dtype = val.dtype
def _ensure_numerical_range_and_cast(val, low, high, np_dtype):
'''
For some cases, the computed quantized data might exceed the data range.
For instance, after rounding and addition, we might get `128` for the int8 quantization.
This utility function ensures the val in the data range before doing the cast.
'''
val = np.minimum(val, high)
val = np.maximum(val, low)
return val.astype(np_dtype)
mode_dtype_to_range = {
(types.int8, "LINEAR"): (-128, 127),
(types.int8, "LINEAR_SYMMETRIC"): (-127, 127),
(types.uint8, "LINEAR"): (0, 255),
(types.uint8, "LINEAR_SYMMETRIC"): (0, 254),
}
if not isinstance(val, (np.ndarray, np.generic)):
raise ValueError("Only numpy arrays are supported")
params = AffineQuantParams()
axes = tuple([i for i in range(len(val.shape)) if i != axis])
val_min = np.amin(val, axis=axes, keepdims=True)
val_max = np.amax(val, axis=axes, keepdims=True)
if mode == "LINEAR_SYMMETRIC":
# For the linear_symmetric mode, the range is symmetrical to 0
max_abs = np.maximum(np.abs(val_min), np.abs(val_max))
val_min = -max_abs
val_max = max_abs
else:
assert mode == "LINEAR"
# For the linear mode, we need to make sure the data range contains `0`
val_min = np.minimum(0.0, val_min)
val_max = np.maximum(0.0, val_max)
q_val_min, q_val_max = mode_dtype_to_range[(dtype, mode)]
# Set the zero point to symmetric mode
np_dtype = nptype_from_builtin(dtype)
if mode == "LINEAR_SYMMETRIC":
if dtype == types.int8:
params.zero_point = (0 * np.ones(val_min.shape)).astype(np.int8)
else:
assert dtype == types.uint8
params.zero_point = (127 * np.ones(val_min.shape)).astype(np.uint8)
else:
assert mode == "LINEAR"
params.zero_point = (q_val_min * val_max - q_val_max * val_min) / (val_max - val_min)
params.zero_point = np.round(params.zero_point)
params.zero_point = _ensure_numerical_range_and_cast(params.zero_point, q_val_min, q_val_max, np_dtype)
# compute the params
params.scale = (val_max - val_min) / (q_val_max - q_val_min)
params.scale = params.scale.astype(val.dtype).squeeze()
params.quantized_data = np.round(
val * (q_val_max - q_val_min) / (val_max - val_min)
)
params.quantized_data = (params.quantized_data + params.zero_point)
params.quantized_data = _ensure_numerical_range_and_cast(params.quantized_data, q_val_min, q_val_max, np_dtype)
params.zero_point = params.zero_point.squeeze()
params.axis = axis
return (params.quantized_data.astype(val_dtype) - params.zero_point.astype(val_dtype)) * params.scale
# Copied from https://github.com/apple/coremltools/blob/7.0b1/coremltools/optimize/coreml/_quantization_passes.py#L423
def fake_palettize(module, nbits, in_ngroups=1, out_ngroups=1):
""" Simulate weight palettization
"""
from coremltools.models.neural_network.quantization_utils import _get_kmeans_lookup_table_and_weight
def compress_kmeans(val, nbits):
lut, indices = _get_kmeans_lookup_table_and_weight(nbits, val)
lut = lut.astype(val.dtype)
indices = indices.astype(np.uint8)
return lut, indices
dtype = module.weight.data.dtype
device = module.weight.data.device
val = module.weight.data.cpu().numpy().astype(np.float16)
if out_ngroups == 1 and in_ngroups == 1:
lut, indices = compress_kmeans(val=val, nbits=nbits)
module.weight.data = torch.from_numpy(lut[indices]).reshape(val.shape).to(dtype)
elif out_ngroups > 1 and in_ngroups == 1:
assert val.shape[0] % out_ngroups == 0
rvals = [
compress_kmeans(val=chunked_val, nbits=nbits)
for chunked_val in np.split(val, out_ngroups, axis=0)
]
shape = list(val.shape)
shape[0] = shape[0] // out_ngroups
module.weight.data = torch.cat([
torch.from_numpy(lut[indices]).reshape(shape)
for lut,indices in rvals
], dim=0).to(dtype).to(device)
elif in_ngroups > 1 and out_ngroups == 1:
assert val.shape[1] % in_ngroups == 0
rvals = [
compress_kmeans(val=chunked_val, nbits=nbits)
for chunked_val in np.split(val, in_ngroups, axis=1)
]
shape = list(val.shape)
shape[1] = shape[1] // in_ngroups
module.weight.data = torch.cat([
torch.from_numpy(lut[indices]).reshape(shape)
for lut,indices in rvals
], dim=1).to(dtype).to(device)
else:
raise ValueError(f"in_ngroups={in_ngroups} & out_ngroups={out_ngroups} is illegal!!!")
return torch.from_numpy(val).to(dtype)
def restore_weight(module, value):
device = module.weight.data.device
module.weight.data = value.to(device)
def get_palettizable_modules(unet, min_size=PALETTIZE_MIN_SIZE):
ret = [
(name, getattr(module, 'weight').data.numel()) for name, module in unet.named_modules()
if isinstance(module, (nn.Linear, nn.Conv2d))
if hasattr(module, 'weight') and getattr(module, 'weight').data.numel() > min_size
]
candidates, sizes = [[a for a,b in ret], [b for a,b in ret]]
logger.info(f"{len(candidates)} candidate tensors with {sum(sizes)/1e6} M total params")
return candidates, sizes
def fake_int8_quantize(module):
i = 0
for name, submodule in tqdm(module.named_modules()):
if hasattr(submodule, 'weight'):
i+=1
submodule.weight.data = torch.from_numpy(
fake_linear_quantize(submodule.weight.data.numpy()))
logger.info(f"{i} modules fake int8 quantized")
return module
def fake_nbits_palette(module, nbits):
i = 0
for name, submodule in tqdm(module.named_modules()):
if hasattr(submodule, 'weight'):
i+=1
fake_palettize(submodule, nbits=nbits)
logger.info(f"{i} modules fake {nbits}-bits palettized")
return module
def fake_palette_from_recipe(module, recipe):
tot_bits = 0
tot_numel = 0
for name, submodule in tqdm(module.named_modules()):
if hasattr(submodule, 'weight'):
tot_numel += submodule.weight.numel()
if name in recipe:
nbits = recipe[name]
assert nbits in NBITS + [16]
tot_bits += submodule.weight.numel() * nbits
if nbits == 16:
continue
fake_palettize(submodule, nbits=nbits)
else:
tot_bits += submodule.weight.numel() * 16
logger.info(f"Palettized to {tot_bits/tot_numel:.2f}-bits mixed palette ({tot_bits/8e6} MB) ")
# Globally synced RNG state
rng = torch.Generator()
rng_state = rng.get_state()
def run_pipe(pipe):
if torch.backends.mps.is_available():
device = "mps"
elif torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
logger.debug(f"Placing pipe in {device}")
global rng, rng_state
rng.set_state(rng_state)
kwargs = dict(
prompt=RANDOM_TEST_DATA,
negative_prompt=[""] * len(RANDOM_TEST_DATA),
num_inference_steps=1,
height=TEST_RESOLUTION,
width=TEST_RESOLUTION,
output_type="latent",
generator=rng
)
if "Img2Img" in pipe.__class__.__name__:
kwargs["image"] = RANDOM_TEST_IMAGE_DATA
kwargs.pop("height")
kwargs.pop("width")
# Run a single denoising step
kwargs["num_inference_steps"] = 4
kwargs["strength"] = 0.25
return np.array([latent.cpu().numpy() for latent in pipe.to(device)(**kwargs).images])
def benchmark_signal_integrity(pipe,
candidates,
nbits,
cumulative,
in_ngroups=1,
out_ngroups=1,
ref_out=None,
):
results = {}
results['metadata'] = {
'nbits': nbits,
'out_ngroups': out_ngroups,
'in_ngroups': in_ngroups,
'cumulative': cumulative,
}
# If reference outputs are not provided, treat current pipe as reference
if ref_out is None:
ref_out = run_pipe(pipe)
for candidate in tqdm(candidates):
palettized = False
for name, module in pipe.unet.named_modules():
if name == candidate:
orig_weight = fake_palettize(
module,
nbits,
out_ngroups=out_ngroups,
in_ngroups=in_ngroups,
)
palettized = True
break
if not palettized:
raise KeyError(name)
test_out = run_pipe(pipe)
if not cumulative:
restore_weight(module, orig_weight)
results[candidate] = [
float(f"{compute_psnr(r,t):.1f}")
for r,t in zip(ref_out, test_out)
]
logger.info(f"{nbits}-bit: {candidate} = {results[candidate]}")
return results
def descending_psnr_order(results):
if 'metadata' in results:
results.pop('metadata')
return OrderedDict(sorted(results.items(), key=lambda items: -sum(items[1])))
def simulate_quant_fn(ref_pipe, quantization_to_simulate):
simulated_pipe = deepcopy(ref_pipe.to('cpu'))
quantization_to_simulate(simulated_pipe.unet)
simulated_out = run_pipe(simulated_pipe)
del simulated_pipe
gc.collect()
ref_out = run_pipe(ref_pipe)
simulated_psnr = sum([
float(f"{compute_psnr(r, t):.1f}")
for r, t in zip(ref_out, simulated_out)
]) / len(ref_out)
return simulated_out, simulated_psnr
def build_recipe(results, sizes, psnr_threshold, default_nbits):
stats = {'nbits': 0}
recipe = {}
for key in results[str(NBITS[0])]:
if key == 'metadata':
continue
achieved_nbits = default_nbits
for nbits in NBITS:
avg_psnr = sum(results[str(nbits)][key])/len(RANDOM_TEST_DATA)
if avg_psnr > psnr_threshold:
achieved_nbits = nbits
break
recipe[key] = achieved_nbits
stats['nbits'] += achieved_nbits * sizes[key]
stats['size_mb'] = stats['nbits'] / (8*1e6)
tot_size = sum(list(sizes.values()))
stats['nbits'] /= tot_size
return recipe, stats
def plot(results, args):
import matplotlib.pyplot as plt
max_model_size = sum(results['cumulative'][str(NBITS[0])]['metadata']['sizes'])
f, ax = plt.subplots(1, 1, figsize=(7, 5))
def compute_x_axis(sizes, nbits, default_nbits):
max_compression_percent = (default_nbits - nbits) / default_nbits
progress = np.cumsum(sizes)
normalized_progress = progress / progress.max()
return normalized_progress * max_compression_percent * 100
# Linear 8-bit baseline and the intercept points for mixed-bit recipes
linear8bit_baseline = results['baselines']['linear_8bit']
# Mark the linear 8-bit baseline
ax.plot(
8 / args.default_nbits * 100,
linear8bit_baseline,
'bx',
markersize=8,
label="8-bit (linear quant)")
# Plot the iso-dB line that matches the 8-bit baseline
ax.plot([0,100], [linear8bit_baseline]*2, '--b')
# Plot non-mixed-bit palettization curves
for idx, nbits in enumerate(NBITS):
size_keys = compute_x_axis(results['cumulative'][str(nbits)]['metadata']['sizes'], nbits, args.default_nbits)
psnr = [
sum(v) / len(RANDOM_TEST_DATA) # avg psnr
for k,v in results['cumulative'][str(nbits)].items() if k != 'metadata'
]
ax.plot(
size_keys,
psnr,
label=f"{nbits}-bit")
# Plot mixed-bit results
mixed_palettes = [
(float(spec.rsplit('_')[1]), psnr)
for spec,psnr in results['baselines'].items()
if 'recipe' in spec
]
mixedbit_sizes = [100. * (1. - a[0] / args.default_nbits) for a in mixed_palettes]
mixedbit_psnrs = [a[1] for a in mixed_palettes]
ax.plot(
mixedbit_sizes,
mixedbit_psnrs,
label="mixed-bit",
)
ax.set_xlabel("Model Size Reduction (%)")
ax.set_ylabel("Signal Integrity (PSNR in dB)")
ax.set_title(args.model_version)
ax.legend()
f.savefig(os.path.join(args.o, f"{args.model_version.replace('/','_')}_psnr_vs_size.png"))
def main(args):
# Initialize pipe
pipe = get_pipeline(args)
# Preserve a pristine copy for reference outputs
ref_pipe = deepcopy(pipe)
if args.default_nbits != 16:
logger.info(f"Palettizing unet to default {args.default_nbits}-bit")
fake_nbits_palette(pipe.unet, args.default_nbits)
logger.info("Done.")
# Cache reference outputs
ref_out = run_pipe(pipe)
# Bookkeeping
os.makedirs(args.o, exist_ok=True)
results = {
'single_layer': {},
'cumulative': {},
'model_version': args.model_version,
}
json_name = f"{args.model_version.replace('/','-')}_palettization_recipe.json"
candidates, sizes = get_palettizable_modules(pipe.unet)
sizes_table = dict(zip(candidates, sizes))
if os.path.isfile(os.path.join(args.o, json_name)):
with open(os.path.join(args.o, json_name), "r") as f:
results = json.load(f)
# Analyze uniform-precision palettization impact on signal integrity
for nbits in NBITS:
if str(nbits) not in results['single_layer']:
# Measure the impact of palettization of each layer independently
results['single_layer'][str(nbits)] = benchmark_signal_integrity(
pipe,
candidates,
nbits,
cumulative=False,
ref_out=ref_out,
)
with open(os.path.join(args.o, json_name), 'w') as f:
json.dump(results, f, indent=2)
# Measure the cumulative impact of palettization based on ascending individual impact computed earlier
sorted_candidates = descending_psnr_order(results['single_layer'][str(nbits)])
if str(nbits) not in results['cumulative']:
results['cumulative'][str(nbits)] = benchmark_signal_integrity(
deepcopy(pipe),
sorted_candidates,
nbits,
cumulative=True,
ref_out=ref_out,
)
results['cumulative'][str(nbits)]['metadata'].update({
'candidates': list(sorted_candidates.keys()),
'sizes': [sizes_table[candidate] for candidate in sorted_candidates],
})
with open(os.path.join(args.o, json_name), 'w') as f:
json.dump(results, f, indent=2)
# Generate uniform-quantization baselines
results['baselines'] = {
"original": simulate_quant_fn(ref_pipe, lambda x: x)[1],
"linear_8bit": simulate_quant_fn(ref_pipe, fake_int8_quantize)[1],
}
with open(os.path.join(args.o, json_name), 'w') as f:
json.dump(results, f, indent=2)
# Generate mixed-bit recipes via decreasing PSNR thresholds
results['recipes'] = {}
recipe_psnr_thresholds = np.linspace(
results['baselines']['original'] - 1,
results['baselines']["linear_8bit"] + 5,
args.num_recipes,
)
for recipe_no, psnr_threshold in enumerate(recipe_psnr_thresholds):
logger.info(f"Building recipe #{recipe_no}")
recipe, stats = build_recipe(
results['cumulative'],
sizes_table,
psnr_threshold,
args.default_nbits,
)
achieved_psnr = simulate_quant_fn(ref_pipe, lambda x: partial(fake_palette_from_recipe, recipe=recipe)(x))[1]
logger.info(
f"Recipe #{recipe_no}: {stats['nbits']:.2f}-bits @ per-layer {psnr_threshold} dB, "
f"end-to-end {achieved_psnr} dB & "
f"{stats['size_mb']:.2f} MB"
)
# Save achieved PSNR and compressed size
recipe_key = f"recipe_{stats['nbits']:.2f}_bit_mixedpalette"
results['baselines'][recipe_key] = float(f"{achieved_psnr:.1f}")
results['recipes'][recipe_key] = recipe
with open(os.path.join(args.o, json_name), 'w') as f:
json.dump(results, f, indent=2)
# Plot model size vs signal integrity
plot(results, args)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"-o",
required=True,
help="Output directory to save the palettization artifacts (recipe json, PSNR plots etc.)"
)
parser.add_argument(
"--model-version",
required=True,
help=
("The pre-trained model checkpoint and configuration to restore. "
"For available versions: https://huggingface.co/models?search=stable-diffusion"
))
parser.add_argument(
"--default-nbits",
help="Default number of bits to use for palettization",
choices=tuple(NBITS + [16]),
default=16,
type=int,
)
parser.add_argument(
"--num-recipes",
help="Maximum number of recipes to generate (with decreasing model size and signal integrity)",
default=7,
type=int,
)
parser.add_argument(
"--custom-vae-version",
type=str,
default=None,
help=
("Custom VAE checkpoint to override the pipeline's built-in VAE. "
"If specified, the specified VAE will be converted instead of the one associated to the `--model-version` checkpoint. "
"No precision override is applied when using a custom VAE."
))
args = parser.parse_args()
main(args)
================================================
FILE: python_coreml_stable_diffusion/multilingual_projection.py
================================================
from python_coreml_stable_diffusion.torch2coreml import _compile_coreml_model
import argparse
import coremltools as ct
import numpy as np
import os
import torch
import torch.nn as nn
# TODO: Read these values off of the NLContextualEmbedding API to enforce dimensions and track API versioning
MAX_SEQUENCE_LENGTH = 256
EMBED_DIM = 512
BATCH_SIZE = 1
def main(args):
# Layer that was trained to map NLContextualEmbedding to your text_encoder.hidden_size dimensionality
text_encoder_projection = torch.jit.load(args.input_path)
# Prepare random inputs for tracing the network before conversion
random_input = torch.randn(BATCH_SIZE, MAX_SEQUENCE_LENGTH, EMBED_DIM)
# Create a class to bake in the reshape operations required to fit the existing model interface
class TextEncoderProjection(nn.Module):
def __init__(self, proj):
super().__init__()
self.proj = proj
def forward(self, x):
return self.proj(x).transpose(1, 2).unsqueeze(2) # BSC, BC1S
# Trace the torch model
text_encoder_projection = torch.jit.trace(TextEncoderProjection(text_encoder_projection), (random_input,))
# Convert the model to Core ML
mlpackage_path = os.path.join(args.output_dir, "MultilingualTextEncoderProjection.mlpackage")
ct.convert(
text_encoder_projection,
inputs=[ct.TensorType('nlcontextualembeddings_output', shape=(1, MAX_SEQUENCE_LENGTH, EMBED_DIM), dtype=np.float32)],
outputs=[ct.TensorType('encoder_hidden_states', dtype=np.float32)],
minimum_deployment_target=ct.target.macOS14, # NLContextualEmbedding minimum availability build
convert_to='mlprogram',
).save()
# Compile the model and save it under the specified directory
_compile_coreml_model(mlpackage_path, args.output_dir, final_name="MultilingualTextEncoderProjection")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--input-path",
help="Path to the torchscript file that contains the projection layer"
)
parser.add_argument(
"--output-dir",
help="Output directory in which the Core ML model should be saved",
)
args = parser.parse_args()
main(args)
================================================
FILE: python_coreml_stable_diffusion/pipeline.py
================================================
#
# For licensing see accompanying LICENSE.md file.
# Copyright (C) 2022 Apple Inc. All Rights Reserved.
#
import argparse
from diffusers import StableDiffusionPipeline, StableDiffusionXLPipeline
from diffusers.pipelines.pipeline_utils import DiffusionPipeline
from diffusers.pipelines.stable_diffusion import StableDiffusionPipelineOutput
from diffusers.schedulers import (
DDIMScheduler,
DPMSolverMultistepScheduler,
EulerAncestralDiscreteScheduler,
EulerDiscreteScheduler,
LMSDiscreteScheduler,
PNDMScheduler,
)
from diffusers.schedulers.scheduling_utils import SchedulerMixin
import gc
import inspect
import logging
logging.basicConfig()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
import numpy as np
import os
from python_coreml_stable_diffusion.coreml_model import (
CoreMLModel,
_load_mlpackage,
_load_mlpackage_controlnet,
get_available_compute_units,
)
import time
import torch # Only used for `torch.from_tensor` in `pipe.scheduler.step()`
from transformers import CLIPFeatureExtractor, CLIPTokenizer
from typing import List, Optional, Union, Tuple
from PIL import Image
class CoreMLStableDiffusionPipeline(DiffusionPipeline):
""" Core ML version of
`diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline`
"""
def __init__(
self,
text_encoder: CoreMLModel,
unet: CoreMLModel,
vae_decoder: CoreMLModel,
scheduler: Union[
DDIMScheduler,
DPMSolverMultistepScheduler,
EulerAncestralDiscreteScheduler,
EulerDiscreteScheduler,
LMSDiscreteScheduler,
PNDMScheduler
],
tokenizer: CLIPTokenizer,
controlnet: Optional[List[CoreMLModel]],
xl: Optional[bool] = False,
force_zeros_for_empty_prompt: Optional[bool] = True,
feature_extractor: Optional[CLIPFeatureExtractor] = None,
safety_checker: Optional[CoreMLModel] = None,
text_encoder_2: Optional[CoreMLModel] = None,
tokenizer_2: Optional[CLIPTokenizer] = None
):
super().__init__()
# Register non-Core ML components of the pipeline similar to the original pipeline
self.register_modules(
tokenizer=tokenizer,
scheduler=scheduler,
feature_extractor=feature_extractor,
)
if safety_checker is None:
# Reproduce original warning:
# https://github.com/huggingface/diffusers/blob/v0.9.0/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L119
logger.warning(
f"You have disabled the safety checker for {self.__class__} by passing `safety_checker=None`. Ensure"
" that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered"
" results in services or applications open to the public. Both the diffusers team and Hugging Face"
" strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling"
" it only for use-cases that involve analyzing network behavior or auditing its results. For more"
" information, please have a look at https://github.com/huggingface/diffusers/pull/254 ."
)
self.xl = xl
self.force_zeros_for_empty_prompt = force_zeros_for_empty_prompt
# Register Core ML components of the pipeline
self.safety_checker = safety_checker
self.text_encoder = text_encoder
self.text_encoder_2 = text_encoder_2
self.tokenizer_2 = tokenizer_2
self.unet = unet
self.unet.in_channels = self.unet.expected_inputs["sample"]["shape"][1]
self.controlnet = controlnet
self.vae_decoder = vae_decoder
VAE_DECODER_UPSAMPLE_FACTOR = 8
# In PyTorch, users can determine the tensor shapes dynamically by default
# In CoreML, tensors have static shapes unless flexible shapes were used during export
# See https://coremltools.readme.io/docs/flexible-inputs
latent_h, latent_w = self.unet.expected_inputs["sample"]["shape"][2:]
self.height = latent_h * VAE_DECODER_UPSAMPLE_FACTOR
self.width = latent_w * VAE_DECODER_UPSAMPLE_FACTOR
logger.info(
f"Stable Diffusion configured to generate {self.height}x{self.width} images"
)
def _encode_prompt(self,
prompt,
prompt_2: Optional[str] = None,
do_classifier_free_guidance: bool = True,
negative_prompt: Optional[str] = None,
negative_prompt_2: Optional[str] = None,
):
batch_size = len(prompt) if isinstance(prompt, list) else 1
if self.xl is True:
prompts = [prompt, prompt_2] if prompt_2 is not None else [prompt, prompt]
# refiner uses only one tokenizer and text encoder (tokenizer_2 and text_encoder_2)
tokenizers = [self.tokenizer, self.tokenizer_2] if self.tokenizer is not None else [self.tokenizer_2]
text_encoders = [self.text_encoder, self.text_encoder_2] if self.text_encoder is not None else [
self.text_encoder_2]
hidden_state_key = 'hidden_embeds'
else:
prompts = [prompt]
tokenizers = [self.tokenizer]
text_encoders = [self.text_encoder]
hidden_state_key = 'last_hidden_state'
prompt_embeds_list = []
for prompt, tokenizer, text_encoder in zip(prompts, tokenizers, text_encoders):
text_inputs = tokenizer(
prompt,
padding="max_length",
max_length=tokenizer.model_max_length,
truncation=True,
return_tensors="np",
)
text_input_ids = text_inputs.input_ids
# tokenize without max_length to catch any truncation
untruncated_ids = tokenizer(prompt, padding="longest", return_tensors="np").input_ids
if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not np.equal(
text_input_ids, untruncated_ids
):
removed_text = tokenizer.batch_decode(untruncated_ids[:, tokenizer.model_max_length - 1: -1])
logger.warning(
"The following part of your input was truncated because CLIP can only handle sequences up to"
f" {tokenizer.model_max_length} tokens: {removed_text}"
)
embeddings = text_encoder(input_ids=text_input_ids.astype(np.float32))
prompt_embeds_list.append(embeddings[hidden_state_key])
# We are only ALWAYS interested in the pooled output of the final text encoder
if self.xl:
pooled_prompt_embeds = embeddings['pooled_outputs']
prompt_embeds = np.concatenate(prompt_embeds_list, axis=-1)
if do_classifier_free_guidance and negative_prompt is None and self.force_zeros_for_empty_prompt:
negative_prompt_embeds = np.zeros_like(prompt_embeds)
if self.xl:
negative_pooled_prompt_embeds = np.zeros_like(pooled_prompt_embeds)
elif do_classifier_free_guidance:
negative_prompt = negative_prompt or ""
negative_prompt_2 = negative_prompt_2 or negative_prompt
# normalize str to list
negative_prompt = batch_size * [negative_prompt] if isinstance(negative_prompt, str) else negative_prompt
negative_prompt_2 = (
batch_size * [negative_prompt_2] if isinstance(negative_prompt_2, str) else negative_prompt_2
)
uncond_tokens: List[str]
if prompts is not None and type(prompts) is not type(negative_prompt):
raise TypeError(
f"`negative_prompt` should be the same type to `prompt`, but got {type(negative_prompt)} !="
f" {type(prompt)}."
)
elif batch_size != len(negative_prompt):
raise ValueError(
f"`negative_prompt`: {negative_prompt} has batch size {len(negative_prompt)}, but `prompt`:"
f" {prompt} has batch size {batch_size}. Please make sure that passed `negative_prompt` matches"
" the batch size of `prompt`.")
else:
uncond_tokens = [negative_prompt, negative_prompt_2]
negative_prompt_embeds_list = []
for negative_prompt, tokenizer, text_encoder in zip(uncond_tokens, tokenizers, text_encoders):
max_length = prompt_embeds.shape[1]
uncond_input = tokenizer(
negative_prompt,
padding="max_length",
max_length=max_length,
truncation=True,
return_tensors="np",
)
gitextract_srx0ds09/
├── .github/
│ └── pull_request_template.md
├── .gitignore
├── ACKNOWLEDGEMENTS
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE.md
├── Package.swift
├── README.md
├── python_coreml_stable_diffusion/
│ ├── __init__.py
│ ├── _version.py
│ ├── activation_quantization.py
│ ├── attention.py
│ ├── chunk_mlprogram.py
│ ├── controlnet.py
│ ├── coreml_model.py
│ ├── layer_norm.py
│ ├── mixed_bit_compression_apply.py
│ ├── mixed_bit_compression_pre_analysis.py
│ ├── multilingual_projection.py
│ ├── pipeline.py
│ ├── torch2coreml.py
│ └── unet.py
├── requirements.txt
├── setup.py
├── swift/
│ ├── StableDiffusion/
│ │ ├── pipeline/
│ │ │ ├── CGImage+vImage.swift
│ │ │ ├── ControlNet.swift
│ │ │ ├── DPMSolverMultistepScheduler.swift
│ │ │ ├── Decoder.swift
│ │ │ ├── DiscreteFlowScheduler.swift
│ │ │ ├── Encoder.swift
│ │ │ ├── ManagedMLModel.swift
│ │ │ ├── MultiModalDiffusionTransformer.swift
│ │ │ ├── MultilingualTextEncoder.swift
│ │ │ ├── NumPyRandomSource.swift
│ │ │ ├── NvRandomSource.swift
│ │ │ ├── RandomSource.swift
│ │ │ ├── ResourceManaging.swift
│ │ │ ├── SafetyChecker.swift
│ │ │ ├── SampleTimer.swift
│ │ │ ├── Scheduler.swift
│ │ │ ├── StableDiffusion3Pipeline+Resources.swift
│ │ │ ├── StableDiffusion3Pipeline.swift
│ │ │ ├── StableDiffusionPipeline+Resources.swift
│ │ │ ├── StableDiffusionPipeline.Configuration.swift
│ │ │ ├── StableDiffusionPipeline.swift
│ │ │ ├── StableDiffusionXL+Resources.swift
│ │ │ ├── StableDiffusionXLPipeline.swift
│ │ │ ├── TextEncoder.swift
│ │ │ ├── TextEncoderT5.swift
│ │ │ ├── TextEncoderXL.swift
│ │ │ ├── TorchRandomSource.swift
│ │ │ └── Unet.swift
│ │ └── tokenizer/
│ │ ├── BPETokenizer+Reading.swift
│ │ ├── BPETokenizer.swift
│ │ └── T5Tokenizer.swift
│ ├── StableDiffusionCLI/
│ │ └── main.swift
│ └── StableDiffusionTests/
│ ├── Resources/
│ │ ├── merges.txt
│ │ └── vocab.json
│ └── StableDiffusionTests.swift
└── tests/
├── __init__.py
└── test_stable_diffusion.py
SYMBOL INDEX (183 symbols across 13 files)
FILE: python_coreml_stable_diffusion/activation_quantization.py
function get_coreml_inputs (line 68) | def get_coreml_inputs(sample_inputs):
function convert_to_coreml (line 77) | def convert_to_coreml(torchscript_module, sample_inputs):
function unet_data_loader (line 92) | def unet_data_loader(data_dir, device='cpu', calibration_nsamples=None):
function quantize_module_config (line 121) | def quantize_module_config(module_name):
function quantize_cumulative_config (line 141) | def quantize_cumulative_config(skip_conv_layers, skip_einsum_layers):
function quantize (line 173) | def quantize(model, config, calibration_data):
function get_quantizable_modules (line 205) | def get_quantizable_modules(unet):
function recipe_overrides_for_inference_speedup (line 217) | def recipe_overrides_for_inference_speedup(conv_layers, skipped_conv):
function recipe_overrides_for_quality (line 231) | def recipe_overrides_for_quality(conv_layers, skipped_conv):
function register_input_log_hook (line 244) | def register_input_log_hook(unet, inputs):
function generate_calibration_data (line 258) | def generate_calibration_data(pipe, args, calibration_dir):
function register_input_preprocessing_hook (line 282) | def register_input_preprocessing_hook(pipe):
function prepare_pipe (line 300) | def prepare_pipe(pipe, unet):
function run_pipe (line 310) | def run_pipe(pipe):
function get_reference_pipeline (line 320) | def get_reference_pipeline(model_version):
function main (line 342) | def main(args):
FILE: python_coreml_stable_diffusion/attention.py
function softmax (line 11) | def softmax(x, dim):
function split_einsum (line 24) | def split_einsum(q, k, v, mask, heads, dim_head):
function split_einsum_v2 (line 77) | def split_einsum_v2(q, k, v, mask, heads, dim_head):
function original (line 147) | def original(q, k, v, mask, heads, dim_head):
FILE: python_coreml_stable_diffusion/chunk_mlprogram.py
function _verify_output_correctness_of_chunks (line 34) | def _verify_output_correctness_of_chunks(full_model,
function _load_prog_from_mlmodel (line 82) | def _load_prog_from_mlmodel(model):
function _get_op_idx_split_location (line 100) | def _get_op_idx_split_location(prog: Program):
function _get_first_chunk_outputs (line 131) | def _get_first_chunk_outputs(block, op_idx):
function _add_fp32_casts (line 150) | def _add_fp32_casts(block, boundary_vars):
function _make_first_chunk_prog (line 161) | def _make_first_chunk_prog(prog, op_idx):
function _make_second_chunk_prog (line 176) | def _make_second_chunk_prog(prog, op_idx):
function _legacy_model_chunking (line 234) | def _legacy_model_chunking(args):
function main (line 342) | def main(args):
FILE: python_coreml_stable_diffusion/controlnet.py
class ControlNetConditioningEmbedding (line 15) | class ControlNetConditioningEmbedding(nn.Module):
method __init__ (line 17) | def __init__(
method forward (line 37) | def forward(self, conditioning):
class ControlNetModel (line 49) | class ControlNetModel(ModelMixin, ConfigMixin):
method __init__ (line 52) | def __init__(
method get_num_residuals (line 191) | def get_num_residuals(self):
method forward (line 199) | def forward(
FILE: python_coreml_stable_diffusion/coreml_model.py
function _macos_version (line 23) | def _macos_version():
class CoreMLModel (line 36) | class CoreMLModel:
method __init__ (line 40) | def __init__(self, model_path, compute_unit, sources='packages', optim...
method _verify_inputs (line 97) | def _verify_inputs(self, **kwargs):
method __call__ (line 118) | def __call__(self, **kwargs):
function get_resource_type (line 126) | def get_resource_type(resources_dir: str) -> str:
function _load_mlpackage (line 155) | def _load_mlpackage(submodule_name,
function _load_mlpackage_controlnet (line 206) | def _load_mlpackage_controlnet(mlpackages_dir, model_version, compute_un...
function get_available_compute_units (line 224) | def get_available_compute_units():
FILE: python_coreml_stable_diffusion/layer_norm.py
class LayerNormANE (line 11) | class LayerNormANE(nn.Module):
method __init__ (line 18) | def __init__(self,
method _reset_parameters (line 46) | def _reset_parameters(self):
method forward (line 51) | def forward(self, inputs):
FILE: python_coreml_stable_diffusion/mixed_bit_compression_apply.py
function main (line 23) | def main(args):
FILE: python_coreml_stable_diffusion/mixed_bit_compression_pre_analysis.py
function fake_linear_quantize (line 65) | def fake_linear_quantize(val, axis=-1, mode='LINEAR', dtype=types.int8):
function fake_palettize (line 139) | def fake_palettize(module, nbits, in_ngroups=1, out_ngroups=1):
function restore_weight (line 189) | def restore_weight(module, value):
function get_palettizable_modules (line 194) | def get_palettizable_modules(unet, min_size=PALETTIZE_MIN_SIZE):
function fake_int8_quantize (line 205) | def fake_int8_quantize(module):
function fake_nbits_palette (line 216) | def fake_nbits_palette(module, nbits):
function fake_palette_from_recipe (line 226) | def fake_palette_from_recipe(module, recipe):
function run_pipe (line 248) | def run_pipe(pipe):
function benchmark_signal_integrity (line 280) | def benchmark_signal_integrity(pipe,
function descending_psnr_order (line 329) | def descending_psnr_order(results):
function simulate_quant_fn (line 336) | def simulate_quant_fn(ref_pipe, quantization_to_simulate):
function build_recipe (line 352) | def build_recipe(results, sizes, psnr_threshold, default_nbits):
function plot (line 376) | def plot(results, args):
function main (line 436) | def main(args):
FILE: python_coreml_stable_diffusion/multilingual_projection.py
function main (line 15) | def main(args):
FILE: python_coreml_stable_diffusion/pipeline.py
class CoreMLStableDiffusionPipeline (line 47) | class CoreMLStableDiffusionPipeline(DiffusionPipeline):
method __init__ (line 52) | def __init__(
method _encode_prompt (line 123) | def _encode_prompt(self,
method run_controlnet (line 259) | def run_controlnet(self,
method run_safety_checker (line 286) | def run_safety_checker(self, image):
method decode_latents (line 313) | def decode_latents(self, latents):
method prepare_latents (line 322) | def prepare_latents(self,
method prepare_control_cond (line 346) | def prepare_control_cond(self,
method check_inputs (line 359) | def check_inputs(self, prompt, height, width, callback_steps):
method prepare_extra_step_kwargs (line 384) | def prepare_extra_step_kwargs(self, eta):
method _get_add_time_ids (line 398) | def _get_add_time_ids(self, original_size, crops_coords_top_left, targ...
method __call__ (line 403) | def __call__(
function get_available_schedulers (line 592) | def get_available_schedulers():
function get_coreml_pipe (line 607) | def get_coreml_pipe(pytorch_pipe,
function get_image_path (line 700) | def get_image_path(args, **override_kwargs):
function prepare_controlnet_cond (line 717) | def prepare_controlnet_cond(image_path, height, width):
function main (line 724) | def main(args):
FILE: python_coreml_stable_diffusion/torch2coreml.py
function _get_coreml_inputs (line 49) | def _get_coreml_inputs(sample_inputs, args):
function compute_psnr (line 59) | def compute_psnr(a, b):
function report_correctness (line 80) | def report_correctness(original_outputs, final_outputs, log_prefix):
function _get_out_path (line 99) | def _get_out_path(args, submodule_name):
function _convert_to_coreml (line 105) | def _convert_to_coreml(submodule_name, torchscript_module, sample_inputs,
function _get_deployment_target (line 147) | def _get_deployment_target(target_string):
function quantize_weights (line 182) | def quantize_weights(args):
function _quantize_weights (line 206) | def _quantize_weights(out_path, model_name, nbits):
function _compile_coreml_model (line 231) | def _compile_coreml_model(source_model_path, output_dir, final_name):
function _download_t5_model (line 251) | def _download_t5_model(args, t5_save_path):
function bundle_resources_for_swift_cli (line 271) | def bundle_resources_for_swift_cli(args):
function patched_make_causal_mask (line 363) | def patched_make_causal_mask(input_ids_shape, dtype, device, past_key_va...
function convert_text_encoder (line 379) | def convert_text_encoder(text_encoder, tokenizer, submodule_name, args):
function modify_coremltools_torch_frontend_badbmm (line 500) | def modify_coremltools_torch_frontend_badbmm():
function convert_vae_decoder (line 548) | def convert_vae_decoder(pipe, args):
function convert_vae_decoder_sd3 (line 644) | def convert_vae_decoder_sd3(args):
function convert_vae_encoder (line 700) | def convert_vae_encoder(pipe, args):
function convert_unet (line 799) | def convert_unet(pipe, args, model_name=None):
function convert_mmdit (line 1053) | def convert_mmdit(args):
function convert_safety_checker (line 1119) | def convert_safety_checker(pipe, args):
function _get_controlnet_base_model (line 1312) | def _get_controlnet_base_model(controlnet_model_version):
function convert_controlnet (line 1317) | def convert_controlnet(pipe, args):
function get_pipeline (line 1485) | def get_pipeline(args):
function main (line 1520) | def main(args):
function parser_spec (line 1603) | def parser_spec():
FILE: python_coreml_stable_diffusion/unet.py
class AttentionImplementations (line 33) | class AttentionImplementations(Enum):
class Einsum (line 45) | class Einsum(nn.Module):
method __init__ (line 46) | def __init__(self, heads, dim_head):
method forward (line 51) | def forward(self, q, k, v, mask):
class CrossAttention (line 62) | class CrossAttention(nn.Module):
method __init__ (line 65) | def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64):
method forward (line 87) | def forward(self, hidden_states, context=None, mask=None):
function linear_to_conv2d_map (line 121) | def linear_to_conv2d_map(state_dict, prefix, local_metadata, strict,
function correct_for_bias_scale_order_inversion (line 132) | def correct_for_bias_scale_order_inversion(state_dict, prefix, local_met...
class LayerNormANE (line 141) | class LayerNormANE(LayerNormANE):
method __init__ (line 143) | def __init__(self, *args, **kwargs):
class CrossAttnUpBlock2D (line 151) | class CrossAttnUpBlock2D(nn.Module):
method __init__ (line 153) | def __init__(
method forward (line 207) | def forward(self,
class UpBlock2D (line 228) | class UpBlock2D(nn.Module):
method __init__ (line 230) | def __init__(
method forward (line 266) | def forward(self, hidden_states, res_hidden_states_tuple, temb=None):
class CrossAttnDownBlock2D (line 282) | class CrossAttnDownBlock2D(nn.Module):
method __init__ (line 284) | def __init__(
method forward (line 336) | def forward(self, hidden_states, temb=None, encoder_hidden_states=None):
class DownBlock2D (line 353) | class DownBlock2D(nn.Module):
method __init__ (line 355) | def __init__(
method forward (line 389) | def forward(self, hidden_states, temb=None):
class ResnetBlock2D (line 406) | class ResnetBlock2D(nn.Module):
method __init__ (line 408) | def __init__(
method forward (line 470) | def forward(self, x, temb):
class Upsample2D (line 492) | class Upsample2D(nn.Module):
method __init__ (line 494) | def __init__(self, channels):
method forward (line 498) | def forward(self, x):
class Downsample2D (line 503) | class Downsample2D(nn.Module):
method __init__ (line 505) | def __init__(self, channels):
method forward (line 509) | def forward(self, x):
class SpatialTransformer (line 513) | class SpatialTransformer(nn.Module):
method __init__ (line 515) | def __init__(
method forward (line 553) | def forward(self, hidden_states, context=None):
class BasicTransformerBlock (line 566) | class BasicTransformerBlock(nn.Module):
method __init__ (line 568) | def __init__(self, dim, n_heads, d_head, context_dim=None, gated_ff=Tr...
method forward (line 586) | def forward(self, hidden_states, context=None):
class FeedForward (line 594) | class FeedForward(nn.Module):
method __init__ (line 596) | def __init__(self, dim, dim_out=None, mult=4, glu=False):
method forward (line 605) | def forward(self, hidden_states):
class GEGLU (line 609) | class GEGLU(nn.Module):
method __init__ (line 611) | def __init__(self, dim_in, dim_out):
method forward (line 615) | def forward(self, hidden_states):
function get_activation (line 620) | def get_activation(act_fn):
class TimestepEmbedding (line 630) | class TimestepEmbedding(nn.Module):
method __init__ (line 631) | def __init__(
method forward (line 665) | def forward(self, sample, condition=None):
class Timesteps (line 685) | class Timesteps(nn.Module):
method __init__ (line 687) | def __init__(self, num_channels, flip_sin_to_cos, downscale_freq_shift):
method forward (line 693) | def forward(self, timesteps):
function get_timestep_embedding (line 703) | def get_timestep_embedding(
class UNetMidBlock2DCrossAttn (line 731) | class UNetMidBlock2DCrossAttn(nn.Module):
method __init__ (line 733) | def __init__(
method forward (line 789) | def forward(self, hidden_states, temb=None, encoder_hidden_states=None):
class UNet2DConditionModel (line 798) | class UNet2DConditionModel(ModelMixin, ConfigMixin):
method __init__ (line 801) | def __init__(
method forward (line 975) | def forward(
class UNet2DConditionModelXL (line 1051) | class UNet2DConditionModelXL(UNet2DConditionModel):
method forward (line 1055) | def forward(
function get_down_block (line 1155) | def get_down_block(
function get_up_block (line 1201) | def get_up_block(
function calculate_conv2d_output_shape (line 1247) | def calculate_conv2d_output_shape(in_h, in_w, conv2d_layer):
FILE: tests/test_stable_diffusion.py
class TestStableDiffusionForTextToImage (line 40) | class TestStableDiffusionForTextToImage(unittest.TestCase):
method setUpClass (line 54) | def setUpClass(cls):
method tearDownClass (line 65) | def tearDownClass(cls):
method test_torch_to_coreml_conversion (line 70) | def test_torch_to_coreml_conversion(self):
method test_end_to_end_image_generation_speed (line 95) | def test_end_to_end_image_generation_speed(self):
method test_image_to_prompt_clip_score (line 117) | def test_image_to_prompt_clip_score(self):
method test_safety_checker_efficacy (line 157) | def test_safety_checker_efficacy(self):
method test_swift_cli_image_generation (line 170) | def test_swift_cli_image_generation(self):
method _init_coreml_pipe (line 218) | def _init_coreml_pipe(self, compute_unit):
method _coreml_text_to_image_with_compute_unit (line 239) | def _coreml_text_to_image_with_compute_unit(self, compute_unit):
function _reset_seed (line 289) | def _reset_seed():
function _get_test_artifacts_dir (line 296) | def _get_test_artifacts_dir(args):
function _extend_parser (line 306) | def _extend_parser(parser):
Condensed preview — 61 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,105K chars).
[
{
"path": ".github/pull_request_template.md",
"chars": 176,
"preview": "Thank you for your interest in contributing to Core ML Stable Diffusion! Please review [CONTRIBUTING.md](../CONTRIBUTING"
},
{
"path": ".gitignore",
"chars": 1951,
"preview": "*~\n\n# Swift Package\n.DS_Store\n/.build\n/Packages\n/*.xcodeproj\n.swiftpm\n.vscode\n.*.sw?\n*.docc-build\n*.vs\nPackage.resolved\n"
},
{
"path": "ACKNOWLEDGEMENTS",
"chars": 31679,
"preview": "Acknowledgements\nPortions of this software may utilize the following copyrighted \nmaterial, the use of which is hereby a"
},
{
"path": "CODE_OF_CONDUCT.md",
"chars": 3357,
"preview": "# Code of Conduct\n\n## Our Pledge\n\nIn the interest of fostering an open and welcoming environment, we as\ncontributors and"
},
{
"path": "CONTRIBUTING.md",
"chars": 730,
"preview": "# Contribution Guide\n\nThank you for your interest in contributing to Core ML Stable Diffusion! This project was released"
},
{
"path": "LICENSE.md",
"chars": 1067,
"preview": "MIT License\n\nCopyright (c) 2024 Apple Inc.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy"
},
{
"path": "Package.swift",
"chars": 1515,
"preview": "// swift-tools-version: 5.8\n// The swift-tools-version declares the minimum version of Swift required to build this pack"
},
{
"path": "README.md",
"chars": 59262,
"preview": "# Core ML Stable Diffusion\n\nRun Stable Diffusion on Apple Silicon with Core ML\n\n[\\[Blog Post\\]](https://machinelearning."
},
{
"path": "python_coreml_stable_diffusion/__init__.py",
"chars": 34,
"preview": "from ._version import __version__\n"
},
{
"path": "python_coreml_stable_diffusion/_version.py",
"chars": 23,
"preview": "__version__ = \"1.1.0\"\r\n"
},
{
"path": "python_coreml_stable_diffusion/activation_quantization.py",
"chars": 19474,
"preview": "#\n# For licensing see accompanying LICENSE.md file.\n# Copyright (C) 2022 Apple Inc. All Rights Reserved.\n#\n\nimport loggi"
},
{
"path": "python_coreml_stable_diffusion/attention.py",
"chars": 5539,
"preview": "import logging\n\nlogger = logging.getLogger(__name__)\nlogger.setLevel(logging.INFO)\n\nimport torch\nimport math\n\nSPLIT_SOFT"
},
{
"path": "python_coreml_stable_diffusion/chunk_mlprogram.py",
"chars": 15744,
"preview": "#\n# For licensing see accompanying LICENSE.md file.\n# Copyright (C) 2022 Apple Inc. All Rights Reserved.\n#\n\nimport argpa"
},
{
"path": "python_coreml_stable_diffusion/controlnet.py",
"chars": 9302,
"preview": "#\n# For licensing see accompanying LICENSE.md file.\n# Copyright (C) 2022 Apple Inc. All Rights Reserved.\n#\n\nfrom diffuse"
},
{
"path": "python_coreml_stable_diffusion/coreml_model.py",
"chars": 8097,
"preview": "#\n# For licensing see accompanying LICENSE.md file.\n# Copyright (C) 2022 Apple Inc. All Rights Reserved.\n#\n\nimport corem"
},
{
"path": "python_coreml_stable_diffusion/layer_norm.py",
"chars": 3001,
"preview": "#\n# For licensing see accompanying LICENSE.md file.\n# Copyright (C) 2022 Apple Inc. All Rights Reserved.\n#\n\nimport torch"
},
{
"path": "python_coreml_stable_diffusion/mixed_bit_compression_apply.py",
"chars": 4290,
"preview": "import argparse\nimport gc\nimport json\nimport logging\nimport os\n\nimport coremltools as ct\nimport coremltools.optimize.cor"
},
{
"path": "python_coreml_stable_diffusion/mixed_bit_compression_pre_analysis.py",
"chars": 20493,
"preview": "from collections import OrderedDict\nfrom copy import deepcopy\nfrom functools import partial\nimport argparse\nimport gc\nim"
},
{
"path": "python_coreml_stable_diffusion/multilingual_projection.py",
"chars": 2254,
"preview": "from python_coreml_stable_diffusion.torch2coreml import _compile_coreml_model\n\nimport argparse\nimport coremltools as ct\n"
},
{
"path": "python_coreml_stable_diffusion/pipeline.py",
"chars": 34388,
"preview": "#\n# For licensing see accompanying LICENSE.md file.\n# Copyright (C) 2022 Apple Inc. All Rights Reserved.\n#\n\nimport argpa"
},
{
"path": "python_coreml_stable_diffusion/torch2coreml.py",
"chars": 76011,
"preview": "#\n# For licensing see accompanying LICENSE.md file.\n# Copyright (C) 2022 Apple Inc. All Rights Reserved.\n#\n\nfrom python_"
},
{
"path": "python_coreml_stable_diffusion/unet.py",
"chars": 43796,
"preview": "#\n# For licensing see accompanying LICENSE.md file.\n# Copyright (C) 2022 Apple Inc. All Rights Reserved.\n#\n\nfrom python_"
},
{
"path": "requirements.txt",
"chars": 158,
"preview": "coremltools>=8.0\ndiffusers[torch]==0.30.2\ndiffusionkit==0.4.0\ntorch\ntransformers==4.44.2\nscipy\nscikit-learn\npytest\ninvis"
},
{
"path": "setup.py",
"chars": 1365,
"preview": "from setuptools import setup, find_packages\n\nfrom python_coreml_stable_diffusion._version import __version__\n\nwith open("
},
{
"path": "swift/StableDiffusion/pipeline/CGImage+vImage.swift",
"chars": 9085,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/ControlNet.swift",
"chars": 4776,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/DPMSolverMultistepScheduler.swift",
"chars": 11873,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. and The HuggingFace Team. All Rights"
},
{
"path": "swift/StableDiffusion/pipeline/Decoder.swift",
"chars": 2772,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2024 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/DiscreteFlowScheduler.swift",
"chars": 4894,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2024 Apple Inc. All Rights Reserved.\n\nimport CoreML\n"
},
{
"path": "swift/StableDiffusion/pipeline/Encoder.swift",
"chars": 3851,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/ManagedMLModel.swift",
"chars": 4447,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport CoreML\n"
},
{
"path": "swift/StableDiffusion/pipeline/MultiModalDiffusionTransformer.swift",
"chars": 4525,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/MultilingualTextEncoder.swift",
"chars": 7718,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2023 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/NumPyRandomSource.swift",
"chars": 4172,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/NvRandomSource.swift",
"chars": 3392,
"preview": "import Foundation\nimport CoreML\n\n/// A random source consistent with NVIDIA curandom\n///\n/// This implementation refere"
},
{
"path": "swift/StableDiffusion/pipeline/RandomSource.swift",
"chars": 263,
"preview": "import CoreML\n\n@available(iOS 16.2, macOS 13.1, *)\npublic protocol RandomSource {\n mutating func nextNormal(mean: Dou"
},
{
"path": "swift/StableDiffusion/pipeline/ResourceManaging.swift",
"chars": 577,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\n/// Protocol f"
},
{
"path": "swift/StableDiffusion/pipeline/SafetyChecker.swift",
"chars": 6276,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/SampleTimer.swift",
"chars": 1998,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/Scheduler.swift",
"chars": 13460,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Acceler"
},
{
"path": "swift/StableDiffusion/pipeline/StableDiffusion3Pipeline+Resources.swift",
"chars": 4094,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2024 Apple Inc. All Rights Reserved.\n\nimport CoreML\n"
},
{
"path": "swift/StableDiffusion/pipeline/StableDiffusion3Pipeline.swift",
"chars": 19637,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2024 Apple Inc. All Rights Reserved.\n\nimport Acceler"
},
{
"path": "swift/StableDiffusion/pipeline/StableDiffusionPipeline+Resources.swift",
"chars": 6921,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/StableDiffusionPipeline.Configuration.swift",
"chars": 4397,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/StableDiffusionPipeline.swift",
"chars": 18645,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Acceler"
},
{
"path": "swift/StableDiffusion/pipeline/StableDiffusionXL+Resources.swift",
"chars": 4974,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2023 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/StableDiffusionXLPipeline.swift",
"chars": 15678,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2023 Apple Inc. All Rights Reserved.\n\nimport Acceler"
},
{
"path": "swift/StableDiffusion/pipeline/TextEncoder.swift",
"chars": 3403,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/TextEncoderT5.swift",
"chars": 4400,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2023 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/TextEncoderXL.swift",
"chars": 3587,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2023 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/TorchRandomSource.swift",
"chars": 5530,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/pipeline/Unet.swift",
"chars": 7514,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/tokenizer/BPETokenizer+Reading.swift",
"chars": 1914,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/tokenizer/BPETokenizer.swift",
"chars": 6360,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusion/tokenizer/T5Tokenizer.swift",
"chars": 809,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2024 Apple Inc. All Rights Reserved.\n\nimport Foundat"
},
{
"path": "swift/StableDiffusionCLI/main.swift",
"chars": 12909,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport Argumen"
},
{
"path": "swift/StableDiffusionTests/Resources/merges.txt",
"chars": 515346,
"preview": "#version: 0.2 - Trained by `huggingface/tokenizers`\ni n\nt h\na n\nr e\na r\ne r\nth e</w>\nin g</w>\no u\no n\ns t\no r\ne n\no n</w"
},
{
"path": "swift/StableDiffusionTests/Resources/vocab.json",
"chars": 852693,
"preview": "{\"!\":0,\"\\\"\":1,\"#\":2,\"$\":3,\"%\":4,\"&\":5,\"'\":6,\"(\":7,\")\":8,\"*\":9,\"+\":10,\",\":11,\"-\":12,\".\":13,\"/\":14,\"0\":15,\"1\":16,\"2\":17,\"3"
},
{
"path": "swift/StableDiffusionTests/StableDiffusionTests.swift",
"chars": 2327,
"preview": "// For licensing see accompanying LICENSE.md file.\n// Copyright (C) 2022 Apple Inc. All Rights Reserved.\n\nimport XCTest\n"
},
{
"path": "tests/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "tests/test_stable_diffusion.py",
"chars": 15965,
"preview": "#\n# For licensing see accompanying LICENSE.md file.\n# Copyright (C) 2022 Apple Inc. All Rights Reserved.\n#\n\nimport argpa"
}
]
About this extraction
This page contains the full source code of the apple/ml-stable-diffusion GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 61 files (1.8 MB), approximately 765.9k tokens, and a symbol index with 183 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.