================================================
FILE: docs/Gemfile
================================================
source "https://rubygems.org"
gem 'github-pages', group: :jekyll_plugins
gem "webrick", "~> 1.8"
================================================
FILE: docs/README.md
================================================
# Readme
Use the following instructions to make documentation changes locally.
## Prerequisites
```bash
$ sudo apt install ruby bundler
$ bundle config set path 'vendor/bundle'
$ bundle install
```
## Serving locally
```bash
$ bundle exec jekyll serve
```
## Theme documentation
We are using the [just the docs](https://just-the-docs.github.io/just-the-docs/)
theme.
================================================
FILE: docs/_config.yml
================================================
# Welcome to Jekyll!
#
# This config file is meant for settings that affect your whole blog, values
# which you are expected to set up once and rarely edit after that. If you find
# yourself editing this file very often, consider using Jekyll's data files
# feature for the data you need to update frequently.
#
# For technical reasons, this file is *NOT* reloaded automatically when you use
# 'bundle exec jekyll serve'. If you change this file, please restart the server process.
# Site settings
# These are used to personalize your new site. If you look in the HTML files,
# you will see them accessed via {{ site.title }}, {{ site.email }}, and so on.
# You can create any custom variable you would like, and they will be accessible
# in the templates via {{ site.myvariable }}.
title: OSS-Fuzz
description: Documentation for OSS-Fuzz
baseurl: "/oss-fuzz" # the subpath of your site, e.g. /blog
url: "" # the base hostname & protocol for your site, e.g. http://example.com
# Build settings
markdown: kramdown
remote_theme: pmarsceill/just-the-docs
search_enabled: true
ga_tracking: G-LRX1V3S5P
aux_links:
"OSS-Fuzz on GitHub":
- https://github.com/google/oss-fuzz
# Exclude from processing.
exclude:
- Gemfile
- Gemfile.lock
- node_modules
- vendor/bundle/
- vendor/cache/
- vendor/gems/
- vendor/ruby/
================================================
FILE: docs/_sass/color_schemes/wider.scss
================================================
@import "./color_schemes/light";
$content-width: 70rem;
================================================
FILE: docs/advanced-topics/advanced_topics.md
================================================
---
layout: default
title: Advanced topics
has_children: true
nav_order: 3
permalink: /advanced-topics/
---
# Advanced topics
================================================
FILE: docs/advanced-topics/bug_fixing_guidance.md
================================================
---
layout: default
title: Bug fixing guidance
nav_order: 6
permalink: /advanced-topics/bug-fixing-guidance
---
# Bug fixing guidance
{: .no_toc}
This page provides brief guidance on how to prioritise and fix bugs reported by
OSS-Fuzz.
- TOC
{:toc}
## Threat modelling
In general the severity of an issue reported by OSS-Fuzz must be determined
relative to the threat model of the project under analysis. Therefore, although
the fuzzers OSS-Fuzz makes an effort into determining the severity of the bug
the true severity of the bug depends on the threat model of the project.
## Bug prioritisation
### Security issues
These are the top priority of solving. A label is attached to these on
the OSS-Fuzz testcase page and you can also search up all of these on monorail
using the search pattern `-Bug=security`.
Issues of this kind include issues reported by Address Sanitizer, e.g.
heap-based buffer overflows, stack-based buffer overflows and use-after-frees.
### Functional issues and memory leaks
These are issues that in general can tamper with the functionality of the
application. The bugs that have highest priority in this case are those that
can be easily triggered by an untrusted user of the project.
### Timeouts and out-of-memory
These are in general the least prioritised issues to solve.
### Bug prioritisation of non C/C++ projects
Currently there is no prioritisation of bugs in non C/C++ projects. As such, in
this scenario it is crucial you do the analysis yourself relative to the threat
model of your project.
## Non-reproducible bugs
OSS-Fuzz will report some bugs that are labeled `Reliably reproduces: NO` and
these can be tricky to deal with. A non-reproducible bug is an issue that
OSS-Fuzz did indeed discover, however, OSS-Fuzz is unable to reproduce the bug
with `python3 infra/helper.py reproduce`. In general, our suggestion is to do
analysis of the bug and determine whether there in fact is an issue.
The non-reproducible bugs can be of varying nature. Some of these bugs will be
due to some internal state of the target application being manipulated over the
cause of several executions of the fuzzer function. This could be several
hundreds or even thousands of executions and the bug may not be reproducible by
a single fuzzer test-case, however, there is indeed a bug in the application.
There are other reasons why bugs may be non-reproducible and in general any
non-determinism introduced into the application can have an effect on this.
In the case of non-reproducible bugs our advice is to put effort into analysing
the potential bug and also assess whether this is due to some internal state
that persists between each fuzz run. If that is indeed the case then we also
suggest investigating whether the fuzzer can be written such that the internal
state in the code will be reset between each fuzz run.
## Should all reported issues be solved?
It is reported by some project maintainers that fixing timeout issues reported
by OSS-Fuzz can increase the complexity of the project’s source code. The
result of this is that maintainers put effort into solving a timeout issue and
the fix results in additional complexity of the project. The question is
whether in a scenario like this if the overall result actually improves the
state of the application.
In order to answer this question we must assess the issue relative to the
threat model. Following the timeout anecdote then some timing issues can have
severe security implications. For example, if the timeout issue can cause
manipulation of control-flow then the timing issue may be of high security
severity. As such, it is difficult to say in the general case whether or not
some bugs should not be solved, as it should be analysed and determined on a
project-by-project basis.
In the event that a bug is reported by OSS-Fuzz that is not relevant to
security or reliability of the application then there may still be a point to
fixing the bug. For example, if the issue is often run into by the fuzzer then
the fuzzer may have difficulty exploring further code in the target, and thus
fixing the bug will allow the fuzzer to explore further code. In this case some
suggested examples of resolving the issue could be:
* Perform a hot-patch that is only applied during fuzzer executions and does
not overcomplicate the project’s code.
* Patch the code of the fuzzer to avoid the timeout. For example, some fuzzers
restrict the size of the input to avoid certain deep recursions or
time-intensive loops.
* Patch the code in the target despite complicating things.
================================================
FILE: docs/advanced-topics/code_coverage.md
================================================
---
layout: default
title: Code coverage
parent: Advanced topics
nav_order: 2
permalink: /advanced-topics/code-coverage/
---
# Code Coverage
{: .no_toc}
For projects written in C/C++, Rust, Go, Swift or Java and other JVM-based languages,
you can generate code coverage reports using Clang source-based code coverage.
This page walks you through the basic steps.
For more details on C/C++ coverage, see [Clang's documentation].
Code coverage reports generation for other languages is not supported yet.
- TOC
{:toc}
---
## Pull the latest Docker images
Docker images get regularly updated with a newer version of build tools, build
configurations, scripts, and other changes. We recommend you pull the most
recent images by running the following command:
```bash
$ python3 infra/helper.py pull_images
```
## Build fuzz targets
Code coverage report generation requires a special build configuration to be
used. To create a code coverage build for your project, run these commands:
```bash
$ python3 infra/helper.py build_image $PROJECT_NAME
$ python3 infra/helper.py build_fuzzers --sanitizer=coverage $PROJECT_NAME
```
## Establish access to GCS
To get a good understanding of fuzz testing quality, you should generate code
coverage reports by running fuzz targets against the corpus
aggregated by OSS-Fuzz. Set up `gsutil` and ensure that you have access to the
corpora by doing the following:
* Install the [gsutil tool].
* Check whether you have access to the corpus for your project:
```bash
$ gsutil ls gs://${PROJECT_NAME}-corpus.clusterfuzz-external.appspot.com/
```
If you see an authorization error from the command above, run this:
```bash
$ gcloud auth login
```
and try again. Once `gsutil` works, you can run the report generation.
## Generate code coverage reports
### Full project report
If you want to generate a code coverage report using the corpus aggregated on
OSS-Fuzz, run this command:
```bash
$ python3 infra/helper.py coverage $PROJECT_NAME
```
If you want to generate a code coverage report using the corpus you have
locally, copy the corpus into the
`build/corpus/$PROJECT_NAME//` directories for each fuzz
target, then run this command:
```bash
$ python3 infra/helper.py coverage --no-corpus-download $PROJECT_NAME
```
### Single fuzz target
You can generate a code coverage report for a particular fuzz target by using
the `--fuzz-target` argument:
```bash
$ python3 infra/helper.py coverage --fuzz-target= $PROJECT_NAME
```
In this mode, you can specify an arbitrary corpus location for the fuzz target
(instead of the corpus downloaded from OSS-Fuzz) by using `--corpus-dir`:
```bash
$ python3 infra/helper.py coverage --fuzz-target= \
--corpus-dir= $PROJECT_NAME
```
### Additional arguments for `llvm-cov` (C/C++/Rust only)
You may want to use some of the options provided by the [llvm-cov tool], like
`-ignore-filename-regex=`. You can pass these to the helper script after `--`:
```bash
$ python3 infra/helper.py coverage $PROJECT_NAME -- \
-ignore-filename-regex=.*code/to/be/ignored/.*
```
If you want to specify particular source files or directories to show in the
report, list their paths at the end of the extra arguments sequence:
```bash
$ python3 infra/helper.py coverage zlib -- \
/src/zlib/inftrees.c /src/zlib_uncompress_fuzzer.cc /src/zlib/zutil.c
```
If you want OSS-Fuzz to use extra arguments when generating code coverage
reports for your project, add the arguments into your `project.yaml` file as
follows:
```yaml
coverage_extra_args: -ignore-filename-regex=.*crc.* -ignore-filename-regex=.*adler.*
```
[Clang's documentation]: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
[gsutil tool]: https://cloud.google.com/storage/docs/gsutil_install
[llvm-cov tool]: https://llvm.org/docs/CommandGuide/llvm-cov.html
================================================
FILE: docs/advanced-topics/corpora.md
================================================
---
layout: default
title: Corpora
parent: Advanced topics
nav_order: 3
permalink: /advanced-topics/corpora/
---
# Accessing Corpora
{: .no_toc}
If you want to access the corpora that we are using for your fuzz targets
(synthesized by the fuzzing engines), follow these steps.
- TOC
{:toc}
---
## Obtain access
To get access to a project's corpora, you must be listed as the
primary contact or as an auto cc in the project's `project.yaml` file, as described
in the [New Project Guide]({{ site.baseurl }}/getting-started/new-project-guide/#projectyaml).
If you don't do this, most of the links below won't work.
## Install Google Cloud SDK
The corpora for fuzz targets are stored on [Google Cloud
Storage](https://cloud.google.com/storage/). To access them, you need to
[install the gsutil
tool](https://cloud.google.com/storage/docs/gsutil_install), which is part of
the Google Cloud SDK. Follow the instructions on the installation page to
login with the Google account listed in your project's `project.yaml` file.
## Viewing the corpus for a fuzz target
The fuzzer statistics page for your project on
[ClusterFuzz]({{ site.baseurl }}/further-reading/clusterfuzz)
contains a link to the Google Cloud console for your corpus under the
**corpus_size** column. Click the link to browse and download individual test inputs in the
corpus.

## Downloading the corpus
If you want to download the entire corpus, click the link in the **corpus_size** column, then
copy the **Buckets** path at the top of the page:

Copy the corpus to a directory on your
machine by running the following command:
```bash
$ gsutil -m cp -r gs://
```
Using the expat example above, this would be:
```bash
$ gsutil -m cp -r \
gs://expat-corpus.clusterfuzz-external.appspot.com/libFuzzer/expat_parse_fuzzer \
```
## Corpus backups
We keep daily zipped backups of your corpora. These can be accessed from the
**corpus_backup** column of the fuzzer statistics page. Downloading these can
be significantly faster than running `gsutil -m cp -r` on the corpus bucket.
================================================
FILE: docs/advanced-topics/debugging.md
================================================
---
layout: default
title: Debugging
parent: Advanced topics
nav_order: 4
permalink: /advanced-topics/debugging/
---
# Debugging issues
{: .no_toc}
- TOC
{:toc}
---
## Debugging build scripts
While developing your build script, it may be useful to run bash within the
container:
```bash
$ python3 infra/helper.py shell $PROJECT_NAME # runs /bin/bash within container
$ compile # runs compilation manually
```
## Debugging fuzzers with GDB
If you wish to debug a fuzz target with gdb, you can use the base-runner-debug
image:
```bash
# Copy input testcase into host output directory so it can be accessed
# within the Docker image.
$ cp /path/to/testcase build/out/$PROJECT_NAME
# Run the Docker image containing GDB.
$ python3 infra/helper.py shell base-runner-debug
$ gdb --args /out/$PROJECT_NAME/$FUZZ_TARGET_NAME /out/$PROJECT_NAME/testcase
```
**Note:** The `base-runner-debug` image does not have access to your sources, so
you will not be able to do source code level debugging. We recommend integrating
your fuzz target upstream as part of
[ideal integration]({{ site.baseurl }}/advanced-topics/ideal-integration/)
for debugging purposes.
================================================
FILE: docs/advanced-topics/fuzz_introspector.md
================================================
---
layout: default
title: Fuzz Introspector
parent: Advanced topics
nav_order: 2
permalink: /advanced-topics/fuzz-introspector/
---
# Fuzz Introspector
{: .no_toc}
For projects written in C/C++, Python and Java you can generate Fuzz
Introspector reports to help guide the development of your fuzzing suite.
These reports help to extract details about the fuzzing setup of your
project with the goal of making it easier to improve the fuzzing set up.
The Fuzz Introspector reports are generated automatically and uploaded
to the cloud like code coverage reports, and you can also generate them
locally using the OSS-Fuzz helper script.
- TOC
{:toc}
---
## Fuzz Introspector overview
As soon as your project is run with ClusterFuzz (<1 day), you can view the Fuzz
Introspector report for your project.
[Fuzz Introspector](https://github.com/ossf/fuzz-introspector) helps you
understand your fuzzers' performance and identify any potential blockers.
It provides individual and aggregated fuzzer reachability and coverage reports.
You can monitor each fuzzer's static reachability potential and compare it
against dynamic coverage and identify any potential bottlenecks.
Fuzz Introspector can offer suggestions on increasing coverage by adding new
fuzz targets or modify existing ones.
Fuzz Introspector reports can be viewed from the [OSS-Fuzz
homepage](https://oss-fuzz.com/) or through this
[index](http://oss-fuzz-introspector.storage.googleapis.com/index.html).
- [Fuzz Introspector documentation](https://fuzz-introspector.readthedocs.io/en/latest/)
- [Fuzz Introspector source code](https://github.com/ossf/fuzz-introspector)
- [OSS-Fuzz Fuzz Introspector reports](http://oss-fuzz-introspector.storage.googleapis.com/index.html)
## Tutorials and guides
The reports generated can be a lot to digest when first viewing them. The
[Fuzz Introspector documentation](https://fuzz-introspector.readthedocs.io/en/latest/)
provides various user guides and tutorials rooted in OSS-Fuzz projects, which is
a useful reference on how to make use of the reports.
For ideas on how to use Fuzz Introspector, see [user guides](https://fuzz-introspector.readthedocs.io/en/latest/user-guides/index.html) which includes sections e.g.
- [Quickly extract overview of a given project](https://fuzz-introspector.readthedocs.io/en/latest/user-guides/quick-overview.html)
- [Get ideas for new fuzz targets](https://fuzz-introspector.readthedocs.io/en/latest/user-guides/get-ideas-for-new-targets.html)
- [Comparing introspector reports](https://fuzz-introspector.readthedocs.io/en/latest/user-guides/comparing-introspector-reports.html)
## Run Fuzz Introspector locally
To generate a Fuzz Introspector report locally use `infra/helper.py` and the
`introspector` command. Fuzz Introspector relies on code coverage to
analyze a given project, and this means we need to extract code coverage in the
Fuzz Introspector process. We can do this in two ways. First, by running the fuzzers
for a given amount of time, and, second, by generating code coverage using the public
corpus available from OSS-Fuzz.
### Generate reports by running fuzzers for X seconds
The following command will generate a Fuzz Introspector report for the `libdwarf` project
and will extract code coverage based on a corpus created from running the fuzzers for 30
seconds.
```bash
$ python3 infra/helper.py introspector libdwarf --seconds=30
```
If the above command was succesful, you should see output along the lines of:
```bash
INFO:root:To browse the report, run: python3 -m http.server 8008 --directory /home/my_user/oss-fuzz/build/out/libdwarf/introspector-report/inspector and navigate to localhost:8008/fuzz_report.html in your browser
```
The above output gives you directions on how to start a simple webserver using
`python3 -m http.server`, which you can use to view the Fuzz Introspector report.
### Generate reports by using public corpora
The following command will generate a Fuzz Introspector report for the `libdwarf` project
and will extract code coverage based on the publicly available corpora.
```bash
$ python3 infra/helper.py introspector libdwarf --public-corpora
```
Assuming the above command is succesful you can view the report using `python3 -m http.server`
following the example described above.
## Differences in build tooling
There are some differences in build environment for Fuzz Introspector builds
in comparison to e.g. ASAN or code coverage builds. The reason is that
Fuzz Introspector relies on certain compile-time tools to do its analysis.
This compile time tooling differs between languages, namely:
- For C/C++, Fuzz Introspector relies on [LLVM LTO](https://llvm.org/docs/LinkTimeOptimization.html) and [LLVM Gold](https://llvm.org/docs/GoldPlugin.html)
- For Python, Fuzz Introspector relies on a modified [PyCG](https://github.com/vitsalis/PyCG)
- For Java, Fuzz Introspector relies on [Soot](https://soot-oss.github.io/soot/)
The consequence of this is your project must be compatible with these projects.
PyCG and Soot have not shown to be a blocker for many projects, however, experience
has shown that sometimes a project's build needs modification in order to compile
with LLVM LTO. The easiest way to test if your project works with LLVM is checking
whether your project can compile with the flags `-flto -fuse-ld=gold` and using
the gold linker. OSS-Fuzz automatically sets these flags and linker options when
using `infra/helper.py` to build your project with `--sanitizer=introspector`, e.g.
```bash
python3 infra/helper.py build_fuzzers --sanitizer=introspector PROJ_NAME
```
================================================
FILE: docs/advanced-topics/ideal_integration.md
================================================
---
layout: default
title: Ideal integration
parent: Advanced topics
nav_order: 1
permalink: /advanced-topics/ideal-integration/
---
# Ideal integration with OSS-Fuzz
{: .no_toc}
OSS projects have different build and test systems. We can't expect them all to
implement and maintain fuzz targets or integrate them with OSS-Fuzz in the same
way. However, we do have recommendations.
This page documents several features (starting from the easiest) that will make
automated fuzzing simple and efficient, and will help you catch regressions
early in the development cycle. This simple
[example](https://github.com/google/oss-fuzz/tree/master/projects/example/my-api-repo)
covers most of the items.
- TOC
{:toc}
---
## Summary
Every [fuzz target](https://llvm.org/docs/LibFuzzer.html#fuzz-target):
* Is [maintained by code owners](#fuzz-target) in their RCS (Git, SVN, etc).
* Is [built with the rest of the tests](#build-support) - no bit rot!
* Has a [seed corpus](#seed-corpus) with good [code coverage](#coverage).
* Has a [dictionary](#dictionary), if applicable.
* Is [continuously tested on the seed corpus](#regression-testing) with
[ASan/UBSan/MSan](https://github.com/google/sanitizers).
* Is [fast and has no OOMs](#performance).
## Fuzz Target
The code of the [fuzz target(s)](https://llvm.org/docs/LibFuzzer.html#fuzz-target) should be
part of the project's source code repository. All fuzz targets should be easily
discoverable (reside in the same directory, follow the same naming pattern,
etc.).
This makes it easy to maintain the fuzzers and minimizes breakages that can
arise as source code changes over time.
Make sure to fuzz the target locally for a small period of time to ensure that
it does not crash, hang, or run out of memory instantly. Also make sure that the fuzzer can
make at least some progress. If you're having trouble, read about [what makes a good fuzz
target](https://github.com/google/fuzzing/blob/master/docs/good-fuzz-target.md).
The interface between the [fuzz target](https://llvm.org/docs/LibFuzzer.html#fuzz-target)
and the fuzzing engines is C, so you can use either C or C++ to implement the
fuzz target. Make sure to not return values other than **zero** [^1].
Examples:
[boringssl](https://github.com/google/boringssl/tree/master/fuzz),
[SQLite](https://www.sqlite.org/src/artifact/ad79e867fb504338),
[s2n](https://github.com/awslabs/s2n/tree/master/tests/fuzz),
[openssl](https://github.com/openssl/openssl/tree/master/fuzz),
[FreeType](http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/tools/ftfuzzer),
[re2](https://github.com/google/re2/tree/master/re2/fuzzing),
[harfbuzz](https://github.com/behdad/harfbuzz/tree/master/test/fuzzing),
[pcre2](https://vcs.pcre.org/pcre2/code/trunk/src/pcre2_fuzzsupport.c?view=markup),
[ffmpeg](https://github.com/FFmpeg/FFmpeg/blob/master/tools/target_dec_fuzzer.c).
[^1]: While LibFuzzer uses a non-zero value as a signal to discard inputs other fuzzers in
use by OSS-Fuzz do not necessarily support this behavior. (Discarding inputs can be used
to stop a fuzzer from exploring further, which should only be used with good reason.)
## Build support
Many different build systems exist in the open-source world. The less OSS-Fuzz
knows about them, the better it can scale.
An ideal build integration for OSS-Fuzz looks like this:
* For every fuzz target `foo` in the project, there is a build rule that
builds `foo_fuzzer`, a binary that:
* Contains the fuzzing entry point.
* Contains (`LLVMFuzzerTestOneInput`) and all the code it depends on.
* Uses the `main()` function from `$LIB_FUZZING_ENGINE` (env var [provided]({{ site.baseurl }}/getting-started/new-project-guide/) by OSS-Fuzz environment).
* Since the build system supports changing the compiler and passing extra compiler
flags, the build command for `foo_fuzzer` looks similar to this:
```bash
# Assume the following env vars are set:
# CC, CXX, CFLAGS, CXXFLAGS, LIB_FUZZING_ENGINE
$ make_or_whatever_other_command foo_fuzzer
```
This minimizes OSS-Fuzz-specific configuration, making your fuzzing more robust.
There is no point in hardcoding the exact compiler flags in the build system
because they a) may change and b) depend on the fuzzing engine and sanitizer
being used.
## Seed Corpus
The *seed corpus* is a set of test inputs, stored as individual files, provided
to the fuzz target as a starting point (to "seed" the mutations). The quality of
the seed corpus has a huge impact on fuzzing efficiency; the higher the quality,
the easier it is for the fuzzer to discover new code paths. The ideal corpus is
a minimal set of inputs that provides maximal code coverage.
For better OSS-Fuzz integration, the seed corpus should be available in
revision control (it can be the same as or different from the source code). It
should be regularly extended with the inputs that (used to) trigger bugs and/or
touch new parts of the code.
Examples:
[boringssl](https://github.com/google/boringssl/tree/master/fuzz),
[openssl](https://github.com/openssl/openssl/tree/master/fuzz),
[nss](https://github.com/mozilla/nss-fuzzing-corpus) (corpus in a separate repo).
## Dictionary
For some input types, a simple dictionary of tokens used by the input language
can have a dramatic impact on fuzzing efficiency. For example, when fuzzing an
XML parser, a dictionary of XML tokens is helpful. AFL++ has a
[collection](https://github.com/AFLplusplus/AFLplusplus/tree/master/dictionaries)
of dictionaries for popular data formats. Ideally, a dictionary should be
maintained alongside the fuzz target, and it must use [correct
syntax](https://llvm.org/docs/LibFuzzer.html#dictionaries).
## Coverage
For a fuzz target to be useful, it must have good coverage in the code that it
is testing. You can view the coverage for your fuzz targets by looking at the
[fuzzer stats]({{ site.baseurl }}/further-reading/clusterfuzz#fuzzer-stats)
dashboard on ClusterFuzz, as well as [coverage reports]({{ site.baseurl
}}/further-reading/clusterfuzz#coverage-reports).
To generate an aggregated code coverage report for your project, please see the
[code coverage]({{ site.baseurl }}/advanced-topics/code-coverage) page.
Coverage can often be improved by adding dictionaries, more inputs for seed
corpora, and fixing timeouts/out-of-memory bugs in your targets.
## Regression Testing
Fuzz targets should be regularly tested (not necessarily fuzzed!) as a part of
the project's regression testing process. One way to do so is to link the fuzz
target with a simple standalone driver
([example](https://github.com/llvm-mirror/compiler-rt/tree/master/lib/fuzzer/standalone))
that runs the provided inputs, then use this driver with the seed corpus created
in previous step. We recommend you use
[sanitizers](https://github.com/google/sanitizers) during regression testing.
Examples: [SQLite](https://www.sqlite.org/src/artifact/d9f1a6f43e7bab45),
[openssl](https://github.com/openssl/openssl/blob/master/fuzz/test-corpus.c).
## Performance
Fuzz targets should perform well, because high memory usage and/or slow
execution speed can slow the down the growth of coverage and finding of new
bugs. ClusterFuzz provides a [performance analyzer]({{ site.baseurl
}}/further-reading/clusterfuzz/#performance-analyzer) for each fuzz target that
shows problems that are impacting performance.
## Not a project member?
If you are a member of the project you want to fuzz, most of the steps above are
simple. However in some cases, someone outside the project team may want to fuzz
the code, and the project maintainers are not interested in helping.
In such cases, we can host the fuzz targets, dictionaries, etc. in OSS-Fuzz's
repository and mention them in the Dockerfile. It's not ideal, because the fuzz
targets will not be continuously tested, so may quickly bitrot.
Examples: [libxml2](https://github.com/google/oss-fuzz/tree/master/projects/libxml2),
[c-ares](https://github.com/google/oss-fuzz/tree/master/projects/c-ares), [expat](https://github.com/google/oss-fuzz/tree/master/projects/expat).
If you are not a project maintainer, we may not be able to CC you to security
bugs found by OSS-Fuzz.
================================================
FILE: docs/advanced-topics/reproducing.md
================================================
---
layout: default
title: Reproducing
parent: Advanced topics
nav_order: 5
permalink: /advanced-topics/reproducing/
---
# Reproducing OSS-Fuzz issues
{: .no_toc}
You've been CCed on an OSS-Fuzz issue
([examples](https://bugs.chromium.org/p/oss-fuzz/issues/list?can=1&q=Type%3ABug%2CBug-Security)).
Now what? Before attempting to fix the bug, you should be able to reliably
reproduce it.
- TOC
{:toc}
---
## Fuzz target bugs
Every issue has a [reproducer file]({{ site.baseurl
}}/reference/glossary/#reproducer) (also know as a "testcase" file) attached.
Download it. This file contains the bytes that were fed to the [fuzz
target](https://llvm.org/docs/LibFuzzer.html#fuzz-target).
**Note:** If the issue is not public, you will need to login using a
[Google account](https://support.google.com/accounts/answer/176347?hl=en)
([why?]({{ site.baseurl
}}/faq/#why-do-you-require-a-google-account-for-authentication)) that the bug
report CCs.
If you have already
[integrated]({{ site.baseurl }}/advanced-topics/ideal-integration/)
the fuzz target with your build and test system, all you have to do is run this command:
```bash
$ ./fuzz_target_binary
```
For timeout bugs, add the `-timeout=65` argument. For OOM bugs, add the
`-rss_limit_mb=2560` argument. Read more on [how timeouts and OOMs are
handled]({{ site.baseurl }}/faq/#how-do-you-handle-timeouts-and-ooms).
Depending on the nature of the bug, the fuzz target binary needs to be built
with the appropriate [sanitizer](https://github.com/google/sanitizers)
(for example, if it's a buffer overflow, build with
[AddressSanitizer](http://clang.llvm.org/docs/AddressSanitizer.html)).
If you're not sure how to build the fuzzer using the project's build system,
you can also use Docker commands to replicate the exact build steps used by
OSS-Fuzz, then feed the reproducer input to the fuzz target ([how?]({{
site.baseurl }}/getting-started/new-project-guide/#prerequisites), [why?]({{
site.baseurl }}/faq/#why-do-you-use-docker)).
## Building using Docker
### Cloning OSS-Fuzz
To use the following `infra/helper.py` commands, you need a checkout of OSS-Fuzz:
```bash
$ git clone --depth=1 https://github.com/google/oss-fuzz.git
$ cd oss-fuzz
```
### Pull the latest Docker images
Docker images get regularly updated with a newer version of build tools, build
configurations, scripts, and other changes. In some cases, a particular issue
can be reproduced only with a fresh image being used. Pull the latest images
by running the following command:
```bash
$ python3 infra/helper.py pull_images
```
### Build the image and the fuzzers
Run the following commands:
```bash
$ python3 infra/helper.py build_image $PROJECT_NAME
$ python3 infra/helper.py build_fuzzers --sanitizer \
--architecture $PROJECT_NAME
```
The `sanitizer` used in the report is the value in the
**Sanitizer** column. It's one of the following:
* **address** for AddressSanitizer.
* **memory** for MemorySanitizer.
* **undefined** for UndefinedBehaviorSanitizer.
**Notes**:
* The `architecture` argument is only necessary if you want to specify
`i386` configuration.
* Some bugs (specially ones related to pointer and integer overflows) are reproducible only in 32 bit mode or only in 64 bit mode.
If you can't reproduce a particular bug building for x86_64, try building for i386.
## Reproducing bugs
After you build an image and a fuzzer, you can reproduce a bug by running the following command:
```bash
$ python3 infra/helper.py reproduce $PROJECT_NAME
```
For example, to build the [libxml2](https://github.com/google/oss-fuzz/tree/master/projects/libxml2)
project with UndefinedBehaviorSanitizer (`undefined`) instrumentation and
reproduce a crash testcase for a fuzzer named `libxml2_xml_read_memory_fuzzer`,
you would run:
```bash
$ python3 infra/helper.py build_image libxml2
$ python3 infra/helper.py build_fuzzers --sanitizer undefined libxml2
$ python3 infra/helper.py reproduce libxml2 libxml2_xml_read_memory_fuzzer ~/Downloads/testcase
```
## Reproduce using local source checkout
You can also mount local sources into the running container by using these commands:
```bash
$ python3 infra/helper.py build_fuzzers \
--sanitizer $PROJECT_NAME
$ python3 infra/helper.py reproduce $PROJECT_NAME
```
Once you reproduce the bug, you can do the following:
- **Fix issue:** Write a patch to fix the issue in your local checkout, then
use the previous command to verify the fix (i.e. no crash occurred).
[Use gdb]({{ site.baseurl }}/advanced-topics/debugging/#debugging-fuzzers-with-gdb)
if needed.
- **Submit fix:** Submit the fix in the project's repository. ClusterFuzz will
automatically pick up the changes, recheck the testcase, and close the
issue (in < 1 day).
- **Improve fuzzing support:** Consider
[improving your integration with OSS-Fuzz]({{ site.baseurl }}/advanced-topics/ideal-integration/).
## Reproducing build failures
Our infrastructure runs some sanity tests to make sure that your build was
correctly configured, even if it succeeded. To reproduce these locally, run these commands:
```bash
$ python3 infra/helper.py build_image $PROJECT_NAME
$ python3 infra/helper.py build_fuzzers --sanitizer \
--engine --architecture $PROJECT_NAME
$ python3 infra/helper.py check_build --sanitizer \
--engine --architecture $PROJECT_NAME \
```
**Note:** Unless you have a reason to think the build is an `i386` build, the build
is probably an `x86_64` build and the `architecture` argument can be omitted.
If you need to reproduce a `coverage` build failure, follow the
[Code Coverage page]({{ site.baseurl }}/advanced-topics/code-coverage) to build
your project and generate a code coverage report.
================================================
FILE: docs/assets/css/just-the-docs-wider.scss
================================================
---
---
{% include css/just-the-docs.scss.liquid color_scheme="wider" %}
================================================
FILE: docs/faq.md
================================================
---
layout: default
title: FAQ
nav_order: 7
permalink: /faq/
---
# Frequently Asked Questions
- TOC
{:toc}
---
## Where can I learn more about fuzzing?
We recommend reading [libFuzzer tutorial] and the other docs in [google/fuzzing]
repository. These and some other resources are listed on the
[useful links]({{ site.baseurl }}/reference/useful-links/#tutorials) page.
[google/fuzzing]: https://github.com/google/fuzzing/tree/master/docs
[libFuzzer tutorial]: https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md
## What kind of projects are you accepting?
We accept established projects that have a critical impact on infrastructure and
user security. We will consider each request on a case-by-case basis, but some
things we keep in mind are:
- Exposure to remote attacks (e.g. libraries that are used to process
untrusted input).
- Number of users/other projects depending on this project.
We hope to relax this requirement in the future though, so keep an eye out even
if we are not able to accept your project at this time!
## How can I find potential fuzz targets in my open source project?
You should look for places in your code that:
- consume un-trusted data from users or from the network.
- consume complex input data even if it's 'trusted'.
- use an algorithm that has two or more implementations
(to verify their equivalence).
- look for existing fuzz target [examples](https://github.com/google/oss-fuzz/tree/master/projects)
and find similarities.
## Where can I store fuzz target sources and the build script if it's not yet accepted upstream?
Fuzz target sources as well as the build script may temporarily live inside the
`projects/` directory in the OSS-Fuzz repository. Note that we do
not accept integrations that rely on forked repositories. Refer to the
[ideal integration guide] for the preferred long term solution.
## My project is not open source. Can I use OSS-Fuzz?
You cannot use OSS-Fuzz, but you can use [ClusterFuzz] which OSS-Fuzz is based
on. ClusterFuzz is an open-source fuzzing infrastructure that you can deploy in
your own environment and run continuously at scale.
OSS-Fuzz is a production instance of ClusterFuzz, plus the code living in
[OSS-Fuzz repository]: build scripts, `project.yaml` files with contacts, etc.
[OSS-Fuzz repository]: https://github.com/google/oss-fuzz
## Why do you use a [different issue tracker](https://bugs.chromium.org/p/oss-fuzz/issues/list) for reporting bugs in OSS projects?
Security access control is important for the kind of issues that OSS-Fuzz detects,
hence why by default issues are only opened on the OSS-Fuzz tracker.
You can opt-in to have them on Github as well by adding the `file_github_issue`
attribute to your `project.yaml` file. Note that this is only for visibility's
purpose, and that the actual details can be found by following the link to the
OSS-Fuzz tracker.
## Why do you require a Google account for authentication?
Our [ClusterFuzz]({{ site.baseurl }}/further-reading/clusterfuzz) fuzzing
infrastructure and [issue tracker](https://bugs.chromium.org/p/oss-fuzz/issues/list)
require a Google account for authentication. Note that an alternate email
address associated with a Google account does not work due to appengine api
limitations.
## Why do you use Docker?
Building fuzzers requires building your project with a fresh Clang compiler and
special compiler flags. An easy-to-use Docker image is provided to simplify
toolchain distribution. This also simplifies our support for a variety of Linux
distributions and provides a reproducible environment for fuzzer
building and execution.
## How do you handle timeouts and OOMs?
If a single input to a [fuzz target]({{ site.baseurl }}/reference/glossary/#fuzz-target)
requires more than **~25 seconds** or more than **2.5GB RAM** to process, we
report this as a timeout or an OOM (out-of-memory) bug
(examples: [timeouts](https://bugs.chromium.org/p/oss-fuzz/issues/list?can=1&q=%22Crash+Type%3A+Timeout%22),
[OOMs](https://bugs.chromium.org/p/oss-fuzz/issues/list?can=1&q="Crash+Type%3A+Out-of-memory")).
This may or may not be considered as a real bug by the project owners,
but nevertheless we treat all timeouts and OOMs as bugs
since they significantly reduce the efficiency of fuzzing.
Remember that fuzzing is executed with AddressSanitizer or other
sanitizers which introduces a certain overhead in RAM and CPU.
We currently do not have a good way to deduplicate timeout or OOM bugs.
So, we report only one timeout and only one OOM bug per fuzz target.
Once that bug is fixed, we will file another one, and so on.
## Can I change the default timeout and OOM for a fuzz target?
Yes, you can. For this, create a fuzz target options file named `.options`,
where `` is the executable file name of the fuzz target, in the same
directory as your `project.yaml`. The options file can contain fuzzer-specific
configuration values, such as:
```
[libfuzzer]
rss_limit_mb = 6000
timeout = 30
```
## My library gracefully handles allocation failures, why are OOMs reported?
OOM detection is done *not* by instrumenting memory allocation routines such as `malloc`
to have them return NULL, but using a separate watchdog thread that measures the resident
set size (RSS) on a periodic basis. Therefore, your fuzz target might successfully
allocate more than the configured max RSS, and yet get killed shortly afterwards.
The only reliable way to avoid this is for your fuzz target to use a custom allocator
that will prevent allocating more memory than a given limit. You can find a more
detailed discussion of this topic, as well as links to the solution implemented
by a specific project, in [this issue](https://github.com/google/oss-fuzz/issues/1830).
## Can I launch an additional process (e.g. a daemon) from my fuzz target?
No. In order to get all the benefits of in-process, coverage-guided fuzz testing,
it is required to run everything inside a single process. Any child processes
created outside the main process introduces heavy launch overhead and is not
monitored for code coverage.
Another rule of thumb is: "the smaller fuzz target is, the better it is". It is
expected that your project will have many fuzz targets to test different
components, instead of a single fuzz target trying to cover everything.
Think of fuzz target as a unit test, though it is much more powerful since it
helps to test millions of data permutations rather than just one.
## What if my fuzz target finds a bug in another project (dependency) ?
Every bug report has a crash stack-trace that shows where the crash happened.
Using that, you can debug the root cause and see which category the bug falls in:
- If this is a bug is due to an incorrect usage of the dependent project's API
in your project, then you need to fix your usage to call the API correctly.
- If this is a real bug in the dependent project, then you should CC the
maintainers of that project on the bug. Once CCed, they will get automatic
access to all the information necessary to reproduce the issue. If this project
is maintained in OSS-Fuzz, you can search for contacts in the respective
project.yaml file.
## What if my fuzzer does not find anything?
If your fuzz target is running for many days and does not find bugs or new
coverage, it may mean several things:
- We've covered all reachable code. In order to cover more code we need more
fuzz targets.
- The [seed corpus]({{ site.baseurl }}/getting-started/new-project-guide#seed-corpus) is not good enough and the
fuzzing engine(s) are not able to go deeper based on the existing seeds.
Need to add more seeds.
- There is some crypto/crc stuff in the code that will prevent any fuzzing
engine from going deeper, in which case the crypto should be disabled in
[fuzzing mode](https://llvm.org/docs/LibFuzzer.html#fuzzer-friendly-build-mode).
Examples: [openssl](https://github.com/openssl/openssl/tree/master/fuzz#reproducing-issues),
[boringssl](https://boringssl.googlesource.com/boringssl/+/HEAD/FUZZING.md#Fuzzer-mode)
- It is also possible that the fuzzer is running too slow
(you may check the speed of your targets at https://oss-fuzz.com/)
In either case, look at the
[coverage reports]({{ site.baseurl }}/further-reading/clusterfuzz#coverage-reports)
for your target(s) and figure out why some parts of the code are not covered.
## What if my fuzzer does not find new coverage or bugs after a while?
It is common for fuzzers to plateau and stop finding new coverage or bugs.
[Fuzz Introspector](https://github.com/ossf/fuzz-introspector) helps you
evaluate your fuzzers' performance.
It can help you identify bottlenecks causing your fuzzers to plateau.
It provides aggregated and individual fuzzer reachability and coverage reports.
Developers can either introduce a new fuzz target or modify an existing one to
reach previously unreachable code.
Here are
[case studies](https://github.com/ossf/fuzz-introspector/blob/main/doc/CaseStudies.md)
where Fuzz Introspector helped developers improve fuzzing of a project.
Fuzz Introspector reports are available on the [OSS-Fuzz homepage](https://oss-fuzz.com/)
or through this [index](http://oss-fuzz-introspector.storage.googleapis.com/index.html).
Developers can also use Fuzz Introspector on their local machines.
Detailed instructions are available
[here](https://github.com/ossf/fuzz-introspector/tree/main/oss_fuzz_integration#build-fuzz-introspector-with-oss-fuzz).
## Why are code coverage reports public?
We work with open source projects and try to keep as much information public as
possible. We believe that public code coverage reports do not put users at risk,
as they do not indicate the presence of bugs or lack thereof.
## Why is the coverage command complaining about format compatibility issues?
This may happen if the Docker images fetched locally become out of sync. Make
sure you run the following command to pull the most recent images:
```bash
$ python3 infra/helper.py pull_images
```
Please refer to
[code coverage]({{ site.baseurl }}/advanced-topics/code-coverage/) for detailed
information on code coverage generation.
## What happens when I rename a fuzz target ?
If you rename your fuzz targets, the existing bugs for those targets will get
closed and fuzzing will start from scratch from a fresh corpora
(seed corpus only). Similar corpora will get accumulated over time depending on
the number of cpu cycles that original fuzz target has run. If this is not
desirable, make sure to copy the accumulated corpora from the original fuzz
target (instructions to download
[here]({{ site.baseurl }}/advanced-topics/corpora/#downloading-the-corpus)) and
restore it to the new GCS location later (instruction to find the
new location [here]({{ site.baseurl }}/advanced-topics/corpora/#viewing-the-corpus-for-a-fuzz-target)).
## Does OSS-Fuzz support AFL or honggfuzz or Centipede?
OSS-Fuzz *uses* the following
[fuzzing engines]({{ site.baseurl }}/reference/glossary/#fuzzing-engine):
1. [libFuzzer](https://llvm.org/docs/LibFuzzer.html).
1. [AFL++](https://github.com/AFLplusplus/AFLplusplus), an improved and
well-maintained version of [AFL](https://lcamtuf.coredump.cx/afl/).
1. [Honggfuzz](https://github.com/google/honggfuzz).
1. [Centipede (Experimental)](https://github.com/google/centipede).
Follow the [new project guide] and OSS-Fuzz will use all its fuzzing engines
on your code.
## What are the specs on your machines?
OSS-Fuzz builders have 32CPU/28.8GB RAM.
Fuzzing machines only have a single core and fuzz targets should not use more
than 2.5GB of RAM.
## Are there any restrictions on using test cases / corpora generated by OSS-Fuzz?
No, you can freely use (i.e. share, add to your repo, etc.) the test cases and
corpora generated by OSS-Fuzz. OSS-Fuzz infrastructure is fully open source
(including [ClusterFuzz], various fuzzing engines, and other dependencies). We
have no intent to restrict the use of the artifacts produced by OSS-Fuzz.
[ClusterFuzz]: https://github.com/google/clusterfuzz
[new project guide]: {{ site.baseurl }}/getting-started/new-project-guide/
[ideal integration guide]: {{ site.baseurl }}/getting-started/new-project-guide/
================================================
FILE: docs/further-reading/clusterfuzz.md
================================================
---
layout: default
title: ClusterFuzz
parent: Further reading
nav_order: 1
permalink: /further-reading/clusterfuzz/
---
# ClusterFuzz
[ClusterFuzz](https://github.com/google/clusterfuzz) is the distributed fuzzing
infrastructure behind OSS-Fuzz. It was initially built for fuzzing Chrome at
scale.
- TOC
{:toc}
---
## Web interface
ClusterFuzz provides a [web interface](https://oss-fuzz.com)
to view statistics about your fuzz targets, as well as current crashes.
*Note*: Access is restricted to project developers who we auto CC on new bug
reports.
## Testcase reports
ClusterFuzz will automatically de-duplicate and file reproducible crashes into
our [bug tracker](https://bugs.chromium.org/p/oss-fuzz/issues/list). We provide
a crash report page that gives you the stack trace, a link to the crashing
testcase, and regression ranges where the bug was most likely introduced.

## Fuzzer stats
You can view statistics about your fuzz targets (e.g. speed, coverage
information, memory usage) on our fuzzer statistics dashboard.


## Coverage reports
We provide coverage reports, where we highlight the parts of source code that
are being reached by your fuzz target. Make sure to look at the uncovered code
marked in red and add appropriate fuzz targets to cover those use cases.


## Performance analyzer
You can view performance issues that your fuzz target is running into (e.g.
leaks, timeouts, etc) by clicking on `Performance` link on our fuzzer statistics
dashboard. Make sure to fix all cited issues, so as to keep your fuzz target
running efficiently and finding new bugs.

## Crash stats
You can view statistics of crashes over time on our crash statistics dashboard.

================================================
FILE: docs/further-reading/further_reading.md
================================================
---
layout: default
title: Further reading
has_children: true
nav_order: 4
permalink: /further-reading/
---
# Further reading
================================================
FILE: docs/further-reading/fuzzer_environment.md
================================================
---
layout: default
title: Fuzzer environment
parent: Further reading
nav_order: 2
permalink: /further-reading/fuzzer-environment/
---
# Fuzzer environment on ClusterFuzz
Your fuzz targets will be run on a
[Google Compute Engine](https://cloud.google.com/compute/) VM (Linux).
- TOC
{:toc}
---
## Runtime Dependencies
You should not make any assumptions on the availability of dependent packages
in the execution environment. Packages that are installed via
[Dockerfile]({{ site.baseurl }}/getting-started/new-project-guide/#dockerfile)
or built as part of
[build.sh]({{ site.baseurl }}/getting-started/new-project-guide/#buildsh)
are not available on the bot runtime environment (where the fuzz targets run).
If you need these dependencies in the runtime environment, you can either:
- (recommended) Build the dependencies statically in
[build.sh]({{ site.baseurl }}/getting-started/new-project-guide/#buildsh)
([example](https://github.com/google/oss-fuzz/blob/64f8b6593da141b97c98c7bc6f07df92c42ee010/projects/ffmpeg/build.sh#L26)).
Their source code should be inside the `$SRC` directory so that coverage can find it.
- Or install the packages via Dockerfile
([example](https://github.com/google/oss-fuzz/blob/2d5e2ef84f281e6ab789055aa735606d3122fda9/projects/tor/Dockerfile#L19))
and then link statically against them
([example](https://github.com/google/oss-fuzz/blob/2d5e2ef84f281e6ab789055aa735606d3122fda9/projects/tor/build.sh#L40)).
**Dependencies built in this way will not be instrumented** and may prevent
the fuzzer from finding bugs if they are involved in the execution of a fuzz target.
All build artifacts needed during fuzz target execution should be inside the
`$OUT` directory. Only those artifacts are archived and used on the bots.
Everything else is ignored (e.g. artifacts in `$WORK`, `$SRC`, etc) and hence
is not available in the execution environment.
We strongly recommend static linking because it just works.
However dynamic linking can work if shared objects are included in the `$OUT` directory and are loaded relative
to `'$ORIGIN'`, the path of the binary (see the discussion of `'$ORIGIN'` [here](http://man7.org/linux/man-pages/man8/ld.so.8.html)).
A fuzzer can be instructed to load libraries relative to `'$ORIGIN'` during compilation (i.e. `-Wl,-rpath,'$ORIGIN/lib'` )
or afterwards using `chrpath -r '$ORIGIN/lib' $OUT/$fuzzerName` ([example](https://github.com/google/oss-fuzz/blob/09aa9ac556f97bd4e31928747eca0c8fed42509f/projects/php/build.sh#L40)). Note that `'$ORIGIN'` should be surrounded
by single quotes because it is not an environment variable like `$OUT` that can be retrieved during execution of `build.sh`.
Its value is retrieved during execution of the binary. You can verify that you did this correctly using `ldd ` and the `check_build` command in `infra/helper.py`.
You should ensure that the fuzz target works correctly by using `run_fuzzer`
command (see instructions
[here]({{ site.baseurl }}/getting-started/new-project-guide/#testing-locally)).
This command uses a clean base-runner docker container and not the base-builder
docker container created during build-time.
## argv[0]
You must not modify `argv[0]`. It is required for certain things to work
correctly.
## Current working directory
You should not make any assumptions about the current working directory of your
fuzz target. If you need to load data files, please use `argv[0]` to get the
directory where your fuzz target executable is located.
## File system
Everything except `/tmp` is read-only, including the directory that your fuzz
target executable lives in.
`/dev` is also unavailable.
## Hardware
Your project should not be compiled with `-march=native` or `-mtune=native`
flags, as the build infrastructure and fuzzing machines may have different CPUs
as well as other hardware differences. You may however use `-mtune=generic`.
================================================
FILE: docs/getting-started/accepting_new_projects.md
================================================
---
layout: default
title: Accepting new projects
parent: Getting started
nav_order: 1
permalink: /getting-started/accepting-new-projects/
---
# Accepting New Projects
To be accepted to OSS-Fuzz, an open-source project must
have a significant user base and/or be critical to the global IT infrastructure.
To submit a new project, do the following:
1. [Create a pull request](https://help.github.com/articles/creating-a-pull-request/)
with a new `projects//project.yaml` file
([example](https://github.com/google/oss-fuzz/tree/master/projects/libarchive/project.yaml)).
**Note:** `project_name` can only contain alphanumeric characters,
underscores(_) or dashes(-).
2. In the file, provide the following information:
* Your project's homepage. ([`homepage`]({{ site.baseurl }}/getting-started/new-project-guide/#homepage))
* Your project's main repository URL. ([`main_repo`]({{ site.baseurl }}/getting-started/new-project-guide/#main_repo))
* Your project's primary language. ([`language`]({{ site.baseurl }}/getting-started/new-project-guide/#language))
* An email address for the engineering contact to be CCed on new issues ([`primary_contact`]({{ site.baseurl }}/getting-started/new-project-guide/#primary)), satisfying the following:
* The address belongs to an established project committer (according to VCS logs).
If the address isn't you, or if the address differs from VCS, we'll require an informal
email verification.
* The address is associated with a Google account
([why?]({{ site.baseurl }}/faq/#why-do-you-require-a-google-account-for-authentication)).
If you use an alternate email address
[linked to a Google Account](https://support.google.com/accounts/answer/176347?hl=en),
you'll only get access to [filed bugs in the issue tracker](https://bugs.chromium.org/p/oss-fuzz/issues/list), not to the [ClusterFuzz]({{ site.baseurl }}/further-reading/clusterfuzz)
dashboard. This is due to appengine API limitations.
3. Once your project is accepted, configure it by following the
[New Project Guide]({{ site.baseurl }}/getting-started/new-project-guide/).
================================================
FILE: docs/getting-started/bug_disclosure_guidelines.md
================================================
---
layout: default
title: Bug disclosure guidelines
parent: Getting started
nav_order: 4
permalink: /getting-started/bug-disclosure-guidelines/
---
## Bug Disclosure Guidelines
Following [Google's standard disclosure policy](https://googleprojectzero.blogspot.com/2015/02/feedback-and-data-driven-updates-to.html),
OSS-Fuzz will adhere to following disclosure principles:
- **Deadline**. After notifying project authors, we will open reported
issues to the public in 90 days, or after the fix is released (whichever
comes earlier).
- **Weekends and holidays**. If a deadline is due to expire on a weekend,
the deadline will be moved to the next normal work day.
- **Grace period**. We have a 14-day grace period. If a 90-day deadline
expires but the upstream engineers let us know before the deadline that a
patch is scheduled for release on a specific day within 14 days following
the deadline, the public disclosure will be delayed until the availability
of the patch.
================================================
FILE: docs/getting-started/continuous_integration.md
================================================
---
layout: default
title: Continuous Integration
parent: Getting started
nav_order: 5
permalink: /getting-started/continuous-integration/
---
# Continuous Integration
OSS-Fuzz offers **CIFuzz**, a GitHub action/CI job that runs your fuzz targets
on pull requests. This works similarly to running unit tests in CI. CIFuzz helps
you find and fix bugs before they make it into your codebase.
Currently, CIFuzz primarily supports projects hosted on GitHub.
Non-OSS-Fuzz users can use CIFuzz with additional features through
[ClusterFuzzLite](https://google.github.io/clusterfuzzlite/).
## How it works
CIFuzz builds your project's fuzzers from the source at a particular
pull request or commit. Then CIFuzz runs the fuzzers for a short amount of time.
If CIFuzz finds a crash, CIFuzz reports the stacktrace, makes the crashing
input available for download and the CI test fails (red X).
If CIFuzz doesn't find a crash during the allotted time, the CI test passes
(green check). If CIFuzz finds a crash, it reports the crash only if both of
following are true:
* The crash is reproducible (on the PR/commit build).
* The crash does not occur on older OSS-Fuzz builds. (If the crash does occur
on older builds, then it was not introduced by the PR/commit
being tested.)
If your project supports [OSS-Fuzz's code coverage]({{ site.baseurl }}/advanced-topics/code-coverage),
CIFuzz only runs the fuzzers affected by a pull request/commit.
Otherwise it will divide up the allotted fuzzing time (10 minutes by default)
among all fuzzers in the project.
CIFuzz uses 30 day old/public regressions and corpora from OSS-Fuzz. This makes
fuzzing more effective and gives you regression testing for free.
## Requirements
1. Your project must be integrated with OSS-Fuzz.
1. Your project is hosted on GitHub.
1. Your repository needs to be cloned with `git` in oss-fuzz Dockerfile (do not use `go get` or other methods)
## Integrating into your repository
You can integrate CIFuzz into your project using the following steps:
1. Create a `.github` directory in the root of your project.
1. Create a `workflows` directory inside of your `.github` directory.
1. Copy the example [`cifuzz.yml`](https://github.com/google/oss-fuzz/blob/master/infra/cifuzz/example_cifuzz.yml)
file over from the OSS-Fuzz repository to the `workflows` directory.
1. Change the `oss-fuzz-project-name` value in `cifuzz.yml` from `example` to the name of your OSS-Fuzz project. It is **very important** that you use your OSS-Fuzz project name which is case sensitive. This name
is the name of your project's subdirectory in the [`projects`](https://github.com/google/oss-fuzz/tree/master/projects) directory of OSS-Fuzz.
1. Set the value of `fuzz-seconds`. The longest time that the project maintainers are acceptable with should be used. This value should be at minimum 600 seconds and scale with project size.
Your directory structure should look like the following:
```
project
|___ .github
| |____ workflows
| |____ cifuzz.yml
|___ other-files
```
cifuzz.yml for an example project:
```yaml
name: CIFuzz
on: [pull_request]
permissions: {}
jobs:
Fuzzing:
runs-on: ubuntu-latest
permissions:
security-events: write
steps:
- name: Build Fuzzers
id: build
uses: google/oss-fuzz/infra/cifuzz/actions/build_fuzzers@master
with:
oss-fuzz-project-name: 'example'
language: c++
- name: Run Fuzzers
uses: google/oss-fuzz/infra/cifuzz/actions/run_fuzzers@master
with:
oss-fuzz-project-name: 'example'
language: c++
fuzz-seconds: 600
output-sarif: true
- name: Upload Crash
uses: actions/upload-artifact@v4
if: failure() && steps.build.outcome == 'success'
with:
name: artifacts
path: ./out/artifacts
- name: Upload Sarif
if: always() && steps.build.outcome == 'success'
uses: github/codeql-action/upload-sarif@v2
with:
# Path to SARIF file relative to the root of the repository
sarif_file: cifuzz-sarif/results.sarif
checkout_path: cifuzz-sarif
```
### Optional configuration
#### Configurable Variables
`language`: (optional) The language your target program is written in. Defaults
to `c++`. This should be the same as the value you set in `project.yaml`. See
[this explanation]({{ site.baseurl }}//getting-started/new-project-guide/#language)
for more details.
`fuzz-seconds`: Determines how long CIFuzz spends fuzzing your project in seconds.
The default is 600 seconds. The GitHub Actions max run time is 21600 seconds (6
hours). This variable is only meaningful when supplied to the `run_fuzzers`
action, not the `build_fuzzers` action.
`dry-run`: Determines if CIFuzz surfaces errors. The default value is `false`. When set to `true`,
CIFuzz will never report a failure even if it finds a crash in your project.
This requires the user to manually check the logs for detected bugs. If dry run mode is desired,
make sure to set the dry-run parameters in both the `Build Fuzzers` and `Run Fuzzers` action step.
`allowed-broken-targets-percentage`: Can be set if you want to set a stricter
limit for broken fuzz targets than OSS-Fuzz's check_build. Most users should
not set this. This value is only meaningful when supplied to the `run_fuzzers`
action, not the `build_fuzzers` action.
`sanitizer`: Determines a sanitizer to build and run fuzz targets with. The choices are `'address'`,
`'memory'` and `'undefined'`. The default is `'address'`. It is important to note that the `Build Fuzzers`
and the `Run Fuzzers` sanitizer field needs to be the same. To specify a list of sanitizers
a [matrix](https://help.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idstrategymatrix)
can be used. To use a sanitizer add it to the list of sanitizers in the matrix field below:
`report-timeouts`: Determines whether to report fails due to timeouts.
`report-ooms`: Determines whether to report fails due to OOM.
```yaml
{% raw %}
name: CIFuzz
on: [pull_request]
permissions: {}
jobs:
Fuzzing:
runs-on: ubuntu-latest
permissions:
security-events: write
strategy:
fail-fast: false
matrix:
sanitizer: [address, undefined, memory]
steps:
- name: Build Fuzzers (${{ matrix.sanitizer }})
id: build
uses: google/oss-fuzz/infra/cifuzz/actions/build_fuzzers@master
with:
oss-fuzz-project-name: 'example'
language: c++
sanitizer: ${{ matrix.sanitizer }}
- name: Run Fuzzers (${{ matrix.sanitizer }})
uses: google/oss-fuzz/infra/cifuzz/actions/run_fuzzers@master
with:
oss-fuzz-project-name: 'example'
language: c++
fuzz-seconds: 600
sanitizer: ${{ matrix.sanitizer }}
output-sarif: true
- name: Upload Crash
uses: actions/upload-artifact@v4
if: steps.build.outcome == 'success'
with:
name: ${{ matrix.sanitizer }}-artifacts
path: ./out/artifacts
- name: Upload Sarif
if: always() && steps.build.outcome == 'success'
uses: github/codeql-action/upload-sarif@v2
with:
# Path to SARIF file relative to the root of the repository
sarif_file: cifuzz-sarif/results.sarif
checkout_path: cifuzz-sarif
{% endraw %}
```
#### Branches and paths
You can make CIFuzz trigger only on certain branches or paths by following the
instructions [here](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions).
For example, the following code can used to trigger CIFuzz only on changes to
C/C++ code residing on master and release branches:
```yaml
name: CIFuzz
on:
pull_request:
branches:
- master
- 'releases/**'
paths:
- '**.c'
- '**.cc'
- '**.cpp'
- '**.cxx'
- '**.h'
permissions: {}
jobs:
Fuzzing:
runs-on: ubuntu-latest
steps:
- name: Build Fuzzers
id: build
uses: google/oss-fuzz/infra/cifuzz/actions/build_fuzzers@master
with:
oss-fuzz-project-name: 'example'
language: c++
- name: Run Fuzzers
uses: google/oss-fuzz/infra/cifuzz/actions/run_fuzzers@master
with:
oss-fuzz-project-name: 'example'
language: c++
fuzz-seconds: 600
- name: Upload Crash
uses: actions/upload-artifact@v4
if: failure() && steps.build.outcome == 'success'
with:
name: artifacts
path: ./out/artifacts
```
You can checkout CIFuzz configs for OSS-Fuzz projects. Example -
[systemd](https://github.com/systemd/systemd/blob/main/.github/workflows/cifuzz.yml),
[curl](https://github.com/curl/curl/blob/master/.github/workflows/fuzz.yml).
## Ubuntu 24.04 Support
CIFuzz supports building and running fuzzers in an Ubuntu 24.04 environment.
Existing projects will continue to use the legacy environment (Ubuntu 20.04) by default,
preserving current behavior.
To migrate your project to Ubuntu 24.04, add the following line to your `project.yaml`:
```yaml
base_os_version: ubuntu-24-04
```
For OSS-Fuzz projects, this file is located at `projects//project.yaml`.
For external projects (ClusterFuzzLite), this file is typically located at `.clusterfuzzlite/project.yaml`.
## Understanding results
The results of CIFuzz can be found in two different places.
* Run fuzzers log:
1. This log can be accessed in the `actions` tab of a CIFuzz integrated repo.
1. Click on the `CIFuzz` button in the workflow selector on the left hand side.
1. Click on the event triggered by your desired pull request.
1. Click the `Fuzzing` workflow.
1. Select the `Run Fuzzer` drop down. It should show the timestamps and results
from each of the fuzz targets.

* Artifacts:
When the fuzzer crashes the input file that causes the crash is uploaded as an
artifact.
To download the artifact, do the following steps:
1. Click on the summary from the run, as illustrated in the screenshot below:
![github-actions-summary]
2. Click on the artifact you wish to download from the summary page, as
illustrated in the screenshot below:
![github-actions-download-crash]
[github-actions-summary]: (https://raw.githubusercontent.com/google/clusterfuzzlite/refs/heads/bucket/images/github-actions-summary.png)
[github-actions-download-crash]: (https://raw.githubusercontent.com/google/clusterfuzzlite/refs/heads/bucket/images/github-actions-download-crash.png)
## Feedback/Questions/Issues
Create an issue in [OSS-Fuzz](https://github.com/google/oss-fuzz/issues/new) if you have questions or any other feedback on CIFuzz.
================================================
FILE: docs/getting-started/getting_started.md
================================================
---
layout: default
title: Getting started
has_children: true
nav_order: 2
permalink: /getting-started/
---
# Getting started
These pages walk you through the process of integrating your open source project
with OSS-Fuzz.
================================================
FILE: docs/getting-started/integration_rewards.md
================================================
---
layout: default
title: Integration rewards
parent: Getting started
nav_order: 3
permalink: /getting-started/integration-rewards/
---
# Integration rewards
We encourage you to apply for integration rewards (up to **$30,000**) once your project
is successfully integrated with OSS-Fuzz. Please see the full details
[here](https://bughunters.google.com/about/rules/5097259337383936/oss-fuzz-reward-program-rules).
To submit your application for a reward, please fill out [this form](https://goo.gle/oss-fuzz-submission).
================================================
FILE: docs/getting-started/new-project-guide/bazel.md
================================================
---
layout: default
title: Integrating a Bazel project
parent: Setting up a new project
grand_parent: Getting started
nav_order: 5
permalink: /getting-started/new-project-guide/bazel/
---
# Integrating a Bazel project
{: .no_toc}
- TOC
{:toc}
---
## Bazel projects
The process of integrating a project using the [Bazel](https://bazel.build/)
build system with OSS-Fuzz is very similar to the general
[Setting up a new project]({{ site.baseurl }}/getting-started/new-project-guide/)
process. The key specifics of integrating a Bazel project are outlined below.
## Fuzzing support in Bazel
For Bazel-based projects, we recommend using the
[`rules_fuzzing`](https://github.com/bazelbuild/rules_fuzzing) extension library
for defining fuzz tests. `rules_fuzzing` provides support for building and running
fuzz tests under
[multiple sanitizer and fuzzing engine configurations][rules-fuzzing-usage].
It also supports specifying corpora and dictionaries as part of the fuzz test
definition.
The fuzzing rules provide out-of-the-box support for building and packaging fuzz
test artifacts in the OSS-Fuzz format. Each `//path/to:fuzz_test` fuzz test
target automatically has a `//path/to:fuzz_test_oss_fuzz` packaging target that
(a) builds the fuzz test using the instrumentation and engine library specified
in the OSS-Fuzz environment variables, and (b) generates an archive containing
the binary and its associated artifacts (corpus, dictionary, etc.). Moreover,
OSS-Fuzz provides a standard tool to automatically process these targets,
substantially simplifying the `build.sh` script (see below).
[rules-fuzzing-usage]: https://github.com/bazelbuild/rules_fuzzing#using-the-rules-in-your-project
## Project files
This section explains how to integrate the fuzz tests written using the
`rules_fuzzing` library with OSS-Fuzz. You can also see a complete example in the
[`bazel-rules-fuzzing-test`](https://github.com/google/oss-fuzz/tree/master/projects/bazel-rules-fuzzing-test)
project.
The structure of the project directory in the OSS-Fuzz repository does not
differ for Bazel-based projects. The project files have the following specific
aspects.
### project.yaml
Only C++ projects are currently supported.
Since the OSS-Fuzz target builds the fuzz test using the instrumentation and
engine specified in the OSS-Fuzz environment variables, all the engine and
sanitizer configurations supported in the `project.yaml` file are automatically
supported by the fuzzing rules.
### Dockerfile
There is no need to install Bazel in your Docker image. The OSS-Fuzz builder
image provides the `bazel` executable through the
[Bazelisk](https://github.com/bazelbuild/bazelisk) launcher, which will fetch
and use the latest Bazel release. If your project requires a particular Bazel
version, create a
[`.bazelversion`](https://docs.bazel.build/versions/master/updating-bazel.html)
file in your repository root with the desired version string.
### build.sh
Your `build.sh` script essentially needs to perform three steps: (1) selecting
which fuzz tests to build, (2) building their OSS-Fuzz package targets in the
right configuration, and (3) copying the build artifacts to the `${OUT}/`
destination.
OSS-Fuzz provides a
[`bazel_build_fuzz_tests`](https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/bazel_build_fuzz_tests)
tool that implements these steps in a standard way, so in most cases your
build script only needs to invoke this command with no arguments.
If necessary, the behavior of the tool can be customized through a set of
environment variables. The most common are:
* `BAZEL_EXTRA_BUILD_FLAGS` are extra build flags passed on the Bazel command
line.
* `BAZEL_FUZZ_TEST_TAG` and `BAZEL_FUZZ_TEST_EXCLUDE_TAG` can be overridden to
specify which target tags to use when determining what fuzz tests to include.
By default, the tool selects all the fuzz tests except for those tagged as
`"no-oss-fuzz"`.
* `BAZEL_FUZZ_TEST_QUERY` overrides the Bazel query the tool uses to identify
the fuzz tests to build, if the tag-based approach is not sufficient.
================================================
FILE: docs/getting-started/new-project-guide/go_lang.md
================================================
---
layout: default
title: Integrating a Go project
parent: Setting up a new project
grand_parent: Getting started
nav_order: 1
permalink: /getting-started/new-project-guide/go-lang/
---
# Integrating a Go project
{: .no_toc}
- TOC
{:toc}
---
The process of integrating a project written in Go with OSS-Fuzz is very similar
to the general
[Setting up a new project]({{ site.baseurl }}/getting-started/new-project-guide/)
process. The key specifics of integrating a Go project are outlined below.
## Go-fuzz support
OSS-Fuzz supports **go-fuzz** in the
[libFuzzer compatible mode](https://github.com/mdempsky/go114-fuzz-build)
only. In that mode, fuzz targets for Go use the libFuzzer engine with native Go
coverage instrumentation. Binaries compiled in this mode provide the same
libFuzzer command line interface as non-Go fuzz targets.
## Native Go Fuzzing support
OSS-fuzz supports [fuzzers written for the native Go 1.18 engine](https://go.dev/doc/fuzz/). These fuzzers are built as libFuzzer binaries in a similar fashion as fuzzers written for the go-fuzz engine. Because of that, dictionaries and seed corpora should be handled in accordance with [the OSS-fuzz documentation](https://google.github.io/oss-fuzz/getting-started/new-project-guide/#seed-corpus).
Unlike libFuzzer/go-fuzz targets which must accept one data buffer, fuzz targets written for the Native Go engine can accept any number of arguments of any type. Here is an example of a valid fuzzer with multiple arguments:
```go
package demofuzzing
import (
"fmt"
"testing"
)
func FuzzDemo(f *testing.F) {
f.Fuzz(func(t *testing.T, data1 string, data2 uint32, data3 float64) {
fmt.Println(data1)
fmt.Println(data2)
fmt.Println(data3)
})
}
```
Some requirements for native Go 1.18 fuzzers are:
* The only `testing.F` method supported is currently `F.Fuzz()`.
* `F.Add()` will not add seeds when fuzzing. To provide OSS-fuzz with a seed corpus, follow the documentation [here](https://google.github.io/oss-fuzz/getting-started/new-project-guide/#seed-corpus).
## Project files
First, you need to write a Go fuzz target. This fuzz target should reside in your project
repository
([example](https://github.com/golang/go/blob/4ad13555184eb0697c2e92c64c1b0bdb287ccc10/src/html/fuzz.go#L13)).
The structure of the project directory in OSS-Fuzz repository doesn't differ for
projects written in Go. The project files have the following Go specific
aspects.
### project.yaml
The `language` attribute must be specified.
```yaml
language: go
```
The only supported fuzzing engine and sanitizer are `libfuzzer` and `address`,
respectively.
[Example](https://github.com/google/oss-fuzz/blob/356f2b947670b7eb33a1f535c71bc5c87a60b0d1/projects/syzkaller/project.yaml#L7):
```yaml
fuzzing_engines:
- libfuzzer
sanitizers:
- address
```
### Dockerfile
The Dockerfile should start by `FROM gcr.io/oss-fuzz-base/base-builder-go`
The OSS-Fuzz builder image has the latest stable release of Golang installed. In
order to install dependencies of your project, add `RUN git clone ...` command to
your Dockerfile.
[Example](https://github.com/google/oss-fuzz/blob/356f2b947670b7eb33a1f535c71bc5c87a60b0d1/projects/syzkaller/Dockerfile#L23):
```dockerfile
# Dependency for one of the fuzz targets.
RUN git clone --depth 1 https://github.com/ianlancetaylor/demangle
```
go-fuzz will then automatically download the dependencies based on the go.mod file
### build.sh
In order to build a Go fuzz target, you need to call `go-fuzz`
command first, and then link the resulting `.a` file against
`$LIB_FUZZING_ENGINE` using the `$CXX $CXXFLAGS ...` command.
For go-fuzz fuzzers, the best way to do this is by using the [`compile_go_fuzzer` script](https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/compile_go_fuzzer), and for native Go 1.18 fuzzers it is recommended to use the [`compile_native_go_fuzzer` script](https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/compile_native_go_fuzzer). Both of these also support coverage builds.
`compile_native_go_fuzzer` requires two dependencies which can be installed with:
```bash
go install github.com/AdamKorcz/go-118-fuzz-build@latest
go get github.com/AdamKorcz/go-118-fuzz-build/testing
```
A usage example from go-dns project is
```sh
compile_go_fuzzer github.com/miekg/dns FuzzNewRR fuzz_newrr fuzz
```
Arguments are :
* path of the package with the fuzz target
* name of the fuzz function
* name of the fuzzer to be built
* optional tag to be used by `go build` and such
================================================
FILE: docs/getting-started/new-project-guide/javascript_lang.md
================================================
---
layout: default
title: Integrating a JavaScript project
parent: Setting up a new project
grand_parent: Getting started
nav_order: 4
permalink: /getting-started/new-project-guide/javascript-lang/
---
# Integrating a JavaScript project
{: .no_toc}
- TOC
{:toc}
---
The process of integrating a project written in JavaScript for Node.js
with OSS-Fuzz is very similar to the general
[Setting up a new project]({{ site.baseurl }}/getting-started/new-project-guide/)
process. The key specifics of integrating a JavaScript project are outlined below.
## Jazzer.js
JavaScript fuzzing in OSS-Fuzz is powered by
[Jazzer.js](https://github.com/CodeIntelligenceTesting/jazzer.js), which is
installed during the build step. As Jazzer.js operates directly on the JavaScript
source code level, it can be applied to any project written in a language that
can be transpiled into JavaScript such as TypeScript. More information on how Jazzer.js
fuzz targets look like can be found in its
[README's Usage section](https://github.com/CodeIntelligenceTesting/jazzer.js#usage).
## Project files
### Example project
We recommend viewing
[javascript-example](https://github.com/google/oss-fuzz/tree/master/projects/javascript-example)
as an example of a simple JavaScript fuzzing project. We also recommend having a look at
[typescript-example](https://github.com/google/oss-fuzz/tree/master/projects/typescript-example)
as an example of how to fuzz TypeScript projects. This example also demonstrates how to use
Jazzer.js fuzzed data provider.
### project.yaml
The `language` attribute must be specified as follows:
```yaml
language: javascript
```
The only supported fuzzing engine is libFuzzer (`libfuzzer`). So far, native sanitizers such as
AddressSanitizer (`address`) and UndefinedBehaviorSanitizer (`undefined`) are not supported.
They would only be needed for projects that have native addons, which is a rather infrequent
use case for JavaScript projects. If you have a project where you need ASan or UBSan, please
create open an issue on [Jazzer.js GitHub repo](https://github.com/CodeIntelligenceTesting/jazzer.js). None (`none`) is the default sanitizer for
JavaScript projects, so setting it up in `project.yaml` is optional.
```yaml
fuzzing_engines:
- libfuzzer
sanitizers:
- none
```
### Dockerfile
The Dockerfile should start by `FROM gcr.io/oss-fuzz-base/base-builder-javascript`
The OSS-Fuzz base Docker images already come with Node.js 19 and `npm` pre-installed.
Apart from that, you should usually not need to do more than to clone the
project, set a `WORKDIR`, and copy any necessary files, or install any
project-specific dependencies here as you normally would.
### Fuzzers
In the simplest case, every fuzzer consists of a single JavaScript file that exports
a function named `fuzz` taking a single argument of type [Buffer](https://nodejs.org/api/buffer.html).
An example fuzz target could thus be a file `fuzz_string_compare.js` with contents:
```javascript
/**
* @param { Buffer } data
*/
module.exports.fuzz = function (data) {
const s = data.toString();
if (s.length !== 16) {
return;
}
if (
s.slice(0, 8) === "Awesome " &&
s.slice(8, 15) === "Fuzzing" &&
s[15] === "!"
) {
throw Error("Welcome to Awesome Fuzzing!");
}
};
```
### build.sh
The OSS-Fuzz base docker image for JavaScript comes with the [`compile_javascript_fuzzer` script](https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/compile_javascript_fuzzer) preinstalled. In `build.sh`, you should install dependencies for your project, and if necessary compile the code into JavaScript. Then, you can use the script to build the fuzzers. The script ensures that [@Jazzer.js/core](https://www.npmjs.com/package/@jazzer.js/core) is installed so that its CLI can be used to execute your fuzz tests. It also generates a wrapper script that can be used as a drop-in replacement for libFuzzer. This means that the generated script accepts the same command line flags for libFuzzer. Under the hood these flags are simply forwarded to the libFuzzer native addon used by Jazzer.js.
A usage example from the javascript-example project is
```shell
compile_javascript_fuzzer example fuzz_string_compare.js --sync
```
Arguments are:
* relative path of the project in the $SRC directory
* relative path to the fuzz test inside the project
* remaining arguments are forwarded to the [Jazzer.js CLI](https://github.com/CodeIntelligenceTesting/jazzer.js/blob/main/docs/fuzz-targets.md#running-the-fuzz-target)
The [javascript-example](https://github.com/google/oss-fuzz/blob/master/projects/javascript-example/build.sh)
project contains an example of a `build.sh` for JavaScript projects.
## FuzzedDataProvider
Jazzer.js provides a `FuzzedDataProvider` that can simplify the task of creating a
fuzz target by translating the raw input bytes received from the fuzzer into
useful primitive JavaScript types. Its functionality is similar to
`FuzzedDataProviders` available in other languages, such as
[Java](https://codeintelligencetesting.github.io/jazzer-docs/jazzer-api/com/code_intelligence/jazzer/api/FuzzedDataProvider.html) and
[C++](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md).
A fuzz target using the `FuzzedDataProvider` would look as follows:
```javascript
const { FuzzedDataProvider } = require("@jazzer.js/core");
/**
* @param { Buffer } fuzzerInputData
*/
module.exports.fuzz = function (fuzzerInputData) {
const data = new FuzzedDataProvider(fuzzerInputData);
const i = data.consumeIntegral(4);
const s = data.consumeRemainingAsString();
exploreMe(i, s);
};
```
================================================
FILE: docs/getting-started/new-project-guide/jvm_lang.md
================================================
---
layout: default
title: Integrating a Java/JVM project
parent: Setting up a new project
grand_parent: Getting started
nav_order: 4
permalink: /getting-started/new-project-guide/jvm-lang/
---
# Integrating a Java/JVM project
{: .no_toc}
- TOC
{:toc}
---
The process of integrating a project written in Java or any other language
running on the Java Virtual Machine (JVM) with OSS-Fuzz is very similar to the
general
[Setting up a new project]({{ site.baseurl }}/getting-started/new-project-guide/)
process. The key specifics of integrating a JVM project are outlined below.
## Jazzer
Java fuzzing in OSS-Fuzz depends on
[Jazzer](https://github.com/CodeIntelligenceTesting/jazzer), which is
pre-installed on the OSS-Fuzz base docker images. As Jazzer operates directly
on the bytecode level, it can be applied to any project written in a JVM-based
language. More information on how Jazzer fuzz targets look like can be found in
its
[README's Usage section](https://github.com/CodeIntelligenceTesting/jazzer#usage).
## Project files
### Example project
We recommend viewing
[json-sanitizer](https://github.com/google/oss-fuzz/tree/master/projects/json-sanitizer)
as an example of a simple Java-only fuzzing project. Additional examples,
including one for a Java project with native dependencies, are part of the
[java-example](https://github.com/google/oss-fuzz/tree/master/projects/java-example)
project.
### project.yaml
The `language` attribute must be specified as follows:
```yaml
language: jvm
```
The only supported fuzzing engine is libFuzzer (`libfuzzer`). So far the only
supported sanitizers are AddressSanitizer (`address`) and
UndefinedBehaviorSanitizer (`undefined`). For pure Java projects, specify
just `address`:
```yaml
fuzzing_engines:
- libfuzzer
sanitizers:
- address
```
### Dockerfile
The Dockerfile should start by `FROM gcr.io/oss-fuzz-base/base-builder-jvm`
The OSS-Fuzz base Docker images already come with OpenJDK 15 pre-installed. If
you need Maven to build your project, you can install it by adding the following
line to your Dockerfile:
```docker
RUN apt-get update && apt-get install -y maven
```
Apart from this, you should usually not need to do more than to clone the
project, set a `WORKDIR`, and copy any necessary files, or install any
project-specific dependencies here as you normally would.
### Fuzzers
In the simplest case, every fuzzer consists of a single Java file with a
filename matching `*Fuzzer.java` and no `package` directive. An example fuzz
target could thus be a file `ExampleFuzzer.java` with contents:
```java
public class ExampleFuzzer {
public static void fuzzerTestOneInput(byte[] input) {
...
// Call a function of the project under test with arguments derived from
// input and throw an exception if something unwanted happens.
...
}
}
```
### build.sh
For JVM projects, `build.sh` does need some more significant modifications
over C/C++ projects. Below is an annotated example build script for a
Java-only project with single-file fuzz targets as described above:
```sh
# Step 1: Build the project
# Build the project .jar as usual, e.g. using Maven.
mvn package
# In this example, the project is built with Maven, which typically includes the
# project version into the name of the packaged .jar file. The version can be
# obtained as follows:
CURRENT_VERSION=$(mvn org.apache.maven.plugins:maven-help-plugin:3.2.0:evaluate \
-Dexpression=project.version -q -DforceStdout)
# Copy the project .jar into $OUT under a fixed name.
cp "target/sample-project-$CURRENT_VERSION.jar" $OUT/sample-project.jar
# Specify the projects .jar file(s), separated by spaces if there are multiple.
PROJECT_JARS="sample-project.jar"
# Step 2: Build the fuzzers (should not require any changes)
# The classpath at build-time includes the project jars in $OUT as well as the
# Jazzer API.
BUILD_CLASSPATH=$(echo $PROJECT_JARS | xargs printf -- "$OUT/%s:"):$JAZZER_API_PATH
# All .jar and .class files lie in the same directory as the fuzzer at runtime.
RUNTIME_CLASSPATH=$(echo $PROJECT_JARS | xargs printf -- "\$this_dir/%s:"):\$this_dir
for fuzzer in $(find $SRC -name '*Fuzzer.java'); do
fuzzer_basename=$(basename -s .java $fuzzer)
javac -cp $BUILD_CLASSPATH $fuzzer
cp $SRC/$fuzzer_basename.class $OUT/
# Create an execution wrapper that executes Jazzer with the correct arguments.
echo "#!/bin/bash
# LLVMFuzzerTestOneInput for fuzzer detection.
this_dir=\$(dirname \"\$0\")
if [[ \"\$@\" =~ (^| )-runs=[0-9]+($| ) ]]; then
mem_settings='-Xmx1900m:-Xss900k'
else
mem_settings='-Xmx2048m:-Xss1024k'
fi
LD_LIBRARY_PATH=\"$JVM_LD_LIBRARY_PATH\":\$this_dir \
\$this_dir/jazzer_driver --agent_path=\$this_dir/jazzer_agent_deploy.jar \
--cp=$RUNTIME_CLASSPATH \
--target_class=$fuzzer_basename \
--jvm_args=\"\$mem_settings:-Djava.awt.headless=true\" \
\$@" > $OUT/$fuzzer_basename
chmod +x $OUT/$fuzzer_basename
done
```
The [java-example](https://github.com/google/oss-fuzz/blob/master/projects/java-example/build.sh)
project contains an example of a `build.sh` for Java projects with native
libraries.
## FuzzedDataProvider
Jazzer provides a `FuzzedDataProvider` that can simplify the task of creating a
fuzz target by translating the raw input bytes received from the fuzzer into
useful primitive Java types. Its functionality is similar to
`FuzzedDataProviders` available in other languages, such as
[Python](https://github.com/google/atheris#fuzzeddataprovider) and
[C++](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md).
On OSS-Fuzz, the required library is available in the base docker images under
the path `$JAZZER_API_PATH`, which is added to the classpath by the example
build script shown above. Locally, the library can be obtained from
[Maven Central](https://search.maven.org/search?q=g:com.code-intelligence%20a:jazzer-api).
A fuzz target using the `FuzzedDataProvider` would look as follows:
```java
import com.code_intelligence.jazzer.api.FuzzedDataProvider;
public class ExampleFuzzer {
public static void fuzzerTestOneInput(FuzzedDataProvider data) {
int number = data.consumeInt();
String string = data.consumeRemainingAsString();
// ...
}
}
```
For a list of convenience methods offered by `FuzzedDataProvider`, consult its
[javadocs](https://codeintelligencetesting.github.io/jazzer-docs/jazzer-api/com/code_intelligence/jazzer/api/FuzzedDataProvider.html).
================================================
FILE: docs/getting-started/new-project-guide/lua_lang.md
================================================
---
layout: default
title: Integrating a Lua project
parent: Setting up a new project
grand_parent: Getting started
nav_order: 4
permalink: /getting-started/new-project-guide/lua-lang/
---
# Integrating a Lua project
{: .no_toc}
- TOC
{:toc}
---
The process of integrating a project written in Lua with OSS-Fuzz
is similar to the general [Setting up a new project]({{ site.baseurl
}}/getting-started/new-project-guide/) process. The key specifics of
integrating a Lua project are outlined below.
## luzer
Lua fuzzing in OSS-Fuzz is powered by
[luzer](https://github.com/ligurio/luzer). As luzer operates
directly on the Lua source code level, it can be applied to any
project written in a language that can be transpiled into Lua,
such as [MoonScript](https://moonscript.org/),
[TypeScriptToLua](https://typescripttolua.github.io/),
[Fennel](https://fennel-lang.org/), and [Urn](https://urn-lang.com/).
Also, it supports fuzzing C/C++ extensions written for Lua. When
fuzzing native code, luzer can be used in combination with
Address Sanitizer or Undefined Behavior Sanitizer to catch extra bugs.
## Project files
### Example project
We recommend viewing
[lua-example](https://github.com/google/oss-fuzz/tree/master/projects/lua-example)
as an example of a simple Lua fuzzing project. This example also
demonstrates how to use luzer's Fuzzed Data Provider.
### project.yaml
The `language` attribute must be specified as follows:
```yaml
language: c
```
The only supported fuzzing engine is libFuzzer (`libfuzzer`).
```yaml
fuzzing_engines:
- libfuzzer
sanitizers:
- none
```
There is nothing special for sanitizer support in OSS-Fuzz
infrastructure. luzer builds its own DSO with libFuzzer and
sanitizer and `compile_lua_fuzzer` (also managed by project) sets
it to `LD_PRELOAD` if required.
### Dockerfile
The Dockerfile should start by `FROM gcr.io/oss-fuzz-base/base-builder`.
The OSS-Fuzz base Docker images come without any pre-installed
components required for Lua fuzzing. Apart from that, you should
usually need to build or install a Lua runtime, luzer module,
clone the project, set a `WORKDIR`, and copy any necessary files,
or install any project-specific dependencies here as you normally would.
### Fuzzers
In the simplest case, every fuzzer consists of a single Lua file that defines
a function `TestOneInput` and executes a function named `luzer.Fuzz()`.
An example fuzz target could thus be a file `fuzz_basic.lua` with contents:
```lua
local parser = require("src.luacheck.parser")
local decoder = require("luacheck.decoder")
local luzer = require("luzer")
local function TestOneInput(buf)
parser.parse(decoder.decode(buf))
end
local args = {
print_final_stats = 1,
}
luzer.Fuzz(TestOneInput, nil, args)
```
### compile_lua_fuzzer
Unlike projects for other languages, the base image does not
include a script that generates a wrapper script that can be used
as a drop-in replacement for libFuzzer.
Therefore, you need to add such a script yourself. This script
sets a relative path to Lua runtime that will be used for running
tests and the necessary environment variables (for example, `LUA_PATH`,
`LUA_CPATH` and `LD_PRELOAD`) and specifies the path directly to
the `.lua` file containing the test implementation. The script
`compile_lua_fuzzer` must accept the same command line flags as
libFuzzer-based tests.
Note, the resulting wrapper scripts must contain the word "luarocks"
to pass checks by `bad_build_check` in continuous integration.
Then, you can use the script `compile_lua_fuzzer` to build the fuzzers.
A usage example from the `lua-example` project is
```shell
compile_lua_fuzzer lua fuzz_basic.lua
```
Arguments are:
* a relative path to a Lua runtime name
* a relative path to the fuzzing test inside the OSS Fuzz project directory
The `lua-example` projects includes an
[example](https://github.com/google/oss-fuzz/blob/master/projects/lua-example/compile_lua_fuzzer)
of such script.
### build.sh
The script is executed within the image built from your [Dockerfile](#Dockerfile).
In general, this script should do the following:
- Set up or build a Lua runtime.
- Set up or build required dependencies for your tests.
- Generate wrapper scripts for your tests using [compile_lua_fuzzer](#compile_lua_fuzzer).
Resulting binaries, tests and their wrapper scripts, and a
directory with Luarocks dependencies should be placed in `$OUT`.
Beware, when installing the luzer module, you need to set the
environment variable `OSS_FUZZ` to non-empty value, otherwise the
build may fail.
The [lua-example](https://github.com/google/oss-fuzz/blob/master/projects/lua-example/build.sh)
project contains an example of a `build.sh` for a Lua projects.
## FuzzedDataProvider
luzer provides a Fuzzed Data Provider that is helpful for splitting
a fuzz input into multiple parts of various Lua types. Its
functionality is similar to
[Fuzzed Data Provider](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider)
available in LLVM. Learn about methods, provided by FDP in luzer,
in [documentation](https://github.com/ligurio/luzer/blob/master/docs/api.md#structure-aware-fuzzing).
A fuzz target using the `FuzzedDataProvider` would look as follows:
```lua
local luzer = require("luzer")
local function TestOneInput(buf)
local fdp = luzer.FuzzedDataProvider(buf)
local str = fdp:consume_string(4)
local b = {}
str:gsub(".", function(c) table.insert(b, c) end)
local count = 0
if b[1] == "o" then count = count + 1 end
if b[2] == "o" then count = count + 1 end
if b[3] == "p" then count = count + 1 end
if b[4] == "s" then count = count + 1 end
if count == 4 then assert(nil) end
end
local args = {
only_ascii = 1,
print_pcs = 1,
}
luzer.Fuzz(TestOneInput, nil, args)
```
================================================
FILE: docs/getting-started/new-project-guide/python_lang.md
================================================
---
layout: default
title: Integrating a Python project
parent: Setting up a new project
grand_parent: Getting started
nav_order: 3
permalink: /getting-started/new-project-guide/python-lang/
---
# Integrating a Python project
{: .no_toc}
- TOC
{:toc}
---
The process of integrating a project written in Python with OSS-Fuzz is very
similar to the general
[Setting up a new project]({{ site.baseurl }}/getting-started/new-project-guide/)
process. The key specifics of integrating a Python project are outlined below.
## Atheris
Python fuzzing in OSS-Fuzz depends on
[Atheris](https://github.com/google/atheris). Fuzzers will depend on the
`atheris` package, and dependencies are pre-installed on the OSS-Fuzz base
docker images.
## Project files
### Example project
We recommend viewing [ujson](https://github.com/google/oss-fuzz/tree/master/projects/ujson) as an
example of a simple Python fuzzing project, with both plain-Atheris and
Atheris + Hypothesis harnesses.
### project.yaml
The `language` attribute must be specified.
```yaml
language: python3
```
The only supported fuzzing engine is libFuzzer (`libfuzzer`). The supported
sanitizers are AddressSanitizer (`address`) and
UndefinedBehaviorSanitizer (`undefined`). These must be explicitly specified.
```yaml
fuzzing_engines:
- libfuzzer
sanitizers:
- address
- undefined
```
### Dockerfile
The Dockerfile should start by `FROM gcr.io/oss-fuzz-base/base-builder-python`
Because most dependencies are already pre-installed on the images, no
significant changes are needed in the Dockerfile for Python fuzzing projects.
You should simply clone the project, set a `WORKDIR`, and copy any necessary
files, or install any project-specific dependencies here as you normally would.
### build.sh
For Python projects, `build.sh` does need some more significant modifications
over normal projects. The following is an annotated example build script,
explaining why each step is necessary and when they can be omitted.
```sh
# Build and install project (using current CFLAGS, CXXFLAGS). This is required
# for projects with C extensions so that they're built with the proper flags.
pip3 install .
# Build fuzzers into $OUT. These could be detected in other ways.
for fuzzer in $(find $SRC -name '*_fuzzer.py'); do
fuzzer_basename=$(basename -s .py $fuzzer)
fuzzer_package=${fuzzer_basename}.pkg
# To avoid issues with Python version conflicts, or changes in environment
# over time on the OSS-Fuzz bots, we use pyinstaller to create a standalone
# package. Though not necessarily required for reproducing issues, this is
# required to keep fuzzers working properly in OSS-Fuzz.
pyinstaller --distpath $OUT --onefile --name $fuzzer_package $fuzzer
# Create execution wrapper. Atheris requires that certain libraries are
# preloaded, so this is also done here to ensure compatibility and simplify
# test case reproduction. Since this helper script is what OSS-Fuzz will
# actually execute, it is also always required.
# NOTE: If you are fuzzing python-only code and do not have native C/C++
# extensions, then remove the LD_PRELOAD line below as preloading sanitizer
# library is not required and can lead to unexpected startup crashes.
echo "#!/bin/sh
# LLVMFuzzerTestOneInput for fuzzer detection.
this_dir=\$(dirname \"\$0\")
LD_PRELOAD=\$this_dir/sanitizer_with_fuzzer.so \
ASAN_OPTIONS=\$ASAN_OPTIONS:symbolize=1:external_symbolizer_path=\$this_dir/llvm-symbolizer:detect_leaks=0 \
\$this_dir/$fuzzer_package \$@" > $OUT/$fuzzer_basename
chmod +x $OUT/$fuzzer_basename
done
```
## Hypothesis
Using [Hypothesis](https://hypothesis.readthedocs.io/), the Python library for
[property-based testing](https://hypothesis.works/articles/what-is-property-based-testing/),
makes it really easy to generate complex inputs - whether in traditional test suites
or [by using test functions as fuzz harnesses](https://hypothesis.readthedocs.io/en/latest/details.html#use-with-external-fuzzers).
> Property based testing is the construction of tests such that, when these tests are fuzzed,
failures in the test reveal problems with the system under test that could not have been
revealed by direct fuzzing of that system.
We recommend using the [`hypothesis write`](https://hypothesis.readthedocs.io/en/latest/ghostwriter.html)
command to generate a starter fuzz harness. This "ghostwritten" code may be usable as-is,
or provide a useful template for writing more specific tests.
See [here for the core "strategies"](https://hypothesis.readthedocs.io/en/latest/data.html),
for arbitrary data, [here for Numpy + Pandas support](https://hypothesis.readthedocs.io/en/latest/numpy.html),
or [here for a variety of third-party extensions](https://hypothesis.readthedocs.io/en/latest/strategies.html)
supporting everything from protobufs, to jsonschemas, to networkx graphs or geojson
or valid Python source code.
Hypothesis' integrated test-case reduction also makes it trivial to report a canonical minimal
example for each distinct failure discovered while fuzzing - just run the test function!
To use Hypothesis in OSS-Fuzz, install it in your Dockerfile with
```shell
RUN pip3 install hypothesis
```
See [the `ujson` structured fuzzer](https://github.com/google/oss-fuzz/blob/master/projects/ujson/hypothesis_structured_fuzzer.py)
for an example "polyglot" which can either be run with `pytest` as a standard test function,
or run with OSS-Fuzz as a fuzz harness.
================================================
FILE: docs/getting-started/new-project-guide/rust_lang.md
================================================
---
layout: default
title: Integrating a Rust project
parent: Setting up a new project
grand_parent: Getting started
nav_order: 2
permalink: /getting-started/new-project-guide/rust-lang/
---
# Integrating a Rust project
{: .no_toc}
- TOC
{:toc}
---
The process of integrating a project written in Rust with OSS-Fuzz is very
similar to the general [Setting up a new project]({{ site.baseurl
}}/getting-started/new-project-guide/) process. The key specifics of integrating
a Rust project are outlined below.
## cargo-fuzz support
Rust integration with OSS-Fuzz is expected to use [`cargo
fuzz`](https://github.com/rust-fuzz/cargo-fuzz) to build fuzzers. The `cargo
fuzz` tool will build code with required compiler flags as well as link to the
correct libFuzzer on OSS-Fuzz itself. Note that using `cargo fuzz` also makes it
quite easy to run the fuzzers locally yourself if you get a failing test case!
## Project files
First you'll want to follow the [setup instructions for `cargo fuzz`
itself](https://rust-fuzz.github.io/book/). Afterwards your project should have:
* A top-level `fuzz` directory.
* A `fuzz/Cargo.toml` manifest which pulls in necessary dependencies to fuzz.
* Some `fuzz/fuzz_targets/*.rs` files which are the fuzz targets that will be
compiled and run on OSS-Fuzz.
Note that you can customize this layout as well, but you'll need to edit some
the scripts below to integrate into OSS-Fuzz.
### project.yaml
The `language` attribute must be specified.
```yaml
language: rust
```
The only supported fuzzing engine and sanitizer are `libfuzzer` and `address`,
respectively.
[Example](https://github.com/google/oss-fuzz/blob/12ef3654b3e9adfd20b5a6afdde54819ba71493d/projects/serde_json/project.yaml#L3-L6)
```yaml
sanitizers:
- address
fuzzing_engines:
- libfuzzer
```
### Dockerfile
The Dockerfile should start by `FROM gcr.io/oss-fuzz-base/base-builder-rust`
The OSS-Fuzz builder image has the latest nightly release of Rust as well as
`cargo fuzz` pre-installed and in `PATH`. In the `Dockerfile` for your project
all you'll need to do is fetch the latest copy of your code and install any
system dependencies necessary to build your project.
[Example](https://github.com/google/oss-fuzz/blob/12ef3654b3e9adfd20b5a6afdde54819ba71493d/projects/serde_json/Dockerfile#L18-L20)
```dockerfile
RUN git clone --depth 1 https://github.com/serde-rs/json json
```
### build.sh
Here it's expected that you'll build the fuzz targets for your project and then
copy the final binaries into the output directory.
[Example](https://github.com/google/oss-fuzz/blob/12ef3654b3e9adfd20b5a6afdde54819ba71493d/projects/serde_json/build.sh#L20):
```sh
cd $SRC/json
cargo fuzz build -O
cp fuzz/target/x86_64-unknown-linux-gnu/release/from_slice $OUT/
```
Note that you likely want to pass the `-O` flag to `cargo fuzz build` which
builds fuzzers in release mode. You may also want to pass the
`--debug-assertions` flag to enable more checks while fuzzing. In this example
the `from_slice` binary is the fuzz target.
With some bash-fu you can also automatically copy over all fuzz targets into
the output directory so when you add a fuzz target to your project it's
automatically integrated into OSS-Fuzz:
```sh
FUZZ_TARGET_OUTPUT_DIR=target/x86_64-unknown-linux-gnu/release
for f in fuzz/fuzz_targets/*.rs
do
FUZZ_TARGET_NAME=$(basename ${f%.*})
cp $FUZZ_TARGET_OUTPUT_DIR/$FUZZ_TARGET_NAME $OUT/
done
```
## Writing fuzzers using a test-style strategy
In Rust you will often have tests written in a way so they are only
compiled into the final binary when build in test-mode. This is, achieved by
wrapping your test code in `cfg(test)`, e.g.
```rust
#[cfg(test)]
mod tests {
use super::*;
...
```
Cargo-fuzz automatically enables the `fuzzing` feature, which means you can
follow a similar strategy to writing fuzzers as you do when writing tests.
Specifically, you can create modules wrapped in the `fuzzing` feature:
```rust
#[cfg(fuzzing)]
pub mod fuzz_logic {
use super::*;
...
```
and then call the logic within `fuzz_logic` from your fuzzer.
Furthermore, within your `.toml` files, you can then specify fuzzing-specific
dependencies by wrapping them as follows:
```
[target.'cfg(fuzzing)'.dependencies]
```
similar to how you wrap test-dependencies as follows:
```
[dev-dependencies]
```
Finally, you can also combine the testing logic you have and the fuzz logic. This
can be achieved simply by using
```rust
#[cfg(any(test, fuzzing))]
```
A project that follows this structure is Linkerd2-proxy and the project files can be
seen [here](https://github.com/google/oss-fuzz/tree/master/projects/linkerd2-proxy).
================================================
FILE: docs/getting-started/new-project-guide/swift_lang.md
================================================
---
layout: default
title: Integrating a Swift project
parent: Setting up a new project
grand_parent: Getting started
nav_order: 1
permalink: /getting-started/new-project-guide/swift-lang/
---
# Integrating a Swift project
{: .no_toc}
- TOC
{:toc}
---
The process of integrating a project written in Swift with OSS-Fuzz is very similar
to the general
[Setting up a new project]({{ site.baseurl }}/getting-started/new-project-guide/)
process. The key specifics of integrating a Swift project are outlined below.
## Project files
First, you need to write a Swift fuzz target that accepts a stream of bytes and
calls the program API with that. This fuzz target should reside in your project
repository.
The structure of the project directory in OSS-Fuzz repository doesn't differ for
projects written in Swift. The project files have the following Swift specific
aspects.
### project.yaml
The `language` attribute must be specified.
```yaml
language: swift
```
The only supported fuzzing engine is `libfuzzer`
The supported sanitizers are and `address`, `thread`
[Example](https://github.com/google/oss-fuzz/blob/2a15c3c88b21f4f1be2a7ff115f72bd7a08e34ac/projects/swift-nio/project.yaml#L9):
```yaml
fuzzing_engines:
- libfuzzer
sanitizers:
- address
- thread
```
### Dockerfile
The Dockerfile should start by `FROM gcr.io/oss-fuzz-base/base-builder-swift`
instead of using the simple base-builder
### build.sh
A `precompile_swift` generates an environment variable `SWIFTFLAGS`
This can then be used in the building command such as `swift build -c release $SWIFTFLAGS`
A usage example from swift-protobuf project is
```sh
. precompile_swift
# build project
cd FuzzTesting
swift build -c debug $SWIFTFLAGS
(
cd .build/debug/
find . -maxdepth 1 -type f -name "*Fuzzer" -executable | while read i; do cp $i $OUT/"$i"-debug; done
)
```
================================================
FILE: docs/getting-started/new_project_guide.md
================================================
---
layout: default
title: Setting up a new project
parent: Getting started
has_children: true
nav_order: 2
permalink: /getting-started/new-project-guide/
---
# Setting up a new project
{: .no_toc}
- TOC
{:toc}
---
## Prerequisites
Before you can start setting up your new project for fuzzing, you must do the following:
- [Integrate]({{ site.baseurl }}/advanced-topics/ideal-integration/) one or more [fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target)
with the project you want to fuzz.
For examples, see
[boringssl](https://github.com/google/boringssl/tree/master/fuzz) or
[SQLite](https://www.sqlite.org/src/artifact/ad79e867fb504338) (C/C++),
[go-fuzz](https://github.com/dvyukov/go-fuzz-corpus/tree/86a5af9d6842f80b205a082538ea28f61bbb8ccb) or
[syzkaller](https://github.com/google/syzkaller/tree/7c7ded697e6322b0975f061b7e268fe44f585dab/prog/test)
(Go).
- [Install Docker](https://docs.docker.com/engine/installation)
(Googlers can visit [go/installdocker](https://goto.google.com/installdocker)).
[Why Docker?]({{ site.baseurl }}/faq/#why-do-you-use-docker)
If you want to run `docker` without `sudo`, you can
[create a docker group](https://docs.docker.com/engine/installation/linux/ubuntulinux/#/create-a-docker-group).
**Note:** Docker images can consume significant disk space. Run
[docker-cleanup](https://gist.github.com/mikea/d23a839cba68778d94e0302e8a2c200f)
periodically to garbage-collect unused images.
- (optional) [Install gsutil](https://cloud.google.com/storage/docs/gsutil_install) for local code coverage testing.
For Google internal (gLinux) machines, please refer [here](https://cloud.google.com/storage/docs/gsutil_install#deb) instead.
## Creating the file structure
Each OSS-Fuzz project has a subdirectory
inside the [`projects/`](https://github.com/google/oss-fuzz/tree/master/projects) directory in the [OSS-Fuzz repository](https://github.com/google/oss-fuzz). For example, the [boringssl](https://github.com/google/boringssl)
project is located in [`projects/boringssl`](https://github.com/google/oss-fuzz/tree/master/projects/boringssl).
Each project directory also contains the following three configuration files:
* [project.yaml](#projectyaml) - provides metadata about the project.
* [Dockerfile](#dockerfile) - defines the container environment with information
on dependencies needed to build the project and its [fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target).
* [build.sh](#buildsh) - defines the build script that executes inside the Docker container and
generates the project build.
You can automatically create a new directory for your project in OSS-Fuzz and
generate templated versions of the configuration files
by running the following commands:
```bash
$ cd /path/to/oss-fuzz
$ export PROJECT_NAME=
$ export LANGUAGE=
$ python3 infra/helper.py generate $PROJECT_NAME --language=$LANGUAGE
```
Once the template configuration files are created, you can modify them to fit your project.
**Note:** We prefer that you keep and maintain [fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target) in your own source code repository. If this isn't possible, you can store them inside the OSS-Fuzz project directory you created.
## project.yaml {#projectyaml}
This configuration file stores project metadata. The following attributes are supported:
- [homepage](#homepage)
- [language](#language)
- [primary_contact](#primary)
- [auto_ccs](#auto_ccs)
- [main_repo](#main_repo)
- [vendor_ccs](#vendor) (optional)
- [sanitizers](#sanitizers) (optional)
- [architectures](#architectures) (optional)
- [help_url](#help_url) (optional)
- [builds_per_day](#build_frequency) (optional)
- [file_github_issue](#file_github_issue) (optional)
- [disable_remediation](#disable_remediation) (optional)
### homepage
You project's homepage.
### language
Programming language the project is written in. Values you can specify include:
* `c`
* `c++`
* [`go`]({{ site.baseurl }}//getting-started/new-project-guide/go-lang/)
* [`rust`]({{ site.baseurl }}//getting-started/new-project-guide/rust-lang/)
* [`python`]({{ site.baseurl }}//getting-started/new-project-guide/python-lang/)
* [`jvm` (Java, Kotlin, Scala and other JVM-based languages)]({{ site.baseurl }}//getting-started/new-project-guide/jvm-lang/)
* [`swift`]({{ site.baseurl }}//getting-started/new-project-guide/swift-lang/)
* [`javascript`]({{ site.baseurl }}//getting-started/new-project-guide/javascript-lang/)
* [`lua`]({{ site.baseurl }}//getting-started/new-project-guide/lua-lang/)
### primary_contact, auto_ccs {#primary}
The primary contact and list of other contacts to be CCed. Each person listed gets access to ClusterFuzz, including crash reports and fuzzer statistics, and are auto-cced on new bugs filed in the OSS-Fuzz
tracker. If you're a primary or a CC, you'll need to use a [Google account](https://support.google.com/accounts/answer/176347?hl=en) to get full access. ([why?]({{ site.baseurl }}/faq/#why-do-you-require-a-google-account-for-authentication)).
### main_repo {#main_repo}
Path to source code repository hosting the code, e.g. `https://path/to/main/repo.git`.
### vendor_ccs (optional) {#vendor}
The list of vendor email addresses that are downstream consumers of the project and want access to
the bug reports as they are filed.
Any changes to this list must follow these rules:
- Approved by the project maintainer (e.g. comment on pull request, reply on project mailing list).
- An organization email address is used.
### sanitizers (optional) {#sanitizers}
The list of sanitizers to use. Possible values are: `address`, `memory` and `undefined`.
If you don't specify a list, `sanitizers` uses a default list of supported
sanitizers (currently ["address"](https://clang.llvm.org/docs/AddressSanitizer.html) and
["undefined"](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html)).
[MemorySanitizer](https://clang.llvm.org/docs/MemorySanitizer.html) ("memory") is also supported
and recommended, but is not enabled by default due to the likelihood of false positives from
un-instrumented system dependencies.
If you want to use "memory," please build all libraries your project needs using
MemorySanitizer.
This can be done by building them with the compiler flags provided during
MemorySanitizer builds.
Then, you can opt in by adding "memory" to your list of sanitizers.
If your project does not build with a particular sanitizer configuration and you need some time to fix
it, you can use `sanitizers` to override the defaults temporarily. For example, to disable the
UndefinedBehaviourSanitizer build, just specify all supported sanitizers except "undefined".
If you want to test a particular sanitizer to see what crashes it generates without filing
them in the issue tracker, you can set an `experimental` flag. For example, if you want to test "memory", set `experimental: True` like this:
```
sanitizers:
- address
- memory:
experimental: True
- undefined
```
Crashes can be accessed on the [ClusterFuzz
homepage]({{ site.baseurl }}/further-reading/clusterfuzz#web-interface).
`sanitizers` example: [boringssl](https://github.com/google/oss-fuzz/blob/master/projects/boringssl/project.yaml).
### architectures (optional) {#architectures}
The list of architectures to fuzz on.
ClusterFuzz supports fuzzing on x86_64 (aka x64) by default.
Some projects can benefit from i386 fuzzing. OSS-Fuzz will build and run
AddressSanitizer with libFuzzer on i386 by doing the following:
```yaml
architectures:
- x86_64
- i386
```
By fuzzing on i386 you might find bugs that:
* Only occur in architecture-specific source code (e.g. code that contains i386 assembly).
* Exist in architecture-independent source code and which only affects i386 users.
* Exist in architecture-independent source code and which affects users on other 32-bit platforms such as AArch32 (aka 32-bit ARM).
Note that some bugs which affect x86_64 may be discovered on i386 and filed as such.
On the testcase page of each oss-fuzz issue is a list of other jobs where the crash reproduces, this can let you know if the crash exists on x86_64 as well.
Fuzzing on i386 is not enabled by default because many projects won't build for i386 without some modification to their OSS-Fuzz build process.
For example, you will need to link against `$LIB_FUZZING_ENGINE` and possibly install i386 dependencies within the x86_64 docker image ([for example](https://github.com/google/oss-fuzz/blob/5b8dcb5d942b3b8bc173b823fb9ddbdca7ec6c99/projects/gdal/build.sh#L18)) to get things working.
There are [known bugs](https://github.com/google/oss-fuzz/issues/2746) in ASAN
on i386 that cause ClusterFuzz to report unreproducible crashes for 0 length
testcases. There are no plans to fix these bugs so be ready for slightly more
false positives if you use i386. These false positives should be somewhat easy
to identify since they will manifest as crashes in ASAN rather than your code.
### fuzzing_engines (optional) {#fuzzing_engines}
The list of fuzzing engines to use.
By default, `libfuzzer`, `afl`, `honggfuzz`, and `centipede` are used. It is recommended to
use all of them if possible. `libfuzzer` is required by OSS-Fuzz.
### help_url (optional) {#help_url}
A link to a custom help URL that appears in bug reports instead of the default
[OSS-Fuzz guide to reproducing crashes]({{ site.baseurl }}/advanced-topics/reproducing/). This can be useful if you assign
bugs to members of your project unfamiliar with OSS-Fuzz, or if they should follow a different workflow for
reproducing and fixing bugs than the standard one outlined in the reproducing guide.
`help_url` example: [skia](https://github.com/google/oss-fuzz/blob/master/projects/skia/project.yaml).
### builds_per_day (optional) {#build_frequency}
The number of times the project should be built per day.
OSS-Fuzz allows upto 4 builds per day, and builds once per day by default.
Example:
```yaml
builds_per_day: 2
```
Will build the project twice per day.
### file_github_issue (optional) {#file_github_issue}
Whether to mirror issues on github instead of having them only in the OSS-Fuzz
tracker.
### disable_remediation (optional) {#disable_remediation}
Opt out of receiving remediation for all new and existing bugs. If remediation
is disabled, all disclosure notifications will not include any proposed code
changes. If enabled (default), proposed code changes and comments to remediate
bugs may be automatically included in disclosure that is private during the
embargo of each issue on a case-by-case basis basis.
## Dockerfile {#dockerfile}
This configuration file defines the Docker image for your project. Your [build.sh](#buildsh) script will be executed in inside the container you define.
For most projects, the image is simple:
```docker
FROM gcr.io/oss-fuzz-base/base-builder # base image with clang toolchain
RUN apt-get update && apt-get install -y ... # install required packages to build your project
RUN git clone # checkout all sources needed to build your project
WORKDIR # current directory for the build script
COPY build.sh fuzzer.cc $SRC/ # copy build script and other fuzzer files in src dir
```
In the above example, the git clone will check out the source to `$SRC/`.
Depending on your project's language, you will use a different base image,
for instance `FROM gcr.io/oss-fuzz-base/base-builder-go` for golang.
For an example, see
[expat/Dockerfile](https://github.com/google/oss-fuzz/tree/master/projects/expat/Dockerfile)
or
[syzkaller/Dockerfile](https://github.com/google/oss-fuzz/blob/master/projects/syzkaller/Dockerfile).
In the case of a project with multiple languages/toolchains needed,
you can run installation scripts `install_lang.sh` where lang is the language needed.
You also need to setup environment variables needed by this toolchain, for example `GOPATH` is needed by golang.
For an example, see
[ecc-diff-fuzzer/Dockerfile](https://github.com/google/oss-fuzz/blob/master/projects/ecc-diff-fuzzer/Dockerfile).
where we use `base-builder-rust`and install golang
Runtime dependencies of your project, such as third-party static libraries, will
not be instrumented if you build them in the Dockerfile. In most cases, you will
want to build them in `build.sh` instead.
## build.sh {#buildsh}
This file defines how to build binaries for [fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target) in your project.
The script is executed within the image built from your [Dockerfile](#Dockerfile).
In general, this script should do the following:
- Build the project using your build system with the correct compiler.
- Provide compiler flags as [environment variables](#Requirements).
- Build your [fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target) and link your project's build with libFuzzer.
Resulting binaries should be placed in `$OUT`.
Here's an example from Expat ([source](https://github.com/google/oss-fuzz/blob/master/projects/expat/build.sh)):
```bash
#!/bin/bash -eu
./buildconf.sh
# configure scripts usually use correct environment variables.
./configure
make clean
make -j$(nproc) all
$CXX $CXXFLAGS -std=c++11 -Ilib/ \
$SRC/parse_fuzzer.cc -o $OUT/parse_fuzzer \
$LIB_FUZZING_ENGINE .libs/libexpat.a
cp $SRC/*.dict $SRC/*.options $OUT/
```
If your project is written in Go, check out the [Integrating a Go project]({{ site.baseurl }}//getting-started/new-project-guide/go-lang/) page.
**Note:**
1. Don't assume the fuzzing engine is libFuzzer by default, because we generate builds for libFuzzer, AFL++, Honggfuzz, and Centipede fuzzing engine configurations. Instead, link the fuzzing engine using $LIB_FUZZING_ENGINE.
2. Make sure that the binary names for your [fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target) contain only
alphanumeric characters, underscore(_) or dash(-). Otherwise, they won't run on our infrastructure.
3. Don't remove source code files. They are needed for code coverage.
### Temporarily disabling code instrumentation during builds
In some cases, it's not necessary to instrument every 3rd party library or tool that supports the build target. Use the following snippet to build tools or libraries without instrumentation:
```
CFLAGS_SAVE="$CFLAGS"
CXXFLAGS_SAVE="$CXXFLAGS"
unset CFLAGS
unset CXXFLAGS
export AFL_NOOPT=1
#
# build commands here that should not result in instrumented code.
#
export CFLAGS="${CFLAGS_SAVE}"
export CXXFLAGS="${CXXFLAGS_SAVE}"
unset AFL_NOOPT
```
### build.sh script environment
When your build.sh script is executed, the following locations are available within the image:
| Location| Env Variable | Description |
|---------| ------------ | ---------- |
| `/out/` | `$OUT` | Directory to store build artifacts (fuzz targets, dictionaries, options files, seed corpus archives). |
| `/src/` | `$SRC` | Directory to checkout source files. |
| `/work/`| `$WORK` | Directory to store intermediate files. |
Although the files layout is fixed within a container, environment variables are
provided so you can write retargetable scripts.
In case your fuzz target uses the [FuzzedDataProvider] class, make sure it is
included via `#include ` directive.
[FuzzedDataProvider]: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider
### build.sh requirements {#Requirements}
Only binaries without an extension are accepted as targets. Extensions are reserved for other artifacts, like .dict.
You *must* use the special compiler flags needed to build your project and fuzz targets.
These flags are provided in the following environment variables:
| Env Variable | Description
| ------------- | --------
| `$CC`, `$CXX`, `$CCC` | The C and C++ compiler binaries.
| `$CFLAGS`, `$CXXFLAGS` | C and C++ compiler flags.
| `$LIB_FUZZING_ENGINE` | C++ compiler argument to link fuzz target against the prebuilt engine library (e.g. libFuzzer).
You *must* use `$CXX` as a linker, even if your project is written in pure C.
Most well-crafted build scripts will automatically use these variables. If not,
pass them manually to the build tool.
See the [Provided Environment Variables](https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/README.md#provided-environment-variables) section in
`base-builder` image documentation for more details.
### Static and dynamic linking of libraries
The `build.sh` should produce fuzzers that are statically linked. This is because the
fuzzer build environment is different to the fuzzer runtime environment and if your
project depends on third party libraries then it is likely they will not be present
in the execution environment. Thus, any shared libraries you may install or compile in
`build.sh` or `Dockerfile` will not be present in the fuzzer runtime environment. There
are exceptions to this rule, and for further information on this please see the [fuzzer environment]({{ site.baseurl }}/further-reading/fuzzer-environment/) page.
## Disk space restrictions
Our builders have a disk size of 250GB (this includes space taken up by the OS). Builds must keep peak disk usage below this.
In addition, please keep the size of the build (everything copied to `$OUT`) small (<10GB uncompressed). The build is repeatedly transferred and unzipped during fuzzing and runs on VMs with limited disk space.
## Fuzzer execution environment
For more on the environment that
your [fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target) run in, and the assumptions you can make, see the [fuzzer environment]({{ site.baseurl }}/further-reading/fuzzer-environment/) page.
## Testing locally
You can build your docker image and fuzz targets locally, so you can test them before you push them to the OSS-Fuzz repository.
1. Run the same helper script you used to create your directory structure, this time using it to build your docker image and [fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target):
```bash
$ cd /path/to/oss-fuzz
$ python3 infra/helper.py build_image $PROJECT_NAME
$ python3 infra/helper.py build_fuzzers --sanitizer $PROJECT_NAME
```
The built binaries appear in the `/path/to/oss-fuzz/build/out/$PROJECT_NAME`
directory on your machine (and `$OUT` in the container).
**Note:** You *must* run your fuzz target binaries inside the base-runner docker
container to make sure that they work properly.
2. Find failures to fix by running the `check_build` command:
```bash
$ python3 infra/helper.py check_build $PROJECT_NAME
```
3. If you want to test changes against a particular fuzz target, run the following command:
```bash
$ python3 infra/helper.py run_fuzzer --corpus-dir= $PROJECT_NAME
```
4. We recommend taking a look at your code coverage as a test to ensure that
your fuzz targets get to the code you expect. This would use the corpus
generated from the previous `run_fuzzer` step in your local corpus directory.
```bash
$ python3 infra/helper.py build_fuzzers --sanitizer coverage $PROJECT_NAME
$ python3 infra/helper.py coverage $PROJECT_NAME --fuzz-target= --corpus-dir=
```
You may need to run `python3 infra/helper.py pull_images` to use the latest
coverage tools. Please refer to
[code coverage]({{ site.baseurl }}/advanced-topics/code-coverage/) for detailed
information on code coverage generation.
**Note:** Currently, we only support AddressSanitizer (address) and UndefinedBehaviorSanitizer (undefined)
configurations by default.
MemorySanitizer is recommended, but needs to be enabled manually since you must build all runtime dependencies with MemorySanitizer.
Make sure to test each
of the supported build configurations with the above commands (build_fuzzers -> run_fuzzer -> coverage).
If everything works locally, it should also work on our automated builders and ClusterFuzz. If you check in
your files and experience failures, review your [dependencies]({{ site.baseurl }}/further-reading/fuzzer-environment/#dependencies).
## Debugging Problems
If you run into problems, our [Debugging page]({{ site.baseurl }}/advanced-topics/debugging/) lists ways to debug your build scripts and
[fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target).
## Efficient fuzzing
To improve your fuzz target ability to find bugs faster, you should consider the
following ways:
### Seed Corpus
Most fuzzing engines use evolutionary fuzzing algorithms. Supplying a seed
corpus consisting of good sample inputs is one of the best ways to improve [fuzz
target]({{ site.baseurl }}/reference/glossary/#fuzz-target)'s coverage.
To provide a corpus for `my_fuzzer`, put `my_fuzzer_seed_corpus.zip` file next
to the [fuzz target]({{ site.baseurl }}/reference/glossary/#fuzz-target)'s binary in `$OUT` during the build. Individual files in this
archive will be used as starting inputs for mutations. You can store the corpus
next to source files, generate during build or fetch it using curl or any other
tool of your choice.
(example: [boringssl](https://github.com/google/oss-fuzz/blob/master/projects/boringssl/build.sh#L41)).
Seed corpus files will be used for cross-mutations and portions of them might appear
in bug reports or be used for further security research. It is important that corpus
has an appropriate and consistent license.
OSS-Fuzz only: See also [Accessing Corpora]({{ site.baseurl }}/advanced-topics/corpora/) for information about getting access to the corpus we are currently using for your fuzz targets.
### Dictionaries
Dictionaries hugely improve fuzzing efficiency for inputs with lots of similar
sequences of bytes. [libFuzzer documentation](https://llvm.org/docs/LibFuzzer.html#dictionaries)
Put your dict file in `$OUT`. If the dict filename is the same as your target
binary name (i.e. `%fuzz_target%.dict`), it will be automatically used. If the
name is different (e.g. because it is shared by several targets), specify this
in .options file:
```
[libfuzzer]
dict = dictionary_name.dict
```
It is common for several [fuzz targets]({{ site.baseurl }}/reference/glossary/#fuzz-target)
to reuse the same dictionary if they are fuzzing very similar inputs.
(example: [expat](https://github.com/google/oss-fuzz/blob/ad88a2e5295d91251d15f8a612758cd9e5ad92db/projects/expat/parse_fuzzer.options)).
### Input Size
By default, the fuzzing engine will generate input of any arbitrary length.
This might be useful to try corner cases that could lead to a
security vulnerability. However, if large inputs are not necessary to
increase the coverage of your target API, it is important to add a limit
here to significantly improve performance.
```cpp
if (size < kMinInputLength || size > kMaxInputLength)
return 0;
```
## Checking in to the OSS-Fuzz repository
Once you've tested your fuzzing files locally, fork OSS-Fuzz, commit, and push to the fork. Then
create a pull request with your change. Follow the
[Forking Project](https://guides.github.com/activities/forking/) guide if you're new to contributing
via GitHub.
### Copyright headers
Please include copyright headers for all files checked in to oss-fuzz:
```
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
```
**Exception:** If you're porting a fuzz target from Chromium, keep the original Chromium license header.
## Reviewing results
Once your change is merged, your project and fuzz targets should be automatically built and run on
ClusterFuzz after a short while (< 1 day). If you think there's a problem, you can check your project's [build status](https://oss-fuzz-build-logs.storage.googleapis.com/index.html).
Use the [ClusterFuzz web interface](https://oss-fuzz.com/) to review the following:
* Crashes generated
* Code coverage statistics
* Fuzzer statistics
* Fuzzer performance analyzer (linked from fuzzer statistics)
**Note:** Your Google Account must be listed in [project.yaml](#projectyaml) for you to have access to the ClusterFuzz web interface.
### Status Badge

Once your project has started [building](https://oss-fuzz-build-logs.storage.googleapis.com/index.html), we'd love it if you added our badge in
your project's README. This allows you to see bugs found by your OSS-Fuzz
integration at a glance. See
[brotli](https://github.com/google/brotli#introduction)'s
README for an example.
Adding it is super easy, just follow this template:
```markdown
[](https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:)
```
## Monitoring performance via Fuzz Introspector
As soon as your project is run with ClusterFuzz (< 1 day), you can view the Fuzz
Introspector report for your project.
[Fuzz Introspector](https://github.com/ossf/fuzz-introspector) helps you
understand your fuzzers' performance and identify any potential blockers.
It provides individual and aggregated fuzzer reachability and coverage reports.
You can monitor each fuzzer's static reachability potential and compare it
against dynamic coverage and identify any potential bottlenecks.
Fuzz Introspector can offer suggestions on increasing coverage by adding new
fuzz targets or modify existing ones.
Fuzz Introspector reports can be viewed from the [OSS-Fuzz
homepage](https://oss-fuzz.com/) or through this
[index](http://oss-fuzz-introspector.storage.googleapis.com/index.html).
Fuzz Introspector support C and C++ projects.
Support for Java and Python projects is in the progress.
You can view the [Fuzz Introspector report for bzip2](https://storage.googleapis.com/oss-fuzz-introspector/bzip2/inspector-report/20221017/fuzz_report.html)
as an example.
================================================
FILE: docs/glossary.md
================================================
This page has moved [here](https://google.github.io/oss-fuzz/reference/glossary/)
================================================
FILE: docs/ideal_integration.md
================================================
This page has moved [here](https://google.github.io/oss-fuzz/advanced-topics/ideal-integration)
================================================
FILE: docs/index.md
================================================
---
layout: default
title: OSS-Fuzz
permalink: /
nav_order: 1
has_children: true
has_toc: false
---
# OSS-Fuzz
[Fuzz testing] is a well-known technique for uncovering programming errors in
software. Many of these detectable errors, like [buffer overflow], can have
serious security implications. Google has found [thousands] of security
vulnerabilities and stability bugs by deploying [guided in-process fuzzing of
Chrome components], and we now want to share that service with the open source
community.
[Fuzz testing]: https://en.wikipedia.org/wiki/Fuzz_testing
[buffer overflow]: https://en.wikipedia.org/wiki/Buffer_overflow
[thousands]: https://bugs.chromium.org/p/chromium/issues/list?q=label%3AStability-LibFuzzer%2CStability-AFL%20-status%3ADuplicate%2CWontFix&can=1
[guided in-process fuzzing of Chrome components]: https://security.googleblog.com/2016/08/guided-in-process-fuzzing-of-chrome.html
In cooperation with the [Core Infrastructure Initiative] and the [OpenSSF],
OSS-Fuzz aims to make common open source software more secure and stable by
combining modern fuzzing techniques with scalable, distributed execution.
Projects that do not qualify for OSS-Fuzz (e.g. closed source) can run their own
instances of [ClusterFuzz] or [ClusterFuzzLite].
[Core Infrastructure Initiative]: https://www.coreinfrastructure.org/
[OpenSSF]: https://www.openssf.org/
We support the [libFuzzer], [AFL++], [Honggfuzz], and [Centipede] fuzzing engines in
combination with [Sanitizers], as well as [ClusterFuzz], a distributed fuzzer
execution environment and reporting tool.
[libFuzzer]: https://llvm.org/docs/LibFuzzer.html
[AFL++]: https://github.com/AFLplusplus/AFLplusplus
[Honggfuzz]: https://github.com/google/honggfuzz
[Centipede]: https://github.com/google/centipede
[Sanitizers]: https://github.com/google/sanitizers
[ClusterFuzz]: https://github.com/google/clusterfuzz
[ClusterFuzzLite]: https://google.github.io/clusterfuzzlite/
Currently, OSS-Fuzz supports C/C++, Rust, Go, Python, Java/JVM code, JavaScript
and Lua. Other languages supported by [LLVM] may work too. OSS-Fuzz supports fuzzing x86_64
and i386 builds.
[LLVM]: https://llvm.org
## Project history
OSS-Fuzz was launched in 2016 in response to the
[Heartbleed] vulnerability, discovered in [OpenSSL], one of the
most popular open source projects for encrypting web traffic. The vulnerability
had the potential to affect almost every internet user, yet was caused by a
relatively simple memory buffer overflow bug that could have been detected by
fuzzing—that is, by running the code on randomized inputs to intentionally cause
unexpected behaviors or crashes. At the time, though, fuzzing
was not widely used and was cumbersome for developers, requiring extensive
manual effort.
Google created OSS-Fuzz to fill this gap: it's a free service that runs fuzzers
for open source projects and privately alerts developers to the bugs detected.
Since its launch, OSS-Fuzz has become a critical service for the open source
community, growing beyond C/C++ to
detect problems in memory-safe languages such as Go, Rust, and Python.
[Heartbleed]: https://heartbleed.com/
[OpenSSL]: https://www.openssl.org/
## Learn more about fuzzing
This documentation describes how to use OSS-Fuzz service for your open source
project. To learn more about fuzzing in general, we recommend reading [libFuzzer
tutorial] and the other docs in [google/fuzzing] repository. These and some
other resources are listed on the [useful links] page.
[google/fuzzing]: https://github.com/google/fuzzing/tree/master/docs
[libFuzzer tutorial]: https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md
[useful links]: {{ site.baseurl }}/reference/useful-links/#tutorials
## Trophies
As of August 2023, OSS-Fuzz has helped identify and fix over [10,000] vulnerabilities and [36,000] bugs across [1,000] projects.
[10,000]: https://bugs.chromium.org/p/oss-fuzz/issues/list?q=Type%3DBug-Security%20label%3Aclusterfuzz%20-status%3ADuplicate%2CWontFix&can=1
[36,000]: https://bugs.chromium.org/p/oss-fuzz/issues/list?q=Type%3DBug%20label%3Aclusterfuzz%20-status%3ADuplicate%2CWontFix&can=1
[1,000]: https://github.com/google/oss-fuzz/tree/master/projects
================================================
FILE: docs/new_project_guide.md
================================================
This page has moved [here](https://google.github.io/oss-fuzz/getting-started/new-project-guide/)
================================================
FILE: docs/oss-fuzz/architecture.md
================================================
---
layout: default
title: Architecture
permalink: /architecture/
nav_order: 1
parent: OSS-Fuzz
---
# Architecture

The process works like this:
1. A maintainer of an open source project (or an outside volunteer) creates
one or more [fuzz targets](https://llvm.org/docs/LibFuzzer.html#fuzz-target)
and [integrates]({{ site.baseurl }}/advanced-topics/ideal-integration/) them
with the project's build and test system.
1. The project is [accepted to OSS-Fuzz]({{ site.baseurl }}/getting-started/accepting-new-projects/) and the developer commits their build configurations.
1. The OSS-Fuzz [builder](https://github.com/google/oss-fuzz/tree/master/infra/build) builds the project from the committed configs.
1. The builder uploads the fuzz targets to the OSS-Fuzz GCS bucket.
1. [ClusterFuzz]({{ site.baseurl }}/further-reading/clusterfuzz) downloads the fuzz targets and begins to fuzz the projects.
1. When Clusterfuzz finds a
bug, it reports the issue automatically to the OSS-Fuzz
[issue tracker](https://bugs.chromium.org/p/oss-fuzz/issues/list)
([example](https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9)).
([Why use a different tracker?]({{ site.baseurl }}/faq/#why-do-you-use-a-different-issue-tracker-for-reporting-bugs-in-oss-projects))
1. Project owners are CCed on the bug report.
1. The project developer fixes the bug upstream and credits OSS-Fuzz for the
discovery (the commit message should contain the string **'Credit to OSS-Fuzz'**).
Once the developer fixes the bug, [ClusterFuzz]({{ site.baseurl }}/further-reading/clusterfuzz) automatically
verifies the fix, adds a comment, and closes the issue ([example](https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=53#c3)). After the fix is verified or 90 days after reporting (whichever is earlier), the issue becomes [public]({{ site.baseurl }}/getting-started/bug-disclosure-guidelines/).
================================================
FILE: docs/reference/glossary.md
================================================
---
layout: default
title: Glossary
nav_order: 1
permalink: /reference/glossary/
parent: Reference
---
# Glossary
For general fuzzing terms, see the [glossary] from [google/fuzzing] project.
[glossary]: https://github.com/google/fuzzing/blob/master/docs/glossary.md
[google/fuzzing]: https://github.com/google/fuzzing
- TOC
{:toc}
---
## OSS-Fuzz specific terms
### ClusterFuzz
A scalable fuzzing infrastructure that is used for OSS-Fuzz backend.
[ClusterFuzz] is also used to fuzz Chrome and many other projects. A quick
overview of ClusterFuzz user interface is available on this [page].
[page]: {{ site.baseurl }}/further-reading/clusterfuzz
[ClusterFuzz]: https://github.com/google/clusterfuzz
### Fuzz Target
In addition to its
[general definition](https://github.com/google/fuzzing/blob/master/docs/glossary.md#fuzz-target),
in OSS-Fuzz a fuzz target can be used to
[reproduce bug reports]({{ site.baseurl }}/advanced-topics/reproducing/).
It is recommended to use it for regression testing as well (see
[ideal integration]({{ site.baseurl }}/advanced-topics/ideal-integration/)).
### Job type
Or **Fuzzer Build**.
This refers to a build that contains all the [fuzz targets] for a given
[project](#project), is run with a specific [fuzzing engine], in a specific
build mode (e.g. with enabled/disabled assertions), and optionally combined
with a [sanitizer].
For example, we have a "libfuzzer_asan_sqlite" job type, indicating a build of
all sqlite3 [fuzz targets] using [libFuzzer](http://libfuzzer.info) and
[ASan](http://clang.llvm.org/docs/AddressSanitizer.html).
### Project
A project is an open source software project that is integrated with OSS-Fuzz.
Each project has a single set of configuration files
(example: [expat](https://github.com/google/oss-fuzz/tree/master/projects/expat))
and may have one or more [fuzz targets]
(example: [openssl](https://github.com/openssl/openssl/blob/master/fuzz/)).
### Reproducer
Or a **testcase**.
A [test input] that causes a specific bug to reproduce.
[fuzz targets]: https://github.com/google/fuzzing/blob/master/docs/glossary.md#fuzz-target
[fuzzing engine]: https://github.com/google/fuzzing/blob/master/docs/glossary.md#fuzzing-engine
[sanitizer]: https://github.com/google/fuzzing/blob/master/docs/glossary.md#sanitizer
[test input]: https://github.com/google/fuzzing/blob/master/docs/glossary.md#test-input
### Sanitizers
Fuzzers are usually built with one or more [sanitizer](https://github.com/google/sanitizers) enabled.
```bash
$ python3 infra/helper.py build_fuzzers --sanitizer undefined json
```
Supported sanitizers:
| Sanitizer | Description
| ------------ | ----------
| `address` *(default)* | [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) with [Leak Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer).
| `undefined` | [Undefined Behavior Sanitizer](http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html).
| `memory` | [Memory Sanitizer](https://github.com/google/sanitizers/wiki/MemorySanitizer). *NOTE: It is critical that you build __all__ the code in your program (including libraries it uses) with Memory Sanitizer. Otherwise, you will see false positive crashes due to an inability to see initializations in uninstrumented code.*
| `coverage` | Used for generating code coverage reports. See [Code Coverage doc]({{ site.baseurl }}/advanced-topics/code-coverage/).
Compiler flag values for predefined configurations are specified in the [Dockerfile](https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/Dockerfile).
These flags can be overridden by specifying `$SANITIZER_FLAGS` directly.
You can choose which configurations to automatically run your fuzzers with in `project.yaml` file (e.g. [sqlite3](https://github.com/google/oss-fuzz/tree/master/projects/sqlite3/project.yaml)).
### Architectures
ClusterFuzz supports fuzzing on x86_64 (aka x64) by default. However you can also fuzz using AddressSanitizer and libFuzzer on i386 (aka x86, or 32 bit) by specifying the `$ARCHITECTURE` build environment variable using the `--architecture` option:
```bash
python3 infra/helper.py build_fuzzers --architecture i386 json
```
================================================
FILE: docs/reference/reference.md
================================================
---
layout: default
title: Reference
has_children: true
nav_order: 6
permalink: /reference/
---
# Reference
================================================
FILE: docs/reference/useful_links.md
================================================
---
layout: default
title: Useful links
nav_order: 2
permalink: /reference/useful-links/
parent: Reference
---
# Useful links
- TOC
{:toc}
---
## Web Interface
* The main page: [oss-fuzz.com](https://oss-fuzz.com)
## Build Status
* [This page](https://oss-fuzz-build-logs.storage.googleapis.com/index.html)
gives the latest build logs for each project.
## Blog posts
* 2016-12-01 - Announcing OSS-Fuzz: Continuous fuzzing for open source software
([Open Source](https://opensource.googleblog.com/2016/12/announcing-oss-fuzz-continuous-fuzzing.html),
[Testing](https://testing.googleblog.com/2016/12/announcing-oss-fuzz-continuous-fuzzing.html),
[Security](https://security.googleblog.com/2016/12/announcing-oss-fuzz-continuous-fuzzing.html))
* 2017-05-08 - OSS-Fuzz: Five months later, and rewarding projects
([Open Source](https://opensource.googleblog.com/2017/05/oss-fuzz-five-months-later-and.html),
[Testing](https://testing.googleblog.com/2017/05/oss-fuzz-five-months-later-and.html),
[Security](https://security.googleblog.com/2017/05/oss-fuzz-five-months-later-and.html))
* 2018-11-06 - A New Chapter for OSS-Fuzz
([Security](https://security.googleblog.com/2018/11/a-new-chapter-for-oss-fuzz.html))
* 2020-10-09 - [Fuzzing internships for Open Source Software](https://security.googleblog.com/2020/10/fuzzing-internships-for-open-source.html)
* 2020-12-07 - [Improving open source security during the Google summer internship program](https://security.googleblog.com/2020/12/improving-open-source-security-during.html)
* 2021-03-10 - [Fuzzing Java in OSS-Fuzz](https://security.googleblog.com/2021/03/fuzzing-java-in-oss-fuzz.html)
* 2021-12-16 - [Improving OSS-Fuzz and Jazzer to catch Log4Shell](https://security.googleblog.com/2021/12/improving-oss-fuzz-and-jazzer-to-catch.html)
* 2022-09-08 - [Fuzzing beyond memory corruption: Finding broader classes of vulnerabilities automatically](https://security.googleblog.com/2022/09/fuzzing-beyond-memory-corruption.html)
* 2023-02-01 - [Taking the next step: OSS-Fuzz in 2023](https://security.googleblog.com/2023/02/taking-next-step-oss-fuzz-in-2023.html)
## Tutorials
* [libFuzzer documentation](https://llvm.org/docs/LibFuzzer.html)
* [libFuzzer tutorial](https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md)
* [libFuzzer workshop](https://github.com/Dor1s/libfuzzer-workshop)
* [Structure-Aware Fuzzing with libFuzzer](https://github.com/google/fuzzer-test-suite/blob/master/tutorial/structure-aware-fuzzing.md)
* [Chromium Fuzzing Page](https://chromium.googlesource.com/chromium/src/testing/libfuzzer/)
* [Chromium Efficient Fuzzing Guide](https://chromium.googlesource.com/chromium/src/testing/libfuzzer/+/HEAD/efficient_fuzzing.md)
* [ClusterFuzz documentation](https://google.github.io/clusterfuzz/)
================================================
FILE: docs/reproducing.md
================================================
This page has moved [here](https://google.github.io/oss-fuzz/advanced-topics/reproducing)
================================================
FILE: docs/research/target_generation.md
================================================
---
layout: default
nav_exclude: true
permalink: /research/llms/target_generation/
---
# Fuzz target generation using LLMs
[Read our announcement blog.](https://security.googleblog.com/2023/08/ai-powered-fuzzing-breaking-bug-hunting.html)
# Background
[OSS-Fuzz](http://github.com/google/oss-fuzz) performs continuous fuzzing of [1000+](https://github.com/google/oss-fuzz/tree/master/projects) open source projects across most major languages. To integrate a new project, a human typically analyzes the attack surface of a library and writes [fuzz targets](https://github.com/google/fuzzing/blob/master/docs/glossary.md#fuzz-target) (also called fuzzing harnesses) to exercise the relevant code. Linked with a [fuzzing engine](https://github.com/google/fuzzing/blob/master/docs/glossary.md#fuzzing-engine) (e.g. libFuzzer, AFL, Centipede), this enables coverage-guided fuzzing for all OSS-Fuzz projects. Depending on the complexity of the project, writing fuzz targets typically requires several hours of manual work and sufficient background knowledge of the project.
Additionally, the main challenge for most integrated OSS-Fuzz projects is ensuring high code coverage. Most OSS-Fuzz projects have fairly low runtime coverage ([~30%](http://introspector.oss-fuzz.com/)) despite millions of hours of CPU time. This means we are not finding vulnerabilities in approximately 70% of each project that we’re fuzzing. Our [preliminary research](https://dl.acm.org/doi/abs/10.1145/3605157.3605177) found that many[ fuzz blockers](https://github.com/ossf/fuzz-introspector/blob/main/doc/Glossary.md#fuzz-blockers) (as determined by [FuzzIntrospector](https://introspector.oss-fuzz.com/)) are because of deficiencies in existing targets, rather than deficiencies in fuzzing engines.
Generating fuzz targets via LLMs can reduce the manual effort required to more thoroughly fuzz existing projects in OSS-Fuzz as well as integrating new projects into OSS-Fuzz.
# Goals
Our ideal end state of this research is to use LLMs for two use cases:
1. Completely automatic fuzz target generation (or modification of existing targets) for existing OSS-Fuzz projects to unblock fuzz blockers and increase project code coverage (and bugs found) for free.
2. Completely automatic fuzz target generation for completely new OSS-Fuzz projects. This is much more challenging than 1, and is an extension of it.
Our current experiments focus on the first use case for C/C++ projects. This report serves as a preliminary investigation into how effective LLMs are for this use case. More detailed results and the experimentation framework for our research will be published at a later date.
# Experiment framework
To discover whether an LLM could successfully write new fuzz targets, we built an evaluation framework that connects OSS-Fuzz to Google’s LLMs, conducts the experiment, and evaluates the results. The steps look like this:

1. OSS-Fuzz’s [Fuzz Introspector tool](http://introspector.oss-fuzz.com/) identifies an under-fuzzed, high-potential, portion of the target project’s code and passes the code to the evaluation framework.
2. The evaluation framework creates a prompt that the LLM will use to write the new fuzz target. The prompt includes project specific information.
3. The evaluation framework takes the fuzz target generated by the LLM and runs the new target.
4. The evaluation framework observes the run for any change in code coverage or crashes.
5. In the event that the fuzz target fails to compile, the evaluation framework prompts the LLM to write a revised fuzz target that addresses the compilation errors.
## 1. Identifying high potential portions of the project’s code
We leverage [Fuzz Introspector](https://introspector.oss-fuzz.com/) ([example JSON endpoint](https://introspector.oss-fuzz.com/api/far-reach-but-low-coverage?project=tinyxml2)) to provide us with a list of functions with low runtime coverage (but high potential to reach more code coverage). These are turned into benchmark YAML files, which consist of an OSS-Fuzz project, and a list of function signatures to generate new targets for.
We have started with a small set of benchmarks, and will gradually scale this to larger, automated sets of benchmarks taken from all of OSS-Fuzz as we improve the function selection and prompt generation process.
Example benchmark (YAML):
```yaml
functions:
- XML_Parser XMLCALL XML_ExternalEntityParserCreate(XML_Parser oldParser, const XML_Char
*context, const XML_Char *encodingName)
- XML_Parser XMLCALL XML_ParserCreateNS(const XML_Char *encodingName, XML_Char nsSep)
- XML_Bool XMLCALL XML_ParserReset(XML_Parser parser, const XML_Char *encodingName)
- static enum XML_Error PTRCALL externalParEntInitProcessor(XML_Parser parser, const
char *s, const char *end, const char **nextPtr)
project: expat
target_path: /src/expat/expat/fuzz/xml_parse_fuzzer.c
target_name: xml_parse_fuzzer_UTF-8
```
## 2. Prompt generation
We dynamically generate a prompt based on a template ([example](https://storage.googleapis.com/oss-fuzz-llm-targets-public/jsoncpp-json-value-removeindex/prompts.txt)).
As part of our experimentation, we tried various different prompt approaches. So far, the best results have come from including:
* One example of an existing function signature and fuzz target from the project under test, formatted into problem and solution structure. Too many examples yields worse results.
* Two examples from other projects in OSS-Fuzz, formatted in the same way.
* Examples of how to leverage [FuzzedDataProvider](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider) to generate inputs for function arguments.
* A priming that gives the task context.
* Examples of code anti-patterns to avoid.
The dynamically generated sections today include examples of existing fuzz targets from both other projects on OSS-Fuzz as well as one example from the project under test. We have other unexplored ideas including more structured information about the function under test, such as:
* Relevant data structure definitions
* Function implementations of the function under test and related functions
* Usages of the function under test and related functions
## 3. Build and run
We leverage the OSS-Fuzz build infrastructure to build new targets by replacing an existing target’s source code with the newly generated target source code.
OSS-Fuzz projects often have strict compiler flags on by default. To make compilation easier, we also implemented a [compiler wrapper](https://github.com/google/oss-fuzz/blob/d78275b4e2e17d3d8f12b99db2b51de4b114edb3/infra/base-images/base-builder/jcc.go) that:
* Turns off compiler warnings to prevent trivial issues such as missing pointer casts from blocking compilation.
* Re-compiles targets as C++ (to leverage [FuzzedDataProvider](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider)).
## 4. Measuring quality of generated targets
An important part of our research is to define metrics to measure the quality of generated targets.
These metrics are:
* Syntax correctness and project consistency. This is measured by its compilation result. For example: whether it compiles successfully, does it call functions in the project correctly without hallucination.
* Whether it crashes instantly or within the fuzz target. This often means that there is some miscalled API and the crashes are likely to be false positives.
* **New code coverage**. This is measured by the new lines it covered compared to all existing targets in OSS-Fuzz for the same project.
All of these metrics can be automatically computed for a given generated target.
## 5. LLM Code Fixer
The fuzz targets generated by LLM often contain various trivial defects, which can be fixed by a separate LLM query.
The prompt of the code fixing query is structured as follows, where the raw code and error are respectively replaced with the fuzz target source code generated by the LLM and the build error messages extracted from pages of build logs:
````
Given the following code and its build error message, fix the code without affecting its functionality.
First explain the reason, then output the whole fixed code.
If a function is missing, fix it by including the related libraries.
Code:
```
{raw_code}
```
Build error message:
```
{error}
```
Fixed code:
````
Several rounds of code fixing queries are required for some cases. For example, when multiple defects incurs several error messages, sometimes LLM tends to only fix one of them at a time. Similarly, new defects may be introduced during code fixing. In these cases, we found iteratively querying LLM with the same prompt structure will gradually fix all errors.
LLM often proposes several responses for each query, we prefer the one with the longest code. This is an implementation decision to avoid a quadratically increasing number of targets to build (e.g. the LLMs could propose 4 new targets across N iterations) and to avoid the LLM deleting the function code to fix build failures.
Additionally, we also check that the generated target includes a call to the requested function to test. If it does not, this is surfaced as an error to the LLM.
### Example
[Prompt](https://storage.googleapis.com/oss-fuzz-llm-targets-public/tinyxml2-tinyxml2-xmldocument-print/fixes/04-F4/prompt.txt): Incorrect target with missing arguments passed to target function.
[After fix](https://storage.googleapis.com/oss-fuzz-llm-targets-public/tinyxml2-tinyxml2-xmldocument-print/fixes/04-F4/01.cpp): Correct function argument added.
# Results
Initially, getting any compilable output was a challenge. We were able to improve this via prompt engineering and our compiler wrapper to having [14/31 tested OSS-Fuzz projects](https://storage.googleapis.com/oss-fuzz-llm-targets-public/index.html?prefix=benchmarks/) successfully compile new targets and increase coverage. The successful examples and prompts are published [here](https://storage.googleapis.com/oss-fuzz-llm-targets-public/index.html).
We see a wide range of coverage improvements from 0-31% code coverage increases.
The top coverage increases, aggregated across all benchmarks per OSS-Fuzz project are:
tinyxml2
31%
cjson
6%
expat
4%
libplist
4%
libxml2
1%
elfutils
1%
The best result we’ve had is with the TinyXML2 project, where we managed to increase fuzz coverage from [38%](https://storage.googleapis.com/oss-fuzz-coverage/tinyxml2/reports/20230801/linux/report.html) line coverage to [69%](https://storage.googleapis.com/oss-fuzz-coverage/llm-results/tinyxml2/linux/report.html) line coverage without any interventions.

Additionally, we targeted OpenSSL from the perspective of discovering past vulnerabilities that were not found due to lack of fuzzing coverage. We were able to replicate [a similar fuzz target](https://storage.googleapis.com/oss-fuzz-llm-targets-public/openssl-ossl_punycode_decode/targets/15.c) that rediscovered [CVE-2022-3602](https://nvd.nist.gov/vuln/detail/CVE-2022-3602).

# Future work
We’ve seen very promising early results in this space and will continue our research.
## 1. Continue research
There are a number of areas we’d like to further research on:
* Expand benchmarks to all of OSS-Fuzz. We’d like to expand the set of benchmarks to cover all of OSS-Fuzz.
* Continued prompt engineering and experimentation with project-specific context, such as more structured context (e.g. structure definitions, implementations) around the relevant function to test.
* Model fine-tuning
* Expand to other languages beyond C/C++
* Expand research to completely new projects with no existing OSS-Fuzz integration.
## 2. Open source evaluation framework
We plan to open source the evaluation framework we’ve built to help test arbitrary auto-fuzz target generation capabilities. We hope that OSS-Fuzz can serve as a valuable benchmarking platform for researchers in this space.
## 3. OSS-Fuzz integration
Ultimately, the goal is to integrate the results of this research into OSS-Fuzz, to provide:
* Free coverage increases for existing projects
* Automated onboarding of new projects, and tools to help maintainers write manual fuzz targets.
# Appendix
## Successful benchmark results
================================================
FILE: infra/.dockerignore
================================================
cifuzz/test_data/*
# Copied from .gitignore.
.vscode/
*.pyc
build
*~
.DS_Store
*.swp
================================================
FILE: infra/MAINTAINERS.csv
================================================
Name,Email,Github Username
Adam Korcz,adam@adalogics.com,AdamKorcz
David Korczynski,david@adalogics.com,DavidKorczynski
Dongge Liu,donggeliu@google.com,Alan32Liu
Holly Gong,gongh@google.com,hogo6002
Jonathan Metzman,metzman@google.com,jonathanmetzman
Oliver Chang,ochang@google.com,oliverchang
================================================
FILE: infra/README.md
================================================
# infra
> OSS-Fuzz project infrastructure
Core infrastructure:
* [`base-images`](base-images/) - docker images for building fuzz targets & corresponding jenkins
pipeline.
Continuous Integration infrastructure:
* [`ci`](ci/) - script to build projects in CI.
## helper.py
> script to automate common docker operations
| Command | Description |
|---------|-------------
| `generate` | Generates skeleton files for a new project |
| `build_image` | Builds a docker image for a given project |
| `build_fuzzers` | Builds fuzz targets for a given project |
| `run_fuzzer` | Runs a fuzz target in a docker container |
| `coverage` | Runs fuzz target(s) in a docker container and generates a code coverage report. See [Code Coverage doc](https://google.github.io/oss-fuzz/advanced-topics/code-coverage/) |
| `reproduce` | Runs a testcase to reproduce a crash |
| `shell` | Starts a shell inside the docker image for a project |
================================================
FILE: infra/base-images/README.md
================================================
# Base Images
This directory contains the base images used by OSS-Fuzz.
## Building
To build all images, run:
```bash
# run from project root
infra/base-images/all.sh
```
## Trial Builds for Testing Changes
When making changes to any of the base images, it's crucial to test them using the trial build system. This system is designed to build test versions of the images, identified by a `-testing` suffix, and use them to build a subset of OSS-Fuzz projects to ensure the changes don't cause regressions.
### Architecture Overview
The trial build system now supports building multiple Ubuntu base variants in parallel to accelerate testing. The supported variants are:
- `latest` (based on the default `Dockerfile`)
- `ubuntu-20-04` (based on `ubuntu-20-04.Dockerfile`)
- `ubuntu-24-04` (based on `ubuntu-24-04.Dockerfile`)
When a trial build is triggered on a Pull Request that modifies files in `infra/base-images/`, the system initiates three separate, parallel builds in Google Cloud Build (GCB). Each of these builds is responsible for building all base images for a single Ubuntu variant. This parallel architecture allows for faster feedback on changes across different base OS versions.
### How to Trigger a Trial Build
1. Create a Pull Request with your changes to the base images.
2. Once the PR is open, add a comment with the following command:
```
/gcbrun trial_build.py all
```
3. This command will be picked up by our CI system. It will first trigger a "coordinator" build, which then spawns the three parallel builds for the different Ubuntu variants. You can monitor the progress of these builds directly in the Google Cloud Build interface linked in your PR.
## Dependency Tree
The following diagram shows the dependency tree of the base images.
```mermaid
graph TD
A[base-image] --> B(base-clang);
B --> C(base-builder);
C --> D(base-builder-go);
C --> E(base-builder-javascript);
C --> F(base-builder-jvm);
C --> G(base-builder-python);
C --> H(base-builder-ruby);
C --> I(base-builder-rust);
C --> J(base-builder-swift);
C --> K(base-builder-fuzzbench);
A --> L(base-runner);
B --> L;
C --> L;
H --> L;
L --> M(base-runner-debug);
```
================================================
FILE: infra/base-images/all.sh
================================================
#!/bin/bash -eux
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
#
# A script to build base images locally.
#
# This script is a wrapper around `docker build` that dynamically fetches the
# official list of images from the Python source of truth, ensuring it never
# goes out of date.
#
# Usage:
# # Build the 'latest' version of all images.
# ./all.sh
#
# # Build the 'ubuntu-24-04' version of all images.
# ./all.sh ubuntu-24-04
#
################################################################################
# The first argument is the version tag, e.g., 'latest', 'ubuntu-20-04'.
VERSION_TAG=${1:-latest}
# Get the directory where this script is located to find the helper script.
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
# Fetch the official list of images from the Python source of truth.
# This avoids duplicating the image list and ensures this script is always
# up-to-date.
IMAGE_LIST=$(python3 "${SCRIPT_DIR}/list_images.py")
echo "Building version: ${VERSION_TAG}"
echo "Images to build: ${IMAGE_LIST}"
# Loop through the official list of images and build each one.
for image_name in ${IMAGE_LIST}; do
image_dir="infra/base-images/${image_name}"
if [ "${VERSION_TAG}" == "latest" ]; then
dockerfile="${image_dir}/Dockerfile"
tag="gcr.io/oss-fuzz-base/${image_name}"
else
dockerfile="${image_dir}/${VERSION_TAG}.Dockerfile"
tag="gcr.io/oss-fuzz-base/${image_name}:${VERSION_TAG}"
fi
if [ ! -f "${dockerfile}" ]; then
echo "Skipping build for ${image_name}:${VERSION_TAG} - Dockerfile not found at ${dockerfile}"
continue
fi
echo "Building ${tag} from ${dockerfile}..."
docker build -t "${tag}" -f "${dockerfile}" "${image_dir}"
done
echo "All builds for version ${VERSION_TAG} completed successfully."
================================================
FILE: infra/base-images/base-builder/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-builder
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-builder` were successfully built. These images contain the necessary tools and libraries for building fuzzers. The `ubuntu-24-04` build required a fix to the `ADD` instruction in the Dockerfile to correctly handle multiple files. Both versions install a variety of tools, including Python, Bazel, and various compilers.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-builder:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-builder:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The `ubuntu-24-04` image includes newer versions of many packages, including Python 3.11.13. The specific versions of other tools and libraries also differ due to the updated base image.
## Dockerfile Analysis
The Dockerfiles for both versions have several key differences:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-clang` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Package Installation:** The `install_deps.sh` script is used to install a base set of dependencies, which differ between the two versions.
* **Python Installation:** The `ubuntu-24-04` Dockerfile installs Python 3.11.13 from source, while the `ubuntu-20-04` version uses a different set of commands.
* **ADD Instruction:** The `ADD` instruction in the `ubuntu-24-04` Dockerfile was corrected to use the proper syntax for adding multiple files.
================================================
FILE: infra/base-images/base-builder/Dockerfile
================================================
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-clang
COPY install_deps.sh /
RUN /install_deps.sh && rm /install_deps.sh
# Build and install latest Python 3.11.
ENV PYTHON_VERSION 3.11.13
RUN PYTHON_DEPS="\
zlib1g-dev \
libncurses5-dev \
libgdbm-dev \
libnss3-dev \
libssl-dev \
libsqlite3-dev \
libreadline-dev \
libffi-dev \
libbz2-dev \
liblzma-dev" && \
unset CFLAGS CXXFLAGS && \
apt-get install -y $PYTHON_DEPS && \
cd /tmp && \
curl -O https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tar.xz && \
tar -xvf Python-$PYTHON_VERSION.tar.xz && \
cd Python-$PYTHON_VERSION && \
./configure --enable-optimizations --enable-shared && \
make -j$(nproc) && \
make install && \
ldconfig && \
ln -s /usr/local/bin/python3 /usr/local/bin/python && \
cd .. && \
rm -r /tmp/Python-$PYTHON_VERSION.tar.xz /tmp/Python-$PYTHON_VERSION && \
rm -rf /usr/local/lib/python${PYTHON_VERSION%.*}/test && \
python3 -m ensurepip && \
python3 -m pip install --upgrade pip && \
apt-get remove -y $PYTHON_DEPS # https://github.com/google/oss-fuzz/issues/3888
ENV CCACHE_VERSION 4.10.2
RUN cd /tmp && curl -OL https://github.com/ccache/ccache/releases/download/v$CCACHE_VERSION/ccache-$CCACHE_VERSION.tar.xz && \
tar -xvf ccache-$CCACHE_VERSION.tar.xz && cd ccache-$CCACHE_VERSION && \
mkdir build && cd build && \
export LDFLAGS='-lpthread' && \
cmake -D CMAKE_BUILD_TYPE=Release .. && \
make -j && make install && \
rm -rf /tmp/ccache-$CCACHE_VERSION /tmp/ccache-$CCACHE_VERSION.tar.xz
# Install six for Bazel rules.
RUN unset CFLAGS CXXFLAGS && pip3 install -v --no-cache-dir \
six==1.15.0 absl-py==2.3.0 pyelftools==0.32 && rm -rf /tmp/*
# Install Bazel through Bazelisk, which automatically fetches the latest Bazel version.
ENV BAZELISK_VERSION 1.9.0
RUN curl -L https://github.com/bazelbuild/bazelisk/releases/download/v$BAZELISK_VERSION/bazelisk-linux-amd64 -o /usr/local/bin/bazel && \
chmod +x /usr/local/bin/bazel
# Default build flags for various sanitizers.
ENV SANITIZER_FLAGS_address "-fsanitize=address -fsanitize-address-use-after-scope"
ENV SANITIZER_FLAGS_hwaddress "-fsanitize=hwaddress -fuse-ld=lld -Wno-unused-command-line-argument"
# Set of '-fsanitize' flags matches '-fno-sanitize-recover' + 'unsigned-integer-overflow'.
ENV SANITIZER_FLAGS_undefined "-fsanitize=array-bounds,bool,builtin,enum,function,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unsigned-integer-overflow,unreachable,vla-bound,vptr -fno-sanitize-recover=array-bounds,bool,builtin,enum,function,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unreachable,vla-bound,vptr"
# Don't include "function" since it is unsupported on aarch64.
ENV SANITIZER_FLAGS_undefined_aarch64 "-fsanitize=array-bounds,bool,builtin,enum,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unsigned-integer-overflow,unreachable,vla-bound,vptr -fno-sanitize-recover=array-bounds,bool,builtin,enum,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unreachable,vla-bound,vptr"
ENV SANITIZER_FLAGS_memory "-fsanitize=memory -fsanitize-memory-track-origins"
ENV SANITIZER_FLAGS_thread "-fsanitize=thread"
ENV SANITIZER_FLAGS_introspector "-O0 -flto -fno-inline-functions -fuse-ld=gold -Wno-unused-command-line-argument"
# Do not use any sanitizers in the coverage build.
ENV SANITIZER_FLAGS_coverage ""
# We use unsigned-integer-overflow as an additional coverage signal and have to
# suppress error messages. See https://github.com/google/oss-fuzz/issues/910.
ENV UBSAN_OPTIONS="silence_unsigned_overflow=1"
# To suppress warnings from binaries running during compilation.
ENV DFSAN_OPTIONS='warn_unimplemented=0'
# Default build flags for coverage feedback.
ENV COVERAGE_FLAGS="-fsanitize=fuzzer-no-link"
# Use '-Wno-unused-command-line-argument' to suppress "warning: -ldl: 'linker' input unused"
# messages which are treated as errors by some projects.
ENV COVERAGE_FLAGS_coverage "-fprofile-instr-generate -fcoverage-mapping -pthread -Wl,--no-as-needed -Wl,-ldl -Wl,-lm -Wno-unused-command-line-argument"
# Default sanitizer, fuzzing engine and architecture to use.
ENV SANITIZER="address"
ENV FUZZING_ENGINE="libfuzzer"
ENV ARCHITECTURE="x86_64"
# DEPRECATED - NEW CODE SHOULD NOT USE THIS. OLD CODE SHOULD STOP. Please use
# LIB_FUZZING_ENGINE instead.
# Path to fuzzing engine library to support some old users of
# LIB_FUZZING_ENGINE.
ENV LIB_FUZZING_ENGINE_DEPRECATED="/usr/lib/libFuzzingEngine.a"
# Argument passed to compiler to link against fuzzing engine.
# Defaults to the path, but is "-fsanitize=fuzzer" in libFuzzer builds.
ENV LIB_FUZZING_ENGINE="/usr/lib/libFuzzingEngine.a"
# TODO: remove after tpm2 catchup.
ENV FUZZER_LDFLAGS ""
WORKDIR $SRC
COPY afl_llvm22_patch.diff $SRC/
RUN git clone https://github.com/AFLplusplus/AFLplusplus.git aflplusplus && \
cd aflplusplus && \
git checkout eadc8a2a7e0fa0338802ee6254bf296489ce4fd7 && \
wget --no-check-certificate -O oss.sh https://raw.githubusercontent.com/vanhauser-thc/binary_blobs/master/oss.sh && \
git apply $SRC/afl_llvm22_patch.diff && \
rm -rf .git && \
chmod 755 oss.sh
# Do precompiles before copying other scripts for better cache efficiency.
COPY precompile_afl /usr/local/bin/
RUN precompile_afl
RUN cd $SRC && \
curl -L -O https://github.com/google/honggfuzz/archive/oss-fuzz.tar.gz && \
mkdir honggfuzz && \
cd honggfuzz && \
tar -xz --strip-components=1 -f $SRC/oss-fuzz.tar.gz && \
rm -rf examples $SRC/oss-fuzz.tar.gz
COPY precompile_honggfuzz /usr/local/bin/
RUN precompile_honggfuzz
RUN cd $SRC && \
git clone https://github.com/google/fuzztest && \
cd fuzztest && \
git checkout a37d133f714395cabc20dd930969a889495c9f53 && \
rm -rf .git
ENV CENTIPEDE_BIN_DIR=$SRC/fuzztest/bazel-bin
COPY precompile_centipede /usr/local/bin/
RUN precompile_centipede
COPY sanitizers /usr/local/lib/sanitizers
COPY bazel_build_fuzz_tests \
cargo \
compile \
compile_afl \
compile_centipede \
compile_honggfuzz \
compile_fuzztests.sh \
compile_go_fuzzer \
compile_javascript_fuzzer \
compile_libfuzzer \
compile_native_go_fuzzer \
compile_native_go_fuzzer_v2 \
go_utils.sh \
compile_python_fuzzer \
debug_afl \
# Go, JavaScript, Java, Python, Rust, and Swift installation scripts.
install_go.sh \
install_javascript.sh \
install_java.sh \
install_python.sh \
install_ruby.sh \
install_rust.sh \
install_swift.sh \
make_build_replayable.py \
python_coverage_helper.py \
replay_build.sh \
srcmap \
write_labels.py \
unshallow_repos.py \
/usr/local/bin/
# TODO: Build this as part of a multi-stage build.
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang-jcc /usr/local/bin/
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang++-jcc /usr/local/bin
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang-jcc2 /usr/local/bin/
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang++-jcc2 /usr/local/bin
RUN chmod +x /usr/local/bin/clang-jcc /usr/local/bin/clang++-jcc /usr/local/bin/clang-jcc2 /usr/local/bin/clang++-jcc2
COPY llvmsymbol.diff $SRC
COPY detect_repo.py /opt/cifuzz/
COPY bazel.bazelrc /root/.bazelrc
# Set up ccache binary and cache directory.
# /ccache/bin will contain the compiler wrappers, and /ccache/cache will
# contain the actual cache, which can be saved.
# To use this, set PATH=/ccache/bin:$PATH.
RUN mkdir -p /ccache/bin && mkdir -p /ccache/cache && \
ln -s /usr/local/bin/ccache /ccache/bin/clang && \
ln -s /usr/local/bin/ccache /ccache/bin/clang++ && \
ln -s /usr/local/bin/ccache /ccache/bin/clang-jcc && \
ln -s /usr/local/bin/ccache /ccache/bin/clang++-jcc
ENV CCACHE_DIR /ccache/cache
# Don't check that the compiler is the same, so we can switch between jcc and
# clang under the hood and re-use the same build cache.
ENV CCACHE_COMPILERCHECK none
ENV CCACHE_COMPILERTYPE clang
# Build newer patchelf than the one available from Ubuntu.
RUN cd /tmp && git clone https://github.com/NixOS/patchelf && \
apt-get update && apt-get install -y autoconf && \
cd patchelf && git checkout 523f401584d9584e76c9c77004e7abeb9e6c4551 && \
unset CFLAGS && export CXXFLAGS='-stdlib=libc++' && export LDFLAGS='-lpthread' && \
./bootstrap.sh && ./configure && make && \
cp /tmp/patchelf/src/patchelf /usr/local/bin && \
rm -rf /tmp/patchelf && apt-get remove -y autoconf
COPY indexer /opt/indexer
COPY --from=gcr.io/oss-fuzz-base/indexer /indexer/build/indexer /opt/indexer/indexer
RUN chmod a+x /opt/indexer/indexer /opt/indexer/index_build.py
CMD ["compile"]
================================================
FILE: infra/base-images/base-builder/README.md
================================================
# base-builder
> Abstract base image for project builders.
Every project image supports multiple commands that can be invoked through docker after the image is built:
docker run --rm -ti gcr.io/oss-fuzz/$project<command><arguments...>
# Supported Commands
| Command | Description |
|---------|-------------|
| `compile` (default) | build all fuzz targets
| `/bin/bash` | drop into shell, execute `compile` script to start build.
# Build Configuration
A single build image can build same set of fuzzers in many configurations.
The configuration is picked through one or more environment variables.
| Env Variable | Description
| ------------- | --------
| `$SANITIZER ("address")` | Specifies predefined sanitizer configuration to use. `address` or `memory` or `undefined`.
| `$SANITIZER_FLAGS` | Specify compiler sanitizer flags directly. Overrides `$SANITIZER`.
| `$COVERAGE_FLAGS` | Specify compiler flags to use for fuzzer feedback coverage.
| `$BUILD_UID` | User id to use while building fuzzers.
## Examples
- *building sqlite3 fuzzer with UBSan (`SANITIZER=undefined`):*
docker run --rm -ti -e SANITIZER=undefined gcr.io/oss-fuzz/sqlite3
# Image Files Layout
| Location|Env| Description |
|---------| -------- | ---------- |
| `/out/` | `$OUT` | Directory to store build artifacts (fuzz targets, dictionaries, options files, seed corpus archives). |
| `/src/` | `$SRC` | Directory to checkout source files |
| `/work/`| `$WORK` | Directory for storing intermediate files |
| `/usr/lib/libFuzzingEngine.a` | `$LIB_FUZZING_ENGINE` | Location of prebuilt fuzzing engine library (e.g. libFuzzer) that needs to be linked with all fuzz targets.
While files layout is fixed within a container, the environment variables are
provided to be able to write retargetable scripts.
## Compiler Flags
You *must* use special compiler flags to build your project and fuzz targets.
These flags are provided in following environment variables:
| Env Variable | Description
| ------------- | --------
| `$CC` | The C compiler binary.
| `$CXX`, `$CCC` | The C++ compiler binary.
| `$CFLAGS` | C compiler flags.
| `$CXXFLAGS` | C++ compiler flags.
Most well-crafted build scripts will automatically use these variables. If not,
pass them manually to the build tool.
# Child Image Interface
## Sources
Child image has to checkout all sources that it needs to compile fuzz targets into
`$SRC` directory. When the image is executed, a directory could be mounted on top
of these with local checkouts using
`docker run -v $HOME/my_project:/src/my_project ...`.
## Other Required Files
Following files have to be added by child images:
| File Location | Description |
| ------------- | ----------- |
| `$SRC/build.sh` | build script to build the project and its fuzz targets |
================================================
FILE: infra/base-images/base-builder/afl_llvm22_patch.diff
================================================
diff --git a/GNUmakefile.llvm b/GNUmakefile.llvm
index 2cde89d9..7cafd3f5 100644
--- a/GNUmakefile.llvm
+++ b/GNUmakefile.llvm
@@ -32,7 +32,7 @@ VERSION = $(shell grep '^ *$(HASH)define VERSION ' ./config.h | cut -d '"' -
SYS = $(shell uname -s)
-override LLVM_TOO_NEW_DEFAULT := 21
+override LLVM_TOO_NEW_DEFAULT := 22
override LLVM_TOO_OLD_DEFAULT := 14
ifeq "$(SYS)" "OpenBSD"
@@ -69,7 +69,7 @@ endif
LLVM_STDCXX := gnu++11
LLVM_LTO := 0
-LLVM_UNSUPPORTED := $(shell echo "$(LLVMVER)" | grep -E -q '^[0-2]\.|^3\.[0-7]\.|^2[2-9]\.' && echo 1 || echo 0)
+LLVM_UNSUPPORTED := $(shell echo "$(LLVMVER)" | grep -E -q '^[0-2]\.|^3\.[0-7]\.|^2[3-9]\.' && echo 1 || echo 0)
# Uncomment to see the values assigned above
# $(foreach var,_CLANG_VERSIONS_TO_TEST LLVM_CONFIG LLVMVER LLVM_MAJOR LLVM_MINOR LLVM_TOO_NEW LLVM_TOO_OLD LLVM_TOO_NEW_DEFAULT LLVM_TOO_OLD_DEFAULT LLVM_NEW_API LLVM_NEWER_API LLVM_13_OK LLVM_HAVE_LTO LLVM_BINDIR LLVM_LIBDIR LLVM_STDCXX LLVM_APPLE_XCODE LLVM_LTO LLVM_UNSUPPORTED,$(warning $(var) = $($(var))))
================================================
FILE: infra/base-images/base-builder/bazel.bazelrc
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Pass variables from environment.
build --action_env=FUZZ_INTROSPECTOR
build --action_env=FUZZINTRO_OUTDIR
================================================
FILE: infra/base-images/base-builder/bazel_build_fuzz_tests
================================================
#!/bin/bash -eu
#
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
: "${BAZEL_FUZZ_TEST_TAG:=fuzz-test}"
: "${BAZEL_FUZZ_TEST_EXCLUDE_TAG:=no-oss-fuzz}"
: "${BAZEL_PACKAGE_SUFFIX:=_oss_fuzz}"
: "${BAZEL_TOOL:=bazel}"
: "${BAZEL_EXTRA_BUILD_FLAGS:=}"
if [ "$FUZZING_LANGUAGE" = "jvm" ]; then
BAZEL_LANGUAGE=java
else
BAZEL_LANGUAGE=cc
fi
if [[ -z "${BAZEL_FUZZ_TEST_QUERY:-}" ]]; then
BAZEL_FUZZ_TEST_QUERY="
let all_fuzz_tests = attr(tags, \"${BAZEL_FUZZ_TEST_TAG}\", \"//...\") in
let lang_fuzz_tests = attr(generator_function, \"^${BAZEL_LANGUAGE}_fuzz_test\$\", \$all_fuzz_tests) in
\$lang_fuzz_tests - attr(tags, \"${BAZEL_FUZZ_TEST_EXCLUDE_TAG}\", \$lang_fuzz_tests)
"
fi
echo "Using Bazel query to find fuzz targets: ${BAZEL_FUZZ_TEST_QUERY}"
declare -r OSS_FUZZ_TESTS=(
$(bazel query "${BAZEL_FUZZ_TEST_QUERY}" | sed "s/$/${BAZEL_PACKAGE_SUFFIX}/")
)
echo "Found ${#OSS_FUZZ_TESTS[@]} fuzz test packages:"
for oss_fuzz_test in "${OSS_FUZZ_TESTS[@]}"; do
echo " ${oss_fuzz_test}"
done
declare -r BAZEL_BUILD_FLAGS=(
"--@rules_fuzzing//fuzzing:cc_engine=@rules_fuzzing_oss_fuzz//:oss_fuzz_engine" \
"--@rules_fuzzing//fuzzing:java_engine=@rules_fuzzing_oss_fuzz//:oss_fuzz_java_engine" \
"--@rules_fuzzing//fuzzing:cc_engine_instrumentation=oss-fuzz" \
"--@rules_fuzzing//fuzzing:cc_engine_sanitizer=none" \
"--cxxopt=-stdlib=libc++" \
"--linkopt=-lc++" \
"--verbose_failures" \
"--spawn_strategy=standalone" \
"--action_env=CC=${CC}" "--action_env=CXX=${CXX}" \
${BAZEL_EXTRA_BUILD_FLAGS[*]}
)
echo "Building the fuzz tests with the following Bazel options:"
echo " ${BAZEL_BUILD_FLAGS[@]}"
${BAZEL_TOOL} build "${BAZEL_BUILD_FLAGS[@]}" "${OSS_FUZZ_TESTS[@]}"
echo "Extracting the fuzz test packages in the output directory."
for oss_fuzz_archive in $(find bazel-bin/ -name "*${BAZEL_PACKAGE_SUFFIX}.tar"); do
tar --no-same-owner -xvf "${oss_fuzz_archive}" -C "${OUT}"
done
if [ "$SANITIZER" = "coverage" ]; then
echo "Collecting the repository source files for coverage tracking."
declare -r COVERAGE_SOURCES="${OUT}/proc/self/cwd"
mkdir -p "${COVERAGE_SOURCES}"
declare -r RSYNC_FILTER_ARGS=(
"--include" "*.h"
"--include" "*.cc"
"--include" "*.hpp"
"--include" "*.cpp"
"--include" "*.c"
"--include" "*.inc"
"--include" "*/"
"--exclude" "*"
)
rsync -avLk "${RSYNC_FILTER_ARGS[@]}" \
"$(bazel info execution_root)/" \
"${COVERAGE_SOURCES}/"
fi
================================================
FILE: infra/base-images/base-builder/bisect_clang.py
================================================
#!/usr/bin/env python3
# Copyright 2019 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Use git bisect to find the Clang/LLVM commit causing a regression."""
import logging
import os
import re
import shutil
import subprocess
import sys
def execute(command, *args, expect_zero=True, **kwargs):
"""Execute |command| and return the returncode, stdout and stderr."""
kwargs['stdout'] = subprocess.PIPE
kwargs['stderr'] = subprocess.PIPE
logging.debug('Running command: "%s"', str(command))
process = subprocess.Popen(command, *args, **kwargs)
stdout, stderr = process.communicate()
stdout = stdout.decode('utf-8')
stderr = stderr.decode('utf-8')
retcode = process.returncode
logging.info('Command: "%s" returned: %d.\nStdout: %s.\nStderr: %s',
str(command), retcode, stdout, stderr)
if expect_zero and retcode != 0:
raise subprocess.CalledProcessError(retcode, command)
return retcode, stdout, stderr
def search_bisect_output(output):
"""Search |output| for a message indicating the culprit commit has been
found."""
# TODO(metzman): Is it necessary to look for "good"?
culprit_regex = re.compile('([a-z0-9]{40}) is the first (good|bad) commit')
match = re.match(culprit_regex, output)
return match.group(1) if match is not None else None
class GitRepo:
"""Class for executing commmands on a git repo."""
def __init__(self, repo_dir):
self.repo_dir = repo_dir
def do_command(self, git_subcommand):
"""Execute a |git_subcommand| (a list of strings)."""
command = ['git', '-C', self.repo_dir] + git_subcommand
return execute(command)
def test_commit(self, test_command):
"""Build LLVM at the currently checkedout commit, then run |test_command|.
If returncode is 0 run 'git bisect good' otherwise return 'git bisect bad'.
Return None if bisect didn't finish yet. Return the culprit commit if it
does."""
build_clang(self.repo_dir)
retcode, _, _ = execute(test_command, shell=True, expect_zero=False)
if retcode == 0:
retcode, stdout, _ = self.do_bisect_command('good')
else:
retcode, stdout, _ = self.do_bisect_command('bad')
return search_bisect_output(stdout)
def bisect(self, good_commit, bad_commit, test_command):
"""Do git bisect assuming |good_commit| is good, |bad_commit| is bad and
|test_command| is an oracle. Return the culprit commit."""
self.bisect_start(good_commit, bad_commit, test_command)
result = self.test_commit(test_command)
while result is None:
result = self.test_commit(test_command)
return result
def bisect_start(self, good_commit, bad_commit, test_command):
"""Start doing git bisect."""
self.do_bisect_command('start')
# Do bad commit first since it is more likely to be recent.
self.test_start_commit(bad_commit, 'bad', test_command)
self.test_start_commit(good_commit, 'good', test_command)
def do_bisect_command(self, subcommand):
"""Execute a git bisect |subcommand| (string) and return the result."""
return self.do_command(['bisect', subcommand])
def test_start_commit(self, commit, label, test_command):
"""Use |test_command| to test the first good or bad |commit| (depending on
|label|)."""
assert label in ('good', 'bad'), label
self.do_command(['checkout', commit])
build_clang(self.repo_dir)
retcode, _, _ = execute(test_command, shell=True, expect_zero=False)
if label == 'good' and retcode != 0:
raise BisectError('Test command "%s" returns %d on first good commit %s' %
(test_command, retcode, commit))
if label == 'bad' and retcode == 0:
raise BisectError('Test command "%s" returns %d on first bad commit %s' %
(test_command, retcode, commit))
self.do_bisect_command(label)
class BisectError(Exception):
"""Error that was encountered during bisection."""
def get_clang_build_env():
"""Get an environment for building Clang."""
env = os.environ.copy()
for variable in ['CXXFLAGS', 'CFLAGS']:
if variable in env:
del env[variable]
return env
def install_clang_build_deps():
"""Instal dependencies necessary to build clang."""
execute([
'apt-get', 'install', '-y', 'build-essential', 'make', 'cmake',
'ninja-build', 'git', 'subversion', 'g++-multilib'
])
def clone_with_retries(repo, local_path, num_retries=10):
"""Clone |repo| to |local_path| if it doesn't exist already. Try up to
|num_retries| times. Return False if unable to checkout."""
if os.path.isdir(local_path):
return
for _ in range(num_retries):
if os.path.isdir(local_path):
shutil.rmtree(local_path)
retcode, _, _ = execute(['git', 'clone', repo, local_path],
expect_zero=False)
if retcode == 0:
return
raise Exception('Could not checkout %s.' % repo)
def get_clang_target_arch():
"""Get target architecture we want clang to target when we build it."""
_, arch, _ = execute(['uname', '-m'])
if 'x86_64' in arch:
return 'X86'
if 'aarch64' in arch:
return 'AArch64'
raise Exception('Unsupported target: %s.' % arch)
def prepare_build(llvm_project_path):
"""Prepare to build clang."""
llvm_build_dir = os.path.join(os.getenv('WORK'), 'llvm-build')
if not os.path.exists(llvm_build_dir):
os.mkdir(llvm_build_dir)
execute([
'cmake', '-G', 'Ninja', '-DLIBCXX_ENABLE_SHARED=OFF',
'-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON', '-DLIBCXXABI_ENABLE_SHARED=OFF',
'-DCMAKE_BUILD_TYPE=Release',
'-DLLVM_ENABLE_PROJECTS=libcxx;libcxxabi;compiler-rt;clang',
'-DLLVM_TARGETS_TO_BUILD=' + get_clang_target_arch(),
os.path.join(llvm_project_path, 'llvm')
],
env=get_clang_build_env(),
cwd=llvm_build_dir)
return llvm_build_dir
def build_clang(llvm_project_path):
"""Checkout, build and install Clang."""
# TODO(metzman): Merge Python checkout and build code with
# checkout_build_install_llvm.sh.
# TODO(metzman): Look into speeding this process using ccache.
# TODO(metzman): Make this program capable of handling MSAN and i386 Clang
# regressions.
llvm_build_dir = prepare_build(llvm_project_path)
execute(['ninja', '-C', llvm_build_dir, 'install'], env=get_clang_build_env())
def find_culprit_commit(test_command, good_commit, bad_commit):
"""Returns the culprit LLVM commit that introduced a bug revealed by running
|test_command|. Uses git bisect and treats |good_commit| as the first latest
known good commit and |bad_commit| as the first known bad commit."""
llvm_project_path = os.path.join(os.getenv('SRC'), 'llvm-project')
clone_with_retries('https://github.com/llvm/llvm-project.git',
llvm_project_path)
git_repo = GitRepo(llvm_project_path)
result = git_repo.bisect(good_commit, bad_commit, test_command)
print('Culprit commit', result)
return result
def main():
# pylint: disable=line-too-long
"""Finds the culprit LLVM commit that introduced a clang regression.
Can be tested using this command in a libsodium shell:
python3 bisect_clang.py "cd /src/libsodium; make clean; cd -; compile && /out/secret_key_auth_fuzzer -runs=100" \
f7e52fbdb5a7af8ea0808e98458b497125a5eca1 \
8288453f6aac05080b751b680455349e09d49825
"""
# pylint: enable=line-too-long
# TODO(metzman): Check CFLAGS for things like -fsanitize=fuzzer-no-link.
# TODO(metzman): Allow test_command to be optional and for just build.sh to be
# used instead.
test_command = sys.argv[1]
# TODO(metzman): Add in more automation so that the script can automatically
# determine the commits used in last Clang roll.
good_commit = sys.argv[2]
bad_commit = sys.argv[3]
# TODO(metzman): Make verbosity configurable.
logging.getLogger().setLevel(logging.DEBUG)
install_clang_build_deps()
find_culprit_commit(test_command, good_commit, bad_commit)
return 0
if __name__ == '__main__':
sys.exit(main())
================================================
FILE: infra/base-images/base-builder/bisect_clang_test.py
================================================
# Copyright 2019 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Tests for bisect_clang.py"""
import os
from unittest import mock
import unittest
import bisect_clang
FILE_DIRECTORY = os.path.dirname(__file__)
LLVM_REPO_PATH = '/llvm-project'
def get_git_command(*args):
"""Returns a git command for the LLVM repo with |args| as arguments."""
return ['git', '-C', LLVM_REPO_PATH] + list(args)
def patch_environ(testcase_obj):
"""Patch environment."""
env = {}
patcher = mock.patch.dict(os.environ, env)
testcase_obj.addCleanup(patcher.stop)
patcher.start()
class BisectClangTestMixin: # pylint: disable=too-few-public-methods
"""Useful mixin for bisect_clang unittests."""
def setUp(self): # pylint: disable=invalid-name
"""Initialization method for unittests."""
patch_environ(self)
os.environ['SRC'] = '/src'
os.environ['WORK'] = '/work'
class GetClangBuildEnvTest(BisectClangTestMixin, unittest.TestCase):
"""Tests for get_clang_build_env."""
def test_cflags(self):
"""Test that CFLAGS are not used compiling clang."""
os.environ['CFLAGS'] = 'blah'
self.assertNotIn('CFLAGS', bisect_clang.get_clang_build_env())
def test_cxxflags(self):
"""Test that CXXFLAGS are not used compiling clang."""
os.environ['CXXFLAGS'] = 'blah'
self.assertNotIn('CXXFLAGS', bisect_clang.get_clang_build_env())
def test_other_variables(self):
"""Test that other env vars are used when compiling clang."""
key = 'other'
value = 'blah'
os.environ[key] = value
self.assertEqual(value, bisect_clang.get_clang_build_env()[key])
def read_test_data(filename):
"""Returns data from |filename| in the test_data directory."""
with open(os.path.join(FILE_DIRECTORY, 'test_data', filename)) as file_handle:
return file_handle.read()
class SearchBisectOutputTest(BisectClangTestMixin, unittest.TestCase):
"""Tests for search_bisect_output."""
def test_search_bisect_output(self):
"""Test that search_bisect_output finds the responsible commit when one
exists."""
test_data = read_test_data('culprit-commit.txt')
self.assertEqual('ac9ee01fcbfac745aaedca0393a8e1c8a33acd8d',
bisect_clang.search_bisect_output(test_data))
def test_search_bisect_output_none(self):
"""Test that search_bisect_output doesnt find a non-existent culprit
commit."""
self.assertIsNone(bisect_clang.search_bisect_output('hello'))
def create_mock_popen(
output=bytes('', 'utf-8'), err=bytes('', 'utf-8'), returncode=0):
"""Creates a mock subprocess.Popen."""
class MockPopen:
"""Mock subprocess.Popen."""
commands = []
testcases_written = []
def __init__(self, command, *args, **kwargs): # pylint: disable=unused-argument
"""Inits the MockPopen."""
stdout = kwargs.pop('stdout', None)
self.command = command
self.commands.append(command)
self.stdout = None
self.stderr = None
self.returncode = returncode
if hasattr(stdout, 'write'):
self.stdout = stdout
def communicate(self, input_data=None): # pylint: disable=unused-argument
"""Mock subprocess.Popen.communicate."""
if self.stdout:
self.stdout.write(output)
if self.stderr:
self.stderr.write(err)
return output, err
def poll(self, input_data=None): # pylint: disable=unused-argument
"""Mock subprocess.Popen.poll."""
return self.returncode
return MockPopen
def mock_prepare_build_impl(llvm_project_path): # pylint: disable=unused-argument
"""Mocked prepare_build function."""
return '/work/llvm-build'
class BuildClangTest(BisectClangTestMixin, unittest.TestCase):
"""Tests for build_clang."""
def test_build_clang_test(self):
"""Tests that build_clang works as intended."""
with mock.patch('subprocess.Popen', create_mock_popen()) as mock_popen:
with mock.patch('bisect_clang.prepare_build', mock_prepare_build_impl):
llvm_src_dir = '/src/llvm-project'
bisect_clang.build_clang(llvm_src_dir)
self.assertEqual([['ninja', '-C', '/work/llvm-build', 'install']],
mock_popen.commands)
class GitRepoTest(BisectClangTestMixin, unittest.TestCase):
"""Tests for GitRepo."""
# TODO(metzman): Mock filesystem. Until then, use a real directory.
def setUp(self):
super().setUp()
self.git = bisect_clang.GitRepo(LLVM_REPO_PATH)
self.good_commit = 'good_commit'
self.bad_commit = 'bad_commit'
self.test_command = 'testcommand'
def test_do_command(self):
"""Test do_command creates a new process as intended."""
# TODO(metzman): Test directory changing behavior.
command = ['subcommand', '--option']
with mock.patch('subprocess.Popen', create_mock_popen()) as mock_popen:
self.git.do_command(command)
self.assertEqual([get_git_command('subcommand', '--option')],
mock_popen.commands)
def _test_test_start_commit_unexpected(self, label, commit, returncode):
"""Tests test_start_commit works as intended when the test returns an
unexpected value."""
def mock_execute_impl(command, *args, **kwargs): # pylint: disable=unused-argument
if command == self.test_command:
return returncode, '', ''
return 0, '', ''
with mock.patch('bisect_clang.execute', mock_execute_impl):
with mock.patch('bisect_clang.prepare_build', mock_prepare_build_impl):
with self.assertRaises(bisect_clang.BisectError):
self.git.test_start_commit(commit, label, self.test_command)
def test_test_start_commit_bad_zero(self):
"""Tests test_start_commit works as intended when the test on the first bad
commit returns 0."""
self._test_test_start_commit_unexpected('bad', self.bad_commit, 0)
def test_test_start_commit_good_nonzero(self):
"""Tests test_start_commit works as intended when the test on the first good
commit returns nonzero."""
self._test_test_start_commit_unexpected('good', self.good_commit, 1)
def test_test_start_commit_good_zero(self):
"""Tests test_start_commit works as intended when the test on the first good
commit returns 0."""
self._test_test_start_commit_expected('good', self.good_commit, 0) # pylint: disable=no-value-for-parameter
@mock.patch('bisect_clang.build_clang')
def _test_test_start_commit_expected(self, label, commit, returncode,
mock_build_clang):
"""Tests test_start_commit works as intended when the test returns an
expected value."""
command_args = []
def mock_execute_impl(command, *args, **kwargs): # pylint: disable=unused-argument
command_args.append(command)
if command == self.test_command:
return returncode, '', ''
return 0, '', ''
with mock.patch('bisect_clang.execute', mock_execute_impl):
self.git.test_start_commit(commit, label, self.test_command)
self.assertEqual([
get_git_command('checkout', commit), self.test_command,
get_git_command('bisect', label)
], command_args)
mock_build_clang.assert_called_once_with(LLVM_REPO_PATH)
def test_test_start_commit_bad_nonzero(self):
"""Tests test_start_commit works as intended when the test on the first bad
commit returns nonzero."""
self._test_test_start_commit_expected('bad', self.bad_commit, 1) # pylint: disable=no-value-for-parameter
@mock.patch('bisect_clang.GitRepo.test_start_commit')
def test_bisect_start(self, mock_test_start_commit):
"""Tests bisect_start works as intended."""
with mock.patch('subprocess.Popen', create_mock_popen()) as mock_popen:
self.git.bisect_start(self.good_commit, self.bad_commit,
self.test_command)
self.assertEqual(get_git_command('bisect', 'start'),
mock_popen.commands[0])
mock_test_start_commit.assert_has_calls([
mock.call('bad_commit', 'bad', 'testcommand'),
mock.call('good_commit', 'good', 'testcommand')
])
def test_do_bisect_command(self):
"""Test do_bisect_command executes a git bisect subcommand as intended."""
subcommand = 'subcommand'
with mock.patch('subprocess.Popen', create_mock_popen()) as mock_popen:
self.git.do_bisect_command(subcommand)
self.assertEqual([get_git_command('bisect', subcommand)],
mock_popen.commands)
@mock.patch('bisect_clang.build_clang')
def _test_test_commit(self, label, output, returncode, mock_build_clang):
"""Test test_commit works as intended."""
command_args = []
def mock_execute_impl(command, *args, **kwargs): # pylint: disable=unused-argument
command_args.append(command)
if command == self.test_command:
return returncode, output, ''
return 0, output, ''
with mock.patch('bisect_clang.execute', mock_execute_impl):
result = self.git.test_commit(self.test_command)
self.assertEqual([self.test_command,
get_git_command('bisect', label)], command_args)
mock_build_clang.assert_called_once_with(LLVM_REPO_PATH)
return result
def test_test_commit_good(self):
"""Test test_commit labels a good commit as good."""
self.assertIsNone(self._test_test_commit('good', '', 0)) # pylint: disable=no-value-for-parameter
def test_test_commit_bad(self):
"""Test test_commit labels a bad commit as bad."""
self.assertIsNone(self._test_test_commit('bad', '', 1)) # pylint: disable=no-value-for-parameter
def test_test_commit_culprit(self):
"""Test test_commit returns the culprit"""
test_data = read_test_data('culprit-commit.txt')
self.assertEqual('ac9ee01fcbfac745aaedca0393a8e1c8a33acd8d',
self._test_test_commit('good', test_data, 0)) # pylint: disable=no-value-for-parameter
class GetTargetArchToBuildTest(unittest.TestCase):
"""Tests for get_target_arch_to_build."""
def test_unrecognized(self):
"""Test that an unrecognized architecture raises an exception."""
with mock.patch('bisect_clang.execute') as mock_execute:
mock_execute.return_value = (None, 'mips', None)
with self.assertRaises(Exception):
bisect_clang.get_clang_target_arch()
def test_recognized(self):
"""Test that a recognized architecture returns the expected value."""
arch_pairs = {'x86_64': 'X86', 'aarch64': 'AArch64'}
for uname_result, clang_target in arch_pairs.items():
with mock.patch('bisect_clang.execute') as mock_execute:
mock_execute.return_value = (None, uname_result, None)
self.assertEqual(clang_target, bisect_clang.get_clang_target_arch())
================================================
FILE: infra/base-images/base-builder/cargo
================================================
#!/bin/bash -eu
# Copyright 2020 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This is a wrapper around calling cargo
# This just expands RUSTFLAGS in case of a coverage build
# We need this until https://github.com/rust-lang/cargo/issues/5450 is merged
# because cargo uses relative paths for the current crate
# and absolute paths for its dependencies
#
################################################################################
if [ "$SANITIZER" = "coverage" ] && [ $1 = "build" ]
then
crate_src_abspath=`cargo metadata --no-deps --format-version 1 | jq -r '.workspace_root'`
export RUSTFLAGS="$RUSTFLAGS --remap-path-prefix src=$crate_src_abspath/src"
fi
if [ "$SANITIZER" = "coverage" ] && [ $1 = "fuzz" ] && [ $2 = "build" ]
then
# hack to turn cargo fuzz build into cargo build so as to get coverage
# cargo fuzz adds "--target" "x86_64-unknown-linux-gnu"
(
# go into fuzz directory if not already the case
cd fuzz || true
fuzz_src_abspath=`pwd`
# Default directory is fuzz_targets, but some projects like image-rs use fuzzers.
while read i; do
export RUSTFLAGS="$RUSTFLAGS --remap-path-prefix $i=$fuzz_src_abspath/$i"
# Bash while syntax so that we modify RUSTFLAGS in main shell instead of a subshell.
done <<< "$(find . -name "*.rs" | cut -d/ -f2 | uniq)"
# we do not want to trigger debug assertions and stops
export RUSTFLAGS="$RUSTFLAGS -C debug-assertions=no"
# do not optimize with --release, leading to Malformed instrumentation profile data
cargo build --bins
# copies the build output in the expected target directory
cd `cargo metadata --format-version 1 --no-deps | jq -r '.target_directory'`
mkdir -p x86_64-unknown-linux-gnu/release
cp -r debug/* x86_64-unknown-linux-gnu/release/
)
exit 0
fi
/rust/bin/cargo "$@"
================================================
FILE: infra/base-images/base-builder/compile
================================================
#!/bin/bash -eu
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo "---------------------------------------------------------------"
sysctl -w vm.mmap_rnd_bits=28
OSS_FUZZ_ON_DEMAND="${OSS_FUZZ_ON_DEMAND:-0}"
# Used for Rust introspector builds
RUST_SANITIZER=$SANITIZER
if [ "$FUZZING_LANGUAGE" = "jvm" ]; then
if [ "$SANITIZER" != "address" ] && [ "$SANITIZER" != "coverage" ] && [ "$SANITIZER" != "undefined" ] && [ "$SANITIZER" != "none" ] && [ "$SANITIZER" != "introspector" ]; then
echo "ERROR: JVM projects can be fuzzed with AddressSanitizer or UndefinedBehaviorSanitizer or Introspector only."
exit 1
fi
if [ "$ARCHITECTURE" != "x86_64" ]; then
echo "ERROR: JVM projects can be fuzzed on x86_64 architecture only."
exit 1
fi
fi
if [ "$FUZZING_LANGUAGE" = "rust" ]; then
if [ "$SANITIZER" = "introspector" ]; then
# introspector sanitizer flag will cause cargo build to fail. Rremove it
# temporarily, RUST_SANITIZER will hold the original sanitizer.
export SANITIZER=address
fi
fi
if [ "$FUZZING_LANGUAGE" = "javascript" ]; then
if [ "$FUZZING_ENGINE" != "libfuzzer" ]; then
echo "ERROR: JavaScript projects can be fuzzed with libFuzzer engine only."
exit 1
fi
if [ "$SANITIZER" != "coverage" ] && [ "$SANITIZER" != "none" ]; then
echo "ERROR: JavaScript projects cannot be fuzzed with sanitizers."
exit 1
fi
if [ "$ARCHITECTURE" != "x86_64" ]; then
echo "ERROR: JavaScript projects can be fuzzed on x86_64 architecture only."
exit 1
fi
fi
if [ "$FUZZING_LANGUAGE" = "python" ]; then
if [ "$FUZZING_ENGINE" != "libfuzzer" ]; then
echo "ERROR: Python projects can be fuzzed with libFuzzer engine only."
exit 1
fi
if [ "$SANITIZER" != "address" ] && [ "$SANITIZER" != "undefined" ] && [ "$SANITIZER" != "coverage" ] && [ "$SANITIZER" != "introspector" ]; then
echo "ERROR: Python projects can be fuzzed with AddressSanitizer or UndefinedBehaviorSanitizer or Coverage or Fuzz Introspector only."
exit 1
fi
if [ "$ARCHITECTURE" != "x86_64" ]; then
echo "ERROR: Python projects can be fuzzed on x86_64 architecture only."
exit 1
fi
fi
if [ -z "${SANITIZER_FLAGS-}" ]; then
FLAGS_VAR="SANITIZER_FLAGS_${SANITIZER}"
export SANITIZER_FLAGS=${!FLAGS_VAR-}
fi
if [[ $ARCHITECTURE == "i386" ]]; then
export CFLAGS="-m32 $CFLAGS"
cp -R /usr/i386/lib/* /usr/local/lib
cp -R /usr/i386/include/* /usr/local/include
fi
# Don't use a fuzzing engine with Jazzer which has libFuzzer built-in or with
# FuzzBench which will provide the fuzzing engine.
if [[ $FUZZING_ENGINE != "none" ]] && [[ $FUZZING_LANGUAGE != "jvm" ]] && [[ "${OSS_FUZZ_ON_DEMAND}" == "0" ]] ; then
# compile script might override environment, use . to call it.
. compile_${FUZZING_ENGINE}
fi
if [[ $SANITIZER_FLAGS = *sanitize=memory* ]]
then
# Take all libraries from lib/msan
# export CXXFLAGS_EXTRA="-L/usr/msan/lib $CXXFLAGS_EXTRA"
cp -R /usr/msan/lib/* /usr/local/lib/x86_64-unknown-linux-gnu/
cp -R /usr/msan/include/* /usr/local/include
echo 'Building without MSan instrumented libraries.'
fi
# Coverage flag overrides.
COVERAGE_FLAGS_VAR="COVERAGE_FLAGS_${SANITIZER}"
if [[ -n ${!COVERAGE_FLAGS_VAR+x} ]]
then
export COVERAGE_FLAGS="${!COVERAGE_FLAGS_VAR}"
fi
# Only need the default coverage instrumentation for libFuzzer or honggfuzz.
# Other engines bring their own.
if [ $FUZZING_ENGINE = "none" ] || [ $FUZZING_ENGINE = "afl" ] || [ $FUZZING_ENGINE = "centipede" ] || [ "${OSS_FUZZ_ON_DEMAND}" != "0" ]; then
export COVERAGE_FLAGS=
fi
# Rust does not support sanitizers and coverage flags via CFLAGS/CXXFLAGS, so
# use RUSTFLAGS.
# FIXME: Support code coverage once support is in.
# See https://github.com/rust-lang/rust/issues/34701.
if [ "$RUST_SANITIZER" == "introspector" ]; then
export RUSTFLAGS="-Cdebuginfo=2 -Cforce-frame-pointers"
elif [ "$SANITIZER" != "undefined" ] && [ "$SANITIZER" != "coverage" ] && [ "$SANITIZER" != "none" ] && [ "$ARCHITECTURE" != 'i386' ]; then
export RUSTFLAGS="--cfg fuzzing -Zsanitizer=${SANITIZER} -Cdebuginfo=1 -Cforce-frame-pointers"
else
export RUSTFLAGS="--cfg fuzzing -Cdebuginfo=1 -Cforce-frame-pointers"
fi
if [ "$SANITIZER" = "coverage" ]
then
# link to C++ from comment in f5098035eb1a14aa966c8651d88ea3d64323823d
export RUSTFLAGS="$RUSTFLAGS -Cinstrument-coverage -C link-arg=-lc++"
fi
# Add Rust libfuzzer flags.
# See https://github.com/rust-fuzz/libfuzzer/blob/master/build.rs#L12.
export CUSTOM_LIBFUZZER_PATH="$LIB_FUZZING_ENGINE_DEPRECATED"
export CUSTOM_LIBFUZZER_STD_CXX=c++
export CFLAGS="$CFLAGS $SANITIZER_FLAGS $COVERAGE_FLAGS"
export CXXFLAGS="$CFLAGS $CXXFLAGS_EXTRA"
if [ "$SANITIZER" = "undefined" ]; then
# Disable "function" sanitizer for C code for now, because many projects,
# possibly via legacy C code are affected.
# The projects should be fixed and this workaround be removed in the future.
# TODO(#11778):
# https://github.com/google/oss-fuzz/issues/11778
export CFLAGS="$CFLAGS -fno-sanitize=function"
fi
if [ "$FUZZING_LANGUAGE" = "go" ]; then
# required by Go 1.20
export CXX="${CXX} -lresolv"
fi
if [ "$FUZZING_LANGUAGE" = "python" ]; then
sanitizer_with_fuzzer_lib_dir=`python3 -c "import atheris; import os; print(atheris.path())"`
sanitizer_with_fuzzer_output_lib=$OUT/sanitizer_with_fuzzer.so
if [ "$SANITIZER" = "address" ]; then
cp $sanitizer_with_fuzzer_lib_dir/asan_with_fuzzer.so $sanitizer_with_fuzzer_output_lib
elif [ "$SANITIZER" = "undefined" ]; then
cp $sanitizer_with_fuzzer_lib_dir/ubsan_with_fuzzer.so $sanitizer_with_fuzzer_output_lib
fi
# Disable leak checking as it is unsupported.
export CFLAGS="$CFLAGS -fno-sanitize=function,leak,vptr,"
export CXXFLAGS="$CXXFLAGS -fno-sanitize=function,leak,vptr"
fi
# Copy latest llvm-symbolizer in $OUT for stack symbolization.
cp $(which llvm-symbolizer) $OUT/
# Copy Jazzer to $OUT if needed.
if [ "$FUZZING_LANGUAGE" = "jvm" ]; then
cp $(which jazzer_agent_deploy.jar) $(which jazzer_driver) $(which jazzer_junit.jar) $OUT/
jazzer_driver_with_sanitizer=$OUT/jazzer_driver_with_sanitizer
if [ "$SANITIZER" = "address" ]; then
cat > $jazzer_driver_with_sanitizer << 'EOF'
#!/bin/bash
this_dir=$(dirname "$0")
"$this_dir/jazzer_driver" --asan "$@"
EOF
elif [ "$SANITIZER" = "undefined" ]; then
cat > $jazzer_driver_with_sanitizer << 'EOF'
#!/bin/bash
this_dir=$(dirname "$0")
"$this_dir/jazzer_driver" --ubsan "$@"
EOF
elif [ "$SANITIZER" = "coverage" ] || [ "$SANITIZER" = "introspector" ]; then
# Coverage & introspector builds require no instrumentation.
cp $(which jazzer_driver) $jazzer_driver_with_sanitizer
fi
chmod +x $jazzer_driver_with_sanitizer
# Disable leak checking since the JVM triggers too many false positives.
export CFLAGS="$CFLAGS -fno-sanitize=leak"
export CXXFLAGS="$CXXFLAGS -fno-sanitize=leak"
fi
if [ "$SANITIZER" = "introspector" ] || [ "$RUST_SANITIZER" = "introspector" ]; then
export AR=llvm-ar
export NM=llvm-nm
export RANLIB=llvm-ranlib
export CFLAGS="$CFLAGS -g"
export CXXFLAGS="$CXXFLAGS -g"
export FI_BRANCH_PROFILE=1
export FUZZ_INTROSPECTOR=1
export FUZZ_INTROSPECTOR_AUTO_FUZZ=1
# Move ar and ranlib
mv /usr/bin/ar /usr/bin/old-ar
mv /usr/bin/nm /usr/bin/old-nm
mv /usr/bin/ranlib /usr/bin/old-ranlib
ln -sf /usr/local/bin/llvm-ar /usr/bin/ar
ln -sf /usr/local/bin/llvm-nm /usr/bin/nm
ln -sf /usr/local/bin/llvm-ranlib /usr/bin/ranlib
apt-get install -y libjpeg-dev zlib1g-dev libyaml-dev
python3 -m pip install --upgrade pip setuptools
python3 -m pip install cxxfilt pyyaml beautifulsoup4 lxml soupsieve rust-demangler
python3 -m pip install --prefer-binary matplotlib
# Install Fuzz-Introspector
pushd /fuzz-introspector/src
python3 -m pip install -e .
popd
if [ "$FUZZING_LANGUAGE" = "python" ]; then
python3 /fuzz-introspector/src/main.py light --language=python
cp -rf $SRC/inspector/ /tmp/inspector-saved
elif [ "$FUZZING_LANGUAGE" = "jvm" ]; then
python3 /fuzz-introspector/src/main.py light --language=jvm
cp -rf $SRC/inspector/ /tmp/inspector-saved
elif [ "$FUZZING_LANGUAGE" = "rust" ]; then
python3 /fuzz-introspector/src/main.py light --language=rust
cp -rf $SRC/inspector/ /tmp/inspector-saved
else
python3 /fuzz-introspector/src/main.py light
# Make a copy of the light. This is needed because we run two versions of
# introspector: one based on pure statis analysis and one based on
# regular LTO.
cp -rf $SRC/inspector/ /tmp/inspector-saved
# Move coverage report.
if [ -d "$OUT/textcov_reports" ]
then
find $OUT/textcov_reports/ -name "*.covreport" -exec cp {} $SRC/inspector/ \;
find $OUT/textcov_reports/ -name "*.json" -exec cp {} $SRC/inspector/ \;
fi
# Make fuzz-introspector HTML report using light approach.
REPORT_ARGS="--name=$PROJECT_NAME"
# Only pass coverage_url when COVERAGE_URL is set (in cloud builds)
if [[ ! -z "${COVERAGE_URL+x}" ]]; then
REPORT_ARGS="$REPORT_ARGS --coverage-url=${COVERAGE_URL}"
fi
# Run pure static analysis fuzz introspector
fuzz-introspector full --target-dir=$SRC \
--language=${FUZZING_LANGUAGE} \
--out-dir=$SRC/inspector \
${REPORT_ARGS}
fi
rsync -avu --delete "$SRC/inspector/" "$OUT/inspector"
fi
echo "---------------------------------------------------------------"
echo "CC=$CC"
echo "CXX=$CXX"
echo "CFLAGS=$CFLAGS"
echo "CXXFLAGS=$CXXFLAGS"
echo "RUSTFLAGS=$RUSTFLAGS"
echo "---------------------------------------------------------------"
if [ "${OSS_FUZZ_ON_DEMAND}" != "0" ]; then
fuzzbench_build
cp $(which llvm-symbolizer) $OUT/
exit 0
fi
# Prepare the build command to run the project's build script.
if [[ ! -z "${REPLAY_ENABLED-}" ]]; then
# If this is a replay, then use replay_build.sh. This is expected to be
# running in a cached container where a build has already happened prior.
BUILD_CMD="bash -eux $SRC/replay_build.sh $@"
else
BUILD_CMD="bash -eux $SRC/build.sh $@"
fi
# Set +u temporarily to continue even if GOPATH and OSSFUZZ_RUSTPATH are undefined.
set +u
# We need to preserve source code files for generating a code coverage report.
# We need exact files that were compiled, so copy both $SRC and $WORK dirs.
COPY_SOURCES_CMD="cp -rL --parents $SRC $WORK /usr/include /usr/local/include $GOPATH $OSSFUZZ_RUSTPATH /rustc $OUT"
set -u
if [ "$FUZZING_LANGUAGE" = "rust" ]; then
# Copy rust std lib to its path with a hash.
export rustch=`rustc --version --verbose | grep commit-hash | cut -d' ' -f2`
mkdir -p /rustc/$rustch/
export rustdef=`rustup toolchain list | grep default | cut -d' ' -f1`
cp -r /rust/rustup/toolchains/$rustdef/lib/rustlib/src/rust/library/ /rustc/$rustch/
fi
if [ "${BUILD_UID-0}" -ne "0" ]; then
adduser -u $BUILD_UID --disabled-password --gecos '' builder
chown -R builder $SRC $OUT $WORK
su -c "$BUILD_CMD" builder
if [ "$SANITIZER" = "coverage" ]; then
# Some directories have broken symlinks (e.g. honggfuzz), ignore the errors.
su -c "$COPY_SOURCES_CMD" builder 2>/dev/null || true
fi
else
$BUILD_CMD
if [ "$SANITIZER" = "coverage" ]; then
# Some directories have broken symlinks (e.g. honggfuzz), ignore the errors.
$COPY_SOURCES_CMD 2>/dev/null || true
fi
fi
if [ "$SANITIZER" = "introspector" ] || [ "$RUST_SANITIZER" = "introspector" ]; then
unset CXXFLAGS
unset CFLAGS
export G_ANALYTICS_TAG="G-8WTFM1Y62J"
# If we get to here, it means the e.g. LTO had no problems and succeeded.
# TO this end, we wlil restore the original light analysis and used the
# LTO processing itself.
rm -rf $SRC/inspector
cp -rf /tmp/inspector-saved $SRC/inspector
cd /fuzz-introspector/src
python3 -m pip install -e .
cd /src/
if [ "$FUZZING_LANGUAGE" = "rust" ]; then
# Restore the sanitizer flag for rust
export SANITIZER="introspector"
fi
mkdir -p $SRC/inspector
find $SRC/ -name "fuzzerLogFile-*.data" -exec cp {} $SRC/inspector/ \;
find $SRC/ -name "fuzzerLogFile-*.data.yaml" -exec cp {} $SRC/inspector/ \;
find $SRC/ -name "fuzzerLogFile-*.data.debug_*" -exec cp {} $SRC/inspector/ \;
find $SRC/ -name "allFunctionsWithMain-*.yaml" -exec cp {} $SRC/inspector/ \;
# Move coverage report.
if [ -d "$OUT/textcov_reports" ]
then
find $OUT/textcov_reports/ -name "*.covreport" -exec cp {} $SRC/inspector/ \;
find $OUT/textcov_reports/ -name "*.json" -exec cp {} $SRC/inspector/ \;
fi
cd $SRC/inspector
# Make fuzz-introspector HTML report.
REPORT_ARGS="--name=$PROJECT_NAME"
# Only pass coverage_url when COVERAGE_URL is set (in cloud builds)
if [[ ! -z "${COVERAGE_URL+x}" ]]; then
REPORT_ARGS="$REPORT_ARGS --coverage-url=${COVERAGE_URL}"
fi
# Do different things depending on languages
if [ "$FUZZING_LANGUAGE" = "python" ]; then
echo "GOING python route"
set -x
REPORT_ARGS="$REPORT_ARGS --target-dir=$SRC/inspector"
REPORT_ARGS="$REPORT_ARGS --language=python"
fuzz-introspector report $REPORT_ARGS
rsync -avu --delete "$SRC/inspector/" "$OUT/inspector"
elif [ "$FUZZING_LANGUAGE" = "jvm" ]; then
echo "GOING jvm route"
set -x
find $OUT/ -name "jacoco.xml" -exec cp {} $SRC/inspector/ \;
REPORT_ARGS="$REPORT_ARGS --target-dir=$SRC --out-dir=$SRC/inspector"
REPORT_ARGS="$REPORT_ARGS --language=jvm"
fuzz-introspector full $REPORT_ARGS
rsync -avu --delete "$SRC/inspector/" "$OUT/inspector"
elif [ "$FUZZING_LANGUAGE" = "rust" ]; then
echo "GOING rust route"
REPORT_ARGS="$REPORT_ARGS --target-dir=$SRC --out-dir=$SRC/inspector"
REPORT_ARGS="$REPORT_ARGS --language=rust"
fuzz-introspector full $REPORT_ARGS
rsync -avu --delete "$SRC/inspector/" "$OUT/inspector"
else
# C/C++
mkdir -p $SRC/inspector
# Correlate fuzzer binaries to fuzz-introspector's raw data
fuzz-introspector correlate --binaries-dir=$OUT/
# Generate fuzz-introspector HTML report, this generates
# the file exe_to_fuzz_introspector_logs.yaml
REPORT_ARGS="$REPORT_ARGS --target-dir=$SRC/inspector"
# Use the just-generated correlation file
REPORT_ARGS="$REPORT_ARGS --correlation-file=exe_to_fuzz_introspector_logs.yaml"
fuzz-introspector report $REPORT_ARGS
rsync -avu --delete "$SRC/inspector/" "$OUT/inspector"
fi
fi
================================================
FILE: infra/base-images/base-builder/compile_afl
================================================
#!/bin/bash -eu
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# If LLVM once again does weird changes then enable this:
#export AFL_LLVM_INSTRUMENT=LLVM-NATIVE
# AFL++ setup
echo "Copying precompiled AFL++"
# Copy AFL++ tools necessary for fuzzing.
pushd $SRC/aflplusplus > /dev/null
cp -f libAFLDriver.a $LIB_FUZZING_ENGINE
# Some important projects include libraries, copy those even when they don't
# start with "afl-". Use "sort -u" to avoid a warning about duplicates.
ls afl-* *.txt *.a *.o *.so | sort -u | xargs cp -t $OUT
export CC="$SRC/aflplusplus/afl-clang-fast"
export CXX="$SRC/aflplusplus/afl-clang-fast++"
# Set sane AFL++ environment defaults:
# Be quiet, otherwise this can break some builds.
export AFL_QUIET=1
# No leak errors during builds.
export ASAN_OPTIONS="detect_leaks=0:symbolize=0:detect_odr_violation=0:abort_on_error=1"
# Do not abort on any problems (because this is during build where it is ok)
export AFL_IGNORE_PROBLEMS=1
# No complain on unknown AFL environment variables
export AFL_IGNORE_UNKNOWN_ENVS=1
# Provide a way to document the AFL++ options used in this build:
echo
echo AFL++ target compilation setup:
env | egrep '^AFL_' | tee "$OUT/afl_options.txt"
echo
popd > /dev/null
echo " done."
================================================
FILE: infra/base-images/base-builder/compile_centipede
================================================
#!/bin/bash -eu
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo "Skipping compilation; using precompiled centipede"
if [[ "$SANITIZER" == 'none' ]]; then
cp "$CENTIPEDE_BIN_DIR/centipede" "$OUT"
fi
cp "$CENTIPEDE_BIN_DIR/libcentipede_runner.pic.a" "$LIB_FUZZING_ENGINE"
export CENTIPEDE_FLAGS=`cat "$SRC/fuzztest/centipede/clang-flags.txt" | tr '\n' ' '`
export LIBRARIES_FLAGS="-Wno-unused-command-line-argument -Wl,-ldl -Wl,-lrt -Wl,-lpthread -Wl,$SRC/fuzztest/centipede/weak.o"
export CFLAGS="$CFLAGS $CENTIPEDE_FLAGS $LIBRARIES_FLAGS"
export CXXFLAGS="$CXXFLAGS $CENTIPEDE_FLAGS $LIBRARIES_FLAGS"
echo 'done.'
================================================
FILE: infra/base-images/base-builder/compile_fuzztests.sh
================================================
#!/bin/bash -eu
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
set -x
# In order to identify fuzztest test case "bazel query" is used to search
# the project. A search of the entire project is done with a default "...",
# however, some projects may fail to, or have very long processing time, if
# searching the entire project. Additionally, it may include fuzzers in
# dependencies, which should not be build as part of a given project.
# Tensorflow is an example project that will fail when the entire project is
# queried. FUZZTEST_TARGET_FOLDER makes it posible to specify the folder
# where fuzztest fuzzers should be search for. FUZZTEST_TARGET_FOLDER is passed
# to "bazel query" below.
if [[ ${FUZZTEST_TARGET_FOLDER:-"unset"} == "unset" ]];
then
export TARGET_FOLDER="..."
else
TARGET_FOLDER=${FUZZTEST_TARGET_FOLDER}
fi
BUILD_ARGS="--config=oss-fuzz --subcommands"
if [[ ${FUZZTEST_EXTRA_ARGS:-"unset"} != "unset" ]];
then
BUILD_ARGS="$BUILD_ARGS ${FUZZTEST_EXTRA_ARGS}"
fi
# Trigger setup_configs rule of fuzztest as it generates the necessary
# configuration file based on OSS-Fuzz environment variables.
bazel run @com_google_fuzztest//bazel:setup_configs >> /etc/bazel.bazelrc
# Bazel target names of the fuzz binaries.
FUZZ_TEST_BINARIES=$(bazel query "kind(\"cc_test\", rdeps(${TARGET_FOLDER}, @com_google_fuzztest//fuzztest:fuzztest_gtest_main))")
# Bazel output paths of the fuzz binaries.
FUZZ_TEST_BINARIES_OUT_PATHS=$(bazel cquery "kind(\"cc_test\", rdeps(${TARGET_FOLDER}, @com_google_fuzztest//fuzztest:fuzztest_gtest_main))" --output=files)
# Build the project and fuzz binaries
# Expose `FUZZTEST_EXTRA_TARGETS` environment variable, in the event a project
# includes non-FuzzTest fuzzers then this can be used to compile these in the
# same `bazel build` command as when building the FuzzTest fuzzers.
# This is to avoid having to call `bazel build` twice.
bazel build $BUILD_ARGS -- ${FUZZ_TEST_BINARIES[*]} ${FUZZTEST_EXTRA_TARGETS:-}
# Iterate the fuzz binaries and list each fuzz entrypoint in the binary. For
# each entrypoint create a wrapper script that calls into the binaries the
# given entrypoint as argument.
# The scripts will be named:
# {binary_name}@{fuzztest_entrypoint}
for fuzz_main_file in $FUZZ_TEST_BINARIES_OUT_PATHS; do
FUZZ_TESTS=$($fuzz_main_file --list_fuzz_tests)
cp ${fuzz_main_file} $OUT/
fuzz_basename=$(basename $fuzz_main_file)
chmod -x $OUT/$fuzz_basename
for fuzz_entrypoint in $FUZZ_TESTS; do
TARGET_FUZZER="${fuzz_basename}@$fuzz_entrypoint"
# Write executer script
echo "#!/bin/sh
# LLVMFuzzerTestOneInput for fuzzer detection.
this_dir=\$(dirname \"\$0\")
chmod +x \$this_dir/$fuzz_basename
\$this_dir/$fuzz_basename --fuzz=$fuzz_entrypoint -- \$@" > $OUT/$TARGET_FUZZER
chmod +x $OUT/$TARGET_FUZZER
done
done
# Synchronise coverage directory to bazel output artifacts. This is a
# best-effort basis in that it will include source code in common
# bazel output folders.
# For projects that store results in non-standard folders or want to
# manage what code to include in the coverage report more specifically,
# the FUZZTEST_DO_SYNC environment variable is made available. Projects
# can then implement a custom way of synchronising source code with the
# coverage build. Set FUZZTEST_DO_SYNC to something other than "yes" and
# no effort will be made to automatically synchronise the source code with
# the code coverage visualisation utility.
if [[ "$SANITIZER" = "coverage" && ${FUZZTEST_DO_SYNC:-"yes"} == "yes" ]]
then
# Synchronize bazel source files to coverage collection.
declare -r REMAP_PATH="${OUT}/proc/self/cwd"
mkdir -p "${REMAP_PATH}"
# Synchronize the folder bazel-BAZEL_OUT_PROJECT.
declare -r RSYNC_FILTER_ARGS=("--include" "*.h" "--include" "*.cc" "--include" \
"*.hpp" "--include" "*.cpp" "--include" "*.c" "--include" "*/" "--include" "*.inc" \
"--exclude" "*")
project_folders="$(find . -name 'bazel-*' -type l -printf '%P\n' | \
grep -v -x -F \
-e 'bazel-bin' \
-e 'bazel-testlogs')"
for link in $project_folders; do
if [[ -d "${PWD}"/$link/external ]]
then
rsync -avLk "${RSYNC_FILTER_ARGS[@]}" "${PWD}"/$link/external "${REMAP_PATH}"
fi
# k8-opt is a common path for storing bazel output artifacts, e.g. bazel-out/k8-opt.
# It's the output folder for default amd-64 builds, but projects may specify custom
# platform output directories, see: https://github.com/bazelbuild/bazel/issues/13818
# We support the default at the moment, and if a project needs custom synchronizing of
# output artifacts and code coverage we currently recommend using FUZZTEST_DO_SYNC.
if [[ -d "${PWD}"/$link/k8-opt ]]
then
rsync -avLk "${RSYNC_FILTER_ARGS[@]}" "${PWD}"/$link/k8-opt "${REMAP_PATH}"/$link
fi
done
# Delete symlinks and sync the current folder.
find . -type l -ls -delete
rsync -av ${PWD}/ "${REMAP_PATH}"
fi
================================================
FILE: infra/base-images/base-builder/compile_go_fuzzer
================================================
#!/bin/bash -eu
# Copyright 2020 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
path=$1
function=$2
fuzzer=$3
tags="-tags gofuzz"
if [[ $# -eq 4 ]]; then
tags="-tags $4"
fi
# Import go_utils.sh
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/go_utils.sh"
# makes directory change temporary
(
cd $GOPATH/src/$path || true
# in the case we are in the right directory, with go.mod but no go.sum
go mod tidy || true
# project was downloaded with go get if go list fails
go list $tags $path || { cd $GOPATH/pkg/mod/ && cd `echo $path | cut -d/ -f1-3 | awk '{print $1"@*"}'`; } || cd -
# project does not have go.mod if go list fails again
go list $tags $path || { go mod init $path && go mod tidy ;}
if [[ $SANITIZER = *coverage* ]]; then
fuzzed_package=`go list $tags -f '{{.Name}}' $path`
abspath=`go list $tags -f {{.Dir}} $path`
cd $abspath
cp $GOPATH/ossfuzz_coverage_runner.go ./"${function,,}"_test.go
sed -i -e 's/FuzzFunction/'$function'/' ./"${function,,}"_test.go
sed -i -e 's/mypackagebeingfuzzed/'$fuzzed_package'/' ./"${function,,}"_test.go
sed -i -e 's/TestFuzzCorpus/Test'$function'Corpus/' ./"${function,,}"_test.go
# The repo is the module path/name, which is already created above in case it doesn't exist,
# but not always the same as the module path. This is necessary to handle SIV properly.
fuzzed_repo=$(go list $tags -f {{.Module}} "$path")
abspath_repo=`go list -m $tags -f {{.Dir}} $fuzzed_repo || go list $tags -f {{.Dir}} $fuzzed_repo`
# give equivalence to absolute paths in another file, as go test -cover uses golangish pkg.Dir
echo "s=$fuzzed_repo"="$abspath_repo"= > $OUT/$fuzzer.gocovpath
# Additional packages for which to get coverage.
pkgaddcov=""
# to prevent bash from failing about unbound variable
GO_COV_ADD_PKG_SET=${GO_COV_ADD_PKG:-}
if [[ -n "${GO_COV_ADD_PKG_SET}" ]]; then
pkgaddcov=","$GO_COV_ADD_PKG
abspath_repo=`go list -m $tags -f {{.Dir}} $GO_COV_ADD_PKG || go list $tags -f {{.Dir}} $GO_COV_ADD_PKG`
echo "s=^$GO_COV_ADD_PKG"="$abspath_repo"= >> $OUT/$fuzzer.gocovpath
fi
go test $tags \
-c \
-o $OUT/$fuzzer \
-v \
-covermode=atomic \
-coverpkg $fuzzed_repo/...$pkgaddcov \
$path
function_names_file="$OUT/fuzzer_function_names.json"
save_function_name "$fuzzer" "Test${function}Corpus" "$function_names_file"
else
# Compile and instrument all Go files relevant to this fuzz target.
echo "Running go-fuzz $tags -func $function -o $fuzzer.a $path"
go-fuzz $tags -func $function -o $fuzzer.a $path
# Link Go code ($fuzzer.a) with fuzzing engine to produce fuzz target binary.
$CXX $CXXFLAGS $LIB_FUZZING_ENGINE $fuzzer.a -o $OUT/$fuzzer
fi
)
================================================
FILE: infra/base-images/base-builder/compile_honggfuzz
================================================
#!/bin/bash -eu
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo "Skipping compilation; using precompiled honggfuzz"
cp $SRC/honggfuzz/honggfuzz.a $LIB_FUZZING_ENGINE
cp $SRC/honggfuzz/honggfuzz $OUT/
# Set flags necessary for netdriver compilation.
export LIB_HFND="-Wl,-u,LIBHFNETDRIVER_module_netdriver -Wl,--start-group $SRC/honggfuzz/libhfnetdriver/libhfnetdriver.a $SRC/honggfuzz/libhfcommon/libhfcommon.a -Wl,--end-group"
export HFND_CXXFLAGS='-DHFND_FUZZING_ENTRY_FUNCTION_CXX(x,y)=extern const char* LIBHFNETDRIVER_module_netdriver;const char** LIBHFNETDRIVER_tmp1 = &LIBHFNETDRIVER_module_netdriver;extern "C" int HonggfuzzNetDriver_main(x,y);int HonggfuzzNetDriver_main(x,y)'
export HFND_CFLAGS='-DHFND_FUZZING_ENTRY_FUNCTION(x,y)=extern const char* LIBHFNETDRIVER_module_netdriver;const char** LIBHFNETDRIVER_tmp1 = &LIBHFNETDRIVER_module_netdriver;int HonggfuzzNetDriver_main(x,y);int HonggfuzzNetDriver_main(x,y)'
# Custom coverage flags, roughly in sync with:
# https://github.com/google/honggfuzz/blob/oss-fuzz/hfuzz_cc/hfuzz-cc.c
export COVERAGE_FLAGS="-fsanitize-coverage=trace-pc-guard,indirect-calls,trace-cmp"
echo " done."
================================================
FILE: infra/base-images/base-builder/compile_javascript_fuzzer
================================================
#!/bin/bash -eu
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
project=$1
# Path the fuzz target source file relative to the project's root.
fuzz_target=$2
# Arguments to pass to Jazzer.js
jazzerjs_args=${@:3}
# Copy source code into the $OUT directory and install Jazzer.js into the project.
if [ ! -d $OUT/$project ]; then
cp -r $SRC/$project $OUT/$project
fi
fuzzer_basename=$(basename -s .js $fuzz_target)
# Create an execution wrapper that executes Jazzer.js with the correct arguments.
echo "#!/bin/bash
# LLVMFuzzerTestOneInput so that the wrapper script is recognized as a fuzz target for 'check_build'.
project_dir=\$(dirname \"\$0\")/$project
\$project_dir/node_modules/@jazzer.js/core/dist/cli.js \$project_dir/$fuzz_target $jazzerjs_args \$JAZZERJS_EXTRA_ARGS -- \$@" > $OUT/$fuzzer_basename
chmod +x $OUT/$fuzzer_basename
================================================
FILE: infra/base-images/base-builder/compile_libfuzzer
================================================
#!/bin/bash -eu
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo -n "Compiling libFuzzer to $LIB_FUZZING_ENGINE... "
export LIB_FUZZING_ENGINE="-fsanitize=fuzzer"
if [ "$FUZZING_LANGUAGE" = "go" ]; then
export LIB_FUZZING_ENGINE="$LIB_FUZZING_ENGINE $GOPATH/gosigfuzz/gosigfuzz.o"
fi
cp /usr/local/lib/clang/*/lib/$ARCHITECTURE-unknown-linux-gnu/libclang_rt.fuzzer.a $LIB_FUZZING_ENGINE_DEPRECATED
echo " done."
================================================
FILE: infra/base-images/base-builder/compile_native_go_fuzzer
================================================
#!/bin/bash -eu
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
path=$1
function=$2
fuzzer=$3
tags="-tags gofuzz"
# Get absolute path.
abs_file_dir=$(go list $tags -f {{.Dir}} $path)
# TODO(adamkorcz): Get rid of "-r" flag here.
export fuzzer_filename=$(grep -r -l --include='*.go' -s "$function" "${abs_file_dir}")
# Import build_native_go_fuzzer_legacy
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/go_utils.sh"
# Test if file contains a line with "func $function" and "testing.F".
if [ $(grep -r "func $function" $fuzzer_filename | grep "testing.F" | wc -l) -eq 1 ]
then
build_native_go_fuzzer_legacy $fuzzer $function $abs_file_dir $path
else
echo "Could not find the function: func ${function}(f *testing.F)"
fi
================================================
FILE: infra/base-images/base-builder/compile_native_go_fuzzer_v2
================================================
#!/bin/bash -eu
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
path=$1
function=$2
fuzzer=$3
tags="-tags gofuzz"
# Get absolute path.
abs_file_dir=$(go list $tags -f {{.Dir}} $path)
# Find the file containing the fuzzer function definition.
# Search for the actual function signature with testing.F to avoid false matches
# in files that only reference the function name (e.g., in comments or helper functions).
export fuzzer_filename=$(grep -r -l --include='*.go' "func ${function}(.*testing\.F" "${abs_file_dir}")
# Import build_native_go_fuzzer
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/go_utils.sh"
# Verify we found exactly one file with the fuzzer function.
file_count=$(echo "$fuzzer_filename" | wc -w)
if [ "$file_count" -eq 1 ] && [ -n "$fuzzer_filename" ] && [ -f "$fuzzer_filename" ]
then
build_native_go_fuzzer $fuzzer $function $abs_file_dir $path
elif [ "$file_count" -gt 1 ]
then
echo "Error: Found multiple files with func ${function}(f *testing.F):"
echo "$fuzzer_filename"
exit 1
else
echo "Could not find the function: func ${function}(f *testing.F)"
fi
================================================
FILE: infra/base-images/base-builder/compile_python_fuzzer
================================================
#!/bin/bash -eux
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# In order to enable PySecSan for a given module, set the environment
# variable ENABLE_PYSECSAN="YES"
fuzzer_path=$1
shift 1
fuzzer_basename=$(basename -s .py $fuzzer_path)
fuzzer_package=${fuzzer_basename}.pkg
PYFUZZ_WORKPATH=$SRC/pyfuzzworkdir/
FUZZ_WORKPATH=$PYFUZZ_WORKPATH/$fuzzer_basename
if [[ $SANITIZER = *introspector* ]]; then
# Extract the source package the fuzzer targets. This must happen before
# we enter the virtual environment in the following lines because we need
# to use the same python environment that installed the fuzzer dependencies.
python3 /fuzz-introspector/frontends/python/prepare_fuzz_imports.py $fuzzer_path isossfuzz
# We must ensure python3.9, this is because we use certain
# AST logic from there.
# The below should probably be refined
apt-get install -y python3.9
apt-get update
apt-get install -y python3-pip
python3.9 -m pip install virtualenv
python3.9 -m virtualenv .venv
. .venv/bin/activate
pip3 install pyyaml
export PYTHONPATH="/fuzz-introspector/frontends/python/PyCG"
ARGS="--fuzzer $fuzzer_path"
if [ -n "${PYFUZZPACKAGE-}" ]; then
ARGS="$ARGS --package=${PYFUZZPACKAGE}"
fi
python /fuzz-introspector/frontends/python/main.py $ARGS
ls -la ./
exit 0
fi
# In coverage mode prepend coverage logic to the fuzzer source
if [[ $SANITIZER = *coverage* ]]; then
cat < coverage_wrapper.py
###### Coverage stub
import atexit
import coverage
cov = coverage.coverage(data_file='.coverage', cover_pylib=True)
cov.start()
# Register an exist handler that will print coverage
def exit_handler():
cov.stop()
cov.save()
atexit.register(exit_handler)
####### End of coverage stub
EOF
# Prepend stub and create tmp file
cat coverage_wrapper.py $fuzzer_path > tmp_fuzzer_coverage.py
# Overwrite existing fuzzer with new fuzzer that has stub
mv tmp_fuzzer_coverage.py $fuzzer_path
fi
# If PYSECSAN is enabled, ensure that we can build with it.
if [[ ${ENABLE_PYSECSAN:-"0"} != "0" ]];
then
# Make sure pysecsan is installed
if [[ ! -d "/pysecsan" ]];
then
pushd /usr/local/lib/sanitizers/pysecsan
python3 -m pip install .
popd
fi
cat < pysecsan_wrapper.py
import pysecsan; pysecsan.add_hooks();
EOF
# Prepend stub and create tmp file
cat pysecsan_wrapper.py $fuzzer_path > tmp_fuzzer_pysecsan.py
# Overwrite existing fuzzer with new fuzzer that has stub
mv tmp_fuzzer_pysecsan.py $fuzzer_path
fi
rm -rf $PYFUZZ_WORKPATH
mkdir $PYFUZZ_WORKPATH $FUZZ_WORKPATH
pyinstaller --distpath $OUT --workpath=$FUZZ_WORKPATH --onefile --name $fuzzer_package "$@" $fuzzer_path
# Disable executable bit from package as OSS-Fuzz uses executable bits to
# identify fuzz targets. We re-enable the executable bit in wrapper script
# below.
chmod -x $OUT/$fuzzer_package
# In coverage mode save source files of dependencies in pyinstalled binary
if [[ $SANITIZER = *coverage* ]]; then
rm -rf /medio/
python3 /usr/local/bin/python_coverage_helper.py $FUZZ_WORKPATH "/medio"
zip -r $fuzzer_package.deps.zip /medio
mv $fuzzer_package.deps.zip $OUT/
fi
# Create execution wrapper.
echo "#!/bin/sh
# LLVMFuzzerTestOneInput for fuzzer detection.
this_dir=\$(dirname \"\$0\")
chmod +x \$this_dir/$fuzzer_package
LD_PRELOAD=\$this_dir/sanitizer_with_fuzzer.so \
ASAN_OPTIONS=\$ASAN_OPTIONS:symbolize=1:external_symbolizer_path=\$this_dir/llvm-symbolizer:detect_leaks=0 \
\$this_dir/$fuzzer_package \$@" > $OUT/$fuzzer_basename
chmod +x $OUT/$fuzzer_basename
================================================
FILE: infra/base-images/base-builder/debug_afl
================================================
#!/bin/bash
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Source this file for afl++ debug sessions.
apt-get update
apt-get install -y strace gdb vim joe psmisc
pushd $SRC/aflplusplus > /dev/null
git checkout dev
git pull
test -n "$1" && { git checkout "$1" ; git pull ; }
CFLAGS_SAVE="$CFLAGS"
CXXFLAGS_SAVE="$CXXFLAGS"
unset CFLAGS
unset CXXFLAGS
make
export CFLAGS="$CFLAGS_SAVE"
export CXXFLAGS="$CXXFLAGS_SAVE"
popd > /dev/null
export ASAN_OPTIONS="detect_leaks=0:symbolize=0:detect_odr_violation=0:abort_on_error=1"
export AFL_LLVM_LAF_ALL=1
export AFL_LLVM_CMPLOG=1
touch "$OUT/afl_cmplog.txt"
export AFL_LLVM_DICT2FILE=$OUT/afl++.dict
ulimit -c unlimited
================================================
FILE: infra/base-images/base-builder/detect_repo.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Module to get the the name of a git repo containing a specific commit
inside of an OSS-Fuzz project.
Example Usage:
python detect_repo.py --src_dir /src --example_commit
b534f03eecd8a109db2b085ab24d419b6486de97
Prints the location of the git remote repo as well as the repo's name
seperated by a space.
https://github.com/VirusTotal/yara.git yara
"""
import argparse
import logging
import os
import subprocess
GO_PATH = '/root/go/src/'
def main():
"""Function to get a git repo's url and name referenced by OSS-Fuzz
Dockerfile.
Raises:
ValueError when a commit or a ref is not provided.
"""
parser = argparse.ArgumentParser(
description=
'Finds a specific git repo in an oss-fuzz project\'s docker file.')
parser.add_argument('--repo_name', help='The name of the git repo.')
parser.add_argument('--src_dir', help='The location of the possible repo.')
parser.add_argument('--example_commit',
help='A commit SHA referencing the project\'s main repo.')
args = parser.parse_args()
if not args.repo_name and not args.example_commit:
raise ValueError(
'Requires an example commit or a repo name to find repo location.')
if args.src_dir:
src_dir = args.src_dir
else:
src_dir = os.environ.get('SRC', '/src')
for single_dir in get_dirs_to_search(src_dir, args.repo_name):
full_path = os.path.join(src_dir, single_dir)
if not os.path.isdir(full_path):
continue
if args.example_commit and check_for_commit(full_path, args.example_commit):
print('Detected repo:', get_repo(full_path), full_path)
return
if args.repo_name and check_for_repo_name(full_path, args.repo_name):
print('Detected repo:', get_repo(full_path), full_path)
return
logging.error('No git repos with specific commit: %s found in %s',
args.example_commit, src_dir)
def get_dirs_to_search(src_dir, repo_name):
"""Gets a list of directories to search for the main git repo.
Args:
src_dir: The location set for the projects SRC.
repo_name: The name of the repo you are searching for.
Returns:
A list of directorys to search.
"""
dirs_to_search = os.listdir(src_dir)
if os.path.exists(GO_PATH) and repo_name:
for root, dirs, _ in os.walk(GO_PATH):
for test_dir in dirs:
if repo_name in test_dir:
dirs_to_search.append(os.path.join(root, test_dir))
return dirs_to_search
def get_repo(repo_path):
"""Gets a git repo link from a specific directory in a docker image.
Args:
repo_path: The directory on the image where the git repo exists.
Returns:
The repo location or None.
"""
output, return_code = execute(['git', 'config', '--get', 'remote.origin.url'],
location=repo_path,
check_result=True)
if return_code == 0 and output:
return output.rstrip()
return None
def check_for_repo_name(repo_path, expected_repo_name):
"""Returns True if the repo at |repo_path| repo_name matches
|expected_repo_name|.
Args:
repo_path: The directory of a git repo.
expected_repo_name: The name of the target git repo.
"""
if not os.path.exists(os.path.join(repo_path, '.git')):
return False
repo_url, _ = execute(['git', 'config', '--get', 'remote.origin.url'],
location=repo_path)
# Handle two common cases:
# https://github.com/google/syzkaller/
# https://github.com/google/syzkaller.git
repo_url = repo_url.replace('.git', '').rstrip().rstrip('/')
actual_repo_name = repo_url.split('/')[-1]
return actual_repo_name == expected_repo_name
def check_for_commit(repo_path, commit):
"""Checks a directory for a specific commit.
Args:
repo_path: The name of the directory to test for the commit.
commit: The commit SHA to check for.
Returns:
True if directory contains that commit.
"""
# Check if valid git repo.
if not os.path.exists(os.path.join(repo_path, '.git')):
return False
# Check if history fetch is needed.
if os.path.exists(os.path.join(repo_path, '.git', 'shallow')):
execute(['git', 'fetch', '--unshallow'], location=repo_path)
# Check if commit is in history.
_, return_code = execute(['git', 'cat-file', '-e', commit],
location=repo_path)
return return_code == 0
def execute(command, location, check_result=False):
"""Runs a shell command in the specified directory location.
Args:
command: The command as a list to be run.
location: The directory the command is run in.
check_result: Should an exception be thrown on failed command.
Returns:
The stdout of the command, the error code.
Raises:
RuntimeError: running a command resulted in an error.
"""
process = subprocess.Popen(command, stdout=subprocess.PIPE, cwd=location)
output, err = process.communicate()
if check_result and (process.returncode or err):
raise RuntimeError(
'Error: %s\n running command: %s\n return code: %s\n out %s\n' %
(err, command, process.returncode, output))
if output is not None:
output = output.decode('ascii')
return output, process.returncode
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/detect_repo_test.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Test the functionality of the detect_repo module.
This will consist of the following functional test:
1. Determine if an OSS-Fuzz projects main repo can be detected from example
commits.
2. Determine if an OSS-Fuzz project main repo can be detected from a
repo name.
"""
import os
import re
import sys
import tempfile
import unittest
from unittest import mock
import detect_repo
# Appending to path for access to repo_manager module.
# pylint: disable=wrong-import-position
sys.path.append(
os.path.dirname(os.path.dirname(os.path.dirname(
os.path.abspath(__file__)))))
import repo_manager
import test_repos
# pylint: enable=wrong-import-position
class TestCheckForRepoName(unittest.TestCase):
"""Tests for check_for_repo_name."""
@mock.patch('os.path.exists', return_value=True)
@mock.patch('detect_repo.execute',
return_value=('https://github.com/google/syzkaller/', None))
def test_go_get_style_url(self, _, __):
"""Tests that check_for_repo_name works on repos that were downloaded using
go get."""
self.assertTrue(detect_repo.check_for_repo_name('fake-path', 'syzkaller'))
@mock.patch('os.path.exists', return_value=True)
@mock.patch('detect_repo.execute',
return_value=('https://github.com/google/syzkaller', None))
def test_missing_git_and_slash_url(self, _, __):
"""Tests that check_for_repo_name works on repos who's URLs do not end in
".git" or "/"."""
self.assertTrue(detect_repo.check_for_repo_name('fake-path', 'syzkaller'))
@mock.patch('os.path.exists', return_value=True)
@mock.patch('detect_repo.execute',
return_value=('https://github.com/google/syzkaller.git', None))
def test_normal_style_repo_url(self, _, __):
"""Tests that check_for_repo_name works on normally cloned repos."""
self.assertTrue(detect_repo.check_for_repo_name('fake-path', 'syzkaller'))
@unittest.skipIf(not os.getenv('INTEGRATION_TESTS'),
'INTEGRATION_TESTS=1 not set')
class DetectRepoIntegrationTest(unittest.TestCase):
"""Class to test the functionality of the detect_repo module."""
def test_infer_main_repo_from_commit(self):
"""Tests that the main repo can be inferred based on an example commit."""
with tempfile.TemporaryDirectory() as tmp_dir:
# Construct example repo's to check for commits.
for test_repo in test_repos.TEST_REPOS:
repo_manager.clone_repo_and_get_manager(test_repo.git_url, tmp_dir)
self.check_with_repo(test_repo.git_url,
test_repo.git_repo_name,
tmp_dir,
commit=test_repo.old_commit)
def test_infer_main_repo_from_name(self):
"""Tests that the main project repo can be inferred from a repo name."""
with tempfile.TemporaryDirectory() as tmp_dir:
for test_repo in test_repos.TEST_REPOS:
repo_manager.clone_repo_and_get_manager(test_repo.git_url, tmp_dir)
self.check_with_repo(test_repo.git_url, test_repo.git_repo_name,
tmp_dir)
def check_with_repo(self, repo_origin, repo_name, tmp_dir, commit=None):
"""Checks the detect repo's main method for a specific set of inputs.
Args:
repo_origin: URL of the git repo.
repo_name: The name of the directory it is cloned to.
tmp_dir: The location of the directory of git repos to be searched.
commit: The commit that should be used to look up the repo.
"""
command = ['python3', 'detect_repo.py', '--src_dir', tmp_dir]
if commit:
command += ['--example_commit', commit]
else:
command += ['--repo_name', repo_name]
out, _ = detect_repo.execute(command,
location=os.path.dirname(
os.path.realpath(__file__)))
match = re.search(r'\bDetected repo: ([^ ]+) ([^ ]+)', out.rstrip())
if match and match.group(1) and match.group(2):
self.assertEqual(match.group(1), repo_origin)
self.assertEqual(match.group(2), os.path.join(tmp_dir, repo_name))
else:
self.assertIsNone(repo_origin)
self.assertIsNone(repo_name)
if __name__ == '__main__':
unittest.main()
================================================
FILE: infra/base-images/base-builder/go_utils.sh
================================================
#!/bin/bash -eu
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Adds a fuzzer to a json list stored in $OUT
# so we can easily check later if a fuzzer
# is a std lib fuzzer
add_to_list_of_native_fuzzers() {
local new_element="$1"
local file="$OUT/native_go_fuzzers.json"
if [ -z "$new_element" ]; then
echo "Usage: add_to_list \"element to add\""
return 1
fi
# Ensure the directory exists
if [ ! -d "$(dirname "$file")" ]; then
echo "Error: Directory $(dirname "$file") does not exist."
return 1
fi
# Initialize the file if it doesn't exist or is empty
if [ ! -s "$file" ]; then
echo "[]" > "$file"
fi
# Append the new element to the list using jq
jq --arg item "$new_element" '. += [$item]' "$file" > "$file.tmp" && mv "$file.tmp" "$file"
}
# Save a key-value pair to a JSON file. We use this to
# store the fuzzer function name with the fuzzer
# executable name; we need the function name in the
# coverage build.
save_function_name() {
local key="$1"
local value="$2"
local file="$3"
if [ -z "$key" ] || [ -z "$value" ] || [ -z "$file" ]; then
echo "Usage: save_function_name "
return 1
fi
# If file doesn't exist or is empty, initialize it as empty object
if [ ! -s "$file" ]; then
echo "{}" > "$file"
fi
# Update or add the key-value pair
jq --arg k "$key" --arg v "$value" '.[$k] = $v' "$file" > "$file.tmp" && mv "$file.tmp" "$file"
}
function build_native_go_fuzzer_legacy() {
fuzzer=$1
function=$2
path=$3
tags="-tags gofuzz"
if [[ $SANITIZER == *coverage* ]]; then
current_dir=$(pwd)
mkdir $OUT/rawfuzzers || true
cd $abs_file_dir
go test $tags -c -run $fuzzer -o $OUT/$fuzzer -cover
cp "${fuzzer_filename}" "${OUT}/rawfuzzers/${fuzzer}"
fuzzed_repo=$(go list $tags -f {{.Module}} "$path")
abspath_repo=`go list -m $tags -f {{.Dir}} $fuzzed_repo || go list $tags -f {{.Dir}} $fuzzed_repo`
# give equivalence to absolute paths in another file, as go test -cover uses golangish pkg.Dir
echo "s=$fuzzed_repo"="$abspath_repo"= > $OUT/$fuzzer.gocovpath
cd $current_dir
else
go-118-fuzz-build $tags -o $fuzzer.a -func $function $abs_file_dir
$CXX $CXXFLAGS $LIB_FUZZING_ENGINE $fuzzer.a -o $OUT/$fuzzer
fi
}
function build_native_go_fuzzer() {
fuzzer=$1
function=$2
abs_path=$3
package_path=$4
tags="-tags gofuzz"
if [[ $SANITIZER == *coverage* ]]; then
function_names_file="$OUT/fuzzer_function_names.json"
# Save the current dir to return later
current_dir=$(pwd)
fuzzed_repo=$(go list $tags -f {{.Module}} "$abs_path")
cd $abs_file_dir
go test $tags \
-c \
-o "$OUT/$fuzzer" \
-coverpkg="$fuzzed_repo/..." \
-covermode=atomic \
"$package_path"
save_function_name "$fuzzer" "$function" "$function_names_file"
abspath_repo=`go list -m $tags -f {{.Dir}} $fuzzed_repo || go list $tags -f {{.Dir}} $fuzzed_repo`
# give equivalence to absolute paths in another file, as go test -cover uses golangish pkg.Dir
echo "s=$fuzzed_repo"="$abspath_repo"= > $OUT/$fuzzer.gocovpath
add_to_list_of_native_fuzzers "${fuzzer}"
# Store the function signature in $OUT/fuzzer-parameters.json
# so we can read it when running helper.py coverage. We need
# this to convert corpus to a readable format by the test.
convertLibFuzzerTestcaseToStdLibGo \
-write-params \
-file $fuzzer_filename \
-fuzzer-func $function \
-fuzzerBinaryName $fuzzer \
-json-out $OUT/fuzzer-parameters.json
cd $current_dir
else
go-118-fuzz-build_v2 $tags -o $fuzzer.a -func $function $abs_file_dir
$CXX $CXXFLAGS $LIB_FUZZING_ENGINE $fuzzer.a -o $OUT/$fuzzer
fi
}
================================================
FILE: infra/base-images/base-builder/indexer/README.md
================================================
# Indexer snapshot builds
This directory provides the tooling to be able to build "indexed" OSS-Fuzz
builds, which are snapshots which provide the binary, source code, and a source
code index.
Snapshots are also built by our infrastructure and available at
`gs://clusterfuzz-builds/indexer_indexes`.
## Building project snapshots
```bash
python infra/helper.py build_image
python infra/helper.py index
# Only build snapshots for `target1` and `target2`.
python infra/helper.py index --targets 'target1,target2'
# Drop into /bin/bash instead of automatically running /opt/indexer/index_build.py.
python infra/helper.py index --shell
# Add additional docker args.
python infra/helper.py index --docker_arg="-eFOO=123" --docker_arg="-eBAR=456"
# Pass through flags to the entrypoint.
python infra/helper.py index -- --target-args '123'
```
The resulting snapshots will be found in `/build/out/`.
## Development
For faster local development on the scripts in this directory, we can mount in
everything inside this directory to overwrite the scripts in the base image by
passing `--dev` to the index command.
If the `indexer` binary does not exist, a prebuilt binary is
[downloaded](https://clusterfuzz-builds.storage.googleapis.com/oss-fuzz-artifacts/indexer).
```
cd $OSS_FUZZ_CHECKOUT
python infra/helper.py index --dev
```
The resulting snapshots will be found in `/build/out/`.
## Testing
We have some basic tests to make sure the snapshots we're generating have the correct format.
For this to work, you need to run `python infra/helper.py index --dev ` at least
once, or make sure you have the `indexer` binary in this directory.
```
sudo INDEX_BUILD_TESTS=1 python3 -m unittest index_build_test
```
================================================
FILE: infra/base-images/base-builder/indexer/clang_wrapper.py
================================================
#!/usr/bin/env python3
# Copyright 2026 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Compiler Wrapper.
This is copied into the OSS-Fuzz container image and run there as part of the
instrumentation process.
"""
from collections.abc import Iterator, MutableSequence, Sequence
import contextlib
import dataclasses
import fcntl
import hashlib
import json
import os
from pathlib import Path # pylint: disable=g-importing-member
import shlex
import shutil
import subprocess
import sys
import time
from typing import Any, Iterable, Set
import dwarf_info
import index_build
_LLVM_READELF_PATH = "/usr/local/bin/llvm-readelf"
_INDEXER_PATH = "/opt/indexer/indexer"
_IGNORED_DEPS_PATH = os.path.join(
os.path.dirname(_INDEXER_PATH), "ignored_deps.json"
)
_INTERNAL_PATHS = ("/src/llvm-project/",)
# When we notice a project using these flags,
# we should figure out how to handle them.
_DISALLOWED_CLANG_FLAGS = (
"-fdebug-prefix-map=",
"-ffile-prefix-map=",
)
# Chromium GN builds use these flags with a period to make paths relative to
# the out directory. This is OK.
_ALLOWED_CLANG_FLAGS_ONLY_WITH_PERIOD = (
"-fdebug-compilation-dir=",
"-ffile-compilation-dir=",
)
_IGNORED_FILES = (
# This file seems to cause a crash in the indexer, as well as performance
# issues.
"simdutf.cpp",
)
_INDEXER_THREADS_PER_MERGE_QUEUE = 16
_INDEXER_PER_THREAD_MEMORY = 2 * 1024**3 # 2 GiB
_CDB_FRAGMENT_DELIMITER = ",\n"
SRC = Path(os.getenv("SRC", "/src"))
# On OSS-Fuzz build infra, $OUT is not /out.
OUT = Path(os.getenv("OUT", "/out"))
INDEXES_PATH = Path(os.getenv("INDEXES_PATH", "/indexes"))
FUZZER_ENGINE = os.getenv("LIB_FUZZING_ENGINE", "/usr/lib/libFuzzingEngine.a")
def _get_available_memory() -> int:
"""Returns the available memory in bytes."""
with open("/proc/meminfo", "r") as f:
for line in f:
if line.startswith("MemAvailable:"):
return int(line.split()[1]) * 1024
raise RuntimeError("Failed to get available memory")
def rewrite_argv0(argv: Sequence[str], clang_toolchain: str) -> list[str]:
"""Rewrite argv[0] to point to the real clang location."""
# We do this because we've set PATH to our wrapper.
rewritten = [os.path.join(clang_toolchain, "bin", os.path.basename(argv[0]))]
rewritten.extend(argv[1:])
return rewritten
def execute(argv: Sequence[str], clang_toolchain: str) -> None:
argv = rewrite_argv0(argv, clang_toolchain)
print("About to execute...", argv)
os.execv(argv[0], tuple(argv))
def run(argv: Sequence[str], clang_toolchain: str) -> None:
argv = rewrite_argv0(argv, clang_toolchain)
print("About to run...", argv)
ret = subprocess.run(argv, check=False)
if ret.returncode != 0:
sys.exit(ret.returncode)
def sha256(file: Path) -> str:
hash_value = hashlib.sha256()
with open(file, "rb") as f:
# We can't use hashlib.file_digest here because OSS-Fuzz is still on
# Python 3.10.
for chunk in iter(lambda: f.read(4096), b""):
hash_value.update(chunk)
return hash_value.hexdigest()
def get_flag_value(argv: Sequence[str], flag: str) -> str | None:
for i in range(len(argv) - 1):
if argv[i] == flag:
return argv[i + 1]
elif flag == "-o" and argv[i].startswith(flag):
return argv[i][2:]
return None
def remove_flag_if_present(argv: Iterable[str], flag: str) -> list[str]:
return [arg for arg in argv if arg != flag]
def remove_flag_and_value(argv: list[str], flag: str) -> list[str] | None:
"""Removes a flag and its value (as a separate token, --a=b not supported.)"""
for i in range(len(argv) - 1):
if argv[i] == flag:
return argv[:i] + argv[i + 2 :]
return argv
def parse_dependency_file(
file_path: Path, output_file: Path, ignored_deps: frozenset[str]
) -> Sequence[str]:
"""Parses the dependency file generated by the linker."""
output_file = output_file.resolve()
with file_path.open("r") as f:
lines = [line.strip() for line in f]
# The first line should have the format "/path/to/file: \"
# Make sure the binary name matches.
if output_file.name != Path(lines[0].split(":")[0].strip()).name:
raise RuntimeError(
f"dependency file has invalid first line: {lines[0]}. "
f"Expected to see {output_file.name}."
)
deps = []
ignored_dep_paths = ["/usr", "/clang", "/lib"]
for line in lines[1:]:
if not line:
break
if line.endswith(" \\"):
line = line[:-2]
dep = os.path.realpath(line)
# We don"t care about system-wide dependencies.
if any([True for p in ignored_dep_paths if dep.startswith(p)]):
continue
if dep in ignored_deps:
continue
deps.append(dep)
return deps
def files_by_creation_time(folder_path: Path) -> Sequence[Path]:
files = [path for path in folder_path.iterdir() if path.is_file()]
files.sort(key=os.path.getctime)
return files
def _wait_for_cdb_fragment(file: Path) -> Sequence[str]:
"""Returns the CDB fragment from the given file, waiting if needed."""
num_retries = 3
for i in range(1 + num_retries):
data = file.read_text()
if data.endswith(_CDB_FRAGMENT_DELIMITER):
return data.split(_CDB_FRAGMENT_DELIMITER)[:-1]
if i < num_retries:
print(
f"WARNING: CDB fragment {file} appears to be invalid: {data}, "
f"sleeping for 2^{i+1} seconds before retrying.",
file=sys.stderr,
)
time.sleep(2 ** (i + 1))
else:
error = f"CDB fragment {file} is invalid even after retries: {data}"
if "test.c" in file.name or "conftest.c" in file.name:
# Some build systems seem to have a weird issue where the autotools
# generated `test.c` or `conftest.c` for testing compilers doesn't
# result in valid cdb fragments.
print(f"WARNING: {error}", file=sys.stderr)
else:
raise RuntimeError(error)
return ()
def read_cdb_fragments(cdb_path: Path) -> Any:
"""Iterates through the CDB fragments to reconstruct the compile commands."""
files = files_by_creation_time(cdb_path)
contents = []
for file in files:
# Don't read previously generated linker commands files.
if file.name.endswith("_linker_commands.json"):
continue
if not file.name.endswith(".json"):
continue
fragments = _wait_for_cdb_fragment(file)
contents.extend(fragments)
contents = _CDB_FRAGMENT_DELIMITER.join(contents)
contents = "[" + contents + "]"
return json.loads(contents)
def _index_dir_path(output_file: Path) -> Path:
"""Returns the path to the index directory for the given output binary."""
# This mirrors the absolute path of the output file.
absolute_path = (Path(os.getcwd()) / output_file).resolve()
return INDEXES_PATH / absolute_path.relative_to("/")
def run_indexer(
output_file: Path,
build_id: str,
linker_commands: dict[str, Any],
allow_errors: bool = False,
):
"""Run the indexer."""
# Use a build-specific compile commands directory, since there could be
# parallel linking happening at the same time.
compile_commands_dir = INDEXES_PATH / f"compile_commands_{build_id}"
try:
compile_commands_dir.mkdir(exist_ok=False)
except FileExistsError:
# Somehow we've already seen this link command, don't try to redo the
# indexing.
# TODO: check if this is the safest behaviour.
print(
f"WARNING: Compile commands directory {compile_commands_dir} "
"already created.",
file=sys.stderr,
)
return
# Indexes can be built incrementally, so use the same directory for each
# output binary.
index_dir = _index_dir_path(output_file)
index_dir.mkdir(parents=True, exist_ok=True)
# Symlink by build ID, because `index_build.py` relies on build IDs to match
# the binaries (which may have moved around) to indexes.
build_id_symlink = INDEXES_PATH / build_id
if not build_id_symlink.exists():
os.symlink(index_dir, build_id_symlink)
if not linker_commands["compile_commands"]:
# Nothing to index.
return
with (compile_commands_dir / "compile_commands.json").open("wt") as f:
json.dump(linker_commands["compile_commands"], f, indent=2)
with (compile_commands_dir / "full_compile_commands.json").open("wt") as f:
json.dump(linker_commands["full_compile_commands"], f, indent=2)
# Auto-tune the number of threads and merge queues according to the number
# of cores and available memory.
# Note: this might require further tuning -- this might not work well if there
# are multiple binaries being linked/indexed at the same time.
num_cores = len(os.sched_getaffinity(0))
num_threads = max(
1, min(_get_available_memory() // _INDEXER_PER_THREAD_MEMORY, num_cores)
)
merge_queues = max(1, num_threads // _INDEXER_THREADS_PER_MERGE_QUEUE)
# TODO: b/447468859 - Use database_only once users are ready.
cmd = [
_INDEXER_PATH,
"--build_dir",
compile_commands_dir,
"--index_dir",
index_dir.as_posix(),
"--source_dir",
SRC.as_posix(),
"--index_threads",
str(num_threads),
"--merge_queues",
str(merge_queues),
]
if (index_dir / "db.sqlite").exists():
cmd.append("--delta")
if allow_errors:
cmd.append("--ignore_indexing_errors")
result = subprocess.run(cmd, check=False, capture_output=True)
if result.returncode != 0:
raise RuntimeError(
"Running indexer failed\n"
f"stdout:\n```\n{result.stdout.decode()}\n```\n"
f"stderr:\n```\n{result.stderr.decode()}\n```\n"
)
def check_fuzzing_engine_and_fix_argv(argv: MutableSequence[str]) -> bool:
"""Check if this command is linking in a fuzzing engine."""
# Also fix up incorrect link flags so we link in the correct fuzzing
# engine.
fuzzing_engine_in_argv = False
idx = 0
for arg in argv[:]:
if arg == "-fsanitize=fuzzer":
argv[idx] = "-lFuzzingEngine"
fuzzing_engine_in_argv = True
elif arg == "-fsanitize=fuzzer-no-link":
argv.remove("-fsanitize=fuzzer-no-link")
idx -= 1
elif arg.startswith("-fsanitize="):
# This could be -fsanitize=address,fuzzer.
sanitize_vals = arg.split("=")[1].split(",")
if "fuzzer" in sanitize_vals:
sanitize_vals.remove("fuzzer")
arg = "-fsanitize=" + ",".join(sanitize_vals)
fuzzing_engine_in_argv = True
elif "fuzzer-no-link" in sanitize_vals:
sanitize_vals.remove("fuzzer-no-link")
arg = "-fsanitize=" + ",".join(sanitize_vals)
argv[idx] = arg
if fuzzing_engine_in_argv:
idx += 1
argv.insert(idx, "-lFuzzingEngine")
idx += 1
if "libFuzzingEngine.a" in arg or "-lFuzzingEngine" in arg:
fuzzing_engine_in_argv = True
return fuzzing_engine_in_argv
def _has_disallowed_clang_flags(argv: Sequence[str]) -> bool:
"""Checks if the command line arguments contain disallowed flags."""
if any(arg.startswith(_DISALLOWED_CLANG_FLAGS) for arg in argv):
return True
if any(
arg.startswith(_ALLOWED_CLANG_FLAGS_ONLY_WITH_PERIOD)
and not arg.endswith("=.")
for arg in argv
):
return True
return False
@dataclasses.dataclass(frozen=True)
class FilteredCompileCommands:
filtered_compile_commands: Sequence[dict[str, str]]
unused_cu_paths: Set[Path]
unused_cc_paths: Set[Path]
def _filter_compile_commands(
elf_path: Path, compile_commands: Sequence[dict[str, str]]
) -> FilteredCompileCommands:
"""Extracts compile commands from the DWARF information of an ELF file.
Args:
elf_path: The path to the ELF file.
compile_commands: The compile commands to filter.
Returns:
The filtered compile commands.
"""
compilation_units = dwarf_info.get_all_compilation_units(elf_path)
cu_paths = set([Path(cu.compdir) / cu.name for cu in compilation_units])
used_cu_paths = set()
filtered_compile_commands = []
unused_cc_paths = set()
for compile_command in compile_commands:
if (
"-ffile-compilation-dir=." in compile_command["arguments"]
or "-fdebug-compilation-dir=." in compile_command["arguments"]
):
# Handle build systems that make their debug paths relative.
directory = Path(".")
else:
directory = Path(compile_command["directory"])
cc_path = Path(directory / compile_command["file"])
if cc_path in cu_paths and cc_path.name not in _IGNORED_FILES:
filtered_compile_commands.append(compile_command)
used_cu_paths.add(cc_path)
else:
unused_cc_paths.add(cc_path)
unused_cu_paths = cu_paths - used_cu_paths
return FilteredCompileCommands(
filtered_compile_commands=filtered_compile_commands,
unused_cu_paths=unused_cu_paths,
unused_cc_paths=unused_cc_paths,
)
def _write_filter_log(
filter_log_file: Path,
filtered_compile_commands: FilteredCompileCommands,
) -> None:
"""Writes the filter log file."""
with open(filter_log_file, "wt") as f:
f.write("The following files were not used in the final binary:\n")
for cc_path in sorted(filtered_compile_commands.unused_cc_paths):
f.write(f"\t{cc_path}\n")
f.write(
"The following compilation units were not matched with any compile"
" commands:\n"
)
for cu_path in sorted(filtered_compile_commands.unused_cu_paths):
if cu_path.as_posix().startswith(_INTERNAL_PATHS):
continue
f.write(f"\t{cu_path}\n")
def expand_rsp_file(argv: Sequence[str]) -> list[str]:
# https://llvm.org/docs/CommandLine.html#response-files
expanded = []
for arg in argv:
if arg.startswith("@"):
with open(arg[1:], "r") as f:
expanded_args = shlex.split(f.read())
expanded.extend(expanded_args)
else:
expanded.append(arg)
return expanded
def force_optimization_flag(argv: Sequence[str]) -> list[str]:
"""Forces -O0 in the given argument list."""
args = []
for arg in argv:
if arg.startswith("-O") and arg != "-O0":
arg = "-O0"
args.append(arg)
return args
def fix_coverage_flags(
argv: Sequence[str], expected_coverage_flags: str
) -> list[str]:
"""Makes sure that the right coverage flags are set."""
args = []
for arg in argv:
# Some projects use -fsanitize-coverage-allowlist/ignorelist to optimize
# fuzzing feedback. For the indexer case, we would prefer to have all code
# instrumented, so we remove these flags.
# Some projects hardcode -fsanitize-coverage= options that cause conflicts
# with our indexer / tracer options.
if (arg.startswith("-fsanitize-coverage-allowlist=") or
arg.startswith("-fsanitize-coverage-ignorelist=") or
arg.startswith("-fsanitize-coverage=")):
continue
args.append(arg)
args.append(expected_coverage_flags)
return args
@contextlib.contextmanager
def _file_lock(lock_path: Path):
"""Context manager for acquiring an exclusive file lock."""
fd = os.open(lock_path.as_posix(), os.O_CREAT | os.O_RDWR)
fcntl.flock(fd, fcntl.LOCK_EX)
try:
yield
finally:
fcntl.flock(fd, fcntl.LOCK_UN)
os.close(fd)
def merge_incremental_cdb(cdb_path: Path, merged_cdb_path: Path) -> None:
"""Merges new CDB fragments into the incremental CDB."""
# Map of output file to the path of the file in the incremental CDB.
# Use the output file path as the key for merging.
existing_output_files: dict[Path, Path] = {}
def load_cdbs(directory: Path) -> Iterator[tuple[Path, dict[str, Any]]]:
for file in directory.iterdir():
if file.suffix != ".json":
continue
if file.name.endswith("_linker_commands.json"):
continue
fragments_data = _wait_for_cdb_fragment(file)
for fragment_data in fragments_data:
fragment = json.loads(fragment_data)
if "output" not in fragment:
continue
yield file, fragment
# We could be running multiple linking steps in parallel, so serialize merges.
with _file_lock(merged_cdb_path / ".lock"):
# Load existing CDB fragments, and build the map of output file -> fragment.
for file, fragment in load_cdbs(merged_cdb_path):
output_path = Path(fragment["directory"]) / fragment["output"]
existing_output_files[output_path] = file
# Load new CDB fragments, replacing existing fragments for the same output
# file.
for file, fragment in load_cdbs(cdb_path):
output_path = Path(fragment["directory"]) / fragment["output"]
if output_path in existing_output_files:
# Remove existing entry for the output file.
os.unlink(existing_output_files[output_path])
del existing_output_files[output_path]
shutil.copy2(file, merged_cdb_path / file.name)
def main(argv: list[str]) -> None:
compile_settings = index_build.read_compile_settings()
argv = expand_rsp_file(argv)
argv = remove_flag_if_present(argv, "-gline-tables-only")
argv = force_optimization_flag(argv)
argv = fix_coverage_flags(argv, compile_settings.coverage_flags)
if _has_disallowed_clang_flags(argv):
raise ValueError("Disallowed clang flags found, aborting.")
# TODO: b/441872725 - Migrate more flags to be appended in the clang wrapper
# instead.
cdb_path = index_build.OUT / "cdb"
argv.extend(("-gen-cdb-fragment-path", cdb_path.as_posix()))
argv.extend((
"-isystem",
(
f"{compile_settings.clang_toolchain}/lib/clang/"
f"{compile_settings.clang_version}"
),
"-resource-dir",
(
f"{compile_settings.clang_toolchain}/lib/clang/"
f"{compile_settings.clang_version}"
),
))
if "-E" in argv:
# Preprocessor-only invocation.
modified_argv = remove_flag_and_value(argv, "-gen-cdb-fragment-path")
execute(modified_argv, compile_settings.clang_toolchain)
fuzzing_engine_in_argv = check_fuzzing_engine_and_fix_argv(argv)
indexer_targets: list[str] = [
t for t in os.getenv("INDEXER_TARGETS", "").split(",") if t
]
# If we are linking, collect the relevant flags and dependencies.
output_file = get_flag_value(argv, "-o")
if not output_file:
execute(argv, compile_settings.clang_toolchain) # Missing output file
output_file = Path(output_file)
if output_file.name.endswith(".o"):
execute(argv, compile_settings.clang_toolchain) # Not a real linker command
if indexer_targets:
if output_file.name not in indexer_targets:
# Not a relevant linker command
print(f"Not indexing as {output_file} is not in the allowlist")
execute(argv, compile_settings.clang_toolchain)
elif not fuzzing_engine_in_argv:
# Not a fuzz target.
execute(argv, compile_settings.clang_toolchain)
print(f"Linking {argv}")
# We can now run the linker and look at the output of some files.
dependency_file = (cdb_path / output_file.name).with_suffix(".deps")
why_extract_file = (cdb_path / output_file.name).with_suffix(".why_extract")
argv.append("-fuse-ld=lld")
argv.append(f"-Wl,--dependency-file={dependency_file}")
argv.append(f"-Wl,--why-extract={why_extract_file}")
argv.append("-Wl,--build-id")
# We force lld, but it doesn't include this dir by default.
argv.append("-L/usr/local/lib")
argv.append("-Qunused-arguments")
if compile_settings.coverage_flags == index_build.TRACING_COVERAGE_FLAGS:
argv.append("/opt/indexer/coverage.o")
run(argv, compile_settings.clang_toolchain)
build_id = index_build.get_build_id(output_file)
assert build_id is not None
output_hash = sha256(output_file)
with open(_IGNORED_DEPS_PATH) as f:
ignored_deps = frozenset(json.load(f)["deps"])
deps = parse_dependency_file(dependency_file, output_file, ignored_deps)
obj_deps = [dep for dep in deps if dep.endswith(".o")]
ar_deps = [dep for dep in deps if dep.endswith(".a") and dep != FUZZER_ENGINE]
archive_deps = []
for archive in ar_deps:
res = subprocess.run(["ar", "-t", archive], capture_output=True, check=True)
archive_deps += [dep.decode() for dep in res.stdout.splitlines()]
# Incremental index building relies on merging all new compilation fragments
# since the initial indexing.
cdb_fragments_dir = cdb_path
if _index_dir_path(output_file).exists():
merge_incremental_cdb(cdb_path, index_build.INCREMENTAL_CDB_PATH)
cdb_fragments_dir = index_build.INCREMENTAL_CDB_PATH
# We only care about the compile commands that emitted an output file.
full_compile_commands = [
cc for cc in read_cdb_fragments(cdb_fragments_dir) if "output" in cc
]
# Discard compile commands that didn't end up in the final binary.
filtered_compile_commands = _filter_compile_commands(
output_file, full_compile_commands
)
linker_commands = {
"output": output_file.as_posix(),
"directory": os.getcwd(),
"deps": obj_deps + archive_deps,
"args": argv,
"sha256": output_hash,
"gnu_build_id": build_id,
"compile_commands": filtered_compile_commands.filtered_compile_commands,
"full_compile_commands": full_compile_commands,
}
filter_log_file = Path(cdb_path) / f"{build_id}_filter_log.txt"
_write_filter_log(filter_log_file, filtered_compile_commands)
if not os.getenv("INDEXER_BINARIES_ONLY"):
is_custom_toolchain = (
compile_settings.clang_toolchain != index_build.DEFAULT_CLANG_TOOLCHAIN
)
run_indexer(
output_file, build_id, linker_commands, allow_errors=is_custom_toolchain
)
linker_commands = json.dumps(linker_commands)
commands_path = Path(cdb_path) / f"{build_id}_linker_commands.json"
commands_path.write_text(linker_commands)
if __name__ == "__main__":
main(sys.argv)
================================================
FILE: infra/base-images/base-builder/indexer/clang_wrapper_test.py
================================================
#!/usr/bin/env python3
# Copyright 2026 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""clang_wrapper tests."""
import json
import pathlib
import clang_wrapper
import unittest
class ClangWrapperTest(unittest.TestCase):
def test_force_optimization_flag_no_optimization(self):
"""Tests that optimization flags are not forced when not present."""
argv = ["clang", "-c", "test.c", "-o", "test.o"]
modified_argv = clang_wrapper.force_optimization_flag(argv)
self.assertCountEqual(modified_argv, argv)
def test_force_optimization_flag(self):
"""Tests that optimization flags are forced when present."""
argv = ["clang", "-O2", "-c", "test.c", "-o", "test.o", "-O1"]
modified_argv = clang_wrapper.force_optimization_flag(argv)
self.assertCountEqual(
modified_argv, ["clang", "-O0", "-c", "test.c", "-o", "test.o", "-O0"]
)
def test_remove_invalid_coverage_flags(self):
"""Tests that invalid coverage flags are removed."""
argv = [
"clang",
"-foo",
"-fsanitize-coverage-allowlist=allowlist",
"-fsanitize-coverage-ignorelist=ignorelist",
"-fsanitize-coverage=edge",
"-c",
"test.c",
]
modified_argv = clang_wrapper.fix_coverage_flags(
argv, "-fsanitize-coverage=bb,no-prune,trace-pc-guard"
)
self.assertCountEqual(
modified_argv,
[
"clang",
"-foo",
"-c",
"test.c",
"-fsanitize-coverage=bb,no-prune,trace-pc-guard",
],
)
def test_merge_incremental_cdb(self):
"""Tests that incremental cdb is merged correctly."""
cdb_path = pathlib.Path(self.create_tempdir().full_path)
merged_cdb_path = pathlib.Path(self.create_tempdir().full_path)
old_cdb_fragments = {
"test.c.123.json": {
"directory": "/build",
"file": "test.c",
"output": "test.o",
"arguments": ["-c", "test.c"],
},
"test.c.455.json": {
"directory": "/build/subdir",
"file": "test.c",
"output": "test.o",
"arguments": ["-c", "test.c"],
},
"foo.c.455.json": {
"directory": "/build",
"file": "foo.c",
"output": "foo.o",
"arguments": ["-c", "foo.c"],
},
"foo.123_linker_commands.json": {"invalid": "foo"},
}
new_cdb_fragments = {
"test.c.aaa.json": [{
"directory": "/build/subdir",
"file": "test.c",
"output": "test.o",
"arguments": ["-c", "test.c"],
}],
"bar.c.bbb.json": [
{
"directory": "/build/subdir",
"file": "bar.c",
"output": "bar.o",
"arguments": ["-c", "bar.c"],
},
{
"directory": "/build/subdir",
"file": "bar2.c",
"output": "bar2.o",
"arguments": ["-c", "bar2.c"],
},
],
}
for cdb_fragment_path, cdb_fragment in old_cdb_fragments.items():
suffix = (
",\n"
if not cdb_fragment_path.endswith("_linker_commands.json")
else ""
)
(merged_cdb_path / cdb_fragment_path).write_text(
json.dumps(cdb_fragment) + suffix
)
for cdb_fragment_path, cdb_fragment in new_cdb_fragments.items():
(cdb_path / cdb_fragment_path).write_text(
",\n".join([json.dumps(frag) for frag in cdb_fragment]) + ",\n"
)
(cdb_path / "not_a_json").write_text("not a json")
clang_wrapper.merge_incremental_cdb(cdb_path, merged_cdb_path)
self.assertCountEqual(
merged_cdb_path.iterdir(),
[
pathlib.Path(merged_cdb_path) / ".lock",
pathlib.Path(merged_cdb_path) / "test.c.123.json",
pathlib.Path(merged_cdb_path) / "test.c.aaa.json",
pathlib.Path(merged_cdb_path) / "foo.c.455.json",
pathlib.Path(merged_cdb_path) / "foo.123_linker_commands.json",
pathlib.Path(merged_cdb_path) / "bar.c.bbb.json",
],
)
def test_merge_incremental_cdb_duplicate_outputs(self):
"""Tests that incremental cdb is merged correctly with duplicate outputs."""
cdb_path = pathlib.Path(self.create_tempdir().full_path)
merged_cdb_path = pathlib.Path(self.create_tempdir().full_path)
fragment1 = {
"directory": "/build",
"file": "test.c",
"output": "test.o",
}
(merged_cdb_path / "1.json").write_text(json.dumps(fragment1) + ",\n")
fragment2 = {
"directory": "/build",
"file": "test.c",
"output": "test.o",
}
(cdb_path / "2.json").write_text(json.dumps(fragment2) + ",\n")
(cdb_path / "3.json").write_text(json.dumps(fragment2) + ",\n")
clang_wrapper.merge_incremental_cdb(cdb_path, merged_cdb_path)
self.assertCountEqual(
merged_cdb_path.iterdir(),
[
merged_cdb_path / ".lock",
merged_cdb_path / "2.json",
merged_cdb_path / "3.json",
],
)
self.assertFalse((merged_cdb_path / "1.json").exists())
if __name__ == "__main__":
unittest.main()
================================================
FILE: infra/base-images/base-builder/indexer/coverage.cc
================================================
// Copyright 2026 Google LLC.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
namespace {
constexpr int kMaxTraceSize = 64 * 1024;
struct CoverageData {
void* pcs[kMaxTraceSize];
size_t idx;
// TODO: b/441647761 - Handle multiple threads.
pid_t main_thread_id;
bool finished;
};
static CoverageData* coverage_data;
bool IsStandardLibrary(const char* file_path) {
return (strstr(file_path, "include/c++/v1") ||
strstr(file_path, "src/libcxx/include") ||
strstr(file_path, "src/libcxx/staging/include") ||
strncmp(file_path, "/usr/include", 12) == 0 ||
strstr(file_path, "libc++/src/include") ||
strstr(file_path, "/absl/"));
}
void WriteTrace() {
coverage_data->finished = true;
char* trace_dump_file = getenv("TRACE_DUMP_FILE");
if (!trace_dump_file) {
return;
}
int fd = open(trace_dump_file, O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
// TODO: b/441647761 - This format likely needs iteration. This just prints
// symbolized function names, but this could still be ambiguous.
for (size_t i = 0; i < coverage_data->idx; ++i) {
char symbol[1024];
char file_path[1024];
// This always null terminates.
__sanitizer_symbolize_pc(coverage_data->pcs[i], "%f", symbol,
sizeof(symbol));
__sanitizer_symbolize_pc(coverage_data->pcs[i], "%s", file_path,
sizeof(file_path));
if (IsStandardLibrary(file_path)) continue;
write(fd, symbol, strlen(symbol));
write(fd, "\n", 1);
}
close(fd);
}
pid_t GetTID() { return static_cast(syscall(SYS_gettid)); }
void Init() {
coverage_data = static_cast(malloc(sizeof(CoverageData)));
coverage_data->finished = false;
coverage_data->idx = 0;
// For now, only record PCs from the main thread.
coverage_data->main_thread_id = GetTID();
// Dump coverage on exit.
atexit(WriteTrace);
__sanitizer_set_death_callback(WriteTrace);
}
} // namespace
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t* start,
uint32_t* stop) {
Init();
static uint32_t N; // Counter for the guards.
if (start == stop || *start) return;
for (uint32_t* x = start; x < stop; x++) *x = ++N;
}
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t* guard) {
thread_local bool in_callback = false;
if (!coverage_data || coverage_data->finished) return;
if (*guard == 0) return;
if (in_callback) return;
in_callback = true;
class ResetInCallback {
public:
~ResetInCallback() { in_callback = false; }
} reset_in_callback;
thread_local pid_t thread_id = GetTID();
if (thread_id != coverage_data->main_thread_id) {
return;
}
if (coverage_data->idx >= kMaxTraceSize) {
return;
}
*guard = 0; // Don't trace the same PC more than once.
coverage_data->pcs[coverage_data->idx++] =
reinterpret_cast(__builtin_return_address(0));
}
================================================
FILE: infra/base-images/base-builder/indexer/dwarf_info.py
================================================
#!/usr/bin/env python3
# Copyright 2026 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""DWARF info parser for ELF files."""
import dataclasses
import io
import os
from typing import Sequence
from absl import logging
from elftools.elf import elffile
_IGNORED_UNIT_TYPES = ("DW_UT_type", "DW_UT_split_type")
@dataclasses.dataclass
class CompilationUnit:
"""Represents a DWARF compilation unit.
Attributes:
producer: The producer of the compilation unit.
name: The name of the compilation unit.
compdir: The compilation directory of the compilation unit.
language: The language of the compilation unit.
apple_flags: Flags used in the compilation unit (if compiled with `-glldb`).
isysroot: The isysroot of the compilation unit.
"""
producer: str
name: str
compdir: str
language: int
apple_flags: str | None
isysroot: str | None
def get_all_compilation_units(
elf_file_path: os.PathLike[str],
) -> list[CompilationUnit]:
"""Parses compilation units from an ELF file.
Args:
elf_file_path: The path to the ELF file.
Returns:
A list of CompilationUnit objects.
"""
result = []
with open(elf_file_path, "rb") as f:
elf_file = elffile.ELFFile(f)
if not elf_file.has_dwarf_info():
logging.error("No DWARF info found in %s", elf_file_path)
return []
dwarf_info = elf_file.get_dwarf_info()
for compilation_unit in dwarf_info.iter_CUs():
if compilation_unit.header.version < 5:
# Only DWARF5 has a unit_type field in the header.
# For older versions, we do a best effort approach.
logging.warning(
"[!] Compilation Unit with unsupported DWARF version %d",
compilation_unit.header.version,
)
elif compilation_unit.header.unit_type in _IGNORED_UNIT_TYPES:
# Type units are not interesting for us.
continue
elif compilation_unit.header.unit_type not in (
"DW_UT_compile",
"DW_UT_partial",
):
raise ValueError(
"Unsupported DWARF compilation unit type"
f" {compilation_unit.header.unit_type}"
)
top_debug_info_entry = compilation_unit.get_top_DIE()
if top_debug_info_entry.tag != "DW_TAG_compile_unit":
logging.error("Top DIE is not a full compile unit")
producer = top_debug_info_entry.attributes[
"DW_AT_producer"
].value.decode()
name = top_debug_info_entry.attributes["DW_AT_name"].value.decode()
language = top_debug_info_entry.attributes["DW_AT_language"].value
compdir = top_debug_info_entry.attributes["DW_AT_comp_dir"].value.decode()
# When using `-glldb`, the compile flags are stored
# in the DW_AT_APPLE_flags attribute
apple_flags = None
if top_debug_info_entry.attributes.get("DW_AT_APPLE_flags", None):
apple_flags = top_debug_info_entry.attributes[
"DW_AT_APPLE_flags"
].value.decode()
isysroot = None
if top_debug_info_entry.attributes.get("DW_AT_LLVM_isysroot", None):
isysroot = top_debug_info_entry.attributes[
"DW_AT_LLVM_isysroot"
].value.decode()
result.append(
CompilationUnit(
producer=producer,
name=name,
compdir=compdir,
language=language,
apple_flags=apple_flags,
isysroot=isysroot,
)
)
return result
def parse_clang_record_command_line_value(command: str) -> Sequence[str]:
"""Parses the value of a `-frecord-command-line` entry from clang.
Separate arguments within a command line are combined with spaces.
Spaces and backslashes within an argument are escaped with backslashes.
Args:
command: The command line string to split.
Returns:
A sequence of strings, each representing a single argument.
Raises:
ValueError: If the command line contains an invalid escape sequence.
ValueError: If the command line contains an empty argument.
"""
value = io.StringIO(command)
args = []
current_arg = ""
while True:
c = value.read(1)
match c:
case "":
# We found the end of the string.
break
case "\\":
# We found a backslash, the next character should be either a space or
# another backslash.
c = value.read(1)
if c not in (" ", "\\"):
raise ValueError(f"Invalid Escape Sequence: \\{c}")
current_arg += c
case " ":
# unescaped spaces separate arguments.
if not current_arg:
raise ValueError("Arguments should not be empty.")
args.append(current_arg)
current_arg = ""
case _:
# Anything else is part of the current argument.
current_arg += c
if not current_arg:
raise ValueError("Last argument should not be empty.")
args.append(current_arg)
return args
================================================
FILE: infra/base-images/base-builder/indexer/dwarf_info_diff.py
================================================
#!/usr/bin/env python3
# Copyright 2026 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Diffs compile commands generated via DWARF info and compilation databases."""
import collections
from collections.abc import Sequence
import json
import pathlib
from absl import app
from absl import flags
from absl import logging
import dwarf_info
_BINARY_PATH = flags.DEFINE_string(
"binary_path", None, "Path to the binary file.", required=True
)
_COMPILE_COMMANDS_PATH = flags.DEFINE_string(
"compile_commands_path",
None,
"Path to the compile commands file.",
required=True,
)
def main(argv: Sequence[str]) -> None:
if len(argv) > 1:
raise app.UsageError("Too many command-line arguments.")
binary_path = pathlib.Path(_BINARY_PATH.value)
compilation_units = dwarf_info.get_all_compilation_units(binary_path)
logging.info("Found %d compilation units.", len(compilation_units))
# Question 1: Do we have repeated CU names in the binary?
cu_files = collections.Counter([cu.name for cu in compilation_units])
logging.info("Most Common CU names: %s", cu_files.most_common(1))
libs = binary_path.parent / "lib"
for lib in libs.iterdir():
new_cus = dwarf_info.get_all_compilation_units(lib)
logging.info("Found %d compilation units in %s", len(new_cus), lib)
compilation_units.extend(new_cus)
with open(_COMPILE_COMMANDS_PATH.value, "r") as f:
compile_commands = json.load(f)
# Question 2: Do we have repeated files in the compile commands?
cc_files = collections.Counter([cc["file"] for cc in compile_commands])
logging.info("Most Common commands files: %s", cc_files.most_common(1))
cc_files = set(cc_files)
cu_files = set(cu_files)
for file in cc_files - cu_files:
logging.info("File not found in CU: %s", file)
for file in cu_files - cc_files:
logging.info("File not found in CC: %s", file)
for file in cu_files.intersection(cc_files):
logging.info("File found in both: %s", file)
if __name__ == "__main__":
app.run(main)
================================================
FILE: infra/base-images/base-builder/indexer/fuzzing_engine.cc
================================================
// Copyright 2026 Google LLC.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include
#include
#include
#include
#include
#include
#include
#include
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t n);
extern "C" __attribute__((weak)) int LLVMFuzzerInitialize(int* argc,
char*** argv);
// Projects can call LLVMFuzzerMutate, but should only do it from
// LLVMFuzzerCustomMutator, which should be called from the fuzzing engine (we
// don't need to).
extern "C" size_t LLVMFuzzerMutate([[maybe_unused]] uint8_t* Data,
[[maybe_unused]] size_t Size,
[[maybe_unused]] size_t MaxSize) {
fprintf(stderr, "LLVMFuzzerMutate was called. This should never happen.\n");
__builtin_trap();
}
int main(int argc, char* argv[]) {
if (LLVMFuzzerInitialize) {
LLVMFuzzerInitialize(&argc, &argv);
}
if (argc != 2) {
// Special-case because curl invokes the fuzzer binaries without arguments
// during make, and will fail if they don't return success.
if (strstr(argv[0], "curl_fuzzer")) {
fprintf(stderr, "Exiting early for curl_fuzzer\n");
exit(EXIT_SUCCESS);
}
fprintf(stderr, "Usage: %s \n", argv[0]);
exit(EXIT_FAILURE);
}
int fd = open(argv[1], O_RDONLY);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
struct stat st;
if (fstat(fd, &st) == -1) {
perror("stat");
exit(EXIT_FAILURE);
}
size_t size = static_cast(st.st_size);
uint8_t* data = static_cast(malloc(size));
if (!data) {
perror("malloc");
exit(EXIT_FAILURE);
}
size_t bytes_read = 0;
while (bytes_read < size) {
ssize_t res = read(fd, data + bytes_read, size - bytes_read);
if (res == -1) {
perror("read");
exit(EXIT_FAILURE);
}
if (res == 0) {
fprintf(stderr, "Unexpected EOF.\n");
exit(EXIT_FAILURE);
}
bytes_read += static_cast(res);
}
close(fd);
int res = LLVMFuzzerTestOneInput(data, size);
free(data);
return res;
}
================================================
FILE: infra/base-images/base-builder/indexer/ignored_deps.json
================================================
{
"deps" : [
]
}
================================================
FILE: infra/base-images/base-builder/indexer/index_build.py
================================================
#!/usr/bin/env python3
# Copyright 2026 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""This runs the actual build process to generate a snapshot."""
import argparse
import dataclasses
import hashlib
import json
import logging
import os
import pathlib
from pathlib import Path # pylint: disable=g-importing-member
import shlex
import shutil
import stat
import subprocess
import tempfile
from typing import Any, Sequence
import manifest_types
import pathlib
PROJECT = Path(os.getenv('PROJECT_NAME', 'project')).name
SNAPSHOT_DIR = Path('/snapshot')
SRC = Path(os.getenv('SRC', '/src'))
# On OSS-Fuzz build infra, $OUT is not /out.
OUT = Path(os.getenv('OUT', '/out'))
INDEXES_PATH = Path(os.getenv('INDEXES_PATH', '/indexes'))
INCREMENTAL_CDB_PATH = Path('/incremental_cdb')
_GCC_BASE_PATH = Path('/usr/lib/gcc/x86_64-linux-gnu')
_LD_BINARY = 'ld-linux-x86-64.so.2'
_LD_PATH = Path('/lib64') / _LD_BINARY
_DEFAULT_GCC_VERSION = '9'
_LLVM_READELF_PATH = '/usr/local/bin/llvm-readelf'
DEFAULT_COVERAGE_FLAGS = '-fsanitize-coverage=bb,no-prune,trace-pc-guard'
TRACING_COVERAGE_FLAGS = '-fsanitize-coverage=func,trace-pc-guard'
DEFAULT_FUZZING_ENGINE = 'fuzzing_engine.cc'
DEFAULT_CLANG_TOOLCHAIN = '/usr/local'
_CLANG_TOOLCHAIN = Path(os.getenv('CLANG_TOOLCHAIN', DEFAULT_CLANG_TOOLCHAIN))
_TOOLCHAIN_WITH_WRAPPER = Path('/opt/toolchain')
INDEXER_DIR = Path(__file__).parent
# Some build systems isolate the compiler environment from the parent process,
# so we can't always rely on using environment variables to pass settings to the
# wrapper. Get around this by writing to a file instead.
COMPILE_SETTINGS_PATH = INDEXER_DIR / 'compile_settings.json'
CLANG_TOOLCHAIN_BINARY_PREFIXES = (
'clang-',
'ld',
'lld',
'llvm-',
)
EXTRA_CFLAGS = (
'-fno-omit-frame-pointer '
'-DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION '
'-O0 -glldb '
'-fsanitize=address '
'-Wno-invalid-offsetof '
'{coverage_flags} '
'-Qunused-arguments '
)
@dataclasses.dataclass(slots=True, frozen=True)
class CompileSettings:
coverage_flags: str
clang_toolchain: str
clang_version: str
def read_compile_settings() -> CompileSettings:
"""Gets compile settings from file."""
with COMPILE_SETTINGS_PATH.open('r') as f:
settings_dict = json.load(f)
return CompileSettings(**settings_dict)
def write_compile_settings(compile_settings: CompileSettings) -> None:
"""Writes compile settings to file."""
with COMPILE_SETTINGS_PATH.open('w') as f:
json.dump(dataclasses.asdict(compile_settings), f)
def set_env_vars(coverage_flags: str):
"""Set up build environment variables."""
os.environ['SANITIZER'] = 'address'
# Prevent ASan leak checker from running on `configure` script targets.
# At the time of writing, this helps prevent a slowdown in `hunspell` build.
os.environ['ASAN_OPTIONS'] = 'detect_leaks=0'
os.environ['FUZZING_ENGINE'] = 'none'
os.environ['LIB_FUZZING_ENGINE'] = '/usr/lib/libFuzzingEngine.a'
os.environ['FUZZING_LANGUAGE'] = 'c++'
os.environ['CXX'] = 'clang++'
os.environ['CC'] = 'clang'
os.environ['COMPILING_PROJECT'] = 'True'
# Force users of clang to use our wrapper. This fixes e.g. libcups.
os.environ['PATH'] = (
f"{_TOOLCHAIN_WITH_WRAPPER / 'bin'}:{os.environ.get('PATH')}"
)
existing_cflags = os.environ.get('CFLAGS', '')
extra_cflags = EXTRA_CFLAGS.format(coverage_flags=coverage_flags)
os.environ['CFLAGS'] = f'{existing_cflags} {extra_cflags}'.strip()
def set_up_wrapper_dir():
"""Sets up a shadow toolchain.
This sets up our clang wrapper for clang/clang++ and symlinks everything else
to point to the real toolchain.
"""
if _TOOLCHAIN_WITH_WRAPPER.exists():
shutil.rmtree(_TOOLCHAIN_WITH_WRAPPER)
_TOOLCHAIN_WITH_WRAPPER.mkdir(parents=True)
# Set up symlinks to toolchain binaries.
wrapper_bin_dir = _TOOLCHAIN_WITH_WRAPPER / 'bin'
wrapper_bin_dir.mkdir()
for name in os.listdir(_CLANG_TOOLCHAIN / 'bin'):
# Symlink clang/llvm toolchain binaries, except for clang itself.
# We have to be careful not to symlink other unrelated binaries, since other
# parts of the build process may wrap those binaries (e.g.
# make_build_replayable.py in OSS-Fuzz).
if not name.startswith(CLANG_TOOLCHAIN_BINARY_PREFIXES):
continue
os.symlink(_CLANG_TOOLCHAIN / 'bin' / name, wrapper_bin_dir / name)
os.symlink(_CLANG_TOOLCHAIN / 'lib', _TOOLCHAIN_WITH_WRAPPER / 'lib')
# Set up our compiler wrappers.
os.symlink(INDEXER_DIR / 'clang_wrapper.py', wrapper_bin_dir / 'clang')
os.symlink(INDEXER_DIR / 'clang_wrapper.py', wrapper_bin_dir / 'clang++')
@dataclasses.dataclass(slots=True, frozen=True)
class BinaryMetadata:
binary_config: manifest_types.CommandLineBinaryConfig
build_id: str
build_id_matches: bool
compile_commands: list[dict[str, Any]]
def _get_build_id_from_elf_notes(elf_file: Path, contents: bytes) -> str | None:
"""Extracts the build id from the ELF notes of a binary.
The ELF notes are obtained with
`llvm-readelf --notes --elf-output-style=JSON`.
Args:
elf_file: The ELF file name.
contents: The contents of the ELF notes, as a JSON string.
Returns:
The build id, or None if it could not be found.
"""
try:
elf_data = json.loads(contents)
except json.JSONDecodeError:
logging.error('failed to decode ELF notes for %s', elf_file)
return None
assert elf_data
# Example output of llvm-readelf JSON output for llvm 19+:
# [
# {
# "FileSummary": {
# "File": "binary",
# "Format": "elf64-x86-64",
# "Arch": "x86_64",
# "AddressSize": "64bit",
# "LoadName": ""
# },
# "NoteSections": [
# {
# "NoteSection": {
# "Name": ".note.gnu.property",
# "Offset": 904,
# "Size": 32,
# "Notes": [
# {
# "Owner": "GNU",
# "Data size": 16,
# "Type": "NT_GNU_PROPERTY_TYPE_0 (property note)",
# "Property": [
# "x86 ISA needed: x86-64-baseline"
# ]
# }
# ]
# }
# },
# {
# "NoteSection": {
# "Name": ".note.gnu.build-id",
# "Offset": 936,
# "Size": 36,
# "Notes": [
# {
# "Owner": "GNU",
# "Data size": 20,
# "Type": "NT_GNU_BUILD_ID (unique build ID bitstring)",
# "Build ID": "182a06c3dca5ee4d7e9c1d94b432c8bd9279438f"
# }
# ]
# }
# },
# {
# "NoteSection": {
# "Name": ".note.ABI-tag",
# "Offset": 1630064,
# "Size": 32,
# "Notes": [
# {
# "Owner": "GNU",
# "Data size": 16,
# "Type": "NT_GNU_ABI_TAG (ABI version tag)",
# "OS": "Linux",
# "ABI": "3.2.0"
# }
# ]
# }
# }
# ]
# }
# ]
for file_info in elf_data:
if 'Notes' in file_info:
# llvm < 19
for note_entry in file_info['Notes']:
note_section = note_entry['NoteSection']
if note_section['Name'] == '.note.gnu.build-id':
note_details = note_section['Note']
if 'Build ID' in note_details:
return note_details['Build ID']
elif 'NoteSections' in file_info:
# llvm 19+
for note_entry in file_info['NoteSections']:
note_section = note_entry['NoteSection']
if note_section['Name'] == '.note.gnu.build-id':
note_details = note_section['Notes']
for note_detail in note_details:
if 'Build ID' in note_detail:
return note_detail['Build ID']
else:
raise ValueError('Unknown ELF notes format.')
return None
def _get_clang_version(toolchain: Path) -> str:
"""Returns the clang version."""
clang = toolchain / 'bin' / 'clang'
clang_version = subprocess.run(
[clang, '-dumpversion'], capture_output=True, check=True, text=True
).stdout
return clang_version.split('.')[0]
def get_build_id(elf_file: Path) -> str | None:
"""This invokes llvm-readelf to get the build ID of the given ELF file."""
ret = subprocess.run(
[
_LLVM_READELF_PATH,
'--notes',
'--elf-output-style=JSON',
elf_file.as_posix(),
],
capture_output=True,
check=False,
)
if ret.returncode != 0:
return None
return _get_build_id_from_elf_notes(elf_file, ret.stdout)
def find_fuzzer_binaries(out_dir: Path, build_id: str) -> Sequence[Path]:
"""Find fuzzer binary with a given build ID."""
binaries = []
for root, _, files in os.walk(out_dir):
for file in files:
file_path = Path(root, file)
if get_build_id(file_path) == build_id:
binaries.append(file_path)
return binaries
def enumerate_build_targets(
binary_config: manifest_types.CommandLineBinaryConfig,
) -> Sequence[BinaryMetadata]:
"""Enumerates the build targets in the project.
Args:
binary_config: The binary config applied to all targets.
Returns:
A sequence of target descriptions, in BinaryMetadata form.
"""
logging.info('enumerate_build_targets')
linker_json_paths = list((OUT / 'cdb').glob('*_linker_commands.json'))
logging.info('Found %i linker JSON files.', len(linker_json_paths))
binary_to_build_metadata: dict[str, BinaryMetadata] = {}
for linker_json_path in linker_json_paths:
build_id = linker_json_path.name.split('_')[0]
with linker_json_path.open('rt') as f:
data = json.load(f)
binary_path = Path(data['output'])
name = binary_path.name
# Some projects may move build files around, so being more careful about
# the binary path and checking the build id should improve the success
# rate.
if (OUT / name).exists():
# Just because the name matches, doesn't mean it's the right one for
# this linker command.
# Only set this if we haven't already found an exact build ID match.
# We can't always rely on build ID matching, because some builds will
# modify the binary after the linker runs.
if (
name in binary_to_build_metadata
and binary_to_build_metadata[name].build_id_matches
):
continue
build_id_matches = build_id == get_build_id(binary_path)
target_binary_config = manifest_types.CommandLineBinaryConfig(
**dict(binary_config.to_dict(), binary_name=name)
)
binary_to_build_metadata[name] = BinaryMetadata(
binary_config=target_binary_config,
compile_commands=data['compile_commands'],
build_id=build_id,
build_id_matches=build_id_matches,
)
else:
logging.info('trying to find %s with build id %s', name, build_id)
binary_paths = find_fuzzer_binaries(OUT, build_id)
logging.info('found matching binaries: %s', binary_paths)
if not binary_paths:
logging.error('could not find %s with build id %s', name, build_id)
continue
for binary_path in binary_paths:
compile_commands = data['compile_commands']
target_binary_config = manifest_types.CommandLineBinaryConfig(
**dict(binary_config.to_dict(), binary_name=binary_path.name)
)
binary_to_build_metadata[binary_path.name] = BinaryMetadata(
binary_config=target_binary_config,
compile_commands=compile_commands,
build_id=build_id,
build_id_matches=True,
)
return tuple(binary_to_build_metadata.values())
def copy_fuzzing_engine(fuzzing_engine: str) -> Path:
"""Copy fuzzing engine."""
# Not every project saves source to $SRC/$PROJECT_NAME
fuzzing_engine_dir = SRC / PROJECT
if not fuzzing_engine_dir.exists():
fuzzing_engine_dir = SRC / 'fuzzing_engine'
fuzzing_engine_dir.mkdir(exist_ok=True)
shutil.copy(f'/opt/indexer/{fuzzing_engine}', fuzzing_engine_dir)
return fuzzing_engine_dir
def _get_latest_gcc_version() -> str:
"""Finds the latest GCC version installed.
Defaults to '9' for backward compatibility if detection fails.
Returns:
The latest GCC version found, or the default.
"""
if _GCC_BASE_PATH.exists():
versions = []
for d in _GCC_BASE_PATH.iterdir():
if d.is_dir() and d.name.isdigit():
versions.append(int(d.name))
if versions:
return str(max(versions))
return _DEFAULT_GCC_VERSION
def build_project(
targets_to_index: Sequence[str] | None = None,
compile_args: Sequence[str] | None = None,
binaries_only: bool = False,
coverage_flags: str = DEFAULT_COVERAGE_FLAGS,
):
"""Build the actual project."""
set_env_vars(coverage_flags)
if targets_to_index:
os.environ['INDEXER_TARGETS'] = ','.join(targets_to_index)
if binaries_only:
os.environ['INDEXER_BINARIES_ONLY'] = '1'
clang_version = _get_clang_version(_CLANG_TOOLCHAIN)
write_compile_settings(
CompileSettings(
coverage_flags=coverage_flags,
clang_toolchain=_CLANG_TOOLCHAIN.as_posix(),
clang_version=clang_version,
)
)
fuzzing_engine_dir = copy_fuzzing_engine(DEFAULT_FUZZING_ENGINE)
gcc_version = _get_latest_gcc_version()
build_fuzzing_engine_command = [
f'{_CLANG_TOOLCHAIN}/bin/clang++',
'-c',
'-Wall',
'-Wextra',
'-pedantic',
'-std=c++20',
'-fno-rtti',
'-fno-exceptions',
'-glldb',
'-O0',
(fuzzing_engine_dir / DEFAULT_FUZZING_ENGINE).as_posix(),
'-o',
f'{OUT}/fuzzing_engine.o',
'-gen-cdb-fragment-path',
f'{OUT}/cdb',
'-Qunused-arguments',
f'-isystem {_CLANG_TOOLCHAIN}/lib/clang/{clang_version}',
f'/usr/lib/gcc/x86_64-linux-gnu/{gcc_version}/../../../../include/c++/{gcc_version}',
'-I',
f'/usr/lib/gcc/x86_64-linux-gnu/{gcc_version}/../../../../include/x86_64-linux-gnu/c++/{gcc_version}',
'-I',
f'/usr/lib/gcc/x86_64-linux-gnu/{gcc_version}/../../../../include/c++/{gcc_version}/backward',
'-I',
f'{_CLANG_TOOLCHAIN}/lib/clang/{clang_version}/include',
'-I',
f'{_CLANG_TOOLCHAIN}/include',
'-I',
'/usr/include/x86_64-linux-gnu',
'-I',
'/usr/include',
]
subprocess.run(build_fuzzing_engine_command, check=True, cwd='/opt/indexer')
build_cov_instrumentation_command = [
f'{_CLANG_TOOLCHAIN}/bin/clang++',
'-fno-rtti',
'-fno-exceptions',
'-c',
'/opt/indexer/coverage.cc',
]
subprocess.run(
build_cov_instrumentation_command, check=True, cwd='/opt/indexer'
)
ar_cmd = [
'ar',
'rcs',
'/opt/indexer/fuzzing_engine.a',
f'{OUT}/fuzzing_engine.o',
]
subprocess.run(ar_cmd, check=True)
lib_fuzzing_engine = '/usr/lib/libFuzzingEngine.a'
if os.path.exists(lib_fuzzing_engine):
os.remove(lib_fuzzing_engine)
os.symlink('/opt/indexer/fuzzing_engine.a', lib_fuzzing_engine)
compile_command = ['/usr/local/bin/compile']
if compile_args:
compile_command.extend(compile_args)
subprocess.run(compile_command, check=True)
def test_target(
target: BinaryMetadata,
) -> bool:
"""Tests a single target."""
target_path = OUT / target.binary_config.binary_name
result = subprocess.run(
[str(target_path)], stderr=subprocess.PIPE, check=False
)
expected_error = f'Usage: {target_path} \n'
if expected_error not in result.stderr.decode() or result.returncode != 1:
logging.error(
'Target %s failed to run: %s',
target_path,
result.stderr.decode(),
)
return False
return True
def set_interpreter(target_path: Path, lib_mount_path: pathlib.PurePath):
subprocess.run(
[
'patchelf',
'--set-interpreter',
(lib_mount_path / _LD_BINARY).as_posix(),
target_path.as_posix(),
],
check=True,
)
def set_target_rpath(binary_artifact: Path, lib_mount_path: pathlib.PurePath):
subprocess.run(
[
'patchelf',
'--set-rpath',
lib_mount_path,
'--force-rpath',
binary_artifact.as_posix(),
],
check=True,
)
def copy_shared_libraries(
fuzz_target_path: Path, libs_path: Path, lib_mount_path: pathlib.PurePath
) -> None:
"""Copies the shared libraries to the shared directory."""
env = os.environ.copy()
env['LD_TRACE_LOADED_OBJECTS'] = '1'
env['LD_BIND_NOW'] = '1'
# TODO: Should we take ld.so from interp?
res = subprocess.run(
[_LD_PATH.as_posix(), fuzz_target_path.as_posix()],
capture_output=True,
env=env,
check=True,
)
output = res.stdout.decode()
if 'statically linked' in output:
return
# Example output:
# linux-vdso.so.1 => (0x00007f40afc0f000)
# linux-vdso.so.1 (0x00007f76b9377000)
# lib foo.so => /tmp/sharedlib/lib foo.so (0x00007f76b9367000)
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f76b9157000)
# /lib64/ld-linux-x86-64.so.2 (0x00007f76b9379000)
#
# The lines that do not have a => should be skipped.
# The dynamic linker should always be copied.
# The lines that have a => could contain a space, but we copy whatever on the
# right side of the =>, removing the load address.
shutil.copy2(_LD_PATH, libs_path / _LD_PATH.name)
lines = output.splitlines()
for line in lines:
if '=>' not in line:
continue
parts = line.split('=>')
lib_name = parts[0].strip()
right_side = parts[1].strip().rsplit(' ', maxsplit=1)[0].strip()
if not right_side:
continue
library_path = Path(right_side)
logging.info('Copying %s => %s', lib_name, library_path)
if library_path.is_relative_to(libs_path):
# This can happen if the project build is doing the same thing as us and
# already copied the library to the library_path.
continue
try:
shutil.copy2(library_path, libs_path / library_path.name)
dst = libs_path / library_path.name
# Need to preserve world writeable permissions.
shutil.copy2(library_path, dst)
# If we don't do this, our shared objects load the system's shared
# objects. What about their shared objects you may ask? Well they
# will all be from this directory where every so has the directory
# as its rpath.
set_target_rpath(dst, lib_mount_path)
except FileNotFoundError as e:
logging.exception('Could not copy %s', library_path)
raise e
def archive_target(target: BinaryMetadata, file_extension: str) -> Path | None:
"""Archives a single target in the project using the exported rootfs."""
logging.info('archive_target %s', target.binary_config.binary_name)
index_dir = INDEXES_PATH / target.build_id
if not index_dir.exists():
logging.error("didn't find index dir %s", index_dir)
return None
source_map = subprocess.run(
['srcmap'], capture_output=True, check=True
).stdout
target_hash = hashlib.sha256(source_map).hexdigest()[:16]
name = f'{PROJECT}.{target.binary_config.binary_name}'
uuid = f'{PROJECT}.{target.binary_config.binary_name}.{target_hash}'
lib_mount_path = pathlib.Path('/tmp') / (uuid + '_lib')
libs_path = OUT / 'lib'
# Keep a backup of the original 'lib' dir, in case the upstream project also
# bundles libs using the same directory name.
libs_backup_path = OUT / 'lib.backup'
if libs_path.exists():
shutil.copytree(libs_path, libs_backup_path)
else:
libs_path.mkdir(parents=False)
target_path = OUT / target.binary_config.binary_name
copy_shared_libraries(target_path, libs_path, lib_mount_path)
# We may want to eventually re-enable SRC copying (with some filtering to only
# include source files).
with tempfile.TemporaryDirectory() as empty_src_dir, \
tempfile.TemporaryDirectory() as backup_dir:
# Make a backup of the target binary so we can undo the rpath/interpreter
# changes in OUT.
backup_path = Path(backup_dir) / target_path.name
shutil.copy2(target_path, backup_path)
# This is to handle `target_path` being a hard link, where other target
# binaries share the same inode.
os.unlink(target_path)
shutil.copy2(backup_path, target_path)
set_interpreter(target_path, lib_mount_path)
set_target_rpath(target_path, lib_mount_path)
archive_path = SNAPSHOT_DIR / f'{uuid}{file_extension}'
# For `/` in $PROJECT.
archive_path.parent.mkdir(parents=True, exist_ok=True)
manifest_types.Manifest(
name=name,
uuid=uuid,
binary_config=target.binary_config,
source_map=manifest_types.source_map_from_dict(json.loads(source_map)),
lib_mount_path=lib_mount_path,
).save_build(
source_dir=Path(empty_src_dir),
build_dir=OUT,
index_dir=index_dir,
archive_path=archive_path,
out_dir=OUT,
)
shutil.move(backup_path, target_path)
logging.info('Wrote archive to: %s', archive_path)
shutil.rmtree(libs_path)
if libs_backup_path.exists():
shutil.move(libs_backup_path, libs_path)
return archive_path
def test_and_archive(
binary_config: manifest_types.CommandLineBinaryConfig,
targets_to_index: Sequence[str] | None,
file_extension: str,
):
"""Test target and archive."""
targets = enumerate_build_targets(binary_config)
if targets_to_index:
targets = [
t for t in targets if t.binary_config.binary_name in targets_to_index
]
missing_targets = set(targets_to_index) - set(
t.binary_config.binary_name for t in targets
)
if missing_targets:
raise ValueError(f'Could not find specified targets {missing_targets}.')
logging.info('targets %s', targets)
for target in targets:
try:
# Check that the target binary behaves like a fuzz target,
# unless the caller specifically asked for a list of targets.
if not targets_to_index and not test_target(target):
# TODO: Figure out if this is a good idea, it makes some things
# pass that should but causes some things to pass that shouldn't.
continue
except Exception: # pylint: disable=broad-exception-caught
logging.exception('Error testing target.')
continue
archive_target(target, file_extension)
def clear_out():
"""Clean up the OUT directory."""
for i in OUT.iterdir():
if i.is_dir():
shutil.rmtree(i)
else:
i.unlink()
def main():
logging.basicConfig(level=logging.INFO)
parser = argparse.ArgumentParser(description='Index builder.')
parser.add_argument(
'-t',
'--targets',
help=(
'Comma separated list of targets to build for. '
'If this is omitted, snapshots are built for all fuzz targets. '
'If specified, this can include binaries which are not fuzz targets '
'(e.g., CLI targets which are built as part of the build '
'integration).'
),
)
parser.add_argument(
'--targets-all-index',
action='store_true',
help=(
'When -t/--targets is set, allow the indexer to run on all of them, '
'but only archive snapshots for the specified targets. This is '
'useful to save some time for projects where the binary name during '
'build time does not match the final name in the output directory.'
),
)
parser.add_argument(
'--target-args',
default=None,
help=(
'Arguments to pass to the target when executing it. '
'This string is shell-escaped (interpreted with `shlex.split`). '
'The substring will be replaced with the input path.'
'Note: This is deprecated, use --target-arg instead.'
),
)
parser.add_argument(
'--target-arg',
action='append',
help=(
'An argument to pass to the target binary. '
'The substring will be replaced with the input path.'
'If you want to pass custom args, pass --harness-kind=binary as well.'
),
)
parser.add_argument(
'--target-env',
action='append',
default=[],
help=(
'Environment variables (key=value) to pass to the target when '
'executing it. The substring in a value will be '
'replaced with the input path.'
),
)
parser.add_argument(
'--binary-config',
default=None,
help=(
'JSON serialized OSS_FUZZ BinaryConfig object containing '
'binary_args, binary_env, harness_kind, etc. If this value is set, '
'redundant flags like target-arg, etc., may not be used. '
'The binary_name field of this BinaryConfig object is ignored, all '
'other fields will be applied to all targets.'
),
)
parser.add_argument(
'--no-clear-out',
action='store_true',
help='Do not clear out the OUT directory before building.',
)
parser.add_argument(
'--compile-arg',
action='append',
help='An argument to pass to the `compile` script.',
)
parser.add_argument(
'--compressed',
action='store_true',
help='Use gzipped tar (.tgz) for the output snapshot',
)
parser.add_argument(
'--binaries-only',
action='store_true',
help='Build target binaries only, and not index archives.',
)
parser.add_argument(
'--harness-kind',
choices=[str(x) for x in manifest_types.HarnessKind],
default=manifest_types.HarnessKind.LIBFUZZER,
help=(
'The harness kind to use for the fuzz target. In order to pass custom'
' args, set this to binary.'
),
)
parser.add_argument(
'--tracing-instrumentation',
action='store_true',
help='Enable tracing coverage instrumentation.',
)
args = parser.parse_args()
INDEXES_PATH.mkdir(exist_ok=True)
INCREMENTAL_CDB_PATH.mkdir(exist_ok=True)
# Clean up the existing OUT by default, otherwise we may run into various
# build errors.
if not args.no_clear_out:
clear_out()
if args.target_args and args.target_arg:
raise ValueError(
'Only one of --target-args or --target-arg can be specified.'
)
if args.binary_config:
if (
args.target_arg
or args.target_args
or args.target_env
or args.harness_kind != manifest_types.HarnessKind.LIBFUZZER
):
raise ValueError(
'If --binary-config is specified, redundant flags may not be set.'
)
binary_config = manifest_types.BinaryConfig.from_dict(
json.loads(args.binary_config)
)
if (
binary_config.kind != manifest_types.BinaryConfigKind.OSS_FUZZ
or not isinstance(binary_config, manifest_types.CommandLineBinaryConfig)
):
raise ValueError(
'Only OSS_FUZZ binary configs are supported with --binary-config.'
)
else:
if args.target_args and args.target_arg:
raise ValueError(
'Only one of --target-args or --target-arg can be specified.'
)
elif args.target_arg:
target_args = args.target_arg
elif args.target_args:
logging.warning('--target-args is deprecated, use --target-arg instead.')
target_args = shlex.split(args.target_args)
else:
logging.info('No target args specified.')
target_args = []
if args.target_env:
target_env = manifest_types.parse_env(args.target_env)
else:
logging.info('No target env specified.')
target_env = {}
harness_kind = manifest_types.HarnessKind(args.harness_kind)
match harness_kind:
case manifest_types.HarnessKind.LIBFUZZER:
if target_args and target_args != [manifest_types.INPUT_FILE]:
raise ValueError(
'Unsupported target args for harness_kind libfuzzer:'
f' {target_args}'
)
target_args = [manifest_types.INPUT_FILE]
case _:
pass
binary_config = manifest_types.CommandLineBinaryConfig(
kind=manifest_types.BinaryConfigKind.OSS_FUZZ,
binary_name='oss-fuzz', # The name will be replaced with the target.
binary_args=target_args,
binary_env=target_env,
harness_kind=harness_kind,
)
targets_to_index = None
if args.targets:
targets_to_index = args.targets.split(',')
for directory in ['aflplusplus', 'fuzztest', 'honggfuzz', 'libfuzzer']:
path = os.path.join(os.environ['SRC'], directory)
shutil.rmtree(path, ignore_errors=True)
# Initially, we put snapshots directly in /out. This caused a bug where each
# snapshot was added to the next because they contain the contents of /out.
SNAPSHOT_DIR.mkdir(exist_ok=True)
# We don't have an existing /out dir on oss-fuzz's build infra.
OUT.mkdir(parents=True, exist_ok=True)
set_up_wrapper_dir()
build_project(
None if args.targets_all_index else targets_to_index,
args.compile_arg,
args.binaries_only,
TRACING_COVERAGE_FLAGS
if args.tracing_instrumentation
else DEFAULT_COVERAGE_FLAGS,
)
if not args.binaries_only:
file_extension = '.tgz' if args.compressed else '.tar'
test_and_archive(binary_config, targets_to_index, file_extension)
for snapshot in SNAPSHOT_DIR.iterdir():
shutil.move(str(snapshot), OUT)
# By default, this directory has o-rwx and its contents can't be deleted
# by a non-root user from outside the container. The rest of the files are
# unaffected because to delete a file, a write permission on its enclosing
# directory is sufficient regardless of the owner.
cdb_dir = OUT / 'cdb'
try:
cdb_dir.chmod(
cdb_dir.stat().st_mode | stat.S_IROTH | stat.S_IWOTH | stat.S_IXOTH
)
except OSError:
pass
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/indexer/index_build_test.py
================================================
#!/usr/bin/env python3
# Copyright 2026 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""index_build tests.
This is only runnable on OSS-Fuzz infrastructure.
"""
import json
import os
import pathlib
import subprocess
import tarfile
from typing import Sequence
import unittest
import manifest_types
THIS_DIR = pathlib.Path(__file__).parent
OSS_FUZZ_DIR = THIS_DIR.parent.parent.parent.parent
@unittest.skipUnless(
os.getenv('INDEX_BUILD_TESTS'), 'Tests do not run on infra'
)
class IndexBuildTest(unittest.TestCase):
def _build_project(
self, project: str, *additional_args, compressed: bool
) -> Sequence[pathlib.Path]:
subprocess.run(
('python3', 'infra/helper.py', 'build_image', '--no-pull', project),
cwd=OSS_FUZZ_DIR,
check=True,
)
out_dir = OSS_FUZZ_DIR / f'build/out/{project}'
docker_args = [
'docker',
'run',
'--rm',
'-e',
f'PROJECT_NAME={project}',
'-v',
f'{THIS_DIR}:/opt/indexer',
'-v',
f'{out_dir}:/out',
f'gcr.io/oss-fuzz/{project}',
'/opt/indexer/index_build.py',
]
if additional_args:
docker_args.extend(additional_args)
file_suffix = '.tar'
if compressed:
docker_args.append('--compressed')
file_suffix = '.tgz'
subprocess.run(docker_args, cwd=OSS_FUZZ_DIR, check=True)
return [
file
for file in out_dir.iterdir()
if file.suffix == file_suffix and file.name.startswith(project)
]
def _check_archive(self, archive_path: pathlib.Path):
has_obj_lib = False
has_idx_sqlite = False
has_idx_absolute = False
has_idx_relative = False
manifest = None
print(f'Testing {archive_path}')
with tarfile.open(archive_path) as tar:
members = tar.getmembers()
for member in members:
if member.name.startswith('obj/lib/'):
has_obj_lib = True
if member.name.startswith('idx/absolute/'):
has_idx_absolute = True
if member.name.startswith('idx/relative/'):
has_idx_relative = True
if member.name == 'idx/db.sqlite':
has_idx_sqlite = True
if member.name == 'manifest.json':
file = tar.extractfile(member)
self.assertIsNotNone(file)
if file: # Make type checkers happy.
manifest = json.load(file)
self.assertTrue(manifest['lib_mount_path'])
self.assertIsNotNone(
tar.getmember('obj/' + manifest['binary_config']['binary_name'])
)
self.assertEqual(
manifest['binary_config']['binary_args'],
[manifest_types.INPUT_FILE],
)
self.assertTrue(has_obj_lib, 'obj/lib/ was not found in the archive.')
self.assertTrue(
has_idx_sqlite, 'idx/db.sqlite was not found in the archive.'
)
self.assertTrue(
has_idx_absolute, 'idx/absolute/ was not found in the archive.'
)
self.assertTrue(
has_idx_relative, 'idx/relative/ was not found in the archive.'
)
self.assertIsNotNone(
manifest, 'manifest.json was not found or is empty in the archive.'
)
def test_basic_build(self):
"""Test basic build."""
for compressed in (False, True):
archives = self._build_project('expat', compressed=compressed)
self.assertGreater(len(archives), 0)
for archive in archives:
self._check_archive(archive)
def test_build_with_target_allowlist(self):
"""Test basic build with target allowlist."""
for compressed in (False, True):
archives = self._build_project(
'expat',
'--targets',
'xml_parse_fuzzer_UTF-8',
compressed=compressed,
)
self.assertEqual(len(archives), 1)
self.assertIn('xml_parse_fuzzer_UTF-8', archives[0].name)
for archive in archives:
self._check_archive(archive)
================================================
FILE: infra/base-images/base-builder/indexer/manifest_constants.py
================================================
#!/usr/bin/env python3
# Copyright 2026 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Constants pertaining to snapshot manifests."""
# pylint: disable=g-import-not-at-top
try:
import pathlib
Path = pathlib.Path
except ImportError:
import pathlib
Path = pathlib.Path
# Source directory.
SRC_DIR = Path("src")
# Object directory.
OBJ_DIR = Path("obj")
# Directory for indexer data.
INDEX_DIR = Path("idx")
# Relative source file root in the index.
INDEX_RELATIVE_SOURCES = INDEX_DIR / "relative"
# Absolute source file root in the index.
INDEX_ABSOLUTE_SOURCES = INDEX_DIR / "absolute"
# The index database filename.
INDEX_DB = Path("db.sqlite")
# Library directory, where shared libraries are copied - inside obj.
LIB_DIR = OBJ_DIR / "lib"
# Manifest location
MANIFEST_PATH = Path("manifest.json")
# Where archive version 1 expects the lib directory to be mounted.
LIB_MOUNT_PATH_V1 = Path("/ossfuzzlib")
# Will be replaced with the input file for target execution.
INPUT_FILE = ""
# A file the target can write output to.
OUTPUT_FILE = ""
# Will be replaced with any dynamic arguments.
DYNAMIC_ARGS = ""
================================================
FILE: infra/base-images/base-builder/indexer/manifest_types.py
================================================
#!/usr/bin/env python3
# Copyright 2026 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Classes and tools to build an indexer snapshot according to the spec.
A snapshot is a tarball containing the following:
- source files
- build artifacts (e.g. object files, shared libraries)
- indexer artifacts (e.g. clang command lines, symbol files)
- the manifest.json file, according to the Manifest class below.
"""
import dataclasses
import enum
import inspect
import io
import json
import logging
import os
import pathlib
import shlex
import shutil
import tarfile
import tempfile
from typing import Any, Callable, Mapping, Self, Sequence
import urllib.request
import manifest_constants
import pathlib
SRC_DIR = manifest_constants.SRC_DIR
OBJ_DIR = manifest_constants.OBJ_DIR
INDEX_DIR = manifest_constants.INDEX_DIR
INDEX_DB = manifest_constants.INDEX_DB
LIB_DIR = manifest_constants.LIB_DIR
MANIFEST_PATH = manifest_constants.MANIFEST_PATH
LIB_MOUNT_PATH_V1 = manifest_constants.LIB_MOUNT_PATH_V1
INPUT_FILE = manifest_constants.INPUT_FILE
OUTPUT_FILE = manifest_constants.OUTPUT_FILE
DYNAMIC_ARGS = manifest_constants.DYNAMIC_ARGS
# Min archive version we currently support.
_MIN_SUPPORTED_ARCHIVE_VERSION = 1
# The current version of the build archive format.
ARCHIVE_VERSION = 5
# OSS-Fuzz $OUT dir.
OUT = pathlib.Path(os.getenv("OUT", "/out"))
# OSS-Fuzz coverage info.
_COVERAGE_INFO_URL = (
"https://storage.googleapis.com/oss-fuzz-coverage/"
f"latest_report_info/{os.getenv('PROJECT_NAME')}.json"
)
class RepositoryType(enum.StrEnum):
"""The type of repository."""
GIT = enum.auto()
SVN = enum.auto()
HG = enum.auto()
@dataclasses.dataclass(frozen=True)
class SourceRef:
"""The reference to a source code repository.
Attributes:
type: The type of repository.
url: The URL of the repository.
rev: The revision of the repository.
"""
type: RepositoryType
url: str
rev: str
@classmethod
def from_dict(cls, data: dict[str, Any]) -> Self:
"""Creates a SourceRef object from a deserialized dict."""
return SourceRef(
url=data["url"], rev=data["rev"], type=RepositoryType(data["type"])
)
@dataclasses.dataclass(frozen=True)
class Reproducibility:
"""A report of how reproducible a known bug is."""
# How many of the trials succeeded in reproducing the behavior?
success_count: int = 0
# How many reproduction trials were attempted?
trial_count: int = 0
@classmethod
def from_dict(cls, data: dict[str, Any]) -> Self:
"""Creates a Reproducibility object from a deserialized dict."""
return Reproducibility(
success_count=data["success_count"],
trial_count=data["trial_count"],
)
class BinaryConfigKind(enum.StrEnum):
"""The kind of binary configurations."""
OSS_FUZZ = enum.auto()
BINARY = enum.auto()
def validate_in(self, options: list[Self]):
if self not in options:
raise ValueError(
f"Expected one of the following binary config kinds: {options}, "
f"but got {self}"
)
@dataclasses.dataclass(frozen=True, kw_only=True)
class BinaryConfig:
"""Base binary configuration.
Attributes:
kind: The kind of binary configuration.
binary_name: The name of the executable file.
"""
kind: BinaryConfigKind
binary_name: str
@property
def uses_stdin(self) -> bool:
"""Whether the binary uses stdin."""
del self
return False
@classmethod
def from_dict(cls, config_dict: Mapping[str, Any]) -> Self:
"""Deserializes the correct `BinaryConfig` subclass from a dict."""
mapping = {
BinaryConfigKind.OSS_FUZZ: CommandLineBinaryConfig,
BinaryConfigKind.BINARY: CommandLineBinaryConfig,
}
kind = config_dict["kind"]
if kind not in mapping:
raise ValueError(f"Unknown BinaryConfigKind: {kind}")
val = config_dict
if isinstance(val.get("binary_args"), str):
logging.warning(
"BinaryConfig: binary_args is type string instead of list."
" This is deprecated. Converting to list. Args: %s",
val["binary_args"],
)
val = dict(val, binary_args=shlex.split(val["binary_args"]))
return mapping[kind].from_dict(val)
def to_dict(self) -> dict[str, Any]:
"""Converts a BinaryConfig object to a serializable dict."""
return dataclasses.asdict(self)
class HarnessKind(enum.StrEnum):
"""The target/harness kind."""
LIBFUZZER = enum.auto()
BINARY = enum.auto()
# The target is a JavaScript shell that consumes JavaScript code.
JS = enum.auto()
@dataclasses.dataclass(frozen=True, kw_only=True)
class CommandLineBinaryConfig(BinaryConfig):
"""Configuration for a command-line userspace binary."""
binary_args: list[str]
# Additional environment variables to pass to the binary. They will overwrite
# any existing environment variables with the same name.
# Input replacement works on these variables as well.
binary_env: dict[str, str] = dataclasses.field(default_factory=dict)
harness_kind: HarnessKind
# Whether to filter the compile commands to only include object files that
# are directly linked into the target binary. Should usually be true but
# some targets like V8 require this to be false, see b/433718862.
filter_compile_commands: bool = True
@property
def uses_stdin(self) -> bool:
"""Whether the binary uses stdin."""
return manifest_constants.INPUT_FILE not in self.binary_args
@classmethod
def from_dict(cls, config_dict: Mapping[str, Any]) -> Self:
"""Deserializes the `CommandLineBinaryConfig` from a dict."""
kind = BinaryConfigKind(config_dict["kind"])
kind.validate_in([BinaryConfigKind.OSS_FUZZ, BinaryConfigKind.BINARY])
# Default to "binary" for backwards compatibility.
harness_kind = HarnessKind(
config_dict.get("harness_kind", HarnessKind.BINARY)
)
return CommandLineBinaryConfig(
kind=kind,
harness_kind=harness_kind,
binary_name=config_dict["binary_name"],
binary_args=config_dict["binary_args"],
binary_env=config_dict.get("binary_env", {}),
filter_compile_commands=config_dict.get(
"filter_compile_commands", True
),
)
def _get_sqlite_db_user_version(sqlite_db_path: pathlib.Path) -> int:
"""Retrieves `PRAGMA user_version;` value without connecting to the database."""
with sqlite_db_path.open("rb") as stream:
# https://www.sqlite.org/pragma.html#pragma_user_version - a big-endian
# 32-bit number at offset 60 of the database header.
too_small_error = ValueError(
f"The file '{sqlite_db_path}' is too small for an SQLite database."
)
try:
stream.seek(60)
except OSError as e:
raise too_small_error from e
version_bytes = stream.read(4)
if len(version_bytes) < 4:
raise too_small_error
return int.from_bytes(version_bytes, byteorder="big")
@dataclasses.dataclass(frozen=True)
class Manifest:
"""Contains general meta-information about the snapshot."""
# The name of the target.
name: str
# A unique identifier for the snapshot (not necessarily a valid UUID).
uuid: str
# A fixed path that shared libraries stored at `./obj/lib` should be mounted
# at before running the target.
lib_mount_path: pathlib.Path | None
# The binary configuration used to build the snapshot.
binary_config: BinaryConfig
# The path prefix of the actual build directory (e.g., a temporary file in
# the build host). It's used during replay to remove noisy source-file
# prefixes from reports.
source_dir_prefix: str | None = None
# The reproducibility information about the bug in this snapshot.
reproducibility: Reproducibility | None = None
# Example source map:
# {
# "/src/hunspell": {
# "type": "git",
# "url": "https://github.com/hunspell/hunspell.git",
# "rev": "a9b7270c1c2832312cfb20c3d1cf5c5080bf221b"
# }
# }
source_map: dict[pathlib.Path, SourceRef] | None = None
# Version of the manifest spec.
version: int = ARCHIVE_VERSION
# Version of the index database schema.
index_db_version: int | None = None
@classmethod
def from_dict(cls, data: dict[str, Any]) -> Self:
"""Creates a Manifest object from a deserialized dict."""
if data["version"] == 1:
lib_mount_path = LIB_MOUNT_PATH_V1
else:
lib_mount_path = _get_mapped(data, "lib_mount_path", pathlib.Path)
if data["version"] < 3:
if not isinstance(data.get("binary_args"), str):
raise RuntimeError(
"binary_args must be a string in version 1 and 2, but got"
f" {type(data.get('binary_args'))}"
)
binary_args = _get_mapped(data, "binary_args", shlex.split)
else:
binary_args = data.get("binary_args")
if data["version"] < 4:
binary_config = CommandLineBinaryConfig(
kind=BinaryConfigKind.BINARY,
binary_name=data["binary_name"],
binary_args=binary_args or [],
harness_kind=HarnessKind.BINARY,
binary_env={},
)
else:
binary_config = _get_mapped(data, "binary_config", BinaryConfig.from_dict)
version = data["version"]
if _MIN_SUPPORTED_ARCHIVE_VERSION <= version <= ARCHIVE_VERSION:
# Upgrade archive version - we have upgraded all necessary fields.
version = ARCHIVE_VERSION
else:
logging.warning(
"Unsupported manifest version %s detected. Not upgrading.", version
)
return Manifest(
version=version,
index_db_version=data.get("index_db_version"),
name=data["name"],
uuid=data["uuid"],
lib_mount_path=lib_mount_path,
source_map=_get_mapped(data, "source_map", source_map_from_dict),
source_dir_prefix=data.get("source_dir_prefix"),
reproducibility=_get_mapped(
data, "reproducibility", Reproducibility.from_dict
),
binary_config=binary_config,
)
def to_dict(self) -> dict[str, Any]:
"""Converts a Manifest object to a serializable dict."""
data = dataclasses.asdict(self)
data["binary_config"] = self.binary_config.to_dict()
data["lib_mount_path"] = _get_mapped(
data, "lib_mount_path", lambda x: x.as_posix()
)
data["source_map"] = _get_mapped(data, "source_map", source_map_to_dict)
return data
def validate(self) -> None:
"""Validates the manifest with some simple checks.
Raises:
RuntimeError: If the manifest is invalid.
"""
if self.version < _MIN_SUPPORTED_ARCHIVE_VERSION:
raise RuntimeError(
f"Build archive version too low: {self.version}. Supporting at"
f" least {_MIN_SUPPORTED_ARCHIVE_VERSION}."
)
if self.version > ARCHIVE_VERSION:
raise RuntimeError(
f"Build archive version too high: {self.version}. Only supporting"
f" up to {ARCHIVE_VERSION}."
)
if self.version == 1 and LIB_MOUNT_PATH_V1 != self.lib_mount_path:
raise RuntimeError(
"Build archive with version 1 has an alternative lib_mount_path set"
f" ({self.lib_mount_path}). This is not a valid archive."
)
if not self.name or not self.uuid or not self.binary_config:
raise RuntimeError(
"Attempting to load a manifest with missing fields. Expected all"
" fields to be set, but got {self}"
)
if self.source_map is not None:
for _, ref in self.source_map.items():
if not ref.url:
raise RuntimeError(
"Attempting to load a manifest with a source map entry with an"
" empty URL. Source map entry: {ref}"
)
# check very simple basic types.
for k, v in inspect.get_annotations(type(self)).items():
if not isinstance(v, type):
continue
if not isinstance(getattr(self, k), v):
raise RuntimeError(
f"Type mismatch for field {k}: expected {v}, got"
f" {type(getattr(self, k))}"
)
# We updated from string to list in version 3, make sure this propagated.
binary_config = self.binary_config
if hasattr(binary_config, "binary_args"):
if not isinstance(binary_config.binary_args, list):
raise RuntimeError(
"Type mismatch for field binary_config.binary_args: expected list,"
f"got {type(binary_config.binary_args)}"
)
def save_build(
self,
*,
source_dir: pathlib.PurePath | None,
build_dir: pathlib.PurePath,
index_dir: pathlib.PurePath,
archive_path: pathlib.PurePath,
out_dir: pathlib.PurePath = pathlib.Path("/out"),
overwrite: bool = True,
) -> Self:
"""Saves a build archive with this Manifest."""
if os.path.exists(archive_path) and not overwrite:
raise FileExistsError(f"Not overwriting existing archive {archive_path}")
self.validate()
with tempfile.NamedTemporaryFile() as tmp:
mode = "w:gz" if archive_path.suffix.endswith("gz") else "w"
with tarfile.open(tmp.name, mode) as tar:
def _save_dir(
path: pathlib.PurePath,
prefix: pathlib.Path,
exclude_build_artifacts: bool = False,
only_include_target: str | None = None,
):
prefix = prefix.as_posix() + "/"
for root, _, files in os.walk(path):
for file in files:
if file.endswith("_seed_corpus.zip"):
# Don't copy over the seed corpus -- it's not necessary.
continue
if "/.git/" in root or root.endswith("/.git"):
# Skip the .git directory -- it can be large.
continue
file = pathlib.Path(root, file)
if exclude_build_artifacts and _is_elf(file):
continue
if only_include_target and _is_elf(file):
# Skip ELF files that aren't the relevant target (unless it's a
# shared library).
if (
file.name != only_include_target
and ".so" not in file.name
and not file.absolute().is_relative_to(out_dir / "lib")
):
continue
tar.add(
# Don't try to replicate symlinks in the tarfile, because they
# can lead to various issues (e.g. absolute symlinks).
file.resolve().as_posix(),
arcname=prefix + str(file.relative_to(path)),
)
dumped_self = self
if self.index_db_version is None:
index_db_version = _get_sqlite_db_user_version(
pathlib.Path(index_dir) / INDEX_DB
)
dumped_self = dataclasses.replace(
self, index_db_version=index_db_version
)
# Make sure the manifest is the first file in the archive to avoid
# seeking when we only need the manifest.
_add_string_to_tar(
tar,
MANIFEST_PATH.as_posix(),
json.dumps(
dumped_self.to_dict(),
indent=2,
),
)
# Make sure the index databases (the only files directly in `INDEX_DIR`)
# are early in the archive for the same reason.
_save_dir(index_dir, INDEX_DIR)
if source_dir:
_save_dir(source_dir, SRC_DIR, exclude_build_artifacts=True)
# Only include the relevant target for the snapshot, to save on disk
# space.
_save_dir(
build_dir,
OBJ_DIR,
only_include_target=self.binary_config.binary_name,
)
if self.binary_config.kind == BinaryConfigKind.OSS_FUZZ:
copied_files = [tar_info.name for tar_info in tar.getmembers()]
try:
report_missing_source_files(
self.binary_config.binary_name, copied_files, tar
)
except Exception as e: # pylint: disable=broad-except
logging.warning("Failed to report missing source files: %s", e)
shutil.copyfile(tmp.name, archive_path)
return dumped_self
def report_missing_source_files(
binary_name: str, copied_files: list[str], tar: tarfile.TarFile
):
"""Saves a report of missing source files to the snapshot tarball."""
copied_files = {_get_comparable_path(file) for file in copied_files}
covered_files = {
_get_comparable_path(path): path
for path in get_covered_files(binary_name)
}
missing = set(covered_files) - copied_files
if not missing:
return
logging.info("Reporting missing files: %s", missing)
missing_report_lines = sorted([covered_files[k] for k in missing])
report_name = f"{binary_name}_missing_files.txt"
tar_info = tarfile.TarInfo(name=report_name)
missing_report = " ".join(missing_report_lines)
missing_report_bytes = missing_report.encode("utf-8")
tar.addfile(tarinfo=tar_info, fileobj=io.BytesIO(missing_report_bytes))
with open(os.path.join(OUT, report_name), "w") as fp:
fp.write(missing_report)
def _get_comparable_path(path: str) -> tuple[str, str]:
return os.path.basename(os.path.dirname(path)), os.path.basename(path)
def get_covered_files(target: str) -> Sequence[str]:
"""Returns the files covered by fuzzing on OSS-Fuzz by the target."""
with urllib.request.urlopen(_COVERAGE_INFO_URL) as resp:
latest_info = json.load(resp)
stats_url = latest_info.get("fuzzer_stats_dir").replace(
"gs://", "https://storage.googleapis.com/"
)
target_url = f"{stats_url}/{target}.json"
with urllib.request.urlopen(target_url) as resp:
target_cov = json.load(resp)
files = target_cov["data"][0]["files"]
return [
file["filename"]
for file in files
if file["summary"]["regions"]["covered"]
]
def _get_mapped(
data: dict[str, Any], key: str, mapper: Callable[[Any], Any]
) -> Any | None:
"""Get a value from a dict and apply a mapper to it, if it's not None."""
value = data.get(key)
if value is None:
return None
return mapper(value)
def source_map_from_dict(data: dict[str, Any]) -> dict[pathlib.Path, SourceRef]:
"""Converts a path: obj dict to a dictionary of SourceRef objects."""
return {pathlib.Path(x): SourceRef.from_dict(y) for x, y in data.items()}
def source_map_to_dict(
x: dict[pathlib.Path, SourceRef],
) -> dict[str, Any]:
"""Converts a dictionary of SourceRef objects to a string: obj dict."""
return {k.as_posix(): v for k, v in x.items()}
def _add_string_to_tar(tar: tarfile.TarFile, name: str, data: str) -> None:
bytesio = io.BytesIO(data.encode("utf-8"))
tar_info = tarfile.TarInfo(name)
tar_info.size = len(bytesio.getvalue())
tar.addfile(tarinfo=tar_info, fileobj=bytesio)
def _is_elf(path: pathlib.PurePath) -> bool:
"""Checks if a file is an ELF file."""
try:
with open(path, "rb") as f:
return f.read(4) == b"\x7fELF"
except OSError:
# Can happen if the file is a symlink, etc.
return False
def parse_env(env_list: list[str]) -> dict[str, str]:
"""Helper function to parse environment variables from a list.
Args:
env_list: A list of environment variables in the format of "key=value".
Returns:
A dictionary of environment variables.
Raises:
ValueError: If a key is empty or invalid.
"""
env = {}
def assert_key_valid(key: str) -> None:
if not key:
raise ValueError("Environment variable key is empty.")
# Check that the key looks like a valid environment variable name.
if key in env:
raise ValueError(
f"Environment variable key {key} is defined twice. "
f"Existing value: {env[key]}, new value: {value}."
)
for entry in env_list:
if "=" not in entry:
logging.warning(
"Environment variable string is not in the format of 'key=value': %s",
entry,
)
key, _, value = entry.partition("=")
assert_key_valid(key)
env[key] = value
return env
================================================
FILE: infra/base-images/base-builder/indexer/utils.py
================================================
#!/usr/bin/env python3
# Copyright 2026 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Utils for snapshotting shared libraries."""
from collections.abc import Mapping, Sequence
import dataclasses
import os
import pathlib
import re
import subprocess
from typing import Final, Protocol
from absl import logging
from google3.pyglib import gfile
import pathlib
LD_BINARY_PATH_X86_64: Final[pathlib.Path] = (
pathlib.Path("/lib64/ld-linux-x86-64.so.2")
)
LD_BINARY_PATH_X86: Final[pathlib.Path] = pathlib.Path("/lib32/ld-linux.so.2")
@dataclasses.dataclass(frozen=True)
class SharedLibrary:
"""A shared library with its name and path."""
name: str
path: pathlib.Path
def _parse_ld_trace_output(
output: str, ld_binary_path: pathlib.Path
) -> Sequence[SharedLibrary]:
"""Parses the output of `LD_TRACE_LOADED_OBJECTS=1 ld.so`."""
if "statically linked" in output:
return []
# Example output:
# linux-vdso.so.1 => (0x00007f40afc0f000)
# linux-vdso.so.1 (0x00007f76b9377000)
# lib foo.so => /tmp/sharedlib/lib foo.so (0x00007f76b9367000)
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f76b9157000)
# /lib64/ld-linux-x86-64.so.2 (0x00007f76b9379000)
# The last line can also be:
# /grte/lib64/lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
# (0x00007f76b9379000)
#
# The lines that do not have a => should be skipped.
# The dynamic linker should always be copied AND have its executable bit set.
# The lines that have a => could contain a space, but we copy whatever is on
# the right side of the =>, removing the load address.
shared_libraries = [
SharedLibrary(name=ld_binary_path.name, path=ld_binary_path)
]
for lib_name, lib_path in re.findall(r"(\S+) => .*?(\S+) \(", output):
lib_path = pathlib.Path(lib_path)
if lib_path == ld_binary_path:
continue
shared_libraries.append(SharedLibrary(name=lib_name, path=lib_path))
return shared_libraries
class CommandRunner(Protocol):
"""Runs `command` with environment `env` and returns its stdout."""
def __call__(
self,
command: Sequence[str | os.PathLike[str]],
env: Mapping[str, str] | None = None,
) -> bytes:
pass
def run_subprocess(
command: Sequence[str | os.PathLike[str]],
env: Mapping[str, str] | None = None,
) -> bytes:
return subprocess.run(
command,
capture_output=True,
env=env,
check=True,
).stdout
def get_shared_libraries(
binary_path: os.PathLike[str],
command_runner: CommandRunner = run_subprocess,
ld_binary_path: pathlib.Path = LD_BINARY_PATH_X86_64,
) -> Sequence[SharedLibrary]:
"""Enumerates the shared libraries required by the given binary."""
env = os.environ | {
"LD_TRACE_LOADED_OBJECTS": "1",
"LD_BIND_NOW": "1",
}
stdout_bytes = command_runner([ld_binary_path, binary_path], env=env)
return _parse_ld_trace_output(stdout_bytes.decode(), ld_binary_path)
def copy_shared_libraries(
libraries: Sequence[SharedLibrary], dst_path: pathlib.Path
) -> None:
"""Copies the shared libraries to the shared directory."""
for lib in libraries:
try:
logging.info("Copying %s => %s", lib.name, lib.path)
gfile.Copy(lib.path, dst_path / lib.path.name, overwrite=True, mode=0o755)
except gfile.GOSError:
logging.exception("Could not copy %s to %s", lib.path, dst_path)
raise
def patch_binary_rpath_and_interpreter(
binary_path: os.PathLike[str],
lib_mount_path: pathlib.Path,
ld_binary_path: pathlib.Path = LD_BINARY_PATH_X86_64,
):
"""Patches the binary rpath and interpreter."""
subprocess.run(
[
"patchelf",
"--set-rpath",
lib_mount_path.as_posix(),
"--force-rpath",
binary_path,
],
check=True,
)
subprocess.run(
[
"patchelf",
"--set-interpreter",
(lib_mount_path / ld_binary_path.name).as_posix(),
binary_path,
],
check=True,
)
def get_library_mount_path(binary_id: str) -> pathlib.Path:
return pathlib.Path("/tmp") / (binary_id + "_lib")
def report_progress(stage: str, is_done: bool = False) -> None:
"""Reports progress of a stage of the snapshotting process."""
logging.info("%s%s", stage, "..." if not is_done else "")
================================================
FILE: infra/base-images/base-builder/install_deps.sh
================================================
#!/bin/bash -eux
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Install base-builder's dependencies in a architecture-aware way.
case $(uname -m) in
x86_64)
dpkg --add-architecture i386
;;
esac
apt-get update && \
apt-get install -y \
binutils-dev \
build-essential \
curl \
wget \
git \
jq \
patchelf \
rsync \
subversion \
zip
case $(uname -m) in
x86_64)
apt-get install -y libc6-dev-i386
;;
esac
================================================
FILE: infra/base-images/base-builder/install_deps_ubuntu-20-04.sh
================================================
#!/bin/bash -eux
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Install base-builder's dependencies in a architecture-aware way.
case $(uname -m) in
x86_64)
dpkg --add-architecture i386
;;
esac
apt-get update && \
apt-get install -y \
binutils-dev \
build-essential \
curl \
wget \
git \
jq \
patchelf \
rsync \
subversion \
zip
case $(uname -m) in
x86_64)
apt-get install -y libc6-dev-i386
;;
esac
================================================
FILE: infra/base-images/base-builder/install_deps_ubuntu-24-04.sh
================================================
#!/bin/bash -eux
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Install base-builder's dependencies in a architecture-aware way.
case $(uname -m) in
x86_64)
dpkg --add-architecture i386
;;
esac
apt-get update && \
apt-get install -y \
binutils-dev \
build-essential \
curl \
wget \
git \
jq \
patchelf \
rsync \
subversion \
zip
case $(uname -m) in
x86_64)
apt-get install -y libc6-dev-i386
;;
esac
# Ubuntu 24.04 does not have lcab. Install an older .deb from Ubuntu repos.
curl -LO https://mirrors.edge.kernel.org/ubuntu/pool/universe/l/lcab/lcab_1.0b12-7_amd64.deb && \
apt-get install -y ./lcab_1.0b12-7_amd64.deb && \
rm lcab_1.0b12-7_amd64.deb
# Create a custom apt configuration to allow downgrades and non-interactive installs.
cat < /etc/apt/apt.conf.d/99-oss-fuzz-apt-defaults
// OSS-Fuzz custom apt configuration.
// Automatically allow downgrades and assume "yes" to all prompts.
APT::Get::Allow-Downgrades "true";
APT::Get::Assume-Yes "true";
EOF
================================================
FILE: infra/base-images/base-builder/install_go.sh
================================================
#!/bin/bash -eux
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
cd /tmp
export GOROOT=/root/.go
wget https://go.dev/dl/go1.25.0.linux-amd64.tar.gz
mkdir temp-go
tar -C temp-go/ -xzf go1.25.0.linux-amd64.tar.gz
mkdir /root/.go/
mv temp-go/go/* /root/.go/
rm -rf temp-go
echo 'Set "GOPATH=/root/go"'
echo 'Set "PATH=$PATH:/root/.go/bin:$GOPATH/bin"'
go install github.com/mdempsky/go114-fuzz-build@latest
ln -s $GOPATH/bin/go114-fuzz-build $GOPATH/bin/go-fuzz
# Build signal handler
if [ -f "$GOPATH/gosigfuzz/gosigfuzz.c" ]; then
clang -c $GOPATH/gosigfuzz/gosigfuzz.c -o $GOPATH/gosigfuzz/gosigfuzz.o
fi
cd /tmp
git clone https://github.com/AdamKorcz/go-118-fuzz-build
cd go-118-fuzz-build
go build
mv go-118-fuzz-build $GOPATH/bin/
# Build v2 binaries
git checkout v2
go build .
mv go-118-fuzz-build $GOPATH/bin/go-118-fuzz-build_v2
pushd cmd/convertLibFuzzerTestcaseToStdLibGo
go build . && mv convertLibFuzzerTestcaseToStdLibGo $GOPATH/bin/
popd
pushd cmd/addStdLibCorpusToFuzzer
go build . && mv addStdLibCorpusToFuzzer $GOPATH/bin/
popd
================================================
FILE: infra/base-images/base-builder/install_java.sh
================================================
#!/bin/bash -eux
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Install OpenJDK 17 and trim its size by removing unused components. This enables using Jazzer's mutation framework.
cd /tmp
curl --silent -L -O https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.16+8/OpenJDK17U-jdk_x64_linux_hotspot_17.0.16_8.tar.gz && \
mkdir -p $JAVA_HOME
tar -xz --strip-components=1 -f OpenJDK17U-jdk_x64_linux_hotspot_17.0.16_8.tar.gz --directory $JAVA_HOME && \
rm -f OpenJDK17U-jdk_x64_linux_hotspot_17.0.16_8.tar.gz
rm -rf $JAVA_HOME/jmods $JAVA_HOME/lib/src.zip
# Install OpenJDK 15 and trim its size by removing unused components. Some projects only run with Java 15.
curl --silent -L -O https://download.java.net/java/GA/jdk15.0.2/0d1cfde4252546c6931946de8db48ee2/7/GPL/openjdk-15.0.2_linux-x64_bin.tar.gz && \
mkdir -p $JAVA_15_HOME
tar -xz --strip-components=1 -f openjdk-15.0.2_linux-x64_bin.tar.gz --directory $JAVA_15_HOME && \
rm -f openjdk-15.0.2_linux-x64_bin.tar.gz
rm -rf $JAVA_15_HOME/jmods $JAVA_15_HOME/lib/src.zip
================================================
FILE: infra/base-images/base-builder/install_javascript.sh
================================================
#!/bin/bash -eux
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# see installation instructions: https://github.com/nodesource/distributions#available-architectures
apt-get update
apt-get install -y ca-certificates curl gnupg
mkdir -p /etc/apt/keyrings
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
NODE_MAJOR=20
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list
apt-get update
apt-get install nodejs -y
================================================
FILE: infra/base-images/base-builder/install_python.sh
================================================
#!/bin/bash -eux
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo "ATHERIS INSTALL"
unset CFLAGS CXXFLAGS
# PYI_STATIC_ZLIB=1 is needed for installing pyinstaller 5.0
export PYI_STATIC_ZLIB=1
LIBFUZZER_LIB=$( echo /usr/local/lib/clang/*/lib/x86_64-unknown-linux-gnu/libclang_rt.fuzzer_no_main.a ) pip3 install -v --no-cache-dir "atheris>=2.3.0" "pyinstaller==6.10.0" "setuptools==72.1.0" "coverage==6.3.2"
rm -rf /tmp/*
================================================
FILE: infra/base-images/base-builder/install_ruby.sh
================================================
#!/bin/bash -eux
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo "Starting ruby installation"
RUBY_VERSION=3.3.1
RUBY_DEPS="binutils xz-utils libyaml-dev libffi-dev zlib1g-dev"
apt update && apt install -y $RUBY_DEPS
curl -O https://cache.ruby-lang.org/pub/ruby/3.3/ruby-$RUBY_VERSION.tar.gz
tar -xvf ruby-$RUBY_VERSION.tar.gz
cd ruby-$RUBY_VERSION
./configure --disable-install-doc --disable-install-rdoc --disable-install-capi
make -j$(nproc)
make install
cd ../
# Clean up the sources.
rm -rf ./ruby-$RUBY_VERSION ruby-$RUBY_VERSION.tar.gz
echo "Finished installing ruby"
================================================
FILE: infra/base-images/base-builder/install_rust.sh
================================================
#!/bin/bash -eux
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
curl https://sh.rustup.rs | sh -s -- -y --default-toolchain=$RUSTUP_TOOLCHAIN --profile=minimal
cargo install cargo-fuzz --locked && rm -rf /rust/registry
# Needed to recompile rust std library for MSAN
rustup component add rust-src
cp -r /usr/local/lib/x86_64-unknown-linux-gnu/* /usr/local/lib/
================================================
FILE: infra/base-images/base-builder/install_swift.sh
================================================
#!/bin/bash -eux
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
SWIFT_PACKAGES="wget \
binutils \
git \
gnupg2 \
libc6-dev \
libcurl4 \
libedit2 \
libgcc-9-dev \
libpython2.7 \
libsqlite3-0 \
libstdc++-9-dev \
libxml2 \
libz3-dev \
pkg-config \
tzdata \
uuid-dev \
zlib1g-dev"
SWIFT_SYMBOLIZER_PACKAGES="build-essential make cmake ninja-build git python3 g++-multilib binutils-dev zlib1g-dev"
apt-get update && apt install -y $SWIFT_PACKAGES && \
apt install -y $SWIFT_SYMBOLIZER_PACKAGES --no-install-recommends
wget -q https://download.swift.org/swift-6.1.3-release/ubuntu2004/swift-6.1.3-RELEASE/swift-6.1.3-RELEASE-ubuntu20.04.tar.gz
tar xzf swift-6.1.3-RELEASE-ubuntu20.04.tar.gz
cp -r swift-6.1.3-RELEASE-ubuntu20.04/usr/* /usr/
rm -rf swift-6.1.3-RELEASE-ubuntu20.04.tar.gz swift-6.1.3-RELEASE-ubuntu20.04/
# TODO: Move to a seperate work dir
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout 63bf228450b8403e0c5e828d276be47ffbcd00d0 # TODO: Keep in sync with base-clang.
git apply ../llvmsymbol.diff --verbose
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_TARGETS_TO_BUILD=X86 \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_BUILD_TESTS=OFF \
-DLLVM_INCLUDE_TESTS=OFF llvm
ninja -j$(nproc) llvm-symbolizer
cp bin/llvm-symbolizer /usr/local/bin/llvm-symbolizer-swift
cd $SRC
rm -rf llvm-project llvmsymbol.diff
# TODO: Cleanup packages
apt-get remove --purge -y wget
apt-get autoremove -y
================================================
FILE: infra/base-images/base-builder/install_swift_ubuntu-20-04.sh
================================================
#!/bin/bash -eux
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
SWIFT_PACKAGES="wget \
binutils \
git \
gnupg2 \
libc6-dev \
libcurl4 \
libedit2 \
libgcc-9-dev \
libpython2.7 \
libsqlite3-0 \
libstdc++-9-dev \
libxml2 \
libz3-dev \
pkg-config \
tzdata \
uuid-dev \
zlib1g-dev"
SWIFT_SYMBOLIZER_PACKAGES="build-essential make cmake ninja-build git python3 g++-multilib binutils-dev zlib1g-dev"
apt-get update && apt install -y $SWIFT_PACKAGES && \
apt install -y $SWIFT_SYMBOLIZER_PACKAGES --no-install-recommends
wget -q https://download.swift.org/swift-6.1.3-release/ubuntu2004/swift-6.1.3-RELEASE/swift-6.1.3-RELEASE-ubuntu20.04.tar.gz
tar xzf swift-6.1.3-RELEASE-ubuntu20.04.tar.gz
cp -r swift-6.1.3-RELEASE-ubuntu20.04/usr/* /usr/
rm -rf swift-6.1.3-RELEASE-ubuntu20.04.tar.gz swift-6.1.3-RELEASE-ubuntu20.04/
# TODO: Move to a seperate work dir
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout 63bf228450b8403e0c5e828d276be47ffbcd00d0 # TODO: Keep in sync with base-clang.
git apply ../llvmsymbol.diff --verbose
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_TARGETS_TO_BUILD=X86 \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_BUILD_TESTS=OFF \
-DLLVM_INCLUDE_TESTS=OFF llvm
ninja -j$(nproc) llvm-symbolizer
cp bin/llvm-symbolizer /usr/local/bin/llvm-symbolizer-swift
cd $SRC
rm -rf llvm-project llvmsymbol.diff
# TODO: Cleanup packages
apt-get remove --purge -y wget
apt-get autoremove -y
================================================
FILE: infra/base-images/base-builder/install_swift_ubuntu-24-04.sh
================================================
#!/bin/bash -eux
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Detect Ubuntu version
source /etc/os-release
if [[ "$VERSION_ID" == "20.04" ]]; then
SWIFT_PACKAGES="wget \
binutils \
git \
gnupg2 \
libc6-dev \
libcurl4 \
libedit2 \
libgcc-9-dev \
libpython2.7 \
libsqlite3-0 \
libstdc++-9-dev \
libxml2 \
libz3-dev \
pkg-config \
tzdata \
uuid-dev \
zlib1g-dev"
SWIFT_URL="https://download.swift.org/swift-6.1.3-release/ubuntu2004/swift-6.1.3-RELEASE/swift-6.1.3-RELEASE-ubuntu20.04.tar.gz"
SWIFT_DIR="swift-6.1.3-RELEASE-ubuntu20_04"
elif [[ "$VERSION_ID" == "24.04" ]]; then
SWIFT_PACKAGES="wget \
binutils \
git \
gnupg2 \
libc6-dev \
libcurl4-openssl-dev \
libedit2 \
libgcc-13-dev \
libncurses-dev \
libpython3-dev \
libsqlite3-0 \
libstdc++-13-dev \
libxml2-dev \
libz3-dev \
pkg-config \
tzdata \
zip \
unzip \
zlib1g-dev"
SWIFT_URL="https://download.swift.org/swift-6.1.3-release/ubuntu2404/swift-6.1.3-RELEASE/swift-6.1.3-RELEASE-ubuntu24.04.tar.gz"
SWIFT_DIR="swift-6.1.3-RELEASE-ubuntu24.04"
else
echo "Unsupported Ubuntu version: $VERSION_ID"
exit 1
fi
SWIFT_SYMBOLIZER_PACKAGES="build-essential make cmake ninja-build git python3 g++-multilib binutils-dev zlib1g-dev"
apt-get update && apt install -y $SWIFT_PACKAGES && \
apt install -y $SWIFT_SYMBOLIZER_PACKAGES --no-install-recommends
wget -q $SWIFT_URL
tar xzf $(basename $SWIFT_URL)
cp -r $SWIFT_DIR/usr/* /usr/
rm -rf $(basename $SWIFT_URL) $SWIFT_DIR
# TODO: Move to a seperate work dir
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout 63bf228450b8403e0c5e828d276be47ffbcd00d0 # TODO: Keep in sync with base-clang.
git apply ../llvmsymbol.diff --verbose
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_TARGETS_TO_BUILD=X86 \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_BUILD_TESTS=OFF \
-DLLVM_INCLUDE_TESTS=OFF llvm
ninja -j$(nproc) llvm-symbolizer
cp bin/llvm-symbolizer /usr/local/bin/llvm-symbolizer-swift
cd $SRC
rm -rf llvm-project llvmsymbol.diff
# TODO: Cleanup packages
apt-get remove --purge -y wget
apt-get autoremove -y
================================================
FILE: infra/base-images/base-builder/jcc/build_jcc.bash
================================================
#!/bin/bash -eu
#
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
go build jcc.go
go build jcc2.go
gsutil cp jcc gs://clusterfuzz-builds/jcc/clang++-jcc
gsutil cp jcc gs://clusterfuzz-builds/jcc/clang-jcc
gsutil cp jcc2 gs://clusterfuzz-builds/jcc/clang++-jcc2
gsutil cp jcc2 gs://clusterfuzz-builds/jcc/clang-jcc2
================================================
FILE: infra/base-images/base-builder/jcc/go.mod
================================================
module github.com/google/jcc
go 1.21
================================================
FILE: infra/base-images/base-builder/jcc/jcc.go
================================================
// Copyright 2023 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package main
import (
"bytes"
"fmt"
"log"
"os"
"os/exec"
"path/filepath"
)
func ExecBuildCommand(bin string, args []string) (int, string, string) {
// Executes the original command.
cmd := exec.Command(bin, args...)
var outb, errb bytes.Buffer
cmd.Stdout = &outb
cmd.Stderr = &errb
cmd.Stdin = os.Stdin
cmd.Run()
return cmd.ProcessState.ExitCode(), outb.String(), errb.String()
}
func Compile(bin string, args []string) (int, string, string) {
// Run the actual command.
return ExecBuildCommand(bin, args)
}
func AppendStringToFile(filepath, new_content string) error {
// Appends |new_content| to the content of |filepath|.
file, err := os.OpenFile(filepath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
return err
}
defer file.Close()
_, err = file.WriteString(new_content)
return err
}
func WriteStdErrOut(args []string, outstr string, errstr string) {
// Prints |outstr| to stdout, prints |errstr| to stderr, and saves |errstr| to err.log.
fmt.Print(outstr)
fmt.Fprint(os.Stderr, errstr)
// Record what compile args produced the error and the error itself in log file.
AppendStringToFile("/tmp/err.log", fmt.Sprintf("%s\n", args)+errstr)
}
func main() {
f, err := os.OpenFile("/tmp/jcc.log", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Println(err)
}
defer f.Close()
if _, err := f.WriteString(fmt.Sprintf("%s\n", os.Args)); err != nil {
log.Println(err)
}
args := os.Args[1:]
basename := filepath.Base(os.Args[0])
isCPP := basename == "clang++-jcc"
newArgs := args
var bin string
if isCPP {
bin = "clang++"
} else {
bin = "clang"
}
fullCmdArgs := append([]string{bin}, newArgs...)
retcode, out, errstr := Compile(bin, newArgs)
WriteStdErrOut(fullCmdArgs, out, errstr)
os.Exit(retcode)
}
================================================
FILE: infra/base-images/base-builder/jcc/jcc2.go
================================================
// Copyright 2024 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package main
import (
"bytes"
"encoding/json"
"errors"
"fmt"
"io/fs"
"io/ioutil"
"log"
"os"
"os/exec"
"path/filepath"
"regexp"
"strings"
)
var MaxMissingHeaderFiles = 10
var CppifyHeadersMagicString = "\n/* JCCCppifyHeadersMagicString */\n"
func CopyFile(src string, dst string) {
contents, err := ioutil.ReadFile(src)
if err != nil {
panic(err)
}
err = ioutil.WriteFile(dst, contents, 0644)
if err != nil {
panic(err)
}
}
func TryFixCCompilation(cmdline []string) ([]string, int, string, string) {
var newFile string = ""
for i, arg := range cmdline {
if !strings.HasSuffix(arg, ".c") {
continue
}
if _, err := os.Stat(arg); errors.Is(err, os.ErrNotExist) {
continue
}
newFile = strings.TrimSuffix(arg, ".c")
newFile += ".cpp"
CopyFile(arg, newFile)
CppifyHeaderIncludesFromFile(newFile)
cmdline[i] = newFile
break
}
if newFile == "" {
return []string{}, 1, "", ""
}
cppBin := "clang++"
newCmdline := []string{"-stdlib=libc++"}
newCmdline = append(cmdline, newCmdline...)
newFullArgs := append([]string{cppBin}, newCmdline...)
retcode, out, err := Compile(cppBin, newCmdline)
if retcode == 0 {
return newFullArgs, retcode, out, err
}
correctedCmdline, corrected, _ := CorrectMissingHeaders(cppBin, newCmdline)
if corrected {
return append([]string{cppBin}, correctedCmdline...), 0, "", ""
}
return newFullArgs, retcode, out, err
}
func ExtractMissingHeader(compilerOutput string) (string, bool) {
r := regexp.MustCompile(`fatal error: ['|<](?P[a-zA-z0-9\/\.]+)['|>] file not found`)
matches := r.FindStringSubmatch(compilerOutput)
if len(matches) == 0 {
return "", false
}
return matches[1], true
}
func ReplaceMissingHeaderInFile(srcFilename, curHeader, replacementHeader string) error {
srcFile, err := os.Open(srcFilename)
if err != nil {
return err
}
srcBytes, err := ioutil.ReadAll(srcFile)
if err != nil {
return err
}
src := string(srcBytes)
newSrc := ReplaceMissingHeader(src, curHeader, replacementHeader)
b := []byte(newSrc)
err = ioutil.WriteFile(srcFilename, b, 0644)
if err != nil {
return err
}
return nil
}
func ReplaceMissingHeader(src, curHeader, replacementHeader string) string {
re := regexp.MustCompile(`#include ["|<]` + curHeader + `["|>]\n`)
replacement := "#include \"" + replacementHeader + "\"\n"
return re.ReplaceAllString(src, replacement)
}
func GetHeaderCorrectedFilename(compilerErr string) (string, string, bool) {
re := regexp.MustCompile(`(?P[a-z\/\-\_0-9A-z\.]+):.* fatal error: .* file not found`)
matches := re.FindStringSubmatch(compilerErr)
if len(matches) < 2 {
return "", "", false
}
oldFilename := matches[1]
base := filepath.Base(oldFilename)
root := filepath.Dir(oldFilename)
newFilename := root + "/jcc-corrected-" + base
return oldFilename, newFilename, true
}
func GetHeaderCorrectedCmd(cmd []string, compilerErr string) ([]string, string, error) {
oldFilename, newFilename, success := GetHeaderCorrectedFilename(compilerErr)
if !success {
return cmd, "", errors.New("Couldn't find buggy file")
}
// Make new cmd.
newCmd := make([]string, len(cmd))
for i, part := range cmd {
newCmd[i] = part
}
found := false
for i, filename := range newCmd {
if filename == oldFilename {
newCmd[i] = newFilename
found = true
break
}
}
CopyFile(oldFilename, newFilename)
if found {
return newCmd, newFilename, nil
}
return cmd, "", errors.New("Couldn't find file")
}
func CorrectMissingHeaders(bin string, cmd []string) ([]string, bool, error) {
_, _, stderr := Compile(bin, cmd)
cmd, correctedFilename, err := GetHeaderCorrectedCmd(cmd, stderr)
if err != nil {
return cmd, false, err
}
for i := 0; i < MaxMissingHeaderFiles; i++ {
fixed, hasBrokenHeaders := TryCompileAndFixHeadersOnce(bin, cmd, correctedFilename)
if fixed {
return cmd, true, nil
}
if !hasBrokenHeaders {
return cmd, false, nil
}
}
return cmd, false, nil
}
func ExecBuildCommand(bin string, args []string) (int, string, string) {
// Executes the original command.
cmd := exec.Command(bin, args...)
var outb, errb bytes.Buffer
cmd.Stdout = &outb
cmd.Stderr = &errb
cmd.Stdin = os.Stdin
cmd.Run()
return cmd.ProcessState.ExitCode(), outb.String(), errb.String()
}
func Compile(bin string, args []string) (int, string, string) {
// Run the actual command.
return ExecBuildCommand(bin, args)
}
func TryCompileAndFixHeadersOnce(bin string, cmd []string, filename string) (fixed, hasBrokenHeaders bool) {
retcode, _, err := Compile(bin, cmd)
if retcode == 0 {
fixed = true
hasBrokenHeaders = false
return
}
missingHeader, isMissing := ExtractMissingHeader(err)
if !isMissing {
fixed = false
hasBrokenHeaders = false
return
}
newHeaderPath, found := FindMissingHeader(missingHeader)
if !found {
fixed = false
hasBrokenHeaders = true
return false, true
}
ReplaceMissingHeaderInFile(filename, missingHeader, newHeaderPath)
return false, true
}
func FindMissingHeader(missingHeader string) (string, bool) {
envVar := "JCC_MISSING_HEADER_SEARCH_PATH"
var searchPath string
searchPath, exists := os.LookupEnv(envVar)
if !exists {
searchPath = "/src"
}
searchPath, _ = filepath.Abs(searchPath)
var headerLocation string
missingHeader = "/" + missingHeader
find := func(path string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
if d.IsDir() {
return nil
}
if strings.HasSuffix(path, missingHeader) {
headerLocation = path
return nil
}
return nil
}
filepath.WalkDir(searchPath, find)
if headerLocation == "" {
return "", false
}
return headerLocation, true
}
func CppifyHeaderIncludesFromFile(srcFile string) error {
contentsBytes, err := ioutil.ReadFile(srcFile)
if err != nil {
return err
}
contents := string(contentsBytes[:])
contents, err = CppifyHeaderIncludes(contents)
if err != nil {
return err
}
b := []byte(contents)
err = ioutil.WriteFile(srcFile, b, 0644)
return err
}
func CppifyHeaderIncludes(contents string) (string, error) {
shouldCppify, exists := os.LookupEnv("JCC_CPPIFY_PROJECT_HEADERS")
if !exists || strings.Compare(shouldCppify, "0") == 0 {
return contents, nil
}
if strings.Contains(contents, CppifyHeadersMagicString) {
return contents, nil
}
re := regexp.MustCompile(`\#include \"(?P.+)\"\n`)
matches := re.FindAllStringSubmatch(contents, -1)
if len(matches) == 0 {
return "", nil // !!!
}
for i, match := range matches {
if i == 0 {
// So we don't cppify twice.
contents += CppifyHeadersMagicString
}
oldStr := match[0]
replacement := "extern \"C\" {\n#include \"" + match[1] + "\"\n}\n"
contents = strings.Replace(contents, oldStr, replacement, 1)
if strings.Compare(contents, "") == 0 {
panic("Failed to replace")
}
}
return contents, nil
}
func AppendStringToFile(filepath, new_content string) error {
// Appends |new_content| to the content of |filepath|.
file, err := os.OpenFile(filepath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
return err
}
defer file.Close()
_, err = file.WriteString(new_content)
return err
}
func WriteStdErrOut(args []string, outstr string, errstr string) {
// Prints |outstr| to stdout, prints |errstr| to stderr, and saves |errstr| to err.log.
fmt.Print(outstr)
fmt.Fprint(os.Stderr, errstr)
// Record what compile args produced the error and the error itself in log file.
AppendStringToFile("/workspace/err.log", fmt.Sprintf("%s\n", args)+errstr)
}
func main() {
f, err := os.OpenFile("/tmp/jcc.log", os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Println(err)
}
defer f.Close()
if _, err := f.WriteString(fmt.Sprintf("%s\n", os.Args)); err != nil {
log.Println(err)
}
args := os.Args[1:]
if args[0] == "unfreeze" {
fmt.Println("unfreeze")
unfreeze()
}
basename := filepath.Base(os.Args[0])
isCPP := basename == "clang++-jcc"
newArgs := append(args, "-w")
var bin string
if isCPP {
bin = "clang++"
newArgs = append(args, "-stdlib=libc++")
} else {
bin = "clang"
}
fullCmdArgs := append([]string{bin}, newArgs...)
if IsCompilingTarget(fullCmdArgs) {
WriteTargetArgsAndCommitImage(fullCmdArgs)
os.Exit(0)
}
retcode, out, errstr := Compile(bin, newArgs)
WriteStdErrOut(fullCmdArgs, out, errstr)
os.Exit(retcode)
}
type BuildCommand struct {
CWD string `json:"CWD"`
CMD []string `json:"CMD"`
}
func WriteTargetArgsAndCommitImage(cmdline []string) {
log.Println("WRITE COMMAND")
f, _ := os.OpenFile("/out/statefile.json", os.O_CREATE|os.O_WRONLY, 0644)
wd, _ := os.Getwd()
buildcmd := BuildCommand{
CWD: wd,
CMD: cmdline,
}
jsonData, _ := json.Marshal(buildcmd)
f.Write(jsonData)
f.Close()
hostname, _ := os.Hostname()
dockerArgs := []string{"commit", hostname, "frozen"}
cmd := exec.Command("docker", dockerArgs...)
var outb, errb bytes.Buffer
cmd.Stdout = &outb
cmd.Stderr = &errb
cmd.Stdin = os.Stdin
cmd.Run()
fmt.Println(outb.String(), errb.String())
fmt.Println("COMMIT IMAGE")
}
func IsCompilingTarget(cmdline []string) bool {
for _, arg := range cmdline {
// This can fail if people do crazy things they aren't supposed
// to such as using some other means to link in libFuzzer.
if arg == "-fsanitize=fuzzer" {
return true
}
if arg == "-lFuzzingEngine" {
return true
}
}
return false
}
func parseCommand(command string) (string, []string) {
args := strings.Fields(command)
commandBin := args[0]
commandArgs := args[1:]
return commandBin, commandArgs
}
func unfreeze() {
content, err := ioutil.ReadFile("/out/statefile.json")
if err != nil {
log.Fatal(err)
}
var command BuildCommand
json.Unmarshal(content, &command)
bin, args := parseCommand(strings.Join(command.CMD, " "))
os.Chdir(command.CWD)
ExecBuildCommand(bin, args)
os.Exit(0)
}
================================================
FILE: infra/base-images/base-builder/jcc/jcc_test.go
================================================
package main
import (
"fmt"
"os"
"strings"
"testing"
)
func TestExtractMissingHeader(t *testing.T) {
missingHeaderMessage := `path/to/file.cpp:8:10: fatal error: 'missingheader.h' file not found
#include "missingheader.h"
^~~~~~~~~~~~
1 error generated.
`
res, _ := ExtractMissingHeader(missingHeaderMessage)
expected := "missingheader.h"
if strings.Compare(res, expected) != 0 {
t.Errorf("Got: %s. Expected: %s.", res, expected)
}
}
func TestGetHeaderCorrectedFilename(t *testing.T) {
missingHeaderMessage := `path/to/file.cpp:8:10: fatal error: 'missingheader.h' file not found
#include "missingheader.h"
^~~~~~~~~~~~
1 error generated.
`
_, correctedFilename, _ := GetHeaderCorrectedFilename(missingHeaderMessage)
expected := "path/to/jcc-corrected-file.cpp"
if strings.Compare(correctedFilename, expected) != 0 {
t.Errorf("Got: %s. Expected: %s.", correctedFilename, expected)
}
}
func TestFindMissingHeader(t *testing.T) {
pwd, _ := os.Getwd()
t.Setenv("JCC_MISSING_HEADER_SEARCH_PATH", pwd)
location, _ := FindMissingHeader("header.h")
expected := pwd + "/testdata/path/to/header.h"
if strings.Compare(location, expected) != 0 {
t.Errorf("Got: %s. Expected: %s.", location, expected)
}
}
func TestCorrectMissingHeaders(t *testing.T) {
pwd, _ := os.Getwd()
t.Setenv("JCC_MISSING_HEADER_SEARCH_PATH", pwd)
cfile := pwd + "/testdata/cfile.c"
cmd := [4]string{"-fsanitize=address", cfile, "-o", "/tmp/blah"}
res, err := CorrectMissingHeaders("clang", cmd[:])
if !res {
fmt.Println(err)
t.Errorf("Expected successful compilation")
}
}
func TestGetHeaderCorrectedCmd(t *testing.T) {
compilerErr := `testdata/cpp.cc:8:10: fatal error: 'missingheader.h' file not found
#include "missingheader.h"
^~~~~~~~~~~~
1 error generated.
`
cmd := [3]string{"-fsanitize=address", "file.cpp", "path/to/cpp.cc"}
expectedFixedCmd := [3]string{"-fanitize=address", "file.cpp", "path/to/jcc-corrected-cpp.cc"}
fixedCmd, _, _ := GetHeaderCorrectedCmd(cmd[:], compilerErr)
if strings.Compare(fixedCmd[1], expectedFixedCmd[1]) != 0 {
t.Errorf("Expected %s, got: %s", expectedFixedCmd, fixedCmd)
}
}
func TestCppifyHeaderIncludes(t *testing.T) {
t.Setenv("JCC_CPPIFY_PROJECT_HEADERS", "1")
src := `// Copyright blah
#include
#include "fuzz.h"
#include "x/y.h"
extern "C" LLVMFuzzerTestOneInput(uint8_t* data, size_t sz) {
return 0;
}`
newFile, _ := CppifyHeaderIncludes(src)
expected := `// Copyright blah
#include
extern "C" {
#include "fuzz.h"
}
extern "C" {
#include "x/y.h"
}
extern "C" LLVMFuzzerTestOneInput(uint8_t* data, size_t sz) {
return 0;
}
/* JCCCppifyHeadersMagicString */
`
if strings.Compare(newFile, expected) != 0 {
t.Errorf("Expected: %s, got: %s", expected, newFile)
}
}
func TestCppifyHeaderIncludesShouldnt(t *testing.T) {
src := `// Copyright blah
#include
#include "fuzz.h"
#include "x/y.h"
extern "C" LLVMFuzzerTestOneInput(uint8_t* data, size_t sz) {
return 0;
}`
newFile, _ := CppifyHeaderIncludes(src)
if strings.Compare(newFile, src) != 0 {
t.Errorf("Expected: %s. Got: %s", src, newFile)
}
}
func TestCppifyHeaderIncludesAlready(t *testing.T) {
src := `// Copyright blah
#include
#include "fuzz.h"
#include "x/y.h"
extern "C" LLVMFuzzerTestOneInput(uint8_t* data, size_t sz) {
return 0;
}
/* JCCCppifyHeadersMagicString */
`
newFile, _ := CppifyHeaderIncludes(src)
if strings.Compare(newFile, src) != 0 {
t.Errorf("Expected %s, got: %s", src, newFile)
}
}
func TestExtractMissingHeaderNonHeaderFailure(t *testing.T) {
missingHeaderMessage := `clang: error: no such file or directory: 'x'
clang: error: no input files`
header, res := ExtractMissingHeader(missingHeaderMessage)
if res {
t.Errorf("Expected no match, got: %s", header)
}
}
func TestReplaceMissingHeader(t *testing.T) {
cfile := `// Copyright 2035 Robots
#include
#include
// Some libraries like OpenSSL will use brackets for their own headers.
#include
int LLVMFuzzerTestOneInput(uint8_t* data, size_t size) {
return 0;
}
`
res := ReplaceMissingHeader(cfile, "missingheader.h", "path/to/includes/missingheader.h")
expected := `// Copyright 2035 Robots
#include
#include
// Some libraries like OpenSSL will use brackets for their own headers.
#include "path/to/includes/missingheader.h"
int LLVMFuzzerTestOneInput(uint8_t* data, size_t size) {
return 0;
}
`
if strings.Compare(res, expected) != 0 {
t.Errorf("Got: %s. Expected: %s.", res, expected)
}
}
================================================
FILE: infra/base-images/base-builder/jcc/testdata/.gitignore
================================================
jcc-corrected-cfile.c
jcc-corrected-cfile.cpp
================================================
FILE: infra/base-images/base-builder/jcc/testdata/cfile.c
================================================
// Copyright 2023 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "header.h"
int main() {
return 0;
}
================================================
FILE: infra/base-images/base-builder/jcc/testdata/cpp.cc
================================================
// Copyright 2023 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "header.h"
int main() {
return 0;
}
================================================
FILE: infra/base-images/base-builder/jcc/testdata/path/to/header.h
================================================
// Copyright 2023 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
int xhg(void);
================================================
FILE: infra/base-images/base-builder/llvmsymbol.diff
================================================
diff --git a/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt b/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt
index acfb3bd0e..a499ee2e0 100644
--- a/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt
+++ b/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt
@@ -12,4 +12,8 @@ add_llvm_component_library(LLVMSymbolize
Object
Support
Demangle
- )
+
+ LINK_LIBS
+ /usr/lib/swift_static/linux/libswiftCore.a
+ /usr/lib/x86_64-linux-gnu/libstdc++.so.6
+)
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index fb4875f79..0030769ee 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -36,6 +36,13 @@
#include
#include
+
+extern "C" char *swift_demangle(const char *mangledName,
+ size_t mangledNameLength,
+ char *outputBuffer,
+ size_t *outputBufferSize,
+ uint32_t flags);
+
namespace llvm {
namespace symbolize {
@@ -678,6 +685,14 @@ LLVMSymbolizer::DemangleName(const std::string &Name,
free(DemangledName);
return Result;
}
+ if (!Name.empty() && Name.front() == '$') {
+ char *DemangledName = swift_demangle(Name.c_str(), Name.length(), 0, 0, 0);
+ if (DemangledName) {
+ std::string Result = DemangledName;
+ free(DemangledName);
+ return Result;
+ }
+ }
if (DbiModuleDescriptor && DbiModuleDescriptor->isWin32Module())
return std::string(demanglePE32ExternCFunc(Name));
================================================
FILE: infra/base-images/base-builder/make_build_replayable.py
================================================
#!/usr/bin/env python3
# Copyright 2025 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#!/usr/bin/env python
import os
import shutil
_REAL_SUFFIX = '.real'
_WRAPPER_TEMPLATE = """#!/usr/bin/env python3
import sys
import os
def main():
target = sys.argv[0] + '.real'
{contents}
os.execv(target, sys.argv)
if __name__ == '__main__':
main()
"""
def create_wrapper(contents: str):
return _WRAPPER_TEMPLATE.format(contents=contents)
def main():
dummy_script_content = '#!/bin/sh'
dummy_scripts = [
'/usr/bin/autoconf',
'/usr/bin/autoheader',
'/usr/bin/autom4te',
'/usr/bin/automake',
'/usr/bin/autopoint',
'/usr/bin/autoreconf',
'/usr/bin/autoscan',
'/usr/bin/autoupdate',
# Applying patches is not idempotent.
'/usr/bin/patch',
]
for script_path in dummy_scripts:
with open(script_path, 'w') as f:
f.write(dummy_script_content)
os.chmod(script_path, 0o755)
files_to_move = (
'/usr/bin/cmake',
'/usr/local/bin/cmake',
'/bin/sh',
'/bin/bash',
'/usr/bin/git',
'/usr/bin/ln',
'/usr/bin/make',
'/usr/bin/meson',
'/usr/bin/mkdir',
'/usr/bin/zip',
)
for src in files_to_move:
if os.path.exists(src):
shutil.move(src, src + _REAL_SUFFIX)
# Create a shell wrapper that stubs out `configure` and `autogen`.
with open('/bin/sh', 'w') as f:
f.write(
create_wrapper("""
if any(os.path.basename(arg) in ('configure', 'autogen.sh') for arg in sys.argv[1:]):
sys.exit(0)
"""))
shutil.copyfile('/bin/sh', '/bin/bash')
# Stub out `make clean`.
with open('/usr/bin/make', 'w') as f:
f.write(
create_wrapper("""
if any(arg == 'clean' for arg in sys.argv[1:]):
sys.exit(0)
"""))
# Stub out `meson setup`.
with open('/usr/bin/meson', 'w') as f:
f.write(
create_wrapper("""
if any(arg == 'setup' for arg in sys.argv[1:]):
sys.exit(0)
"""))
# Stub out cmake, but allow cmake --build, --install, -E (command mode), -P
# (script mode).
with open('/usr/bin/cmake', 'w') as f:
f.write(
create_wrapper("""
if not any(arg in ('--build', '--install', '-E', '-P', '--version') for arg in sys.argv[1:]):
sys.exit(0)
"""))
shutil.copyfile('/usr/bin/cmake', '/usr/local/bin/cmake')
# Add -p to mkdir calls to allow it to be run twice.
with open('/usr/bin/mkdir', 'w') as f:
f.write(
create_wrapper("""
if not any(arg == '-p' for arg in sys.argv[1:]):
sys.argv.insert(1, '-p')
"""))
# Don't zip something that already exists.
with open('/usr/bin/zip', 'w') as f:
f.write(
create_wrapper("""
if (any(arg.endswith('.zip') and os.path.exists(arg) for arg in sys.argv[1:])):
sys.exit(0)
"""))
# Add -f to ln.
with open('/usr/bin/ln', 'w') as f:
f.write(
create_wrapper("""
if not any(arg == '-f' for arg in sys.argv[1:]):
sys.argv.insert(1, '-f')
"""))
# Don't allow git `reset` or `clean` or `apply`.
# reset/clean might remove build artifacts.
# clone is not idempotent.
# applying patches is not idempotent.
with open('/usr/bin/git', 'w') as f:
f.write(
create_wrapper("""
if any(arg in ('clean', 'clone', 'reset', 'apply', 'submodule') for arg in sys.argv[1:]):
sys.exit(0)
"""))
for file_path in files_to_move:
if os.path.exists(file_path):
os.chmod(file_path, 0o755)
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/ossfuzz_coverage_runner.go
================================================
// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package mypackagebeingfuzzed
import (
"io/ioutil"
"os"
"runtime/pprof"
"testing"
)
func TestFuzzCorpus(t *testing.T) {
dir := os.Getenv("FUZZ_CORPUS_DIR")
if dir == "" {
t.Logf("No fuzzing corpus directory set")
return
}
infos, err := ioutil.ReadDir(dir)
if err != nil {
t.Logf("Not fuzzing corpus directory %s", err)
return
}
filename := ""
defer func() {
if r := recover(); r != nil {
t.Error("Fuzz panicked in "+filename, r)
}
}()
profname := os.Getenv("FUZZ_PROFILE_NAME")
if profname != "" {
f, err := os.Create(profname + ".cpu.prof")
if err != nil {
t.Logf("error creating profile file %s\n", err)
} else {
_ = pprof.StartCPUProfile(f)
}
}
for i := range infos {
filename = dir + infos[i].Name()
data, err := ioutil.ReadFile(filename)
if err != nil {
t.Error("Failed to read corpus file", err)
}
FuzzFunction(data)
}
if profname != "" {
pprof.StopCPUProfile()
f, err := os.Create(profname + ".heap.prof")
if err != nil {
t.Logf("error creating heap profile file %s\n", err)
}
if err = pprof.WriteHeapProfile(f); err != nil {
t.Logf("error writing heap profile file %s\n", err)
}
f.Close()
}
}
================================================
FILE: infra/base-images/base-builder/precompile_afl
================================================
#!/bin/bash -eu
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo "Precompiling AFLplusplus"
pushd $SRC/aflplusplus > /dev/null
make clean
# Unset CFLAGS and CXXFLAGS while building AFL since we don't want to slow it
# down with sanitizers.
SAVE_CXXFLAGS=$CXXFLAGS
SAVE_CFLAGS=$CFLAGS
unset CXXFLAGS
unset CFLAGS
export AFL_IGNORE_UNKNOWN_ENVS=1
make clean
AFL_NO_X86=1 PYTHON_INCLUDE=/ make
make -C utils/aflpp_driver
popd > /dev/null
echo "Done."
================================================
FILE: infra/base-images/base-builder/precompile_centipede
================================================
#!/bin/bash -eu
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo -n "Precompiling centipede"
# Build Centipede with bazel.
cd "$SRC/fuzztest/centipede/"
apt-get update && apt-get install libssl-dev -y
unset CXXFLAGS CFLAGS
# We need to use an older version of BAZEL because fuzztest relies on WORKSPACE
# Ref: https://github.com/google/oss-fuzz/pull/12838#issue-2733821058
export USE_BAZEL_VERSION=7.4.0
echo 'build --cxxopt=-stdlib=libc++ --linkopt=-lc++' >> /tmp/centipede.bazelrc
bazel --bazelrc=/tmp/centipede.bazelrc build -c opt :all
unset USE_BAZEL_VERSION
# Prepare the weak symbols:
# This is necessary because we compile the target binary and the intermediate
# auxiliary binaries with the same cflags. The auxiliary binaries do not need
# data-flow tracing flags, but will still throw errors when they cannot find
# the corresponding functions.
# The weak symbols provides fake implementations for intermediate binaries.
$CXX "$SRC/fuzztest/centipede/weak_sancov_stubs.cc" -c -o "$SRC/fuzztest/centipede/weak.o"
echo 'Removing extra stuff leftover to avoid bloating image.'
rm -rf /clang-*.tgz /clang
BAZEL_BIN_REAL_DIR=$(readlink -f $CENTIPEDE_BIN_DIR)
rm -rf $CENTIPEDE_BIN_DIR
mkdir -p $CENTIPEDE_BIN_DIR
mv $BAZEL_BIN_REAL_DIR/centipede/{centipede,libcentipede_runner.pic.a} $CENTIPEDE_BIN_DIR
rm -rf /root/.cache
echo 'Done.'
================================================
FILE: infra/base-images/base-builder/precompile_honggfuzz
================================================
#!/bin/bash -eu
# Copyright 2019 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo "Precompiling honggfuzz"
export BUILD_OSSFUZZ_STATIC=true
PACKAGES=(
libunwind8-dev
libblocksruntime-dev
liblzma-dev
libiberty-dev
zlib1g-dev
pkg-config)
apt-get update && apt-get install -y ${PACKAGES[@]}
pushd $SRC/honggfuzz > /dev/null
make clean
# These CFLAGs match honggfuzz's default, with the exception of -mtune to
# improve portability and `-D_HF_LINUX_NO_BFD` to remove assembly instructions
# from the filenames.
sed -i 's/-Werror//g' Makefile
CC=clang CFLAGS="-O3 -funroll-loops -D_HF_LINUX_NO_BFD -Wno-unterminated-string-initialization -Wno-error" make
# libhfuzz.a will be added by CC/CXX linker directly during linking,
# but it's defined here to satisfy the build infrastructure
ar rcs honggfuzz.a libhfuzz/*.o libhfcommon/*.o
popd > /dev/null
apt-get remove -y --purge ${PACKAGES[@]}
apt-get autoremove -y
echo "Done."
================================================
FILE: infra/base-images/base-builder/precompile_honggfuzz_ubuntu_20_04
================================================
#!/bin/bash -eux
# Copyright 2019 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo "Precompiling honggfuzz"
export BUILD_OSSFUZZ_STATIC=true
PACKAGES=(
libunwind8-dev
libblocksruntime-dev
liblzma-dev
libiberty-dev
zlib1g-dev
pkg-config)
apt-get update && apt-get install -y ${PACKAGES[@]}
pushd $SRC/honggfuzz > /dev/null
make clean
# These CFLAGs match honggfuzz's default, with the exception of -mtune to
# improve portability and `-D_HF_LINUX_NO_BFD` to remove assembly instructions
# from the filenames.
sed -i 's/-Werror//g' Makefile
CC=clang CFLAGS="-O3 -funroll-loops -D_HF_LINUX_NO_BFD -Wno-unterminated-string-initialization -Wno-error" make
# libhfuzz.a will be added by CC/CXX linker directly during linking,
# but it's defined here to satisfy the build infrastructure
ar rcs honggfuzz.a libhfuzz/*.o libhfcommon/*.o
popd > /dev/null
apt-get remove -y --purge ${PACKAGES[@]}
apt-get autoremove -y
echo "Done."
================================================
FILE: infra/base-images/base-builder/precompile_honggfuzz_ubuntu_24_04
================================================
#!/bin/bash -eux
# Copyright 2019 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
echo "Precompiling honggfuzz"
export BUILD_OSSFUZZ_STATIC=true
PACKAGES=(
libunwind8-dev
libblocksruntime-dev
liblzma-dev
libiberty-dev
zlib1g-dev
pkg-config)
apt-get update && apt-get install -y ${PACKAGES[@]}
pushd $SRC/honggfuzz > /dev/null
make clean
# These CFLAGs match honggfuzz's default, with the exception of -mtune to
# improve portability and `-D_HF_LINUX_NO_BFD` to remove assembly instructions
# from the filenames.
sed -i 's/-Werror//g' Makefile
CC=clang CFLAGS="-O3 -funroll-loops -D_HF_LINUX_NO_BFD -Wno-unterminated-string-initialization -Wno-error" make LDFLAGS="-lBlocksRuntime -lunwind-ptrace -lunwind-generic"
# libhfuzz.a will be added by CC/CXX linker directly during linking,
# but it's defined here to satisfy the build infrastructure
ar rcs honggfuzz.a libhfuzz/*.o libhfcommon/*.o
popd > /dev/null
apt-get remove -y --purge ${PACKAGES[@]}
apt-get autoremove -y
echo "Done."
================================================
FILE: infra/base-images/base-builder/python_coverage_helper.py
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Extracts file paths to copy files from pyinstaller-generated executables"""
import os
import sys
import shutil
import zipfile
# Finds all *.toc files in ./workpath and reads these files in order to
# identify Python files associated with a pyinstaller packaged executable.
# Copies all of the Python files to a temporary directory (/medio) following
# the original directory structure.
def get_all_files_from_toc(toc_file, file_path_set):
"""
Extract filepaths from a .toc file and add to file_path_set
"""
with open(toc_file, 'rb') as toc_file_fd:
for line in toc_file_fd:
try:
line = line.decode()
except: # pylint:disable=bare-except
continue
if '.py' not in line:
continue
split_line = line.split(' ')
for word in split_line:
word = word.replace('\'', '').replace(',', '').replace('\n', '')
if '.py' not in word:
continue
# Check if .egg is in the path and if so we need to split it
if os.path.isfile(word):
file_path_set.add(word)
elif '.egg' in word: # check if this is an egg
egg_path_split = word.split('.egg')
if len(egg_path_split) != 2:
continue
egg_path = egg_path_split[0] + '.egg'
if not os.path.isfile(egg_path):
continue
print('Unzipping contents of %s' % egg_path)
# We have an egg. This needs to be unzipped and then replaced
# with the unzipped data.
tmp_dir_name = 'zipdcontents'
if os.path.isdir(tmp_dir_name):
shutil.rmtree(tmp_dir_name)
# unzip egg and replace path with unzipped content
with zipfile.ZipFile(egg_path, 'r') as zip_f:
zip_f.extractall(tmp_dir_name)
os.remove(egg_path)
shutil.copytree(tmp_dir_name, egg_path)
# Now the lines should be accessible, so check again
if os.path.isfile(word):
file_path_set.add(word)
def create_file_structure_from_tocs(work_path, out_path):
"""
Extract the Python files that are added as paths in the output of
a pyinstaller operation. The files are determined by reading through
all of the *.toc files in the workpath of pyinstaller.
The files will be copied into the out_path using a similar file path
as they originally are. If any archive (.egg) files are present in the
.toc files, then unzip the archives and substitute the archive for the
unzipped content, i.e. we will extract the archives and collect the source
files.
"""
print('Extracts files from the pyinstaller workpath')
file_path_set = set()
for path1 in os.listdir(work_path):
full_path = os.path.join(work_path, path1)
if not os.path.isdir(full_path):
continue
# We have a directory
for path2 in os.listdir(full_path):
if not '.toc' in path2:
continue
full_toc_file = os.path.join(full_path, path2)
get_all_files_from_toc(full_toc_file, file_path_set)
for file_path in file_path_set:
relative_src = file_path[1:] if file_path[0] == '/' else file_path
dst_path = os.path.join(out_path, relative_src)
os.makedirs(os.path.dirname(dst_path), exist_ok=True)
shutil.copy(file_path, dst_path)
def main():
"""
Main handler.
"""
if len(sys.argv) != 3:
print('Use: python3 python_coverage_helper.py pyinstaller_workpath '
'destination_for_output')
sys.exit(1)
work_path = sys.argv[1]
out_path = sys.argv[2]
create_file_structure_from_tocs(work_path, out_path)
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/replay_build.sh
================================================
#!/bin/bash -x
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
if [ ! -f /usr/bin/bash.real ]; then
# Only run this once.
python /usr/local/bin/make_build_replayable.py
fi
. $SRC/build.sh "$@"
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/.gitignore
================================================
dist
pysecsan.egg-info*
build
.venv
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/LICENSE
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/README.md
================================================
# pysecsan
Security sanitizers for vulnerability detection during runtime.
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/pyproject.toml
================================================
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "pysecsan"
version = "0.1.0"
authors = [
{ name="David Korczynski", email="david@adalogics.com" },
]
description = "Sanitizers to detect security vulnerabilities at runtime."
readme = "README.md"
requires-python = ">=3.7"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
]
[project.urls]
"Homepage" = "https://github.com/google/oss-fuzz/tree/master/infra/sanitizers/pysecsan"
"Bug Tracker" = "https://github.com/google/oss-fuzz/issues"
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/pysecsan/__init__.py
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Glue for pysecsan library."""
# Import sanlib and expose only needs functionality by way of __all__
from .sanlib import *
# pylint: disable=undefined-all-variable
__all__ = ['add_hooks']
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/pysecsan/command_injection.py
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Sanitizers for capturing code injections."""
from typing import Optional
from pysecsan import sanlib
def get_all_substr_prefixes(main_str, sub_str):
"""Yields all strings prefixed with sub_str in main_str."""
idx = 0
while True:
idx = main_str.find(sub_str, idx)
if idx == -1:
return
yield main_str[0:idx]
# Increase idx the length of the substring from the current position
# where an occurence of the substring was found.
idx += len(sub_str)
# pylint: disable=unsubscriptable-object
def check_code_injection_match(elem, check_unquoted=False) -> Optional[str]:
"""identify if elem is an injection match."""
# Check exact match
if elem == 'exec-sanitizer':
return 'Explicit command injection found.'
# Check potential for injecting into a string
if 'FROMFUZZ' in elem:
if check_unquoted:
# return true if any index is unquoted
for sub_str in get_all_substr_prefixes(elem, 'FROMFUZZ'):
if sub_str.count('\"') % 2 == 0:
return 'Fuzzer controlled content in data. Code injection potential.'
# Return None if all fuzzer taints were quoted
return None
return 'Fuzzer-controlled data in command string. Injection potential.'
return None
# pylint: disable=invalid-name
def hook_pre_exec_subprocess_Popen(cmd, **kwargs):
"""Hook for subprocess.Popen."""
arg_shell = 'shell' in kwargs and kwargs['shell']
# Command injections depend on whether the first argument is a list of
# strings or a string. Handle this now.
# Example: tests/poe/ansible-runner-cve-2021-4041
if isinstance(cmd, str):
res = check_code_injection_match(cmd, check_unquoted=True)
if res is not None:
# if shell arg is true and string is tainted and unquoted that's a
# definite code injection.
if arg_shell is True:
sanlib.abort_with_issue('Code injection in Popen', 'Command injection')
# It's a maybe: will not report this to avoid false positives.
# TODO: add more precise detection here.
# Check for hg command injection
# Example: tests/poe/libvcs-cve-2022-21187
if cmd[0] == 'hg':
# Check if the arguments are controlled by the fuzzer, and this given
# arg is not preceded by --
found_dashes = False
for idx in range(1, len(cmd)):
if cmd[0] == '--':
found_dashes = True
if not found_dashes and check_code_injection_match(cmd[idx]):
sanlib.abort_with_issue(
'command injection likely by way of mercurial. The following'
f'command {str(cmd)} is executed, and if you substitute {cmd[idx]} '
'with \"--config=alias.init=!touch HELLO_PY\" then you will '
'create HELLO_PY', 'Command injection')
def hook_pre_exec_os_system(cmd):
"""Hook for os.system."""
res = check_code_injection_match(cmd)
if res is not None:
sanlib.abort_with_issue(f'code injection by way of os.system\n{res}',
'Command injection')
def hook_pre_exec_eval(cmd, *args, **kwargs):
"""Hook for eval. Experimental atm."""
res = check_code_injection_match(cmd, check_unquoted=True)
if res is not None:
sanlib.abort_with_issue(f'Potential code injection by way of eval\n{res}',
'Command injection')
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/pysecsan/redos.py
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Sanitizer for regular expression dos."""
# pylint: disable=protected-access
import time
import os
from pysecsan import sanlib
START_RE_TIME = None
# Hooks for regular expressions.
# Main problem is to identify ReDOS attemps. This is a non-trivial task
# - https://arxiv.org/pdf/1701.04045.pdf
# - https://dl.acm.org/doi/pdf/10.1145/3236024.3236027
# and the current approach we use is simply check for extensive computing time.
# In essence, this is more of a refinement of traditional timeout checker from
# the fuzzer, which, effectively will detect these types of attacks by way of
# timeouts.
#
# Perhaps the smartest would be to use something like e.g.
# https://github.com/doyensec/regexploit to scan the regex patterns.
# Other heuristics without going too technical on identifying super-linear
# regexes:
# - check
# - if 'taint' exists in re.compile(xx)
# - check
# - for backtracking possbility in PATTERN within re.comile(PATTERN)
# - and
# - 'taint' in findall(XX) calls.
# pylint: disable=global-statement
def hook_post_exec_re_pattern_findall(self, re_str):
"""Hook post exeution re.compile().findall()."""
_ = self # Satisfy lint
global START_RE_TIME
try:
endtime = time.time() - START_RE_TIME
if endtime > 4:
sanlib.abort_with_issue(f'Potential ReDOS attack.\n {re_str}', 'ReDOS')
except NameError:
sanlib.sanitizer_log(
'starttime is not set, which it should have. Error in PySecSan',
sanlib.LOG_INFO)
os._exit(1)
def hook_pre_exec_re_pattern_findall(self, string):
"""Hook pre execution of re.pattern().findall()."""
_ = (self, string) # Satisfy lint
global START_RE_TIME
START_RE_TIME = time.time()
def hook_post_exec_re_compile(retval, pattern, flags=None):
"""Hook for re.compile post execution to hook returned objects functions."""
_ = (pattern, flags) # Satisfy lint
sanlib.sanitizer_log('Inside of post compile hook', sanlib.LOG_DEBUG)
wrapper_object = sanlib.create_object_wrapper(
findall=(hook_pre_exec_re_pattern_findall,
hook_post_exec_re_pattern_findall))
hooked_object = wrapper_object(retval)
return hooked_object
def hook_pre_exec_re_compile(pattern, flags=None):
"""Check if tainted input exists in pattern. If so, likely chance of making
ReDOS possible."""
_ = (pattern, flags) # Satisfy lint
sanlib.sanitizer_log('Inside re compile hook', sanlib.LOG_DEBUG)
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/pysecsan/sanlib.py
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Core routines for pysecsan library."""
# pylint: disable=protected-access
import re
import os
import functools
import subprocess
import traceback
import importlib.util
from typing import Any, Callable, Optional
from pysecsan import command_injection, redos, yaml_deserialization
LOG_DEBUG = 0
LOG_INFO = 1
PYSECSAN_LOG_LVL = LOG_INFO
# Message that will be printed to stdout when an issue is found.
PYSECSAN_BUG_LABEL = r'===BUG DETECTED: PySecSan:'
# pylint: disable=global-statement
def sanitizer_log(msg, log_level, force=False, log_prefix=True):
"""Helper printing function."""
global PYSECSAN_LOG_LVL
if log_level >= PYSECSAN_LOG_LVL or force:
if log_prefix:
print(f'[PYSECSAN] {msg}')
else:
print(f'{msg}')
def sanitizer_log_always(msg, log_prefix=True):
"""Wrapper for sanitizer logging. Will always log"""
sanitizer_log(msg, 0, force=True, log_prefix=log_prefix)
def is_module_present(mod_name):
"""Identify if module is importable."""
# pylint: disable=deprecated-method
return importlib.util.find_spec(mod_name) is not None
def _log_bug(bug_title):
sanitizer_log_always('%s %s ===' % (PYSECSAN_BUG_LABEL, bug_title),
log_prefix=False)
def abort_with_issue(msg, bug_title):
"""Print message, display stacktrace and force process exit.
Use this function for signalling an issue is found and use the messages
logged from this function to determine if a fuzzer found a bug.
"""
# Show breaker string using an ASAN approach (uses 65 =)
sanitizer_log_always("=" * 65, log_prefix=False)
# Log issue message
_log_bug(bug_title)
sanitizer_log_always(msg)
# Log stacktrace
sanitizer_log_always("Stacktrace:")
traceback.print_stack()
# Force exit
# Use os._exit here to force exit. sys.exit will exit
# by throwing a SystemExit exception which the interpreter
# handles by exiting. However, code may catch this exception,
# and thus to avoid this we exit the process without exceptions.
# pylint: disable=protected-access
sanitizer_log_always("Exiting")
os._exit(1)
def is_exact_taint(stream) -> bool:
"""Checks if stream is an exact match for taint from fuzzer."""
# The fuzzer has to get 8 characters right. This may be a bit much,
# however, when found it shows a high level of control over the data.
if stream == 'FROMFUZZ':
return True
return False
def create_object_wrapper(**methods):
"""Hooks functions in an object.
This is needed for hooking built-in types and object attributes.
Example use case is if we want to find ReDOS vulnerabilities, that
have a pattern of
```
import re
r = re.compile(REGEX)
for _ in r.findall(...)
```
In the above case r.findall is a reference to
re.Pattern.findall, which is a built-in type that is non-writeable.
In order to hook such calls we need to wrap the object, and also hook the
re.compile function to return the wrapped/hooked object.
"""
class Wrapper():
"""Wrap an object by hiding attributes."""
def __init__(self, instance):
object.__setattr__(self, 'instance', instance)
def __setattr__(self, name, value):
object.__setattr__(object.__getattribute__(self, 'instance'), name, value)
def __getattribute__(self, name):
instance = object.__getattribute__(self, 'instance')
def _hook_func(self, pre_hook, post_hook, orig, *args, **kargs):
if pre_hook is not None:
pre_hook(self, *args, **kargs)
# No need to pass instance here because when we extracted
# the function we used instance.__getattribute__(name) which
# seems to include it. I think.
orig_retval = orig(*args, **kargs)
if post_hook is not None:
post_hook(self, *args, **kargs)
return orig_retval
# If this is a wrapped method, return a bound method
if name in methods:
pre_hook = methods[name][0]
post_hook = methods[name][1]
orig = instance.__getattribute__(name)
return (lambda *args, **kargs: _hook_func(self, pre_hook, post_hook,
orig, *args, **kargs))
# Otherwise, just return attribute of instance
return instance.__getattribute__(name)
return Wrapper
# pylint: disable=unsubscriptable-object
def add_hook(function: Callable[[Any], Any],
pre_exec_hook: Optional[Callable[[Any], Any]] = None,
post_exec_hook: Optional[Callable[[Any], Any]] = None):
"""Hook a function.
Hooks can be placed pre and post function call. At least one hook is
needed.
This hooking is intended on non-object hooks. In order to hook functions
in objects the `create_object_wrapper` function is used in combination
with function hooking initialisation functions post execution.
"""
if pre_exec_hook is None and post_exec_hook is None:
raise Exception('Some hooks must be included')
@functools.wraps(function)
def run(*args, **kwargs):
sanitizer_log(f'Hook start {str(function)}', LOG_DEBUG)
# Call hook
if pre_exec_hook is not None:
pre_exec_hook(*args, **kwargs)
# Call the original function in the even the hook did not indicate
# failure.
ret = function(*args, **kwargs)
# Post execution hook. Overwrite return value if anything is returned
# by post hook.
if post_exec_hook is not None:
tmp_ret = post_exec_hook(ret, *args, **kwargs)
if tmp_ret is not None:
sanitizer_log('Overwriting return value', LOG_DEBUG)
ret = tmp_ret
sanitizer_log(f'Hook end {str(function)}', LOG_DEBUG)
return ret
return run
def add_hooks():
"""Sets up hooks."""
sanitizer_log('Starting', LOG_INFO)
os.system = add_hook(os.system,
pre_exec_hook=command_injection.hook_pre_exec_os_system)
subprocess.Popen = add_hook(
subprocess.Popen,
pre_exec_hook=command_injection.hook_pre_exec_subprocess_Popen)
__builtins__['eval'] = add_hook(
__builtins__['eval'], pre_exec_hook=command_injection.hook_pre_exec_eval)
re.compile = add_hook(re.compile,
pre_exec_hook=redos.hook_pre_exec_re_compile,
post_exec_hook=redos.hook_post_exec_re_compile)
# Hack to determine if yaml is elligible, because pkg_resources does
# not seem to work from pyinstaller.
# pylint: disable=import-outside-toplevel
if is_module_present('yaml'):
import yaml
sanitizer_log('Hooking pyyaml.load', LOG_DEBUG)
yaml.load = add_hook(
yaml.load,
pre_exec_hook=yaml_deserialization.hook_pre_exec_pyyaml_load,
)
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/pysecsan/yaml_deserialization.py
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Catches vulnerable yaml desrializations that can potentially lead to
arbitrary code execution."""
from pysecsan import sanlib
try:
import yaml
# pylint: disable=broad-except
except Exception:
pass
def hook_pre_exec_pyyaml_load(stream, loader):
"""Hook for pyyaml.load_yaml.
Exits if the loader is unsafe or vanilla loader and the stream passed
to the loader is controlled by the fuzz data
"""
# Ensure loader is the unsafe loader or vanilla loader
if loader not in (yaml.loader.Loader, yaml.loader.UnsafeLoader):
return
# Check for exact taint in stream
if sanlib.is_exact_taint(stream):
msg = (
'Yaml deserialization issue.\n'
'Unsafe deserialization can be used to execute arbitrary commands.\n')
sanlib.abort_with_issue(msg, 'Yaml deserialisation')
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/setup.py
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Config for installing python as package."""
from setuptools import setup, find_packages
setup(name='pysecsan',
version='0.1',
author='David Korczynski',
author_email='david@adalogics.com',
packages=find_packages(exclude='tests'))
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/README.md
================================================
# Tests including Proof of Exploits
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/eval_command_injection.py
================================================
#!/usr/bin/python3
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fuzzer targetting command injection of eval."""
# pylint: disable=eval-used
import sys
import atheris
import pysecsan
pysecsan.add_hooks()
def list_files_perhaps(param, magicval):
"""Pass fuzzer data into eval."""
if len(param) < 3:
return
if magicval == 1337:
try:
eval("FROMFUZZ")
except ValueError:
pass
def test_one_input(data):
"""Fuzzer entrypoint."""
fdp = atheris.FuzzedDataProvider(data)
list_files_perhaps(fdp.ConsumeUnicodeNoSurrogates(24),
fdp.ConsumeIntInRange(500, 1500))
def main():
"""Set up and start fuzzing."""
atheris.instrument_all()
atheris.Setup(sys.argv, test_one_input, enable_python_coverage=True)
atheris.Fuzz()
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/os_command_injection.py
================================================
#!/usr/bin/python3
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fuzzer targetting command injection of os.system."""
import os
import sys
import atheris
import pysecsan
def list_files_perhaps(param, magicval):
"""Pass fuzzer data into os.system."""
if 'B' not in param:
return
if magicval == 1338:
os.system('exec-san')
elif magicval == 1339:
os.system('ls -la FROMFUZZ')
else:
os.system('ls -la ./')
def test_one_input(data):
"""Fuzzer entrypoint."""
fdp = atheris.FuzzedDataProvider(data)
list_files_perhaps(fdp.ConsumeUnicodeNoSurrogates(24),
fdp.ConsumeIntInRange(500, 1500))
def main():
"""Set up and start fuzzing."""
pysecsan.add_hooks()
atheris.instrument_all()
atheris.Setup(sys.argv, test_one_input, enable_python_coverage=True)
atheris.Fuzz()
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/poe/ansible-runner-cve-2021-4041/build.sh
================================================
#!/bin/bash -eu
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
python3 -m pip install pysecsan
git clone https://github.com/ansible/ansible-runner/
cd ansible-runner
git checkout cdc0961df51fa1e10b44371944aafe5ae140b98c
python3 -m pip install .
cd ..
python3 fuzz_ansible_runner.py
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/poe/ansible-runner-cve-2021-4041/fuzz_ansible_runner.py
================================================
#!/usr/bin/python3
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Targets: https://github.com/advisories/GHSA-6j58-grhv-2769."""
import sys
import atheris
import pexpect
import pysecsan
import ansible_runner
from ansible_runner.config.runner import RunnerConfig
pysecsan.add_hooks()
def test_one_input(data):
"""Fuzzer entrypoint."""
fdp = atheris.FuzzedDataProvider(data)
conf = RunnerConfig('/tmp/')
conf.suppress_ansible_output = True
conf.expect_passwords = {pexpect.TIMEOUT: None, pexpect.EOF: None}
conf.cwd = str('/tmp/')
conf.env = {}
conf.job_timeout = 10
conf.idle_timeout = 0
conf.pexpect_timeout = 2.
conf.pexpect_use_poll = True
conf.command = 'from_fuzzer'
runner = ansible_runner.Runner(conf)
runner.resource_profiling = True
# rc.resource_profiling_base_cgroup = "; exec-san"
assistance = True
if assistance and fdp.ConsumeIntInRange(1, 100) > 80:
conf.resource_profiling_base_cgroup = 'FROMFUZZ'
else:
conf.resource_profiling_base_cgroup = fdp.ConsumeUnicodeNoSurrogates(24)
try:
runner.run()
except (RuntimeError, ValueError, TypeError) as _:
pass
def main():
"""Set up and start fuzzing."""
atheris.instrument_all()
atheris.Setup(sys.argv, test_one_input, enable_python_coverage=True)
atheris.Fuzz()
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/poe/libvcs-cve-2022-21187/build.sh
================================================
#!/bin/bash -eu
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
python3 -m pip install pysecsan
python3 -m pip install libvcs==0.11.0
python3 ./fuzz_libvcs.py
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/poe/libvcs-cve-2022-21187/fuzz_libvcs.py
================================================
#!/usr/bin/python3
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Targets https://github.com/advisories/GHSA-mv2w-4jqc-6fg4."""
import sys
import atheris
import pysecsan
from libvcs.shortcuts import create_repo
pysecsan.add_hooks()
def test_one_input(data):
"""Target code injection in libvcs."""
fdp = atheris.FuzzedDataProvider(data)
mercurial_repo = create_repo(url=fdp.ConsumeUnicodeNoSurrogates(128),
vcs='hg',
repo_dir='./')
try:
mercurial_repo.update_repo()
except (ValueError, FileNotFoundError) as exception:
_ = exception # Satisfy lint
def main():
"""Set up and start fuzzing."""
atheris.instrument_all()
atheris.Setup(sys.argv, test_one_input, enable_python_coverage=True)
atheris.Fuzz()
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/poe/python-ldap-GHSL-2021-117/build.sh
================================================
#!/bin/bash -eu
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
python3 -m pip install pysecsan
git clone https://github.com/python-ldap/python-ldap
cd python-ldap
git checkout 404c36b702c5b3a7e60729745c8bda16098b1472
python3 -m pip install .
cd ../
python3 ./fuzz_ldap.py
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/poe/python-ldap-GHSL-2021-117/fuzz_ldap.py
================================================
#!/usr/bin/python3
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Targets: https://github.com/python-ldap/python-ldap/security/advisories/GHSA-r8wq-qrxc-hmcm""" # pylint: disable=line-too-long
import sys
import atheris
import pysecsan
import ldap.schema
pysecsan.add_hooks()
def test_one_input(data):
"""Fuzzer targetting regex dos in ldap."""
fdp = atheris.FuzzedDataProvider(data)
try:
ldap.schema.split_tokens(fdp.ConsumeUnicodeNoSurrogates(1024))
except ValueError:
pass
def main():
"""Set up and start fuzzing."""
atheris.instrument_all()
atheris.Setup(sys.argv, test_one_input, enable_python_coverage=True)
atheris.Fuzz()
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/poe/pytorch-lightning-1.5.10/build.sh
================================================
#!/bin/bash -eu
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
git clone --depth 1 --branch 1.5.10 https://github.com/PyTorchLightning/pytorch-lightning.git
cd pytorch-lightning
python3 -m pip install .
cd ../
python3 ./fuzz_pytorch_lightning.py
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/poe/pytorch-lightning-1.5.10/fuzz_pytorch_lightning.dict
================================================
"os.system('exec-sanitizer')"
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/poe/pytorch-lightning-1.5.10/fuzz_pytorch_lightning.py
================================================
#!/usr/local/bin/python3
#
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Exploit pytorch lightning with fuzzer's input as a random env variable.
This PoC is extended from a report in GitHub Advisory Database:
https://github.com/advisories/GHSA-r5qj-cvf9-p85h
The original report documents an exploit using a specific environment variable,
we show a way to achieve the same exploit with an arbitrary env variable.
"""
import os
import sys
import atheris
import pysecsan
pysecsan.add_hooks()
with atheris.instrument_imports():
from pytorch_lightning import Trainer
from pytorch_lightning.utilities.argparse import parse_env_variables
def prepare_fuzzing_input(data):
"""Prepare the data needed by the exploit with input data from fuzzers."""
data = data.replace(b'\0', b'')
env_name = 'AN_ARBITRARY_ENV_NAME'
return data, env_name
def exploit_target(env_value, env_name):
"""This target is based on a snippet from the official documentation of
`parse_env_variables`:
https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.utilities.argparse.html # pylint: disable=line-too-long
It might not be the most realistic example,
but serves as a PoC to show that SystemSan works for Python."""
os.environb[env_name.encode()] = env_value
parse_env_variables(Trainer, template=env_name)
def TestOneInput(data): # pylint: disable=invalid-name
"""Exploit the target only with input data from fuzzers."""
env_value, env_name = prepare_fuzzing_input(data)
exploit_target(env_value, env_name)
def main():
"""Fuzz target with atheris."""
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/subprocess_popen_injection.py
================================================
#!/usr/bin/python3
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fuzzer displaying insecure use of subprocess.Popen."""
import sys
import subprocess
import atheris
import pysecsan
def list_files_perhaps(param):
"""Insecure call to Popen."""
try:
subprocess.Popen(' '.join(['ls', '-la', param]), shell=True)
except ValueError:
pass
def test_one_input(data):
"""Fuzzer entrypoint."""
fdp = atheris.FuzzedDataProvider(data)
if fdp.ConsumeIntInRange(1, 10) == 5:
list_files_perhaps('FROMFUZZ')
else:
list_files_perhaps('.')
def main():
"""Set up and start fuzzing."""
pysecsan.add_hooks()
atheris.instrument_all()
atheris.Setup(sys.argv, test_one_input, enable_python_coverage=True)
atheris.Fuzz()
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/yaml_deserialization_general.py
================================================
#!/usr/bin/python3
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fuzzer for insecure yaml deserialization."""
import sys
import yaml
import atheris
import pysecsan
def serialize_with_tainted_data(param):
"""Hit insecure yaml function."""
try:
yaml.load(param, yaml.Loader)
except yaml.YAMLError:
pass
def test_one_input(data):
"""Fuzzer routine."""
fdp = atheris.FuzzedDataProvider(data)
serialize_with_tainted_data(fdp.ConsumeUnicodeNoSurrogates(32))
def main():
"""Set up and start fuzzing."""
pysecsan.add_hooks()
atheris.instrument_all()
atheris.Setup(sys.argv, test_one_input, enable_python_coverage=True)
atheris.Fuzz()
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/sanitizers/pysecsan/tests/yaml_deserialization_simple.py
================================================
#!/usr/bin/python3
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fuzzer triggering insecure yaml serialization."""
import sys
import yaml
import atheris
import pysecsan
def serialize_with_tainted_data(param, magicval):
"""Pass data to insecure yaml functions."""
if magicval == 1337:
try:
yaml.load(param, yaml.Loader)
except yaml.YAMLError:
pass
elif magicval == 1338:
try:
yaml.load('FROMFUZZ', yaml.Loader)
except yaml.YAMLError:
pass
def test_one_input(data):
"""Fuzzer entrypoint."""
fdp = atheris.FuzzedDataProvider(data)
serialize_with_tainted_data(fdp.ConsumeUnicodeNoSurrogates(32),
fdp.ConsumeIntInRange(500, 1500))
def main():
"""Set up and start fuzzing."""
pysecsan.add_hooks()
atheris.instrument_all()
atheris.Setup(sys.argv, test_one_input, enable_python_coverage=True)
atheris.Fuzz()
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/srcmap
================================================
#!/bin/bash -eux
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Deterimine srcmap of checked out source code
SRCMAP=$(tempfile)
echo "{}" > $SRCMAP
# $1 - json file, $2 - jq program
function jq_inplace() {
F=$(tempfile) && cat $1 | jq "$2" > $F && mv $F $1
}
PATHS_TO_SCAN="$SRC"
if [[ $FUZZING_LANGUAGE == "go" ]]; then
PATHS_TO_SCAN="$PATHS_TO_SCAN $GOPATH"
fi
# Git
for DOT_GIT_DIR in $(find $PATHS_TO_SCAN -name ".git" -type d); do
GIT_DIR=$(dirname $DOT_GIT_DIR)
cd $GIT_DIR
GIT_URL=$(git config --get remote.origin.url)
GIT_REV=$(git rev-parse HEAD)
jq_inplace $SRCMAP ".\"$GIT_DIR\" = { type: \"git\", url: \"$GIT_URL\", rev: \"$GIT_REV\" }"
done
# Subversion
for DOT_SVN_DIR in $(find $PATHS_TO_SCAN -name ".svn" -type d); do
SVN_DIR=$(dirname $DOT_SVN_DIR)
cd $SVN_DIR
SVN_URL=$(svn info | grep "^URL:" | sed 's/URL: //g')
SVN_REV=$(svn info -r HEAD | grep "^Revision:" | sed 's/Revision: //g')
jq_inplace $SRCMAP ".\"$SVN_DIR\" = { type: \"svn\", url: \"$SVN_URL\", rev: \"$SVN_REV\" }"
done
# Mercurial
for DOT_HG_DIR in $(find $PATHS_TO_SCAN -name ".hg" -type d); do
HG_DIR=$(dirname $DOT_HG_DIR)
cd $HG_DIR
HG_URL=$(hg paths default)
HG_REV=$(hg --debug id -r. -i)
jq_inplace $SRCMAP ".\"$HG_DIR\" = { type: \"hg\", url: \"$HG_URL\", rev: \"$HG_REV\" }"
done
if [ "${OSSFUZZ_REVISION-}" != "" ]; then
jq_inplace $SRCMAP ".\"/src\" = { type: \"git\", url: \"https://github.com/google/oss-fuzz.git\", rev: \"$OSSFUZZ_REVISION\" }"
fi
cat $SRCMAP
rm $SRCMAP
================================================
FILE: infra/base-images/base-builder/test_data/culprit-commit.txt
================================================
ac9ee01fcbfac745aaedca0393a8e1c8a33acd8d is the first bad commit
commit ac9ee01fcbfac745aaedca0393a8e1c8a33acd8d
Author: John Doe
Date: Tue Aug 6 08:41:53 2019 +0000
[compiler-rt] Implement getrandom interception
Summary:
Straightforward implementation of `getrandom` syscall and libc
hooks.
Test Plan: Local MSAN failures caused by uninstrumented `getrandom`
calls stop failing.
Patch by John Doe 3.
Reviewers: jonhdoe2, johndoe
Reviewed By: johndoe
Subscribers: johndoe4, johndoe5, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D65551
llvm-svn: 367999
:040000 040000 8db10511ca83cc7b0265c7703684cd386350151b 62508fdc5e8919bbb2a0bd185cc109868192cdb0 M compiler-rt
bisect run success
================================================
FILE: infra/base-images/base-builder/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-clang:ubuntu-20-04
COPY install_deps_ubuntu-20-04.sh install_swift_ubuntu-20-04.sh /
RUN /install_deps_ubuntu-20-04.sh
# Build and install latest Python 3.11.
ENV PYTHON_VERSION 3.11.13
RUN PYTHON_DEPS="\
zlib1g-dev \
libncurses5-dev \
libgdbm-dev \
libnss3-dev \
libssl-dev \
libsqlite3-dev \
libreadline-dev \
libffi-dev \
libbz2-dev \
liblzma-dev" && \
unset CFLAGS CXXFLAGS && \
apt-get install -y $PYTHON_DEPS && \
cd /tmp && \
curl -O https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tar.xz && \
tar -xvf Python-$PYTHON_VERSION.tar.xz && \
cd Python-$PYTHON_VERSION && \
./configure --enable-optimizations --enable-shared && \
make -j$(nproc) && \
make install && \
ldconfig && \
ln -s /usr/local/bin/python3 /usr/local/bin/python && \
cd .. && \
rm -r /tmp/Python-$PYTHON_VERSION.tar.xz /tmp/Python-$PYTHON_VERSION && \
rm -rf /usr/local/lib/python${PYTHON_VERSION%.*}/test && \
python3 -m ensurepip && \
python3 -m pip install --upgrade pip && \
apt-get remove -y $PYTHON_DEPS # https://github.com/google/oss-fuzz/issues/3888
ENV CCACHE_VERSION 4.10.2
RUN cd /tmp && curl -OL https://github.com/ccache/ccache/releases/download/v$CCACHE_VERSION/ccache-$CCACHE_VERSION.tar.xz && \
tar -xvf ccache-$CCACHE_VERSION.tar.xz && cd ccache-$CCACHE_VERSION && \
mkdir build && cd build && \
export LDFLAGS='-lpthread' && \
cmake -D CMAKE_BUILD_TYPE=Release .. && \
make -j && make install && \
rm -rf /tmp/ccache-$CCACHE_VERSION /tmp/ccache-$CCACHE_VERSION.tar.xz
# Install six for Bazel rules.
RUN unset CFLAGS CXXFLAGS && pip3 install -v --no-cache-dir \
six==1.15.0 absl-py==2.3.0 pyelftools==0.32 && rm -rf /tmp/*
# Install Bazel through Bazelisk, which automatically fetches the latest Bazel version.
ENV BAZELISK_VERSION 1.9.0
RUN curl -L https://github.com/bazelbuild/bazelisk/releases/download/v$BAZELISK_VERSION/bazelisk-linux-amd64 -o /usr/local/bin/bazel && \
chmod +x /usr/local/bin/bazel
# Default build flags for various sanitizers.
ENV SANITIZER_FLAGS_address "-fsanitize=address -fsanitize-address-use-after-scope"
ENV SANITIZER_FLAGS_hwaddress "-fsanitize=hwaddress -fuse-ld=lld -Wno-unused-command-line-argument"
# Set of '-fsanitize' flags matches '-fno-sanitize-recover' + 'unsigned-integer-overflow'.
ENV SANITIZER_FLAGS_undefined "-fsanitize=array-bounds,bool,builtin,enum,function,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unsigned-integer-overflow,unreachable,vla-bound,vptr -fno-sanitize-recover=array-bounds,bool,builtin,enum,function,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unreachable,vla-bound,vptr"
# Don't include "function" since it is unsupported on aarch64.
ENV SANITIZER_FLAGS_undefined_aarch64 "-fsanitize=array-bounds,bool,builtin,enum,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unsigned-integer-overflow,unreachable,vla-bound,vptr -fno-sanitize-recover=array-bounds,bool,builtin,enum,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unreachable,vla-bound,vptr"
ENV SANITIZER_FLAGS_memory "-fsanitize=memory -fsanitize-memory-track-origins"
ENV SANITIZER_FLAGS_thread "-fsanitize=thread"
ENV SANITIZER_FLAGS_introspector "-O0 -flto -fno-inline-functions -fuse-ld=gold -Wno-unused-command-line-argument"
# Do not use any sanitizers in the coverage build.
ENV SANITIZER_FLAGS_coverage ""
# We use unsigned-integer-overflow as an additional coverage signal and have to
# suppress error messages. See https://github.com/google/oss-fuzz/issues/910.
ENV UBSAN_OPTIONS="silence_unsigned_overflow=1"
# To suppress warnings from binaries running during compilation.
ENV DFSAN_OPTIONS='warn_unimplemented=0'
# Default build flags for coverage feedback.
ENV COVERAGE_FLAGS="-fsanitize=fuzzer-no-link"
# Use '-Wno-unused-command-line-argument' to suppress "warning: -ldl: 'linker' input unused"
# messages which are treated as errors by some projects.
ENV COVERAGE_FLAGS_coverage "-fprofile-instr-generate -fcoverage-mapping -pthread -Wl,--no-as-needed -Wl,-ldl -Wl,-lm -Wno-unused-command-line-argument"
# Default sanitizer, fuzzing engine and architecture to use.
ENV SANITIZER="address"
ENV FUZZING_ENGINE="libfuzzer"
ENV ARCHITECTURE="x86_64"
# DEPRECATED - NEW CODE SHOULD NOT USE THIS. OLD CODE SHOULD STOP. Please use
# LIB_FUZZING_ENGINE instead.
# Path to fuzzing engine library to support some old users of
# LIB_FUZZING_ENGINE.
ENV LIB_FUZZING_ENGINE_DEPRECATED="/usr/lib/libFuzzingEngine.a"
# Argument passed to compiler to link against fuzzing engine.
# Defaults to the path, but is "-fsanitize=fuzzer" in libFuzzer builds.
ENV LIB_FUZZING_ENGINE="/usr/lib/libFuzzingEngine.a"
# TODO: remove after tpm2 catchup.
ENV FUZZER_LDFLAGS ""
WORKDIR $SRC
COPY afl_llvm22_patch.diff $SRC/
RUN git clone https://github.com/AFLplusplus/AFLplusplus.git aflplusplus && \
cd aflplusplus && \
git checkout eadc8a2a7e0fa0338802ee6254bf296489ce4fd7 && \
wget --no-check-certificate -O oss.sh https://raw.githubusercontent.com/vanhauser-thc/binary_blobs/master/oss.sh && \
git apply $SRC/afl_llvm22_patch.diff && \
rm -rf .git && \
chmod 755 oss.sh
# Do precompiles before copying other scripts for better cache efficiency.
COPY precompile_afl /usr/local/bin/
RUN precompile_afl
RUN cd $SRC && \
curl -L -O https://github.com/google/honggfuzz/archive/oss-fuzz.tar.gz && \
mkdir honggfuzz && \
cd honggfuzz && \
tar -xz --strip-components=1 -f $SRC/oss-fuzz.tar.gz && \
rm -rf examples $SRC/oss-fuzz.tar.gz
COPY precompile_honggfuzz_ubuntu_20_04 /usr/local/bin/
RUN precompile_honggfuzz_ubuntu_20_04
RUN cd $SRC && \
git clone https://github.com/google/fuzztest && \
cd fuzztest && \
git checkout a37d133f714395cabc20dd930969a889495c9f53 && \
rm -rf .git
ENV CENTIPEDE_BIN_DIR=$SRC/fuzztest/bazel-bin
COPY precompile_centipede /usr/local/bin/
RUN precompile_centipede
COPY sanitizers /usr/local/lib/sanitizers
COPY bazel_build_fuzz_tests \
cargo \
compile \
compile_afl \
compile_centipede \
compile_honggfuzz \
compile_fuzztests.sh \
compile_go_fuzzer \
compile_javascript_fuzzer \
compile_libfuzzer \
compile_native_go_fuzzer \
compile_native_go_fuzzer_v2 \
go_utils.sh \
compile_python_fuzzer \
debug_afl \
# Go, JavaScript, Java, Python, Rust, and Swift installation scripts.
install_go.sh \
install_javascript.sh \
install_java.sh \
install_python.sh \
install_ruby.sh \
install_rust.sh \
install_swift_ubuntu-20-04.sh \
make_build_replayable.py \
python_coverage_helper.py \
replay_build.sh \
srcmap \
write_labels.py \
unshallow_repos.py \
/usr/local/bin/
# TODO: Build this as part of a multi-stage build.
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang-jcc /usr/local/bin/
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang++-jcc /usr/local/bin
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang-jcc2 /usr/local/bin/
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang++-jcc2 /usr/local/bin
RUN chmod +x /usr/local/bin/clang-jcc /usr/local/bin/clang++-jcc /usr/local/bin/clang-jcc2 /usr/local/bin/clang++-jcc2
COPY llvmsymbol.diff $SRC
COPY detect_repo.py /opt/cifuzz/
COPY bazel.bazelrc /root/.bazelrc
# Set up ccache binary and cache directory.
# /ccache/bin will contain the compiler wrappers, and /ccache/cache will
# contain the actual cache, which can be saved.
# To use this, set PATH=/ccache/bin:$PATH.
RUN mkdir -p /ccache/bin && mkdir -p /ccache/cache && \
ln -s /usr/local/bin/ccache /ccache/bin/clang && \
ln -s /usr/local/bin/ccache /ccache/bin/clang++ && \
ln -s /usr/local/bin/ccache /ccache/bin/clang-jcc && \
ln -s /usr/local/bin/ccache /ccache/bin/clang++-jcc
ENV CCACHE_DIR /ccache/cache
# Don't check that the compiler is the same, so we can switch between jcc and
# clang under the hood and re-use the same build cache.
ENV CCACHE_COMPILERCHECK none
ENV CCACHE_COMPILERTYPE clang
# Build newer patchelf than the one available from Ubuntu.
RUN cd /tmp && git clone https://github.com/NixOS/patchelf && \
apt-get update && apt-get install -y autoconf && \
cd patchelf && git checkout 523f401584d9584e76c9c77004e7abeb9e6c4551 && \
unset CFLAGS && export CXXFLAGS='-stdlib=libc++' && export LDFLAGS='-lpthread' && \
./bootstrap.sh && ./configure && make && \
cp /tmp/patchelf/src/patchelf /usr/local/bin && \
rm -rf /tmp/patchelf && apt-get remove -y autoconf
COPY indexer /opt/indexer
COPY --from=gcr.io/oss-fuzz-base/indexer /indexer/build/indexer /opt/indexer/indexer
RUN chmod a+x /opt/indexer/indexer /opt/indexer/index_build.py
CMD ["compile"]
================================================
FILE: infra/base-images/base-builder/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-clang:ubuntu-24-04
COPY install_deps_ubuntu-24-04.sh install_swift_ubuntu-24-04.sh /
RUN /install_deps_ubuntu-24-04.sh
# Build and install latest Python 3.11.
ENV PYTHON_VERSION 3.11.13
RUN PYTHON_DEPS="\
zlib1g-dev \
libncurses-dev \
libgdbm-dev \
libnss3-dev \
libssl-dev \
libsqlite3-dev \
libreadline-dev \
libffi-dev \
libbz2-dev \
liblzma-dev" && \
unset CFLAGS CXXFLAGS && \
apt-get install -y $PYTHON_DEPS && \
cd /tmp && \
curl -O https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tar.xz && \
tar -xvf Python-$PYTHON_VERSION.tar.xz && \
cd Python-$PYTHON_VERSION && \
./configure --enable-optimizations --enable-shared && \
make -j$(nproc) && \
make install && \
ldconfig && \
ln -s /usr/local/bin/python3 /usr/local/bin/python && \
cd .. && \
rm -r /tmp/Python-$PYTHON_VERSION.tar.xz /tmp/Python-$PYTHON_VERSION && \
rm -rf /usr/local/lib/python${PYTHON_VERSION%.*}/test && \
python3 -m ensurepip && \
python3 -m pip install --upgrade pip && \
apt-get remove -y $PYTHON_DEPS # https://github.com/google/oss-fuzz/issues/3888
ENV CCACHE_VERSION 4.10.2
RUN cd /tmp && curl -OL https://github.com/ccache/ccache/releases/download/v$CCACHE_VERSION/ccache-$CCACHE_VERSION.tar.xz && \
tar -xvf ccache-$CCACHE_VERSION.tar.xz && cd ccache-$CCACHE_VERSION && \
mkdir build && cd build && \
export LDFLAGS='-lpthread' && \
cmake -D CMAKE_BUILD_TYPE=Release .. && \
make -j && make install && \
rm -rf /tmp/ccache-$CCACHE_VERSION /tmp/ccache-$CCACHE_VERSION.tar.xz
# Install six for Bazel rules.
RUN unset CFLAGS CXXFLAGS && pip3 install -v --no-cache-dir \
six==1.15.0 absl-py==2.3.0 pyelftools==0.32 && rm -rf /tmp/*
# Install Bazel through Bazelisk, which automatically fetches the latest Bazel version.
ENV BAZELISK_VERSION 1.9.0
RUN curl -L https://github.com/bazelbuild/bazelisk/releases/download/v$BAZELISK_VERSION/bazelisk-linux-amd64 -o /usr/local/bin/bazel && \
chmod +x /usr/local/bin/bazel
# Default build flags for various sanitizers.
ENV SANITIZER_FLAGS_address "-fsanitize=address -fsanitize-address-use-after-scope"
ENV SANITIZER_FLAGS_hwaddress "-fsanitize=hwaddress -fuse-ld=lld -Wno-unused-command-line-argument"
# Set of '-fsanitize' flags matches '-fno-sanitize-recover' + 'unsigned-integer-overflow'.
ENV SANITIZER_FLAGS_undefined "-fsanitize=array-bounds,bool,builtin,enum,function,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unsigned-integer-overflow,unreachable,vla-bound,vptr -fno-sanitize-recover=array-bounds,bool,builtin,enum,function,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unreachable,vla-bound,vptr"
# Don't include "function" since it is unsupported on aarch64.
ENV SANITIZER_FLAGS_undefined_aarch64 "-fsanitize=array-bounds,bool,builtin,enum,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unsigned-integer-overflow,unreachable,vla-bound,vptr -fno-sanitize-recover=array-bounds,bool,builtin,enum,integer-divide-by-zero,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unreachable,vla-bound,vptr"
ENV SANITIZER_FLAGS_memory "-fsanitize=memory -fsanitize-memory-track-origins"
ENV SANITIZER_FLAGS_thread "-fsanitize=thread"
ENV SANITIZER_FLAGS_introspector "-O0 -flto -fno-inline-functions -fuse-ld=gold -Wno-unused-command-line-argument"
# Do not use any sanitizers in the coverage build.
ENV SANITIZER_FLAGS_coverage ""
# We use unsigned-integer-overflow as an additional coverage signal and have to
# suppress error messages. See https://github.com/google/oss-fuzz/issues/910.
ENV UBSAN_OPTIONS="silence_unsigned_overflow=1"
# To suppress warnings from binaries running during compilation.
ENV DFSAN_OPTIONS='warn_unimplemented=0'
# Default build flags for coverage feedback.
ENV COVERAGE_FLAGS="-fsanitize=fuzzer-no-link"
# Use '-Wno-unused-command-line-argument' to suppress "warning: -ldl: 'linker' input unused"
# messages which are treated as errors by some projects.
ENV COVERAGE_FLAGS_coverage "-fprofile-instr-generate -fcoverage-mapping -pthread -Wl,--no-as-needed -Wl,-ldl -Wl,-lm -Wno-unused-command-line-argument"
# Default sanitizer, fuzzing engine and architecture to use.
ENV SANITIZER="address"
ENV FUZZING_ENGINE="libfuzzer"
ENV ARCHITECTURE="x86_64"
# DEPRECATED - NEW CODE SHOULD NOT USE THIS. OLD CODE SHOULD STOP. Please use
# LIB_FUZZING_ENGINE instead.
# Path to fuzzing engine library to support some old users of
# LIB_FUZZING_ENGINE.
ENV LIB_FUZZING_ENGINE_DEPRECATED="/usr/lib/libFuzzingEngine.a"
# Argument passed to compiler to link against fuzzing engine.
# Defaults to the path, but is "-fsanitize=fuzzer" in libFuzzer builds.
ENV LIB_FUZZING_ENGINE="/usr/lib/libFuzzingEngine.a"
# TODO: remove after tpm2 catchup.
ENV FUZZER_LDFLAGS ""
WORKDIR $SRC
COPY afl_llvm22_patch.diff $SRC/
RUN git clone https://github.com/AFLplusplus/AFLplusplus.git aflplusplus && \
cd aflplusplus && \
git checkout eadc8a2a7e0fa0338802ee6254bf296489ce4fd7 && \
wget --no-check-certificate -O oss.sh https://raw.githubusercontent.com/vanhauser-thc/binary_blobs/master/oss.sh && \
git apply $SRC/afl_llvm22_patch.diff && \
rm -rf .git && \
chmod 755 oss.sh
# Do precompiles before copying other scripts for better cache efficiency.
COPY precompile_afl /usr/local/bin/
RUN precompile_afl
RUN cd $SRC && \
curl -L -O https://github.com/google/honggfuzz/archive/oss-fuzz.tar.gz && \
mkdir honggfuzz && \
cd honggfuzz && \
tar -xz --strip-components=1 -f $SRC/oss-fuzz.tar.gz && \
rm -rf examples $SRC/oss-fuzz.tar.gz
COPY precompile_honggfuzz_ubuntu_24_04 /usr/local/bin/
RUN precompile_honggfuzz_ubuntu_24_04
RUN cd $SRC && \
git clone https://github.com/google/fuzztest && \
cd fuzztest && \
git checkout a37d133f714395cabc20dd930969a889495c9f53 && \
rm -rf .git
ENV CENTIPEDE_BIN_DIR=$SRC/fuzztest/bazel-bin
COPY precompile_centipede /usr/local/bin/
RUN precompile_centipede
COPY sanitizers /usr/local/lib/sanitizers
COPY bazel_build_fuzz_tests \
cargo \
compile \
compile_afl \
compile_centipede \
compile_honggfuzz \
compile_fuzztests.sh \
compile_go_fuzzer \
compile_javascript_fuzzer \
compile_libfuzzer \
compile_native_go_fuzzer \
compile_native_go_fuzzer_v2 \
go_utils.sh \
compile_python_fuzzer \
debug_afl \
# Go, JavaScript, Java, Python, Rust, and Swift installation scripts.
install_go.sh \
install_javascript.sh \
install_java.sh \
install_python.sh \
install_ruby.sh \
install_rust.sh \
install_swift_ubuntu-24-04.sh \
make_build_replayable.py \
python_coverage_helper.py \
replay_build.sh \
srcmap \
write_labels.py \
unshallow_repos.py \
/usr/local/bin/
# TODO: Build this as part of a multi-stage build.
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang-jcc /usr/local/bin/
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang++-jcc /usr/local/bin
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang-jcc2 /usr/local/bin/
ADD https://commondatastorage.googleapis.com/clusterfuzz-builds/jcc/clang++-jcc2 /usr/local/bin
RUN chmod +x /usr/local/bin/clang-jcc /usr/local/bin/clang++-jcc /usr/local/bin/clang-jcc2 /usr/local/bin/clang++-jcc2
COPY indexer /opt/indexer
COPY --from=gcr.io/oss-fuzz-base/indexer:ubuntu-24-04 /indexer/build/indexer /opt/indexer/indexer
RUN chmod a+x /opt/indexer/indexer /opt/indexer/index_build.py
COPY llvmsymbol.diff $SRC
COPY detect_repo.py /opt/cifuzz/
COPY bazel.bazelrc /root/.bazelrc
# Set up ccache binary and cache directory.
# /ccache/bin will contain the compiler wrappers, and /ccache/cache will
# contain the actual cache, which can be saved.
# To use this, set PATH=/ccache/bin:$PATH.
RUN mkdir -p /ccache/bin && mkdir -p /ccache/cache && \
ln -s /usr/local/bin/ccache /ccache/bin/clang && \
ln -s /usr/local/bin/ccache /ccache/bin/clang++ && \
ln -s /usr/local/bin/ccache /ccache/bin/clang-jcc && \
ln -s /usr/local/bin/ccache /ccache/bin/clang++-jcc
ENV CCACHE_DIR /ccache/cache
# Don't check that the compiler is the same, so we can switch between jcc and
# clang under the hood and re-use the same build cache.
ENV CCACHE_COMPILERCHECK none
ENV CCACHE_COMPILERTYPE clang
# Build newer patchelf than the one available from Ubuntu.
RUN cd /tmp && git clone https://github.com/NixOS/patchelf && \
apt-get update && apt-get install -y autoconf && \
cd patchelf && git checkout 523f401584d9584e76c9c77004e7abeb9e6c4551 && \
unset CFLAGS && export CXXFLAGS='-stdlib=libc++' && export LDFLAGS='-lpthread' && \
./bootstrap.sh && ./configure && make && \
cp /tmp/patchelf/src/patchelf /usr/local/bin && \
rm -rf /tmp/patchelf && apt-get remove -y autoconf
CMD ["compile"]
================================================
FILE: infra/base-images/base-builder/unshallow_repos.py
================================================
#!/usr/bin/env python3
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Script to unshallow repositories."""
import argparse
import os
import pathlib
import re
import subprocess
SRC = pathlib.Path(os.getenv('SRC', '/src'))
def main():
parser = argparse.ArgumentParser(description='Unshallows repositores.')
parser.add_argument('repos', nargs='+', help='Repo URLs')
args = parser.parse_args()
repos = set()
for repo in args.repos:
repos.add(_normalize_repo(repo))
for subdir in SRC.iterdir():
if (subdir / '.git').exists():
repo = subprocess.check_output(['git', 'remote', 'get-url', 'origin'],
cwd=subdir).decode().strip()
if _normalize_repo(repo) in repos:
if not _is_shallow_repo(subdir):
continue
print(f'Unshallowing {repo} at {subdir}.')
subprocess.check_call(['git', 'fetch', '--unshallow'], cwd=subdir)
def _normalize_repo(repo: str) -> str:
return re.sub(r'(.git)?/?$', '', repo)
def _is_shallow_repo(directory: pathlib.Path):
return subprocess.check_output(
['git', 'rev-parse', '--is-shallow-repository'],
cwd=directory).decode().strip() == 'true'
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder/write_labels.py
================================================
#!/usr/bin/env python3
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Script for writing from project.yaml to .labels file."""
import os
import json
import sys
def main():
"""Writes labels."""
if len(sys.argv) != 3:
print('Usage: write_labels.py labels_json out_dir', file=sys.stderr)
sys.exit(1)
labels_by_target = json.loads(sys.argv[1])
out = sys.argv[2]
for target_name, labels in labels_by_target.items():
# Skip over wildcard value applying to all fuzz targets
if target_name == '*':
continue
with open(os.path.join(out, target_name + '.labels'), 'w') as file_handle:
file_handle.write('\n'.join(labels))
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-builder-fuzzbench/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-builder-fuzzbench
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-builder-fuzzbench` were successfully built. These images install dependencies for FuzzBench, a service for evaluating fuzzers. The build process required several modifications to the `fuzzbench_install_dependencies` script to handle package version incompatibilities and differences between Ubuntu 20.04 and 24.04.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-builder-fuzzbench:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-builder-fuzzbench:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The `ubuntu-24-04` image includes newer versions of many packages, including Python development libraries. The `fuzzbench_install_dependencies` script was updated to handle these differences.
## Dockerfile Analysis
The Dockerfiles for both versions have the following key differences:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-builder` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Dependency Installation:** The `fuzzbench_install_dependencies` script was modified to:
* Update the `pytype` version to `2024.4.11`.
* Update the `Orange3` package version to `3.39.0`.
* Add version detection logic to install the correct Python development packages for each Ubuntu version.
* Install `lsb-release` to support the version detection logic.
================================================
FILE: infra/base-images/base-builder-fuzzbench/Dockerfile
================================================
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder
# Copy/Run this now to make the cache more resilient.
COPY fuzzbench_install_dependencies /usr/local/bin
RUN fuzzbench_install_dependencies
ENV OSS_FUZZ_ON_DEMAND=1
COPY fuzzbench_build fuzzbench_run_fuzzer fuzzbench_measure /usr/local/bin/
================================================
FILE: infra/base-images/base-builder-fuzzbench/fuzzbench_build
================================================
#! /bin/bash -eux
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# TODO(metzman): Do this in a docket image so we don't need to waste time
# reinstalling.
PYTHONPATH=$FUZZBENCH_PATH python3 -B -u -c "from fuzzers.$FUZZING_ENGINE import fuzzer; fuzzer.build()"
if [ "$FUZZING_ENGINE" = "coverage" ]; then
cd $OUT
mkdir -p filestore/oss-fuzz-on-demand/coverage-binaries
# We expect an error regarding leading slashes. Just assume this step succeeds.
# TODO(metzman): Fix this when I get a chance.
tar -czvf filestore/oss-fuzz-on-demand/coverage-binaries/coverage-build-$PROJECT.tar.gz * /src /work || exit 0
fi
================================================
FILE: infra/base-images/base-builder-fuzzbench/fuzzbench_install_dependencies
================================================
#! /bin/bash -eux
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
apt-get update && apt-get install -y gcc gfortran python-dev libopenblas-dev liblapack-dev cython libpq-dev
wget -O /tmp/requirements.txt https://raw.githubusercontent.com/google/fuzzbench/master/requirements.txt
pip3 install pip --upgrade
CFLAGS= CXXFLAGS= pip3 install -r /tmp/requirements.txt
rm /tmp/requirements.txt
================================================
FILE: infra/base-images/base-builder-fuzzbench/fuzzbench_install_dependencies_ubuntu_20_04
================================================
#! /bin/bash -eux
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
apt-get update && apt-get install -y gcc gfortran python-dev libopenblas-dev liblapack-dev cython libpq-dev
wget -O /tmp/requirements.txt https://raw.githubusercontent.com/google/fuzzbench/master/requirements.txt
pip3 install pip --upgrade
CFLAGS= CXXFLAGS= pip3 install -r /tmp/requirements.txt
rm /tmp/requirements.txt
================================================
FILE: infra/base-images/base-builder-fuzzbench/fuzzbench_install_dependencies_ubuntu_24_04
================================================
#! /bin/bash -eux
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
if [[ $(lsb_release -rs) == "20.04" ]]; then
apt-get update && apt-get install -y gcc gfortran python-dev libopenblas-dev liblapack-dev cython libpq-dev
else
apt-get update && apt-get install -y gcc gfortran python3-dev libopenblas-dev liblapack-dev cython3 libpq-dev
fi
wget -O /tmp/requirements.txt https://raw.githubusercontent.com/google/fuzzbench/master/requirements.txt
pip3 install pip --upgrade
CFLAGS= CXXFLAGS= pip3 install -r /tmp/requirements.txt
rm /tmp/requirements.txt
================================================
FILE: infra/base-images/base-builder-fuzzbench/fuzzbench_measure
================================================
#! /bin/bash -eux
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# TODO(metzman): Make these configurable.
export DB_PATH=$OUT/experiment.db
export SNAPSHOT_PERIOD=30
export EXPERIMENT_FILESTORE=$OUT/filestore
export MAX_TOTAL_TIME=120
export EXPERIMENT=oss-fuzz-on-demand
rm -f $DB_PATH
# FUZZER=mopt BENCHMARK=skcms
export SQL_DATABASE_URL=sqlite:///$DB_PATH
cd $FUZZBENCH_PATH
PYTHONPATH=. python3 -B experiment/measurer/standalone.py $MAX_TOTAL_TIME
================================================
FILE: infra/base-images/base-builder-fuzzbench/fuzzbench_run_fuzzer
================================================
#! /bin/bash -eux
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
export RUNNER_NICENESS="-5"
export EXPERIMENT_FILESTORE=$OUT/filestore
export EXPERIMENT=oss-fuzz-on-demand
export OSS_FUZZ_ON_DEMAND=1
export OUTPUT_CORPUS_DIR=/output-corpus
export SEED_CORPUS_DIR=/input-corpus
mkdir $SEED_CORPUS_DIR
rm -rf $OUTPUT_CORPUS_DIR
mkdir $OUTPUT_CORPUS_DIR
export FUZZER=$FUZZING_ENGINE
export SNAPSHOT_PERIOD=$((MAX_TOTAL_TIME / 3))
export TRIAL_ID=1
export FORCE_LOCAL=1
# BENCHMARK, FUZZ_TARGET
cd $OUT
# Prevent permissions issues with pyc files and docker.
cp -r $FUZZBENCH_PATH /tmp/fuzzbench
PYTHONPATH=/tmp/fuzzbench nice -n $RUNNER_NICENESS python3 -B -u /tmp/fuzzbench/experiment/runner.py
cat $EXPERIMENT_FILESTORE/$EXPERIMENT/experiment-folders/$BENCHMARK-$FUZZER/trial-$TRIAL_ID/results/fuzzer-log.txt
================================================
FILE: infra/base-images/base-builder-fuzzbench/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-20-04
# Copy/Run this now to make the cache more resilient.
COPY fuzzbench_install_dependencies_ubuntu_20_04 /usr/local/bin
RUN fuzzbench_install_dependencies_ubuntu_20_04
ENV OSS_FUZZ_ON_DEMAND=1
COPY fuzzbench_build fuzzbench_run_fuzzer fuzzbench_measure /usr/local/bin/
================================================
FILE: infra/base-images/base-builder-fuzzbench/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-24-04
# Copy/Run this now to make the cache more resilient.
COPY fuzzbench_install_dependencies_ubuntu_24_04 /usr/local/bin
RUN fuzzbench_install_dependencies_ubuntu_24_04
ENV OSS_FUZZ_ON_DEMAND=1
COPY fuzzbench_build fuzzbench_run_fuzzer fuzzbench_measure /usr/local/bin/
================================================
FILE: infra/base-images/base-builder-go/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-builder-go
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-builder-go` were successfully built. These images install the Go programming language and related fuzzing tools on top of the `base-builder` image. The build process for both versions is nearly identical, with the primary difference being the base image used.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-builder-go:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-builder-go:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
There are no significant package differences introduced in this build stage, as the dependencies are inherited from the `base-builder` image.
## Dockerfile Analysis
The Dockerfiles for both versions are very similar and perform the following actions:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-builder` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Go Installation:** The `install_go.sh` script is used to download and install Go.
* **Go Fuzzing Tools:** The Dockerfiles install several Go-based fuzzing tools, including `go114-fuzz-build` and `go-118-fuzz-build`.
================================================
FILE: infra/base-images/base-builder-go/Dockerfile
================================================
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder
# Set up Golang environment variables (copied from /root/.bash_profile).
ENV GOPATH /root/go
# /root/.go/bin is for the standard Go binaries (i.e. go, gofmt, etc).
# $GOPATH/bin is for the binaries from the dependencies installed via "go get".
ENV PATH $PATH:/root/.go/bin:$GOPATH/bin
COPY gosigfuzz.c $GOPATH/gosigfuzz/
RUN install_go.sh
# TODO(jonathanmetzman): Install this file using install_go.sh.
COPY ossfuzz_coverage_runner.go \
$GOPATH/
================================================
FILE: infra/base-images/base-builder-go/gosigfuzz.c
================================================
/*
* Copyright 2023 Google LLC
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
* http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include
#include
static void fixSignalHandler(int signum) {
struct sigaction new_action;
struct sigaction old_action;
sigemptyset (&new_action.sa_mask);
sigaction (signum, NULL, &old_action);
new_action.sa_flags = old_action.sa_flags | SA_ONSTACK;
new_action.sa_sigaction = old_action.sa_sigaction;
new_action.sa_handler = old_action.sa_handler;
sigaction (signum, &new_action, NULL);
}
static void FixStackSignalHandler() {
fixSignalHandler(SIGSEGV);
fixSignalHandler(SIGABRT);
fixSignalHandler(SIGALRM);
fixSignalHandler(SIGINT);
fixSignalHandler(SIGTERM);
fixSignalHandler(SIGBUS);
fixSignalHandler(SIGFPE);
fixSignalHandler(SIGXFSZ);
fixSignalHandler(SIGUSR1);
fixSignalHandler(SIGUSR2);
}
int LLVMFuzzerInitialize(int *argc, char ***argv) {
FixStackSignalHandler();
return 0;
}
================================================
FILE: infra/base-images/base-builder-go/ossfuzz_coverage_runner.go
================================================
// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package mypackagebeingfuzzed
import (
"io/fs"
"io/ioutil"
"os"
"path/filepath"
"runtime/pprof"
"testing"
)
func TestFuzzCorpus(t *testing.T) {
dir := os.Getenv("FUZZ_CORPUS_DIR")
if dir == "" {
t.Logf("No fuzzing corpus directory set")
return
}
filename := ""
defer func() {
if r := recover(); r != nil {
t.Error("Fuzz panicked in "+filename, r)
}
}()
profname := os.Getenv("FUZZ_PROFILE_NAME")
if profname != "" {
f, err := os.Create(profname + ".cpu.prof")
if err != nil {
t.Logf("error creating profile file %s\n", err)
} else {
_ = pprof.StartCPUProfile(f)
}
}
_, err := ioutil.ReadDir(dir)
if err != nil {
t.Logf("Not fuzzing corpus directory %s", err)
return
}
// recurse for regressions subdirectory
err = filepath.Walk(dir, func(fname string, info fs.FileInfo, err error) error {
if info.IsDir() {
return nil
}
data, err := ioutil.ReadFile(fname)
if err != nil {
t.Error("Failed to read corpus file", err)
return err
}
filename = fname
FuzzFunction(data)
return nil
})
if err != nil {
t.Error("Failed to run corpus", err)
}
if profname != "" {
pprof.StopCPUProfile()
f, err := os.Create(profname + ".heap.prof")
if err != nil {
t.Logf("error creating heap profile file %s\n", err)
}
if err = pprof.WriteHeapProfile(f); err != nil {
t.Logf("error writing heap profile file %s\n", err)
}
f.Close()
}
}
================================================
FILE: infra/base-images/base-builder-go/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-20-04
# Set up Golang environment variables (copied from /root/.bash_profile).
ENV GOPATH /root/go
# /root/.go/bin is for the standard Go binaries (i.e. go, gofmt, etc).
# $GOPATH/bin is for the binaries from the dependencies installed via "go get".
ENV PATH $PATH:/root/.go/bin:$GOPATH/bin
COPY gosigfuzz.c $GOPATH/gosigfuzz/
RUN install_go.sh
# TODO(jonathanmetzman): Install this file using install_go.sh.
COPY ossfuzz_coverage_runner.go \
$GOPATH/
================================================
FILE: infra/base-images/base-builder-go/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not a use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-24-04
# Set up Golang environment variables (copied from /root/.bash_profile).
ENV GOPATH /root/go
# /root/.go/bin is for the standard Go binaries (i.e. go, gofmt, etc).
# $GOPATH/bin is for the binaries from the dependencies installed via "go get".
ENV PATH $PATH:/root/.go/bin:$GOPATH/bin
COPY gosigfuzz.c $GOPATH/gosigfuzz/
RUN install_go.sh
# TODO(jonathanmetzman): Install this file using install_go.sh.
COPY ossfuzz_coverage_runner.go \
$GOPATH/
================================================
FILE: infra/base-images/base-builder-javascript/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-builder-javascript
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-builder-javascript` were successfully built. These images install Node.js and other JavaScript-related tools on top of the `base-builder` image. The build process for both versions is straightforward and relies on the `install_javascript.sh` script, which is compatible with both base images.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-builder-javascript:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-builder-javascript:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The primary difference is the version of Node.js installed, which is Node.js 20.x for both versions, but the underlying dependencies may differ due to the base image.
## Dockerfile Analysis
The Dockerfiles for both versions are very similar and perform the following actions:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-builder` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Node.js Installation:** The `install_javascript.sh` script is used to add the Node.js repository and install Node.js.
================================================
FILE: infra/base-images/base-builder-javascript/Dockerfile
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder
RUN install_javascript.sh
================================================
FILE: infra/base-images/base-builder-javascript/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-20-04
RUN install_javascript.sh
================================================
FILE: infra/base-images/base-builder-javascript/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-24-04
RUN install_javascript.sh
================================================
FILE: infra/base-images/base-builder-jvm/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-builder-jvm
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-builder-jvm` were successfully built. These images install the Java Development Kit (JDK) and the Jazzer fuzzer on top of the `base-builder` image. The `ubuntu-24-04` build required fixing syntax errors in the Dockerfile, specifically missing line continuation characters (`\`). After these corrections, both builds completed successfully.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-builder-jvm:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-builder-jvm:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The primary difference is the Java version installed. Both versions install OpenJDK 17 and 15. The underlying dependencies may differ due to the base image.
## Dockerfile Analysis
The Dockerfiles for both versions have the following key differences:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-builder` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Java Installation:** The `install_java.sh` script is used to download and install OpenJDK 17 and 15.
* **Jazzer Installation:** Both versions clone the Jazzer repository and build it using Bazel.
* **Dockerfile Syntax:** The `ubuntu-24-04` Dockerfile had syntax errors that were corrected.
================================================
FILE: infra/base-images/base-builder-jvm/Dockerfile
================================================
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder AS base
ENV JAVA_HOME /usr/lib/jvm/java-17-openjdk-amd64
ENV JAVA_15_HOME /usr/lib/jvm/java-15-openjdk-amd64
ENV JVM_LD_LIBRARY_PATH $JAVA_HOME/lib/server
ENV PATH $PATH:$JAVA_HOME/bin
ENV JAZZER_API_PATH "/usr/local/lib/jazzer_api_deploy.jar"
ENV JAZZER_JUNIT_PATH "/usr/local/bin/jazzer_junit.jar"
RUN install_java.sh
RUN chmod 777 /usr/local/bin && chmod 777 /usr/local/lib
FROM base AS builder
RUN useradd -m jazzer_user
USER jazzer_user
WORKDIR $SRC
# Install Jazzer version 0.26.0
RUN git clone https://github.com/CodeIntelligenceTesting/jazzer && \
cd jazzer && \
git checkout 86378b7a20f08165e72a45582d6a9a5091212318
WORKDIR $SRC/jazzer
RUN echo "build --java_runtime_version=local_jdk_17" >> .bazelrc \
&& echo "build --cxxopt=-stdlib=libc++" >> .bazelrc \
&& echo "build --linkopt=-lc++" >> .bazelrc
RUN bazel build \
//src/main/java/com/code_intelligence/jazzer:jazzer_standalone_deploy.jar \
//deploy:jazzer-api \
//deploy:jazzer-junit \
//launcher:jazzer
RUN cp $(bazel cquery --output=files //src/main/java/com/code_intelligence/jazzer:jazzer_standalone_deploy.jar) /usr/local/bin/jazzer_agent_deploy.jar && \
cp $(bazel cquery --output=files //launcher:jazzer) /usr/local/bin/jazzer_driver && \
cp $(bazel cquery --output=files //deploy:jazzer-api) $JAZZER_API_PATH && \
cp $(bazel cquery --output=files //deploy:jazzer-junit) $JAZZER_JUNIT_PATH
FROM base AS final
COPY --from=builder /usr/local/bin/jazzer_agent_deploy.jar /usr/local/bin/jazzer_agent_deploy.jar
COPY --from=builder /usr/local/bin/jazzer_driver /usr/local/bin/jazzer_driver
COPY --from=builder $JAZZER_API_PATH $JAZZER_API_PATH
COPY --from=builder $JAZZER_JUNIT_PATH $JAZZER_JUNIT_PATH
RUN chmod 755 /usr/local/bin && chmod 755 /usr/local/lib
WORKDIR $SRC
================================================
FILE: infra/base-images/base-builder-jvm/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-20-04 AS base
ENV JAVA_HOME /usr/lib/jvm/java-17-openjdk-amd64
ENV JAVA_15_HOME /usr/lib/jvm/java-15-openjdk-amd64
ENV JVM_LD_LIBRARY_PATH $JAVA_HOME/lib/server
ENV PATH $PATH:$JAVA_HOME/bin
ENV JAZZER_API_PATH "/usr/local/lib/jazzer_api_deploy.jar"
ENV JAZZER_JUNIT_PATH "/usr/local/bin/jazzer_junit.jar"
RUN install_java.sh
RUN chmod 777 /usr/local/bin && chmod 777 /usr/local/lib
FROM base AS builder
RUN useradd -m jazzer_user
USER jazzer_user
WORKDIR $SRC
RUN git clone https://github.com/CodeIntelligenceTesting/jazzer && \
cd jazzer && \
git checkout 11b42852df4344737df54a380c2f522025bb4e84
WORKDIR $SRC/jazzer
RUN echo "build --java_runtime_version=local_jdk_17" >> .bazelrc \
&& echo "build --cxxopt=-stdlib=libc++" >> .bazelrc \
&& echo "build --linkopt=-lc++" >> .bazelrc
RUN bazel build \
//src/main/java/com/code_intelligence/jazzer:jazzer_standalone_deploy.jar \
//deploy:jazzer-api \
//deploy:jazzer-junit \
//launcher:jazzer
RUN cp $(bazel cquery --output=files //src/main/java/com/code_intelligence/jazzer:jazzer_standalone_deploy.jar) /usr/local/bin/jazzer_agent_deploy.jar && \
cp $(bazel cquery --output=files //launcher:jazzer) /usr/local/bin/jazzer_driver && \
cp $(bazel cquery --output=files //deploy:jazzer-api) $JAZZER_API_PATH && \
cp $(bazel cquery --output=files //deploy:jazzer-junit) $JAZZER_JUNIT_PATH
FROM base AS final
COPY --from=builder /usr/local/bin/jazzer_agent_deploy.jar /usr/local/bin/jazzer_agent_deploy.jar
COPY --from=builder /usr/local/bin/jazzer_driver /usr/local/bin/jazzer_driver
COPY --from=builder $JAZZER_API_PATH $JAZZER_API_PATH
COPY --from=builder $JAZZER_JUNIT_PATH $JAZZER_JUNIT_PATH
RUN chmod 755 /usr/local/bin && chmod 755 /usr/local/lib
WORKDIR $SRC
================================================
FILE: infra/base-images/base-builder-jvm/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-24-04 AS base
ENV JAVA_HOME /usr/lib/jvm/java-17-openjdk-amd64
ENV JAVA_15_HOME /usr/lib/jvm/java-15-openjdk-amd64
ENV JVM_LD_LIBRARY_PATH $JAVA_HOME/lib/server
ENV PATH $PATH:$JAVA_HOME/bin
ENV JAZZER_API_PATH "/usr/local/lib/jazzer_api_deploy.jar"
ENV JAZZER_JUNIT_PATH "/usr/local/bin/jazzer_junit.jar"
RUN install_java.sh
RUN chmod 777 /usr/local/bin && chmod 777 /usr/local/lib
FROM base AS builder
RUN useradd -m jazzer_user
USER jazzer_user
WORKDIR $SRC
RUN git clone https://github.com/CodeIntelligenceTesting/jazzer && \
cd jazzer && \
git checkout 11b42852df4344737df54a380c2f522025bb4e84
WORKDIR $SRC/jazzer
RUN echo "build --java_runtime_version=local_jdk_17" >> .bazelrc \
&& echo "build --cxxopt=-stdlib=libc++" >> .bazelrc \
&& echo "build --linkopt=-lc++" >> .bazelrc
RUN bazel build \
//src/main/java/com/code_intelligence/jazzer:jazzer_standalone_deploy.jar \
//deploy:jazzer-api \
//deploy:jazzer-junit \
//launcher:jazzer
RUN cp $(bazel cquery --output=files //src/main/java/com/code_intelligence/jazzer:jazzer_standalone_deploy.jar) /usr/local/bin/jazzer_agent_deploy.jar && \
cp $(bazel cquery --output=files //launcher:jazzer) /usr/local/bin/jazzer_driver && \
cp $(bazel cquery --output=files //deploy:jazzer-api) $JAZZER_API_PATH && \
cp $(bazel cquery --output=files //deploy:jazzer-junit) $JAZZER_JUNIT_PATH
FROM base AS final
COPY --from=builder /usr/local/bin/jazzer_agent_deploy.jar /usr/local/bin/jazzer_agent_deploy.jar
COPY --from=builder /usr/local/bin/jazzer_driver /usr/local/bin/jazzer_driver
COPY --from=builder $JAZZER_API_PATH $JAZZER_API_PATH
COPY --from=builder $JAZZER_JUNIT_PATH $JAZZER_JUNIT_PATH
RUN chmod 755 /usr/local/bin && chmod 755 /usr/local/lib
WORKDIR $SRC
================================================
FILE: infra/base-images/base-builder-python/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-builder-python
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-builder-python` were successfully built. These images install Python-specific fuzzing tools, including Atheris and Coverage, on top of the `base-builder` image. The Dockerfile structure was refactored to support multi-version builds by creating separate Dockerfiles for each Ubuntu version and updating the `FROM` instruction accordingly.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-builder-python:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-builder-python:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The primary difference is the version of Python and its dependencies. Both versions install Atheris, PyInstaller, and other Python packages, but the underlying system libraries and Python version are different.
## Dockerfile Analysis
The Dockerfiles for both versions have the following key differences:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-builder` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Python Fuzzing Tools:** The `install_python.sh` script is used to install Atheris and other Python fuzzing tools.
* **Refactoring:** The original `Dockerfile` was renamed to `ubuntu-20-04.Dockerfile`, and a new `ubuntu-24-04.Dockerfile` was created to support the multi-version build strategy.
================================================
FILE: infra/base-images/base-builder-python/Dockerfile
================================================
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder
RUN install_python.sh
================================================
FILE: infra/base-images/base-builder-python/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-20-04
RUN install_python.sh
================================================
FILE: infra/base-images/base-builder-python/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-24-04
RUN install_python.sh
================================================
FILE: infra/base-images/base-builder-ruby/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-builder-ruby
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-builder-ruby` were successfully built. These images install Ruby and the Ruzzy fuzzer on top of the `base-builder` image. The Dockerfile structure was refactored to support multi-version builds by creating separate Dockerfiles for each Ubuntu version and updating the `FROM` instruction accordingly.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-builder-ruby:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-builder-ruby:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The primary difference is the version of Ruby and its dependencies. Both versions install Ruby 3.3.1, but the underlying system libraries and dependencies are different.
## Dockerfile Analysis
The Dockerfiles for both versions have the following key differences:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-builder` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Ruby Installation:** The `install_ruby.sh` script is used to download and install Ruby.
* **Ruzzy Installation:** Both versions clone the Ruzzy repository and install it using `gem`.
* **Refactoring:** The original `Dockerfile` was renamed to `ubuntu-20-04.Dockerfile`, and a new `ubuntu-24-04.Dockerfile` was created to support the multi-version build strategy.
================================================
FILE: infra/base-images/base-builder-ruby/Dockerfile
================================================
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder
RUN git clone https://github.com/trailofbits/ruzzy.git $SRC/ruzzy
RUN /usr/local/bin/install_ruby.sh
RUN /usr/local/bin/gem update --system 3.5.11
# Install ruzzy
WORKDIR $SRC/ruzzy
# The MAKE variable allows overwriting the make command at runtime. This forces the
# Ruby C extension to respect ENV variables when compiling, like CC, CFLAGS, etc.
ENV MAKE="make --environment-overrides V=1"
RUN CC="clang" \
CXX="clang++" \
LDSHARED="clang -shared" \
LDSHAREDXX="clang++ -shared" \
gem build
RUN MAKE="make --environment-overrides V=1" \
CC="clang" \
CXX="clang++" \
LDSHARED="clang -shared" \
LDSHAREDXX="clang++ -shared" \
CXXFLAGS="-fPIC" \
CFLAGS="-fPIC" \
RUZZY_DEBUG=1 gem install --install-dir /install/ruzzy --development --verbose ruzzy-*.gem
ENV LDSHARED="$CC -shared"
ENV LDSHAREDXX="$CXX -shared"
ENV GEM_HOME="$OUT/fuzz-gem"
ENV GEM_PATH="/install/ruzzy"
COPY ruzzy-build /usr/bin/ruzzy-build
================================================
FILE: infra/base-images/base-builder-ruby/ruzzy-build
================================================
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
fuzz_target=$(basename "$1")
echo "BASENAME: $fuzz_target ---"
harness_sh=${fuzz_target::-3}
cp $1 $OUT/$fuzz_target
echo """#!/usr/bin/env bash
# LLVMFuzzerTestOneInput for fuzzer detection.
this_dir=\$(dirname \"\$0\")
export GEM_HOME=\$this_dir/fuzz-gem
ruzzy \$this_dir/$fuzz_target \$@
""" > $OUT/$harness_sh
chmod +x $OUT/$harness_sh
================================================
FILE: infra/base-images/base-builder-ruby/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-20-04
RUN git clone https://github.com/trailofbits/ruzzy.git $SRC/ruzzy
RUN /usr/local/bin/install_ruby.sh
RUN /usr/local/bin/gem update --system 3.5.11
# Install ruzzy
WORKDIR $SRC/ruzzy
# The MAKE variable allows overwriting the make command at runtime. This forces the
# Ruby C extension to respect ENV variables when compiling, like CC, CFLAGS, etc.
ENV MAKE="make --environment-overrides V=1"
RUN CC="clang" \
CXX="clang++" \
LDSHARED="clang -shared" \
LDSHAREDXX="clang++ -shared" \
gem build
RUN MAKE="make --environment-overrides V=1" \
CC="clang" \
CXX="clang++" \
LDSHARED="clang -shared" \
LDSHAREDXX="clang++ -shared" \
CXXFLAGS="-fPIC" \
CFLAGS="-fPIC" \
RUZZY_DEBUG=1 gem install --install-dir /install/ruzzy --development --verbose ruzzy-*.gem
ENV LDSHARED="$CC -shared"
ENV LDSHAREDXX="$CXX -shared"
ENV GEM_HOME="$OUT/fuzz-gem"
ENV GEM_PATH="/install/ruzzy"
COPY ruzzy-build /usr/bin/ruzzy-build
================================================
FILE: infra/base-images/base-builder-ruby/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-24-04
RUN git clone https://github.com/trailofbits/ruzzy.git $SRC/ruzzy
RUN /usr/local/bin/install_ruby.sh
RUN /usr/local/bin/gem update --system 3.5.11
RUN gem update --system 3.5.11
# Install ruzzy
WORKDIR $SRC/ruzzy
# The MAKE variable allows overwriting the make command at runtime. This forces the
# Ruby C extension to respect ENV variables when compiling, like CC, CFLAGS, etc.
ENV MAKE="make --environment-overrides V=1"
RUN CC="clang" \
CXX="clang++" \
LDSHARED="clang -shared" \
LDSHAREDXX="clang++ -shared" \
gem build
RUN MAKE="make --environment-overrides V=1" \
CC="clang" \
CXX="clang++" \
LDSHARED="clang -shared" \
LDSHAREDXX="clang++ -shared" \
CXXFLAGS="-fPIC" \
CFLAGS="-fPIC" \
RUZZY_DEBUG=1 gem install --install-dir /install/ruzzy --development --verbose ruzzy-*.gem
ENV LDSHARED="$CC -shared"
ENV LDSHAREDXX="$CXX -shared"
ENV GEM_HOME="$OUT/fuzz-gem"
ENV GEM_PATH="/install/ruzzy"
COPY ruzzy-build /usr/bin/ruzzy-build
================================================
FILE: infra/base-images/base-builder-rust/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-builder-rust
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-builder-rust` were successfully built. These images are used for building Rust-based fuzzers and contain the necessary toolchain.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-builder-rust:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-builder-rust:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The `ubuntu-24-04` image includes newer versions of the Rust toolchain and other related dependencies.
## Dockerfile Analysis
The Dockerfiles for both versions have the following key differences:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-builder` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Dependency Installation:** The `install_rust.sh` script is used to install the Rust toolchain, which may have version differences between the two Ubuntu versions.
================================================
FILE: infra/base-images/base-builder-rust/Dockerfile
================================================
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder
ENV CARGO_HOME=/rust
ENV RUSTUP_HOME=/rust/rustup
ENV PATH=$PATH:/rust/bin
# Set up custom environment variable for source code copy for coverage reports
ENV OSSFUZZ_RUSTPATH /rust
# Force rustup to ignore `rust-toolchain` and `rust-toolchain.toml` files by
# manually specifying what toolchain to use. Note that this environment variable
# is additionally used by `install_rust.sh` as the toolchain to install.
# cf https://rust-lang.github.io/rustup/overrides.html
ENV RUSTUP_TOOLCHAIN nightly-2025-09-05
# Configure the linker used by default for x86_64 linux to be `clang` instead of
# rustc's default of `cc` which is able to find custom-built libraries like
# `libc++` by default more easily.
ENV CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER clang
RUN install_rust.sh
================================================
FILE: infra/base-images/base-builder-rust/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-20-04
ENV CARGO_HOME=/rust
ENV RUSTUP_HOME=/rust/rustup
ENV PATH=$PATH:/rust/bin
# Set up custom environment variable for source code copy for coverage reports
ENV OSSFUZZ_RUSTPATH /rust
# Force rustup to ignore `rust-toolchain` and `rust-toolchain.toml` files by
# manually specifying what toolchain to use. Note that this environment variable
# is additionally used by `install_rust.sh` as the toolchain to install.
# cf https://rust-lang.github.io/rustup/overrides.html
ENV RUSTUP_TOOLCHAIN nightly-2024-07-12
# Configure the linker used by default for x86_64 linux to be `clang` instead of
# rustc's default of `cc` which is able to find custom-built libraries like
# `libc++` by default more easily.
ENV CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER clang
RUN install_rust.sh
================================================
FILE: infra/base-images/base-builder-rust/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-24-04
ENV CARGO_HOME=/rust
ENV RUSTUP_HOME=/rust/rustup
ENV PATH=$PATH:/rust/bin
# Set up custom environment variable for source code copy for coverage reports
ENV OSSFUZZ_RUSTPATH /rust
# Force rustup to ignore `rust-toolchain` and `rust-toolchain.toml` files by
# manually specifying what toolchain to use. Note that this environment variable
# is additionally used by `install_rust.sh` as the toolchain to install.
# cf https://rust-lang.github.io/rustup/overrides.html
ENV RUSTUP_TOOLCHAIN nightly-2024-07-12
# Configure the linker used by default for x86_64 linux to be `clang` instead of
# rustc's default of `cc` which is able to find custom-built libraries like
# `libc++` by default more easily.
ENV CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER clang
RUN install_rust.sh
================================================
FILE: infra/base-images/base-builder-swift/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-builder-swift
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-builder-swift` were successfully built. These images install Swift and related tools on top of the `base-builder` image. The build process for both versions is straightforward and relies on the `install_swift.sh` script, which is compatible with both base images.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-builder-swift:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-builder-swift:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
There are no significant package differences introduced in this build stage, as the dependencies are inherited from the `base-builder` image.
## Dockerfile Analysis
The Dockerfiles for both versions are very similar and perform the following actions:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-builder` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Swift Installation:** The `install_swift.sh` script is used to download and install Swift.
================================================
FILE: infra/base-images/base-builder-swift/Dockerfile
================================================
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder
RUN install_swift.sh
COPY precompile_swift /usr/local/bin/
================================================
FILE: infra/base-images/base-builder-swift/llvmsymbol.diff
================================================
diff --git a/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt b/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt
index acfb3bd0e..a499ee2e0 100644
--- a/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt
+++ b/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt
@@ -12,4 +12,8 @@ add_llvm_component_library(LLVMSymbolize
Object
Support
Demangle
- )
+
+ LINK_LIBS
+ /usr/lib/swift_static/linux/libswiftCore.a
+ /usr/lib/x86_64-linux-gnu/libstdc++.so.6
+)
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index fb4875f79..0030769ee 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -36,6 +36,13 @@
#include
#include
+
+extern "C" char *swift_demangle(const char *mangledName,
+ size_t mangledNameLength,
+ char *outputBuffer,
+ size_t *outputBufferSize,
+ uint32_t flags);
+
namespace llvm {
namespace symbolize {
@@ -678,6 +685,14 @@ LLVMSymbolizer::DemangleName(const std::string &Name,
free(DemangledName);
return Result;
}
+ if (!Name.empty() && Name.front() == '$') {
+ char *DemangledName = swift_demangle(Name.c_str(), Name.length(), 0, 0, 0);
+ if (DemangledName) {
+ std::string Result = DemangledName;
+ free(DemangledName);
+ return Result;
+ }
+ }
if (DbiModuleDescriptor && DbiModuleDescriptor->isWin32Module())
return std::string(demanglePE32ExternCFunc(Name));
================================================
FILE: infra/base-images/base-builder-swift/precompile_swift
================================================
#!/bin/bash -eu
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
cp /usr/local/bin/llvm-symbolizer-swift $OUT/llvm-symbolizer
export SWIFTFLAGS="-Xswiftc -parse-as-library -Xswiftc -static-stdlib --static-swift-stdlib"
if [ "$SANITIZER" = "coverage" ]
then
export SWIFTFLAGS="$SWIFTFLAGS -Xswiftc -profile-generate -Xswiftc -profile-coverage-mapping -Xswiftc -sanitize=fuzzer"
else
export SWIFTFLAGS="$SWIFTFLAGS -Xswiftc -sanitize=fuzzer,$SANITIZER --sanitize=$SANITIZER"
for f in $CFLAGS; do
export SWIFTFLAGS="$SWIFTFLAGS -Xcc=$f"
done
for f in $CXXFLAGS; do
export SWIFTFLAGS="$SWIFTFLAGS -Xcxx=$f"
done
fi
================================================
FILE: infra/base-images/base-builder-swift/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-20-04
RUN install_swift_ubuntu-20-04.sh
COPY precompile_swift /usr/local/bin/
================================================
FILE: infra/base-images/base-builder-swift/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-24-04
COPY llvmsymbol.diff /src/
RUN install_swift_ubuntu-24-04.sh
COPY precompile_swift /usr/local/bin/
================================================
FILE: infra/base-images/base-clang/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-clang
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-clang` were successfully built. Both images install Clang and its dependencies on top of the corresponding `base-image`. The build process for both versions is complex, involving the checkout and compilation of a specific LLVM revision. The primary differences between the two versions are the base image used and the script for checking out and building LLVM.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-clang:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-clang:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The package differences are numerous due to the different base Ubuntu versions. The `ubuntu-24-04` image uses newer versions of essential build tools and libraries, such as `g++`, `python3`, and `zlib1g-dev`.
## Dockerfile Analysis
The Dockerfiles for both versions have the following key differences:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-image` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **LLVM Build Script:** The `ubuntu-24-04` Dockerfile uses a new script, `checkout_build_install_llvm_24.04.sh`, to handle the LLVM build process, while the `ubuntu-20-04` Dockerfile uses `checkout_build_install_llvm.sh`. This is necessary to accommodate changes in the build environment and dependencies between the two Ubuntu versions.
================================================
FILE: infra/base-images/base-clang/Dockerfile
================================================
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Docker image with clang installed.
FROM gcr.io/oss-fuzz-base/base-image
ARG arch=x86_64
ENV FUZZINTRO_OUTDIR=$SRC
# Install newer cmake.
# Many projects, as well as recent clang versions, need a newer cmake.
ENV CMAKE_VERSION 3.29.2
RUN apt-get update && apt-get install -y wget sudo && \
wget -q https://github.com/Kitware/CMake/releases/download/v$CMAKE_VERSION/cmake-$CMAKE_VERSION-Linux-$arch.sh && \
chmod +x cmake-$CMAKE_VERSION-Linux-$arch.sh && \
./cmake-$CMAKE_VERSION-Linux-$arch.sh --skip-license --prefix="/usr/local" && \
rm cmake-$CMAKE_VERSION-Linux-$arch.sh && \
SUDO_FORCE_REMOVE=yes apt-get autoremove --purge -y wget sudo && \
rm -rf /usr/local/doc/cmake /usr/local/bin/cmake-gui
COPY checkout_build_install_llvm.sh /root/
# Keep all steps in the same script to decrease the number of intermediate
# layes in docker file.
ARG FULL_LLVM_BUILD
RUN FULL_LLVM_BUILD=$FULL_LLVM_BUILD /root/checkout_build_install_llvm.sh
RUN rm /root/checkout_build_install_llvm.sh
# Setup the environment.
ENV CC "clang"
ENV CXX "clang++"
ENV CCC "clang++"
# FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION is described at
# https://llvm.org/docs/LibFuzzer.html#fuzzer-friendly-build-mode
# The implicit-function-declaration and implicit-int errors are downgraded to a
# warning, to allow compiling legacy code.
# See https://releases.llvm.org/16.0.0/tools/clang/docs/ReleaseNotes.html#potentially-breaking-changes
# Same for deprecated-declarations, int-conversion,
# incompatible-function-pointer-types, enum-constexpr-conversion,
# vla-cxx-extension
ENV CFLAGS -O1 \
-fno-omit-frame-pointer \
-gline-tables-only \
-Wno-error=incompatible-function-pointer-types \
-Wno-error=int-conversion \
-Wno-error=deprecated-declarations \
-Wno-error=implicit-function-declaration \
-Wno-error=implicit-int \
-Wno-error=unknown-warning-option \
-Wno-error=vla-cxx-extension \
-DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
ENV CXXFLAGS_EXTRA "-stdlib=libc++"
ENV CXXFLAGS "$CFLAGS $CXXFLAGS_EXTRA"
================================================
FILE: infra/base-images/base-clang/README.md
================================================
# base-builder-clang
## Regular build
```
docker build -t gcr.io/oss-fuzz-base/base-clang .
```
## Full build
For a build including all binaries and libraries, including with everything built against libcxx, do
```
docker build -t gcr.io/oss-fuzz-base/base-clang-full --build-arg FULL_LLVM_BUILD=1 .
```
================================================
FILE: infra/base-images/base-clang/checkout_build_install_llvm.sh
================================================
#!/bin/bash -eux
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
NPROC=$(nproc)
# Set this to get a full build with all binaries and libraries, as well as
# everything built with libcxx.
if [ -z "${FULL_LLVM_BUILD-}" ]; then
FULL_LLVM_BUILD=
fi
TARGET_TO_BUILD=
case $(uname -m) in
x86_64)
TARGET_TO_BUILD=X86
ARCHITECTURE_DEPS="g++-multilib"
# Use chromium's clang revision.
export CC=$WORK/llvm-stage1/bin/clang
export CXX=$WORK/llvm-stage1/bin/clang++
;;
aarch64)
TARGET_TO_BUILD=AArch64
# g++ multilib is not needed on AArch64 because we don't care about i386.
# We need to install clang and lld using apt because the binary downloaded
# from Chrome's developer tools doesn't support AArch64.
# TODO(metzman): Make x86_64 use the distro's clang for consistency once
# we support AArch64 fully.
ARCHITECTURE_DEPS="clang lld g++"
export CC=clang
export CXX=clang++
;;
*)
echo "Error: unsupported target $(uname -m)"
exit 1
;;
esac
INTROSPECTOR_DEP_PACKAGES="texinfo bison flex"
# zlib1g-dev is needed for llvm-profdata to handle coverage data from rust compiler
LLVM_DEP_PACKAGES="build-essential make ninja-build git python3 python3-distutils binutils-dev zlib1g-dev $ARCHITECTURE_DEPS $INTROSPECTOR_DEP_PACKAGES"
apt-get update && apt-get install -y $LLVM_DEP_PACKAGES --no-install-recommends
# For manual bumping.
# On each bump a full trial run for everything (fuzzing engines, sanitizers,
# languages, projects, ...) is needed.
# Check CMAKE_VERSION infra/base-images/base-clang/Dockerfile was released
# recently enough to fully support this clang version.
OUR_LLVM_REVISION=cb2f0d0a5f14
mkdir $SRC/chromium_tools
cd $SRC/chromium_tools
git clone https://chromium.googlesource.com/chromium/src/tools/clang
cd clang
# Pin clang script due to https://github.com/google/oss-fuzz/issues/7617
OUR_CLANG_REVISION=063d3766486a820c708e888d737b004d11543410
git checkout $OUR_CLANG_REVISION
LLVM_SRC=$SRC/llvm-project
# Checkout
CHECKOUT_RETRIES=10
function clone_with_retries {
REPOSITORY=$1
LOCAL_PATH=$2
CHECKOUT_RETURN_CODE=1
# Disable exit on error since we might encounter some failures while retrying.
set +e
for i in $(seq 1 $CHECKOUT_RETRIES); do
rm -rf $LOCAL_PATH
git clone $REPOSITORY $LOCAL_PATH
CHECKOUT_RETURN_CODE=$?
if [ $CHECKOUT_RETURN_CODE -eq 0 ]; then
break
fi
done
# Re-enable exit on error. If checkout failed, script will exit.
set -e
return $CHECKOUT_RETURN_CODE
}
clone_with_retries https://github.com/llvm/llvm-project.git $LLVM_SRC
git -C $LLVM_SRC checkout $OUR_LLVM_REVISION
echo "Using LLVM revision: $OUR_LLVM_REVISION"
# Prepare fuzz introspector.
echo "Installing fuzz introspector"
FUZZ_INTROSPECTOR_CHECKOUT=341ebbd72bc9116733bcfcfab5adfd7f9b633e07
git clone https://github.com/ossf/fuzz-introspector.git /fuzz-introspector
cd /fuzz-introspector
git checkout $FUZZ_INTROSPECTOR_CHECKOUT
git submodule init
git submodule update
echo "Applying introspector changes"
OLD_WORKING_DIR=$PWD
cd $LLVM_SRC
cp -rf /fuzz-introspector/frontends/llvm/include/llvm/Transforms/FuzzIntrospector/ ./llvm/include/llvm/Transforms/FuzzIntrospector
cp -rf /fuzz-introspector/frontends/llvm/lib/Transforms/FuzzIntrospector ./llvm/lib/Transforms/FuzzIntrospector
# LLVM currently does not support dynamically loading LTO passes. Thus, we
# hardcode it into Clang instead. Ref: https://reviews.llvm.org/D77704
/fuzz-introspector/frontends/llvm/patch-llvm.sh
cd $OLD_WORKING_DIR
mkdir -p $WORK/llvm-stage2 $WORK/llvm-stage1
python3 $SRC/chromium_tools/clang/scripts/update.py --output-dir $WORK/llvm-stage1
cd $WORK/llvm-stage2
if [[ -n "$FULL_LLVM_BUILD" ]]; then
# Bootstrap libc++ so we can build llvm with it.
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi" \
-DLLVM_TARGETS_TO_BUILD="$TARGET_TO_BUILD" \
-DLLVM_BINUTILS_INCDIR="/usr/include/" \
-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
$LLVM_SRC/llvm
ninja runtimes -j $NPROC
ninja install-runtimes
# Make libc++ discoverable by the linker.
export LIBRARY_PATH=/usr/local/lib/x86_64-unknown-linux-gnu/
fi
# Note: LLVM_ENABLE_LIBCXX=ON doesn't break the build even if libcxx doesn't
# exist.
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DLLVM_ENABLE_LIBCXX=ON \
-DLLVM_ENABLE_WARNINGS=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_RUNTIMES="compiler-rt;libcxx;libcxxabi" \
-DLLVM_TARGETS_TO_BUILD="$TARGET_TO_BUILD" \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_BINUTILS_INCDIR="/usr/include/" \
-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
$LLVM_SRC/llvm
ninja -j $NPROC
ninja install
rm -rf $WORK/llvm-stage1 $WORK/llvm-stage2
# libFuzzer sources.
cp -r $LLVM_SRC/compiler-rt/lib/fuzzer $SRC/libfuzzer
# Use the clang we just built from now on.
export CC=clang
export CXX=clang++
function free_disk_space {
rm -rf $LLVM_SRC $SRC/chromium_tools
apt-get autoremove --purge -y $LLVM_DEP_PACKAGES
if [[ -n "$FULL_LLVM_BUILD" ]]; then
return 0
fi
# Delete unneeded parts of LLVM to reduce image size.
# See https://github.com/google/oss-fuzz/issues/5170
LLVM_TOOLS_TMPDIR=/tmp/llvm-tools
mkdir $LLVM_TOOLS_TMPDIR
# Move binaries with llvm- prefix that we want into LLVM_TOOLS_TMPDIR.
mv \
/usr/local/bin/llvm-ar \
/usr/local/bin/llvm-as \
/usr/local/bin/llvm-config \
/usr/local/bin/llvm-cov \
/usr/local/bin/llvm-link \
/usr/local/bin/llvm-objcopy \
/usr/local/bin/llvm-nm \
/usr/local/bin/llvm-profdata \
/usr/local/bin/llvm-ranlib \
/usr/local/bin/llvm-symbolizer \
/usr/local/bin/llvm-undname \
/usr/local/bin/llvm-readelf \
/usr/local/bin/llvm-readobj \
$LLVM_TOOLS_TMPDIR
# Delete remaining llvm- binaries.
rm -rf /usr/local/bin/llvm-*
# Restore the llvm- binaries we want to keep.
mv $LLVM_TOOLS_TMPDIR/* /usr/local/bin/
rm -rf $LLVM_TOOLS_TMPDIR
# Remove binaries from LLVM build that we don't need.
rm -f \
/usr/local/bin/bugpoint \
/usr/local/bin/llc \
/usr/local/bin/lli \
/usr/local/bin/clang-check \
/usr/local/bin/clang-refactor \
/usr/local/bin/clang-offload-wrapper \
/usr/local/bin/clang-offload-bundler \
/usr/local/bin/clang-repl \
/usr/local/bin/clang-check \
/usr/local/bin/clang-refactor \
/usr/local/bin/c-index-test \
/usr/local/bin/clang-rename \
/usr/local/bin/clang-scan-deps \
/usr/local/bin/clang-extdef-mapping \
/usr/local/bin/diagtool \
/usr/local/bin/sanstats \
/usr/local/bin/dsymutil \
/usr/local/bin/verify-uselistorder \
/usr/local/bin/clang-format
# Remove unneeded clang libs, CMake files from LLVM build, lld libs, and the
# libraries.
# Note: we need fuzzer_no_main libraries for atheris. Don't delete.
rm -rf \
/usr/local/lib/libclang* \
/usr/local/lib/liblld* \
/usr/local/lib/cmake/
}
if [ "$TARGET_TO_BUILD" == "AArch64" ]
then
free_disk_space
# Exit now on AArch64. We don't need to rebuild libc++ because on AArch64 we
# do not support MSAN nor do we care about i386.
exit 0
fi
function cmake_libcxx {
extra_args="$@"
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PIC=ON \
-DLLVM_TARGETS_TO_BUILD="$TARGET_TO_BUILD" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi" \
-DLLVM_BINUTILS_INCDIR="/usr/include/" \
$extra_args \
-S $LLVM_SRC/runtimes
}
# 32-bit libraries.
mkdir -p $WORK/i386
cd $WORK/i386
cmake_libcxx \
-DCMAKE_INSTALL_PREFIX=/usr/i386/ \
-DCMAKE_C_FLAGS="-m32" \
-DCMAKE_CXX_FLAGS="-m32"
ninja -j $NPROC cxx
ninja install-cxx
rm -rf $WORK/i386
# MemorySanitizer instrumented libraries.
mkdir -p $WORK/msan
cd $WORK/msan
# https://github.com/google/oss-fuzz/issues/1099
cat < $WORK/msan/ignorelist.txt
fun:__gxx_personality_*
EOF
cmake_libcxx \
-DLLVM_USE_SANITIZER=Memory \
-DCMAKE_INSTALL_PREFIX=/usr/msan/ \
-DCMAKE_CXX_FLAGS="-fsanitize-ignorelist=$WORK/msan/ignorelist.txt"
ninja -j $NPROC cxx
ninja install-cxx
rm -rf $WORK/msan
free_disk_space
================================================
FILE: infra/base-images/base-clang/checkout_build_install_llvm_ubuntu_20_04.sh
================================================
#!/bin/bash -eux
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
NPROC=$(nproc)
# Set this to get a full build with all binaries and libraries, as well as
# everything built with libcxx.
if [ -z "${FULL_LLVM_BUILD-}" ]; then
FULL_LLVM_BUILD=
fi
TARGET_TO_BUILD=
case $(uname -m) in
x86_64)
TARGET_TO_BUILD=X86
ARCHITECTURE_DEPS="g++-multilib"
# Use chromium's clang revision.
export CC=$WORK/llvm-stage1/bin/clang
export CXX=$WORK/llvm-stage1/bin/clang++
;;
aarch64)
TARGET_TO_BUILD=AArch64
# g++ multilib is not needed on AArch64 because we don't care about i386.
# We need to install clang and lld using apt because the binary downloaded
# from Chrome's developer tools doesn't support AArch64.
# TODO(metzman): Make x86_64 use the distro's clang for consistency once
# we support AArch64 fully.
ARCHITECTURE_DEPS="clang lld g++"
export CC=clang
export CXX=clang++
;;
*)
echo "Error: unsupported target $(uname -m)"
exit 1
;;
esac
INTROSPECTOR_DEP_PACKAGES="texinfo bison flex"
# zlib1g-dev is needed for llvm-profdata to handle coverage data from rust compiler
LLVM_DEP_PACKAGES="build-essential make ninja-build git python3 python3-distutils binutils-dev zlib1g-dev $ARCHITECTURE_DEPS $INTROSPECTOR_DEP_PACKAGES"
apt-get update && apt-get install -y $LLVM_DEP_PACKAGES --no-install-recommends
# For manual bumping.
# On each bump a full trial run for everything (fuzzing engines, sanitizers,
# languages, projects, ...) is needed.
# Check CMAKE_VERSION infra/base-images/base-clang/Dockerfile was released
# recently enough to fully support this clang version.
OUR_LLVM_REVISION=cb2f0d0a5f14
mkdir $SRC/chromium_tools
cd $SRC/chromium_tools
git clone https://chromium.googlesource.com/chromium/src/tools/clang
cd clang
# Pin clang script due to https://github.com/google/oss-fuzz/issues/7617
OUR_CLANG_REVISION=063d3766486a820c708e888d737b004d11543410
git checkout $OUR_CLANG_REVISION
LLVM_SRC=$SRC/llvm-project
# Checkout
CHECKOUT_RETRIES=10
function clone_with_retries {
REPOSITORY=$1
LOCAL_PATH=$2
CHECKOUT_RETURN_CODE=1
# Disable exit on error since we might encounter some failures while retrying.
set +e
for i in $(seq 1 $CHECKOUT_RETRIES); do
rm -rf $LOCAL_PATH
git clone $REPOSITORY $LOCAL_PATH
CHECKOUT_RETURN_CODE=$?
if [ $CHECKOUT_RETURN_CODE -eq 0 ]; then
break
fi
done
# Re-enable exit on error. If checkout failed, script will exit.
set -e
return $CHECKOUT_RETURN_CODE
}
clone_with_retries https://github.com/llvm/llvm-project.git $LLVM_SRC
git -C $LLVM_SRC checkout $OUR_LLVM_REVISION
echo "Using LLVM revision: $OUR_LLVM_REVISION"
# Prepare fuzz introspector.
echo "Installing fuzz introspector"
FUZZ_INTROSPECTOR_CHECKOUT=341ebbd72bc9116733bcfcfab5adfd7f9b633e07
git clone https://github.com/ossf/fuzz-introspector.git /fuzz-introspector
cd /fuzz-introspector
git checkout $FUZZ_INTROSPECTOR_CHECKOUT
git submodule init
git submodule update
echo "Applying introspector changes"
OLD_WORKING_DIR=$PWD
cd $LLVM_SRC
cp -rf /fuzz-introspector/frontends/llvm/include/llvm/Transforms/FuzzIntrospector/ ./llvm/include/llvm/Transforms/FuzzIntrospector
cp -rf /fuzz-introspector/frontends/llvm/lib/Transforms/FuzzIntrospector ./llvm/lib/Transforms/FuzzIntrospector
# LLVM currently does not support dynamically loading LTO passes. Thus, we
# hardcode it into Clang instead. Ref: https://reviews.llvm.org/D77704
/fuzz-introspector/frontends/llvm/patch-llvm.sh
cd $OLD_WORKING_DIR
mkdir -p $WORK/llvm-stage2 $WORK/llvm-stage1
python3 $SRC/chromium_tools/clang/scripts/update.py --output-dir $WORK/llvm-stage1
cd $WORK/llvm-stage2
if [[ -n "$FULL_LLVM_BUILD" ]]; then
# Bootstrap libc++ so we can build llvm with it.
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi" \
-DLLVM_TARGETS_TO_BUILD="$TARGET_TO_BUILD" \
-DLLVM_BINUTILS_INCDIR="/usr/include/" \
-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
$LLVM_SRC/llvm
ninja runtimes -j $NPROC
ninja install-runtimes
# Make libc++ discoverable by the linker.
export LIBRARY_PATH=/usr/local/lib/x86_64-unknown-linux-gnu/
fi
# Note: LLVM_ENABLE_LIBCXX=ON doesn't break the build even if libcxx doesn\'t
# exist.
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DLLVM_ENABLE_LIBCXX=ON \
-DLLVM_ENABLE_WARNINGS=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_RUNTIMES="compiler-rt;libcxx;libcxxabi" \
-DLLVM_TARGETS_TO_BUILD="$TARGET_TO_BUILD" \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_BINUTILS_INCDIR="/usr/include/" \
-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
$LLVM_SRC/llvm
ninja -j $NPROC
ninja install
rm -rf $WORK/llvm-stage1 $WORK/llvm-stage2
# libFuzzer sources.
cp -r $LLVM_SRC/compiler-rt/lib/fuzzer $SRC/libfuzzer
# Use the clang we just built from now on.
export CC=clang
export CXX=clang++
function free_disk_space {
rm -rf $LLVM_SRC $SRC/chromium_tools
apt-get autoremove --purge -y $LLVM_DEP_PACKAGES
if [[ -n "$FULL_LLVM_BUILD" ]]; then
return 0
fi
# Delete unneeded parts of LLVM to reduce image size.
# See https://github.com/google/oss-fuzz/issues/5170
LLVM_TOOLS_TMPDIR=/tmp/llvm-tools
mkdir $LLVM_TOOLS_TMPDIR
# Move binaries with llvm- prefix that we want into LLVM_TOOLS_TMPDIR.
mv \
/usr/local/bin/llvm-ar \
/usr/local/bin/llvm-as \
/usr/local/bin/llvm-config \
/usr/local/bin/llvm-cov \
/usr/local/bin/llvm-link \
/usr/local/bin/llvm-objcopy \
/usr/local/bin/llvm-nm \
/usr/local/bin/llvm-profdata \
/usr/local/bin/llvm-ranlib \
/usr/local/bin/llvm-symbolizer \
/usr/local/bin/llvm-undname \
/usr/local/bin/llvm-readelf \
/usr/local/bin/llvm-readobj \
$LLVM_TOOLS_TMPDIR
# Delete remaining llvm- binaries.
rm -rf /usr/local/bin/llvm-*
# Restore the llvm- binaries we want to keep.
mv $LLVM_TOOLS_TMPDIR/* /usr/local/bin/
rm -rf $LLVM_TOOLS_TMPDIR
# Remove binaries from LLVM build that we don't need.
rm -f \
/usr/local/bin/bugpoint \
/usr/local/bin/llc \
/usr/local/bin/lli \
/usr/local/bin/clang-check \
/usr/local/bin/clang-refactor \
/usr/local/bin/clang-offload-wrapper \
/usr/local/bin/clang-offload-bundler \
/usr/local/bin/clang-repl \
/usr/local/bin/clang-check \
/usr/local/bin/clang-refactor \
/usr/local/bin/c-index-test \
/usr/local/bin/clang-rename \
/usr/local/bin/clang-scan-deps \
/usr/local/bin/clang-extdef-mapping \
/usr/local/bin/diagtool \
/usr/local/bin/sanstats \
/usr/local/bin/dsymutil \
/usr/local/bin/verify-uselistorder \
/usr/local/bin/clang-format
# Remove unneeded clang libs, CMake files from LLVM build, lld libs, and the
# libraries.
# Note: we need fuzzer_no_main libraries for atheris. Don't delete.
rm -rf \
/usr/local/lib/libclang* \
/usr/local/lib/liblld* \
/usr/local/lib/cmake/
}
if [ "$TARGET_TO_BUILD" == "AArch64" ]
then
free_disk_space
# Exit now on AArch64. We don't need to rebuild libc++ because on AArch64 we
# do not support MSAN nor do we care about i386.
exit 0
fi
function cmake_libcxx {
extra_args="$@"
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PIC=ON \
-DLLVM_TARGETS_TO_BUILD="$TARGET_TO_BUILD" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi" \
-DLLVM_BINUTILS_INCDIR="/usr/include/" \
$extra_args \
-S $LLVM_SRC/runtimes
}
# 32-bit libraries.
mkdir -p $WORK/i386
cd $WORK/i386
cmake_libcxx \
-DCMAKE_INSTALL_PREFIX=/usr/i386/ \
-DCMAKE_C_FLAGS="-m32" \
-DCMAKE_CXX_FLAGS="-m32"
ninja -j $NPROC cxx
ninja install-cxx
rm -rf $WORK/i386
# MemorySanitizer instrumented libraries.
mkdir -p $WORK/msan
cd $WORK/msan
# https://github.com/google/oss-fuzz/issues/1099
cat < $WORK/msan/ignorelist.txt
fun:__gxx_personality_*
EOF
cmake_libcxx \
-DLLVM_USE_SANITIZER=Memory \
-DCMAKE_INSTALL_PREFIX=/usr/msan/ \
-DCMAKE_CXX_FLAGS="-fsanitize-ignorelist=$WORK/msan/ignorelist.txt"
ninja -j $NPROC cxx
ninja install-cxx
rm -rf $WORK/msan
free_disk_space
================================================
FILE: infra/base-images/base-clang/checkout_build_install_llvm_ubuntu_24_04.sh
================================================
#!/bin/bash -eux
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
NPROC=$(nproc)
# Set this to get a full build with all binaries and libraries, as well as
# everything built with libcxx.
if [ -z "${FULL_LLVM_BUILD-}" ]; then
FULL_LLVM_BUILD=
fi
TARGET_TO_BUILD=
case $(uname -m) in
x86_64)
TARGET_TO_BUILD=X86
ARCHITECTURE_DEPS="g++-multilib"
# Use chromium's clang revision.
export CC=$WORK/llvm-stage1/bin/clang
export CXX=$WORK/llvm-stage1/bin/clang++
;;
aarch64)
TARGET_TO_BUILD=AArch64
# g++ multilib is not needed on AArch64 because we don't care about i386.
# We need to install clang and lld using apt because the binary downloaded
# from Chrome's developer tools doesn't support AArch64.
# TODO(metzman): Make x86_64 use the distro's clang for consistency once
# we support AArch64 fully.
ARCHITECTURE_DEPS="clang lld g++"
export CC=clang
export CXX=clang++
;;
*)
echo "Error: unsupported target $(uname -m)"
exit 1
;;
esac
INTROSPECTOR_DEP_PACKAGES="texinfo bison flex"
# zlib1g-dev is needed for llvm-profdata to handle coverage data from rust compiler
LLVM_DEP_PACKAGES="build-essential make ninja-build git python3 python3-setuptools binutils-dev zlib1g-dev $ARCHITECTURE_DEPS $INTROSPECTOR_DEP_PACKAGES"
apt-get update && apt-get install -y $LLVM_DEP_PACKAGES --no-install-recommends
# For manual bumping.
# On each bump a full trial run for everything (fuzzing engines, sanitizers,
# languages, projects, ...) is needed.
# Check CMAKE_VERSION infra/base-images/base-clang/Dockerfile was released
# recently enough to fully support this clang version.
OUR_LLVM_REVISION=cb2f0d0a5f14
mkdir $SRC/chromium_tools
cd $SRC/chromium_tools
git clone https://chromium.googlesource.com/chromium/src/tools/clang
cd clang
# Pin clang script due to https://github.com/google/oss-fuzz/issues/7617
git checkout 063d3766486a820c708e888d737b004d11543410
LLVM_SRC=$SRC/llvm-project
# Checkout
CHECKOUT_RETRIES=10
function clone_with_retries {
REPOSITORY=$1
LOCAL_PATH=$2
CHECKOUT_RETURN_CODE=1
# Disable exit on error since we might encounter some failures while retrying.
set +e
for i in $(seq 1 $CHECKOUT_RETRIES); do
rm -rf $LOCAL_PATH
git clone $REPOSITORY $LOCAL_PATH
CHECKOUT_RETURN_CODE=$?
if [ $CHECKOUT_RETURN_CODE -eq 0 ]; then
break
fi
done
# Re-enable exit on error. If checkout failed, script will exit.
set -e
return $CHECKOUT_RETURN_CODE
}
clone_with_retries https://github.com/llvm/llvm-project.git $LLVM_SRC
git -C $LLVM_SRC checkout $OUR_LLVM_REVISION
echo "Using LLVM revision: $OUR_LLVM_REVISION"
# Prepare fuzz introspector.
echo "Installing fuzz introspector"
FUZZ_INTROSPECTOR_CHECKOUT=341ebbd72bc9116733bcfcfab5adfd7f9b633e07
git clone https://github.com/ossf/fuzz-introspector.git /fuzz-introspector
cd /fuzz-introspector
git checkout $FUZZ_INTROSPECTOR_CHECKOUT
git submodule init
git submodule update
# For fuzz introspector.
echo "Applying introspector changes"
OLD_WORKING_DIR=$PWD
cd $LLVM_SRC
cp -rf /fuzz-introspector/frontends/llvm/include/llvm/Transforms/FuzzIntrospector/ ./llvm/include/llvm/Transforms/FuzzIntrospector
cp -rf /fuzz-introspector/frontends/llvm/lib/Transforms/FuzzIntrospector ./llvm/lib/Transforms/FuzzIntrospector
# LLVM currently does not support dynamically loading LTO passes. Thus, we
# hardcode it into Clang instead. Ref: https://reviews.llvm.org/D77704
/fuzz-introspector/frontends/llvm/patch-llvm.sh
cd $OLD_WORKING_DIR
mkdir -p $WORK/llvm-stage2 $WORK/llvm-stage1
python3 $SRC/chromium_tools/clang/scripts/update.py --output-dir $WORK/llvm-stage1
cd $WORK/llvm-stage2
if [[ -n "$FULL_LLVM_BUILD" ]]; then
# Bootstrap libc++ so we can build llvm with it.
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi" \
-DLLVM_TARGETS_TO_BUILD="$TARGET_TO_BUILD" \
-DLLVM_BINUTILS_INCDIR="/usr/include/" \
-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
$LLVM_SRC/llvm
ninja runtimes -j $NPROC
ninja install-runtimes
# Make libc++ discoverable by the linker.
export LIBRARY_PATH=/usr/local/lib/x86_64-unknown-linux-gnu/
fi
# Note: LLVM_ENABLE_LIBCXX=ON doesn't break the build even if libcxx doesn't
# exist.
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DLLVM_ENABLE_LIBCXX=ON \
-DLLVM_ENABLE_WARNINGS=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_RUNTIMES="compiler-rt;libcxx;libcxxabi" \
-DLLVM_TARGETS_TO_BUILD="$TARGET_TO_BUILD" \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_BINUTILS_INCDIR="/usr/include/" \
-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
$LLVM_SRC/llvm
ninja -j $NPROC
ninja install
rm -rf $WORK/llvm-stage1 $WORK/llvm-stage2
# libFuzzer sources.
cp -r $LLVM_SRC/compiler-rt/lib/fuzzer $SRC/libfuzzer
# Use the clang we just built from now on.
export CC=clang
export CXX=clang++
function free_disk_space {
rm -rf $LLVM_SRC $SRC/chromium_tools
apt-get autoremove --purge -y $LLVM_DEP_PACKAGES
if [[ -n "$FULL_LLVM_BUILD" ]]; then
return 0
fi
# Delete unneeded parts of LLVM to reduce image size.
# See https://github.com/google/oss-fuzz/issues/5170
LLVM_TOOLS_TMPDIR=/tmp/llvm-tools
mkdir $LLVM_TOOLS_TMPDIR
# Move binaries with llvm- prefix that we want into LLVM_TOOLS_TMPDIR.
mv \
/usr/local/bin/llvm-ar \
/usr/local/bin/llvm-as \
/usr/local/bin/llvm-config \
/usr/local/bin/llvm-cov \
/usr/local/bin/llvm-objcopy \
/usr/local/bin/llvm-nm \
/usr/local/bin/llvm-profdata \
/usr/local/bin/llvm-ranlib \
/usr/local/bin/llvm-symbolizer \
/usr/local/bin/llvm-undname \
/usr/local/bin/llvm-readelf \
/usr/local/bin/llvm-readobj \
$LLVM_TOOLS_TMPDIR
# Delete remaining llvm- binaries.
rm -rf /usr/local/bin/llvm-*
# Restore the llvm- binaries we want to keep.
mv $LLVM_TOOLS_TMPDIR/* /usr/local/bin/
rm -rf $LLVM_TOOLS_TMPDIR
# Remove binaries from LLVM build that we don't need.
rm -f \
/usr/local/bin/bugpoint \
/usr/local/bin/llc \
/usr/local/bin/lli \
/usr/local/bin/clang-check \
/usr/local/bin/clang-refactor \
/usr/local/bin/clang-offload-wrapper \
/usr/local/bin/clang-offload-bundler \
/usr/local/bin/clang-repl \
/usr/local/bin/clang-check \
/usr/local/bin/clang-refactor \
/usr/local/bin/c-index-test \
/usr/local/bin/clang-rename \
/usr/local/bin/clang-scan-deps \
/usr/local/bin/clang-extdef-mapping \
/usr/local/bin/diagtool \
/usr/local/bin/sanstats \
/usr/local/bin/dsymutil \
/usr/local/bin/verify-uselistorder \
/usr/local/bin/clang-format
# Remove unneeded clang libs, CMake files from LLVM build, lld libs, and the
# libraries.
# Note: we need fuzzer_no_main libraries for atheris. Don't delete.
rm -rf \
/usr/local/lib/libclang* \
/usr/local/lib/liblld* \
/usr/local/lib/cmake/
}
if [ "$TARGET_TO_BUILD" == "AArch64" ]
then
free_disk_space
# Exit now on AArch64. We don't need to rebuild libc++ because on AArch64 we
# do not support MSAN nor do we care about i386.
exit 0
fi
function cmake_libcxx {
extra_args="$@"
cmake -G "Ninja" \
-DLIBCXX_ENABLE_SHARED=OFF \
-DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF \
-DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PIC=ON \
-DLLVM_TARGETS_TO_BUILD="$TARGET_TO_BUILD" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi" \
-DLLVM_BINUTILS_INCDIR="/usr/include/" \
$extra_args \
-S $LLVM_SRC/runtimes
}
# 32-bit libraries.
mkdir -p $WORK/i386
cd $WORK/i386
cmake_libcxx \
-DCMAKE_INSTALL_PREFIX=/usr/i386/ \
-DCMAKE_C_FLAGS="-m32" \
-DCMAKE_CXX_FLAGS="-m32"
ninja -j $NPROC cxx
ninja install-cxx
rm -rf $WORK/i386
# MemorySanitizer instrumented libraries.
mkdir -p $WORK/msan
cd $WORK/msan
# https://github.com/google/oss-fuzz/issues/1099
cat < $WORK/msan/ignorelist.txt
fun:__gxx_personality_*
EOF
cmake_libcxx \
-DLLVM_USE_SANITIZER=Memory \
-DCMAKE_INSTALL_PREFIX=/usr/msan/ \
-DCMAKE_CXX_FLAGS="-fsanitize-ignorelist=$WORK/msan/ignorelist.txt"
ninja -j $NPROC cxx
ninja install-cxx
rm -rf $WORK/msan
free_disk_space
================================================
FILE: infra/base-images/base-clang/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Docker image with clang installed.
FROM gcr.io/oss-fuzz-base/base-image
ARG arch=x86_64
ENV FUZZINTRO_OUTDIR=$SRC
# Install newer cmake.
# Many projects, as well as recent clang versions, need a newer cmake.
ENV CMAKE_VERSION 3.29.2
RUN apt-get update && apt-get install -y wget sudo && \
wget -q https://github.com/Kitware/CMake/releases/download/v$CMAKE_VERSION/cmake-$CMAKE_VERSION-Linux-$arch.sh && \
chmod +x cmake-$CMAKE_VERSION-Linux-$arch.sh && \
./cmake-$CMAKE_VERSION-Linux-$arch.sh --skip-license --prefix="/usr/local" && \
rm cmake-$CMAKE_VERSION-Linux-$arch.sh && \
SUDO_FORCE_REMOVE=yes apt-get autoremove --purge -y wget sudo && \
rm -rf /usr/local/doc/cmake /usr/local/bin/cmake-gui
COPY checkout_build_install_llvm_ubuntu_20_04.sh /root/
# Keep all steps in the same script to decrease the number of intermediate
# layes in docker file.
ARG FULL_LLVM_BUILD
RUN FULL_LLVM_BUILD=$FULL_LLVM_BUILD /root/checkout_build_install_llvm_ubuntu_20_04.sh
RUN rm /root/checkout_build_install_llvm_ubuntu_20_04.sh
# Setup the environment.
ENV CC "clang"
ENV CXX "clang++"
ENV CCC "clang++"
# FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION is described at
# https://llvm.org/docs/LibFuzzer.html#fuzzer-friendly-build-mode
# The implicit-function-declaration and implicit-int errors are downgraded to a
# warning, to allow compiling legacy code.
# See https://releases.llvm.org/16.0.0/tools/clang/docs/ReleaseNotes.html#potentially-breaking-changes
# Same for deprecated-declarations, int-conversion,
# incompatible-function-pointer-types, enum-constexpr-conversion,
# vla-cxx-extension
ENV CFLAGS -O1 \
-fno-omit-frame-pointer \
-gline-tables-only \
-Wno-error=incompatible-function-pointer-types \
-Wno-error=int-conversion \
-Wno-error=deprecated-declarations \
-Wno-error=implicit-function-declaration \
-Wno-error=implicit-int \
-Wno-error=unknown-warning-option \
-Wno-error=vla-cxx-extension \
-DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
ENV CXXFLAGS_EXTRA "-stdlib=libc++"
ENV CXXFLAGS "$CFLAGS $CXXFLAGS_EXTRA"
================================================
FILE: infra/base-images/base-clang/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Docker image with clang installed.
FROM gcr.io/oss-fuzz-base/base-image:ubuntu-24-04
ARG arch=x86_64
ENV FUZZINTRO_OUTDIR=$SRC
# Install newer cmake.
# Many projects, as well as recent clang versions, need a newer cmake.
ENV CMAKE_VERSION 3.29.2
RUN apt-get update && apt-get install -y wget sudo && \
wget -q https://github.com/Kitware/CMake/releases/download/v$CMAKE_VERSION/cmake-$CMAKE_VERSION-Linux-$arch.sh && \
chmod +x cmake-$CMAKE_VERSION-Linux-$arch.sh && \
./cmake-$CMAKE_VERSION-Linux-$arch.sh --skip-license --prefix="/usr/local" && \
rm cmake-$CMAKE_VERSION-Linux-$arch.sh && \
SUDO_FORCE_REMOVE=yes apt-get autoremove --purge -y wget sudo && \
rm -rf /usr/local/doc/cmake /usr/local/bin/cmake-gui
COPY checkout_build_install_llvm_ubuntu_24_04.sh /root/
RUN chmod +x /root/checkout_build_install_llvm_ubuntu_24_04.sh
# Keep all steps in the same script to decrease the number of intermediate
# layes in docker file.
ARG FULL_LLVM_BUILD
RUN FULL_LLVM_BUILD=$FULL_LLVM_BUILD /root/checkout_build_install_llvm_ubuntu_24_04.sh
RUN rm /root/checkout_build_install_llvm_ubuntu_24_04.sh
# Setup the environment.
ENV CC "clang"
ENV CXX "clang++"
ENV CCC "clang++"
# FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION is described at
# https://llvm.org/docs/LibFuzzer.html#fuzzer-friendly-build-mode
# The implicit-function-declaration and implicit-int errors are downgraded to a
# warning, to allow compiling legacy code.
# See https://releases.llvm.org/16.0.0/tools/clang/docs/ReleaseNotes.html#potentially-breaking-changes
# Same for deprecated-declarations, int-conversion,
# incompatible-function-pointer-types, enum-constexpr-conversion,
# vla-cxx-extension
ENV CFLAGS -O1 \
-fno-omit-frame-pointer \
-gline-tables-only \
-Wno-error=incompatible-function-pointer-types \
-Wno-error=int-conversion \
-Wno-error=deprecated-declarations \
-Wno-error=implicit-function-declaration \
-Wno-error=implicit-int \
-Wno-error=unknown-warning-option \
-Wno-error=vla-cxx-extension \
-DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
ENV CXXFLAGS_EXTRA "-stdlib=libc++"
ENV CXXFLAGS "$CFLAGS $CXXFLAGS_EXTRA"
================================================
FILE: infra/base-images/base-image/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-image
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-image` were successfully built. Both images are based on their respective Ubuntu versions and include essential packages for the fuzzing environment. The primary difference between the two is the version of `libgcc-dev` used, which is `libgcc-9-dev` for Ubuntu 20.04 and `libgcc-13-dev` for Ubuntu 24.04.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-image:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-image:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
| Package | Ubuntu 20.04 Version | Ubuntu 24.04 Version | Notes |
| --- | --- | --- | --- |
| `libgcc-9-dev` | Installed | - | Specific to Ubuntu 20.04 |
| `libgcc-13-dev` | - | Installed | Specific to Ubuntu 24.04 |
## Dockerfile Analysis
The Dockerfiles for both versions are very similar, with the main differences being:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding Ubuntu version (`ubuntu:20.04` or `ubuntu:24.04`).
* **Package Installation:** The `apt-get install` command is updated to install the correct version of `libgcc-dev` for each Ubuntu release.
================================================
FILE: infra/base-images/base-image/Dockerfile
================================================
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Base image for all other images.
ARG parent_image=ubuntu:20.04@sha256:4a45212e9518f35983a976eead0de5eecc555a2f047134e9dd2cfc589076a00d
FROM $parent_image
ENV DEBIAN_FRONTEND noninteractive
# Install tzadata to match ClusterFuzz
# (https://github.com/google/oss-fuzz/issues/9280).
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y libc6-dev binutils libgcc-9-dev tzdata && \
apt-get autoremove -y
ENV OUT=/out
ENV SRC=/src
ENV WORK=/work
ENV PATH="$PATH:/out"
ENV HWASAN_OPTIONS=random_tags=0
RUN mkdir -p $OUT $SRC $WORK && chmod a+rwx $OUT $SRC $WORK
================================================
FILE: infra/base-images/base-image/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Base image for all other images.
ARG parent_image=ubuntu:20.04@sha256:4a45212e9518f35983a976eead0de5eecc555a2f047134e9dd2cfc589076a00d
FROM $parent_image
ENV DEBIAN_FRONTEND noninteractive
# Install tzadata to match ClusterFuzz
# (https://github.com/google/oss-fuzz/issues/9280).
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y libc6-dev binutils libgcc-9-dev tzdata && \
apt-get autoremove -y
ENV OUT=/out
ENV SRC=/src
ENV WORK=/work
ENV PATH="$PATH:/out"
ENV HWASAN_OPTIONS=random_tags=0
RUN mkdir -p $OUT $SRC $WORK && chmod a+rwx $OUT $SRC $WORK
================================================
FILE: infra/base-images/base-image/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Base image for all other images.
ARG parent_image=ubuntu:24.04@sha256:9cbed754112939e914291337b5e554b07ad7c392491dba6daf25eef1332a22e8
FROM $parent_image
ENV DEBIAN_FRONTEND noninteractive
# Install tzadata to match ClusterFuzz
# (https://github.com/google/oss-fuzz/issues/9280).
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y libc6-dev binutils libgcc-13-dev tzdata && \
apt-get autoremove -y
ENV OUT=/out
ENV SRC=/src
ENV WORK=/work
ENV PATH="$PATH:/out"
ENV HWASAN_OPTIONS=random_tags=0
RUN mkdir -p $OUT $SRC $WORK && chmod a+rwx $OUT $SRC $WORK
================================================
FILE: infra/base-images/base-runner/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-runner
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-runner` were successfully built. These images are used to run fuzzers and contain the necessary runtime dependencies. The initial build failed due to incorrect paths in the `COPY` instructions, which was resolved by making the paths relative to the build context.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-runner:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-runner:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The `ubuntu-24-04` image includes newer versions of many packages, including Python, Java, and Node.js. The specific versions of other tools and libraries also differ due to the updated base image.
## Dockerfile Analysis
The Dockerfiles for both versions have the following key differences:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-image` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **Dependency Installation:** The `install_deps.sh` script is used to install a base set of dependencies, which differ between the two versions.
* **COPY Instructions:** The `COPY` instructions were corrected to use paths relative to the build context.
================================================
FILE: infra/base-images/base-runner/Dockerfile
================================================
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Build rust stuff in its own image. We only need the resulting binaries.
# Keeping the rust toolchain in the image wastes 1 GB.
FROM gcr.io/oss-fuzz-base/base-image as temp-runner-binary-builder
RUN apt-get update && apt-get install -y cargo libyaml-dev
RUN cargo install rustfilt
# Using multi-stage build to copy some LLVM binaries needed in the runner image.
FROM gcr.io/oss-fuzz-base/base-clang AS base-clang
FROM gcr.io/oss-fuzz-base/base-builder-ruby AS base-ruby
# The base builder image compiles a specific Python version. Using a multi-stage build
# to copy that same Python interpreter into the runner image saves build time and keeps
# the Python versions in sync.
FROM gcr.io/oss-fuzz-base/base-builder AS base-builder
# Real image that will be used later.
FROM gcr.io/oss-fuzz-base/base-image
COPY --from=temp-runner-binary-builder /root/.cargo/bin/rustfilt /usr/local/bin
# Copy the binaries needed for code coverage and crash symbolization.
COPY --from=base-clang /usr/local/bin/llvm-cov \
/usr/local/bin/llvm-profdata \
/usr/local/bin/llvm-symbolizer \
/usr/local/bin/
# Copy the pre-compiled Python binaries and libraries
COPY --from=base-builder /usr/local/bin/python3.11 /usr/local/bin/python3.11
COPY --from=base-builder /usr/local/lib/libpython3.11.so.1.0 /usr/local/lib/libpython3.11.so.1.0
COPY --from=base-builder /usr/local/include/python3.11 /usr/local/include/python3.11
COPY --from=base-builder /usr/local/lib/python3.11 /usr/local/lib/python3.11
COPY --from=base-builder /usr/local/bin/pip3 /usr/local/bin/pip3
# Create symbolic links to ensure compatibility
RUN ldconfig && \
ln -s /usr/local/bin/python3.11 /usr/local/bin/python3 && \
ln -s /usr/local/bin/python3.11 /usr/local/bin/python
COPY install_deps.sh /
RUN /install_deps.sh && rm /install_deps.sh
ENV CODE_COVERAGE_SRC=/opt/code_coverage
# Pin coverage to the same as in the base builder:
# https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/install_python.sh#L22
RUN git clone https://chromium.googlesource.com/chromium/src/tools/code_coverage $CODE_COVERAGE_SRC && \
cd /opt/code_coverage && \
git checkout edba4873b5e8a390e977a64c522db2df18a8b27d && \
pip3 install wheel && \
# If version "Jinja2==2.10" is in requirements.txt, bump it to a patch version that
# supports upgrading its MarkupSafe dependency to a Python 3.11 compatible release:
sed -i 's/Jinja2==2.10/Jinja2==2.10.3/' requirements.txt && \
pip3 install -r requirements.txt && \
pip3 install MarkupSafe==2.0.1 && \
pip3 install coverage==6.3.2
# Default environment options for various sanitizers.
# Note that these match the settings used in ClusterFuzz and
# shouldn't be changed unless a corresponding change is made on
# ClusterFuzz side as well.
ENV ASAN_OPTIONS="alloc_dealloc_mismatch=0:allocator_may_return_null=1:allocator_release_to_os_interval_ms=500:check_malloc_usable_size=0:detect_container_overflow=1:detect_odr_violation=0:detect_leaks=1:detect_stack_use_after_return=1:fast_unwind_on_fatal=0:handle_abort=1:handle_segv=1:handle_sigill=1:max_uar_stack_size_log=16:print_scariness=1:quarantine_size_mb=10:strict_memcmp=1:strip_path_prefix=/workspace/:symbolize=1:use_sigaltstack=1:dedup_token_length=3"
ENV MSAN_OPTIONS="print_stats=1:strip_path_prefix=/workspace/:symbolize=1:dedup_token_length=3"
ENV UBSAN_OPTIONS="print_stacktrace=1:print_summary=1:silence_unsigned_overflow=1:strip_path_prefix=/workspace/:symbolize=1:dedup_token_length=3"
ENV FUZZER_ARGS="-rss_limit_mb=2560 -timeout=25"
ENV AFL_FUZZER_ARGS="-m none"
# Set up Golang environment variables (copied from /root/.bash_profile).
ENV GOPATH /root/go
# /root/.go/bin is for the standard Go binaries (i.e. go, gofmt, etc).
# $GOPATH/bin is for the binaries from the dependencies installed via "go get".
ENV PATH $PATH:$GOPATH/bin
COPY gocoverage $GOPATH/gocoverage
COPY install_go.sh /
RUN /install_go.sh && rm -rf /install_go.sh /root/.go
# Install OpenJDK 15 and trim its size by removing unused components.
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
ENV JAVA_15_HOME=/usr/lib/jvm/java-15-openjdk-amd64
ENV JVM_LD_LIBRARY_PATH=$JAVA_HOME/lib/server
ENV PATH=$PATH:$JAVA_HOME/bin
COPY install_java.sh /
RUN /install_java.sh && rm /install_java.sh
# Install JaCoCo for JVM coverage.
RUN wget https://repo1.maven.org/maven2/org/jacoco/org.jacoco.cli/0.8.7/org.jacoco.cli-0.8.7-nodeps.jar -O /opt/jacoco-cli.jar && \
wget https://repo1.maven.org/maven2/org/jacoco/org.jacoco.agent/0.8.7/org.jacoco.agent-0.8.7-runtime.jar -O /opt/jacoco-agent.jar && \
echo "37df187b76888101ecd745282e9cd1ad4ea508d6 /opt/jacoco-agent.jar" | shasum --check && \
echo "c1814e7bba5fd8786224b09b43c84fd6156db690 /opt/jacoco-cli.jar" | shasum --check
COPY install_javascript.sh /
RUN /install_javascript.sh && rm /install_javascript.sh
# Copy built ruby. It is up to the fuzzing harnesses
# themselves to set GEM_HOME and GEM_PATH appropriately, as this depends
# on how the harnesses are packaged.
COPY --from=base-ruby /usr/local/bin/ruby /usr/local/bin/ruby
COPY --from=base-ruby /usr/local/bin/gem /usr/local/bin/gem
COPY --from=base-ruby /usr/local/lib/ruby /usr/local/lib/ruby
COPY --from=base-ruby /usr/local/include/ruby-3.3.0 /usr/local/include/ruby-3.3.0
RUN apt-get update && apt-get install -y luarocks
# Do this last to make developing these files easier/faster due to caching.
COPY bad_build_check \
coverage \
coverage_helper \
download_corpus \
jacoco_report_converter.py \
nyc_report_converter.py \
rcfilt \
reproduce \
run_fuzzer \
parse_options.py \
generate_differential_cov_report.py \
profraw_update.py \
targets_list \
test_all.py \
test_one.py \
python_coverage_runner_help.py \
/usr/local/bin/
================================================
FILE: infra/base-images/base-runner/README.md
================================================
# base-runner
> Base image for fuzzer runners.
```bash
docker run -ti gcr.io/oss-fuzz-base/base-runner
```
## Commands
| Command | Description |
|---------|-------------|
| `reproduce ` | build all fuzz targets and run specified one with testcase `/testcase` and given options.
| `run_fuzzer ` | runs specified fuzzer combining options with `.options` file |
| `test_all.py` | runs every binary in `/out` as a fuzzer for a while to ensure it works. |
| `coverage ` | generate a coverage report for the given fuzzer. |
# Examples
- *Reproduce using latest OSS-Fuzz build:*
docker run --rm -ti -v <testcase_path>:/testcase gcr.io/oss-fuzz/$PROJECT_NAME reproduce <fuzzer_name>
================================================
FILE: infra/base-images/base-runner/bad_build_check
================================================
#!/bin/bash -u
# Copyright 2017 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# A minimal number of runs to test fuzz target with a non-empty input.
MIN_NUMBER_OF_RUNS=4
# The "example" target has 73 with ASan, 65 with UBSan, and 6648 with MSan.
# Real world targets have greater values (arduinojson: 407, zlib: 664).
# Mercurial's bdiff_fuzzer has 116 PCs when built with ASan.
THRESHOLD_FOR_NUMBER_OF_EDGES=100
# A fuzz target is supposed to have at least two functions, such as
# LLVMFuzzerTestOneInput and an API that is being called from there.
THRESHOLD_FOR_NUMBER_OF_FUNCTIONS=2
# Threshold values for different sanitizers used by instrumentation checks.
ASAN_CALLS_THRESHOLD_FOR_ASAN_BUILD=1000
ASAN_CALLS_THRESHOLD_FOR_NON_ASAN_BUILD=0
# The value below can definitely be higher (like 500-1000), but avoid being too
# agressive here while still evaluating the DFT-based fuzzing approach.
DFSAN_CALLS_THRESHOLD_FOR_DFSAN_BUILD=100
DFSAN_CALLS_THRESHOLD_FOR_NON_DFSAN_BUILD=0
MSAN_CALLS_THRESHOLD_FOR_MSAN_BUILD=1000
# Some engines (e.g. honggfuzz) may make a very small number of calls to msan
# for memory poisoning.
MSAN_CALLS_THRESHOLD_FOR_NON_MSAN_BUILD=3
# Usually, a non UBSan build (e.g. ASan) has 165 calls to UBSan runtime. The
# majority of targets built with UBSan have 200+ UBSan calls, but there are
# some very small targets that may have < 200 UBSan calls even in a UBSan build.
# Use the threshold value of 168 (slightly > 165) for UBSan build.
UBSAN_CALLS_THRESHOLD_FOR_UBSAN_BUILD=168
# It would be risky to use the threshold value close to 165 for non UBSan build,
# as UBSan runtime may change any time and thus we could have different number
# of calls to UBSan runtime even in ASan build. With that, we use the threshold
# value of 200 that would detect unnecessary UBSan instrumentation in the vast
# majority of targets, except of a handful very small ones, which would not be
# a big concern either way as the overhead for them would not be significant.
UBSAN_CALLS_THRESHOLD_FOR_NON_UBSAN_BUILD=200
# ASan builds on i386 generally have about 250 UBSan runtime calls.
if [[ $ARCHITECTURE == 'i386' ]]
then
UBSAN_CALLS_THRESHOLD_FOR_NON_UBSAN_BUILD=280
fi
# Verify that the given fuzz target is correctly built to run with a particular
# engine.
function check_engine {
local FUZZER=$1
local FUZZER_NAME=$(basename $FUZZER)
local FUZZER_OUTPUT="/tmp/$FUZZER_NAME.output"
local CHECK_FAILED=0
if [[ "$FUZZING_ENGINE" == libfuzzer ]]; then
# Store fuzz target's output into a temp file to be used for further checks.
$FUZZER -- -seed=1337 -runs=$MIN_NUMBER_OF_RUNS &>$FUZZER_OUTPUT
CHECK_FAILED=$(egrep "ERROR: no interesting inputs were found. Is the code instrumented" -c $FUZZER_OUTPUT)
if (( $CHECK_FAILED > 0 )); then
echo "BAD BUILD: $FUZZER does not seem to have coverage instrumentation."
cat $FUZZER_OUTPUT
# Bail out as the further check does not make any sense, there are 0 PCs.
return 1
fi
local NUMBER_OF_EDGES=$(grep -Po "INFO: Loaded [[:digit:]]+ module.*\(.*(counters|guards)\):[[:space:]]+\K[[:digit:]]+" $FUZZER_OUTPUT)
# If a fuzz target fails to start, grep won't find anything, so bail out early to let check_startup_crash deal with it.
[[ -z "$NUMBER_OF_EDGES" ]] && return
if (( $NUMBER_OF_EDGES < $THRESHOLD_FOR_NUMBER_OF_EDGES )); then
echo "BAD BUILD: $FUZZER seems to have only partial coverage instrumentation."
fi
elif [[ "$FUZZING_ENGINE" == afl ]]; then
AFL_FORKSRV_INIT_TMOUT=30000 AFL_NO_UI=1 SKIP_SEED_CORPUS=1 timeout --preserve-status -s INT 35s run_fuzzer $FUZZER_NAME &>$FUZZER_OUTPUT
CHECK_PASSED=$(egrep "All set and ready to roll" -c $FUZZER_OUTPUT)
if (( $CHECK_PASSED == 0 )); then
echo "BAD BUILD: fuzzing $FUZZER with afl-fuzz failed."
cat $FUZZER_OUTPUT
return 1
fi
elif [[ "$FUZZING_ENGINE" == honggfuzz ]]; then
SKIP_SEED_CORPUS=1 timeout --preserve-status -s INT 20s run_fuzzer $FUZZER_NAME &>$FUZZER_OUTPUT
CHECK_PASSED=$(egrep "^Sz:[0-9]+ Tm:[0-9]+" -c $FUZZER_OUTPUT)
if (( $CHECK_PASSED == 0 )); then
echo "BAD BUILD: fuzzing $FUZZER with honggfuzz failed."
cat $FUZZER_OUTPUT
return 1
fi
elif [[ "$FUZZING_ENGINE" == dataflow ]]; then
$FUZZER &> $FUZZER_OUTPUT
local NUMBER_OF_FUNCTIONS=$(grep -Po "INFO:\s+\K[[:digit:]]+(?=\s+instrumented function.*)" $FUZZER_OUTPUT)
[[ -z "$NUMBER_OF_FUNCTIONS" ]] && NUMBER_OF_FUNCTIONS=0
if (( $NUMBER_OF_FUNCTIONS < $THRESHOLD_FOR_NUMBER_OF_FUNCTIONS )); then
echo "BAD BUILD: $FUZZER does not seem to be properly built in 'dataflow' config."
cat $FUZZER_OUTPUT
return 1
fi
elif [[ "$FUZZING_ENGINE" == centipede \
&& ("${HELPER:-}" == True || "$SANITIZER" == none ) ]]; then
# Performs run test on unsanitized binaries with auxiliary sanitized
# binaries if they are built with helper.py.
# Performs run test on unsanitized binaries without auxiliary sanitized
# binaries if they are from trial build and production build.
# TODO(Dongge): Support run test with sanitized binaries for trial and
# production build.
SKIP_SEED_CORPUS=1 timeout --preserve-status -s INT 20s run_fuzzer $FUZZER_NAME &>$FUZZER_OUTPUT
CHECK_PASSED=$(egrep "\[S0.0] begin-fuzz: ft: 0 corp: 0/0" -c $FUZZER_OUTPUT)
if (( $CHECK_PASSED == 0 )); then
echo "BAD BUILD: fuzzing $FUZZER with centipede failed."
cat $FUZZER_OUTPUT
return 1
fi
fi
return 0
}
# Verify that the given fuzz target has been built properly and works.
function check_startup_crash {
local FUZZER=$1
local FUZZER_NAME=$(basename $FUZZER)
local FUZZER_OUTPUT="/tmp/$FUZZER_NAME.output"
local CHECK_PASSED=0
if [[ "$FUZZING_ENGINE" = libfuzzer ]]; then
# Skip seed corpus as there is another explicit check that uses seed corpora.
SKIP_SEED_CORPUS=1 run_fuzzer $FUZZER_NAME -seed=1337 -runs=$MIN_NUMBER_OF_RUNS &>$FUZZER_OUTPUT
CHECK_PASSED=$(egrep "Done $MIN_NUMBER_OF_RUNS runs" -c $FUZZER_OUTPUT)
elif [[ "$FUZZING_ENGINE" = afl ]]; then
AFL_FORKSRV_INIT_TMOUT=30000 AFL_NO_UI=1 SKIP_SEED_CORPUS=1 timeout --preserve-status -s INT 35s run_fuzzer $FUZZER_NAME &>$FUZZER_OUTPUT
if [ $(egrep "target binary (crashed|terminated)" -c $FUZZER_OUTPUT) -eq 0 ]; then
CHECK_PASSED=1
fi
elif [[ "$FUZZING_ENGINE" = dataflow ]]; then
# TODO(https://github.com/google/oss-fuzz/issues/1632): add check for
# binaries compiled with dataflow engine when the interface becomes stable.
CHECK_PASSED=1
else
# TODO: add checks for another fuzzing engines if possible.
CHECK_PASSED=1
fi
if [ "$CHECK_PASSED" -eq "0" ]; then
echo "BAD BUILD: $FUZZER seems to have either startup crash or exit:"
cat $FUZZER_OUTPUT
return 1
fi
return 0
}
# Mixed sanitizers check for ASan build.
function check_asan_build {
local FUZZER=$1
local ASAN_CALLS=$2
local DFSAN_CALLS=$3
local MSAN_CALLS=$4
local UBSAN_CALLS=$5
# Perform all the checks for more detailed error message.
if (( $ASAN_CALLS < $ASAN_CALLS_THRESHOLD_FOR_ASAN_BUILD )); then
echo "BAD BUILD: $FUZZER does not seem to be compiled with ASan."
return 1
fi
if (( $DFSAN_CALLS > $DFSAN_CALLS_THRESHOLD_FOR_NON_DFSAN_BUILD )); then
echo "BAD BUILD: ASan build of $FUZZER seems to be compiled with DFSan."
return 1
fi
if (( $MSAN_CALLS > $MSAN_CALLS_THRESHOLD_FOR_NON_MSAN_BUILD )); then
echo "BAD BUILD: ASan build of $FUZZER seems to be compiled with MSan."
return 1
fi
if (( $UBSAN_CALLS > $UBSAN_CALLS_THRESHOLD_FOR_NON_UBSAN_BUILD )); then
echo "BAD BUILD: ASan build of $FUZZER seems to be compiled with UBSan."
return 1
fi
return 0
}
# Mixed sanitizers check for DFSan build.
function check_dfsan_build {
local FUZZER=$1
local ASAN_CALLS=$2
local DFSAN_CALLS=$3
local MSAN_CALLS=$4
local UBSAN_CALLS=$5
# Perform all the checks for more detailed error message.
if (( $ASAN_CALLS > $ASAN_CALLS_THRESHOLD_FOR_NON_ASAN_BUILD )); then
echo "BAD BUILD: DFSan build of $FUZZER seems to be compiled with ASan."
return 1
fi
if (( $DFSAN_CALLS < $DFSAN_CALLS_THRESHOLD_FOR_DFSAN_BUILD )); then
echo "BAD BUILD: $FUZZER does not seem to be compiled with DFSan."
return 1
fi
if (( $MSAN_CALLS > $MSAN_CALLS_THRESHOLD_FOR_NON_MSAN_BUILD )); then
echo "BAD BUILD: ASan build of $FUZZER seems to be compiled with MSan."
return 1
fi
if (( $UBSAN_CALLS > $UBSAN_CALLS_THRESHOLD_FOR_NON_UBSAN_BUILD )); then
echo "BAD BUILD: ASan build of $FUZZER seems to be compiled with UBSan."
return 1
fi
return 0
}
# Mixed sanitizers check for MSan build.
function check_msan_build {
local FUZZER=$1
local ASAN_CALLS=$2
local DFSAN_CALLS=$3
local MSAN_CALLS=$4
local UBSAN_CALLS=$5
# Perform all the checks for more detailed error message.
if (( $ASAN_CALLS > $ASAN_CALLS_THRESHOLD_FOR_NON_ASAN_BUILD )); then
echo "BAD BUILD: MSan build of $FUZZER seems to be compiled with ASan."
return 1
fi
if (( $DFSAN_CALLS > $DFSAN_CALLS_THRESHOLD_FOR_NON_DFSAN_BUILD )); then
echo "BAD BUILD: MSan build of $FUZZER seems to be compiled with DFSan."
return 1
fi
if (( $MSAN_CALLS < $MSAN_CALLS_THRESHOLD_FOR_MSAN_BUILD )); then
echo "BAD BUILD: $FUZZER does not seem to be compiled with MSan."
return 1
fi
if (( $UBSAN_CALLS > $UBSAN_CALLS_THRESHOLD_FOR_NON_UBSAN_BUILD )); then
echo "BAD BUILD: MSan build of $FUZZER seems to be compiled with UBSan."
return 1
fi
return 0
}
# Mixed sanitizers check for UBSan build.
function check_ubsan_build {
local FUZZER=$1
local ASAN_CALLS=$2
local DFSAN_CALLS=$3
local MSAN_CALLS=$4
local UBSAN_CALLS=$5
if [[ "$FUZZING_ENGINE" != libfuzzer ]]; then
# Ignore UBSan checks for fuzzing engines other than libFuzzer because:
# A) we (probably) are not going to use those with UBSan
# B) such builds show indistinguishable number of calls to UBSan
return 0
fi
# Perform all the checks for more detailed error message.
if (( $ASAN_CALLS > $ASAN_CALLS_THRESHOLD_FOR_NON_ASAN_BUILD )); then
echo "BAD BUILD: UBSan build of $FUZZER seems to be compiled with ASan."
return 1
fi
if (( $DFSAN_CALLS > $DFSAN_CALLS_THRESHOLD_FOR_NON_DFSAN_BUILD )); then
echo "BAD BUILD: UBSan build of $FUZZER seems to be compiled with DFSan."
return 1
fi
if (( $MSAN_CALLS > $MSAN_CALLS_THRESHOLD_FOR_NON_MSAN_BUILD )); then
echo "BAD BUILD: UBSan build of $FUZZER seems to be compiled with MSan."
return 1
fi
if (( $UBSAN_CALLS < $UBSAN_CALLS_THRESHOLD_FOR_UBSAN_BUILD )); then
echo "BAD BUILD: $FUZZER does not seem to be compiled with UBSan."
return 1
fi
}
# Verify that the given fuzz target is compiled with correct sanitizer.
function check_mixed_sanitizers {
local FUZZER=$1
local result=0
local CALL_INSN=
if [ "${FUZZING_LANGUAGE:-}" = "jvm" ]; then
# Sanitizer runtime is linked into the Jazzer driver, so this check does not
# apply.
return 0
fi
if [ "${FUZZING_LANGUAGE:-}" = "javascript" ]; then
# Jazzer.js currently does not support using sanitizers with native Node.js addons.
# This is not relevant anyways since supporting this will be done by preloading
# the sanitizers in the wrapper script starting Jazzer.js.
return 0
fi
if [ "${FUZZING_LANGUAGE:-}" = "ruby" ]; then
return 0
fi
if [ "${FUZZING_LANGUAGE:-}" = "python" ]; then
# Sanitizer runtime is loaded via LD_PRELOAD, so this check does not apply.
return 0
fi
# luzer-based tests.
# Sanitizer runtime is loaded via LD_PRELOAD, so this check does
# not apply.
egrep luarocks $FUZZER && return 0;
# For fuzztest fuzzers point to the binary instead of launcher script.
if [[ $FUZZER == *"@"* ]]; then
FUZZER=(${FUZZER//@/ }[0])
fi
CALL_INSN=
if [[ $ARCHITECTURE == "x86_64" ]]
then
CALL_INSN="callq?\s+[0-9a-f]+\s+<"
elif [[ $ARCHITECTURE == "i386" ]]
then
CALL_INSN="call\s+[0-9a-f]+\s+<"
elif [[ $ARCHITECTURE == "aarch64" ]]
then
CALL_INSN="bl\s+[0-9a-f]+\s+<"
else
echo "UNSUPPORTED ARCHITECTURE"
exit 1
fi
local ASAN_CALLS=$(objdump -dC $FUZZER | egrep "${CALL_INSN}__asan" -c)
local DFSAN_CALLS=$(objdump -dC $FUZZER | egrep "${CALL_INSN}__dfsan" -c)
local MSAN_CALLS=$(objdump -dC $FUZZER | egrep "${CALL_INSN}__msan" -c)
local UBSAN_CALLS=$(objdump -dC $FUZZER | egrep "${CALL_INSN}__ubsan" -c)
if [[ "$SANITIZER" = address ]]; then
check_asan_build $FUZZER $ASAN_CALLS $DFSAN_CALLS $MSAN_CALLS $UBSAN_CALLS
result=$?
elif [[ "$SANITIZER" = dataflow ]]; then
check_dfsan_build $FUZZER $ASAN_CALLS $DFSAN_CALLS $MSAN_CALLS $UBSAN_CALLS
result=$?
elif [[ "$SANITIZER" = memory ]]; then
check_msan_build $FUZZER $ASAN_CALLS $DFSAN_CALLS $MSAN_CALLS $UBSAN_CALLS
result=$?
elif [[ "$SANITIZER" = undefined ]]; then
check_ubsan_build $FUZZER $ASAN_CALLS $DFSAN_CALLS $MSAN_CALLS $UBSAN_CALLS
result=$?
elif [[ "$SANITIZER" = thread ]]; then
# TODO(metzman): Implement this.
result=0
fi
return $result
}
# Verify that the given fuzz target doesn't crash on the seed corpus.
function check_seed_corpus {
local FUZZER=$1
local FUZZER_NAME="$(basename $FUZZER)"
local FUZZER_OUTPUT="/tmp/$FUZZER_NAME.output"
if [[ "$FUZZING_ENGINE" != libfuzzer ]]; then
return 0
fi
# Set up common fuzzing arguments, otherwise "run_fuzzer" errors out.
if [ -z "$FUZZER_ARGS" ]; then
export FUZZER_ARGS="-rss_limit_mb=2560 -timeout=25"
fi
bash -c "run_fuzzer $FUZZER_NAME -runs=0" &> $FUZZER_OUTPUT
# Don't output anything if fuzz target hasn't crashed.
if [ $? -ne 0 ]; then
echo "BAD BUILD: $FUZZER has a crashing input in its seed corpus:"
cat $FUZZER_OUTPUT
return 1
fi
return 0
}
function check_architecture {
local FUZZER=$1
local FUZZER_NAME=$(basename $FUZZER)
if [ "${FUZZING_LANGUAGE:-}" = "jvm" ]; then
# The native dependencies of a JVM project are not packaged, but loaded
# dynamically at runtime and thus cannot be checked here.
return 0;
fi
if [ "${FUZZING_LANGUAGE:-}" = "javascript" ]; then
# Jazzer.js fuzzers are wrapper scripts that start the fuzz target with
# the Jazzer.js CLI.
return 0;
fi
if [ "${FUZZING_LANGUAGE:-}" = "ruby" ]; then
return 0;
fi
if [ "${FUZZING_LANGUAGE:-}" = "python" ]; then
FUZZER=${FUZZER}.pkg
fi
# luzer-based tests.
egrep luarocks $FUZZER && return 0;
# For fuzztest fuzzers point to the binary instead of launcher script.
if [[ $FUZZER == *"@"* ]]; then
FUZZER=(${FUZZER//@/ }[0])
fi
FILE_OUTPUT=$(file $FUZZER)
if [[ $ARCHITECTURE == "x86_64" ]]
then
echo $FILE_OUTPUT | grep "x86-64" > /dev/null
elif [[ $ARCHITECTURE == "i386" ]]
then
echo $FILE_OUTPUT | grep "80386" > /dev/null
elif [[ $ARCHITECTURE == "aarch64" ]]
then
echo $FILE_OUTPUT | grep "aarch64" > /dev/null
else
echo "UNSUPPORTED ARCHITECTURE"
return 1
fi
result=$?
if [[ $result != 0 ]]
then
echo "BAD BUILD $FUZZER is not built for architecture: $ARCHITECTURE"
echo "file command output: $FILE_OUTPUT"
echo "check_mixed_sanitizers test will fail."
fi
return $result
}
function main {
local FUZZER=$1
local AUXILIARY_FUZZER=${2:-}
local checks_failed=0
local result=0
export RUN_FUZZER_MODE="batch"
check_engine $FUZZER
result=$?
checks_failed=$(( $checks_failed + $result ))
check_architecture $FUZZER
result=$?
checks_failed=$(( $checks_failed + $result ))
if [[ "$FUZZING_ENGINE" == centipede \
&& "$SANITIZER" != none && "${HELPER:-}" == True ]]; then
check_mixed_sanitizers $AUXILIARY_FUZZER
else
check_mixed_sanitizers $FUZZER
fi
result=$?
checks_failed=$(( $checks_failed + $result ))
check_startup_crash $FUZZER
result=$?
checks_failed=$(( $checks_failed + $result ))
# TODO: re-enable after introducing bug auto-filing for bad builds.
# check_seed_corpus $FUZZER
return $checks_failed
}
if [ $# -ne 1 -a $# -ne 2 ]; then
echo "Usage: $0 []"
exit 1
fi
# Fuzz target path.
FUZZER=$1
AUXILIARY_FUZZER=${2:-}
main $FUZZER $AUXILIARY_FUZZER
exit $?
================================================
FILE: infra/base-images/base-runner/coverage
================================================
#!/bin/bash -u
# Copyright 2018 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
cd $OUT
if (( $# > 0 )); then
FUZZ_TARGETS="$@"
else
FUZZ_TARGETS="$(find . -maxdepth 1 -type f -executable -printf '%P\n' | \
grep -v -x -F \
-e 'llvm-symbolizer' \
-e 'jazzer_agent_deploy.jar' \
-e 'jazzer_driver' \
-e 'jazzer_driver_with_sanitizer' \
-e 'sanitizer_with_fuzzer.so')"
fi
COVERAGE_OUTPUT_DIR=${COVERAGE_OUTPUT_DIR:-$OUT}
DUMPS_DIR="$COVERAGE_OUTPUT_DIR/dumps"
FUZZERS_COVERAGE_DUMPS_DIR="$DUMPS_DIR/fuzzers_coverage"
MERGED_COVERAGE_DIR="$COVERAGE_OUTPUT_DIR/merged_coverage"
FUZZER_STATS_DIR="$COVERAGE_OUTPUT_DIR/fuzzer_stats"
TEXTCOV_REPORT_DIR="$COVERAGE_OUTPUT_DIR/textcov_reports"
LOGS_DIR="$COVERAGE_OUTPUT_DIR/logs"
REPORT_ROOT_DIR="$COVERAGE_OUTPUT_DIR/report"
REPORT_BY_TARGET_ROOT_DIR="$COVERAGE_OUTPUT_DIR/report_target"
PLATFORM=linux
REPORT_PLATFORM_DIR="$COVERAGE_OUTPUT_DIR/report/$PLATFORM"
for directory in $DUMPS_DIR $FUZZER_STATS_DIR $LOGS_DIR $REPORT_ROOT_DIR $TEXTCOV_REPORT_DIR\
$REPORT_PLATFORM_DIR $REPORT_BY_TARGET_ROOT_DIR $FUZZERS_COVERAGE_DUMPS_DIR $MERGED_COVERAGE_DIR; do
rm -rf $directory
mkdir -p $directory
done
PROFILE_FILE="$DUMPS_DIR/merged.profdata"
SUMMARY_FILE="$REPORT_PLATFORM_DIR/summary.json"
COVERAGE_TARGET_FILE="$FUZZER_STATS_DIR/coverage_targets.txt"
# Use path mapping, as $SRC directory from the builder is copied into $OUT/$SRC.
PATH_EQUIVALENCE_ARGS="-path-equivalence=/,$OUT"
# It's important to use $COVERAGE_EXTRA_ARGS as the last argument, because it
# can contain paths to source files / directories which are positional args.
LLVM_COV_COMMON_ARGS="$PATH_EQUIVALENCE_ARGS \
-ignore-filename-regex=.*src/libfuzzer/.* $COVERAGE_EXTRA_ARGS"
# Options to extract branch coverage.
BRANCH_COV_ARGS="--show-branches=count --show-expansions"
# Timeout for running a single fuzz target.
TIMEOUT=1h
# This will be used by llvm-cov command to generate the actual report.
objects=""
# Number of CPUs available, this is needed for running tests in parallel.
# Set the max number of parallel jobs to be the CPU count and a max of 10.
NPROC=$(nproc)
MAX_PARALLEL_COUNT=10
CORPUS_DIR=${CORPUS_DIR:-"/corpus"}
function run_fuzz_target {
local target=$1
# '%1m' will produce separate dump files for every object. For example, if a
# fuzz target loads a shared library, we will have dumps for both of them.
local profraw_file="$DUMPS_DIR/$target.%1m.profraw"
local profraw_file_mask="$DUMPS_DIR/$target.*.profraw"
local profdata_file="$DUMPS_DIR/$target.profdata"
local corpus_real="$CORPUS_DIR/${target}"
# -merge=1 requires an output directory, create a new, empty dir for that.
local corpus_dummy="$OUT/dummy_corpus_dir_for_${target}"
rm -rf $corpus_dummy && mkdir -p $corpus_dummy
# Use -merge=1 instead of -runs=0 because merge is crash resistant and would
# let to get coverage using all corpus files even if there are crash inputs.
# Merge should not introduce any significant overhead compared to -runs=0,
# because (A) corpuses are already minimized; (B) we do not use sancov, and so
# libFuzzer always finishes merge with an empty output dir.
# Use 100s timeout instead of 25s as code coverage builds can be very slow.
local args="-merge=1 -timeout=100 $corpus_dummy $corpus_real"
export LLVM_PROFILE_FILE=$profraw_file
timeout $TIMEOUT $OUT/$target $args &> $LOGS_DIR/$target.log
cov_retcode=$?
target_error_log="$LOGS_DIR/${target}_error.log"
grep -E "^==[0-9]+== ERROR: libFuzzer:" "$LOGS_DIR/$target.log" > "$target_error_log"
grep_retcode=$?
if (( $cov_retcode != 0 || $grep_retcode == 0 )); then
echo "Error occured while running $target:"
echo "Cov returncode: $cov_retcode, grep returncode: $grep_retcode"
cat $LOGS_DIR/$target.log
fi
rm -rf $corpus_dummy
if (( $(du -c $profraw_file_mask | tail -n 1 | cut -f 1) == 0 )); then
# Skip fuzz targets that failed to produce profile dumps.
return 0
fi
# If necessary translate to latest profraw version.
if [[ $target == *"@"* ]]; then
# Extract fuzztest binary name from fuzztest wrapper script.
target=(${target//@/ }[0])
fi
profraw_update.py $OUT/$target -i $profraw_file_mask
llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
# Delete unnecessary and (potentially) large .profraw files.
rm $profraw_file_mask
shared_libraries=$(coverage_helper shared_libs -build-dir=$OUT -object=$target)
llvm-cov export -summary-only -instr-profile=$profdata_file -object=$target \
$shared_libraries $LLVM_COV_COMMON_ARGS > $FUZZER_STATS_DIR/$target.json
# If grep returned zero an error was matched.
echo "Coverage error, creating log file: $FUZZER_STATS_DIR/${target}_error.log"
if (( $cov_retcode != 0 || $grep_retcode == 0 )); then
mv "$target_error_log" "$FUZZER_STATS_DIR/${target}_error.log";
fi
# For introspector.
llvm-cov show -instr-profile=$profdata_file -object=$target -line-coverage-gt=0 $shared_libraries $BRANCH_COV_ARGS $LLVM_COV_COMMON_ARGS > ${TEXTCOV_REPORT_DIR}/$target.covreport
}
# The native_go_fuzzers.json file contains a list of
# fuzzers that are std lib harnesses. This function
# reads that json list to see if the name of a fuzzer
# exists in that list.
is_std_lib_fuzzer() {
local element="$1"
local file="$OUT/native_go_fuzzers.json"
if [ -z "$element" ]; then
echo "Usage: check_in_list \"element to check\""
return 1
fi
if [ ! -s "$file" ]; then
echo "false"
return 0
fi
if jq -e --arg item "$element" 'index($item) != null' "$file" > /dev/null; then
echo "true"
else
echo "false"
fi
}
# get_function_name reads a value from a simple json format:
#
# {
# "fuzzerBinaryName1": "FuzzFunctionName1",
# "fuzzerBinaryName2": "FuzzFunctionName2"
# }
#
# In this case, we call `get_function_name fuzzerBinaryName1
# $OUT/fuzzer_function_names.json` to get "FuzzFunctionName1".
# We need this when setting up a temporary directory for
# getting the coverage.
# Go reads corpus files from `testdata/fuzz/FuzzFunctionName1`,
# so we need the function name instead of the executable binary name.
get_function_name() {
local key="$1"
local file="$2"
if [ -z "$key" ] || [ -z "$file" ]; then
echo "Usage: get_function_name "
return 1
fi
# If file doesn't exist or is empty
if [ ! -s "$file" ]; then
echo "Error: File '$file' does not exist or is empty."
return 1
fi
# Use jq to extract the value
local result
result=$(jq -r --arg k "$key" '.[$k] // empty' "$file")
if [ -z "$result" ]; then
echo "Error: Key '$key' not found in '$file'."
return 1
else
echo "$result"
fi
}
function run_go_fuzz_target {
local target=$1
echo "Running go target $target"
export FUZZ_CORPUS_DIR="$CORPUS_DIR/${target}/"
export FUZZ_PROFILE_NAME="$DUMPS_DIR/$target.perf"
# "NAME" in testdata/fuzz/NAME needs to be the function
# name of the fuzz test in the compiled binary. We have
# stored that name in $OUT/fuzzer_function_names.json.
# We also need this parameter for go-fuzz fuzzers, to
# run only the single function. Otherwise, the test may
# run other tests that are included in the binary
function_names_file="$OUT/fuzzer_function_names.json"
function_name=$(get_function_name "$target" "$function_names_file")
if [[ "$(is_std_lib_fuzzer "${target}")" == "true" ]]; then
# Create the corpus dir testdata/fuzz/NAME
mkdir -p "${OUT}/testdata/fuzz/${function_name}"
# Now we copy the corpus generated at runtime into FUZZ_CORPUS_DIR.
# In the process, we convert it from a raw byte slice to files of
# the std lib Go corpus file format, eg:
#
# go test fuzz v1
# int8(42)
# string("hello")
#
# The convertLibFuzzerTestcaseToStdLibGo binary handles the entire
# process for that.
convertLibFuzzerTestcaseToStdLibGo -convert-seeds \
-params-json $OUT/fuzzer-parameters.json \
-fuzzer-func "${target}" \
-fuzzerBinaryName $target \
-seeds-dir "${FUZZ_CORPUS_DIR}" \
-out-dir "${OUT}/testdata/fuzz/${function_name}"
pushd $OUT
timeout "$TIMEOUT" \
"$OUT/$target" \
-test.run="^${function_name}\$" \
-test.coverprofile "$DUMPS_DIR/$target.profdata" \
&> "$LOGS_DIR/$target.log"
if (( $? != 0 )); then
echo "Error occured while running std lib fuzzer $target:"
cat $LOGS_DIR/$target.log
fi
popd
# cleanup after native go fuzzers
rm -r "${OUT}/testdata/fuzz/${function_name}"
# The std lib fuzzers are renamed to "*_fuzz_.go" during "infra/helper.py build_fuzzers".
# They are are therefore refered to as "*_fuzz_.go" in the profdata files.
# Since the copies named "*_fuzz_.go" do not exist in the file tree during
# the coverage build, we change the references in the .profdata files
# to the original file names.
sed -i "s/_libFuzzer.go/_test.go/g" $DUMPS_DIR/$target.profdata
else
timeout "$TIMEOUT" \
"$OUT/$target" \
-test.run="^${function_name}\$" \
-test.coverprofile "$DUMPS_DIR/$target.profdata" \
&> "$LOGS_DIR/$target.log"
if (( $? != 0 )); then
echo "Error occured while running $target:"
cat $LOGS_DIR/$target.log
fi
fi
# translate from golangish paths to current absolute paths
cat $OUT/$target.gocovpath | while read i; do sed -i $i $DUMPS_DIR/$target.profdata; done
# cf PATH_EQUIVALENCE_ARGS
sed -i 's=/='$OUT'/=' $DUMPS_DIR/$target.profdata
$SYSGOPATH/bin/gocovsum $DUMPS_DIR/$target.profdata > $FUZZER_STATS_DIR/$target.json
}
function run_python_fuzz_target {
local target=$1
local zipped_sources="$DUMPS_DIR/$target.deps.zip"
local corpus_real="$CORPUS_DIR/${target}"
# Write dummy stats file
echo "{}" > "$FUZZER_STATS_DIR/$target.json"
# Run fuzzer
$OUT/$target $corpus_real -atheris_runs=$(ls -la $corpus_real | wc -l) > $LOGS_DIR/$target.log 2>&1
if (( $? != 0 )); then
echo "Error happened getting coverage of $target"
echo "This is likely because Atheris did not exit gracefully"
cat $LOGS_DIR/$target.log
return 0
fi
mv .coverage $OUT/.coverage_$target
}
function run_java_fuzz_target {
local target=$1
local exec_file="$DUMPS_DIR/$target.exec"
local class_dump_dir="$DUMPS_DIR/${target}_classes/"
mkdir "$class_dump_dir"
local corpus_real="$CORPUS_DIR/${target}"
# -merge=1 requires an output directory, create a new, empty dir for that.
local corpus_dummy="$OUT/dummy_corpus_dir_for_${target}"
rm -rf $corpus_dummy && mkdir -p $corpus_dummy
# Use 100s timeout instead of 25s as code coverage builds can be very slow.
local jacoco_args="destfile=$exec_file,classdumpdir=$class_dump_dir,excludes=com.code_intelligence.jazzer.*\\:com.sun.tools.attach.VirtualMachine"
local args="-merge=1 -timeout=100 --nohooks \
--additional_jvm_args=-javaagent\\:/opt/jacoco-agent.jar=$jacoco_args \
$corpus_dummy $corpus_real"
timeout $TIMEOUT $OUT/$target $args &> $LOGS_DIR/$target.log
if (( $? != 0 )); then
echo "Error occured while running $target:"
cat $LOGS_DIR/$target.log
fi
if (( $(du -c $exec_file | tail -n 1 | cut -f 1) == 0 )); then
# Skip fuzz targets that failed to produce .exec files.
echo "$target failed to produce .exec file."
return 0
fi
# Generate XML report only as input to jacoco_report_converter.
# Source files are not needed for the summary.
local xml_report="$DUMPS_DIR/${target}.xml"
local summary_file="$FUZZER_STATS_DIR/$target.json"
java -jar /opt/jacoco-cli.jar report $exec_file \
--xml $xml_report \
--classfiles $class_dump_dir
# Write llvm-cov summary file.
jacoco_report_converter.py $xml_report $summary_file
}
function run_javascript_fuzz_target {
local target=$1
local corpus_real="$CORPUS_DIR/${target}"
# -merge=1 requires an output directory, create a new, empty dir for that.
local corpus_dummy="$OUT/dummy_corpus_dir_for_${target}"
rm -rf $corpus_dummy && mkdir -p $corpus_dummy
# IstanbulJS currently does not work when the tested program creates
# subprocesses. For this reason, we first minimize the corpus removing
# any crashing inputs so that we can report source-based code coverage
# with a single sweep over the minimized corpus
local merge_args="-merge=1 -timeout=100 $corpus_dummy $corpus_real"
timeout $TIMEOUT $OUT/$target $merge_args &> $LOGS_DIR/$target.log
# nyc saves the coverage reports in a directory with the default name "coverage"
local coverage_dir="$DUMPS_DIR/coverage_dir_for_${target}"
rm -rf $coverage_dir && mkdir -p $coverage_dir
local nyc_json_coverage_file="$coverage_dir/coverage-final.json"
local nyc_json_summary_file="$coverage_dir/coverage-summary.json"
local args="-runs=0 $corpus_dummy"
local jazzerjs_args="--coverage --coverageDirectory $coverage_dir --coverageReporters json --coverageReporters json-summary"
JAZZERJS_EXTRA_ARGS=$jazzerjs_args $OUT/$target $args &> $LOGS_DIR/$target.log
if (( $? != 0 )); then
echo "Error occured while running $target:"
cat $LOGS_DIR/$target.log
fi
if [ ! -s $nyc_json_coverage_file ]; then
# Skip fuzz targets that failed to produce coverage-final.json file.
echo "$target failed to produce coverage-final.json file."
return 0
fi
cp $nyc_json_coverage_file $FUZZERS_COVERAGE_DUMPS_DIR/$target.json
local summary_file="$FUZZER_STATS_DIR/$target.json"
nyc_report_converter.py $nyc_json_summary_file $summary_file
}
function generate_html {
local profdata=$1
local shared_libraries=$2
local objects=$3
local output_dir=$4
rm -rf "$output_dir"
mkdir -p "$output_dir/$PLATFORM"
local llvm_cov_args="-instr-profile=$profdata $objects $LLVM_COV_COMMON_ARGS"
llvm-cov show -format=html -output-dir=$output_dir -Xdemangler rcfilt $llvm_cov_args
# Export coverage summary in JSON format.
local summary_file=$output_dir/$PLATFORM/summary.json
llvm-cov export -summary-only $llvm_cov_args > $summary_file
coverage_helper -v post_process -src-root-dir=/ -summary-file=$summary_file \
-output-dir=$output_dir $PATH_EQUIVALENCE_ARGS
}
export SYSGOPATH=$GOPATH
export GOPATH=$OUT/$GOPATH
# Run each fuzz target, generate raw coverage dumps.
for fuzz_target in $FUZZ_TARGETS; do
# Test if fuzz target is a golang one.
if [[ $FUZZING_LANGUAGE == "go" ]]; then
# Continue if not a fuzz target.
if [[ $FUZZING_ENGINE != "none" ]]; then
grep "FUZZ_CORPUS_DIR" $fuzz_target > /dev/null 2>&1 || grep "testing\.T" $fuzz_target > /dev/null 2>&1 || continue
fi
# Log the target in the targets file.
echo ${fuzz_target} >> $COVERAGE_TARGET_FILE
# Run the coverage collection.
run_go_fuzz_target $fuzz_target &
elif [[ $FUZZING_LANGUAGE == "python" ]]; then
echo "Entering python fuzzing"
# Log the target in the targets file.
echo ${fuzz_target} >> $COVERAGE_TARGET_FILE
# Run the coverage collection.
run_python_fuzz_target $fuzz_target
elif [[ $FUZZING_LANGUAGE == "jvm" ]]; then
# Continue if not a fuzz target.
if [[ $FUZZING_ENGINE != "none" ]]; then
grep "LLVMFuzzerTestOneInput" $fuzz_target > /dev/null 2>&1 || continue
fi
echo "Running $fuzz_target"
# Log the target in the targets file.
echo ${fuzz_target} >> $COVERAGE_TARGET_FILE
# Run the coverage collection.
run_java_fuzz_target $fuzz_target &
elif [[ $FUZZING_LANGUAGE == "javascript" ]]; then
# Continue if not a fuzz target.
if [[ $FUZZING_ENGINE != "none" ]]; then
grep "LLVMFuzzerTestOneInput" $fuzz_target > /dev/null 2>&1 || continue
fi
echo "Running $fuzz_target"
# Log the target in the targets file.
echo ${fuzz_target} >> $COVERAGE_TARGET_FILE
# Run the coverage collection.
run_javascript_fuzz_target $fuzz_target &
else
# Continue if not a fuzz target.
if [[ $FUZZING_ENGINE != "none" ]]; then
grep "LLVMFuzzerTestOneInput" $fuzz_target > /dev/null 2>&1 || continue
fi
echo "Running $fuzz_target"
# Log the target in the targets file.
echo ${fuzz_target} >> $COVERAGE_TARGET_FILE
# Run the coverage collection.
run_fuzz_target $fuzz_target &
# Rewrite object if its a FUZZTEST target
if [[ $fuzz_target == *"@"* ]]; then
# Extract fuzztest binary name from fuzztest wrapper script.
fuzz_target=(${fuzz_target//@/ }[0])
fi
if [[ -z $objects ]]; then
# The first object needs to be passed without -object= flag.
objects="$fuzz_target"
else
objects="$objects -object=$fuzz_target"
fi
fi
# Limit the number of processes to be spawned.
n_child_proc=$(jobs -rp | wc -l)
while [[ "$n_child_proc" -eq "$NPROC" || "$n_child_proc" -gt "$MAX_PARALLEL_COUNT" ]]; do
sleep 4
n_child_proc=$(jobs -rp | wc -l)
done
done
# Wait for background processes to finish.
wait
if [[ $FUZZING_LANGUAGE == "go" ]]; then
echo $DUMPS_DIR
$SYSGOPATH/bin/gocovmerge $DUMPS_DIR/*.profdata > fuzz.cov
gotoolcover -html=fuzz.cov -o $REPORT_ROOT_DIR/index.html
$SYSGOPATH/bin/gocovsum fuzz.cov > $SUMMARY_FILE
cp $REPORT_ROOT_DIR/index.html $REPORT_PLATFORM_DIR/index.html
$SYSGOPATH/bin/pprof-merge $DUMPS_DIR/*.perf.cpu.prof
mv merged.data $REPORT_ROOT_DIR/cpu.prof
$SYSGOPATH/bin/pprof-merge $DUMPS_DIR/*.perf.heap.prof
mv merged.data $REPORT_ROOT_DIR/heap.prof
#TODO some proxy for go tool pprof -http=127.0.0.1:8001 $DUMPS_DIR/cpu.prof
echo "Finished generating code coverage report for Go fuzz targets."
elif [[ $FUZZING_LANGUAGE == "python" ]]; then
# Extract source files from all dependency zip folders
mkdir -p /pythoncovmergedfiles/medio
PYCOVDIR=/pycovdir/
mkdir $PYCOVDIR
for fuzzer in $FUZZ_TARGETS; do
fuzzer_deps=${fuzzer}.pkg.deps.zip
unzip $OUT/${fuzzer_deps}
rsync -r ./medio /pythoncovmergedfiles/medio
rm -rf ./medio
# Translate paths in unzipped folders to paths that we can use
mv $OUT/.coverage_$fuzzer .coverage
python3 /usr/local/bin/python_coverage_runner_help.py translate /pythoncovmergedfiles/medio
cp .new_coverage $PYCOVDIR/.coverage_$fuzzer
cp .new_coverage $OUT/coverage_d_$fuzzer
done
# Combine coverage
cd $PYCOVDIR
python3 /usr/local/bin/python_coverage_runner_help.py combine .coverage_*
python3 /usr/local/bin/python_coverage_runner_help.py html
# Produce all_cov file used by fuzz introspector.
python3 /usr/local/bin/python_coverage_runner_help.py json -o ${TEXTCOV_REPORT_DIR}/all_cov.json
# Generate .json with similar format to llvm-cov output.
python3 /usr/local/bin/python_coverage_runner_help.py \
convert-to-summary-json ${TEXTCOV_REPORT_DIR}/all_cov.json $SUMMARY_FILE
# Copy coverage date out
cp htmlcov/status.json ${TEXTCOV_REPORT_DIR}/html_status.json
mv htmlcov/* $REPORT_PLATFORM_DIR/
mv .coverage_* $REPORT_PLATFORM_DIR/
elif [[ $FUZZING_LANGUAGE == "jvm" ]]; then
# From this point on the script does not tolerate any errors.
set -e
# Merge .exec files from the individual targets.
jacoco_merged_exec=$DUMPS_DIR/jacoco.merged.exec
java -jar /opt/jacoco-cli.jar merge $DUMPS_DIR/*.exec \
--destfile $jacoco_merged_exec
# Prepare classes directory for jacoco process
classes_dir=$DUMPS_DIR/classes
mkdir $classes_dir
# Only copy class files found in $OUT/$SRC to ensure they are
# lively compiled from the project, avoiding inclusion of
# dependency classes. This also includes the fuzzer classes.
find "$OUT/$SRC" -type f -name "*.class" | while read -r class_file; do
# Skip module-info.class
if [[ "$(basename "$class_file")" == "module-info.class" ]]; then
continue
fi
# Use javap to extract the fully qualified name of the class and copy it to $classes_dir
fqn=$(javap -verbose "$class_file" 2>/dev/null | grep "this_class:" | grep -oP '(?<=// ).*')
if [ -n "$fqn" ]; then
mkdir -p $classes_dir/$(dirname $fqn)
cp $class_file $classes_dir/$fqn.class
fi
done
# Heuristically determine source directories based on Maven structure.
# Always include the $SRC root as it likely contains the fuzzer sources.
sourcefiles_args=(--sourcefiles $OUT/$SRC)
source_dirs=$(find $OUT/$SRC -type d -name 'java')
for source_dir in $source_dirs; do
sourcefiles_args+=(--sourcefiles "$source_dir")
done
# Generate HTML and XML reports.
xml_report=$REPORT_PLATFORM_DIR/index.xml
java -jar /opt/jacoco-cli.jar report $jacoco_merged_exec \
--html $REPORT_PLATFORM_DIR \
--xml $xml_report \
--classfiles $classes_dir \
"${sourcefiles_args[@]}"
# Also serve the raw exec file and XML report, which can be useful for
# automated analysis.
cp $jacoco_merged_exec $REPORT_PLATFORM_DIR/jacoco.exec
cp $xml_report $REPORT_PLATFORM_DIR/jacoco.xml
cp $xml_report $TEXTCOV_REPORT_DIR/jacoco.xml
# Write llvm-cov summary file.
jacoco_report_converter.py $xml_report $SUMMARY_FILE
set +e
elif [[ $FUZZING_LANGUAGE == "javascript" ]]; then
# From this point on the script does not tolerate any errors.
set -e
json_report=$MERGED_COVERAGE_DIR/coverage.json
nyc merge $FUZZERS_COVERAGE_DUMPS_DIR $json_report
nyc report -t $MERGED_COVERAGE_DIR --report-dir $REPORT_PLATFORM_DIR --reporter=html --reporter=json-summary
nyc_json_summary_file=$REPORT_PLATFORM_DIR/coverage-summary.json
# Write llvm-cov summary file.
nyc_report_converter.py $nyc_json_summary_file $SUMMARY_FILE
set +e
else
# From this point on the script does not tolerate any errors.
set -e
# Merge all dumps from the individual targets.
rm -f $PROFILE_FILE
llvm-profdata merge -sparse $DUMPS_DIR/*.profdata -o $PROFILE_FILE
# TODO(mmoroz): add script from Chromium for rendering directory view reports.
# The first path in $objects does not have -object= prefix (llvm-cov format).
shared_libraries=$(coverage_helper shared_libs -build-dir=$OUT -object=$objects)
objects="$objects $shared_libraries"
generate_html $PROFILE_FILE "$shared_libraries" "$objects" "$REPORT_ROOT_DIR"
# Per target reports.
for fuzz_target in $FUZZ_TARGETS; do
if [[ $fuzz_target == *"@"* ]]; then
profdata_path=$DUMPS_DIR/$fuzz_target.profdata
report_dir=$REPORT_BY_TARGET_ROOT_DIR/$fuzz_target
# Extract fuzztest binary name from fuzztest wrapper script.
fuzz_target=(${fuzz_target//@/ }[0])
else
profdata_path=$DUMPS_DIR/$fuzz_target.profdata
report_dir=$REPORT_BY_TARGET_ROOT_DIR/$fuzz_target
fi
if [[ ! -f "$profdata_path" ]]; then
echo "WARNING: $fuzz_target has no profdata generated."
continue
fi
generate_html $profdata_path "$shared_libraries" "$fuzz_target" "$report_dir"
done
set +e
fi
# Make sure report is readable.
chmod -R +r $REPORT_ROOT_DIR $REPORT_BY_TARGET_ROOT_DIR
find $REPORT_ROOT_DIR $REPORT_BY_TARGET_ROOT_DIR -type d -exec chmod +x {} +
# HTTP_PORT is optional.
set +u
if [[ -n $HTTP_PORT ]]; then
# Serve the report locally.
echo "Serving the report on http://127.0.0.1:$HTTP_PORT/linux/index.html"
cd $REPORT_ROOT_DIR
python3 -m http.server $HTTP_PORT
fi
================================================
FILE: infra/base-images/base-runner/coverage_helper
================================================
#!/bin/bash -u
# Copyright 2018 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
python3 $CODE_COVERAGE_SRC/coverage_utils.py $@
================================================
FILE: infra/base-images/base-runner/download_corpus
================================================
#!/bin/bash -u
# Copyright 2018 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
if (( $# < 1 )); then
echo "Usage: $0 \"path_download_to url_download_from\" (can be repeated)" >&2
exit 1
fi
for pair in "$@"; do
read path url <<< "$pair"
wget -q -O $path $url
done
# Always exit with 0 as we do not track wget return codes and should not rely
# on the latest command execution.
exit 0
================================================
FILE: infra/base-images/base-runner/generate_differential_cov_report.py
================================================
#!/usr/bin/env python3
#
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Script for generating differential coverage reports.
generate_differential_cov_report.py \
"""
import os
import shutil
import subprocess
import sys
class ProfData:
"""Class representing a profdata file."""
def __init__(self, text):
self.function_profs = []
for function_prof in text.split('\n\n'):
if not function_prof:
continue
self.function_profs.append(FunctionProf(function_prof))
def to_string(self):
"""Convert back to a string."""
return '\n'.join(
[function_prof.to_string() for function_prof in self.function_profs])
def find_function(self, function, idx=None):
"""Find the same function in this profdata."""
if idx is not None:
try:
possibility = self.function_profs[idx]
if function.func_hash == possibility.func_hash:
return possibility
except IndexError:
pass
for function_prof in self.function_profs:
if function_prof.func_hash == function.func_hash:
return function_prof
return None
def subtract(self, subtrahend):
"""Subtract subtrahend from this profdata."""
for idx, function_prof in enumerate(self.function_profs):
subtrahend_function_prof = subtrahend.find_function(function_prof, idx)
function_prof.subtract(subtrahend_function_prof)
class FunctionProf:
"""Profile of a function."""
FUNC_HASH_COMMENT_LINE = '# Func Hash:'
NUM_COUNTERS_COMMENT_LINE = '# Num Counters:'
COUNTER_VALUES_COMMENT_LINE = '# Counter Values:'
def __init__(self, text):
print(text)
lines = text.splitlines()
self.function = lines[0]
assert self.FUNC_HASH_COMMENT_LINE == lines[1]
self.func_hash = lines[2]
assert self.NUM_COUNTERS_COMMENT_LINE == lines[3]
self.num_counters = int(lines[4])
assert self.COUNTER_VALUES_COMMENT_LINE == lines[5]
self.counter_values = [1 if int(line) else 0 for line in lines[6:]]
def to_string(self):
"""Convert back to text."""
lines = [
self.function,
self.FUNC_HASH_COMMENT_LINE,
self.func_hash,
self.NUM_COUNTERS_COMMENT_LINE,
str(self.num_counters),
self.COUNTER_VALUES_COMMENT_LINE,
] + [str(num) for num in self.counter_values]
return '\n'.join(lines)
def subtract(self, subtrahend_prof):
"""Subtract this other function from this function."""
if not subtrahend_prof:
print(self.function, 'has no subtrahend')
# Nothing to subtract.
return
self.counter_values = [
max(counter1 - counter2, 0) for counter1, counter2 in zip(
self.counter_values, subtrahend_prof.counter_values)
]
def get_profdata_files(directory):
"""Returns profdata files in |directory|."""
profdatas = []
for filename in os.listdir(directory):
filename = os.path.join(directory, filename)
if filename.endswith('.profdata'):
profdatas.append(filename)
return profdatas
def convert_profdata_to_text(profdata):
"""Convert a profdata binary file to a profdata text file."""
profdata_text = f'{profdata}.txt'
if os.path.exists(profdata_text):
os.remove(profdata_text)
command = [
'llvm-profdata', 'merge', '-j=1', '-sparse', profdata, '--text', '-o',
profdata_text
]
print(command)
subprocess.run(command, check=True)
return profdata_text
def convert_text_profdata_to_bin(profdata_text):
"""Convert a profdata text file to a profdata binary file."""
profdata = profdata_text.replace('.txt', '').replace('.profdata',
'') + '.profdata'
print('bin profdata', profdata)
if os.path.exists(profdata):
os.remove(profdata)
command = [
'llvm-profdata', 'merge', '-j=1', '-sparse', profdata_text, '-o', profdata
]
print(command)
subprocess.run(command, check=True)
return profdata
def get_difference(minuend_filename, subtrahend_filename):
"""Subtract subtrahend_filename from minuend_filename."""
with open(minuend_filename, 'r', encoding='utf-8') as minuend_file:
print('minuend', minuend_filename)
minuend = ProfData(minuend_file.read())
with open(subtrahend_filename, 'r', encoding='utf-8') as subtrahend_file:
print('subtrahend', subtrahend_filename)
subtrahend = ProfData(subtrahend_file.read())
minuend.subtract(subtrahend)
return minuend
def profdatas_to_objects(profdatas):
"""Get the corresponding objects for each profdata."""
return [
os.path.splitext(os.path.basename(profdata))[0] for profdata in profdatas
]
def generate_differential_cov_reports(minuend_profdatas, subtrahend_profdatas,
difference_dir):
"""Calculate the differences between all profdatas and generate differential
coverage reports."""
profdata_objects = profdatas_to_objects(minuend_profdatas)
real_profdata_objects = [
binobject for binobject in profdata_objects if binobject != 'merged'
]
for minuend, subtrahend, binobject in zip(minuend_profdatas,
subtrahend_profdatas,
profdata_objects):
minuend_text = convert_profdata_to_text(minuend)
subtrahend_text = convert_profdata_to_text(subtrahend)
difference = get_difference(minuend_text, subtrahend_text)
basename = os.path.basename(minuend_text)
difference_text = os.path.join(difference_dir, basename)
with open(difference_text, 'w', encoding='utf-8') as file_handle:
file_handle.write(difference.to_string())
difference_profdata = convert_text_profdata_to_bin(difference_text)
if not difference_profdata.endswith('merged.profdata'):
generate_html_report(difference_profdata, [binobject],
os.path.join(difference_dir, binobject))
else:
generate_html_report(difference_profdata, real_profdata_objects,
os.path.join(difference_dir, 'merged'))
def generate_html_report(profdata, objects, directory):
"""Generate an HTML coverage report."""
# TODO(metzman): Deal with shared libs.
html_dir = os.path.join(directory, 'reports')
if os.path.exists(html_dir):
os.remove(html_dir)
os.makedirs(html_dir)
out_dir = os.getenv('OUT', '/out')
command = [
'llvm-cov', 'show', f'-path-equivalence=/,{out_dir}', '-format=html',
'-Xdemangler', 'rcfilt', f'-instr-profile={profdata}'
]
objects = [os.path.join(out_dir, binobject) for binobject in objects]
command += objects + ['-o', html_dir]
print(' '.join(command))
subprocess.run(command, check=True)
def main():
"""Generate differential coverage reports."""
if len(sys.argv) != 4:
print(
f'Usage: {sys.argv[0]} ')
minuend_dir = sys.argv[1]
subtrahend_dir = sys.argv[2]
difference_dir = sys.argv[3]
if os.path.exists(difference_dir):
shutil.rmtree(difference_dir)
os.makedirs(difference_dir, exist_ok=True)
minuend_profdatas = get_profdata_files(minuend_dir)
subtrahend_profdatas = get_profdata_files(subtrahend_dir)
generate_differential_cov_reports(minuend_profdatas, subtrahend_profdatas,
difference_dir)
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-runner/gocoverage/go.mod
================================================
module oss-fuzz.com/gocoverage
go 1.14
require (
github.com/google/pprof v0.0.0-20210226084205-cbba55b83ad5
golang.org/x/tools v0.1.0
)
================================================
FILE: infra/base-images/base-runner/gocoverage/go.sum
================================================
github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWRnGsAI=
github.com/chzyer/readline v0.0.0-20180603132655-2972be24d48e/go.mod h1:nSuG5e5PlCu98SY8svDHJxuZscDgtXS6KTTbou5AhLI=
github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1/go.mod h1:Q3SI9o4m/ZMnBNeIyt5eFwwo7qiLfzFZmjNmxjkiQlU=
github.com/google/pprof v0.0.0-20210226084205-cbba55b83ad5 h1:zIaiqGYDQwa4HVx5wGRTXbx38Pqxjemn4BP98wpzpXo=
github.com/google/pprof v0.0.0-20210226084205-cbba55b83ad5/go.mod h1:kpwsk12EmLew5upagYY7GY0pfYCcupk39gWOCRROcvE=
github.com/ianlancetaylor/demangle v0.0.0-20200824232613-28f6c0f3b639/go.mod h1:aSSvb/t6k1mPoxDqO4vJh6VOCGPwU4O0C2/Eqndh1Sc=
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191204072324-ce4227a45e2e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210119212857-b64e53b001e4/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.1.0 h1:po9/4sTYwZU9lPhi1tOrb4hCv3qrhiQ77LZfGa2OjwY=
golang.org/x/tools v0.1.0/go.mod h1:xkSsbof2nBLbhDlRMhhhyNLN/zl3eTqcnHD5viDpcZ0=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
================================================
FILE: infra/base-images/base-runner/gocoverage/gocovmerge/LICENSE
================================================
Copyright (c) 2015, Wade Simmons
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
================================================
FILE: infra/base-images/base-runner/gocoverage/gocovmerge/gocovmerge.go
================================================
// gocovmerge takes the results from multiple `go test -coverprofile` runs and
// merges them into one profile
package main
import (
"flag"
"fmt"
"io"
"log"
"os"
"sort"
"golang.org/x/tools/cover"
)
func mergeProfiles(p *cover.Profile, merge *cover.Profile) {
if p.Mode != merge.Mode {
log.Fatalf("cannot merge profiles with different modes")
}
// Since the blocks are sorted, we can keep track of where the last block
// was inserted and only look at the blocks after that as targets for merge
startIndex := 0
for _, b := range merge.Blocks {
startIndex = mergeProfileBlock(p, b, startIndex)
}
}
func mergeProfileBlock(p *cover.Profile, pb cover.ProfileBlock, startIndex int) int {
sortFunc := func(i int) bool {
pi := p.Blocks[i+startIndex]
return pi.StartLine >= pb.StartLine && (pi.StartLine != pb.StartLine || pi.StartCol >= pb.StartCol)
}
i := 0
if sortFunc(i) != true {
i = sort.Search(len(p.Blocks)-startIndex, sortFunc)
}
i += startIndex
if i < len(p.Blocks) && p.Blocks[i].StartLine == pb.StartLine && p.Blocks[i].StartCol == pb.StartCol {
if p.Blocks[i].EndLine != pb.EndLine || p.Blocks[i].EndCol != pb.EndCol {
log.Fatalf("OVERLAP MERGE: %v %v %v", p.FileName, p.Blocks[i], pb)
}
switch p.Mode {
case "set":
p.Blocks[i].Count |= pb.Count
case "count", "atomic":
p.Blocks[i].Count += pb.Count
default:
log.Fatalf("unsupported covermode: '%s'", p.Mode)
}
} else {
if i > 0 {
pa := p.Blocks[i-1]
if pa.EndLine >= pb.EndLine && (pa.EndLine != pb.EndLine || pa.EndCol > pb.EndCol) {
log.Fatalf("OVERLAP BEFORE: %v %v %v", p.FileName, pa, pb)
}
}
if i < len(p.Blocks)-1 {
pa := p.Blocks[i+1]
if pa.StartLine <= pb.StartLine && (pa.StartLine != pb.StartLine || pa.StartCol < pb.StartCol) {
log.Fatalf("OVERLAP AFTER: %v %v %v", p.FileName, pa, pb)
}
}
p.Blocks = append(p.Blocks, cover.ProfileBlock{})
copy(p.Blocks[i+1:], p.Blocks[i:])
p.Blocks[i] = pb
}
return i + 1
}
func addProfile(profiles []*cover.Profile, p *cover.Profile) []*cover.Profile {
i := sort.Search(len(profiles), func(i int) bool { return profiles[i].FileName >= p.FileName })
if i < len(profiles) && profiles[i].FileName == p.FileName {
mergeProfiles(profiles[i], p)
} else {
profiles = append(profiles, nil)
copy(profiles[i+1:], profiles[i:])
profiles[i] = p
}
return profiles
}
func dumpProfiles(profiles []*cover.Profile, out io.Writer) {
if len(profiles) == 0 {
return
}
fmt.Fprintf(out, "mode: %s\n", profiles[0].Mode)
for _, p := range profiles {
for _, b := range p.Blocks {
fmt.Fprintf(out, "%s:%d.%d,%d.%d %d %d\n", p.FileName, b.StartLine, b.StartCol, b.EndLine, b.EndCol, b.NumStmt, b.Count)
}
}
}
func main() {
flag.Parse()
var merged []*cover.Profile
for _, file := range flag.Args() {
profiles, err := cover.ParseProfiles(file)
if err != nil {
log.Fatalf("failed to parse profiles: %v", err)
}
for _, p := range profiles {
merged = addProfile(merged, p)
}
}
dumpProfiles(merged, os.Stdout)
}
================================================
FILE: infra/base-images/base-runner/gocoverage/gocovsum/gocovsum.go
================================================
// Copyright 2023 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package main
import (
"encoding/json"
"flag"
"fmt"
"log"
"go/ast"
"go/parser"
"go/token"
"golang.org/x/tools/cover"
)
type CoverageTotal struct {
Count int `json:"count"`
Covered int `json:"covered"`
Uncovered int `json:"notcovered"`
Percent float64 `json:"percent"`
}
type CoverageTotals struct {
Functions CoverageTotal `json:"functions,omitempty"`
Lines CoverageTotal `json:"lines,omitempty"`
Regions CoverageTotal `json:"regions,omitempty"`
Instantiations CoverageTotal `json:"instantiations,omitempty"`
Branches CoverageTotal `json:"branches,omitempty"`
}
type CoverageFile struct {
Summary CoverageTotals `json:"summary,omitempty"`
Filename string `json:"filename,omitempty"`
}
type CoverageData struct {
Totals CoverageTotals `json:"totals,omitempty"`
Files []CoverageFile `json:"files,omitempty"`
}
type PositionInterval struct {
start token.Position
end token.Position
}
type CoverageSummary struct {
Data []CoverageData `json:"data,omitempty"`
Type string `json:"type,omitempty"`
Version string `json:"version,omitempty"`
}
func isFunctionCovered(s token.Position, e token.Position, blocks []cover.ProfileBlock) bool {
for _, b := range blocks {
if b.StartLine >= s.Line && b.StartLine <= e.Line && b.EndLine >= s.Line && b.EndLine <= e.Line {
if b.Count > 0 {
return true
}
}
}
return false
}
func computePercent(s *CoverageTotals) {
if s.Regions.Count > 0 {
s.Regions.Percent = float64(100*s.Regions.Covered) / float64(s.Regions.Count)
}
if s.Lines.Count > 0 {
s.Lines.Percent = float64(100*s.Lines.Covered) / float64(s.Lines.Count)
}
if s.Functions.Count > 0 {
s.Functions.Percent = float64(100*s.Functions.Covered) / float64(s.Functions.Count)
}
}
func main() {
flag.Parse()
if len(flag.Args()) != 1 {
log.Fatalf("needs exactly one argument")
}
profiles, err := cover.ParseProfiles(flag.Args()[0])
if err != nil {
log.Fatalf("failed to parse profiles: %v", err)
}
r := CoverageSummary{}
r.Type = "oss-fuzz.go.coverage.json.export"
r.Version = "2.0.1"
r.Data = make([]CoverageData, 1)
for _, p := range profiles {
fset := token.NewFileSet() // positions are relative to fset
f, err := parser.ParseFile(fset, p.FileName, nil, 0)
if err != nil {
log.Printf("failed to parse go file: %v", err)
continue
}
fileCov := CoverageFile{}
fileCov.Filename = p.FileName
ast.Inspect(f, func(n ast.Node) bool {
switch x := n.(type) {
case *ast.FuncLit:
startf := fset.Position(x.Pos())
endf := fset.Position(x.End())
fileCov.Summary.Functions.Count++
if isFunctionCovered(startf, endf, p.Blocks) {
fileCov.Summary.Functions.Covered++
} else {
fileCov.Summary.Functions.Uncovered++
}
case *ast.FuncDecl:
startf := fset.Position(x.Pos())
endf := fset.Position(x.End())
fileCov.Summary.Functions.Count++
if isFunctionCovered(startf, endf, p.Blocks) {
fileCov.Summary.Functions.Covered++
} else {
fileCov.Summary.Functions.Uncovered++
}
}
return true
})
for _, b := range p.Blocks {
fileCov.Summary.Regions.Count++
if b.Count > 0 {
fileCov.Summary.Regions.Covered++
} else {
fileCov.Summary.Regions.Uncovered++
}
fileCov.Summary.Lines.Count += b.NumStmt
if b.Count > 0 {
fileCov.Summary.Lines.Covered += b.NumStmt
} else {
fileCov.Summary.Lines.Uncovered += b.NumStmt
}
}
r.Data[0].Totals.Regions.Count += fileCov.Summary.Regions.Count
r.Data[0].Totals.Regions.Covered += fileCov.Summary.Regions.Covered
r.Data[0].Totals.Regions.Uncovered += fileCov.Summary.Regions.Uncovered
r.Data[0].Totals.Lines.Count += fileCov.Summary.Lines.Count
r.Data[0].Totals.Lines.Covered += fileCov.Summary.Lines.Covered
r.Data[0].Totals.Lines.Uncovered += fileCov.Summary.Lines.Uncovered
r.Data[0].Totals.Functions.Count += fileCov.Summary.Functions.Count
r.Data[0].Totals.Functions.Covered += fileCov.Summary.Functions.Covered
r.Data[0].Totals.Functions.Uncovered += fileCov.Summary.Functions.Uncovered
computePercent(&fileCov.Summary)
r.Data[0].Files = append(r.Data[0].Files, fileCov)
}
computePercent(&r.Data[0].Totals)
o, err := json.Marshal(r)
if err != nil {
log.Fatalf("failed to generate json: %v", err)
}
fmt.Printf(string(o))
}
================================================
FILE: infra/base-images/base-runner/gocoverage/pprof-merge/LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright {yyyy} {name of copyright owner}
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: infra/base-images/base-runner/gocoverage/pprof-merge/main.go
================================================
// Copyright 2019 Google Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package main
import (
"flag"
"log"
"os"
"github.com/google/pprof/profile"
)
var (
output string
)
func main() {
flag.StringVar(&output, "o", "merged.data", "")
flag.Parse()
files := os.Args[1:]
if len(files) == 0 {
log.Fatal("Give profiles files as arguments")
}
var profiles []*profile.Profile
for _, fname := range files {
f, err := os.Open(fname)
if err != nil {
log.Fatalf("Cannot open profile file at %q: %v", fname, err)
}
p, err := profile.Parse(f)
if err != nil {
log.Fatalf("Cannot parse profile at %q: %v", fname, err)
}
profiles = append(profiles, p)
}
merged, err := profile.Merge(profiles)
if err != nil {
log.Fatalf("Cannot merge profiles: %v", err)
}
out, err := os.OpenFile(output, os.O_RDWR|os.O_CREATE, 0755)
if err != nil {
log.Fatalf("Cannot open output to write: %v", err)
}
if err := merged.Write(out); err != nil {
log.Fatalf("Cannot write merged profile to file: %v", err)
}
if err := out.Close(); err != nil {
log.Printf("Error when closing the output file: %v", err)
}
}
================================================
FILE: infra/base-images/base-runner/install_deps.sh
================================================
#!/bin/bash -eux
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Install dependencies in a platform-aware way.
apt-get update && apt-get install -y \
binutils \
file \
ca-certificates \
fonts-dejavu \
git \
libcap2 \
rsync \
unzip \
jq \
wget \
zip --no-install-recommends
case $(uname -m) in
x86_64)
# We only need to worry about i386 if we are on x86_64.
apt-get install -y lib32gcc1 libc6-i386
;;
esac
================================================
FILE: infra/base-images/base-runner/install_deps_ubuntu_20_04.sh
================================================
#!/bin/bash -eux
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Install dependencies in a platform-aware way.
apt-get update && apt-get install -y \
binutils \
file \
ca-certificates \
fonts-dejavu \
git \
libcap2 \
rsync \
unzip \
jq \
wget \
zip --no-install-recommends
case $(uname -m) in
x86_64)
# We only need to worry about i386 if we are on x86_64.
apt-get install -y lib32gcc1 libc6-i386
;;
esac
================================================
FILE: infra/base-images/base-runner/install_deps_ubuntu_24_04.sh
================================================
#!/bin/bash -eux
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Install dependencies in a platform-aware way.
apt-get update && apt-get install -y \
binutils \
file \
ca-certificates \
fonts-dejavu \
git \
libcap2 \
rsync \
unzip \
jq \
wget \
libunwind-dev \
libblocksruntime-dev \
zip --no-install-recommends
# libunwind-dev libblocksruntime-dev are needed for Honggfuzz
case $(uname -m) in
x86_64)
# We only need to worry about i386 if we are on x86_64.
if grep -q '24.04' /etc/os-release; then
apt-get install -y lib32gcc-s1 libc6-i386
else
apt-get install -y lib32gcc1 libc6-i386
fi
;;
esac
================================================
FILE: infra/base-images/base-runner/install_go.sh
================================================
#!/bin/bash -eux
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Install go on x86_64, don't do anything on ARM.
case $(uname -m) in
x86_64)
# Download and install Go.
export GOROOT=/root/.go
wget https://go.dev/dl/go1.25.0.linux-amd64.tar.gz
mkdir temp-go
tar -C temp-go/ -xzf go1.25.0.linux-amd64.tar.gz
mkdir $GOROOT
mv temp-go/go/* /root/.go/
rm -rf temp-go
echo 'Set "GOPATH=/root/go"'
echo 'Set "PATH=$PATH:/root/.go/bin:$GOPATH/bin"'
# Set up Golang coverage modules.
printf $(find . -name gocoverage)
cd $GOPATH/gocoverage && /root/.go/bin/go install ./...
cd /root/.go/src/cmd/cover && /root/.go/bin/go build && mv cover $GOPATH/bin/gotoolcover
pushd /tmp
git clone --depth=1 https://github.com/AdamKorcz/go-118-fuzz-build --branch=v2
cd go-118-fuzz-build/cmd/convertLibFuzzerTestcaseToStdLibGo
/root/.go/bin/go build .
mv convertLibFuzzerTestcaseToStdLibGo $GOPATH/bin/
popd
;;
aarch64)
# Don't install go because installer is not provided.
echo "Not installing go: aarch64."
;;
*)
echo "Error: unsupported architecture: $(uname -m)"
exit 1
;;
esac
================================================
FILE: infra/base-images/base-runner/install_java.sh
================================================
#!/bin/bash -eux
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Install java in a platform-aware way.
ARCHITECTURE=
case $(uname -m) in
x86_64)
ARCHITECTURE=x64
;;
aarch64)
ARCHITECTURE=aarch64
;;
*)
echo "Error: unsupported architecture: $(uname -m)"
exit 1
;;
esac
wget -q https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.16+8/OpenJDK17U-jdk_"$ARCHITECTURE"_linux_hotspot_17.0.16_8.tar.gz -O /tmp/openjdk-17.0.16_linux-"$ARCHITECTURE"_bin.tar.gz
wget -q https://download.java.net/java/GA/jdk15.0.2/0d1cfde4252546c6931946de8db48ee2/7/GPL/openjdk-15.0.2_linux-"$ARCHITECTURE"_bin.tar.gz -O /tmp/openjdk-15.0.2_linux-"$ARCHITECTURE"_bin.tar.gz
cd /tmp
mkdir -p $JAVA_HOME
tar -xz --strip-components=1 -f openjdk-17.0.16_linux-"$ARCHITECTURE"_bin.tar.gz --directory $JAVA_HOME
rm -f openjdk-17.0.16_linux-"$ARCHITECTURE"_bin.tar.gz
rm -rf $JAVA_HOME/jmods $JAVA_HOME/lib/src.zip
# Install OpenJDK 15 and trim its size by removing unused components. Some projects only run with Java 15.
mkdir -p $JAVA_15_HOME
tar -xz --strip-components=1 -f openjdk-15.0.2_linux-"$ARCHITECTURE"_bin.tar.gz --directory $JAVA_15_HOME
rm -f openjdk-15.0.2_linux-"$ARCHITECTURE"_bin.tar.gz
rm -rf $JAVA_15_HOME/jmods $JAVA_15_HOME/lib/src.zip
================================================
FILE: infra/base-images/base-runner/install_javascript.sh
================================================
#!/bin/bash -eux
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# see installation instructions: https://github.com/nodesource/distributions#available-architectures
apt-get update
apt-get install -y ca-certificates curl gnupg
mkdir -p /etc/apt/keyrings
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
NODE_MAJOR=20
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list
apt-get update
apt-get install nodejs -y
# Install latest versions of nyc for source-based coverage reporting
npm install --global nyc
================================================
FILE: infra/base-images/base-runner/jacoco_report_converter.py
================================================
#!/usr/bin/env python3
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Helper script for creating an llvm-cov style JSON summary from a JaCoCo XML
report."""
import json
import os
import sys
import xml.etree.ElementTree as ET
def convert(xml):
"""Turns a JaCoCo XML report into an llvm-cov JSON summary."""
summary = {
'type': 'oss-fuzz.java.coverage.json.export',
'version': '1.0.0',
'data': [{
'totals': {},
'files': [],
}],
}
report = ET.fromstring(xml)
totals = make_element_summary(report)
summary['data'][0]['totals'] = totals
# Since Java compilation does not track source file location, we match
# coverage info to source files via the full class name, e.g. we search for
# a path in /out/src ending in foo/bar/Baz.java for the class foo.bar.Baz.
# Under the assumptions that a given project only ever contains a single
# version of a class and that no class name appears as a suffix of another
# class name, we can assign coverage info to every source file matched in that
# way.
src_files = list_src_files()
for class_element in report.findall('./package/class'):
# Skip fuzzer classes
if is_fuzzer_class(class_element):
continue
# Skip non class elements
if 'sourcefilename' not in class_element.attrib:
continue
class_name = class_element.attrib['name']
package_name = os.path.dirname(class_name)
basename = class_element.attrib['sourcefilename']
# This path is 'foo/Bar.java' for the class element
# .
canonical_path = os.path.join(package_name, basename)
class_summary = make_element_summary(class_element)
for src_file in relative_to_src_path(src_files, canonical_path):
summary['data'][0]['files'].append({
'filename': src_file,
'summary': class_summary,
})
return json.dumps(summary)
def list_src_files():
"""Returns a map from basename to full path for all files in $OUT/$SRC."""
filename_to_paths = {}
out_path = os.environ['OUT'] + '/'
src_path = os.environ['SRC']
src_in_out = out_path + src_path
for dirpath, _, filenames in os.walk(src_in_out):
for filename in filenames:
full_path = dirpath + '/' + filename
# Map /out//src/... to /src/...
file_path = full_path[len(out_path):]
filename_to_paths.setdefault(filename, []).append(file_path)
return filename_to_paths
def is_fuzzer_class(class_element):
"""Check if the class is fuzzer class."""
method_elements = class_element.find('./method[@name=\"fuzzerTestOneInput\"]')
if method_elements:
return True
return False
def relative_to_src_path(src_files, canonical_path):
"""Returns all paths in src_files ending in canonical_path."""
basename = os.path.basename(canonical_path)
if basename not in src_files:
return []
candidate_paths = src_files[basename]
return [
path for path in candidate_paths if path.endswith("/" + canonical_path)
]
def make_element_summary(element):
"""Returns a coverage summary for an element in the XML report."""
summary = {}
function_counter = element.find('./counter[@type=\'METHOD\']')
summary['functions'] = make_counter_summary(function_counter)
line_counter = element.find('./counter[@type=\'LINE\']')
summary['lines'] = make_counter_summary(line_counter)
# JaCoCo tracks branch coverage, which counts the covered control-flow edges
# between llvm-cov's regions instead of the covered regions themselves. For
# non-trivial code parts, the difference is usually negligible. However, if
# all methods of a class consist of a single region only (no branches),
# JaCoCo does not report any branch coverage even if there is instruction
# coverage. Since this would give incorrect results for CI Fuzz purposes, we
# increase the regions counter by 1 if there is any amount of instruction
# coverage.
instruction_counter = element.find('./counter[@type=\'INSTRUCTION\']')
has_some_coverage = instruction_counter is not None and int(
instruction_counter.attrib["covered"]) > 0
branch_covered_adjustment = 1 if has_some_coverage else 0
region_counter = element.find('./counter[@type=\'BRANCH\']')
summary['regions'] = make_counter_summary(
region_counter, covered_adjustment=branch_covered_adjustment)
return summary
def make_counter_summary(counter_element, covered_adjustment=0):
"""Turns a JaCoCo element into an llvm-cov totals entry."""
summary = {}
covered = covered_adjustment
missed = 0
if counter_element is not None:
covered += int(counter_element.attrib['covered'])
missed += int(counter_element.attrib['missed'])
summary['covered'] = covered
summary['notcovered'] = missed
summary['count'] = summary['covered'] + summary['notcovered']
if summary['count'] != 0:
summary['percent'] = (100.0 * summary['covered']) / summary['count']
else:
summary['percent'] = 0
return summary
def main():
"""Produces an llvm-cov style JSON summary from a JaCoCo XML report."""
if len(sys.argv) != 3:
sys.stderr.write('Usage: %s \n' %
sys.argv[0])
return 1
with open(sys.argv[1], 'r') as xml_file:
xml_report = xml_file.read()
json_summary = convert(xml_report)
with open(sys.argv[2], 'w') as json_file:
json_file.write(json_summary)
return 0
if __name__ == '__main__':
sys.exit(main())
================================================
FILE: infra/base-images/base-runner/nyc_report_converter.py
================================================
#!/usr/bin/env python3
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Helper script for creating a llvm-cov style JSON summary from a nyc
JSON summary."""
import json
import sys
def convert(nyc_json_summary):
"""Turns a nyc JSON report into a llvm-cov JSON summary."""
summary = {
'type':
'oss-fuzz.javascript.coverage.json.export',
'version':
'1.0.0',
'data': [{
'totals':
file_summary(nyc_json_summary['total']),
'files': [{
'filename': src_file,
'summary': file_summary(nyc_json_summary[src_file])
} for src_file in nyc_json_summary if src_file != 'total'],
}],
}
return json.dumps(summary)
def file_summary(nyc_file_summary):
"""Returns a summary for a given file in the nyc JSON summary report."""
return {
'functions': element_summary(nyc_file_summary['functions']),
'lines': element_summary(nyc_file_summary['lines']),
'regions': element_summary(nyc_file_summary['branches'])
}
def element_summary(element):
"""Returns a summary of a coverage element in the nyc JSON summary
of the file"""
return {
'count': element['total'],
'covered': element['covered'],
'notcovered': element['total'] - element['covered'] - element['skipped'],
'percent': element['pct'] if element['pct'] != 'Unknown' else 0
}
def main():
"""Produces a llvm-cov style JSON summary from a nyc JSON summary."""
if len(sys.argv) != 3:
sys.stderr.write('Usage: %s \n' %
sys.argv[0])
return 1
with open(sys.argv[1], 'r') as nyc_json_summary_file:
nyc_json_summary = json.load(nyc_json_summary_file)
json_summary = convert(nyc_json_summary)
with open(sys.argv[2], 'w') as json_output_file:
json_output_file.write(json_summary)
return 0
if __name__ == '__main__':
sys.exit(main())
================================================
FILE: infra/base-images/base-runner/parse_options.py
================================================
#!/usr/bin/env python3
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Helper script for parsing custom fuzzing options."""
import configparser
import sys
def parse_options(options_file_path, options_section):
"""Parses the given file and returns options from the given section."""
parser = configparser.ConfigParser()
parser.read(options_file_path)
if not parser.has_section(options_section):
return None
options = parser[options_section]
if options_section == 'libfuzzer':
options_string = ' '.join(
'-%s=%s' % (key, value) for key, value in options.items())
else:
# Sanitizer options.
options_string = ':'.join(
'%s=%s' % (key, value) for key, value in options.items())
return options_string
def main():
"""Processes the arguments and prints the options in the correct format."""
if len(sys.argv) < 3:
sys.stderr.write('Usage: %s \n' %
sys.argv[0])
return 1
options = parse_options(sys.argv[1], sys.argv[2])
if options is not None:
print(options)
return 0
if __name__ == "__main__":
sys.exit(main())
================================================
FILE: infra/base-images/base-runner/profraw_update.py
================================================
#!/usr/bin/env python3
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Helper script for upgrading a profraw file to latest version."""
from collections import namedtuple
import struct
import subprocess
import sys
HeaderGeneric = namedtuple('HeaderGeneric', 'magic version')
HeaderVersion9 = namedtuple(
'HeaderVersion9',
'BinaryIdsSize DataSize PaddingBytesBeforeCounters CountersSize \
PaddingBytesAfterCounters NumBitmapBytes PaddingBytesAfterBitmapBytes NamesSize CountersDelta BitmapDelta NamesDelta ValueKindLast'
)
PROFRAW_MAGIC = 0xff6c70726f667281
def relativize_address(data, offset, databegin, sect_prf_cnts, sect_prf_data):
"""Turns an absolute offset into a relative one."""
value = struct.unpack('Q', data[offset:offset + 8])[0]
if sect_prf_cnts <= value < sect_prf_data:
# If the value is an address in the right section, make it relative.
value = (value - databegin) & 0xffffffffffffffff
value = struct.pack('Q', value)
for i in range(8):
data[offset + i] = value[i]
# address was made relative
return True
# no changes done
return False
def upgrade(data, sect_prf_cnts, sect_prf_data):
"""Upgrades profraw data, knowing the sections addresses."""
generic_header = HeaderGeneric._make(struct.unpack('QQ', data[:16]))
if generic_header.magic != PROFRAW_MAGIC:
raise Exception('Bad magic.')
base_version = generic_header.version
if base_version >= 9:
# Nothing to do.
return data
if base_version < 5 or base_version == 6:
raise Exception('Unhandled version.')
if generic_header.version == 5:
generic_header = generic_header._replace(version=7)
# Upgrade from version 5 to 7 by adding binaryids field.
data = data[:8] + struct.pack('Q', generic_header.version) + struct.pack(
'Q', 0) + data[16:]
if generic_header.version == 7:
# cf https://reviews.llvm.org/D111123
generic_header = generic_header._replace(version=8)
data = data[:8] + struct.pack('Q', generic_header.version) + data[16:]
if generic_header.version == 8:
# see https://reviews.llvm.org/D138846
generic_header = generic_header._replace(version=9)
# Upgrade from version 8 to 9 by adding NumBitmapBytes, PaddingBytesAfterBitmapBytes and BitmapDelta fields.
data = data[:8] + struct.pack(
'Q', generic_header.version) + data[16:56] + struct.pack(
'QQ', 0, 0) + data[56:72] + struct.pack('Q', 0) + data[72:]
v9_header = HeaderVersion9._make(struct.unpack('QQQQQQQQQQQQ', data[16:112]))
if base_version <= 8 and v9_header.BinaryIdsSize % 8 != 0:
# Adds padding for binary ids.
# cf commit b9f547e8e51182d32f1912f97a3e53f4899ea6be
# cf https://reviews.llvm.org/D110365
padlen = 8 - (v9_header.BinaryIdsSize % 8)
v7_header = v9_header._replace(BinaryIdsSize=v9_header.BinaryIdsSize +
padlen)
data = data[:16] + struct.pack('Q', v9_header.BinaryIdsSize) + data[24:]
data = data[:112 + v9_header.BinaryIdsSize] + bytes(
padlen) + data[112 + v9_header.BinaryIdsSize:]
if base_version <= 8:
offset = 112 + v9_header.BinaryIdsSize
for d in range(v9_header.DataSize):
# Add BitmapPtr and aligned u32(NumBitmapBytes)
data = data[:offset + 3 * 8] + struct.pack(
'Q', 0) + data[offset + 3 * 8:offset + 6 * 8] + struct.pack(
'Q', 0) + data[offset + 6 * 8:]
value = struct.unpack('Q',
data[offset + 2 * 8:offset + 3 * 8])[0] - 16 * d
data = data[:offset + 2 * 8] + struct.pack('Q',
value) + data[offset + 3 * 8:]
offset += 8 * 8
if base_version >= 8:
# Nothing more to do.
return data
# Last changes are relaed to bump from 7 to version 8 making CountersPtr relative.
dataref = sect_prf_data
# 80 is offset of CountersDelta.
if not relativize_address(data, 80, dataref, sect_prf_cnts, sect_prf_data):
return data
offset = 112 + v9_header.BinaryIdsSize
# This also works for C+Rust binaries compiled with
# clang-14/rust-nightly-clang-13.
for _ in range(v9_header.DataSize):
# 16 is the offset of CounterPtr in ProfrawData structure.
relativize_address(data, offset + 16, dataref, sect_prf_cnts, sect_prf_data)
# We need this because of CountersDelta -= sizeof(*SrcData);
# seen in __llvm_profile_merge_from_buffer.
dataref += 44 + 2 * (v9_header.ValueKindLast + 1)
if was8:
#profraw9 added RelativeBitmapPtr and NumBitmapBytes (8+4 rounded up to 16)
dataref -= 16
# This is the size of one ProfrawData structure.
offset += 44 + 2 * (v9_header.ValueKindLast + 1)
return data
def main():
"""Helper script for upgrading a profraw file to latest version."""
if len(sys.argv) < 3:
sys.stderr.write('Usage: %s options? ...\n' % sys.argv[0])
return 1
# First find llvm profile sections addresses in the elf, quick and dirty.
process = subprocess.Popen(['readelf', '-S', sys.argv[1]],
stdout=subprocess.PIPE)
output, err = process.communicate()
if err:
print('readelf failed')
return 2
for line in iter(output.split(b'\n')):
if b'__llvm_prf_cnts' in line:
sect_prf_cnts = int(line.split()[3], 16)
elif b'__llvm_prf_data' in line:
sect_prf_data = int(line.split()[3], 16)
out_name = "default.profup"
in_place = False
start = 2
if sys.argv[2] == "-i":
in_place = True
start = start + 1
elif sys.argv[2] == "-o":
out_name = sys.argv[3]
start = 4
if len(sys.argv) < start:
sys.stderr.write('Usage: %s options ...\n' % sys.argv[0])
return 1
for i in range(start, len(sys.argv)):
# Then open and read the input profraw file.
with open(sys.argv[i], 'rb') as input_file:
profraw_base = bytearray(input_file.read())
# Do the upgrade, returning a bytes object.
profraw_latest = upgrade(profraw_base, sect_prf_cnts, sect_prf_data)
# Write the output to the file given to the command line.
if in_place:
out_name = sys.argv[i]
with open(out_name, 'wb') as output_file:
output_file.write(profraw_latest)
return 0
if __name__ == '__main__':
sys.exit(main())
================================================
FILE: infra/base-images/base-runner/python_coverage_runner_help.py
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Helper to manage coverage.py related operations. Does two main
things: (1) pass commands into the coverage.py library and (2)
translate .coverage created from a pyinstaller executable into
paths that match local files. This is needed for html report creation.
"""
import os
import re
import json
import sys
from coverage.cmdline import main as coverage_main
from coverage.data import CoverageData
def should_exclude_file(filepath):
"""Returns whether the path should be excluded from the coverage report."""
# Skip all atheris code
if "atheris" in filepath:
return True
# Filter out all standard python libraries
if ('/usr/local/lib/python' in filepath and
'site-packages' not in filepath and 'dist-packages' not in filepath):
return True
# Avoid all PyInstaller modules.
if 'PyInstaller' in filepath:
return True
return False
def translate_lines(cov_data, new_cov_data, all_file_paths):
"""
Translate lines in a .coverage file created by coverage.py such that
the file paths points to local files instead. This is needed when collecting
coverage from executables created by pyinstaller.
"""
for pyinstaller_file_path in cov_data.measured_files():
stripped_py_file_path = pyinstaller_file_path
if stripped_py_file_path.startswith('/tmp/_MEI'):
stripped_py_file_path = '/'.join(stripped_py_file_path.split('/')[3:])
if stripped_py_file_path.startswith('/out/'):
stripped_py_file_path = stripped_py_file_path.replace('/out/', '')
# Check if this file exists in our file paths:
for local_file_path in all_file_paths:
if should_exclude_file(local_file_path):
continue
if local_file_path.endswith(stripped_py_file_path):
print('Found matching: %s' % (local_file_path))
new_cov_data.add_lines(
{local_file_path: cov_data.lines(pyinstaller_file_path)})
def translate_coverage(all_file_paths):
"""
Translate pyinstaller-generated file paths in .coverage (produced by
coverage.py) into local file paths. Place result in .new_coverage.
"""
covdata_pre_translation = CoverageData('.coverage')
covdata_post_translation = CoverageData('.new_coverage')
covdata_pre_translation.read()
translate_lines(covdata_pre_translation, covdata_post_translation,
all_file_paths)
covdata_post_translation.write()
def convert_coveragepy_cov_to_summary_json(src, dst):
"""
Converts a json file produced by coveragepy into a summary.json file
similary to llvm-cov output. `src` is the source coveragepy json file,
`dst` is the destination json file, which will be overwritten.
"""
dst_dict = {'data': [{'files': {}}]}
lines_covered = 0
lines_count = 0
with open(src, "r") as src_f:
src_json = json.loads(src_f.read())
if 'files' in src_json:
for elem in src_json.get('files'):
if 'summary' not in src_json['files'][elem]:
continue
src_dict = src_json['files'][elem]['summary']
count = src_dict['covered_lines'] + src_dict['missing_lines']
covered = src_dict['covered_lines']
notcovered = src_dict['missing_lines']
percent = src_dict['percent_covered']
# Accumulate line coverage
lines_covered += covered
lines_count += count
dst_dict['data'][0]['files'][elem] = {
'summary': {
'lines': {
'count': count,
'covered': covered,
'notcovered': notcovered,
'percent': percent
}
}
}
if lines_count > 0:
lines_covered_percent = lines_covered / lines_count
else:
lines_covered_percent = 0.0
dst_dict['data'][0]['totals'] = {
'branches': {
'count': 0,
'covered': 0,
'notcovered': 0,
'percent': 0.0
},
'functions': {
'count': 0,
'covered': 0,
'percent': 0.0
},
'instantiations': {
'count': 0,
'covered': 0,
'percent': 0.0
},
'lines': {
'count': lines_count,
'covered': lines_covered,
'percent': lines_covered_percent
},
'regions': {
'count': 0,
'covered': 0,
'notcovered': 0,
'percent': 0.0
}
}
with open(dst, 'w') as dst_f:
dst_f.write(json.dumps(dst_dict))
def main():
"""
Main handler.
"""
if sys.argv[1] == 'translate':
print('Translating the coverage')
files_path = sys.argv[2]
all_file_paths = list()
for root, _, files in os.walk(files_path):
for relative_file_path in files:
abs_file_path = os.path.abspath(os.path.join(root, relative_file_path))
all_file_paths.append(abs_file_path)
print('Done with path walk')
translate_coverage(all_file_paths)
elif sys.argv[1] == 'convert-to-summary-json':
src = sys.argv[2]
dst = sys.argv[3]
convert_coveragepy_cov_to_summary_json(src, dst)
else:
# Pass commands into coverage package
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(coverage_main())
if __name__ == '__main__':
main()
================================================
FILE: infra/base-images/base-runner/rcfilt
================================================
#!/bin/bash -u
# Copyright 2020 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Symbol demangling for both C++ and Rust
#
################################################################################
# simply pipe
rustfilt | c++filt -n
================================================
FILE: infra/base-images/base-runner/reproduce
================================================
#!/bin/bash -eux
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FUZZER=$1
shift
if [ ! -v TESTCASE ]; then
TESTCASE="/testcase"
fi
if [ ! -f $TESTCASE ]; then
echo "Error: $TESTCASE not found, use: docker run -v :$TESTCASE ..."
exit 1
fi
export RUN_FUZZER_MODE="interactive"
export FUZZING_ENGINE="libfuzzer"
export SKIP_SEED_CORPUS="1"
run_fuzzer $FUZZER $@ $TESTCASE
================================================
FILE: infra/base-images/base-runner/run_fuzzer
================================================
#!/bin/bash -eu
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Fuzzer runner. Appends .options arguments and seed corpus to users args.
# Usage: $0
sysctl -w vm.mmap_rnd_bits=28
export PATH=$OUT:$PATH
cd $OUT
DEBUGGER=${DEBUGGER:-}
FUZZER=$1
shift
# This env var is set by CIFuzz. CIFuzz fills this directory with the corpus
# from ClusterFuzz.
CORPUS_DIR=${CORPUS_DIR:-}
if [ -z "$CORPUS_DIR" ]
then
CORPUS_DIR="/tmp/${FUZZER}_corpus"
rm -rf $CORPUS_DIR && mkdir -p $CORPUS_DIR
fi
SANITIZER=${SANITIZER:-}
if [ -z $SANITIZER ]; then
# If $SANITIZER is not specified (e.g. calling from `reproduce` command), it
# is not important and can be set to any value.
SANITIZER="default"
fi
if [[ "$RUN_FUZZER_MODE" = interactive ]]; then
FUZZER_OUT="$OUT/${FUZZER}_${FUZZING_ENGINE}_${SANITIZER}_out"
else
FUZZER_OUT="/tmp/${FUZZER}_${FUZZING_ENGINE}_${SANITIZER}_out"
fi
function get_dictionary() {
local options_file="$FUZZER.options"
local dict_file="$FUZZER.dict"
local dict=""
if [[ -f "$options_file" ]]; then
dict=$(sed -n 's/^\s*dict\s*=\s*\(.*\)/\1/p' "$options_file" | tail -1)
fi
if [[ -z "$dict" && -f "$dict_file" ]]; then
dict="$dict_file"
fi
[[ -z "$dict" ]] && return
if [[ "$FUZZING_ENGINE" = "libfuzzer" ]]; then
printf -- "-dict=%s" "$dict"
elif [[ "$FUZZING_ENGINE" = "afl" ]]; then
printf -- "-x %s" "$dict"
elif [[ "$FUZZING_ENGINE" = "honggfuzz" ]]; then
printf -- "--dict %s" "$dict"
elif [[ "$FUZZING_ENGINE" = "centipede" ]]; then
printf -- "--dictionary %s" "$dict"
else
printf "Unexpected FUZZING_ENGINE: $FUZZING_ENGINE, ignoring\n" >&2
fi
}
function get_extra_binaries() {
[[ "$FUZZING_ENGINE" != "centipede" ]] && return
extra_binaries="$OUT/__centipede_${SANITIZER}/${FUZZER}"
if compgen -G "$extra_binaries" >> /dev/null; then
printf -- "--extra_binaries %s" \""$extra_binaries\""
fi
}
rm -rf $FUZZER_OUT && mkdir -p $FUZZER_OUT
SEED_CORPUS="${FUZZER}_seed_corpus.zip"
# TODO: Investigate why this code block is skipped
# by all default fuzzers in bad_build_check.
# They all set SKIP_SEED_CORPUS=1.
if [ -f $SEED_CORPUS ] && [ -z ${SKIP_SEED_CORPUS:-} ]; then
echo "Using seed corpus: $SEED_CORPUS"
unzip -o -d ${CORPUS_DIR}/ $SEED_CORPUS > /dev/null
fi
OPTIONS_FILE="${FUZZER}.options"
CUSTOM_LIBFUZZER_OPTIONS=""
if [ -f $OPTIONS_FILE ]; then
custom_asan_options=$(parse_options.py $OPTIONS_FILE asan)
if [ ! -z $custom_asan_options ]; then
export ASAN_OPTIONS="$ASAN_OPTIONS:$custom_asan_options"
fi
custom_msan_options=$(parse_options.py $OPTIONS_FILE msan)
if [ ! -z $custom_msan_options ]; then
export MSAN_OPTIONS="$MSAN_OPTIONS:$custom_msan_options"
fi
custom_ubsan_options=$(parse_options.py $OPTIONS_FILE ubsan)
if [ ! -z $custom_ubsan_options ]; then
export UBSAN_OPTIONS="$UBSAN_OPTIONS:$custom_ubsan_options"
fi
CUSTOM_LIBFUZZER_OPTIONS=$(parse_options.py $OPTIONS_FILE libfuzzer)
fi
if [[ "$FUZZING_ENGINE" = afl ]]; then
# Set afl++ environment options.
export ASAN_OPTIONS="$ASAN_OPTIONS:abort_on_error=1:symbolize=0:detect_odr_violation=0:"
export MSAN_OPTIONS="$MSAN_OPTIONS:exit_code=86:symbolize=0"
export UBSAN_OPTIONS="$UBSAN_OPTIONS:symbolize=0"
export AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1
export AFL_SKIP_CPUFREQ=1
export AFL_TRY_AFFINITY=1
export AFL_FAST_CAL=1
export AFL_CMPLOG_ONLY_NEW=1
export AFL_FORKSRV_INIT_TMOUT=30000
export AFL_IGNORE_PROBLEMS=1
export AFL_IGNORE_UNKNOWN_ENVS=1
# If $OUT/afl_cmplog.txt is present this means the target was compiled for
# CMPLOG. So we have to add the proper parameters to afl-fuzz.
test -e "$OUT/afl_cmplog.txt" && AFL_FUZZER_ARGS="$AFL_FUZZER_ARGS -c $OUT/$FUZZER"
# If $OUT/afl++.dict we load it as a dictionary for afl-fuzz.
test -e "$OUT/afl++.dict" && AFL_FUZZER_ARGS="$AFL_FUZZER_ARGS -x $OUT/afl++.dict"
# Ensure timeout is a bit larger than 1sec as some of the OSS-Fuzz fuzzers
# are slower than this.
AFL_FUZZER_ARGS="$AFL_FUZZER_ARGS -t 5000+"
# AFL expects at least 1 file in the input dir.
echo input > ${CORPUS_DIR}/input
CMD_LINE="$OUT/afl-fuzz $AFL_FUZZER_ARGS -i $CORPUS_DIR -o $FUZZER_OUT $(get_dictionary) $* -- $OUT/$FUZZER"
echo afl++ setup:
env|grep AFL_
cat "$OUT/afl_options.txt"
elif [[ "$FUZZING_ENGINE" = honggfuzz ]]; then
# Honggfuzz expects at least 1 file in the input dir.
echo input > $CORPUS_DIR/input
# --exit_upon_crash: exit whith a first crash seen
# -V: verify crashes
# -R (report): save report file to this location
# -W (working dir): where the crashes go
# -v (verbose): don't use VTE UI, just stderr
# -z: use software-instrumentation of clang (trace-pc-guard....)
# -P: use persistent mode of fuzzing (i.e. LLVMFuzzerTestOneInput)
# -f: location of the initial (and destination) file corpus
# -n: number of fuzzing threads (and processes)
CMD_LINE="$OUT/honggfuzz -n 1 --exit_upon_crash -V -R /tmp/${FUZZER}_honggfuzz.report -W $FUZZER_OUT -v -z -P -f \"$CORPUS_DIR\" $(get_dictionary) $* -- \"$OUT/$FUZZER\""
if [[ $(LC_ALL=C grep -P "\x01_LIBHFUZZ_NETDRIVER_BINARY_SIGNATURE_\x02\xFF" "$FUZZER" ) ]]; then
# Honggfuzz Netdriver port. This must match the port in Clusterfuzz.
export HFND_TCP_PORT=8666
fi
elif [[ "$FUZZING_ENGINE" = centipede ]]; then
# Create the work and corpus directory for Centipede.
CENTIPEDE_WORKDIR="${CENTIPEDE_WORKDIR:-$OUT}"
# Centipede only saves crashes to crashes/ in workdir.
rm -rf $FUZZER_OUT
# --workdir: Dir that stores corpus&features in Centipede's own format.
# --corpus_dir: Location of the initial (and destination) file corpus.
# --fork_server: Execute the target(s) via the fork server.
# Run in fork mode to continue fuzzing indefinitely in case of
# OOMs, timeouts, and crashes.
# --exit_on_crash=0: Do not exit on crash.
# --timeout=1200: The process that executes target binary will abort
# if an input runs >= 1200s.
# --rss_limit_mb=0: Disable RSS limit.
# --address_space_limit_mb=0: No address space limit.
# --binary: The target binary under test without sanitizer.
# --extra_binary: The target binaries under test with sanitizers.
CMD_LINE="$OUT/centipede --workdir=$CENTIPEDE_WORKDIR --corpus_dir=\"$CORPUS_DIR\" --fork_server=1 --exit_on_crash=1 --timeout=1200 --rss_limit_mb=4096 --address_space_limit_mb=5120 $(get_dictionary) --binary=\"$OUT/${FUZZER}\" $(get_extra_binaries) $*"
else
CMD_LINE="$OUT/$FUZZER -- $FUZZER_ARGS $*"
if [ -z ${SKIP_SEED_CORPUS:-} ]; then
CMD_LINE="$CMD_LINE $CORPUS_DIR"
fi
if [[ ! -z ${CUSTOM_LIBFUZZER_OPTIONS} ]]; then
CMD_LINE="$CMD_LINE $CUSTOM_LIBFUZZER_OPTIONS"
fi
if [[ ! "$CMD_LINE" =~ "-dict=" ]]; then
if [ -f "$FUZZER.dict" ]; then
CMD_LINE="$CMD_LINE -dict=$FUZZER.dict"
fi
fi
CMD_LINE="$CMD_LINE < /dev/null"
fi
echo $CMD_LINE
# Unset OUT so the fuzz target can't rely on it.
unset OUT
if [ ! -z "$DEBUGGER" ]; then
CMD_LINE="$DEBUGGER $CMD_LINE"
fi
bash -c "$CMD_LINE"
================================================
FILE: infra/base-images/base-runner/ruzzy
================================================
#!/usr/bin/env bash
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
ASAN_OPTIONS="allocator_may_return_null=1:detect_leaks=0:use_sigaltstack=0" LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
ruby $@
================================================
FILE: infra/base-images/base-runner/targets_list
================================================
#!/bin/bash
for binary in $(find $OUT/ -executable -type f); do
[[ "$binary" != *.so ]] || continue
[[ $(basename "$binary") != jazzer_driver* ]] || continue
file "$binary" | grep -e ELF -e "shell script" > /dev/null 2>&1 || continue
grep "LLVMFuzzerTestOneInput" "$binary" > /dev/null 2>&1 || continue
basename "$binary"
done
================================================
FILE: infra/base-images/base-runner/test_all.py
================================================
#!/usr/bin/env python3
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Does bad_build_check on all fuzz targets in $OUT."""
import contextlib
import multiprocessing
import os
import re
import subprocess
import stat
import sys
import tempfile
BASE_TMP_FUZZER_DIR = '/tmp/not-out'
EXECUTABLE = stat.S_IEXEC | stat.S_IXGRP | stat.S_IXOTH
IGNORED_TARGETS = [
r'do_stuff_fuzzer', r'checksum_fuzzer', r'fuzz_dump', r'fuzz_keyring',
r'xmltest', r'fuzz_compression_sas_rle', r'ares_*_fuzzer'
]
IGNORED_TARGETS_RE = re.compile('^' + r'$|^'.join(IGNORED_TARGETS) + '$')
def move_directory_contents(src_directory, dst_directory):
"""Moves contents of |src_directory| to |dst_directory|."""
# Use mv because mv preserves file permissions. If we don't preserve file
# permissions that can mess up CheckFuzzerBuildTest in cifuzz_test.py and
# other cases where one is calling test_all on files not in OSS-Fuzz's real
# out directory.
src_contents = [
os.path.join(src_directory, filename)
for filename in os.listdir(src_directory)
]
command = ['mv'] + src_contents + [dst_directory]
subprocess.check_call(command)
def is_elf(filepath):
"""Returns True if |filepath| is an ELF file."""
result = subprocess.run(['file', filepath],
stdout=subprocess.PIPE,
check=False)
return b'ELF' in result.stdout
def is_shell_script(filepath):
"""Returns True if |filepath| is a shell script."""
result = subprocess.run(['file', filepath],
stdout=subprocess.PIPE,
check=False)
return b'shell script' in result.stdout
def find_fuzz_targets(directory):
"""Returns paths to fuzz targets in |directory|."""
# TODO(https://github.com/google/oss-fuzz/issues/4585): Use libClusterFuzz for
# this.
fuzz_targets = []
for filename in os.listdir(directory):
path = os.path.join(directory, filename)
if filename == 'llvm-symbolizer':
continue
if filename.startswith('afl-'):
continue
if filename.startswith('jazzer_'):
continue
if not os.path.isfile(path):
continue
if not os.stat(path).st_mode & EXECUTABLE:
continue
# Fuzz targets can either be ELF binaries or shell scripts (e.g. wrapper
# scripts for Python and JVM targets or rules_fuzzing builds with runfiles
# trees).
if not is_elf(path) and not is_shell_script(path):
continue
if os.getenv('FUZZING_ENGINE') not in {'none', 'wycheproof'}:
with open(path, 'rb') as file_handle:
binary_contents = file_handle.read()
if b'LLVMFuzzerTestOneInput' not in binary_contents:
continue
fuzz_targets.append(path)
return fuzz_targets
def do_bad_build_check(fuzz_target):
"""Runs bad_build_check on |fuzz_target|. Returns a
Subprocess.ProcessResult."""
print('INFO: performing bad build checks for', fuzz_target)
if centipede_needs_auxiliaries():
print('INFO: Finding Centipede\'s auxiliary for target', fuzz_target)
auxiliary_path = find_centipede_auxiliary(fuzz_target)
print('INFO: Using auxiliary binary:', auxiliary_path)
auxiliary = [auxiliary_path]
else:
auxiliary = []
command = ['bad_build_check', fuzz_target] + auxiliary
with tempfile.TemporaryDirectory() as temp_centipede_workdir:
# Do this so that centipede doesn't fill up the disk during bad build check
env = os.environ.copy()
env['CENTIPEDE_WORKDIR'] = temp_centipede_workdir
return subprocess.run(command,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE,
env=env,
check=False)
def get_broken_fuzz_targets(bad_build_results, fuzz_targets):
"""Returns a list of broken fuzz targets and their process results in
|fuzz_targets| where each item in |bad_build_results| is the result of
bad_build_check on the corresponding element in |fuzz_targets|."""
broken = []
for result, fuzz_target in zip(bad_build_results, fuzz_targets):
if result.returncode != 0:
broken.append((fuzz_target, result))
return broken
def has_ignored_targets(out_dir):
"""Returns True if |out_dir| has any fuzz targets we are supposed to ignore
bad build checks of."""
out_files = set(os.listdir(out_dir))
for filename in out_files:
if re.match(IGNORED_TARGETS_RE, filename):
return True
return False
@contextlib.contextmanager
def use_different_out_dir():
"""Context manager that moves OUT to subdirectory of BASE_TMP_FUZZER_DIR. This
is useful for catching hardcoding. Note that this sets the environment
variable OUT and therefore must be run before multiprocessing.Pool is created.
Resets OUT at the end."""
# Use a fake OUT directory to catch path hardcoding that breaks on
# ClusterFuzz.
initial_out = os.getenv('OUT')
os.makedirs(BASE_TMP_FUZZER_DIR, exist_ok=True)
# Use a random subdirectory of BASE_TMP_FUZZER_DIR to allow running multiple
# instances of test_all in parallel (useful for integration testing).
with tempfile.TemporaryDirectory(dir=BASE_TMP_FUZZER_DIR) as out:
# Set this so that run_fuzzer which is called by bad_build_check works
# properly.
os.environ['OUT'] = out
# We move the contents of the directory because we can't move the
# directory itself because it is a mount.
move_directory_contents(initial_out, out)
try:
yield out
finally:
move_directory_contents(out, initial_out)
os.environ['OUT'] = initial_out
def test_all_outside_out(allowed_broken_targets_percentage):
"""Wrapper around test_all that changes OUT and returns the result."""
with use_different_out_dir() as out:
return test_all(out, allowed_broken_targets_percentage)
def centipede_needs_auxiliaries():
"""Checks if auxiliaries are needed for Centipede."""
# Centipede always requires unsanitized binaries as the main fuzz targets,
# and separate sanitized binaries as auxiliaries.
# 1. Building sanitized binaries with helper.py (i.e., local or GitHub CI):
# Unsanitized ones will be built automatically into the same docker container.
# Script bad_build_check tests both
# a) If main fuzz targets can run with the auxiliaries, and
# b) If the auxiliaries are built with the correct sanitizers.
# 2. In Trial build and production build:
# Two kinds of binaries will be in separated buckets / docker containers.
# Script bad_build_check tests either
# a) If the unsanitized binaries can run without the sanitized ones, or
# b) If the sanitized binaries are built with the correct sanitizers.
return (os.getenv('FUZZING_ENGINE') == 'centipede' and
os.getenv('SANITIZER') != 'none' and os.getenv('HELPER') == 'True')
def find_centipede_auxiliary(main_fuzz_target_path):
"""Finds the sanitized binary path that corresponds to |main_fuzz_target| for
bad_build_check."""
target_dir, target_name = os.path.split(main_fuzz_target_path)
sanitized_binary_dir = os.path.join(target_dir,
f'__centipede_{os.getenv("SANITIZER")}')
sanitized_binary_path = os.path.join(sanitized_binary_dir, target_name)
if os.path.isfile(sanitized_binary_path):
return sanitized_binary_path
# Neither of the following two should ever happen, returns None to indicate
# an error.
if os.path.isdir(sanitized_binary_dir):
print('ERROR: Unable to identify Centipede\'s sanitized target'
f'{sanitized_binary_path} in {os.listdir(sanitized_binary_dir)}')
else:
print('ERROR: Unable to identify Centipede\'s sanitized target directory'
f'{sanitized_binary_dir} in {os.listdir(target_dir)}')
return None
def test_all(out, allowed_broken_targets_percentage): # pylint: disable=too-many-return-statements
"""Do bad_build_check on all fuzz targets."""
# TODO(metzman): Refactor so that we can convert test_one to python.
fuzz_targets = find_fuzz_targets(out)
if not fuzz_targets:
print('ERROR: No fuzz targets found.')
return False
if centipede_needs_auxiliaries():
for fuzz_target in fuzz_targets:
if not find_centipede_auxiliary(fuzz_target):
print(f'ERROR: Couldn\'t find auxiliary for {fuzz_target}.')
return False
pool = multiprocessing.Pool()
bad_build_results = pool.map(do_bad_build_check, fuzz_targets)
pool.close()
pool.join()
broken_targets = get_broken_fuzz_targets(bad_build_results, fuzz_targets)
broken_targets_count = len(broken_targets)
if not broken_targets_count:
return True
print('Retrying failed fuzz targets sequentially', broken_targets_count)
pool = multiprocessing.Pool(1)
retry_targets = []
for broken_target, result in broken_targets:
retry_targets.append(broken_target)
bad_build_results = pool.map(do_bad_build_check, retry_targets)
pool.close()
pool.join()
broken_targets = get_broken_fuzz_targets(bad_build_results, broken_targets)
broken_targets_count = len(broken_targets)
if not broken_targets_count:
return True
print('Broken fuzz targets', broken_targets_count)
total_targets_count = len(fuzz_targets)
broken_targets_percentage = 100 * broken_targets_count / total_targets_count
for broken_target, result in broken_targets:
print(broken_target)
# Use write because we can't print binary strings.
sys.stdout.buffer.write(result.stdout + result.stderr + b'\n')
if broken_targets_percentage > allowed_broken_targets_percentage:
print('ERROR: {broken_targets_percentage}% of fuzz targets seem to be '
'broken. See the list above for a detailed information.'.format(
broken_targets_percentage=broken_targets_percentage))
if has_ignored_targets(out):
print('Build check automatically passing because of ignored targets.')
return True
return False
print('{total_targets_count} fuzzers total, {broken_targets_count} '
'seem to be broken ({broken_targets_percentage}%).'.format(
total_targets_count=total_targets_count,
broken_targets_count=broken_targets_count,
broken_targets_percentage=broken_targets_percentage))
return True
def get_allowed_broken_targets_percentage():
"""Returns the value of the environment value
'ALLOWED_BROKEN_TARGETS_PERCENTAGE' as an int or returns a reasonable
default."""
return int(os.getenv('ALLOWED_BROKEN_TARGETS_PERCENTAGE') or '10')
def main():
"""Does bad_build_check on all fuzz targets in parallel. Returns 0 on success.
Returns 1 on failure."""
allowed_broken_targets_percentage = get_allowed_broken_targets_percentage()
if not test_all_outside_out(allowed_broken_targets_percentage):
return 1
return 0
if __name__ == '__main__':
sys.exit(main())
================================================
FILE: infra/base-images/base-runner/test_all_test.py
================================================
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Tests test_all.py"""
import unittest
from unittest import mock
import test_all
class TestTestAll(unittest.TestCase):
"""Tests for the test_all_function."""
@mock.patch('test_all.find_fuzz_targets', return_value=[])
@mock.patch('builtins.print')
def test_test_all_no_fuzz_targets(self, mock_print, _):
"""Tests that test_all returns False when there are no fuzz targets."""
outdir = '/out'
allowed_broken_targets_percentage = 0
self.assertFalse(
test_all.test_all(outdir, allowed_broken_targets_percentage))
mock_print.assert_called_with('ERROR: No fuzz targets found.')
if __name__ == '__main__':
unittest.main()
================================================
FILE: infra/base-images/base-runner/test_one.py
================================================
#!/usr/bin/env python3
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""Does bad_build_check on a fuzz target in $OUT."""
import os
import sys
import test_all
def test_one(fuzz_target):
"""Does bad_build_check on one fuzz target. Returns True on success."""
with test_all.use_different_out_dir():
fuzz_target_path = os.path.join(os.environ['OUT'], fuzz_target)
result = test_all.do_bad_build_check(fuzz_target_path)
if result.returncode != 0:
sys.stdout.buffer.write(result.stdout + result.stderr + b'\n')
return False
return True
def main():
"""Does bad_build_check on one fuzz target. Returns 1 on failure, 0 on
success."""
if len(sys.argv) != 2:
print('Usage: %d ', sys.argv[0])
return 1
fuzz_target_binary = sys.argv[1]
return 0 if test_one(fuzz_target_binary) else 1
if __name__ == '__main__':
sys.exit(main())
================================================
FILE: infra/base-images/base-runner/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Build rust stuff in its own image. We only need the resulting binaries.
# Keeping the rust toolchain in the image wastes 1 GB.
FROM gcr.io/oss-fuzz-base/base-image:ubuntu-20-04 as temp-runner-binary-builder
RUN apt-get update && apt-get install -y cargo libyaml-dev
RUN cargo install rustfilt
# Using multi-stage build to copy some LLVM binaries needed in the runner image.
FROM gcr.io/oss-fuzz-base/base-clang:ubuntu-20-04 AS base-clang
FROM gcr.io/oss-fuzz-base/base-builder-ruby:ubuntu-20-04 AS base-ruby
# The base builder image compiles a specific Python version. Using a multi-stage build
# to copy that same Python interpreter into the runner image saves build time and keeps
# the Python versions in sync.
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-20-04 AS base-builder
# Real image that will be used later.
FROM gcr.io/oss-fuzz-base/base-image:ubuntu-20-04
COPY --from=temp-runner-binary-builder /root/.cargo/bin/rustfilt /usr/local/bin
# Copy the binaries needed for code coverage and crash symbolization.
COPY --from=base-clang /usr/local/bin/llvm-cov \
/usr/local/bin/llvm-profdata \
/usr/local/bin/llvm-symbolizer \
/usr/local/bin/
# Copy the pre-compiled Python binaries and libraries
COPY --from=base-builder /usr/local/bin/python3.11 /usr/local/bin/python3.11
COPY --from=base-builder /usr/local/lib/libpython3.11.so.1.0 /usr/local/lib/libpython3.11.so.1.0
COPY --from=base-builder /usr/local/include/python3.11 /usr/local/include/python3.11
COPY --from=base-builder /usr/local/lib/python3.11 /usr/local/lib/python3.11
COPY --from=base-builder /usr/local/bin/pip3 /usr/local/bin/pip3
# Create symbolic links to ensure compatibility
RUN ldconfig && \
ln -s /usr/local/bin/python3.11 /usr/local/bin/python3 && \
ln -s /usr/local/bin/python3.11 /usr/local/bin/python
COPY install_deps_ubuntu_20_04.sh /
RUN /install_deps_ubuntu_20_04.sh && rm /install_deps_ubuntu_20_04.sh
ENV CODE_COVERAGE_SRC=/opt/code_coverage
# Pin coverage to the same as in the base builder:
# https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/install_python.sh#L22
RUN git clone https://chromium.googlesource.com/chromium/src/tools/code_coverage $CODE_COVERAGE_SRC && \
cd /opt/code_coverage && \
git checkout edba4873b5e8a390e977a64c522db2df18a8b27d && \
pip3 install wheel && \
# If version "Jinja2==2.10" is in requirements.txt, bump it to a patch version that
# supports upgrading its MarkupSafe dependency to a Python 3.11 compatible release:
sed -i 's/Jinja2==2.10/Jinja2==2.10.3/' requirements.txt && \
pip3 install -r requirements.txt && \
pip3 install MarkupSafe==2.0.1 && \
pip3 install coverage==6.3.2
# Default environment options for various sanitizers.
# Note that these match the settings used in ClusterFuzz and
# shouldn't be changed unless a corresponding change is made on
# ClusterFuzz side as well.
ENV ASAN_OPTIONS="alloc_dealloc_mismatch=0:allocator_may_return_null=1:allocator_release_to_os_interval_ms=500:check_malloc_usable_size=0:detect_container_overflow=1:detect_odr_violation=0:detect_leaks=1:detect_stack_use_after_return=1:fast_unwind_on_fatal=0:handle_abort=1:handle_segv=1:handle_sigill=1:max_uar_stack_size_log=16:print_scariness=1:quarantine_size_mb=10:strict_memcmp=1:strip_path_prefix=/workspace/:symbolize=1:use_sigaltstack=1:dedup_token_length=3"
ENV MSAN_OPTIONS="print_stats=1:strip_path_prefix=/workspace/:symbolize=1:dedup_token_length=3"
ENV UBSAN_OPTIONS="print_stacktrace=1:print_summary=1:silence_unsigned_overflow=1:strip_path_prefix=/workspace/:symbolize=1:dedup_token_length=3"
ENV FUZZER_ARGS="-rss_limit_mb=2560 -timeout=25"
ENV AFL_FUZZER_ARGS="-m none"
# Set up Golang environment variables (copied from /root/.bash_profile).
ENV GOPATH /root/go
# /root/.go/bin is for the standard Go binaries (i.e. go, gofmt, etc).
# $GOPATH/bin is for the binaries from the dependencies installed via "go get".
ENV PATH $PATH:$GOPATH/bin
COPY gocoverage $GOPATH/gocoverage
COPY install_go.sh /
RUN /install_go.sh && rm -rf /install_go.sh /root/.go
# Install OpenJDK 15 and trim its size by removing unused components.
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
ENV JAVA_15_HOME=/usr/lib/jvm/java-15-openjdk-amd64
ENV JVM_LD_LIBRARY_PATH=$JAVA_HOME/lib/server
ENV PATH=$PATH:$JAVA_HOME/bin
COPY install_java.sh /
RUN /install_java.sh && rm /install_java.sh
# Install JaCoCo for JVM coverage.
RUN wget https://repo1.maven.org/maven2/org/jacoco/org.jacoco.cli/0.8.7/org.jacoco.cli-0.8.7-nodeps.jar -O /opt/jacoco-cli.jar && \
wget https://repo1.maven.org/maven2/org/jacoco/org.jacoco.agent/0.8.7/org.jacoco.agent-0.8.7-runtime.jar -O /opt/jacoco-agent.jar && \
echo "37df187b76888101ecd745282e9cd1ad4ea508d6 /opt/jacoco-agent.jar" | shasum --check && \
echo "c1814e7bba5fd8786224b09b43c84fd6156db690 /opt/jacoco-cli.jar" | shasum --check
COPY install_javascript.sh /
RUN /install_javascript.sh && rm /install_javascript.sh
# Copy built ruby. It is up to the fuzzing harnesses
# themselves to set GEM_HOME and GEM_PATH appropriately, as this depends
# on how the harnesses are packaged.
COPY --from=base-ruby /usr/local/bin/ruby /usr/local/bin/ruby
COPY --from=base-ruby /usr/local/bin/gem /usr/local/bin/gem
COPY --from=base-ruby /usr/local/lib/ruby /usr/local/lib/ruby
COPY --from=base-ruby /usr/local/include/ruby-3.3.0 /usr/local/include/ruby-3.3.0
RUN apt-get update && apt-get install -y luarocks
# Do this last to make developing these files easier/faster due to caching.
COPY bad_build_check \
coverage \
coverage_helper \
download_corpus \
jacoco_report_converter.py \
nyc_report_converter.py \
rcfilt \
reproduce \
run_fuzzer \
parse_options.py \
generate_differential_cov_report.py \
profraw_update.py \
targets_list \
test_all.py \
test_one.py \
python_coverage_runner_help.py \
/usr/local/bin/
================================================
FILE: infra/base-images/base-runner/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
# Build rust stuff in its own image. We only need the resulting binaries.
# Keeping the rust toolchain in the image wastes 1 GB.
FROM gcr.io/oss-fuzz-base/base-image:ubuntu-24-04 as temp-runner-binary-builder
RUN apt-get update && apt-get install -y cargo libyaml-dev
RUN cargo install rustfilt
# Using multi-stage build to copy some LLVM binaries needed in the runner image.
FROM gcr.io/oss-fuzz-base/base-clang:ubuntu-24-04 AS base-clang
FROM gcr.io/oss-fuzz-base/base-builder-ruby:ubuntu-24-04 AS base-ruby
# The base builder image compiles a specific Python version. Using a multi-stage build
# to copy that same Python interpreter into the runner image saves build time and keeps
# the Python versions in sync.
FROM gcr.io/oss-fuzz-base/base-builder:ubuntu-24-04 AS base-builder
# Real image that will be used later.
FROM gcr.io/oss-fuzz-base/base-image:ubuntu-24-04
COPY --from=temp-runner-binary-builder /root/.cargo/bin/rustfilt /usr/local/bin
# Copy the binaries needed for code coverage and crash symbolization.
COPY --from=base-clang /usr/local/bin/llvm-cov \
/usr/local/bin/llvm-profdata \
/usr/local/bin/llvm-symbolizer \
/usr/local/bin/
# Copy the pre-compiled Python binaries and libraries
COPY --from=base-builder /usr/local/bin/python3.11 /usr/local/bin/python3.11
COPY --from=base-builder /usr/local/lib/libpython3.11.so.1.0 /usr/local/lib/libpython3.11.so.1.0
COPY --from=base-builder /usr/local/include/python3.11 /usr/local/include/python3.11
COPY --from=base-builder /usr/local/lib/python3.11 /usr/local/lib/python3.11
COPY --from=base-builder /usr/local/bin/pip3 /usr/local/bin/pip3
# Create symbolic links to ensure compatibility
RUN ldconfig && \
ln -s /usr/local/bin/python3.11 /usr/local/bin/python3 && \
ln -s /usr/local/bin/python3.11 /usr/local/bin/python
COPY install_deps_ubuntu_24_04.sh /
RUN /install_deps_ubuntu_24_04.sh && rm /install_deps_ubuntu_24_04.sh
ENV CODE_COVERAGE_SRC=/opt/code_coverage
# Pin coverage to the same as in the base builder:
# https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/install_python.sh#L22
RUN git clone https://chromium.googlesource.com/chromium/src/tools/code_coverage $CODE_COVERAGE_SRC && \
cd /opt/code_coverage && \
git checkout edba4873b5e8a390e977a64c522db2df18a8b27d && \
pip3 install wheel && \
# If version "Jinja2==2.10" is in requirements.txt, bump it to a patch version that
# supports upgrading its MarkupSafe dependency to a Python 3.11 compatible release:
sed -i 's/Jinja2==2.10/Jinja2==2.10.3/' requirements.txt && \
pip3 install -r requirements.txt && \
pip3 install MarkupSafe==2.0.1 && \
pip3 install coverage==6.3.2
# Default environment options for various sanitizers.
# Note that these match the settings used in ClusterFuzz and
# shouldn't be changed unless a corresponding change is made on
# ClusterFuzz side as well.
ENV ASAN_OPTIONS="alloc_dealloc_mismatch=0:allocator_may_return_null=1:allocator_release_to_os_interval_ms=500:check_malloc_usable_size=0:detect_container_overflow=1:detect_odr_violation=0:detect_leaks=1:detect_stack_use_after_return=1:fast_unwind_on_fatal=0:handle_abort=1:handle_segv=1:handle_sigill=1:max_uar_stack_size_log=16:print_scariness=1:quarantine_size_mb=10:strict_memcmp=1:strip_path_prefix=/workspace/:symbolize=1:use_sigaltstack=1:dedup_token_length=3"
ENV MSAN_OPTIONS="print_stats=1:strip_path_prefix=/workspace/:symbolize=1:dedup_token_length=3"
ENV UBSAN_OPTIONS="print_stacktrace=1:print_summary=1:silence_unsigned_overflow=1:strip_path_prefix=/workspace/:symbolize=1:dedup_token_length=3"
ENV FUZZER_ARGS="-rss_limit_mb=2560 -timeout=25"
ENV AFL_FUZZER_ARGS="-m none"
# Set up Golang environment variables (copied from /root/.bash_profile).
ENV GOPATH /root/go
# /root/.go/bin is for the standard Go binaries (i.e. go, gofmt, etc).
# $GOPATH/bin is for the binaries from the dependencies installed via "go get".
ENV PATH $PATH:$GOPATH/bin
COPY gocoverage $GOPATH/gocoverage
COPY install_go.sh /
RUN /install_go.sh && rm -rf /install_go.sh /root/.go
# Install OpenJDK 15 and trim its size by removing unused components.
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
ENV JAVA_15_HOME=/usr/lib/jvm/java-15-openjdk-amd64
ENV JVM_LD_LIBRARY_PATH=$JAVA_HOME/lib/server
ENV PATH=$PATH:$JAVA_HOME/bin
COPY install_java.sh /
RUN /install_java.sh && rm /install_java.sh
# Install JaCoCo for JVM coverage.
RUN wget https://repo1.maven.org/maven2/org/jacoco/org.jacoco.cli/0.8.7/org.jacoco.cli-0.8.7-nodeps.jar -O /opt/jacoco-cli.jar && \
wget https://repo1.maven.org/maven2/org/jacoco/org.jacoco.agent/0.8.7/org.jacoco.agent-0.8.7-runtime.jar -O /opt/jacoco-agent.jar && \
echo "37df187b76888101ecd745282e9cd1ad4ea508d6 /opt/jacoco-agent.jar" | shasum --check && \
echo "c1814e7bba5fd8786224b09b43c84fd6156db690 /opt/jacoco-cli.jar" | shasum --check
COPY install_javascript.sh /
RUN /install_javascript.sh && rm /install_javascript.sh
# Copy built ruby. It is up to the fuzzing harnesses
# themselves to set GEM_HOME and GEM_PATH appropriately, as this depends
# on how the harnesses are packaged.
COPY --from=base-ruby /usr/local/bin/ruby /usr/local/bin/ruby
COPY --from=base-ruby /usr/local/bin/gem /usr/local/bin/gem
COPY --from=base-ruby /usr/local/lib/ruby /usr/local/lib/ruby
COPY --from=base-ruby /usr/local/include/ruby-3.3.0 /usr/local/include/ruby-3.3.0
RUN apt-get update && apt-get install -y luarocks
# Do this last to make developing these files easier/faster due to caching.
COPY bad_build_check \
coverage \
coverage_helper \
download_corpus \
jacoco_report_converter.py \
nyc_report_converter.py \
rcfilt \
reproduce \
run_fuzzer \
parse_options.py \
generate_differential_cov_report.py \
profraw_update.py \
targets_list \
test_all.py \
test_one.py \
python_coverage_runner_help.py \
/usr/local/bin/
================================================
FILE: infra/base-images/base-runner-debug/CHANGELOG.md
================================================
# Docker Image Version Changelog: oss-fuzz/base-runner-debug
## Analysis Summary
The `ubuntu-20-04` and `ubuntu-24-04` images for `oss-fuzz/base-runner-debug` were successfully built. These images are used for debugging fuzzers and contain tools like GDB and Valgrind. The Dockerfile structure was refactored to support multi-version builds by creating separate Dockerfiles for each Ubuntu version and updating the `FROM` instruction accordingly.
## Build Status
| Image Tag | Dockerfile | Status |
| --- | --- | --- |
| `oss-fuzz/base-runner-debug:ubuntu-20-04` | `ubuntu-20-04.Dockerfile` | Success |
| `oss-fuzz/base-runner-debug:ubuntu-24-04` | `ubuntu-24-04.Dockerfile` | Success |
## Package Comparison
### Key Differences (Ubuntu 20.04 vs. Ubuntu 24.04)
The `ubuntu-24-04` image includes newer versions of GDB, Valgrind, and other debugging tools.
## Dockerfile Analysis
The Dockerfiles for both versions have the following key differences:
* **Base Image:** The `FROM` instruction in each Dockerfile points to the corresponding `oss-fuzz/base-runner` tag (`ubuntu-20-04` or `ubuntu-24-04`).
* **GDB Installation:** Both versions download and build GDB from source.
* **Refactoring:** The original `Dockerfile` was split into `ubuntu-20-04.Dockerfile` and `ubuntu-24-04.Dockerfile` to support the multi-version build strategy.
================================================
FILE: infra/base-images/base-runner-debug/Dockerfile
================================================
# Copyright 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-runner
RUN apt-get update && apt-get install -y valgrind zip
# Installing GDB 12, re https://github.com/google/oss-fuzz/issues/7513.
RUN apt-get install -y build-essential libgmp-dev && \
wget https://ftp.gnu.org/gnu/gdb/gdb-12.1.tar.xz && \
tar -xf gdb-12.1.tar.xz && cd gdb-12.1 && ./configure && \
make -j $(expr $(nproc) / 2) && make install && cd .. && \
rm -rf gdb-12.1* && apt-get remove --purge -y build-essential libgmp-dev
================================================
FILE: infra/base-images/base-runner-debug/ubuntu-20-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-runner:ubuntu-20-04
RUN apt-get update && apt-get install -y valgrind zip
# Installing GDB 12, re https://github.com/google/oss-fuzz/issues/7513.
RUN apt-get install -y build-essential libgmp-dev && \
wget https://ftp.gnu.org/gnu/gdb/gdb-12.1.tar.xz && \
tar -xf gdb-12.1.tar.xz && cd gdb-12.1 && ./configure && \
make -j $(expr $(nproc) / 2) && make install && cd .. && \
rm -rf gdb-12.1* && apt-get remove --purge -y build-essential libgmp-dev
================================================
FILE: infra/base-images/base-runner-debug/ubuntu-24-04.Dockerfile
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-runner:ubuntu-24-04
RUN apt-get update && apt-get install -y valgrind zip
# Installing GDB 12, re https://github.com/google/oss-fuzz/issues/7513.
RUN apt-get install -y build-essential libgmp-dev && \
wget https://ftp.gnu.org/gnu/gdb/gdb-12.1.tar.xz && \
tar -xf gdb-12.1.tar.xz && cd gdb-12.1 && ./configure && \
make -j $(expr $(nproc) / 2) && make install && cd .. && \
rm -rf gdb-12.1* && apt-get remove --purge -y build-essential libgmp-dev
================================================
FILE: infra/base-images/list_images.py
================================================
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
"""
Helper script to print the official list of base images.
This script serves as the single source of truth for shell scripts,
avoiding logic duplication.
"""
import os
import sys
# Add the path to the `functions` directory to import the `base_images` module.
FUNCTIONS_DIR = os.path.abspath(
os.path.join(os.path.dirname(__file__), '..', 'build', 'functions'))
sys.path.append(FUNCTIONS_DIR)
import base_images
for image_config in base_images.BASE_IMAGE_DEFS:
# Exclude 'base-clang-full' as it is a special case not intended for
# the general build script.
if image_config.get('name', '') != 'base-clang-full':
print(image_config.get('name', ''))
================================================
FILE: infra/bisector.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Uses bisection to determine which commit a bug was introduced and fixed.
This module takes a high and a low commit SHA, a repo name, and a bug.
The module bisects the high and low commit SHA searching for the location
where the bug was introduced. It also looks for where the bug was fixed.
This is done with the following steps:
NOTE: Needs to be run from root of the OSS-Fuzz source checkout.
Typical usage example:
python3 infra/bisector.py
--old_commit 1e403e9259a1abedf108ab86f711ba52c907226d
--new_commit f79be4f2330f4b89ea2f42e1c44ca998c59a0c0f
--fuzz_target rules_fuzzer
--project_name yara
--testcase infra/yara_testcase
--sanitizer address
"""
import argparse
import collections
import logging
import os
import sys
import tempfile
import build_specified_commit
import helper
import repo_manager
import utils
Result = collections.namedtuple('Result', ['repo_url', 'commit'])
START_MARKERS = [
'==ERROR',
'==WARNING',
]
END_MARKERS = [
'SUMMARY:',
]
DEDUP_TOKEN_MARKER = 'DEDUP_TOKEN:'
class BisectError(Exception):
"""Bisection error."""
def __init__(self, message, repo_url):
super().__init__(message)
self.repo_url = repo_url
def main():
"""Finds the commit SHA where an error was initally introduced."""
logging.getLogger().setLevel(logging.INFO)
utils.chdir_to_root()
parser = argparse.ArgumentParser(
description='git bisection for finding introduction of bugs')
parser.add_argument('--project_name',
help='The name of the project where the bug occurred.',
required=True)
parser.add_argument('--new_commit',
help='The newest commit SHA to be bisected.',
required=True)
parser.add_argument('--old_commit',
help='The oldest commit SHA to be bisected.',
required=True)
parser.add_argument('--fuzz_target',
help='The name of the fuzzer to be built.',
required=True)
parser.add_argument('--test_case_path',
help='The path to test case.',
required=True)
parser.add_argument('--engine',
help='The default is "libfuzzer".',
default='libfuzzer')
parser.add_argument('--sanitizer',
default='address',
help='The default is "address".')
parser.add_argument('--type',
choices=['regressed', 'fixed'],
help='The bisection type.',
required=True)
parser.add_argument('--architecture', default='x86_64')
args = parser.parse_args()
build_data = build_specified_commit.BuildData(project_name=args.project_name,
engine=args.engine,
sanitizer=args.sanitizer,
architecture=args.architecture)
result = bisect(args.type, args.old_commit, args.new_commit,
args.test_case_path, args.fuzz_target, build_data)
if not result.commit:
logging.error('No error was found in commit range %s:%s', args.old_commit,
args.new_commit)
return 1
if result.commit == args.old_commit:
logging.error(
'Bisection Error: Both the first and the last commits in'
'the given range have the same behavior, bisection is not possible. ')
return 1
if args.type == 'regressed':
print('Error was introduced at commit %s' % result.commit)
elif args.type == 'fixed':
print('Error was fixed at commit %s' % result.commit)
return 0
def _get_dedup_token(output):
"""Get dedup token."""
for line in output.splitlines():
token_location = line.find(DEDUP_TOKEN_MARKER)
if token_location == -1:
continue
return line[token_location + len(DEDUP_TOKEN_MARKER):].strip()
return None
def _check_for_crash(project_name, fuzz_target, testcase_path):
"""Check for crash."""
def docker_run(args, **kwargs):
del kwargs
command = ['docker', 'run', '--rm', '--privileged']
if sys.stdin.isatty():
command.append('-i')
return utils.execute(command + args)
logging.info('Checking for crash')
out, err, return_code = helper.reproduce_impl(
project=helper.Project(project_name),
fuzzer_name=fuzz_target,
valgrind=False,
env_to_add=[],
fuzzer_args=[],
testcase_path=testcase_path,
run_function=docker_run,
err_result=(None, None, None))
if return_code is None:
return None
logging.info('stdout =\n%s', out)
logging.info('stderr =\n%s', err)
# pylint: disable=unsupported-membership-test
has_start_marker = any(
marker in out or marker in err for marker in START_MARKERS)
has_end_marker = any(marker in out or marker in err for marker in END_MARKERS)
if not has_start_marker or not has_end_marker:
return None
return _get_dedup_token(out + err)
# pylint: disable=too-many-locals
# pylint: disable=too-many-arguments
# pylint: disable=too-many-statements
def _bisect(bisect_type, old_commit, new_commit, testcase_path, fuzz_target,
build_data):
"""Perform the bisect."""
# pylint: disable=too-many-branches
base_builder_repo = build_specified_commit.load_base_builder_repo()
with tempfile.TemporaryDirectory() as tmp_dir:
repo_url, repo_path = build_specified_commit.detect_main_repo(
build_data.project_name, commit=new_commit)
if not repo_url or not repo_path:
raise ValueError('Main git repo can not be determined.')
if old_commit == new_commit:
raise BisectError('old_commit is the same as new_commit', repo_url)
# Copy /src from the built Docker container to ensure all dependencies
# exist. This will be mounted when running them.
host_src_dir = build_specified_commit.copy_src_from_docker(
build_data.project_name, tmp_dir)
bisect_repo_manager = repo_manager.RepoManager(
os.path.join(host_src_dir, os.path.basename(repo_path)))
bisect_repo_manager.fetch_all_remotes()
commit_list = bisect_repo_manager.get_commit_list(new_commit, old_commit)
old_idx = len(commit_list) - 1
new_idx = 0
logging.info('Testing against new_commit (%s)', commit_list[new_idx])
if not build_specified_commit.build_fuzzers_from_commit(
commit_list[new_idx],
bisect_repo_manager,
host_src_dir,
build_data,
base_builder_repo=base_builder_repo):
raise BisectError('Failed to build new_commit', repo_url)
if bisect_type == 'fixed':
should_crash = False
elif bisect_type == 'regressed':
should_crash = True
else:
raise BisectError('Invalid bisect type ' + bisect_type, repo_url)
expected_error = _check_for_crash(build_data.project_name, fuzz_target,
testcase_path)
logging.info('new_commit result = %s', expected_error)
if not should_crash and expected_error:
logging.warning('new_commit crashed but not shouldn\'t. '
'Continuing to see if stack changes.')
range_valid = False
for _ in range(2):
logging.info('Testing against old_commit (%s)', commit_list[old_idx])
if not build_specified_commit.build_fuzzers_from_commit(
commit_list[old_idx],
bisect_repo_manager,
host_src_dir,
build_data,
base_builder_repo=base_builder_repo):
raise BisectError('Failed to build old_commit', repo_url)
if _check_for_crash(build_data.project_name, fuzz_target,
testcase_path) == expected_error:
logging.warning('old_commit %s had same result as new_commit %s',
old_commit, new_commit)
# Try again on an slightly older commit.
old_commit = bisect_repo_manager.get_parent(old_commit, 64)
if not old_commit:
break
commit_list = bisect_repo_manager.get_commit_list(
new_commit, old_commit)
old_idx = len(commit_list) - 1
continue
range_valid = True
break
if not range_valid:
raise BisectError('old_commit had same result as new_commit', repo_url)
while old_idx - new_idx > 1:
curr_idx = (old_idx + new_idx) // 2
logging.info('Testing against %s (idx=%d)', commit_list[curr_idx],
curr_idx)
if not build_specified_commit.build_fuzzers_from_commit(
commit_list[curr_idx],
bisect_repo_manager,
host_src_dir,
build_data,
base_builder_repo=base_builder_repo):
# Treat build failures as if we couldn't repo.
# TODO(ochang): retry nearby commits?
old_idx = curr_idx
continue
current_error = _check_for_crash(build_data.project_name, fuzz_target,
testcase_path)
logging.info('Current result = %s', current_error)
if expected_error == current_error:
new_idx = curr_idx
else:
old_idx = curr_idx
return Result(repo_url, commit_list[new_idx])
# pylint: disable=too-many-locals
# pylint: disable=too-many-arguments
def bisect(bisect_type, old_commit, new_commit, testcase_path, fuzz_target,
build_data):
"""From a commit range, this function caluclates which introduced a
specific error from a fuzz testcase_path.
Args:
bisect_type: The type of the bisect ('regressed' or 'fixed').
old_commit: The oldest commit in the error regression range.
new_commit: The newest commit in the error regression range.
testcase_path: The file path of the test case that triggers the error
fuzz_target: The name of the fuzzer to be tested.
build_data: a class holding all of the input parameters for bisection.
Returns:
The commit SHA that introduced the error or None.
Raises:
ValueError: when a repo url can't be determine from the project.
"""
try:
return _bisect(bisect_type, old_commit, new_commit, testcase_path,
fuzz_target, build_data)
finally:
# Clean up projects/ as _bisect may have modified it.
oss_fuzz_repo_manager = repo_manager.RepoManager(helper.OSS_FUZZ_DIR)
oss_fuzz_repo_manager.git(['reset', 'projects'])
oss_fuzz_repo_manager.git(['checkout', 'projects'])
oss_fuzz_repo_manager.git(['clean', '-fxd', 'projects'])
if __name__ == '__main__':
main()
================================================
FILE: infra/bisector_test.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing perepo_managerissions and
# limitations under the License.
"""Test the functionality of bisection module:
1) Test a known case where an error appears in a regression range.
2) Bisect can handle incorrect inputs.
IMPORTANT: This test needs to be run with root privileges.
"""
import os
import unittest
import bisector
import build_specified_commit
import test_repos
# Necessary because __file__ changes with os.chdir
TEST_DIR_PATH = os.path.dirname(os.path.realpath(__file__))
@unittest.skip('Test is too long to be run with presubmit.')
class BisectIntegrationTests(unittest.TestCase):
"""Class to test the functionality of bisection method."""
BISECT_TYPE = 'regressed'
def test_bisect_invalid_repo(self):
"""Test the bisection method on a project that does not exist."""
test_repo = test_repos.INVALID_REPO
build_data = build_specified_commit.BuildData(
project_name=test_repo.project_name,
engine='libfuzzer',
sanitizer='address',
architecture='x86_64')
with self.assertRaises(ValueError):
bisector.bisect(self.BISECT_TYPE, test_repo.old_commit,
test_repo.new_commit, test_repo.testcase_path,
test_repo.fuzz_target, build_data)
def test_bisect(self):
"""Test the bisect method on example projects."""
for test_repo in test_repos.TEST_REPOS:
if test_repo.new_commit:
build_data = build_specified_commit.BuildData(
project_name=test_repo.project_name,
engine='libfuzzer',
sanitizer='address',
architecture='x86_64')
result = bisector.bisect(self.BISECT_TYPE, test_repo.old_commit,
test_repo.new_commit, test_repo.testcase_path,
test_repo.fuzz_target, build_data)
self.assertEqual(result.commit, test_repo.intro_commit)
if __name__ == '__main__':
# Change to oss-fuzz main directory so helper.py runs correctly.
if os.getcwd() != os.path.dirname(TEST_DIR_PATH):
os.chdir(os.path.dirname(TEST_DIR_PATH))
unittest.main()
================================================
FILE: infra/build/blog/.gitignore
================================================
oss-fuzz-blog
hugo-coder
================================================
FILE: infra/build/blog/Dockerfile
================================================
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM python:3.11-bullseye
# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True
RUN mkdir -p hugo-bin && \
cd hugo-bin && \
wget https://github.com/gohugoio/hugo/releases/download/v0.126.1/hugo_extended_0.126.1_linux-amd64.tar.gz && \
tar -xzf hugo_extended_0.126.1_linux-amd64.tar.gz
ENV PATH="${PATH}:/hugo-bin/"
RUN git clone https://github.com/luizdepra/hugo-coder hugo-coder && \
cd hugo-coder && \
git checkout 759cc945636473d251a28597e2007cbb7d11631d # 17th May 2024
COPY content /content
COPY hugo.toml /hugo.toml
COPY build_blog.sh /build_blog.sh
RUN /build_blog.sh
CMD exec python3 -m http.server 8011 -d /oss-fuzz-blog/page/public
================================================
FILE: infra/build/blog/build_blog.sh
================================================
#!/bin/bash -eux
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
BASE=$PWD
if [ -d "${BASE}/hugo-coder" ]
then
echo "Local version of hugo exists. Using this."
else
# When writing the blog outeside of docker we clone Hugo here.
git clone https://github.com/luizdepra/hugo-coder hugo-coder
cd hugo-coder
git checkout 759cc945636473d251a28597e2007cbb7d11631d # 17th May 2024
cd ../
fi
# Build the site
if [ -d "${BASE}/oss-fuzz-blog" ]
then
rm -rf ${BASE}/oss-fuzz-blog
fi
mkdir oss-fuzz-blog
cd oss-fuzz-blog
hugo new site page
cd page
git init
# Copy over our content
cp -rf ${BASE}/hugo-coder themes/hugo-coder
cp $BASE/hugo.toml .
rm -rf ./content
cp -rf $BASE/content .
# Build the site
hugo -D
# Uncomment the following to launch site automatically
#python3 -m http.server 8011 -d ./oss-fuzz-blog/page/public
================================================
FILE: infra/build/blog/content/about.md
================================================
+++
title = "About"
description = "OSS-Fuzz's blog"
date = "2024-05-20"
aliases = ["about-us", "about-oss-fuzz", "contact"]
author = "OSS-Fuzz maintainers"
+++
This is a blog for updates, research and initiatives of the OSS-Fuzz project.
OSS-Fuzz is an open source fuzzing framework focused on large scale fuzzing
of open source projects. The efforts described in this blog focuses on this
domain and includes both feature updates to OSS-Fuzz itself as well as insights
into research and development efforts of OSS-Fuzz.
================================================
FILE: infra/build/blog/content/posts/introducing-java-auto-harnessing.md
================================================
+++
authors = ["OSS-Fuzz Maintainers"]
title = "Introducing Java fuzz harness synthesis using LLMs"
date = "2024-09-05"
description = "Introducing LLM-based harness generation for Java OSS-Fuzz projects."
categories = [
"Fuzzing",
"Fuzzing synthesis",
"LLM",
"Automated fuzzing",
"Java",
"Java automatic fuzzing",
]
+++
# Introduction
The primary objective of OSS-Fuzz-gen is to automate the fuzzing process for open-source software.
In our previous blog posts ([1](https://security.googleblog.com/2023/08/ai-powered-fuzzing-breaking-bug-hunting.html),[2](https://blog.oss-fuzz.com/posts/introducing-llm-based-harness-synthesis-for-unfuzzed-projects/)), we've demonstrated promising results using large language models (LLMs) to enhance existing C/C++ OSS-Fuzz projects and explored the potential of leveraging LLMs for initial OSS-Fuzz integrations.
In this blog post, we explore how we can extend this work to another language (Java), and the unique challenges we encountered while building Java-specific capabilities into our existing OSS-Fuzz-Gen workflow:
1. Extracting program analysis data from Java projects.
2. Generate LLM prompts based on program analysis targeted Java projects.
# Java fuzz harness sample and outline
To illustrate the typical structure of a Java fuzz harness, consider the following
example targeting the [Jettison](https://github.com/google/oss-fuzz/tree/master/projects/jettison) project, specifically the constructor of the
`MappedXMLStreamReader` class. This constructor requires a `JSONObject` as an argument,
which the harness instantiates using fuzz data provided by the `FuzzedDataProvider`
object. Moreover, since `MappedXMLStreamReader` is a resource class implementing the
`AutoCloseable` interface, the harness must invoke the close method on the instantiated
object to prevent memory exhaustion. Failure to do so would result in memory
leaks during each fuzz iteration.
The harness also needs to handle `JSONException` and `XMLStreamException` exceptions,
as these are valid exceptions that the target class may throw. If these exceptions
are not appropriately caught, they would be incorrectly reported as issues, leading
to false positives. It is also important to note that the methods targeted by the
harness are publicly accessible, as otherwise the harness wouldn’t build successfully.
```java
import com.code_intelligence.jazzer.api.FuzzedDataProvider;
import org.codehaus.jettison.mapped.MappedXMLStreamReader;
import org.codehaus.jettison.json.JSONObject;
import org.codehaus.jettison.json.JSONException;
import javax.xml.stream.XMLStreamException;
public class JsonFuzzer {
public static void fuzzerInitialize() {
}
public static void fuzzerTearDown() {
}
public static void fuzzerTestOneInput(FuzzedDataProvider data) {
try {
JSONObject jsonObject = new JSONObject();
for (int i = 0; i < 10; ++i) {
try {
jsonObject.put(data.consumeString(10), data.consumeString(10));
} catch (JSONException e) {
// handle exception
}
}
MappedXMLStreamReader reader = new MappedXMLStreamReader(jsonObject);
reader.close();
} catch (JSONException | XMLStreamException e) {
// handle exception
}
}
}
```
These aspects are central in the generation of Java fuzzing harnesses, and, we
may see similar code structures in a C++ harness, one of the observations that we
have made during our Java fuzzing automation efforts is that without contextual
information regarding the aforementioned parts the LLMs are likely to generate
harnesses that can’t build or produce false positives. To this end, a significant
part of enabling Java fuzzing harness synthesis by way of LLMs has been to provide
enough context to the LLM so it is aware of these constraints.
# Challenges faced integrating Java into OSS-Fuzz-gen
The above example highlights several common characteristics of Java harnesses,
and throughout our efforts we identified the need for specific handling of these
within our prompt. This includes specific considerations to the following attributes:
## 1. Object creation and constructors
Fuzzing Java targets almost always requires creating and managing objects, and this involves calling constructors, managing object lifecycles, and ensuring objects are in valid states before invoking methods. Because precise object management is crucial, it is important to provide context about object creation to LLMs so they can generate fuzzing harnesses that use the correct constructors or static methods.
For example, our auto-generation capabilities support generating harnesses targeting both static methods and object instance methods. Whenever the target is an instance method we provide details about the constructors associated with a given class, and further descriptions about the types of the arguments to the construct. This constructor section that we add to the LLM prompt, provides details such as a list of constructors, methods and guidelines in the target code that create and initialize objects of the type that the target method is attached to.
A sample of this section is shown below, where the goal of the constructor section is to provide context for the LLM on how to instantiate a `DiffRowGenerator` object.
```sh
DiffRowGenerator.Builder.build()
You MUST call the STATIC method DiffRowGenerator.create() to retrieve an instance of DiffRowGenerator.Builder before invoking DiffRowGenerator.Builder.build() to generate a com.github.difflib.text.DiffRowGenerator instance.
```
For the full prompt and the harness generated by the prompt, please see the following [Gist](https://gist.github.com/DavidKorczynski/d16bf21a433931d6c8be9f5a4048f48e).
## 2. Exception handling
Exceptions are prevalent in Java and although they are also prevalent in C++, we found a need for adding further handling of exceptions when auto generating Java harnesses. We anticipate that this is, to some extent, due to Java fuzzing often revolving around generating harnesses that are meant to flag any uncaught exceptions in the target code, whereas the predominant goal of C++ fuzzing is to capture memory corruption issues. To this end, we added a specific guide on which exceptions a Java harness needs to catch as displayed in the prompt snippet below:
```sh
The tag contains a list of exceptions thrown by the target method that you MUST catch.
...
jakarta.mail.internet.AddressException
```
In order to extract the exceptions that a harness should catch, we rely on reachability analysis from Fuzz Introspector, that extracts the exceptions a given function can throw explicitly. The primary objective of including this section is to minimize the number of false positives arising from expected exceptions and to catch all checked exceptions, thereby preventing compilation errors in the generated harness.
## 3. Resources object closing
In Java fuzzing we must manage and close resources for classes that implement `AutoCloseable`
to prevent memory leaks and resource exhaustion. Java relies on the garbage collector
for memory management so a harness generally doesn’t need to worry about out-of-memory
issues or memory leaks. However, for classes that implement the `AutoCloseable` interface,
such as file streams, network connections, or database handles, the garbage collector won’t
free up its allocated memory. To this end, in order to avoid memory leaks and out-of-memory
issues, we need to provide context for the LLM whenever `AutoCloseable` objects are and guidance
for closing the objects correctly. To address this, we add to the LLM prompt general guidance
on the need for closing `AutoCloseable` interfaces, as well as specific guidance whenever
we incur objects that implements this interface, as shown by the snippet of a prompt below:
```sh
...
You MUST invoke the close method of the org.codehaus.jettison.mapped.MappedXMLStreamReader objects in the finally block after the target method is invoked.You MUST invoke the close method of any resource class objects that implements the java.lang.AutoCloseable interface in the finally block after the target method is invoked.
...
```
## 4. Choosing suitable targets
A central theme when auto-generating fuzzing harnesses is to identify entry points in the target code that are relevant fuzz targets. In general, OSS-Fuzz-gen does this by identifying target functions that exhibit a lot of complexity, but has zero or low code coverage from the existing OSS-Fuzz harnesses. This works well in terms of identifying targets that if fuzzed correctly will yield a lot of code coverage. This idea translates well into Java as well, in that we are interested in fuzzing targets that are high in the function call tree of the target codebase.
However, we found that we need additional filtering mechanisms when choosing target method candidates, due to the language features of Java such as polymorphism, method scope and more. In addition to this, because Java targets often have several thousand methods that are potential candidates, we found a stronger need for more carefully choosing which candidates may be viable targets. For example, in addition to the existing candidate choosing mechanisms we have in OSS-Fuzz-gen, we added filtering logic for Java methods that only includes methods if they:
- Are publicly accessible.
- Are not part of the JVM library.
- Are not part of an enum class.
- Are not called by any existing fuzzing harnesses
- Are not part of any exception or testing class or contain the words “test”, “exception” or “error in the function name.
## 5. Random objects and primitive data
Java harnesses often have to generate complex types as input to the target methods, and these types themselves are often generated either through creation of a sequence of different objects or using helper methods provided by helper classes exposed by the Jazzer fuzzing framework. We found the need to provide further guidance on how to instantiate the arguments of a given function, as well provide guidance on generating simple types such as strings seeded with fuzz data. To this end, we include in each prompt a section on how to instantiate the arguments of a target method, as shown by the below snippet:
```sh
1. Argument #0 requires a java.util.List instance with a generic type of String. You MUST create an empty java.util.List instance, then fill the list with multiple DIFFERENT String objects generated by FuzzedDataProvider::consumeString(int) or FuzzedDataProvider::consumeAsciiString(int) or FuzzedDataProvider::consumeRemainingAsString() or FuzzedDataProvider::consumeRemainingAsAsciiString() or FuzzedDataProvider::pickValue(String[]) methods.
2. Argument #1 requires a com.github.difflib.patch.Patch instance with a generic type of String. You MUST create two empty java.util.List instance, then fill the two lists with multiple DIFFERENT String objects generated by FuzzedDataProvider::consumeString(int) or FuzzedDataProvider::consumeAsciiString(int) or FuzzedDataProvider::consumeRemainingAsString() or FuzzedDataProvider::consumeRemainingAsAsciiString() or FuzzedDataProvider::pickValue(String[]) methods. After the two lists creation, use these newly created lists to invoke the STATIC method com.github.difflib.DiffUtils.diff(java.util.List,java.util.List) to generate a com.github.difflib.patch.Patch instance with generic type of String.
```
The section outlines both how to create primitive types using the `FuzzedDataProvider` exposed by the fuzzing engine, as well as guidelines on how to create higher-level types such as the `difflib.patch.Patch` as shown in the second argument in the above snippet.
## 6. General Java fuzzing requirements
There are several fuzzing engines for Java, such as JQF, Jazzer, and JavaFuzz, each with its own unique structure and methodology, unlike the more standardized engines used for C/C++ fuzzing. Currently, OSS-Fuzz supports Java fuzzing exclusively through the Jazzer engine, so it is essential for OSS-Fuzz-Gen to provide guidelines that enable LLMs to generate harnesses following Jazzer's specific structure for direct use in OSS-Fuzz. We found a need to guide the LLM towards generating Jazzer-friendly harnesses by providing a Java-specific introduction section in the LLM, as well as a section on general Java fuzzing guidelines.
An example snippet of the general guidelines are shown in the snippet below:
```sh
...
The generated fuzzing harness should be wrapped with the tag.NEVER use any methods from the java.lang.Random class in the generated code.NEVER use any classes or methods in the java.lang.reflect package in the generated code.NEVER use the @FuzzTest annotation for specifying the fuzzing method.Please avoid using any multithreading or multi-processing approach.Please add import statements for necessary classes, except for classes in the java.lang package.You MUST create the object before calling the target method.You MUST catch java.lang.RuntimeException.Please use HeaderTokenizerFuzzer as the Java class name.
```
# Results
The Java harness generation logic is built into OSS-Fuzz-gen which means we can run the logic at scale. To test our approach we ran the harness generation on a total of 106 existing Java project integrations in OSS-Fuzz. An overview of the results are shown in the below table. In total we tried to synthesize harnesses for 592 targets, which means that we identified 592 interesting Java methods to fuzz. In total, 280 of the harnesses synthesized were able to build and of these 280 harnesses a total of 102 harnesses had an edge-coverage delta of more than zero. This means that 102 harnesses had code exploration, where the remaining 178 harnesses either ran into an exception in the first iteration or failed to explore code incrementally.
| Total number of projects | Total harnesses synthesized | Harnesses successfully built | harness build success rate | Harnesses with edge coverage delta above 0 |
|--------------------------|----------------------------|--------------------|-----------------|----------------|
| 106 | 592 | 280 | 47.30% | 102 |
During our evaluation, several of the harnesses reported bugs which we triaged. We found that two issues were deemed to be security issues (awaiting response) and also triaged four issues that were considered reliability bugs. All issues were reported to the upstream maintainers. In the following we’ll go through the four harnesses that found reliability bugs (uncaught exceptions).
## Reliability bug 1: uncaught exception in jakarta.mail
Reported [here](https://github.com/jakartaee/mail-api/issues/734)
Generated harness:
```java
import com.code_intelligence.jazzer.api.FuzzedDataProvider;
import jakarta.mail.internet.ParameterList;
import jakarta.mail.internet.ParseException;
public class HeaderTokenizerFuzzer {
public static void fuzzerInitialize() {
}
public static void fuzzerTearDown() {
}
public static void fuzzerTestOneInput(FuzzedDataProvider data) {
try {
for (int i = 0; i < 10; i++) {
String var_0 = data.consumeRemainingAsString();
ParameterList parameterList = new ParameterList(var_0);
}
} catch (ParseException e) {
}
}
}
```
Execution log and bug trace:
```sh
#2 INITED cov: 34 ft: 34 corp: 1/1b exec/s: 0 rss: 911Mb
#6 NEW cov: 51 ft: 69 corp: 2/3b lim: 4 exec/s: 0 rss: 911Mb L: 2/2 MS: 8 ChangeByte-Custom-ChangeBit-Custom-CopyPart-Custom-InsertByte-Custom-
#12 NEW cov: 60 ft: 79 corp: 3/5b lim: 4 exec/s: 0 rss: 911Mb L: 2/2 MS: 2 InsertByte-Custom-
#14 NEW cov: 62 ft: 82 corp: 4/8b lim: 4 exec/s: 0 rss: 911Mb L: 3/3 MS: 4 ShuffleBytes-Custom-InsertByte-Custom-
#15 NEW cov: 63 ft: 83 corp: 5/10b lim: 4 exec/s: 0 rss: 911Mb L: 2/3 MS: 2 ChangeByte-Custom-
#16 REDUCE cov: 63 ft: 83 corp: 5/9b lim: 4 exec/s: 0 rss: 911Mb L: 2/2 MS: 2 EraseBytes-Custom-
#25 NEW cov: 64 ft: 84 corp: 6/11b lim: 4 exec/s: 0 rss: 911Mb L: 2/2 MS: 8 ShuffleBytes-Custom-ChangeByte-Custom-ChangeByte-Custom-CopyPart-Custom-
#54 NEW cov: 71 ft: 91 corp: 7/14b lim: 4 exec/s: 0 rss: 911Mb L: 3/3 MS: 8 ChangeBinInt-Custom-CrossOver-Custom-InsertByte-Custom-ChangeBit-Custom-
#60 NEW cov: 73 ft: 95 corp: 8/16b lim: 4 exec/s: 0 rss: 911Mb L: 2/3 MS: 2 CopyPart-Custom-
…
#232899 REDUCE cov: 247 ft: 1033 corp: 288/8318b lim: 493 exec/s: 116449 rss: 936Mb L: 14/260 MS: 2 EraseBytes-Custom-
#234061 REDUCE cov: 247 ft: 1033 corp: 288/8317b lim: 501 exec/s: 117030 rss: 936Mb L: 13/260 MS: 4 ChangeBit-Custom-EraseBytes-Custom-
#234662 REDUCE cov: 247 ft: 1033 corp: 288/8309b lim: 501 exec/s: 117331 rss: 936Mb L: 33/260 MS: 2 EraseBytes-Custom-
#236110 REDUCE cov: 247 ft: 1033 corp: 288/8305b lim: 509 exec/s: 118055 rss: 936Mb L: 25/260 MS: 6 ShuffleBytes-Custom-CMP-Custom-EraseBytes-Custom- DE: "*0*"-
#236377 REDUCE cov: 247 ft: 1033 corp: 288/8304b lim: 509 exec/s: 118188 rss: 936Mb L: 42/260 MS: 4 ChangeBit-Custom-EraseBytes-Custom-
#236528 REDUCE cov: 247 ft: 1033 corp: 288/8303b lim: 509 exec/s: 118264 rss: 936Mb L: 8/260 MS: 2 EraseBytes-Custom-
#238094 NEW cov: 247 ft: 1037 corp: 289/8336b lim: 517 exec/s: 119047 rss: 936Mb L: 33/260 MS: 2 CopyPart-Custom-
== Java Exception: java.util.ConcurrentModificationException
at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1584)
at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1607)
at jakarta.mail.internet.ParameterList.combineMultisegmentNames(ParameterList.java:408)
at jakarta.mail.internet.ParameterList.(ParameterList.java:309)
at HeaderTokenizerFuzzer.fuzzerTestOneInput(HeaderTokenizerFuzzer.java:16)
```
## Reliability bug 2: uncaught exception in jettison.json
Reported [here](https://github.com/jettison-json/jettison/issues/96)
Generated harness:
```java
import com.code_intelligence.jazzer.api.FuzzedDataProvider;
import org.codehaus.jettison.json.JSONArray;
import org.codehaus.jettison.json.JSONException;
import org.codehaus.jettison.json.JSONTokener;
public class JsonFuzzer {
public static void fuzzerInitialize() {
}
public static void fuzzerTearDown() {
}
public static void fuzzerTestOneInput(FuzzedDataProvider data) {
try {
JSONTokener jSONTokener = new JSONTokener(data.consumeRemainingAsString());
JSONArray jSONArray = new JSONArray(jSONTokener);
} catch (JSONException e) {
}
}
}
```
Execution log and bug trace:
```sh
#1729 INITED cov: 88 ft: 341 corp: 67/2733b exec/s: 0 rss: 954Mb
#1731 NEW cov: 90 ft: 343 corp: 68/2742b lim: 567 exec/s: 0 rss: 954Mb L: 9/558 MS: 4 CopyPart-Custom-ManualDict-Custom- DE: "}="-
#1758 NEW cov: 92 ft: 345 corp: 69/2747b lim: 567 exec/s: 0 rss: 954Mb L: 5/558 MS: 4 PersAutoDict-Custom-CrossOver-Custom- DE: "}="-
#1759 REDUCE cov: 92 ft: 345 corp: 69/2745b lim: 567 exec/s: 0 rss: 954Mb L: 9/558 MS: 2 EraseBytes-Custom-
#1763 NEW cov: 94 ft: 347 corp: 70/2820b lim: 567 exec/s: 0 rss: 954Mb L: 75/558 MS: 8 ChangeASCIIInt-Custom-EraseBytes-Custom-ShuffleBytes-Custom-CopyPart-Custom-
#1779 NEW cov: 96 ft: 349 corp: 71/2853b lim: 567 exec/s: 0 rss: 954Mb L: 33/558 MS: 2 ManualDict-Custom- DE: "\""-
#1804 NEW cov: 98 ft: 351 corp: 72/2864b lim: 567 exec/s: 0 rss: 954Mb L: 11/558 MS: 10 CMP-Custom-ChangeBit-Custom-ChangeBinInt-Custom-InsertByte-Custom-ChangeByte-Custom- DE: "(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at org.codehaus.jettison.json.JSONTokener.newJSONArray(JSONTokener.java:436)
at org.codehaus.jettison.json.JSONTokener.nextValue(JSONTokener.java:342)
at org.codehaus.jettison.json.JSONArray.(JSONArray.java:145)
at JsonFuzzer.fuzzerTestOneInput(JsonFuzzer.java:17)
```
## Reliability bug 3: Uncaught exception in sqlite-jdbc
Reported [here](https://github.com/xerial/sqlite-jdbc/issues/1141)
Generated harness:
```java
import com.code_intelligence.jazzer.api.FuzzedDataProvider;
import org.sqlite.ExtendedCommand;
public class SqliteConnectionFuzzer {
public static void fuzzerInitialize() {
// Initializing objects for fuzzing
}
public static void fuzzerTearDown() {
// Tear down objects after fuzzing
}
public static void fuzzerTestOneInput(FuzzedDataProvider data) {
// Use the FuzzedDataProvider object to generate random data for fuzzing
String string0 = data.consumeRemainingAsString();
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++) {
// Fuzz by invoking the target method with random parameters / objects generated above.
ExtendedCommand.removeQuotation(string0);
}
}
}
}
```
Execution log and bug trace:
```sh
#2 INITED cov: 9 ft: 9 corp: 1/1b exec/s: 0 rss: 942Mb
#486 NEW cov: 10 ft: 10 corp: 2/3b lim: 8 exec/s: 486 rss: 1042Mb L: 2/2 MS: 8 ChangeByte-Custom-ChangeBit-Custom-CopyPart-Custom-ChangeBit-Custom-
== Java Exception: java.lang.StringIndexOutOfBoundsException: begin 1, end 0, length 1
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3734)
at java.base/java.lang.String.substring(String.java:1903)
at org.sqlite.ExtendedCommand.removeQuotation(ExtendedCommand.java:57)
at SqliteConnectionFuzzer.fuzzerTestOneInput(SqliteConnectionFuzzer.java:19)
```
## Reliability bug 4: uncaught exception in jolt
Reported [here](https://github.com/bazaarvoice/jolt/issues/1268)
Generated harness:
```java
import com.bazaarvoice.jolt.removr.spec.RemovrCompositeSpec;
import com.code_intelligence.jazzer.api.FuzzedDataProvider;
import java.util.Map;
import java.util.HashMap;
public class JsonUtilsFuzzer {
public static void fuzzerInitialize() {
}
public static void fuzzerTearDown() {
}
public static void fuzzerTestOneInput(FuzzedDataProvider data) {
try {
String var_0 = data.consumeString(1024);
Map var_1 = new HashMap();
for (int i = 0; i < data.consumeInt(0, 10); i++) {
var_1.put(data.consumeString(1024), data.consumeString(1024));
}
RemovrCompositeSpec target = new RemovrCompositeSpec(var_0, var_1);
} catch (java.lang.UnsupportedOperationException e) {
}
}
}
```
Execution log and bug trace:
```sh
#358 NEW cov: 52 ft: 67 corp: 12/35b lim: 4 exec/s: 0 rss: 990Mb L: 4/4 MS: 6 ChangeBinInt-Custom-PersAutoDict-Custom-CopyPart-Custom- DE: "\000\000"-
#411 NEW cov: 54 ft: 70 corp: 13/39b lim: 4 exec/s: 0 rss: 990Mb L: 4/4 MS: 6 CrossOver-Custom-ShuffleBytes-Custom-CopyPart-Custom-
#468 REDUCE cov: 54 ft: 70 corp: 13/38b lim: 4 exec/s: 0 rss: 990Mb L: 2/4 MS: 4 ChangeBit-Custom-CrossOver-Custom-
#469 REDUCE cov: 54 ft: 70 corp: 13/37b lim: 4 exec/s: 0 rss: 990Mb L: 2/4 MS: 2 CrossOver-Custom-
== Java Exception: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
at com.bazaarvoice.jolt.common.pathelement.StarDoublePathElement.(StarDoublePathElement.java:54)
at com.bazaarvoice.jolt.removr.spec.RemovrSpec.parse(RemovrSpec.java:55)
at com.bazaarvoice.jolt.removr.spec.RemovrSpec.(RemovrSpec.java:36)
at com.bazaarvoice.jolt.removr.spec.RemovrCompositeSpec.(RemovrCompositeSpec.java:59)
at JsonUtilsFuzzer.fuzzerTestOneInput(JsonUtilsFuzzer.java:20)
```
# Conclusions and future work
In this blog post we have introduced our initial efforts toward automated java fuzzing. We described the challenges we faced during integration of Java support to OSS-Fuzz-gen, including how we pick interesting candidates and how we provide Java-specific context in the LLM prompts. The approach is built into our existing pipelines in OSS-Fuzz-gen, which enables us to do harness generation at scale for all OSS-Fuzz Java projects. The approach has shown interesting elements including code coverage gains across a large part of the Java projects as well as reporting security and reliability issues.
We will continue our efforts in automated Java fuzzer generation and are actively exploring new avenues for prompt generation. The goal is to provide reliable and clear harness suggestions to OSS-Fuzz users. We are also exploring combining our Java efforts with approaches for generating OSS-Fuzz integrations from scratch, as described in a previous [blog post](https://blog.oss-fuzz.com/posts/introducing-llm-based-harness-synthesis-for-unfuzzed-projects/).
The efforts described in this blog post are available in our OSS-Fuzz-gen repository [https://github.com/google/oss-fuzz-gen](https://github.com/google/oss-fuzz-gen) and we invite contributions from the community to further the Java harness automation generation.
================================================
FILE: infra/build/blog/content/posts/introducing-llm-based-harness-synthesis-for-unfuzzed-projects.md
================================================
+++
authors = ["OSS-Fuzz Maintainers"]
title = "Introducing LLM-based harness synthesis for unfuzzed projects"
date = "2024-05-27"
description = "Introducing LLM-based harness generation for unfuzzed projects."
categories = [
"Fuzzing",
"Fuzzing synthesis",
"LLM",
"Automated fuzzing",
]
+++
# Introduction
As part of the OSS-Fuzz-Gen project, we’ve been working on generating fuzzing harnesses for OSS-Fuzz projects with the goal of improving fuzzing coverage and unearthing more vulnerabilities.
Results previously published from our ongoing work described in our [blog post](https://security.googleblog.com/2023/08/ai-powered-fuzzing-breaking-bug-hunting.html) showed promising results, with absolute coverage increases of up to 35% across over 160 OSS-Fuzz projects, and [6 new vulnerabilities](https://github.com/google/oss-fuzz-gen/?tab=readme-ov-file#bugs-discovered) discovered. However, this work only applied to projects already integrated into OSS-Fuzz as it uses the existing fuzzing build setups scripts in the given OSS-Fuzz project.
Recently, we experimented with generating fuzzing harnesses for arbitrary C/C++ software projects, using the same LLM techniques.
The primary goal of our efforts are to take as input a GitHub repository and output an OSS-Fuzz project as well as a ClusterFuzzLite project with a meaningful fuzz harness. In this blog post we will describe how we automatically build projects, how we generate fuzzing harnesses using LLMs, how these are evaluated and list a selection of 15 projects that we generated OSS-Fuzz/ClusterFuzzLite integrations for and have upstreamed the results.
# Generating OSS-Fuzz integrations with LLM harness synthesis
The high-level process for generating fuzzing harnesses from scratch takes as input a URL to a GitHub project and then follows a four step approach:
1. Build generator: Try building the project using a set of pre-defined auto-build heuristics and capture the output of the build heuristics. If no build succeeds, do not continue.
2. Fuzz Introspector build: For each successful build, rebuild the project under analysis of [Fuzz Introspector](https://github.com/ossf/fuzz-introspector) in order to extract a myriad of program analysis data as output in a Fuzz Introspector report.
3. LLM-based harness generation: Synthesize harnesses by way of LLMs where the prompts are based on the program analysis data from Fuzz Introspector report.
4. Harness Building: For each generated harness, build it using the build scripts generated from step (1) and run each harness for a number of seconds to evaluate its runtime performance. Log results from runtime for later inspection. For each harness wrap it in an appropriate OSS-Fuzz and ClusterFuzzLite project.
The output of the above is a set of OSS-Fuzz/ClusterFuzzLite projects with LLM-generated harnesses, build scripts, Dockerfiles and output from runtime evaluations. The following figure visualizes the approach, and we will now go into further details with each of the above steps.

## Step 1: Auto-build target project
The first step intends to build the target project. In the case of C/C++ projects this is a non-trivial problem because, in comparison to several managed languages, there is limited consensus on how to build projects. There are multiple build systems, e.g. Make, CMake, Bazel, Ninja and so on, some projects rely on third-party dependencies to be installed on the system that builds the project, and some projects may rely on multiple commands to create the build artifacts. In addition to this, in order to build the code in a fuzzer-friendly manner we need to ensure certain compiler flags, e.g. to enable sanitizers, and compilers are used for the compilation.
The strategy we have opted for auto-building projects for a fuzzer-friendly build is creating a set of generalized build scripts by abstracting the existing build scripts in OSS-Fuzz. These generalized build scripts are template-like and include, for example, general build approaches based on Make, CMake and also compiling source files directly. We call these generalized build scripts for “build heuristics”. The build heuristics also include features for building the target code statically, since this is a requirement by OSS-Fuzz, and techniques for disabling certain options there may be available in the target projects’ build set up. We added these options because we observed several libraries where default options may not be fuzz-compatible and disabling these would successfully build the target projects.
Upon successful execution of a build template, we search for the binary artifacts created by the build. Specifically, we are interested in the static archives produced by the build since to run on OSS-Fuzz it is preferred to link harnesses statically. We consider each build that produces at least one static archive to be a successful build, and each successful build is used for further processing in the next steps. The output of this step is a bash build script for each successful build template.
To provide intuition for how the build scripts look, consider the following two examples.
Example 1, lorawan-parser build script ([PR](https://github.com/JiapengLi/lorawan-parser/pull/17)):
```sh
autoreconf -fi
./configure
make
$CC $CFLAGS $LIB_FUZZING_ENGINE $SRC/fuzzer.c -Wl,--whole-archive $SRC/lorawan-parser/lw/.libs/liblorawan.a -Wl,--whole-archive $SRC/lorawan-parser/lib/libloragw$SRC/.libs/libloragw.a -Wl,--whole-archive $SRC/lorawan-parser/lib/.libs/lib.a -Wl,--allow-multiple-definition -I$SRC/lorawan-parser/util/parser -I$SRC/lorawan-parser/lib/libloragw/inc -I$SRC/lorawan-parser/lib -I$SRC/lorawan-parser/lw -o $OUT/fuzzer
```
Example 2, simpleson build script ([PR](https://github.com/gregjesl/simpleson/pull/40)):
```sh
mkdir fuzz-build
cd fuzz-build
cmake -DCMAKE_VERBOSE_MAKEFILE=ON ../
make V=1 || true
$CXX $CXXFLAGS $LIB_FUZZING_ENGINE $SRC/fuzzer.cpp -Wl,--whole-archive $SRC/simpleson/fuzz-build/libsimpleson.a -Wl,--allow-multiple-definition -I$SRC/simpleson/ -o $OUT/fuzzer
```
## Step 2: Extract program analysis data using Fuzz Introspector
The next step is to extract data about the program under analysis so that we can use it in a programmatic manner. We need this for two reasons. First, in order to select functions that are good candidates for fuzzing in the target project. Second, we need to be able to programmatically describe the program under analysis in a way that allows us to generate LLM prompts that describe the source code in a human-readable manner.
To achieve this we build the target under analysis, using the build scripts from the previous step, in combination with Fuzz Introspector. Fuzz Introspector is an LLVM-based program analysis tool that extracts a lot of data useful for fuzz introspection and also program analysis in general. For example, for each function in the target project Fuzz Introspector provides data such as function signature, cross-reference information, source code, cyclomatic complexity, call tree and more. This is all useful for our LLM-based harness generation since the goal is to present the LLM with a prompt that gives a precise technical description of the target under analysis.The output of this step is an introspector report for each build script.
The json snippet below is a sample subset of the data that Fuzz Introspector provides for the [json_validate](https://github.com/JiapengLi/lorawan-parser/blob/010a8f16074fb9a004b812e0289c5bc527e548ba/lib/json.c#L433) function in the [lorawan-parser](https://github.com/JiapengLi/lorawan-parser) project. The full data is available in this [Gist](https://gist.github.com/DavidKorczynski/f573d074d7745f351f76fcecd4a51930) and this type of data is provided to each function in a target codebase. We have stripped the sample for a number of keys in the json output to focus primarily on the data that we use in OSS-Fuzz-gen. Specifically, during OSS-Fuzz-gen harness synthesis we use of the data below:
1. Cyclomatic complexity data to highlight functions of interest.
2. Callsite to identify sample locations a given function is used.
3. Debug information to present to the prompt with program context about the target function.
4. Source code location to extract source.
This is a target function that our approach successfully generated an OSS-Fuzz integration for.
```json
{
"Func name": "json_validate",
"Functions filename": "/src/test-fuzz-build-2/./lib/json.c",
"Function call depth": 7,
"Cyclomatic complexity": 4,
"Functions reached": 45,
"Reached by functions": 0,
"Accumulated cyclomatic complexity": 212,
"ArgNames": [
"json"
],
"callsites": {
"skip_space": [
"./lib/json.c#json_validate:437",
"./lib/json.c#json_validate:441"
],
"parse_value": [
"./lib/json.c#json_validate:438"
]
},
"source_line_begin": 434,
"source_line_end": 446,
"function_signature": "bool json_validate(const char *)",
"debug_function_info": {
"name": "json_validate",
"is_public": 0,
"is_private": 0,
"func_signature_elems": {
"return_type": [
"DW_TAG_base_type",
"bool"
],
"params": [
[
"DW_TAG_pointer_type",
"DW_TAG_const_type",
"char"
]
]
},
"source": {
"source_file": "/src/test-fuzz-build-2/lib/json.c",
"source_line": "433"
},
"return_type": "bool",
"args": [
"const char *"
]
}
},
```
## Step 3: LLM-based harness synthesis
The next step is to use LLMs to generate fuzzing harnesses. To do this, we have implemented several “harness-generators” that take as input the introspector reports and use this to create human-readable (LLM-readable) prompts which direct the LLM towards creating fuzz harnesses. The high-level idea is to generate textual descriptions of the target functions that are likely to produce a good harness by the LLM. To this end, for each function we consider a likely good candidate for fuzzing we have features for including in the prompts:
- Description of the target function’s signature, with complete types, of the target program
- Description of specifically which header files are available in the target project.
- Examples of cross-references that use the target function to present sample code patterns involving the target function.
- The actual source code of the target function.
- Provide basic guidance to the LLM, such as the need for wrapping it in `LLVMFuzzerTestOneInput`.
The output of this step is a set of fuzzing harnesses produced by LLMs. Specifically, we generate Y amounts of harnesses, where Y is the number of functions to target in the program under analysis.
The harnesses that perform the best for each project are in general those that target high-level functions in the target project where the given function accepts fairly raw input data and does not rely on a complex initialization set up. This includes top level functions to, e.g. parse a given string, read a certain file and similar. The interesting parts of the LLM harness synthesi is that the LLM generates the correct initialization logic, provides the correct data types seeded with fuzz data and also provides correct cleaning up logic. To this end, we consider our approach to likely be useful for a lot of open source projects that are by nature fuzzing-friendly targets.
The following GitHub [Gist](https://gist.github.com/DavidKorczynski/bc386b88eab43931338971cff4d4655b) contains three sample prompts generated by the prompt-generation logic, and also the corresponding harness as output by the LLM. These are all prompts that generated successful harnesses in that the harnesses exercised a meaningful portion of code in the target as well as based on human investigation were determined to be good harnesses.
The following examples show fuzzing harnesses generated by our approach.
Example 1, nanosvg harness ([PR](https://github.com/google/oss-fuzz/pull/11944)):
```c
#include
#include
#include
#include
#include "nanosvgrast.h"
#include "nanosvg.h"
#include "stb_image_write.h"
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Ensure null-terminated string
char* data_copy = (char*)malloc(size + 1);
memcpy(data_copy, data, size);
data_copy[size] = '\0';
// Dummy arguments
const char* dummy_filename = "dummy.svg";
float dummy_value = 1.0f;
// Call the target function
NSVGimage* result = nsvgParse(data_copy, dummy_filename, dummy_value);
// Free memory
if (result) {
nsvgDelete(result);
}
free(data_copy);
return 0;
}
```
Example 2, tinyexpr harness ([PR](https://github.com/google/oss-fuzz/pull/11944)):
```c
#include
#include
#include
#include
#include "minctest.h"
#include "tinyexpr.h"
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size == 0) {
return 0;
}
char *input = (char*)malloc(size + 1);
if (!input) {
return 0;
}
memcpy(input, data, size);
input[size] = '\0';
te_variable vars[] = {{ "x", 0 }};
int error;
te_expr *result = te_compile(input, vars, 1, &error);
free(input);
te_free(result);
return 0;
}
```
Example 3, Simpleson harness ([PR](https://github.com/gregjesl/simpleson/pull/40))
```c
#include
#include
#include
#include
#include "json.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size == 0) {
return 0;
}
// Copy input data to a null-terminated string
char* input = new char[size + 1];
memcpy(input, data, size);
input[size] = '\0';
try {
json::jobject::parse(input);
} catch (...) {
// Catch all exceptions thrown by the target code
}
delete[] input;
return 0;
}
```
## Step 4: Build and run each generated project
The final step is to verify that the build scripts combined with the fuzzing harnesses can build actual executables, and for each successfully built executable to verify that the harness is meaningful. Specifically, from step (1) we have a set of build scripts that successfully build the project under analysis and from step (3) we have a set of harnesses, for each successful build script, that targets the project under analysis. We now combine these by running the build script and creating a command for building a given harness against the output static libraries.
For each successfully built harness we run the given harness for a set period of time (40 seconds) in order to collect runtime logs, and for each successfully built harness we wrap the relevant artifacts in an OSS-Fuzz project as well as CluserFuzzLite project that can be run directly using the OSS-Fuzz infrastructure. The logs from runtime are then used for later inspection to determine if a harness was good, using factors such as edge coverage and how long the harness ran without running into any issues. At this stage, this is then verified by a human to determine if the integration is considered to the standard of an OSS-Fuzz/ClusterFuzzLite integration.
# Results
The goal of our efforts is to enable continuous fuzzing for arbitrary open source projects. We ran our approach on a benchmark of C/C++ projects and have captured a subset of the successful integrations in order to integrate them to OSS-Fuzz or upstream ClusterFuzzLite.
During the initial runs of the fuzz harnesses three memory corruption issues were reported and also a couple of memory leakages. We manually debugged the issues to create fixes and report the issues upstream. The memory corruption issues were all heap-based read buffer overflows, where the heap-aspect comes from the harnesses allocating data on the heap. An interesting aspect for one of the discovered issues is that the project already had CodeQL running as part of its continuous integration workflow, however, CodeQL did not find the issue as reported by the automatically generated fuzzing set up.
In addition to the issues reported we also collected code coverage for the projects based on the coverage achieved from a 40 second run. The below table contains references to the 15 projects that we upstream for continuous fuzzing. These project integrations were all fully automatically generated, with the caveat that for some of the pull requests we made follow-up commits that only addressed cosmetic changes, such as beautifying the code and making the build scripts more lean.
| Target GitHub repository | Integration PR | Code coverage | Issues found and fixed |
| ------------- | ------- | ------- | ------- |
| https://github.com/memononen/nanosvg | [PR](https://github.com/google/oss-fuzz/pull/11944) | 41% | |
| https://github.com/skeeto/pdjson | [PR](https://github.com/skeeto/pdjson/pull/33) | 78% | |
| https://github.com/gregjesl/simpleson | [PR](https://github.com/gregjesl/simpleson/pull/40) | 35% | [PR](https://github.com/gregjesl/simpleson/pull/39) |
| https://github.com/kgabis/parson | [PR](https://github.com/kgabis/parson/pull/214) | 42% | |
| https://github.com/rafagafe/tiny-json | [PR](https://github.com/rafagafe/tiny-json/pull/18) | 85% | |
| https://github.com/kosma/minmea | [PR](https://github.com/kosma/minmea/pull/79) | 37% | |
| https://github.com/marcobambini/sqlite-createtable-parser | [PR](https://github.com/marcobambini/sqlite-createtable-parser/pull/5) | 14% | [PR](https://github.com/marcobambini/sqlite-createtable-parser/pull/6 ) |
| https://github.com/benoitc/http-parser | [PR](https://github.com/benoitc/http-parser/pull/102) | 1.5% | [PR](https://github.com/benoitc/http-parser/pull/103 ) |
| https://github.com/orangeduck/mpc | [PR](https://github.com/orangeduck/mpc/pull/169) | 49% | |
| https://github.com/JiapengLi/lorawan-parser | [PR](https://github.com/JiapengLi/lorawan-parser/pull/17) | 11% | |
| https://github.com/argtable/argtable3 | [PR](https://github.com/argtable/argtable3/pull/96) | 0.8% | |
| https://github.com/h2o/picohttpparser | [PR](https://github.com/h2o/picohttpparser/pull/83) | 41% | |
| https://github.com/ndevilla/iniparser | [PR](https://github.com/ndevilla/iniparser/pull/161) | 46% | |
| https://github.com/codeplea/tinyexpr | [PR](https://github.com/codeplea/tinyexpr/pull/114) | 34% | |
| https://github.com/vincenthz/libjson | [PR](https://github.com/vincenthz/libjson/pull/28) | 10% | |
# Continuing the LLM harness synthesis loop
The LLM-based synthesis generation does not stop once a target project has integrated into OSS-Fuzz. In fact, at this point, the existing capabilities of OSS-Fuzz-gen will come into play and continuously, on a weekly basis, experiment with new harnesses for the target project that takes into account the project’s current coverage status on OSS-Fuzz. In particular, OSS-Fuzz-gen will analyze which are the most promising new targets for a given OSS-Fuzz project and use our extensive LLM-based harness synthesis to evaluate, test and run new harnesses for a given project. To this end, the synthesis will continue and improve as a given project is continuously fuzzed.
The [overall goal we’re working towards](https://security.googleblog.com/2023/08/ai-powered-fuzzing-breaking-bug-hunting.html) is to provide a fully automated solution to improve the security of projects with fuzzing, from the initial build integration, continuous fuzz harness generation, bug reporting and triage, and automatic patching.
## Contribute to the efforts
The efforts described in this blog post are open sourced in [OSS-Fuzz-gen](https://github.com/google/oss-fuzz-gen/tree/main/experimental/c-cpp). We invite contributions, and would like to highlight specific efforts that will have a positive impact on our OSS-Fuzz from scratch generation:
- Adding new build heuristics to enable compilation and fuzz introspector analysis of new projects. The key here is that any improvements in this context will open up analysis of new open source projects, and will ultimately be a positive sum outcome.
- Adding additional prompt generators with a given example of success. The approach described in this blog runs each prompt generator without affecting the other prompt generation approaches, and because of this there is no need to worry of causing regressions in our existing prompt generators. As such, contributions are welcome for any new prompt generation approach that successfully creates harnesses for a project where the existing prompt generation approaches come short. We encourage the use of program analysis data from Fuzz Introspector in order to provide source code context.
================================================
FILE: infra/build/blog/content/posts/oss-fuzz-hello-world.md
================================================
+++
authors = ["OSS-Fuzz Maintainers"]
title = "OSS-Fuzz blog"
date = "2024-05-20"
description = "Introduction to the OSS-Fuzz blog"
tags = [
"fuzzing",
"blogging",
]
categories = [
"introduction",
]
+++
Welcome to the new OSS-Fuzz blog! In this place we will publish feature
updates, research efforts and all-things OSS-Fuzz related.
================================================
FILE: infra/build/blog/content/posts/oss-fuzz-integrations-via-agent-based-build-generation.md
================================================
+++
authors = ["OSS-Fuzz Maintainers"]
title = "OSS-Fuzz integrations via agent-based build generation"
date = "2025-05-25"
description = "OSS-Fuzz integrations via agent-based build generation."
categories = [
"Fuzzing",
"Fuzzing synthesis",
"LLM",
"Automated fuzzing",
"Automated build script generation",
]
+++
# Introduction
As part of the [OSS-Fuzz-Gen](https://github.com/google/oss-fuzz-gen) project we have been working on making it easier for maintainers to integrate projects into OSS-Fuzz. Since OSS-Fuzz requires a specific build script in a format that uses the OSS-Fuzz build environment, the problem of automating OSS-Fuzz integrations is largely split in two parts: (1) creating a script for building the target project and relevant fuzzing harnesses and (2) creating the actual fuzzing harness. As such, any solution looking to automate OSS-Fuzz integrations must be able to solve both of these problems for a diverse set of open source projects.
The key goal for automating OSS-Fuzz integration is to support a full workflow that takes as input a project, e.g. GitHub repository, that is not integrated into OSS-Fuzz and outputs a generated OSS-Fuzz project with a working build script and one or more fuzzing harnesses. Furthermore, this workflow should be easily accessible and deployable, so open source maintainers can quickly leverage its features.
The primary focus of OSS-Fuzz-Gen has so far been on generating fuzzing harnesses for existing OSS-Fuzz projects. In a previous [blog post](https://blog.oss-fuzz.com/posts/introducing-llm-based-harness-synthesis-for-unfuzzed-projects/) we documented an end-to-end approach for OSS-Fuzz integrations, however, there were several significant limitations to this approach that meant it did not sufficiently solve the problems described above. Specifically, the build script generation was based on a template-based strategy and this had limitations in terms of being able to create build scripts for a diverse set of projects. In this blog post, we present two improvements towards an end-to-end OSS-Fuzz integration workflow:
1) An agentic LLM-based approach for build script generation.
2) A CLI tool making this easy to access and run.
# Overview and sample run
The main goal of our approach is to automate the full end-to-end generation of an OSS-Fuzz project, by simply providing as input one or more Git repositories and then outputting a list of one or more OSS-Fuzz integrations. We expose these capabilities as a CLI in a Python package which makes it easy to install and run. To demonstrate the capabilities and give an intuition for the workflow, consider the below sample which generates an OSS-Fuzz project, including fuzzing harnesses, for [https://github.com/zserge/jsmn](https://github.com/zserge/jsmn).
```sh
# Prepare virtual environment
python3.11 -m virtualenv .venv
. .venv/bin/activate
# Clone OSS-Fuzz-gen
git clone https://github.com/google/oss-fuzz-gen
cd oss-fuzz-gen
# Install OSS-Fuzz-gen
python3 -m pip install .
# Generate fuzzers for https://github.com/zserge/jsmn
echo "https://github.com/zserge/jsmn" > input.txt
# Run the generation
# Setup Vertex AI access: https://github.com/google/oss-fuzz-gen/blob/main/USAGE.md#llm-access
oss-fuzz-generator generate-full -i input.txt -m vertex_ai_gemini-2-flash-chat --agent -w work-1
# List the files of generated project
$ ls final-oss-fuzz-projects/jsmn-agent/
build.sh Dockerfile empty-fuzzer.0.c empty-fuzzer.1.c project.yaml
```
The first step is to install the Python package, which is currently done by cloning OSS-Fuzz-Gen and then installing it using `python -m pip install .`. The installed package includes a CLI tool `oss-fuzz-generator` which exposes the OSS-Fuzz project generation capabilities. The only step following this is to set up your LLM environment and use the `generate-full` command. This command will generate a build script as well as fuzzing harnesses, and also merge all successful fuzzing harness into one single OSS-Fuzz project.
More specifically, `oss-fuzz-generator generate-full` will perform three main steps for each of the repositories in the `input.txt` file:
1) Generate a build script that will compile fuzzers of a given project
2) If step 1 was successful, generate fuzzing harnesses for the project
3) Merge successful fuzzing harnesses into a single OSS-Fuzz project
In this blog post, we will focus on step 1 above.
# Agent-based build generation
The agent-based approach to generating build scripts rely on three central components. First, an initial prompt that outlines the overall task and constraints for creating build scripts for a given arbitrary repository. Second, an agent that communicates with the LLM and executes arbitrary commands, provided by the LLM, within the environment where the build script will be run, allowing the LLM to explore the runtime environment. Third, a process for running generated build scripts as well as executing generated fuzzers to guide the output from the LLM. The overall algorithm for the agentic build generation workflow is as follows:
```
initial_prompt = prepare_initial_prompt(target_repository)
prompt = prepare_initial_prompt(target_repository)
llm_client = llm_start_chat()
while should_keep_going():
llm_response = llm_client.chat(prompt)
res = parse_llm_response(llm_response)
if res.is_commands() {
output = execute_commands(res.get_commands());
}
else if (res.has_build_script()) {
output = build_and_run_fuzzer(res.get_build_script(), res.get_fuzz_harness());
if (output.has_successful_build_script()) {
// Success in harness generation
return output
}
}
else {
// Failure happened in parsing LLM output
exit()
}
// Prepare a next prompt for the LLM to chat.
prompt = prepare_next_prompt(output);
```
The algorithm returns a successful build script if the `return output` line is reached. Specifically, this line is reached when the LLM has created a build script with an accompanying fuzz harness that can successfully build and link against the code of the target repository.
There are four key functions in the build generation algorithm:
- *parse_llm_response*: takes as input the raw text returned by the LLM and converts it into either (1) a list of commands to execute in the runtime environment where the fuzzers are build or (2) a build script with supplied fuzzing harness source code. The LLM is initially instructed in the prepare_initial_prompt to generate text that conforms to a given standard, such as by wrapping output in XML tags.
- *execute_commands*: if the LLM returns a list of commands as determined by `parse_llm_response`, then this function executes these commands in the runtime environment in which the build script is to be run. The main point of this is that it enables the LLM to explore, understand and test the runtime environment. Results from running the commands are returned to the LLM and since the agent runs in a loop, the LLM can continuously issue commands, interpret the output and act accordingly.
- *build_and_run_fuzzer*: if the LLM returns a build script and potentially a fuzzing harness (it may be empty) as determined by the `parse_llm_response`, then this function will build these artifacts in the runtime environment. The output of the build will be analysed, and, if a successful harness was built then the process is considered completed. In case the build was not successful, then the output of executing the build script is saved and will eventually be passed back to the LLM.
- *prepare_next_prompt*: in the event that the iteration did not result in a successful build script, which can happen either if the LLM had returned a set of commands or the returned build script failed, the output of this will be used to construct a next prompt for the LLM. Specifically, we use the output from `execute_commands` and `build_and_run_fuzzer` as input to the next prompt. The prompt itself wraps around the output with a lightweight textual description. This prompt will then be used in the next agent iteration and passed to `llm_client.chat`.
The output of the above algorithm is upon success a build script that can produce a successful fuzzing harness. The next step is then to use this build script as the foundation of an OSS-Fuzz project, and use OSS-Fuzz-Gen’s core capabilities of generating fuzzing harnesses for an OSS-Fuzz project.
# Empirical evaluation
To test our approach, we ran this on a set of GitHub repositories written in C/C++. In total we analysed 225 repositories which generated a total of 88 OSS-Fuzz integrations with valid build scripts. We furthermore ran the harness generation capabilities of OSS-Fuzz-Gen on these projects to confirm the validity of the build scripts. In this blog post we will focus on the results of the build generation and discuss bug-finding capabilities in a future blog post.
We found that the approach has many strengths in that it overcame hurdles posed by many of the projects, and that it’s able to create build scripts our template-based approach is not. In this section we will go through a few samples of the build scripts to highlight the capabilities and limitations of the approach.
### Libcypher-parser
Target: [https://github.com/cleishm/libcypher-parser](https://github.com/cleishm/libcypher-parser)
The build script generated for this project is:
```sh
# Build script for libcypher-parser with fuzzing harness
# Install necessary dependencies
apt-get update && apt-get install -y make autoconf automake libtool curl
# Install autoconf 2.71
cd /tmp
curl -O http://ftp.gnu.org/gnu/autoconf/autoconf-2.71.tar.gz
tar -xzf autoconf-2.71.tar.gz
cd autoconf-2.71
./configure
make
make install
# Install leg tool
cd /tmp
curl -L -O https://www.piumarta.com/software/peg/peg-0.1.18.tar.gz
tar -xzf peg-0.1.18.tar.gz
cd peg-0.1.18
make
make install
# Build libcypher-parser
cd $SRC/libcypher-parser
./autogen.sh
./configure --enable-static --disable-shared
make
# Copy the static library to the output directory
cp lib/src/.libs/libcypher-parser.a $OUT/
# Compile the fuzzing harness
for fuzzer in $(find $SRC -maxdepth 1 -name 'empty-fuzzer.*'); do
fuzzer_basename=$(basename $fuzzer .c)
$CC $CFLAGS -I$SRC/libcypher-parser/lib/src $fuzzer -o $OUT/${fuzzer_basename} \
-L$OUT -lcypher-parser -Wl,--whole-archive $LIB_FUZZING_ENGINE -Wl,--no-whole-archive
done
```
Overall, this is a fairly complex build in that it installs a version of autoconf different to what is already in the runtime environment and also a tool “peg”. In order to identify that the “leg”/”peg” tool is needed, the agent had to interpret the output of the building the project without this tool, as doing so leads to the following issue when configuring libcypher-parser:
```sh
...
checking for leg... no
configure: error: leg is required: please install http://piumarta.com/software/peg/
ERROR:__main__:Building fuzzers failed.
```
### Yams
Target: [https://github.com/jpsim/Yams](https://github.com/jpsim/Yams)
The build script produced in this case is as follows:
```sh
#!/bin/bash
# Set up build environment
cd $SRC/Yams/Sources/CYaml
mkdir -p build
cd build
cmake .. -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX
make
# Compile the fuzzing harness
$CC $CFLAGS -I$SRC/Yams/Sources/CYaml/include -I$SRC $SRC/empty-fuzzer.c -o $OUT/empty-fuzzer -L. -Wl,--whole-archive libCYaml.a -Wl,--no-whole-archive $LIB_FUZZING_ENGINE
```
The build script is a simple cmake build script, but it passes arguments to the cmake command to ensure the target builds with the commands as specified by the OSS-Fuzz runtime environment. Interestingly, the building and linking of the fuzz harness in this context is not done in a loop. This works in principle, in that the project is able to build and link a fuzzing harness, but is limited in that it is unable to build and link and arbitrary number of fuzzing harnesses.
### Moment
Target: [https://github.com/kmussel/Moment](https://github.com/kmussel/Moment)
This is a library that has not been active for 10 years, so a particular challenge of this project is building a potentially old project. The build script generated is the following:
```sh
#!/bin/bash
# Install necessary tools
apt-get update && apt-get install -y bison flex
# Set source and output directories
SRC_DIR="$SRC/Moment"
OUT_DIR="$OUT"
# Preprocess parseIt.ym to remove the Foundation import
sed '/#import /d' $SRC_DIR/parseIt.ym > $SRC_DIR/parseIt_preprocessed.ym
# Use bison to generate parseIt.c and y.tab.h
bison -d -o $SRC_DIR/parseIt.c $SRC_DIR/parseIt_preprocessed.ym
# Rename the generated header to y.tab.h
mv $SRC_DIR/parseIt.h $SRC_DIR/y.tab.h
# Use flex to generate tokeIt.c
flex -o $SRC_DIR/tokeIt.c $SRC_DIR/tokeIt.l
# Compile the source files into object files
$CC $CFLAGS -c $SRC_DIR/TimeParser.c -o TimeParser.o
$CC $CFLAGS -c $SRC_DIR/parseIt.c -o parseIt.o
$CC $CFLAGS -I$SRC_DIR -c $SRC_DIR/tokeIt.c -o tokeIt.o
# Archive the object files into a static library
llvm-ar rcs libmoment.a TimeParser.o parseIt.o tokeIt.o
# Compile the fuzzing harness and link with the static library
$CC $CFLAGS -I$SRC_DIR $SRC/empty-fuzzer.c -o $OUT_DIR/empty-fuzzer -L. libmoment.a $LIB_FUZZING_ENGINE
```
Similar to libcypher-parser above, the impressive parts of this build script is the complexity of it, and the ability to download and install custom packages, and using the tools of these packages as part of the build script process. The build script adjusts a file in the target project using `sed` and runs several tools, bison and flex, to generate code needed for the compilation. Furthermore, the project itself does not have any build system files, e.g. `Makefile` and consequently the build agent reverts to compiling the source files directly.
# Limitations and future work
During our empirical evaluation we observed several limitations and possible improvements.
### Build scripts that succeed but avoid target source code
We observed several cases where the agent produces build scripts that simply avoid building the target source code and just end up building an empty fuzzing harness. The problem is that the approach currently does not do post-processing analysis on the generated harness in terms of validating that the target source code is part of the resulting binary. The problem in this case is that when the process moves on to harness generation, then the build script is not able to support the workflow because no building of the target source code is involved. Although this was a rare occurrence, a required solution is to have more rigorous post processing to validate the completeness of the build script generated, or the ability to adjust later in the end-to-end workflow.
### Build scripts that can only build a single fuzzing harness
The approach instructs the LLMs to generate build scripts that can build an arbitrary number of fuzzing harnesses. This is to make it possible to use the build script for building and linking any number of fuzzing harnesses that the harness-generation part of OSS-Fuzz-gen produces. It is likely OSS-Fuzz-gen will end up producing more than one valuable fuzzing harness, and the build script should be able to build them all at the same time. We observed that this constraint is not always accepted and some build scripts end up with the capability of only building a single fuzzing harness, e.g. due to lack of a loop in the build instructions. The problem in this case is that the user of the tool will have to adjust the build script such that it can build an arbitrary number of fuzzing harnesses, assuming that two or more fuzzing harnesses are needed for the final OSS-Fuzz integration.
### Integrating into target codebase’s build configurations
The current approach often yields build scripts that explicitly link fuzzing harnesses using CXX or CC environment variables to link against static libraries built earlier in the build script. As such, the build scripts are composed of a set of commands as well as a fuzzing harness source code. Another approach would be to integrate the building of the fuzzers into the build system of the target repository, such as by extending Makefiles and alike. Although this won’t have any practical difference as such, it may make the approach more friendly towards developers.
### Diagnosis and conclusions on failed generation
In the case where the build script generation failed, there is at the moment no real explanation as to why it failed. An extension to the workflow is to have another agent or similar, that analyse the reason why the build failed. For example, it is valuable to know if the reason for failing is a hard reason such as the code cannot be compiled in the relevant build runtime, or whether the LLM simply wasn’t capable of finding a proper solution. In case the reason is a hard reason, that is in a sense a positive conclusion since it can explicitly tell the user that the target project is not compatible with OSS-Fuzz (such as, if it’s a Windows-only project).
# Conclusion
In this blog post we have introduced a new capability of [OSS-Fuzz-Gen](https://github.com/google/oss-fuzz-gen) for producing OSS-Fuzz project integrations from scratch via an agent-based approach to build script generation. This is available through a CLI tool and only a single command is required for generating an OSS-Fuzz project. We ran the approach against a set of 225 projects which resulted in 88 OSS-Fuzz build scripts, and to highlight the capabilities we went through a set of these as well as identified limitations and future work.
The tooling is available on GitHub at [https://github.com/google/oss-fuzz-gen/tree/main/experimental/end_to_end](https://github.com/google/oss-fuzz-gen/tree/main/experimental/end_to_end) and we encourage users to try and run the tooling on their projects of interest. We are always happy to hear about projects where the build generation is not working and welcome users to submit this information as GitHub issues in the above repository.
================================================
FILE: infra/build/blog/hugo.toml
================================================
seURL = ''
languageCode = 'en-us'
title = 'OSS-Fuzz blog'
theme = 'hugo-coder'
languagecode = "en"
defaultcontentlanguage = "en"
paginate = 20
[markup.highlight]
style = "github-dark"
[params]
info = "Fuzzing Open Source Software at Scale"
description = "OSS-Fuzz fuzzing blog"
keywords = "fuzzing, vulnerability research, open source, security"
avatarurl = "images/oss-fuzz-logo.png"
faviconSVG = "/img/favicon.svg"
favicon_32 = "/img/favicon-32x32.png"
favicon_16 = "/img/favicon-16x16.png"
since = 2024
enableTwemoji = true
colorScheme = "auto"
hidecolorschemetoggle = false
[taxonomies]
category = "categories"
series = "series"
tag = "tags"
author = "authors"
# Social links
[[params.social]]
name = "Github"
icon = "fa-brands fa-github fa-2x"
weight = 1
url = "https://github.com/google/oss-fuzz"
# Menu links
[[menu.main]]
name = "Blog"
weight = 1
url = "posts/"
[[menu.main]]
name = "About"
weight = 2
url = "about/"
================================================
FILE: infra/build/build_status/Dockerfile
================================================
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################
FROM gcr.io/oss-fuzz-base/base-runner
RUN mkdir -p /opt/oss-fuzz/infra/build_status
COPY infra/build/functions/* /opt/oss-fuzz/infra/build_status/
COPY infra/build/build_status/* /opt/oss-fuzz/infra/build_status/
RUN pip3 install -r /opt/oss-fuzz/infra/build_status/requirements.txt
ENTRYPOINT [ "python3", "/opt/oss-fuzz/infra/build_status/update_build_status.py" ]
================================================
FILE: infra/build/build_status/cloudbuild.yaml
================================================
steps:
- name: 'gcr.io/cloud-builders/docker'
args:
- build
- '-t'
- gcr.io/oss-fuzz-base/build-status
- '-f'
- infra/build/build_status/Dockerfile
- .
- name: 'gcr.io/oss-fuzz-base/build-status'
args: []
timeout: 14400s
options:
logging: CLOUD_LOGGING_ONLY
timeout: 14400s
================================================
FILE: infra/build/build_status/fuzz_introspector_page_gen.py
================================================
#!/usr/bin/python3
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Logic to create Fuzz Introspector overview page."""
import json
from urllib.request import urlopen
from bs4 import BeautifulSoup
TABLE_HEAD = """
"""
def refine_percentage_string(percentage_string):
"""Shortens a srting to 4 characters and prepends zeros if necessary.
We need to prepend the zero to make sorting in the final table accurrate.
"""
percentage_string = percentage_string.replace("%", "")
if len(percentage_string.split(".")[0]) == 1:
percentage_string = "0" + percentage_string
if len(percentage_string) > 5:
percentage_string = percentage_string[:5]
# Check if the percentage is withing range of [0.0 : 100.0]
# Some old reports from 2022 have deprecated data, which we do not want to
# display.
float_val = float(percentage_string)
if float_val < 0.0 or float_val > 100.0:
# Raise exception to make the code display '-' elements.
raise Exception('Out of range numbers')
return percentage_string + "%"
def fetch_fuzz_introspector_summary(report_url):
"""Given a URL to an introspector report, returns a dictionary with data
from the report. This includes, fuzzer count, reachability and code
coverage.
"""
# Extract json summary file.
summary_url = report_url.replace('fuzz_report.html', 'summary.json')
response = urlopen(summary_url)
json_data = json.loads(response.read())
# 1) Extract fuzzer count. This corresponds to all but two elements at the
# top level of the dictionary.
fuzzer_count = len(json_data) - 2
# 2) Extract reachability count.
reached_stats = "0.0%"
if 'MergedProjectProfile' in json_data:
if 'stats' in json_data['MergedProjectProfile']:
merged_profile = json_data['MergedProjectProfile']
reached_stats = merged_profile['stats']['reached-complexity-percentage']
reached_stats = refine_percentage_string(str(reached_stats))
# Extract code coverage stats.
# Momentarily, we will get this from the HTML page because it's not yet
# in the summary.json. This will change in the near future, but in the
# spirit of time we keep it like this for now.
fuzz_report_html = urlopen(report_url).read()
soup = BeautifulSoup(fuzz_report_html, 'html.parser')
target_divs = soup.findAll('text', {'class': 'percentage'})
# The code coverage is the third instance of this text class.
raw_code_coverage = target_divs[2].string.strip()
code_coverage = refine_percentage_string(raw_code_coverage)
return {
'fuzzer_count': fuzzer_count,
'project_complexity_reached': reached_stats,
'code_coverage': code_coverage
}
def get_fuzzer_introspector_project_summary(report_url):
"""Return dictionary containing summary of fuzz introspector project."""
try:
results_dict = fetch_fuzz_introspector_summary(report_url)
except Exception: # pylint: disable=broad-except
results_dict = {
'fuzzer_count': '-',
'project_complexity_reached': '-',
'code_coverage': '-'
}
return results_dict
def get_fuzz_introspector_row(project, report_url):
"""Creates a single row in the Fuzz Introspector HTML table."""
project_summary = get_fuzzer_introspector_project_summary(report_url)
return ("