Repository: microsoft/sca-fuzzer Branch: main Commit: 7cc132332d6a Files: 378 Total size: 2.7 MB Directory structure: gitextract_198ykafh/ ├── .editorconfig ├── .github/ │ ├── CODEOWNERS │ └── workflows/ │ ├── kmodule-build.yaml │ └── python-lint-and-test.yaml ├── .gitignore ├── .gitmodules ├── .pylintrc ├── AUTHORS ├── CHANGELOG.md ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── SECURITY.md ├── demo/ │ ├── README.md │ ├── big-fuzz.yaml │ ├── detect-foreshadow.yaml │ ├── detect-mds.yaml │ ├── detect-sco.yaml │ ├── detect-v1-store.yaml │ ├── detect-v1-var.yaml │ ├── detect-v1.yaml │ ├── detect-v4.yaml │ ├── detect-zdi.yaml │ ├── tsa-l1d/ │ │ ├── config.yaml │ │ └── template.asm │ └── tsa-sq/ │ ├── config.yaml │ └── template.asm ├── docs/ │ ├── assets/ │ │ ├── branches.drawio │ │ ├── dr-instrumentation.drawio │ │ ├── dr-model.drawio │ │ ├── fuzzing-flow.drawio │ │ ├── tsa-sq-template.drawio │ │ └── unicorn-model-state-machine.drawio │ ├── faq/ │ │ └── general.md │ ├── glossary.md │ ├── howto/ │ │ ├── ask-a-question.md │ │ ├── choose-contract.md │ │ ├── design-campaign.md │ │ ├── interpret-results.md │ │ ├── minimize.md │ │ ├── root-cause-a-violation.md │ │ ├── use-macros.md │ │ └── use-templates.md │ ├── index.md │ ├── internals/ │ │ ├── architecture/ │ │ │ ├── analysis.md │ │ │ ├── code.md │ │ │ ├── data.md │ │ │ ├── exec.md │ │ │ ├── fuzz.md │ │ │ ├── isa.md │ │ │ ├── logging.md │ │ │ ├── mini.md │ │ │ ├── model.md │ │ │ └── overview.md │ │ ├── code-structure.md │ │ ├── contributing/ │ │ │ ├── code-style.md │ │ │ ├── general.md │ │ │ ├── git.md │ │ │ └── overview.md │ │ ├── index.md │ │ └── model-backends/ │ │ ├── model-dr.md │ │ └── model-unicorn.md │ ├── intro/ │ │ ├── 01-overview.md │ │ ├── 02-install.md │ │ ├── 03-primer.md │ │ ├── 04-tutorials.md │ │ ├── start-here.md │ │ └── tutorials/ │ │ ├── 01-first-fuzz.md │ │ ├── 02-first-vuln.md │ │ ├── 03-faults.md │ │ ├── 04-isolation.md │ │ ├── 05-extending.md │ │ └── tsa-sq.md │ ├── ref/ │ │ ├── artifact-file-formats.md │ │ ├── binary-formats.md │ │ ├── cli.md │ │ ├── config.md │ │ ├── index.md │ │ ├── macros.md │ │ ├── minimization-passes.md │ │ ├── modes.md │ │ ├── papers.md │ │ ├── registers.md │ │ ├── runtime-statistic.md │ │ └── sandbox.md │ ├── structure.md │ ├── stylesheets/ │ │ └── extra.css │ └── topics/ │ ├── actors.md │ ├── contracts.md │ ├── models.md │ ├── test-case-generation.md │ └── trace-analysis.md ├── mkdocs.yml ├── pyproject.toml ├── revizor.py ├── rvzr/ │ ├── __init__.py │ ├── analyser.py │ ├── arch/ │ │ ├── __init__.py │ │ ├── arm64/ │ │ │ ├── __init__.py │ │ │ ├── asm_parser.py │ │ │ ├── config.py │ │ │ ├── executor.py │ │ │ ├── fuzzer.py │ │ │ ├── generator.py │ │ │ ├── get_spec.py │ │ │ └── target_desc.py │ │ └── x86/ │ │ ├── __init__.py │ │ ├── asm_parser.py │ │ ├── config.py │ │ ├── executor.py │ │ ├── fuzzer.py │ │ ├── generator.py │ │ ├── get_spec.py │ │ └── target_desc.py │ ├── asm_parser.py │ ├── cli.py │ ├── code_generator.py │ ├── config.py │ ├── data_generator.py │ ├── elf_parser.py │ ├── executor.py │ ├── executor_km/ │ │ ├── .clang-format │ │ ├── .gitignore │ │ ├── Makefile │ │ ├── arm64/ │ │ │ ├── asm_snippets.h │ │ │ ├── entry_exit_points.h │ │ │ ├── exception.S │ │ │ ├── fault_handler.c │ │ │ ├── macros.c │ │ │ ├── page_tables_guest.c │ │ │ ├── perf_counters.c │ │ │ ├── registers.h │ │ │ └── special_registers.c │ │ ├── code_loader.c │ │ ├── data_loader.c │ │ ├── include/ │ │ │ ├── actor.h │ │ │ ├── asm_snippets.h │ │ │ ├── code_loader.h │ │ │ ├── data_loader.h │ │ │ ├── fault_handler.h │ │ │ ├── hardware_desc.h │ │ │ ├── input_parser.h │ │ │ ├── macro_expansion.h │ │ │ ├── main.h │ │ │ ├── measurement.h │ │ │ ├── page_tables_common.h │ │ │ ├── page_tables_guest.h │ │ │ ├── page_tables_host.h │ │ │ ├── perf_counters.h │ │ │ ├── sandbox_constants.h │ │ │ ├── sandbox_manager.h │ │ │ ├── shortcuts.h │ │ │ ├── special_registers.h │ │ │ ├── svm.h │ │ │ ├── svm_constants.h │ │ │ ├── test_case_parser.h │ │ │ ├── vmx.h │ │ │ └── vmx_config.h │ │ ├── input_parser.c │ │ ├── macro_expansion.c │ │ ├── main.c │ │ ├── measurement.c │ │ ├── page_tables_host.c │ │ ├── readme.md │ │ ├── sandbox_manager.c │ │ ├── test_case_parser.c │ │ └── x86/ │ │ ├── asm_snippets.h │ │ ├── entry_exit_points.h │ │ ├── fault_handlers.S │ │ ├── idt.c │ │ ├── macros.c │ │ ├── page_tables_guest.c │ │ ├── perf_counters.c │ │ ├── registers.h │ │ ├── special_registers.c │ │ ├── svm.c │ │ └── vmx.c │ ├── factory.py │ ├── fuzzer.py │ ├── instruction_spec.py │ ├── isa_spec.py │ ├── logs.py │ ├── model.py │ ├── model_dynamorio/ │ │ ├── Makefile │ │ ├── __init__.py │ │ ├── adapter/ │ │ │ ├── .clang-format │ │ │ ├── .clang-tidy │ │ │ ├── CMakeLists.txt │ │ │ ├── main.c │ │ │ ├── parser.c │ │ │ ├── parser.h │ │ │ ├── rcbf.h │ │ │ ├── rdbf.h │ │ │ ├── sandbox.c │ │ │ ├── sandbox.h │ │ │ ├── sandbox_const.h │ │ │ └── test_case_entry.S │ │ ├── backend/ │ │ │ ├── .clang-format │ │ │ ├── .clang-tidy │ │ │ ├── CMakeLists.txt │ │ │ ├── cli.cpp │ │ │ ├── dispatcher.cpp │ │ │ ├── factory.cpp │ │ │ ├── include/ │ │ │ │ ├── cli.hpp │ │ │ │ ├── dispatcher.hpp │ │ │ │ ├── factory.hpp │ │ │ │ ├── logger.hpp │ │ │ │ ├── observables.hpp │ │ │ │ ├── speculator_abc.hpp │ │ │ │ ├── speculators/ │ │ │ │ │ ├── cond.hpp │ │ │ │ │ └── seq.hpp │ │ │ │ ├── taint_tracker.hpp │ │ │ │ ├── tracer_abc.hpp │ │ │ │ ├── tracers/ │ │ │ │ │ ├── ct.hpp │ │ │ │ │ ├── ind.hpp │ │ │ │ │ └── pc.hpp │ │ │ │ ├── types/ │ │ │ │ │ ├── debug_trace.hpp │ │ │ │ │ ├── decoder.hpp │ │ │ │ │ ├── file_buffer.hpp │ │ │ │ │ ├── input_taint.hpp │ │ │ │ │ ├── store_log.hpp │ │ │ │ │ └── trace.hpp │ │ │ │ └── util.hpp │ │ │ ├── logger.cpp │ │ │ ├── model.cpp │ │ │ ├── speculator_abc.cpp │ │ │ ├── speculators/ │ │ │ │ ├── cond.cpp │ │ │ │ └── seq.cpp │ │ │ ├── taint_tracker.cpp │ │ │ ├── tracer_abc.cpp │ │ │ ├── tracers/ │ │ │ │ ├── ct.cpp │ │ │ │ ├── ind.cpp │ │ │ │ └── pc.cpp │ │ │ └── util.cpp │ │ ├── model.py │ │ └── trace_decoder.py │ ├── model_unicorn/ │ │ ├── __init__.py │ │ ├── coverage.py │ │ ├── execution_context.py │ │ ├── interpreter.py │ │ ├── model.py │ │ ├── speculator_abc.py │ │ ├── speculators_basic.py │ │ ├── speculators_fault.py │ │ ├── speculators_vs.py │ │ ├── taint_tracker.py │ │ └── tracer.py │ ├── postprocessing/ │ │ ├── __init__.py │ │ ├── analysis_passes.py │ │ ├── input_passes.py │ │ ├── instruction_passes.py │ │ ├── minimizer.py │ │ ├── pass_abc.py │ │ └── progress_printer.py │ ├── py.typed │ ├── sandbox.py │ ├── stats.py │ ├── target_desc.py │ ├── tc_components/ │ │ ├── __init__.py │ │ ├── actor.py │ │ ├── instruction.py │ │ ├── test_case_binary.py │ │ ├── test_case_code.py │ │ └── test_case_data.py │ ├── traces.py │ └── unicorn.pyi └── tests/ ├── .coveragerc ├── .gitignore ├── __init__.py ├── acceptance.bats ├── arm64/ │ ├── asm/ │ │ ├── actor_switch.asm │ │ ├── asm_basic.asm │ │ ├── asm_multiactor.asm │ │ ├── asm_symbol.asm │ │ ├── calls.asm │ │ ├── direct_jumps.asm │ │ ├── fault-div-zero-speculation.asm │ │ ├── fault_undefined_opcode.asm │ │ ├── macro_fault_handler.asm │ │ ├── model_flags_match.asm │ │ ├── model_match.asm │ │ ├── model_match_memory.asm │ │ ├── model_match_xmm.asm │ │ └── spectre_v1.asm │ ├── configs/ │ │ ├── arch-actors.yaml │ │ ├── arch-faults.yaml │ │ ├── arch.yaml │ │ ├── archdiff.yaml │ │ ├── base-and-simd-categories.yaml │ │ ├── common.yaml │ │ ├── ct-cond.yaml │ │ ├── ct-seq.yaml │ │ ├── exceptions.yaml │ │ └── fault-handler.yaml │ ├── min_arm64.json │ ├── model_common.py │ ├── unit_generators.py │ └── unit_isa_loader.py ├── kernel_module.bats ├── pre-release.sh ├── quick-test.sh ├── runtests.sh ├── scripts/ │ ├── create_rcbf_file.py │ └── create_rdbf_file.py ├── unit_analyser.py ├── unit_docs.py ├── unit_fuzzer.py ├── unit_isa_loader.py ├── unit_stats.py ├── unit_tc_components.py ├── unit_traces.py └── x86_tests/ ├── __init__.py ├── asm/ │ ├── actor_switch.asm │ ├── asm_basic.asm │ ├── asm_multiactor.asm │ ├── asm_symbol.asm │ ├── calls.asm │ ├── direct_jumps.asm │ ├── fault-div-overflow-speculation.asm │ ├── fault-div-zero-speculation.asm │ ├── fault_INT1.asm │ ├── fault_INT3.asm │ ├── fault_UD.asm │ ├── fault_load.asm │ ├── fault_ooo_mem_access.asm │ ├── fault_rmw.asm │ ├── macro_fault_handler.asm │ ├── minimization-after.asm │ ├── minimization-before.asm │ ├── model_flags_match.asm │ ├── model_match.asm │ ├── model_match_memory.asm │ ├── model_match_xmm.asm │ ├── spectre_ret.asm │ ├── spectre_v1.1.asm │ ├── spectre_v1.asm │ ├── spectre_v1_arch.asm │ ├── spectre_v1_independent.asm │ ├── spectre_v1_n2.asm │ ├── spectre_v2.asm │ ├── spectre_v4.asm │ └── vm_switch.asm ├── configs/ │ ├── arch-actors.yaml │ ├── arch-dr.yaml │ ├── arch-faults.yaml │ ├── arch.yaml │ ├── archdiff.yaml │ ├── base-and-simd-categories.yaml │ ├── base-categories.yaml │ ├── common.yaml │ ├── copy.yaml │ ├── ct-cond.yaml │ ├── ct-deh.yaml │ ├── ct-seq.yaml │ ├── div-detect.yaml │ ├── div-verif.yaml │ ├── exceptions.yaml │ ├── fault-handler.yaml │ ├── l1tf-p-verif.yaml │ ├── l1tf-p.yaml │ ├── l1tf-w-verif.yaml │ ├── l1tf-w.yaml │ ├── meltdown-verif.yaml │ ├── meltdown.yaml │ ├── mpx-verif.yaml │ ├── mpx.yaml │ ├── ssbp-detect.yaml │ ├── ssbp-verif.yaml │ └── vm-switch.yaml ├── min_x86.json ├── model_common.py ├── unit_dr_decoder.py ├── unit_fuzzer.py ├── unit_generators.py ├── unit_isa_loader.py ├── unit_model.py └── unit_taint_tracker.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .editorconfig ================================================ # https://editorconfig.org/ root = true [*] indent_style = space indent_size = 4 insert_final_newline = true trim_trailing_whitespace = true end_of_line = lf charset = utf-8 max_line_length = 100 [*.json] indent_size = 2 keep_blank_lines_in_code = 0 keep_indents_on_empty_lines = false keep_line_breaks = true space_after_colon = true space_after_comma = true space_before_colon = true space_before_comma = false spaces_within_braces = false spaces_within_brackets = false wrap_long_lines = false insert_final_newline = ignore [Makefile] indent_style = tab [{*.bash,*.zsh,*.sh,*.bats}] tab_width = 4 binary_ops_start_line = false keep_column_alignment_padding = false minify_program = false redirect_followed_by_space = false switch_cases_indented = false [{*.yml,*.yaml}] indent_size = 2 keep_indents_on_empty_lines = false keep_line_breaks = true ================================================ FILE: .github/CODEOWNERS ================================================ * @OleksiiOleksenko ================================================ FILE: .github/workflows/kmodule-build.yaml ================================================ # This workflow will build the kernel module on multiple Ubuntu versions name: Kmodule Build on: push: branches: - main - main-fixes - pre-release - dev pull_request: branches: - main - main-fixes - pre-release - dev jobs: km_build: permissions: contents: read strategy: fail-fast: false matrix: include: - runner: ubuntu-latest name: x86_latest - runner: ubuntu-22.04 name: x86_backward_compatible - runner: ubuntu-24.04-arm name: arm_latest - runner: ubuntu-22.04-arm name: arm_backward_compatible runs-on: ${{ matrix.runner }} name: km_build_${{ matrix.name }} steps: - uses: actions/checkout@v4 - name: Install kernel headers run: sudo apt-get update && sudo apt-get install -y linux-headers-$(uname -r) linux-headers-generic - name: Build kernel module run: | set -o pipefail cd rvzr/executor_km make VMBUILD=1 2>&1 | tee build.log if grep -q "Error" build.log; then echo "Build failed" exit 1 fi ================================================ FILE: .github/workflows/python-lint-and-test.yaml ================================================ # This workflow will install Python dependencies, run tests and lint with a variety of Python versions # For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python name: Python Lint and Test permissions: contents: read on: push: branches: - main - main-fixes - pre-release - dev pull_request: branches: - main - main-fixes - pre-release - dev jobs: build: runs-on: ubuntu-latest strategy: fail-fast: false matrix: python-version: ["3.9", "3.13"] steps: - uses: actions/checkout@v4 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v5 with: python-version: ${{ matrix.python-version }} - name: Install dependencies run: | python -m pip install --upgrade pip python -m pip install flake8 mypy pylint python -m pip install . - name: Run run: | ./tests/runtests.sh --skip-km-tests ================================================ FILE: .gitignore ================================================ cmake-build-*/ build/ .vscode/ .mypy_cache/ .lsync* venv/ **/__pycache__/ base.json rvzr/arch/x86/*.json *.code-workspace *.o rvzr/generated.asm generated.asm generated rvzr/executor_km/.cache.mk rvzr/executor_km/measurement.o.ur-safe dbg/ site dist/ .cache/ .claude/ ================================================ FILE: .gitmodules ================================================ ================================================ FILE: .pylintrc ================================================ [MAIN] # Analyse import fallback blocks. This can be used to support both Python 2 and # 3 compatible code, which means that the block might have code that exists # only in one or another interpreter, leading to false positives when analysed. analyse-fallback-blocks=no # Clear in-memory caches upon conclusion of linting. Useful if running pylint # in a server-like mode. clear-cache-post-run=no # Load and enable all available extensions. Use --list-extensions to see a list # all available extensions. #enable-all-extensions= # In error mode, messages with a category besides ERROR or FATAL are # suppressed, and no reports are done by default. Error mode is compatible with # disabling specific errors. #errors-only= # Always return a 0 (non-error) status code, even if lint errors are found. # This is primarily useful in continuous integration scripts. #exit-zero= # A comma-separated list of package or module names from where C extensions may # be loaded. Extensions are loading into the active Python interpreter and may # run arbitrary code. extension-pkg-allow-list= # A comma-separated list of package or module names from where C extensions may # be loaded. Extensions are loading into the active Python interpreter and may # run arbitrary code. (This is an alternative name to extension-pkg-allow-list # for backward compatibility.) extension-pkg-whitelist= # Return non-zero exit code if any of these messages/categories are detected, # even if score is above --fail-under value. Syntax same as enable. Messages # specified are enabled, while categories only check already-enabled messages. fail-on= # Specify a score threshold under which the program will exit with error. fail-under=10 # Interpret the stdin as a python script, whose filename needs to be passed as # the module_or_package argument. #from-stdin= # Files or directories to be skipped. They should be base names, not paths. ignore=CVS # Add files or directories matching the regular expressions patterns to the # ignore-list. The regex matches against paths and can be in Posix or Windows # format. Because '\\' represents the directory delimiter on Windows systems, # it can't be used as an escape character. ignore-paths= # Files or directories matching the regular expression patterns are skipped. # The regex matches against base names, not paths. The default value ignores # Emacs file locks ignore-patterns=^\.# # List of module names for which member attributes should not be checked and # will not be imported (useful for modules/projects where namespaces are # manipulated during runtime and thus existing member attributes cannot be # deduced by static analysis). It supports qualified module names, as well as # Unix pattern matching. ignored-modules= # Python code to execute, usually for sys.path manipulation such as # pygtk.require(). #init-hook= # Use multiple processes to speed up Pylint. Specifying 0 will auto-detect the # number of processors available to use, and will cap the count on Windows to # avoid hangs. jobs=1 # Control the amount of potential inferred values when inferring a single # object. This can help the performance when dealing with large functions or # complex, nested conditions. limit-inference-results=100 # List of plugins (as comma separated values of python module names) to load, # usually to register additional checkers. load-plugins= # Pickle collected data for later comparisons. persistent=yes # Resolve imports to .pyi stubs if available. May reduce no-member messages and # increase not-an-iterable messages. prefer-stubs=no # Minimum Python version to use for version dependent checks. Will default to # the version used to run pylint. py-version=3.12 # Discover python modules and packages in the file system subtree. recursive=no # Add paths to the list of the source roots. Supports globbing patterns. The # source root is an absolute path or a path relative to the current working # directory used to determine a package namespace for modules located under the # source root. source-roots= # Allow loading of arbitrary C extensions. Extensions are imported into the # active Python interpreter and may run arbitrary code. unsafe-load-any-extension=no # In verbose mode, extra non-checker-related info will be displayed. #verbose= [BASIC] # Naming style matching correct argument names. argument-naming-style=snake_case # Regular expression matching correct argument names. Overrides argument- # naming-style. If left empty, argument names will be checked with the set # naming style. #argument-rgx= # Naming style matching correct attribute names. attr-naming-style=snake_case # Regular expression matching correct attribute names. Overrides attr-naming- # style. If left empty, attribute names will be checked with the set naming # style. #attr-rgx= # Bad variable names which should always be refused, separated by a comma. bad-names=foo, bar, baz, toto, tutu, tata # Bad variable names regexes, separated by a comma. If names match any regex, # they will always be refused bad-names-rgxs= # Naming style matching correct class attribute names. class-attribute-naming-style=any # Regular expression matching correct class attribute names. Overrides class- # attribute-naming-style. If left empty, class attribute names will be checked # with the set naming style. #class-attribute-rgx= # Naming style matching correct class constant names. class-const-naming-style=UPPER_CASE # Regular expression matching correct class constant names. Overrides class- # const-naming-style. If left empty, class constant names will be checked with # the set naming style. #class-const-rgx= # Naming style matching correct class names. class-naming-style=PascalCase # Regular expression matching correct class names. Overrides class-naming- # style. If left empty, class names will be checked with the set naming style. #class-rgx= # Naming style matching correct constant names. const-naming-style=UPPER_CASE # Regular expression matching correct constant names. Overrides const-naming- # style. If left empty, constant names will be checked with the set naming # style. #const-rgx= # Minimum line length for functions/classes that require docstrings, shorter # ones are exempt. docstring-min-length=-1 # Naming style matching correct function names. function-naming-style=snake_case # Regular expression matching correct function names. Overrides function- # naming-style. If left empty, function names will be checked with the set # naming style. #function-rgx= # Good variable names which should always be accepted, separated by a comma. good-names=i, j, k, ex, Run, _ # Good variable names regexes, separated by a comma. If names match any regex, # they will always be accepted good-names-rgxs= # Include a hint for the correct naming format with invalid-name. include-naming-hint=no # Naming style matching correct inline iteration names. inlinevar-naming-style=any # Regular expression matching correct inline iteration names. Overrides # inlinevar-naming-style. If left empty, inline iteration names will be checked # with the set naming style. #inlinevar-rgx= # Naming style matching correct method names. method-naming-style=snake_case # Regular expression matching correct method names. Overrides method-naming- # style. If left empty, method names will be checked with the set naming style. #method-rgx= # Naming style matching correct module names. module-naming-style=snake_case # Regular expression matching correct module names. Overrides module-naming- # style. If left empty, module names will be checked with the set naming style. #module-rgx= # Colon-delimited sets of names that determine each other's naming style when # the name regexes allow several styles. name-group= # Regular expression which should only match function or class names that do # not require a docstring. no-docstring-rgx=^_ # List of decorators that produce properties, such as abc.abstractproperty. Add # to this list to register other decorators that produce valid properties. # These decorators are taken in consideration only for invalid-name. property-classes=abc.abstractproperty # Regular expression matching correct type alias names. If left empty, type # alias names will be checked with the set naming style. #typealias-rgx= # Regular expression matching correct type variable names. If left empty, type # variable names will be checked with the set naming style. #typevar-rgx= # Naming style matching correct variable names. variable-naming-style=snake_case # Regular expression matching correct variable names. Overrides variable- # naming-style. If left empty, variable names will be checked with the set # naming style. #variable-rgx= [CLASSES] # Warn about protected attribute access inside special methods check-protected-access-in-special-methods=no # List of method names used to declare (i.e. assign) instance attributes. defining-attr-methods=__init__, __new__, setUp, asyncSetUp, __post_init__ # List of member names, which should be excluded from the protected access # warning. exclude-protected=_asdict,_fields,_replace,_source,_make,os._exit # List of valid names for the first argument in a class method. valid-classmethod-first-arg=cls # List of valid names for the first argument in a metaclass class method. valid-metaclass-classmethod-first-arg=mcs [DESIGN] # List of regular expressions of class ancestor names to ignore when counting # public methods (see R0903) exclude-too-few-public-methods= # List of qualified class names to ignore when counting class parents (see # R0901) ignored-parents= # Maximum number of arguments for function / method. max-args=9 # NOTE: non-default (5) because we rely on data classes with many attributes # Maximum number of attributes for a class (see R0902). max-attributes=12 # NOTE: non-default (5) because we rely on data classes with many attributes # Maximum number of boolean expressions in an if statement (see R0916). max-bool-expr=5 # Maximum number of branch for function / method body. max-branches=12 # Maximum number of locals for function / method body. max-locals=15 # Maximum number of parents for a class (see R0901). max-parents=7 # Maximum number of public methods for a class (see R0904). max-public-methods=20 # Maximum number of return / yield for function / method body. max-returns=6 # Maximum number of statements in function / method body. max-statements=50 # Minimum number of public methods for a class (see R0903). min-public-methods=1 [EXCEPTIONS] # Exceptions that will emit a warning when caught. overgeneral-exceptions=builtins.BaseException,builtins.Exception [FORMAT] # Expected format of line ending, e.g. empty (any line ending), LF or CRLF. expected-line-ending-format= # Regexp for a line that is allowed to be longer than the limit. ignore-long-lines=^\s*(# )??$ # Number of spaces of indent required inside a hanging or continued line. indent-after-paren=4 # String used as indentation unit. This is usually " " (4 spaces) or "\t" (1 # tab). indent-string=' ' # Maximum number of characters on a single line. max-line-length=100 # Maximum number of lines in a module. max-module-lines=1000 # Allow the body of a class to be on the same line as the declaration if body # contains single statement. single-line-class-stmt=no # Allow the body of an if to be on the same line as the test if there is no # else. single-line-if-stmt=no [IMPORTS] # List of modules that can be imported at any level, not just the top level # one. allow-any-import-level= # Allow explicit reexports by alias from a package __init__. allow-reexport-from-package=no # Allow wildcard imports from modules that define __all__. allow-wildcard-with-all=no # Deprecated modules which should not be used, separated by a comma. deprecated-modules= # Output a graph (.gv or any supported image format) of external dependencies # to the given file (report RP0402 must not be disabled). ext-import-graph= # Output a graph (.gv or any supported image format) of all (i.e. internal and # external) dependencies to the given file (report RP0402 must not be # disabled). import-graph= # Output a graph (.gv or any supported image format) of internal dependencies # to the given file (report RP0402 must not be disabled). int-import-graph= # Force import order to recognize a module as part of the standard # compatibility libraries. known-standard-library= # Force import order to recognize a module as part of a third party library. known-third-party=enchant # Couples of modules and preferred modules, separated by a comma. preferred-modules= [LOGGING] # The type of string formatting that logging methods do. `old` means using % # formatting, `new` is for `{}` formatting. logging-format-style=old # Logging modules to check that the string format arguments are in logging # function parameter format. logging-modules=logging [MESSAGES CONTROL] # Only show warnings with the listed confidence levels. Leave empty to show # all. Valid levels: HIGH, CONTROL_FLOW, INFERENCE, INFERENCE_FAILURE, # UNDEFINED. confidence=HIGH, CONTROL_FLOW, INFERENCE, INFERENCE_FAILURE, UNDEFINED # Disable the message, report, category or checker with the given id(s). You # can either give multiple identifiers separated by comma (,) or put this # option multiple times (only on the command line, not in the configuration # file where it should appear only once). You can also use "--disable=all" to # disable everything first and then re-enable specific checks. For example, if # you want to run only the similarities checker, you can use "--disable=all # --enable=similarities". If you want to run only the classes checker, but have # no Warning level messages displayed, use "--disable=all --enable=classes # --disable=W". disable=W0511, # disable warnings on FIXME tag # invalid-name: we actively use Final to define read-only attributes, # so using UPPERCASE everywhere would lead to messy code c0103, # use-yield-from: the replacement does not always produce the same result functionally # and it breaks the code, so we disable this warning r1737, # unspecified-encoding: Revizor runs only on Linux, so we don't need to specify encoding w1514, # too-many-positional-arguments # NOTE: we use data classes with many attributes r0917, # too-few-public-methods: we use data classes with many attributes r0903, # rise-missing-from w0707, # consider-using-sys-exit: just meh r1722, # Enable the message, report, category or checker with the given id(s). You can # either give multiple identifier separated by comma (,) or put this option # multiple time (only on the command line, not in the configuration file where # it should appear only once). See also the "--disable" option for examples. enable= [METHOD_ARGS] # List of qualified names (i.e., library.method) which require a timeout # parameter e.g. 'requests.api.get,requests.api.post' timeout-methods=requests.api.delete,requests.api.get,requests.api.head,requests.api.options,requests.api.patch,requests.api.post,requests.api.put,requests.api.request [MISCELLANEOUS] # List of note tags to take in consideration, separated by a comma. notes=FIXME, XXX, TODO # Regular expression of note tags to take in consideration. notes-rgx= [REFACTORING] # Maximum number of nested blocks for function / method body max-nested-blocks=5 # Complete name of functions that never returns. When checking for # inconsistent-return-statements if a never returning function is called then # it will be considered as an explicit return statement and no message will be # printed. never-returning-functions=sys.exit,argparse.parse_error # Let 'consider-using-join' be raised when the separator to join on would be # non-empty (resulting in expected fixes of the type: ``"- " + " - # ".join(items)``) suggest-join-with-non-empty-separator=yes [REPORTS] # Python expression which should return a score less than or equal to 10. You # have access to the variables 'fatal', 'error', 'warning', 'refactor', # 'convention', and 'info' which contain the number of messages in each # category, as well as 'statement' which is the total number of statements # analyzed. This score is used by the global evaluation report (RP0004). evaluation=max(0, 0 if fatal else 10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10)) # Template used to display messages. This is a python new-style format string # used to format the message information. See doc for all details. msg-template= # Set the output format. Available formats are: text, parseable, colorized, # json2 (improved json format), json (old json format) and msvs (visual # studio). You can also give a reporter class, e.g. # mypackage.mymodule.MyReporterClass. #output-format= # Tells whether to display a full report or only the messages. reports=no # Activate the evaluation score. score=yes [SIMILARITIES] # Comments are removed from the similarity computation ignore-comments=yes # Docstrings are removed from the similarity computation ignore-docstrings=yes # Imports are removed from the similarity computation ignore-imports=yes # Signatures are removed from the similarity computation ignore-signatures=yes # Minimum lines number of a similarity. min-similarity-lines=8 [SPELLING] # Limits count of emitted suggestions for spelling mistakes. max-spelling-suggestions=4 # Spelling dictionary name. No available dictionaries : You need to install # both the python package and the system dependency for enchant to work. spelling-dict= # List of comma separated words that should be considered directives if they # appear at the beginning of a comment and should not be checked. spelling-ignore-comment-directives=fmt: on,fmt: off,noqa:,noqa,nosec,isort:skip,mypy: # List of comma separated words that should not be checked. spelling-ignore-words= # A path to a file that contains the private dictionary; one word per line. spelling-private-dict-file= # Tells whether to store unknown words to the private dictionary (see the # --spelling-private-dict-file option) instead of raising a message. spelling-store-unknown-words=no [STRING] # This flag controls whether inconsistent-quotes generates a warning when the # character used as a quote delimiter is used inconsistently within a module. check-quote-consistency=no # This flag controls whether the implicit-str-concat should generate a warning # on implicit string concatenation in sequences defined over several lines. check-str-concat-over-line-jumps=no [TYPECHECK] # List of decorators that produce context managers, such as # contextlib.contextmanager. Add to this list to register other decorators that # produce valid context managers. contextmanager-decorators=contextlib.contextmanager # List of members which are set dynamically and missed by pylint inference # system, and so shouldn't trigger E1101 when accessed. Python regular # expressions are accepted. generated-members= # Tells whether to warn about missing members when the owner of the attribute # is inferred to be None. ignore-none=yes # This flag controls whether pylint should warn about no-member and similar # checks whenever an opaque object is returned when inferring. The inference # can return multiple potential results while evaluating a Python object, but # some branches might not be evaluated, which results in partial inference. In # that case, it might be useful to still emit no-member and other checks for # the rest of the inferred objects. ignore-on-opaque-inference=yes # List of symbolic message names to ignore for Mixin members. ignored-checks-for-mixins=no-member, not-async-context-manager, not-context-manager, attribute-defined-outside-init # List of class names for which member attributes should not be checked (useful # for classes with dynamically set attributes). This supports the use of # qualified names. ignored-classes=optparse.Values,thread._local,_thread._local,argparse.Namespace # Show a hint with possible names when a member name was not found. The aspect # of finding the hint is based on edit distance. missing-member-hint=yes # The minimum edit distance a name should have in order to be considered a # similar match for a missing member name. missing-member-hint-distance=1 # The total number of similar names that should be taken in consideration when # showing a hint for a missing member. missing-member-max-choices=1 # Regex pattern to define which classes are considered mixins. mixin-class-rgx=.*[Mm]ixin # List of decorators that change the signature of a decorated function. signature-mutators= [VARIABLES] # List of additional names supposed to be defined in builtins. Remember that # you should avoid defining new builtins when possible. additional-builtins= # Tells whether unused global variables should be treated as a violation. allow-global-unused-variables=yes # List of names allowed to shadow builtins allowed-redefined-builtins= # List of strings which can identify a callback function by name. A callback # name must start or end with one of those strings. callbacks=cb_, _cb # A regular expression matching the name of dummy variables (i.e. expected to # not be used). dummy-variables-rgx=_+$|(_[a-zA-Z0-9_]*[a-zA-Z0-9]+?$)|dummy|^ignored_|^unused_ # Argument names that match this expression will be ignored. ignored-argument-names=_.*|^ignored_|^unused_ # Tells whether we should check for unused import in __init__ files. init-import=no # List of qualified module names which can have objects that can redefine # builtins. redefining-builtins-modules=six.moves,past.builtins,future.builtins,builtins,io ================================================ FILE: AUTHORS ================================================ Here is an inevitably incomplete list of MUCH-APPRECIATED CONTRIBUTORS: Oleksii Oleksenko Boris Koepf Emanuele Vannacci Jana Hofmann Connor Shugg Marco Guarnieri Flavien Solt Brian Fu Alvise de Faveri Tron ================================================ FILE: CHANGELOG.md ================================================ # Changelog All notable changes to Revizor will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [2.0.0] - 2026-01-10 ### TL;DR This release contains a major refactoring of the codebase, including many of the core modules. This breaks compatibility with previous versions, hence the major version bump. In addition, several significant enhancements have been made: - ARM64 is now fully supported. - New DynamoRIO-based model backend has been added, which vastly improves ISA coverage on x86. - The documentation has been fully restructured and expanded. ### Added #### ARM64 Support - Full hardware tracing support for ARM64 CPUs (#137) - ARM64 executor, fuzzer, and code generator implementations - ARM64 test suite with acceptance and unit tests - ARM64 ISA specification and target description #### DynamoRIO Model Backend - New DynamoRIO-based model backend added, which completely re-implements the leakage modeling functionality - New tracers: indirect memory access (IND) tracer and poisoning of faulty loads (#133) - Contract-based input generation for DynamoRIO backend (#138) #### Documentation - Complete documentation restructure with tutorials, reference guides, and topic guides - Five comprehensive tutorials covering first fuzzing campaign, vulnerability detection, fault handling, isolation, and extending Revizor - Detailed primer on contracts and leakage models - In-depth guides on choosing contracts, designing campaigns, interpreting results, and root-causing violations - Architecture overview with detailed diagrams - DynamoRIO backend instrumentation diagrams - Sandbox and binary format documentation - Actor and test case generation topics - Glossary of key terms #### Demos and Examples - TSA-L1D demo configuration and template - TSA-SQ demo files - Improved detection demos for various Spectre variants #### Testing and Development - Unified tests for Unicorn and DynamoRIO backends - Unit tests for traces, stats, and test case components - Utility scripts for generating RCBF/RDBF test files - Interface to run individual testing stages - Improved test coverage and CI integration #### Misc. Features - Special value generation option for input data (not just random values) - More verbose configuration error messages - Better visibility for warnings in logger output - Support for FS/GS segment register instructions in ISA specification - Input differential minimization for observer actors ### Changed **WARNING**: This release contains breaking changes! The release introduces a complete refactoring of the code structure, including many of the core modules. See docs/internals/architecture/overview.md for details. #### Code Structure - Renamed source directory from src/ to rvzr/ for better compliance with Python packaging standards - Encapsulated all core components into dedicated modules (sandbox.py, actor.py, etc) - Moved all test case components into a dedicated directory rvzr/tc_components - Refactored fuzzer.py to isolate the multi-stage filtering logic into a dedicated class - Isolated utility classes into dedicated modules stats.py and logs.py - Unicorn-based backend split into logical classes: Tracer, Speculator, TaintTracker, etc. (rvzr/model_unicorn) - Reorganized into architecture-specific subdirectories (rvzr/arch/x86, rvzr/arch/arm64) - Minimizer refactored to encapsulate each pass into a separate class (rvzr/postprocessing) - Executor KM is now shared between x86 and ARM to avoid code duplication - Consistent naming conventions for generators across architectures - Improved code style and formatting #### Configuration Options - Many config options have been renamed during the refactoring process - Refer to the updated documentation (`docs/ref/config.md`)for the new option names and their usage. #### ISA Spec Format - Renamed several fields in the json produced by the download_spec command #### Testing Infrastructure - Cleaner interface for test scripts - GitHub Actions aligned with internal test scripts #### Documentation Structure - Reorganized into intro/, howto/, ref/, topics/, and internals/ sections - Split architecture documentation into per-module pages - Updated navigation structure in MkDocs ### Deprecated - MPX support --- ## [1.3.2] - 2024-09-12 See git history for changes in version 1.3.2 and earlier. [1.3.3]: https://github.com/microsoft/side-channel-fuzzer/compare/v1.3.2...v1.3.3 [1.3.2]: https://github.com/microsoft/side-channel-fuzzer/releases/tag/v1.3.2 ================================================ FILE: CODE_OF_CONDUCT.md ================================================ # Microsoft Open Source Code of Conduct This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). Resources: - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/) - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns ================================================ FILE: CONTRIBUTING.md ================================================ # Contributing As an open source project, Revizor welcomes contributions and suggestions. ## Contributor License Agreement and Code of Conduct Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. ## Contribution Guidelines Please refer to the [Guide to Contributing](https://microsoft.github.io/side-channel-fuzzer/internals/contributing/overview/) for an overview of how to contribute. ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) Microsoft Corporation. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE ================================================ FILE: README.md ================================================ # Revizor ![GitHub](https://img.shields.io/github/license/microsoft/side-channel-fuzzer) ![GitHub all releases](https://img.shields.io/github/downloads/microsoft/side-channel-fuzzer/total) ![GitHub contributors](https://img.shields.io/github/contributors/microsoft/side-channel-fuzzer) ![PyPI](https://img.shields.io/pypi/v/revizor-fuzzer?label=PyPI) ![PyPI - Downloads](https://img.shields.io/pypi/dm/revizor-fuzzer?label=%22PyPI%20Downloads%22) Revizor is a security-oriented fuzzer for detecting information leaks in CPUs, such as [Spectre and Meltdown](https://meltdownattack.com/). It tests CPUs against [Leakage Contracts](https://arxiv.org/abs/2006.03841) and searches for unexpected leaks. ## Getting Started and Documentation You can find a quick start guide at [Quick Start](https://microsoft.github.io/side-channel-fuzzer/intro/start-here/). For detailed information on how to use Revizor, see [Documentation Pages](https://microsoft.github.io/side-channel-fuzzer/structure/). For information on how to contribute to Revizor, see [CONTRIBUTING.md](CONTRIBUTING.md). ## Need Help with Revizor? If you find a bug in Revizor, don't hesitate to [open an issue](https://github.com/microsoft/side-channel-fuzzer/issues). If something is confusing or you need help in using Revizor, we have a [discussion page](https://github.com/microsoft/side-channel-fuzzer/discussions). ## Citing Revizor To cite this project, you can use any of the following references: 1. Original paper that introduced the concept of Model-based Relation Testing as well as the Revizor tool: Oleksii Oleksenko, Christof Fetzer, Boris Köpf, Mark Silberstein. "[Revizor: Testing Black-box CPUs against Speculation Contracts](https://www.microsoft.com/en-us/research/publication/revizor-testing-black-box-cpus-against-speculation-contracts/)" in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022. 2. Theoretical foundations of leakage contract: Marco Guarnieri, Boris Köpf, Jan Reineke, and Pepe Vila. "[Hardware-software contracts for secure speculation](https://www.microsoft.com/en-us/research/publication/hardware-software-contracts-for-secure-speculation/)" in Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), 2021. 3. Accessible summary of the two papers above, in a journal format: Oleksii Oleksenko, Christof Fetzer, Boris Köpf, Mark Silberstein. "Revizor: Testing Black-box CPUs against Speculation Contracts". In IEEE Micro, 2023. 4. Paper that introduced speculation filtering, observation filtering, and contract-based input generation: Oleksii Oleksenko, Marco Guarnieri, Boris Köpf, and Mark Silberstein. "[Hide and Seek with Spectres: Efficient discovery of speculative information leaks with random testing](https://www.microsoft.com/en-us/research/publication/hide-and-seek-with-spectres-efficient-discovery-of-speculative-information-leaks-with-random-testing/)" in Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), 2022. 5. Paper that introduced exception-based testing (i.e., focus on Meltdown, Foreshadow) into Revizor: Jana Hofmann, Emanuele Vannacci, Cédric Fournet, Boris Köpf, and Oleksii Oleksenko. "[Speculation at Fault: Modeling and Testing Microarchitectural Leakage of CPU Exceptions.](https://www.usenix.org/conference/usenixsecurity23/presentation/hofmann)" in Proceedings of 32nd USENIX Security Symposium (USENIX Security), 2023. 6. Paper that introduced testing of cross-VM and user-kernel leaks in Revizor, as well as presented TSA attacks on AMD CPUs: Oleksii Oleksenko, Flavien Solt, Cédric Fournet, Jana Hofmann, Boris Köpf, and Stavros Volos. "[Enter, Exit, Page Fault, Leak: Testing Isolation Boundaries for Microarchitectural Leaks](https://www.microsoft.com/en-us/research/wp-content/uploads/2025/07/Enter-Exit-SP26.pdf)" (to be published) in Proceedings of the 2026 IEEE Symposium on Security and Privacy (SP), 2026. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies. ================================================ FILE: SECURITY.md ================================================ ## Security Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/). If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)), please report it to us as described below. ## Reporting Security Issues **Please do not report security vulnerabilities through public GitHub issues.** Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report). If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc). You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc). Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) * Full paths of source file(s) related to the manifestation of the issue * The location of the affected source code (tag/branch/commit or direct URL) * Any special configuration required to reproduce the issue * Step-by-step instructions to reproduce the issue * Proof-of-concept or exploit code (if possible) * Impact of the issue, including how an attacker might exploit the issue This information will help us triage your report more quickly. If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs. ## Preferred Languages We prefer all communications to be in English. ## Policy Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd). ================================================ FILE: demo/README.md ================================================ This directory contains a set of demo configurations for fuzzing various known CPU vulnerabilities using Revizor. Each config here is intentionally made to detect only one type of vulnerabilities. For example, if you fuzz an Intel CPU with `detect-v1.yaml`, you will likely detect an instance of Spectre V1. (of course, there is always a chance that you will find a new previously-unknown vulnerability with this config, but the likelihood is rather low). The commands below assume that the ISA spec (downloaded via `rvzr download_spec`) is stored in `base.json`. ## [Spectre V1](https://meltdownattack.com/) ``` rvzr fuzz -s base.json -c demo/detect-v1.yaml -i 50 -n 10000 ``` Expected duration - several seconds. ## Spectre V1 (store variant) ``` rvzr fuzz -s base.json -c demo/detect-v1-store.yaml -i 50 -n 10000 ``` Expected duration - several seconds. ## Spectre V1-Var ([description](https://dl.acm.org/doi/10.1145/3503222.3507729) and [here](https://eprint.iacr.org/2022/715.pdf)) ``` rvzr fuzz -s base.json -c demo/detect-v1-var.yaml -i 50 -n 10000 ``` Expected duration - several hours. ## [MDS](https://mdsattacks.com/) or [LVI-Null](https://lviattack.eu/), depending on the CPU model Note: only Intel CPUs. ``` rvzr fuzz -s base.json -c demo/detect-mds.yaml -i 50 -n 10000 ``` Expected duration - several minutes. ## Spectre V4 ([description](https://www.cyberus-technology.de/posts/2018-05-22-intel-store-load-spectre-vulnerability.html)) ``` rvzr fuzz -s base.json -c demo/detect-v4.yaml -i 50 -n 10000 ``` Expected duration - 5-20 minutes. ## Zero Divisor Injection (ZDI) Note: only Intel CPUs. ``` rvzr fuzz -s base.json -c demo/detect-zdi.yaml -i 50 -n 10000 ``` Expected duration - several minutes. ## String Comparison Overrun (SCO) ``` rvzr fuzz -s base.json -c demo/detect-sco.yaml -i 50 -n 10000 ``` Expected duration - several minutes. ## Foreshadow (simplified version) Note: only Intel CPUs. ``` rvzr fuzz -s base.json -c demo/detect-foreshadow.yaml -i 50 -n 10000 ``` Expected duration - several minutes. ## Transient Scheduler Attack, Store Queue variant (TSA-SQ) Note: only AMD CPUs vulnerable to TSA. ``` rvzr tfuzz -s base.json -c demo/tsa-sq/config.yaml -t demo/tsa-sq/template.asm -i 50 -n 10000 ``` Expected duration - several minutes. ## Transient Scheduler Attack, L1D Cache variant (TSA-L1D) Note: only AMD CPUs vulnerable to TSA. ``` rvzr tfuzz -s base.json -c demo/tsa-l1d/config.yaml -t demo/tsa-l1d/template.asm -i 50 -n 10000 `` Expected duration - several minutes. ================================================ FILE: demo/big-fuzz.yaml ================================================ instruction_set: x86-64 # Model contract_observation_clause: ct contract_execution_clause: - seq # Actors actors: - main: - data_properties: - present: true # Executor executor_mode: P+P x86_executor_enable_ssbp_patch: true # Program generator program_size: 64 avg_mem_accesses: 16 max_bb_per_function: 1 # straight-line code only min_bb_per_function: 1 min_successors_per_bb: 1 max_successors_per_bb: 1 instruction_categories: - BASE-BINARY - BASE-BITBYTE - BASE-CMOV - BASE-COND_BR - BASE-CONVERT - BASE-DATAXFER - BASE-FLAGOP - BASE-LOGICAL - BASE-MISC - BASE-NOP - BASE-WIDENOP - BASE-POP - BASE-PUSH - BASE-SEMAPHORE - BASE-SETCC # - BASE-STRINGOP # commented out as it triggers a known information leak - LONGMODE-CONVERT - LONGMODE-DATAXFER - LONGMODE-SEMAPHORE # - LONGMODE-STRINGOP # commented out as it triggers a known information leak - SSE-DATAXFER - SSE-LOGICAL_FP - SSE-MISC - SSE-SSE # Input generator data_generator_entropy_bits: 24 inputs_per_class: 2 # Fuzzer enable_speculation_filter: true enable_observation_filter: true enable_fast_path_model: true coverage_type: model_instructions # Output color: true logging_modes: - info - stat - dbg_generator # - dbg_timestamp # - dbg_violation # - dbg_dump_htraces # - dbg_dump_ctraces # - dbg_dump_traces_unlimited # - dbg_model - dbg_coverage # - dbg_priming # - dbg_executor_raw ================================================ FILE: demo/detect-foreshadow.yaml ================================================ # This demo illustrates detection of Foreshadow # contract contract_observation_clause: loads+stores+pc contract_execution_clause: - delayed-exception-handling # tested instructions instruction_categories: - BASE-BINARY - BASE-BITBYTE - BASE-CMOV - BASE-CONVERT - BASE-DATAXFER - BASE-LOGICAL - BASE-MISC - BASE-NOP - BASE-POP - BASE-PUSH - BASE-SETCC instruction_blocklist_append: - DIV - IDIV actors: - main: - data_properties: - present: false - writable: false # misc. fuzzing configuration enable_speculation_filter: true enable_observation_filter: true program_size: 16 avg_mem_accesses: 8 inputs_per_class: 2 executor_warmups: 2 x86_disable_div64: false ================================================ FILE: demo/detect-mds.yaml ================================================ # contract contract_observation_clause: ct contract_execution_clause: - seq-assist # tested instructions instruction_categories: - BASE-BITBYTE - BASE-CMOV - BASE-LOGICAL # environment actors: - main: - data_properties: - accessed: False # fuzzing configuration enable_speculation_filter: true enable_observation_filter: true program_size: 20 avg_mem_accesses: 10 inputs_per_class: 2 program_generator_seed: 955240 ================================================ FILE: demo/detect-sco.yaml ================================================ # contract contract_observation_clause: ct contract_execution_clause: - seq # tested instructions instruction_categories: - BASE-BITBYTE - BASE-CMOV - BASE-LOGICAL - BASE-STRINGOP - BASE-FLAGOP # fuzzing configuration enable_speculation_filter: true enable_observation_filter: true program_size: 20 avg_mem_accesses: 10 inputs_per_class: 2 program_generator_seed: 910000 ================================================ FILE: demo/detect-v1-store.yaml ================================================ file: !include detect-v1.yaml # prevent speculative stores from being observed contract_observation_clause: ct-nonspecstore contract_execution_clause: - cond ================================================ FILE: demo/detect-v1-var.yaml ================================================ file: !include detect-v1.yaml # contract # contract_observation_clause: ct contract_execution_clause: - cond # analyser_subsets_is_violation: false # # tested instructions # instruction_categories: # - BASE-BITBYTE # - BASE-COND_BR # - BASE-CMOV # - BASE-LOGICAL # # fuzzing configuration # enable_speculation_filter: true # enable_observation_filter: true # data_generator_entropy_bits: 16 # min_bb_per_function: 2 # max_bb_per_function: 2 # program_size: 20 # avg_mem_accesses: 10 # inputs_per_class: 2 ================================================ FILE: demo/detect-v1.yaml ================================================ # contract contract_observation_clause: loads+stores+pc contract_execution_clause: - no_speculation # tested instructions instruction_categories: - BASE-BINARY - BASE-BITBYTE - BASE-CMOV - BASE-COND_BR - BASE-CONVERT - BASE-DATAXFER - BASE-LOGICAL - BASE-MISC - BASE-NOP - BASE-POP - BASE-PUSH - BASE-SETCC # fuzzing configuration enable_speculation_filter: true enable_observation_filter: true program_size: 16 avg_mem_accesses: 8 inputs_per_class: 2 program_generator_seed: 100 ================================================ FILE: demo/detect-v4.yaml ================================================ # contract contract_observation_clause: ct contract_execution_clause: - seq # tested instructions instruction_categories: - BASE-BITBYTE - BASE-CMOV - BASE-LOGICAL # environment x86_executor_enable_ssbp_patch: false # fuzzing configuration enable_speculation_filter: true enable_observation_filter: true program_size: 20 avg_mem_accesses: 10 inputs_per_class: 2 # reduce entropy (not strictly required for detection, but makes the demo finish faster) data_generator_entropy_bits: 10 program_generator_seed: 1000000 ================================================ FILE: demo/detect-zdi.yaml ================================================ # contract contract_observation_clause: ct contract_execution_clause: - seq # tested instructions instruction_categories: - BASE-BITBYTE - BASE-BINARY - BASE-CMOV - BASE-LOGICAL # fuzzing configuration enable_speculation_filter: true enable_observation_filter: true program_size: 64 avg_mem_accesses: 24 inputs_per_class: 2 program_generator_seed: 252633 x86_disable_div64: false ================================================ FILE: demo/tsa-l1d/config.yaml ================================================ instruction_set: x86-64 instruction_categories: - BASE-BINARY - BASE-BITBYTE - BASE-CMOV - BASE-COND_BR - BASE-CONVERT - BASE-DATAXFER - BASE-FLAGOP - BASE-LOGICAL - BASE-MISC - BASE-NOP - BASE-POP - BASE-PUSH - BASE-SEMAPHORE - BASE-SETCC - BASE-WIDENOP actors: - main: - mode: "host" - privilege_level: "kernel" - vmvictim: - mode: "guest" - privilege_level: "kernel" - vm: - mode: "guest" - observer: true - privilege_level: "kernel" - data_properties: - writable: false contract_observation_clause: ct-ni max_bb_per_function: 1 executor_mode: F+R executor_sample_sizes: - 15 - 40 - 160 - 320 executor_filtering_repetitions: 5 x86_enable_hpa_gpa_collisions: true program_generator_seed: 20000000 data_generator_seed: 1000000 inputs_per_class: 2 analyser_stat_threshold: 0.1 # enable_speculation_filter: true enable_observation_filter: true enable_fast_path_model: true # color: true logging_modes: - info # - stat ================================================ FILE: demo/tsa-l1d/template.asm ================================================ .intel_syntax noprefix # ----------------------------- Hypervisor (Host) ---------------------------- .section .data.main .function_main_0: # observer start .macro.set_h2g_target.vm.function_vm_0: .macro.set_g2h_target.main.function_main_1: .macro.switch_h2g.vm.0: .function_main_1: .macro.landing_g2h.main_1: .macro.set_h2g_target.vmvictim.function_vmvictim_0: .macro.set_g2h_target.main.function_main_2: .macro.switch_h2g.vmvictim.0: .function_main_2: .macro.landing_g2h.main_2: .macro.set_h2g_target.vm.function_vm_1: .macro.set_g2h_target.main.function_main_3: xor rax, rax # noremove xor rbx, rbx # noremove xor rcx, rcx # noremove xor rdx, rdx # noremove xor rsi, rsi # noremove xor rdi, rdi # noremove # insert flushing patches here .patch_placeholder: .macro.switch_h2g.vm.1: .function_main_3: .macro.landing_g2h.main_3: .macro.fault_handler: .patch_placeholder_fault_handler: .macro.set_h2g_target.vm.function_vm_2: .macro.set_g2h_target.main.function_main_4: .macro.switch_h2g.vm.2: .function_main_4: .macro.landing_g2h.main_4: nop # ----------------------------- VM - Victim ---------------------------------- .section .data.vmvictim .function_vmvictim_0: .macro.landing_h2g.vmvictim_0: # secret injection .macro.random_instructions.64.32.main_1: .macro.switch_g2h.main.vmvictim_0: lfence # ----------------------------- VM - Observer -------------------------------- .section .data.vm .function_vm_0: .macro.landing_h2g.vm_0: .macro.measurement_start: .macro.switch_g2h.main.vm_0: lfence .function_vm_1: .macro.landing_h2g.vm_1: xor rax, rax # noremove mov rax, qword ptr [r14 + 0x2000] # noremove mov rbx, qword ptr [r14 + 0x2008] # noremove mov rcx, qword ptr [r14 + 0x2010] # noremove mov rdx, qword ptr [r14 + 0x2018] # noremove mov rsi, qword ptr [r14 + 0x2020] # noremove mov rdi, qword ptr [r14 + 0x2028] # noremove mfence # noremove # secret retrieval .macro.random_instructions.64.32.vm_1: # make sure the model doesn't attempt to go further than this point lfence # noremove .macro.measurement_end.vm_1: .macro.switch_g2h.main.1: lfence .function_vm_2: .macro.landing_h2g.vm_2: .macro.measurement_end.vm_2: .macro.switch_g2h.main.2: lfence # ----------------------------- Exit ----------------------------------------- .section .data.main .test_case_exit: ================================================ FILE: demo/tsa-sq/config.yaml ================================================ instruction_set: x86-64 instruction_categories: - BASE-BINARY - BASE-BITBYTE - BASE-CMOV - BASE-COND_BR - BASE-CONVERT - BASE-DATAXFER - BASE-FLAGOP - BASE-LOGICAL - BASE-MISC - BASE-NOP - BASE-POP - BASE-PUSH - BASE-SEMAPHORE - BASE-SETCC - BASE-WIDENOP faults_allowlist: - user-to-kernel-access actors: - main: - mode: "host" - privilege_level: "kernel" - fault_blocklist: - user-to-kernel-access - user: - mode: "host" - observer: true - privilege_level: "user" - data_properties: - present: true contract_observation_clause: ct-ni max_bb_per_function: 1 executor_mode: F+R executor_sample_sizes: - 15 - 40 - 160 - 320 executor_filtering_repetitions: 5 x86_enable_hpa_gpa_collisions: true program_generator_seed: 20000000 data_generator_seed: 1000000 inputs_per_class: 2 analyser_stat_threshold: 0.2 # enable_speculation_filter: true enable_observation_filter: true enable_fast_path_model: true # color: true logging_modes: - info # - stat ================================================ FILE: demo/tsa-sq/template.asm ================================================ .intel_syntax noprefix # ----------------------------- Kernel-mode Actor (Victim) ------------------- .section .data.main .function_main_0: # observer start .macro.set_k2u_target.user.function_user_0: .macro.set_u2k_target.main.function_main_1: .macro.switch_k2u.user.0: .function_main_1: .macro.landing_u2k.main_1: # secret injection .macro.random_instructions.64.32.main_1: .macro.set_k2u_target.user.function_user_1: .macro.set_u2k_target.main.function_main_2: .macro.switch_k2u.user.1: .function_main_2: .macro.landing_u2k.main_2: .macro.fault_handler: .macro.set_k2u_target.user.function_user_2: .macro.set_u2k_target.main.function_main_3: .macro.switch_k2u.user.2: .function_main_3: .macro.landing_u2k.main_3: nop # ----------------------------- User-mode Actor ------------------------------ .section .data.user .function_user_0: .macro.landing_k2u.user_0: .macro.measurement_start: .macro.switch_u2k.main.user_0: lfence .function_user_1: .macro.landing_k2u.user_1: xor rax, rax # noremove mov rax, qword ptr [r14 + 0x2000] # noremove mov rbx, qword ptr [r14 + 0x2008] # noremove mov rcx, qword ptr [r14 + 0x2010] # noremove mov rdx, qword ptr [r14 + 0x2018] # noremove mov rsi, qword ptr [r14 + 0x2020] # noremove mov rdi, qword ptr [r14 + 0x2028] # noremove lfence # secret retrieval .macro.random_instructions.64.32.user_1: # make sure the model doesn't attempt to go further than this point lfence # noremove .macro.measurement_end.user_1: .macro.switch_u2k.main.1: lfence .function_user_2: .macro.landing_k2u.user_2: .macro.measurement_end.user_2: .macro.switch_u2k.main.2: lfence # ----------------------------- Exit ----------------------------------------- .section .data.main .test_case_exit: ================================================ FILE: docs/assets/branches.drawio ================================================ 3VlZj5swEP41kdpKWQVzJPvYPdqqdxVVbR6d4IBXgFNjNqS/vkMYMARytJtC2pfEHt8z830zNgPzNkxfS7ryPwiXBQMyctOBeTcgxHIM+M0Em1wwGdu5wJPczUWGFkz5T4bCEUoT7rK41lEJESi+qgsXIorYQtVkVEqxrndbiqC+6op6rCGYLmjQlH7jrvJR6tiWbnjDuOfj0sQ0nbwlpEVvPErsU1esKyLzfmDeSiFUXgrTWxZkyisUk497tae13JlkkTplwHRKPw4d/vbNbDE22dcv3rvw8xBneaRBgiceECeA+W6WAqaFXasN6sL5kYiiYRhvLfUSOhjjVaoboeTh/3aWeSEIKY8KIexxvtsRZPmKO+NlIXkWKzqHrZBbWFSygNGYuc9RteUmiRRJ5LLsyAYMWvtcsemKLrLWNXgoyHwVBtjcVGGhDyYVSysiVOlrJkKm5Aa6YKtpoXnRwc0R1tcVdylcwK94ygRlFD3UK6fWNoQCmvE3THrd0AlzwaexKqTyhSciGtxr6Y3W2ghqus97IVaoqwem1AYBShMl6poEBcrN92z8FbGL+qzaeJfi7Hltg7W9JohFIhfswDkJ0gGVHlPHXTxTwkGDgk9RxR/rwD+7dUgTcSGPhGw6PPp4qzXf0znwbM0CNOBeBOUFKJJJEGQ+zIHJXmJDyF03NzYD9G6xlJtgJXiktue0bwb2HUIcTW2MW410yPMa4CnpGRetEWAbqIajK9MgSKQn2wWn+5wdpzIXMWsALevFFGK5jMGBdg1b7uokW8/u0iEXE/fTw718HKbfflzbaeF5HSMx5ep7pTzbgtLGmkZhVilAqNFbhS6OOgBel8Z+ybWtMPu70DZ7RXKXoXO45CnkQQcCqCaOPSH1WRLnARTQGIYcEJ/lUvATsVRtExbAzZUOrr2HVuLUQ6tF+g6tRVbaW2wddxRbzX8ztjYhGdKH/yO2Fq53luBKRrtB0ThLrCXXHYTaQw7bQsYNonwaO68kG5bec+LVpY2229h4mbmqpuOt5xZs3P9Np6Tfy6HjXhKsP2dV63fSmothVasraLns8WmQikS2+jIJgkw7isUK7JwBaq1h1j+O7ItLawqa7ueioi8ns0rLkYtKmfrkFxVy7KZyPOqeCmL7RBBbFwViey+Iz3xZWTKqEoiRL5722Jf5lsxzm1jJZKG4iEoo4xrlBQaIo39YO9c7D4FW77Du57bSCuvRQVjvonMfzMfHYP7nsHb6gnV7Jm1N6t7kjHa8JN8ojjp/Pu10xRfzxPtLXIEvJhdMEXbv3wqMSZ8U0cUT5fkC//hEhiAXFfjHXQHZFwoQp7F8OFff+0Z5Apazh8r+wWwY3aXxUNXfiXOK11/bzftf ================================================ FILE: docs/assets/dr-instrumentation.drawio ================================================ ================================================ FILE: docs/assets/dr-model.drawio ================================================ ================================================ FILE: docs/assets/fuzzing-flow.drawio ================================================ 7V1bd+K2Fv41rJXzQJav4DwGkplOD+1JJ5mm85Sl2ALcMRa1RQLz64+EZWNL8gWwIEnd1dViIWRb37582tra6ZnjxfpzBJbz35AHg56heeueedMzDMOxHfI/2rJJWnTD1pKWWeR7rG3XcO//hKwx7bbyPRgXOmKEAuwvi40uCkPo4kIbiCL0Wuw2RUHxrkswg0LDvQsCsfXR9/A8aXWM4a79F+jP5umd9cFV8s0CpJ3Zm8Rz4KHXXJN52zPHEUI4+bRYj2FAZy+dl8cvm8dg8mPw+dc/4n/At9F/H37/s58M9mmfn2SvEMEQHzz094enyQj8sL3RYGpF3vL201pjP9FeQLBi8+UG/uVyw94Yb9JpjF/9RQBCcjWaohDfs290cg0CfxaSzy55OhiRhhcYYZ8gcM2+wGhJWt25H3gTsEEr+g4xBu6P9Go0R5H/kwwLAjYm+TrCTJiMQaHHPf0ladZIawRj0ucunRida/oNrAsdJyDGrMFFQQCWsf+cvcYCRDM/HCGM0YJ1ajjvDB/62nCdkzqGw2eIFhBHZE619FuLzTtTqvTydSeghsba5jnhHLA2wHRilo28w518YNDvIQamIAY9Y7TVAT+8+I8gDOQ18RakCP2AYxQggvpNiBLp8IOAa0oFJIBTXCoe8RK4fjibbPvcWLuWr+z9aRMiv50GWyWc+54HQwotwgCDBEcK2hL5Id5OkD0i/5IpG2uXds8mDz4m1/rumvxLu0d4jELyLuRd6X0gEZJXSAVFgn+lEtXjz/AmEt0I77Rf63gbErw5jAN/i12CcWo59YMAXhCoArhD9IECftPXBdRNEXVTgnAAnmFwh2If+4iOHyV9OeTPBa5tNAPXUYStNagHF3rER7JLMkVzNEMhCG53rWTOV6EHPTbjuz4TtAWPYvc3xHjDbDRYYUStNF6kFpzMarT5K3/xnQ52adjp9U1qnJOrDbvaupdkVH2wtxGmb9bEBPfJkxjpzDDk+ilniWAAsP9SZBAyoNjwd1Twys17RpnSIWK0ilzIfpV32fxA2qWW+8dISUs67hU3LnGaM4iFca+jCGxy3Zie7P38pY/J978q9Ccfkifgfp0+DppOY4h7vNxnmDVThWB95/cfvz0+j/94+Pb39Ffv6jp1aycW/LWP/9qKus2uvue+2Qk9vZDIvJPXHTZGpj01qiNVgea6k4hPhWnRbblOHakvhu60oy/8QLauRkH4+5iDagVxtKruRf1oS/ZFL9Anlw/E941BDO8iNIvAomcMAuoLnyPyaZZ4xaTF818KmjP4Z0XXOSNK/frM31+THluXn32bjkJvNfGJm7VHX8LlCt8Q/71lYMng5IW246e3lCjphPr4omLVrjMynrFl+6WU8EZUuBIfn61K2Vi9bC1YoA2lpqfK+9jDYdGa6634nr5V9GmWWRxBnbHVB2elGVrvYJrhnMB0ltGRI+E2D6Qa+1q8svuUWTyhfxUlaEsC0+njbB4zROOHCLgwZ4V2Ri/X65ddrw9olRIdrSTF2qBVs9Su3ZGudwTQp6ufP2FEo1jm9aftZwHMLpzVZjjLLCq77kjiWYZkCTxUFc8aCjKRxLO2OFxcXl52Ma2asIfVWAgqYloyzE1VMa0GYY8upnUguANJTEsGrrKY1llX8unnhFmmq/ralfwB0auErVVMhN40zHXsMmJwqQ9zoaehWbDwpnZoRItzFVlIq+0QVsl9lPLPK8ECCcvtd84py3WzOtB6HIdMR7oq/kIdpXQEIF3kwacZDGEEMGLUckzaPqdNHcNUyTD7GVWsopiOxCMp2zJNQ4sd36g3EU5jvCvIpAxdVXxDFyMJyQLCjSDA8AmT139yiWFnawmtvwtmijHWbqXRqnCYVjPhaGX3XObnNQHSN0pGc/tIRmEjKYucqtlIqqexDMLasKmlhN1myWllUdO2WKhjF2kz8xFKWajMdCUBT8pDerJNHfpFP95KHt3TcZZrcUunejsnGfp8+znH0dpSJa9itUPrWF6bpR2citfqhiAbHoGSJ7YU3o7YnojY8qZoaIveTRZqqSK27HZfqeCHM/Jeu/vZxdtZRsNQLe9MQUAQDgkXGlHHFiuRVllWYsewqzW79XitMoot7tskFJsZI3jB82qpB+q4dbtiISPXTexBa+TaPie5Jl49n7SVEeo6dr0XuW4386CeaytK2uIirEKuVeOkLT5Uq4qMc67PLKYk1D6XeYoUBjHyKGYvvPMYcqnSV5Ftx3CuCmi0lC1lFAdVkixVrpQFoBf07FVCwNkxrI54KyTeA54IX4mOTxpyVBZQLktaCBDwchFHLrxIWBJxxaaeTBbtj6mlyP1AwptKqFW5kel4lVx9W49oW6rSI/QuP0IhvEOjofVQha5IHOAauqsspnPLrjqvotKrGG/Pq4hJCZ3a1ynRu9mnTAn2GTlDecJ0xxmOlK6mG53KOIMkoR6EINhk2dXX9Cru8qvVehVrWL9HYO67R3CcYIgbSZ1XqVOiY7yKFF5lXqWsHARBjqjs04uPaNSFTMuFuHAc51wDLrRKUmX+TEfqnEfbQiRxHlIhUldjQtzgCdAs3h3LmaDZrHMdp3UdsrxJqVgoO5pjiJHPznXUqdD7cR2yKNNoG6in87FzHReZ6e+OYrUuBU1tv7KzWGaD1A0Yete0vhu15wGIY9/tFTawipu6h5bEOKQax37GuHYDlk18s1xHo/5oTw5VWwJq2nbs/i53gls/dH+Xd0GZJNbs77a11WaKPISe9/5yT1Md4yV0t1iTGc0dCz8wR5KO66Jw6s+yMeWmLSfmLVm5zM+9zn0M75dgK5KvEVgKqiTdzVUQC+UFSBOphyyaYapiHuZ5ClQwA9SntZyMXj5529CHlXaorCrQvhW16KB3MPLJNFLWrMLGNc3ndhrauGMrcHGid2Wf1uTItnJzdXUSa+HH4Inan2RF9IVyhJVLGdE9zTloo1JOSk7bTv2opSypqjXOEanK/Ui5A5cufaSIWFbxF+pyPUxxb04XkQwCfxmXGfActvwBY36NnHcdPcOcOi50XcHPkG+eHduylW6GaZwWGpKw5UkdgLgXJqGlHw8IblNSt0+Ig7ROjwCD5Hj/uWGQmK5myJTCoFtc+caBJBZzShxEYmz9K3DQuFNZsu2UU+KgS9yBemL6XmpOOkoYIm8UD6/RynHNLOWj7ZqTJQ9cmr7M9a+rUWlplf0VnVUUXUFXpfKU9eBqiyQrKlNp6UWl4ZRPYZXKM9cENpqfLzn1ORE1hlbnU+JMzj42Pidi11jslgytWX3uo/a59u3PGXJFhlbkev+Ogpe1xYne88kS6RuL+6uSI3Ufj9WbvBoOJVuhJ2X1w7N6mkKkuXGZkOFbPsmYhgmUVw05Lroh7n3m0py0/30l//md7ul8SHs7rLO3A61Vc3sCgyqGbyUbSR/QoHJhEsMWzzKcNmwochiJif2AQHBVhLKqQmcD4jxn9Esq/b+1Qv8lJdaPLfTPrxpMozhE43QM/o+R8KVc21pGlTyw2r8AIKaflacYfwR/a9SenH9LBf3J5e7PZSbdd3911Lz9Pw== ================================================ FILE: docs/assets/tsa-sq-template.drawio ================================================ 7Vtdc5s4FP01nmkfkjESYPvRcZJmptnZzma7m33qyCAbpoBcIcf2/vqVQHxJ2CYOJiTZvsS6EgKde+7RvYIO4CzcfqFo5f1GXBwMwNDdDuD1AADDHkH+R1h20jIEk9SypL4rbYXhwf8XZwOlde27OK4MZIQEzF9VjQ6JIuywig1RSjbVYQsSVO+6QkusGR4cFOjWv32XednCLLPouMP+0pO3BhDaaU+IstFyKbGHXLIpmeDNAM4oISz9FW5nOBDwZcCk193u6c2fjOKINblgeDe9uDccYzeZjR7t9TSOx8sLMEqneULBWi5ZPi3bZRhQso5cLGYxBvBq4/kMP6yQI3o33O3c5rEwkN0LPwhmJCA0uRYuFgvgONweM0p+4lKPa89ty+Y9+jrk0p4wZXhbMsl1fcEkxIzu+BDZC22JcUazDPNN4TMwlDav5C4zuxBJnizzuQsk+Q8J5nOAtc8LrGvhsWvWATsGc2i3BKzRP2ANDdevmEaJ7qTgATvgD3E1p/zXUvz6hBwmwJkmUelHnzVHcDRYFe0qqhGJsOICaUKBv4x40+EYY26/Etj6XD+msiP0XVfcpta9BQGGYnoSsQf5UDXR/Wzn5cqTOQ+amvPGNb6DZ4sJzXXfYw4aGIqN47Df1nzgR/GbaRz3m1EXdGdznFWjZdJbmbNEYP0YZmZ+l7zngOwNj8veUUcxsmpH63Id2yntMuygS9jrtpAUX9d/KuBNTfEKRaot94EIn33eyW11M3Bz6V63998f7hJnirRJD9gVxcIFIh9DjofTkQzrgdtTCpgqBWBDCpxtt2uQnuHInYpMVyAUoDj2nSp8Vaw5OHT3WG78IxqXVta83pY7r3eytRddhugSswNLkOKB3UqmrfughLFVA3FmozhAzH+q5ud1uMs7fCM+f+Iio1HEFRhWdYqYrKmD5VXldFqdCFrVicYKB1JgtIkSGuTLPp0ZY40Zf+JYPN8MxSLybqKMf/3bMVsI1bGyR471SLW71OqJ5o5kRzR0lawKdsk59q+1qAWTfOJiIx9aJD8RoSEKigHZTH/5DvPDpPqtzZ8oilwS8pra/bxX5jsQ5tbzWlWlJ6+t0pmmqN4/VPK9HXzzfPT18NWLvgRf+C7wtZpmGWcTL0OvzJKMEbShXklSpUrXHzf3v0+v96eSR9LSl2vnJ74RUl5gCmiihRjEfBK9K51Us1lz9OpxrBeSC7QO2A+P71SBOAp4gzBrcmm+eji3XDgag5cWjlPGkPOzYabyNmtFaL16dLVfLG599ih7xO9SqchbRaUoGlmhuK/ABCdXmGlFdmDhUlWOVqIyLHpSidpgjzw/txK1R8pEQJnozJWooZeiLfFuWOHdqCHxjM6IZzck3qRXxINQUa7hiUcg6kE1BMpE5yaeXnSfRfCaEq87xZs0JJ7RL8lTmZfvoS9lXteSl93u40me0VTzMg/1lHoQtiV6ZreiB/SzkLNQ7zTmHSOe9rXC7S3k/0RlgmIv/xaiC372672ERqvRifzUNbZjfupnSRo/s6LsHs1x8I3Evjh74V1zwhgJ91Zt5bcWHlqJycLtUnx5djlHnOSXC052b6B9K5NzbC+tnlP2mYqfBOnVwm+k88SA+ynxsq8q4EfNgoymhV92vvl/rLcc62Yf9qIXEMjoFS+AUsgbp54IqC+5ra5zFP2cV3k5zf2nMeW9vpu2at6fdfpuGpytTu59tQKa7hD9UgJN2EfgsqV6xZy0pQW8WXy9ng4v/hcAvPkP ================================================ FILE: docs/assets/unicorn-model-state-machine.drawio ================================================ 7Vxbc5s4FP41nml3phmQuD4mTtzONOtmm3S23ZeOYmSbLUZeITdxf/0KEBcDBmyMgaTuTApHEhyk7zs6R7cRHK+e31O0Xv5JLOyMgGQ9j+D1CAAgQZn/50u2oUQGqhRKFtS2hCwR3Nu/sBBG2Ta2hb2djIwQh9nrXeGMuC6esR0ZopQ87WabE2f3rWu0wDnB/Qw5eenftsWWQqqpSpLwAduLpXg1gFALU1Yoyi0+xVsiizylRPBmBMeUEBZerZ7H2PGrL6qYsNxkT2qsGcUuq1Pgr6+f7pzF9uP0++zLP9PF+8ndVnkn60I5to0+GVu8BsQtoWxJFsRFzk0ivaJk41rYf6zE75I8t4SsuVDmwn8xY1vRnGjDCBct2coRqfjZZl/94hequPuWSrl+Fk8ObrbRjcvoNiwE1Oj+WzoxKRfcRQU9RskPPCYOocEHwskE8h9PCT/d/969VSpEHtnQGS6rR4FNRBeYleTT4obnnMFkhbmmvBzFDmL2z109kMDuIs6XtC6/EA18SGOHz/2JnI140x/89o4/TJoSukL8AZrDtb96pPxq4V/5GS5dX8X7NZ7lsLKLhKelzfD9GgU19cQtwm6rI8deuPzawXMW1/5PTBl+Lq//fH2JAtAU8BXmRYm49pTiaiRbpmiqSS1VsV9+cHxqRidJmkwk6bR0AjXppHdJJ7CfTje8povJ1EcaGVFv2xsaRX7DgGgkNyIR5DTStNOSCNYkkaw0ZFFQ9JJStE1lWBPbZV7qyXe+IAGdLIFd1KlSxn/JFlDkhgUMpaIA0EBZAX4RfmUC7Li6GmD9GDMyJe47r4emxNT6ZkqUIfbIDT3cNrpkZQhdsrKfSx+QazmYDqdX1vS+UQkMr1du6ty2ESuqNZnUaayoHsmkvnZMKuwdm7TfbDoBm7SabDK7cHLVjA3X5HIPVDFBWf52HFAwxDHA4QJRlnqBRLUciZoBy/K3hERjKEjsF6L2dH1nRhSsiK5lGZYVaAlS5m9IHQMp2AtIVQzYmFlEnWW8RhoKol5Gdwl6gcTs5GrWuMEsFMEZoCjqMBWkff4yzaEzwZ5cHXnNbcdJtfx8PgezWREmLO1RU5MR62bBWXZkF3YdnCmd+sRyiuLRYOBAhw31mjTvpL8x1QzuFLW8v8nkVypmIEyzNH87RkHp1Ik+Frr6cKHbSQfVFLrQqICuVJq/Jeh26qy/Quh2MvPbGLpaOXR1rTR/O9BVO40KXiF01UFCt2L8VwOl+VuCbqfTeq8Qup0M1zWFLjArpi4MvSx/O9AVr0wHwJ9ub68uxx9zkO5/FKzD3kXBPRrqkuoahoZBsKZdgcnktIbB6LM7BjOwgxWDrno2MjjHoKswVsNC4nCB2IlzlQNihZ+vmh34+UqPVm28BpPYia+UQ2LFYIkC9LL87SDRyPk+X+6uLx9uRuEyrSbuj4MesXOFZj8WQbGdmaDJZDzuykMCet88JKD0xxwcNxkoH2oN/N+prYFZ0xoYDa1Bo7Y2c4QLlkRi17LdBb+aoI2zZ7n+Mlgy+X0e5pBc4v+1Xf7nDUemv6jSb6u3Z91/l2WwMcPFDH40eKwonYbB2S0ZmlHAYA2ek8E98iyP3B9wMIXbGPyIGrb1DQJH9ejZ5Wz6gWPI+jl8y6gKT2Bh6tqWl2FVtDpWRdEu1DPaFQUOz640ihLa2MIo191X39EexuySRl2vWgOpG2UFWrIrezb+t2hXWtkG0oltkSWlf8YFDtC4NBwOa8W8gHOZl2bndoBC/k6J/9kxief7SfyC2Aj6yMbXNwjQSgRR98CCTiKI7FrzygjC7CKCKD5MoL6leBnxgNFDIyHn96bfTK/zhsNx7LWHq+u5YJTe/xVUuer/8+XEZZlxvNSi/mZmWc7Ey3ELVI3MGq3FX4M3ykfM0/hO0omNct3d6k0napqRK79d/f7h8vPDC6FX9pyzmG2dsUvNdzS3eMF7mBG8LO5Z+PV/G/+sw6vdA+eEkIve+TBb4qDBEGVhXxVktnm/LLnY5qk0zoSDw3EkZFkUe0EGkiQmUjKPhVEQuyw7wCCjZ3LYQR1FV4jNltg7vRriJKBjdMjVU0oVhj0/bYY4JSrUSM9CFOvBkzac7EErBC0WN1gwaoBSeXyDwBWhQuyK+CRJ4h/FKlU6SJk9OtR6URI/HfqyI78sPzSTey/F0ctQkXMnoTlLk2UVK+gxsvaqv7jooXu0QMFft6CI11iR+jNnRS1DMdtQN7ZTAvseduYX39NPfvM2b2Wk2KQE/Kmi1UGfUfEBxyq/o3t91fe6+8cvFeC4x6W9aEWYUNWpelwhjrPboPS1LJ+mnwU6vIjmAyNHVs53tdzNKgodgHR4b8tvkxOAwygwOUkZ3vwP ================================================ FILE: docs/faq/general.md ================================================ # General FAQ ## Overview #### What is Revizor? {#what-is-revizor} : Revizor is a security-oriented fuzzer designed to detect microarchitectural information leaks in CPUs. It automatically generates random test programs, executes them on real hardware, and compares the observed behavior against a formal model to identify unexpected information leakage through side channels like those exploited by Spectre and Meltdown attacks. #### Who is Revizor for? {#who-uses-revizor} : Revizor is primarily designed for CPU security researchers and hardware vendors interested in identifying and mitigating microarchitectural vulnerabilities. It may also be useful for system developers and security professionals who want to assess the security of the hardware platforms they work with. #### How does Revizor differ from other hardware fuzzers (e.g., SiliFuzz)? {#how-does-revizor-differ-from-other-hardware-fuzzers} : Most of the existing hardware fuzzers focus on finding functional bugs, such as incorrect instruction execution or crashes. Revizor, on the other hand, is specifically designed to find security vulnerabilities related to microarchitectural side channels. It uses a model-based approach to define what information is allowed to leak and tests whether the CPU adheres to these specifications. : See [Revizor at a Glance](../intro/01-overview.md) for a more detailed introduction. #### How is Revizor different from constant-time testing tools (e.g., Microwalk)? {#how-does-revizor-differ-from-ct-testing-tools} : Constant-time testing tools like Microwalk focus on verifying that software implementations do not leak sensitive information through timing variations. They analyze the execution of programs to ensure that their timing behavior is independent of secret data. : Revizor, in contrast, tests the CPU hardware itself for microarchitectural information leaks. It tests whether the CPU behaves as expected, regardless of the software running on it. #### What CPUs does Revizor support? {#supported-cpus} : Revizor currently supports testing on x86-64 CPUs from Intel and AMD, as well as ARM CPUs. #### Does Revizor detect only those leaks that are described in the contract? {#leaks-described-in-contract} : No! It is a common misconception that Revizor can only find leaks that are explicitly described in the contract. In reality, it is the opposite: The contract defines what the Revizor should *not* report as a leak, which allows the tool to filter out the known types of leakage and focus on finding unexpected leaks that violate the contract. This is how Revizor is able to discover new vulnerabilities even in completely black-box CPUs. --- ## Installing Revizor #### What operating system is required to run Revizor? {#required-os} : You will need Linux. #### Do I need a specific Linux distribution/version? {#specific-linux-distro} : No, Revizor should work on any reasonably recent Linux. If you encounter issues, that's most likely a bug that we would like to hear about. Please report any problems on our [GitHub Issues page](https://github.com/microsoft/side-channel-fuzzer/issues). #### Does Revizor require root or administrator privileges? {#requires-root} : Yes. Revizor's executor is implemented as a kernel module that requires loading into the kernel and accessing hardware performance counters. Both operations require root privileges. Additionally, some system configuration steps recommended for optimal performance (like disabling hyperthreading) require administrative access. #### Can I run Revizor in a virtual machines? {#run-on-vms} : Unfortunately, not. Revizor requires direct access to the CPU's PMU to accurately measure side-channel leakage. Running Revizor inside a virtual machine would introduce additional layers of abstraction and interference that could distort the measurements and lead to inaccurate results. You need to run Revizor on a bare-metal installation of Linux. --- ## Running Revizor #### Can Revizor affect system stability? {#safety} : Although extremely unlikely, Revizor could potentially affect the host operating system. Revizor executes randomly-generated code in kernel space, which means that a misconfiguration or bug can crash the system and potentially lead to data loss. However, it does not intentionally perform any operations that would damage hardware. : You should never run Revizor on production machines or systems containing important data without backups. Always use a dedicated testing machine. #### How long does it take to find a vulnerability? {#time-to-find} : This varies significantly, based on the complexity of the experiment. Typical numbers range from minutes to weeks. #### Can Revizor test my own assembly programs or does it only generate random ones? {#test-custom-programs} : Yes, Revizor can test custom assembly programs using the `-t` flag. You can provide your own test case program in assembly format, and Revizor will execute it with randomly-generated inputs to check for contract violations. This is useful when you want to verify specific code patterns or investigate potential vulnerabilities in particular instruction sequences. : See the [CLI Reference](../ref/cli.md) for details on the `-t` option. #### How much computational resources does a typical fuzzing campaign require? {#resource-requirements} : Resource requirements vary significantly based on the fuzzing configuration. A typical campaign runs continuously for hours to weeks. The primary variables affecting performance are the number of inputs per test case, sample sizes for hardware measurements, and the complexity of the ISA subset being tested. Larger sample sizes increase accuracy but reduce throughput. Most campaigns run on standard server or workstation hardware without specialized requirements beyond the supported CPU architecture. : See [How to Design a Fuzzing Campaign](../howto/design-campaign.md) for guidance on balancing performance and detection effectiveness. --- ## Violations #### Are false positives common? How does Revizor handle them? {#false-positives} : No, unless it is misconfigured. Revizor uses a multi-stage filtering pipeline to eliminate false positives caused by noise and non-deterministic hardware behavior. This removes the vast majority of spurious violations. However, if Revizor is misconfigured (e.g., insufficient sample sizes), false positives can still occur due to noise in hardware measurements. These are relatively easy to identify as they tend to be unstable and non-reproducible. : See [How to Interpret Violation Results](../howto/interpret-results.md#evaluating-violation-quality) for guidance on evaluating violation quality and handling false positives. #### Can Revizor automatically generate exploits or proof-of-concept code? {#generate-exploits} : No. Revizor detects violations of the leakage contract by identifying test cases where hardware behavior differs from the contract's predictions. While it provides the test program and inputs that trigger the violation, it does not automatically generate working exploits. The violation artifacts serve as evidence of unexpected leakage and a starting point for manual security analysis. You can use the minimization feature to simplify the test case, making it easier to understand and potentially develop into a proof-of-concept. : See [How to Minimize Test Cases](../howto/minimize.md) for details on simplifying violations. #### How do I know if a detected violation is actually exploitable? {#exploitability} : Determining exploitability requires manual analysis of the violation. Start by reproducing the violation to confirm it's stable, then use the minimization feature to simplify the test case. Next, analyze the minimized program to understand what information is leaking and through which side channel. Root-cause analysis involves examining the assembly code, understanding the data dependencies, and determining whether an attacker could control the leaked information to extract sensitive data. Not all violations are practically exploitable, but all indicate deviation from the specified security contract. : See [How to Root-Cause a Violation](../howto/root-cause-a-violation.md) for systematic analysis techniques. #### Is Revizor deterministic? Can I reproduce results? {#reproducibility} : Contract traces are fully deterministic—the same program with the same inputs always produces identical contract traces. Hardware traces, however, contain inherent non-determinism due to timing variations, cache state, and other microarchitectural effects. Revizor handles this through statistical analysis of multiple samples. Violations are reproducible when the same test program and inputs consistently show the same distributional differences in hardware traces. The violation artifact includes all necessary files (program, inputs, configuration) to reproduce detected violations, and Revizor provides a dedicated reproduce mode for verification. : See [Execution Modes](../ref/modes.md) for details on the reproduce mode. --- ## Development and Contribution #### Is Revizor actively maintained? {#maintenance-status} : Yes. Revizor is actively maintained and continues to receive updates, bug fixes, and new features. The project has an active GitHub repository with recent commits and ongoing development. #### Can I contribute to Revizor? {#contributing} : Yes, we welcome contributions from the community! You can contribute by reporting issues, suggesting new features, improving documentation, or submitting code changes through pull requests. Please refer to our [Contribution Guidelines](../internals/index.md) for instructions on how to get started. ================================================ FILE: docs/glossary.md ================================================ # Glossary This glossary defines key terms used throughout the Revizor documentation. The entries are ordered in such a way that more fundamental concepts appear first, building up to more complex ideas. So, you can should be able to get a good understanding of the terminology by reading the glossary top-down. --- ####Noninterference : A formal property that captures perfect confidentiality, stating that changes in secret data have no observable effect on public outputs. A program satisfies noninterference if variations in secret inputs cause no differences in public outputs. In Revizor's context, this property is checked with respect to side-channel observations and speculation contracts. !!! info "Related Documentation" - [Primer: Information-Flow Properties](intro/03-primer.md#information-flow-properties) - [Primer: Noninterference Definition](intro/03-primer.md#noninterference-definition-and-examples) --- ####Information Flow : The movement of data through a computation. Information-flow security is concerned with how data moves through a system and how it can be observed by an attacker. For example, if a program contains a data-dependent memory access `array[secret_index]`, the value of `secret_index` influences which memory location is accessed. In turn, if the attacker can observe the cache lines being accessed by this program, the execution of the array access will reveal (leak) information about `secret_index` through side channels. This creates an information flow from the secret data (`secret_index`) to the attacker's observations (cache state). !!! info "Related Documentation" - [Primer: Information-Flow Properties](intro/03-primer.md#information-flow-properties) - [Primer: Side Channels](intro/03-primer.md#beyond-direct-outputs-side-channels) --- ####Speculation Contract (aka Leakage Contract) : A formalization of how we expect the CPU to behave and what information we expect it to leak when any given program is executed. A simplified and deterministic model of CPU hardware designed to capture the information that a given program could leak over side channels when executed with given inputs. A speculation contract defines two key aspects for every instruction: an observation clause (describing what data is exposed) and an execution clause (describing how hardware optimizations like speculative execution affect the instruction). Speculation contracts intentionally overestimate possible leaks to ensure conservative and deterministic traces. !!! info "Related Documentation" - [Topic: Contracts](topics/contracts.md) - [Primer: Speculation Contracts](intro/03-primer.md#speculation-contracts-dealing-with-the-complexity-of-modern-hardware) - [How-to: Choose a Contract](howto/choose-contract.md) --- ####Observation Clause : Part of a speculation contract that specifies what information an instruction exposes through side channels when executed. For example, an observation clause might specify that a load instruction exposes the memory address it accesses. !!! info "Related Documentation" - [Topic: Contracts - Contract Structure](topics/contracts.md#contract-structure) - [Primer: Speculation Contracts](intro/03-primer.md#speculation-contracts-dealing-with-the-complexity-of-modern-hardware) --- ####Execution Clause : Part of a speculation contract that specifies how hardware optimizations (particularly speculative execution) affect an instruction's semantics. For example, an execution clause might specify that a conditional branch may mispredict its target and execute down the wrong path. !!! info "Related Documentation" - [Topic: Contracts - Contract Structure](topics/contracts.md#contract-structure) - [Primer: Speculation Contracts](intro/03-primer.md#speculation-contracts-dealing-with-the-complexity-of-modern-hardware) --- ####Leakage Model : An implementation of a speculation contract. This model is used to compare the actual CPU behavior against the specification defined by the contract. It predicts what information flow is allowed through side channels for any given test case. !!! info "Related Documentation" - [Topic: Leakage Models](topics/models.md) - [Internals: Model Architecture](internals/architecture/model.md) - [Internals: Unicorn Backend](internals/model-backends/model-unicorn.md) - [Internals: DynamoRIO Backend](internals/model-backends/model-dr.md) --- ####Contract Trace (CTrace) : The output of a leakage model. A CTrace is a recording of all exposed information when a given program is executed on the leakage model (e.g., a sequence of memory addresses accessed). This trace represents the expected information flow according to the contract. !!! info "Related Documentation" - [Topic: Contracts - Contract Traces](topics/contracts.md#contract-traces) - [Topic: Leakage Models - Trace Representation](topics/models.md#trace-representation) - [Topic: Trace Analysis](topics/trace-analysis.md) --- ####Executor : The component responsible for running programs on real hardware and collecting attacker-observable microarchitectural changes. This component acts as the counterpart to the leakage model; that is, while the model represents our expectations of the CPU behavior, the executor captures the actual behavior of the CPU under test. !!! info "Related Documentation" - [Internals: Executor Architecture](internals/architecture/exec.md) - [Reference: Configuration Options](ref/config.md) --- ####Hardware Trace (HTrace) : The output of the executor. An HTrace is a recording of microarchitectural state changes (like cache evictions, readings of the time stamp counter, etc.) observed during a program execution. These traces are used to capture the information flows on the CPU under test, both the expected and unexpected ones. !!! info "Related Documentation" - [Topic: Trace Analysis](topics/trace-analysis.md) - [Internals: Executor Architecture](internals/architecture/exec.md) --- ####Test Case Program : A small assembly program, either generated automatically by Revizor or written manually by the user. Test case programs are intended to be executed on the target CPU to collect hardware traces, and on the leakage model to collect contract traces. !!! info "Related Documentation" - [Topic: Test Case Generation](topics/test-case-generation.md) - [Internals: Code Generator Architecture](internals/architecture/code.md) - [Reference: Binary Formats - RCBF](ref/binary-formats.md) --- ####Test Case Data (aka Test Case Input) : A blob of data used to initialize memory and registers for the execution of a test case program. Test case data can be generated automatically by Revizor or provided manually by the user. !!! info "Related Documentation" - [Topic: Test Case Generation](topics/test-case-generation.md) - [Internals: Data Generator Architecture](internals/architecture/data.md) - [Reference: Binary Formats - RDBF](ref/binary-formats.md) --- ####Sandbox (or Test Case Sandbox) : An isolated execution environment where test case programs are run on the target CPU and on the model. On the technical level, a sandbox constitutes of a dedicated region of memory where the test case program and data are loaded, as well as a set of mechanisms to isolate the test case execution from the rest of the system (e.g., by disabling interrupts, overriding MSRs, etc.). !!! info "Related Documentation" - [Reference: Sandbox](ref/sandbox.md) - [Reference: Registers](ref/registers.md) --- ####Model-based Relational Testing (MRT) : The core methodology of Revizor. It involves randomly generating test programs and inputs to them, executing them with the executor and the model, collecting the corresponding hardware and contract traces, identifying the information flows in both, and comparing them to find unexpected leaks. !!! info "Related Documentation" - [Primer: Model-Based Relational Testing](intro/03-primer.md#model-based-relational-testing-and-revizor) - [Topic: Trace Analysis](topics/trace-analysis.md) - [Internals: Fuzzer Architecture](internals/architecture/fuzz.md) --- ####Violation : A situation where hardware traces expose some information that is not exposed in the contract traces for the same test case. This indicates that the CPU is leaking some information not specified by the contract, which may represent a security vulnerability. !!! info "Related Documentation" - [Topic: Trace Analysis](topics/trace-analysis.md) - [Primer: Contract Violation](intro/03-primer.md#building-and-testing-speculation-contracts) - [How-to: Root-Cause a Violation](howto/root-cause-a-violation.md) --- ####Violation Artifact (aka Contract Counterexample) : A bundle consisting of a test case program, two inputs that trigger the violation (plus extra inputs to set the uarch state, if needed), the corresponding hardware and contract traces, and a collection of configuration files to reproduce the violation. Violation artifacts are generated automatically by Revizor when a violation is detected. !!! info "Related Documentation" - [Reference: Binary Formats](ref/binary-formats.md) - [How-to: Root-Cause a Violation](howto/root-cause-a-violation.md) - [How-to: Minimize Test Cases](howto/minimize.md) --- ####Minimization : A post-processing mode that takes a violation artifact and performs transformation passes to simplify the program and data while preserving the violation. The goal is to produce a minimal artifact that is easier to understand and analyze, using program passes (instruction removal/simplification), input passes (sequence/diff minimization), and analysis passes (source analysis). !!! info "Related Documentation" - [How-to: Minimize Test Cases](howto/minimize.md) - [Reference: Minimization Passes](ref/minimization-passes.md) - [Internals: Minimization Architecture](internals/architecture/mini.md) --- ####Multi-stage Filtering : A pipeline of validation stages applied to potential violations to rule out false positives. A violation must survive all stages to be reported. !!! info "Related Documentation" - [Internals: Fuzzer Architecture](internals/architecture/fuzz.md) - [Reference: Configuration Options](ref/config.md) --- ####Priming Test : One of the most important validation stages. It is motivated by the following problem: when hardware traces are collected for a sequence of many inputs, the execution of the program with earlier inputs will affect the microarchitectural state for later inputs (e.g., the branch predictor state). This can lead to false positives, where two inputs that should be indistinguishable according to the contract produce different hardware traces simply because they were executed in different microarchitectural states (e.g., one input triggered a misprediction while the other did not). These case don't actually represent a violation because the difference in traces is not caused by the data difference, but rather by the sequence of executions. The priming test mitigates this problem by re-executing the violating inputs in a different sequence, by swapping the order of inputs that trigger a violation. If the violation disappears when the order is swapped, it indicates that the difference in traces was due to inconsistent microarchitectural state rather than a true violation. Otherwise, we have evidence that the violation is genuine. !!! info "Related Documentation" - [Reference: Configuration Options - enable_priming](ref/config.md#enable_priming) - [Internals: Fuzzer Architecture](internals/architecture/fuzz.md) --- ####Contract Compliance : A CPU complies with a speculation contract if, for all possible programs and input pairs that produce identical contract traces, the corresponding hardware traces are also identical. This ensures that the contract captures all information that the hardware can leak. While testing all possible programs is infeasible, Revizor approximates this by randomly sampling the search space with a large number of test cases. !!! info "Related Documentation" - [Topic: Trace Analysis - Contract Compliance Property](topics/trace-analysis.md#contract-compliance-property) - [Topic: Contracts - Contract Compliance](topics/contracts.md#contract-compliance) - [Primer: Contract Compliance](intro/03-primer.md#building-and-testing-speculation-contracts) --- ####Contract Equivalence Class (ContractEqClass) : A group of inputs that produce identical contract traces for a given test case program. According to the leakage model, these inputs should be indistinguishable when executed. !!! info "Related Documentation" - [Topic: Trace Analysis - Deterministic Trace Comparison](topics/trace-analysis.md#deterministic-trace-comparison) - [Internals: Analyser Architecture](internals/architecture/analysis.md) --- ####Hardware Equivalence Class (HardwareEqClass) : A group of inputs that produce statistically similar hardware traces for a given test case program. These inputs are actually indistinguishable on real hardware. !!! info "Related Documentation" - [Topic: Trace Analysis - Statistical Trace Comparison](topics/trace-analysis.md#statistical-trace-comparison) - [Internals: Analyser Architecture](internals/architecture/analysis.md) --- ####Boosting (aka Contract-driven Input Generation) : A data generation optimization technique that uses taint analysis to generate inputs more likely to trigger contract violations. The boosted generator identifies which input bytes affect the contract trace and generates new inputs by mutating the non-tainted bytes. This way, we can deterministically and efficiently create any number of inputs that produce the same contract trace (i.e., form one ContractEqClass), increasing the chances of finding violations. !!! info "Related Documentation" - [Internals: Data Generator Architecture](internals/architecture/data.md) - [Reference: Configuration Options](ref/config.md) --- ####Fuzzer : The main orchestrator in Revizor that manages core components (CodeGenerator, DataGenerator, Model, Executor, and Analyser) and coordinates the fuzzing loop. When a potential violation is found, the Fuzzer runs it through a multi-stage filtering pipeline to eliminate false positives. !!! info "Related Documentation" - [Internals: Fuzzer Architecture](internals/architecture/fuzz.md) - [Reference: Configuration Options](ref/config.md) --- ####Analyser : The component that compares contract traces with hardware traces to detect violations. It uses an equivalence class approach where it groups inputs by contract traces (ContractEqClasses) and then checks if they split into multiple hardware equivalence classes (HardwareEqClasses), which would indicate a violation. !!! info "Related Documentation" - [Topic: Trace Analysis](topics/trace-analysis.md) - [Internals: Analyser Architecture](internals/architecture/analysis.md) - [Reference: Configuration Options - analyser](ref/config.md) --- ####Actor : A partition of the sandbox representing a distinct execution context with specific isolation properties (e.g., a VM). An actor encompasses a code region, a data region with configurable permissions, and an execution context (CPU mode, privilege level, and system configuration). Actors enable testing for information leaks across different security domains. !!! info "Related Documentation" - [Topic: Actors](topics/actors.md) - [Reference: Sandbox](ref/sandbox.md) --- ####Actor Non-Interference : A specialized type mode of testing in Revizor, where, on top of testing for standard contract violations, the tool also checks that there are no information flows between different actors in a multi-actor test case. This mode is used to verify isolation properties between security domains, ensuring that secret data in one actor does not influence observable behavior in another actor. !!! info "Related Documentation" - [Topic: Actors](topics/actors.md) --- ####Observer Actor : An actor marked as an observer in the configuration, representing an attacker that can observe data leaks in multi-actor testing scenarios. This is used in conjunction with the Actor Non-Interference mode to check that secret data in other actors does not influence the traces in the observer actor. !!! info "Related Documentation" - [Topic: Actors](topics/actors.md) - [Reference: Configuration Options](ref/config.md) --- ####RCBF (Revizor Code Binary Format) : A custom binary format used to transfer test case programs between Revizor components. The format contains a header, actor table, symbol table, metadata, and code sections for each actor. !!! info "Related Documentation" - [Reference: Binary Formats - RCBF](ref/binary-formats.md) --- ####RDBF (Revizor Data Binary Format) : A custom binary format used to transfer input data between Revizor components. The format contains initialization data for sandbox memory and registers, and can combine multiple inputs into a single file for batch processing. !!! info "Related Documentation" - [Reference: Binary Formats - RDBF](ref/binary-formats.md) --- ####Template : An assembly file that combines regular assembly instructions with placeholders to define a test case structure for the code generator. Such templates are used in a special template mode of Revizor, where the programs are generated by populating the placeholders with random instructions instead of generating programs from scratch. !!! info "Related Documentation" - [How-to: Use Templates](howto/use-templates.md) - [Reference: Configuration Options](ref/config.md) --- ####Macro : A special pseudo-instruction in test case programs that can be treated differently depending on whether the test case is executed by the model or the executor. One prominent example is VM transition macros, which handle switching between actors. A special type of macro is also used to implement the placeholders in templates. !!! info "Related Documentation" - [How-to: Use Macros](howto/use-macros.md) - [Reference: Macro Reference](ref/macros.md) --- ================================================ FILE: docs/howto/ask-a-question.md ================================================ # Ask a Question If you have a question about Revizor, there are several ways to reach out to us: * For **any questions, no matter how big or small,** feel free to post them in our community [Zulip chat](https://rvzr.zulipchat.com/) where the community and developers can assist you. * Alternatively, you can start a discussion on our [GitHub Discussions page](https://github.com/microsoft/side-channel-fuzzer/discussions) (this is preferable for longer questions that may require more in-depth answers). Bug reports should be submitted to our [GitHub Issues page](https://github.com/microsoft/side-channel-fuzzer/issues). For general information about Revizor, please refer to our [FAQ](../faq/general.md) page. ================================================ FILE: docs/howto/choose-contract.md ================================================ # How to Choose a Contract This guide helps you select the appropriate [contract](../glossary.md#speculation-contract) for your fuzzing campaign. The contract determines which microarchitectural leaks Revizor will report as violations, making it a critical configuration choice that affects both what you find and how efficiently you find it. !!! note "Prerequisites" Before choosing a contract, you should understand what contracts are and how they work. Read the [Contracts](../topics/contracts.md) topic guide if you need background on contract structure and purpose. ## Standard Fuzzing with CT-SEQ Use CT-SEQ for most fuzzing campaigns. This contract assumes nothing about the target CPU except the presence of CPU caches, making it a zero-knowledge baseline for detecting unknown vulnerabilities. With CT-SEQ, Revizor reports any information leaks beyond the most trivial non-speculative cache accesses. Configure CT-SEQ by setting the [observation clause](../glossary.md#observation-clause) to `ct` and the [execution clause](../glossary.md#execution-clause) to `seq`: ```yaml contract_observation_clause: ct contract_execution_clause: - seq ``` CT-SEQ provides the strictest security guarantees and will detect the widest range of vulnerabilities. Start with this contract unless you have specific reasons to use a different one. ## Continuing After Finding a Violation When you find a violation with CT-SEQ and want to continue testing for additional vulnerabilities, you have two approaches. The simpler and more efficient approach is to blocklist the instruction that triggered the violation. Use the [`instruction_blocklist_append`](../ref/config.md#instruction_blocklist_append) configuration option to exclude specific instructions from testing. For example, if a branch misprediction caused the violation, blocklist all conditional branch instructions: ```yaml contract_observation_clause: ct contract_execution_clause: - seq instruction_blocklist_append: - jne - je # add other branch instructions ``` This approach lets you continue using CT-SEQ's fast and efficient detection while avoiding repeated reports of the same root cause. Alternatively, you can incorporate the newly discovered speculation source into the contract by switching to a different execution clause. For violations caused by branch mispredictions, switch to the COND execution clause: ```yaml contract_observation_clause: ct contract_execution_clause: - cond ``` The CT-COND contract models speculative execution from branch mispredictions as expected behavior. Revizor will no longer report violations from this source, allowing you to search for other types of leaks in the same instruction set. ## Testing with Exceptions If your fuzzing campaign includes code that may raise exceptions such as page faults or general protection faults, these exceptions will likely cause trivial violations under CT-SEQ. Modern CPUs implement out-of-order execution, which means instructions after a faulting instruction may begin executing before the CPU recognizes the exception. These subsequent instructions can leak information not predicted by CT-SEQ's strictly sequential model. These violations typically represent known artifacts of out-of-order execution rather than genuine security issues. To suppress such trivial reports, use the CT-DEH contract instead. This contract models delayed exception handling, allowing instructions after a faulting instruction to execute transiently before the exception is handled: ```yaml contract_observation_clause: ct contract_execution_clause: - deh ``` CT-DEH remains strict about other speculation sources while accommodating the expected behavior around exceptions. ## Testing Cross-Domain Isolation When testing isolation between security domains such as kernel versus user mode or host versus guest execution, use the Actor Non-Interference contract (CT-NI). This contract changes the security property being tested. Instead of only checking that inputs with identical [contract traces](../glossary.md#contract-trace) produce equivalent [hardware traces](../glossary.md#hardware-trace), CT-NI adds an additional requirement: the hardware traces observed by attacker actors must not depend on data from victim actors. Configure CT-NI with the following observation clause: ```yaml contract_observation_clause: ct-ni ``` You must also configure actors properly, designating which actors are observers (attackers) and which are victims. See [Actors](../topics/actors.md) for details on actor configuration. ## Investigating Known Vulnerabilities When investigating variants of known vulnerabilities, use a contract that models the specific vulnerability class you are studying. For Spectre V1 variant analysis, use the COND execution clause to model branch mispredictions as expected behavior: ```yaml contract_observation_clause: ct contract_execution_clause: - cond ``` This configuration lets you explore whether other instructions or gadget patterns can be exploited through branch misprediction without being distracted by the original Spectre V1 finding. For other vulnerability classes, choose the execution clause that models the corresponding speculation mechanism. See the [Configuration Reference](../ref/config.md#contract_execution_clause) for a list of available execution clauses and their intended use cases. ## What's Next? - Topic: [Contracts](../topics/contracts.md) - Understanding contract structure and behavior - How-to: [Design a Fuzzing Campaign](design-campaign.md) - Complete campaign planning including contract selection - Reference: [Configuration Options](../ref/config.md) - Complete list of contract and configuration parameters - Glossary: [Contract](../glossary.md#speculation-contract), [Observation Clause](../glossary.md#observation-clause), [Execution Clause](../glossary.md#execution-clause) ================================================ FILE: docs/howto/design-campaign.md ================================================ # How to Design a Fuzzing Campaign This guide shows you how to design and configure a fuzzing campaign for detecting speculative execution vulnerabilities. A campaign consists of three components: a configuration file (YAML), command-line arguments, and optionally a template file (ASM). !!! note "Prerequisites" - Revizor installed and the executor kernel module loaded - Basic understanding of [contracts](../topics/contracts.md) and what you want to test ## Select Instruction Set Choose which instruction subset to test. Smaller subsets are more effective because violations are found faster and root-cause analysis is simpler. For comprehensive ISA coverage, split testing into multiple targeted campaigns rather than running a single large campaign. Specify instruction categories in your configuration file using `instruction_categories`: ```yaml instruction_categories: - BASE-BINARY # arithmetic instructions - BASE-STRINGOP # string operations - BASE-LOGIC # logical operations ``` Verify which instructions are included by enabling debug logging: ```yaml logging_modes: ['info', 'stat', 'dbg_generator'] ``` For fine-grained control over the instruction set, see the [Configuration Reference](../ref/config.md#instruction_categories). ## Configure Exception Testing Enable exception testing using the `faults_allowlist` option: ```yaml faults_allowlist: - div-by-zero # division by zero exceptions ``` Ensure the corresponding instructions are included in your instruction set. For example, `div-by-zero` requires division instructions in the tested pool. For testing Meltdown or Foreshadow-like vulnerabilities, configure memory access permissions through actor-specific `data_properties` and `data_ept_properties`: ```yaml actors: - main: data_properties: present: false # trigger page faults writable: false # trigger write protection faults ``` See the [Sandbox Reference](../ref/sandbox.md) for details on memory permissions and the [Configuration Reference](../ref/config.md#faults_allowlist) for all exception handling options. ## Configure Actors for Multi-Domain Testing For cross-domain leakage testing, define [actors](../glossary.md#actor) to represent different security domains: ```yaml actors: - main: mode: host privilege_level: kernel - guest: mode: guest privilege_level: kernel observer: true ``` Create corresponding template files to specify transition sequences between actors. See [Actors](../topics/actors.md) for detailed instructions. ## Select Contract Choose a [contract](../glossary.md#speculation-contract) that defines what execution behavior constitutes a violation. Contract selection depends on whether you are testing cross-domain leakage and which known vulnerabilities you want to filter out. For detailed guidance on selecting the appropriate contract for your testing scenario, see [How to Choose a Contract](choose-contract.md). Example configuration: ```yaml contract_observation_clause: ct contract_execution_clause: - seq ``` See the [Configuration Reference](../ref/config.md#contract_observation_clause) for all available contract options. ## Configure Noise Threshold Adjust noise tolerance based on your system characteristics. Higher thresholds and larger sample sizes reduce false positives but may miss subtle leaks and decrease performance. Lower thresholds increase sensitivity but may produce false positives on noisy systems. For high-noise systems: ```yaml analyser_stat_threshold: 0.5 # conservative threshold executor_sample_sizes: [50, 100, 500, 1000] ``` For low-noise systems: ```yaml analyser_stat_threshold: 0.1 # sensitive threshold executor_sample_sizes: [10, 50, 100] ``` Start with low-noise settings and increase thresholds if you encounter non-reproducible violations. See the [Trace Analysis Guide](../topics/trace-analysis.md#statistical-trace-comparison) for more information on noise handling. ## Enable Reproducibility Set deterministic seeds to make the campaign reproducible: ```yaml program_generator_seed: 12345 # deterministic program generation data_generator_seed: 67890 # deterministic input generation ``` Reproducible campaigns are essential for debugging and comparing results across different runs. ## Configure Test Case Shape Control the structure of generated test cases: ```yaml program_size: 64 # instructions per program avg_mem_accesses: 32 # average memory accesses min_bb_per_function: 1 # minimum basic blocks per function max_bb_per_function: 2 # maximum basic blocks per function min_successors_per_bb: 1 # minimum successors per basic block max_successors_per_bb: 1 # maximum successors per basic block ``` Larger programs may find more complex interactions but require longer analysis time. Start with smaller programs and increase size if needed. ## Use Templates for Targeted Testing Use templates when targeting specific microarchitectural scenarios. Templates define fixed assembly structures with random instruction insertion, allowing you to focus on specific patterns while maintaining variability. Example template: ```asm .section .data.main .function_main_0: # Fixed initialization mov rax, 0 # Random instruction sequence .macro.random_instructions.32.0: # Fixed measurement .macro.measurement_start: mov rbx, [r14] .macro.measurement_end: .test_case_exit: ``` See [How to Use Templates](use-templates.md) for detailed template syntax and the [Macro Reference](../ref/macros.md) for available macros. ## Complete Example This campaign tests whether division-by-zero exceptions cause unexpected information leakage on the target CPU. It focuses on simple arithmetic instructions to isolate exception handling behavior and answers the question: "Does division by zero on this CPU leak information through microarchitectural side channels?" The configuration assumes a CPU with relatively low non-determinism, using moderate sample sizes and a conservative statistical threshold. The campaign uses the DEH (Delay Exception Handling) contract to filter out trivial cases of out-of-order handling of the exception. Test cases are kept small (32 instructions, no branches) to simplify analysis and accelerate violation detection. Each campaign iteration generates 100 different inputs per test case to explore various data-dependent behaviors around division operations. ```yaml # Instruction selection instruction_categories: - BASE-BINARY # Exception handling faults_allowlist: - div-by-zero # Contract contract_observation_clause: ct contract_execution_clause: - deh # Noise handling analyser_stat_threshold: 0.2 executor_sample_sizes: [10, 50, 100, 500] # Reproducibility program_generator_seed: 12345 data_generator_seed: 67890 # Test case shape: 32 instructions with no branches program_size: 32 avg_mem_accesses: 16 min_bb_per_function: 1 max_bb_per_function: 1 # Single actor actors: - main: mode: host privilege_level: kernel data_properties: # no page faults present: true writable: true # Debugging logging_modes: ['info', 'stat', 'dbg_generator'] ``` Launch the campaign: ```bash rvzr fuzz -s base.json -c config.yaml -n 100000 -i 100 -w ./violations --timeout 3600 ``` ## What's Next? - How-to: [Choose a Contract](choose-contract.md) - Select the appropriate contract for your testing scenario - How-to: [Use Templates](use-templates.md) - Create targeted test cases - How-to: [Interpret Results](interpret-results.md) - Understand fuzzing output - Topic: [Actors](../topics/actors.md) - Configure multi-domain testing - Topic: [Contracts](../topics/contracts.md) - Understanding leakage contracts - Topic: [Test Case Generation](../topics/test-case-generation.md) - How test cases are generated - Reference: [Configuration Options](../ref/config.md) - Complete configuration reference - Reference: [CLI Reference](../ref/cli.md) - Command-line interface reference ================================================ FILE: docs/howto/interpret-results.md ================================================ # How to Interpret Violation Results So you've run a fuzzing campaign and found a violation. Now what? This guide will help you understand and validate violations detected by Revizor. This guide explains the structure of violation artifacts, how to reproduce violations, and how to interpret the output to determine whether a violation is genuine and worth investigating. !!! info "Prerequisites" Before starting, ensure you have: - Revizor installed and functional on the target system - A violation directory (`violation-`) produced during fuzzing - The configuration file (`config.yaml`) used in the original fuzzing campaign - Access to the same hardware where the violation was detected ## Violation Message When Revizor detects a violation during fuzzing, it prints a summary message to the console similar to this: ```plaintext (venv-3.12) main ➜ revizor ./revizor.py fuzz -s base.json -c demo/detect-v1.yaml -n 1000 -i 100 -w ./ INFO: [prog_gen] Setting program_generator_seed to random value: 599740 INFO: [fuzzer] Starting at 15:39:42 17 ( 2%)| Stats: Cls:0/0,In:200,R:7,SF:10,OF:7,Fst:0,CN:0,CT:0,P1:0,CS:0,P2:0,V:0> Priming 27 . to 500 ================================ Violations detected ========================== Violation Details: ----------------------------------------------------------------------------------- HTrace | ID:92 | ID:192| ----------------------------------------------------------------------------------- ^...^...................^...........^.........^................. | 497 | 0 | ^...^........................................................... | 3 | 2 | ^^..^...........................................^.........^..... | 0 | 498 | ================================ Statistics =================================== Test Cases: 18 Inputs per test case: 200.0 Violations: 1 Effectiveness: Total Cls: 98.0 Effective Cls: 98.0 Discarded Test Cases: Speculation Filter: 10 Observation Filter: 7 Fast Path: 0 Max Nesting Check: 0 Tainting Check: 0 Early Priming Check: 0 Large Sample Check: 0 Priming Check: 0 Duration: 40.5 Finished at 15:40:23 ``` Most of the output is statistics, and they are mostly irrelevant for interpreting the violation itself. You can find a detailed explanation of the runtime statistics in the [Statistics Reference](../ref/runtime-statistic.md). The relevant part for interpreting the violation is the `Violation Details` section: ``` ----------------------------------------------------------------------------------- HTrace | ID:92 | ID:192| ----------------------------------------------------------------------------------- ^...^...................^...........^.........^................. | 497 | 0 | ^...^........................................................... | 3 | 2 | ^^..^...........................................^.........^..... | 0 | 498 | ``` This section summarizes the hardware trace samples recorded for the inputs that triggered the violation. Let's break it down. ### Violating Inputs ``` | ID:92 | ID:192| ``` This block tells us which inputs produced the violation. In this case, it's inputs 92 and 192. You can find them in the violation artifact directory as `input_92.bin` and `input_192.bin`. ### Hardware Traces ``` ^...^...................^...........^.........^................. ^...^........................................................... ^^..^...........................................^.........^..... ``` This block shows a visual representation of all observed hardware traces for these inputs. In this example, we used Revizor's default P+P (Prime+Probe) cache side channel tracer, which records the state of L1D cache after a test case execution. The `^` character indicates that a cache line was accessed (evicted by the test case program), while the `.` character indicates that the cache line was not accessed. The complete line is a bitmap of all 64 L1D cache sets available on the target machine, numbered left to right from 0 to 63. Accordingly, the first line is interpreted as follows: ``` Set 4 accessed Set 36 accessed | | Set 46 accessed | | | ^...^...................^...........^.........^................. | | Set 0 accessed Set 24 accessed ``` meaning that cache sets with IDs 0, 4, 24, 36, and 46 were accessed in this hardware trace. !!! tip "Colors!" Enable `color: true` in the configuration file to improve readability of hardware trace visualizations. ### Trace Distribution ``` ... | 497 | 0 | ... | 3 | 2 | ... | 0 | 498 | ``` Finally, this block shows the [statistical distribution](../topics/trace-analysis.md#statistical-trace-comparison) of hardware traces for each input. For example, input 92 produced the first hardware trace 497 times (out of the total of 500 measurements), while input 192 never produced that trace. Instead, input 192 produced the third hardware trace 498 times. ### Analysis By looking at this table, we can deduce two important facts about the violation: 1. There is a clear difference in the sample distributions for the two inputs. This indicates a genuine violation rather than random noise. 2. The dominant (most frequently observed) hardware trace for each input have evicted distinct sets of cache lines. This is an indirect clue that the test case had a data-dependent memory accesses pattern that was not predicted by the contract (likely due to speculative execution). ## Violation Artifact When Revizor detects a violation, it creates a directory named `violation-`, with the following structure: ``` violation-/ ├── program.asm ├── input_0.bin ├── input_1.bin ├── ... ├── report.txt ├── org-config.yaml ├── reproduce.yaml └── minimize.yaml ``` The `program.asm` file holds the test case program that triggered the violation. The `input_*.bin` files contain the input sequence that exposed the leak. The `report.txt` file provides additional details including hardware and contract traces. The configuration files include `org-config.yaml` (the original configuration), `reproduce.yaml` (for reproducing the violation), and `minimize.yaml` (for test case minimization). Before proceeding with analysis, locate this directory and verify that all required files are present. ## Reproducing the Violation It is usually a good idea to first reproduce the violation outside of the fuzzing campaign. This confirms that the violation is stable and not a transient artifact of noise or a misconfiguration of the fuzzer. ```bash rvzr reproduce -s base.json -c ./violation-/reproduce.yaml \ -t ./violation-/program.asm -i ./violation-/input_*.bin ``` If Revizor prints "Violation detected" in the output, the violation reproduced successfully. The distribution of hardware traces should roughly match the original violation. Significant differences may indicate a bug or misconfiguration in the fuzzer (e.g., random seeds). Non-reproducible violations should be rare, typically no more than one or two per machine per week of fuzzing. If your campaign produces more, adjust the configuration file to increase noise tolerance. See the [configuration options reference](../ref/config.md) for details on noise-related parameters. ## Evaluating Violation Quality Several factors determine whether a violation is worth investigating further. *Reproducibility* is the most important criterion. Violations that consistently reproduce across multiple runs indicate stable, genuine leaks. Sporadic violations that appear and disappear may be false positives caused by noise. In such cases, consider adjusting noise tolerance settings ([`analyser_stat_threshold`](../ref/config.md#analyser_stat_threshold) and/or [`executor_sample_sizes`](../ref/config.md#executor_sample_sizes)) in the configuration file and rerunning the fuzzing campaign. *Trace distribution* provides additional insight. Clean violations show clear separation between inputs with consistent occurrence counts. Messy violations with overlapping traces or highly variable counts suggest non-determinism and may be harder to analyze. In such cases, consider collecting more samples per input by increasing the [`executor_sample_sizes`](../ref/config.md#executor_sample_sizes) configuration option (note: this will slow down fuzzing). Finally, *the hardware trace pattern* can be informative as well. There is no hard rule here, but if you see lots of accessed cache sets while the configuration is supposed to limit the number of memory accesses to only a few, that may indicate that some CPU feature creates additional noise, beyond the ability of the statistical analyzer to filter it out. In practice, this is often due to prefetchers. It is typically a good idea to disable them, unless you are specifically testing for prefetcher-related leaks. ## Next Steps Once you have confirmed that a violation is reproducible and worth investigating, proceed to minimize the violation artifacts and root-cause the leak. See the [How to Minimize Test Cases](minimize.md) and [How to Root-Cause a Violation](root-cause-a-violation.md) guides for detailed instructions. ## See Also - [How to Root-Cause a Violation](root-cause-a-violation.md) - Systematic analysis of confirmed violations - [How to Design a Fuzzing Campaign](design-campaign.md) - Tuning fuzzer parameters for better results - [How to Minimize Test Cases](minimize.md) - Simplifying violation artifacts for analysis - [Configuration Options](../ref/config.md) - Detailed configuration parameter reference - [Execution Modes](../ref/modes.md) - Understanding reproduce mode and other execution modes - [Trace Analysis and Violation Detection](../topics/trace-analysis.md) - How Revizor detects and analyzes violations - [Contracts and Leakage Models](../topics/contracts.md) - Understanding contract semantics ================================================ FILE: docs/howto/minimize.md ================================================ # How to Minimize Test Cases This guide discussed a process of test case minimization, which aims to reduce complexity of violation artifacts by simplifying test programs and input sequences while preserving the violation. This is typically a post-processing step performed after a fuzzing campaign has detected a violation, with the goal of producing a minimal test case suitable for human analysis and root-cause investigation. The minimization is done by using Revizor's `minimize` mode, which post-processes a violation through a series of transformation passes that simplify both the test program and input sequence. !!! note "Related Documentation" For a complete list of available passes and their detailed descriptions, see the [Minimization Passes reference](../ref/minimization-passes.md). !!! info "Prerequisites" Before starting, ensure you have: - Revizor installed and functional on the target system - A violation directory (`violation-`) produced during fuzzing - The configuration file (`config.yaml`) used in the original fuzzing campaign - Access to the same hardware where the violation was detected ## Basic Usage Run the minimizer with the following syntax: ```bash rvzr minimize -s -c -t -o \ -i --input-outdir --num-attempts \ [pass_options] ``` Parameters: - `-s`: Path to ISA specification (e.g., `base.json`) - `-c`: Path to configuration file (typically `minimize.yaml` from violation directory) - `-t`: Path to test program (typically `program.asm` from violation directory) - `-o`: Output path for minimized program - `-i`: Number of inputs in the sequence (must match the original fuzzing campaign) - `--input-outdir`: Directory to store minimized input files - `--num-attempts`: Number of minimization iterations to perform - `[pass_options]`: Enable specific minimization passes (see [Minimization Passes](../ref/minimization-passes.md)) Example command (assuming a violation directory named `violation-0000-0000`): ```bash rvzr minimize -s base.json -c violation-0000-0000/minimize.yaml -t violation-0000-0000/program.asm \ -i 25 --input-outdir ./min-inputs --num-attempts 10 --enable-instruction-pass 1 \ -o min.asm ``` This command generates an input sequence of 25 inputs based on the seed in `violation-0000-0000/minimize.yaml`, applies the instruction removal pass 10 times to simplify `program.asm`, and writes the minimized program to `min.asm`. The simplified input sequence is stored in `./min-inputs`. ## Interpreting the Output Each minimization pass prints progress indicators to the console as it executes. Understanding this output helps verify that minimization is progressing correctly. ### Program Pass Output Program passes display one character per instruction to indicate success or failure: - `.` indicates the pass succeeded on this instruction (e.g., instruction was successfully removed) - `-` indicates the pass failed on this instruction (e.g., removing this instruction breaks the violation) Example output when running `--enable-instruction-pass`: ``` [Pass 2] Instruction Removal Pass .............-.....--.-------..---- ``` Interpret this output by reading from right to left, since the pass iterates from the end of the program to the beginning. In this example, the pass successfully removed the last 13 instructions, failed on the 14th instruction from the end, succeeded on the 15th, and so on. ### Input Pass Output The `input-diff` pass uses a memory-map visualization to show minimization progress. Each character represents one byte in the input sequence: - `.` indicates zeroing the byte succeeded - `+` indicates copying the byte from the first input to the second succeeded - `=` indicates the byte was already identical in both inputs - `^` indicates the pass could not minimize this byte (it remains different between inputs) Example output from `--enable-input-diff-pass`: ``` Address +0x0 +0x40 +0x80 +0xc0 +0x100 +0x140 +0x180 +0x1c0 0x00000000 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000200 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000400 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000600 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000800 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000a00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000c00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000e00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001000 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001200 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001400 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001600 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001800 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001a00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001c00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001e00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00002000 ====^=.. 0x00002040 ........ ........ ........ ........ > Result: Leaked 1 bytes > Addresses: ['0x2020'] ``` This output shows that the pass successfully minimized most input differences. The byte at address `0x2020` (marked with `^`) remains different between the two inputs and likely contributes to the violation. Bytes at addresses `0x2000-0x2018` and `0x2028` (marked with `=`) were already identical. ### Comment Pass Output Enable `--enable-comment-pass` to annotate the minimized program with analysis information. The pass inserts comments indicating which memory accesses contributed to the violation, making it easier to identify the root cause. Comment format: ``` # mem access: [input1_id] [load_addr]-[store_addr] CL [cache_set_id]:[cache_line_offset] | [input2_id] [load_addr]-[store_addr] CL [cache_set_id]:[cache_line_offset] ``` Each comment shows the memory addresses accessed by an instruction when executed with the two inputs that triggered the violation. The comment includes both virtual addresses and their corresponding L1D cache set IDs and line offsets. Example comment: ```asm # mem access: [1] 0x800-0x800 CL 32:0 | [11] 0x710-0x710 CL 28:10 ``` This indicates that when executed with input 1, the instruction accessed address `0x800` (cache set 32, offset 0), and when executed with input 11, it accessed address `0x710` (cache set 28, offset 10). These different cache set accesses likely contributed to the violation. ## Complete Workflow Example This example demonstrates a typical minimization workflow. Assume a fuzzing campaign detected a violation: ```bash rvzr fuzz -s base.json -c config.yaml -n 1000 -i 25 -w . ``` The fuzzer created a violation directory (e.g., `violation-000000-000000`) containing the test case artifacts. ### Step 1: Minimize the Program Apply all program passes to simplify the test case while preserving the violation: ```bash rvzr minimize -s base.json -c ./violation-000000-000000/minimize.yaml \ -t ./violation-000000-000000/program.asm \ -o min.asm -i 25 --num-attempts 3 \ --enable-instruction-pass 1 \ --enable-simplification-pass 1 \ --enable-nop-pass 1 \ --enable-constant-pass 1 \ --enable-mask-pass 1 \ --enable-label-pass 1 ``` ### Step 2: Verify Program Minimization Confirm the minimized program still triggers the violation: ```bash rvzr fuzz -s base.json -c ./violation-000000-000000/minimize.yaml -t min.asm -i 25 ``` If the violation is no longer detected, reduce `--num-attempts` or disable some passes, then retry step 1. ### Step 3: Minimize Inputs and Add Annotations Apply input passes and analysis passes to further simplify the test case and add helpful comments: ```bash rvzr minimize -s base.json -c ./violation-000000-000000/minimize.yaml \ -t min.asm -o commented.asm -i 25 \ --input-outdir ./inputs \ --enable-input-diff-pass 1 \ --enable-input-seq-pass 1 \ --enable-comment-pass 1 ``` ### Step 4: Verify Complete Minimization Reproduce the violation with the minimized program and inputs: ```bash rvzr reproduce -s base.json -c ./violation-000000-000000/reproduce.yaml \ -t commented.asm -i ./inputs/min_input*.bin ``` If successful, the minimized test case in `commented.asm` and `./inputs/` is ready for detailed analysis. The annotated comments will help identify the root cause of the violation. !!! tip "Troubleshooting Failed Minimization" If minimization breaks the violation, try these adjustments: - Reduce `--num-attempts` to perform fewer iterations - Disable aggressive passes like `--enable-simplification-pass` - Minimize the program before minimizing inputs - Check that `data_generator_seed` matches the original fuzzing campaign ## What's Next? Once a violation is minimized, the next step is typically to analyze it manually to understand the root cause. The [How to Root-Cause a Violation](root-cause-a-violation.md) guide is dedicated to this topic. ## See Also - [Minimization Passes](../ref/minimization-passes.md) - Complete list of available passes and their options - [CLI Reference](../ref/cli.md) - Full command-line interface documentation - [Execution Modes](../ref/modes.md) - Overview of all Revizor execution modes - [Configuration Options](../ref/config.md) - Configuration file reference including `data_generator_seed` - [How to Design a Fuzzing Campaign](design-campaign.md) - Set up effective fuzzing campaigns - [How to Interpret Results](interpret-results.md) - Understand fuzzing outputs and violation reports - [Trace Analysis and Violation Detection](../topics/trace-analysis.md) - Understanding how violations are detected ================================================ FILE: docs/howto/root-cause-a-violation.md ================================================ # How to Root-Cause a Violation This guide discussed in detail how to identify the root cause of confirmed contract violations. This guide shows a typical workflow and some useful techniques for analyzing violation artifacts and isolating the specific CPU behavior that leads to information leakage. !!! warning "Art, Not Science" Root-causing violations is more art than science. The techniques described here are not guaranteed to work in every situation because violations can arise from a wide variety of complex CPU behaviors. Use your intuition and knowledge of microarchitecture to guide your analysis. Experiment with different approaches and document what works best for you. !!! info "Prerequisites" The guide assume you have already finished a [fuzzing campaign](design-campaign.md) and [minimized the violation artifacts](minimize.md). ## Locate the Violation Files We will explore the root-cause analysis through a concrete example. The example will demonstrate a CT-SEQ contract violation on an x86-64 CPU. We will be working with: - The violation artifact in `violation-0000-0000/` produced during fuzzing - A minimized version of the violation program in `min.asm` produced by the minimizer - A set of minimized input files in `./inputs/min_input_*.bin` produced by the minimizer - The configuration file `config.yaml` used during fuzzing ## Gather Insights from Minimizer A good starting point is to examine the output of the minimizer, especially from input minimization passes. These passes attempt to reduce the differences between inputs that trigger the violation, and thus they often highlight the specific data values that leak and that impact the violation. Below is an example of the printed summary from the differential input minimizer: ``` [PASS 2] Differential Input Minimizer > Minimizing the difference between inputs 1 and 11 Address +0x0 +0x40 +0x80 +0xc0 +0x100 +0x140 +0x180 +0x1c0 0x00000000 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000200 ........ =....... ........ ........ ........ ........ ........ ........ 0x00000400 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000600 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000800 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000a00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000c00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000e00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001000 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001200 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001400 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001600 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001800 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001a00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001c00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001e00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00002000 .....^.. 0x00002040 ........ ........ ........ ........ > Result: Leaked 1 bytes > Addresses: ['0x2028'] ``` The minimizer goes through the pair of inputs that trigger the violation - inputs #1 and #11 in this case - and tries to minimize the differences between them: * If both inputs already have identical values at a given address, the minimizer prints `=` for that address. In this example, this is the case for address `0x240`. * Next, the pass attempts to zero out one byte at a time in both inputs. If the violation persists, then the minimizer prints `.` for that address. In this example, most of the addresses are zeroed out. * Next, the pass attempts to copy one byte from input #1 into the same address in input #11. If the violation persists, then the minimizer prints `+` for that address. This example does not have such cases. * If both attempts fail, the pass restores the original values at the given address, prints `^`, and moves to the next address. In this example, the minimizer restored the original value at address `0x2028`. The interpretation of these results is case-specific, but generally, the values with `+` or `=` are those that create conditions for leakage, and the values with `^` are the addresses whose value leaks. In this example, the minimizer found that this test case leaks one byte at address `0x2028` (used to initialize RDI). The minimizer also found that the address `0x240` must contain specific non-zero values that must be the same in both inputs. This address in the input is used to initialize the corresponding offset in the sandbox of actor 0. See [Sandbox Memory Layout](https://microsoft.github.io/side-channel-fuzzer/user/sandbox/) for more details about register and memory initialization. !!! tip "Minimizer Behavior" Ideally, the minimizer should be able to reduce the leakage to a single byte. If more then a couple bytes leak, it typically indicates that the violation is non-deterministic, and it might be a good idea to re-run the program minimizer or to change the configuration to increase the number of attempts/increase the noise threshold. If *no* bytes leak, this is a certain sign that something went wrong; re-run the minimizer. ## Step 3: Add Comments to Minimized Program Run the minimizer again with the `comment` pass enabled to annotate the minimized program with memory access information. This will help you map hardware traces to specific instructions in the program. ```bash rvzr minimize -s base.json -c ./violation-0000-0000/minimize.yaml \ -t min.asm -o commented.asm -i \ --enable-comment-pass 1 ``` ## Insert Speculation Fences To isolate speculative behavior, add fences: ```bash rvzr minimize -s base.json -c ./violation-0000-0000/minimize.yaml \ -t commented.asm -o fenced.asm -i \ --enable-fence-pass 1 ``` This pass with attempt to insert an `LFENCE` after every instruction in the program and check if the violation still occurs. In the resulting file (`fenced.asm`) the region *without* fences is the one that causes the violation. The remaining instructions are just setting up the data for the violation, and are likely irrelevant. !!! warning "Unexpected Fence Insertion Results" If an `LFENCE` is inserted after *every* instruction in the test case and the violation still occurs, this is most likely due to a bug in the model or the executor. If you are using a custom model, consider checking the model for correctness. If you haven't made changes to the Revizor source code, please, open an issue in the [bug tracker](https://github.com/microsoft/side-channel-fuzzer/issues). ## Map Hardware Traces to Minimized Program and Data When both program and its inputs are minimized, you should be able to identify which instructions caused the cache accesses in the hardware traces and which data was leaked. When we run the `reproduce` command with the minimized program and inputs, we will see the following hardware traces: ```bash rvzr reproduce -s base.json -c ./violation-0000-0000/reproduce.yaml \ -t commented.asm -i ./inputs/min_input*.bin ... ================================ Violations detected ========================== ----------------------------------------------------------------------------------- HTrace | ID:1 | ID:11 | ----------------------------------------------------------------------------------- ^...............................................^............... | 420 | 0 | ^............................................................... | 80 | 0 | ^..............^................................................ | 0 | 500 | ``` !!! tip "Input IDs" If in your case the input IDs have changed after minimization, you can either exclude some of the inputs from the arguments of the `reproduce` command, or re-run the minimizer with fewer passes. We see that the hardware traces have been significantly simplified compared to the original violation, and now there are at most two accessed cache sets in each trace: 0 and 48 for input #1, and 0 and 15 for input #11. This is a good sign: the minimization was successful. We can also tell that the only difference between the two traces is the accessed cache set 48 vs 15 . This is the cache set that is causing the violation, and we should be aiming to find the instruction that does the access. To do so, let's look at the contents of the `commented.asm` file. This file contains the minimized program with comments that show which memory addresses or cache lines are accessed by each instruction. ```assembly ; ... skipped header ... 1. and rax, 0b1111111111111 # instrumentation 2. lfence 3. mov edx, dword ptr [r14 + rax] 4. # mem access: [1] 0x0 cl 0:0 | [11] 0x0 cl 0:0 5. or cx, 0b1000 # instrumentation 6. and cl, 0b11111000 # instrumentation 7. and dx, 0b11 # 8. and rsi, 0b1111111111111 # 9. add cl, 39 # 10. mov rbx, 0b1111111111111 # 11. bt si, dx 12. jbe .bb_0.1 13. jmp .exit_0 14. .bb_0.1: 15. mov ecx, edi 16. and rcx, 0b1111111111000 # instrumentation 17. mov byte ptr [r14 + rcx], 88 ; ... skipped footer ... ``` This program contains only two memory accesses, at lines 3 and 17. The [annotation](minimize.md#comment-pass-output) at line 4 tells us that the `mov` instruction accesses memory offset `0x0` when executed with input 1 (`[1]`) and the same cache set when executed with input 11 (`[11]`). The notation `0:0` stands for cache set `0` and cache line offset `0`. This information lets us map this instruction to the first access in the hardware trace: ```plaintext ^...............................................^............... | This eviction maps to `mov edx, dword ptr [r14 + rax]` at line 3 ``` The second memory access (line 17) does not have an annotation, which implies that the contract model has not executed this instruction with the inputs provided. It does not, however, mean that the CPU has not executed this instruction, as there is a chance that this instruction was executed speculatively. This is a typical scenario in violations detected by Revizor. If we look at the instructions prior to the memory access, we can see `jbe` instruction at line 12, which is a conditional jump - a common source of speculation, namely branch prediction. This type of speculation is not permitted by the target contract (CT-SEQ), so it could cause a violation. From this, we can make a hypothesis that the memory access at line 17 is speculative and is the one causing the second cache access: ```plaintext Inputs [1]: Hypothesis: This eviction maps to `mov` at line 17 | ^...............................................^............... Inputs [11]: ^..............^................................................ | Hypothesis: This eviction maps to `mov` at line 17 ``` To check if our hypothesis is correct, let's cross-reference this information with the leaked bytes from the differential input minimizer: ```plaintext ; .. skip zero bytes 0x00002000 .....^.. 0x00002040 ........ ........ ........ ........ > Result: Leaked 1 bytes > Addresses: ['0x2028'] ``` This summary tells us that `rdi` has a differing value between inputs #1 and 11. At the same time, the first time `rdi` is used in the program is at line 15, where it is moved to `rcx`, and then later used as a part of the address in the memory access at line 17. This would make the speculative memory access at line 17 access different addresses with the two inputs, and would explain the difference between the hardware traces. At this point, the hypothesis is more-or-less confirmed, and we can declare that the root cause of the leak was the misprediction of the `jbe` branch at line 12, which caused the speculative execution of the memory access at line 17, and which in turn leaked the value of `rdi`. If we want to further increase our confidence, we can manually inspect the contents of the inputs at the address `0x2028` to see if the values correspond to the cache set ID that we observe in the hardware traces. This can be done by running the `hexdump` command on the input files: ```bash $ hexdump -C ./inputs/min_input_0001.bin | grep 2020 00002020 00 00 00 00 00 00 00 00 1e 1c 4a 00 1e 1c 4a 00 |..........J...J.| $ hexdump -C ./inputs/min_input_0011.bin | grep 2020 00002020 00 00 00 00 00 00 00 00 c8 13 58 00 c8 13 58 00 |..........X...X.| ``` The values are `0x4a1c1e004a1c1e` for input #1 and `0x5813c8005813c8` for input #11. These are masked with `0b1111111111000` by `and` at line 16 and become `7192` and `5064` respectively. If we translate these values to cache set IDs (`id = (addr % 0x1000) // 64`), we get `48` and `15`. These values match the cache set IDs that we observed in the hardware traces, which confirms our hypothesis. If we want even more confidence, we can manually modify the input files (e.g, with `hexedit` tool) to see if the hardware traces change when we modify the value of `rdi` in the input files. --- ## Modify the Program In many cases, the minimization process will not provide a clear result as in the example above and you will not be able to make a specific hypothesis about the root cause of the violation. In such cases, you can try to modify the program in various ways to see if the violation still occurs. There are no strict rules on which modifications to make and you will have to rely on your intuition and knowledge of the target microarchitecture, but here are some general guidelines: 1. **Simplify Instructions**: Start by trying to manually replace instructions in `minimized.asm` with simpler ones. For example, replace complex instructions with memory operands with simple loads or stores. 2. **Increase/Decrease Aliasing**: Try to change the addresses of memory accesses to match (or not match if they already do) the addresses of other instruction. Such aliasing often triggers speculation (e.g., in Speculative Store Bypass or MDS attacks). 3. **Add/Remove Dependent Instructions**: If you have a hypothesis about which instruction triggers speculation, try adding or removing data-dependent instructions before it. This will change the size of the speculative window and might change hardware traces, which will give you more insight into the violation. 4. **Change Memory Permissions**: If the violation is related to memory accesses, try changing the permissions of the memory regions that are accessed by the program. For example, if the memory is read-only, try changing it to read-write. If the violation disappears, it might indicate that the violation is related to the permission checks in the CPU. 5. **Change Instruction Operands**: Try changing operands to add or remove data dependencies between instructions. For example, if you have a sequence of two moves `mov rax, [rax]; mov rbx, [rax]`, try changing the second move to `mov rbx, [rbx]` to see if the violation still occurs if there are no data dependencies between the instructions. After each modification, run the `reproduce` command to see if the violation still occurs: ```bash rvzr reproduce -s base.json -c ./violation-/reproduce.yaml \ -t modified.asm -i ./inputs/min_input*.bin ``` !!! tip "Share Your Findings" If you find any other strategies that work well, please consider sharing them by opening a pull request to this documentation. We would love to hear about your experiences and learn from them. ## See Also - [How to Interpret Violation Results](interpret-results.md) - Understanding and validating violations before root-cause analysis - [How to Minimize Test Cases](minimize.md) - Complete minimization workflow and pass descriptions - [Minimization Passes](../ref/minimization-passes.md) - Reference documentation for all minimization passes - [Configuration Options](../ref/config.md) - Configuration parameters for reproduction and minimization - [Command-Line Interface](../ref/cli.md) - Complete CLI reference for all execution modes - [Sandbox Memory Layout](../ref/sandbox.md) - Understanding input file structure and register initialization - [Trace Analysis and Violation Detection](../topics/trace-analysis.md) - How Revizor detects violations - [Contracts and Leakage Models](../topics/contracts.md) - Understanding contract semantics ================================================ FILE: docs/howto/use-macros.md ================================================ # How To Use Macros This document explains the concept of macros in Revizor and describes how to create test cases that use macros. Note that macros are especially useful in the template-based mode of Revizor, so if you are not familiar, check out the [Template-Based Mode](../howto/use-templates.md) documentation as well. ## What is a macro? Macros in Revizor are special pseudo-instructions that provide a flexible way to insert complex operations into test cases. They appear as labels of a special format in the assembly code but are dynamically expanded into actual implementations during execution by the model and the executor. Macros solve two key challenges, especially in the context of multi-domain testing: * Structuring: Enable insertion of pre-defined instruction sequences (like domain transitions or microarchitectural isolation primitives) within randomized test contexts * Unification: Allow the same test case template to be instantiated differently across executor and model stages, accommodating differences in ISA support. ## Why use macros? Macros exist to provide extra flexibility and convenience when creating test case. There are certain operations that are cumbersome or impractical to express directly in assembly code, and macros serve to abstract away these complexities. ## Macro Definition and Usage ### Assembly Syntax Macros use standard assembly syntax of a label with the `.macro` prefix: ```assembly .macro.macro_name.argument1.argument2.argument3.argument4: ``` A macro can take at most four arguments. The arguments are strictly static; Revizor does not support dynamic arguments in macros, such as registers or memory addresses. ### Example Usage A user can create a test case program where only a subset of instruction is measured by using `measurement_start` and `measurement_end` macros: ```asm .intel_syntax noprefix .section .data.main ... ; non-measured code here .macro.measurement_start: ... ; measured code here .macro.measurement_end: ... ; non-measured code here .test_case_exit: ``` Revizor will automatically replace the macros with no-op operations of an ISA-dependent size, and record the location and the arguments of the macros in the test case metadata. When the executor and the model run the test case, they will recognize these macros and execute the corresponding logic. Note that the logic can be configurable, e.g., when the user has set `executor_mode: P+P` (prime+probe), the `measurement_start` macro will correspond the Prime stage of the measurement, and `measurement_end` will correspond to the Probe stage. See [Implementation Overview](#implementation-overview) for details on how macros are implemented in the executor and model. ## Implementation Overview ### Internal Representation of Macros Revizor internally replaces all macros with a no-op placeholder of a fixed size (8 bytes for x86-64, 12 bytes for ARM64). This placeholder is used to maintain the original instruction flow while allowing the executor and model to recognize and handle macros dynamically. The macro location, type, and arguments are stored in the test case metadata, namely in the `SYMBOL TABLE` section of the [RCBF File Format](../ref/binary-formats.md), where `owner` is set to the actor ID of the actor that contains the macro, `offset` is the offset of the macro placeholder in the code section of the actor, `id` is the macro type (defined in [executor_km/include/macro_expansion.h](https://github.com/microsoft/side-channel-fuzzer/blob/main/src/x86/executor/include/macro_loader.h)), and `args` is a compressed representation of the macro arguments. ### Macros in Executor Each actor's code section contains a dedicated memory region for macros, and the implementation is copied there during test case initialization. The executor copies the implementations of all macros into this section, and it replaces the macro placeholders with direct jumps to the corresponding implementations. The executor also inserts a return jump at the end of each macro implementation to return control flow back to the original instruction sequence. For example, if we have a simple test case like this: ```asm .macro.measurement_start: ... ; some code here .macro.measurement_end: .test_case_exit: ``` The executor with expand it as follows: ```asm jump measurement_start_impl lfence .l1: ... ; some code here jump measurement_end_impl lfence .l2: .test_case_exit: .macro_code_section: measurement_start_impl: ... ; sequence of instructions that implements the macro jump .l1 ; jump to the end of the macro section measurement_end_impl: ... ; sequence of instructions that implements the macro jump .l2 ; jump to the end of the macro section ``` Note that the executor also inserts LFENCE barriers after each macro jump. This is to ensure that the macro execution does not trigger straight-line speculation, which could interfere with the measurement process. ### Macros in Model In the model, macros are implemented as dynamic callbacks. The model executes a hook function on every instruction execution, checking if the current instruction matches an entry in the symbol table. If a match is found, the model invokes the corresponding callback function to emulate the macro behavior. ================================================ FILE: docs/howto/use-templates.md ================================================ # How to Use Templates Template-based mode (`tfuzz`) enables targeted testing of specific CPU scenarios by using predefined assembly templates that get expanded with random instructions. This mode narrows down the fuzzing space to focus on particular interaction patterns while maintaining randomization within those patterns. ## Overview Template-based mode generates test cases from assembly templates containing macros that get dynamically expanded during generation. Templates define the structure and flow of test cases while allowing specific sections to be populated with random instructions based on configuration. ## Command Line Usage Template-based mode is invoked using the `rvzr tfuzz` command. The invocation is almost identical to the normal `rvzr fuzz` mode, but it takes an additional `-t` or `--template` parameter to specify the assembly template file. Invocation example: ```bash rvzr tfuzz -t template.asm -c config.yaml -s base.json -n 10 -i 100 ``` where `template.asm` is the template file. ## Template Structure Templates are assembly files that combine: - Regular assembly instructions - Macros (special pseudo-instructions as described in [Macros](../ref/macros.md)) Example template: ```asm .intel_syntax noprefix .section .data.main .macro.random_instructions.10.0: ; Replaced with 10 random instructions div rax, rbx ; rax and rbx may be set by random instructions jmp .test_case_exit ; Jump to exit point if no exception occurs .fault_handler: .macro.random_instructions.10.1: ; Generate 10 random instructions executed when a fault occurs .test_case_exit: ``` Revizor will take this template and replace the `.macro.random_instructions.N` with N random instructions from the instruction pool defined in the configuration file. A new test case will be generated this way in each fuzzing round, allowing for a wide variety of test cases while still adhering to the structure defined in the template. For example, if `-n 10` is specified, the generator will produce 10 test cases based on the template, each with different random instruction sequences. ================================================ FILE: docs/index.md ================================================ --- title: "Revizor" hide: - navigation - toc ---
Revizor Logo Revizor Logo

Hardware fuzzing for the age of speculation

- __:fontawesome-solid-arrow-right: Get Started__ ---

Welcome to the Revizor documentation! Whether you're a new user looking to get started or a developer interested in contributing, you'll find all the information you need here.

[Start Here](intro/start-here.md){ .md-button .md-button--primary } [Learn Revizor](intro/01-overview.md){ .md-button } [Ask a Question](howto/ask-a-question.md){ .md-button } [Cite Revizor](ref/papers.md){ .md-button } - __:fontawesome-solid-code: Source Code__ ---

The Revizor project lives on GitHub. Explore the source code, report issues, and contribute to the project.


[GitHub](https://github.com/microsoft/side-channel-fuzzer){ .md-button } [Contributing](internals/contributing/overview.md){ .md-button } [Bug Reports](https://github.com/microsoft/side-channel-fuzzer/issues){ .md-button } [Explore Docs](structure.md){ .md-button } - __:fontawesome-solid-comments: Join the Community__ ---

Join the Revizor community to get help, discuss ideas, suggest features, and share your experiences.


[Zulip Community](https://rvzr.zulipchat.com/){ .md-button } [GitHub Discussions](https://github.com/microsoft/side-channel-fuzzer/discussions){ .md-button }
--- ## __:fontawesome-solid-bug: Trophies__{ .trophies-header } #### Transient Scheduler Attack - L1 Cache (TSA-L1) === "Description" A speculative leak affecting AMD Family 19h processors where false completions in load instructions can leak data from the L1 data cache across security boundaries. The attack exploits the linear address-based microtag system used for L1 cache lookups - when a load finds a matching microtag entry but the L1 doesn't contain valid data, invalid data from the matching microtag entry is used in a false completion. This leak enables information disclosure between kernel/userspace, hypervisor/guest, across different applications or VMs, and from SEV-SNP VMs to the host. === "CVE" [CVE-2024-36357](https://nvd.nist.gov/vuln/detail/CVE-2024-36357) === "Links" * More details in: [Enter, Exit, Page Fault, Leak: Testing Isolation Boundaries for Microarchitectural Leaks](https://aka.ms/enter-exit-leak) * AMD Security Advisory: [Advisory](https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7029.html) #### Transient Scheduler Attack - Store Queue (TSA-SQ) === "Description" A speculative leak affecting AMD Family 19h processors where false completions in Store-To-Load Forwarding operations can leak data from previous store instructions. When a load matches an older store's address but the store data isn't yet available, a false completion occurs using invalid data from a previously executed store that occupied the same store queue entry. This effect enables information leakage from the OS kernel to user applications, hypervisor to guest, and to a lesser extent, between application. === "CVE" [CVE-2024-36350](https://cve.mitre.org/cgi-bin/cvename.cgi?name=2024-36350) === "Links" * More details in: [Enter, Exit, Page Fault, Leak: Testing Isolation Boundaries for Microarchitectural Leaks](https://aka.ms/enter-exit-leak) * AMD Security Advisory: [Advisory](https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7029.html) #### Control Register Speculation === "Description" A speculative leak affecting AMD processors where user processes can speculatively infer control register values even when User Mode Instruction Prevention (UMIP) is enabled. This bypasses intended security boundaries by allowing unprivileged code to access system-level configuration information through speculative channels. === "CVE" [CVE-2024-36348](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-36348) === "Links" * More details in: [Enter, Exit, Page Fault, Leak: Testing Isolation Boundaries for Microarchitectural Leaks](https://aka.ms/enter-exit-leak) * AMD Security Advisory: [Advisory](https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7029.html) #### TSC_AUX Speculation === "Description" A speculative leak affecting AMD processors affecting AMD processors that permits user processes to infer the Time Stamp Counter Auxiliary (TSC_AUX) register value even when direct reads are disabled. === "CVE" [CVE-2024-36349](https://nvd.nist.gov/vuln/detail/CVE-2024-36349) === "Links" * More details in: [Enter, Exit, Page Fault, Leak: Testing Isolation Boundaries for Microarchitectural Leaks](https://aka.ms/enter-exit-leak) * AMD Security Advisory: [Advisory](https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7029.html) #### Divider State Sampling (DSS) === "Description" A speculative leak where division-by-zero operations can transiently return values that depend on previous division operations. The leaked state persists across privilege boundaries. The discovery of the leak triggered a patch to the Linux kernel as well as other operating systems. === "CVE" [CVE-2023-20588](https://nvd.nist.gov/vuln/detail/CVE-2023-20588) === "Links" More details in: [Speculation at Fault](https://www.usenix.org/system/files/usenixsecurity23-hofmann.pdf) #### String Comparison Overrun (SCO) === "Description" Revizor discovered that string operations on Intel and AMD CPUs (in particular, string comparison and string scan) can speculatively bypass the bounds of their target strings, which permits the attacker to leak data from out-of-bounds memory locations. === "Links" More details in: [Hide & Seek with Spectres](https://www.microsoft.com/en-us/research/publication/hide-and-seek-with-spectres-efficient-discovery-of-speculative-information-leaks-with-random-testing/) #### Zero Dividend Injection (ZDI) === "Description" 64-bit division operations on Intel CPUs can speculative ignore the upper bits of the divisor, thus producing an incorrect computational result. This speculation can potentially impact the security of cryptographic algorithms that use division to implement modulo operations. === "Links" More details in: [Hide & Seek with Spectres](https://www.microsoft.com/en-us/research/publication/hide-and-seek-with-spectres-efficient-discovery-of-speculative-information-leaks-with-random-testing/) #### Read-Modify-Write Speculation === "Description" A new variant of Microarchitectural Data Sampling (MDS) where a store operation to read-only memory triggers speculative behavior. When a read-modify-write instruction (like XADD) attempts to access read-only memory, it speculatively returns stale data from internal CPU buffers, even though the read itself would be permitted. === "Links" More details in: [Speculation at Fault](https://www.usenix.org/system/files/usenixsecurity23-hofmann.pdf) #### Non-canonical Store Forwarding === "Description" A speculative leak where stores to non-canonical addresses can be forwarded to subsequent loads from the canonical versions of those addresses. This means that even though a store operation fails due to an invalid address format, its data can still be transiently accessed by later instructions using a related valid address. === "Links" More details in: [Speculation at Fault](https://www.usenix.org/system/files/usenixsecurity23-hofmann.pdf) #### Variable-latency Spectre === "Description" A variant of Spectre vulnerability where the leakage is caused by the race condition that appears when a speculative memory access is data-dependent on a variable-latency instruction. This race condition can expose the operands of the variable-latency instruction. === "Links" More details in [the Revizor paper](https://www.microsoft.com/en-us/research/publication/revizor-testing-black-box-cpus-against-speculation-contracts/) #### Store-based Spectre V1 === "Description" Several defense proposals (e.g., STT, KLEESpectre) assumed that stores do not modify the cache state until they retire. We used Revizor to validate this assumption, and discovered that is not true on recent Intel CPUs (e.g., CoffeeLake). === "Links" More details in [the Revizor paper](https://www.microsoft.com/en-us/research/publication/revizor-testing-black-box-cpus-against-speculation-contracts/) #### Speculative Store with Forwarding === "Description" Revizor discovered that two consecutive loads from the same address can speculatively return two different values if one of them receives a forwarded value from a store while the other load experiences a speculative store bypass. This combination exposes more information to the attacker compared to the original store bypass. === "Links" More details in [the appendix to the Revizor paper](https://www.microsoft.com/en-us/research/publication/revizor-testing-black-box-cpus-against-speculation-contracts/) ================================================ FILE: docs/internals/architecture/analysis.md ================================================ | | | | ---------------- | ------------------ | | Module | `rvzr/analyser.py` | | Public interface | `Analyser` | | Inputs | `CTrace`, `HTrace` | | Outputs | `Violation` | The Analyser compares contract traces with hardware traces to detect violations. The core principle: inputs with identical CTraces should produce equivalent HTraces. When they don't, a contract violation has occurred. ```python For all inputs i, j: if CTrace(i) == CTrace(j) and HTrace(i) != HTrace(j): → Violation detected ``` Analyser implementations: Different analysers define "equivalent HTrace" differently: - `MergedBitmapAnalyser` (default) — Merges samples using bitwise OR, compares bitmaps. For cache-based channels. - `SetAnalyser` — Compares sets of unique samples. - `MWUAnalyser` — Uses Mann-Whitney U statistical test. For timing-based channels. - `ChiSquaredAnalyser` — Uses chi-squared test for distribution differences. ================================================ FILE: docs/internals/architecture/code.md ================================================ # Test Case Code Generation | | | | ---------------- | ------------------------ | | Module | `rvzr/code_generator.py` | | Public interface | `CodeGenerator` | | Inputs | `InstructionSet` | | Outputs | `TestCaseProgram` | This module generates random assembly programs for testing. The generator creates programs designed to trigger speculative execution and expose microarchitectural leaks. ### Generation process 1. Create control flow graph — Generate a random Directed Acyclic Graph (DAG) of basic blocks. The DAG structure prevents infinite loops while allowing branches and mispredictions. 2. Add jump instructions — Insert conditional and unconditional jumps at block boundaries to connect the blocks according to the DAG. 3. Fill basic blocks — Populate blocks with random instructions from the tested instruction pool, respecting instruction frequencies and operand constraints. 4. Instrument — (Optionally) Prevent faults by masking memory addresses, avoiding division by zero, and ensuring all accesses stay within the sandbox. 5. Assemble — Convert to binary and extract metadata. 6. Transform into RCBF — Serialize the test case into Revizor's custom binary format ([RCBF](../../ref/binary-formats.md)) for execution. ### Test case representation ```text TestCaseProgram ├─ CodeSection (one per actor) │ └─ Function │ └─ BasicBlock │ └─ InstructionNode │ └─ Instruction │ └─ Operand └─ TestCaseBinary └─ SymbolTable ``` ### Variants Architecture-specific implementations of the code generator exist for x86 and ARM64, named `X86Generator` and `ARM64Generator` in `rvzr/arch/*/code_generator.py` ================================================ FILE: docs/internals/architecture/data.md ================================================ # Test Case Data Generation | | | | ---------------- | ------------------------ | | Module | `rvzr/data_generator.py` | | Public interface | `DataGenerator` | | Inputs | `Config` | | Outputs | `InputData` | `DataGenerator` generates input data that is used to initialize registers and memory before executing a test case, on both the model and the target hardware. ## Generation modes Two input generation modes are supported: ### Standard generation Interface: `DataGenerator.generate(...)` This method creates fully random inputs using a PRNG. Can optionally reduce entropy (to increase trace collisions) or inject special values (zeros, boundary values) to trigger edge cases. ### Boosted generation Interface: `DataGenerator.generate_boosted(...)` Boosted generation solves the following challenge: Two detect a violation via relational non-interference testing, we always need at least two inputs that produce identical contract traces (see [Trace Analysis](overview.md#6-trace-analysis)). Generating such contract-equivalent inputs through pure randomness is extremely inefficient because the entropy of contract traces is usually very high, and thus most random inputs produce unique traces. Boosted generation addresses this by leveraging dynamic taint analysis on the model side. It works as follows: Start by producing a set of random inputs using standard generation. Then, we execute the test case with each input in the model and perform backwards taint analysis to identify which input bytes affect the contract trace (tainted) and which don't (untainted). This produces a set of `InputTaint` objects that map input bytes to their taint status. These taint maps a fed back into the `generate_boosted()` method, which creates new inputs such that the tainted bytes remain fixed while the untainted bytes are randomized. ```text Original InputData → Model → InputTaint → N contract-equivalent inputs ``` Such "boosted" inputs are guaranteed to produce the same contract trace as the original input while still being mostly random. ## Data Representation Each input is represented as an `InputData` object, which is a numpy structured array containing - Memory contents - General-purpose registers - SIMD registers - Flags and special registers for each actor in the test case. This object can be serialized into Revizor's custom binary format ([RDBF](../../ref/binary-formats.md)) for consumption by the model and executor. ================================================ FILE: docs/internals/architecture/exec.md ================================================ # Hardware Tracing | | | | ---------------- | --------------------------------------- | | Module | `rvzr/executor.py`, `rvzr/executor_km/` | | Public interface | `Executor` | | Inputs | `TestCaseProgram`, `InputData` | | Outputs | `HTrace` | ## Executor The Executor runs test cases on real hardware and collects hardware traces (HTraces) using side-channel measurements. It uses a two-layer architecture: Python code communicates with a kernel module that performs measurements in kernel space. ```text Python (executor.py) ├─ X86IntelExecutor ├─ X86AMDExecutor └─ ARM64Executor │ │ /sys/rvzr_executor/ interface ▼ Kernel Module (executor_km/) ``` ## HTrace representation The `HTrace` class (`rvzr/traces.py`) represents hardware traces collected during execution. The executor produces one `HTrace` object per program-input pair, meaning that for each `TestCaseProgram` execution with each `InputData` input, one `HTrace` is generated. Each `HTrace` encapsulates multiple measurements results (samples): This is because the executor typically repeats the execution several times and each execution produces one measurement sample. Such repeated measurements allow us to apply statistical methods when comparing noisy hardware traces (see [Trace Analysis](analysis.md) below). The structure of an `HTrace` is as follows: ```text HTrace └─ Array[RawHTraceSample] ├─ trace Main measurement (cache bitmap, timestamp, or registers) └─ pfc0-pfc4 Performance counter values ``` ================================================ FILE: docs/internals/architecture/fuzz.md ================================================ # Orchestration Module | | | | ---------------- | ---------------------------------------- | | Module | `rvzr/fuzzer.py` | | Public interface | `Fuzzer` | | Inputs | `Config`, `InstructionSet`, ASM Template | | Outputs | Violation artifact, logs | The `Fuzzer` class is the main coordinator. It manages the core components (`CodeGenerator`, `DataGenerator`, `Model`, `Executor`, and `Analyser`) and orchestrates the fuzzing loop. ## Main workflow ```text Fuzzer.start() └─> for each test case: ├─> CodeGenerator.create_test_case() → TestCaseProgram ├─> DataGenerator.generate() → List[InputData] └─> Fuzzer.fuzzing_round(program, inputs) ├─> Model.trace_test_case() → List[CTrace] ├─> Executor.trace_test_case() → List[HTrace] ├─> Analyser.filter_violations() → List[Violation] └─> if violation: multi-stage filtering pipeline ``` ## Multi-stage filtering When a potential violation is found, the Fuzzer runs it through several validation stages. Each stage modifies parameters and re-checks the violation to rule out false positives: 1. `fast` — Initial fast detection using minimal speculative nesting on the model side and small sample size on the executor side 2. `nesting` — Re-collect ctraces with the model using full speculative nesting. This rules out false positives caused by incomplete speculation modeling 3. `taint_mistake` — Re-collect ctraces for the boosted inputs to rule out boosting-based generation mistakes 4. `priming` — Perform a so-called "priming test" (swap the order of violating inputs) to rule out false positives caused by inconsistent microarchitectural state across executions 5. `noise` — Increase sample size on the executor side to increase statistical confidence and rule out noise-induced violations 6. `arch_mismatch` — Compare the architectural output (i.e., register/memory states) of the model and executor to rule out violations caused by functional mismatches (i.e., by bugs in the model or executor) If a violation survives all stages, Revizor saves a reproduction package (called "violation artifact") containing the test case, inputs, configuration, and detailed report. ## Fuzzer variants The `Fuzzer` class is abstract. There are several variants modifying the baseline logic: - `X86Fuzzer` / `ARM64Fuzzer` — Architecture-specific implementations - `ArchitecturalFuzzer` — Validates model correctness (i.e., performs stage 6 `arch_mismatch` for all test cases, even non-violating ones) - `ArchDiffFuzzer` — Completely discards the model, and instead compares two hardware executions, one with a normal test case and one with a speculation fence added after every instruction. This variant is used to detect speculation-induced architectural bugs, like zenbleed. ================================================ FILE: docs/internals/architecture/isa.md ================================================ # Instruction Set Specification | | | | ---------------- | ------------------ | | Module | `rvzr/isa_spec.py` | | Public interface | `InstructionSet` | | Inputs | `base.json` | | Outputs | `InstructionSet` | This module manages the instruction set available for fuzzing. It loads ISA definitions from a JSON file (`base.json`) and applies user-configured filters to create a pool of allowed instructions. Each instruction is represented by an `InstructionSpec` containing instruction name and category, operand specifications, and instruction properties. Processing pipeline: 1. Load ISA specification from JSON 2. Apply filters (allowlist, blocklist, categories, register restrictions) 3. Remove duplicates 4. Categorize instructions by type (control flow, memory access, etc.) ================================================ FILE: docs/internals/architecture/logging.md ================================================ # Logging | | | | ---------------- | ----------------------------- | | Module | `rvzr/logs.py` | | Public interface | `FuzzLogger`, etc. | | Inputs | N/A | | Outputs | Log messages (stdout, stderr) | Revizor uses a centralized logging system with configurable verbosity. The system uses the Borg pattern to share state across modules. Available logging modes: - info — General messages and progress - stat — Statistics - dbg_* — Debug modes for specific components Logging components: - Basic functions: `error()`, `warning()`, `inform()`, `dbg()` - Module-specific loggers: `FuzzLogger`, `GeneratorLogger`, `ISALogger`, `ExecutorLogger`, `AnalyserLogger` ================================================ FILE: docs/internals/architecture/mini.md ================================================ # Post-violation Analysis | | | | ---------------- | ------------------------------- | | Module | `rvzr/postprocessing/` | | Public interface | `Minimizer` | | Inputs | Violation artifact (.asm, .bin) | | Outputs | Minimized test case and inputs | After confirming a violation, users can run post-processing to simplify the test case and identify the root cause. The postprocessing module applies minimization passes that reduce complexity while preserving the violation. Class hierarchy: ```text Minimizer └─ Orchestrates passes, manages files BaseMinimizationPass ├─ Instruction passes (modify code) ├─ Data passes (modify inputs) └─ Analysis passes (add annotations) ``` Instruction passes (operate on test case code): - `InstructionRemovalPass` — Remove instructions one at a time to find essential ones - `NopReplacementPass` — Replace with NOPs (preserves alignment) - `InstructionSimplificationPass` — Replace complex instructions with simpler ones - `ConstantSimplificationPass` — Simplify immediate values - `MaskSimplificationPass` — Simplify bitmasks - `LabelRemovalPass` — Remove unused labels - `FenceInsertionPass` — Insert fences to identify speculation boundaries Data passes (operate on inputs): - `DifferentialInputMinimizerPass` — Use delta debugging to find minimal byte differences - `InputSequenceMinimizationPass` — Reduce number of inputs Analysis passes (add annotations): - `AddViolationCommentsPass` — Annotate assembly with memory addresses from execution ================================================ FILE: docs/internals/architecture/model.md ================================================ # Contract Tracing | | | | ---------------- | ------------------------------ | | Module | `rvzr/model.py` | | Public interface | `Model` | | Inputs | `TestCaseProgram`, `InputData` | | Outputs | `CTrace` | ## Model The Model executes test cases according to a leakage contract and produces contract traces (CTraces). These represent the information expected to leak during execution, including speculative execution. Revizor supports two model backends: - **Unicorn**: This backend is based on the [Unicorn CPU emulator](https://www.unicorn-engine.org/). It implements the contract by hooking into instruction execution and memory access events. Documentation is provided in [Unicorn Backend](../model-backends/model-unicorn.md). - **DynamoRIO**: This backend uses [DynamoRIO](https://dynamorio.org/) for dynamic binary instrumentation. It instruments the test case to insert hooks for tracing and speculation simulation. Documentation is provided in [DynamoRIO Backend](../model-backends/model-dr.md). Both implement the same interface defined by the abstract `Model` class. ## Contract Trace Representation A `CTrace` is a sequence of typed observations representing leaked information: ```text CTrace └─ List[CTraceEntry] ├─ mem Memory address ├─ pc Program counter ├─ val Data value ├─ reg Register value └─ ind Indirect branch target ``` CTraces use `xxhash` for fast equality checking, enabling efficient grouping into equivalence classes. ================================================ FILE: docs/internals/architecture/overview.md ================================================ # Architecture Overview & Code Structure This document introduces Revizor's architecture and key components. It is designed to provide an overview of how the codebase is organized and how the main pieces work together. !!! info "Prerequisites" This document assumes familiarity with the concepts of side-channel attacks, speculative execution, and [Speculation Contracts and Model-based Relational Testing (MRT)](../../topics/contracts.md). ## How Revizor Works Revizor detects CPU security vulnerabilities using Model-based Relational Testing (MRT). The core idea is to compare what a CPU should leak (according to a leakage model) with what it actually leaks during execution. Basic process: 1. Generate random assembly programs 2. Execute them on both a leakage model and real hardware 3. Compare the observed hardware behavior with the model's predictions 4. If they match, the CPU behaves as expected (discard the test) 5. If they differ, a potential vulnerability has been found The leakage model acts as a reference model of the expected CPU behavior. If the real CPU leaks more information than the model predicts (i.e., if it diverges from the reference), this indicates a potential security vulnerability. For details on how leakage models work, see [Speculation Contracts](../../topics/contracts.md). Revizor runs the following loop until it finds a violation or completes the configured number of test cases: ![architecture](../../assets/fuzzing-flow.png) ## 1. Initialization This step runs once at startup. Revizor reads the fuzzing configuration, which specifies: - Target CPU architecture - ISA (instruction set) specification - Which instructions to test - Which side channels to monitor - Other fuzzing parameters The `cli.py` module handles command-line arguments and creates the main objects: `InstructionSet` (from `isa_spec.py`), `Config` (from `config.py`), and `Fuzzer` (from `fuzzer.py`). ## 2. Code Generation Each fuzzing round starts by generating a random test program. This is an assembly program with semi-random control flow, built from a pool of allowed instructions. The code generator can be configured to control the shape of the control flow graph, which instructions to include, and how often each instruction appears. It also (optionally) instruments the program to prevent faults like division by zero. The `Fuzzer` calls `CodeGenerator.create_test_case()` (in `code_generator.py`), which returns a `TestCaseProgram` object representing the generated assembly program. ## 3. Data Generation Next, Revizor generates random inputs for the test program. Each input contains initial values for registers and memory. These values are pseudo-random but use fixed seeds for reproducibility. The `DataGenerator` class (in `data_generator.py`) creates these inputs and returns them as `InputData` objects. See [binary formats](../../ref/binary-formats.md#revizor-data-binary-format-rdbf) for the structure of input data. ## 3.5 Test Case Filtering (Optional) Some test cases are unlikely to reveal vulnerabilities, so Revizor can filter them out early to save time. This is optional and disabled by default. Two filters are available: - Speculation filter: Uses performance counters to check if the test case triggers branch mispredictions. Without mispredictions, the test cannot expose speculative leaks. - Observation filter: Compares the original test case with a "fenced" version (with serialization instructions added). If both produce identical traces, speculation left no observable effects. These filters are implemented in architecture-specific fuzzer classes (like `X86Fuzzer` in `rvzr/arch/x86/fuzzer.py`). ## 4. Model Execution The model executes the test program with each generated input and produces contract traces (CTraces). These traces represent what the model predicts should leak during execution. The `Model` class (in `model.py`) provides two key methods: - `load_test_case()`: Loads the program into the model - `trace_test_case()`: Executes the program with each input and returns CTraces Revizor supports multiple model backends: [Unicorn](../model-backends/model-unicorn.md) (CPU emulator) and [DynamoRIO](../model-backends/model-dr.md) (dynamic instrumentation). Both implement the same interface. ## 5. Hardware Execution The executor runs the test program on the target hardware with each input and collects hardware traces (HTraces). A hardware trace is a set of observable microarchitectural effects (like cache state or timing) caused by the test case execution. Traces are typically collected using side-channel techniques (e.g., Prime+Probe, Flush+Reload) or by reading performance counters. To ensure that the measurements reflect the test case execution (rather than noise), the executor creates a controlled measurement environment by disabling interrupts, flushing caches, and repeating executions multiple times. The `Executor` class (in `executor.py`) works through a kernel module (`executor_km/`) that performs measurements in kernel space. It provides the same interface as the model: `load_test_case()` and `trace_test_case()`. ## 6. Trace Analysis The analyzer compares contract traces (what should leak) with hardware traces (what actually leaked) to detect violations. Instead of directly comparing traces, it uses an equivalence class approach. How it works: 1. Group by contract: Inputs with identical CTraces form a ContractEqClass. According to the model, these inputs should be indistinguishable. 2. Group by hardware: Within each ContractEqClass, inputs with similar HTraces form HardwareEqClasses. These inputs are actually indistinguishable on real hardware. 3. Detect violations: If a ContractEqClass splits into multiple HardwareEqClasses, a violation has occurred. The model says the inputs should look the same, but hardware reveals differences between them. This approach focuses on information leakage rather than exact trace values, and it essentially implements a non-interference check (see [Theoretical Foundations](../../topics/contracts.md)). The `Analyser` class (in `analyser.py`) implements this logic in its `filter_violations()` method. ## 7. Post-violation Analysis When Revizor detects a potential violation, it runs additional tests to filter out false positives. These tests modify execution parameters and verify the violation still occurs. See [post-violation tests](mini.md) for details. If the violation survives all filters, Revizor reports it to the user and saves reproduction artifacts. The user can then use [minimization tools](../../howto/minimize.md) to simplify the test case and identify the root cause. The post-violation logic is implemented in `Fuzzer.fuzzing_round()`, and the `FuzzLogger` class handles reporting. ================================================ FILE: docs/internals/code-structure.md ================================================ # Code Structure The Revizor codebase is organized into the following main directories: ```text rvzr/ Main source code directory containing core fuzzing logic ├── *.py Core modules that implement main fuzzing components ├── tc_components/ Test case representation objects (code and data) ├── model_unicorn/ Unicorn-based leakage model ├── model_dynamorio/ DynamoRIO-based leakage model ├── executor_km/ Kernel module that implements the hardware executor ├── postprocessing/ Minimization utilities for contract counterexamples └── arch/ Architecture-specific implementations (x86/ and arm64/) tests/ Unit and integration tests docs/ Documentation files ``` The main entry point is `rvzr/cli.py`, which parses command-line arguments and initializes the `Fuzzer` object. ================================================ FILE: docs/internals/contributing/code-style.md ================================================ # Code Style Please follow these coding standards when writing code for inclusion in Revizor. ## Python * Unless otherwise specified, follow PEP 8. But remember that PEP 8 is only a guide, so respect the style of the surrounding code as a primary goal. * An exception to PEP 8 is our rules on line lengths. Don’t limit lines of code to 79 characters if it means the code looks significantly uglier or is harder to read. We allow up to 100 characters. * All files should be formatted using the `flake8` auto-formatter. Use all default settings except for the line width (`--max-line-length 100`) * The Python and C files use 4 spaces for indentation, and YAML uses 2 spaces. * The project repository includes an .editorconfig file. We recommend using a text editor with EditorConfig support to avoid indentation and whitespace issues. * Use underscores, not camelCase, for variable, function and method names (i.e. poll.get_unique_voters(), not poll.getUniqueVoters()). * Use InitialCaps for class names (or for factory functions that return classes). * In docstrings, follow PEP 257. ## C * All files should be formatted using the `clang-format`. The settings are included into the `.clang-format` files in the directories with C files. Just run the formatter with: `clang-format -i *.c` ## Misc * Remove import statements that are no longer used when you change code. flake8 will identify these imports for you. If an unused import needs to remain for backwards-compatibility, mark the end of with `# NOQA` to silence the flake8 warning. * Systematically remove all trailing whitespaces from your code as those add unnecessary bytes, add visual clutter to the patches and can also occasionally cause unnecessary merge conflicts. Some IDE’s can be configured to automatically remove them and most VCS tools can be set to highlight them in diff outputs. ================================================ FILE: docs/internals/contributing/general.md ================================================ # General Development Guidelines ## Testing To run automated tests you will need to install a few more dependencies: * [Bash Automated Testing System](https://bats-core.readthedocs.io/en/latest/index.html) * [mypy](https://mypy.readthedocs.io/en/latest/getting_started.html#installing-and-running-mypy) * [flake8](https://flake8.pycqa.org/en/latest/index.html) With the dependencies installed, you can run the tests with: ```bash ./tests/runtests.sh ``` Note that some of the acceptance tests are microarchitecture-dependent. These tests are labeled "Detection" (e.g., `"Detection [spectre-type] Spectre V1; load variant"`), and they may fail if the CPU under test does not have a given vulnerability. Generally, if a few of these tests fail, it is not a problem, but if all of them (or a significant portion) fail, it indicates an issue with the fuzzer. ## Submitting Patches To submit a patch, use the following procedure: * Fork Revizor on github: [https://docs.github.com/en/github/getting-started-with-github/fork-a-repo](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) * Create a topic branch: ```bash git checkout -b my_branch ``` * Make sure all tests pass (see [Testing](#testing)) * Make sure your code follows the guidelines in [Code Style](code-style.md) * Push to your branch ```bash git push origin my_branch ``` * Initiate a pull request on github: [https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request) * Wait for the PR to get reviewed and merged ================================================ FILE: docs/internals/contributing/git.md ================================================ # Git Workflow Guidelines ## Git Messages We practice the following conventions for commit messages: ``` : [] ``` Where: * ``: The scope of the change. * ``: The type of the change. * ``: A short description of the change. ### Scopes The following scopes are typical: | Scope | Description | | ----------- | ---------------------------------------------------------------- | | `all` | Changes that affect the entire project (e.g., major refactoring) | | `root` | Root directory changes (e.g., readme, git, author list) | | `fuzz` | Changes to the core fuzzer algorithm. | | `cli` | Changes to the command-line interface. | | `exec` | Changes to the executor. | | `model` | Changes to the model. | | `analyser` | Changes to the analyser. | | `mini` | Changes to the postprocessor (i.e., minimizer). | | `code_gen` | Changes to the program generator | | `data_gen` | Changes to the input generator | | `tests` | Changes to the tests | | `isa` | Changes to the ISA loader or to `get_spec` files | If a commit covers several scopes, use the most relevant one. If a commit targets a specific architecture (e.g., x86), add the architecture to the scope (e.g., `fuzz/x86`). ### Types Use one of the following types: | Type | Description | | -------- | ----------------------------------------------------------------------------- | | `feat` | A new feature. | | `fix` | A bug fix. | | `docs` | Documentation changes. | | `chore` | Changes to the build process or auxiliary tools. | | `ft` | Fault tolerance changes (e.g., adding error handling or recovery mechanisms). | | `refact` | Refactoring of the codebase. This includes code style change. | | `perf` | Performance improvements. | | `revert` | Reverts a previous commit. | If possible, try to use only these types. If you need to use a different type, please discuss it with a maintainer. ## Git Branches We practice the [git workflow](https://git-scm.com/docs/gitworkflows), with a few modifications. ![branching workflow](../../assets/branches.png) We use the following branches for graduation: * `main`: The latest release. This branch should always be stable, and it is the last branch to receive changes. * `main-fixes`: Commits that go in the next maintenance release. This branch is created from the last release branch. * `dev`: The development branch. This branch is the first to receive changes. Commits should be merged upwards: * `dev` -> `pre-release` -> `main` * In case of hot fixes, `main-fixes` -> `main` AND `main-fixes` -> `pre-release` For working on unstable code (e.g., progress on features or bug fixes), use either forks or feature branches. Use forks if you are the only one working on the feature, and use a pull request to merge the changes back into the main repository. Use a feature branch if multiple people are working on the feature, in which case name the branch `feature-` or `bugfix-`, and make sure to branch from the `dev` branch. The only exception is the `gh-pages` branch, which is used for the project's website. This branch is used by automated tools and should never be used for development. ================================================ FILE: docs/internals/contributing/overview.md ================================================ # Guide to Contributing This document provides an overview of how to contribute to the Revizor project. ## What can I contribute? Revizor is an open-source project, and we welcome contributions of all kinds. You don't have to be an expert in hardware security or fuzzing to contribute! Even small contributions are valuable. Here are some ways you can help: * :fontawesome-solid-bug: Report Issues: The easiest way to contribute is by reporting issues you encounter while using Revizor. Try following the introductory [guides and tutorials](../../intro/start-here.md), and if you find any issues, bugs, or unclear documentation, please report them on our [GitHub Issues page](https://github.com/microsoft/side-channel-fuzzer/issues). * :fontawesome-solid-pencil: Improve Documentation: You can also contribute by improving the documentation. If you find any gaps, outdated information -- even typos -- feel free to submit a pull request with your improvements. * :fontawesome-solid-code: Code Contributions: If you're interested in coding, you can contribute new features, fix bugs, or enhance existing functionality. Check out the [issue tracker](https://github.com/microsoft/side-channel-fuzzer/issues) for open issues and feature requests. * :fontawesome-solid-lightbulb: New Features: Finally, if you have expertise in hardware security, fuzzing, or related areas, consider contributing new features and enhancements to Revizor (see [ideas for contributions](#ideas-for-contributions) if you need inspiration). ## Reporting Bugs and Issues To report a bug or an issue, please use the [GitHub Issues page](https://github.com/microsoft/side-channel-fuzzer/issues). If you're reporting a simple bug, it is sufficient to provide a small description of the problem and the environment in which it occurred (Revizor version, target architecture, OS, etc.). For more complex issues, especially those related to the fuzzing process, also include the configuration file you've used and the command-line arguments. The recommended report template is as follows: ``` ## Description A clear and concise description of what the bug is. ## To Reproduce 1. Go to '...' 2. Run '...' 3. See error ## Expected behavior A clear and concise description of what you expected to happen. **Environment** - Revizor version: - Architecture: - OS: ... ## Additional context Add any other context about the problem here. ## Attachments - Configuration file used: - Command-line arguments: - Logs or error messages: ``` ## Submitting Patches To submit a patch, be it to the code or to the documentation, use the following procedure: * [Fork Revizor on github](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) * Create a topic branch (`git checkout -b my_branch`) * Make and commit your changes in the new branch * Make sure all tests pass (`./tests/runtests.sh `) and that the code is formatted accordingly to the [Code Style](code-style.md) guidelines. * Push to your branch (`git push origin my_branch`) * [Initiate a pull request on github](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request) * Wait for the PR to get reviewed and merged #### Contributor License Agreement and Code of Conduct Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit [https://cla.opensource.microsoft.com](https://cla.opensource.microsoft.com). When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. ## Ideas for Contributions If you're looking for ideas and inspiration on how you can meaningfully extend and improve Revizor, here are some suggestions: --- #### Add Support for New Instructions There are quite many specialized instructions that Revizor does not yet fully support. Implementing support for these instructions can help improve the coverage and effectiveness of the fuzzer. As a bonus, you might discover new type of information leaks in the process. These include, but are not limited to: * Floating-point instructions (either x87 or SSE/AVX) * Segment-based memory accesses or instructions that manipulate segment registers * Complex control-flow instructions (e.g., `call`, `ret`, indirect jumps) * MMX instructions --- #### Make Generators Smarter Both code and data generators can be improved in various ways to produce more effective test cases. The bar is fairly low here, as current generators are fully random. Ideas include: * Bias generators to produce values that are more likely to trigger edge cases (e.g., boundary values, special bit patterns) * Implement ability to control the frequency of certain instruction types in generated programs * Implement mutation-based generation strategies that modify existing test cases to explore new behaviors If you decide to work on any of these or have your own ideas, please discuss them with us first by reaching out on [GitHub Discussions](https://github.com/microsoft/side-channel-fuzzer/discussions) or opening a draft pull request. This way we can ensure that your efforts align with the project's goals and avoid duplication of work. --- #### Improve Reporting Tools The current logging and debugging tools in Revizor are relatively basic. Enhancing these utilities for better readability and usability can significantly aid users in understanding fuzzing results and diagnosing issues. Ideas include: * Refactor the logging module to output a live dashboard, similar to what is seen in other fuzzers like AFL or libFuzzer * Improve the debugging output to improve readability when debugging models --- #### Implement New Measurement Modes Revizor currently collects side-channel observations primarily through cache measurements or by recording the execution time of test programs. Implementing additional measurement modes can help uncover new types of leaks and improve the fuzzer's effectiveness. New measurement modes could include: * Instruction cache measurements (e.g., using I-cache side channels) * Contention-based measurements (e.g., measuring resource contention on the memory bus) * Performance counter-based measurements (i.e., reading directly from CPU performance counters) Beyond that -- if you're brave enough -- you could attempt implementing concurrent measurement modes, for example, by running each actor in a test case on a different core or SMT thread. This is a complex task that requires significant changes to executor, and might require new techniques for dealing with non-determinism and imprecise synchronization. But if successful, it could open up new avenues for discovering cross-core or cross-thread leaks. You might even make a paper out of it. --- #### Implement Coverage-Guided Fuzzing Another interesting avenue for exploration is implementing proxy-based coverage metrics. Currently, Revizor runs in a fully random mode, without any feedback being collected in the process of fuzzing. Implementing coverage-guided fuzzing techniques could significantly improve the efficiency of the fuzzer. Ideas include: * Proxy-based coverage metrics, where an emulator or a simulator is used as a proxy for the CPU coverage. That is, the fuzzer would run test cases on an emulator, collect the software coverage information (which edges of the emulator code were executed), and use that to guide the generation of new test cases. * Specification-based coverage metrics, where a formal specification of the instructions (e.g., ARM Architecture Specification Language) is used to determine edge cases in the execution of instructions. The fuzzer would then aim to cover all possible behaviors defined in the specification. ================================================ FILE: docs/internals/index.md ================================================ # Developer Documentation This section provides technical documentation for developers contributing to Revizor. ## Development Guidelines - [Guide to Contributing](contributing/overview.md): Overview of the contribution process and resources - [General Guidelines](contributing/general.md): Development environment setup, testing procedures, contribution workflow - [Code Style](contributing/code-style.md): Formatting conventions for Python and C code, naming conventions - [Git Workflow](contributing/git.md): Branch management, commit message format, merge procedures ## Architecture and Modules - [Code Structure](code-structure.md): Organization of the source code directory and key modules - [Overview](architecture/overview.md): High-level system architecture and component interaction - [Orchestration](architecture/fuzz.md): Main fuzzing loop and coordination between components - [ISA Specification](architecture/isa.md): Instruction set architecture definitions and JSON-based specification format - [Test Case Code Generation](architecture/code.md): Program generation algorithm and relevant classes - [Test Case Data Generation](architecture/data.md): Data generation algorithm and relevant classes - [Hardware Tracing](architecture/exec.md): Execution of test cases on the target HW and hardware trace collection - [Contract Tracing](architecture/model.md): Leakage modeling and contract trace generation (high-level overview; implementation details in backend-specific pages) - [Trace Analysis](architecture/analysis.md): Comparison of contract and hardware traces to detect violations - [Minimization](architecture/mini.md): Post-detection reduction of test cases to minimal reproducing examples - [Logging](architecture/logging.md): Logging infrastructure and debugging facilities ## Contract Modeling Backends Revizor supports two different backends for contract-based leakage modeling. They are documented in the following pages: - [Unicorn Backend](model-backends/model-unicorn.md): Backend based on the Unicorn CPU emulator - [DynamoRIO Backend](model-backends/model-dr.md): Backend based on the DynamoRIO dynamic binary instrumentation engine ================================================ FILE: docs/internals/model-backends/model-dr.md ================================================ # DynamoRIO-based Model Backend This document describes the DynamoRIO-based model. As any other model, this backend is responsible for collecting contract traces for generated test cases. ## Design Overview This backend is composed of several parts: * The Python adapter (`rvzr/model_dynamorio/model.py`) is responsible for receiving a test case from Revizor, transforming it into a format that can be executed by the backend, triggering the backend to execute the test case, and returning the collected contract traces to Revizor. * The Test Case Loader (`rvzr/model_dynamorio/adapter.c`) is a C program that loads a test case program and a batch of inputs into its memory, and executes the test case program with each input in a sequence. * The DynamoRIO components (`rvzr/model_dynamorio/backend`) are executed together with the test case loader, and they instrument the loader binary to collect contract traces. These components can be roughly divided into the instrumentation-time components that are responsible for modifying the binary, and execution-time components that implement the model logic (i.e., the contract). [![DynamoRIO-based Model Backend](../../assets/dr-model.png)](../../assets/dr-model.png) ## Python Adapter Revizor communicates with the backend through a Python adapter (`rvzr/model_dynamorio/model.py:DynamoRIOModel`). At the beginning of the fuzzing process, Revizor configures the backend by calling `configure_clauses` method. This configuration will be later passed down to the backend when the test case is executed. During the fuzzing process, Revizor sends test cases to the backend by calling `load_test_case` method, and then triggers the backend to execute the test case by calling `trace_test_case` method. Internally, `trace_test_case` will call the backend to execute the test case and collect the contract traces. The adapter will then parse the traces and return them back to the caller. The `trace_test_case` method implements the following algorithm: - Convert test case program and inputs into RCBF and RDBF files, respectively - For each input, call the test case loader with the RCBF and RDBF files. Attach the DynamoRIO backend to the call so that the binary instrumentation is performed: ```shell ~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so --tracer -- ~/.local/dynamorio/adapter ``` - Parse contract traces from the backend and convert them into `CTrace` objects - Return the list of collected `CTrace` objects to the caller (usually, `fuzzer.py`) ## Test Case Loader Since the test cases produced by Revizor are raw binaries, they cannot be directly executed (e.g., they don't have `libc` linked). The test case loader (`rvzr/model_dynamorio/adapter.c`) is a simple C program that fixes this issue by providing a wrapper around the test case binary. The loader implements the following algorithm: - Receive the test case binary and an input from the Python adapter via CLI arguments - Load the test case binary and the input into dedicated memory regions - Print the addresses of the test case and input memory regions (for trace normalization) - Initialize registers based on the input - Jump to the test case binary entry point - Return ## DynamoRIO Tool The DynamoRIO tool (`rvzr/model_dynamorio/backend`) is responsible for instrumenting the test case loader binary and collecting contract traces. ### Implementation Overview All instrumentation logic is implemented as a DynamoRIO client. In particular, `model.cpp` contains the event callbacks that are executed at instrumentation time, while `dispatcher.cpp` contains the body of the callbacks that are inserted by the DR client and are executed before every instruction at runtime. Finally, the `Dispatcher` object holds the state that is shared between instrumentation-time callbacks and execution-time callbacks. The following figure provides an overview of the implementation. [![DynamoRIO Instrumentation Overview](../../assets/dr-instrumentation.png)](../../assets/dr-instrumentation.png) 1. `dr_client_main()` is responsible of installing the initial instrumentation callbacks to hook all relevant DR events (`module_load`, `bb_translation`, exceptions and the `exit` event) 2. `dr_client_main()` also sets the name of the function to instrument (passed by `cli.cpp`) 3. on `module_load`, the instrumentation checks for the presence of the target function in the loaded module. If found, the callback adds a `drwarp` callback (`event_instrumentation_start`) which will be executed at the start of the target function 4. once a call to the target function is found, the `event_instrumentation_start` will save the return address in a global object (`instrumented_func`) and call `start()` on the dispatcher 5. from that moment on, every translated basic block is instrumented by our client, in particular: - a `dispatch_callback()` is inserted before every instruction - at the function exit point (i.e. the previously saved return addres) an `exit_callback` is inserted 6. these callbacks are executed at runtime with the following effects: - the `dispatch_callback()` implements the observation and execution clauses (see next section) - the `exit_callback()` checks the current speculation state before exiting: - speculative exits cause a rollback - architectural exit causes the instrumentation to stop Finally, exceptions and the `exit` event are also forwarded to the Dispatcher: - Speculative **exceptions** will cause a rollback, while architectural ones are forwarded to the target program - The **exit** event stops instrumentation and flushes all logs (in case the exit callback has not been executed architecturally) ### Instrumentation Components The instrumentation components modify the binary of the test case loader by adding a call to the function `dispatch_callback` before every instruction in the binary (or more specifically, every instruction in the `test_case_entry` function of the loader). The tool interacts with DynamoRIO through the `model.cpp` module. This module registers an event for entering the `test_case_entry`, which triggers the flush of the internal DynamoRIO code fragment cache and the start of instrumentation. The module also registers an event for every instruction in the `test_case_entry`, and the event in turn calls the `Dispatch::instrument_instruction()`. Finally, exceptions are hooked and passed to the dispatcher through `Dispatch::handle_exception()`, which can decide to either handle the signal (e.g. on speculative paths) or forward it to the test case (e.g. architectural exceptions). The `Dispatch` class implements the actual instrumentation logic. When the `instrument_instruction()` method is called, it inserts a clean call to the `dispatch_callback` function before the instruction. The call receives the PC and opcode of the instruction as arguments. DynamoRIO also automatically saves the complete register state before the call, thus making it available to `dispatch_callback`. ### Execution-Time Components The execution-time components are responsible for implementing the contract logic, and are triggered by the `dispatch_callback` function. At the current state of the backend, the dispatch callback invokes only two classes, Tracer and Speculator, that implement the observation and execution clauses, respectively. Optionally, each component can log additional events, e.g. speculation rollbacks or the current register state, through a shared `Logger` component. Subclasses of `TracerABC` record contract-relevant information via `observe_instruction` and `observe_mem_access` methods. E.g., `TracerCT` implements `CT` observation clause by recording the PC of instructions upon `observe_instruction` and the address of memory accesses upon `observe_mem_access`. Currently, `observe_exception` simply adds a special entry to the trace to indicate that the program ended due to an (architectural) exception. Subclasses of `SpeculatorABC` implement the contract speculation logic. E.g., `SpeculatorCond` implements `speculate_instruction`. When this method is called with a branch instruction, the class takes a checkpoint of the process state, flips the branch condition (i.e., modified `FLAGS` register), and continues the execution. During the simulated speculation, each call to `speculate_instruction` counts the number of executed instructions, and when the number reaches the limit (e.g., 256), the class restores the checkpoint and continues the execution from the original state. (Actually, the algorithm is more complex, but this is the general idea.) When the instrumentation ends (according to `model.cpp`), the tracer's `tracing_finalized` method is called, during which any remaining traces are flushed into the trace file, together with an "End Of Trace" entry. The Python adapter will then read the trace file, decode it, and return the corresponding CTrace to Revizor. ### Standalone Usage The DR tool can be used as a standalone tool to collect the runtime trace of any program, independently from the rest of Revizor's infrastructure. A typical usage is for example: ```shell ~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so --tracer --speculator -- ls /dev/null ``` By default, this will instrument `ls` starting from `__libc_start_main` until the end of the program, run it with `/dev/null` as an argument, and generate a binary file called `rvzr_trace.dat` that contains the collected trace. Other flags can be printed using `~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so -h` The trace file location can be changed by adding `--trace-output `. Additionally, the tool can also dump the trace in human-readable format to STDOUT using the `--print-trace` flag. To decode and analyze the trace file, downstream tools should always use the `TraceDecoder` class provided by `trace_decoder.py`. For internal usage, this module also provides a simple entrypoint for trace printing: ```bash python3 trace_decoder.py rzvr_trace.dat ``` #### Debugging Attaching a debugger like GDB to the DR tool might not always be the best debugging option, as the program has three separate states: 1. the state of the program being instrumented (e.g. `ls`) 2. the state of the DR client (`libdr_model.so`) instrumentation 3. the state of DynamoRIO itself (`drrun`) More information about debugging DR clients can be found [here](https://dynamorio.org/page_debugging.html). For our instrumentation, other (possibly simpler) options are available: 1. **Inspecting Debug Traces**: the DR tool can optionally log extra information, e.g. the complete state of the register file before each instruction, each value being read and written to memory, and speculation events like checkpoints are rollbacks, in a separate debug trace: - This option can be enabled using `--log-level ` - By default, the tool will dump debug entries to `rzvr_dbg_trace.dat` in binary format; to change the path of the debug trace file use `--debug-output ` - `--print-debug-trace` can be used to pretty-print debug entries to STDOUT during execution - `trace_decoder.py` also provides a decoder for debug entries - **WARNING:** debug traces can become very big, especially for nested speculation 2. **Running DynamoRIO with logging**: DynamoRIO can also produce logs (see DR documentation): ``` ~/.local/dynamorio/drrun -debug -loglevel 3 -c ~/.local/dynamorio/libdr_model.so --tracer --speculator -- ls /dev/null ``` ================================================ FILE: docs/internals/model-backends/model-unicorn.md ================================================ # Unicorn Backend Unicorn backend architecture: ```text UnicornModel (main orchestrator) ├─ UnicornTracer Records observations (PC, memory addresses, etc.) ├─ UnicornSpeculator Simulates speculative execution ├─ UnicornTaintTracker Tracks data flow for boosted input generation ├─ ExtraInterpreter Handles features Unicorn doesn't support └─ InstructionCoverage Tracks which instructions were tested ``` Key components: - `UnicornModel`: Manages the emulator and coordinates components through hooks on instruction and memory events. - `UnicornTracer`: Implements the observation clause of the contract. Different tracers record different information (program counters, memory addresses, data values). - `UnicornSpeculator`: Implements the speculation clause using checkpoint-rollback mechanisms. When speculation triggers (branch misprediction, CPU exception), it saves state and executes speculatively up to a window limit (default 250 instructions). It rolls back on serializing instructions or window expiration. - `UnicornTaintTracker`: Performs dynamic taint analysis to identify which input bytes affect the contract trace. Used for boosted input generation. ================================================ FILE: docs/intro/01-overview.md ================================================ # Revizor at a Glance ## What is Revizor? Revizor is a security-oriented fuzzer that detects microarchitectural information leaks in CPUs—the vulnerabilities behind attacks like Spectre and Meltdown. It tests processors "blindly," requiring no prior knowledge of specific flaws or hardware internals. Instead, it compares actual CPU behaviour against a [*leakage contract*](../glossary.md#speculation-contract-aka-leakage-contract): a specification defining known sources of information leakage. Any discrepancy reveals a potential vulnerability. ## What Problems Does Revizor Solve? Modern CPUs achieve their speed through speculative execution, out-of-order processing, complex caching, and other microarchitectural optimizations. These optimizations create side channels—timing variations, cache-state changes, buffer contentions—that can leak sensitive data. Such leaks are notoriously difficult to catch: they cause no crashes, depend on precise timing, and emerge only under specific conditions. Revizor automates the detection of these elusive side-channel leaks. Specifically, Revizor addresses several key challenges: * **Automated discovery**: Finding side-channel attacks manually demands deep (often undocumented) microarchitectural knowledge and extensive trial-and-error. Revizor automates this process, systematically exploring the CPU's behaviour by probing the microarchitecture with lots of automatically generated test cases. * **Variant analysis**: Side-channel vulnerabilities spawn many variants. Revizor can search for new attack vectors that might bypass existing patches. * **Validation of mitigations**: Vendor patches meant to close side channels have sometimes proven incomplete. Revizor verifies whether fixes actually eliminate the leakage. ## Quick Example: Detecting Spectre V1 To illustrate how Revizor works, consider a simple fuzzing campaign that will lead to a detection of a known vulnerability in most modern CPUs, namely Spectre V1. !!! info "Prerequisites" Before running this example, ensure you have Revizor installed and set up correctly. Follow the [Installation Guide](02-install.md) if you haven't done so already. We will use a configuration file in `demo/detect-v1.yaml`. This config file tells Revizor to test a small subset of x86-64 ISA (arithmetic instructions + conditional branches) against a contract that states that the CPU should not speculate and should only leak information about loads, stores, and the program counter. As most modern CPUs implement branch prediction, we expect to see a violation of this contract. Run the fuzzer with the following command: ```bash $ rvzr fuzz -s base.json -n 1000 -i 100 -c demo/detect-v1.yaml -w ./ ``` After a short while, you should see output similar to this: ``` INFO: [prog_gen] Setting program_generator_seed to random value: 562112 INFO: [fuzzer] Starting at 14:00:51 13 ( 2%)| Stats: Cls:100/100,In:200,R:9,SF:5,OF:6,Fst:2... ================================ Violations detected ========================== Violation Details: ----------------------------------------------------------------------------------- HTrace | ID:71 | ID:171| ----------------------------------------------------------------------------------- ^................^..^.....^.....................^.....^......... | 599 | 0 | ^...................^.....^..................................... | 28 | 23 | ^................^..^.....^.......^...^......................... | 0 | 604 | ================================ Statistics =================================== Test Cases: 14 Inputs per test case: 200.0 Violations: 1 Effectiveness: Total Cls: 100.0 Effective Cls: 100.0 Discarded Test Cases: Speculation Filter: 5 Observation Filter: 6 Fast Path: 2 Max Nesting Check: 0 Tainting Check: 0 Early Priming Check: 0 Large Sample Check: 0 Priming Check: 0 Duration: 52.1 Finished at 14:01:43 ``` This message indicates that Revizor found a [violation](../glossary.md#violation) of the specified contract, and the tool will store the corresponding [violation artifact](../glossary.md#violation-artifact-aka-contract-counterexample) in `./violation-/`. What happened here is that Revizor generated a series of random [test programs](../glossary.md#test-case-program), executed them on the target CPU and the reference model that implement the contract, collected the side-channel observations on both sides, and compared them. In this case, one of the generated test programs produced two different [hardware traces](../glossary.md#hardware-trace-htrace) for two different inputs while the model (contract) produced the same trace for both inputs. This discrepancy indicates that the CPU leaked information through microarchitectural side channels in a way that violates the specified contract. The corresponding program and the [inputs](../glossary.md#test-case-data-aka-test-case-input) are stored in the violation artifact (`./violation-/`), and it will contain an assembly file `program.asm` that surfaced a violation, a sequence of inputs `input_*.bin` to this program, and some details about the violation in `report.txt`. If we inspect the assembly code in `program.asm` and do an analysis of the violation, we will most likely find that it is a gadget that implements a typical Spectre V1 pattern: a conditional branch and a speculative memory access that leaks data through the cache. (This is a most likely outcome because the pattern is statistically very common for the given configuration). For example, the program may look like this (simplified for illustration): ```assembly .section .data.main ... jnp .bb_0.1 // conditional branch jmp .exit_0 .bb_0.1: ... or byte ptr [r14 + rcx], al // data-dependent memory access ... .exit_0: .test_case_exit: ``` !!! info "On violation analysis" This example was intentionally chosen to have a straightforward output that directly corresponds to a known vulnerability pattern. In practice, analyzing violations can be more complex, especially for novel or less understood leaks. We won't go into the details of the analysis here as it is a relatively complex topic; refer to the [this guide](../howto/root-cause-a-violation.md) if you want to dive into the details. The power of this approach is that Revizor doesn't need to know the specific vulnerability it's looking for. It simply tests whether the CPU matches the expected security specification. When it finds a discrepancy, that's a potential vulnerability worth investigating. ## What's Next? Now that you understand what Revizor is and what it does, here are your next steps: * **Dive Deeper into Concepts**: For a more detailed explanation of the information flow analysis used in Revizor, the concepts of leakage contracts, and other related topics, see the [Core Concepts Guide](03-primer.md). * **Follow a Tutorial**: Our [step-by-step tutorial series](./04-tutorials.md) guides you through detecting your first vulnerability, understanding the results, and designing effective fuzzing campaigns. * **Explore the Glossary**: Familiarize yourself with key terms and definitions in the [Glossary](../glossary.md) to better understand Revizor's terminology (we have quite a few unique terms!). * **Get Help**: If you run into issues or have questions, visit our [FAQ](../faq/general.md) for common questions, or [ask a question](../howto/ask-a-question.md) to reach out to the community. ================================================ FILE: docs/intro/02-install.md ================================================ # Installation **Warning**: Revizor runs randomly-generated code in kernel space. This means that a misconfiguration (or a bug) can crash the system and potentially lead to data loss. Make sure you're not running Revizor on a production machine, and that you have a backup of your data. ### 1. Requirements **Hardware**: x86-64 or ARM64 CPU. Specifically: * All Intel and AMD x86-64 CPUs are supported. * Some ARM CPUs are also supported, namely Microsoft Cobalt and Raspberry Pi. Other ARM CPUs may work, but are not officially supported. **No virtualization**: You will need a bare-metal OS installation. Testing from inside a VM is not supported. **OS**: The target machine has to be running Linux v4.15 or later. ### 2. Python Package The preferred installation method is using `pip` within a virtual environment. The python version must be 3.9 or later. ```bash sudo apt install python3.9 python3.9-venv /usr/bin/python3.9 -m pip install virtualenv /usr/bin/python3.9 -m virtualenv ~/venv-revizor source ~/venv-revizor/bin/activate pip install revizor-fuzzer ``` ### 3. Executor In addition to the Python package, you will need to build and install the executor, which is a kernel module. ```bash # building a kernel module require kernel headers sudo apt-get install linux-headers-$(uname -r) linux-headers-generic # get the source code git clone https://github.com/microsoft/side-channel-fuzzer.git # build executor cd side-channel-fuzzer/rvzr/executor_km make uninstall # the command will give an error message, but it's ok! make clean make make install ``` ### 4. (Optional) DynamoRIO Backend If you want to use the DynamoRIO-based model, it has to be installed separately: ```bash # install dependencies sudo apt-get install cmake g++ g++-multilib doxygen git zlib1g-dev libunwind-dev libsnappy-dev liblz4-dev # install DynamoRIO and the model make -C rvzr/model_dynamorio # check installation ~/.local/dynamorio/drrun -c ~/.local/dynamorio/libdr_model.so --list-tracers -- ls # expected output: # ct # ... # /dev/null ``` ### 5. Download ISA spec ```bash rvzr download_spec -a x86-64 --extensions ALL_SUPPORTED --outfile base.json # Alternatively, use the following command to include system instructions; # however, mind that testing these instructions may crash the system if misconfigured! # rvzr download_spec -a x86-64 --extensions ALL_AND_UNSAFE --outfile base.json ``` ### 6. Test the Installation To make sure that the installation was successful, run the following command: ```bash ./tests/quick-test.sh # The expected output is: Detection: OK Filtering: OK ``` If you see any other output, check if the previous steps were executed correctly. If you still have issues, please [open an issue](https://github.com/microsoft/side-channel-fuzzer/issues). ### 7. (Optional) System Configuration External processes can interfere with Revizor's measurements. To minimize this interference, we recommend the following system configuration: * Disable Hyperthreading (BIOS option); * Disable Turbo Boost (BIOS option); * Boot the kernel on a single core (add `-maxcpus=1` to [Linux boot parameters]((https://wiki.ubuntu.com/Kernel/KernelBootParameters))). If you skip these steps, Revizor may produce false positives, especially if you use a low value for [`executor_sample_sizes`](../ref/config.md#executor-configuration) for measurements. However, a large sample size (> 300-400) usually mitigates this issue. ================================================ FILE: docs/intro/03-primer.md ================================================ # Primer: Speculation Contracts and Model-Based Relational Testing Below is a brief primer on the theoretical foundations of speculation contracts and model-based relational testing—concepts that underlie the Revizor tool. This primer provides a high-level overview of the topic, introducing the concepts of noninterference, speculation contracts, and model compliance. This document is intended for those new to the topic, particularly people without a background in information-flow analysis. For a more detailed and technical explanation, refer to the [original contracts paper](https://arxiv.org/pdf/2006.03841). ## Information-Flow Properties We will start with the basics: the concepts of confidentiality and [noninterference](../glossary.md#noninterference), which are fundamental to understanding how [speculation contracts](../glossary.md#speculation-contract-aka-leakage-contract) work. Traditionally, security mechanisms like access control and encryption have focused on protecting data at rest or in transit. However, these mechanisms do not address the problem of [**information flow**](../glossary.md#information-flow) within a system. For example, consider a program that reads a secret input and then writes it to a public output, such as a web server that logs failed login attempts along with the username and masked password entered. Even if the program is secure in the sense that it does not allow unauthorized access to the secret data, it may still leak the secret through its public output, such as logging "User admin failed login with password starting with 'P@ss'" — revealing partial information about the secret password. This is where **information-flow security** comes into play. Information-flow security is concerned with how data moves through a computation and how it can be observed by an attacker. The goal is to ensure that secret information does not leak to observers who are unauthorized to access it. An **end-to-end confidentiality policy** might be stated as: *“No secret input data can be inferred by an attacker through observations of system output.”* In other words, even if an adversary can see all public outputs of a computation, they should learn nothing about the secret inputs. **Information-flow properties** generally classify program variables or inputs/outputs into security levels (e.g., `Secret` and `Public`). The key property for confidentiality is that *no information flows from Secret to Public.* But how can information flow? There are two primary routes: - **Explicit flows:** These occur when confidential data is directly assigned or passed into a public variable or output. For example, in code, writing `public = secret` is an explicit flow from a secret variable to a public variable (an obvious violation of confidentiality). Any mechanism that directly transfers the bits of a secret into a publicly observable sink is an explicit flow. Such flows are usually straightforward to detect. - **Implicit flows:** These occur indirectly, through the control structure of the program. An implicit flow arises when the *control path* taken by a program (e.g., which branch of an `if` or how many loop iterations) depends on a secret, thereby implicitly leaking information. === "Example 1: Implicit Flow" : Consider this pseudocode example: ```c if (Sec == 0) { Pub = 0; } else { Pub = 1; } ``` Here `Sec` is a secret input and `Pub` is a public output. There is no direct assignment of `Sec` to `Pub`. However, an observer of `Pub` can deduce information about `Sec`. In fact, this program sets `Pub` to 0 if `Sec` was 0; otherwise, it sets `Pub` to 1—effectively copying the one-bit information “is Sec zero?” into `Pub`. This is an implicit flow of information from `Sec` to `Pub` through the control structure (the `if` condition on `Sec`). ## Noninterference: Definition and Examples **Noninterference** is a formal property that captures the idea of perfect confidentiality: changes in secret data have *no observable effect* on public outputs. This property can be formalized as: *"a system is noninterferent if variations in Secret inputs cause no differences in Public outputs"*. Equivalently, confidential inputs do not interfere with the publicly visible state of the system. To make this more concrete, imagine we run a program twice with two different secret inputs but the same public inputs. If **no attacker can distinguish** the two runs by observing anything public, then the program satisfies the noninterference property. The “attacker” here is assumed to have complete access to all public outputs, which are formalized as a function `PublicOut`: ``` output = PublicOut(Sec, Pub) ``` Noninterference essentially demands that for any two secrets `Sec1` and `Sec2` and any public input `Pub`, the program’s behavior from an attacker’s perspective is identical when run on `(Sec1, Pub)` versus `(Sec2, Pub)`: === "Definition 1: Noninterference" : A program `P` is noninterferent if, for all
public inputs `Pub` and all pairs of secret inputs `Sec1`, `Sec2` it holds that
`PublicOut(P, Sec1, Pub) = PublicOut(P, Sec2, Pub)`. Here are some examples to illustrate this principle: === "Example 2: Interfering program" : Suppose our program simply copies a secret to output: ```c void copy(int* sec, int* output) { *output = *sec; } ``` : Running it with two different secrets clearly yields different public outputs (e.g., `output` becomes 5 in one run and 7 in another). An attacker would distinguish these runs, so the program is **not** noninterferent—it blatantly leaks information. --- === "Example 3: Noninterfering program" : A trivial example of a noninterferent program is one that produces no output dependent on the secret. For instance: ```c void assign_zero(int* sec, int* output) { *output = 0; } ``` : This program ignores secret `sec` entirely and always sets the public output `output` to 0. No matter what the secret input is, the public output is constant (0), so an attacker gains no information about `sec`. Indeed, any two runs are indistinguishable (both runs output 0). This satisfies noninterference (albeit by doing nothing useful with the secret). --- === "Example 4: Allowed benign dependency" : It is possible for a program to use secret data internally yet still be noninterferent as long as the final public outputs don’t reveal those secrets. For instance: ```c void mask_secret(int* sec, int* output) { int temp = *sec; temp = temp * 0; // multiply secret by 0 *output = temp; } ``` : Here the program *did* read the secret (`sec`) and even manipulated it, but it “washed out” the secret by multiplying by 0. The value assigned to `output` is always 0. From an external view, this is just like the previous example—no dependence of `output` on `sec`. Noninterference is concerned only with *what can be observed by the attacker*, not with whether the program internally used the secret. As long as any use of the secret eventually has no effect on outputs, the policy holds. : Naturally, this example is not useful either, as it does nothing with the secret. In practice, however, there are techniques to ensure noninterference while still making use of secret data for useful computations. We won't go into these techniques here as they are beyond the scope of this primer. --- One important insight is that noninterference is relative to a given specification of what is “observable.” If you consider only the functional outputs as observable, a program might be noninterferent in that model. But if in reality the attacker can observe more (e.g., the execution time of a program), then the program that was secure in theory might be insecure in practice. This leads us to examine how *side channels* break the assumptions of basic noninterference. ## Beyond Direct Outputs: Side Channels The original works on information-flow properties focused on direct outputs of a program (e.g., writing to a file or a network socket). However, in practice, attackers can extract information from more than just the “official” outputs of a program. For example, the attacker might observe how long a computation takes or measure the power consumption of a device. These additional sources of information are called **side channels**. Side channels are unintended channels through which secret data can be inferred by observing the system’s behavior, even if the direct outputs are secure. These side channels can reveal information about the secret inputs, and so we must include them in the definition of noninterference. Similarly to how we defined `PublicOut(Sec, Pub)` as the observable output, we can define `Trace` as the observable side-channel information for a given program `P`. ``` trace = Trace(P, Sec, Pub) ``` For example, a trace might be the execution time of the program or its cache access pattern. Noninterference then requires that the traces of two runs with different secrets - `(Sec1, Pub)` versus `(Sec2, Pub)` - are indistinguishable to an attacker. This is a stronger requirement than just looking at the functional outputs. === "Definition 2: Side-Channel Noninterference" : Given a side channel that produces a trace `Trace`, a program `P` is noninterferent with respect to this side channel if, for all public inputs `Pub` and all pairs of secret inputs `Sec1`, `Sec2` it holds that
`Trace(P, Sec1, Pub) = Trace(P, Sec2, Pub)`. Here are some examples of side channels and how they can violate noninterference: === "Example 5A: Timing side channel" : Consider a program that reads a compares a password with a user’s input: ```c bool check_password(const char *attempt, const char *pswd) { for (int i = 0; i < length(pswd); i++) { if (attempt[i] != pswd[i]) { return false; // mismatch found, return early } } return true; // all characters matched } ``` : If the attacker can measure how long the function takes to reject a guess, they can infer the password one character at a time. This leakage surfaces as a violation of the noninterference property with respect to timing observations. : A counterexample to Definition 2 could be as follows: Let's say we use the same input on two different secrets: : - `input1={attempt="aaa", pswd="abc"}` : - `input2={attempt="aaa", pswd="aab"}` : The traces of these inputs will be: : - `trace1 = Trace(check_password, input1) = 1` : - `trace2 = Trace(check_password, input2) = 2` : These inputs constitute a violation of Definition 2, as `trace1 != trace2` even though the two inputs have the same public values. --- === "Example 5B: Timing side channel - Password length" : Noninterference is able to model different kinds of secret-dependent leaks. Let's take for example a patched version of the previous program: ```c bool check_password(const char *attempt, const char *pswd) { int len = min(length(attempt), length(pswd)); bool same = true; for (int i = 0; i < len; i++) { same = same && (attempt[i] == pswd[i]); // all the loop is executed } return same; } ``` : In this version there is no early-exit condition, yet the attacker is still able to infer the _length_ of the password through a side-channel. This is captured by the following counterexample: : - `input1={attempt="aaaaaa", pswd="b"}`, `trace1 = 1` : - `input2={attempt="aaaaaa", pswd="bbb"}`, `trace2 = 3` : Which shows that the program still violates Definition 2. --- === "Example 6: Cache side channel" : Consider a program that uses a secret value to index into an array, as in the following code: ```c int multiply(const char *array, int pub, int sec) { char x = array[sec]; return x * pub; } ``` : A co-located attacker could observe the cache access pattern of the program by using Prime+Probe or Flush+Reload attack. Such traces can reveal the addresses accessed by the program and thus leak the secret value. This leakage would violate the noninterference property with respect to cache observations. : A violation could be surfaced by two inputs: : - `input1={array=0x10000, pub=1, sec=0x40}` : - `input2={array=0x10000, pub=1, sec=0x80}` : Let's assume that the cache line size is 64 bytes, and the cache is direct-mapped, meaning that the cache line ID is based on the memory access address `addr` as `line_id = (addr % 0x1000) // 0x40`. Since the array access in the first line of `multiply` will access two different addresses for the two inputs, they will also produce two different traces: : - `trace1 = Trace(multiply, input1) = ((0x10000 + 0x40) % 0x1000) // 0x40 = 1` : - `trace2 = Trace(multiply, input2) = ((0x10000 + 0x80) % 0x1000) // 0x40 = 2` : Since we have two inputs that match on the secret value `sec` but differ on the cache trace, this constitutes a violation of Definition 2. ## Challenges of Side-Channel Noninterference Despite its completeness, the above formalization of side-channel noninterference is too simplistic to faithfully capture the side effects of program execution on modern, highly optimized hardware, especially CPUs. There are two key challenges: - *Challenge 1 - Noisy and Non-Deterministic Traces*: The traces observed by the attacker over a side channel are typically noisy, non-deterministic, and depend on the microarchitectural state of the CPU. For example, cache access patterns can be influenced by other programs running on the machine, the operating system and its interrupts, and can depend on microarchitectural buffers like store buffers or branch history tables. This means that the `Trace` function is not a simple deterministic function of the program inputs, but a complex function of many factors, some of which affect the result concurrently and in a non-deterministic fashion. - *Challenge 2 - Unknown Side Channels*: Modern CPUs have a plethora of side channels, including cache timing, branch prediction, and many others. To ensure complete confidentiality, we need to check that the program does not leak information over *any* of them. This is a challenging task, as we do not know the full set of possible side channels when it comes to commercial hardware with proprietary microarchitectures. For example, a CPU might have an obscure microarchitectural optimization that vastly expands possibilities for information leaks, as was the case with Spectre and Meltdown vulnerabilities. Not including this optimization will undermine the noninterference analysis. Therefore, to test for noninterference comprehensively, we need a way to discover and reason about all possible side channels that could leak information. The next two sections discuss how speculation contracts address these challenges. ## Speculation Contracts: Dealing with the Complexity of Modern Hardware As a solution to the first challenge, Guarnieri et al. (2021) introduced the concept of **speculation contracts**. A speculation contract is a simplified and deterministic model of the hardware, designed to capture the information that a given program *could* leak over side channels when executed with the given inputs. The key term here is "could"—the contract is not meant to exactly predict the side-channel traces, but instead, it errs on the side of caution, overestimating the possible leaks to achieve deterministic and noise-free traces. A speculation contract works by defining two key aspects for every instruction in the CPU's ISA: 1. [**Observation Clause**](../glossary.md#observation-clause): For each instruction that may have an observable side effect, the contract declares an observation clause. It describes the data exposed by the instruction. 2. [**Execution Clause**](../glossary.md#execution-clause): For each instruction whose semantics may be affected by hardware optimizations (e.g., speculative execution), the contract declares an execution clause. It describes the effect of such optimizations, but without specifying the exact mechanism of the optimization. At a high level, a contract implements a function `ContractTrace` that maps a program `P` and its inputs `Sec, Pub` to a [contract trace](../glossary.md#contract-trace-ctrace) `ctrace`. It is essentially a conservative approximation of the `Trace` function. ``` ctrace = ContractTrace(P, Sec, Pub) ``` The contract trace is a sequence of all data that is exposed when a program is executed according to a contract. It captures the side-channel observations that *could be visible* if the CPU followed the speculation contract's rules for a given program execution. Accordingly, the noninterference property is redefined in terms of the contract trace: === "Definition 3: Contract Noninterference" : Given a contract that produces a contract trace `ContractTrace`, a program `P` is noninterferent with respect to this contract if,
for all public inputs `Pub` and all secret inputs `Sec1`, `Sec2`, it holds that
`ContractTrace(P, Sec1, Pub) = ContractTrace(P, Sec2, Pub)`. The following examples illustrate how a contract can be used to model side-channel leaks on a CPU. === "Example 7: Memory Observation Contract, MEM-SEQ" : Let's imagine a CPU with a shared data cache and no other optimizations (i.e., no speculation). A co-located attacker can recover the addresses of loads/stores by observing which of the cache sets changed their state via a cache timing side-channel attack (e.g., Prime+Probe). We can encode these expectations in an observation clause for loads and stores by specifying that they expose their address. Since the CPU does not speculate, the execution clause for all instructions is empty. We call this contract MEM-SEQ (memory leakage with sequential execution), and it can be summarized as a table: | | Observation Clause | Execution Clause | | ----- | ------------------ | ---------------- | | Load | Expose Address | - | | Store | Expose Address | - | | Other | - | - | : Note that MEM-SEQ intentionally overestimates the leaks by assuming that the attacker observes complete addresses loads/stores (in contrast to a subset of bits that are actually leaked in practice) and that *all* loads/stores are observable (in reality, they might be masked by noise or other factors). This overestimation is intentional to ensure that the contract is conservative and captures all possible corner cases. : Let's now consider how we can produce a contract trace using MEM-SEQ. We will use a slightly modified version of the `multiply` function from Example 6: ```c int multiply(const char *array, int pub, int sec) { char x = array[sec]; // MEM-SEQ exposes: &array[sec] char y = array[pub]; // MEM-SEQ exposes: &array[pub] return x * y; } ``` : The inputs are: : - `input1 = {array=0x10000, pub=1, sec=2}` : - `input2 = {array=0x10000, pub=1, sec=3}` : The model collects a trace by executing the program line-by-line according to the rules in the table above (in practice, this is usually done using a modified CPU emulator). The first line has a load from memory, so the model records the address `&array[sec]` as exposed. The second line has another load, so the model records the address `&array[pub]` as exposed. The contract traces for this program would be: : - `ctrace1 = ContractTrace(multiply, input1) = [0x10002, 0x10001]` : - `ctrace2 = ContractTrace(multiply, input2) = [0x10003, 0x10001]` : Finally, this model can be used to check for noninterference by comparing contract traces according to Definition 3. In this case, we have two inputs with matching public values and different secrets, and they produced different contract traces, `ctrace1 != ctrace2`. This constitutes a violation and means that the `multiply` function is not noninterferent with respect to MEM-SEQ. --- === "Example 8: Branch Prediction Contract, MEM-COND" : Now let's consider a more complex scenario, with a CPU that implements branch prediction—a common form of speculative execution. In this case, the CPU may incorrectly predict branch targets and execute instructions that are not part of the correct control flow. We can model this behavior in a contract by introducing an execution clause for conditional jumps that specifies the mispredicted target. To make the example useful, we will assume that the CPU also has a data cache, so the observation clause for loads and stores remains the same as in MEM-SEQ. We call this contract MEM-COND (memory leakage with conditional branch misprediction). | | Observation Clause | Execution Clause | |------------|--------------------|-------------------| | Load | Expose Address | - | | Store | Expose Address | - | | Cond. Jump | - | Mispredict Target | | Other | - | - | : As a target program we will use the following function: ```c int conditional_multiply(char *array, int pub, int sec) { int z = array[pub]; // MEM-COND exposes: &array[pub] if (z < 10) { // MEM-COND mispredicts (assume z = 10) z *= array[sec]; // MEM-COND exposes: &array[sec] } return z; } ``` : and a pair of inputs with the same public value but different secrets: : - `input1 = {array=0x10000, pub=1, secret=2}` : - `input2 = {array=0x10000, pub=1, secret=3}` : The first line of `conditional_multiply` has a load, so it exposes its address, `&array[pub]`. For the sake of this example, let's assume this load returns `10`, so the next branch is not supposed to be taken. However, according to MEM-COND, branches take the wrong target, so the model executes the third line anyway. This line is a load, so it exposes the address `&array[sec]`. After this, the program terminates, and the resulting traces are: : - `ctrace1 = ContractTrace(conditional_multiply, input1) = [0x10002, 0x10001]` : - `ctrace2 = ContractTrace(conditional_multiply, input2) = [0x10003, 0x10001]` : Again, the traces are different, so the program violates noninterference with respect to MEM-COND. : Notably, however, these two inputs would *not* violate noninterference with respect to MEM-SEQ, as the branch at line 2 would not be mispredicted, and the traces would be identical: : `ctrace_mem_seq1 = ctrace_mem_seq2 = [0x10001]` ## Building and Testing Speculation Contracts Speculation contracts are typically built by hand, with the initial versions based on public knowledge of the CPU's microarchitecture and its side-channel vulnerabilities. However, in the case of commercial CPUs, the exact details of the microarchitecture are often proprietary and not publicly disclosed. In these cases, the contract could—and often will—be incomplete. This is where the testing of speculation contracts becomes crucial: the initial "draft" of a contract is tested against the real hardware to ensure that it captures all side-channel leaks that the CPU exhibits. If the contract misses something, it is refined based on the results of the testing, and the process is repeated until the contract is deemed safe to use. But how do we test a speculation contract? A naive approach might be to directly compare the traces produced by the model with the traces collected from the real CPU for the same program and inputs. However, this approach is generally not feasible because the contract traces intentionally overestimate the hardware traces, so mismatches are expected. Moreover, the model might expose information differently than the real hardware (e.g., the model might expose load/store addresses, while the hardware exposes cache set indexes), meaning direct comparison is often impossible. Instead, a more precise approach is to compare *the information contained in the traces*. The idea is to check that the information exposed by the model is a strict superset of the information exposed by the real hardware. This is done by verifying that all inputs producing identical contract traces for a given program also produce identical hardware traces. If this property holds for all possible programs and inputs (ignore the complexity question for now), then any program that would be noninterferent with respect to the real hardware is guaranteed to be noninterferent with respect to the speculation contract. At this point, the model is safe to use as a proxy for real hardware when analyzing side-channel leaks. To formalize this idea, let's introduce a new function `HardwareTrace` to denote the [hardware trace](../glossary.md#hardware-trace-htrace) collected from the real hardware, and it will take an extra argument `Ctx` to capture the fact that real-world hardware traces depend on the microarchitectural state (e.g., on the state of branch predictors or caches). === "Definition 4: [Contract Compliance](../glossary.md#contract-compliance)" : A CPU complies with a speculation contract if, for all programs `P`, all input pairs `(Sec1, Pub), (Sec2, Pub)`, and all initial microarchitectural states `Ctx`, if `ContractTrace(P, Sec1, Pub) = ContractTrace(P, Sec2, Pub)`, then `HardwareTrace(P, Sec1, Pub, Ctx) = HardwareTrace(P, Sec2, Pub, Ctx)`. and conversely === "Definition 5: [Contract Violation](../glossary.md#violation)" : A CPU violates a speculation contract if there exists a program `P`, a microarchitectural state `Ctx`, and two inputs `(Sec1, Pub), (Sec2, Pub)` such that `ContractTrace(P, Sec1, Pub) = ContractTrace(P, Sec2, Pub)` and
`HardwareTrace(P, Sec1, Pub, Ctx) != HardwareTrace(P, Sec2, Pub, Ctx)`. We call the tuple `(P, Ctx, Sec1, Sec2)` a [**contract counterexample**](../glossary.md#violation-artifact-aka-contract-counterexample). The counterexample demonstrates that an adversary can learn more information from hardware traces than what the contract specifies. A counterexample indicates a potential microarchitectural leakage that was not accounted for by the contract. The goal of Revizor is to find such counterexamples. ## [Model-Based Relational Testing](../glossary.md#model-based-relational-testing-mrt) and Revizor Revizor applies the principles above, and provides a framework for building executable speculation contracts together with a mechanism to test real hardware (currently only CPUs) against these contracts by searching for contract counterexamples, as in Definition 5. However, there are certain issues that appear when the theory from the previous section is applied in practice, which we had to address in Revizor. The first issue is the search space: testing all possible programs and inputs is literally impossible. We mitigate this issue by relying on a sampling-based approach, similar to fuzzing, where we approximate the complete search space via random sampling. Specifically, Revizor generates small (50-100 instructions long) programs, creates random inputs for them, collects both the contract and hardware traces for these inputs, and checks whether any of the traces constitute a contract counterexample. This process is called [*Model-based Relational Testing*](../glossary.md#model-based-relational-testing-mrt), and it is detailed further in the [Architecture Overview](../internals/architecture/overview.md). This approach works well in practice because any given hardware optimization can typically be triggered by many different programs, and we need to find only one instance to detect a violation. Evidence of this is the [list of trophies](https://microsoft.github.io/side-channel-fuzzer/) that Revizor has already amassed. The second issue we encountered is nondeterminism. As mentioned earlier, hardware traces can be non-deterministic due to various factors like interrupts or other programs running on the machine. To handle this, we use statistical methods: Revizor collects hardware traces for each program-input pair multiple times and then compares their distributions. If the distributions of the traces are statistically similar, Revizor considers the traces to be equivalent. This approach helps us account for noise in the hardware traces while still making reliable decisions about contract compliance. For more details, see [Architecture Overview](../internals/architecture/overview.md). ## Conclusion In this primer, we have introduced the concepts of noninterference, side channels, and speculation contracts, which all underlie the design of Revizor: - The hardware fuzzer in Revizor uses speculation contracts and the concepts of noninterference (1) to detect unexpected side channels and dangerous microarchitectural optimizations in commercial CPUs, and (2) to aid in building sound leakage models for those CPUs. - The software fuzzer in Revizor (*NOTE: currently under construction*) uses the leakage models produced by the hardware fuzzer, and applies the principles of noninterference testing to detect side-channel vulnerabilities in real-world software. With these two components, we aim to provide a comprehensive tool for discovering and mitigating side-channel vulnerabilities software that can handle even the most obscure and complex microarchitectural optimizations in modern hardware. --- ## Sources and Further Reading - A. Sabelfeld and A. C. Myers. *Language-Based Information-Flow Security*. IEEE Journal on Selected Areas in Communications, 21(1), 2003. (Survey of information-flow security, implicit/explicit flows, covert channels, etc.) - J. A. Goguen and J. Meseguer. *Security Policies and Security Models*. IEEE Symposium on Security and Privacy, 1982. (Origin of noninterference as a security policy formalism.) - J. B. Almeida et al. *Verifying Constant-Time Implementations*. USENIX Security Symposium, 2016. (Constant-time programming principles and the ct-verif tool for automated verification.) - M. Guarnieri, B. Köpf, J. Reineke, P. Vila. *Hardware-Software Contracts for Secure Speculation*. IEEE Symposium on Security and Privacy, 2021. (Original paper on speculation contracts.) - O. Oleksenko, C. Fetzer, B. Köpf, M. Silberstein. *Revizor: Testing Black-box CPUs against Speculation Contracts*. ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022. (Paper describing Model-based Relational Testing and Revizor.) ================================================ FILE: docs/intro/04-tutorials.md ================================================ # Starting with Tutorials Let's learn by example. This is a starting point for a tutorial series that will teach you how to use Revizor for testing CPUs, from the most basic cases, to detection of Spectre and Meltdown, to building custom campaigns for detecting new vulnerabilities, and up to building custom extensions for Revizor for the most advanced cases. !!! note "Prerequisites" Before proceeding with this tutorial, ensure that you have completed the installation steps outlined in the [Installation Guide](02-install.md). !!! question "Need Help?" - **Questions about the tutorial?** Check the [FAQ](../faq/general.md) or open a [GitHub discussion](https://github.com/microsoft/side-channel-fuzzer/discussions) - **Found a bug?** Report it in [GitHub issues](https://github.com/microsoft/side-channel-fuzzer/issues) ### Let's get started! Ready to dive in? * [Tutorial 1](./tutorials/01-first-fuzz.md) - run your first fuzzing campaign with Revizor * [Tutorial 2](./tutorials/02-first-vuln.md) - find your first microarchitectural vulnerability with Revizor * [Tutorial 3](./tutorials/03-faults.md) - learn how to test faults and exceptions with Revizor * [Tutorial 4](./tutorials/04-isolation.md) - explore how to test domain isolation boundaries * [Tutorial 5](./tutorials/05-extending.md) - extend Revizor with custom features ================================================ FILE: docs/intro/start-here.md ================================================ # Getting started New to Revizor? Or to side-channel testing in general? You came to the right place: read this material to quickly get up and running. ## Introductory Materials * [Revizor at a Glance](01-overview.md): Understand what Revizor is, what problems it solves, and see a quick example of violation detection. * [Installation Guide](02-install.md): Get Revizor installed on your system and verify your setup. * [Core Concepts](03-primer.md): Learn about contracts, traces, speculation, and other fundamental concepts needed to use Revizor effectively. * [Tutorial Series](04-tutorials.md): Follow a series of hands-on tutorials that walk you through running your first tests, detecting violations, and rump up all the way to root-cause analysis and design of custom campaigns. * [Glossary](../glossary.md): A quick reference for key terms used throughout the documentation. ## Research Interested in the academic research behind Revizor? Check out the papers listed in the [Research Papers](../ref/papers.md) section. ## Need Help? [Ask a Question](../howto/ask-a-question.md) about Revizor if you need assistance or have any questions. ================================================ FILE: docs/intro/tutorials/01-first-fuzz.md ================================================ # Tutorial 1: Your First Fuzz This is the first part of the tutorial on the basic usage of Revizor. ### Overview In this first tutorial, we'll start with a baseline experiment to verify your Revizor installation and familiarize yourself with the basic workflow. This tutorial walks you through a simple fuzzing campaign that should find no violations. The goal of this first campaign is verification, not vulnerability detection. We'll deliberately choose an instruction set that should not trigger speculation on Intel or AMD CPUs—specifically, simple arithmetic operations without any branches or memory speculation sources. Since there are no conditional branches to mispredict and no page faults to speculate around, we expect the CPU to execute sequentially without any speculative side effects. This baseline is useful for two reasons. First, it confirms your installation is working correctly. If the fuzzer crashes or behaves unexpectedly, you'll know there's a setup issue rather than discovering problems later during more complex campaigns. Second, it establishes what "no violations" looks like, so you can recognize the difference when you do find a vulnerability in the next tutorial. ### Create your first configuration file Revizor's behavior is controlled by a YAML configuration file that specifies which instructions to test and what contract to check against. Create a file named `config.yaml` with the following content: ```yaml # tested instructions instruction_categories: - BASE-BINARY # prevent branch generation max_bb_per_function: 1 min_bb_per_function: 1 # contract contract_observation_clause: loads+stores+pc contract_execution_clause: - no_speculation ``` Let's understand each section. The `instruction_categories` field tells Revizor which instructions to include in generated test cases. We're using `BASE-BINARY`, which includes only arithmetic and logical operations like `add`, `sub`, `and`, `xor`, and `mov`. These operations are data-processing instructions that don't involve control flow or special memory access patterns. The `max_bb_per_function` and `min_bb_per_function` settings both set to 1 ensure that Revizor generates programs with exactly one basic block—meaning no branches at all. This simplifies our test cases to pure arithmetic sequences, eliminating any possibility of branch misprediction. The contract configuration section is set to use the simplest contract, CT-SEQ. This contract assumes nothing about the target CPU except the presence of CPU caches, making it a zero-knowledge baseline for detecting unknown vulnerabilities. With CT-SEQ, Revizor reports any information leaks beyond the most trivial non-speculative cache accesses. For a complete reference of all configuration options, see the [Configuration Reference](../../ref/config.md). ### Run the Campaign Let's run the fuzzer with your baseline configuration: ```bash rvzr fuzz -s base.json -c config.yaml -n 100 -i 50 -w . ``` This command tells Revizor to execute 100 test cases (`-n 100`) with 50 inputs per test case (`-i 50`), using the ISA specification from `base.json` and your configuration file. The `-w .` flag specifies the working directory for saving any violations. You'll see output similar to this: ``` INFO: [fuzzer] Starting at 14:32:18 100 (100%)| Stats: Cls:50/50,In:100,R:5,SF:0,OF:0,Fst:0,CN:0,CT:0,P1:0,CS:0,P2:0,V:0 ================================ No Violations detected =========================== ``` The campaign should complete in under a minute with no violations detected. This is exactly what we expect—our simple arithmetic instructions don't trigger speculation, so the hardware behaves according to the strict sequential contract. ### Interpret the statistics Let's examine the statistics line to understand what Revizor is reporting: ``` 100 (100%)| Stats: Cls:50/50,In:100,R:5,SF:0,OF:0,Fst:0,CN:0,CT:0,P1:0,CS:0,P2:0,V:0 ``` #### `100 (100%)` This part shows we completed all 100 test cases. This number was continuously updated while the fuzzer was running. #### `Cls:50/50` These numbers indicate the number of [equivalence classes](../../glossary.md#contract-equivalence-class) formed by the inputs. The first number is the effective classes (> 1 input per class) and the second is the total number of classes. If you don't understand what all of this means, that's ok. The only important factors are: - if both numbers are equal (or at least close), and they are also equal to the number of inputs that you've set via `-i` command-line argument: everything is going well. - if the numbers are different, it means either a misconfiguration or an issue with the input generator. Ensure that `input_per_class` config option is `> 1`. - if the numbers are equal, but they are both considerably lower than the number of inputs set via `-i`: You're using an overly simple fuzzing configuration, and you're unlikely to find anything with it. None of the issues above should happen if you're using the config file from this tutorial. If they do, double-check your installation. #### `R:5` This is an indirect indicator of the level of noise on the system. More concretely, it is the average sample size used by the executor. It is an adaptive number, which increases when the tool starts to encounter false positive caused by noise. This number should be relatively small. If you see that it's going above 10-20 range, it is likely because something is polluting the measurements. Consider applying the suggestions [here](../02-install.md#7-optional-system-configuration). #### `SF:0,OF:0,Fst:0,CN:0,CT:0,P1:0,CS:0,P2:0` These numbers are the statistics on the effectiveness of various optimizations used by Revizor, such as speculation and observation filtering. You can ignore these numbers for now, as they are useful only when you're trying to optimize performance of the fuzzer. If you're still curious, though, see the [Fuzzing Statistics Reference](../../ref/runtime-statistic.md). ### Understand what this means The successful completion of this baseline campaign tells you several things. Your Revizor installation is working correctly—the fuzzer can generate test cases, execute them on your hardware, collect traces, and analyze the results. Your system is stable enough for fuzzing—there's no excessive noise preventing measurement. The kernel module loaded correctly and can execute test programs in the sandbox environment. !!! success "Setup Verified" If you've successfully completed this baseline campaign with no violations, your Revizor installation is ready for real vulnerability detection. You can now proceed to Tutorial 2 with confidence. !!! warning "Troubleshooting Common Issues" If the fuzzer crashes or produces errors, check these common problems: **Module not loaded**: Ensure the kernel module is loaded with `lsmod | grep rvzr_executor`. If not, run `cd rvzr/executor_km && make && sudo make install`. **Permission denied**: Revizor needs root privileges to access performance counters. Check that your user account on the system has `sudo` privileges. **ISA specification missing**: If you see "base.json not found", run `rvzr download_spec` first to download the instruction set specification. ### What's Next? You've finished the first tutorial. Congrats! If you're ready to go further and start detecting violations, proceed to [Tutorial 2](./02-first-vuln.md). ================================================ FILE: docs/intro/tutorials/02-first-vuln.md ================================================ # Tutorial 2: Detecting Your First Vulnerability This tutorial is the first step into actual vulnerability detection. You'll learn how to set up a fuzzing campaign that tests conditional branches. And, most likely, it will end with a detection of Spectre V1. ### Testing Workflow Before we begin with actual testing, let's take a step back and consider how a typical testing workflow looks like. The process of using Revizor normally constitutes of the following steps: 1. **Design the campaign** by selecting which instructions to test and choosing an appropriate contract that defines what behavior we consider a violation. 2. **Create a configuration file** that captures these decisions. 3. **Run the fuzzer** to generate and execute random test cases. 4. **Validate the violation** to ensure it's genuine and not a false positive. 5. **Minimize the test case** to remove unnecessary complexity, making it easier to understand. 6. **Analyze the minimized program** to identify the root cause of the vulnerability. In the following, we will go step-by-step through this workflow. ### Plan the campaign Let's imagine we have a new CPU and want to determine if conditional branches produce any information leakage on it. These instructions are infamous for causing Spectre V1, therefore it is always useful to start with them when testing a new CPU. The first step is planning our fuzzing campaign strategically. For effective testing, we'll focus on a minimal instruction subset rather than the entire ISA. Spectre V1 requires only two capabilities: conditional branches (to trigger misprediction) and memory accesses (to leak information through side channels). By limiting our instruction set to just arithmetic operations and conditional branches, we accomplish two goals. First, the fuzzer will find violations faster because there are fewer instruction combinations to explore. Second, when we do find a violation, it will be much easier to analyze because the test case will be simpler. !!! warning Note that this focused approach is *not* representative of a real fuzzing campaign. This tutorial is intentionally simplified to help with understanding. In a real campaign, you'll need to find balance between having a broad scope (increases changes of finding unknown vulnerabilities) and having focus on specific CPU features (simplifies root-cause analysis). For more guidance on campaign design, see [How to Design a Fuzzing Campaign](../../howto/design-campaign.md). We'll pair this minimal instruction set with the strictest possible contract—one that forbids any speculation whatsoever. This means Revizor will flag any speculative behavior as a violation. While this contract is more restrictive than what modern CPUs actually guarantee, it's perfect for our purposes. Since we're only testing conditional branches and simple arithmetic, any speculation we detect will almost certainly be Spectre V1. With this campaign plan, we are trying to answer a specific question: "Does this CPU leak information through conditional branches?" ### Create the configuration file Now that we've planned our campaign, let's translate it into a configuration file. Create a YAML file with the following content: ```yaml # tested instructions instruction_categories: - BASE-BINARY - BASE-COND_BR # contract contract_observation_clause: loads+stores+pc contract_execution_clause: - no_speculation # enable perf. optimizations enable_speculation_filter: true enable_observation_filter: true enable_fast_path_model: true ``` The `instruction_categories` section implements our decision to use a minimal instruction set. We're including `BASE-BINARY` for arithmetic operations like addition and comparison, and `BASE-COND_BR` for conditional branches like `jz` and `jne`. These two categories give the fuzzer everything it needs to express Spectre V1 patterns. The contract configuration consists of two clauses. The `contract_observation_clause` tells Revizor what microarchitectural side effects to track. We're using `loads+stores+pc`, which observes memory access addresses and the program counter—exactly what an attacker would monitor through cache timing attacks. The `contract_execution_clause` defines what execution behavior is allowed. By setting it to `no_speculation`, we're telling Revizor that any speculative execution is a violation. The performance optimization flags at the bottom significantly speed up fuzzing without affecting correctness. The `enable_speculation_filter` skips test cases that don't trigger speculation at all. The `enable_observation_filter` skips test cases that leave no observable traces. The `enable_fast_path_model` allows Revizor to reuse contract traces across similar inputs, reducing the model execution overhead. For a complete reference of all configuration options, see the [Configuration Reference](../../ref/config.md). ### Run the fuzzer Now we're ready to start fuzzing. Run Revizor with the following command: ``` ./revizor.py fuzz -s base.json -c config.yaml -n 1000 -i 10 -w . ``` This command tells Revizor to run 1000 test cases (`-n 1000`), with 10 inputs per test case (`-i 10`), using the ISA specification from `base.json` (`-s`) and our configuration file (`-c`). The `-w .` flag tells Revizor to save any violations it finds to the current directory. As the fuzzer runs, you'll see a continuously updating progress line: ``` 50 ( 5%)| Stats: Cls:10/10,In:20,R:7,SF:38,OF:6,Fst:6,CN:0,CT:0,P1:0,CS:0,P2:0,V:0 ``` ### View the detected violation After a minute or so, you should see a violation. It will be reported in a format similar to this: ``` ================================ Violations detected ========================== Violation Details: ----------------------------------------------------------------------------------- HTrace | ID:4 | ID:14 | ----------------------------------------------------------------------------------- ^......^...^........^.................^...........^............. | 626 | 0 | ^......^...^........^........................................... | 1 | 18 | ^^.....^...^........^....^...................................... | 0 | 609 | ``` Excellent! We've successfully detected a contract violation. Let's understand what this violation report is telling us. The report shows us the violation details in a table format. The header row displays the input IDs that triggered the violation—in this case, inputs 4 and 14: `| ID:4 | ID:14 |` These are two inputs from our test case that the contract predicted would behave identically, but the hardware traces show they behaved differently. The three rows below show the different hardware traces that were observed: ``` ^......^...^........^.................^...........^............. ^......^...^........^........................................... ^^.....^...^........^....^...................................... ``` Each row represents a distinct cache access pattern, visualized as a bitmap where `^` marks an accessed cache line and `.` marks an untouched cache line. We're using Prime+Probe cache side channel measurements (default), so each position in the bitmap corresponds to one of the 64 cache sets in the L1D cache. (A cache set is a group of cache lines that compete for the same position in the cache—when the CPU accesses memory at a particular address, the data goes into a specific cache set determined by the address bits.) For example, the first trace reads like this: ``` Cache Set 0 accessed | Cache Set 11 accessed | | Cache set 38 accessed | | | ^......^...^........^.................^...........^............. | | | | | Cache Set 50 accessed | Cache Set 20 accessed Cache Set 7 accessed ``` Finally, the numbers in the columns tell us how often each trace appeared for each input: ``` ... | 626 | 0 | ... | 1 | 18 | ... | 0 | 609 | ``` Looking at the first hardware trace we see it appeared 626 times for input 4 but never for input 14. The third trace shows the opposite pattern—0 times for input 4 but 609 times for input 14. This clear separation in the distributions confirms this is a genuine violation, not random noise. What we're seeing is a data-dependent cache access pattern. The test case accessed different cache lines depending on the input data, creating an observable side channel. We don't know yet what caused this channel, but we can already tell that it's likely to be caused by speculation; non-speculative cache accesses are permitted by our reference contract, so they wouldn't be reported as violations. For more details on interpreting violation reports, see [How to Interpret Violation Results](../../howto/interpret-results.md). ### Violation Artifact The artifact for this violation is stored in a directory named `violation-`: ```bash $ ls -l violation-251203-103338 input_0000.bin input_0004.bin input_0008.bin input_0012.bin input_0016.bin minimize.yaml reproduce.yaml input_0001.bin input_0005.bin input_0009.bin input_0013.bin input_0017.bin org-config.yaml input_0002.bin input_0006.bin input_0010.bin input_0014.bin input_0018.bin program.asm input_0003.bin input_0007.bin input_0011.bin input_0015.bin input_0019.bin report.txt ``` The `program.asm` file holds the test case program that triggered the violation. The `input_*.bin` files contain the input sequence that exposed the leak. The `report.txt` file provides additional details including hardware and contract traces. The configuration files include `org-config.yaml` (the original configuration), `reproduce.yaml` (for reproducing the violation), and `minimize.yaml` (for test case minimization). ### Validate the violation Let's verify this violation is genuine and reproducible. First, we'll move the violation artifacts to a simpler path: ```bash mv violation-251203-103338 ./violation ``` Now we'll reproduce the violation using the saved artifacts: ```bash ./revizor.py reproduce -s base.json -c ./violation/reproduce.yaml -t ./violation/program.asm -i ./violation/input*.bin ``` If the violation is genuine, we should see Revizor report it again: ``` ================================ Violations detected ========================== Violation Details: ----------------------------------------------------------------------------------- HTrace | ID:4 | ID:14 | ----------------------------------------------------------------------------------- ^......^...^........^.................^...........^............. | 626 | 0 | ^......^...^........^........................................... | 1 | 20 | ^^.....^...^........^....^...................................... | 0 | 607 | ``` Perfect! The hardware traces are roughly the same as before, confirming this is a stable, reproducible violation. !!! tip "Dealing with False Positives" In most cases, violations are genuine. However, if you're on a high-noise system, you might occasionally see non-reproducible violations. If this happens, adjust the noise tolerance by increasing `analyser_stat_threshold` or `executor_sample_sizes` in your configuration file (see the [Configuration Reference](../../ref/config.md) for details), then rerun the fuzzer. Also, consider trying to mitigate the noise, for example by disabling hyperthreading or by turning prefetchers off. ### Minimize the test case Now that we've confirmed the violation is real, let's simplify it for easier analysis. The minimizer will systematically remove unnecessary instructions while keeping the violation reproducible. Use the following command. We won't go into it's details now as they are irrelevant to this tutorial. If you're curious, check our [How to Minimize](../../howto/minimize.md) guide. ```bash ./revizor.py minimize -s base.json \ -c ./violation/minimize.yaml -t ./violation/program.asm \ -o ./violation/min.asm -i 10 --num-attempts 3 \ --enable-instruction-pass 1 \ --enable-simplification-pass 1 \ --enable-nop-pass 1 \ --enable-constant-pass 1 \ --enable-mask-pass 1 \ --enable-label-pass 1 ``` We'll see the minimization progress as it works through multiple passes: ``` [PASS 1] Reproducing the violation > Violation reproduced. Proceeding with minimization > Violating input IDs: [4, 14] [INFO] Minimization attempt 1/3 [PASS 2] Instruction Removal Pass ........---...-- [PASS 3] Instruction Simplification Pass --..- [PASS 4] NOP Replacement Pass (and so on...) ``` This process typically takes 5-10 minutes. Each `.` indicates a failed removal attempt (the violation disappeared), while each `-` shows a successful simplification (the violation persisted with fewer instructions). After it finishes, we'll find the minimized program in `./violation/min.asm`. ``` asm .intel_syntax noprefix .section .data.main .function_0: .macro.measurement_start: nop qword ptr [rax + 0xff] add al, -118 # instrumentation and rdi, 0b1111111111100 # instrumentation adc al, byte ptr [r14 + rdi] mov rax, -1332388169 imul eax, eax, -75 and rcx, 0b1111111111000 # instrumentation add dword ptr [r14 + rcx], eax and rax, 0b1111111111000 # instrumentation imul qword ptr [r14 + rax] and rcx, 0b1111111000000 # instrumentation lock inc qword ptr [r14 + rcx] and rdi, 0b1111111111000 # instrumentation add byte ptr [r14 + rdi], al sub dl, al jp .bb_0.1 jmp .exit_0 .bb_0.1: and rbx, 0b1111111111000 # instrumentation cmp dword ptr [r14 + rbx], eax and rdi, 0b1111111111000 # instrumentation cmp qword ptr [r14 + rdi], rbx and rbx, 0b1111111000000 # instrumentation lock sub word ptr [r14 + rbx], dx and rbx, 0b1111111111000 # instrumentation dec word ptr [r14 + rbx] and rsi, 0b1111111111000 # instrumentation neg qword ptr [r14 + rsi] and rbx, 0b1111111111000 # instrumentation adc ax, word ptr [r14 + rbx] .exit_0: .macro.measurement_end: nop qword ptr [rax + 0xff] .section .data.main .test_case_exit:nop ``` Let's verify the minimized program still triggers the violation: ``` bash $ ./revizor.py reproduce -s base.json -c ./violation/reproduce.yaml -t ./violation/min.asm -i ./violation/input*.bin INFO: [prog_gen] Setting program_generator_seed to random value: 112509 INFO: [fuzzer] Starting at 11:04:52 > Entering slow path...> Priming 1 > Increasing sample size... to 50> Increasing sample size... to 100> Increasing sample size... to 500> Priming 1 ================================ Violations detected ========================== Violation Details: ----------------------------------------------------------------------------------- HTrace | ID:5 | ID:15 | ----------------------------------------------------------------------------------- ^^................^..^.......................................... | 404 | 15 | ^^.........^....^.^..^.......................................... | 223 | 0 | ^^................^..^.......^....................^............. | 0 | 612 | ``` Excellent! The violation still reproduces with the minimized program. We've successfully reduced the test case while preserving the vulnerability. The program is still fairly complex, though. Let's run input minimization to identify exactly which values are being leaked. ### Analyze the leak through input minimization ```bash $ revizor ./revizor.py minimize -s base.json -c ./violation/minimize.yaml -t ./violation/min.asm -o ./violation/min.asm -i 25 --input-outdir ./violation/min-inputs \ --enable-input-diff-pass 1 \ --enable-input-seq-pass 1 \ --enable-instruction-pass false ``` Among other information, the minimizer prints the leaked values: ``` > Minimizing the difference between inputs 2 and 3 Address +0x0 +0x40 +0x80 +0xc0 +0x100 +0x140 +0x180 +0x1c0 0x00000000 ........ ....=... ........ ........ ........ ........ ........ ........ 0x00000200 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000400 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000600 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000800 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000a00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000c00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000e00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001000 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001200 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001400 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001600 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001800 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001a00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001c00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001e00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00002000 ....^... 0x00002040 ........ ........ ........ ........ > Result: Leaked 1 bytes > Addresses: ['0x2020'] ``` There are two bits of information that we learn from here: - Most of the input has been successfully zeroed-out (`.`). This means it is likely irrelevant to the leak. - The only non-zero byte is at address `0x2020` (marked with `^`). This is likely the leaked byte. To understand how this address maps to the test case, we need to look at the layout of the input: [here](../../ref/artifact-file-formats.md). We can see that the leak is within the GPR region of actor 0 (the only actor in this test case). Specifically, 0x2020 - 0x2000 = 0x20, is the offset used to initialize RSI (GPRs are ordered as: `rax`, `rbx`, `rcx`, `rdx`, `rsi`, `rdi`, `flags`, `rsp`). Now we just need to find how the test case uses RSI (possibly speculatively), and we will have a good idea of the root-cause of the leak. Let's inspect the minimized program in `./violation/min.asm`: ``` asm linenums="1" .intel_syntax noprefix .section .data.main .function_0: .macro.measurement_start: nop qword ptr [rax + 0xff] add al, -118 and rdi, 0b1111111111100 adc al, byte ptr [r14 + rdi] # mem access: [5] 0x1578 cl 21:56 | [15] 0x1578 cl 21:56 mov rax, -1332388169 imul eax, eax, -75 and rcx, 0b1111111111000 add dword ptr [r14 + rcx], eax # mem access: [5] 0x2498-0x2498 cl 18:24 | [15] 0x2498-0x2498 cl 18:24 and rax, 0b1111111111000 imul qword ptr [r14 + rax] # mem access: [5] 0x1060 cl 1:32 | [15] 0x1060 cl 1:32 and rcx, 0b1111111000000 lock inc qword ptr [r14 + rcx] # mem access: [5] 0x2480-0x2480 cl 18:0 | [15] 0x2480-0x2480 cl 18:0 and rdi, 0b1111111111000 add byte ptr [r14 + rdi], al # mem access: [5] 0x1578-0x1578 cl 21:56 | [15] 0x1578-0x1578 cl 21:56 sub dl, al jp .bb_0.1 jmp .exit_0 .bb_0.1: and rbx, 0b1111111111000 cmp dword ptr [r14 + rbx], eax and rdi, 0b1111111111000 cmp qword ptr [r14 + rdi], rbx and rbx, 0b1111111000000 lock sub word ptr [r14 + rbx], dx and rbx, 0b1111111111000 dec word ptr [r14 + rbx] and rsi, 0b1111111111000 neg qword ptr [r14 + rsi] # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HERE: RSI is used here and rbx, 0b1111111111000 adc ax, word ptr [r14 + rbx] .exit_0: .macro.measurement_end: nop qword ptr [rax + 0xff] .section .data.main .test_case_exit:nop ``` We can see that RSI is used in the instruction at line 36: ``` asm neg qword ptr [r14 + rsi] ``` That already gives most of the information we need. We can see a clear Spectre V1 pattern here: 1. There is a conditional branch at line 24 (`jp .bb_0.1`) 2. And a load of a previously-unused value on a mispredicted path (line 36) To verify that, let's inspect the actual value of RSI in the violating inputs (inputs 2 and 3 according to the minimizer output above). We can use `hexdump` for that: ``` bash $ hexdump -C ./violation/min-inputs/min_input_0002.bin | grep 2020 00002020 93 22 00 00 93 22 00 00 00 00 00 00 00 00 00 00 |."..."..........| $ hexdump -C ./violation/min-inputs/min_input_0003.bin | grep 2020 00002020 40 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 |@...@...........| ``` So the value of RSI were: - Input 2: `rsi=0x0000229300002293` - Input 3: `rsi=0x0000004000000040` These values were masked by the instruction at line 35: ``` asm and rsi, 0b1111111111000 # instrumentation ``` Which means that the values of RSI used in memory accesses at line 36 were: - Input 2: `0x0000229300002293 & 0b1111111111000 = 0x290` - Input 3: `0x0000004000000040 & 0b1111111111000 = 0x040` All memory accesses within the test case are relative to `r14`, which is page-aligned and points to the base of the sandbox memory. Therefore, we can calculate the ID of the cache lines accessed by the instruction at line 36 as follows: - Input 2: cache line ID = `0x290 // 0x40 = 0xa = 10` - Input 3: cache line ID = `0x040 // 0x40 = 0x1 = 1` So, if our hypothesis is correct, we should see that in the hardware trace of the violation, cache lines 10 and 1 were accessed when executing inputs 2 and 3. Let's verify it by running rvzr in the reproduce mode: ``` $ ./revizor.py reproduce -s base.json -c ./violation/reproduce.yaml -t ./violation/min.asm -i ./violation/min-inputs/min_input_*.bin ----------------------------------------------------------------------------------- HTrace | ID:2 | ID:3 | ----------------------------------------------------------------------------------- ^^........^..................................................... | 626 | 0 | ^^.............................................................. | 1 | 627 | ``` The first hardware trace (dominant for input 2) is: ``` ^^........^..................................................... || | || Cache set 10 accessed |Cache set 1 Cache set 0 accessed ``` The second hardware trace (dominant for input 3) is: ``` ^^.............................................................. || | Cache set 1 accessed Cache set 0 accessed ``` Indeed, we see that our hypothesis is correct! The instruction at line 36 accessed different cache lines depending on the value of RSI, which was influenced by speculative execution after the conditional branch at line 24. This tells us that the root-cause of the leak was misprediction of a conditional branch that led to speculative leak of a value (RSI) through a data access. ### Summary Congratulations! We've successfully detected and analyzed a Spectre V1 vulnerability from start to finish. !!! success "What We've Learned" In this section, we've walked through the complete workflow for detecting speculative execution vulnerabilities: - **Strategic planning**: Choosing a minimal instruction set and appropriate contract focused our search - **Violation detection**: Revizor found the vulnerability automatically in under two minutes - **Validation**: Reproduction confirmed the violation was genuine and stable - **Minimization**: We reduced a complex test case to its essential components - **Root-cause analysis**: By examining register values and cache access patterns, we identified the exact mechanism of the leak This same workflow applies to discovering and analyzing any speculative execution vulnerability. ### What's Next? Proceed to [Tutorial 3](./03-faults.md) to see how the same principles can be applied to detect more complex vulnerabilities based on CPU exceptions and faults. ================================================ FILE: docs/intro/tutorials/03-faults.md ================================================ # Tutorial 3: Testing faults with Revizor Having detected Spectre V1, let's now apply the same methodology to find a different vulnerability class. Meltdown-style vulnerabilities exploit speculative execution around exception handling rather than branch misprediction. !!! important This tutorial relies on the knowledge about sandboxed execution and the memory layout of the sandbox. If you haven't read about it yet, please refer to the [Sandbox Reference](../../ref/sandbox.md) and the [Actors and Isolation Topic Guide](../../topics/actors.md) before proceeding. ### Plan the campaign The key difference in this campaign is the speculation source. Instead of conditional branches, we'll test page faults. Meltdown and related vulnerabilities occur when a CPU speculatively executes instructions that follow a faulting memory access, potentially leaking data from inaccessible memory regions. From the practical standpoint, the key difference that we will need to configure the [sandbox](../../ref/sandbox.md) to make it possible for the test case to trigger page faults. Namely, we will make one of the pages accessible by the test cases non-readable. ### Create the configuration file Our configuration for this campaign makes three important changes from the Spectre V1 setup. First, we remove `BASE-COND_BR` from the instruction categories since we already know conditional branches cause Spectre V1 violations. This focuses our testing on other speculation sources. Second, we add an `actors` section with `data_properties` to configure the sandbox memory layout. Revizor's sandbox allocates each actor two 4KB memory regions: a main area with normal read-write permissions and a faulty area where we can configure special permissions. By setting `present: false` in the data properties, we mark the faulty area as non-present in the page tables. When the test case attempts to access this region, the CPU will raise a page fault, giving us the exception-based speculation source we want to test. Third, we change the contract execution clause to `delayed-exception-handling`. Modern CPUs implement out-of-order execution, so data-independent instructions after a fault may execute before the exception is recognized. This is expected behavior and would cause trivial violations under the strict `no_speculation` contract. The `delayed-exception-handling` clause accommodates this expected speculation, allowing Revizor to focus on more interesting leaks. For more details on contract selection, see [How to Choose a Contract](../../howto/choose-contract.md). ```yaml # contract contract_observation_clause: loads+stores+pc contract_execution_clause: - delayed-exception-handling # tested instructions instruction_categories: - BASE-BINARY # - BASE-COND_BR actors: - main: - data_properties: - present: false enable_speculation_filter: true enable_observation_filter: true enable_fast_path_model: true ``` ### Run the fuzzer With the configuration ready, let's run the fuzzer. ``` $ ./revizor.py fuzz -s base.json -c dbg/tut/2.yaml -n 1000 -i 20 -w . INFO: [fuzzer] Starting at 12:05:26 66 ( 7%)| Stats: Cls:19/19,In:40,R:19,SF:0,OF:0,Fst:6,CN:60,CT:0,P1:0,CS:0,P2:0,V:0 ``` Notice in the statistics that `SF:0,OF:0`—unlike the Spectre V1 campaign, none of our test cases are filtered by the speculation or observation filters since every test case with a page fault exhibits speculation. Eventually (after a few minutes), Revizor detects a violation: ``` ================================ Violations detected ========================== Violation Details: ----------------------------------------------------------------------------------- HTrace | ID:3 | ID:23 | ----------------------------------------------------------------------------------- ^^.^.......^.........^..^.........................^............^ | 627 | 0 | ^^.^...^...^............^.........................^............^ | 0 | 627 | ``` The output is similar to what we saw in the Spectre V1 campaign, so we won't go into the details of reading the violation report again. The key takeaway is that we've successfully detected a contract violation, and the hardware traces show different cache access patterns for the two inputs. ### Validate the violation As before, we validate the violation by reproducing it: ``` $ ./revizor.py reproduce -s base.json -c ./violation/reproduce.yaml -t ./violation/program.asm -i ./violation/input*.bin ``` The output should be similar to the original: ``` ================================ Violations detected ========================== Violation Details: ----------------------------------------------------------------------------------- HTrace | ID:3 | ID:23 | ----------------------------------------------------------------------------------- ^^.^.......^.........^..^.........................^............^ | 627 | 0 | ^^.^...^...^............^.........................^............^ | 0 | 627 | ``` Great! The violation reproduces successfully, confirming it's genuine. ### Minimize the test case Now we minimize the test case to make it easier to analyze: ``` ./revizor.py minimize -s base.json -c ./violation/minimize.yaml -t ./violation/program.asm -o ./violation/min.asm -i 10 --num-attempts 3 \ --enable-instruction-pass 1 \ --enable-simplification-pass 1 \ --enable-nop-pass 1 \ --enable-constant-pass 1 \ --enable-mask-pass 1 \ --enable-label-pass 1 ``` After the minimization completes, verify that the minimized program still reproduces the violation: ``` ./revizor.py reproduce -s base.json -c ./violation/reproduce.yaml -t ./violation/min.asm -i ./violation/input*.bin INFO: [prog_gen] Setting program_generator_seed to random value: 578824 INFO: [fuzzer] Starting at 12:14:08 > Entering slow path...> Priming 6 > Increasing sample size... to 50> Increasing sample size... to 100> Increasing sample size... to 500> Priming 6 ================================ Violations detected ========================== Violation Details: ----------------------------------------------------------------------------------- HTrace | ID:11 | ID:31 | ----------------------------------------------------------------------------------- ^^.^.......^...^........^.........................^...^........^ | 627 | 0 | ^^.^.......^...^........^.........................^............^ | 0 | 627 | ``` ### Identify the leaked value Next, we minimize the inputs to identify which specific values are being leaked: ``` ./revizor.py minimize -s base.json -c ./violation/minimize.yaml -t ./violation/min.asm -o ./violation/min.asm -i 10 --input-outdir ./violation/min-inputs \ --enable-input-diff-pass 1 \ --enable-input-seq-pass 1 \ --enable-comment-pass 1 \ --enable-instruction-pass false (skipping output for brevity) > Minimizing the difference between inputs 0 and 1 Address +0x0 +0x40 +0x80 +0xc0 +0x100 +0x140 +0x180 +0x1c0 0x00000000 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000200 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000400 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000600 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000800 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000a00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000c00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00000e00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001000 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001200 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001400 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001600 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001800 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001a00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001c00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00001e00 ........ ........ ........ ........ ........ ........ ........ ........ 0x00002000 ..=.=^.. 0x00002040 ........ ........ ........ ........ > Result: Leaked 1 bytes > Addresses: ['0x2028'] > Saving new inputs in '/home/t-oleksenkoo/revizor/violation/min-inputs' > Violating input IDs: [5, 15] ``` Key takeaways: - The leaked value originates from address `0x2028` in the input, which corresponds to offset `0x28` in the GPR initialization region of the sandbox memory, used to initialize the `RDI` register. - Two other values in the input were not zeroed out, which indicates they are somehow relevant to triggering the violation. Namely, those are offsets `0x10` and `0x20`, which correspond to `RCX` and `RSI`. ### Perform root-cause analysis With the minimized program and inputs, we can now investigate the root cause. The minimized program is as follows: ``` asm linenums="1" .intel_syntax noprefix .section .data.main .function_0: .macro.measurement_start: nop qword ptr [rax + 0xff] and rsi, 0b1111111111000 # instrumentation add rdi, qword ptr [r14 + rsi] add cl, dl and rcx, 0b1111111111000 # instrumentation add qword ptr [r14 + rcx], rbx and rbx, 0b1111111111000 # instrumentation add dword ptr [r14 + rbx], ecx and rax, 0b1111111111000 # instrumentation cmp dword ptr [r14 + rax], ecx and rdi, 0b1111111111000 # instrumentation or byte ptr [r14 + rdi], 1 # instrumentation # <<<<<<<<<<<<<<< HERE: RDI is used here mov ax, 1 # instrumentation div byte ptr [r14 + rdi] # <<<<<<<<<<<<<<< HERE: RDI is used here and rsi, 0b1111111111000 # instrumentation sub byte ptr [r14 + rsi], bl and rcx, 0b1111111111000 # instrumentation sub al, byte ptr [r14 + rcx] and rcx, 0b1111111111000 # instrumentation mul qword ptr [r14 + rcx] and rax, 0b1111111000000 # instrumentation lock sub word ptr [r14 + rax], -128 .macro.measurement_end: nop qword ptr [rax + 0xff] .section .data.main .test_case_exit:nop ``` RDI is used in two places: 1. Line 15: `or byte ptr [r14 + rdi], 1` (a write) 2. Line 17: `div byte ptr [r14 + rdi]` (a read) This is a clear data-dependent pattern, which explains why RDI is being leaked. But normally, these patterns should not be reported as violations of CT-DEH (our selected contract), since the contract permits cache-based leakage. So if the violation was reported, it means these instructions were not executed in the model. Let's investigate why. We will inspect how the model executes this program. To this end, we will add a debug flag to the config file: ```yaml logging_modes: - dbg_model ``` Then, we will reproduce the violation again, now with a verbose log of test case execution on the model: ``` ./revizor.py reproduce -s base.json -c ./violation/reproduce.yaml -t ./violation/min.asm -i ./violation/min-inputs/min_input_0000.bin ##### Input 0 ##### 0x0 : macro .measurement_start, .noarg rax=0x0000000000000000 rbx=0x0000000000000000 rcx=0x0000d04a0000d04a rdx=0x0000000000000000 rsi=0x0000d0510000d051 rdi=0x000056b8000056b8 flags=0b000000000010 xmm0=0x00000000000000000000000000000000 xmm1=0x00000000000000000000000000000000 xmm2=0x00000000000000000000000000000000 xmm3=0x00000000000000000000000000000000 xmm4=0x00000000000000000000000000000000 xmm5=0x00000000000000000000000000000000 xmm6=0x00000000000000000000000000000000 xmm7=0x00000000000000000000000000000000 0x8 : and rsi, 0b1111111111000 rax=0x0000000000000000 rbx=0x0000000000000000 rcx=0x0000d04a0000d04a rdx=0x0000000000000000 rsi=0x0000d0510000d051 rdi=0x000056b8000056b8 flags=0b000000000010 xmm0=0x00000000000000000000000000000000 xmm1=0x00000000000000000000000000000000 xmm2=0x00000000000000000000000000000000 xmm3=0x00000000000000000000000000000000 xmm4=0x00000000000000000000000000000000 xmm5=0x00000000000000000000000000000000 xmm6=0x00000000000000000000000000000000 xmm7=0x00000000000000000000000000000000 0xf : add rdi, [r14 +rsi] rax=0x0000000000000000 rbx=0x0000000000000000 rcx=0x0000d04a0000d04a rdx=0x0000000000000000 rsi=0x0000000000001050 rdi=0x000056b8000056b8 flags=0b000000000110 xmm0=0x00000000000000000000000000000000 xmm1=0x00000000000000000000000000000000 xmm2=0x00000000000000000000000000000000 xmm3=0x00000000000000000000000000000000 xmm4=0x00000000000000000000000000000000 xmm5=0x00000000000000000000000000000000 xmm6=0x00000000000000000000000000000000 xmm7=0x00000000000000000000000000000000 > load from +0x2050 value 0x0 EXCEPTION #13: Read from non-readable memory (UC_ERR_READ_PROT) 0x13: [transient, nesting = 1] add cl, dl rax=0x0000000000000000 rbx=0x0000000000000000 rcx=0x0000d04a0000d04a rdx=0x0000000000000000 rsi=0x0000000000001050 rdi=0x000056b8000056b8 flags=0b000000000110 xmm0=0x00000000000000000000000000000000 xmm1=0x00000000000000000000000000000000 xmm2=0x00000000000000000000000000000000 xmm3=0x00000000000000000000000000000000 xmm4=0x00000000000000000000000000000000 xmm5=0x00000000000000000000000000000000 xmm6=0x00000000000000000000000000000000 xmm7=0x00000000000000000000000000000000 0x15: [transient, nesting = 1] and rcx, 0b1111111111000 rax=0x0000000000000000 rbx=0x0000000000000000 rcx=0x0000d04a0000d04a rdx=0x0000000000000000 rsi=0x0000000000001050 rdi=0x000056b8000056b8 flags=0b000000000010 xmm0=0x00000000000000000000000000000000 xmm1=0x00000000000000000000000000000000 xmm2=0x00000000000000000000000000000000 xmm3=0x00000000000000000000000000000000 xmm4=0x00000000000000000000000000000000 xmm5=0x00000000000000000000000000000000 xmm6=0x00000000000000000000000000000000 xmm7=0x00000000000000000000000000000000 0x1c: [transient, nesting = 1] add [r14 +rcx], rbx rax=0x0000000000000000 rbx=0x0000000000000000 rcx=0x0000000000001048 rdx=0x0000000000000000 rsi=0x0000000000001050 rdi=0x000056b8000056b8 flags=0b000000000110 xmm0=0x00000000000000000000000000000000 xmm1=0x00000000000000000000000000000000 xmm2=0x00000000000000000000000000000000 xmm3=0x00000000000000000000000000000000 xmm4=0x00000000000000000000000000000000 xmm5=0x00000000000000000000000000000000 xmm6=0x00000000000000000000000000000000 xmm7=0x00000000000000000000000000000000 > load from +0x2048 value 0x0 EXCEPTION #13: Read from non-readable memory (UC_ERR_READ_PROT) ROLLBACK to 0x7f ``` This log shows in detail which instructions from the test case were executed by the model, whether they were transient or non-transient, and the register/memory state before each instruction. We can see that, early in the execution of the test case, a page fault occurs when trying to read from memory at address `0x2050`. This is because of the configuration we're using, where the second page of the sandbox memory (the faulty page) is set as non-readable. Accordingly, since we're using `delayed-exception-handling` execution clause, the model will not execute any instructions that are data-dependent on this faulting load. This includes the two instructions that use RDI (lines 15 and 17), since RDI was computed based on the value loaded from address `0x2050`. From this, we can conclude that the CPU implements some sort of speculation on page faults: The RDI-dependent instructions were not supposed to be executed, but we see leakage of RDI in cache traces nonetheless. To understand what specific value is returned speculatively, we can manually modify the test case, and replace the instructions after the faulting load with a gadget that will specifically leak RDI: ``` asm linenums="1" .intel_syntax noprefix .section .data.main .macro.measurement_start: nop qword ptr [rax + 0xff] and rsi, 0b1111111111000 # instrumentation mov rdi, qword ptr [r14 + rsi] and rdi, 0b111111111111 # mask the value of RDI mov rdi, qword ptr [r14 + rdi] .macro.measurement_end: nop qword ptr [rax + 0xff] .test_case_exit: ``` Will will also enable another debug mode to see the hardware traces even when no violation is detected: ```yaml logging_modes: # - dbg_model - dbg_dump_htraces ``` Then, we can run the modified test case: ``` $ ./revizor.py reproduce -s base.json -c ./violation/reproduce.yaml \ -t ./violation/min.asm -i ./violation/min-inputs/min_input_0000.bin ================================ Collected Traces ============================= - Input 0: HTr: ^^.^.......^............^.........................^............^ [10] Feedback: (816, 685, 64, 0, 0) ``` We see that multiple cache lines were accesses, so it is hard to pinpoint the exact one that belongs to the speculative leak. (We likely have all these evictions due to the page walk triggered by the page fault.) We can identify the specific cache line by further modifying the test case to add an hard offset to the speculative memory access, e.g., changing: ``` asm mov rdi, qword ptr [r14 + rdi + 0x100] ``` Then, we can run it again and see how the hardware trace changes: ``` ./revizor.py reproduce -s base.json -c ./violation/reproduce.yaml -t ./violation/min.asm -i ./violation/min-inputs/min_input_0000.bin ================================ Collected Traces ============================= - Input 0: HTr: ^^.^^......^............^.........................^............^ [10] Feedback: (816, 685, 71, 0, 0) ``` Let's compare it side-by-side with the previous trace: ``` Before: ^^.^.......^............^.........................^............^ After: ^^.^^......^............^.........................^............^ | + Added cache set access due to +0x100 offset (cache set ID 4) ``` This shows that the speculative access used cache set ID 4. From this, we can do a simple calculation to deduce the value of RDI that was used for the memory access: ``` Cache ID = 4 Cache Line Size = 0x40 Hardcoded Offset = 0x100 Speculative Address = (Cache ID * Cache Line Size) = rdi + Hardcoded Offset // ignore r14 => rdi_masked = (Cache ID * Cache Line Size) - Hardcoded Offset = (4 * 0x40) - 0x100 = 0x0 ``` Now we know that the masked value of RDI used in the speculative access was `0x0`. The remaining part is to figure out what was the original value of RDI before masking. For that, we can shift the pre-mask value of RDI by 12 bits (since the mask is `0b111111111111` = 0xfff = 12 bits) and repeat the procedure. We'll do 6 times to reveal the whole value. The resulting traces are as follows: ``` no shift: ^^.^.......^............^.........................^............^ 12 bits: ^^.^.......^............^.........................^............^ 24 bits: ^^.^.......^............^.........................^............^ 36 bits: ^^.^.......^............^.........................^............^ 48 bits: ^^.^.......^............^.........................^............^ 60 bits: ^^.^.......^............^.........................^............^ ``` We can see that in all cases, the cache set accessed is 0, which means that the masked value of RDI was always 0, regardless of how much we shifted it. This tells us that the faulting load returned 0 speculatively, which reveals to us the root cause of the violation. This is an instance of a previously-discovered vulnerability called LVI-Null, which we have successfully and independently rediscovered using Revizor! !!! success "What We've Learned" In this section, we applied the same systematic workflow to a different vulnerability class: - **Flexible configuration**: By changing just a few configuration options (removing branches, adding page faults, adjusting the contract), we refocused our search entirely - **Contract selection matters**: The `delayed-exception-handling` contract helped filter out trivial violations while exposing genuine leaks - **Deep analysis techniques**: We manually modified test cases and used offset manipulation to precisely identify what value the CPU returned speculatively The same workflow—plan, configure, fuzz, validate, minimize, analyze—works across all speculative execution vulnerability classes. ### What's Next? Proceed to [Tutorial 4](./04-isolation.md) to see how we can go even further and start testing high-level isolation properties. ================================================ FILE: docs/intro/tutorials/04-isolation.md ================================================ # Tutorial 4: Testing Security Domain Isolation with Revizor In the previous tutorials, we used random test generation to find Spectre V1 and LVI-Null by testing against contracts. While contract violations are interesting, the most critical security issues often arise from failures in isolation between different security domains—such as user vs kernel mode, or different virtual machines. In this tutorial, we'll explore how to use Revizor's template-based fuzzing and multi-actor testing features to evaluate isolation guarantees. Specifically, we'll test whether privileged kernel code can leak information to unprivileged user code through speculative execution. ### Preliminaries Through this tutorial, you should become familiar with three concepts: actors, templates, and actor non-interference. These concepts are covered in detail in the [Topic Guide: Actors](../../topics/actors.md) and [Howto: Use Templates](../../howto/use-templates.md), but we'll provide a brief overview here. **Actors** are an abstraction that separates a test case into multiple components, each with its own code, execution context, privilege level, and memory space. This allows us to model scenarios where different parts of the test case run under different security domains. For example, we can define a `kernel` actor that runs in kernel mode and a `user` actor that runs in user mode. While they will have separate memory spaces and are isolated through privilege separation by the CPU, information could still leak from the kernel actor to the user actor through side channels; Revizor helps us detect such leaks. **Templates** are assembly files that define the high-level structure of test cases. They allow us to specify hard-coded parts of the test case and its actors, while still leaving room for random instruction generation. Templates are essential for testing isolation because they define how different actors interact. For example, a template can specify that the user actor calls into the kernel actor, which processes secret data, and then returns control to the user actor for observation. This structure is unlikely to be generated through pure randomness, so templates enable targeted testing of specific attack patterns. **Actor Non-Interference Contract** is a specialized contract that checks whether one actor's execution can influence another actor's observations. In our case, we want to ensure that the kernel actor's processing of secret data does not affect what the user actor can observe through side channels. If the user actor's hardware trace differs based on the kernel actor's secret data, that's a non-interference violation, indicating a potential isolation failure. ### Plan the campaign Let's imagine we want to test whether a CPU properly enforces isolation between kernel and user mode. Specifically, we want to check if privileged kernel code can leak information to unprivileged user code through speculative execution side channels—attacks like Meltdown exploit exactly this type of isolation failure. For this campaign, we'll use a two-actor setup: a kernel actor (the victim) that processes secret data, and a user actor (the attacker) that attempts to observe those secrets through side channels. Rather than relying on pure random generation, we'll use a template that explicitly encodes the interaction pattern: the kernel processes data, then transfers control to user mode, where observation code runs. This template-based approach ensures we're testing the specific isolation boundary we care about. We'll pair this multi-actor test structure with the Actor Non-Interference contract. This contract checks whether the user actor's hardware traces (cache state, timing, etc.) differ based on the kernel actor's input data. If they do, it means information crossed the privilege boundary—a clear isolation failure. Unlike model-based contracts that compare hardware against an idealized model, non-interference testing directly verifies that one actor cannot observe another actor's secrets, which is precisely the security property we want to enforce. With this campaign plan, we are trying to answer a specific question: "Can unprivileged code observe secrets from privileged code through speculative side channels?" ### Create the configuration file ```yaml # contract for isolation testing contract_observation_clause: ct-ni # instruction categories instruction_categories: - BASE-BINARY # actor configuration actors: - main: - privilege_level: "kernel" - observer: false - user: - privilege_level: "user" - observer: true # filters enable_speculation_filter: true enable_observation_filter: true enable_fast_path_model: true ``` This configuration introduces several important concepts. The `contract_observation_clause` is set to `ct-ni`, which tells Revizor to use the Actor Non-Interference check instead of the standard model-based testing. The `actors` section defines two execution contexts. The `main` actor runs in kernel mode (`mode: kernel`) and has `observer: false`, meaning it's the victim whose secrets might leak. The `user` actor runs in user mode (`mode: user`) and has `observer: true`, meaning it's the attacker trying to observe kernel secrets through side channels. For more details on actor configuration, see [Topic Guide: Actors](../../topics/actors.md). ### Create the template Now we need a template that exercises the kernel-user boundary. Create `template.asm`: ``` asm .intel_syntax noprefix # ----------------------------- Kernel-mode Actor (Victim) ------------------- .section .data.main .function_main_1: # random code of the victim .macro.random_instructions.16.8.main_1: # switch to user actor to observe .macro.set_k2u_target.user.function_user_1: .macro.set_u2k_target.main.function_main_2: .macro.switch_k2u.user.1: .macro.fault_handler: # one more call to the user to complete the measurement in case of a fault .macro.set_k2u_target.user.function_user_2: .macro.set_u2k_target.main.function_main_3: .macro.switch_k2u.user.2: # return point for the user actor .function_main_2: .macro.landing_u2k.main_2: # exit .function_main_3: .macro.landing_u2k.main_3: nop # ----------------------------- User-mode Actor ------------------------------ .section .data.user .function_user_1: # reset registers to ensure we're not observing leftover state .macro.landing_k2u.user_1: xor rax, rax # noremove mov rax, qword ptr [r14 + 0x2000] # noremove mov rbx, qword ptr [r14 + 0x2008] # noremove mov rcx, qword ptr [r14 + 0x2010] # noremove mov rdx, qword ptr [r14 + 0x2018] # noremove mov rsi, qword ptr [r14 + 0x2020] # noremove mov rdi, qword ptr [r14 + 0x2028] # noremove lfence # attacker code to observe side effects .macro.measurement_start: .macro.random_instructions.16.8.user_1: .macro.measurement_end.1: # switch back to kernel actor .macro.switch_u2k.main.1: # second measurement call; for the cases when the first one was bypassed by a fault .function_user_2: .macro.landing_k2u.user_2: .macro.measurement_end.2: .macro.switch_u2k.main.2: lfence # ----------------------------- Exit ----------------------------------------- .section .data.main .test_case_exit: ``` Let's break down this template block by block to understand how it orchestrates the kernel-user isolation test: **Kernel Actor - Initial Execution (`function_main_1`)** ```asm .section .data.main .function_main_1: .macro.random_instructions.16.8.main_1: ``` The template begins in the kernel actor's code space (`.section .data.main`). The `.macro.random_instructions.16.8.main_1` macro generates 16 random instructions with an average of 8 memory accesses. This randomized kernel code represents the victim's execution. **Transition Setup - Kernel to User** ```asm .macro.set_k2u_target.user.function_user_1: .macro.set_u2k_target.main.function_main_2: .macro.switch_k2u.user.1: ``` These macros configure and execute a privilege level transition. The `set_k2u_target` macro specifies that when dropping to user mode, execution should begin at `function_user_1` in the `user` actor. The `set_u2k_target` macro specifies that when returning to kernel mode, execution should resume at `function_main_2` in the `main` actor. Finally, `switch_k2u` performs the actual privilege drop, transferring control to user mode. The `.1` suffix is a unique label for this transition. **Kernel Actor - Return Point (`function_main_2`)** ```asm .function_main_2: .macro.landing_u2k.main_2: .macro.fault_handler: ``` This is where the kernel resumes after the user actor returns control. The `landing_u2k` macro handles the privilege escalation transition, restoring the kernel execution context. The `fault_handler` macro designates this location as the exception handler—if any faults occur during execution (in either actor), control transfers here. **Second Transition - Kernel to User Again** ```asm .macro.set_k2u_target.user.function_user_2: .macro.set_u2k_target.main.function_main_3: .macro.switch_k2u.user.2: ``` The kernel performs another transition to user mode, this time to `function_user_2`. This is necessary because, if the random code in the user actor triggers a fault, the `measurement_end` may never be reached, and the hardware trace would be corrupted. By splitting the measurement into two parts, we ensure that even if a fault occurs during the first measurement, we can still capture whatever trace was collected up to that point. **Kernel Actor - Exit (`function_main_3`)** ```asm .function_main_3: .macro.landing_u2k.main_3: nop ``` The final kernel return point. After the second user-mode measurement completes, execution returns here and falls through to the test case exit. **User Actor - First Observation (`function_user_1`)** ```asm .section .data.user .function_user_1: .macro.landing_k2u.user_1: xor rax, rax # noremove mov rax, qword ptr [r14 + 0x2000] # noremove mov rbx, qword ptr [r14 + 0x2008] # noremove ... lfence ``` This is where the attacker code executes. The `landing_k2u` macro handles the privilege drop transition, setting up the user execution context. The explicit register initialization loads fresh values from memory (via `r14`, which points to the sandbox memory). The `# noremove` comments prevent Revizor's minimization passes from removing these instructions—they're essential for resetting architectural state. The `lfence` ensures these loads complete before observation begins, preventing them from affecting the measurement. **User Actor - Measurement** ```asm .macro.measurement_start: .macro.random_instructions.16.8.user_1: .macro.measurement_end.1: ``` The `measurement_start` macro marks where hardware trace collection begins. Only code between `measurement_start` and `measurement_end` contributes to the observed side-channel trace. The random instructions here represent attacker code that might be sensitive to cache state, timing variations, or other microarchitectural side effects left by the kernel's execution. The `.1` suffix distinguishes this measurement from the second one. **User Actor - Return to Kernel** ```asm .macro.switch_u2k.main.1: ``` The `switch_u2k` macro performs a privilege escalation, returning control to the kernel actor. This transition was pre-configured earlier by the `set_u2k_target` macro. **User Actor - Second Observation (`function_user_2`)** ```asm .function_user_2: .macro.landing_k2u.user_2: .macro.measurement_end.2: .macro.switch_u2k.main.2: lfence ``` The second user-mode entry point completes the measurement that was started in `function_user_1`. ### Run the isolation test Execute the multi-actor fuzzing campaign: ```bash ./revizor.py tfuzz -s base.json -c config.yaml -t template.asm -n 1000 -i 10 -w . ``` We're running 1000 test cases with 10 inputs each. Multi-actor testing often requires more iterations to find violations because we're looking for interactions between actors, which adds complexity. The fuzzer will run and search for isolation violations. On most systems, you will not find a violation; isolation mechanisms are generally robust. We will need to try harder to find issues. ``` Duration: 60.5 Finished at 08:44:40 ``` ### Adding Faults Now let's add a little more complexity to the experiment. We will make the attacker "active" by allowing the user actor to try and access the memory of the kernel actor. This simulates an attacker that attempts to read privileged memory, which should be blocked by the CPU's privilege separation. To do this, we will use a generator pass that is specifically designed for this purpose. The `user-to-kernel-access` pass randomly selects a memory access from the user actor's code and modifies it to access the kernel actor's memory space. This creates a faulting access that the CPU should prevent. Update the configuration file to include this generator pass: ```yaml faults_allowlist: - user-to-kernel-access # actor configuration actors: - main: - privilege_level: "kernel" - observer: false - fault_blocklist: - user-to-kernel-access - user: - privilege_level: "user" - observer: true ``` Note that we also added a `fault_blocklist` to the kernel actor. This is done to prevent redundant work on the generator side; there is no point in making kernel access its own memory. ### Run the fuzzer with faults enabled Run the fuzzer again with the updated configuration: ```bash ./revizor.py tfuzz -s base.json -c config.yaml -t template.asm -n 5000 -i 10 -w . ``` This time, with the user actor actively trying to access kernel memory, we have a higher chance of provoking isolation violations. If you're testing a system vulnerable to Meltdown, you should see a violation reported: ``` ================================ Violations detected ========================== Violation Details: ----------------------------------------------------------------------------------- HTrace | ID:3 | ID:13 | ----------------------------------------------------------------------------------- ^^.^.......^.........^..^.........................^............^ | 627 | 0 | ^^.^...^...^............^.........................^............^ | 0 | 627 | ``` Validate and minimize the violation, as we've done in the previous tutorials. As a result, you should obtain a minimized test case that contains a typical Meltdown pattern: the user actor attempts to read kernel memory, which causes a fault, but speculative execution allows some of the kernel data to leak through side channels, and thus impact the user's hardware traces. !!! success "What We've Learned" In this tutorial, we've progressed from random fuzzing to structured testing: - **Templates provide structure**: When testing specific attack scenarios, templates let us encode the essential pattern while still benefiting from randomization - **Macros control generation**: The macro system gives fine-grained control over what code gets generated and where - **Multi-actor testing**: Revizor can test isolation between different privilege levels or security domains using the actor system - **Noninterference contract**: This specialized contract detects when one actor's data influences another actor's observations ### What's Next? This concludes our tutorials on using Revizor for security testing. Note that all examples in the tutorials were simplified for clarity. If you wish to explore more realistic scenarios, refer to our guide on [Design a Campaign](../../howto/design-campaign.md) or check an advanced tutorial on [Detecting TSA-SQ](./tsa-sq.md). Proceed to [Tutorial 5](./05-extending.md) to learn how you can extend various components of Revizor to fit your research needs. ================================================ FILE: docs/intro/tutorials/05-extending.md ================================================ # Tutorial 5: Extending Revizor In this tutorial, we will switch gears: instead of using Revizor's existing components, we will extend Revizor by adding custom functionality to some of its core modules. ## Workflow The general workflow for extending any part of Revizor is as follows: - Subclass the exiting module or interface you want to extend. For a list of all interfaces, refer to the [Architecture Overview](../../internals/architecture/overview.md) document. - Implement your custom logic by overwriting the necessary methods. - Register your new class in the factory so that Revizor can access the new implementation. It will also enable the user to select the new implementation via a config file. - Add new configuration options if your extension requires additional parameters. ## Changing Data Generation Algorithm As our first example, we will modify the data (input) generation algorithm used by Revizor. By default, Revizor generates random input data for each test case. However, in some scenarios, it may be beneficial to generate inputs that contain extreme values (e.g., minimum or maximum integers) to test edge cases in the microarchitecture. We will implement this feature. The data generation logic is defined by the `DataGenerator` interface, with its default implementation located in `rvzr/data_generator.py`. We will create a new subclass of `DataGenerator` that generates minimum or maximum integer values with a configurable probability. Implement the new generation algorithm by overwriting the generation logic in the default `DataGenerator` class. ``` python # rvzr/data_generator.py class MinMaxIntGenerator(DataGenerator): """ A variant of DataGenerator that generates minimum or maximum integer values with a configurable probability. """ int_sizes: Final[List[int]] = [8, 16, 32, 64] def __init__(self, seed: int): super().__init__(seed) self._probability_of_max = CONF.input_gen_probability_of_minmax def _generate_one(self, state: int, n_actors: int) -> Tuple[InputData, int]: input_ = InputData(n_actors) input_.seed = state per_actor_data_size = input_.itemsize // 8 rng = np.random.default_rng(seed=state) for i in range(n_actors): # generate random data data = rng.integers( self.max_input_value, size=per_actor_data_size, dtype=np.uint64) # type: ignore # if the probability of max is 0, we're done if self._probability_of_max == 0: input_.set_actor_data(i, data) continue # otherwise, with a given probability, set some values to min or max int for val_id in range(per_actor_data_size): roll = rng.random() if roll > self._probability_of_max: continue int_size = random.choice(self.int_sizes) int_sign = random.choice([True, False]) value = (2 ** (int_size - 1)) if not int_sign: value = -value data[val_id] = np.uint64(value & 0xFFFFFFFFFFFFFFFF) input_.set_actor_data(i, data) return input_, state + 1 ``` We now need to let Revizor know about the existence of this new class. This is achieved by via the factory module `rvzr/factory.py`: ``` python # rvzr/factory.py _DATA_GENERATORS: Dict[str, Type[data_generator.DataGenerator]] = { 'random': data_generator.DataGenerator, 'minmax': data_generator.MinMaxIntGenerator, # <<<<<<<<<<<<<<<< ADDED LINE } ``` Finally, our implementation used a new config option (`input_gen_probability_of_minmax`) to control the probability of generating extreme values. We need to register this new option in the configuration module `rvzr/config.py`: ``` python # rvzr/config.py class Config: ... input_gen_probability_of_minmax: float = 0.5 # <<<<<<<<<<<<<<<< ADDED LINE ``` That's it. That's all it takes to change the data generation algorithm in Revizor. Now, let's test the implementation: ``` yaml # config.yaml data_generator: minmax input_gen_probability_of_minmax: 0.7 ``` Run Revizor with the new configuration: ``` shell ./revizor.py generate -s base.json -c config.yaml -w ./ -n 1 -i 1 ``` See that the new generator was applied: ``` $ hexdump -C ./tc0/input0.bin| head -10 00000000 80 ff ff ff ff ff ff ff 00 00 00 00 00 00 00 80 |................| 00000010 00 80 ff ff ff ff ff ff 2a 35 00 00 00 00 00 00 |........*5......| 00000020 80 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 |................| 00000030 c4 83 00 00 00 00 00 00 37 26 00 00 00 00 00 00 |........7&......| 00000040 36 d5 00 00 00 00 00 00 00 80 ff ff ff ff ff ff |6...............| 00000050 41 27 00 00 00 00 00 00 00 80 00 00 00 00 00 00 |A'..............| 00000060 32 69 00 00 00 00 00 00 64 b0 00 00 00 00 00 00 |2i......d.......| 00000070 00 80 ff ff ff ff ff ff 7c d7 00 00 00 00 00 00 |........|.......| 00000080 00 00 00 00 00 00 00 80 00 80 00 00 00 00 00 00 |................| 00000090 31 86 00 00 00 00 00 00 f9 f4 00 00 00 00 00 00 |1...............| ``` Success! We can see large and small integer values in the generated input data (`ff ff ff ...`), meaning that our new data generator is working as expected. ## Adding a Code Generation Pass We will now explore the other part of the test case generation pipeline - generation of test case programs (code). In this example, we will add a new code generation pass that replaces all registers in the test case with a fixed register (`RAX`). !!! note Frankly, it is not a very useful generation pass, but it serves the purpose of demonstration. The same principles apply to more complex generation passes. We will follow the same steps as before. The code pass interface is located in `rvzr/code_generator.py` as the `Pass` class. We will create a new subclass of it, and, since we are creating an ISA-specific pass, we will place it into `rvzr/arch/x86/generator.py`. ``` python # rvzr/arch/x86/generator.py class _X86RaxPass(Pass): """ Demonstration-only pass that replaces all register operands with RAX. """ def run_on_test_case(self, test_case: TestCaseProgram) -> None: for bb in test_case.iter_basic_blocks(): for node in bb.iter_nodes(): inst = node.instruction for op in inst.operands: if isinstance(op, RegisterOp): op.value = "rax" ``` Register the new class with the generator: ``` python # rvzr/arch/x86/generator.py class X86Generator(CodeGenerator): ... self._passes = [ _X86PatchUndefinedFlagsPass(self._instruction_set, self), _X86SandboxPass(self._target_desc, self._faults), _X86PatchUndefinedResultPass(), _X86RaxPass(), # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ADDED LINE ] ``` That's it. Now, let's test our new code generation pass by running Revizor again: ``` ./revizor.py generate -w . -n 1 -i 1 -s base.json ``` Check the generated program: ``` $ cat tc0/program.asm | head -10 .intel_syntax noprefix .section .data.main .function_0: .bb_0.0: .macro.measurement_start: nop qword ptr [rax + 0xff] and rax, 0b1111111111000 # instrumentation lock add byte ptr [r14 + rdi], rax cmp rax, 106 or rax, 0b1000000000000000000000000000000 # instrumentation bsr rax, rax ``` As we can see, all register operands have been replaced with `RAX`, confirming that our new code generation pass is functioning correctly. ================================================ FILE: docs/intro/tutorials/tsa-sq.md ================================================ # Tutorial: Detecting TSA-SQ with Revizor This tutorial demonstrates how we used Revizor to detect TSA-SQ (Transient Scheduler Attack - Store Queue), a microarchitectural vulnerability discovered in AMD Zen4 processors. We'll walk through the design rationale behind the fuzzing campaign configuration and template, explaining how each component contributes to successful vulnerability detection. You can reproduce this campaign using the provided configuration and template files, which are available in the Revizor repository under `demo/tsa-sq/`. !!! info "Prerequisites" To follow this tutorial, you should have: - Non-virtualized access to an AMD Zen4 processor for testing - A working installation of Revizor. See [installation guide](../02-install.md) for setup instructions. - Basic understanding of Revizor's fuzzing framework, in particular the concepts of [model-based relational testing](../03-primer.md), [actors](../../topics/actors.md), [templates](../../howto/use-templates.md), [macros](../../ref/macros.md). - Familiarity with microarchitectural vulnerabilities and side-channel attacks ## Background: Understanding TSA-SQ Before diving into the Revizor configuration, let's briefly understand what TSA-SQ is. According to the [AMD security bulletin](https://www.amd.com/content/dam/amd/en/documents/resources/bulletin/technical-guidance-for-mitigating-transient-scheduler-attacks.pdf), TSA-SQ exploits timing variations in the CPU's store queue during "false completion" events. When a load instruction matches the address of an older store whose data isn't yet available, it may complete falsely using stale data from a previous store that occupied the same store queue entry. This creates timing differences that an attacker can observe to infer information about previous stores, even from different privilege levels. The key insight is that an unprivileged user process can potentially observe timing variations that depend on data from kernel stores, creating a kernel-to-user information leak channel. ## Design Rationale When this campaign was designed, we were not yet aware of the TSA-SQ vulnerability (in fact, the vulnerability was discovered as *result* of this campaign). Therefore, the campaign design is not specifically tailored to detect TSA-SQ, but rather to stress-test the general isolation between kernel and user modes in a way that could reveal microarchitectural vulnerabilities. ## Threat Model and Actor Configuration Our fuzzing campaign targets a common and high-impact threat model: a malicious user process attempting to extract sensitive data from the kernel. This scenario is particularly relevant for privilege escalation attacks where an attacker seeks to leak kernel secrets. The actor section of `config.yaml` reflects this threat model: ```yaml actors: - main: - mode: "host" - privilege_level: "kernel" ... - user: - observer: true - mode: "host" - privilege_level: "user" ``` The `main` actor represents the victim kernel, while the `user` actor represents the attacker. The `observer: true` flag designates the user actor as the attacker attempting to extract information. This configuration, in combination with the noninterference contract, tells Revizor that any information leakage from `main` to `user` should be flagged as a violation. ## Template Design: Simulating Attack Patterns The template structure follows the typical flow of a microarchitectural side-channel attack, specifically implementing a Flush+Reload pattern across privilege transitions. ![tsa-sq-template.png](../../assets/tsa-sq-template.png) You can find the complete template in [`template.asm`](https://github.com/microsoft/side-channel-fuzzer/blob/main/demo/tsa-sq/template.asm). Let's examine each phase: **Phase 1: Setup and Flush (function_main_0 and function_user_0)** The first stage represent the attacker preparing the microarchitectural state for measurements. The first action in the template is in the `function_user_0`, where the `user` actor initializes the microarchitectural state by flushing the cache lines that will be used for measurements. This is done using the `measurement_start` macro, which is translated into a Flush stage of Flush+Reload attack. Revizor does this translation automatically based on the `executor_mode: F+R` setting in the configuration file. Note that the template does not actually start from the `function_user_0` actor function. Instead, it starts with the `function_main_0`, which is a function belonging to the `main` actor. This is because Revizor requires that the entry point to the test case must be within the `main` actor's code. **Phase 2: Secret Injection (function_main_1)** After the initial setup, the attacker transitions to the victim and let's it do some computations on the victim's secret data. The victim actor execute a sequence of random instructions in the `function_main_1` macro, which simulates the kernel performing operations on sensitive data. Here, "random instructions" means a sequence of instructions that is randomly generated in each fuzzing round (i.e., each generated test case will have a different sequence of instructions in `function_main_1`). This randomness is crucial because it allows us to test a wide range of ways how secret data can impact microarchitectural state, without knowing a priori what specific instruction sequences might trigger a leak. This was one of the key factors that allowed us to discover TSA vulnerabilities without knowing about them beforehand. **Phase 3: Secret Extraction (function_user_1)** Back in user mode, we first clear the architectural state to eliminate any architectural information flow between actors. This is necessary to prevent any architectural information flows between the actors, which could otherwise lead to false positives in the analysis because Revizor is unable to distinguish between architectural and microarchitectural information flows (to be precise, Revizor would be able to distinguish them with a more subtle contract, but re-initializing the registers is a simpler solution). ```assembly xor rax, rax # noremove mov rax, qword ptr [r14 + 0x2000] # noremove mov rbx, qword ptr [r14 + 0x2008] # noremove # ... more register initialization ``` After that, the attacker execute another sequence of random instructions, which simulates the user process attempting to access the sensitive data that was just processed by the kernel. Note that this sequence may include an attempt to access kernel memory from the user mode (see the `user-to-kernel-access` fault allowlist in the configuration). As we found out post-factum, this is not strictly necessary for TSA-SQ, but it helps to create complex microarchitectural conditions that can trigger the leak. Depending on whether random instruction sequence triggers the fault, the user actor will either switch to the kernel mode explicitly (using the `switch_u2k.main.user_1` macro) or the CPU will transfer control to the fault handler (`fault_handler` macro in the `function_main_2`). In this experiment, we were not particularly interested in fault handling, so both paths lead to the same point in the template. **Phase 4: State Measurement (function_user_2)** Finally, the "Reload" stage in `function_user_2` measures which cache lines were accessed by the random code in the previous stage. If the accessed cache lines were somehow influenced by the kernel's secret data, this will lead to a discrepancy in the "Reload" measurements, leading to diverging hardware traces for different inputs, and ultimately to Revizor detecting a violation. ## Configuration Overview Beyond the actor configuration, `config.yaml` contains several other important settings that guide the fuzzing campaign, as described next: * **Contract**: The contract configuration specifies what information leakage we consider acceptable ```yaml contract_observation_clause: ct contract_execution_clause: - noninterference ``` The `noninterference` execution clause implements the security property that observer actors cannot learn information about non-observer actors through microarchitectural channels. Combined with the `ct` (constant-time) observation clause, this allows the observer to see memory access patterns and control flow but prohibits leakage of raw data values. * **Exceptions**: The configuration includes `user-to-kernel-access` in the fault allowlist, which enables testing for Meltdown-type vulnerabilities. This was part of our original experimental design when we didn't yet know about TSA's existence. Revizor's program generator will randomly select memory accesses in the user actor and modify them to target kernel memory, triggering page faults. Interestingly, this exception-based approach helped discover TSA-SQ because the false completion events in the store queue can lead to timing differences in subsequent instructions, and the faults provide a constant-time reference point for the timing differences to get transformed into persistent cache state. Namely, when a variable-latency instruction is executed concurrently with a faulting instruction, it creates a race condition, where the cache impact of the variable-latency instruction can be influenced by whether the faulting instruction completes before or after it. Note the fault configuration quirk: we enable `user-to-kernel-access` globally but block it specifically for the main actor using `fault_blocklist`. This is the only way to enable a fault for a specific actor, because Revizor does not allow faults to be allow-listed for a specific actor. * **Statistical Analysis**: The statistical analysis parameters balance sensitivity with noise tolerance: ```yaml analyser_stat_threshold: 0.05 executor_sample_sizes: [15, 40, 160, 320] ``` The low threshold of 0.05 makes the analysis sensitive to subtle timing differences, while the adaptive sample sizes allow Revizor to start with quick tests and increase precision when potential violations are detected. * **Instruction Set**: The instruction set is defined as `x86-64` because we are targeting AMD CPUs, and the instruction categories include all base instructions, which allows for a wide range of microarchitectural interactions in the randomly generated code. Ideally, we would include even more categories, such as SIMD extensions and other advanced instructions, but Revizor does not yet support them (coming up soon, though!). ## Running the Campaign With the configuration and template in place, we can run the detection campaign using Revizor's `tfuzz` command. This command generates test cases based on the provided template and configuration, executes them, and analyzes the results for violations. ```shell ./revizor.py tfuzz -s base.json --save-violations t -w ./results/ \ -c config.yaml -t template.asm -n 100000 -i 25 ``` This runs 100,000 test cases with 25 inputs each. The `--save-violations` flag preserves any detected violations for later analysis. When TSA-SQ is present, you'll eventually see output similar to: ``` ================================ Violations detected ========================== Contract trace: 14140085380608124960 (hash) Hardware traces: Input group 1: [11] Input group 2: [36] ^^^.........^.................................^^................ [287 | 36 ] ^^^.........^.................................^................. [31 | 284 ] ``` The different hardware trace patterns for inputs 11 and 36, despite having the same contract trace hash, indicate that the CPU is leaking information not predicted by the noninterference contract. On our machine, the campaign typically takes about 5 hours to detect a leak, but your mileage may vary depending on the CPU model and due to the inherent randomness of the process. ## Verifying Genuine Violations To confirm that a detected violation is genuine, reproduce it using: ```bash ./revizor.py reproduce -s base.json -c ./results/violation-*/reproduce.yaml \ -t ./results/violation-*/program.asm -i ./results/violation-*/input_*.bin ``` A genuine violation will reproduce consistently across multiple runs with the same statistical pattern, confirming that the timing differences represent a real microarchitectural information leak. The next step is to do root-cause analysis of the violation, which is beyond the scope of this tutorial. See [Root-Causing a Violation Detected by Revizor](../../howto/root-cause-a-violation.md) for details on this process. ================================================ FILE: docs/ref/artifact-file-formats.md ================================================ # Artifact File Formats This document describes the structure of violations artifact files stored by Revizor when it detects a contract violation. ## Program Artifact Format The program artifact is stored as an assembly file named `program.asm` in the violation directory (e.g., `violation-/program.asm`). The file uses Intel syntax and is structured around actors, with each actor's code placed in a separate section. The program artifact is structured as follows: ```asm .intel_syntax noprefix # Required: Use Intel syntax .test_case_enter: # Required: marks the beginning of the test case .section .data.main # Start of "main" actor section ... # Instructions for main actor, # including possible control transfers to other actors .test_case_exit: # Required: marks the end of the test case; # Must be within the "main" actor section .section .data.actor2 # Start of "actor2" actor section ... # Instructions for actor2 ``` ## Input Data Artifact Format The inputs to the program are stored as binary files in the violation directory, named according to their order in the input sequence (e.g., `violation-/input_004.bin`). The format mimics the layout of the [sandbox memory](sandbox.md), with the only exception that some of the sections are removed as they are irrelevant for input data (e.g., the MACRO STACK and the padding areas). The layout of the input data files is as follows: | Offset | Actor ID | Section Name | Size, B | | ------ | -------- | ------------ | ------- | | 0x0 | ACTOR 0 | MAIN AREA | 0x1000 | | 0x1000 | | FAULTY AREA | 0x1000 | | 0x2000 | | GPR AREA | 0x40 | | 0x2040 | | SIMD AREA | 0x100 | | 0x2140 | | (unused) | 0xec0 | | 0x0 | ACTOR 1 | MAIN AREA | 0x1000 | | 0x1000 | | FAULTY AREA | 0x1000 | | 0x2000 | | GPR AREA | 0x40 | | 0x2040 | | SIMD AREA | 0x100 | | 0x2140 | | (unused) | 0xec0 | | ... | ... | ... | ... | ================================================ FILE: docs/ref/binary-formats.md ================================================ # Binary Formats in Revizor !!! info "Advanced Topic" This is an advanced topic describing internal implementation details of Revizor. You are unlikely to need this information unless you are extending or modifying Revizor's core components. This document describes the structure of the custom binary formats used by Revizor to transfer test cases and their data between different components, specifically for transferring generated test cases and their inputs to the executor kernel module and to the DynamoRIO-based model backend. Such custom formats are necessary because the components are implemented in different programming languages and different technologies, so passing objects directly is not possible. Using one of the standard formats (e.g., ELF) is also not an option because test cases in Revizor have special structure (e.g., multiple actors in different execution modes, some instructions are macros, etc.) and this structure is not supported by the standard formats. The formats are designed to as simple as possible to minimize the overhead of serialization and deserialization. ## Revizor Code Binary Format (RCBF) RCBF is a structured representation of the complete test case binary, together with its metadata. The structure is as follows: ``` yaml title="RCBF Structure" linenums="1" HEADER (16 bytes total) n_actors: 8 bytes # Number of Actors in the test case (also equals the number of code sections) n_symbols: 8 bytes # Number of symbols in the test case ACTOR TABLE (48 x n_actors bytes) actor_entry: # (repeated n_actors times) id: 8 bytes # Unique identifier for the actor mode: 8 bytes # Execution mode of the actor pl: 8 bytes # Protection level data_permissions: 8 bytes # Data access permissions data_ept_permissions: 8 bytes # EPT (Extended Page Table) data permissions code_permissions: 8 bytes # Code execution permissions SYMBOL TABLE (32 x n_symbols bytes) symbol_entry: # (repeated n_symbols times) owner: 8 bytes # ID of the actor that owns this symbol offset: 8 bytes # (Offset of the symbol within its section id: 8 bytes # (Symbol's unique identifier args: 8 bytes # (Number of arguments the symbol takes (relevant for macros) METADATA (24 x n_actors bytes) metadata_entry: owner: 8 bytes # (ID of the actor that owns this section size: 8 bytes # (Size of the code section in bytes reserved: 8 bytes # (Reserved for future use DATA (8 kB x n_actors bytes) code_section: # (repeated n_actors times) code: 8 kB # (Actual assembled binary code for the section ``` The file begins with a header containing the number of actors (it is also the number of sections) and the number of symbols in the test case. The term "symbol" in this context refers to any location in the test case that can be referenced. Two common types of symbols are functions (specifically, function entry points) and macros. Next, the file contains the actor table, which is an array of actor metadata entries, one for each actor in the test case. The actor metadata entry contains the actor's ID, execution mode, protection level, data permissions, EPT data permissions, and code permissions. After the actor table, the file contains the symbol table, which is an array of symbol entries, one for each symbol in the test case. The symbol entry contains the ID the section to which the symbol belongs, the offset of the symbol within the section, the symbol's ID, and the number of arguments the symbol takes (if it is a macro). The file continues with the table of metadata for each section in the test case. Each metadata entry contains the ID of the actor that owns the section and the size of the section. Finally, the file contains a sequence of code sections, one for each actor in the test case. These sections contain the actual assembled binary for each of the sections in the test case. ## Revizor Data Binary Format (RDBF) RDBF is a structured representation of the data used to initialize sandbox memory and registers before executing the test case. Note that this format combines multiple inputs into a single file. This is done because typically, a single test case program is executed multiple times with different inputs, and so it is more efficient to send a batch of inputs at once. ``` yaml title="RDBF Structure" linenums="1" HEADER (16 bytes) n_actors: 8 bytes # Number of Actors in the test case (also equals the number of data sections) n_inputs: 8 bytes # Number of inputs in the batch METADATA (16 x n_actors bytes) metadata_entry: # (repeated n_actors x n_inputs times) section_size: 8 bytes # Size of the data section reserved: 8 bytes # Reserved for future use DATA (12 x n_actors x n_inputs KB) input: # (repeated n_inputs times) data_section: # (repeated n_actors times) main_area: 4 KB # Main data area faulty_area: 4 KB # Faulty page area reg_init_region: 4 KB # Register initialization area ``` The file begins with a section containing the number of actors (equal to the number of sections) and the number of inputs in the batch. Next, the file contains the table of metadata for each data section, which only contains the size of the section. Finally, the file contains a sequence of data sections, one for each actor in the test case and each input in the batch. The data sections are arranged to mirror the data layout in the sandbox memory (see the [sandbox memory layout](sandbox.md) document for more information). ================================================ FILE: docs/ref/cli.md ================================================ # Command-Line Interface This document provides a complete reference for all command-line options accepted by the `rvzr` command (or `./revizor.py` if running directly from the source tree). !!! note "CLI vs Configuration Files" Revizor is controlled via two interfaces: command line arguments and a configuration file. Command line arguments specify the mode of operation and set high-level parameters (e.g., file paths, number of fuzzing rounds), while the configuration file specifies details of the fuzzing campaign (e.g., the target contract, generation parameters, etc). This document focuses on the former; for information on configuration files, see the [configuration documentation](config.md). ## General Syntax The general syntax of the command line is: ``` rvzr MODE [OPTIONS] # Where MODE can be: # fuzz fuzzing mode # tfuzz template fuzzing mode # reproduce reproduce mode # minimize test case minimization mode # analyse stand-alone trace analysis mode # generate stand-alone generation mode # download_spec call the script that downloads the instruction set specification ``` The available options depend on the selected mode. See [Execution Modes](modes.md) for descriptions of each mode's purpose and behavior. For example, a typical way to run Revizor is in fuzzing mode with a command like this: ```bash rvzr fuzz -s base.json -n 100 -i 10 -c config.yaml -w ./violations ``` This command will run the fuzzer for 100 iterations (i.e., 100 test cases), with 10 inputs per test case. The fuzzer will use the ISA spec stored in the `base.json` file, and will read the configuration from `config.yaml`. If the fuzzer finds a violation, it will be stored in the `./violations` directory. ## Fuzzing Mode Command-line arguments supported in `fuzz` mode: ``` -h, --help show this help message and exit -c CONFIG, --config CONFIG Path to the configuration file (YAML) that will be used during fuzzing. -I INCLUDE_DIR, --include-dir INCLUDE_DIR Path to the directory containing configuration files that included by the main configuration file (received via --config). -s INSTRUCTION_SET, --instruction-set INSTRUCTION_SET Path to the instruction set specification (JSON) file. -n NUM_TEST_CASES, --num-test-cases NUM_TEST_CASES Number of test cases. -i NUM_INPUTS, --num-inputs NUM_INPUTS Number of inputs per test case. -w WORKING_DIRECTORY, --working-directory WORKING_DIRECTORY -t TESTCASE, --testcase TESTCASE Use an existing test case [DEPRECATED - see reproduce] --timeout TIMEOUT Run fuzzing with a time limit [seconds]. No timeout when set to zero. --nonstop Don't stop after detecting an unexpected result --save-violations SAVE_VIOLATIONS If set, store all detected violations in working directory. ``` ## Template Fuzzing Mode Command-line arguments supported in `tfuzz` mode: ``` -h, --help show this help message and exit -c CONFIG, --config CONFIG Path to the configuration file (YAML) that will be used during fuzzing. -I INCLUDE_DIR, --include-dir INCLUDE_DIR Path to the directory containing configuration files that included by the main configuration file (received via --config). -s INSTRUCTION_SET, --instruction-set INSTRUCTION_SET Path to the instruction set specification (JSON) file. -n NUM_TEST_CASES, --num-test-cases NUM_TEST_CASES Number of test cases. -i NUM_INPUTS, --num-inputs NUM_INPUTS Number of inputs per test case. -w WORKING_DIRECTORY, --working-directory WORKING_DIRECTORY -t TEMPLATE, --template TEMPLATE The template to use for generating test cases --timeout TIMEOUT Run fuzzing with a time limit [seconds]. No timeout when set to zero. --nonstop Don't stop after detecting an unexpected result --save-violations SAVE_VIOLATIONS If set, store all detected violations in working directory. ``` ## Reproduce Mode Command-line arguments supported in `reproduce` mode: ``` -h, --help show this help message and exit -c CONFIG, --config CONFIG Path to the configuration file (YAML) that will be used during fuzzing. -I INCLUDE_DIR, --include-dir INCLUDE_DIR Path to the directory containing configuration files that included by the main configuration file (received via --config). -s INSTRUCTION_SET, --instruction-set INSTRUCTION_SET Path to the instruction set specification (JSON) file. -t TESTCASE, --testcase TESTCASE Path to the test case -i [INPUTS ...], --inputs [INPUTS ...] Path to the directory with inputs -n NUM_INPUTS, --num-inputs NUM_INPUTS Number of inputs per test case. [IGNORED if --input-dir is set] ``` ## Minimize Mode Command-line arguments supported in `minimize` mode: ``` -h, --help show this help message and exit -c CONFIG, --config CONFIG Path to the configuration file (YAML) that will be used during fuzzing. -I INCLUDE_DIR, --include-dir INCLUDE_DIR Path to the directory containing configuration files that included by the main configuration file (received via --config). -s INSTRUCTION_SET, --instruction-set INSTRUCTION_SET Path to the instruction set specification (JSON) file. --testcase TESTCASE, -t TESTCASE Path to the test case program that needs to be minimized. -i NUM_INPUTS, --num-inputs NUM_INPUTS Number of inputs to the program that will be used during minimization. --testcase-outfile TESTCASE_OUTFILE, -o TESTCASE_OUTFILE Output path for the minimized test case program. --input-outdir INPUT_OUTDIR Output directory for storing minimized inputs. --num-attempts NUM_ATTEMPTS Number of attempts to minimize the test case. --enable- Enable a specific pass during minimization. ``` See also the [minimization documentation](minimization-passes.md) for a list of available minimization passes. ## Stand-alone Trace Analysis Command-line arguments supported in `analyse` mode: ``` -h, --help show this help message and exit -c CONFIG, --config CONFIG Path to the configuration file (YAML) that will be used during fuzzing. -I INCLUDE_DIR, --include-dir INCLUDE_DIR Path to the directory containing configuration files that included by the main configuration file (received via --config). -s INSTRUCTION_SET, --instruction-set INSTRUCTION_SET Path to the instruction set specification (JSON) file. --ctraces CTRACES --htraces HTRACES ``` ## Stand-alone Generation Command-line arguments supported in `generate` mode: ``` -h, --help show this help message and exit -c CONFIG, --config CONFIG Path to the configuration file (YAML) that will be used during fuzzing. -I INCLUDE_DIR, --include-dir INCLUDE_DIR Path to the directory containing configuration files that included by the main configuration file (received via --config). -s INSTRUCTION_SET, --instruction-set INSTRUCTION_SET Path to the instruction set specification (JSON) file. -r SEED, --seed SEED Add seed to generate test case. -n NUM_TEST_CASES, --num-test-cases NUM_TEST_CASES Number of test cases. -i NUM_INPUTS, --num-inputs NUM_INPUTS Number of inputs per test case. -w WORKING_DIRECTORY, --working-directory WORKING_DIRECTORY --permit-overwrite Permit overwriting existing files. ``` ### Download Instruction Set Specification The following command-line arguments are supported in `download_spec` mode: ``` -h, --help show this help message and exit -a ARCHITECTURE, --architecture ARCHITECTURE The ISA to download the specification for (e.g., x86-64) --outfile OUTFILE, -o OUTFILE The destination file to save the downloaded specification. --extensions [EXTENSIONS ...] List of ISA extensions to include in the specification (e.g., SSE, VTX) ``` ================================================ FILE: docs/ref/config.md ================================================ # Configuration Options Below is a list of the available configuration options for Revizor, which are passed down to Revizor via a config file. For an example of how to write the config file, see [demo/big-fuzz.yaml](https://github.com/microsoft/side-channel-fuzzer/tree/main/demo/big-fuzz.yaml). ## Fuzzing Configuration #### `fuzzer` : :material-water:`basic` Select the variant of a fuzzer to be used. === "Syntax" ```yaml fuzzer: ``` === "Available Options" `basic` | `architectural` | `archdiff` === "Options Explained" * `basic` - normal model-based fuzzing. A violation in this mode indicates that the CPU exposes more information than predicted by the contract. This option should be used in most testing campaigns. * `architectural` - self-fuzzing for architectural mismatches between the model and the executor. This option should be used for testing the fuzzer itself, i.e., a violation in this mode indicates a bug in the fuzzer rather then a bug in the CPU. This is useful when running the fuzzer with a previously-untested instruction set, or when a new contract is implemented. * `archdiff` - fuzzing for architectural invariants. This is a special mode targeted for for semi-microarchitectural violations, similar to ZenBleed. This mode is experimental and should be used with caution. #### `enable_priming` : :material-water: `True` This option enables or disables priming. It should be set to True in most cases, as priming is crucial for eliminating false positives. : **What is priming?**: Priming solves the following problem: Revizor collects hardware traces for inputs in a sequence, and the microarchitectural state is not reset between the inputs. This means that the microarchitectural state for the input at, for example, position 100 is different from the state for the input at position 200. Accordingly, the hardware traces for these inputs may differ because the measurements are taken in different microarchitectural contexts. : To address this issue, we use priming, which swaps the inputs in the sequence and re-runs the tests. For example, if the original sequence is `(i1 . . . i99,i100,i101 . . . i199,i200)`, the priming sequence will be `(i1 . . . i99,i200,i101 . . . i199,i100)`. If the violation persists in this sequence, it is a true positive. If the violation disappears, it is a false positive, and it will be discarded. === "Syntax" ```yaml enable_priming: ``` #### `enable_speculation_filter` : :material-water: `False` If enabled, Revizor will not consider test cases that do not trigger speculation. : This option is useful for improving the throughput of the fuzzer, but it can discard potential violations if the leakage is not caused by speculation. === "Syntax" ```yaml enable_speculation_filter: ``` #### `enable_observation_filter` : :material-water: `False` If enabled, Revizor will not consider test cases that do not leave speculative traces. This is achieved by pre-filtering: For each test case, Revizor adds an `LFENCE` after each instruction in the test case, and compares the resulting hardware traces with the original. If the traces are identical, the test case is discarded without further processing. : This option is useful for improving the throughput of the fuzzer, but it can discard potential violations if the leakage is not caused by speculation. === "Syntax" ```yaml enable_observation_filter: ``` #### `enable_fast_path_model` : :material-water: `True` If enabled, the fuzzer will assume that all boosted inputs produce the same contract trace, and thus it will re-use the contract trace of the original input for all its boosted variants. This is normally a valid assumption to make if the taint tracker in the model does not contain bugs. : This option is a pure performance optimization. It only impacts the speed of fuzzing, and not its correctness. === "Syntax" ```yaml enable_fast_path_model: ``` #### `color` : :material-water: `False` If enabled, the output will be colored. This option is helps a lot with readability, but may produce corrupted output when redirected to a file. === "Syntax" ```yaml color: ``` #### `logging_modes` : :material-water: `['info', 'stat']` Control the information logged by Revizor. === "Syntax" ```yaml logging_modes: - - ... ``` === "Available Options" `info` | `stat` | `dbg_timestamp` | `dbg_violation` | `dbg_dump_htraces` | `dbg_dump_ctraces` | `dbg_dump_traces_unlimited` | `dbg_executor_raw` | `dbg_model` | `dbg_coverage` | `dbg_generator` | `dbg_priming` | `dbg_isa_filter` === "Options Explained" * `info` - general information about the progress of fuzzing; * `stat` - statistics the end of the fuzzing campaign; * `dbg_timestamp` - every 1000 test cases print the timestamp during the fuzzing process; * `dbg_violation` - upon detecting a violation, print detailed information about it; * `dbg_dump_htraces` - print the first 100 hardware traces for every test case; * `dbg_dump_ctraces` - print the first 100 contract traces for every test case; * `dbg_dump_traces_unlimited` - print ALL traces (use carefully, produces LOTS of text); * `dbg_executor_raw` - prints hardware traces for every stage of the fuzzing process; this differs from `dbg_dump_htraces` in that it prints the traces collected by speculation/observation filters as well as at every iteration of multi-sample collection; * `dbg_model` - print a detailed info about EVERY instruction executed on the model (use carefully, produces LOTS of text); * `dbg_coverage` - stores instruction coverage information; * `dbg_generator` - prints a list of instructions used to generate test cases; * `dbg_priming` - prints information about the priming process; only useful for debugging the priming mechanism itself. * `dbg_isa_filter` - when rvzr loads information about the instruction set (normally, from `base.json`), it filters out some of the instructions, either because of the config options provided by the user, or because some instructions are known to cause issues in the model or executor. This debug option prints the list of instructions that were filtered out, along with the reason for filtering them out. #### `multiline_output` : :material-water: `False` If enabled, each output message will be printed on a separate line. Otherwise, the fuzzing progress will be continuously overwriting the same line (works only in the terminal). === "Syntax" ```yaml enable_priming: ``` ## Program Generator Configuration #### `generator` : :material-water: `random` Select the type of program generator to be used. === "Syntax" ```yaml generator: ``` === "Available Options" `random` === "Options Explained" * `random` - generate random assembly programs. This is the only supported option at the moment. #### `instruction_set` : :octicons-cpu-24: The instruction set under test. === "Syntax" ```yaml instruction_set: ``` === "Available Options" `x86-64` | `arm64` #### `instruction_categories` : :octicons-cpu-24: Select a list of instruction categories to be used when generating programs. This list effectively filters out instructions from the ISA descriptor file (e.g., `base.json`) passed via the command line (`-s`). !!! info "Priority" This list has higher priority than `instruction_blocklist` but lower than `instruction_allowlist`. The resulting instruction pool is: `all from(instruction_categories) - instruction_blocklist + instruction_allowlist` === "Syntax" ```yaml instruction_categories: - - ... ``` === "Available Options" Any category in the ISA descriptor file (`base.json`). #### `instruction_blocklist` : :octicons-cpu-24: A list of instructions that will **not** be used for generating programs. This list filters out instructions from `instruction_categories`, but not from `instruction_allowlist`. !!! info "Priority" This list has lower priority than `instruction_allowlist`. The resulting instruction pool is: `all from(instruction_categories) - instruction_blocklist + instruction_allowlist` !!! warning "Danger Zone" This option has a somewhat sensible default value for each supported architecture, selected to avoid known-bad instructions. Thus, setting this option explicitly is unadvisable. Prefer using `instruction_blocklist_append` to add more instructions to the default blocklist. === "Syntax" ```yaml instruction_blocklist: - - ... ``` === "Available Options" Any instruction in the ISA descriptor file (`base.json`). #### `instruction_blocklist_append` : :material-water: `[]` A list of instructions that will be appended to the default blocklist for the target ISA. This option is identical to `instruction_blocklist`, but the list is added to the default instead of replacing it. !!! info "Priority" This list has lower priority than `instruction_allowlist`. The resulting instruction pool is: `all from(instruction_categories) - instruction_blocklist + instruction_allowlist` === "Syntax" ```yaml instruction_blocklist_append: - - ... ``` === "Available Options" Any instruction in the ISA descriptor file (`base.json`). #### `instruction_allowlist` : :material-water: `[]` A list of instructions to use for generating programs. !!! info "Priority" This list has priority over `instruction_categories` and over `instruction_blocklist`, thus adding instructions on top of the categories. The resulting instruction pool is: `all from(instruction_categories) - instruction_blocklist + instruction_allowlist` === "Syntax" ```yaml instruction_allowlist: - - ... ``` === "Available Options" Any instruction in the ISA descriptor file (`base.json`). #### `program_generator_seed` : :material-water: `0` Seed of the program generator (aka code generator). If set to zero, a random seed will be used for each run. === "Syntax" ```yaml program_generator_seed: ``` #### `program_size` : :material-water: `24` Number of instructions in the test case programs to be produced by the code generator. Note that the actual size might be larger because of the instrumentation. === "Syntax" ```yaml program_size: ``` #### `avg_mem_accesses` : :material-water: `12` Average number of memory accesses in the test case programs to be produced by the code generator. The actual number will be random, but the average over all programs will be close to this value. === "Syntax" ```yaml avg_mem_accesses: ``` #### `min_bb_per_function` : :material-water: `1` Minimal number of basic blocks per function in generated programs. === "Syntax" ```yaml min_bb_per_function: ``` #### `max_bb_per_function` : :material-water: `2` Maximal number of basic blocks per function in generated programs. === "Syntax" ```yaml max_bb_per_function: ``` #### `min_successors_per_bb` : :material-water: `2` Minimal number of successors for each basic block in generated programs. !!! note "Hint, not a rule" This option is a *hint*; it could be overwritten * if the instruction set does not have the necessary instructions to satisfy it * if a certain number of successor is required for correctness. * if min_successors_per_bb > max_successors_per_bb, the value is overwritten with max_successors_per_bb === "Syntax" ```yaml min_successors_per_bb: ``` #### `max_successors_per_bb` : :material-water: `2` Maximal number of successors for each basic block in generated programs. !!! note "Hint, not a rule" This option is a *hint*; it could be overwritten * if the instruction set does not have the necessary instructions to satisfy it * if a certain number of successor is required for correctness === "Syntax" ```yaml max_successors_per_bb: ``` #### `register_allowlist` : :material-water: `[]` A list of registers that **can** be used for generating programs. !!! info "Priority" This list has higher priority than `register_blocklist`. The resulting list is: `(all registers - register_blocklist) + register_allowlist`. === "Syntax" ```yaml register_allowlist: - - ... ``` === "Available Options" Any register supported by the target CPU. #### `register_blocklist` : :octicons-cpu-24: A list of registers that will **not** be used for generating programs. !!! info "Priority" This list has lower priority than `register_allowlist`. The resulting list is: `(all registers - register_blocklist) + register_allowlist`. !!! warning "Danger Zone" The default value of this option includes registers that reserved for internal use by the executor, and thus should be avoided. Modifying this option may lead to a full system crash. === "Syntax" ```yaml register_blocklist: - - ... ``` === "Available Options" Any register supported by the target CPU. #### `faults_allowlist` : :material-water: `[]` By default, the generator will produce programs that never trigger exceptions. This option modifies this behavior by permitting the generator to produce 'unsafe' instruction sequences that could potentially trigger an exception. The model and executor will also be configured to handle these exceptions gracefully. === "Syntax" ```yaml faults_allowlist: - - ... ``` === "Available Options" `div-by-zero` | `div-overflow` | `opcode-undefined` | `breakpoint` | `debug-register` | `non-canonical-access` | `user-to-kernel-access` === "Options Explained" * `div-by-zero` - generate divisions with unmasked divisor, which can cause a division by zero exception. * `div-overflow` - generate divisions with unmasked dividend, which can cause an overflow exception. * `opcode-undefined` - generate undefined opcodes, which can cause an undefined opcode exception. * `breakpoint` - generate breakpoints, which can cause INT3 exceptions. * `debug-register` - generate instructions that cause INT1 exceptions. * `non-canonical-access` - randomly select a memory access in a generated program and instrument it to access a non-canonical address. * `user-to-kernel-access` - randomly select memory access instructions in user-privilege actors and instrument them to access the kernel actor's (actor 0) memory. This creates cross-privilege-level memory access patterns useful for detecting CPU vulnerabilities like Meltdown. Requires at least one actor with `privilege_level: user`. The instrumentation modifies both the memory operands and the sandboxing masks to ensure accesses target the kernel's FAULTY data area. ## Actor Configuration All actors are defined in the `actors` list, with the following syntax: ```yaml actors: - : : : : : ... - : ... ... ``` The following options are available for each actor: #### `mode` : :material-water: `host` The execution mode of the actor. === "Syntax" ```yaml actors: - : mode: ``` === "Available Options" `host` | `guest` === "Options Explained" * `host` - the actor runs in the normal, non-virtualized mode. * `guest` - the actor runs in a VM (one VM per actor). #### `privilege_level` : :material-water: `kernel` The privilege level of the actor. === "Syntax" ```yaml actors: - : privilege_level: ``` === "Available Options" `user` | `kernel` === "Options Explained" * `user` - the actor runs in user mode (CPL=3). * `kernel` - the actor runs in kernel mode (CPL=0). #### `data_properties` : :material-water: (see below) The properties of the data memory used by the actor. These properties are applied only to the faulty page of the actor's data region (see [sandbox](../ref/sandbox.md) for details). : Note that the above properties are set in the host page tables for actors with `mode: host`, and in the guest page tables for actors with `mode: guest`. === "Syntax" ```yaml actors: - : data_properties: present: writable: user: accessed: dirty: executable: reserved_bit: randomized: ``` === "Available Options" `present` | `writable` | `user` | `accessed` | `dirty` | `executable` | `reserved_bit` | `randomized` === "Options Explained" * `present` [default: True] - the value of the Present bit in the page table entry. * `writable` [default: True] - the value of the Writable bit in the page table entry. * `user` [default: False] - the value of the User/Supervisor bit in the page table entry. * `accessed` [default: True] - the value of the Accessed bit in the page table entry. * `dirty` [default: True] - the value of the Dirty bit in the page table entry. * `executable` [default: False] - the value of the Executable bit in the page table entry. * `reserved_bit` [default: False] - the value of the Reserved bit in the page table entry. * `randomized` [default: False] - if true, the values of the above properties will be randomized for each test case. #### `data_ept_properties` : :material-water: `(see below)` The properties of the EPT entry used by the actor (on Intel) or the NPT entry (on AMD). These properties are applied only to the faulty page of the actor's data region (see [sandbox](../ref/sandbox.md) for details). : This property has no effect on actors with `mode: host`. === "Syntax" ```yaml actors: - : data_ept_properties: present: writable: executable: accessed: dirty: user: reserved_bit: randomized: ``` === "Available Options" `present` | `writable` | `executable` | `accessed` | `dirty` | `user` | `reserved_bit` | `randomized` === "Options Explained" * `present` [default: True] - the value of the Present bit in the EPT/NPT entry. * `writable` [default: True] - the value of the Writable bit in the EPT/NPT entry. * `executable` [default: False] - the value of the Executable bit in the EPT/NPT entry. * `accessed` [default: True] - the value of the Accessed bit in the EPT/NPT entry. * `dirty` [default: True] - the value of the Dirty bit in the EPT/NPT entry. * `user` [default: False] - the value of the User/Supervisor bit in the EPT/NPT entry. * `reserved_bit` [default: False] - the value of the Reserved bit in the EPT/NPT entry. * `randomized` [default: False] - if true, the values of the above properties will be randomized for each test case. #### `observer` : :material-water: `False` If enabled, the actor will be an observer actor, hence modelling an attacker. This option is only used if the contract is `noninterference`, and it is ignored otherwise. === "Syntax" ```yaml actors: - : observer: ``` #### `instruction_blocklist` : :material-water: `[]` Actor-specific instruction blocklist. : This option is useful when writing a test case template that uses multiple actors, and some actors should use a different set of instructions than the others. For example, if privileged instructions should be blocked for low-privilege actors. !!! info "Priority" This list has priority over the global `instruction_blocklist` and modifies the instruction pool for the specific actor. === "Syntax" ```yaml actors: - : instruction_blocklist: - - ... ``` #### `fault_blocklist` : :material-water: `[]` Actor-specific fault blocklist. : For example, when using `user-to-kernel-access`, you typically want to add it to the kernel actor's `fault_blocklist` to prevent the kernel from accessing its own memory (which would not be a cross-privilege access). !!! info "Priority" This list has priority over the global `faults_allowlist` and modifies the fault-inducing instrumentation for the specific actor. === "Syntax" ```yaml actors: - : fault_blocklist: - - ... ``` === "Available Options" See [`faults_allowlist`](#faults_allowlist) for the list of available faults. ## Data Generator Configuration #### `data_generator` : :material-water: `random` Select the method of test case data generation. === "Syntax" ```yaml data_generator: ``` === "Available Options" `random` === "Options Explained" * `random` - generate random input data for the test cases. This is the only supported option at the moment. #### `data_generator_seed` : :material-water: `10` Seed of the test case data generator. If set to zero, a random seed will be used for each run. === "Syntax" ```yaml data_generator_seed: ``` #### `data_generator_entropy_bits` : :material-water: `31` Entropy of the random values created by the data generator. === "Syntax" ```yaml data_generator_entropy_bits: ``` === "Allowed Values" Integer in the range `[1, 31]` #### `input_gen_probability_of_special_value` : :material-water: `0.05` When set to a non-zero value, the data generator will occasionally produce special values (such as zero or MAX_INT) alongside random values, with the frequency controlled by this probability. These special values help exercise fast-path optimizations in the microarchitecture. === "Syntax" ```yaml input_gen_probability_of_special_value: ``` === "Allowed Values" Float in the range `[0.0, 1.0]` #### `inputs_per_class` : :material-water: `2` Number of inputs generated for each input class via input boosting (aka Contract-Driven Input Generation). For the explanation of the input classes and the generation algorithm, see [this paper](https://arxiv.org/pdf/2301.07642), Section 4.D. Contract-driven Input Generator. === "Syntax" ```yaml inputs_per_class: ``` ## Contract Configuration #### `contract_execution_clause` : :material-water: `['seq']` The execution clause of the contract. Multiple clauses can be combined to form a more permissive contract. === "Syntax" ```yaml contract_execution_clause: - ``` === "Available Options" `seq` | `no_speculation` | `seq-assist` | `cond` | `conditional_br_misprediction` | `bpas` | `nullinj-fault` | `nullinj-assist` | `delayed-exception-handling` | `div-zero` | `div-overflow` | `meltdown` | `fault-skip` | `noncanonical` | `vspec-ops-div` | `vspec-ops-memory-faults` | `vspec-ops-memory-assists` | `vspec-ops-gp` | `vspec-all-div` | `vspec-all-memory-faults` | `vspec-all-memory-assists` === "Options Explained" * `seq` - sequential execution. * `no_speculation` - sequential execution. Synonym for `seq`. * `seq-assist` - sequential execution with possible microcode assists. * `cond` - permitted misprediction of conditional branches. * `conditional_br_misprediction` - permitted misprediction of conditional branches. Synonym for `cond`. * `bpas` - permitted speculative store bypass * `nullinj-fault` - page faults are permitted to speculatively return zero. * `nullinj-assist` - microcode assists are permitted to speculatively return zero. * `delayed-exception-handling` - upon an exception or a fault, data-independent instructions that follow the exception are allowed to execute speculatively. * `meltdown` - permission-based page faults are permitted to speculatively return the value in the memory. * `fault-skip` - upon a fault, the faulting instruction is speculatively skipped. * `noncanonical` - permitted speculative non-canonical memory accesses. * `vspec*` - experimental contracts for value speculation. See [this paper](https://www.usenix.org/system/files/usenixsecurity23-hofmann.pdf) for details. * `div-zero` - experimental contract; do not use. * `div-overflow` - experimental contract; do not use. #### `contract_observation_clause` : :material-water: `ct` The observation clause of the contract. In most cases, the default value should be used. === "Syntax" ```yaml contract_observation_clause: ``` === "Available Options" `none` | `l1d` | `memory` | `pc` | `ct` | `loads+stores+pc` | `ct-nonspecstore` | `ctr` | `arch` | `tct` | `tcto` | `ct-ni` === "Options Explained" * `none` - the model observes nothing. Useful for testing the fuzzer. * `l1d` - the model observes the addresses of data accesses, adjusted to imitate the L1D cache trace. Has very few real applications, and should be generally avoided. * `memory` - the model observes the addresses of data accesses. * `ct` (constant time tracer) - the model observes the addresses of data accesses and the control flow. * `loads+stores+pc` - the model observes the addresses of data accesses and the control flow. Synonym for `ct`. * `ct-nonspecstore` - the model observes the addresses of data accesses and the control flow, but does not observe the addresses of stores during speculation. * `ctr` - the model observes the addresses of data accesses and the control flow, as well as the values of the general-purpose registers. * `arch` - the model observes the addresses of data accesses and the control flow, as well as the values loaded from memory. This clause imitates the security guarantees provided by secure speculation mechanisms like STT. * `tct` (truncated constant time tracer) - the model observes address of the memory access and of the program counter at cache line granularity. * `tcto` (truncated constant time tracer with overflows) - the model address of the memory access and of the program counter at cache line granularity + observe cache line overflows. * `ct-ni` - (only available in multi-actor context) when executing actors with `observer: false`, the model observes the same data as as with `ct`. When executing actors with `observer: true`, the model observes complete memory of the actor as well as their register values. #### `model_backend` : :material-water: `unicorn` The backend used to implement the contract model. === "Syntax" ```yaml model_backend: ``` === "Available Options" `dummy` | `unicorn` | `dynamorio` === "Options Explained" * `unicorn` - use Unicorn-based implementation of the model. This backend is more mature and feature-rich, but it supports a considerably smaller set of instruction than DynamoRIO (essentially, only the base x86 or ARM instruction sets, without any extensions). * `dynamorio` - use DynamoRIO-based implementation of the model. This backend is less mature and supports fewer contracts and features than Unicorn, but it can handle a much larger set of instructions, including complex extensions like AVX-512 on x86-64. It is also generally faster than Unicorn, especially when testing large test case or running with many inputs per test case. * `dummy` - use a dummy model. This model always returns the same (empty) contract trace, and as such will not produce meaningful results. This option is useful, however, when root-causing violations, because it allows to collect hardware traces without running the model, hence allowing to trace instructions that are not supported by any of the backends. #### `model_min_nesting` : :material-water: `1` Minimum number of nested mispredictions in the model. This value is used to generate the contract traces on the fast path of the fuzzer. Chose a small value when speculation is rare, and a larger value when speculation is common. : This option is a pure performance optimization. It only impacts the speed of fuzzing, and not its correctness. === "Syntax" ```yaml model_min_nesting: ``` #### `model_max_nesting` : :material-water: `30` Maximum number of nested mispredictions in the model. This value is used to generate the contract traces on the slow path of the fuzzer, i.e., when a potential violation is detected and the fuzzer tries to check if it is a true positive. : In contrast to `model_min_nesting`, this option could cause false positives if set too low. Thus, it is advisable to set it to a sufficiently high value to cover all possible nested mispredictions in the test cases. Leave the default unless you are sure that a lower value is sufficient. === "Syntax" ```yaml model_max_nesting: ``` #### `model_max_spec_window` : :material-water: `250` Size of the speculation window in the model. : This option sets a trade-off between accuracy and performance. A larger speculation window avoids potential false positives due to inaccurate modelling of the speculation, but it also slows down the model execution. Leave the default unless you are sure that a different value is needed. === "Syntax" ```yaml model_max_spec_window: ``` ## Executor Configuration #### `executor` : :octicons-cpu-24: ISA-specific version of the executor to use. The default value is auto-detected based on `cpuinfo`. Should be changed only if the auto-detection fails. === "Syntax" ```yaml executor: ``` === "Available Options" `x86-64-intel` | `x86-64-amd` | `arm64` #### `executor_mode` : :material-water: `P+P` Method of collecting hardware traces in the executor. The method determines the contents of hardware traces. === "Syntax" ```yaml executor_mode: ``` === "Available Options" `P+P` | `F+R` | `E+R` | `PP+P` | `TSC` === "Options Explained" * `P+P` - prime and probe side-channel attack. The hardware traces contain the cache sets that were accessed during the execution of the test case. * `F+R` - flush and reload side-channel attack. The hardware traces contain the memory addresses that were accessed during the execution of the test case. * `E+R` - evict and reload side-channel attack. The hardware traces contain the cache sets that were accessed during the execution of the test case. * `PP+P` - partial prime and probe (i.e., leave a subset of cache lines unprimed). The hardware traces contain the cache sets that were accessed during the execution of the test case. * `TSC` - use `RDTSCP` instruction to measure the execution time of test cases. The hardware traces contain the execution time, in cycles. #### `executor_warmups` : :material-water: `5` Number of warmup rounds executed before starting to collect hardware traces. === "Syntax" ```yaml executor_warmups: ``` #### `executor_sample_sizes` : :material-water: `[10, 50, 100, 500]` A list of sample sizes to be used during the measurements. !!! info "Clarification" Executor normally performs measurements multiple times for each test case in order to collect a sample of hardware traces. This allows Revizor to tolerate noise and non-determinism in the measurements by applying statistical methods for comparing the traces. For performance reasons, Revizor does not immediately use a large sample size. Instead, it starts with a small sample, collects the traces, and checks if a violation is detected. If no violation is detected, the executor assumes that the test case is safe, and moves on to the next one. If a violation is detected, however, the executor tries to reproduce it with larger sample sizes. This option defines the list of sample sizes through which Revizor will iterate in this process. To make it sensible, the list should be sorted in ascending order with a reasonable gap between the sizes. === "Syntax" ```yaml executor_sample_sizes: - - ... ``` #### `executor_filtering_repetitions` : :material-water: `10` The sample size to be used by the speculation and observation filters. === "Syntax" ```yaml executor_filtering_repetitions: ``` #### `executor_taskset` : :material-water: `0` The CPU core ID which the executor will use for running test cases. That is, the executor process will be pinned to this core. === "Syntax" ```yaml executor_taskset: ``` #### `enable_pre_run_flush` : :material-water: `True` If enabled, the executor will do its best to flush the microarchitectural state before running test cases. === "Syntax" ```yaml enable_pre_run_flush: ``` ## Analyser Configuration #### `analyser` : :material-water: `chi2` The type of the analyser that is used to compare hardware traces against contract traces. === "Syntax" ```yaml analyser: ``` === "Available Options" `chi2` | `mwu` | `sets` | `bitmaps` === "Options Explained" * `sets` - combine the hardware traces for each input into a set. A violation is reported if two inputs in the same contract-equivalence class have different sets of hardware traces. * `bitmaps` - combine the hardware traces for each input into a bitmap. A violation is reported if two inputs in the same contract-equivalence class have different bitmaps of hardware traces. * `chi2` - use the chi-squared homogeneity test to compare the hardware traces of inputs in the same contract-equivalence class. This test effectively checks if the hardware traces from two different inputs come from the same distribution. A violation is reported if the test fails. * `mwu` - [experimental; both false positives and negatives are possible] use the Mann-Whitney U test to compare the hardware traces of inputs in the same contract-equivalence class. This test effectively checks if the hardware traces from two different inputs come from the same distribution. A violation is reported if the test fails. #### `analyser_subsets_is_violation` : :material-water: `False` This option is relevant only for the `sets` and `bitmaps` analysers. If enabled, the analyser will not label hardware traces as mismatching if they form a subset relation. === "Syntax" ```yaml analyser_subsets_is_violation: ``` #### `analyser_outliers_threshold` : :material-water: `0.1` This option is relevant only for the `sets` and `bitmaps` analysers. The analyser will ignore the hardware traces that appear in less than this percentage of the sampled traces. === "Syntax" ```yaml analyser_outliers_threshold: ``` #### `analyser_stat_threshold` : :material-water: `0.5` This option is relevant only for the `chi2` and `mwu` analysers. The threshold for the statistical tests. If a pair of hardware traces has the (normalized) statistics below the threshold, then the traces are considered equivalent. : For the chi2 test, the threshold is applied to the `statistics / (len(htrace1) + len(htrace2))`. : For the mwu test, the threshold is applied to the p-value. === "Syntax" ```yaml analyser_stat_threshold: ``` ## Miscellaneous Configuration #### `coverage_type` : :material-water: `none` The type of coverage tracking. === "Syntax" ```yaml coverage_type: ``` === "Available Options" `none` | `model_instructions` === "Options Explained" * `none` - disable coverage tracking. * `model_instructions` - track how many times the model executed each instruction in the target ISA. #### `minimizer_retries` : :material-water: `1` Number of minimization retries. When the minimizer performs a check to reduce a test case, each check is attempted this number of times and it succeeds if at least one check is successful. === "Syntax" ```yaml minimizer_retries: ``` ## Unique x86-64 Options #### `x86_executor_enable_ssbp_patch` : :material-water: `True` Enable a microcode patch against Speculative Store Bypass, if available. === "Syntax" ```yaml x86_executor_enable_ssbp_patch: ``` #### `x86_executor_enable_prefetcher` : :material-water: `False` Enable all prefetchers, if the software controls are available. === "Syntax" ```yaml x86_executor_enable_prefetcher: ``` #### `x86_disable_div64` : :material-water: `True` Do not generate 64-bit division instructions. Useful for avoiding certain types of speculation that are specific to 64-bit division. === "Syntax" ```yaml x86_disable_div64: ``` #### `x86_enable_hpa_gpa_collisions` : :material-water: `False` When a test case contains at least one guest actor, allocate its memory in the guest physical address space to match the corresponding host physical addresses of the main actor. Useful for testing Foreshadow-like leaks. === "Syntax" ```yaml x86_enable_hpa_gpa_collisions: ``` #### `x86_generator_align_locks` : :material-water: `True` When generating memory accesses with locks, apply instrumentation to align the locks to 8 bytes. Useful for avoiding faults on unaligned accesses. === "Syntax" ```yaml x86_generator_align_locks: ``` --- ## What's Next? - [Command Line Interface](cli.md) - CLI options and modes - [demo/big-fuzz.yaml](https://github.com/microsoft/side-channel-fuzzer/tree/main/demo/big-fuzz.yaml) - Comprehensive example configuration - [demo/](https://github.com/microsoft/side-channel-fuzzer/tree/main/demo/) - Example configurations for various scenarios ================================================ FILE: docs/ref/index.md ================================================ # Reference Documentation Complete technical reference for all Revizor components, commands, configuration options, and formats. ## User-Facing Components * [Command Line Interface](cli.md) Complete reference for all `rvzr` command-line options and arguments. Covers common options and mode-specific parameters. * [Execution Modes](modes.md) Detailed specifications for all execution modes: fuzzing, template fuzzing, reproduce, minimize, analyse, generate, and download_spec. * [Configuration Options](config.md) Complete reference for all configuration file parameters organized by component: fuzzer, generator, executor, model, analyser, and actors. * [Macros Reference](macros.md) Complete reference for all template macros including measurement control, fault handling, code generation, and actor transitions. * [Minimization Passes](minimization-passes.md) Complete list of available minimization passes for reducing test case complexity while preserving violations. ## Architecture & Internals Low-level technical references for Revizor's internal components. * [Binary Formats](binary-formats.md) Specifications for Revizor's binary file formats: RCBF (Revizor Contract Binary Format) and RDBF (Revizor DynamoRIO Binary Format). * [Registers](registers.md) Register specifications and conventions for x86-64 and ARM64 architectures. * [Sandbox](sandbox.md) Memory layout and sandboxing mechanisms used during test execution. ================================================ FILE: docs/ref/macros.md ================================================ # Macros This document provides a complete reference for all macros available in Revizor. !!! note "Related Documentation" This document is intended as a reference; if you're looking for a practical guide on how to use the `macros`, please refer to [How-To: Use Macros](../howto/use-macros.md). ## Overview Macros are special pseudo-instructions in assembly test cases that appear as labels with the `.macro` prefix. They are dynamically expanded into actual implementations during execution by the model and executor. Macros enable complex operations like domain transitions, measurement control, and random code generation within test cases. Macros accept up to four static arguments. Arguments are strictly static (either a constant integer or a string); dynamic values (registers, memory addresses) are not supported. === "Syntax" ```assembly .macro.....: ``` === "Example" ```assembly ; Macro to switch execution to ; a function called `main` that belongs to the actor `actor_2` .macro.switch.user.function_user_0: ``` ## Measurement Macros Control the start and end of hardware and contract trace collection. #### `measurement_start` : Begins hardware and contract trace collection. Instructions before this macro are executed but not included in the contract/hardware traces. === "Syntax" ```assembly .macro.measurement_start: ; alternative .macro.measurement_start.