Full Code of N64Recomp/N64Recomp for AI

main 81213c1831fa cached
30 files
629.4 KB
155.1k tokens
228 symbols
1 requests
Download .txt
Showing preview only (649K chars total). Download the full file or copy to clipboard to get everything.
Repository: N64Recomp/N64Recomp
Branch: main
Commit: 81213c1831fa
Files: 30
Total size: 629.4 KB

Directory structure:
gitextract_8ah6s3g4/

├── .github/
│   └── workflows/
│       └── validate.yml
├── .gitignore
├── .gitmodules
├── CMakeLists.txt
├── LICENSE
├── LiveRecomp/
│   ├── live_generator.cpp
│   └── live_recompiler_test.cpp
├── OfflineModRecomp/
│   └── main.cpp
├── README.md
├── RSPRecomp/
│   └── src/
│       └── rsp_recomp.cpp
├── RecompModMerger/
│   └── main.cpp
├── RecompModTool/
│   └── main.cpp
├── include/
│   ├── recomp.h
│   └── recompiler/
│       ├── context.h
│       ├── generator.h
│       ├── live_recompiler.h
│       └── operations.h
└── src/
    ├── analysis.cpp
    ├── analysis.h
    ├── cgenerator.cpp
    ├── config.cpp
    ├── config.h
    ├── elf.cpp
    ├── main.cpp
    ├── mdebug.cpp
    ├── mdebug.h
    ├── mod_symbols.cpp
    ├── operations.cpp
    ├── recompilation.cpp
    └── symbol_lists.cpp

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/validate.yml
================================================
name: validate
on:
  push:
    branches:
      - main
  pull_request:
    types: [opened, synchronize]
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true
jobs:
  build:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        type: [ Debug, Release ]
        # macos-13 is intel, macos-14 is arm, blaze/ubuntu-22.04 is arm
        os: [ ubuntu-latest, windows-latest, macos-15-intel, macos-14, blaze/ubuntu-22.04 ]
    name: ${{ matrix.os }} (${{ (matrix.os == 'macos-14' || matrix.os == 'blaze/ubuntu-22.04') && 'arm64' || 'x64' }}, ${{ matrix.type }})
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          submodules: true
      - name: ccache
        uses: hendrikmuhs/ccache-action@v1.2
        with:
          key: ${{ matrix.os }}-N64Recomp-ccache-${{ matrix.type }}
      - name: Install Windows Dependencies
        if: runner.os == 'Windows'
        run: |
          choco install ninja
          Remove-Item -Path "C:\ProgramData\Chocolatey\bin\ccache.exe" -Force -ErrorAction SilentlyContinue
      - name: Install Linux Dependencies
        if: runner.os == 'Linux'
        run: |
          sudo apt-get update
          sudo apt-get install -y ninja-build
      - name: Install macOS Dependencies
        if: runner.os == 'macOS'
        run: |
            brew install ninja
      - name: Configure Developer Command Prompt
        if: runner.os == 'Windows'
        uses: ilammy/msvc-dev-cmd@v1
      - name: Build N64Recomp (Unix)
        if: runner.os != 'Windows'
        run: |-
          # enable ccache
          export PATH="/usr/lib/ccache:/usr/local/opt/ccache/libexec:$PATH"
          
          cmake -DCMAKE_BUILD_TYPE=${{ matrix.type }} -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_MAKE_PROGRAM=ninja -G Ninja -S . -B cmake-build
          cmake --build cmake-build --config ${{ matrix.type }} -j $(nproc)
      - name: Upload Build Artifacts (Unix)
        if: runner.os != 'Windows'
        uses: actions/upload-artifact@v4
        with:
          name: N64Recomp-${{ runner.os }}-${{ runner.arch }}-${{ matrix.type }}
          path: |
            ./cmake-build/LiveRecompTest
            ./cmake-build/N64Recomp
            ./cmake-build/OfflineModRecomp
            ./cmake-build/RecompModTool
            ./cmake-build/RSPRecomp
      - name: Build N64Recomp (Windows)
        if: runner.os == 'Windows'
        run: |-
          # enable ccache
          set $env:PATH="$env:USERPROFILE/.cargo/bin;$env:PATH"

          cmake -DCMAKE_BUILD_TYPE=${{ matrix.type }} -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_MAKE_PROGRAM=ninja -G Ninja -S . -B cmake-build
          cmake --build cmake-build --config ${{ matrix.type }}
      - name: Upload Build Artifacts (Windows)
        if: runner.os == 'Windows'
        uses: actions/upload-artifact@v4
        with:
          name: N64Recomp-${{ runner.os }}-${{ runner.arch }}-${{ matrix.type }}
          path: |
            ./cmake-build/LiveRecompTest.exe
            ./cmake-build/N64Recomp.exe
            ./cmake-build/OfflineModRecomp.exe
            ./cmake-build/RecompModTool.exe
            ./cmake-build/RSPRecomp.exe


================================================
FILE: .gitignore
================================================
# VSCode file settings
.vscode/settings.json
.vscode/c_cpp_properties.json

# Input elf and rom files
*.elf
*.z64

# Local working data
tests

# Linux build output
build/
*.o

# Windows build output
*.exe

# User-specific files
*.rsuser
*.suo
*.user
*.userosscache
*.sln.docstates

# Build results
[Dd]ebug/
[Dd]ebugPublic/
[Rr]elease/
[Rr]eleases/
x64/
x86/
[Ww][Ii][Nn]32/
[Aa][Rr][Mm]/
[Aa][Rr][Mm]64/
bld/
[Bb]in/
[Oo]bj/
[Ll]og/
[Ll]ogs/

# Visual Studio 2015/2017 cache/options directory
.vs/

# Runtime files
imgui.ini
rt64.log
.idea
cmake-build*
.DS_Store


================================================
FILE: .gitmodules
================================================
[submodule "lib/rabbitizer"]
	path = lib/rabbitizer
	url = https://github.com/Decompollaborate/rabbitizer
[submodule "lib/ELFIO"]
	path = lib/ELFIO
	url = https://github.com/serge1/ELFIO
[submodule "lib/fmt"]
	path = lib/fmt
	url = https://github.com/fmtlib/fmt
[submodule "lib/tomlplusplus"]
	path = lib/tomlplusplus
	url = https://github.com/marzer/tomlplusplus
[submodule "lib/sljit"]
	path = lib/sljit
	url = https://github.com/zherczeg/sljit


================================================
FILE: CMakeLists.txt
================================================
cmake_minimum_required(VERSION 3.20)
set(CMAKE_C_STANDARD 17)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
# set(CMAKE_CXX_VISIBILITY_PRESET hidden)

# Rabbitizer
project(rabbitizer)
add_library(rabbitizer STATIC)

target_sources(rabbitizer PRIVATE
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/src/analysis/LoPairingInfo.cpp"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/src/analysis/RegistersTracker.cpp"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/src/instructions/InstrId.cpp"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/src/instructions/InstrIdType.cpp"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/src/instructions/InstructionBase.cpp"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/src/instructions/InstructionCpu.cpp"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/src/instructions/InstructionR3000GTE.cpp"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/src/instructions/InstructionR5900.cpp"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/src/instructions/InstructionRsp.cpp"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/analysis/RabbitizerLoPairingInfo.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/analysis/RabbitizerRegistersTracker.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/analysis/RabbitizerTrackedRegisterState.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/common/RabbitizerConfig.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/common/RabbitizerVersion.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/common/Utils.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstrCategory.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstrDescriptor.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstrId.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstrIdType.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstrSuffix.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionCpu/RabbitizerInstructionCpu_OperandType.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionR3000GTE/RabbitizerInstructionR3000GTE.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionR3000GTE/RabbitizerInstructionR3000GTE_OperandType.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionR3000GTE/RabbitizerInstructionR3000GTE_ProcessUniqueId.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionR5900/RabbitizerInstructionR5900.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionR5900/RabbitizerInstructionR5900_OperandType.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionR5900/RabbitizerInstructionR5900_ProcessUniqueId.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionRsp/RabbitizerInstructionRsp.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionRsp/RabbitizerInstructionRsp_OperandType.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstructionRsp/RabbitizerInstructionRsp_ProcessUniqueId.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstruction/RabbitizerInstruction.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstruction/RabbitizerInstruction_Disassemble.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstruction/RabbitizerInstruction_Examination.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstruction/RabbitizerInstruction_Operand.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerInstruction/RabbitizerInstruction_ProcessUniqueId.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerRegister.c"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/src/instructions/RabbitizerRegisterDescriptor.c")

target_include_directories(rabbitizer PUBLIC
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/include"
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/cplusplus/include")

target_include_directories(rabbitizer PRIVATE
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/rabbitizer/tables")

# fmtlib
add_subdirectory(lib/fmt)

# tomlplusplus
set(TOML_ENABLE_FORMATTERS OFF)
add_subdirectory(lib/tomlplusplus)

# Hardcoded symbol lists (separate library to not force a dependency on N64Recomp)
project(SymbolLists)
add_library(SymbolLists)

target_sources(SymbolLists PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/src/symbol_lists.cpp
)

target_include_directories(SymbolLists PUBLIC
    "${CMAKE_CURRENT_SOURCE_DIR}/include"
)

# N64 recompiler core library
project(N64Recomp)
add_library(N64Recomp)

target_sources(N64Recomp PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/src/analysis.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/src/operations.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/src/cgenerator.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/src/recompilation.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/src/mod_symbols.cpp
)

target_include_directories(N64Recomp PUBLIC
    "${CMAKE_CURRENT_SOURCE_DIR}/include"
)

target_link_libraries(N64Recomp SymbolLists fmt rabbitizer tomlplusplus::tomlplusplus)

# N64 recompiler elf parsing
project(N64RecompElf)
add_library(N64RecompElf)

target_sources(N64RecompElf PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/src/elf.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/src/mdebug.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/src/symbol_lists.cpp
)

target_include_directories(N64RecompElf PUBLIC
    "${CMAKE_CURRENT_SOURCE_DIR}/include"
)

target_include_directories(N64RecompElf PRIVATE
    "${CMAKE_CURRENT_SOURCE_DIR}/lib/ELFIO"
)

target_link_libraries(N64RecompElf fmt)

# N64 recompiler executable
project(N64RecompCLI)
add_executable(N64RecompCLI)

target_sources(N64RecompCLI PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/src/config.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/src/main.cpp
)

target_include_directories(N64RecompCLI PRIVATE
    "${CMAKE_CURRENT_SOURCE_DIR}/include"
)

target_link_libraries(N64RecompCLI fmt rabbitizer tomlplusplus::tomlplusplus N64Recomp N64RecompElf)
set_target_properties(N64RecompCLI PROPERTIES OUTPUT_NAME N64Recomp)

# RSP recompiler
project(RSPRecomp)
add_executable(RSPRecomp)

target_include_directories(RSPRecomp PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}/include")

target_link_libraries(RSPRecomp fmt rabbitizer tomlplusplus::tomlplusplus)

target_sources(RSPRecomp PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/RSPRecomp/src/rsp_recomp.cpp)

# Mod tool
project(RecompModTool)
add_executable(RecompModTool)

target_sources(RecompModTool PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/src/config.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/src/mod_symbols.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/RecompModTool/main.cpp
)

target_include_directories(RecompModTool PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/lib/ELFIO
)

target_link_libraries(RecompModTool fmt tomlplusplus::tomlplusplus N64RecompElf)

# Offline mod recompiler
project(OfflineModRecomp)
add_executable(OfflineModRecomp)

target_sources(OfflineModRecomp PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/src/config.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/OfflineModRecomp/main.cpp
)

target_link_libraries(OfflineModRecomp fmt rabbitizer tomlplusplus::tomlplusplus N64Recomp)

# Mod combiner
project(RecompModMerger)
add_executable(RecompModMerger)

target_sources(RecompModMerger PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/src/config.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/RecompModMerger/main.cpp
)

target_link_libraries(RecompModMerger N64Recomp)

# Live recompiler
project(LiveRecomp)
add_library(LiveRecomp)

target_sources(LiveRecomp PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/LiveRecomp/live_generator.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/lib/sljit/sljit_src/sljitLir.c
)

target_include_directories(LiveRecomp PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/lib/sljit/sljit_src
)

target_link_libraries(LiveRecomp N64Recomp)

# Live recompiler test
project(LiveRecompTest)
add_executable(LiveRecompTest)

target_sources(LiveRecompTest PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/LiveRecomp/live_recompiler_test.cpp
)

target_include_directories(LiveRecompTest PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/lib/sljit/sljit_src
)

target_link_libraries(LiveRecompTest LiveRecomp)


================================================
FILE: LICENSE
================================================
The MIT License (MIT)

Copyright (c) 2024 Wiseguy

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.


================================================
FILE: LiveRecomp/live_generator.cpp
================================================
#include <cassert>
#include <fstream>
#include <unordered_map>
#include <cmath>

#include "fmt/format.h"
#include "fmt/ostream.h"

#include "recompiler/live_recompiler.h"
#include "recomp.h"

#include "sljitLir.h"

static_assert(sizeof(void*) >= sizeof(sljit_uw), "`void*` must be able to hold a `sljit_uw` value for rewritable jumps!");

constexpr uint64_t rdram_offset = 0xFFFFFFFF80000000ULL;

void N64Recomp::live_recompiler_init() {
    RabbitizerConfig_Cfg.pseudos.pseudoMove = false;
    RabbitizerConfig_Cfg.pseudos.pseudoBeqz = false;
    RabbitizerConfig_Cfg.pseudos.pseudoBnez = false;
    RabbitizerConfig_Cfg.pseudos.pseudoNot = false;
    RabbitizerConfig_Cfg.pseudos.pseudoBal = false;
}

namespace Registers {
    constexpr int rdram = SLJIT_S0; // stores (rdram - rdram_offset)
    constexpr int ctx = SLJIT_S1; // stores ctx
    constexpr int c1cs = SLJIT_S2; // stores ctx
    constexpr int hi = SLJIT_S3; // stores ctx
    constexpr int lo = SLJIT_S4; // stores ctx
    constexpr int arithmetic_temp1 = SLJIT_R0;
    constexpr int arithmetic_temp2 = SLJIT_R1;
    constexpr int arithmetic_temp3 = SLJIT_R2;
    constexpr int arithmetic_temp4 = SLJIT_R3;
}

struct InnerCall {
    size_t target_func_index;
    sljit_jump* jump;
};

struct ReferenceSymbolCall {
    N64Recomp::SymbolReference reference;
    sljit_jump* jump;
};

struct SwitchErrorJump {
    uint32_t instr_vram;
    uint32_t jtbl_vram;
    sljit_jump* jump;
};

struct N64Recomp::LiveGeneratorContext {
    std::string function_name;
    std::unordered_map<std::string, sljit_label*> labels;
    std::unordered_map<std::string, std::vector<sljit_jump*>> pending_jumps;
    std::vector<sljit_label*> func_labels;
    std::vector<InnerCall> inner_calls;
    std::vector<std::vector<std::string>> switch_jump_labels;
    // See LiveGeneratorOutput::jump_tables for info. Contains sljit labels so they can be linked after recompilation.
    std::vector<std::pair<std::vector<sljit_label*>, std::unique_ptr<void*[]>>> unlinked_jump_tables;
    // Jump tables for the current function being recompiled.
    std::vector<std::unique_ptr<void*[]>> pending_jump_tables;
    // See LiveGeneratorOutput::reference_symbol_jumps for info.
    std::vector<std::pair<ReferenceJumpDetails, sljit_jump*>> reference_symbol_jumps;
    // See LiveGeneratorOutput::import_jumps_by_index for info.
    std::unordered_multimap<size_t, sljit_jump*> import_jumps_by_index;
    std::vector<SwitchErrorJump> switch_error_jumps;
    sljit_jump* cur_branch_jump;
};

N64Recomp::LiveGenerator::LiveGenerator(size_t num_funcs, const LiveGeneratorInputs& inputs) : inputs(inputs) {
    compiler = sljit_create_compiler(nullptr);
    context = std::make_unique<LiveGeneratorContext>();
    context->func_labels.resize(num_funcs);
    errored = false;
}

N64Recomp::LiveGenerator::~LiveGenerator() {
    if (compiler != nullptr) {
        sljit_free_compiler(compiler);
        compiler = nullptr;
    }
}

N64Recomp::LiveGeneratorOutput N64Recomp::LiveGenerator::finish() {
    LiveGeneratorOutput ret{};
    if (errored) {
        ret.good = false;
        return ret;
    }
    
    ret.good = true;

    // Populate all the pending inner function calls.
    for (const InnerCall& call : context->inner_calls) {
        sljit_label* target_func_label = context->func_labels[call.target_func_index];

        // Generation isn't valid if the target function wasn't recompiled.
        if (target_func_label == nullptr) {
            return { };
        }

        sljit_set_label(call.jump, target_func_label);
    }

    // Generate the switch error jump targets and assign the jump labels.
    if (!context->switch_error_jumps.empty()) {
        // Allocate the function name and place it in the literals.
        char* func_name = new char[context->function_name.size() + 1];
        memcpy(func_name, context->function_name.c_str(), context->function_name.size());
        func_name[context->function_name.size()] = '\x00';
        ret.string_literals.emplace_back(func_name);

        std::vector<sljit_jump*> switch_error_return_jumps{};
        switch_error_return_jumps.resize(context->switch_error_jumps.size());

        // Generate and assign the labels for the switch error jumps.
        for (size_t i = 0; i < context->switch_error_jumps.size(); i++) {
            const auto& cur_error_jump = context->switch_error_jumps[i];

            // Generate a label and assign it to the jump.
            sljit_set_label(cur_error_jump.jump, sljit_emit_label(compiler));

            // Load the arguments (function name, vram, jump table address)
            sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, SLJIT_IMM, sljit_sw(func_name));
            sljit_emit_op1(compiler, SLJIT_MOV32, SLJIT_R1, 0, SLJIT_IMM, sljit_sw(cur_error_jump.instr_vram));
            sljit_emit_op1(compiler, SLJIT_MOV32, SLJIT_R2, 0, SLJIT_IMM, sljit_sw(cur_error_jump.jtbl_vram));
            
            // Call switch_error.
            sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS3V(P, 32, 32), SLJIT_IMM, sljit_sw(inputs.switch_error));

            // Jump to the return statement.
            switch_error_return_jumps[i] = sljit_emit_jump(compiler, SLJIT_JUMP);
        }

        // Generate the return statement.
        sljit_label* return_label = sljit_emit_label(compiler);
        sljit_emit_return_void(compiler);

        // Assign the label for all the return jumps.
        for (sljit_jump* cur_jump : switch_error_return_jumps) {
            sljit_set_label(cur_jump, return_label);
        }
    }
    context->switch_error_jumps.clear();

    // Generate the code.
    ret.code = sljit_generate_code(compiler, 0, NULL);
    ret.code_size = sljit_get_generated_code_size(compiler);
    ret.functions.resize(context->func_labels.size());

    // Get the function addresses.
    for (size_t func_index = 0; func_index < ret.functions.size(); func_index++) {
        sljit_label* func_label = context->func_labels[func_index];

        // If the function wasn't recompiled, don't populate its address.
        if (func_label != nullptr) {
            ret.functions[func_index] = reinterpret_cast<recomp_func_t*>(sljit_get_label_addr(func_label));
        }
    }
    context->func_labels.clear();

    // Get the reference symbol jump instruction addresses.
    ret.reference_symbol_jumps.resize(context->reference_symbol_jumps.size());
    for (size_t jump_index = 0; jump_index < context->reference_symbol_jumps.size(); jump_index++) {
        ReferenceJumpDetails& details = context->reference_symbol_jumps[jump_index].first;
        sljit_jump* jump = context->reference_symbol_jumps[jump_index].second;

        ret.reference_symbol_jumps[jump_index].first = details;
        ret.reference_symbol_jumps[jump_index].second = reinterpret_cast<void*>(jump->addr);
    }
    context->reference_symbol_jumps.clear();
    
    // Get the import jump instruction addresses.
    ret.import_jumps_by_index.reserve(context->import_jumps_by_index.size());
    for (auto& [jump_index, jump] : context->import_jumps_by_index) {
        ret.import_jumps_by_index.emplace(jump_index, reinterpret_cast<void*>(jump->addr));
    }
    context->import_jumps_by_index.clear();

    // Populate label addresses for the jump tables and place them in the output.
    for (auto& [labels, jump_table] : context->unlinked_jump_tables) {
        for (size_t entry_index = 0; entry_index < labels.size(); entry_index++) {
            sljit_label* cur_label = labels[entry_index];
            jump_table[entry_index] = reinterpret_cast<void*>(sljit_get_label_addr(cur_label));
        }
        ret.jump_tables.emplace_back(std::move(jump_table));
    }
    context->unlinked_jump_tables.clear();

    ret.executable_offset = sljit_get_executable_offset(compiler);

    sljit_free_compiler(compiler);
    compiler = nullptr;
    errored = false;

    return ret;
}

N64Recomp::LiveGeneratorOutput::~LiveGeneratorOutput() {
    if (code != nullptr) {
        sljit_free_code(code, nullptr);
        code = nullptr;
    }
}

size_t N64Recomp::LiveGeneratorOutput::num_reference_symbol_jumps() const {
    return reference_symbol_jumps.size();
}

void N64Recomp::LiveGeneratorOutput::set_reference_symbol_jump(size_t jump_index, recomp_func_t* func) {
    const auto& jump_entry = reference_symbol_jumps[jump_index];
    sljit_set_jump_addr(reinterpret_cast<sljit_uw>(jump_entry.second), reinterpret_cast<sljit_uw>(func), executable_offset);
}

N64Recomp::ReferenceJumpDetails N64Recomp::LiveGeneratorOutput::get_reference_symbol_jump_details(size_t jump_index) {
    return reference_symbol_jumps[jump_index].first;
}

void N64Recomp::LiveGeneratorOutput::populate_import_symbol_jumps(size_t import_index, recomp_func_t* func) {
    auto find_range = import_jumps_by_index.equal_range(import_index);
    for (auto it = find_range.first; it != find_range.second; ++it) {
        sljit_set_jump_addr(reinterpret_cast<sljit_uw>(it->second), reinterpret_cast<sljit_uw>(func), executable_offset);
    }
}

constexpr int get_gpr_context_offset(int gpr_index) {
    return offsetof(recomp_context, r0) + sizeof(recomp_context::r0) * gpr_index;
}

constexpr int get_fpr_single_context_offset(int fpr_index) {
    return offsetof(recomp_context, f0.fl) + sizeof(recomp_context::f0) * fpr_index;
}

constexpr int get_fpr_double_context_offset(int fpr_index) {
    return offsetof(recomp_context, f0.d) + sizeof(recomp_context::f0) * fpr_index;
}

constexpr bool is_fpr_u32l(N64Recomp::Operand operand) {
    return
        operand == N64Recomp::Operand::FdU32L ||
        operand == N64Recomp::Operand::FsU32L ||
        operand == N64Recomp::Operand::FtU32L;
    return false;
}

constexpr void get_fpr_u32l_context_offset(int fpr_index, sljit_compiler* compiler, int odd_float_address_register, sljit_sw& out, sljit_sw& outw) {
    if (fpr_index & 1) {
        assert(compiler != nullptr);
        // Load ctx->f_odd into the address register.
        sljit_emit_op1(compiler, SLJIT_MOV_P, odd_float_address_register, 0, SLJIT_MEM1(Registers::ctx), offsetof(recomp_context, f_odd));
        // sljit_emit_op0(compiler, SLJIT_BREAKPOINT);
        out = SLJIT_MEM1(odd_float_address_register);
        // Set a memory offset of ((fpr_index - 1) * 2) * sizeof(*f_odd).
        outw = ((fpr_index - 1) * 2) * sizeof(*recomp_context::f_odd);
    }
    else {
        out = SLJIT_MEM1(Registers::ctx);
        outw = offsetof(recomp_context, f0.u32l) + sizeof(recomp_context::f0) * fpr_index;
    }
}

constexpr int get_fpr_u64_context_offset(int fpr_index) {
    return offsetof(recomp_context, f0.u64) + sizeof(recomp_context::f0) * fpr_index;
}

void get_gpr_values(int gpr, sljit_sw& out, sljit_sw& outw) {
    if (gpr == 0) {
        out = SLJIT_IMM;
        outw = 0;
    }
    else {
        out = SLJIT_MEM1(Registers::ctx);
        outw = get_gpr_context_offset(gpr);
    }
}

bool get_operand_values(N64Recomp::Operand operand, const N64Recomp::InstructionContext& context, sljit_sw& out, sljit_sw& outw,
    sljit_compiler* compiler, int odd_float_address_register
)
{
    using namespace N64Recomp;

    switch (operand) {
        case Operand::Rd:
            get_gpr_values(context.rd, out, outw);
            break;
        case Operand::Rs:
            get_gpr_values(context.rs, out, outw);
            break;
        case Operand::Rt:
            get_gpr_values(context.rt, out, outw);
            break;
        case Operand::Fd:
            out = SLJIT_MEM1(Registers::ctx);
            outw = get_fpr_single_context_offset(context.fd);
            break;
        case Operand::Fs:
            out = SLJIT_MEM1(Registers::ctx);
            outw = get_fpr_single_context_offset(context.fs);
            break;
        case Operand::Ft:
            out = SLJIT_MEM1(Registers::ctx);
            outw = get_fpr_single_context_offset(context.ft);
            break;
        case Operand::FdDouble:
            out = SLJIT_MEM1(Registers::ctx);
            outw = get_fpr_double_context_offset(context.fd);
            break;
        case Operand::FsDouble:
            out = SLJIT_MEM1(Registers::ctx);
            outw = get_fpr_double_context_offset(context.fs);
            break;
        case Operand::FtDouble:
            out = SLJIT_MEM1(Registers::ctx);
            outw = get_fpr_double_context_offset(context.ft);
            break;
        case Operand::FdU32L:
            get_fpr_u32l_context_offset(context.fd, compiler, odd_float_address_register, out, outw);
            break;
        case Operand::FsU32L:
            get_fpr_u32l_context_offset(context.fs, compiler, odd_float_address_register, out, outw);
            break;
        case Operand::FtU32L:
            get_fpr_u32l_context_offset(context.ft, compiler, odd_float_address_register, out, outw);
            break;
        case Operand::FdU32H:
            assert(false);
            return false;
        case Operand::FsU32H:
            assert(false);
            return false;
        case Operand::FtU32H:
            assert(false);
            return false;
        case Operand::FdU64:
            out = SLJIT_MEM1(Registers::ctx);
            outw = get_fpr_u64_context_offset(context.fd);
            break;
        case Operand::FsU64:
            out = SLJIT_MEM1(Registers::ctx);
            outw = get_fpr_u64_context_offset(context.fs);
            break;
        case Operand::FtU64:
            out = SLJIT_MEM1(Registers::ctx);
            outw = get_fpr_u64_context_offset(context.ft);
            break;
        case Operand::ImmU16:
            out = SLJIT_IMM;
            outw = (sljit_sw)(uint16_t)context.imm16;
            break;
        case Operand::ImmS16:
            out = SLJIT_IMM;
            outw = (sljit_sw)(int16_t)context.imm16;
            break;
        case Operand::Sa:
            out = SLJIT_IMM;
            outw = context.sa;
            break;
        case Operand::Sa32:
            out = SLJIT_IMM;
            outw = context.sa + 32;
            break;
        case Operand::Cop1cs:
            out = Registers::c1cs;
            outw = 0;
            break;
        case Operand::Hi:
            out = Registers::hi;
            outw = 0;
            break;
        case Operand::Lo:
            out = Registers::lo;
            outw = 0;
            break;
        case Operand::Zero:
            out = SLJIT_IMM;
            outw = 0;
            break;
    }
    return true;
}

bool outputs_to_zero(N64Recomp::Operand output, const N64Recomp::InstructionContext& ctx) {
    if (output == N64Recomp::Operand::Rd && ctx.rd == 0) {
        return true;
    }
    if (output == N64Recomp::Operand::Rt && ctx.rt == 0) {
        return true;
    }
    if (output == N64Recomp::Operand::Rs && ctx.rs == 0) {
        return true;
    }
    return false;
}

void N64Recomp::LiveGenerator::process_binary_op(const BinaryOp& op, const InstructionContext& ctx) const {
    // Skip instructions that output to $zero
    if (outputs_to_zero(op.output, ctx)) {
        return;
    }
    
    // Float u32l input operands are not allowed in a binary operation.
    if (is_fpr_u32l(op.operands.operands[0]) || is_fpr_u32l(op.operands.operands[1])) {
        assert(false);
        errored = true;
        return;
    }

    // A float u32l output operand is only allowed for lwc1, which has an op type of LW.
    if (is_fpr_u32l(op.output) && op.type != BinaryOpType::LW) {
        assert(false);
        errored = true;
        return;
    }

    sljit_sw dst;
    sljit_sw dstw;
    sljit_sw src1;
    sljit_sw src1w;
    sljit_sw src2;
    sljit_sw src2w;
    bool output_good = get_operand_values(op.output, ctx, dst, dstw, compiler, Registers::arithmetic_temp2);
    bool input0_good = get_operand_values(op.operands.operands[0], ctx, src1, src1w, nullptr, 0);
    bool input1_good = get_operand_values(op.operands.operands[1], ctx, src2, src2w, nullptr, 0);

    if (!output_good || !input0_good || !input1_good) {
        assert(false);
        errored = true;
        return;
    }

    // If a relocation is present, perform the relocation and change src1/src1w to use the relocated value.
    if (ctx.reloc_type != RelocType::R_MIPS_NONE) {
        // Only allow LO16 relocations.
        if (ctx.reloc_type != RelocType::R_MIPS_LO16) {
            assert(false);
            errored = true;
            return;
        }
        // Only allow relocations on immediates.
        if (src2 != SLJIT_IMM) {
            assert(false);
            errored = true;
            return;
        }
        // Only allow relocations on loads and adds.
        switch (op.type) {
            case BinaryOpType::LD:
            case BinaryOpType::LW:
            case BinaryOpType::LWU:
            case BinaryOpType::LH:
            case BinaryOpType::LHU:
            case BinaryOpType::LB:
            case BinaryOpType::LBU:
            case BinaryOpType::LDL:
            case BinaryOpType::LDR:
            case BinaryOpType::LWL:
            case BinaryOpType::LWR:
            case BinaryOpType::Add64:
            case BinaryOpType::Add32:
                break;
            default:
                // Relocations aren't allowed on this instruction.
                assert(false);
                errored = true;
                return;
        }
        // Load the relocated address into temp2.
        load_relocated_address(ctx, Registers::arithmetic_temp1);
        // Extract the LO16 value from the full address (sign extended lower 16 bits).
        sljit_emit_op1(compiler, SLJIT_MOV_S16, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0);
        // Replace the immediate input (src2) with the LO16 value.
        src2 = Registers::arithmetic_temp1;
        src2w = 0;
    }

    // TODO validate that the unary ops are valid for the current binary op.
    if (op.operands.operand_operations[0] != UnaryOpType::None &&
        op.operands.operand_operations[0] != UnaryOpType::ToU64 &&
        op.operands.operand_operations[0] != UnaryOpType::ToS64 &&
        op.operands.operand_operations[0] != UnaryOpType::ToU32)
    {
        assert(false);
        errored = true;
        return;
    }
    
    if (op.operands.operand_operations[1] != UnaryOpType::None &&
        op.operands.operand_operations[1] != UnaryOpType::ToU64 &&
        op.operands.operand_operations[1] != UnaryOpType::ToS64 &&
        op.operands.operand_operations[1] != UnaryOpType::Mask5 && // Only for 32-bit shifts
        op.operands.operand_operations[1] != UnaryOpType::Mask6) // Only for 64-bit shifts
    {
        assert(false);
        errored = true;
        return;
    }

    bool cmp_unsigned = op.operands.operand_operations[0] != UnaryOpType::ToS64;

    auto sign_extend_and_store = [dst, dstw, this]() {
        // Sign extend the result.
        sljit_emit_op1(this->compiler, SLJIT_MOV_S32, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0);
        // Store the result back into the context.
        sljit_emit_op1(this->compiler, SLJIT_MOV_P, dst, dstw, Registers::arithmetic_temp1, 0);
    };

    auto do_op32 = [src1, src1w, src2, src2w, this, &sign_extend_and_store](sljit_s32 op) {
        sljit_emit_op2(this->compiler, op, Registers::arithmetic_temp1, 0, src1, src1w, src2, src2w);
        sign_extend_and_store();
    };

    auto do_op64 = [dst, dstw, src1, src1w, src2, src2w, this](sljit_s32 op) {
        sljit_emit_op2(this->compiler, op, dst, dstw, src1, src1w, src2, src2w);
    };

    auto do_float_op = [dst, dstw, src1, src1w, src2, src2w, this](sljit_s32 op) {
        sljit_emit_fop2(this->compiler, op, dst, dstw, src1, src1w, src2, src2w);
    };

    auto do_load_op = [dst, dstw, src1, src1w, src2, src2w, this](sljit_s32 op, int address_xor) {
        // TODO 0 immediate optimization.

        // Add the base and immediate into the arithemtic temp.
        sljit_emit_op2(compiler, SLJIT_ADD, Registers::arithmetic_temp1, 0, src1, src1w, src2, src2w);

        if (address_xor != 0) {
            // xor the address with the specified amount
            sljit_emit_op2(compiler, SLJIT_XOR, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, SLJIT_IMM, address_xor);
        }
        
        // Load the value at rdram + address into the arithemtic temp with the given operation to allow for sign-extension or zero-extension.
        sljit_emit_op1(compiler, op, Registers::arithmetic_temp1, 0, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0);

        // Move the arithmetic temp into the destination.
        sljit_emit_op1(compiler, SLJIT_MOV, dst, dstw, Registers::arithmetic_temp1, 0);
    };

    auto do_compare_op = [cmp_unsigned, dst, dstw, src1, src1w, src2, src2w, this](sljit_s32 op_unsigned, sljit_s32 op_signed) {
        // Pick the operation based on the signedness of the comparison.
        sljit_s32 op = cmp_unsigned ? op_unsigned : op_signed;

        // Pick the flags to set based on the operation.
        sljit_s32 flags;
        if (op <= SLJIT_NOT_ZERO) {
            flags = SLJIT_SET_Z;
        } else
        {
            flags = SLJIT_SET(op);
        }

        // Perform a subtraction with the determined flag.
        sljit_emit_op2u(compiler, SLJIT_SUB | flags, src1, src1w, src2, src2w);
        
        // Move the operation's flag into the destination.
        sljit_emit_op_flags(compiler, SLJIT_MOV, dst, dstw, op);
    };

    auto do_float_compare_op = [dst, dstw, src1, src1w, src2, src2w, this](sljit_s32 flag_op, sljit_s32 set_op, bool double_precision) {
        // Pick the operation based on the signedness of the comparison.
        sljit_s32 compare_op = set_op | (double_precision ? SLJIT_CMP_F64 : SLJIT_CMP_F32);

        // Perform the comparison with the determined operation.
        // Float comparisons use fop1 and put the left hand side in dst.
        sljit_emit_fop1(compiler, compare_op, src1, src1w, src2, src2w);
        
        // Move the operation's flag into the destination.
        sljit_emit_op_flags(compiler, SLJIT_MOV, dst, dstw, flag_op);
    };

    auto do_unaligned_load_op = [dst, dstw, src1, src1w, src2, src2w, this](bool left, bool doubleword) {
        // TODO 0 immediate optimization.

        // Determine the shift direction to use for calculating the mask and shifting the loaded value.
        sljit_sw shift_op = left ? SLJIT_SHL : SLJIT_LSHR;
        // Determine the operation's word size.
        sljit_sw word_size = doubleword ? 8 : 4;

        // Add the base and immediate into the temp1.
        // addr = base + offset
        sljit_emit_op2(compiler, SLJIT_ADD, Registers::arithmetic_temp1, 0, src1, src1w, src2, src2w);

        // Mask the address with the alignment mask to get the misalignment and put it in temp2.
        // misalignment = addr & (word_size - 1);
        sljit_emit_op2(compiler, SLJIT_AND, Registers::arithmetic_temp2, 0, Registers::arithmetic_temp1, 0, SLJIT_IMM, word_size - 1);

        // Mask the address with ~alignment_mask to get the aligned address and put it in temp1.
        // addr = addr & ~(word_size - 1);
        sljit_emit_op2(compiler, SLJIT_AND, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, SLJIT_IMM, ~(word_size - 1));

        // Load the word at rdram + aligned address into the temp1 with sign-extension.
        // loaded_value = *addr
        if (doubleword) {
            // Rotate the loaded doubleword by 32 bits to swap the two words into the right order.
            sljit_emit_op2(compiler, SLJIT_ROTL, Registers::arithmetic_temp1, 0, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0, SLJIT_IMM, 32);
        }
        else {
            // Use MOV_S32 to sign-extend the loaded word.
            sljit_emit_op1(compiler, SLJIT_MOV_S32, Registers::arithmetic_temp1, 0, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0);
        }

        // Inverse the misalignment if this is a right load.
        if (!left) {
            // misalignment = (word_size - 1 - misalignment) * 8
            sljit_emit_op2(compiler, SLJIT_SUB, Registers::arithmetic_temp2, 0, SLJIT_IMM, word_size - 1, Registers::arithmetic_temp2, 0);
        }

        // Calculate the misalignment shift and put it into temp2.
        // misalignment_shift = misalignment * 8
        sljit_emit_op2(compiler, SLJIT_SHL, Registers::arithmetic_temp2, 0, Registers::arithmetic_temp2, 0, SLJIT_IMM, 3);

        // Calculate the misalignment mask and put it into temp3. Use a 32-bit shift if this is a 32-bit operation.
        // misalignment_mask = word(-1) SHIFT misalignment_shift
        sljit_emit_op2(compiler, doubleword ? shift_op : (shift_op | SLJIT_32),
            Registers::arithmetic_temp3, 0,
            SLJIT_IMM, doubleword ? uint64_t(-1) : uint32_t(-1),
            Registers::arithmetic_temp2, 0);

        if (!doubleword) {
            // Sign extend the misalignment mask.
            // misalignment_mask = ((uint64_t)(int32_t)misalignment_mask)
            sljit_emit_op1(compiler, SLJIT_MOV_S32, Registers::arithmetic_temp3, 0, Registers::arithmetic_temp3, 0);
        }

        // Shift the loaded value by the misalignment shift and put it into temp1.
        // loaded_value SHIFT misalignment_shift
        sljit_emit_op2(compiler, shift_op, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp2, 0);

        if (left && !doubleword) {
            // Sign extend the loaded value.
            // loaded_value = (uint64_t)(int32_t)loaded_value
            sljit_emit_op1(compiler, SLJIT_MOV_S32, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0);
        }

        // Mask the shifted loaded value by the misalignment mask.
        // loaded_value &= misalignment_mask
        sljit_emit_op2(compiler, SLJIT_AND, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp3, 0);

        // Invert the misalignment mask and store it into temp3.
        // misalignment_mask = ~misalignment_mask
        sljit_emit_op2(compiler, SLJIT_XOR, Registers::arithmetic_temp3, 0, Registers::arithmetic_temp3, 0, SLJIT_IMM, sljit_sw(-1));

        // Mask the initial value (stored in the destination) with the misalignment mask and place it into temp3.
        // masked_value = initial_value & misalignment_mask
        sljit_emit_op2(compiler, SLJIT_AND, Registers::arithmetic_temp3, 0, dst, dstw, Registers::arithmetic_temp3, 0);

        // Combine the masked initial value with the shifted loaded value and store it in the destination.
        // out = masked_value | loaded_value
        sljit_emit_op2(compiler, SLJIT_OR, dst, dstw, Registers::arithmetic_temp3, 0, Registers::arithmetic_temp1, 0);
    };

    switch (op.type) {
        // Addition/subtraction
        case BinaryOpType::Add32:
            do_op32(SLJIT_ADD32);
            break;
        case BinaryOpType::Sub32:
            do_op32(SLJIT_SUB32);
            break;
        case BinaryOpType::Add64:
            do_op64(SLJIT_ADD);
            break;
        case BinaryOpType::Sub64:
            do_op64(SLJIT_SUB);
            break;

        // Float arithmetic
        case BinaryOpType::AddFloat:
            do_float_op(SLJIT_ADD_F32);
            break;
        case BinaryOpType::AddDouble:
            do_float_op(SLJIT_ADD_F64);
            break;
        case BinaryOpType::SubFloat:
            do_float_op(SLJIT_SUB_F32);
            break;
        case BinaryOpType::SubDouble:
            do_float_op(SLJIT_SUB_F64);
            break;
        case BinaryOpType::MulFloat:
            do_float_op(SLJIT_MUL_F32);
            break;
        case BinaryOpType::MulDouble:
            do_float_op(SLJIT_MUL_F64);
            break;
        case BinaryOpType::DivFloat:
            do_float_op(SLJIT_DIV_F32);
            break;
        case BinaryOpType::DivDouble:
            do_float_op(SLJIT_DIV_F64);
            break;

        // Bitwise
        case BinaryOpType::And64:
            do_op64(SLJIT_AND);
            break;
        case BinaryOpType::Or64:
            do_op64(SLJIT_OR);
            break;
        case BinaryOpType::Nor64:
            // Bitwise or the two registers and move the result into the temp, then invert the result and move it into the destination.
            sljit_emit_op2(this->compiler, SLJIT_OR, Registers::arithmetic_temp1, 0, src1, src1w, src2, src2w);
            sljit_emit_op2(this->compiler, SLJIT_XOR, dst, dstw, Registers::arithmetic_temp1, 0, SLJIT_IMM, sljit_sw(-1));
            break;
        case BinaryOpType::Xor64:
            do_op64(SLJIT_XOR);
            break;
        case BinaryOpType::Sll32:
            // TODO only mask if the second input's op is Mask5.
            do_op32(SLJIT_MSHL32);
            break;
        case BinaryOpType::Sll64:
            // TODO only mask if the second input's op is Mask6.
            do_op64(SLJIT_MSHL);
            break;
        case BinaryOpType::Srl32:
            // TODO only mask if the second input's op is Mask5.
            do_op32(SLJIT_MLSHR32);
            break;
        case BinaryOpType::Srl64:
            // TODO only mask if the second input's op is Mask6.
            do_op64(SLJIT_MLSHR);
            break;
        case BinaryOpType::Sra32:
            // Hardware bug: The input is not masked to 32 bits before right shifting, so bits from the upper half of the register will bleed into the lower half.
            // This means we have to use a 64-bit shift and manually mask the input before shifting.
            // TODO only mask if the second input's op is Mask5.
            sljit_emit_op2(this->compiler, SLJIT_AND32, Registers::arithmetic_temp1, 0, src2, src2w, SLJIT_IMM, 0b11111);
            sljit_emit_op2(this->compiler, SLJIT_MASHR, Registers::arithmetic_temp1, 0, src1, src1w, Registers::arithmetic_temp1, 0);
            sign_extend_and_store();
            break;
        case BinaryOpType::Sra64:
            // TODO only mask if the second input's op is Mask6.
            do_op64(SLJIT_MASHR);
            break;

        // Comparisons
        case BinaryOpType::Equal:
            do_compare_op(SLJIT_EQUAL, SLJIT_EQUAL);
            break;
        case BinaryOpType::NotEqual:
            do_compare_op(SLJIT_NOT_EQUAL, SLJIT_NOT_EQUAL);
            break;
        case BinaryOpType::Less:
            do_compare_op(SLJIT_LESS, SLJIT_SIG_LESS);
            break;
        case BinaryOpType::LessEq:
            do_compare_op(SLJIT_LESS_EQUAL, SLJIT_SIG_LESS_EQUAL);
            break;
        case BinaryOpType::Greater:
            do_compare_op(SLJIT_GREATER, SLJIT_SIG_GREATER);
            break;
        case BinaryOpType::GreaterEq:
            do_compare_op(SLJIT_GREATER_EQUAL, SLJIT_SIG_GREATER_EQUAL);
            break;
        case BinaryOpType::EqualFloat:
            do_float_compare_op(SLJIT_F_EQUAL, SLJIT_SET_F_EQUAL, false);
            break;
        case BinaryOpType::LessFloat:
            do_float_compare_op(SLJIT_F_LESS, SLJIT_SET_F_LESS, false);
            break;
        case BinaryOpType::LessEqFloat:
            do_float_compare_op(SLJIT_F_LESS_EQUAL, SLJIT_SET_F_LESS_EQUAL, false);
            break;
        case BinaryOpType::EqualDouble:
            do_float_compare_op(SLJIT_F_EQUAL, SLJIT_SET_F_EQUAL, true);
            break;
        case BinaryOpType::LessDouble:
            do_float_compare_op(SLJIT_F_LESS, SLJIT_SET_F_LESS, true);
            break;
        case BinaryOpType::LessEqDouble:
            do_float_compare_op(SLJIT_F_LESS_EQUAL, SLJIT_SET_F_LESS_EQUAL, true);
            break;
        case BinaryOpType::False:
            // Load 0 into condition destination
            sljit_emit_op1(compiler, SLJIT_MOV, dst, dstw, SLJIT_IMM, 0);
            break;

        // Loads
        case BinaryOpType::LD:
            // Add the base and immediate into the arithemtic temp.
            sljit_emit_op2(compiler, SLJIT_ADD, Registers::arithmetic_temp1, 0, src1, src1w, src2, src2w);
        
            // Load the value at rdram + address into the arithemtic temp and rotate it by 32 bits to swap the two words into the right order.
            sljit_emit_op2(compiler, SLJIT_ROTL, Registers::arithmetic_temp1, 0, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0, SLJIT_IMM, 32);

            // Move the arithmetic temp into the destination.
            sljit_emit_op1(compiler, SLJIT_MOV, dst, dstw, Registers::arithmetic_temp1, 0);
            break;
        case BinaryOpType::LW:
            do_load_op(SLJIT_MOV_S32, 0);
            break;
        case BinaryOpType::LWU:
            do_load_op(SLJIT_MOV_U32, 0);
            break;
        case BinaryOpType::LH:
            do_load_op(SLJIT_MOV_S16, 2);
            break;
        case BinaryOpType::LHU:
            do_load_op(SLJIT_MOV_U16, 2);
            break;
        case BinaryOpType::LB:
            do_load_op(SLJIT_MOV_S8, 3);
            break;
        case BinaryOpType::LBU:
            do_load_op(SLJIT_MOV_U8, 3);
            break;
        case BinaryOpType::LDL:
            do_unaligned_load_op(true, true);
            break;
        case BinaryOpType::LDR:
            do_unaligned_load_op(false, true);
            break;
        case BinaryOpType::LWL:
            do_unaligned_load_op(true, false);
            break;
        case BinaryOpType::LWR:
            do_unaligned_load_op(false, false);
            break;
        default:
            assert(false);
            errored = true;
            return;
    }
}

// TODO these four operations should use banker's rounding, but roundeven is C23 so it's unavailable here.
int32_t do_round_w_s(float num) {
    return lroundf(num);
}

int32_t do_round_w_d(double num) {
    return lround(num);
}

int64_t do_round_l_s(float num) {
    return llroundf(num);
}

int64_t do_round_l_d(double num) {
    return llround(num);
}

int32_t do_ceil_w_s(float num) {
    return (int32_t)ceilf(num);
}

int32_t do_ceil_w_d(double num) {
    return (int32_t)ceil(num);
}

int64_t do_ceil_l_s(float num) {
    return (int64_t)ceilf(num);
}

int64_t do_ceil_l_d(double num) {
    return (int64_t)ceil(num);
}

int32_t do_floor_w_s(float num) {
    return (int32_t)floorf(num);
}

int32_t do_floor_w_d(double num) {
    return (int32_t)floor(num);
}

int64_t do_floor_l_s(float num) {
    return (int64_t)floorf(num);
}

int64_t do_floor_l_d(double num) {
    return (int64_t)floor(num);
}

void N64Recomp::LiveGenerator::load_relocated_address(const InstructionContext& ctx, int reg) const {
    // Get the pointer to the section address.
    int32_t* section_addr_ptr = (ctx.reloc_tag_as_reference ? inputs.reference_section_addresses : inputs.local_section_addresses) + ctx.reloc_section_index;

    // Load the section's address into the target register.
    sljit_emit_op1(compiler, SLJIT_MOV_S32, reg, 0, SLJIT_MEM0(), sljit_sw(section_addr_ptr));

    // Don't emit the add if the offset is zero (small optimization).
    if (ctx.reloc_target_section_offset != 0) {
        // Add the reloc section offset to the section's address and put the result in R0.
        sljit_emit_op2(compiler, SLJIT_ADD, reg, 0, reg, 0, SLJIT_IMM, ctx.reloc_target_section_offset);
    }
}

void N64Recomp::LiveGenerator::process_unary_op(const UnaryOp& op, const InstructionContext& ctx) const {
    // Skip instructions that output to $zero
    if (outputs_to_zero(op.output, ctx)) {
        return;
    }

    // A unary op may have a float u32l as the source or destination, but not both.
    if (is_fpr_u32l(op.input) && is_fpr_u32l(op.output)) {
        assert(false);
        errored = true;
        return;
    }

    sljit_sw dst;
    sljit_sw dstw;
    sljit_sw src;
    sljit_sw srcw;
    bool output_good = get_operand_values(op.output, ctx, dst, dstw, compiler, Registers::arithmetic_temp3);
    bool input_good = get_operand_values(op.input, ctx, src, srcw, compiler, Registers::arithmetic_temp3);

    if (!output_good || !input_good) {
        assert(false);
        errored = true;
        return;
    }

    // If a relocation is needed for the input operand, perform the relocation and store the result directly.
    if (ctx.reloc_type != RelocType::R_MIPS_NONE) {
        // Only allow relocation of lui with an immediate.
        if (op.operation != UnaryOpType::Lui || op.input != Operand::ImmU16) {
            assert(false);
            errored = true;
            return;
        }
        // Only allow HI16 relocs.
        if (ctx.reloc_type != RelocType::R_MIPS_HI16) {
            assert(false);
            errored = true;
            return;
        }
        // Load the relocated address into temp1.
        load_relocated_address(ctx, Registers::arithmetic_temp1);

        // HI16 reloc on a lui
        // The 32-bit address (a) is equal to section address + section offset
        // The 16-bit immediate is equal to (a - (int16_t)a) >> 16
        // Therefore, the register should be set to (int32_t)(a - (int16_t)a) as the shifts cancel out and the lower 16 bits are zero.

        // Extract a sign extended 16-bit value from the lower half of the relocated address and put it in temp2.
        sljit_emit_op1(compiler, SLJIT_MOV_S16, Registers::arithmetic_temp2, 0, Registers::arithmetic_temp1, 0);

        // Subtract the sign extended 16-bit value from the full address to get the HI16 value and place it in the destination.
        sljit_emit_op2(compiler, SLJIT_SUB, dst, dstw, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp2, 0);
        return;
    }

    sljit_s32 jit_op = SLJIT_BREAKPOINT;

    bool float_op = false;
    bool func_float_op = false;

    auto emit_s_func = [this, src, srcw, dst, dstw, &func_float_op](float (*func)(float)) {
        func_float_op = true;

        sljit_emit_fop1(compiler, SLJIT_MOV_F32, SLJIT_FR0, 0, src, srcw);
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1(F32, F32), SLJIT_IMM, sljit_sw(func));
        sljit_emit_fop1(compiler, SLJIT_MOV_F32, dst, dstw, SLJIT_RETURN_FREG, 0);
    };

    auto emit_d_func = [this, src, srcw, dst, dstw, &func_float_op](double (*func)(double)) {
        func_float_op = true;

        sljit_emit_fop1(compiler, SLJIT_MOV_F64, SLJIT_FR0, 0, src, srcw);
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1(F64, F64), SLJIT_IMM, sljit_sw(func));
        sljit_emit_fop1(compiler, SLJIT_MOV_F64, dst, dstw, SLJIT_RETURN_FREG, 0);
    };

    auto emit_l_from_s_func = [this, src, srcw, dst, dstw, &func_float_op](int64_t (*func)(float)) {
        func_float_op = true;

        sljit_emit_fop1(compiler, SLJIT_MOV_F32, SLJIT_FR0, 0, src, srcw);
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1(P, F32), SLJIT_IMM, sljit_sw(func));
        sljit_emit_op1(compiler, SLJIT_MOV, dst, dstw, SLJIT_RETURN_REG, 0);
    };

    auto emit_w_from_s_func = [this, src, srcw, dst, dstw, &func_float_op](int32_t (*func)(float)) {
        func_float_op = true;

        sljit_emit_fop1(compiler, SLJIT_MOV_F32, SLJIT_FR0, 0, src, srcw);
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1(32, F32), SLJIT_IMM, sljit_sw(func));
        sljit_emit_op1(compiler, SLJIT_MOV_S32, dst, dstw, SLJIT_RETURN_REG, 0);
    };

    auto emit_l_from_d_func = [this, src, srcw, dst, dstw, &func_float_op](int64_t (*func)(double)) {
        func_float_op = true;

        sljit_emit_fop1(compiler, SLJIT_MOV_F64, SLJIT_FR0, 0, src, srcw);
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1(P, F64), SLJIT_IMM, sljit_sw(func));
        sljit_emit_op1(compiler, SLJIT_MOV, dst, dstw, SLJIT_RETURN_REG, 0);
    };

    auto emit_w_from_d_func = [this, src, srcw, dst, dstw, &func_float_op](int32_t (*func)(double)) {
        func_float_op = true;

        sljit_emit_fop1(compiler, SLJIT_MOV_F64, SLJIT_FR0, 0, src, srcw);
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1(32, F64), SLJIT_IMM, sljit_sw(func));
        sljit_emit_op1(compiler, SLJIT_MOV_S32, dst, dstw, SLJIT_RETURN_REG, 0);
    };

    switch (op.operation) {
        case UnaryOpType::Lui:
            if (src != SLJIT_IMM) {
                assert(false);
                errored = true;
                break;
            }
            src = SLJIT_IMM;
            srcw = (sljit_sw)(int32_t)(srcw << 16);
            jit_op = SLJIT_MOV;
            break;
        case UnaryOpType::NegateFloat:
            jit_op = SLJIT_NEG_F32;
            float_op = true;
            break;
        case UnaryOpType::NegateDouble:
            jit_op = SLJIT_NEG_F64;
            float_op = true;
            break;
        case UnaryOpType::AbsFloat:
            jit_op = SLJIT_ABS_F32;
            float_op = true;
            break;
        case UnaryOpType::AbsDouble:
            jit_op = SLJIT_ABS_F64;
            float_op = true;
            break;
        case UnaryOpType::SqrtFloat:
            emit_s_func(sqrtf);
            break;
        case UnaryOpType::SqrtDouble:
            emit_d_func(sqrt);
            break;
        case UnaryOpType::ConvertSFromW:
            jit_op = SLJIT_CONV_F32_FROM_S32;
            float_op = true;
            break;
        case UnaryOpType::ConvertWFromS:
            emit_w_from_s_func(do_cvt_w_s);
            break;
        case UnaryOpType::ConvertDFromW:
            jit_op = SLJIT_CONV_F64_FROM_S32;
            float_op = true;
            break;
        case UnaryOpType::ConvertWFromD:
            emit_w_from_d_func(do_cvt_w_d);
            break;
        case UnaryOpType::ConvertDFromS:
            jit_op = SLJIT_CONV_F64_FROM_F32;
            float_op = true;
            break;
        case UnaryOpType::ConvertSFromD:
            // SLJIT_CONV_F32_FROM_F64 uses the current rounding mode, just as CVT_S_D does.
            jit_op = SLJIT_CONV_F32_FROM_F64;
            float_op = true;
            break;
        case UnaryOpType::ConvertDFromL:
            jit_op = SLJIT_CONV_F64_FROM_SW;
            float_op = true;
            break;
        case UnaryOpType::ConvertLFromD:
            emit_l_from_d_func(do_cvt_l_d);
            break;
        case UnaryOpType::ConvertSFromL:
            jit_op = SLJIT_CONV_F32_FROM_SW;
            float_op = true;
            break;
        case UnaryOpType::ConvertLFromS:
            emit_l_from_s_func(do_cvt_l_s);
            break;
        case UnaryOpType::TruncateWFromS:
            // SLJIT_CONV_S32_FROM_F32 rounds towards zero, just as TRUNC_W_S does.
            jit_op = SLJIT_CONV_S32_FROM_F32;
            float_op = true;
            break;
        case UnaryOpType::TruncateWFromD:
            // SLJIT_CONV_S32_FROM_F64 rounds towards zero, just as TRUNC_W_D does.
            jit_op = SLJIT_CONV_S32_FROM_F64;
            float_op = true;
            break;
        case UnaryOpType::TruncateLFromS:
            // SLJIT_CONV_SW_FROM_F32 rounds towards zero, just as TRUNC_L_S does.
            jit_op = SLJIT_CONV_SW_FROM_F32;
            float_op = true;
            break;
        case UnaryOpType::TruncateLFromD:
            // SLJIT_CONV_SW_FROM_F64 rounds towards zero, just as TRUNC_L_D does.
            jit_op = SLJIT_CONV_SW_FROM_F64;
            float_op = true;
            break;
        case UnaryOpType::RoundWFromS:
            emit_w_from_s_func(do_round_w_s);
            break;
        case UnaryOpType::RoundWFromD:
            emit_w_from_d_func(do_round_w_d);
            break;
        case UnaryOpType::RoundLFromS:
            emit_l_from_s_func(do_round_l_s);
            break;
        case UnaryOpType::RoundLFromD:
            emit_l_from_d_func(do_round_l_d);
            break;
        case UnaryOpType::CeilWFromS:
            emit_w_from_s_func(do_ceil_w_s);
            break;
        case UnaryOpType::CeilWFromD:
            emit_w_from_d_func(do_ceil_w_d);
            break;
        case UnaryOpType::CeilLFromS:
            emit_l_from_s_func(do_ceil_l_s);
            break;
        case UnaryOpType::CeilLFromD:
            emit_l_from_d_func(do_ceil_l_d);
            break;
        case UnaryOpType::FloorWFromS:
            emit_w_from_s_func(do_floor_w_s);
            break;
        case UnaryOpType::FloorWFromD:
            emit_w_from_d_func(do_floor_w_d);
            break;
        case UnaryOpType::FloorLFromS:
            emit_l_from_s_func(do_floor_l_s);
            break;
        case UnaryOpType::FloorLFromD:
            emit_l_from_d_func(do_floor_l_d);
            break;
        case UnaryOpType::None:
            // Only write 32 bits to the output is a fpr u32l operand.
            if (is_fpr_u32l(op.output)) {
                jit_op = SLJIT_MOV32;
            }
            else {
                jit_op = SLJIT_MOV;
            }
            break;
        case UnaryOpType::ToS32:
        case UnaryOpType::ToInt32:
            // sljit won't emit a sign extension with SLJIT_MOV_32 if the destination is memory,
            // so emit an explicit move into a register and set that register as the new src.
            sljit_emit_op1(compiler, SLJIT_MOV_S32, Registers::arithmetic_temp1, 0, src, srcw);
            // Replace the original input with the temporary register.
            src = Registers::arithmetic_temp1;
            srcw = 0;
            jit_op = SLJIT_MOV;
            break;
        // Unary ops that can't be used as a standalone operation
        case UnaryOpType::ToU32:
        case UnaryOpType::ToS64:
        case UnaryOpType::ToU64:
        case UnaryOpType::Mask5:
        case UnaryOpType::Mask6:
            assert(false && "Unsupported unary op");
            errored = true;
            return;
    }

    if (func_float_op) {
        // Already handled by the lambda.
    }
    else if (float_op) {
        sljit_emit_fop1(compiler, jit_op, dst, dstw, src, srcw);
    }
    else {
        sljit_emit_op1(compiler, jit_op, dst, dstw, src, srcw);
    }
}

void N64Recomp::LiveGenerator::process_store_op(const StoreOp& op, const InstructionContext& ctx) const {
    sljit_sw src;
    sljit_sw srcw;
    sljit_sw imm = (sljit_sw)(int16_t)ctx.imm16;

    get_operand_values(op.value_input, ctx, src, srcw, compiler, Registers::arithmetic_temp2);

    // Only LO16 relocs are valid on stores.
    if (ctx.reloc_type != RelocType::R_MIPS_NONE && ctx.reloc_type != RelocType::R_MIPS_LO16) {
        assert(false);
        errored = true;
        return;
    }

    if (ctx.reloc_type == RelocType::R_MIPS_LO16) {
        // Load the relocated address into temp1.
        load_relocated_address(ctx, Registers::arithmetic_temp1);
        // Extract the LO16 value from the full address (sign extended lower 16 bits).
        sljit_emit_op1(compiler, SLJIT_MOV_S16, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0);
        // Add the base register (rs) to the LO16 immediate.
        sljit_emit_op2(compiler, SLJIT_ADD, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, SLJIT_MEM1(Registers::ctx), get_gpr_context_offset(ctx.rs));
    }
    else {
        // TODO 0 immediate optimization.

        // Add the base register (rs) and the immediate to get the address and store it in the arithemtic temp.
        sljit_emit_op2(compiler, SLJIT_ADD, Registers::arithmetic_temp1, 0, SLJIT_MEM1(Registers::ctx), get_gpr_context_offset(ctx.rs), SLJIT_IMM, imm);
    }

    auto do_unaligned_store_op = [src, srcw, this](bool left, bool doubleword) {
        // Determine the shift direction to use for calculating the mask and shifting the loaded value.
        sljit_sw shift_op = left ? SLJIT_LSHR : SLJIT_SHL;
        // Determine the operation's word size.
        sljit_sw word_size = doubleword ? 8 : 4;

        // Mask the address with the alignment mask to get the misalignment and put it in temp2.
        // misalignment = addr & (word_size - 1);
        sljit_emit_op2(compiler, SLJIT_AND, Registers::arithmetic_temp2, 0, Registers::arithmetic_temp1, 0, SLJIT_IMM, word_size - 1);

        // Mask the address with ~alignment_mask to get the aligned address and put it in temp1.
        // addr = addr & ~(word_size - 1);
        sljit_emit_op2(compiler, SLJIT_AND, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, SLJIT_IMM, ~(word_size - 1));

        // Load the word at rdram + aligned address into the temp1 with sign-extension.
        // loaded_value = *addr
        if (doubleword) {
            // Rotate the loaded doubleword by 32 bits to swap the two words into the right order.
            sljit_emit_op2(compiler, SLJIT_ROTL, Registers::arithmetic_temp3, 0, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0, SLJIT_IMM, 32);
        }
        else {
            // Use MOV_S32 to sign-extend the loaded word.
            sljit_emit_op1(compiler, SLJIT_MOV_S32, Registers::arithmetic_temp3, 0, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0);
        }

        // Inverse the misalignment if this is a right load.
        if (!left) {
            // misalignment = (word_size - 1 - misalignment) * 8
            sljit_emit_op2(compiler, SLJIT_SUB, Registers::arithmetic_temp2, 0, SLJIT_IMM, word_size - 1, Registers::arithmetic_temp2, 0);
        }

        // Calculate the misalignment shift and put it into temp2.
        // misalignment_shift = misalignment * 8
        sljit_emit_op2(compiler, SLJIT_SHL, Registers::arithmetic_temp2, 0, Registers::arithmetic_temp2, 0, SLJIT_IMM, 3);

        // Shift the input value by the misalignment shift and put it into temp4.
        // input_value SHIFT= misalignment_shift
        sljit_emit_op2(compiler, shift_op, Registers::arithmetic_temp4, 0, src, srcw, Registers::arithmetic_temp2, 0);

        // Calculate the misalignment mask and put it into temp2. Use a 32-bit shift if this is a 32-bit operation.
        // misalignment_mask = word(-1) SHIFT misalignment_shift
        sljit_emit_op2(compiler, doubleword ? shift_op : (shift_op | SLJIT_32),
            Registers::arithmetic_temp2, 0,
            SLJIT_IMM, doubleword ? uint64_t(-1) : uint32_t(-1),
            Registers::arithmetic_temp2, 0);

        // Mask the input value with the misalignment mask and place it into temp4.
        // masked_value = shifted_value & misalignment_mask
        sljit_emit_op2(compiler, SLJIT_AND, Registers::arithmetic_temp4, 0, Registers::arithmetic_temp4, 0, Registers::arithmetic_temp2, 0);

        // Invert the misalignment mask and store it into temp2.
        // misalignment_mask = ~misalignment_mask
        sljit_emit_op2(compiler, SLJIT_XOR, Registers::arithmetic_temp2, 0, Registers::arithmetic_temp2, 0, SLJIT_IMM, sljit_sw(-1));

        // Mask the loaded value by the misalignment mask.
        // input_value &= misalignment_mask
        sljit_emit_op2(compiler, SLJIT_AND, Registers::arithmetic_temp3, 0, Registers::arithmetic_temp3, 0, Registers::arithmetic_temp2, 0);

        // Combine the masked initial value with the shifted loaded value and store it in the destination.
        // out = masked_value | input_value
        if (doubleword) {
            // Combine the values into a temp so that it can be rotated to the correct word order.
            sljit_emit_op2(compiler, SLJIT_OR, Registers::arithmetic_temp4, 0, Registers::arithmetic_temp4, 0, Registers::arithmetic_temp3, 0);
            sljit_emit_op2(compiler, SLJIT_ROTL, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0, Registers::arithmetic_temp4, 0, SLJIT_IMM, 32);
        }
        else {
            sljit_emit_op2(compiler, SLJIT_OR32, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0, Registers::arithmetic_temp4, 0, Registers::arithmetic_temp3, 0);
        }
    };

    switch (op.type) {
        case StoreOpType::SD:
        case StoreOpType::SDC1:        
            // Rotate the arithmetic temp by 32 bits to swap the words and move it into the destination.
            sljit_emit_op2(compiler, SLJIT_ROTL, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0, src, srcw, SLJIT_IMM, 32);
            break;
        case StoreOpType::SDL:
            do_unaligned_store_op(true, true);
            break;
        case StoreOpType::SDR:
            do_unaligned_store_op(false, true);
            break;
        case StoreOpType::SW:
        case StoreOpType::SWC1:
            // store the 32-bit value at address + rdram
            sljit_emit_op1(compiler, SLJIT_MOV_U32, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0, src, srcw);
            break;
        case StoreOpType::SWL:
            do_unaligned_store_op(true, false);
            break;
        case StoreOpType::SWR:
            do_unaligned_store_op(false, false);
            break;
        case StoreOpType::SH:
            // xor the address with 2
            sljit_emit_op2(compiler, SLJIT_XOR, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, SLJIT_IMM, 2);
            // store the 16-bit value at address + rdram
            sljit_emit_op1(compiler, SLJIT_MOV_U16, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0, src, srcw);
            break;
        case StoreOpType::SB:
            // xor the address with 3
            sljit_emit_op2(compiler, SLJIT_XOR, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, SLJIT_IMM, 3);
            // store the 8-bit value at address + rdram
            sljit_emit_op1(compiler, SLJIT_MOV_U8, SLJIT_MEM2(Registers::rdram, Registers::arithmetic_temp1), 0, src, srcw);
            break;
    }
}

void N64Recomp::LiveGenerator::emit_function_start(const std::string& function_name, size_t func_index) const {
    context->function_name = function_name;
    context->func_labels[func_index] = sljit_emit_label(compiler);
    // sljit_emit_op0(compiler, SLJIT_BREAKPOINT);
    sljit_emit_enter(compiler, 0, SLJIT_ARGS2V(P, P), 4 | SLJIT_ENTER_FLOAT(1), 5 | SLJIT_ENTER_FLOAT(0), 0);
    sljit_emit_op2(compiler, SLJIT_SUB, Registers::rdram, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
    
    // Check if this function's entry is hooked and emit the hook call if so.
    auto find_hook_it = inputs.entry_func_hooks.find(func_index);
    if (find_hook_it != inputs.entry_func_hooks.end()) {
        // Load rdram and ctx into R0 and R1.
        sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R0, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
        sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, Registers::ctx, 0);
        // Load the hook's index into R2.
        sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R2, 0, SLJIT_IMM, find_hook_it->second);
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS3V(P, P, W), SLJIT_IMM, sljit_sw(inputs.run_hook));
    }
}

void N64Recomp::LiveGenerator::emit_function_end() const {
    // Check that all jumps have been paired to a label.
    if (!context->pending_jumps.empty()) {
        assert(false);
        errored = true;
    }
    
    // Populate the labels for pending switches and move them into the unlinked jump tables.
    bool invalid_switch = false;
    for (size_t switch_index = 0; switch_index < context->switch_jump_labels.size(); switch_index++) {
        const std::vector<std::string>& cur_labels = context->switch_jump_labels[switch_index];
        std::vector<sljit_label*> cur_label_addrs{};
        cur_label_addrs.resize(cur_labels.size());
        for (size_t case_index = 0; case_index < cur_labels.size(); case_index++) {
            // Find the label.
            auto find_it = context->labels.find(cur_labels[case_index]);
            if (find_it == context->labels.end()) {
                // Label not found, invalid switch.
                // Track this in a variable instead of returning immediately so that the pending labels are still cleared.
                invalid_switch = true;
                break;
            }
            cur_label_addrs[case_index] = find_it->second;
        }
        context->unlinked_jump_tables.emplace_back(
            std::make_pair<std::vector<sljit_label*>, std::unique_ptr<void*[]>>(
                std::move(cur_label_addrs),
                std::move(context->pending_jump_tables[switch_index])
            )
        );
    }
    context->switch_jump_labels.clear();
    context->pending_jump_tables.clear();

    // Clear the labels to prevent labels from one function being jumped to by another.
    context->labels.clear();

    if (invalid_switch) {
        assert(false);
        errored = true;
    }
}

void N64Recomp::LiveGenerator::emit_function_call_lookup(uint32_t addr) const {
    // Load the address immediate into the first argument. 
    sljit_emit_op1(compiler, SLJIT_MOV32, SLJIT_R0, 0, SLJIT_IMM, int32_t(addr));
    
    // Call get_function.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1(P, 32), SLJIT_IMM, sljit_sw(inputs.get_function));
    
    // Copy the return value into R3 so that it can be used for icall
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R3, 0, SLJIT_RETURN_REG, 0);
    
    // Load rdram and ctx into R0 and R1.
    sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R0, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, Registers::ctx, 0);

    // Call the function.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS2V(P, P), SLJIT_R3, 0);
}

void N64Recomp::LiveGenerator::emit_function_call_by_register(int reg) const {
    // Load the register's value into the first argument. 
    sljit_emit_op1(compiler, SLJIT_MOV32, SLJIT_R0, 0, SLJIT_MEM1(Registers::ctx), get_gpr_context_offset(reg));

    // Call get_function.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1(P, 32), SLJIT_IMM, sljit_sw(inputs.get_function));

    // Copy the return value into R3 so that it can be used for icall
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R3, 0, SLJIT_RETURN_REG, 0);

    // Load rdram and ctx into R0 and R1.
    sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R0, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, Registers::ctx, 0);

    // Call the function.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS2V(P, P), SLJIT_R3, 0);
}

void N64Recomp::LiveGenerator::emit_function_call_reference_symbol(const Context&, uint16_t section_index, size_t symbol_index, uint32_t target_section_offset) const {
    (void)symbol_index;

    // Load rdram and ctx into R0 and R1.
    sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R0, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, Registers::ctx, 0);
    // sljit_emit_op0(compiler, SLJIT_BREAKPOINT);
    // Call the function and save the jump to set its label later on.
    sljit_jump* call_jump = sljit_emit_call(compiler, SLJIT_CALL | SLJIT_REWRITABLE_JUMP, SLJIT_ARGS2V(P, P));
    // Set a dummy jump value, this will get replaced during reference/import symbol jump population.
    if (section_index == N64Recomp::SectionImport) {
        sljit_set_target(call_jump, sljit_uw(-1));
        context->import_jumps_by_index.emplace(symbol_index, call_jump);
    }
    else {
        sljit_set_target(call_jump, sljit_uw(-2));
        context->reference_symbol_jumps.emplace_back(std::make_pair(
            ReferenceJumpDetails{
                .section = section_index,
                .section_offset = target_section_offset
            },
            call_jump
        ));
    }
}

void N64Recomp::LiveGenerator::emit_function_call(const Context&, size_t function_index) const {
    // Load rdram and ctx into R0 and R1.
    sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R0, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, Registers::ctx, 0);
    // Call the function and save the jump to set its label later on.
    sljit_jump* call_jump = sljit_emit_call(compiler, SLJIT_CALL, SLJIT_ARGS2V(P, P));
    context->inner_calls.emplace_back(InnerCall{ .target_func_index = function_index, .jump = call_jump });
}

void N64Recomp::LiveGenerator::emit_named_function_call(const std::string& function_name) const {
    // The live recompiler can't call functions by name. This is only used for statics, so it's not an issue.
    assert(false);
    errored = true;
}

void N64Recomp::LiveGenerator::emit_goto(const std::string& target) const {
    sljit_jump* jump = sljit_emit_jump(compiler, SLJIT_JUMP);
    // Check if the label already exists.
    auto find_it = context->labels.find(target);
    if (find_it != context->labels.end()) {
        sljit_set_label(jump, find_it->second);
    }
    // It doesn't, so queue this as a pending jump to be resolved later.
    else {
        context->pending_jumps[target].push_back(jump);
    }
}

void N64Recomp::LiveGenerator::emit_label(const std::string& label_name) const {
    sljit_label* label = sljit_emit_label(compiler);

    // Check if there are any pending jumps for this label and assign them if so.
    auto find_it = context->pending_jumps.find(label_name);
    if (find_it != context->pending_jumps.end()) {
        for (sljit_jump* jump : find_it->second) {
            sljit_set_label(jump, label);
        }

        // Remove the pending jumps for this label.
        context->pending_jumps.erase(find_it);
    }

    context->labels.emplace(label_name, label);
}

void N64Recomp::LiveGenerator::emit_jtbl_addend_declaration(const JumpTable& jtbl, int reg) const {
    (void)jtbl;
    (void)reg;
    // Nothing to do here, the live recompiler performs a subtraction to get the switch's case.
}

void N64Recomp::LiveGenerator::emit_branch_condition(const ConditionalBranchOp& op, const InstructionContext& ctx) const {
    // Make sure there's no pending jump.
    if(context->cur_branch_jump != nullptr) {
        assert(false);
        errored = true;
        return;
    }

    // Branch conditions do not allow unary ops, except for ToS64 on the first operand to indicate the branch comparison is signed.
    if(op.operands.operand_operations[0] != UnaryOpType::None && op.operands.operand_operations[0] != UnaryOpType::ToS64) {
        assert(false);
        errored = true;
        return;
    }

    if (op.operands.operand_operations[1] != UnaryOpType::None) {
        assert(false);
        errored = true;
        return;
    }

    // Branch conditions do not allow float u32l operands.
    if (is_fpr_u32l(op.operands.operands[0]) || is_fpr_u32l(op.operands.operands[1])) {
        assert(false);
        errored = true;
        return;
    }

    sljit_s32 condition_type;
    bool cmp_signed = op.operands.operand_operations[0] == UnaryOpType::ToS64;
    // Comparisons need to be inverted to account for the fact that the generator is expected to generate a code block that only runs if
    // the condition is met, meaning the branch should be taken if the condition isn't met.
    switch (op.comparison) {
        case BinaryOpType::Equal:
            condition_type = SLJIT_NOT_EQUAL;
            break;
        case BinaryOpType::NotEqual:
            condition_type = SLJIT_EQUAL;
            break;
        case BinaryOpType::GreaterEq:
            if (cmp_signed) {
                condition_type = SLJIT_SIG_LESS;
            }
            else {
                condition_type = SLJIT_LESS;
            }
            break;
        case BinaryOpType::Greater:
            if (cmp_signed) {
                condition_type = SLJIT_SIG_LESS_EQUAL;
            }
            else {
                condition_type = SLJIT_LESS_EQUAL;
            }
            break;
        case BinaryOpType::LessEq:
            if (cmp_signed) {
                condition_type = SLJIT_SIG_GREATER;
            }
            else {
                condition_type = SLJIT_GREATER;
            }
            break;
        case BinaryOpType::Less:
            if (cmp_signed) {
                condition_type = SLJIT_SIG_GREATER_EQUAL;
            }
            else {
                condition_type = SLJIT_GREATER_EQUAL;
            }
            break;
        default:
            assert(false && "Invalid branch condition comparison operation!");
            errored = true;
            return;
    }
    sljit_sw src1;
    sljit_sw src1w;
    sljit_sw src2;
    sljit_sw src2w;

    get_operand_values(op.operands.operands[0], ctx, src1, src1w, nullptr, 0);
    get_operand_values(op.operands.operands[1], ctx, src2, src2w, nullptr, 0);

    // Relocations aren't valid on conditional branches.
    if(ctx.reloc_type != RelocType::R_MIPS_NONE) {
        assert(false);
        errored = true;
        return;
    }

    // Create a compare jump and track it as the pending branch jump.
    context->cur_branch_jump = sljit_emit_cmp(compiler, condition_type, src1, src1w, src2, src2w);
}

void N64Recomp::LiveGenerator::emit_branch_close() const {
    // Make sure there's a pending branch jump.
    if(context->cur_branch_jump == nullptr) {
        assert(false);
        errored = true;
        return;
    }

    // Assign a label at this point to the pending branch jump and clear it.
    sljit_set_label(context->cur_branch_jump, sljit_emit_label(compiler));
    context->cur_branch_jump = nullptr;
}

void N64Recomp::LiveGenerator::emit_switch(const Context& recompiler_context, const JumpTable& jtbl, int reg) const {
    // Populate the switch's labels.
    std::vector<std::string> cur_labels{};
    cur_labels.resize(jtbl.entries.size());
    for (size_t i = 0; i < cur_labels.size(); i++) {
        cur_labels[i] = fmt::format("L_{:08X}", jtbl.entries[i]);
    }
    context->switch_jump_labels.emplace_back(std::move(cur_labels));

    // Allocate the jump table.
    std::unique_ptr<void* []> cur_jump_table = std::make_unique<void* []>(jtbl.entries.size());

    /// Codegen

    // Load the jump target register. The lw instruction was patched into an addiu, so this holds
    // the address of the jump table entry instead of the actual jump target.
    sljit_emit_op1(compiler, SLJIT_MOV, Registers::arithmetic_temp1, 0, SLJIT_MEM1(Registers::ctx), get_gpr_context_offset(reg));
    // Subtract the jump table's address from the jump target to get the jump table addend.
    // Sign extend the jump table address to 64 bits so that the entire register's contents are used instead of just the lower 32 bits.
    const auto& jtbl_section = recompiler_context.sections[jtbl.section_index];
    if (jtbl_section.relocatable) {
        // Make a dummy instruction context to pass to `load_relocated_address`.
        InstructionContext dummy_context{};
        
        // Get the relocated address of the jump table.
        uint32_t section_offset = jtbl.vram - jtbl_section.ram_addr;

        // Get the section index to use for relocation at runtime.
        uint16_t reloc_section_index = jtbl.section_index;
        if (!inputs.original_section_indices.empty()) {
            reloc_section_index = inputs.original_section_indices[reloc_section_index];
        }

        // Populate the necessary fields of the dummy context and load the relocated address into temp2.
        dummy_context.reloc_section_index = reloc_section_index;
        dummy_context.reloc_target_section_offset = section_offset;
        load_relocated_address(dummy_context, Registers::arithmetic_temp2);

        // Subtract the relocated jump table start address from the loaded address. 
        sljit_emit_op2(compiler, SLJIT_SUB, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp2, 0);
    }
    else {
        sljit_emit_op2(compiler, SLJIT_SUB, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, SLJIT_IMM, (sljit_sw)((int32_t)jtbl.vram));
    }
    
    // Bounds check the addend. If it's greater than or equal to the jump table size (entries * sizeof(u32)) then jump to the switch error.
    sljit_jump* switch_error_jump = sljit_emit_cmp(compiler, SLJIT_GREATER_EQUAL, Registers::arithmetic_temp1, 0, SLJIT_IMM, jtbl.entries.size() * sizeof(uint32_t));
    context->switch_error_jumps.emplace_back(SwitchErrorJump{.instr_vram = jtbl.jr_vram, .jtbl_vram = jtbl.vram, .jump = switch_error_jump});

    // Multiply the jump table addend by 2 to get the addend for the real jump table. (4 bytes per entry to 8 bytes per entry).
    sljit_emit_op2(compiler, SLJIT_ADD, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0, Registers::arithmetic_temp1, 0);
    // Load the real jump table address.
    sljit_emit_op1(compiler, SLJIT_MOV, Registers::arithmetic_temp2, 0, SLJIT_IMM, (sljit_sw)cur_jump_table.get());
    // Load the real jump entry.
    sljit_emit_op1(compiler, SLJIT_MOV, Registers::arithmetic_temp1, 0, SLJIT_MEM2(Registers::arithmetic_temp1, Registers::arithmetic_temp2), 0);
    // Jump to the loaded entry.
    sljit_emit_ijump(compiler, SLJIT_JUMP, Registers::arithmetic_temp1, 0);

    // Move the jump table into the pending jump tables.
    context->pending_jump_tables.emplace_back(std::move(cur_jump_table));
}

void N64Recomp::LiveGenerator::emit_case(int case_index, const std::string& target_label) const {
    (void)case_index;
    (void)target_label;
    // Nothing to do here, the jump table is built in emit_switch.
}

void N64Recomp::LiveGenerator::emit_switch_error(uint32_t instr_vram, uint32_t jtbl_vram) const {
    (void)instr_vram;
    (void)jtbl_vram;
    // Nothing to do here, the jump table is built in emit_switch.
}

void N64Recomp::LiveGenerator::emit_switch_close() const {
    // Nothing to do here, the jump table is built in emit_switch.
}

void N64Recomp::LiveGenerator::emit_return(const Context& context, size_t func_index) const {
    (void)context;
    
    // Check if this function's return is hooked and emit the hook call if so.
    auto find_hook_it = inputs.return_func_hooks.find(func_index);
    if (find_hook_it != inputs.return_func_hooks.end()) {
        // Load rdram and ctx into R0 and R1.
        sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R0, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
        sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, Registers::ctx, 0);
        // Load the return hook's index into R2.
        sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R2, 0, SLJIT_IMM, find_hook_it->second);
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS3V(P, P, W), SLJIT_IMM, sljit_sw(inputs.run_hook));
    }
    sljit_emit_return_void(compiler);
}

void N64Recomp::LiveGenerator::emit_check_fr(int fpr) const {
    (void)fpr;
    // Nothing to do here.
}

void N64Recomp::LiveGenerator::emit_check_nan(int fpr, bool is_double) const {
    (void)fpr;
    (void)is_double;
    // Nothing to do here.
}

void N64Recomp::LiveGenerator::emit_cop0_status_read(int reg) const {
    // Skip the read if the target is the zero register.
    if (reg != 0) {
        // Load ctx into R0.
        sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, Registers::ctx, 0);

        // Call cop0_status_read.
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1V(P), SLJIT_IMM, sljit_sw(inputs.cop0_status_read));

        // Store the result in the output register.
        sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_MEM1(Registers::ctx), get_gpr_context_offset(reg), SLJIT_R0, 0);
    }
}

void N64Recomp::LiveGenerator::emit_cop0_status_write(int reg) const {
    sljit_sw src;
    sljit_sw srcw;
    get_gpr_values(reg, src, srcw);
    
    // Load ctx and the input register value into R0 and R1
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, Registers::ctx, 0);
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, src, srcw);

    // Call cop0_status_write.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS2V(P,32), SLJIT_IMM, sljit_sw(inputs.cop0_status_write));
}

void N64Recomp::LiveGenerator::emit_cop1_cs_read(int reg) const {
    // Skip the read if the target is the zero register.
    if (reg != 0) {
        sljit_sw dst;
        sljit_sw dstw;
        get_gpr_values(reg, dst, dstw);

        // Call get_cop1_cs.
        sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS0(32), SLJIT_IMM, sljit_sw(get_cop1_cs));

        // Sign extend the result into a temp register.
        sljit_emit_op1(compiler, SLJIT_MOV_S32, Registers::arithmetic_temp1, 0, SLJIT_RETURN_REG, 0);

        // Move the sign extended result into the destination.
        sljit_emit_op1(compiler, SLJIT_MOV, dst, dstw, Registers::arithmetic_temp1, 0);
    }
}

void N64Recomp::LiveGenerator::emit_cop1_cs_write(int reg) const {
    sljit_sw src;
    sljit_sw srcw;
    get_gpr_values(reg, src, srcw);

    // Load the input register value into R0.
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, src, srcw);

    // Call set_cop1_cs.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1V(32), SLJIT_IMM, sljit_sw(set_cop1_cs));
}

void N64Recomp::LiveGenerator::emit_muldiv(InstrId instr_id, int reg1, int reg2) const {
    sljit_sw src1;
    sljit_sw src1w;
    sljit_sw src2;
    sljit_sw src2w;
    get_gpr_values(reg1, src1, src1w);
    get_gpr_values(reg2, src2, src2w);
    
    auto do_mul32_op = [src1, src1w, src2, src2w, this](bool is_signed) {
        // Load the two inputs into the multiplication input registers (R0/R1).
        if (is_signed) {
            // 32-bit signed multiplication is really 64 bits * 35 bits, so load accordingly.
            sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, src1, src1w); 

            // Sign extend to 35 bits by shifting left by 64 - 35 and then shifting right by the same amount.
            sljit_emit_op2(compiler, SLJIT_SHL, SLJIT_R1, 0, src2, src2w, SLJIT_IMM, 64 - 35);
            sljit_emit_op2(compiler, SLJIT_ASHR, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_IMM, 64 - 35);
        }
        else {
            sljit_emit_op1(compiler, SLJIT_MOV_U32, SLJIT_R0, 0, src1, src1w);
            sljit_emit_op1(compiler, SLJIT_MOV_U32, SLJIT_R1, 0, src2, src2w);
        }

        // Perform the multiplication.
        sljit_emit_op0(compiler, is_signed ? SLJIT_LMUL_SW : SLJIT_LMUL_UW);

        // Move the results into hi and lo with sign extension.
        sljit_emit_op2(compiler, SLJIT_ASHR, Registers::hi, 0, SLJIT_R0, 0, SLJIT_IMM, 32);
        sljit_emit_op1(compiler, SLJIT_MOV_S32, Registers::lo, 0, SLJIT_R0, 0);
    };
    
    auto do_mul64_op = [src1, src1w, src2, src2w, this](bool is_signed) {
        // Load the two inputs into the multiplication input registers (R0/R1).
        sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, src1, src1w); 
        sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, src2, src2w);

        // Perform the multiplication.
        sljit_emit_op0(compiler, is_signed ? SLJIT_LMUL_SW : SLJIT_LMUL_UW);

        // Move the results into hi and lo.
        sljit_emit_op1(compiler, SLJIT_MOV, Registers::hi, 0, SLJIT_R1, 0);
        sljit_emit_op1(compiler, SLJIT_MOV, Registers::lo, 0, SLJIT_R0, 0);
    };
    
    auto do_div_op = [src1, src1w, src2, src2w, this](bool doubleword, bool is_signed) {
        // Pick the division opcode based on the bit width and signedness.
        // Note that the 64-bit division opcode is used for 32-bit signed division to match hardware behavior and prevent overflow.
        sljit_sw div_opcode = doubleword ?
            (is_signed ? SLJIT_DIVMOD_SW : SLJIT_DIVMOD_UW) :
            (is_signed ? SLJIT_DIVMOD_SW : SLJIT_DIVMOD_U32);

        // Pick the move opcode to use for loading the operands.
        sljit_sw load_opcode = doubleword ? SLJIT_MOV :
            (is_signed ? SLJIT_MOV_S32 : SLJIT_MOV_U32);

        // Pick the move opcode to use for saving the results.
        sljit_sw save_opcode = doubleword ? SLJIT_MOV : SLJIT_MOV_S32;

        // Load the two inputs into R0 and R1 (the numerator and denominator).
        sljit_emit_op1(compiler, load_opcode, SLJIT_R0, 0, src1, src1w); 

        // TODO figure out 32-bit signed division behavior when inputs aren't properly sign extended.
        // if (!doubleword && is_signed) {
        //     // Sign extend to 35 bits by shifting left by 64 - 35 and then shifting right by the same amount.
        //     sljit_emit_op2(compiler, SLJIT_SHL, SLJIT_R1, 0, src2, src2w, SLJIT_IMM, 64 - 35);
        //     sljit_emit_op2(compiler, SLJIT_ASHR, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_IMM, 64 - 35);
        // }
        // else {
            sljit_emit_op1(compiler, load_opcode, SLJIT_R1, 0, src2, src2w);
        // }

        // Prevent overflow on 64-bit signed division.
        if (doubleword && is_signed) {
            // If the numerator is INT64_MIN and the denominator is -1, an overflow will occur. To prevent an exception and
            // behave as the original hardware would, check if either of those conditions are false.
            // If neither condition is false (i.e. both are true), set the denominator to 1.

            // Xor the numerator with INT64_MIN. This will be zero if they're equal.
            sljit_emit_op2(compiler, SLJIT_XOR, Registers::arithmetic_temp3, 0, Registers::arithmetic_temp1, 0, SLJIT_IMM, sljit_sw(INT64_MIN));

            // Invert the denominator. This will be zero if it's -1.
            sljit_emit_op2(compiler, SLJIT_XOR, Registers::arithmetic_temp4, 0, Registers::arithmetic_temp2, 0, SLJIT_IMM, sljit_sw(-1)); 

            // Or the results of the previous two calculations and set the zero flag. This will be zero if both conditions were met.
            sljit_emit_op2(compiler, SLJIT_OR | SLJIT_SET_Z, Registers::arithmetic_temp3, 0, Registers::arithmetic_temp3, 0, Registers::arithmetic_temp4, 0);

            // If the zero flag is 0, meaning both conditions were true, replace the denominator with 1.
            // i.e. conditionally move an immediate of 1 into arithmetic temp 2 if the zero flag is 0.
            sljit_emit_select(compiler, SLJIT_ZERO, SLJIT_R1, SLJIT_IMM, 1, SLJIT_R1);
        }

        // If the denominator is 0, skip the division and jump the special handling for that case.
        // Branch past the division if the divisor is 0.
        sljit_jump* jump_skip_division = sljit_emit_cmp(compiler, SLJIT_EQUAL, SLJIT_R1, 0, SLJIT_IMM, 0);// sljit_emit_jump(compiler, SLJIT_ZERO);

        // Perform the division.
        sljit_emit_op0(compiler, div_opcode);

        // Extract the remainder and quotient into the high and low registers respectively.
        sljit_emit_op1(compiler, save_opcode, Registers::hi, 0, SLJIT_R1, 0);
        sljit_emit_op1(compiler, save_opcode, Registers::lo, 0, SLJIT_R0, 0);

        // Jump to the end of this routine.
        sljit_jump* jump_to_end = sljit_emit_jump(compiler, SLJIT_JUMP);

        // Emit a label and set it as the target of the jump if the denominator was zero.
        sljit_label* after_division = sljit_emit_label(compiler);
        sljit_set_label(jump_skip_division, after_division);

        // Move the numerator into hi.
        sljit_emit_op1(compiler, save_opcode, Registers::hi, 0, SLJIT_R0, 0);

        if (is_signed) {
            // Calculate the negative signum of the numerator and place it in lo.
            // neg_signum = ((int64_t)(~x) >> (bit width - 1)) | 1
            sljit_emit_op2(compiler, SLJIT_XOR, Registers::lo, 0, SLJIT_R0, 0, SLJIT_IMM, sljit_sw(-1));
            sljit_emit_op2(compiler, SLJIT_ASHR, Registers::lo, 0, Registers::lo, 0, SLJIT_IMM, 64 - 1);
            sljit_emit_op2(compiler, SLJIT_OR, Registers::lo, 0, Registers::lo, 0, SLJIT_IMM, 1);
        }
        else {
            // Move -1 into lo.
            sljit_emit_op1(compiler, SLJIT_MOV, Registers::lo, 0, SLJIT_IMM, sljit_sw(-1));
        }

        // Emit a label and set it as the target of the jump after the divison.
        sljit_label* end_label = sljit_emit_label(compiler);
        sljit_set_label(jump_to_end, end_label);
    };
    

    switch (instr_id) {
        case InstrId::cpu_mult:
            do_mul32_op(true);
            break;
        case InstrId::cpu_multu:
            do_mul32_op(false);
            break;
        case InstrId::cpu_dmult:
            do_mul64_op(true);
            break;
        case InstrId::cpu_dmultu:
            do_mul64_op(false);
            break;
        case InstrId::cpu_div:
            do_div_op(false, true);
            break;
        case InstrId::cpu_divu:
            do_div_op(false, false);
            break;
        case InstrId::cpu_ddiv:
            do_div_op(true, true);
            break;
        case InstrId::cpu_ddivu:
            do_div_op(true, false);
            break;
        default:
            assert(false && "Invalid mul/div instruction id!");
            break;
    }
}

void N64Recomp::LiveGenerator::emit_syscall(uint32_t instr_vram) const {
    // Load rdram and ctx into R0 and R1.
    sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R0, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, Registers::ctx, 0);
    // Load the vram into R2.
    sljit_emit_op1(compiler, SLJIT_MOV32, SLJIT_R2, 0, SLJIT_IMM, instr_vram);
    // Call syscall_handler.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS3V(P, P, 32), SLJIT_IMM, sljit_sw(inputs.syscall_handler));
}

void N64Recomp::LiveGenerator::emit_do_break(uint32_t instr_vram) const {
    // Load the vram into R0.
    sljit_emit_op1(compiler, SLJIT_MOV32, SLJIT_R0, 0, SLJIT_IMM, instr_vram);
    // Call do_break.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1V(32), SLJIT_IMM, sljit_sw(inputs.do_break));
}

void N64Recomp::LiveGenerator::emit_pause_self() const {
    // Load rdram into R0.
    sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R0, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
    // Call pause_self.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS1V(P), SLJIT_IMM, sljit_sw(inputs.pause_self));
}

void N64Recomp::LiveGenerator::emit_trigger_event(uint32_t event_index) const {
    // Load rdram and ctx into R0 and R1.
    sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R0, 0, Registers::rdram, 0, SLJIT_IMM, rdram_offset);
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, Registers::ctx, 0);
    // Load the global event index into R2.
    sljit_emit_op1(compiler, SLJIT_MOV32, SLJIT_R2, 0, SLJIT_IMM, event_index + inputs.base_event_index);
    // Call trigger_event.
    sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_ARGS3V(P,P,32), SLJIT_IMM, sljit_sw(inputs.trigger_event));
}

void N64Recomp::LiveGenerator::emit_comment(const std::string& comment) const {
    (void)comment;
    // Nothing to do here.
}

bool N64Recomp::recompile_function_live(LiveGenerator& generator, const Context& context, size_t function_index, std::ostream& output_file, std::span<std::vector<uint32_t>> static_funcs_out, bool tag_reference_relocs) {
    return recompile_function_custom(generator, context, function_index, output_file, static_funcs_out, tag_reference_relocs);
}

N64Recomp::ShimFunction::ShimFunction(recomp_func_ext_t* to_shim, uintptr_t value) {
    sljit_compiler* compiler = sljit_create_compiler(nullptr);

    // Create the function.
    sljit_label* func_label = sljit_emit_label(compiler);
    sljit_emit_enter(compiler, 0, SLJIT_ARGS2V(P_R, P_R), 3, 0, 0);

    // Move the provided value into the third argument.
    sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R2, 0, SLJIT_IMM, sljit_sw(value));

    // Tail call the provided function.
    sljit_emit_icall(compiler, SLJIT_CALL | SLJIT_CALL_RETURN, SLJIT_ARGS3V(P, P, W), SLJIT_IMM, sljit_sw(to_shim));

    // Generate the function's code and get the address to the function.
    code = sljit_generate_code(compiler, 0, nullptr);
    func = reinterpret_cast<recomp_func_t*>(sljit_get_label_addr(func_label));

    // Cleanup.
    sljit_free_compiler(compiler);
}

N64Recomp::ShimFunction::~ShimFunction() {
    sljit_free_code(code, nullptr);
    code = nullptr;
    func = nullptr;
}


================================================
FILE: LiveRecomp/live_recompiler_test.cpp
================================================
#include <fstream>
#include <chrono>
#include <filesystem>
#include <cinttypes>

#include "sljitLir.h"
#include "recompiler/live_recompiler.h"
#include "recomp.h"

static std::vector<uint8_t> read_file(const std::filesystem::path& path, bool& found) {
    std::vector<uint8_t> ret;
    found = false;

    std::ifstream file{ path, std::ios::binary};

    if (file.good()) {
        file.seekg(0, std::ios::end);
        ret.resize(file.tellg());
        file.seekg(0, std::ios::beg);

        file.read(reinterpret_cast<char*>(ret.data()), ret.size());
        found = true;
    }

    return ret;
}


uint32_t read_u32_swap(const std::vector<uint8_t>& vec, size_t offset) {
    return byteswap(*reinterpret_cast<const uint32_t*>(&vec[offset]));
}

uint32_t read_u32(const std::vector<uint8_t>& vec, size_t offset) {
    return *reinterpret_cast<const uint32_t*>(&vec[offset]);
}

std::vector<uint8_t> rdram;

void byteswap_copy(uint8_t* dst, uint8_t* src, size_t count) {
    for (size_t i = 0; i < count; i++) {
        dst[i ^ 3] = src[i];
    }
}

bool byteswap_compare(uint8_t* a, uint8_t* b, size_t count) {
    for (size_t i = 0; i < count; i++) {
        if (a[i ^ 3] != b[i]) {
            return false;
        }
    }
    return true;
}

enum class TestError {
    Success,
    FailedToOpenInput,
    FailedToRecompile,
    UnknownStructType,
    DataDifference
};

struct TestStats {
    TestError error;
    uint64_t codegen_microseconds;
    uint64_t execution_microseconds;
    uint64_t code_size;
};

void write1(uint8_t* rdram, recomp_context* ctx) {
    MEM_B(0, ctx->r4) = 1;
}

recomp_func_t* test_get_function(int32_t vram) {
    if (vram == 0x80100000) {
        return write1;
    }
    assert(false);
    return nullptr;
}

void test_switch_error(const char* func, uint32_t vram, uint32_t jtbl) {
    printf("  Switch-case out of bounds in %s at 0x%08X for jump table at 0x%08X\n", func, vram, jtbl);
}

TestStats run_test(const std::filesystem::path& tests_dir, const std::string& test_name) {
    std::filesystem::path input_path = tests_dir / (test_name + "_data.bin");
    std::filesystem::path data_dump_path = tests_dir / (test_name + "_data_out.bin");

    bool found;
    std::vector<uint8_t> file_data = read_file(input_path, found);

    if (!found) {
        printf("Failed to open file: %s\n", input_path.string().c_str());
        return { TestError::FailedToOpenInput };
    }

    // Parse the test file.
    uint32_t text_offset = read_u32_swap(file_data, 0x00);
    uint32_t text_length = read_u32_swap(file_data, 0x04);
    uint32_t init_data_offset = read_u32_swap(file_data, 0x08);
    uint32_t good_data_offset = read_u32_swap(file_data, 0x0C);
    uint32_t data_length = read_u32_swap(file_data, 0x10);
    uint32_t text_address = read_u32_swap(file_data, 0x14);
    uint32_t data_address = read_u32_swap(file_data, 0x18);
    uint32_t next_struct_address = read_u32_swap(file_data, 0x1C);

    recomp_context ctx{};

    byteswap_copy(&rdram[text_address - 0x80000000], &file_data[text_offset], text_length);
    byteswap_copy(&rdram[data_address - 0x80000000], &file_data[init_data_offset], data_length);

    // Build recompiler context.
    N64Recomp::Context context{};

    // Move the file data into the context.
    context.rom = std::move(file_data);

    context.sections.resize(2);
    // Create a section for the function to exist in.
    context.sections[0].ram_addr = text_address;
    context.sections[0].rom_addr = text_offset;
    context.sections[0].size = text_length;
    context.sections[0].name = ".text";
    context.sections[0].executable = true;
    context.sections[0].relocatable = true;
    context.section_functions.resize(context.sections.size());
    // Create a section for .data (used for relocations)
    context.sections[1].ram_addr = data_address;
    context.sections[1].rom_addr = init_data_offset;
    context.sections[1].size = data_length;
    context.sections[1].name = ".data";
    context.sections[1].executable = false;
    context.sections[1].relocatable = true;

    size_t start_func_index;
    uint32_t function_desc_address = 0;
    uint32_t reloc_desc_address = 0;

    // Read any extra structs.
    while (next_struct_address != 0) {
        uint32_t cur_struct_address = next_struct_address;
        uint32_t struct_type = read_u32_swap(context.rom, next_struct_address + 0x00);
        next_struct_address = read_u32_swap(context.rom, next_struct_address + 0x04);

        switch (struct_type) {
            case 1: // Function desc
                function_desc_address = cur_struct_address;
                break;
            case 2: // Relocation
                reloc_desc_address = cur_struct_address;
                break;
            default:
                printf("Unknown struct type %u\n", struct_type);
                return { TestError::UnknownStructType };
        }
    }

    // Check if a function description exists.
    if (function_desc_address == 0) {
        // No function description, so treat the whole thing as one function.

        // Get the function's instruction words.
        std::vector<uint32_t> text_words{};
        text_words.resize(text_length / sizeof(uint32_t));
        for (size_t i = 0; i < text_words.size(); i++) {
            text_words[i] = read_u32(context.rom, text_offset + i * sizeof(uint32_t));
        }

        // Add the function to the context.
        context.functions_by_vram[text_address].emplace_back(context.functions.size());
        context.section_functions.emplace_back(context.functions.size());
        context.sections[0].function_addrs.emplace_back(text_address);
        context.functions.emplace_back(
            text_address,
            text_offset,
            text_words,
            "test_func",
            0
        );
        start_func_index = 0;
    }
    else {
        // Use the function description.
        uint32_t num_funcs = read_u32_swap(context.rom, function_desc_address + 0x08);
        start_func_index = read_u32_swap(context.rom, function_desc_address + 0x0C);

        for (size_t func_index = 0; func_index < num_funcs; func_index++) {
            uint32_t cur_func_address = read_u32_swap(context.rom, function_desc_address + 0x10 + 0x00 + 0x08 * func_index);
            uint32_t cur_func_length = read_u32_swap(context.rom, function_desc_address + 0x10 + 0x04 + 0x08 * func_index);
            uint32_t cur_func_offset = cur_func_address - text_address + text_offset;

            // Get the function's instruction words.
            std::vector<uint32_t> text_words{};
            text_words.resize(cur_func_length / sizeof(uint32_t));
            for (size_t i = 0; i < text_words.size(); i++) {
                text_words[i] = read_u32(context.rom, cur_func_offset + i * sizeof(uint32_t));
            }

            // Add the function to the context.
            context.functions_by_vram[cur_func_address].emplace_back(context.functions.size());
            context.section_functions.emplace_back(context.functions.size());
            context.sections[0].function_addrs.emplace_back(cur_func_address);
            context.functions.emplace_back(
                cur_func_address,
                cur_func_offset,
                std::move(text_words),
                "test_func_" + std::to_string(func_index),
                0
            );
        }
    }

    // Check if a relocation description exists.
    if (reloc_desc_address != 0) {
        uint32_t num_relocs = read_u32_swap(context.rom, reloc_desc_address + 0x08);
        for (uint32_t reloc_index = 0; reloc_index < num_relocs; reloc_index++) {
            uint32_t cur_desc_address = reloc_desc_address + 0x0C + reloc_index * 4 * sizeof(uint32_t);
            uint32_t reloc_type = read_u32_swap(context.rom, cur_desc_address + 0x00);
            uint32_t reloc_section = read_u32_swap(context.rom, cur_desc_address + 0x04);
            uint32_t reloc_address = read_u32_swap(context.rom, cur_desc_address + 0x08);
            uint32_t reloc_target_offset = read_u32_swap(context.rom, cur_desc_address + 0x0C);

            context.sections[0].relocs.emplace_back(N64Recomp::Reloc{
                .address = reloc_address,
                .target_section_offset = reloc_target_offset,
                .symbol_index = 0,
                .target_section = static_cast<uint16_t>(reloc_section),
                .type = static_cast<N64Recomp::RelocType>(reloc_type),
                .reference_symbol = false
            });
        }
    }

    std::vector<std::vector<uint32_t>> dummy_static_funcs{};
    std::vector<int32_t> section_addresses{};
    section_addresses.emplace_back(text_address);
    section_addresses.emplace_back(data_address);

    auto before_codegen = std::chrono::system_clock::now();

    N64Recomp::LiveGeneratorInputs generator_inputs {
        .switch_error = test_switch_error,
        .get_function = test_get_function,
        .reference_section_addresses = nullptr,
        .local_section_addresses = section_addresses.data()
    };

    // Create the sljit compiler and the generator.
    N64Recomp::LiveGenerator generator{ context.functions.size(), generator_inputs };

    for (size_t func_index = 0; func_index < context.functions.size(); func_index++) {
        std::ostringstream dummy_ostream{};

        //sljit_emit_op0(compiler, SLJIT_BREAKPOINT);

        if (!N64Recomp::recompile_function_live(generator, context, func_index, dummy_ostream, dummy_static_funcs, true)) {
            return { TestError::FailedToRecompile };
        }
    }

    // Generate the code.
    N64Recomp::LiveGeneratorOutput output = generator.finish();

    auto after_codegen = std::chrono::system_clock::now();

    auto before_execution = std::chrono::system_clock::now();

    int old_rounding = fegetround();

    // Run the generated code.
    ctx.r29 = 0xFFFFFFFF80000000 + rdram.size() - 0x10; // Set the stack pointer.
    output.functions[start_func_index](rdram.data(), &ctx);

    fesetround(old_rounding);

    auto after_execution = std::chrono::system_clock::now();

    // Check the result of running the code.
    bool good = byteswap_compare(&rdram[data_address - 0x80000000], &context.rom[good_data_offset], data_length);

    // Dump the data if the results don't match.
    if (!good) {
        std::ofstream data_dump_file{ data_dump_path, std::ios::binary };
        std::vector<uint8_t> data_swapped;
        data_swapped.resize(data_length);
        byteswap_copy(data_swapped.data(), &rdram[data_address - 0x80000000], data_length);
        data_dump_file.write(reinterpret_cast<char*>(data_swapped.data()), data_length);
        return { TestError::DataDifference };
    }

    // Return the test's stats.
    TestStats ret{};
    ret.error = TestError::Success;
    ret.codegen_microseconds = std::chrono::duration_cast<std::chrono::microseconds>(after_codegen - before_codegen).count();
    ret.execution_microseconds = std::chrono::duration_cast<std::chrono::microseconds>(after_execution - before_execution).count();
    ret.code_size = output.code_size;

    return ret;
}

int main(int argc, const char** argv) {
    if (argc < 3) {
        printf("Usage: %s [test directory] [test 1] ...\n", argv[0]);
        return EXIT_SUCCESS;
    }

    N64Recomp::live_recompiler_init();

    rdram.resize(0x8000000);

    // Skip the first argument (program name) and second argument (test directory).
    int count = argc - 1 - 1;
    int passed_count = 0;

    std::vector<size_t> failed_tests{};

    for (size_t test_index = 0; test_index < count; test_index++) {
        const char* cur_test_name = argv[2 + test_index];
        printf("Running test: %s\n", cur_test_name);
        TestStats stats = run_test(argv[1], cur_test_name);

        switch (stats.error) {
        case TestError::Success:
            printf("  Success\n");
            printf("  Generated %" PRIu64 " bytes in %" PRIu64 " microseconds and ran in %" PRIu64 " microseconds\n",
                stats.code_size, stats.codegen_microseconds, stats.execution_microseconds);
            passed_count++;
            break;
        case TestError::FailedToOpenInput:
            printf("  Failed to open input data file\n");
            break;
        case TestError::FailedToRecompile:
            printf("  Failed to recompile\n");
            break;
        case TestError::UnknownStructType:
            printf("  Unknown additional data struct type in test data\n");
            break;
        case TestError::DataDifference:
            printf("  Output data did not match, dumped to file\n");
            break;
        }

        if (stats.error != TestError::Success) {
            failed_tests.emplace_back(test_index);
        }

        printf("\n");
    }

    printf("Passed %d/%d tests\n", passed_count, count);
    if (!failed_tests.empty()) {
        printf("  Failed: ");
        for (size_t i = 0; i < failed_tests.size(); i++) {
            size_t test_index = failed_tests[i];

            printf("%s", argv[2 + test_index]);
            if (i != failed_tests.size() - 1) {
                printf(", ");
            }
        }
        printf("\n");
    }
    return 0;
}


================================================
FILE: OfflineModRecomp/main.cpp
================================================
#include <filesystem>
#include <fstream>
#include <vector>
#include <span>

#include "recompiler/context.h"
#include "rabbitizer.hpp"

static std::vector<uint8_t> read_file(const std::filesystem::path& path, bool& found) {
    std::vector<uint8_t> ret;
    found = false;

    std::ifstream file{ path, std::ios::binary};

    if (file.good()) {
        file.seekg(0, std::ios::end);
        ret.resize(file.tellg());
        file.seekg(0, std::ios::beg);

        file.read(reinterpret_cast<char*>(ret.data()), ret.size());
        found = true;
    }

    return ret;
}

int main(int argc, const char** argv) {
    if (argc != 5) {
        printf("Usage: %s [mod symbol file] [mod binary file] [recomp symbols file] [output C file]\n", argv[0]);
        return EXIT_SUCCESS;
    }
    bool found;
    std::vector<uint8_t> symbol_data = read_file(argv[1], found);
    if (!found) {
        fprintf(stderr, "Failed to open symbol file\n");
        return EXIT_FAILURE;
    }

    std::vector<uint8_t> rom_data = read_file(argv[2], found);
    if (!found) {
        fprintf(stderr, "Failed to open ROM\n");
        return EXIT_FAILURE;
    }

    std::span<const char> symbol_data_span { reinterpret_cast<const char*>(symbol_data.data()), symbol_data.size() };

    std::vector<uint8_t> dummy_rom{};
    N64Recomp::Context reference_context{};
    if (!N64Recomp::Context::from_symbol_file(argv[3], std::move(dummy_rom), reference_context, false)) {
        printf("Failed to load provided function reference symbol file\n");
        return EXIT_FAILURE;
    }

    //for (const std::filesystem::path& cur_data_sym_path : data_reference_syms_file_paths) {
    //    if (!reference_context.read_data_reference_syms(cur_data_sym_path)) {
    //        printf("Failed to load provided data reference symbol file\n");
    //        return EXIT_FAILURE;
    //    }
    //}

    std::unordered_map<uint32_t, uint16_t> sections_by_vrom{};
    for (uint16_t section_index = 0; section_index < reference_context.sections.size(); section_index++) {
        sections_by_vrom[reference_context.sections[section_index].rom_addr] = section_index;
    }

    N64Recomp::Context mod_context;

	N64Recomp::ModSymbolsError error = N64Recomp::parse_mod_symbols(symbol_data_span, rom_data, sections_by_vrom, mod_context);
    if (error != N64Recomp::ModSymbolsError::Good) {
        fprintf(stderr, "Error parsing mod symbols: %d\n", (int)error);
        return EXIT_FAILURE;
    }

    mod_context.import_reference_context(reference_context);

    // Populate R_MIPS_26 reloc symbol indices. Start by building a map of vram address to matching reference symbols.
    std::unordered_map<uint32_t, std::vector<size_t>> reference_symbols_by_vram{};
    for (size_t reference_symbol_index = 0; reference_symbol_index < mod_context.num_regular_reference_symbols(); reference_symbol_index++) {
        const auto& sym = mod_context.get_regular_reference_symbol(reference_symbol_index);
        uint16_t section_index = sym.section_index;
        if (section_index != N64Recomp::SectionAbsolute) {
            uint32_t section_vram = mod_context.get_reference_section_vram(section_index);
            reference_symbols_by_vram[section_vram + sym.section_offset].push_back(reference_symbol_index);
        }
    }
    
    // Use the mapping to populate the symbol index for every R_MIPS_26 reference symbol reloc. 
    for (auto& section : mod_context.sections) {
        for (auto& reloc : section.relocs) {
            if (reloc.type == N64Recomp::RelocType::R_MIPS_26 && reloc.reference_symbol) {
                if (mod_context.is_regular_reference_section(reloc.target_section)) {
                    uint32_t section_vram = mod_context.get_reference_section_vram(reloc.target_section);
                    uint32_t target_vram = section_vram + reloc.target_section_offset;

                    auto find_funcs_it = reference_symbols_by_vram.find(target_vram);
                    bool found = false;
                    if (find_funcs_it != reference_symbols_by_vram.end()) {
                        for (size_t symbol_index : find_funcs_it->second) {
                            const auto& cur_symbol = mod_context.get_reference_symbol(reloc.target_section, symbol_index);
                            if (cur_symbol.section_index == reloc.target_section) {
                                reloc.symbol_index = symbol_index;
                                found = true;
                                break;
                            }
                        }
                    }
                    if (!found) {
                        fprintf(stderr, "Failed to find R_MIPS_26 relocation target in section %d with vram 0x%08X\n", reloc.target_section, target_vram);
                        return EXIT_FAILURE;
                    }
                }
            }
        }
    }

    mod_context.rom = std::move(rom_data);

    std::vector<std::vector<uint32_t>> static_funcs_by_section{};
    static_funcs_by_section.resize(mod_context.sections.size());

    const char* output_file_path = argv[4];
    std::ofstream output_file { output_file_path };

    RabbitizerConfig_Cfg.pseudos.pseudoMove = false;
    RabbitizerConfig_Cfg.pseudos.pseudoBeqz = false;
    RabbitizerConfig_Cfg.pseudos.pseudoBnez = false;
    RabbitizerConfig_Cfg.pseudos.pseudoNot = false;
    RabbitizerConfig_Cfg.pseudos.pseudoBal = false;

    output_file << "#include \"mod_recomp.h\"\n\n";

    // Write the API version.
    output_file << "RECOMP_EXPORT uint32_t recomp_api_version = 1;\n\n";

    output_file << "// Values populated by the runtime:\n\n";

    // Write import function pointer array and defines (i.e. `#define testmod_inner_import imported_funcs[0]`)
    output_file << "// Array of pointers to imported functions with defines to alias their names.\n";
    size_t num_imports = mod_context.import_symbols.size();
    for (size_t import_index = 0; import_index < num_imports; import_index++) {
        const auto& import = mod_context.import_symbols[import_index];
        output_file << "#define " << import.base.name << " imported_funcs[" << import_index << "]\n";
    }

    output_file << "RECOMP_EXPORT recomp_func_t* imported_funcs[" << std::max(size_t{1}, num_imports) << "] = {0};\n";
    output_file << "\n";

    // Use reloc list to write reference symbol function pointer array and defines (i.e. `#define func_80102468 reference_symbol_funcs[0]`)
    output_file << "// Array of pointers to functions from the original ROM with defines to alias their names.\n";
    std::unordered_set<std::string> written_reference_symbols{};
    size_t num_reference_symbols = 0;
    for (const auto& section : mod_context.sections) {
        for (const auto& reloc : section.relocs) {
            if (reloc.type == N64Recomp::RelocType::R_MIPS_26 && reloc.reference_symbol && mod_context.is_regular_reference_section(reloc.target_section)) {
                const auto& sym = mod_context.get_reference_symbol(reloc.target_section, reloc.symbol_index);

                // Prevent writing multiple of the same define. This means there are duplicate symbols in the array if a function is called more than once,
                // but only the first of each set of duplicates is referenced. This is acceptable, since offline mod recompilation is mainly meant for debug purposes.
                if (!written_reference_symbols.contains(sym.name)) {
                    output_file << "#define " << sym.name << " reference_symbol_funcs[" << num_reference_symbols << "]\n";
                    written_reference_symbols.emplace(sym.name);
                }
                num_reference_symbols++;
            }
        }
    }
    // C doesn't allow 0-sized arrays, so always add at least one member to all arrays. The actual size will be pulled from the mod symbols.
    output_file << "RECOMP_EXPORT recomp_func_t* reference_symbol_funcs[" << std::max(size_t{1},num_reference_symbols) << "] = {0};\n\n";

    // Write provided event array (maps internal event indices to global ones).
    output_file << "// Base global event index for this mod's events.\n";
    output_file << "RECOMP_EXPORT uint32_t base_event_index;\n\n";

    // Write the event trigger function pointer.
    output_file << "// Pointer to the runtime function for triggering events.\n";
    output_file << "RECOMP_EXPORT void (*recomp_trigger_event)(uint8_t* rdram, recomp_context* ctx, uint32_t) = NULL;\n\n";

    // Write the get_function pointer.
    output_file << "// Pointer to the runtime function for looking up functions from vram address.\n";
    output_file << "RECOMP_EXPORT recomp_func_t* (*get_function)(int32_t vram) = NULL;\n\n";

    // Write the cop0_status_write pointer.
    output_file << "// Pointer to the runtime function for performing a cop0 status register write.\n";
    output_file << "RECOMP_EXPORT void (*cop0_status_write)(recomp_context* ctx, gpr value) = NULL;\n\n";

    // Write the cop0_status_read pointer.
    output_file << "// Pointer to the runtime function for performing a cop0 status register read.\n";
    output_file << "RECOMP_EXPORT gpr (*cop0_status_read)(recomp_context* ctx) = NULL;\n\n";

    // Write the switch_error pointer.
    output_file << "// Pointer to the runtime function for reporting switch case errors.\n";
    output_file << "RECOMP_EXPORT void (*switch_error)(const char* func, uint32_t vram, uint32_t jtbl) = NULL;\n\n";

    // Write the do_break pointer.
    output_file << "// Pointer to the runtime function for handling the break instruction.\n";
    output_file << "RECOMP_EXPORT void (*do_break)(uint32_t vram) = NULL;\n\n";

    // Write the section_addresses pointer.
    output_file << "// Pointer to the runtime's array of loaded section addresses for the base ROM.\n";
    output_file << "RECOMP_EXPORT int32_t* reference_section_addresses = NULL;\n\n";

    // Write the local section addresses pointer array.
    size_t num_sections = mod_context.sections.size();
    output_file << "// Array of this mod's loaded section addresses.\n";
    output_file << "RECOMP_EXPORT int32_t section_addresses[" << std::max(size_t{1}, num_sections) << "] = {0};\n\n";

    // Create a set of the export indices to avoid renaming them.
    std::unordered_set<size_t> export_indices{mod_context.exported_funcs.begin(), mod_context.exported_funcs.end()};

    // Name all the functions in a first pass so function calls emitted in the second are correct. Also emit function prototypes.
    output_file << "// Function prototypes.\n";
    for (size_t func_index = 0; func_index < mod_context.functions.size(); func_index++) {
        auto& func = mod_context.functions[func_index];
        // Don't rename exports since they already have a name from the mod symbol file.
        if (!export_indices.contains(func_index)) {
            func.name = "mod_func_" + std::to_string(func_index);
        }
        output_file << "RECOMP_FUNC void " << func.name << "(uint8_t* rdram, recomp_context* ctx);\n";
    }
    output_file << "\n";

    // Perform a second pass for recompiling all the functions.
    for (size_t func_index = 0; func_index < mod_context.functions.size(); func_index++) {
        if (!N64Recomp::recompile_function(mod_context, func_index, output_file, static_funcs_by_section, true)) {
            output_file.close();
            std::error_code ec;
            std::filesystem::remove(output_file_path, ec);
            return EXIT_FAILURE;
        }
    }

	return EXIT_SUCCESS;
}


================================================
FILE: README.md
================================================
# N64: Recompiled
N64: Recompiled is a tool to statically recompile N64 binaries into C code that can be compiled for any platform. This can be used for ports or tools as well as for simulating behaviors significantly faster than interpreters or dynamic recompilation can. More widely, it can be used in any context where you want to run some part of an N64 binary in a standalone environment.

This is not the first project that uses static recompilation on game console binaries. A well known example is [jamulator](https://github.com/andrewrk/jamulator), which targets NES binaries. Additionally, this is not even the first project to apply static recompilation to N64-related projects: the [IDO static recompilation](https://github.com/decompals/ido-static-recomp) recompiles the SGI IRIX IDO compiler on modern systems to faciliate matching decompilation of N64 games. This project works similarly to the IDO static recomp project in some ways, and that project was my main inspiration for making this.

## Table of Contents
* [How it Works](#how-it-works)
* [Overlays](#overlays)
* [How to Use](#how-to-use)
* [Single File Output Mode](#single-file-output-mode-for-patches)
* [RSP Microcode Support](#rsp-microcode-support)
* [Planned Features](#planned-features)
* [Building](#building)

## How it Works
The recompiler works by accepting a list of symbols and metadata alongside the binary with the goal of splitting the input binary into functions that are each individually recompiled into a C function, named according to the metadata.

Instructions are processed one-by-one and corresponding C code is emitted as each one gets processed. This translation is very literal in order to keep complexity low. For example, the instruction `addiu $r4, $r4, 0x20`, which adds `0x20` to the 32-bit value in the low bytes of register `$r4` and stores the sign extended 64-bit result in `$r4`, gets recompiled into `ctx->r4 = ADD32(ctx->r4, 0X20);` The `jal` (jump-and-link) instruction is recompiled directly into a function call, and `j` or `b` instructions (unconditional jumps and branches) that can be identified as tail-call optimizations are also recompiled into function calls as well. Branch delay slots are handled by duplicating instructions as necessary. There are other specific behaviors for certain instructions, such as the recompiler attempting to turn a `jr` instruction into a switch-case statement if it can tell that it's being used with a jump table. The recompiler has mostly been tested on binaries built with old MIPS compilers (e.g. mips gcc 2.7.2 and IDO) as well as modern clang targeting mips. Modern mips gcc may trip up the recompiler due to certain optimizations it can do, but those cases can probably be avoided by setting specific compilation flags.

Every output function created by the recompiler is currently emitted into its own file. An option may be provided in the future to group functions together into output files, which should help improve build times of the recompiler output by reducing file I/O in the build process.

Recompiler output can be compiled with any C compiler (tested with msvc, gcc and clang). The output is expected to be used with a runtime that can provide the necessary functionality and macro implementations to run it. A runtime is provided in [N64ModernRuntime](https://github.com/N64Recomp/N64ModernRuntime) which can be seen in action in the [Zelda 64: Recompiled](https://github.com/Zelda64Recomp/Zelda64Recomp) project.

## Overlays
Statically linked and relocatable overlays can both be handled by this tool. In both cases, the tool emits function lookups for jump-and-link-register (i.e. function pointers or virtual functions) which the provided runtime can implement using any sort of lookup table. For example, the instruction `jalr $25` would get recompiled as `LOOKUP_FUNC(ctx->r25)(rdram, ctx);` The runtime can then maintain a list of which program sections are loaded and at what address they are at in order to determine which function to run whenever a lookup is triggered during runtime.

For relocatable overlays, the tool will modify supported instructions possessing relocation data (`lui`, `addiu`, load and store instructions) by emitting an extra macro that enables the runtime to relocate the instruction's immediate value field. For example, the instruction `lui $24, 0x80C0` in a section beginning at address `0x80BFA100` with a relocation against a symbol with an address of `0x80BFA730` will get recompiled as `ctx->r24 = S32(RELOC_HI16(1754, 0X630) << 16);`, where 1754 is the index of this section. The runtime can then implement the RELOC_HI16 and RELOC_LO16 macros in order to handle modifying the immediate based on the current loaded address of the section.

Support for relocations for TLB mapping is coming in the future, which will add the ability to provide a list of MIPS32 relocations so that the runtime can relocate them on load. Combining this with the functionality used for relocatable overlays should allow running most TLB mapped code without incurring a performance penalty on every RAM access.

## How to Use
The recompiler is configured by providing a toml file in order to configure the recompiler behavior, which is the first argument provided to the recompiler. The toml is where you specify input and output file paths, as well as optionally stub out specific functions, skip recompilation of specific functions, and patch single instructions in the target binary. There is also planned functionality to be able to emit hooks in the recompiler output by adding them to the toml (the `[[patches.func]]` and `[[patches.hook]]` sections of the linked toml below), but this is currently unimplemented. Documentation on every option that the recompiler provides is not currently available, but an example toml can be found in the Zelda 64: Recompiled project [here](https://github.com/Mr-Wiseguy/Zelda64Recomp/blob/dev/us.rev1.toml).

Currently, the only way to provide the required metadata is by passing an elf file to this tool. The easiest way to get such an elf is to set up a disassembly or decompilation of the target binary, but there will be support for providing the metadata via a custom format to bypass the need to do so in the future.

## Single File Output Mode (for Patches)
This tool can also be configured to recompile in "single file output" mode via an option in the configuration toml. This will emit all of the functions in the provided elf into a single output file. The purpose of this mode is to be able to compile patched versions of functions from the target binary.

This mode can be combined with the functionality provided by almost all linkers (ld, lld, MSVC's link.exe, etc.) to replace functions from the original recompiler output with modified versions. Those linkers only look for symbols in a static library if they weren't already found in a previous input file, so providing the recompiled patches to the linker before providing the original recompiler output will result in the patches taking priority over functions with the same names from the original recompiler output.

This saves a tremendous amount of time while iterating on patches for the target binary, as you can bypass rerunning the recompiler on the target binary as well as compiling the original recompiler output. An example of using this single file output mode for that purpose can be found in the Zelda 64: Recompiled project [here](https://github.com/Mr-Wiseguy/Zelda64Recomp/blob/dev/patches.toml), with the corresponding Makefile that gets used to build the elf for those patches [here](https://github.com/Mr-Wiseguy/Zelda64Recomp/blob/dev/patches/Makefile).

## RSP Microcode Support
RSP microcode can also be recompiled with this tool. Currently there is no support for recompiling RSP overlays, but it may be added in the future if desired. Documentation on how to use this functionality will be coming soon.

## Planned Features
* Custom metadata format to provide symbol names, relocations, and any other necessary data in order to operate without an elf
* Emitting multiple functions per output file to speed up compilation
* Support for recording MIPS32 relocations to allow runtimes to relocate them for TLB mapping
* Ability to recompile into a dynamic language (such as Lua) to be able to load code at runtime for mod support

## Building
This project can be built with CMake 3.20 or above and a C++ compiler that supports C++20. This repo uses git submodules, so be sure to clone recursively (`git clone --recurse-submodules`) or initialize submodules recursively after cloning (`git submodule update --init --recursive`). From there, building is identical to any other cmake project, e.g. run `cmake` in the target build folder and point it at the root of this repo, then run `cmake --build .` from that target folder.

## Libraries Used
* [rabbitizer](https://github.com/Decompollaborate/rabbitizer) for instruction decoding/analysis
* [ELFIO](https://github.com/serge1/ELFIO) for elf parsing
* [toml11](https://github.com/ToruNiina/toml11) for toml parsing
* [fmtlib](https://github.com/fmtlib/fmt)


================================================
FILE: RSPRecomp/src/rsp_recomp.cpp
================================================
#include <optional>
#include <fstream>
#include <array>
#include <vector>
#include <unordered_set>
#include <unordered_map>
#include <cassert>
#include <iostream>
#include <filesystem>
#include "rabbitizer.hpp"
#include "fmt/format.h"
#include "fmt/ostream.h"
#include <toml++/toml.hpp>

using InstrId = rabbitizer::InstrId::UniqueId;
using Cop0Reg = rabbitizer::Registers::Rsp::Cop0;
constexpr size_t instr_size = sizeof(uint32_t);
constexpr uint32_t rsp_mem_mask = 0x1FFF;

// Can't use rabbitizer's operand types because we need to be able to provide a register reference or a register index
enum class RspOperand {
    None,
    Vt,
    VtIndex,
    Vd,
    Vs,
    VsIndex,
    De,
    Rt,
    Rs,
    Imm7,
};

std::unordered_map<InstrId, std::array<RspOperand, 3>> vector_operands{
    // Vt, Rs, Imm
    { InstrId::rsp_lbv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_ldv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_lfv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_lhv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_llv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_lpv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_lqv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_lrv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_lsv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_luv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    // { InstrId::rsp_lwv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}}, // Not in rabbitizer
    { InstrId::rsp_sbv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_sdv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_sfv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_shv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_slv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_spv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_sqv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_srv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_ssv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_suv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_swv, {RspOperand::Vt, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_stv, {RspOperand::VtIndex, RspOperand::Rs, RspOperand::Imm7}},
    { InstrId::rsp_ltv, {RspOperand::VtIndex, RspOperand::Rs, RspOperand::Imm7}},

    // Vd, Vs, Vt
    { InstrId::rsp_vabs,    {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vadd,    {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vaddc,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vand,    {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vch,     {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vcl,     {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vcr,     {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_veq,     {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vge,     {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vlt,     {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmacf,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmacu,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmadh,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmadl,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmadm,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmadn,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmrg,    {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmudh,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmudl,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmudm,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmudn,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vne,     {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vnor,    {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vnxor,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vor,     {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vsub,    {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vsubc,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmulf,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmulu,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vmulq,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vnand,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vxor,    {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}},
    { InstrId::rsp_vsar,    {RspOperand::Vd, RspOperand::Vs, RspOperand::None}},
    { InstrId::rsp_vmacq,   {RspOperand::Vd, RspOperand::None, RspOperand::None}},
    // { InstrId::rsp_vzero,   {RspOperand::Vd, RspOperand::Vs, RspOperand::Vt}}, unused pseudo
    { InstrId::rsp_vrndn,   {RspOperand::Vd, RspOperand::VsIndex, RspOperand::Vt}},
    { InstrId::rsp_vrndp,   {RspOperand::Vd, RspOperand::VsIndex, RspOperand::Vt}},

    // Vd, De, Vt
    { InstrId::rsp_vmov,    {RspOperand::Vd, RspOperand::De, RspOperand::Vt}},
    { InstrId::rsp_vrcp,    {RspOperand::Vd, RspOperand::De, RspOperand::Vt}},
    { InstrId::rsp_vrcpl,   {RspOperand::Vd, RspOperand::De, RspOperand::Vt}},
    { InstrId::rsp_vrcph,   {RspOperand::Vd, RspOperand::De, RspOperand::Vt}},
    { InstrId::rsp_vrsq,    {RspOperand::Vd, RspOperand::De, RspOperand::Vt}},
    { InstrId::rsp_vrsql,   {RspOperand::Vd, RspOperand::De, RspOperand::Vt}},
    { InstrId::rsp_vrsqh,   {RspOperand::Vd, RspOperand::De, RspOperand::Vt}},

    // Rt, Vs
    { InstrId::rsp_mfc2,    {RspOperand::Rt, RspOperand::Vs, RspOperand::None}},
    { InstrId::rsp_mtc2,    {RspOperand::Rt, RspOperand::Vs, RspOperand::None}},

    // Nop
    { InstrId::rsp_vnop,    {RspOperand::None, RspOperand::None, RspOperand::None}}
};

std::string_view ctx_gpr_prefix(int reg) {
    if (reg != 0) {
        return "r";
    }
    return "";
}

uint32_t expected_c0_reg_value(int cop0_reg) {
    switch (static_cast<Cop0Reg>(cop0_reg)) {
    case Cop0Reg::RSP_COP0_SP_STATUS:
        return 0; // None of the flags in RSP status are set
    case Cop0Reg::RSP_COP0_SP_DMA_FULL:
        return 0; // Pretend DMAs complete instantly
    case Cop0Reg::RSP_COP0_SP_DMA_BUSY:
        return 0; // Pretend DMAs complete instantly
    case Cop0Reg::RSP_COP0_SP_SEMAPHORE:
        return 0; // Always acquire the semaphore
    case Cop0Reg::RSP_COP0_DPC_STATUS:
        return 0; // Good enough for the microcodes that would be recompiled (i.e. non-graphics ones)
    default:
        fmt::print(stderr, "Unhandled mfc0: {}\n", cop0_reg);
        throw std::runtime_error("Unhandled mfc0");
        return 0;
    }
}

std::string_view c0_reg_write_action(int cop0_reg) {
    switch (static_cast<Cop0Reg>(cop0_reg)) {
    case Cop0Reg::RSP_COP0_SP_SEMAPHORE:
        return ""; // Ignore semaphore functionality
    case Cop0Reg::RSP_COP0_SP_STATUS:
        return ""; // Ignore writes to the status flags since yielding is ignored
    case Cop0Reg::RSP_COP0_SP_DRAM_ADDR:
        return "SET_DMA_DRAM";
    case Cop0Reg::RSP_COP0_SP_MEM_ADDR:
        return "SET_DMA_MEM";
    case Cop0Reg::RSP_COP0_SP_RD_LEN:
        return "DO_DMA_READ";
    case Cop0Reg::RSP_COP0_SP_WR_LEN:
        return "DO_DMA_WRITE";
    default:
        fmt::print(stderr, "Unhandled mtc0: {}\n", cop0_reg);
        throw std::runtime_error("Unhandled mtc0");
    }

}

bool is_c0_reg_write_dma_read(int cop0_reg) {
    return static_cast<Cop0Reg>(cop0_reg) == Cop0Reg::RSP_COP0_SP_RD_LEN;
}

std::optional<int> get_rsp_element(const rabbitizer::InstructionRsp& instr) {
    if (instr.hasOperand(rabbitizer::OperandType::rsp_vt_elementhigh)) {
        return instr.GetRsp_elementhigh();
    } else if (instr.hasOperand(rabbitizer::OperandType::rsp_vt_elementlow) || instr.hasOperand(rabbitizer::OperandType::rsp_vs_index)) {
        return instr.GetRsp_elementlow();
    }

    return std::nullopt;
}

bool rsp_ignores_element(InstrId id) {
    return id == InstrId::rsp_vmacq || id == InstrId::rsp_vnop;
}

struct BranchTargets {
    std::unordered_set<uint32_t> direct_targets;
    std::unordered_set<uint32_t> indirect_targets;
};

BranchTargets get_branch_targets(const std::vector<rabbitizer::InstructionRsp>& instrs) {
    BranchTargets ret;
    for (const auto& instr : instrs) {
        if (instr.isJumpWithAddress() || instr.isBranch()) {
            ret.direct_targets.insert(instr.getBranchVramGeneric() & rsp_mem_mask);
        }
        if (instr.doesLink()) {
            ret.indirect_targets.insert(instr.getVram() + 2 * instr_size);
        }
    }
    return ret;
}

struct ResumeTargets {
    std::unordered_set<uint32_t> non_delay_targets;
    std::unordered_set<uint32_t> delay_targets;
};

void get_overlay_swap_resume_targets(const std::vector<rabbitizer::InstructionRsp>& instrs, ResumeTargets& targets) {
    bool is_delay_slot = false;
    for (const auto& instr : instrs) {
        InstrId instr_id = instr.getUniqueId();
        int rd = (int)instr.GetO32_rd();

        if (instr_id == InstrId::rsp_mtc0 && is_c0_reg_write_dma_read(rd)) {
            uint32_t vram = instr.getVram();

            targets.non_delay_targets.insert(vram);

            if (is_delay_slot) {
                targets.delay_targets.insert(vram);
            }
        }

        is_delay_slot = instr.hasDelaySlot();
    }
}

bool process_instruction(size_t instr_index, const std::vector<rabbitizer::InstructionRsp>& instructions, std::ofstream& output_file, const BranchTargets& branch_targets, const std::unordered_set<uint32_t>& unsupported_instructions, const ResumeTargets& resume_targets, bool has_overlays, bool indent, bool in_delay_slot) {
    const auto& instr = instructions[instr_index];

    uint32_t instr_vram = instr.getVram();
    InstrId instr_id = instr.getUniqueId();

    // Skip labels if we're duplicating an instruction into a delay slot
    if (!in_delay_slot) {
        // Print a label if one exists here
        if (branch_targets.direct_targets.contains(instr_vram) || branch_targets.indirect_targets.contains(instr_vram)) {
            fmt::print(output_file, "L_{:04X}:\n", instr_vram);
        }
    }

    uint16_t branch_target = instr.getBranchVramGeneric() & rsp_mem_mask;

    // Output a comment with the original instruction
    if (instr.isBranch() || instr_id == InstrId::rsp_j) {
        fmt::print(output_file, "    // {}\n", instr.disassemble(0, fmt::format("L_{:04X}", branch_target)));
    } else if (instr_id == InstrId::rsp_jal) {
        fmt::print(output_file, "    // {}\n", instr.disassemble(0, fmt::format("0x{:04X}", branch_target)));
    } else {
        fmt::print(output_file, "    // {}\n", instr.disassemble(0));
    }

    auto print_indent = [&]() {
        fmt::print(output_file, "    ");
    };

    auto print_line = [&]<typename... Ts>(fmt::format_string<Ts...> fmt_str, Ts&& ...args) {
        print_indent();
        fmt::print(output_file, fmt_str, std::forward<Ts>(args)...);
        fmt::print(output_file, ";\n");
    };

    auto print_branch_condition = [&]<typename... Ts>(fmt::format_string<Ts...> fmt_str, Ts&& ...args) {
        fmt::print(output_file, fmt_str, std::forward<Ts>(args)...);
        fmt::print(output_file, " ");
    };

    auto print_unconditional_branch = [&]<typename... Ts>(fmt::format_string<Ts...> fmt_str, Ts&& ...args) {
        if (instr_index < instructions.size() - 1) {
            uint32_t next_vram = instr_vram + 4;
            process_instruction(instr_index + 1, instructions, output_file, branch_targets, unsupported_instructions, resume_targets, has_overlays, false, true);
        }
        print_indent();
        fmt::print(output_file, fmt_str, std::forward<Ts>(args)...);
        fmt::print(output_file, ";\n");
    };

    auto print_branch = [&]<typename... Ts>(fmt::format_string<Ts...> fmt_str, Ts&& ...args) {
        fmt::print(output_file, "{{\n    ");
        if (instr_index < instructions.size() - 1) {
            uint32_t next_vram = instr_vram + 4;
            process_instruction(instr_index + 1, instructions, output_file, branch_targets, unsupported_instructions, resume_targets, has_overlays, true, true);
        }
        fmt::print(output_file, "        ");
        fmt::print(output_file, fmt_str, std::forward<Ts>(args)...);
        fmt::print(output_file, ";\n    }}\n");
    };

    if (indent) {
        print_indent();
    }

    // Replace unsupported instructions with early returns
    if (unsupported_instructions.contains(instr_vram)) {
        print_line("return RspExitReason::Unsupported", instr_vram);
        if (indent) {
            print_indent();
        }
    }

    int rd = (int)instr.GetO32_rd();
    int rs = (int)instr.GetO32_rs();
    int base = rs;
    int rt = (int)instr.GetO32_rt();
    int sa = (int)instr.Get_sa();

    int fd = (int)instr.GetO32_fd();
    int fs = (int)instr.GetO32_fs();
    int ft = (int)instr.GetO32_ft();

    uint16_t imm = instr.Get_immediate();

    std::string unsigned_imm_string = fmt::format("{:#X}", imm);
    std::string signed_imm_string = fmt::format("{:#X}", (int16_t)imm);

    auto rsp_element = get_rsp_element(instr);

    // If this instruction is in the vector operand table then emit the appropriate function call for its implementation
    auto operand_find_it = vector_operands.find(instr_id);
    if (operand_find_it != vector_operands.end()) {
        const auto& operands = operand_find_it->second;
        int vd = (int)instr.GetRsp_vd();
        int vs = (int)instr.GetRsp_vs();
        int vt = (int)instr.GetRsp_vt();
        std::string operand_string = "";
        for (RspOperand operand : operands) {
            switch (operand) {
                case RspOperand::Vt:
                    operand_string += fmt::format("rsp.vpu.r[{}], ", vt);
                    break;
                case RspOperand::VtIndex:
                    operand_string += fmt::format("{}, ", vt);
                    break;
                case RspOperand::Vd:
                    operand_string += fmt::format("rsp.vpu.r[{}], ", vd);
                    break;
                case RspOperand::Vs:
                    operand_string += fmt::format("rsp.vpu.r[{}], ", vs);
                    break;
                case RspOperand::VsIndex:
                    operand_string += fmt::format("{}, ", vs);
                    break;
                case RspOperand::De:
                    operand_string += fmt::format("{}, ", instr.GetRsp_de() & 7);
                    break;
                case RspOperand::Rt:
                    operand_string += fmt::format("{}{}, ", ctx_gpr_prefix(rt), rt);
                    break;
                case RspOperand::Rs:
                    operand_string += fmt::format("{}{}, ", ctx_gpr_prefix(rs), rs);
                    break;
                case RspOperand::Imm7:
                    // Sign extend the 7-bit immediate
                    operand_string += fmt::format("{:#X}, ", ((int8_t)(imm << 1)) >> 1);
                    break;
                case RspOperand::None:
                    break;
            }
        }
        // Trim the trailing comma off the operands
        if (operand_string.size() > 0) {
            operand_string = operand_string.substr(0, operand_string.size() - 2);
        }
        std::string uppercase_name = "";
        std::string lowercase_name = instr.getOpcodeName();
        uppercase_name.reserve(lowercase_name.size() + 1);
        for (char c : lowercase_name) {
            uppercase_name += std::toupper(c);
        }
        if (rsp_ignores_element(instr_id)) {
            print_line("rsp.{}({})", uppercase_name, operand_string);
        } else {
            print_line("rsp.{}<{}>({})", uppercase_name, rsp_element.value(), operand_string);
        }
    }
    // Otherwise, implement the instruction directly
    else {
        switch (instr_id) {
        case InstrId::rsp_nop:
            fmt::print(output_file, "\n");
            break;
            // Arithmetic
        case InstrId::rsp_lui:
            print_line("{}{} = S32({} << 16)", ctx_gpr_prefix(rt), rt, unsigned_imm_string);
            break;
        case InstrId::rsp_add:
        case InstrId::rsp_addu:
            if (rd == 0) {
                fmt::print(output_file, "\n");
                break;
            }
            print_line("{}{} = RSP_ADD32({}{}, {}{})", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_negu: // pseudo instruction for subu x, 0, y
        case InstrId::rsp_sub:
        case InstrId::rsp_subu:
            print_line("{}{} = RSP_SUB32({}{}, {}{})", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_addi:
        case InstrId::rsp_addiu:
            print_line("{}{} = RSP_ADD32({}{}, {})", ctx_gpr_prefix(rt), rt, ctx_gpr_prefix(rs), rs, signed_imm_string);
            break;
        case InstrId::rsp_and:
            if (rd == 0) {
                fmt::print(output_file, "\n");
                break;
            }
            print_line("{}{} = {}{} & {}{}", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_andi:
            print_line("{}{} = {}{} & {}", ctx_gpr_prefix(rt), rt, ctx_gpr_prefix(rs), rs, unsigned_imm_string);
            break;
        case InstrId::rsp_or:
            print_line("{}{} = {}{} | {}{}", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_ori:
            print_line("{}{} = {}{} | {}", ctx_gpr_prefix(rt), rt, ctx_gpr_prefix(rs), rs, unsigned_imm_string);
            break;
        case InstrId::rsp_nor:
            print_line("{}{} = ~({}{} | {}{})", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_xor:
            print_line("{}{} = {}{} ^ {}{}", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_xori:
            print_line("{}{} = {}{} ^ {}", ctx_gpr_prefix(rt), rt, ctx_gpr_prefix(rs), rs, unsigned_imm_string);
            break;
        case InstrId::rsp_sll:
            print_line("{}{} = S32({}{}) << {}", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rt), rt, sa);
            break;
        case InstrId::rsp_sllv:
            print_line("{}{} = S32({}{}) << ({}{} & 31)", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rt), rt, ctx_gpr_prefix(rs), rs);
            break;
        case InstrId::rsp_sra:
            print_line("{}{} = S32(RSP_SIGNED({}{}) >> {})", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rt), rt, sa);
            break;
        case InstrId::rsp_srav:
            print_line("{}{} = S32(RSP_SIGNED({}{}) >> ({}{} & 31))", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rt), rt, ctx_gpr_prefix(rs), rs);
            break;
        case InstrId::rsp_srl:
            print_line("{}{} = S32(U32({}{}) >> {})", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rt), rt, sa);
            break;
        case InstrId::rsp_srlv:
            print_line("{}{} = S32(U32({}{}) >> ({}{} & 31))", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rt), rt, ctx_gpr_prefix(rs), rs);
            break;
        case InstrId::rsp_slt:
            print_line("{}{} = RSP_SIGNED({}{}) < RSP_SIGNED({}{}) ? 1 : 0", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_slti:
            print_line("{}{} = RSP_SIGNED({}{}) < {} ? 1 : 0", ctx_gpr_prefix(rt), rt, ctx_gpr_prefix(rs), rs, signed_imm_string);
            break;
        case InstrId::rsp_sltu:
            print_line("{}{} = {}{} < {}{} ? 1 : 0", ctx_gpr_prefix(rd), rd, ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_sltiu:
            print_line("{}{} = {}{} < {} ? 1 : 0", ctx_gpr_prefix(rt), rt, ctx_gpr_prefix(rs), rs, signed_imm_string);
            break;
            // Loads
            // TODO ld
        case InstrId::rsp_lw:
            print_line("{}{} = RSP_MEM_W_LOAD({}, {}{})", ctx_gpr_prefix(rt), rt, signed_imm_string, ctx_gpr_prefix(base), base);
            break;
        case InstrId::rsp_lh:
            print_line("{}{} = RSP_MEM_H_LOAD({}, {}{})", ctx_gpr_prefix(rt), rt, signed_imm_string, ctx_gpr_prefix(base), base);
            break;
        case InstrId::rsp_lb:
            print_line("{}{} = RSP_MEM_B({}, {}{})", ctx_gpr_prefix(rt), rt, signed_imm_string, ctx_gpr_prefix(base), base);
            break;
        case InstrId::rsp_lhu:
            print_line("{}{} = RSP_MEM_HU_LOAD({}, {}{})", ctx_gpr_prefix(rt), rt, signed_imm_string, ctx_gpr_prefix(base), base);
            break;
        case InstrId::rsp_lbu:
            print_line("{}{} = RSP_MEM_BU({}, {}{})", ctx_gpr_prefix(rt), rt, signed_imm_string, ctx_gpr_prefix(base), base);
            break;
            // Stores
        case InstrId::rsp_sw:
            print_line("RSP_MEM_W_STORE({}, {}{}, {}{})", signed_imm_string, ctx_gpr_prefix(base), base, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_sh:
            print_line("RSP_MEM_H_STORE({}, {}{}, {}{})", signed_imm_string, ctx_gpr_prefix(base), base, ctx_gpr_prefix(rt), rt);
            break;
        case InstrId::rsp_sb:
            print_line("RSP_MEM_B({}, {}{}) = {}{}", signed_imm_string, ctx_gpr_prefix(base), base, ctx_gpr_prefix(rt), rt);
            break;
            // Branches
        case InstrId::rsp_j:
        case InstrId::rsp_b:
            print_unconditional_branch("goto L_{:04X}", branch_target);
            break;
        case InstrId::rsp_jal:
            print_line("{}{} = 0x{:04X}", ctx_gpr_prefix(31), 31, instr_vram + 2 * instr_size);
            print_unconditional_branch("goto L_{:04X}", branch_target);
            break;
        case InstrId::rsp_jr:
            print_line("jump_target = {}{}", ctx_gpr_prefix(rs), rs);
            print_line("debug_file = __FILE__; debug_line = __LINE__");
            print_unconditional_branch("goto do_indirect_jump");
            break;
        case InstrId::rsp_jalr:
            print_line("jump_target = {}{}; {}{} = 0x{:8X}", ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rd), rd, instr_vram + 2 * instr_size);
            print_line("debug_file = __FILE__; debug_line = __LINE__");
            print_unconditional_branch("goto do_indirect_jump");
            break;
        case InstrId::rsp_bne:
            print_indent();
            print_branch_condition("if ({}{} != {}{})", ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            print_branch("goto L_{:04X}", branch_target);
            break;
        case InstrId::rsp_beq:
            print_indent();
            print_branch_condition("if ({}{} == {}{})", ctx_gpr_prefix(rs), rs, ctx_gpr_prefix(rt), rt);
            print_branch("goto L_{:04X}", branch_target);
            break;
        case InstrId::rsp_bgez:
            print_indent();
            print_branch_condition("if (RSP_SIGNED({}{}) >= 0)", ctx_gpr_prefix(rs), rs);
            print_branch("goto L_{:04X}", branch_target);
            break;
        case InstrId::rsp_bgtz:
            print_indent();
            print_branch_condition("if (RSP_SIGNED({}{}) > 0)", ctx_gpr_prefix(rs), rs);
            print_branch("goto L_{:04X}", branch_target);
            break;
        case InstrId::rsp_blez:
            print_indent();
            print_branch_condition("if (RSP_SIGNED({}{}) <= 0)", ctx_gpr_prefix(rs), rs);
            print_branch("goto L_{:04X}", branch_target);
            break;
        case InstrId::rsp_bltz:
            print_indent();
            print_branch_condition("if (RSP_SIGNED({}{}) < 0)", ctx_gpr_prefix(rs), rs);
            print_branch("goto L_{:04X}", branch_target);
            break;
        case InstrId::rsp_break:
            print_line("return RspExitReason::Broke", instr_vram);
            break;
        case InstrId::rsp_mfc0:
            print_line("{}{} = {}", ctx_gpr_prefix(rt), rt, expected_c0_reg_value(rd));
            break;
        case InstrId::rsp_mtc0:
            {
                std::string_view write_action = c0_reg_write_action(rd);
                if (has_overlays && is_c0_reg_write_dma_read(rd)) {
                    // DMA read, do overlay swap if reading into IMEM
                    fmt::print(output_file, 
                        "    if (dma_mem_address & 0x1000) {{\n"
                        "        ctx->resume_address = 0x{:04X};\n"
                        "        ctx->resume_delay = {};\n"
                        "        goto do_overlay_swap;\n"
                        "    }}\n",
                        instr_vram, in_delay_slot ? "true" : "false");
                }
                if (!write_action.empty()) {
                    print_line("{}({}{})", write_action, ctx_gpr_prefix(rt), rt);
                }
                break;
            }
        default:
            fmt::print(stderr, "Unhandled instruction: {}\n", instr.getOpcodeName());
            assert(false);
            return false;
        }
    }

    // Write overlay swap resume labels
    if (in_delay_slot) {
        if (resume_targets.delay_targets.contains(instr_vram)) {
            fmt::print(output_file, "R_{:04X}_delay:\n", instr_vram);
        }
    } else {
        if (resume_targets.non_delay_targets.contains(instr_vram)) {
            fmt::print(output_file, "R_{:04X}:\n", instr_vram);
        }
    }

    return true;
}

void write_indirect_jumps(std::ofstream& output_file, const BranchTargets& branch_targets, const std::string& output_function_name) {
    fmt::print(output_file,
        "do_indirect_jump:\n"
        "    switch ((jump_target | 0x1000) & {:#X}) {{ \n", rsp_mem_mask);
    for (uint32_t branch_target: branch_targets.indirect_targets) {
        fmt::print(output_file, "        case 0x{0:04X}: goto L_{0:04X};\n", branch_target);
    }
    fmt::print(output_file,
        "    }}\n"
        "    printf(\"Unhandled jump target 0x%04X in microcode {}, coming from [%s:%d]\\n\", jump_target, debug_file, debug_line);\n"
        "    printf(\"Register dump: r0  = %08X r1  = %08X r2  = %08X r3  = %08X r4  = %08X r5  = %08X r6  = %08X r7  = %08X\\n\"\n"
        "           \"               r8  = %08X r9  = %08X r10 = %08X r11 = %08X r12 = %08X r13 = %08X r14 = %08X r15 = %08X\\n\"\n"
        "           \"               r16 = %08X r17 = %08X r18 = %08X r19 = %08X r20 = %08X r21 = %08X r22 = %08X r23 = %08X\\n\"\n"
        "           \"               r24 = %08X r25 = %08X r26 = %08X r27 = %08X r28 = %08X r29 = %08X r30 = %08X r31 = %08X\\n\",\n"
        "           0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r13, r14, r15, r16,\n"
        "           r17, r18, r19, r20, r21, r22, r23, r24, r25, r26, r27, r28, r29, r30, r31);\n"
        "    return RspExitReason::UnhandledJumpTarget;\n", output_function_name);
}

void write_overlay_swap_return(std::ofstream& output_file) {
    fmt::print(output_file,
        "do_overlay_swap:\n"
        "                    ctx->r1 = r1;   ctx->r2 = r2;   ctx->r3 = r3;   ctx->r4 = r4;   ctx->r5 = r5;   ctx->r6 = r6;   ctx->r7 = r7;\n"
        "    ctx->r8 = r8;   ctx->r9 = r9;   ctx->r10 = r10; ctx->r11 = r11; ctx->r12 = r12; ctx->r13 = r13; ctx->r14 = r14; ctx->r15 = r15;\n"
        "    ctx->r16 = r16; ctx->r17 = r17; ctx->r18 = r18; ctx->r19 = r19; ctx->r20 = r20; ctx->r21 = r21; ctx->r22 = r22; ctx->r23 = r23;\n"
        "    ctx->r24 = r24; ctx->r25 = r25; ctx->r26 = r26; ctx->r27 = r27; ctx->r28 = r28; ctx->r29 = r29; ctx->r30 = r30; ctx->r31 = r31;\n"
        "    ctx->dma_mem_address = dma_mem_address;\n"
        "    ctx->dma_dram_address = dma_dram_address;\n"
        "    ctx->jump_target = jump_target;\n"
        "    ctx->rsp = rsp;\n"
        "    return RspExitReason::SwapOverlay;\n");
}

#ifdef _MSC_VER
inline uint32_t byteswap(uint32_t val) {
    return _byteswap_ulong(val);
}
#else
constexpr uint32_t byteswap(uint32_t val) {
    return __builtin_bswap32(val);
}
#endif

struct RSPRecompilerOverlayConfig {
    size_t offset;
    size_t size;
};

struct RSPRecompilerOverlaySlotConfig {
    size_t text_address;
    std::vector<RSPRecompilerOverlayConfig> overlays;
};

struct RSPRecompilerConfig {
    size_t text_offset;
    size_t text_size;
    size_t text_address;
    std::filesystem::path rom_file_path;
    std::filesystem::path output_file_path;
    std::string output_function_name;
    std::vector<uint32_t> extra_indirect_branch_targets;
    std::unordered_set<uint32_t> unsupported_instructions;
    std::vector<RSPRecompilerOverlaySlotConfig> overlay_slots;
};

std::filesystem::path concat_if_not_empty(const std::filesystem::path& parent, const std::filesystem::path& child) {
    if (!child.empty()) {
        return parent / child;
    }
    return child;
}

template <typename T>
std::vector<T> toml_to_vec(const toml::array* array) {
    std::vector<T> ret;

    // Reserve room for all the funcs in the map.
    ret.reserve(array->size());
    array->for_each([&ret](auto&& el) {
        if constexpr (toml::is_integer<decltype(el)>) {
            ret.push_back(*el);
        }
    });

    return ret;
}

template <typename T>
std::unordered_set<T> toml_to_set(const toml::array* array) {
    std::unordered_set<T> ret;

    array->for_each([&ret](auto&& el) {
        if constexpr (toml::is_integer<decltype(el)>) {
            ret.insert(*el);
        }
    });

    return ret;
}

bool read_config(const std::filesystem::path& config_path, RSPRecompilerConfig& out) {
    RSPRecompilerConfig ret{};

    try {
        const toml::table config_data = toml::parse_file(config_path.u8string());
        std::filesystem::path basedir = std::filesystem::path{ config_path }.parent_path();

        std::optional<uint32_t> text_offset = config_data["text_offset"].value<uint32_t>();
        if (text_offset.has_value()) {
            ret.text_offset = text_offset.value();
        }
        else {
            throw toml::parse_error("Missing text_offset in config file", config_data.source());
        }

        std::optional<uint32_t> text_size = config_data["text_size"].value<uint32_t>();
        if (text_size.has_value()) {
            ret.text_size = text_size.value();
        }
        else {
            throw toml::parse_error("Missing text_size in config file", config_data.source());
        }

        std::optional<uint32_t> text_address = config_data["text_address"].value<uint32_t>();
        if (text_address.has_value()) {
            ret.text_address = text_address.value();
        }
        else {
            throw toml::parse_error("Missing text_address in config file", config_data.source());
        }

        std::optional<std::string> rom_file_path = config_data["rom_file_path"].value<std::string>();
        if (rom_file_path.has_value()) {
            ret.rom_file_path = concat_if_not_empty(basedir, rom_file_path.value());
        }
        else {
            throw toml::parse_error("Missing rom_file_path in config file", config_data.source());
        }

        std::optional<std::string> output_file_path = config_data["output_file_path"].value<std::string>();
        if (output_file_path.has_value()) {
            ret.output_file_path = concat_if_not_empty(basedir, output_file_path.value());
        }
        else {
            throw toml::parse_error("Missing output_file_path in config file", config_data.source());
        }

        std::optional<std::string> output_function_name = config_data["output_function_name"].value<std::string>();
        if (output_function_name.has_value()) {
            ret.output_function_name = output_function_name.value();
        }
        else {
            throw toml::parse_error("Missing output_function_name in config file", config_data.source());
        }

        // Extra indirect branch targets (optional)
        const toml::node_view branch_targets_data = config_data["extra_indirect_branch_targets"];
        if (branch_targets_data.is_array()) {
            const toml::array* branch_targets_array = branch_targets_data.as_array();
            ret.extra_indirect_branch_targets = toml_to_vec<uint32_t>(branch_targets_array);
        }

        // Unsupported_instructions (optional)
        const toml::node_view unsupported_instructions_data = config_data["unsupported_instructions"];
        if (unsupported_instructions_data.is_array()) {
            const toml::array* unsupported_instructions_array = unsupported_instructions_data.as_array();
            ret.unsupported_instructions = toml_to_set<uint32_t>(unsupported_instructions_array);
        }

        // Overlay slots (optional)
        const toml::node_view overlay_slots = config_data["overlay_slots"];
        if (overlay_slots.is_array()) {
            const toml::array* overlay_slots_array = overlay_slots.as_array();

            int slot_idx = 0;
            overlay_slots_array->for_each([&](toml::table slot){
                RSPRecompilerOverlaySlotConfig slot_config;

                std::optional<uint32_t> text_address = slot["text_address"].value<uint32_t>();
                if (text_address.has_value()) {
                    slot_config.text_address = text_address.value();
                }
                else {
                    throw toml::parse_error(
                        fmt::format("Missing text_address in config file at overlay slot {}", slot_idx).c_str(), 
                        config_data.source());
                }

                // Overlays per slot
                const toml::node_view overlays = slot["overlays"];
                if (overlays.is_array()) {
                    const toml::array* overlay_array = overlays.as_array();

                    int overlay_idx = 0;
                    overlay_array->for_each([&](toml::table overlay){
                        RSPRecompilerOverlayConfig overlay_config;
                        
                        std::optional<uint32_t> offset = overlay["offset"].value<uint32_t>();
                        if (offset.has_value()) {
                            overlay_config.offset = offset.value();
                        }
                        else {
                            throw toml::parse_error(
                                fmt::format("Missing offset in config file at overlay slot {} overlay {}", slot_idx, overlay_idx).c_str(), 
                                config_data.source());
                        }

                        std::optional<uint32_t> size = overlay["size"].value<uint32_t>();
                        if (size.has_value()) {
                            overlay_config.size = size.value();

                            if ((size.value() % sizeof(uint32_t)) != 0) {
                                throw toml::parse_error(
                                    fmt::format("Overlay size must be a multiple of {} in config file at overlay slot {} overlay {}", sizeof(uint32_t), slot_idx, overlay_idx).c_str(), 
                                    config_data.source());
                            }
                        }
                        else {
                            throw toml::parse_error(
                                fmt::format("Missing size in config file at overlay slot {} overlay {}", slot_idx, overlay_idx).c_str(), 
                                config_data.source());
                        }

                        slot_config.overlays.push_back(overlay_config);
                        overlay_idx++;
                    });
                }
                else {
                    throw toml::parse_error(
                        fmt::format("Missing overlays in config file at overlay slot {}", slot_idx).c_str(), 
                        config_data.source());
                }

                ret.overlay_slots.push_back(slot_config);
                slot_idx++;
            });
        }

    }
    catch (const toml::parse_error& err) {
        std::cerr << "Syntax error parsing toml: " << *err.source().path << " (" << err.source().begin <<  "):\n" << err.description() << std::endl;
        return false;
    }

    out = ret;
    return true;
}

struct FunctionPermutation {
    std::vector<rabbitizer::InstructionRsp> instrs;
    std::vector<uint32_t> permutation;
};

struct Permutation {
    std::vector<uint32_t> instr_words;
    std::vector<uint32_t> permutation;
};

struct Overlay {
    std::vector<uint32_t> instr_words;
};

struct OverlaySlot {
    uint32_t offset;
    std::vector<Overlay> overlays;
};

bool next_permutation(const std::vector<uint32_t>& option_lengths, std::vector<uint32_t>& current) {
    current[current.size() - 1] += 1;

    size_t i = current.size() - 1;
    while (current[i] == option_lengths[i]) {
        current[i] = 0;
        if (i == 0) {
            return false;
        }

        current[i - 1] += 1;
        i--;
    }

    return true;
}

void permute(const std::vector<uint32_t>& base_words, const std::vector<OverlaySlot>& overlay_slots, std::vector<Permutation>& permutations) {
    auto current = std::vector<uint32_t>(overlay_slots.size(), 0);
    auto slot_options = std::vector<uint32_t>(overlay_slots.size(), 0);

    for (size_t i = 0; i < overlay_slots.size(); i++) {
        slot_options[i] = overlay_slots[i].overlays.size();
    }

    do {
        Permutation permutation = {
            .instr_words = std::vector<uint32_t>(base_words),
            .permutation = std::vector<uint32_t>(current)
        };

        for (size_t i = 0; i < overlay_slots.size(); i++) {
            const OverlaySlot &slot = overlay_slots[i];
            const Overlay &overlay = slot.overlays[current[i]];

            uint32_t word_offset = slot.offset / sizeof(uint32_t);

            size_t size_needed = word_offset + overlay.instr_words.size();
            if (permutation.instr_words.size() < size_needed) {
                permutation.instr_words.reserve(size_needed);
            }

            std::copy(overlay.instr_words.begin(), overlay.instr_words.end(), permutation.instr_words.data() + word_offset);
        }

        permutations.push_back(permutation);
    } while (next_permutation(slot_options, current));
}

std::string make_permutation_string(const std::vector<uint32_t> permutation) {
    std::string str = "";

    for (uint32_t opt : permutation) {
        str += std::to_string(opt);
    }

    return str;
}

void create_overlay_swap_function(const std::string& function_name, std::ofstream& output_file, const std::vector<FunctionPermutation>& permutations, const RSPRecompilerConfig& config) {
    // Includes and permutation protos
    fmt::print(output_file, 
        "#include <map>\n"
        "#include <vector>\n\n"
        "using RspUcodePermutationFunc = RspExitReason(uint8_t* rdram, RspContext* ctx);\n\n"
        "RspExitReason {}(uint8_t* rdram, RspContext* ctx);\n",
        config.output_function_name + "_initial");

    for (const auto& permutation : permutations) {
        fmt::print(output_file, "RspExitReason {}(uint8_t* rdram, RspContext* ctx);\n",
            config.output_function_name + make_permutation_string(permutation.permutation));
    }
    fmt::print(output_file, "\n");

    // IMEM -> slot index mapping
    fmt::print(output_file, 
        "static const std::map<uint32_t, uint32_t> imemToSlot = {{\n");
    for (size_t i = 0; i < config.overlay_slots.size(); i++) {
        const RSPRecompilerOverlaySlotConfig& slot = config.overlay_slots[i];

        uint32_t imemAddress = slot.text_address & rsp_mem_mask;
        fmt::print(output_file, "    {{ 0x{:04X}, {} }},\n",
            imemAddress, i);
    }
    fmt::print(output_file, "}};\n\n");

    // ucode offset -> overlay index mapping (per slot)
    fmt::print(output_file, 
        "static const std::vector<std::map<uint32_t, uint32_t>> offsetToOverlay = {{\n");
    for (const auto& slot : config.overlay_slots) {
        fmt::print(output_file, "    {{\n");
        for (size_t i = 0; i < slot.overlays.size(); i++) {
            const RSPRecompilerOverlayConfig& overlay = slot.overlays[i];

            fmt::print(output_file, "        {{ 0x{:04X}, {} }},\n",
                overlay.offset, i);
        }
        fmt::print(output_file, "    }},\n");
    }
    fmt::print(output_file, "}};\n\n");

    // Permutation function pointers
    fmt::print(output_file, 
        "static RspUcodePermutationFunc* permutations[] = {{\n");
    for (const auto& permutation : permutations) {
        fmt::print(output_file, "    {},\n",
            config.output_function_name + make_permutation_string(permutation.permutation));
    }
    fmt::print(output_file, "}};\n\n");

    // Main function
    fmt::print(output_file,
        "RspExitReason {}(uint8_t* rdram, uint32_t ucode_addr) {{\n"
        "    RspContext ctx{{}};\n",
        config.output_function_name);
    
    std::string slots_init_str = "";
    for (size_t i = 0; i < config.overlay_slots.size(); i++) {
        if (i > 0) {
            slots_init_str += ", ";
        }

        slots_init_str += "0";
    }

    fmt::print(output_file, "    uint32_t slots[] = {{{}}};\n\n",
        slots_init_str);

    fmt::print(output_file, "    RspExitReason exitReason = {}(rdram, &ctx);\n\n",
        config.output_function_name + "_initial");
    
    fmt::print(output_file, "");

    std::string perm_index_str = "";
    for (size_t i = 0; i < config.overlay_slots.size(); i++) {
        if (i > 0) {
            perm_index_str += " + ";
        }

        uint32_t multiplier = 1;
        for (size_t k = i + 1; k < config.overlay_slots.size(); k++) {
            multiplier *= config.overlay_slots[k].overlays.size();
        }

        perm_index_str += fmt::format("slots[{}] * {}", i, multiplier);
    }
    
    fmt::print(output_file,
        "    while (exitReason == RspExitReason::SwapOverlay) {{\n"
        "        uint32_t slot = imemToSlot.at(ctx.dma_mem_address);\n"
        "        uint32_t overlay = offsetToOverlay.at(slot).at(ctx.dma_dram_address - ucode_addr);\n"
        "        slots[slot] = overlay;\n"
        "\n"
        "        RspUcodePermutationFunc* permutationFunc = permutations[{}];\n"
        "        exitReason = permutationFunc(rdram, &ctx);\n"
        "    }}\n\n"
        "    return exitReason;\n"
        "}}\n\n",
        perm_index_str);
}

void create_function(const std::string& function_name, std::ofstream& output_file, const std::vector<rabbitizer::InstructionRsp>& instrs, const RSPRecompilerConfig& config, const ResumeTargets& resume_targets, bool is_permutation, bool is_initial) {
    // Collect indirect jump targets (return addresses for linked jumps)
    BranchTargets branch_targets = get_branch_targets(instrs);

    // Add any additional indirect branch targets that may not be found directly in the code (e.g. from a jump table)
    for (uint32_t target : config.extra_indirect_branch_targets) {
        branch_targets.indirect_targets.insert(target);
    }
    
    // Write function
    if (is_permutation) {
        fmt::print(output_file,
            "RspExitReason {}(uint8_t* rdram, RspContext* ctx) {{\n"
            "    uint32_t                 r1 = ctx->r1,   r2 = ctx->r2,   r3 = ctx->r3,   r4 = ctx->r4,   r5 = ctx->r5,   r6 = ctx->r6,   r7 = ctx->r7;\n"
            "    uint32_t  r8 = ctx->r8,  r9 = ctx->r9,   r10 = ctx->r10, r11 = ctx->r11, r12 = ctx->r12, r13 = ctx->r13, r14 = ctx->r14, r15 = ctx->r15;\n"
            "    uint32_t r16 = ctx->r16, r17 = ctx->r17, r18 = ctx->r18, r19 = ctx->r19, r20 = ctx->r20, r21 = ctx->r21, r22 = ctx->r22, r23 = ctx->r23;\n"
            "    uint32_t r24 = ctx->r24, r25 = ctx->r25, r26 = ctx->r26, r27 = ctx->r27, r28 = ctx->r28, r29 = ctx->r29, r30 = ctx->r30, r31 = ctx->r31;\n"
            "    uint32_t dma_mem_address = ctx->dma_mem_address, dma_dram_address = ctx->dma_dram_address, jump_target = ctx->jump_target;\n"
            "    const char * debug_file = NULL; int debug_line = 0;\n"
            "    RSP rsp = ctx->rsp;\n", function_name);

        // Write jumps to resume targets
        if (!is_initial) {
            fmt::print(output_file,
                "    if (ctx->resume_delay) {{\n"
                "        switch (ctx->resume_address) {{\n");
            
            for (uint32_t address : resume_targets.delay_targets) {
                fmt::print(output_file, "            case 0x{0:04X}: goto R_{0:04X}_delay;\n", 
                    address);
            }
            
            fmt::print(output_file,
                "        }}\n"
                "    }} else {{\n"
                "        switch (ctx->resume_address) {{\n");
            
            for (uint32_t address : resume_targets.non_delay_targets) {
                fmt::print(output_file, "            case 0x{0:04X}: goto R_{0:04X};\n", 
                    address);
            }

            fmt::print(output_file,
                "        }}\n"
                "    }}\n"
                "    printf(\"Unhandled resume target 0x%04X (delay slot: %d) in microcode {}\\n\", ctx->resume_address, ctx->resume_delay);\n"
                "    return RspExitReason::UnhandledResumeTarget;\n",
                config.output_function_name);
        }

        fmt::print(output_file, "    r1 = 0xFC0;\n");
    } else {
        fmt::print(output_file,
            "RspExitReason {}(uint8_t* rdram, [[maybe_unused]] uint32_t ucode_addr) {{\n"
            "    uint32_t           r1 = 0,  r2 = 0,  r3 = 0,  r4 = 0,  r5 = 0,  r6 = 0,  r7 = 0;\n"
            "    uint32_t  r8 = 0,  r9 = 0, r10 = 0, r11 = 0, r12 = 0, r13 = 0, r14 = 0, r15 = 0;\n"
            "    uint32_t r16 = 0, r17 = 0, r18 = 0, r19 = 0, r20 = 0, r21 = 0, r22 = 0, r23 = 0;\n"
            "    uint32_t r24 = 0, r25 = 0, r26 = 0, r27 = 0, r28 = 0, r29 = 0, r30 = 0, r31 = 0;\n"
            "    uint32_t dma_mem_address = 0, dma_dram_address = 0, jump_target = 0;\n"
            "    const char * debug_file = NULL; int debug_line = 0;\n"
            "    RSP rsp{{}};\n"
            "    r1 = 0xFC0;\n", function_name);
    }
    // Write each instruction
    for (size_t instr_index = 0; instr_index < instrs.size(); instr_index++) {
        process_instruction(instr_index, instrs, output_file, branch_targets, config.unsupported_instructions, resume_targets, is_permutation, false, false);
    }

    // Terminate instruction code with a return to indicate that the microcode has run past its end
    fmt::print(output_file, "    return RspExitReason::ImemOverrun;\n");

    // Write the section containing the indirect jump table
    write_indirect_jumps(output_file, branch_targets, config.output_function_name);

    // Write routine for returning for an overlay swap
    if (is_permutation) {
        write_overlay_swap_return(output_file);
    }

    // End the file
    fmt::print(output_file, "}}\n");
}

int main(int argc, const char** argv) {
    if (argc != 2) {
        fmt::print("Usage: {} [config file]\n", argv[0]);
        std::exit(EXIT_SUCCESS);
    }

    RSPRecompilerConfig config;
    if (!read_config(std::filesystem::path{argv[1]}, config)) {
        fmt::print("Failed to parse config file {}\n", argv[0]);
        std::exit(EXIT_FAILURE);
    }

    std::vector<uint32_t> instr_words{};
    std::vector<OverlaySlot> overlay_slots{};
    instr_words.resize(config.text_size / sizeof(uint32_t));
    {
        std::ifstream rom_file{ config.rom_file_path, std::ios_base::binary };

        if (!rom_file.good()) {
            fmt::print(stderr, "Failed to open rom file\n");
            return EXIT_FAILURE;
        }

        rom_file.seekg(config.text_offset);
        rom_file.read(reinterpret_cast<char*>(instr_words.data()), config.text_size);

        for (const RSPRecompilerOverlaySlotConfig &slot_config : config.overlay_slots) {
            OverlaySlot slot{};
            slot.offset = (slot_config.text_address - config.text_address) & rsp_mem_mask;

            for (const RSPRecompilerOverlayConfig &overlay_config : slot_config.overlays) {
                Overlay overlay{};
                overlay.instr_words.resize(overlay_config.size / sizeof(uint32_t));

                rom_file.seekg(config.text_offset + overlay_config.offset);
                rom_file.read(reinterpret_cast<char*>(overlay.instr_words.data()), overlay_config.size);

                slot.overlays.push_back(overlay);
            }

            overlay_slots.push_back(slot);
        }
    }

    // Create overlay permutations
    std::vector<Permutation> permutations{};
    if (!overlay_slots.empty()) {
        permute(instr_words, overlay_slots, permutations);
    }

    // Disable appropriate pseudo instructions
    RabbitizerConfig_Cfg.pseudos.pseudoMove = false;
    RabbitizerConfig_Cfg.pseudos.pseudoBeqz = false;
    RabbitizerConfig_Cfg.pseudos.pseudoBnez = false;
    RabbitizerConfig_Cfg.pseudos.pseudoNot = false;

    // Decode the instruction words into instructions
    std::vector<rabbitizer::InstructionRsp> instrs{};
    instrs.reserve(instr_words.size());
    uint32_t vram = config.text_address & rsp_mem_mask;
    for (uint32_t instr_word : instr_words) {
        const rabbitizer::InstructionRsp& instr = instrs.emplace_back(byteswap(instr_word), vram);
        vram += instr_size;
    }

    std::vector<FunctionPermutation> func_permutations{};
    func_permutations.reserve(permutations.size());
    for (const Permutation& permutation : permutations) {
        FunctionPermutation func = {
            .permutation = std::vector<uint32_t>(permutation.permutation)
        };

        func.instrs.reserve(permutation.instr_words.size());
        uint32_t vram = config.text_address & rsp_mem_mask;
        for (uint32_t instr_word : permutation.instr_words) {
            const rabbitizer::InstructionRsp& instr = func.instrs.emplace_back(byteswap(instr_word), vram);
            vram += instr_size;
        }

        func_permutations.emplace_back(func);
    }

    // Determine all possible overlay swap resume targets
    ResumeTargets resume_targets{};
    for (const FunctionPermutation& permutation : func_permutations) {
        get_overlay_swap_resume_targets(permutation.instrs, resume_targets);
    }

    // Open output file and write beginning
    std::filesystem::create_directories(std::filesystem::path{ config.output_file_path }.parent_path());
    std::ofstream output_file(config.output_file_path);
    fmt::print(output_file,
        "#include \"librecomp/rsp.hpp\"\n"
        "#include \"librecomp/rsp_vu_impl.hpp\"\n");
    
    // Write function(s)
    if (overlay_slots.empty()) {
        create_function(config.output_function_name, output_file, instrs, config, resume_targets, false, false);
    } else {
        create_overlay_swap_function(config.output_function_name, output_file, func_permutations, config);
        create_function(config.output_function_name + "_initial", output_file, instrs, config, ResumeTargets{}, true, true);

        for (const auto& permutation : func_permutations) {
            create_function(config.output_function_name + make_permutation_string(permutation.permutation), 
                output_file, permutation.instrs, config, resume_targets, true, false);
        }
    }

    return 0;
}


================================================
FILE: RecompModMerger/main.cpp
================================================
#include <cstdio>
#include <fstream>

#include "recompiler/context.h"

template <typename T>
bool read_file(const std::filesystem::path& p, std::vector<T>& out) {
    static_assert(sizeof(T) == 1);
    std::vector<T> ret{};

    std::ifstream input_file{p, std::ios::binary};
    if (!input_file.good()) {
        return false;
    }
    
    input_file.seekg(0, std::ios::end);
    ret.resize(input_file.tellg());
    input_file.seekg(0, std::ios::beg);

    input_file.read(reinterpret_cast<char*>(ret.data()), ret.size());

    out = std::move(ret);

    return true;
}

bool write_file(const std::filesystem::path& p, std::span<char> in) {
    std::ofstream out{ p, std::ios::binary };
    if (!out.good()) {
        return false;
    }

    out.write(in.data(), in.size());
    return true;
}

std::span<uint8_t> reinterpret_span_u8(std::span<char> s) {
    return std::span(reinterpret_cast<uint8_t*>(s.data()), s.size());
}

std::span<char> reinterpret_span_char(std::span<uint8_t> s) {
    return std::span(reinterpret_cast<char*>(s.data()), s.size());
}

bool copy_into_context(N64Recomp::Context& out, const N64Recomp::Context& in) {
    size_t rom_offset = out.rom.size();
    size_t section_offset = out.sections.size();
    size_t function_offset = out.functions.size();
    size_t event_offset = out.event_symbols.size();
    
    // Append the input rom to the end of the output rom.
    out.rom.insert(out.rom.end(), in.rom.begin(), in.rom.end());

    // Merge dependencies from the input. Copy new ones and remap existing ones.
    std::vector<size_t> new_dependency_indices(in.dependencies.size());
    for (size_t dep_index = 0; dep_index < in.dependencies.size(); dep_index++) {
        const std::string& dep = in.dependencies[dep_index];
        auto find_dep_it = out.dependencies_by_name.find(dep);
        if (find_dep_it != out.dependencies_by_name.end()) {
            new_dependency_indices[dep_index] = find_dep_it->second;
        }
        else {
            out.dependencies_by_name[dep] = out.dependencies.size();
            new_dependency_indices[dep_index] = out.dependencies.size();
            out.dependencies.emplace_back(dep);
        }
    }

    // Merge imports from the input. Copy new ones and remap existing ones.
    std::vector<size_t> new_import_indices(in.import_symbols.size());
    for (size_t import_index = 0; import_index < in.import_symbols.size(); import_index++) {
        const N64Recomp::ImportSymbol& sym = in.import_symbols[import_index];
        size_t dependency_index = new_dependency_indices[sym.dependency_index];

        size_t original_import_index = (size_t)-1;
        
        // Check if any import symbols have the same dependency index and symbol name.
        for (size_t i = 0; i < out.import_symbols.size(); i++) {
            const N64Recomp::ImportSymbol& sym_out = out.import_symbols[i];
            if (sym_out.dependency_index == dependency_index && sym_out.base.name == sym.base.name) {
                original_import_index = i;
                break;
            }
        }

        if (original_import_index != (size_t)-1) {
            new_import_indices[import_index] = original_import_index;
        }
        else {
            new_import_indices[import_index] = out.import_symbols.size();
            N64Recomp::ImportSymbol new_sym{};
            new_sym.dependency_index = dependency_index;
            new_sym.base.name = sym.base.name;
            out.import_symbols.emplace_back(std::move(new_sym));
        }
    }

    // Merge dependency events from the input. Copy new ones and remap existing ones.
    std::vector<size_t> new_dependency_event_indices(in.dependency_events.size());
    for (size_t dependency_event_index = 0; dependency_event_index < in.dependency_events.size(); dependency_event_index++) {
        const N64Recomp::DependencyEvent& event = in.dependency_events[dependency_event_index];
        size_t dependency_index = new_dependency_indices[event.dependency_index];

        size_t original_event_index = (size_t)-1;

        // Check if any dependency events have the same dependency index and event name.
        for (size_t i = 0; i < out.dependency_events.size(); i++) {
            const N64Recomp::DependencyEvent& event_out = out.dependency_events[i];
            if (event_out.dependency_index == dependency_index && event_out.event_name == event.event_name) {
                original_event_index = i;
                break;
            }
        }

        if (original_event_index != (size_t)-1) {
            new_dependency_event_indices[dependency_event_index] = original_event_index;
        }
        else {
            new_dependency_event_indices[dependency_event_index] = out.dependency_events.size();
            out.dependency_events.emplace_back(N64Recomp::DependencyEvent{ .dependency_index = dependency_index, .event_name = event.event_name });
        }
    }

    // Copy every section from the input.
    for (size_t section_index = 0; section_index < in.sections.size(); section_index++) {
        const N64Recomp::Section& section = in.sections[section_index];

        size_t out_section_index = section_offset + section_index;
        N64Recomp::Section& section_out = out.sections.emplace_back(section);
        section_out.rom_addr += rom_offset;
        section_out.name = "";

        // Adjust the section index of all the section's relocs.
        for (N64Recomp::Reloc& reloc : section_out.relocs) {
            if (reloc.target_section == N64Recomp::SectionAbsolute) {
                printf("Internal error: reloc in section %zu references an absolute symbol and should have been relocated already. Please report this issue.\n",
                    section_index);
                // Nothing to do for absolute relocs.
            }
            else if (reloc.target_section == N64Recomp::SectionImport) {
                // symbol_index indexes context.import_symbols
                reloc.symbol_index = new_import_indices[reloc.symbol_index];
            }
            else if (reloc.target_section == N64Recomp::SectionEvent) {
                // symbol_index indexes context.event_symbols
                reloc.symbol_index += event_offset;
            }
            else if (reloc.reference_symbol) {
                // symbol_index indexes context.reference_symbols
                // Nothing to do here, reference section indices will remain unchanged.
            }
            else {
                reloc.target_section += section_offset;
            }
        }
    }

    out.section_functions.resize(out.sections.size());

    // Copy every function from the input.
    for (size_t func_index = 0; func_index < in.functions.size(); func_index++) {
        const N64Recomp::Function& func = in.functions[func_index];

        size_t out_func_index = function_offset + func_index;
        N64Recomp::Function& function_out = out.functions.emplace_back(func);

        function_out.section_index += section_offset;
        function_out.rom += rom_offset;
        // functions_by_name unused
        out.functions_by_vram[function_out.vram].push_back(out_func_index);

        out.section_functions[function_out.section_index].push_back(out_func_index);
    }

    // Copy replacements from the input.
    for (size_t replacement_index = 0; replacement_index < in.replacements.size(); replacement_index++) {
        const N64Recomp::FunctionReplacement& replacement = in.replacements[replacement_index];
        N64Recomp::FunctionReplacement& replacement_out = out.replacements.emplace_back(replacement);
        replacement_out.func_index += function_offset;
    }

    // Copy hooks from the input.
    for (size_t hook_index = 0; hook_index < in.hooks.size(); hook_index++) {
        const N64Recomp::FunctionHook& hook = in.hooks[hook_index];
        N64Recomp::FunctionHook& hook_out = out.hooks.emplace_back(hook);
        hook_out.func_index += function_offset;
    }

    // Copy callbacks from the input.
    for (size_t callback_index = 0; callback_index < in.callbacks.size(); callback_index++) {
        const N64Recomp::Callback& callback = in.callbacks[callback_index];
        N64Recomp::Callback callback_out = out.callbacks.emplace_back(callback);
        callback_out.dependency_event_index = new_dependency_event_indices[callback_out.dependency_event_index];
    }

    // Copy exports from the input.
    for (size_t exported_func : in.exported_funcs) {
        out.exported_funcs.push_back(exported_func + function_offset);
    }

    // Copy events from the input.
    for (const N64Recomp::EventSymbol& event_sym : in.event_symbols) {
        out.event_symbols.emplace_back(event_sym);
    }

    return true;
}

int main(int argc, const char** argv) {
    if (argc != 8) {
        printf("Usage: %s <function symbol toml> <symbol file 1> <binary 1> <symbol file 2> <binary 2> <output symbol file> <output binary file>\n", argv[0]);
        return EXIT_SUCCESS;
    }

    const char* function_symbol_toml_path = argv[1];
    const char* sym_file_path_1 = argv[2];
    const char* binary_path_1 = argv[3];
    const char* sym_file_path_2 = argv[4];
    const char* binary_path_2 = argv[5];
    const char* output_sym_path = argv[6];
    const char* output_binary_path = argv[7];

    // Load the symbol and binary files.
    std::vector<char> sym_file_1;
    if (!read_file(sym_file_path_1, sym_file_1)) {
        fprintf(stderr, "Error reading file %s\n", sym_file_path_1);
        return EXIT_FAILURE;
    }

    std::vector<uint8_t> binary_1;
    if (!read_file(binary_path_1, binary_1)) {
        fprintf(stderr, "Error reading file %s\n", binary_path_1);
        return EXIT_FAILURE;
    }

    std::vector<char> sym_file_2;
    if (!read_file(sym_file_path_2, sym_file_2)) {
        fprintf(stderr, "Error reading file %s\n", sym_file_path_2);
        return EXIT_FAILURE;
    }

    std::vector<uint8_t> binary_2;
    if (!read_file(binary_path_2, binary_2)) {
        fprintf(stderr, "Error reading file %s\n", binary_path_2);
        return EXIT_FAILURE;
    }
    
    N64Recomp::ModSymbolsError err;

    // Parse the symbol toml.
    std::vector<uint8_t> dummy_rom{};
    N64Recomp::Context reference_context{};
    if (!N64Recomp::Context::from_symbol_file(function_symbol_toml_path, std::move(dummy_rom), reference_context, false)) {
        fprintf(stderr, "Failed to load provided function reference symbol file\n");
        return EXIT_FAILURE;
    }

    // Build a reference section lookup of rom address.
    std::unordered_map<uint32_t, uint16_t> sections_by_rom{};
    for (size_t section_index = 0; section_index < reference_context.sections.size(); section_index++) {
        sections_by_rom[reference_context.sections[section_index].rom_addr] = section_index;
    }

    // Parse the two contexts.
    N64Recomp::Context context1{};
    err = N64Recomp::parse_mod_symbols(sym_file_1, binary_1, sections_by_rom, context1);
    if (err != N64Recomp::ModSymbolsError::Good) {
        fprintf(stderr, "Error parsing mod symbols %s\n", sym_file_path_1);
        return EXIT_FAILURE;
    }
    context1.rom = std::move(binary_1);

    N64Recomp::Context context2{};
    err = N64Recomp::parse_mod_symbols(sym_file_2, binary_2, sections_by_rom, context2);
    if (err != N64Recomp::ModSymbolsError::Good) {
        fprintf(stderr, "Error parsing mod symbols %s\n", sym_file_path_2);
        return EXIT_FAILURE;
    }
    context2.rom = std::move(binary_2);

    N64Recomp::Context merged{};
    merged.import_reference_context(reference_context);

    if (!copy_into_context(merged, context1)) {
        fprintf(stderr, "Failed to merge first mod into output\n");
        return EXIT_FAILURE;
    }
    if (!copy_into_context(merged, context2)) {
        fprintf(stderr, "Failed to merge second mod into output\n");
        return EXIT_FAILURE;
    }

    std::vector<uint8_t> syms_out = N64Recomp::symbols_to_bin_v1(merged);

    if (!write_file(output_sym_path, reinterpret_span_char(syms_out))) {
        fprintf(stderr, "Failed to write symbol file to %s\n", output_sym_path);
        return EXIT_FAILURE;
    }

    if (!write_file(output_binary_path, reinterpret_span_char(std::span{ merged.rom }))) {
        fprintf(stderr, "Failed to write binary file to %s\n", output_binary_path);
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}


================================================
FILE: RecompModTool/main.cpp
================================================
#include <array>
#include <fstream>
#include <filesystem>
#include <iostream>
#include <numeric>
#include <cctype>
#include <cstdlib>
#include "fmt/format.h"
#include "fmt/ostream.h"
#include "recompiler/context.h"
#include <toml++/toml.hpp>

#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#else
#include <unistd.h>
#include <sys/wait.h>
#endif

constexpr std::string_view symbol_filename = "mod_syms.bin";
constexpr std::string_view binary_filename = "mod_binary.bin";
constexpr std::string_view manifest_filename = "mod.json";

struct ModManifest {
    std::string mod_id;
    std::string version_string;
    std::string display_name;
    std::string description;
    std::string short_description;
    std::vector<std::string> authors;
    std::string game_id;
    std::string minimum_recomp_version;
    std::unordered_map<std::string, std::vector<std::string>> native_libraries;
    bool custom_gamemode = false;
    std::vector<toml::table> config_options;
    std::vector<std::string> dependencies;
    std::vector<std::string> full_dependency_strings;
    std::vector<std::string> optional_dependencies;
    std::vector<std::string> full_optional_dependency_strings;
};

struct ModInputs {
    std::filesystem::path elf_path;
    std::string mod_filename;
    std::filesystem::path func_reference_syms_file_path;
    std::vector<std::filesystem::path> data_reference_syms_file_paths;
    std::vector<std::filesystem::path> additional_files;
};

struct ModConfig {
    ModManifest manifest;
    ModInputs inputs;
};

static std::filesystem::path concat_if_not_empty(const std::filesystem::path& parent, const std::filesystem::path& child) {
    if (child.is_absolute()) {
        return child;
    }
    if (!child.empty()) {
        return parent / child;
    }
    return child;
}

static bool validate_version_string(std::string_view str, bool& has_label) {
    std::array<size_t, 2> period_indices;
    size_t num_periods = 0;
    size_t cur_pos = 0;
    uint16_t major;
    uint16_t minor;
    uint16_t patch;

    // Find the 2 required periods.
    cur_pos = str.find('.', cur_pos);
    period_indices[0] = cur_pos;
    cur_pos = str.find('.', cur_pos + 1);
    period_indices[1] = cur_pos;

    // Check that both were found.
    if (period_indices[0] == std::string::npos || period_indices[1] == std::string::npos) {
        return false;
    }

    // Parse the 3 numbers formed by splitting the string via the periods.
    std::array<std::from_chars_result, 3> parse_results; 
    std::array<size_t, 3> parse_starts { 0, period_indices[0] + 1, period_indices[1] + 1 };
    std::array<size_t, 3> parse_ends { period_indices[0], period_indices[1], str.size() };
    parse_results[0] = std::from_chars(str.data() + parse_starts[0], s
Download .txt
gitextract_8ah6s3g4/

├── .github/
│   └── workflows/
│       └── validate.yml
├── .gitignore
├── .gitmodules
├── CMakeLists.txt
├── LICENSE
├── LiveRecomp/
│   ├── live_generator.cpp
│   └── live_recompiler_test.cpp
├── OfflineModRecomp/
│   └── main.cpp
├── README.md
├── RSPRecomp/
│   └── src/
│       └── rsp_recomp.cpp
├── RecompModMerger/
│   └── main.cpp
├── RecompModTool/
│   └── main.cpp
├── include/
│   ├── recomp.h
│   └── recompiler/
│       ├── context.h
│       ├── generator.h
│       ├── live_recompiler.h
│       └── operations.h
└── src/
    ├── analysis.cpp
    ├── analysis.h
    ├── cgenerator.cpp
    ├── config.cpp
    ├── config.h
    ├── elf.cpp
    ├── main.cpp
    ├── mdebug.cpp
    ├── mdebug.h
    ├── mod_symbols.cpp
    ├── operations.cpp
    ├── recompilation.cpp
    └── symbol_lists.cpp
Download .txt
SYMBOL INDEX (228 symbols across 23 files)

FILE: LiveRecomp/live_generator.cpp
  type Registers (line 26) | namespace Registers {
  type InnerCall (line 38) | struct InnerCall {
  type ReferenceSymbolCall (line 43) | struct ReferenceSymbolCall {
  type SwitchErrorJump (line 48) | struct SwitchErrorJump {
  type N64Recomp::LiveGeneratorContext (line 54) | struct N64Recomp::LiveGeneratorContext {
  function get_gpr_context_offset (line 229) | constexpr int get_gpr_context_offset(int gpr_index) {
  function get_fpr_single_context_offset (line 233) | constexpr int get_fpr_single_context_offset(int fpr_index) {
  function get_fpr_double_context_offset (line 237) | constexpr int get_fpr_double_context_offset(int fpr_index) {
  function is_fpr_u32l (line 241) | constexpr bool is_fpr_u32l(N64Recomp::Operand operand) {
  function get_fpr_u32l_context_offset (line 249) | constexpr void get_fpr_u32l_context_offset(int fpr_index, sljit_compiler...
  function get_fpr_u64_context_offset (line 265) | constexpr int get_fpr_u64_context_offset(int fpr_index) {
  function get_gpr_values (line 269) | void get_gpr_values(int gpr, sljit_sw& out, sljit_sw& outw) {
  function get_operand_values (line 280) | bool get_operand_values(N64Recomp::Operand operand, const N64Recomp::Ins...
  function outputs_to_zero (line 386) | bool outputs_to_zero(N64Recomp::Operand output, const N64Recomp::Instruc...
  function do_round_w_s (line 831) | int32_t do_round_w_s(float num) {
  function do_round_w_d (line 835) | int32_t do_round_w_d(double num) {
  function do_round_l_s (line 839) | int64_t do_round_l_s(float num) {
  function do_round_l_d (line 843) | int64_t do_round_l_d(double num) {
  function do_ceil_w_s (line 847) | int32_t do_ceil_w_s(float num) {
  function do_ceil_w_d (line 851) | int32_t do_ceil_w_d(double num) {
  function do_ceil_l_s (line 855) | int64_t do_ceil_l_s(float num) {
  function do_ceil_l_d (line 859) | int64_t do_ceil_l_d(double num) {
  function do_floor_w_s (line 863) | int32_t do_floor_w_s(float num) {
  function do_floor_w_d (line 867) | int32_t do_floor_w_d(double num) {
  function do_floor_l_s (line 871) | int64_t do_floor_l_s(float num) {
  function do_floor_l_d (line 875) | int64_t do_floor_l_d(double num) {

FILE: LiveRecomp/live_recompiler_test.cpp
  function read_file (line 10) | static std::vector<uint8_t> read_file(const std::filesystem::path& path,...
  function read_u32_swap (line 29) | uint32_t read_u32_swap(const std::vector<uint8_t>& vec, size_t offset) {
  function read_u32 (line 33) | uint32_t read_u32(const std::vector<uint8_t>& vec, size_t offset) {
  function byteswap_copy (line 39) | void byteswap_copy(uint8_t* dst, uint8_t* src, size_t count) {
  function byteswap_compare (line 45) | bool byteswap_compare(uint8_t* a, uint8_t* b, size_t count) {
  type TestError (line 54) | enum class TestError {
  type TestStats (line 62) | struct TestStats {
  function write1 (line 69) | void write1(uint8_t* rdram, recomp_context* ctx) {
  function recomp_func_t (line 73) | recomp_func_t* test_get_function(int32_t vram) {
  function test_switch_error (line 81) | void test_switch_error(const char* func, uint32_t vram, uint32_t jtbl) {
  function TestStats (line 85) | TestStats run_test(const std::filesystem::path& tests_dir, const std::st...
  function main (line 301) | int main(int argc, const char** argv) {

FILE: OfflineModRecomp/main.cpp
  function read_file (line 9) | static std::vector<uint8_t> read_file(const std::filesystem::path& path,...
  function main (line 27) | int main(int argc, const char** argv) {

FILE: RSPRecomp/src/rsp_recomp.cpp
  type RspOperand (line 21) | enum class RspOperand {
  function ctx_gpr_prefix (line 117) | std::string_view ctx_gpr_prefix(int reg) {
  function expected_c0_reg_value (line 124) | uint32_t expected_c0_reg_value(int cop0_reg) {
  function c0_reg_write_action (line 143) | std::string_view c0_reg_write_action(int cop0_reg) {
  function is_c0_reg_write_dma_read (line 164) | bool is_c0_reg_write_dma_read(int cop0_reg) {
  function get_rsp_element (line 168) | std::optional<int> get_rsp_element(const rabbitizer::InstructionRsp& ins...
  function rsp_ignores_element (line 178) | bool rsp_ignores_element(InstrId id) {
  type BranchTargets (line 182) | struct BranchTargets {
  function BranchTargets (line 187) | BranchTargets get_branch_targets(const std::vector<rabbitizer::Instructi...
  type ResumeTargets (line 200) | struct ResumeTargets {
  function get_overlay_swap_resume_targets (line 205) | void get_overlay_swap_resume_targets(const std::vector<rabbitizer::Instr...
  function process_instruction (line 225) | bool process_instruction(size_t instr_index, const std::vector<rabbitize...
  function write_indirect_jumps (line 576) | void write_indirect_jumps(std::ofstream& output_file, const BranchTarget...
  function write_overlay_swap_return (line 595) | void write_overlay_swap_return(std::ofstream& output_file) {
  function byteswap (line 610) | inline uint32_t byteswap(uint32_t val) {
  function byteswap (line 614) | constexpr uint32_t byteswap(uint32_t val) {
  type RSPRecompilerOverlayConfig (line 619) | struct RSPRecompilerOverlayConfig {
  type RSPRecompilerOverlaySlotConfig (line 624) | struct RSPRecompilerOverlaySlotConfig {
  type RSPRecompilerConfig (line 629) | struct RSPRecompilerConfig {
  function concat_if_not_empty (line 641) | std::filesystem::path concat_if_not_empty(const std::filesystem::path& p...
  function toml_to_vec (line 649) | std::vector<T> toml_to_vec(const toml::array* array) {
  function toml_to_set (line 664) | std::unordered_set<T> toml_to_set(const toml::array* array) {
  function read_config (line 676) | bool read_config(const std::filesystem::path& config_path, RSPRecompiler...
  type FunctionPermutation (line 824) | struct FunctionPermutation {
  type Permutation (line 829) | struct Permutation {
  type Overlay (line 834) | struct Overlay {
  type OverlaySlot (line 838) | struct OverlaySlot {
  function next_permutation (line 843) | bool next_permutation(const std::vector<uint32_t>& option_lengths, std::...
  function permute (line 860) | void permute(const std::vector<uint32_t>& base_words, const std::vector<...
  function make_permutation_string (line 892) | std::string make_permutation_string(const std::vector<uint32_t> permutat...
  function create_overlay_swap_function (line 902) | void create_overlay_swap_function(const std::string& function_name, std:...
  function create_function (line 1004) | void create_function(const std::string& function_name, std::ofstream& ou...
  function main (line 1087) | int main(int argc, const char** argv) {

FILE: RecompModMerger/main.cpp
  function read_file (line 7) | bool read_file(const std::filesystem::path& p, std::vector<T>& out) {
  function write_file (line 27) | bool write_file(const std::filesystem::path& p, std::span<char> in) {
  function reinterpret_span_u8 (line 37) | std::span<uint8_t> reinterpret_span_u8(std::span<char> s) {
  function reinterpret_span_char (line 41) | std::span<char> reinterpret_span_char(std::span<uint8_t> s) {
  function copy_into_context (line 45) | bool copy_into_context(N64Recomp::Context& out, const N64Recomp::Context...
  function main (line 209) | int main(int argc, const char** argv) {

FILE: RecompModTool/main.cpp
  type ModManifest (line 25) | struct ModManifest {
  type ModInputs (line 43) | struct ModInputs {
  type ModConfig (line 51) | struct ModConfig {
  function concat_if_not_empty (line 56) | static std::filesystem::path concat_if_not_empty(const std::filesystem::...
  function validate_version_string (line 66) | static bool validate_version_string(std::string_view str, bool& has_labe...
  function validate_dependency_string (line 122) | static bool validate_dependency_string(const std::string& val, size_t& n...
  function T (line 167) | static T read_toml_value(const toml::table& data, std::string_view key, ...
  function get_toml_path_array (line 208) | static std::vector<std::filesystem::path> get_toml_path_array(const toml...
  function validate_config_option (line 225) | bool validate_config_option(const toml::table& option) {
  function ModManifest (line 230) | ModManifest parse_mod_config_manifest(const std::filesystem::path& based...
  function ModInputs (line 377) | ModInputs parse_mod_config_inputs(const std::filesystem::path& basedir, ...
  function ModConfig (line 431) | ModConfig parse_mod_config(const std::filesystem::path& config_path, boo...
  function round_up_16 (line 475) | static inline uint32_t round_up_16(uint32_t value) {
  function parse_callback_name (line 479) | bool parse_callback_name(std::string_view data, std::string& dependency_...
  function string_vector_to_toml (line 498) | toml::array string_vector_to_toml(const std::vector<std::string>& input) {
  function write_manifest (line 506) | void write_manifest(const std::filesystem::path& path, const ModManifest...
  function build_mod_context (line 563) | N64Recomp::Context build_mod_context(const N64Recomp::Context& input_con...
  function create_mod_zip (line 1010) | bool create_mod_zip(const std::filesystem::path& output_dir, const ModCo...
  function main (line 1128) | int main(int argc, const char** argv) {

FILE: include/recomp.h
  function DMULT (line 41) | static inline void DMULT(int64_t a, int64_t b, int64_t* lo64, int64_t* h...
  function DMULTU (line 48) | static inline void DMULTU(uint64_t a, uint64_t b, uint64_t* lo64, uint64...
  function DMULT (line 61) | static inline void DMULT(int64_t a, int64_t b, int64_t* lo64, int64_t* h...
  function DMULTU (line 65) | static inline void DMULTU(uint64_t a, uint64_t b, uint64_t* lo64, uint64...
  function DDIV (line 73) | static inline void DDIV(int64_t a, int64_t b, int64_t* quot, int64_t* re...
  function DDIVU (line 79) | static inline void DDIVU(uint64_t a, uint64_t b, uint64_t* quot, uint64_...
  type gpr (line 84) | typedef uint64_t gpr;
  function load_doubleword (line 115) | static inline uint64_t load_doubleword(uint8_t* rdram, gpr reg, gpr offs...
  function gpr (line 126) | static inline gpr do_lwl(uint8_t* rdram, gpr initial_value, gpr offset, ...
  function gpr (line 143) | static inline gpr do_lwr(uint8_t* rdram, gpr initial_value, gpr offset, ...
  function do_swl (line 160) | static inline void do_swl(uint8_t* rdram, gpr offset, gpr reg, gpr val) {
  function do_swr (line 175) | static inline void do_swr(uint8_t* rdram, gpr offset, gpr reg, gpr val) {
  function gpr (line 190) | static inline gpr do_ldl(uint8_t* rdram, gpr initial_value, gpr offset, ...
  function gpr (line 206) | static inline gpr do_ldr(uint8_t* rdram, gpr initial_value, gpr offset, ...
  function do_sdl (line 222) | static inline void do_sdl(uint8_t* rdram, gpr offset, gpr reg, gpr val) {
  function do_sdr (line 243) | static inline void do_sdr(uint8_t* rdram, gpr offset, gpr reg, gpr val) {
  function get_cop1_cs (line 264) | static inline uint32_t get_cop1_cs() {
  function set_cop1_cs (line 288) | static inline void set_cop1_cs(uint32_t val) {
  function do_cvt_w_s (line 364) | static inline int32_t do_cvt_w_s(float val) {
  function do_cvt_l_s (line 372) | static inline int64_t do_cvt_l_s(float val) {
  function do_cvt_w_d (line 380) | static inline int32_t do_cvt_w_d(double val) {
  function do_cvt_l_d (line 388) | static inline int64_t do_cvt_l_d(double val) {
  type fpr (line 401) | typedef union {
  type recomp_context (line 414) | typedef struct {

FILE: include/recompiler/context.h
  function byteswap (line 15) | inline uint32_t byteswap(uint32_t val) {
  function byteswap (line 19) | constexpr uint32_t byteswap(uint32_t val) {
  function namespace (line 24) | namespace N64Recomp {
  type FunctionHook (line 204) | struct FunctionHook {
  function class (line 211) | class Context {
  function ModSymbolsError (line 596) | enum class ModSymbolsError {
  function validate_mod_id (line 652) | inline bool validate_mod_id(const std::string& str) {

FILE: include/recompiler/generator.h
  type InstructionContext (line 8) | struct InstructionContext {
  function class (line 28) | class Generator {

FILE: include/recompiler/live_recompiler.h
  type sljit_compiler (line 8) | struct sljit_compiler
  type LiveGeneratorContext (line 11) | struct LiveGeneratorContext
  type ReferenceJumpDetails (line 12) | struct ReferenceJumpDetails {
  type LiveGeneratorOutput (line 16) | struct LiveGeneratorOutput {
  function num_reference_symbol_jumps (line 41) | size_t num_reference_symbol_jumps() const;

FILE: include/recompiler/operations.h
  type class (line 12) | enum class
  type class (line 25) | enum class
  type class (line 69) | enum class
  function Operand (line 127) | enum class Operand {

FILE: src/analysis.cpp
  type RegState (line 13) | struct RegState {
    method RegState (line 32) | RegState() = default;
    method invalidate (line 34) | void invalidate() {
  function analyze_instruction (line 59) | bool analyze_instruction(const rabbitizer::InstructionCpu& instr, const ...

FILE: src/analysis.h
  type AbsoluteJump (line 10) | struct AbsoluteJump {

FILE: src/cgenerator.cpp
  type BinaryOpFields (line 9) | struct BinaryOpFields { std::string func_string; std::string infix_strin...
  function gpr_to_string (line 81) | static std::string gpr_to_string(int gpr_index) {
  function fpr_to_string (line 88) | static std::string fpr_to_string(int fpr_index) {
  function fpr_double_to_string (line 92) | static std::string fpr_double_to_string(int fpr_index) {
  function fpr_u32l_to_string (line 96) | static std::string fpr_u32l_to_string(int fpr_index) {
  function fpr_u64_to_string (line 105) | static std::string fpr_u64_to_string(int fpr_index) {
  function unsigned_reloc (line 109) | static std::string unsigned_reloc(const N64Recomp::InstructionContext& c...
  function signed_reloc (line 122) | static std::string signed_reloc(const N64Recomp::InstructionContext& con...
  type StoreSyntax (line 595) | enum class StoreSyntax {

FILE: src/config.cpp
  function concat_if_not_empty (line 8) | std::filesystem::path concat_if_not_empty(const std::filesystem::path& p...
  function get_manual_funcs (line 15) | std::vector<N64Recomp::ManualFunction> get_manual_funcs(const toml::arra...
  function get_data_syms_paths (line 41) | std::vector<std::filesystem::path> get_data_syms_paths(const toml::array...
  function get_stubbed_funcs (line 58) | std::vector<std::string> get_stubbed_funcs(const toml::table* patches_da...
  function get_ignored_funcs (line 84) | std::vector<std::string> get_ignored_funcs(const toml::table* patches_da...
  function get_renamed_funcs (line 107) | std::vector<std::string> get_renamed_funcs(const toml::table* patches_da...
  function get_func_sizes (line 130) | std::vector<N64Recomp::FunctionSize> get_func_sizes(const toml::array* f...
  function get_instruction_patches (line 160) | std::vector<N64Recomp::InstructionPatch> get_instruction_patches(const t...
  function get_function_hooks (line 204) | std::vector<N64Recomp::FunctionTextHook> get_function_hooks(const toml::...
  function get_mdebug_mappings (line 248) | void get_mdebug_mappings(const toml::array* mdebug_mappings_array,
  function reloc_type_from_name (line 540) | N64Recomp::RelocType reloc_type_from_name(const std::string& reloc_type_...

FILE: src/config.h
  type InstructionPatch (line 10) | struct InstructionPatch {
  type FunctionTextHook (line 16) | struct FunctionTextHook {
  type FunctionSize (line 22) | struct FunctionSize {

FILE: src/elf.cpp
  function read_symbols (line 11) | bool read_symbols(N64Recomp::Context& context, const ELFIO::elfio& elf_f...
  type SegmentEntry (line 203) | struct SegmentEntry {
  function get_segment (line 209) | std::optional<size_t> get_segment(const std::vector<SegmentEntry>& segme...
  function setup_context_for_elf (line 619) | static void setup_context_for_elf(N64Recomp::Context& context, const ELF...

FILE: src/main.cpp
  function add_manual_functions (line 16) | void add_manual_functions(N64Recomp::Context& context, const std::vector...
  function read_list_file (line 75) | bool read_list_file(const std::filesystem::path& filename, std::vector<s...
  function compare_files (line 90) | bool compare_files(const std::filesystem::path& file1_path, const std::f...
  function recompile_single_function (line 114) | bool recompile_single_function(const N64Recomp::Context& context, size_t...
  function dump_context (line 160) | void dump_context(const N64Recomp::Context& context, const std::unordere...
  function read_file (line 253) | static std::vector<uint8_t> read_file(const std::filesystem::path& path) {
  function main (line 269) | int main(int argc, char** argv) {

FILE: src/mdebug.cpp
  type MDebugSymbol (line 8) | struct MDebugSymbol {
  type MDebugFile (line 19) | struct MDebugFile {
  class MDebugInfo (line 24) | class MDebugInfo {
    method MDebugInfo (line 26) | MDebugInfo(const N64Recomp::ElfParsingConfig& config, const char* mdeb...
    method is_identifier_char (line 119) | bool is_identifier_char(char c) {
    method sanitize_section_name (line 135) | std::string sanitize_section_name(std::string section_name) {
    method populate_context (line 151) | bool populate_context(const N64Recomp::ElfParsingConfig& elf_config, N...
    method print (line 496) | void print() {
    method good (line 505) | bool good() {
    method add_file (line 509) | void add_file(std::string&& filename) {
    method add_symbol (line 515) | void add_symbol(size_t file_index, MDebugSymbol&& sym) {
  function get_func (line 526) | bool get_func(const char *mdata, const MDebug::HDRR& hdrr, const MDebug:...
  function read_mdebug (line 548) | void read_mdebug(N64Recomp::Context& context, ELFIO::section* mdebug_sec...

FILE: src/mdebug.h
  function namespace (line 12) | namespace N64Recomp
  type PDR (line 137) | struct PDR {
  type LANG (line 187) | enum LANG {
  function get_fMerge (line 202) | struct FDR {
  function swap (line 251) | void swap() {
  function std (line 273) | inline std::span<const AUX> get_auxs(const std::vector<AUX>& all_auxs) c...
  function std (line 277) | inline std::span<const PDR> get_pdrs(const std::vector<PDR>& all_pdrs) c...
  function std (line 281) | inline std::span<const SYMR> get_symrs(std::span<const SYMR> all_symrs) ...
  function relocate (line 301) | struct HDRR {
  function std (line 370) | inline std::vector<FDR> read_fdrs(const char* data) {

FILE: src/mod_symbols.cpp
  type FileHeader (line 5) | struct FileHeader {
  type FileSubHeaderV1 (line 10) | struct FileSubHeaderV1 {
  type SectionFlags (line 23) | enum class SectionFlags : uint32_t {
  type SectionHeaderV1 (line 28) | struct SectionHeaderV1 {
  type FuncV1 (line 38) | struct FuncV1 {
  type RelocV1 (line 50) | struct RelocV1 {
  type DependencyV1 (line 57) | struct DependencyV1 {
  type ImportV1 (line 63) | struct ImportV1 {
  type DependencyEventV1 (line 69) | struct DependencyEventV1 {
  type ReplacementV1 (line 75) | struct ReplacementV1 {
  type ExportV1 (line 82) | struct ExportV1 {
  type CallbackV1 (line 88) | struct CallbackV1 {
  type EventV1 (line 93) | struct EventV1 {
  type HookV1 (line 98) | struct HookV1 {
  function T (line 106) | const T* reinterpret_data(std::span<const char> data, size_t& offset, si...
  function check_magic (line 116) | bool check_magic(const FileHeader* header) {
  function round_up_4 (line 123) | static inline uint32_t round_up_4(uint32_t value) {
  function parse_v1 (line 127) | bool parse_v1(std::span<const char> data, const std::unordered_map<uint3...
  function vec_put (line 527) | void vec_put(std::vector<uint8_t>& vec, const T* data) {
  function vec_put (line 533) | void vec_put(std::vector<uint8_t>& vec, const std::string& data) {

FILE: src/operations.cpp
  type N64Recomp (line 3) | namespace N64Recomp {

FILE: src/recompilation.cpp
  type JalResolutionResult (line 16) | enum class JalResolutionResult {
  function JalResolutionResult (line 24) | JalResolutionResult resolve_jal(const N64Recomp::Context& context, size_...
  function ctx_gpr_prefix (line 107) | std::string_view ctx_gpr_prefix(int reg) {
  function process_instruction (line 115) | bool process_instruction(GeneratorType& generator, const N64Recomp::Cont...
  function recompile_function_impl (line 763) | bool recompile_function_impl(GeneratorType& generator, const N64Recomp::...
Condensed preview — 30 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (665K chars).
[
  {
    "path": ".github/workflows/validate.yml",
    "chars": 3264,
    "preview": "name: validate\non:\n  push:\n    branches:\n      - main\n  pull_request:\n    types: [opened, synchronize]\nconcurrency:\n  gr"
  },
  {
    "path": ".gitignore",
    "chars": 564,
    "preview": "# VSCode file settings\n.vscode/settings.json\n.vscode/c_cpp_properties.json\n\n# Input elf and rom files\n*.elf\n*.z64\n\n# Loc"
  },
  {
    "path": ".gitmodules",
    "chars": 447,
    "preview": "[submodule \"lib/rabbitizer\"]\n\tpath = lib/rabbitizer\n\turl = https://github.com/Decompollaborate/rabbitizer\n[submodule \"li"
  },
  {
    "path": "CMakeLists.txt",
    "chars": 8484,
    "preview": "cmake_minimum_required(VERSION 3.20)\nset(CMAKE_C_STANDARD 17)\nset(CMAKE_CXX_STANDARD 20)\nset(CMAKE_CXX_STANDARD_REQUIRED"
  },
  {
    "path": "LICENSE",
    "chars": 1074,
    "preview": "The MIT License (MIT)\n\nCopyright (c) 2024 Wiseguy\n\nPermission is hereby granted, free of charge, to any person obtaining"
  },
  {
    "path": "LiveRecomp/live_generator.cpp",
    "chars": 83606,
    "preview": "#include <cassert>\n#include <fstream>\n#include <unordered_map>\n#include <cmath>\n\n#include \"fmt/format.h\"\n#include \"fmt/o"
  },
  {
    "path": "LiveRecomp/live_recompiler_test.cpp",
    "chars": 13194,
    "preview": "#include <fstream>\n#include <chrono>\n#include <filesystem>\n#include <cinttypes>\n\n#include \"sljitLir.h\"\n#include \"recompi"
  },
  {
    "path": "OfflineModRecomp/main.cpp",
    "chars": 11526,
    "preview": "#include <filesystem>\n#include <fstream>\n#include <vector>\n#include <span>\n\n#include \"recompiler/context.h\"\n#include \"ra"
  },
  {
    "path": "README.md",
    "chars": 9145,
    "preview": "# N64: Recompiled\nN64: Recompiled is a tool to statically recompile N64 binaries into C code that can be compiled for an"
  },
  {
    "path": "RSPRecomp/src/rsp_recomp.cpp",
    "chars": 52034,
    "preview": "#include <optional>\n#include <fstream>\n#include <array>\n#include <vector>\n#include <unordered_set>\n#include <unordered_m"
  },
  {
    "path": "RecompModMerger/main.cpp",
    "chars": 12372,
    "preview": "#include <cstdio>\n#include <fstream>\n\n#include \"recompiler/context.h\"\n\ntemplate <typename T>\nbool read_file(const std::f"
  },
  {
    "path": "RecompModTool/main.cpp",
    "chars": 57623,
    "preview": "#include <array>\n#include <fstream>\n#include <filesystem>\n#include <iostream>\n#include <numeric>\n#include <cctype>\n#incl"
  },
  {
    "path": "include/recomp.h",
    "chars": 14845,
    "preview": "#ifndef __RECOMP_H__\n#define __RECOMP_H__\n\n#include <stdlib.h>\n#include <stdint.h>\n#include <math.h>\n#include <fenv.h>\n#"
  },
  {
    "path": "include/recompiler/context.h",
    "chars": 26578,
    "preview": "#ifndef __RECOMP_PORT__\n#define __RECOMP_PORT__\n\n#include <span>\n#include <string_view>\n#include <cstdint>\n#include <uti"
  },
  {
    "path": "include/recompiler/generator.h",
    "chars": 6320,
    "preview": "#ifndef __GENERATOR_H__\n#define __GENERATOR_H__\n\n#include \"recompiler/context.h\"\n#include \"operations.h\"\n\nnamespace N64R"
  },
  {
    "path": "include/recompiler/live_recompiler.h",
    "chars": 8510,
    "preview": "#ifndef __LIVE_RECOMPILER_H__\n#define __LIVE_RECOMPILER_H__\n\n#include <unordered_map>\n#include \"recompiler/generator.h\"\n"
  },
  {
    "path": "include/recompiler/operations.h",
    "chars": 5037,
    "preview": "#ifndef __OPERATIONS_H__\n#define __OPERATIONS_H__\n\n#include <unordered_map>\n\n#include \"rabbitizer.hpp\"\n\nnamespace N64Rec"
  },
  {
    "path": "src/analysis.cpp",
    "chars": 14764,
    "preview": "#include <set>\n#include <algorithm>\n\n#include \"rabbitizer.hpp\"\n#include \"fmt/format.h\"\n\n#include \"recompiler/context.h\"\n"
  },
  {
    "path": "src/analysis.h",
    "chars": 638,
    "preview": "#ifndef __RECOMP_ANALYSIS_H__\n#define __RECOMP_ANALYSIS_H__\n\n#include <cstdint>\n#include <vector>\n\n#include \"recompiler/"
  },
  {
    "path": "src/cgenerator.cpp",
    "chars": 27992,
    "preview": "#include <cassert>\n#include <fstream>\n\n#include \"fmt/format.h\"\n#include \"fmt/ostream.h\"\n\n#include \"recompiler/generator."
  },
  {
    "path": "src/config.cpp",
    "chars": 38621,
    "preview": "#include <iostream>\n\n#include <toml++/toml.hpp>\n#include \"fmt/format.h\"\n#include \"config.h\"\n#include \"recompiler/context"
  },
  {
    "path": "src/config.h",
    "chars": 2692,
    "preview": "#ifndef __RECOMP_CONFIG_H__\n#define __RECOMP_CONFIG_H__\n\n#include <cstdint>\n#include <filesystem>\n#include <vector>\n#inc"
  },
  {
    "path": "src/elf.cpp",
    "chars": 36336,
    "preview": "#include <optional>\n\n#include \"fmt/format.h\"\n// #include \"fmt/ostream.h\"\n\n#include \"recompiler/context.h\"\n#include \"elfi"
  },
  {
    "path": "src/main.cpp",
    "chars": 49574,
    "preview": "#include <cstdio>\n#include <cstdlib>\n#include <unordered_set>\n#include <span>\n#include <filesystem>\n#include <optional>\n"
  },
  {
    "path": "src/mdebug.cpp",
    "chars": 31393,
    "preview": "#include <algorithm>\n#include <unordered_map>\n\n#include \"fmt/format.h\"\n\n#include \"mdebug.h\"\n\nstruct MDebugSymbol {\n    s"
  },
  {
    "path": "src/mdebug.h",
    "chars": 18367,
    "preview": "#ifndef __RECOMP_MDEBUG_H__\n#define __RECOMP_MDEBUG_H__\n\n#include <cassert>\n#include <cstdint>\n#include <span>\n#include "
  },
  {
    "path": "src/mod_symbols.cpp",
    "chars": 34899,
    "preview": "#include <cstring>\n\n#include \"recompiler/context.h\"\n\nstruct FileHeader {\n    char magic[8]; // N64RSYMS\n    uint32_t ver"
  },
  {
    "path": "src/operations.cpp",
    "chars": 23155,
    "preview": "#include \"recompiler/operations.h\"\n\nnamespace N64Recomp {\n    const std::unordered_map<InstrId, UnaryOp> unary_ops {\n   "
  },
  {
    "path": "src/recompilation.cpp",
    "chars": 37005,
    "preview": "#include <vector>\n#include <set>\n#include <unordered_set>\n#include <unordered_map>\n#include <cassert>\n\n#include \"rabbiti"
  },
  {
    "path": "src/symbol_lists.cpp",
    "chars": 14479,
    "preview": "#include \"recompiler/context.h\"\n\nconst std::unordered_set<std::string> N64Recomp::reimplemented_funcs {\n    // OS initia"
  }
]

About this extraction

This page contains the full source code of the N64Recomp/N64Recomp GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 30 files (629.4 KB), approximately 155.1k tokens, and a symbol index with 228 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!