[
  {
    "path": ".dockerignore",
    "content": "**/__pycache__\r\n**/.venv\r\n**/.classpath\r\n**/.dockerignore\r\n**/.env\r\n**/.git\r\n**/.gitignore\r\n**/.project\r\n**/.settings\r\n**/.toolstarget\r\n**/.vs\r\n**/.vscode\r\n**/*.*proj.user\r\n**/*.dbmdl\r\n**/*.jfm\r\n**/bin\r\n**/charts\r\n**/docker-compose*\r\n**/compose*\r\n**/Dockerfile*\r\n**/node_modules\r\n**/npm-debug.log\r\n**/obj\r\n**/secrets.dev.yaml\r\n**/values.dev.yaml\r\n**/results\r\n**/.cache\r\n**/.changed_files_cache\r\n**/prototyping\r\n**/node_modules\r\nLICENSE\r\nREADME.md\r\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "content": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n**Describe the bug**\nA clear and concise description of what the bug is.\n\n**To Reproduce**\nSteps to reproduce the behavior:\n1. Go to '...'\n2. Click on '....'\n3. Scroll down to '....'\n4. See error\n\n**Expected behavior**\nA clear and concise description of what you expected to happen.\n\n**Screenshots**\nIf applicable, add screenshots to help explain your problem.\n\n**Desktop (please complete the following information):**\n - OS: [e.g. iOS]\n - Browser [e.g. chrome, safari]\n - Version [e.g. 22]\n\n**Smartphone (please complete the following information):**\n - Device: [e.g. iPhone6]\n - OS: [e.g. iOS8.1]\n - Browser [e.g. stock browser, safari]\n - Version [e.g. 22]\n\n**Additional context**\nAdd any other context about the problem here.\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "content": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n**Is your feature request related to a problem? Please describe.**\nA clear and concise description of what the problem is. Ex. I'm always frustrated when [...]\n\n**Describe the solution you'd like**\nA clear and concise description of what you want to happen.\n\n**Describe alternatives you've considered**\nA clear and concise description of any alternative solutions or features you've considered.\n\n**Additional context**\nAdd any other context or screenshots about the feature request here.\n"
  },
  {
    "path": ".gitignore",
    "content": "acab\nautopsy*\n__pycache__\nlibvmdl\nsleuthkit-*\n.cache\n.DS_Store\n.vscode\n*.txt\ndiffs/\nnode_modules\n.changed_files_cache/\nresults/\nfrontend/yarn.lock\nfeedback.md\npreview.sh\nvolatility3/\nbuild/\n*volatilitycache*\nprofile.pstats\nvolatility3\nlibvmdk/\nfrontend/public/json\nfrontend/build/json\n!requirements.txt\n"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "content": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nWe as members, contributors, and leaders pledge to make participation in our\ncommunity a harassment-free experience for everyone, regardless of age, body\nsize, visible or invisible disability, ethnicity, sex characteristics, gender\nidentity and expression, level of experience, education, socio-economic status,\nnationality, personal appearance, race, religion, or sexual identity\nand orientation.\n\nWe pledge to act and interact in ways that contribute to an open, welcoming,\ndiverse, inclusive, and healthy community.\n\n## Our Standards\n\nExamples of behavior that contributes to a positive environment for our\ncommunity include:\n\n* Demonstrating empathy and kindness toward other people\n* Being respectful of differing opinions, viewpoints, and experiences\n* Giving and gracefully accepting constructive feedback\n* Accepting responsibility and apologizing to those affected by our mistakes,\n  and learning from the experience\n* Focusing on what is best not just for us as individuals, but for the\n  overall community\n\nExamples of unacceptable behavior include:\n\n* The use of sexualized language or imagery, and sexual attention or\n  advances of any kind\n* Trolling, insulting or derogatory comments, and personal or political attacks\n* Public or private harassment\n* Publishing others' private information, such as a physical or email\n  address, without their explicit permission\n* Other conduct which could reasonably be considered inappropriate in a\n  professional setting\n\n## Enforcement Responsibilities\n\nCommunity leaders are responsible for clarifying and enforcing our standards of\nacceptable behavior and will take appropriate and fair corrective action in\nresponse to any behavior that they deem inappropriate, threatening, offensive,\nor harmful.\n\nCommunity leaders have the right and responsibility to remove, edit, or reject\ncomments, commits, code, wiki edits, issues, and other contributions that are\nnot aligned to this Code of Conduct, and will communicate reasons for moderation\ndecisions when appropriate.\n\n## Scope\n\nThis Code of Conduct applies within all community spaces, and also applies when\nan individual is officially representing the community in public spaces.\nExamples of representing our community include using an official e-mail address,\nposting via an official social media account, or acting as an appointed\nrepresentative at an online or offline event.\n\n## Enforcement\n\nInstances of abusive, harassing, or otherwise unacceptable behavior may be\nreported to the community leaders responsible for enforcement at\n127803604+vmdiff@users.noreply.github.com.\nAll complaints will be reviewed and investigated promptly and fairly.\n\nAll community leaders are obligated to respect the privacy and security of the\nreporter of any incident.\n\n## Enforcement Guidelines\n\nCommunity leaders will follow these Community Impact Guidelines in determining\nthe consequences for any action they deem in violation of this Code of Conduct:\n\n### 1. Correction\n\n**Community Impact**: Use of inappropriate language or other behavior deemed\nunprofessional or unwelcome in the community.\n\n**Consequence**: A private, written warning from community leaders, providing\nclarity around the nature of the violation and an explanation of why the\nbehavior was inappropriate. A public apology may be requested.\n\n### 2. Warning\n\n**Community Impact**: A violation through a single incident or series\nof actions.\n\n**Consequence**: A warning with consequences for continued behavior. No\ninteraction with the people involved, including unsolicited interaction with\nthose enforcing the Code of Conduct, for a specified period of time. This\nincludes avoiding interactions in community spaces as well as external channels\nlike social media. Violating these terms may lead to a temporary or\npermanent ban.\n\n### 3. Temporary Ban\n\n**Community Impact**: A serious violation of community standards, including\nsustained inappropriate behavior.\n\n**Consequence**: A temporary ban from any sort of interaction or public\ncommunication with the community for a specified period of time. No public or\nprivate interaction with the people involved, including unsolicited interaction\nwith those enforcing the Code of Conduct, is allowed during this period.\nViolating these terms may lead to a permanent ban.\n\n### 4. Permanent Ban\n\n**Community Impact**: Demonstrating a pattern of violation of community\nstandards, including sustained inappropriate behavior,  harassment of an\nindividual, or aggression toward or disparagement of classes of individuals.\n\n**Consequence**: A permanent ban from any sort of public interaction within\nthe community.\n\n## Attribution\n\nThis Code of Conduct is adapted from the [Contributor Covenant][homepage],\nversion 2.0, available at\nhttps://www.contributor-covenant.org/version/2/0/code_of_conduct.html.\n\nCommunity Impact Guidelines were inspired by [Mozilla's code of conduct\nenforcement ladder](https://github.com/mozilla/diversity).\n\n[homepage]: https://www.contributor-covenant.org\n\nFor answers to common questions about this code of conduct, see the FAQ at\nhttps://www.contributor-covenant.org/faq. Translations are available at\nhttps://www.contributor-covenant.org/translations.\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "## Contributing\n\n* I’m not going be working on/maintaining vmdiff for at least 12 months, maybe ever\n* I’d _love_ for someone to steal this genius idea, either forking the prototype, or making their own\n\n## Future work\n\n* If a Windows disk has corrupted sectors, `dfvfs` can’t read those sectors. This comes up a lot, and while you can run `chkdsk` on the VM to get around it, it would be nice to not have to.\n* It would be nice to be able to diff snapshots of your actual computer, not a virtual machine, but this is hard without external storage\n  * The two snapshots of your disk may not fit on your disk itself, to say nothing of the memory snapshots\n\n* See the [blog post](https://community.atlassian.com/t5/Trust-Security-articles/Introducing-vmdiff-a-tool-to-find-everything-that-changes-on/ba-p/2321969) for allll the good details\n"
  },
  {
    "path": "Dockerfile",
    "content": "FROM node:lts-alpine as frontend\r\nWORKDIR /app\r\nENV PATH /app/node_modules/.bin:$PATH\r\nCOPY frontend ./\r\nRUN yarn install --production\r\nRUN yarn build --production\r\n\r\n\r\n# For more information, please refer to https://aka.ms/vscode-docker-python\r\nFROM python:3.8-slim\r\n\r\n\r\nEXPOSE 5000\r\n\r\n# Keeps Python from generating .pyc files in the container\r\nENV PYTHONDONTWRITEBYTECODE=1\r\n\r\n# Turns off buffering for easier container logging\r\nENV PYTHONUNBUFFERED=1\r\n\r\n# Install pip requirements\r\nCOPY server_requirements.txt .\r\nRUN python -m pip install -r server_requirements.txt\r\n\r\nWORKDIR /app\r\nCOPY --from=frontend /app/build /react-build/\r\nCOPY backend/ backend/\r\nCOPY server.py .\r\nCOPY config.py .\r\n\r\n\r\n# Creates a non-root user with an explicit UID and adds permission to access the /app folder\r\n# For more info, please refer to https://aka.ms/vscode-docker-python-configure-containers\r\nRUN adduser -u 5678 --disabled-password --gecos \"\" appuser && chown -R appuser /app\r\nUSER appuser\r\n\r\n# During debugging, this entry point will be overridden. For more information, please refer to https://aka.ms/vscode-docker-python-debug\r\nCMD [\"gunicorn\", \"--bind\", \"0.0.0.0:5000\", \"server:app\"]\r\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 vmdiff\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# vmdiff\n\n![logo](https://community.atlassian.com/t5/image/serverpage/image-id/250140i6BA42D04B2F49CE1/image-dimensions/280x210?v=v2)\n\nA tool to compare virtual machine snapshots, allowing you to see everything that changes on your computer.\n\n## Blog post\nThere's also a delightful [companion blog post](https://community.atlassian.com/t5/Trust-Security-articles/Introducing-vmdiff-a-tool-to-find-everything-that-changes-on/ba-p/2321969) with more context :))\n\n## Features\n\n* Accepts two Windows or macOS virtual machine snapshots (`.vmdk` and `.vmem` files)\n* Diffs all files on both disks, line-by line (including deleted files). If it’s not in the list, it didn’t happen\n* Diffs memory (running processes, command lines, and environment variables) on Windows\n* Diffs also available to search/process via terminal as local directories (think `grep`)\n* Runs on Windows, macOS, Linux\n\n![Demo](https://community.atlassian.com/t5/image/serverpage/image-id/250126i9D3D94314406622B/image-dimensions/749x376?v=v2)\n\n![Process tree](https://community.atlassian.com/t5/image/serverpage/image-id/250138iB53029B9F025028D/image-size/large?v=v2&px=999)\n\n![Terminal parsing](https://community.atlassian.com/t5/image/serverpage/image-id/250129i6BE4A67E932C3C34/image-size/large?v=v2&px=999)\n\n## Installation\n\n```shell\ngit clone github.com/vmdiff/vmdiff-prototype\ncd vmdiff-prototype\n```\n\n### Install Docker\n\nDocker will need to be installed and running, since `vmdiff` uses `docker-compose`.\n\n### Install dependencies for the CLI\n\n```shell\npip install -r requirements.txt\n```\n\n## Usage\n\nYou'll need a directory in which the virtual machine snapshots (`.vmdk` and `.vmem` files) are all stored.\nFor [VMWare](https://kb.vmware.com/s/article/1003880), the default directories are:\n\n* `C:\\Users\\<username>\\My Documents\\My Virtual Machines\\<VM name>\\` (Windows)\n* `~/Virtual Machines.localized/<VM name>/` (macOS)\n* `~/vmware/` (Linux)\n\n```shell\n$ ./vmdiff --help\n                                                                                                                              \n Usage: vmdiff [OPTIONS] INPUT_DIR                                                                                            \n                                                                                                                              \n                                                                                                                              \n Generate and view diffs for .vmdk and .vmem files.                                                                           \n EXAMPLES:                                                                                                                    \n                                                                                                                              \n What snapshots do I have to choose from?                                                                                     \n     ./vmdiff \"~/Virtual Machines.localized/VMName/\" --list-snapshots                                                         \n                                                                                                                              \n Diff snapshots 1 and 2                                                                                                       \n     ./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2                                        \n                                                                                                                              \n Don't prompt me for a partition, I know it's partition 4                                                                     \n     ./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2 --partition 4                          \n                                                                                                                              \n Diff generic VMDK files, not necessarily from a snapshot                                                                     \n     ./vmdiff ~/dir-with-vmdk-files/ --from-disk disk1.vmdk --to-disk disk2.vmdk --no-use-memory                              \n                                                                                                                              \n Only show files that have changed in the user's home directory                                                               \n     ./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2 --filter-path \"/home/username/\"        \n                                                                                                                              \n Ignore .log and .txt files                                                                                                   \n     ./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2 --filter-path \"/home/username/\"        \n --ignore-path \".*\\.log\" --ignore-path \".*\\.txt\"                                                                              \n                                                                                                                              \n╭─ Input and output ─────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ *    input_dir      DIRECTORY  Path to virtual machine directory, or any directory containing .vmdk/.vmem files.           │\n│                                [required]                                                                                  │\n╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ --list-snapshots  -l        Show information about the VM snapshots in INPUT_DIR, e.g. the files belonging to each         │\n│                             snapshot.                                                                                      │\n│ --debug                     Enable debug logging.                                                                          │\n│ --help                      Show this message and exit.                                                                    │\n╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n╭─ Input and output ─────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ --from-disk      -fd      PATH  Path (or filename) of first chronological disk snapshot.                                   │\n│ --to-disk        -td      PATH  Path (or filename) of second chronological disk snapshot.                                  │\n│ --from-memory    -fm      PATH  Path (or filename) of first chronological memory snapshot.                                 │\n│ --to-memory      -tm      PATH  Path (or filename) of second chronological memory snapshot.                                │\n│ --from-snapshot  -fs      TEXT  First chronological snapshot ID obtained via --list-snapshots.                             │\n│ --to-snapshot    -ts      TEXT  Second chronological snapshot ID obtained via --list-snapshots.                            │\n│ --partition      -p       TEXT  Disk Partition ID to use. If not set, show partitions and ask which one to use via STDIN.  │\n╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n╭─ Configuring ──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ --ignore-path     -i                         TEXT  List of disk path regular expressions to ignore when diffing. Multiple  │\n│                                                    values accepted via e.g. \"--ignore-path /path/one --ignore-path         │\n│                                                    /path/two\"                                                              │\n│ --filter-path     -f                         TEXT  List of disk path regular expressions. Only these paths will be         │\n│                                                    processed. Multiple values accepted via e.g. \"--filter-path /path/one   │\n│                                                    --filter-path /path/two\"                                                │\n│                                                    [default: /, \\]                                                         │\n│ --ignore-process  -I                         TEXT  Regular expression to ignore when diffing process names. Note that only │\n│                                                    the first 14 characters of the process name are processed (by           │\n│                                                    Volatility).                                                            │\n│ --cache               --no-cache                   Whether to cache results based on input filenames and config options.   │\n│                                                    [default: cache]                                                        │\n│ --use-memory          --no-use-memory              Whether to process/diff memory. [default: use-memory]                   │\n│ --use-disk            --no-use-disk                Whether to process/diff disks. [default: use-disk]                      │\n│ --include-binary      --no-include-binary          Whether to also process and diff binary files.                          │\n│                                                    [default: no-include-binary]                                            │\n╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n╭─ Display ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ --show  -s        Open browser and show diff viewer UI.                                                                    │\n╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n```\n\n### Typical usage\n\nWhich snapshots do I have to choose from?\n\n```shell\n./vmdiff \"~/Virtual Machines.localized/VMName/\" --list-snapshots\n                     Found snapshots in ~/Virtual Machines.localized/VirtualMachine.vmwarevm\n┏━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃    ┃ Parent ┃                     ┃                             ┃                            ┃                             ┃\n┃ ID ┃ ID     ┃ Creation time       ┃ Disk file                   ┃ Memory file                ┃ Description                 ┃\n┡━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ 1  │        │ 2022-11-17 13:24:39 │ VirtualMachine-disk1.vmdk   │ VirtualMachine-Snapshot1.… │ Initial Snapshot            │\n│ 2  │ 1      │ 2022-11-17 13:39:40 │ VirtualMachine-disk1-00000… │ VirtualMachine-Snapshot2.… │ Snapshot after changes made │\n└────┴────────┴─────────────────────┴─────────────────────────────┴────────────────────────────┴─────────────────────────────┘\n```\n\nLet's diff snapshots 1 and 2 (this will prompt you for which partition to use on STDIN unless you use `--partition`)\n\n```shell\n./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2\n```\n\nNow let's view the diffs in browser:\n\n```shell\n./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2 --show\n```\n\nThe UI will then be running on `http://localhost:5000`\n\n### Browse the diffs via shell\n\nThe raw diffs are available in a directory structure mirroring the VM in the `results/` directory\n\n## How it works\n\n![Diagram](https://community.atlassian.com/t5/image/serverpage/image-id/250141i2CC67D463D148789/image-size/large?v=v2&px=999)\n\n### Tech Stack\n\n* [Typer](https://typer.tiangolo.com/) (CLI)\n* docker-compose\n* Volatility (to parse memory images)\n* [dfvfs](https://github.com/log2timeline/dfvfs) (to parse disk images)\n* Custom fork of [pyvmdk](https://github.com/libyal/libvmdk) (enables .vmdk delta disks for snapshots)\n* React + TypeScript + Ant Design (frontend)\n* grep (Searching diffs via command line)\n\n## Contributing\n\n* I’m not going be working on/maintaining vmdiff for at least 12 months, maybe ever\n* I’d _love_ for someone to steal this genius idea, either forking the prototype, or making their own\n\n## Future work\n\n* If a Windows disk has corrupted sectors, `dfvfs` can’t read those sectors. This comes up a lot, and while you can run `chkdsk` on the VM to get around it, it would be nice to not have to.\n* It would be nice to be able to diff snapshots of your actual computer, not a virtual machine, but this is hard without external storage\n  * The two snapshots of your disk may not fit on your disk itself, to say nothing of the memory snapshots\n"
  },
  {
    "path": "__init__.py",
    "content": ""
  },
  {
    "path": "backend/Dockerfile",
    "content": "# For more information, please refer to https://aka.ms/vscode-docker-python\r\nFROM ubuntu:22.04\r\n\r\n# Keeps Python from generating .pyc files in the container\r\nENV PYTHONDONTWRITEBYTECODE=1\r\n\r\n# Turns off buffering for easier container logging\r\nENV PYTHONUNBUFFERED=1\r\n\r\nRUN apt-get update && apt-get install -y \\\r\n    gcc \\ \r\n    python3 \\\r\n    python3-pip\r\n# python3-dfvfs \r\n\r\n\r\n# Install pip requirements\r\nCOPY backend/requirements.txt .\r\n\r\nRUN pip3 install dfvfs==20220816\r\n\r\n# CRIME TIME\r\n# Copy in our patched pyvmdk with delta disk support, putting it in the same directory as `vmdk_file_io.py` \r\nCOPY backend/pyvmdk_delta.py /usr/local/lib/python3.10/dist-packages/dfvfs/file_io\r\nCOPY backend/vmdk_file_io.py /usr/local/lib/python3.10/dist-packages/dfvfs/file_io\r\n\r\n\r\n# Put config.py in the same relative location as outside the containers so it can be imported.\r\nCOPY config.py /\r\n\r\nWORKDIR /backend\r\n\r\n\r\n# Creates a non-root user with an explicit UID and adds permission to access the /app folder\r\n# For more info, please refer to https://aka.ms/vscode-docker-python-configure-containers\r\n# RUN adduser -u 5678 --disabled-password --gecos \"\" appuser && chown -R appuser /app\r\n# USER appuser\r\n\r\n# During debugging, this entry point will be overridden. For more information, please refer to https://aka.ms/vscode-docker-python-debug\r\nCMD [\"python3\", \"vmdiff.py\"]\r\n"
  },
  {
    "path": "backend/__init__.py",
    "content": ""
  },
  {
    "path": "backend/diff_tree.py",
    "content": "import utils\n\n\nclass DiffTree(object):\n\n    def __init__(self, differ):\n        self.nodes = {}\n        self.children_map = {}\n\n        # Parents of leaf nodes only\n        self.leaf_parents = {}\n\n        # Create the nested array structure that will be the tree.\n        self.tree = []\n        self.root = None\n\n        self.node_parent_ids = {}\n\n        self.differ = differ\n\n        self.create_file_tree()\n\n    def merge(self, other):\n        \"\"\"Combine this diff tree with another (only so it can be cached/uncached)\"\"\"\n        self.nodes.update(other.nodes)\n        self.children_map.update(other.children_map)\n        self.tree.extend(other.tree)\n        return self\n\n    def get_tree(self):\n        return self.tree\n\n    def get_children_map(self):\n        return self.children_map\n\n    def get_children(self, parent_node):\n        key = parent_node[\"key\"]\n        if key not in self.children_map:\n            self.children_map[key] = []\n        return self.children_map[key]\n\n    def add_child(self, parent_node, child):\n        key = parent_node[\"key\"]\n        if key not in self.children_map:\n            self.children_map[key] = []\n        self.children_map[key].append(child)\n\n    def create_node(self, path: str, is_dir=True, is_leaf=False):\n        \"\"\"Create a node, allowing for children to be added later.\"\"\"\n\n        if path in self.nodes:\n            return\n\n        if self.differ.diff_type == \"disk\":\n            p = utils.ensure_posix(path)\n            parent_id = None if p.parent == p else str(p.parent)\n\n            if parent_id is None:\n                text = \"/\"\n            else:\n                text = p.name\n            key = str(p)\n        elif self.differ.diff_type == \"process\":\n            node_id = path\n            text = node_id\n            pid = node_id.split(\"-\")[-1]\n            key = pid\n\n        # Defaults (for created parent nodes, mostly)\n        status = \"unchanged\"\n        lines_added = 0\n        lines_removed = 0\n\n        diff = self.differ.diff(path)\n\n        if diff is not None:\n            status = diff.status\n            lines_added = diff.lines_added\n            lines_removed = diff.lines_removed\n\n            if diff.title:\n                text = diff.title\n\n            # This gets fixed later.\n            is_leaf = not diff.is_dir\n            is_dir = diff.is_dir\n\n            ppid = diff.ppid\n            if ppid is not None:\n                # Save which node is this node's parent, if any.\n                self.node_parent_ids[pid] = ppid\n\n        node = {\n            \"title\": text,\n            \"key\": key,\n            \"isLeaf\": is_leaf,\n            \"isDirectory\": is_dir,\n            \"children\": [],\n            \"status\": status,\n            \"linesAdded\": lines_added,\n            \"linesRemoved\": lines_removed,\n            \"numChildren\": 0,\n            \"numDirectChildren\": 0,\n        }\n        self.nodes[key] = node\n        return node\n\n    def create_root_process_node(self):\n        key = \"Processes\"\n        node = {\n            \"title\": key,\n            \"key\": key,\n            \"isLeaf\": False,\n            \"isDirectory\": False,\n            \"children\": [],\n            \"status\": \"modified\",\n            \"linesAdded\": 0,\n            \"linesRemoved\": 0,\n            \"numChildren\": 0,\n            \"numDirectChildren\": 0,\n        }\n        self.nodes[key] = node\n        self.node_parent_ids[key] = key\n        self.root = node\n        return node\n\n    def get_parent(self, node):\n        if self.differ.diff_type == \"disk\":\n            p = utils.ensure_posix(node[\"key\"])\n            parent_path = str(p.parent)\n            parent_node = self.nodes.get(parent_path)\n            return parent_node\n\n        elif self.differ.diff_type == \"process\":\n            parent_id = self.node_parent_ids.get(node[\"key\"])\n            parent = self.nodes.get(parent_id)\n            if parent is None:\n                return self.root\n            return parent\n\n    def create_file_tree(self):\n\n        # If we're calling this function a second time, we don't need to do anything, the tree is already generated.\n        if len(self.tree) > 0:\n            return\n\n        def create_parent_nodes(path: str):\n\n            p = utils.ensure_posix(path)\n            parent_paths = p.parents\n            for parent_path in parent_paths:\n                parent_path = str(parent_path)\n                if parent_path not in self.nodes:\n                    self.nodes[parent_path] = self.create_node(\n                        parent_path, is_dir=True)\n\n        paths = self.differ.diffs.keys()\n\n        # Create flat node index.\n        for path in paths:\n            if self.differ.diff_type == \"disk\":\n                create_parent_nodes(path)\n\n            self.create_node(path)\n\n        if self.differ.diff_type == \"process\" and len(self.differ.diffs) > 0:\n            self.create_root_process_node()\n\n        # Link up the nodes to their parents\n        for path, node in sorted(self.nodes.items()):\n\n            parent_node = self.get_parent(node)\n\n            # If this node is the root, just add it.\n            if parent_node == node:\n                self.root = node\n                self.tree.append(node)\n                continue\n\n            # Otherwise, insert this node underneath the parent node.\n            # Sorting paths guarantees that parents are inserted first, then children.\n            if parent_node:\n                # Link this node to its parent\n                self.add_child(parent_node, node)\n\n        if len(self.tree) > 0:\n            root = self.tree[0]\n        else:\n            root = []\n\n        if len(self.tree) == 0:\n            return []\n        # Fix the tree\n        for node in reversed(list(self.traverse(root))):\n            # Directories without children should be leaves.\n            children = self.get_children(node)\n            if len(children) == 0:\n                node[\"isLeaf\"] = True\n            else:\n                node[\"isLeaf\"] = False\n                # Count the number of file descendants of each node.\n                for child in children:\n                    num_child_children = 0\n                    # Don't count directories as children.\n                    if not child[\"isDirectory\"]:\n                        num_child_children += 1\n                        node[\"numDirectChildren\"] += 1\n\n                    num_child_children += child[\"numChildren\"]\n\n                    node[\"numChildren\"] += num_child_children\n\n        return\n\n    def traverse(self, node):\n        yield node\n        for child in self.get_children(node):\n            yield from self.traverse(child)\n"
  },
  {
    "path": "backend/diffcache.py",
    "content": "import pathlib\nimport os\nimport logging\nimport json\n\nimport unified_diff\nimport utils\n\nDIR_META_FILENAME = \".__this_directory__\"\n\n\nclass DiffCache(object):\n\n    def __init__(self, run_disk_path, run_tree_path, run_process_path=None):\n        self.run_path = pathlib.Path(run_disk_path)\n        self.tree_path = pathlib.Path(run_tree_path)\n        self.run_process_path = pathlib.Path(str(run_process_path))\n        if run_process_path:\n            os.makedirs(self.run_process_path, exist_ok=True)\n\n    def cache_results(self, results):\n        \"\"\"Create output directory, and write the same filesystem into it as in the results\"\"\"\n\n        os.makedirs(self.run_path, exist_ok=True)\n        # Sort by path, so we only create parent directories after children.\n        for path, diff in sorted(results.items(), key=lambda tup: tup[0]):\n\n            path = utils.ensure_posix(path)\n\n            if diff.is_dir:\n                path = path / pathlib.Path(DIR_META_FILENAME)\n\n            root, *relative_disk_path = path.parts\n\n            relative_disk_path = pathlib.Path(\n                relative_disk_path[0]).joinpath(*relative_disk_path[1:])\n\n            result_path = self.run_path / pathlib.Path(relative_disk_path)\n\n            try:\n                # Create the parent directories\n                result_path.parent.mkdir(parents=True, exist_ok=True)\n            except FileExistsError:\n                # This means a path has changed from a directory to a file.\n                # Whatever, tho\n                # Limitation: Let's keep it as a directory\n                result_path.parent.rename(\n                    result_path.parent.with_suffix(\".__renamed__\"))\n\n                result_path.parent.mkdir(parents=True, exist_ok=True)\n\n                logging.warning(\n                    f\"Ignoring file exists error when creating parents for {str(result_path)}, overwriting parent file with directory.\")\n\n            if result_path.is_dir():\n                result_path = result_path.with_suffix(\".__directory_as_file__\")\n                logging.warning(\n                    f\"Path has changed from directory to file (or vice versa), writing as {str(result_path)}\")\n\n            # Write the diff file.\n            with open(result_path, \"w\") as f:\n                f.writelines(diff.diff_lines)\n\n    def ensure_posix(self, path):\n        if path.startswith(\"\\\\\"):\n            # Force POSIX path so that we can create the directory structure in the Docker container, even if the path is Windows.\n            path = pathlib.PureWindowsPath(path).as_posix()\n        path = pathlib.Path(path)\n        return path\n\n    def cache_process_results(self, results):\n        for pid, diff in results.items():\n\n            filename = pid\n\n            result_path = self.run_process_path / filename\n\n            # Write the diff file.\n            with open(result_path, \"w\") as f:\n                f.writelines(diff.diff_lines)\n\n    def get_process_diff_from_cache(self, pid):\n        filename = pid\n        result_path = self.run_process_path / filename\n        try:\n            with open(result_path, \"r\") as f:\n                lines = f.readlines()\n                diff = unified_diff.UnifiedDiff(lines)\n                return diff\n        except FileNotFoundError:\n            print(f\"Process diff cache not found: {result_path}\")\n            return None\n\n    def get_diff_from_cache(self, vm_path):\n\n        if not self.run_path.exists:\n            return None\n\n        vm_path = utils.ensure_posix(vm_path)\n\n        # Slice off the root (and drive on Windows) from the vm path, so it's not an absolute path\n        cache_path = self.run_path.joinpath(*vm_path.parts[1:])\n        is_dir = False\n        # If this was a directory on the VM, the diff is stored in a file called DIR_META_FILENAME\n        if cache_path.joinpath(DIR_META_FILENAME).exists():\n            is_dir = True\n            cache_path = cache_path.joinpath(DIR_META_FILENAME)\n\n        if not cache_path.is_file():\n            return None\n\n        with open(cache_path) as f:\n            lines = f.readlines()\n            diff = unified_diff.UnifiedDiff(lines, is_dir)\n            return diff\n\n    def get_diff(self, key):\n        # If the key is a process ID (numeric)\n        if key.isdigit():\n            return self.get_process_diff_from_cache(key)\n        else:\n            return self.get_diff_from_cache(key)\n\n    def cache_exists(self):\n        return self.run_path.exists() and self.tree_cache_exists()\n\n    def process_cache_exists(self):\n        return self.run_process_path is not None and self.run_process_path.exists()\n\n    def get_cached_results(self):\n\n        if not self.cache_exists():\n            raise RuntimeError(f\"Cache path {self.run_path} does not exist!\")\n\n        results = {}\n\n        logging.info(f\"Loading from diff cache {self.run_dir}\")\n        for path, subdirs, files in os.walk(self.run_path):\n            for filename in files:\n                is_dir = False\n                if filename == DIR_META_FILENAME:\n                    is_dir = True\n                filepath = os.path.join(path, filename)\n                with open(filepath) as f:\n                    lines = f.readlines()\n                    diff = unified_diff.UnifiedDiff(lines, is_dir)\n                    relative_path = pathlib.Path(\n                        filepath).relative_to(self.run_path)\n                    if is_dir:\n                        # Remove dir suffix if this is a dir\n                        relative_path = relative_path.parent\n\n                    original_path = os.path.join(\"/\", relative_path)\n\n                    results[original_path] = diff\n\n        return results\n\n    def tree_cache_exists(self):\n        return (self.tree_path / \"tree.json\").exists()\n\n    def cache_tree(self, tree):\n        os.makedirs(self.tree_path, exist_ok=True)\n        with open(self.tree_path / \"tree.json\", \"w\") as f:\n            json.dump(tree.get_tree(), f)\n        with open(self.tree_path / \"children.json\", \"w\") as f:\n            json.dump(tree.get_children_map(), f)\n\n    def get_tree_data_from_cache(self):\n        with open(self.tree_path / \"tree.json\", \"r\") as f:\n            tree = json.load(f)\n        with open(self.tree_path / \"children.json\", \"r\") as f:\n            children_map = json.load(f)\n\n        return tree, children_map\n"
  },
  {
    "path": "backend/diskdiff.py",
    "content": "\nimport difflib\nimport hashlib\n\nimport logging\nimport stat as statlib\n\nimport sys\nimport os\nimport inspect\n\nimport unified_diff\n\n\n# Hacks to import the config from the parent directory.\ncurrentdir = os.path.dirname(os.path.abspath(\n    inspect.getfile(inspect.currentframe())))\nparentdir = os.path.dirname(currentdir)\nsys.path.insert(0, parentdir)\n\nimport config  # noqa\n\n\nclass DiskDiffer(object):\n    # Class constant that defines the default read buffer size.\n    _READ_BUFFER_SIZE = 16 * 1024 * 1024\n\n    MAX_SIZE = 1024 * 1024 * 2  # 2MB\n\n    _STAT_ATTRIBUTES = set([\n        \"type\",\n        \"owner_identifier\",\n        \"group_identifier\",\n        \"mode\",\n    ])\n\n    _TIME_ATTRIBUTES = set([\n        \"access_time\",\n        \"added_time\",\n        \"change_time\",\n        \"creation_time\",\n        \"modification_time\",\n    ])\n\n    _ATTRIBUTE_ATTRIBUTES = set([\n        \"name\",\n    ])\n    diff_type = \"disk\"\n\n    def __init__(self, a_file_lister, b_file_lister,\n                 use_stat=True,\n                 use_times=True,\n                 use_attributes=True,\n                 use_contents=True,\n                 ignore_binary=True,\n                 ignore_directories=False,\n                 ignore_contents_unchanged=False,\n                 show_times=False,\n                 only_changed_files=False,\n                 **kwargs):\n        \"\"\"\n        a: { path: str -> file_entry FileEntry }\n        b: { path: str -> file_entry FileEntry }\n        \"\"\"\n        # Save options for creating unique caches later.\n        self.init_options = locals()\n\n        self.a_file_lister = a_file_lister\n        self.b_file_lister = b_file_lister\n\n        self.a_file_map = {}\n        self.b_file_map = {}\n\n        self.use_stat = use_stat\n        self.use_times = use_times\n        self.use_attributes = use_attributes\n        self.use_contents = use_contents\n        self.ignore_binary = ignore_binary\n        self.ignore_directories = ignore_directories\n        self.ignore_contents_unchanged = ignore_contents_unchanged\n        self.show_times = show_times\n        self.only_changed_files = only_changed_files\n\n        self.changed_file_paths = set()\n\n        self.diffs = {}\n\n    def get_a_file(self, path):\n\n        file_lister_cache_hit = self.a_file_lister.file_entries.get(path)\n\n        if file_lister_cache_hit:\n            return file_lister_cache_hit\n\n        if path in self.a_file_map:\n            return self.a_file_map[path]\n\n        file_entry = self.a_file_lister.GetFileEntry(path)\n\n        self.a_file_map[path] = file_entry\n\n        return file_entry\n\n    def get_b_file(self, path):\n\n        file_lister_cache_hit = self.b_file_lister.file_entries.get(path)\n\n        if file_lister_cache_hit:\n            return file_lister_cache_hit\n\n        if path in self.b_file_map:\n            return self.b_file_map[path]\n\n        file_entry = self.b_file_lister.GetFileEntry(path)\n\n        self.b_file_map[path] = file_entry\n\n        return file_entry\n\n    def get_file(self, path):\n        \"\"\"Just get the file, don't care whether it's from before or after\"\"\"\n        b_file = self.get_b_file(path)\n        if b_file:\n            return b_file\n        return self.get_a_file(path)\n\n    def diff_all(self):\n        # Step 1, find files which are different\n        changed_file_paths = self.get_changed_files()\n        results = {}\n\n        for path in changed_file_paths:\n            if self._should_ignore(path):\n                continue\n\n            result = self.diff(path)\n\n            if result is None:\n                logging.debug(f\"Ignoring diffing (no diff): {path}\")\n                continue\n\n            virtual_path = path\n\n            results[virtual_path] = result\n\n        return results\n\n    def diff(self, path):\n        \"\"\"\n            Returns:\n                (virtual_path: str, merged_diff: list) | None\n        \"\"\"\n        if path in self.diffs:\n            return self.diffs[path]\n\n        if self._should_ignore(path):\n            return None\n\n        # Step 2, diff those files\n        # (Get diffable attributes, then return diff for each one)\n        a_file = self.get_a_file(path)\n        b_file = self.get_b_file(path)\n\n        stat_diff = times_diff = attribute_diff = contents_diff = []\n        diff_kwargs = self._make_diff_kwargs(path)\n\n        if self.use_stat:\n            stat_diff = list(difflib.unified_diff(\n                self.get_stat_sequence(\n                    a_file), self.get_stat_sequence(b_file),\n                **diff_kwargs\n            ))\n\n        if self.show_times:\n            times_diff = list(difflib.unified_diff(\n                self.get_times_sequence(\n                    a_file), self.get_times_sequence(b_file),\n                **diff_kwargs\n            ))\n\n        if self.use_attributes:\n            attribute_diff = list(difflib.unified_diff(\n                self.get_attribute_sequence(\n                    a_file), self.get_attribute_sequence(b_file),\n                **diff_kwargs\n            ))\n\n        has_contents = a_file is not None and a_file.IsFile(\n        ) or b_file is not None and b_file.IsFile()\n\n        # We're not ignoring binary if we're here, so treat the files as if they might be binary.\n        if self.use_contents and has_contents:\n            # Don't try and diff files larger than MAX_SIZE\n            if (a_file and a_file.size > self.MAX_SIZE) or (b_file and b_file.size > self.MAX_SIZE):\n                logging.info(f\"Generating generic diff: (too big): {path}\")\n                size = b_file.size if b_file else a_file.size\n                contents_diff = [\n                    f\"--- {path}\\n\",\n                    f\"+++ {path}\\n\",\n                    \"@@ 0,0 +0,0 @@\\n\",\n                    # Note the extra space for Unified Diff format.\n                    f\" File too large to diff ({size}B)\\n\"\n                ]\n\n            # If the file is binary, diff it as binary.\n            elif not self.ignore_binary:\n                files = [a_file, b_file]\n                existing_files = [f for f in files if f is not None]\n                binary_files = [\n                    self._is_binary(f) for f in existing_files\n                ]\n\n                # If the files are both binary (or one is None and the other is binary), diff them as binary\n                if all(binary_files):\n                    if self._compare_binaries(a_file, b_file):\n                        contents_diff = [\n                            f\"--- {path}\\n\",\n                            f\"+++ {path}\\n\",\n                            \"@@ 0,0 +0,0 @@\\n\",\n                            \" Binary files differ\\n\"\n                        ]\n            else:\n                # If at least one file is not binary, do a real diff.\n                # If only one is binary, just consider it the string \"Binary File\"\n                a_contents_sequence = self.get_contents_sequence(\n                    a_file)\n                b_contents_sequence = self.get_contents_sequence(\n                    b_file)\n\n                # If both are nonbinary (😎😎😎) diff them as text\n                contents_diff = list(difflib.unified_diff(\n                    a_contents_sequence,\n                    b_contents_sequence,\n                    **self._make_diff_kwargs(path)))\n\n        if not any((stat_diff, times_diff, attribute_diff, contents_diff)):\n            logging.debug(f\"Ignoring (no diff): {path}\")\n            return None\n\n        # If it's a file, and the contents are unchanged, ignore it.\n        # (Don't ignore directories though, because they don't have contents.)\n        if not self.get_file(path).IsDirectory() and not contents_diff and self.ignore_contents_unchanged:\n            return None\n\n        merged_diff = self.merge_diffs(\n            stat_diff, times_diff, attribute_diff, contents_diff)\n\n        # Add headers to conform with git diff format and look pretty for diff2html\n        init_header = f\"diff --git {path} {path}\"\n\n        added_removed_header = \"\"\n        if a_file is None:\n            mode = b_file.GetStatAttribute().mode\n            if mode is not None:\n                mode = format(mode, \"o\")\n            else:\n                mode = \"<unknown>\"\n            added_removed_header = f\"new file mode {mode}\"\n\n        if b_file is None:\n            mode = a_file.GetStatAttribute().mode\n            if mode is not None:\n                mode = format(mode, \"o\")\n            else:\n                mode = \"<unknown>\"\n            added_removed_header = f\"deleted file mode {mode}\"\n\n        self.add_header(merged_diff, added_removed_header)\n        self.add_header(merged_diff, init_header)\n\n        diff = unified_diff.UnifiedDiff(\n            merged_diff, is_dir=self.get_file(path).IsDirectory())\n\n        self.diffs[path] = diff\n        return diff\n\n    def _should_ignore(self, path):\n\n        if not path:\n            return True\n\n        a_file = self.get_a_file(path)\n        b_file = self.get_b_file(path)\n\n        if self.ignore_directories and (a_file and a_file.IsDirectory() or b_file and b_file.IsDirectory()):\n            logging.info(f\"Ignoring (directory): {path}\")\n            return True\n\n        a_is_binary = self._is_binary(a_file)\n        b_is_binary = self._is_binary(b_file)\n\n        # Ignore this file if it is or was binary\n        if self.ignore_binary and (a_is_binary or b_is_binary):\n            logging.info(f\"Ignoring (binary): {path}\")\n            return True\n\n        return False\n\n    def _make_diff_kwargs(self, path, pseudo_file_type=None):\n        kwargs = {\n            \"n\": 0\n        }\n\n        from_path = path\n        to_path = path\n\n        # Add pseudo file types (e.g. \"stat\", \"attributes\")\n        if pseudo_file_type:\n            from_path = f\"{from_path}.{pseudo_file_type}\"\n            to_path = f\"{to_path}.{pseudo_file_type}\"\n\n        kwargs[\"fromfile\"] = from_path\n        kwargs[\"tofile\"] = to_path\n\n        return kwargs\n\n    def add_header(self, delta, header):\n        \"\"\"Add an arbitrary header to a delta (sequence of diff lines)\"\"\"\n\n        if not delta or not header:\n            return\n\n        header_line = f\"{header}\\n\"\n\n        delta.insert(0, header_line)\n\n    def merge_diffs(self, stat_diff, times_diff, attribute_diff, contents_diff):\n        \"\"\"Merge all the diffs into one, adding the metadata diffs as their own special hunks\"\"\"\n        stat_hunk = self.create_hunk_diff(stat_diff, \"stat attributes\")\n        times_hunk = self.create_hunk_diff(times_diff, \"file times\")\n        attribute_hunk = self.create_hunk_diff(\n            attribute_diff, \"extended file attributes\")\n\n        for diff in (stat_diff, times_diff, attribute_diff, contents_diff):\n            if diff:\n                # --- a/file\n                # +++ b/file\n                headers = diff[:2]\n\n        if contents_diff:\n            contents_diff_hunks = contents_diff[2:]\n        else:\n            contents_diff_hunks = []\n\n        # Insert the metadata hunks into the content diff, before everything else (even the first hunk in the content diff)\n        merged_diff = []\n        merged_diff.extend(headers)\n        merged_diff.extend(stat_hunk)\n        merged_diff.extend(times_hunk)\n        merged_diff.extend(attribute_hunk)\n        merged_diff.extend(contents_diff_hunks)\n\n        return merged_diff\n\n    def create_hunk_diff(self, diff, name):\n        if not diff:\n            return []\n        headers, content = self.split_diff(diff)\n\n        # --- a/file\n        # +++ b/file\n        # @@ hunk header @@\n        hunk_header = headers[-1].rstrip(\"\\n\")\n        hunk_header = [f\"{hunk_header} {name}\\n\"]\n        hunk_diff = hunk_header + content\n        return hunk_diff\n\n    def split_diff(self, diff):\n        \"\"\"Return (headers: list, content: list)\"\"\"\n\n        return diff[:3], diff[3:]\n\n    def equal(self, file1, file2):\n        \"\"\"Compares two file_entry objects\"\"\"\n\n        if file1.size != file2.size:\n            return False\n\n        # Compare stat\n        if self.use_stat and not self._equal_stat(file1, file2):\n            return False\n\n        # Compare times\n        if self.use_times and not self._equal_times(file1, file2):\n            return False\n\n        # Compare attributes\n        if self.use_attributes and not self._equal_attributes(file1, file2):\n            return False\n\n        # TODO: Optionally diff hashes\n\n        return True\n\n    def _is_binary(self, file):\n\n        if file is None:\n            return False\n        textchars = bytearray({7, 8, 9, 10, 12, 13, 27}\n                              | set(range(0x20, 0x100)) - {0x7f})  # noqa\n\n        file_obj = file.GetFileObject()\n        if file_obj is None:\n            return False\n        try:\n            header = file_obj.read(512)\n            file_obj.seek(0)\n\n            try:\n                header.decode(\"utf8\", errors=\"strict\")\n            except UnicodeDecodeError:\n                return True\n\n            return bool(header.translate(None, textchars))\n\n        except OSError:\n            logging.warning(f\"Failed to read {file.path_spec.location}\")\n            return True\n\n    def _compare_binaries(self, file1, file2):\n\n        return self._hash_file(file1) == self._hash_file(file2)\n\n    def _hash_file(self, file_entry):\n        \"\"\"Calculates a message digest hash of the data of the file entry.\n\n        Args:\n        file_entry (dfvfs.FileEntry): file entry.\n\n        Returns:\n        str: digest hash or None.\n        \"\"\"\n        if file_entry is None:\n            return None\n\n        if file_entry.IsDevice() or file_entry.IsPipe() or file_entry.IsSocket():\n            # Ignore devices, FIFOs/pipes and sockets.\n            return None\n\n        hash_context = hashlib.sha256()\n\n        try:\n            file_object = file_entry.GetFileObject()\n        except IOError as exception:\n            logging.warning((\n                'Unable to open path specification:\\n{0:s}'\n                'with error: {1!s}').format(file_entry.path_spec.location, exception))\n            return None\n\n        if not file_object:\n            return None\n\n        try:\n            data = file_object.read(self._READ_BUFFER_SIZE)\n            while data:\n                hash_context.update(data)\n                data = file_object.read(self._READ_BUFFER_SIZE)\n        except IOError as exception:\n            logging.warning((\n                'Unable to read from path specification:\\n{0:s}'\n                'with error: {1!s}').format(file_entry.path_spec.location, exception))\n            return None\n\n        return hash_context.hexdigest()\n\n    def get_stat_sequence(self, file):\n        if file is None:\n            return []\n\n        stat = file.GetStatAttribute()\n        out = []\n        for attr in self._STAT_ATTRIBUTES:\n            value = getattr(stat, attr)\n            if value and attr == \"mode\":\n                value = statlib.filemode(value)\n\n            line = f\"{attr}: {value}\\n\"\n            out.append(line)\n        return out\n\n    def get_times_sequence(self, file):\n        if file is None:\n            return []\n        out = []\n        for attr in self._TIME_ATTRIBUTES:\n            line = f\"{attr}: {getattr(file, attr).CopyToDateTimeStringISO8601()}\\n\"\n            out.append(line)\n        return out\n\n    def get_attribute_sequence(self, file):\n\n        def _get_attribute_value(attribute):\n\n            # macOS dfvfs\n            if hasattr(attribute, \"read\"):\n                attribute_value = attribute.read().decode(errors=\"ignore\")\n                return attribute_value\n\n            # Windows dfvfs\n            elif hasattr(attribute, \"name\"):\n                attribute_value = attribute.name\n                return attribute_value\n\n            return None\n\n        if file is None:\n            return []\n        out = []\n\n        for attribute in file.attributes:\n            attribute_value = _get_attribute_value(attribute)\n            if attribute_value:\n                line = f\"{attribute.name}: {attribute_value}\\n\"\n                out.append(line)\n        return out\n\n    def get_contents_sequence(self, file):\n        if file is None:\n            return []\n\n        if not self.ignore_binary and self._is_binary(file):\n            return [\"<Binary file>\\n\"]\n\n        file_obj = file.GetFileObject()\n\n        if file_obj is None:\n            return []\n\n        contents = file_obj.read().decode(\"utf8\", \"ignore\")\n\n        lines = []\n        # Make sure all lines end with newlines, to conform with diff format.\n\n        for line in contents.split(\"\\n\"):\n            lines.append(line + \"\\n\")\n\n        return lines\n\n    def _equal_stat(self, file1, file2):\n\n        stat1 = file1.GetStatAttribute()\n        stat2 = file2.GetStatAttribute()\n        for attr in self._STAT_ATTRIBUTES:\n            if getattr(stat1, attr) != getattr(stat2, attr):\n                return False\n\n        return True\n\n    def _equal_times(self, file1, file2):\n\n        for attr in self._TIME_ATTRIBUTES:\n            if getattr(file1, attr) != getattr(file2, attr):\n                return False\n\n        return True\n\n    def _equal_attributes(self, file1, file2):\n\n        if file1.number_of_attributes != file2.number_of_attributes:\n            return False\n\n        for attr1, attr2 in zip(file1.attributes, file2.attributes):\n\n            # Only check the attributes we care about when considering equality.\n            # (We have literally invented prejudice today boys.)\n\n            for attr in self._ATTRIBUTE_ATTRIBUTES:\n                if hasattr(attr1, attr) and hasattr(attr2, attr):\n                    if getattr(attr1, attr) != getattr(attr2, attr):\n                        return False\n\n        return True\n\n    def get_run_id(self):\n        return config.RUN_ID\n\n    def get_changed_files(self):\n\n        if self.changed_file_paths:\n            return self.changed_file_paths\n\n        # Otherwise, we need to list the files in A and B first\n        # This is the slowest part.\n        self.a_file_lister.ListFileEntries()\n        self.b_file_lister.ListFileEntries()\n\n        # If path doesn't exist, consider it different\n\n        changed_file_paths = set()\n\n        a_paths_set = set(self.a_file_lister.file_entries.keys())\n        b_paths_set = set(self.b_file_lister.file_entries.keys())\n        self.added_files = b_paths_set - a_paths_set\n        self.deleted_files = a_paths_set - b_paths_set\n\n        if not self.only_changed_files:\n            changed_file_paths = changed_file_paths | self.added_files | self.deleted_files\n\n        # Get all files in A but not B (and vice versa), and consider them different\n        remaining_paths = a_paths_set & b_paths_set\n\n        # These paths are guaranteed to be in both A and B\n        for path in remaining_paths:\n            a_file = self.get_a_file(path)\n            b_file = self.get_b_file(path)\n\n            if not self.equal(a_file, b_file):\n                changed_file_paths.add(path)\n\n        logging.info(f\"Files (from): {len(a_paths_set)}\")\n        logging.info(f\"Files (to): {len(b_paths_set)}\")\n        logging.info(f\"Files (both): {len(remaining_paths)}\")\n        logging.info(f\"Files added: {len(self.added_files)}\")\n        logging.info(f\"Files deleted: {len(self.deleted_files)}\")\n        logging.info(\n            f\"Files changed (including binary): {len(changed_file_paths)}\")\n        logging.debug(\"Changed files: \")\n        logging.debug(changed_file_paths)\n\n        self.changed_file_paths = changed_file_paths\n\n        return self.changed_file_paths\n"
  },
  {
    "path": "backend/file_entry_lister.py",
    "content": "import re\nimport logging\n\n\nfrom dfvfs.helpers import volume_scanner\nfrom dfvfs.lib import definitions as dfvfs_definitions\nfrom dfvfs.lib import errors\nfrom dfvfs.resolver import resolver\nfrom dfvfs.path import factory\n\n\nclass FileEntryLister(volume_scanner.VolumeScanner):\n    \"\"\"File entry lister.\"\"\"\n\n    _NON_PRINTABLE_CHARACTERS = list(range(0, 0x20)) + list(range(0x7f, 0xa0))\n    _ESCAPE_CHARACTERS = str.maketrans({\n        value: '\\\\x{0:02x}'.format(value)\n        for value in _NON_PRINTABLE_CHARACTERS})\n\n    def __init__(self, source, volume_scanner_options, mediator=None, ignore_dirs=None, allow_dirs=None):\n        \"\"\"Initializes a file entry lister.\n\n        Args:\n          mediator (VolumeScannerMediator): a volume scanner mediator.\n        \"\"\"\n        super(FileEntryLister, self).__init__(mediator=mediator)\n\n        if ignore_dirs is None:\n            ignore_dirs = set()\n        if allow_dirs is None:\n            allow_dirs = set([\"/\"])\n\n        self.allow_dirs = allow_dirs\n        self.ignore_dirs = ignore_dirs\n\n        self._list_only_files = False\n\n        self.base_path_specs = self.GetBasePathSpecs(\n            source, options=volume_scanner_options)\n\n        self.source = source\n\n        if not self.base_path_specs:\n            raise Exception(\n                f'{source}: No supported file system found in source.')\n\n        # TODO: Support multiple base path specs\n        self.base_path_spec = self.base_path_specs[0]\n        self.file_system = resolver.Resolver.OpenFileSystem(\n            self.base_path_spec)\n\n        self.file_entries = {}\n\n    def _GetDisplayPath(self, path_spec, path_segments, data_stream_name):\n        \"\"\"Retrieves a path to display.\n\n        Args:\n          path_spec (dfvfs.PathSpec): path specification of the file entry.\n          path_segments (list[str]): path segments of the full path of the file\n              entry.\n          data_stream_name (str): name of the data stream.\n\n        Returns:\n          str: path to display.\n        \"\"\"\n        display_path = ''\n\n        if path_spec.HasParent():\n            parent_path_spec = path_spec.parent\n            if parent_path_spec and parent_path_spec.type_indicator in (\n                    dfvfs_definitions.PARTITION_TABLE_TYPE_INDICATORS):\n                display_path = ''.join(\n                    [display_path, parent_path_spec.location])\n\n        path_segments = [\n            segment.translate(self._ESCAPE_CHARACTERS) for segment in path_segments]\n        display_path = ''.join([display_path, '/'.join(path_segments)])\n\n        if data_stream_name:\n            data_stream_name = data_stream_name.translate(\n                self._ESCAPE_CHARACTERS)\n            display_path = ':'.join([display_path, data_stream_name])\n\n        return display_path or '/'\n\n    def _ShouldListDir(self, file_entry):\n\n        location = file_entry.path_spec.location\n\n        for allow_dir in self.allow_dirs:\n            if location.startswith(allow_dir) or allow_dir.startswith(location):\n                for ignore_dir in self.ignore_dirs:\n                    # Convert to raw string so backslashes aren't interpreted as escapes.\n                    ignore_dir = repr(ignore_dir).strip(\"'\")\n                    if re.search(ignore_dir, location):\n                        return False\n                return True\n\n        return False\n\n    def _ListFileEntry(\n            self, file_entry):\n        \"\"\"Lists a file entry.\n\n        Args:\n          file_entry (dfvfs.FileEntry): file entry to list.\n        \"\"\"\n        def _dedup_backslashes(path):\n            return path.replace(\"\\\\\\\\\", \"\\\\\")\n\n        location = file_entry.path_spec.location\n        if location.startswith(\"\\\\\"):\n\n            location = _dedup_backslashes(location)\n\n        self.file_entries[location] = file_entry\n\n        try:\n            for sub_file_entry in file_entry.sub_file_entries:\n\n                if not self._ShouldListDir(sub_file_entry):\n                    continue\n\n                self._ListFileEntry(sub_file_entry)\n\n        except OSError as e:\n            if \"unable to read MFT entry:\" in str(e):\n                logging.error(\n                    f\"{self.source}: Unable to list subdirectories for {location}: MFT is corrupted. Try chkdsk first?\")\n            else:\n                logging.error(\n                    f\"{self.source}: Unable to list subdirectories for {location}\")\n                logging.debug(\n                    f\"{self.source}: {e}\")\n\n    def ListFileEntries(self):\n        \"\"\"Lists file entries in the base path specification.\"\"\"\n        for base_path_spec in self.base_path_specs:\n            self.file_system = resolver.Resolver.OpenFileSystem(base_path_spec)\n            file_entry = resolver.Resolver.OpenFileEntry(base_path_spec)\n\n            if file_entry is None:\n                logging.warning(\n                    'Unable to open base path specification:\\n{0:s}'.format(\n                        base_path_spec))\n                return\n\n            self._ListFileEntry(file_entry)\n\n    def GetFileEntry(self, path):\n\n        for base_path_spec in self.base_path_specs:\n            path_spec = factory.Factory.NewPathSpec(\n                base_path_spec.type_indicator, location=path, parent=self.base_path_spec.parent)\n            try:\n                file_entry = resolver.Resolver.OpenFileEntry(path_spec)\n                if file_entry:\n                    return file_entry\n            except errors.BackEndError:\n                logging.warning(\n                    f\"{base_path_spec.location}: Unable to open file: {path}\")\n\n        return None\n"
  },
  {
    "path": "backend/memdiff.py",
    "content": "\n\nimport collections\nimport difflib\nimport json\nimport re\nimport logging\n\nimport unified_diff\n\n\nclass MemoryDiffer(object):\n    # TODO: Inherit from a shared \"Differ\" class\n    diff_type = \"process\"\n\n    def __init__(self, from_pslist, to_pslist, from_envars=None, to_envars=None, from_cmdline=None, to_cmdline=None, ignore_regex=\"\"):\n\n        self.ignore_regex = ignore_regex\n        self.from_procs = self._list_by_id(from_pslist)\n        self.to_procs = self._list_by_id(to_pslist)\n\n        self.add_envars(from_envars, to_envars)\n        self.add_cmdline(from_cmdline, to_cmdline)\n\n        self.all_pids = set(self.from_procs.keys()) | set(self.to_procs.keys())\n        self.diffs = {}\n\n    def diff_all(self):\n        if self.diffs:\n            return self.diffs\n\n        for pid in self.all_pids:\n            diff = self.diff(pid)\n            if diff:\n                self.diffs[pid] = diff\n        return self.diffs\n\n    def diff(self, pid):\n\n        if pid in self.diffs:\n            return self.diffs[pid]\n\n        from_proc = self.from_procs.get(pid, \"\")\n        to_proc = self.to_procs.get(pid, \"\")\n\n        # Ignore \"Required memory <address> is not valid (process exited?)\" errors.\n        if to_proc and \"is not valid (process exited?)\" in to_proc[\"CommandLine\"]:\n            to_proc = \"\"\n        if from_proc and \"is not valid (process exited?)\" in from_proc[\"CommandLine\"]:\n            from_proc = \"\"\n\n        kwargs = {}\n        fromfile = self._make_title(from_proc)\n        tofile = self._make_title(to_proc)\n\n        from_name = fromfile.split(\"-\")[0] if fromfile else \"\"\n        to_name = tofile.split(\"-\")[0] if tofile else \"\"\n\n        if self.ignore_regex:\n            # Ignore this proceses if the to or from process name matches the supplied regex.\n            if (from_name and re.search(self.ignore_regex, from_name)):\n                logging.info(\n                    f\"Ignoring due to filter regex: {from_name}\")\n                from_proc = \"\"\n            if (to_name and re.search(self.ignore_regex, to_name)):\n                logging.info(\n                    f\"Ignoring due to filter regex: {to_name}\")\n                to_proc = \"\"\n\n        # Use the other filename if one of the filenames is empty (because this is an added or deleted file)\n\n        fromfile = fromfile or tofile\n        tofile = tofile or fromfile\n\n        kwargs[\"fromfile\"] = fromfile\n        kwargs[\"tofile\"] = tofile\n\n        # Number of lines of context to show (show the entire process)\n        kwargs[\"n\"] = 999\n\n        result = list(difflib.unified_diff(\n            self._to_string(from_proc),\n            self._to_string(to_proc),\n            **kwargs\n        ))\n\n        if not result:\n            return None\n        # Add headers to conform with git diff format and look pretty for diff2html\n        init_header = f\"diff --git {fromfile} {tofile}\"\n\n        is_added = not from_proc and to_proc\n        is_removed = not to_proc and from_proc\n\n        added_removed_header = \"\"\n        if is_added:\n            added_removed_header = \"new file\"\n\n        if is_removed:\n            added_removed_header = \"deleted file\"\n\n        self.add_header(result, added_removed_header)\n        self.add_header(result, init_header)\n\n        ppid = self.get_ppid(pid)\n        title = self._make_title(to_proc or from_proc)\n        diff = unified_diff.UnifiedDiff(result, ppid=ppid, title=title)\n        return diff\n\n    def add_envars(self, from_envars, to_envars):\n\n        if not from_envars and to_envars:\n            return\n\n        def _add_envars_to_procs(envars, procs):\n\n            # Group vars by PID\n            pid_vars = collections.defaultdict(dict)\n            for var in envars:\n                key = var[\"Variable\"]\n                value = var[\"Value\"]\n                pid = str(var[\"PID\"])\n                pid_vars[pid][key] = value\n\n            # Add vars dict to PID in procs\n            for pid in pid_vars:\n                procs[pid][\"EnvironmentVariables\"] = pid_vars[pid]\n\n        _add_envars_to_procs(from_envars, self.from_procs)\n        _add_envars_to_procs(to_envars, self.to_procs)\n\n    def add_cmdline(self, from_cmdline, to_cmdline):\n\n        if not from_cmdline and to_cmdline:\n            return\n\n        def _add_cmdline_to_procs(cmdlines, procs):\n\n            # Group vars by PID\n            for cmdline in cmdlines:\n                args = cmdline[\"Args\"]\n                pid = str(cmdline[\"PID\"])\n\n                # Add \"Args\" field to existing processes by PID\n                procs[pid][\"CommandLine\"] = args\n\n        _add_cmdline_to_procs(from_cmdline, self.from_procs)\n        _add_cmdline_to_procs(to_cmdline, self.to_procs)\n\n    def _make_id(self, proc):\n        if not proc:\n            return \"\"\n        pid = str(proc[\"PID\"])\n        return pid\n\n    def _make_title(self, proc):\n        if not proc:\n            return \"\"\n        pid = proc[\"PID\"]\n        name = proc[\"ImageFileName\"]\n        return f\"{name}-{pid}\"\n\n    def _list_by_id(self, pslist):\n        procs = {}\n        for proc in pslist:\n            process_id = self._make_id(proc)\n            # Ignore \"Threads\" value, since it changes a lot and isn't worth diffing on.\n\n            del proc[\"Threads\"]\n\n            procs[process_id] = proc\n        return procs\n\n    def _to_string(self, proc):\n        if not proc:\n            return \"\"\n        return [line + \"\\n\" for line in json.dumps(proc,\n                                                   separators=(',', ': '),\n                                                   sort_keys=True,\n                                                   indent=4).split(\"\\n\")]\n\n    def get_ppid(self, pid):\n        from_proc = self.from_procs.get(pid)\n        to_proc = self.to_procs.get(pid)\n\n        # Select whichever one isn't none, defaulting to to_proc.\n        proc = to_proc or from_proc\n\n        ppid = str(proc.get(\"PPID\", \"\"))\n        return ppid\n\n    def add_header(self, delta, header):\n        \"\"\"Add an arbitrary header to a delta (sequence of diff lines)\"\"\"\n\n        if not delta or not header:\n            return\n\n        header_line = f\"{header}\\n\"\n\n        delta.insert(0, header_line)\n"
  },
  {
    "path": "backend/pyvmdk_delta.py",
    "content": "\nimport pyvmdk\nimport os\n\n\nclass handle(object):\n\n    \"\"\"Trick dfvfs into keeping the parent handles in scope by storing them in this object, which is going to masquerade as a pyvmdk.handle\"\"\"\n\n    # The list of parent handles. Even though we never read from this list, storing parent handles in it keeps them in scope, preventing them from being deallocated.\n    parent_handles = []\n\n    def __init__(self):\n        self.parent = None\n        self._handle = pyvmdk.handle()\n\n    def open(self, path):\n        \"\"\"Open a handle to a VMDK path\n            AND open any parent delta files\n            AND open extent data files for all VMDK files\"\"\"\n\n        self._handle.open(path)\n        self._handle.open_extent_data_files()\n\n        parent_filename = self._handle.get_parent_filename()\n\n        # If this disk is a delta disk, set its parent.\n        if parent_filename:\n\n            # Delta disks contain the filename to their parent disk, not the full path,\n            # so we expect the parent disk to be in the same directory.\n            parent_path = os.path.join(os.path.dirname(path), parent_filename)\n\n            parent_handle = handle()\n            # The parent disk may itself be a child of another disk, so recurse.\n            parent_handle.open(parent_path)\n\n            self.parent_handles.append(parent_handle)\n\n            self._handle.set_parent(parent_handle._handle)\n\n    def __getattribute__(self, name):\n\n        # Hard code the list of attributes, because try/except is slow.\n        if name in (\"__getattribute__\", \"_handle\", \"open\", \"parent\", \"__init__\", \"parent_handles\"):\n            return object.__getattribute__(self, name)\n        else:\n            return getattr(self._handle, name)\n"
  },
  {
    "path": "backend/requirements.txt",
    "content": "libvmdk-python\ndfvfs==20220816"
  },
  {
    "path": "backend/unified_diff.py",
    "content": "\nclass UnifiedDiff(object):\n\n    def __init__(self, diff_lines, is_dir=None, ppid=None, title=None):\n        self.diff_lines = diff_lines\n        self._iter = iter(diff_lines)\n        self.is_dir = is_dir\n\n        self.title = title\n\n        # Parent PID if this is a process node.\n        self.ppid = ppid\n\n        header = diff_lines[1]\n        if header.startswith(\"new\"):\n            self.status = \"added\"\n        elif header.startswith(\"deleted\"):\n            self.status = \"removed\"\n        else:\n            self.status = \"modified\"\n\n        self.lines_added = 0\n        self.lines_removed = 0\n        for line in diff_lines:\n            # Ignore --- and +++ lines\n            if line.startswith(\"+\") and not line.startswith(\"++\"):\n                self.lines_added += 1\n            if line.startswith(\"-\") and not line.startswith(\"--\"):\n                self.lines_removed += 1\n\n    def __next__(self):\n        return next(self._iter)\n"
  },
  {
    "path": "backend/utils.py",
    "content": "import pathlib \n\ndef ensure_posix(path):\n    if path.startswith(\"\\\\\"):\n        # Force POSIX path so that we can create the directory structure in the Docker container, even if the path is Windows.\n        path = pathlib.PureWindowsPath(path).as_posix()\n    path = pathlib.Path(path)\n    return path"
  },
  {
    "path": "backend/vmdiff.py",
    "content": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\"\"\"Script to list file entries.\"\"\"\n\nfrom dfvfs.helpers import command_line\nfrom dfvfs.helpers import volume_scanner\nimport memdiff\nimport diff_tree\nimport diffcache\nimport diskdiff\nimport file_entry_lister\nimport logging\nimport sys\nimport os\nimport json\nimport inspect\n\nimport hashlib\n\n\n# Hacks to import the config from the parent directory.\ncurrentdir = os.path.dirname(os.path.abspath(\n    inspect.getfile(inspect.currentframe())))\nparentdir = os.path.dirname(currentdir)\nsys.path.insert(0, parentdir)\n\nimport config  # noqa\n\nlogging.basicConfig(\n    format='[%(asctime)s]:%(levelname)s:%(message)s', level=config.LOG_LEVEL)\n\n\nclass CachingStdinInputReader(command_line.StdinInputReader):\n    \"\"\"Remembers the last input, so it can be reused.\"\"\"\n\n    def __init__(self, encoding='utf-8'):\n        \"\"\"\n        Args:\n            encoding (Optional[str]): input encoding.\n        \"\"\"\n        super(CachingStdinInputReader, self).__init__(encoding=encoding)\n        self.last_input = None\n\n    def Read(self):\n        self.last_input = super(CachingStdinInputReader, self).Read()\n        return self.last_input\n\n\ndef load_memory_results():\n    memory_run_name = f\"{config.FROM_MEMORY_IMAGE_FILENAME}__{config.TO_MEMORY_IMAGE_FILENAME}\"\n    memory_run_path = os.path.join(\n        config.RESULTS_DIR, \"memory\", memory_run_name)\n    results = {}\n    for plugin in config.MEMORY_PLUGINS:\n\n        from_plugin_path = os.path.join(memory_run_path, f\"from-{plugin}.json\")\n        to_plugin_path = os.path.join(memory_run_path, f\"to-{plugin}.json\")\n        with open(from_plugin_path) as f:\n            from_plugin = json.load(f)\n        with open(to_plugin_path) as f:\n            to_plugin = json.load(f)\n\n        results[plugin] = (from_plugin, to_plugin)\n\n    return results\n\n\ndef dump_api_data(cache):\n\n    run_path = cache.tree_path\n\n    dump_dir = run_path / \"json\"\n    children_dir = dump_dir / \"children\"\n    diff_dir = dump_dir / \"diff\"\n\n    if config.USE_CACHE and dump_dir.exists():\n        return\n\n    logging.info(f\"Generating API data for static site: {dump_dir}\")\n\n    os.makedirs(dump_dir, exist_ok=True)\n    os.makedirs(children_dir, exist_ok=True)\n    os.makedirs(diff_dir, exist_ok=True)\n\n    logging.info(f\"Dumping API data to {dump_dir}\")\n\n    tree, children_map = cache.get_tree_data_from_cache()\n\n    json.dump(tree, open(dump_dir / \"changed_files\", \"w\"))\n\n    # Dump all the data with url encoded keys, so we can serve it statically later\n    for key, children in children_map.items():\n\n        # Make sure to also encode the \"/\" character.\n        filename = hashlib.sha1(key.encode(\"utf8\")).hexdigest()\n\n        path = children_dir / filename\n        json.dump(children, open(path, \"w\"))\n\n        # Get the diff and dump it too.\n        diff = cache.get_diff(key)\n        if diff is None:\n            result = None\n        else:\n            result = diff.diff_lines\n\n        if result:\n            path = diff_dir / filename\n            json.dump(result, open(path, \"w\"))\n\n\ndef Main():\n\n    # Leave Blank or invalid for interactive prompt\n    partition = config.PARTITION\n    VOLUMES = \"all\"\n\n    logging.basicConfig(\n        level=logging.INFO, format='[%(levelname)s] %(message)s')\n\n    caching_input_reader = CachingStdinInputReader()\n    mediator = command_line.CLIVolumeScannerMediator(\n        input_reader=caching_input_reader)\n\n    volume_scanner_options = volume_scanner.VolumeScannerOptions()\n    volume_scanner_options.partitions = mediator.ParseVolumeIdentifiersString(\n        partition)\n\n    volume_scanner_options.volumes = mediator.ParseVolumeIdentifiersString(\n        VOLUMES)\n\n    # Init disk file listers.\n    parent_lister = file_entry_lister.FileEntryLister(\n        config.FROM_DISK_PATH, volume_scanner_options, mediator=mediator, ignore_dirs=config.ignore_dirs, allow_dirs=config.allow_dirs)\n\n    partition_input = partition\n    if not partition_input:\n        # Get the input the user gave the first time, if any.\n        partition_input = caching_input_reader.last_input\n\n    volume_scanner_options.partitions = list(mediator.ParseVolumeIdentifiersString(\n        partition_input))\n\n    delta_lister = file_entry_lister.FileEntryLister(\n        config.TO_DISK_PATH, volume_scanner_options, mediator=mediator, ignore_dirs=config.ignore_dirs, allow_dirs=config.allow_dirs)\n\n    # ls parition to make sure it's the right one:\n    if not partition:\n        entries = list(\n            parent_lister.file_system.GetRootFileEntry().sub_file_entries)\n        ls_root = [e.name for e in entries]\n        logging.info(f\"Partition {partition} root files: {ls_root}\")\n\n    diff_config = config.diff_config\n    differ = diskdiff.DiskDiffer(\n        parent_lister, delta_lister,\n        **diff_config\n    )\n\n    USE_CACHE = config.USE_CACHE\n\n    run_process_path = config.RUN_MEMORY_PATH if config.USE_MEMORY else None\n\n    cache = diffcache.DiffCache(\n        config.RUN_DISK_PATH, config.RUN_TREE_PATH, run_process_path)\n\n    if USE_CACHE and cache.cache_exists() and (not config.USE_MEMORY or (config.USE_MEMORY and cache.process_cache_exists())):\n        # Slice off the leading \"/\" and trailing \"/disk\"\n        results_dir = os.path.join(*cache.run_path.parts[1:-1])\n        logging.info(f\"Results already cached at: {str(results_dir)}\")\n        # The diffs can be accessed via cache.get_diff_from_cache(path)\n    else:\n        logging.info(\"No cache found, diffing... \")\n\n        if config.USE_DISK:\n            logging.info(\"Diffing disk... \")\n\n            # Get results and cache them.\n            differ.get_changed_files()\n            results = differ.diff_all()\n\n            if not results:\n                logging.info(\"No disk differences found.\")\n            cache.cache_results(results)\n\n            # Now render the tree\n            disk_tree = diff_tree.DiffTree(differ)\n\n        if config.USE_MEMORY:\n            logging.info(\"Diffing memory... \")\n\n            plugin_results = load_memory_results()\n            from_pslist, to_pslist = plugin_results.get(\n                \"windows.pslist.PsList\")\n            from_envars, to_envars = plugin_results.get(\n                \"windows.envars.Envars\")\n            from_cmdline, to_cmdline = plugin_results.get(\n                \"windows.cmdline.CmdLine\")\n\n            # Load pslists already provided by memory-processing.\n            mem_differ = memdiff.MemoryDiffer(from_pslist,\n                                              to_pslist,\n                                              from_envars=from_envars,\n                                              to_envars=to_envars,\n                                              from_cmdline=from_cmdline,\n                                              to_cmdline=to_cmdline,\n                                              ignore_regex=config.IGNORE_PROCESSES_REGEX)\n\n            memdiffs = mem_differ.diff_all()\n            if not memdiffs:\n                logging.info(\"No memory differences found.\")\n            cache.cache_process_results(memdiffs)\n\n            mem_tree = diff_tree.DiffTree(mem_differ)\n\n        if config.USE_DISK and config.USE_MEMORY:\n            merged_tree = disk_tree.merge(mem_tree)\n        elif config.USE_DISK:\n            merged_tree = disk_tree\n        elif config.USE_MEMORY:\n            merged_tree = mem_tree\n        else:\n            raise RuntimeError(\n                \"Must set either USE_DISK or USE_MEMORY, otherwise what am I supposed to diff, huh wise guy\")\n\n        logging.debug(merged_tree.children_map)\n\n        cache.cache_tree(merged_tree)\n        dump_api_data(cache)\n\n        logging.info(f\"Saved results to {cache.run_path}\")\n\n    return cache\n\n\nif __name__ == '__main__':\n    Main()\n"
  },
  {
    "path": "backend/vmdk_file_io.py",
    "content": "# -*- coding: utf-8 -*-\n\"\"\"The VMDK image file-like object.\"\"\"\n# Copy of `vmdk_file_io` we patch and copy into the docker container.\n\n# This is the patch.\nimport pyvmdk_delta as pyvmdk\n\nfrom dfvfs.file_io import file_object_io\nfrom dfvfs.lib import errors\nfrom dfvfs.path import factory as path_spec_factory\nfrom dfvfs.resolver import resolver\n\n\nclass VMDKFile(file_object_io.FileObjectIO):\n    \"\"\"File input/output (IO) object using pyvmdk.\"\"\"\n\n    def _OpenFileObject(self, path_spec):\n        \"\"\"Opens the file-like object defined by path specification.\n\n        Args:\n          path_spec (PathSpec): path specification.\n\n        Returns:\n          pyvmdk.handle: a file-like object.\n\n        Raises:\n          IOError: if the file-like object could not be opened.\n          OSError: if the file-like object could not be opened.\n          PathSpecError: if the path specification is incorrect.\n        \"\"\"\n        if not path_spec.HasParent():\n            raise errors.PathSpecError(\n                'Unsupported path specification without parent.')\n\n        parent_path_spec = path_spec.parent\n\n        parent_location = getattr(parent_path_spec, 'location', None)\n        if not parent_location:\n            raise errors.PathSpecError(\n                'Unsupported parent path specification without location.')\n\n        # Note that we cannot use pyvmdk's open_extent_data_files_as_file_objects\n        # function since it does not handle the file system abstraction dfVFS\n        # provides.\n\n        file_system = resolver.Resolver.OpenFileSystem(\n            parent_path_spec, resolver_context=self._resolver_context)\n\n        file_object = resolver.Resolver.OpenFileObject(\n            parent_path_spec, resolver_context=self._resolver_context)\n\n        vmdk_handle = pyvmdk.handle()\n        vmdk_handle.open(parent_location)\n\n        return vmdk_handle\n\n    def open_extent_data_files(self, vmdk_handle, parent_path_spec):\n\n        parent_location = getattr(parent_path_spec, 'location', None)\n        file_system = resolver.Resolver.OpenFileSystem(\n            parent_path_spec, resolver_context=self._resolver_context)\n\n        parent_location_path_segments = file_system.SplitPath(parent_location)\n        extent_data_files = []\n        for extent_descriptor in iter(vmdk_handle.extent_descriptors):\n            extent_data_filename = extent_descriptor.filename\n\n            _, path_separator, filename = extent_data_filename.rpartition('/')\n            if not path_separator:\n                _, path_separator, filename = extent_data_filename.rpartition(\n                    '\\\\')\n\n            if not path_separator:\n                filename = extent_data_filename\n\n            # The last parent location path segment contains the extent data filename.\n            # Since we want to check if the next extent data file exists we remove\n            # the previous one form the path segments list and add the new filename.\n            # After that the path segments list can be used to create the location\n            # string.\n            parent_location_path_segments.pop()\n            parent_location_path_segments.append(filename)\n            extent_data_file_location = file_system.JoinPath(\n                parent_location_path_segments)\n\n            # Note that we don't want to set the keyword arguments when not used\n            # because the path specification base class will check for unused\n            # keyword arguments and raise.\n            kwargs = path_spec_factory.Factory.GetProperties(parent_path_spec)\n\n            kwargs['location'] = extent_data_file_location\n            if parent_path_spec.parent is not None:\n                kwargs['parent'] = parent_path_spec.parent\n\n            extent_data_file_path_spec = path_spec_factory.Factory.NewPathSpec(\n                parent_path_spec.type_indicator, **kwargs)\n\n            if not file_system.FileEntryExistsByPathSpec(extent_data_file_path_spec):\n                break\n\n            extent_data_files.append(extent_data_file_path_spec)\n\n        if len(extent_data_files) != vmdk_handle.number_of_extents:\n            raise IOError('Unable to locate all extent data files.')\n\n        file_objects = []\n        for extent_data_file_path_spec in extent_data_files:\n            file_object = resolver.Resolver.OpenFileObject(\n                extent_data_file_path_spec, resolver_context=self._resolver_context)\n            file_objects.append(file_object)\n\n        vmdk_handle.open_extent_data_files_as_file_objects(file_objects)\n\n    def get_size(self):\n        \"\"\"Retrieves the size of the file-like object.\n\n        Returns:\n          int: size of the file-like object data.\n\n        Raises:\n          IOError: if the file-like object has not been opened.\n          OSError: if the file-like object has not been opened.\n        \"\"\"\n        if not self._is_open:\n            raise IOError('Not opened.')\n\n        return self._file_object.get_media_size()\n"
  },
  {
    "path": "config.py",
    "content": "import os\nimport hashlib\nimport logging\nimport json\n\n\ndef as_bool(var):\n    if var is None:\n        return False\n\n    val = var.lower()\n    if val == \"false\":\n        return False\n\n    if val == \"true\":\n        return True\n\n    logging.debug(str(os.environ))\n    raise RuntimeError(\n        f\"Environment variable with value {var} is neither True nor False\")\n\n\n# Read config vars dynamically fron environment (set in `.env`)\ndiff_config_keys = [key for key in os.environ if key.startswith(\"DIFF_\")]\n\n# Convert environment variable format (DIFF_USE_ATTRIBUTES) to variable name format for diskdiff.py (use_attributes)\ndiff_config = {\n    key[5:].lower(): as_bool(os.environ[key])\n    for key in diff_config_keys\n}\ndev = \"_DEV\" if os.environ.get(\"VMDIFF_DEV\") else \"\"\n\nfilter_path_json = os.environ.get(\"FILTER_PATH_JSON\")\nignore_path_json = os.environ.get(\"IGNORE_PATH_JSON\")\n\nallow_dirs = json.loads(filter_path_json) if filter_path_json else []\nignore_dirs = json.loads(ignore_path_json) if ignore_path_json else []\n\nIGNORE_PROCESSES_REGEX = os.environ.get(\"IGNORE_PROCESSES_REGEX\")\n\nPARTITION = os.environ.get(\"PARTITION_IDENTIFIER\")\n\n\nMEMORY_PLUGINS = os.environ.get(\"MEMORY_PLUGINS\").split()\n\nFROM_DISK_IMAGE_FILENAME = os.environ.get(\n    \"FROM_DISK_IMAGE_FILENAME\")\nTO_DISK_IMAGE_FILENAME = os.environ.get(\n    \"TO_DISK_IMAGE_FILENAME\")\n\nUSE_DISK = False\nif FROM_DISK_IMAGE_FILENAME and TO_DISK_IMAGE_FILENAME and as_bool(os.environ.get(\"USE_DISK\")):\n    USE_DISK = True\n\n\nUSE_CACHE = as_bool(os.environ.get(\"USE_CACHE\"))\n\n\nSNAPSHOT_DIR = os.environ.get(f\"SNAPSHOT_DIR{dev}\")\n\nFROM_DISK_PATH = os.path.join(SNAPSHOT_DIR, FROM_DISK_IMAGE_FILENAME)\nTO_DISK_PATH = os.path.join(SNAPSHOT_DIR, TO_DISK_IMAGE_FILENAME)\n\n\nFROM_MEMORY_IMAGE_FILENAME = os.environ.get(\"FROM_MEMORY_IMAGE_FILENAME\")\nTO_MEMORY_IMAGE_FILENAME = os.environ.get(\"TO_MEMORY_IMAGE_FILENAME\")\n\nUSE_MEMORY = False\n\nif FROM_MEMORY_IMAGE_FILENAME and TO_MEMORY_IMAGE_FILENAME and as_bool(os.environ.get(\"USE_MEMORY\")):\n    USE_MEMORY = True\n\nRESULTS_DIR = os.environ[f\"RESULTS_DIR{dev}\"]\nREACT_BUILD_DIR = os.environ[f\"REACT_BUILD_DIR{dev}\"]\n\nLOG_LEVEL = logging.DEBUG if dev else logging.INFO\n\n\ndef get_run_id():\n    opts_bitfield = \"\".join(\n        [\"1\" if opt else \"0\" for opt in sorted(diff_config.values())])\n\n    dir_opts = \"\".join(sorted(allow_dirs)) + \"\".join(sorted(ignore_dirs))\n\n    config_str = opts_bitfield + dir_opts\n    config_hash = hashlib.sha1(config_str.encode()).hexdigest()[:10]\n\n    if USE_DISK:\n        filename = f\"{FROM_DISK_IMAGE_FILENAME}--{TO_DISK_IMAGE_FILENAME}--{config_hash}\"\n    else:\n        filename = f\"{FROM_MEMORY_IMAGE_FILENAME}--{TO_MEMORY_IMAGE_FILENAME}--{config_hash}\"\n\n    return filename\n\n\nRUN_ID = get_run_id()\nRUN_PATH = os.path.join(RESULTS_DIR, RUN_ID)\nRUN_DISK_PATH = os.path.join(RUN_PATH, \"disk\")\nRUN_MEMORY_PATH = os.path.join(RUN_PATH, \"memory\")\nRUN_TREE_PATH = os.path.join(RUN_PATH, \"tree\")\n"
  },
  {
    "path": "docker-compose.yml",
    "content": "version: '3.4'\r\n\r\nservices:\r\n  vmdiff:\r\n    image: vmdiff/vmdiff\r\n    build:\r\n      context: ./\r\n      dockerfile: ./backend/Dockerfile\r\n    tty: true\r\n    env_file:\r\n      - .env\r\n    volumes:\r\n      - ./backend:/backend\r\n      - ./results:$RESULTS_DIR\r\n  memdiff:\r\n    image: vmdiff/memory-processor\r\n    build:\r\n      context: ./\r\n      dockerfile: ./memory-processing/Dockerfile\r\n    env_file:\r\n      - .env\r\n    volumes:\r\n      - ./memory-processing:/memdiff\r\n      - ./memory-processing/volatilitycache:/home/unprivileged/.cache/volatility3\r\n      - ./results:$RESULTS_DIR\r\n  app:\r\n    image: vmdiff/vmdiff-app\r\n    build:\r\n      context: .\r\n      dockerfile: ./Dockerfile\r\n    env_file:\r\n      - .env\r\n    volumes:\r\n      - ./results:$RESULTS_DIR\r\n    ports:\r\n      - \"5000:5000\"\r\n"
  },
  {
    "path": "frontend/.dockerignore",
    "content": "**/.classpath\r\n**/.dockerignore\r\n**/.env\r\n**/.git\r\n**/.gitignore\r\n**/.project\r\n**/.settings\r\n**/.toolstarget\r\n**/.vs\r\n**/.vscode\r\n**/*.*proj.user\r\n**/*.dbmdl\r\n**/*.jfm\r\n**/charts\r\n**/docker-compose*\r\n**/compose*\r\n**/Dockerfile*\r\n**/node_modules\r\n**/npm-debug.log\r\n**/obj\r\n**/secrets.dev.yaml\r\n**/values.dev.yaml\r\nREADME.md\r\n"
  },
  {
    "path": "frontend/.gitignore",
    "content": "# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.\n\n# dependencies\n/node_modules\n/.pnp\n.pnp.js\n\n# testing\n/coverage\n\n# production\n/build\n\n# misc\n.DS_Store\n.env.local\n.env.development.local\n.env.test.local\n.env.production.local\n\nnpm-debug.log*\nyarn-debug.log*\nyarn-error.log*\n"
  },
  {
    "path": "frontend/README.md",
    "content": "# Getting Started with Create React App\n\nThis project was bootstrapped with [Create React App](https://github.com/facebook/create-react-app).\n\n## Available Scripts\n\nIn the project directory, you can run:\n\n### `yarn start`\n\nRuns the app in the development mode.\\\nOpen [http://localhost:3000](http://localhost:3000) to view it in the browser.\n\nThe page will reload if you make edits.\\\nYou will also see any lint errors in the console.\n\n### `yarn test`\n\nLaunches the test runner in the interactive watch mode.\\\nSee the section about [running tests](https://facebook.github.io/create-react-app/docs/running-tests) for more information.\n\n### `yarn build`\n\nBuilds the app for production to the `build` folder.\\\nIt correctly bundles React in production mode and optimizes the build for the best performance.\n\nThe build is minified and the filenames include the hashes.\\\nYour app is ready to be deployed!\n\nSee the section about [deployment](https://facebook.github.io/create-react-app/docs/deployment) for more information.\n\n### `yarn eject`\n\n**Note: this is a one-way operation. Once you `eject`, you can’t go back!**\n\nIf you aren’t satisfied with the build tool and configuration choices, you can `eject` at any time. This command will remove the single build dependency from your project.\n\nInstead, it will copy all the configuration files and the transitive dependencies (webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except `eject` will still work, but they will point to the copied scripts so you can tweak them. At this point you’re on your own.\n\nYou don’t have to ever use `eject`. The curated feature set is suitable for small and middle deployments, and you shouldn’t feel obligated to use this feature. However we understand that this tool wouldn’t be useful if you couldn’t customize it when you are ready for it.\n\n## Learn More\n\nYou can learn more in the [Create React App documentation](https://facebook.github.io/create-react-app/docs/getting-started).\n\nTo learn React, check out the [React documentation](https://reactjs.org/).\n"
  },
  {
    "path": "frontend/package.json",
    "content": "{\n  \"name\": \"vmdiff-regrets\",\n  \"version\": \"0.1.0\",\n  \"private\": true,\n  \"homepage\": \".\",\n  \"dependencies\": {\n    \"@testing-library/jest-dom\": \"^5.14.1\",\n    \"@testing-library/react\": \"^13.0.0\",\n    \"@testing-library/user-event\": \"^13.2.1\",\n    \"@types/jest\": \"^27.0.1\",\n    \"@types/react\": \"^18.0.0\",\n    \"@types/react-dom\": \"^18.0.0\",\n    \"antd\": \"^4.23.5\",\n    \"crypto\": \"^1.0.1\",\n    \"crypto-browserify\": \"^3.12.0\",\n    \"crypto-hash\": \"^2.0.1\",\n    \"diff2html\": \"^3.4.19\",\n    \"node-polyfill-webpack-plugin\": \"^2.0.1\",\n    \"react\": \"^18.2.0\",\n    \"react-dom\": \"^18.2.0\",\n    \"react-scripts\": \"5.0.1\",\n    \"typescript\": \"^4.4.2\",\n    \"web-vitals\": \"^2.1.0\"\n  },\n  \"scripts\": {\n    \"start\": \"GENERATE_SOURCEMAP=false react-scripts start\",\n    \"build\": \"GENERATE_SOURCEMAP=false react-scripts build\",\n    \"test\": \"react-scripts test\",\n    \"eject\": \"react-scripts eject\"\n  },\n  \"eslintConfig\": {\n    \"extends\": [\n      \"react-app\",\n      \"react-app/jest\"\n    ]\n  },\n  \"browserslist\": {\n    \"production\": [\n      \">0.2%\",\n      \"not dead\",\n      \"not op_mini all\"\n    ],\n    \"development\": [\n      \"last 1 chrome version\",\n      \"last 1 firefox version\",\n      \"last 1 safari version\"\n    ]\n  },\n  \"devDependencies\": {\n    \"@types/node\": \"^18.11.19\"\n  }\n}"
  },
  {
    "path": "frontend/public/index.html",
    "content": "<!DOCTYPE html>\n<html lang=\"en\">\n\n<head>\n  <meta charset=\"utf-8\" />\n  <link rel=\"icon\" href=\"%PUBLIC_URL%/favicon.ico\" />\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n  <meta name=\"theme-color\" content=\"#000000\" />\n  <meta name=\"description\" content=\"Web site created using create-react-app\" />\n  <link rel=\"apple-touch-icon\" href=\"%PUBLIC_URL%/logo192.png\" />\n  <!--\n      manifest.json provides metadata used when your web app is installed on a\n      user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/\n    -->\n  <link rel=\"manifest\" href=\"%PUBLIC_URL%/manifest.json\" />\n  <!--\n      Notice the use of %PUBLIC_URL% in the tags above.\n      It will be replaced with the URL of the `public` folder during the build.\n      Only files inside the `public` folder can be referenced from the HTML.\n\n      Unlike \"/favicon.ico\" or \"favicon.ico\", \"%PUBLIC_URL%/favicon.ico\" will\n      work correctly both with client-side routing and a non-root public URL.\n      Learn how to configure a non-root public URL by running `npm run build`.\n    -->\n  <title>🔥vmdiff🔥 (beta)</title>\n</head>\n\n<body>\n  <noscript>You need to enable JavaScript to run this app.</noscript>\n  <div id=\"root\"></div>\n  <!--\n      This HTML file is a template.\n      If you open it directly in the browser, you will see an empty page.\n\n      You can add webfonts, meta tags, or analytics to this file.\n      The build step will place the bundled scripts into the <body> tag.\n\n      To begin the development, run `npm start` or `yarn start`.\n      To create a production bundle, use `npm run build` or `yarn build`.\n    -->\n</body>\n\n</html>"
  },
  {
    "path": "frontend/public/manifest.json",
    "content": "{\n  \"short_name\": \"vmdiff\",\n  \"name\": \"vmdiff\",\n  \"icons\": [\n    {\n      \"src\": \"favicon.ico\",\n      \"sizes\": \"64x64 32x32 24x24 16x16\",\n      \"type\": \"image/x-icon\"\n    }\n  ],\n  \"start_url\": \".\",\n  \"display\": \"standalone\",\n  \"theme_color\": \"#000000\",\n  \"background_color\": \"#ffffff\"\n}\n"
  },
  {
    "path": "frontend/src/App.css",
    "content": "@import '~antd/dist/antd.css';\n\ndiv.ant-tree-treenode {\n    width: max-content;\n}\n"
  },
  {
    "path": "frontend/src/App.test.tsx",
    "content": "import React from 'react';\nimport { render, screen } from '@testing-library/react';\nimport App from './App';\n\ntest('renders learn react link', () => {\n  render(<App />);\n  const linkElement = screen.getByText(/learn react/i);\n  expect(linkElement).toBeInTheDocument();\n});\n"
  },
  {
    "path": "frontend/src/App.tsx",
    "content": "import React, { useEffect, useState } from 'react';\n\nimport { FolderOutlined, ExperimentOutlined } from '@ant-design/icons';\n\n\nimport type { DataNode } from 'antd/es/tree';\n\nimport { Tree, Layout } from 'antd';\n\nimport { Typography, Space } from 'antd';\n\n\nimport * as Diff2Html from \"diff2html\";\nimport \"diff2html/bundles/css/diff2html.min.css\";\n\nimport './App.css';\nimport { sha1 } from 'crypto-hash'\n\nconst { Title } = Typography;\n\nconst { Header, Content, Sider } = Layout;\nconst { DirectoryTree } = Tree;\n\n\ntype DiffNodeProps = {\n  status: string,\n  linesAdded: number,\n  linesRemoved: number,\n  numChildren: number,\n  numDirectChildren: number,\n  isDirectory: boolean\n};\n\ntype DiffNode = DataNode & Partial<DiffNodeProps>;\n\nlet DEMO = !(process.env.VMDIFF_DEMO === \"false\")\nlet BASE_URL = \"\"\n\n// Serve from the cached /json directory if this is a demo, otherwise from the localhost server directly.\n// It's always a demo, though.\nif (DEMO) {\n  BASE_URL = window.location.pathname + \"json\";\n}\n\n\nconst colours: any = {\n  added: \"#52c41a\",\n  removed: \"#eb2f96\",\n  modified: \"#d0b44c\",\n  unchanged: \"#333\"\n}\n\nconst initTreeData: DiffNode[] = [];\n\nconst getInitTreeData = (): Promise<DiffNode[]> => {\n\n  return fetch(BASE_URL + \"/changed_files\").then((response) => {\n    return response.json()\n  });\n}\n\nconst getChildrenData = (key: React.Key): Promise<DiffNode[]> => {\n\n  if (DEMO) {\n    const hasher = sha1(String(key))\n\n    return hasher.then((hash) => {\n      return fetch(BASE_URL + `/children/` + hash).then((response) => {\n        return response.json()\n      });\n    });\n\n  } else {\n    return fetch(BASE_URL + `/children?` + new URLSearchParams({\n      key: String(key)\n    })).then((response) => {\n      return response.json()\n    });\n  }\n}\nconst getDiffString = (key: React.Key): Promise<string[]> => {\n\n  if (DEMO) {\n    const hasher = sha1(String(key))\n\n    return hasher.then((hash) => {\n      return fetch(BASE_URL + `/diff/` + hash).then((response) => {\n        return response.json()\n      });\n    });\n\n  } else {\n\n    return fetch(BASE_URL + `/diff?` + new URLSearchParams({\n      key: String(key)\n    })).then((response) => {\n      return response.json()\n    });\n  }\n}\n\nconst treeMap = new Map<React.Key, DiffNode>();\n\nconst cache = (nodes: DiffNode[]): void => {\n  nodes.map((node) => {\n    treeMap.set(node.key, node)\n    if (node.children) {\n      cache(node.children)\n    }\n    return null\n  })\n}\n// Cache the initial tree\ncache(initTreeData);\n\n\nconst setIcon = (node: DiffNode): DiffNode => {\n  if (node.isDirectory && node.isLeaf) {\n    node.icon = <FolderOutlined />\n  }\n  return node\n\n}\nconst iconifyAll = () => {\n  treeMap.forEach((value, key) => {\n    treeMap.set(key, setIcon(value));\n  })\n}\n\n// It's just a simple demo. You can use tree map to optimize update perf.\nfunction updateTreeData(\n  list: DiffNode[],\n  key: React.Key,\n  children: DiffNode[]\n): DiffNode[] {\n  iconifyAll();\n  return list.map((node) => {\n    if (node.key === key) {\n      return {\n        ...node,\n        children,\n      };\n    } else if (node.children) {\n      return {\n        ...node,\n        children: updateTreeData(node.children, key, children),\n      };\n    }\n    return node;\n  });\n}\n\nconst getDiffHtml = (key: React.Key): Promise<string> => {\n\n  return getDiffString(key).then((diffLines) => {\n\n    const unifiedDiffString = diffLines.join(\"\");\n    const diffHtml = Diff2Html.html(\n      unifiedDiffString,\n      {\n        drawFileList: false,\n        matching: \"lines\",\n        outputFormat: \"line-by-line\",\n        renderNothingWhenEmpty: false\n      }\n    );\n    return diffHtml\n  })\n\n}\n\nconst App: React.FC = () => {\n  const [treeData, setTreeData] = useState<DiffNode[] | undefined>(undefined);\n  const [expandedKeys, setExpandedKeys] = useState<React.Key[]>([]);\n  const [, setLoadedKeys] = useState<React.Key[]>([]);\n  const [autoExpandParent, setAutoExpandParent] = useState(true);\n  const [diff, setDiff] = useState(\"\");\n  const [collapsed, setCollapsed] = useState(true);\n\n  useEffect(() => {\n    getInitTreeData().then((data) => {\n      cache(data)\n      iconifyAll()\n      const newExpandedKeys: React.Key[] = []\n      const newLoadedKeys: React.Key[] = []\n      treeMap.forEach((value, key) => {\n        newLoadedKeys.push(key)\n\n        // Nodes to leave collapsed initially\n        if (value.children !== undefined && value.children.length > 0) {\n\n          // If all children are leaves\n          let allChildrenLeaves = true\n          for (const child of value.children) {\n            if (!child.isLeaf) {\n              allChildrenLeaves = false;\n              break;\n            }\n          }\n\n          if (!allChildrenLeaves && value.numDirectChildren! < 10 && newExpandedKeys.length < 1000) {\n            newExpandedKeys.push(key)\n          }\n        }\n\n      })\n      setExpandedKeys(newExpandedKeys)\n      setLoadedKeys(newLoadedKeys)\n      setTreeData(data)\n    })\n  }, []);\n\n\n\n  const onExpand = (expandedKeys: React.Key[], { node }: { expanded: boolean, node: DiffNode }): any => {\n\n    setExpandedKeys(expandedKeys)\n    setAutoExpandParent(false);\n  }\n\n\n  const shouldAutoExpand = (key: React.Key): boolean => {\n    const node = treeMap.get(key);\n\n    // Always expand if there are just empty folders underneath.\n    if (node?.numChildren === 0) {\n      return true;\n    }\n    if (node?.numDirectChildren! > 10) {\n      return false;\n    }\n    // Does the key have all leaf children?\n    if (node !== undefined && node.children?.every(child => { return child.isLeaf })) {\n      // TODO: Find a way to measure how many nodes are showing, not expanded\n      if ((treeMap.size + node.children!.length) < 20) {\n        console.log(`(${expandedKeys.length}) allowing expand of ${key}`)\n        return true;\n      }\n      console.log(` (${expandedKeys.length}) not expanding ${key}`)\n    }\n    return false;\n  }\n\n  const expand = (key: React.Key) => {\n    if (!(key in expandedKeys)) {\n      setExpandedKeys((prev) => [...prev, key]);\n    }\n  }\n\n  const onSelect = (selectedKeys: React.Key[]): any => {\n    const key = selectedKeys[0];\n\n    getDiffHtml(key).then((html) => {\n      setDiff(html);\n    });\n\n  }\n\n\n\n  const onLoadData = ({ key, children }: any) =>\n    new Promise<void>(resolve => {\n      console.log(key)\n      if (children != null && children.length > 0) {\n        // Do nothing if the node has children already somehow (double expand?)\n        resolve();\n        return;\n      }\n\n      setTimeout(() => {\n        // Load the children of this node.\n        getChildrenData(key).then((children) => {\n          cache(children)\n          setTreeData(origin =>\n            origin === undefined ? undefined :\n              updateTreeData(origin, key, children)\n          );\n          children.forEach((child) => {\n            if (!child.isLeaf && shouldAutoExpand(child.key)) {\n              expand(child.key);\n            }\n          })\n          resolve();\n        })\n        resolve();\n      })\n    });\n\n  const renderTitle = (node: DiffNode): React.ReactNode | undefined => {\n    const titleTextStyle = {\n      color: colours[node.status!],\n      filter: \"brightness(0.8)\"\n    }\n    const numChildrenStyle = {\n      color: \"#aaa\",\n      \"marginLeft\": \"5px\"\n    }\n\n    const linesAddedStyle = {\n      color: colours[\"added\"],\n      filter: \"brightness(0.55)\"\n    }\n    const linesRemovedStyle = {\n      color: colours[\"removed\"],\n      filter: \"brightness(0.55)\"\n    }\n    const linesChangedStyle = {\n      \"marginLeft\": \"0.3rem\",\n      opacity: \"80%\"\n    }\n\n    const showLineStats = (node.linesAdded !== 0 || node.linesRemoved !== 0) && !node.isDirectory\n\n    return <span className=\"node-title\">\n      {/* {node.status === \"added\" ? <PlusSquareTwoTone twoToneColor={colours.added} /> : null}\n          {node.status === \"removed\" ? <MinusSquareTwoTone twoToneColor={colours.removed} /> : null}\n          {node.status === \"modified\" ? <RightSquareTwoTone twoToneColor={colours.modified} /> : null} */}\n      <span className=\"node-name\" style={titleTextStyle} >{String(node.title)}</span>\n      {node.numChildren !== undefined && node.numChildren > 0 && !expandedKeys.includes(node.key) ? <span style={numChildrenStyle}>({node.numChildren})</span> : null}\n      {showLineStats ? <span style={linesChangedStyle}>\n        {node.linesAdded !== 0 ? <span style={linesAddedStyle}>+{node.linesAdded}</span> : null}{node.linesAdded !== 0 && node.linesRemoved !== 0 ? \",\" : null}\n        {node.linesRemoved !== 0 ? <span style={linesRemovedStyle}>-{node.linesRemoved}</span> : null}\n      </span>\n        : null}\n    </span>\n  }\n\n  return (<Layout>\n    <Header style={{ position: 'sticky', top: 0, zIndex: 1, width: '100%' }}>\n      <Space>\n        <Typography>\n\n          <Title level={2} style={{\n            color: \"#fff\"\n          }}>\n            <Space>\n              <ExperimentOutlined\n                size={30}\n              />\n              🔥vmdiff🔥\n            </Space>\n          </Title>\n        </Typography>\n      </Space >\n\n    </Header >\n    <Layout hasSider className=\"site-layout\" style={{}}>\n\n      <Sider collapsible collapsed={collapsed} onCollapse={value => setCollapsed(value)}\n        theme={\"light\"}\n        collapsedWidth={\"30vw\"}\n        width={\"60vw\"}\n        style={{\n          overflow: 'scroll',\n          height: '100vh',\n          marginBottom: '50px',\n        }}>\n\n        <div id=\"components-tree-demo-dynamic\"\n          style={{\n            height: \"100%\"\n          }}\n        >\n          <DirectoryTree\n            loadData={onLoadData}\n            expandedKeys={expandedKeys}\n            treeData={treeData}\n            onExpand={onExpand}\n            onSelect={onSelect}\n            titleRender={renderTitle}\n            virtual={true}\n            blockNode={false}\n            autoExpandParent={autoExpandParent}\n            defaultExpandParent={true}\n            style={{\n              height: \"100%\"\n            }}\n          />\n        </div>\n\n      </Sider>\n      <Content style={{ margin: '24px 16px 0', overflow: 'initial' }}>\n\n        <div className=\"site-layout-background\" style={{ padding: 24, textAlign: 'center' }}>\n          <div id=\"code-diff\" dangerouslySetInnerHTML={{ __html: diff }}>\n          </div>\n        </div>\n      </Content>\n    </Layout>\n  </Layout >\n\n  )\n};\n\nexport default App;\n\n"
  },
  {
    "path": "frontend/src/index.css",
    "content": "body {\n  margin: 0;\n  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Oxygen',\n    'Ubuntu', 'Cantarell', 'Fira Sans', 'Droid Sans', 'Helvetica Neue',\n    sans-serif;\n  -webkit-font-smoothing: antialiased;\n  -moz-osx-font-smoothing: grayscale;\n}\n\ncode {\n  font-family: source-code-pro, Menlo, Monaco, Consolas, 'Courier New',\n    monospace;\n}"
  },
  {
    "path": "frontend/src/index.tsx",
    "content": "import React from 'react';\nimport ReactDOM from 'react-dom/client';\nimport './index.css';\nimport App from './App';\nimport reportWebVitals from './reportWebVitals';\n\nconst root = ReactDOM.createRoot(\n  document.getElementById('root') as HTMLElement\n);\nroot.render(\n  <React.StrictMode>\n    <App />\n  </React.StrictMode>\n);\n\n// If you want to start measuring performance in your app, pass a function\n// to log results (for example: reportWebVitals(console.log))\n// or send to an analytics endpoint. Learn more: https://bit.ly/CRA-vitals\nreportWebVitals();\n"
  },
  {
    "path": "frontend/src/react-app-env.d.ts",
    "content": "/// <reference types=\"react-scripts\" />\n"
  },
  {
    "path": "frontend/src/reportWebVitals.ts",
    "content": "import { ReportHandler } from 'web-vitals';\n\nconst reportWebVitals = (onPerfEntry?: ReportHandler) => {\n  if (onPerfEntry && onPerfEntry instanceof Function) {\n    import('web-vitals').then(({ getCLS, getFID, getFCP, getLCP, getTTFB }) => {\n      getCLS(onPerfEntry);\n      getFID(onPerfEntry);\n      getFCP(onPerfEntry);\n      getLCP(onPerfEntry);\n      getTTFB(onPerfEntry);\n    });\n  }\n};\n\nexport default reportWebVitals;\n"
  },
  {
    "path": "frontend/src/setupTests.ts",
    "content": "// jest-dom adds custom jest matchers for asserting on DOM nodes.\n// allows you to do things like:\n// expect(element).toHaveTextContent(/react/i)\n// learn more: https://github.com/testing-library/jest-dom\nimport '@testing-library/jest-dom';\n"
  },
  {
    "path": "frontend/tsconfig.json",
    "content": "{\n  \"compilerOptions\": {\n    \"target\": \"es5\",\n    \"lib\": [\n      \"dom\",\n      \"dom.iterable\",\n      \"esnext\"\n    ],\n    \"allowJs\": true,\n    \"skipLibCheck\": true,\n    \"esModuleInterop\": true,\n    \"allowSyntheticDefaultImports\": true,\n    \"strict\": true,\n    \"forceConsistentCasingInFileNames\": true,\n    \"noFallthroughCasesInSwitch\": true,\n    \"module\": \"esnext\",\n    \"moduleResolution\": \"node\",\n    \"resolveJsonModule\": true,\n    \"isolatedModules\": true,\n    \"noEmit\": true,\n    \"jsx\": \"react-jsx\"\n  },\n  \"include\": [\n    \"src\"\n  ]\n}"
  },
  {
    "path": "memory-processing/Dockerfile",
    "content": "# For more information, please refer to https://aka.ms/vscode-docker-python\nFROM sk4la/volatility3\n\n# Keeps Python from generating .pyc files in the container\nENV PYTHONDONTWRITEBYTECODE=1\n\n# Turns off buffering for easier container logging\nENV PYTHONUNBUFFERED=1\n\n\nWORKDIR /memdiff\nRUN mkdir -p volatilitycache\nRUN mkdir -p results\nCOPY memory-processing/memdiff.sh .\n\n# Creates a non-root user with an explicit UID and adds permission to access the /app folder\n# For more info, please refer to https://aka.ms/vscode-docker-python-configure-containers\nUSER root\nRUN chown -R unprivileged /memdiff\nRUN chmod +x memdiff.sh\n\nRUN mkdir -p /disk\nRUN mkdir -p /memory\n\n# RUN touch /disk/from\n# RUN touch /disk/to\n\n# RUN touch /memory/from\n# RUN touch /memory/to\n\n# USER appuser\nUSER unprivileged\n\n# ENTRYPOINT [ \"/usr/bin/dumb-init\", \"--\", \"volatility3\" ]\nENTRYPOINT [ \"/bin/sh\", \"memdiff.sh\" ]\n"
  },
  {
    "path": "memory-processing/memdiff.sh",
    "content": "#! /bin/bash\nif [ \"${#FROM_MEMORY_IMAGE_FILENAME}\" -lt 1 ]; then\n    echo \"No memory image filename given by ${FROM_MEMORY_IMAGE_FILENAME}, skipping memory analysis\"\n    exit 0\nfi\n\nPLUGINS=$MEMORY_PLUGINS\n\nRUN_NAME=\"${FROM_MEMORY_IMAGE_FILENAME}__${TO_MEMORY_IMAGE_FILENAME}\"\nRUN_DIR=\"/results/memory/$RUN_NAME\"\nFROM_OUTPUT_PATH_TEMPLATE=\"$RUN_DIR/from\"\nTO_OUTPUT_PATH_TEMPLATE=\"$RUN_DIR/to\"\nmkdir -p \"$RUN_DIR\"\n\nfor plugin in $PLUGINS; do\n\n    FROM_OUTPUT_FILENAME=\"$FROM_OUTPUT_PATH_TEMPLATE-$plugin.json\"\n    TO_OUTPUT_FILENAME=\"$TO_OUTPUT_PATH_TEMPLATE-$plugin.json\"\n\n    if [ ! -s \"$FROM_OUTPUT_FILENAME\" ]; then\n        volatility3 --cache-path ./volatilitycache -o . -f \"/snapshots/$FROM_MEMORY_IMAGE_FILENAME\" --renderer json $plugin | tee \"$FROM_OUTPUT_FILENAME\"\n    fi\n\n    if [ ! -s \"$TO_OUTPUT_FILENAME\" ]; then\n        volatility3 --cache-path ./volatilitycache -o . -f \"/snapshots/$TO_MEMORY_IMAGE_FILENAME\" --renderer json $plugin | tee \"$TO_OUTPUT_FILENAME\"\n    fi\ndone\n"
  },
  {
    "path": "requirements.txt",
    "content": "# To ensure app dependencies are ported from your virtual environment/host machine into your container, run 'pip freeze > requirements.txt' in the terminal to overwrite this file\ntyper[all]==0.7.0\n"
  },
  {
    "path": "server.py",
    "content": "import inspect\nimport os\nimport sys\nimport time\n\nimport logging\n\nfrom flask import Flask, jsonify, request, render_template, send_from_directory\n\nimport config\n\n# Python Crimes to import from a local module\ncurrentdir = os.path.dirname(os.path.abspath(\n    inspect.getfile(inspect.currentframe())))\nbackend_dir = os.path.join(currentdir, \"backend\")\nsys.path.insert(0, backend_dir)\n\n\nif config.dev:\n    from backend import vmdiff\n    vmdiff.Main()\n\ntry:\n    import diffcache  # noqa\nexcept ImportError:\n    from backend import diffcache\n\n\nREACT_BUILD_DIR = config.REACT_BUILD_DIR\n\napp = Flask(\n    __name__, static_folder=f\"{REACT_BUILD_DIR}/static\", template_folder=f\"{REACT_BUILD_DIR}\")\n\nlogging.info(f\"Waiting for results at {config.RUN_TREE_PATH}....\")\n\nif not os.path.exists(config.RUN_TREE_PATH):\n    logging.critical(\n        f\"No results found at {config.RUN_TREE_PATH}. Generate results first?\")\n    sys.exit(1)\n\ncache = diffcache.DiffCache(\n    config.RUN_DISK_PATH, config.RUN_TREE_PATH, config.RUN_MEMORY_PATH)\nwhile True:\n    try:\n        tree, children_map = cache.get_tree_data_from_cache()\n        break\n    except FileNotFoundError:\n        time.sleep(3)\n\n\nlogging.debug(f\"Tree: {len(tree)}, children: {len(children_map)}\")\n\n\n@app.route(\"/children\")\ndef get_children_handler():\n    key = request.args.get(\"key\")\n\n    node_children = children_map[key]\n    response = jsonify(node_children)\n    response.headers.add('Access-Control-Allow-Origin', '*')\n\n    return response\n\n\n@app.route(\"/diff\")\ndef get_diff():\n\n    key = request.args.get(\"key\")\n\n    diff = cache.get_diff(key)\n    if diff is None:\n        logging.warning(f\"No diff found for {key}\")\n        result = None\n    else:\n        result = diff.diff_lines\n\n    response = jsonify(result)\n    response.headers.add('Access-Control-Allow-Origin', '*')\n\n    # to start with, just return the directories, and let the user expand out the files.\n    return response\n\n\n@app.route(\"/changed_files\")\ndef get_changed_files():\n    response = jsonify(tree)\n    response.headers.add('Access-Control-Allow-Origin', '*')\n\n    # To start with, just return the directories, and let the user expand out the files.\n    return response\n\n\n@app.route(\"/json/<path:path>\")\ndef json(path):\n    json_dir = f\"{cache.tree_path}/json\"\n    return send_from_directory(json_dir, path)\n\n\n@app.route(\"/\")\ndef index():\n    return render_template(\"index.html\")\n\n\nif __name__ == \"__main__\":\n\n    app.run(\"0.0.0.0\", debug=True)\n"
  },
  {
    "path": "vmdiff",
    "content": "#! env python3\n\"\"\"\nvmdiff CLI\n\"\"\"\n\n__author__ = \"Atlassian Icarus Labs\"\n__version__ = \"0.1.0\"\n__license__ = \"MIT\"\n\nfrom typing import Optional, List\nimport typer\nimport pathlib\nimport subprocess\nimport sys\nimport shlex\n\nimport os\nimport re\n\nimport json\n\nfrom datetime import datetime\nfrom struct import unpack, pack\n\n\nfrom rich.table import Table\nfrom rich import print\n\n\napp = typer.Typer()\n\ninput_path_options = {\n    \"exists\": True,\n    \"rich_help_panel\": \"Input and output\",\n    \"show_default\": False\n}\n\n\ndef main(\n    input_dir: pathlib.Path = typer.Argument(..., help=\"Path to virtual machine directory, or any directory containing .vmdk/.vmem files.\",\n                                             file_okay=False, **input_path_options),\n    from_disk: pathlib.Path = typer.Option(\n        None, \"--from-disk\", \"-fd\", help=\"Path (or filename) of first chronological disk snapshot.\",\n        **input_path_options),\n    to_disk: pathlib.Path = typer.Option(\n        None, \"--to-disk\", \"-td\", help=\"Path (or filename) of second chronological disk snapshot.\",\n        **input_path_options),\n    from_memory: pathlib.Path = typer.Option(\n        None, \"--from-memory\", \"-fm\", help=\"Path (or filename) of first chronological memory snapshot.\",\n        **input_path_options),\n    to_memory: pathlib.Path = typer.Option(\n        None, \"--to-memory\", \"-tm\", help=\"Path (or filename) of second chronological memory snapshot.\",\n        **input_path_options),\n    from_snapshot: str = typer.Option(\n        None, \"--from-snapshot\", \"-fs\", help=\"First chronological snapshot ID obtained via --list-snapshots.\", rich_help_panel=\"Input and output\", show_default=False),\n    to_snapshot: str = typer.Option(\n        None, \"--to-snapshot\", \"-ts\", help=\"Second chronological snapshot ID obtained via --list-snapshots.\", rich_help_panel=\"Input and output\", show_default=False),\n    list_snapshots: bool = typer.Option(\n        False, \"--list-snapshots\", \"-l\", help=\"Show information about the VM snapshots in INPUT_DIR, e.g. the files belonging to each snapshot.\"),\n    ignore_path: Optional[List[str]] = typer.Option(\n        [], \"--ignore-path\", \"-i\", help=\"List of disk path regular expressions to ignore when diffing. Multiple values accepted via e.g. \\\"--ignore-path /path/one --ignore-path /path/two\\\"\", rich_help_panel=\"Configuring\"),\n    filter_path: Optional[List[str]] = typer.Option(\n        [\"/\", \"\\\\\"], \"--filter-path\", \"-f\", help=\"List of disk path regular expressions. Only these paths will be processed. Multiple values accepted via e.g. \\\"--filter-path /path/one --filter-path /path/two\\\"\", rich_help_panel=\"Configuring\"),\n    ignore_processes: Optional[str] = typer.Option(\n        \"\", \"--ignore-process\", \"-I\", help=\"Regular expression to ignore when diffing process names. Note that only the first 14 characters of the process name are processed (by Volatility).\", rich_help_panel=\"Configuring\"),\n    cache: bool = typer.Option(\n        True, help=\"Whether to cache results based on input filenames and config options.\", rich_help_panel=\"Configuring\"),\n    partition: str = typer.Option(\n        \"\", \"--partition\", \"-p\", help=\"Disk Partition ID to use. If not set, show partitions and ask which one to use via STDIN.\", rich_help_panel=\"Input and output\", show_default=False),\n    use_memory: bool = typer.Option(\n        True, help=\"Whether to process/diff memory.\", rich_help_panel=\"Configuring\"),\n    use_disk: bool = typer.Option(\n        True, help=\"Whether to process/diff disks.\", rich_help_panel=\"Configuring\"),\n    include_binary: bool = typer.Option(\n        None, help=\"Whether to also process and diff binary files.\", rich_help_panel=\"Configuring\"),\n    show: bool = typer.Option(\n        None, \"--show\", \"-s\", help=\"Open browser and show diff viewer UI.\", rich_help_panel=\"Display\"),\n    debug: bool = typer.Option(\n        None, \"--debug\", help=\"Enable debug logging.\"),\n\n):\n    \"\"\"\n    \\b\n    Generate and view diffs for .vmdk and .vmem files.\n\n    \\b\n    EXAMPLES:\n\n    \\b\n    What snapshots do I have to choose from?\n        ./vmdiff \"~/Virtual Machines.localized/VMName/\" --list-snapshots\n    \\b\n    Diff snapshots 1 and 2\n        ./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2\n    \\b\n    Don't prompt me for a partition, I know it's partition 4\n        ./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2 --partition 4\n    \\b\n    Diff generic VMDK files, not necessarily from a snapshot\n        ./vmdiff ~/dir-with-vmdk-files/ --from-disk disk1.vmdk --to-disk disk2.vmdk --no-use-memory\n    \\b\n    Only show files that have changed in the user's home directory\n        ./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2 --filter-path \"/home/username/\"\n    \\b\n    Ignore .log and .txt files\n        ./vmdiff \"~/Virtual Machines.localized/VMName/\" --from-snapshot 1 --to-snapshot 2 --filter-path \"/home/username/\" --ignore-path \".*\\.log\" --ignore-path \".*\\.txt\"\n\n    \"\"\"\n\n    def run_command(command, description, env):\n        if debug:\n            print(command)\n        subprocess.run(command, stdin=sys.stdin,\n                       stdout=sys.stdout, stderr=sys.stderr, shell=True, check=True, env=env)\n\n    file_opts = (from_disk, to_disk, from_memory, to_memory)\n    disk_opts = (from_disk, to_disk)\n    memory_opts = (from_memory, to_memory)\n    snapshot_opts = (from_snapshot, to_snapshot)\n\n    if list_snapshots or (from_snapshot and to_snapshot):\n        if any(file_opts):\n            raise typer.BadParameter(\n                \"--list-snapshots and --from/to-snapshot cannot be specified with any of --from-disk, --to-disk, --from-memory, --to-memory\")\n        if list_snapshots:\n            table, _ = do_list_snapshots(input_dir)\n            print(table)\n            return\n    \n    # If no specific opts given, list the snapshots.\n    if not any(file_opts) and not any(disk_opts) and not any(memory_opts) and not any(snapshot_opts):\n        table, _ = do_list_snapshots(input_dir)\n        print(table)\n        return\n\n    if any(snapshot_opts):\n        if not all(snapshot_opts):\n            raise typer.BadParameter(\n                \"Need both --to-snapshot and --from-snapshot when using one.\")\n\n        _, snapshots = do_list_snapshots(input_dir)\n        from_disk = snapshots[from_snapshot][\"disk_filename\"]\n        to_disk = snapshots[to_snapshot][\"disk_filename\"]\n        from_memory = snapshots[from_snapshot][\"memory_filename\"]\n        to_memory = snapshots[to_snapshot][\"memory_filename\"]\n    else:\n        if any(disk_opts) and not all(disk_opts):\n            raise typer.BadParameter(\n                \"Need both --to-disk and --from-disk.\")\n        else:\n            from_disk = from_disk.name\n            to_disk = to_disk.name\n        if any(memory_opts) and not all(memory_opts):\n            raise typer.BadParameter(\n                \"Need both --to-memory and --from-memory.\")\n\n        else:\n            from_memory = from_memory.name\n            to_memory = to_memory.name\n\n    filter_path_json = json.dumps(filter_path)\n    ignore_path_json = json.dumps(ignore_path)\n\n    # Unset paths if not used, so config.py resolves USE_DISK and USE_MEMORY correctly.\n    if not use_disk:\n        from_disk = to_disk = \"\"\n    if not use_memory:\n        from_memory = to_memory = \"\"\n\n    # Convert to filenames, not file paths.\n    env_var_mapping = {\n        \"FROM_DISK_IMAGE_FILENAME\": from_disk,\n        \"TO_DISK_IMAGE_FILENAME\": to_disk,\n        \"FROM_MEMORY_IMAGE_FILENAME\": from_memory,\n        \"TO_MEMORY_IMAGE_FILENAME\": to_memory,\n        \"SNAPSHOT_DIR\": input_dir,\n        \"FILTER_PATH_JSON\": filter_path_json,\n        \"IGNORE_PATH_JSON\": ignore_path_json,\n        \"IGNORE_PROCESSES_REGEX\": ignore_processes,\n        \"PARTITION_IDENTIFIER\": partition,\n        \"USE_CACHE\": str(cache),\n        \"USE_DISK\": str(use_disk),\n        \"USE_MEMORY\": str(use_memory),\n        \"DIFF_IGNORE_BINARY\": str(not include_binary),\n        \"VMDIFF_DEV\": str(debug)\n    }\n\n    env = os.environ.copy()\n    env.update(env_var_mapping)\n\n    # Generate the docker compose run CLI args to mount the files.\n    volume_maps = [\n        f\"{input_dir}:/snapshots\"\n    ]\n\n    volume_args_list = []\n    for volume_map in volume_maps:\n        volume_args_list.append(\"-v\")\n        volume_args_list.append(shlex.quote(volume_map))\n\n    parts = \"docker compose --env-file .env run -i\".split(\n        \" \")\n    parts.extend(volume_args_list)\n\n    parts.extend([\"memdiff\"])\n    command = \" \".join(parts)\n    if use_memory and not show:\n        run_command(command, \"[green] :gear: Processing memory dump...\", env)\n\n    parts[-1] = \"vmdiff\"\n    command = \" \".join(parts)\n    if not show:\n        if use_disk:\n            message = \"[green] :gear: Reading and diffing virtual disks...\"\n        else:\n            message = \"[green] :gear: Diffing memory...\"\n        run_command(\n            command, message, env)\n\n        print(\"Now run with --show to display results in browser\")\n\n    if show:\n        command = \"docker compose --env-file .env up app\"\n        print(\"[green] :gear: Serving results on http://localhost:5000\")\n        run_command(\n            command, \"[green] :gear: Serving results on localhost:5000...\", env)\n\n\ndef do_list_snapshots(snapshot_dir):\n\n    contents = os.listdir(snapshot_dir)\n    vmsd_filename = None\n    for filename in contents:\n        if filename.endswith(\".vmsd\"):\n            vmsd_filename = filename\n\n    if vmsd_filename is None:\n        raise typer.BadParameter(\n            \"Couldn't find .vmsd file in input directory, so can't list snapshots.\")\n\n    vmsd_path = os.path.join(snapshot_dir, vmsd_filename)\n    vmsd = parse_vmsd(vmsd_path)\n\n    table = Table(title=f\"Found snapshots in {snapshot_dir}\")\n\n    table.add_column(\"ID\", style=\"bold\")\n    table.add_column(\"Parent ID\", style=\"bold\", max_width=6)\n    table.add_column(\"Creation time\", style=\"yellow\", no_wrap=True)\n    table.add_column(\"Disk file\", style=\"magenta\", )\n    table.add_column(\"Memory file\", style=\"magenta\")\n    table.add_column(\"Description\", style=\"green\")\n\n    # Sort snapshots by create time.\n    for sid, snapshot in sorted(vmsd.items(), key=lambda tup: tup[1].get(\"create_time\")):\n\n        create_time = snapshot.get(\"create_time\")\n        disk_filename = snapshot.get(\"disk_filename\")\n        memory_filename = snapshot.get(\"memory_filename\")\n\n        description = snapshot.get(\"displayName\")\n        table.add_row(sid, snapshot.get(\"parent\"), create_time,\n                      disk_filename, memory_filename, description)\n\n    return table, vmsd\n\n\ndef parse_vmsd(vmsd_path: os.PathLike):\n    def convert_time(low, high):\n        low = int(low)\n        high = int(high)\n        combinedTimeMsec = float(\n            (high * 2**32) + unpack('I', pack('i', low))[0])\n        combinedTimeSec = combinedTimeMsec / 1000000\n        timestamp = datetime.fromtimestamp(combinedTimeSec)\n\n        return timestamp.strftime('%Y-%m-%d %H:%M:%S')\n\n    with open(vmsd_path) as f:\n        lines = f.readlines()\n        sid2uid = {}\n        snapshots = {}\n        for line in lines:\n            # Ignore encoding.\n            if line.startswith(\".encoding\"):\n                continue\n\n            LINE = re.compile(r'(?P<key>(\\w+\\.?)+) = \"(?P<value>[^\"]+)\"')\n\n            match = re.search(LINE, line)\n            key = match.group(\"key\")\n\n            keys = key.split(\".\")\n            subkey = keys[-1]\n            value = match.group(\"value\")\n\n            # Ignore \"snapshot\" rather than \"snapshot0, snapshot1\", etc.\n            sid_match = re.match(r\"snapshot(\\d+)$\", keys[0])\n            if not sid_match:\n                continue\n            sid = sid_match.group(0)\n\n            if subkey == \"uid\":\n                uid = value\n                sid2uid[sid] = uid\n                snapshots[uid] = {}\n            else:\n                uid = sid2uid[sid]\n\n            if subkey == \"fileName\":\n                subkey = \"disk_filename\"\n            # It's fiiiiine.\n            if subkey == \"filename\":\n                subkey = \"memory_filename\"\n                # The .vmsd file lists memory dumps as .vmsn, but we're interested in the actual .vmem dumps.\n                value = value.replace(\".vmsn\", \".vmem\")\n\n            snapshots[uid][subkey] = value\n\n        for sid, snapshot in snapshots.items():\n\n            create_time = convert_time(\n                snapshot[\"createTimeLow\"], snapshot[\"createTimeHigh\"])\n            snapshot[\"create_time\"] = create_time\n        return snapshots\n\n\nif __name__ == \"__main__\":\n    typer.run(main)\n"
  }
]