Full Code of mgeeky/msidump for AI

main 40833694ebba cached

4 files

75.6 KB

18.1k tokens

60 symbols

1 requests

Download .txt

Repository: mgeeky/msidump
Branch: main
Commit: 40833694ebba
Files: 4
Total size: 75.6 KB

Directory structure:
gitextract_xq7srhum/

├── README.md
├── msidump.py
├── requirements.txt
└── test-cases/
    └── README.md

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# `msidump`

**MSI Dump** - a tool that analyzes malicious MSI installation packages, extracts files, streams, binary data and incorporates YARA scanner.

On Macro-enabled Office documents we can quickly use [oletools mraptor](https://github.com/decalage2/oletools/blob/master/oletools/mraptor.py) to determine whether document is malicious. If we want to dissect it further, we could bring in [oletools olevba](https://github.com/decalage2/oletools/blob/master/oletools/olevba.py) or [oledump](https://github.com/DidierStevens/DidierStevensSuite/blob/master/oledump.py).

To dissect malicious MSI files, so far we had only one, but reliable and trustworthy [lessmsi](https://github.com/activescott/lessmsi).
However, `lessmsi` doesn't implement features I was looking for:

- quick triage
- Binary data extraction
- YARA scanning

Hence this is where `msidump` comes into play.


## Features

This tool helps in quick triages as well as detailed examinations of malicious MSIs corpora.
It lets us:

- Quickly determine whether file is suspicious or not.
- List all MSI tables as well as dump specific records
- Extract Binary data, all files from CABs, scripts from CustomActions
- scan all inner data and records with YARA rules
- Uses `file`/MIME type deduction to determine inner data type

It was created as a companion tool to the blog post I released here:

- [MSI Shenanigans. Part 1 - Offensive Capabilities Overview](https://mgeeky.tech/msi-shenanigans-part-1/)


### Limitations

- The program is still in an early alpha version, things are expected to break and triaging/parsing logic to change
- Due to this tool heavy relience on Win32 COM `WindowsInstaller.Installer` interfaces, currently **it is not possible to support native Linux** platforms. Maybe `wine python msidump.py` could help, but haven't tried that yet.


## Use Cases

1. Perform quick triage of a suspicious MSI augmented with YARA rule:

```
cmd> python msidump.py evil.msi -y rules.yara
```

![1.png](img/1.png)

Here we can see that input MSI is injected with suspicious **VBScript** and contains numerous executables in it.


2. Now we want to take a closer look at this VBScript by extracting only that record. 

We see from the triage table that it was present in `Binary` table. Lets get him:

```
python msidump.py putty-backdoored.msi -l binary -i UBXtHArj
```

We can specify which to record dump either by its name/ID or its index number (here that would be 7).

![2.png](img/2.png)

Lets have a look at another example. This time there is executable stored in `Binary` table that will be executed during installation:

![3.png](img/3.png)

To extract that file we're gonna go with 

```
python msidump.py evil2.msi -x binary -i lmskBju -O extracted
```

Where 
- `-x binary` tells to extract contents of `Binary` table
- `-i lmskBju` specifies which record exactly to extract
- `-O extracted` sets output directory

![4.png](img/4.png)


For the best output experience, run the tool on a **maximized console window** or redirect output to file:

```
python msidump.py [...] -o analysis.log
```

## Full Usage

```
PS D:\> python .\msidump.py --help
options:
  -h, --help            show this help message and exit

Required arguments:
  infile                Input MSI file (or directory) for analysis.

Options:
  -q, --quiet           Surpress banner and unnecessary information. In triage mode, will display only verdict.
  -v, --verbose         Verbose mode.
  -d, --debug           Debug mode.
  -N, --nocolor         Dont use colors in text output.
  -n PRINT_LEN, --print-len PRINT_LEN
                        When previewing data - how many bytes to include in preview/hexdump. Default: 128
  -f {text,json,csv}, --format {text,json,csv}
                        Output format: text, json, csv. Default: text
  -o path, --outfile path
                        Redirect program output to this file.
  -m, --mime            When sniffing inner data type, report MIME types

Analysis Modes:
  -l what, --list what  List specific table contents. See help message to learn what can be listed.
  -x what, --extract what
                        Extract data from MSI. For what can be extracted, refer to help message.

Analysis Specific options:
  -i number|name, --record number|name
                        Can be a number or name. In --list mode, specifies which record to dump/display entirely. In --extract mode dumps only this particular record to --outdir
  -O path, --outdir path
                        When --extract mode is used, specifies output location where to extract data.
  -y path, --yara path  Path to YARA rule/directory with rules. YARA will be matched against Binary data, streams and inner files

------------------------------------------------------

- What can be listed:
    --list CustomAction     - Specific table
    --list Registry,File    - List multiple tables
    --list stats            - Print MSI database statistics
    --list all              - All tables and their contents
    --list olestream        - Prints all OLE streams & storages.
                              To display CABs embedded in MSI try: --list _Streams
    --list cabs             - Lists embedded CAB files
    --list binary           - Lists binary data embedded in MSI for its own purposes.
                              That typically includes EXEs, DLLs, VBS/JS scripts, etc

- What can be extracted:
    --extract all           - Extracts Binary data, all files from CABs, scripts from CustomActions
    --extract binary        - Extracts Binary data
    --extract files         - Extracts files
    --extract cabs          - Extracts cabinets
    --extract scripts       - Extracts scripts

------------------------------------------------------
```

## TODO

- Triaging logic is still a bit flakey, I'm not very proud of it. Hence it will be subject for constant redesigns and further ramifications
- Test it on a wider test samples corpora
- Add support for input ZIP archives with passwords
- Add support for ingesting entire directory full of YARA rules instead of working with a single file only
- Currently, the tool matches malicious `CustomAction Type`s based on assessing their numbers, which is prone to being evaded.
  - It needs to be reworked to properly consume Type number and decompose it [onto flags](https://learn.microsoft.com/en-us/windows/win32/msi/summary-list-of-all-custom-action-types)


## Tool's Name

Apparently when naming my tool, I didn't think on checking whether it was already taken.
There is another tool named `msidump` being part of [msitools](https://gitlab.gnome.org/GNOME/msitools) GNU package:

- [msidump](https://wiki.gnome.org/msitools)

---

### ☕ Show Support ☕

This and other projects are outcome of sleepless nights and **plenty of hard work**. If you like what I do and appreciate that I always give back to the community,
[Consider buying me a coffee](https://github.com/sponsors/mgeeky) _(or better a beer)_ just to say thank you! 💪 

---

```
Mariusz Banach / mgeeky, (@mariuszbit)
<mb [at] binary-offensive.com>
```


================================================
FILE: msidump.py
================================================
#!/usr/bin/python3
#
# Written by Mariusz Banach <mb@binary-offensive.com>, @mariuszbit / mgeeky
#

import sys
import os
import re
import glob
import pefile
import argparse
import hashlib
import random
import string
import tempfile
import textwrap
import cabarchive
import shutil
import atexit
import urllib
from collections import OrderedDict
from textwrap import fill

if sys.platform != 'win32':
    print('\n\n[!] FATAL: This script can only be used in Windows system as it works with Win32 COM/OLE interfaces.\n\n')

import pythoncom
import win32com.client
from win32com.shell import shell, shellcon
from win32com.client import constants

USE_SSDEEP = False

try:
    import ssdeep
    USE_SSDEEP = True
except:
    quiet = False
    # for a in sys.argv:
    #     if a == '-q' or a == '--quiet':
    #         quiet = True
    #         break
    # if not quiet:
    #     print("[!] 'ssdeep' not installed. Will not use it.")

try:
    import colorama
    import magic
    import yara
    import olefile
    from prettytable import PrettyTable

except ImportError as e:
    print(f'\n[!] Requirements not installed: {e}\n\tInstall them with:\n\tcmd> pip install -r requirements.txt\n')
    sys.exit(1)

#########################################################

VERSION = '0.2'

#########################################################

options = {
    'debug'     : False,
    'verbose'   : False,
    'format'    : 'text',
}

logger = None

try:
    colorama.init()
except:
    pass

class Logger:
    colors_map = {
        'red':      colorama.Fore.RED, 
        'green':    colorama.Fore.GREEN, 
        'yellow':   colorama.Fore.YELLOW,
        'blue':     colorama.Fore.BLUE, 
        'magenta':  colorama.Fore.MAGENTA, 
        'cyan':     colorama.Fore.CYAN,
        'white':    colorama.Fore.WHITE, 
        'grey':     colorama.Fore.WHITE,
        'reset':    colorama.Style.RESET_ALL,
    }
    
    def __init__(self, opts):
        self.opts = opts

    @staticmethod
    def colorize(txt, col):
        if type(txt) is not str:
            txt = str(txt)
        if not col in Logger.colors_map.keys() or options.get('nocolor', False):
            return txt
        return Logger.colors_map[col] + txt + Logger.colors_map['reset']

    @staticmethod
    def stripColors(txt):
        ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
        result = ansi_escape.sub('', txt)
        return result

    def fatal(self, txt):
        self.text('[!] ' + txt, color='red')
        sys.exit(1)

    def info(self, txt):
        self.text('[.] ' + txt, color='yellow')

    def err(self, txt):
        self.text('[-] ' + txt, color='red')

    def ok(self, txt):
        self.text('[+] ' + txt, color='green')

    def verbose(self, txt):
        if self.opts.get('verbose', False) or self.opts.get('debug', False):
            self.text('[>] ' + txt, color='cyan')

    def dbg(self, txt):
        if self.opts.get('debug', False):
            self.text('[dbg] ' + txt, color='magenta')

    def text(self, txt, color='none'):
        if color != 'none':
            txt = Logger.colorize(txt, color)

        if not self.opts.get('quiet', False):
            print(txt)


class MSIDumper:
    # https://learn.microsoft.com/pl-pl/windows/win32/msi/custom-action-return-processing-options?redirectedfrom=MSDN
    CustomActionReturnType = {
        'check' : 0,
        'ignore' : 64,
        'asyncWait' : 128,
        'asyncNoWait' : 192,
    }

    # https://learn.microsoft.com/en-us/windows/win32/msi/custom-action-execution-scheduling-options
    CustomActionExecuteType = {
        'always' : 0,
        'firstSequence' : 256,
        'oncePerProcess' : 512,
        'clientRepeat' : 768
    }

    #
    # https://learn.microsoft.com/en-us/windows/win32/msi/custom-action-in-script-execution-options
    # Deferred, rollback and commit custom actions can only be placed between InstallInitialize and InstallFinalize
    #
    CustomActionInScriptExecute = {
        'immediate' : 0,
        'deferred' : 1,
        'rollback' : 1280,
        'commit' : 1536,
        'deferred-no-impersonate' : 3072,
        'rollback-no-impersonate' : 3328,
        'commit-no-impersonate' : 3584,
    }

    # https://learn.microsoft.com/en-us/windows/win32/msi/summary-list-of-all-custom-action-types
    CustomActionNativeTypes = {
        'dll-in-binary-table' : 1,
        'exe-in-binary-table' : 2,
        'jscript-in-binary-table' : 5,
        'vbscript-in-binary-table' : 6,
        'dll-installed-with-product' : 17,
        'exe-installed-with-product' : 18,
        'jscript-installed-with-product' : 21,
        'vbscript-installed-with-product' : 22,
        'exe-with-directory-path-in-target' : 34,
        'directory-set' : 35,
        'jscript-in-sequence-table' : 37,
        'vbscript-in-sequence-table' : 38,
        'exe-command-line' : 50,
        'jscript-with-funcname-in-property' : 53,
        'vbscript-with-funcname-in-property' : 55,
    }

    OpenMode = {
        'msiOpenDatabaseModeReadOnly' : 0,
        'msiOpenDatabaseModeTransact' : 1,
    }

    SkipColumns = (
        'extendedtype',
    )

    ListModes = (
        'all', 'olestream', 'cabs', 'binary', 'stats', 'olestreams',
    )

    ExtractModes = (
        'all', 'binary', 'files', 'cabs', 'scripts',
    )

    KnownCOMErrors = {
        0x80004005 : 'Could not process input database',
    }

    KnownTables = (
		'ActionText', 'AdminExecuteSequence', 'AdminUISequence', 'AdvtExecuteSequence', 'AdvtUISequence', 
        'AppId', 'AppSearch', 'BBControl', 'Billboard', 'Binary', 'BindImage', 'CCPSearch', 'CheckBox', 
        'Class', 'ComboBox', 'CompLocator', 'Complus', 'Component', 'Condition', 'Control', 'ControlCondition',
         'ControlEvent', 'CreateFolder', 'CustomAction', 'Dialog', 'Directory', 'DrLocator', 
         'DuplicateFile', 'Environment', 'Error', 'EventMapping', 'Extension', 'Feature', 'FeatureComponents', 
         'File', 'FileSFPCatalog', 'Font', 'Icon', 'IniFile', 'IniLocator', 'InstallExecuteSequence', 
         'InstallUISequence', 'IsolatedComponent', 'LaunchCondition', 'ListBox', 'ListView', 'LockPermissions', 
         'Media', 'MIME', 'MoveFile', 'MsiAssembly', 'MsiAssemblyName', 'MsiDigitalCertificate', 
         'MsiDigitalSignature', 'MsiEmbeddedChainer', 'MsiEmbeddedUI', 'MsiFileHash', 'MsiLockPermissionsEx', 
         'MsiPackageCertificate', 'MsiPatchCertificate', 'MsiPatchHeaders', 'MsiPatchMetadata', 'MsiPatchOldAssemblyFile', 
         'MsiPatchOldAssemblyName', 'MsiPatchSequence', 'MsiServiceConfig', 'MsiServiceConfigFailureActions', 
         'MsiSFCBypass', 'MsiShortcutProperty', 'ODBCAttribute', 'ODBCDataSource', 'ODBCDriver', 'ODBCSourceAttribute', 
         'ODBCTranslator', 'Patch', 'PatchPackage', 'ProgId', 'Property', 'PublishComponent', 'RadioButton', 
         'Registry', 'RegLocator', 'RemoveFile', 'RemoveIniFile', 'RemoveRegistry', 'ReserveCost', 'SelfReg', 
         'ServiceControl', 'ServiceInstall', 'SFPCatalog', 'Shortcut', 'Signature', 'TextStyle', 'TypeLib', 'UIText', 
         'Upgrade', 'Verb', '_Columns', '_Storages', '_Streams', '_Tables', '_TransformView', '_Validation',
    )

    ImportantTables = (
        'CustomAction', 'InstallExecuteSequence', '_Streams', 'Media', 'InstallUISequence', 'Binary', '_TransformView',
        'Component', 'Registry', 'Shortcut', 'RemoveFile', 'File',
    )

    SuspiciousTables = (
        'CustomAction', 'Binary', '_Streams', 
    )

    #
    # Approach based on assessing CustomAction Type numbers is prone to being evaded.
    # TODO: Rework it to properly consume Type number and decompose it onto flags:
    #  https://learn.microsoft.com/en-us/windows/win32/msi/summary-list-of-all-custom-action-types
    #
    CustomActionTypes = {
        'Execute' : {
            'color' : 'red',
            'types': (1250, 3298, 226),
            'desc' : 'Will execute system commands or other executables',
        },
        'VBScript' : {
            'color' : 'red',
            'types': (1126, 102),
            'desc' : 'Will run VBScript in-memory',
        }, 
        'JScript' : {
            'color' : 'red',
            'types': (1125, 101),
            'desc' : 'Will run JScript in-memory',
        },
        'Run-Exe' : {
            'color' : 'red',
            'types': (1218, 194),
            'desc' : 'Will extract executable from inner Binary table, drop it to:\n  C:\\Windows\\Installer\\MSIXXXX.tmp\nand then run it.',
        },
        'Load-DLL' : {
            'color' : 'red',
            'types': (65, ),
            'desc' : 'Will load DLL in memory and invoke its exported function.\nThat may also include .NET DLL',
        },
        'Run-Dropped-File' : {
            'color' : 'red',
            'types': (1746,),
            'desc' : 'Will run file extracted as a result of installation',
        },
        'Set-Directory' : {
            'color' : 'cyan',
            'types': (51,),
            'desc' : 'Will set Directory to a specific path',
        },
    }

    MimeTypesThatIncreasSuspiciousScore = (
        "application/hta",
        "application/js",
        "application/msword",
        "application/vnd.ms-excel",
        "application/vnd.ms-powerpoint",
        "application/vns.ms-appx",
        "application/x-ms-shortcut",
        "application/x-vbs",
        'application/vnd.ms-excel', 
        'application/vnd.openxmlformats-officedocument.presentationml.presentation', 
        'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 
        'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
        'application/x-dosexec',
    )

    RecognizedInnerFileTypes = {
        'cabinet' : {
            'indicator' : 'MS Cabinet archive (.CAB)',
            'safe-extension' : '.cab',
            'color' : 'yellow',
            'magic' : ('Microsoft Cabinet',)
        },
        'executable' : {
            'indicator' : 'PE executable (EXE)',
            'safe-extension' : '.exe.bin',
            'color' : 'red',
            'magic' : (
                'executable (console)', 
                'executable (GUI)', 
            )
        },
        'dll' : {
            'indicator' : 'PE executable (DLL)',
            'safe-extension' : '.dll.bin',
            'color' : 'red',
            'magic' : (
                'executable (DLL)', 
            )
        },
        'unsure-executable' : {
            'indicator' : 'PE executable (?)',
            'safe-extension' : '.exe.bin',
            'color' : 'red',
            'min-keywords' : 3,
            'keywords' : (
                'This program', 'cannot be', 'run in', 'dos mode',
            ),
        },
        'unsure-cabinet' : {
            'indicator' : 'CAB archive (?)',
            'safe-extension' : '.cab',
            'color' : 'yellow',
            'min-keywords' : 1,
            'keywords' : (
                'MSCF',
            ),
        },
        'unsure-vbscript' : {
            'indicator' : 'VBScript (?)',
            'safe-extension' : '.vbs.bin',
            'color' : 'red',
            'printable' : True,
            'min-keywords' : 3,
            'keywords' : (
                'dim', 'function ', 'sub ', 'createobject', 'getobject', 'with', 'string',
                'object', 'set', 'then', 'end if', 'end function', 'end sub'
            ),
            'not-keywords' : (
                '<?xml',
            )
        },
        'unsure-jscript' : {
            'indicator' : 'JScript (?)',
            'safe-extension' : '.js.bin',
            'color' : 'red',
            'printable' : True,
            'min-keywords' : 3,
            'keywords' : (
                'var', 'activexobject', 'try {', 'try{', '}catch', '} catch', 'return ',
            'function ',
            ),
            'not-keywords' : (
            )
        }
    }

    DangerousExtensions = (
        '.lnk', '.exe', '.cpl', '.xll', '.url', '.vbs', '.ps1', '.bat', '.psm', 
        '.wsc', '.wsf', '.dll', '.js', '.vbe', '.jse', '.hta', '.msi', '.cmd',
    )

    TableSortBy = {
        'InstallExecuteSequence' : 2,
        'InstallUISequence' : 2,
        'File' : 7,
        'Feature' : 4,
        'Media' : 0,
    }

    DefaultTableWidth = 128

    def __init__(self, options, logger):
        self.options = options
        self.logger = logger
        self.disinfectionMode = False
        self.report = []
        self.infile = ''
        self.csvDelim = ','
        self.maxWidth = self.options.get('print_len', -1)
        self.format = self.options.get('format', 'text')
        self.errorsCache = set()
        self.nativedb = None
        self.outdir = ''
        self.verdict = f'[.] Verdict: {Logger.colorize("Benign", "green")}'
        self.installer = None
        self.extractedCount = 0
        self.grade = 0

        self.specificTableAlignment = {
            'stats' : {
                'type' : 'r',
                'value' : 'l',
            },
            'report' : {
                'description': 'l',
                'context': 'l',
            }
        }

    @staticmethod
    def isprintable(data):
        if type(data) is str:
            data = data.encode()
        for a in data:
            if a not in string.printable.encode():
                return False
        return True

    @staticmethod
    def fromHexdumpToRaw(txt):
        raw = []
        if not re.match(r'[0-9a-f]+ \| [0-9a-f]{2}.*', txt.split('\n')[0], re.I):
            return txt.encode()

        for line in txt.split('\n'):
            line = line.strip()

            if re.match(r'[0-9a-f]+ \| [0-9a-f]{2}.*', line, re.I):
                parts = line.split('|')
                bytesPart = parts[1].strip()

                for m in re.finditer(r'([0-9a-f]{2})', bytesPart, re.I):
                    raw.append(int(m.group(1), 16))
        return bytes(raw)

    @staticmethod
    def hexdump(data, addr = 0, num = 0):
        s = ''
        n = 0
        lines = []
        if num == 0: num = len(data)

        if len(data) == 0:
            return '<empty>'

        if type(data) is str:
            data = data.encode()

        for i in range(0, num, 16):
            line = ''
            line += '%04x | ' % (addr + i)
            n += 16

            for j in range(n-16, n):
                if j >= len(data): break
                line += '%02x ' % (int(data[j]) & 0xff)

            line += ' ' * (3 * 16 + 7 - len(line)) + ' | '

            for j in range(n-16, n):
                if j >= len(data): break
                c = data[j] if not (data[j] < 0x20 or data[j] > 0x7e) else '.'
                line += '%c' % c

            lines.append(line)
        return '\n'.join(lines)

    def parseCOMException(self, message, error, additional=''):
        code = error.hresult + 2**32
        code2 = 0

        try:
            code2 = error.excepinfo[-1] + 2**32
        except:
            pass

        if code2 != 0:
            if code in MSIDumper.KnownCOMErrors:
                additional += MSIDumper.KnownCOMErrors[code]

            if code2 in MSIDumper.KnownCOMErrors:
                additional += MSIDumper.KnownCOMErrors[code2]

            self.logger.err(f'''{message}:

    {error}

    HRESULT 1: 0x{code:08X}          <-- General exception code

    HRESULT 2: 0x{code2:08X}          <-- COM exception code. Google up that error number: 
                                        https://google.com/?q={urllib.parse.quote_plus(f"COM exception 0x{code2:08X}")}

    {additional}
''')

        else:
            if code in MSIDumper.KnownCOMErrors:
                additional += MSIDumper.KnownCOMErrors[code]

            self.logger.err(f'''{message}:

    {error}

    HRESULT: 0x{code:08X}          <-- General exception code

    {additional}
''')

    def open(self, infile):
        self.infile = os.path.abspath(os.path.normpath(infile))
        self.outdir = os.path.abspath(os.path.normpath(self.options.get('outdir', '')))

        if not os.path.isfile(self.infile):
            self.logger.fatal(f'Input file does not exist: {self.infile}')

        mode = MSIDumper.OpenMode['msiOpenDatabaseModeReadOnly']

        if self.disinfectionMode:
            self.logger.fatal('MSI Disinfection is not yet implemented.')
            mode = MSIDumper.OpenMode['constants.msiOpenDatabaseModeTransact']

        self.initCOM()

        try:
            self.logger.dbg(f'Opening database {self.infile} ...')
            self.nativedb = self.installer.OpenDatabase(
                self.infile, 
                mode
            )

            return True

        except pythoncom.com_error as error:
            if self.options['debug']:
                self.parseCOMException(
                    message=f"Could not open MSI database natively via COM",
                    error=error
                )

            return False

    def close(self):
        if self.nativedb is not None:
            self.nativedb = None
        
        if self.installer is not None:
            try:
                self.installer.Release()
            except:
                pass

            self.installer = None

    def initCOM(self):
        if self.installer is not None:
            return

        try:
            #
            # Logic borrowed from:
            #   https://github.com/orestis/python/blob/master/Tools/msi/msilib.py#L60
            #

            self.logger.dbg('Initializing COM and instantiating WindowsInstaller.Installer ...')
            pythoncom.CoInitialize()

            win32com.client.gencache.EnsureModule('{000C1092-0000-0000-C000-000000000046}', 1033, 1, 0)

            self.installer = win32com.client.Dispatch(
                'WindowsInstaller.Installer',
                resultCLSID='{000C1090-0000-0000-C000-000000000046}'
            )

            if self.installer is None:
                self.logger.fatal('Could not instantiate WindowsInstaller.Installer!')

        except Exception as e:
            self.logger.fatal(f'Could not instantiate WindowsInstaller.Installer. Exception:\n\n\t{e}')

    def collectEntries(self, table, dontSort = False):
        entries = []

        try:
            entries = self._collectEntries(
                table, 
                dontSort
            )
        except Exception as e:
            self.logger.dbg(f'Error: Table {table} did not contain any records.')

            if self.options.get('debug', False) and table.lower() != '_streams':
                raise

        return entries

    def _collectEntries(self, table, dontSort = False):
        assert self.nativedb is not None, "Database is not opened"
        entries = []

        view = self.nativedb.OpenView(f"SELECT * FROM {table}")
        view.Execute(None)

        types = view.ColumnInfo(constants.msiColumnInfoTypes)
        names = view.ColumnInfo(constants.msiColumnInfoNames)
        columns = []

        for i in range(1, types.FieldCount+1):
            t = types.StringData(i)
            n = names.StringData(i)

            if t[0] in 'slSL':
                columns.append((n, 'str'))
            elif t[0] in 'iI':
                columns.append((n, 'int'))
            elif t[0] == 'v':
                columns.append((n, 'bin'))
            else:
                self.logger.dbg(f'Unsupported column type: table {table}, column: {i}. Type: {t}, Name: {n}')
                columns.append((n, '?'))

        while True:
            r = view.Fetch() 
            if not r:
                break

            rec = OrderedDict()
            for i in range(1, r.FieldCount+1):
                val = None
                name = columns[i-1][0]

                if r.IsNull(i):
                    val = ''

                elif columns[i-1][1] == 'str': 
                    try:
                        val = r.StringData(i)

                    except Exception as e:
                        txt = f'Could not convert {table} column {columns[i-1][0]} value to string (type: {columns[i-1][1]}): {e}'
                        if txt not in self.errorsCache:
                            self.logger.dbg(txt)
                            self.errorsCache.add(txt)
                        val = ''

                elif columns[i-1][1] == 'int': 
                    try:
                        val = r.IntegerData(i)
                    except Exception as e:
                        txt = f'Could not convert {table} column {columns[i-1][0]} value to integer (type: {columns[i-1][1]}): {e}'
                        if txt not in self.errorsCache:
                            self.logger.dbg(txt)
                            self.errorsCache.add(txt)
                        val = 0

                elif columns[i-1][1] == 'bin': 
                    size = r.DataSize(i)
                    val = r.ReadStream(i, size, constants.msiReadStreamBytes)

                rec[columns[i-1][0].lower()] = val

            entries.append(rec)

        view.Close()

        if not dontSort and table in MSIDumper.TableSortBy:
            entries = sorted(entries, key=lambda x: list(x.values())[MSIDumper.TableSortBy[table]] )

        self.logger.dbg(f'Collected {len(entries)} entries from {table} ...')
        return entries

    def getMaxValueFromTable(self, table, columnNum):
        maxVal = -1
        entries = self.collectEntries(table)

        for entry in entries:
            if maxVal < entry[columnNum]:
                maxVal = entry[columnNum]

        return maxVal

    def analyse(self):
        assert self.nativedb is not None, "Database is not opened"

        try:
            ret = self.analysisWorker()

            if self.grade > 0:
                self.verdict = f'[.] Verdict: {Logger.colorize("SUSPICIOUS", "red")}'

            self.logger.verbose(f'Verdict grade: {self.grade}')

            return ret

        except Exception as e:
            if self.nativedb is not None:
                self.nativedb = None

            if self.options['debug']: 
                raise
            else:
                self.logger.err(f'Could not analyse input MSI. Enable --debug to learn more. Exception: {e}')

            return False

        finally:
            pass

    def listTable(self, table):
        if ',' in table:
            output = ''
            tables = table.split(',')
            for t in tables:
                output += f'{Logger.colorize("[+]", "green")} Listing: {Logger.colorize(t, "green")}\n\n'

                out = self._listTable(t)
                if out is not None:
                    output += str(out) + '\n'

            return output
        else:
            return self._listTable(table)

    def _listTable(self, table):
        assert self.nativedb is not None, "Database is not opened"

        records = None

        if table == 'streams':  table = '_Streams'
        if table == 'stream':   table = '_Streams'
        if table == 'binary':   table = 'Binary'
        if table == 'cabs':     table = 'Media'
        if table == 'olestreams':table = 'olestream'

        if table.lower() not in [x.lower() for x in MSIDumper.KnownTables + MSIDumper.ListModes]:
            tb = PrettyTable(['1','2','3'])
            tb.header = False
            vals = list(MSIDumper.KnownTables + MSIDumper.ListModes)
            i = 0
            while i + 3 < len(vals):
                tb.add_row([vals[i+0], vals[i+1], vals[i+2]])
                i += 3

            if i < len(vals):
                for j in range(len(vals)-i):
                    tb.add_row([vals[i+j], '', ''])

            self.logger.fatal(f'Unsupported --list setting: {table}\n    Pick one/combination of following --list values:\n\n{tb}\n')

        if table.lower() in [x.lower() for x in MSIDumper.KnownTables]:
            try:
                if table not in MSIDumper.KnownTables:
                    for t in MSIDumper.KnownTables:
                        if table.lower() == t.lower():
                            table = t
                            break

                index = self.options.get('record', -1)
                if index != -1:
                    records0 = self.collectEntries(table)

                    try:
                        index = int(index)
                        if index < 0 or index-1 > len(records0):
                            self.logger.fatal(f'Invalid --record specified. There were only {len(records0)} records returned from {table}.\n\t\tUse value between --record 1 and --record {len(records0)}')
                        records = [ records0[index-1], ]
                    except:
                        records = []
                        for a in records0:
                            vals = list(a.values())
                            if len(vals) > 0 and vals[0].lower() == index.lower():
                                records.append(a)
                                break

                        if len(records) == 0:
                            self.logger.fatal(f'Invalid --record specified. Could not find {table} record entry based on its index number nor ID name.')
                else:
                    records = self.collectEntries(table)  
     
            except Exception as e:
                self.logger.err(f'Exception occurred while enumerating {table} entries: {e}')

                if self.options.get('debug', False):
                    raise
        else:
            table = table.lower()

            try:
                if table == 'stats':
                    records = self.collectStats()
                elif table == 'all':
                    return self.collectAll()
                elif table == 'olestream':
                    records = self.collectStreams()
                else:
                    self.logger.fatal(f'Unsupported --list setting: {table}')

            except Exception as e:
                self.logger.err(f'Exception occurred while pulling MSI metadata {table}: {e}')

                if self.options.get('debug', False):
                    raise

        if records is not None:
            self.tableSpecificHighlighting(table, records)
            return self.printTable(table, records)

        else:
            if table in MSIDumper.KnownTables:
                return f'No records found in {Logger.colorize(table, "green")} table.'
            else:
                return f'No {Logger.colorize(table, "green")} metadata was extracted.'

    def tableSpecificHighlighting(self, table, records):
        if table.lower() == 'customaction':
            for i in range(len(records)):
                rec = records[i]
                for k, v in rec.items():
                    if k == 'type':
                        col = ''
                        for a, b in MSIDumper.CustomActionTypes.items():
                            if v in b['types']:
                                col = b['color']
                                break
                        if col != '':
                            records[i][k] = Logger.colorize(v, col)
                            records[i]['source'] = Logger.colorize(records[i]['source'], col)

        if table.lower() == 'binary':
            for i in range(len(records)):
                records[i]['Magic type'] = self.sniffDataType(records[i]['data'], color=True)
        
    def extract(self, what):
        assert self.nativedb is not None, "Database is not opened"

        what = what.lower()

        if what == 'script':
            what = 'scripts'

        if what not in [x.lower() for x in MSIDumper.ExtractModes]:
            self.logger.fatal(f'Unsupported --extract setting: {what}')

        self.outdir = os.path.normpath(os.path.abspath(self.options.get('outdir', '')))
        if len(self.outdir) == 0:
            self.outdir = os.getcwd()

        if not os.path.isdir(self.outdir):
            os.makedirs(self.outdir)

        if what == 'all':
            return self.extractAll()
        elif what == 'binary':
            return self.extractBinary()
        elif what == 'files':
            return self.extractFiles()
        elif what == 'cabs':
            return self.extractCABs()
        elif what == 'scripts':
            return self.extractScripts()

    def extractAll(self):
        output = ''

        outs = self.extractBinary()
        if len(outs) > 0:
            output += outs + '\n'
        
        outs = self.extractFiles()
        if len(outs) > 0:
            output += outs + '\n'

        outs = self.extractCABs()
        if len(outs) > 0:
            output += outs + '\n'

        outs = self.extractScripts()
        if len(outs) > 0:
            output += outs + '\n'

        output += f'\nExtracted in total {self.extractedCount} objects.\n'

        return output

    def sanitizeName(self, name):
        windowsNames = (
            'CON', 'PRN', 'AUX', 'NUL', 'COM1', 'COM2', 
            'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 
            'COM8', 'COM9', 'LPT1', 'LPT2', 'LPT3', 'LPT4', 
            'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9', 
        )

        for a in ('..', '\\', '/', '"', "'", '?', '*', ':'):
            name = name.replace(a, '')

        for a in windowsNames:
            name = name.replace(a, '')
        
        if len(name) == 0:
            name = 'bin-' + ''.join(random.choices(string.ascii_uppercase + string.digits, k=5))

        return name

    def extractBinary(self):
        binary = self.collectEntries('Binary')
        num = 0
        output = ''

        self.logger.verbose('Extracting data from Binary table...')

        if len(binary) == 0:
            self.logger.err('Input MSI does not contain any embedded Binary data.')

        for elem in binary:
            sniffed = self.sniffDataType(elem['data'])
            name = self.sanitizeName(elem['name']) + self.sniffDataExt(sniffed)
            outp = os.path.join(self.outdir, name)

            with open(outp, 'wb') as f:
                f.write(elem['data'].encode())

            num += 1
            output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["data"]),"green")} bytes of {Logger.colorize(elem["name"],"green")} object to: {Logger.colorize(outp,"yellow")}'

        self.extractedCount += num
        if num > 0 and self.options.get('extract', '') != 'all':
            output += f'\n\nExtracted in total {num} objects.\n'

        return output

    def extractCab(self, infile, outdir, files):
        with open(infile, "rb") as f:
            arc = cabarchive.CabArchive(f.read())

        self.logger.verbose('Extracting Cabinets from MSI...')

        output = f'Extracting files from CAB ({infile}):\n\n'
        num = 0

        for k, v in arc.items():
            fn = v.filename

            for _file in files:
                if fn == _file['file']:
                    fn = _file['filename']

            p, ext = os.path.splitext(fn)
            if ext.lower() in MSIDumper.DangerousExtensions:
                fn += '.bin'

            lp = os.path.join(outdir, fn)

            lp1 = os.path.join(outdir, os.path.dirname(lp))
            if not os.path.isdir(lp1):
                output += f'\t{Logger.colorize("[+]","green")} Creating temp dir: {lp1}\n'
                os.makedirs(lp1, exist_ok=True)

            output += f'{Logger.colorize("[+]","green")} {v.filename:20} => {lp}\n'
            with open(lp, 'wb') as f:
                f.write(v.buf)
                num += 1

        return num, output

    def extractFiles(self, overrideOutdir=''):
        outdir = self.outdir
        if len(overrideOutdir) > 0:
            dirpath = overrideOutdir
        else:
            dirpath = tempfile.mkdtemp()

        self.outdir = dirpath
        self.extractCABs()
        self.outdir = outdir

        self.logger.verbose('Extracting files from MSI...')

        cabsNum = 0
        num = 0
        output = ''
        files = self.collectEntries('File')

        path = os.path.join(dirpath, '*.cab')
        for cab in glob.glob(path, recursive=True):
            cabPath = os.path.join(path, cab)
            cabsNum += 1
            outp = os.path.join(dirpath, os.path.basename(cabPath).replace('.cab', ''))

            try:
                num0, output0 = self.extractCab(cabPath, outp, files)
                num += num0
                output += output0

            except Exception as e:
                self.logger.err(f'Could not extract files from CABinet: {cabPath}. Error: {e}')
                if self.options.get('debug', False):
                    raise
            finally:
                if os.path.isfile(cabPath):
                    os.remove(cabPath)

        if dirpath != overrideOutdir:
            shutil.rmtree(dirpath)

        self.extractedCount += num
        if num > 0 and self.options.get('extract', '') != 'all':
            output += f'\nExtracted in total {num} files from {cabsNum} cabinets.\n'

        return output

    def extractCABs(self):
        binary = self.collectEntries('Binary')
        num = 0
        output = ''

        if len(binary) == 0:
            self.logger.err('Input MSI does not contain any embedded Binary data.')

        for elem in binary:
            sniffed = self.sniffDataType(elem['data'])
            if '.cab' not in sniffed.lower():
                continue

            name = self.sanitizeName(elem['name']) + '.cab'
            outp = os.path.join(self.outdir, name)

            with open(outp, 'wb') as f:
                f.write(elem['data'].encode())

            num += 1

        # source: https://github.com/decalage2/oletools/blob/master/oletools/oledir.py#L245
        ole = olefile.OleFileIO(self.infile)
        for entry in ole.listdir():
            name = entry[-1]
            name = repr(name)[1:-1]
            entry_id = ole._find(entry)
            try:
                size = ole.get_size(entry)
            except:
                size = '-'

            data0 = ole.openstream(entry).getvalue()
            data = data0.decode(errors='ignore')

            sniffed = self.sniffDataType(data)
            if '.cab' not in sniffed.lower():
                continue

            name = f'ole-stream-{entry_id}.cab'
            outp = os.path.join(self.outdir, name)

            with open(outp, 'wb') as f:
                f.write(data0)

            num += 1
            output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["data"]), "green")} bytes of {Logger.colorize(elem["name"],"green")} object to: {Logger.colorize(outp,"yellow")}'

        self.extractedCount += num
        if num > 0 and self.options.get('extract', '') != 'all':
            output += f'\n\nExtracted in total {num} objects.\n'

        return output

    def extractScripts(self):
        binary = self.collectEntries('Binary')
        actions = self.collectEntries('CustomAction')
        num = 0
        output = ''

        self.logger.verbose('Extracting scripts from CustomAction and Binary tables...')

        if len(binary) == 0:
            self.logger.err('Input MSI does not contain any embedded Binary data.')

        for elem in actions:
            sniffed = self.sniffDataType(elem['target'])
            if 'vbscript' not in sniffed.lower() and 'jscript' not in sniffed.lower():
                continue

            name = self.sanitizeName(elem['action'])
            outp = os.path.join(self.outdir, name) + self.sniffDataExt(sniffed)

            with open(outp, 'wb') as f:
                f.write(elem['target'].encode())

            num += 1
            output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["target"]),"green")} bytes of {Logger.colorize(elem["action"],"green")} CustomAction script to: {Logger.colorize(outp,"yellow")}'

        for elem in binary:
            sniffed = self.sniffDataType(elem['data'])
            if 'vbscript' not in sniffed.lower() and 'jscript' not in sniffed.lower():
                continue
                
            name = self.sanitizeName(elem['name'])
            outp = os.path.join(self.outdir, name) + self.sniffDataExt(sniffed)

            with open(outp, 'wb') as f:
                f.write(elem['data'].encode())

            num += 1
            output += f'\n{Logger.colorize("[+]","green")} Extracted {Logger.colorize(len(elem["data"]),"green")} bytes of {Logger.colorize(elem["name"],"green")} binary object script to: {Logger.colorize(outp,"yellow")}'

        self.extractedCount += num
        if num > 0 and self.options.get('extract', '') != 'all':
            output += f'\n\nExtracted in total {num} objects.\n'

        return output

    def formatTable(self, tbl, table, records):
        if self.maxWidth > -1 and len(records) > 0:
            for k in records[0].keys():
                tbl._max_width[k] = self.maxWidth

        tbl.align['YARA Results'] = 'l'

        if table.lower() in self.specificTableAlignment.keys():
            for k, v in self.specificTableAlignment[table.lower()].items():
                tbl.align[k] = v

        if table.lower() in [x.lower() for x in MSIDumper.TableSortBy] and len(records) > 0:
            tbl.sortby = list(records[0].keys())[MSIDumper.TableSortBy[table]]

        return tbl

    def collectAll(self):
        output = ''

        self.logger.info('Dumping all MSI tables...')

        for table in MSIDumper.KnownTables:
            recs = self.collectEntries(table)

            if not self.options.get('verbose', False) and len(recs) == 0 and table not in MSIDumper.ImportantTables:
                continue

            output += '\n\n'
            output += Logger.colorize(f'===============[ {table} : {len(recs)} records ]===============', 'green')
            output += '\n\n'
            output += self.printTable(table, recs)

        return output
    
    def collectStreams(self):
        records = []

        ole = olefile.OleFileIO(self.infile)
        for entry in ole.listdir(storages=True):
            name = entry[-1]
            name = repr(name)[1:-1]
            entry_id = ole._find(entry)
            try:
                size = ole.get_size(entry)
            except:
                size = '-'
            typeid = ole.get_type(entry)
            clsid = ole.getclsid(entry)
            
            data0 = ole.openstream(entry).getvalue()
            data = data0.decode(errors='ignore')
            sniffed = self.sniffDataType(data, color=True)

            records.append({
                'entry_id' : entry_id,
                'data type' : sniffed,
                'name' : Logger.colorize(name, 'yellow'),
                'size' : size,
                'typeid' : typeid,
                'CLSID' : clsid,
            })

        return sorted(records, key=lambda x: x['entry_id'])

    def collectStats(self):
        records = []
        hashes = (
            'md5', 'sha1', 'sha256', 'ssdeep'
        )

        self.logger.info('Computing MSI file hashes...')

        with open(self.infile, 'rb') as f:
            data = f.read()

            for h in hashes:
                if h == 'ssdeep':
                    if USE_SSDEEP:
                        hsh = ssdeep.hash(data)
                    else:
                        hsh = 'err: ssdeep module not installed'
                else:
                    m = hashlib.new(h)
                    m.update(data)
                    hsh = m.hexdigest()

                records.append({
                    'type' : Logger.colorize(f'Hash {h}', 'cyan'),
                    'value' : Logger.colorize(hsh, 'cyan'),
                })

        del data

        self.logger.info('Collecting MSI tables stats...')

        for table in MSIDumper.KnownTables:
            recs = self.collectEntries(table)
            val = f'{len(recs)} records'

            if table in MSIDumper.SuspiciousTables:
                table = Logger.colorize(table, 'red')
                val = Logger.colorize(val, 'red')

            elif table in MSIDumper.ImportantTables:
                table = Logger.colorize(table, 'yellow')
                val = Logger.colorize(val, 'yellow')

            else:
                if len(recs) == 0 and not self.options.get('verbose', False):
                    continue

            records.append({
                'type' : table,
                'value' : val,
            })

        return records

    def analysisWorker(self):
        self.processActions()
        self.lookForIOCs()

        return self.printReport()

    def normalizeDataForOutput(self, val, num=0, table=''):
        if num == 0:
            num = self.options.get('print_len', MSIDumper.DefaultTableWidth)

        if num != -1:
            val = val[:num]

        printable = MSIDumper.isprintable(val)

        if not printable and table not in ('olestream', ):
            printable2 = MSIDumper.isprintable(Logger.stripColors(val))
            if not printable2:
                val = MSIDumper.hexdump(val) + '\n'

        return val
    
    def cleanString(self, txt):
        txt = txt.replace('\r', '')
        txt = txt.replace('\t', '  ')

        if self.options.get('format', 'text') in ('csv', 'json'):
            txt = Logger.stripColors(txt)
            txt = ''.join(filter(lambda x: x in string.printable, txt))
            txt = txt.replace('\n', ' ')
            txt = re.sub(r'\s+', ' ', txt, re.I)
        
        return txt

    def printTable(self, table, records):
        if len(records) == 0:
            return f'\n\nNo records found in table {Logger.colorize(table, "green")}.'

        yaraColumn = ''
        self.logger.dbg(f'Dumping {table} table results...')

        rules = None
        if len(self.options.get('yara', '')) > 0 and table != 'YARA Results':
            yaraColumn = 'YARA Results'
            matchesReport = []
            rules = self.initYara()

        if len(records) == 1 and (self.options.get('record', '') != -1 and len(self.options.get('record', '')) > 0):
            output = ''

            for k, v in records[0].items():
                k0 = Logger.colorize(k, "green")
                output += f'\n- {k0:20} : '

                if type(v) is str:
                    v = self.normalizeDataForOutput(v, -1, table=table)

                    if len(v) < 50:
                        output += v
                    else:
                        spacer = Logger.colorize('=' * MSIDumper.DefaultTableWidth, 'yellow')
                        output += '\n\n' + spacer + '\n\n' + v + '\n\n' + spacer + '\n'
                else:
                    output += str(v)

                if table in ('binary', ):
                    output += '\n'

            output += '\n'

            if len(yaraColumn) > 0:
                k0 = Logger.colorize(yaraColumn, "green")
                output += f'\n- {k0:20} : '

                for k, v in records[0].items(): 
                    if type(v) is not str:
                        continue
                    matches = rules.match(data = v)
                    if matches:
                        ms = ''
                        for m in matches:
                            ms += f'- {m.rule}\n'
                        output += Logger.colorize(f'YARA rule match on column {k}:', 'green') + '\n' + ms + '\n'
        else:
            output = ''
            numCol = ['#',]
            yarCol = []
            if table == 'olestream':
                numCol = []

            if len(yaraColumn) > 0:
                yarCol = [yaraColumn, ]

            tbl = PrettyTable(numCol + list(records[0].keys()) + yarCol)
            num = 0

            index = self.options.get('record', -1)
            if index != -1:
                num = index - 1

            tbl = self.formatTable(tbl, table, records)

            for rec in records:
                num += 1
                vals = []
                i = 0
                for v in [num, ] + list(rec.values()):
                    if i == 0 and 'entry_id' in rec.keys():
                        i += 1
                        continue
                    if type(v) is str:
                        v = self.normalizeDataForOutput(v, table=table)
                        s = self.cleanString(v).strip()
                        n = ''

                        if table.lower() in ('binary', ):
                            n = '\n'

                        vals.append(s + n)
                    else:
                        vals.append(v)
                    i += 1

                if len(yaraColumn) > 0:
                    i = 0
                    val = ''
                    for v in list(rec.values()): 
                        if type(v) is not str:
                            i += 1
                            continue
                        matches = rules.match(data = v)
                        if matches:
                            ms = ''
                            for m in matches:
                                ms += f'- {m.rule}\n'
                            k = list(rec.keys())[i]
                            val += Logger.colorize(f'YARA rule match on column {k}:', 'green') + '\n' + ms + '\n'
                        i += 1
                    vals.append(val)

                if self.options['format'] == 'csv':
                    tbl.add_row([str(x).replace(self.csvDelim, '') for x in vals])
                else:
                    tbl.add_row(vals)

            if self.options['format'] == 'text':
                output += str(tbl)

                if table != 'YARA Results' and self:
                    output += f'\n\n[.] Found {Logger.colorize(str(len(records)), "green")} records in {Logger.colorize(table, "green")} table.'

                output += '\n'

            elif self.options['format'] == 'json':
                output += str(tbl.get_json_string())
            
            elif self.options['format'] == 'csv':
                output += str(tbl.get_csv_string(delimiter=self.csvDelim, escapechar='\\'))
            
            # elif self.options['format'] == 'html':
            #     output += str(tbl.get_html_string())
            
        return output

    def printReport(self):
        output = ''
        cols = [
            '#',
            'threat',
            'location',
            'context',
            'description'
        ]
        tbl = PrettyTable(cols)
        tbl = self.formatTable(tbl, 'report', self.report)

        num = 0

        for report in self.report:
            num += 1
            rec = [
                num,
                report['name'],
                report['location'],
                report['context'],
                report['desc'],
            ]
            vals = []
            for v in rec:
                if type(v) is str:
                    vals.append(self.cleanString(v))
                else:
                    vals.append(v)

            if self.options['format'] == 'csv':
                tbl.add_row([str(x).replace(self.csvDelim, '') for x in vals])
            else:
                tbl.add_row(vals)

        if self.options['format'] == 'text':
            output += str(tbl)

        elif self.options['format'] == 'json':
            output += str(tbl.get_json_string())
        
        elif self.options['format'] == 'csv':
            output += str(tbl.get_csv_string(delimiter=self.csvDelim, escapechar='\\'))
        
        # elif self.options['format'f] == 'html':
        #     output += str(tbl.get_html_string())

        return output

    def printRecord(self, rec, indent=''):
        out = ''
        keyLen = -1

        if type(rec) is str:
            return rec

        for k, v in rec.items():
            if len(k) > keyLen:
                keyLen = len(Logger.colorize(k, 'yellow')) + 1

        if self.format == 'text':
            for k, v in rec.items():
                if k.lower() in MSIDumper.SkipColumns:
                    continue

                if type(v) is str or type(v) is bytes:
                    printable = MSIDumper.isprintable(v)

                    if not printable and v[0] != '\x1b':
                        v = '\n\n' + MSIDumper.hexdump(v) + '\n'

                    if self.options.get('record', -1) == -1 and len(v) > 256: 
                        v = '\n\n' + v[:256].strip() + '\n\t[CUT FOR BREVITY]\n'

                k = Logger.colorize(k, 'yellow')
                out += indent + f'- {k:{keyLen}}: {v}\n'

        elif self.format == 'csv':
            out = self.csvDelim.join([str(x).replace(self.csvDelim, '')[:self.maxWidth] for x in rec.values()])

        return out

    @staticmethod
    def isValidPE(data):
        pe = None
        try:
            pe = pefile.PE(data=data.encode(), fast_load=True)
            _format = MSIDumper.RecognizedInnerFileTypes['executable']['indicator']

            if pe.OPTIONAL_HEADER.DllCharacteristics != 0:
                _format = MSIDumper.RecognizedInnerFileTypes['dll']['indicator']

            pe.close()
            return (True, _format)
        except pefile.PEFormatError as e:
            logger.dbg(f'pefile error: {e}')
            return (False, '')
        finally:
            if pe:
                pe.close()

    def sniffDataExt(self, sniffed):
        for k, v in MSIDumper.RecognizedInnerFileTypes.items():
            if v['indicator'].lower() == sniffed.lower():
                return MSIDumper.RecognizedInnerFileTypes[k]['safe-extension']

        return ''

    def gradeFoundIndicator(self, indicator, data='', color='', mime=''):
        if color != '':
            if color == 'red':
                return 1
        
        if mime != '' and mime.lower() in MSIDumper.MimeTypesThatIncreasSuspiciousScore:
            return 1

        return 0

    def sniffDataType(self, data, color=False):
        mime = self.options.get('mime', False)
        magicOut = 'data'
        try:
            magicOut = magic.from_buffer(data, mime=mime)
        except Exception as e:
            self.logger.dbg(f'Magic failed fingerprinting data: {e}')

        pe, petype = MSIDumper.isValidPE(data)
        if pe:
            if mime and magicOut in ('data', 'application/octet-stream'):
                indicator = 'application/x-dosexec'
            if color:
                indicator = Logger.colorize(petype, 'red')
            self.grade += self.gradeFoundIndicator(indicator, data, color='red')
            return indicator

        for format, predicate in MSIDumper.RecognizedInnerFileTypes.items():
            indicator = predicate.get('indicator', '')
            predColor = predicate.get('color', '')

            if format == 'unsure-executable':
                if data[:2] != 'MZ' and data[:2] != 'ZM':
                    continue
            elif format == 'unsure-cabinet':
                if data[:4] != 'MSCF':
                    continue

            if mime:
                indicator = magicOut

            if color:
                indicator = Logger.colorize(indicator, predColor)
                
            magicVals = predicate.get('magic', [])
            if len(magicVals) > 0:
                for m in magicVals:
                    if m.lower() in magicOut.lower():
                        self.grade += self.gradeFoundIndicator(indicator, data, color=predColor)
                        return indicator

            keywords = predicate.get('keywords', [])
            minkeywords = predicate.get('min-keywords', 0)
            
            printable = predicate.get('printable', 0)
            printableMet = False
            if printable:
                if MSIDumper.isprintable(data):
                    printableMet = True

            if printable and not printableMet:
                continue

            if len(keywords) > 0 and minkeywords > 0:
                skip = False
                found = 0
                for keyword in keywords:
                    if re.search(r'\b' + re.escape(keyword) + r'\b', data, re.I):
                        found += 1

                if found >= minkeywords:
                    foundNots = 0
                    notkeywords = predicate.get('not-keywords', [])

                    if len(notkeywords) > 0:
                        for keyword in notkeywords:
                            if re.search(r'\b' + re.escape(keyword) + r'\b', data, re.I):
                                foundNots += 1

                    if foundNots == 0:
                        self.grade += self.gradeFoundIndicator(indicator, data, color=predColor)
                        return indicator

        if magicOut == 'data':
            return ''

        return magicOut

    def lookForIOCs(self):
        binary = self.collectEntries('Binary')
        customActions = self.collectEntries('CustomAction')
        i = 0

        streams = self.collectEntries('_Streams')
        if len(streams) == 0:
            self.report.append({
                'name' : Logger.colorize('Missing _Streams', 'yellow'),
                'location' : f'_Streams table',
                'context' : '',
                'desc' : f'Typically MSIs contain _Streams table referring .CAB archives.\nThis sample however didn\'t contain such table, making it unusual/mangled.\n',
            })

        for data in binary:
            i += 1
            sniffed = self.sniffDataType(data['data'], color=True)

            if len(sniffed) > 0:
                data['size'] = len(data['data'])
                runByCa = False
                desc = ''

                i = 0
                for ca in customActions:
                    i += 1
                    if ca['source'] == data['name']:
                        runByCa = True
                        desc = f'\nThat data will be used during installation by CustomAction {Logger.colorize(i, "yellow")}. {Logger.colorize(ca["action"], "yellow")}'
                        break

                if not runByCa:
                    self.grade -= 1
                    sniffed = Logger.stripColors(sniffed)
                    sniffed = Logger.colorize(sniffed, 'yellow')
                    desc = '\nHowever that data doesn\'t seem to be used in CustomActions, decreasing impact.'

                self.report.append({
                    'name' : sniffed,
                    'location' : f'Binary table',
                    'context' : self.printRecord(data),
                    'desc' : f'MSI contains {sniffed} data in Binary table entry {Logger.colorize(str(i), "yellow")}. {Logger.colorize(data["name"], "yellow")}' + desc,
                })

    def processActions(self):
        actions = self.collectEntries('CustomAction')
        execSeq = self.collectEntries('InstallExecuteSequence')
        uiSeq = self.collectEntries('InstallUISequence')

        for action in actions:
            self.logger.dbg(f'Parsing CustomAction {action["action"]} ...')

            for suspAction, data in MSIDumper.CustomActionTypes.items():
                if action['type'] in data['types']:
                    desc = data['desc']
                    color = MSIDumper.CustomActionTypes[suspAction].get('color', 'white')

                    fieldToHighlight = ''

                    if 'vbscript' in suspAction.lower() or 'jscript' in suspAction.lower():
                        if len(action['source']) > 0:
                            fieldToHighlight = 'source'
                            self.grade += self.gradeFoundIndicator(suspAction, color=color)
                            desc += f".\nScript is located in {Logger.colorize(action['source'],'yellow')} Binary table record."

                    elif 'run-dll' in suspAction.lower():
                        fieldToHighlight = 'source'
                        self.grade += self.gradeFoundIndicator(suspAction, color=color)
                        desc += f".\nDLL is located in {Logger.colorize(action['source'],'yellow')} Binary table record."
                    
                    elif 'run-exe' in suspAction.lower():
                        fieldToHighlight = 'source'
                        self.grade += self.gradeFoundIndicator(suspAction, color=color)
                        desc += f"\nEXE is located in {Logger.colorize(action['source'],'yellow')} Binary table record."

                    elif 'set-directory' in suspAction.lower():
                        fieldToHighlight = 'target'

                    elif 'execute' in suspAction.lower():
                        fieldToHighlight = 'target'
                        self.grade += self.gradeFoundIndicator(suspAction, color=color)
                        desc += f".\nCommand that will be executed:\ncmd> {Logger.colorize(action['target'],'red')}"

                    foundInSeq = False
                    for seq in execSeq:
                        if seq['action'] == action['action']:
                            foundInSeq = True
                            cond = ''
                            if len(seq['condition']) > 0:
                                cond = f" with condition:\n- {Logger.colorize(seq['condition'],'yellow')}"

                            desc += f"\nThat action is scheduled to run in {Logger.colorize('InstallExecuteSequence','yellow')} table" + cond + '\n'
                            break

                    for seq in uiSeq:
                        if seq['action'] == action['action']:
                            foundInSeq = True
                            cond = ''
                            if len(seq['condition']) > 0:
                                cond = f" with condition:\n- {Logger.colorize(seq['condition'],'yellow')}"

                            desc += f"\nThat action is scheduled to run in {Logger.colorize('InstallUISequence','yellow')} table" + cond + '\n'
                            break

                    if not foundInSeq:
                        self.grade -= 1
                        color = 'yellow'
                        desc = '\nHowever that action doesn\'t seem to be invoked anywhere, decreasing impact.'

                    if len(fieldToHighlight) > 0:
                        action[fieldToHighlight] = Logger.colorize(action[fieldToHighlight], color)

                    self.report.append({
                        'name' : Logger.colorize(suspAction, color),
                        'location' : f'CustomAction table',
                        'context' : self.printRecord(action),
                        'desc' : desc
                    })
                    break

    def initYara(self):
        yaraPath = self.options.get('yara', '')
        if len(yaraPath) == 0:
            return None

        yaraPath = os.path.abspath(os.path.normpath(yaraPath))

        if not os.path.isfile(yaraPath) and not os.path.isdir(yaraPath):
            self.logger.fatal(f'Specified --yara path does not exist.')

        rules = None
        try:
            rules = yara.compile(yaraPath)
        except Exception as e:
            self.logger.fatal(f'Could not compile YARA rules. Exception: {e}')

        return rules

    def yaraScan(self, scanBinary=True, scanActions=True, scanFiles=True):
        matchesReport = []
        rules = self.initYara()

        if scanBinary:
            binary = self.collectEntries('Binary')
            output = ''

            if len(binary) > 0:
                i = 0
                for elem in binary:
                    i += 1
                    matches = rules.match(data = elem['data'].encode())
                    if matches:
                        matchesReport.append({
                            'where' : f'Binary record {Logger.colorize(i, "yellow")}. {Logger.colorize(elem["name"], "yellow")}',
                            'rules' : '\n'.join([x.rule for x in matches])
                        })

        if scanActions:
            actions = self.collectEntries('CustomAction')
            output = ''

            if len(actions) > 0:
                i = 0
                for elem in actions:
                    sniffed = self.sniffDataType(elem['target'])
                    if 'vbscript' not in sniffed.lower() and 'jscript' not in sniffed.lower():
                        continue
                    i += 1
                    matches = rules.match(data = elem['data'])
                    if matches:
                        matchesReport.append({
                            'where' : f'CustomAction record {Logger.colorize(i, "yellow")}. {Logger.colorize(elem["name"], "yellow")}',
                            'rules' : '\n'.join([x.rule for x in matches])
                        })

        if scanFiles:
            try:
                dirpath = tempfile.mkdtemp()
                self.logger.verbose(f'Extracting all files from MSI into temp dir: {dirpath} ...')

                out = self.extractFiles(overrideOutdir = dirpath)

                for _file in glob.glob(os.path.join(dirpath, '**/*.*'), recursive=True):
                    path = os.path.join(dirpath, _file)

                    matches = rules.match(path)
                    if matches:
                        matchesReport.append({
                            'where' : f'File extracted from MSI: {Logger.colorize(os.path.basename(path), "yellow")}',
                            'rules' : '\n'.join([x.rule for x in matches])
                        })

            except Exception as e:
                self.logger.err(f'Could not extract files from MSI for YARA scanning. Exception: {e}')
                if self.options.get('debug', False):
                    raise

            finally:
                if os.path.isdir(dirpath):
                    shutil.rmtree(dirpath)

        if len(matchesReport) > 0:
            output += Logger.colorize(f'[+] Got {len(matchesReport)} YARA rules matches on this MSI:\n\n', 'green')
            output += self.printTable('YARA Results', matchesReport)

        return output

def getoptions():
    global logger
    global options

    epilog = f'''

------------------------------------------------------

- What can be listed:
    --list CustomAction     - Specific table
    --list Registry,File    - List multiple tables
    --list stats            - Print MSI database statistics
    --list all              - All tables and their contents
    --list olestream        - Prints all OLE streams & storages. 
                              To display CABs embedded in MSI try: --list _Streams
    --list cabs             - Lists embedded CAB files
    --list binary           - Lists binary data embedded in MSI for its own purposes.
                              That typically includes EXEs, DLLs, VBS/JS scripts, etc

- What can be extracted:
    --extract all           - Extracts Binary data, all files from CABs, scripts from CustomActions
    --extract binary        - Extracts Binary data
    --extract files         - Extracts files
    --extract cabs          - Extracts cabinets
    --extract scripts       - Extracts scripts

------------------------------------------------------

'''

    usage = '\nUsage: msidump.py [options] <infile.msi>\n'
    opts = argparse.ArgumentParser(
        usage=usage,
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=textwrap.dedent(epilog)
    )

    req = opts.add_argument_group('Required arguments')
    req.add_argument('infile', help='Input MSI file (or directory) for analysis.')
    
    opt = opts.add_argument_group('Options')
    opt.add_argument('-q', '--quiet', default=False, action='store_true', help='Surpress banner and unnecessary information. In triage mode, will display only verdict.')
    opt.add_argument('-v', '--verbose', default=False, action='store_true', help='Verbose mode.')
    opt.add_argument('-d', '--debug', default=False, action='store_true', help='Debug mode.')
    opt.add_argument('-N', '--nocolor', default=False, action='store_true', help='Dont use colors in text output.')
    opt.add_argument('-n', '--print-len', default=MSIDumper.DefaultTableWidth, type=int, help='When previewing data - how many bytes to include in preview/hexdump. Default: 128')
    opt.add_argument('-f', '--format', default='text', choices=['text', 'json', 'csv'], help='Output format: text, json, csv. Default: text')
    opt.add_argument('-o', '--outfile', metavar='path', default='', help='Redirect program output to this file.')
    opt.add_argument('-m', '--mime', default=False, action='store_true', help='When sniffing inner data type, report MIME types')
    
    mod = opts.add_argument_group('Analysis Modes')
    mod.add_argument('-l', '--list', metavar='what', default='', help='List specific table contents. See help message to learn what can be listed.')
    mod.add_argument('-x', '--extract', metavar='what', default='', help='Extract data from MSI. For what can be extracted, refer to help message.')

    spec = opts.add_argument_group('Analysis Specific options')
    spec.add_argument('-i', '--record', metavar='number|name', type=str, default=-1, help='Can be a number or name. In --list mode, specifies which record to dump/display entirely. In --extract mode dumps only this particular record to --outdir')
    spec.add_argument('-O', '--outdir', metavar='path', default='', help='When --extract mode is used, specifies output location where to extract data.')
    spec.add_argument('-y', '--yara', metavar='path', default='', help='Path to YARA rule/directory with rules. YARA will be matched against Binary data, streams and inner files')

    args = opts.parse_args()
    options.update(vars(args))

    logger = Logger(options)

    if len(args.list) > 0:
        if args.list.lower() not in [x.lower() for x in MSIDumper.ListModes + MSIDumper.KnownTables] and ',' not in args.list:
            logger.err(f'WARNING: Requested {args.list} table is not recognized: parser will probably crash!')

    args.infile = os.path.abspath(os.path.normpath(args.infile))

    if not os.path.isfile(args.infile) and not os.path.isdir(args.infile):
        logger.fatal(f'--infile does not exist!')

    exclusive = sum([len(args.list) > 0, len(args.extract) > 0])
    if exclusive > 1:
        logger.fatal(f'--list and --extract are mutually exclusive options. Pick one.')

    if len(args.extract) > 0 and len(args.outdir) == 0:
        logger.fatal('-O/--outdir telling where to extract files to is required when working in --extract mode.')

    options.update(vars(args))
    return args

@atexit.register
def goodbye():
    try:
        colorama.deinit()
    except:
        pass

def terminalWidth():
    n = shutil.get_terminal_size((80, 20))  # pass fallback
    return n.columns

def banner():
    print(f'''
                   _     _                       
     _ __ ___  ___(_) __| |_   _ _ __ ___  _ __  
    | '_ ` _ \/ __| |/ _` | | | | '_ ` _ \| '_ \ 
    | | | | | \__ \ | (_| | |_| | | | | | | |_) |
    |_| |_| |_|___/_|\__,_|\__,_|_| |_| |_| .__/ 
                                        |_|    
    version: {Logger.colorize(VERSION, "green")}
    author : Mariusz Banach (mgeeky, @mariuszbit)
             binary-offensive.com
''')

def processFile(args, path):
    msir = MSIDumper(options, logger)

    if not msir.open(path):
        logger.err(f'Could not open database (use -d to learn more): {path}')
        return ''

    report = ''
    if not args.quiet and args.format == 'text':
        report += f'{Logger.colorize("[+]","green")} Analyzing : {path}\n\n'

    if len(args.list) > 0:
        report += msir.listTable(args.list)

    elif len(args.extract) > 0:
        report += msir.extract(args.extract)

    else:
        rep = msir.analyse()

        if len(args.yara) > 0:
            rep += '\n\n' + msir.yaraScan()

        if not args.quiet:
            report += str(rep)

            if args.format == 'text':
                report += '\n\n' + msir.verdict.strip() + '\n'

        elif args.format == 'text':
            verd = msir.verdict.strip()
            pos = verd.find(':')
            if pos != -1:
                verd = verd[pos+1:].strip()

            report += verd + ' : ' + path

    if args.format == 'text':
        logger.ok(f'Database processed : {path}')
    msir.close()

    return report

def processDir(args, infile):
    report = ''

    logger.verbose(f'Process files from directory: {infile}')

    for file in glob.glob(os.path.join(infile, '**/**'), recursive=True):
        path = os.path.join(infile, file)
        if os.path.isfile(path):
            try:
                report += processFile(args, path)
                report += '\n\n'

            except Exception as e:
                logger.err('Analysis of "{}" failed. Exception: {}'.format(
                    path, str(e)
                ))

    return report

def main():
    global options
    args = getoptions()
    if not args:
        return False

    if not args.quiet and args.format == 'text':
        banner()

    if len(args.outfile) > 0:
        options['nocolor'] = True

    options['max_width'] = terminalWidth()

    if os.path.isfile(args.infile):
        report = processFile(args, args.infile)

    else:
        report = processDir(args, args.infile)

    if len(args.outfile) > 0:
        with open(args.outfile, 'wb') as f:
            rep = Logger.stripColors(report)
            f.write(rep.encode())
    else:
        print(report)

if __name__ == '__main__':
    main()


================================================
FILE: requirements.txt
================================================
olefile
colorama
yara-python
prettytable>=3.5
pefile
cabarchive
pywin32
python-magic
python-magic-bin; sys_platform == "win32" or sys_platform == "darwin"

# ssdeep is optional
#ssdeep 

================================================
FILE: test-cases/README.md
================================================
## msidump test cases

- `sample1-run-autoruns64.msi.bin` - launches MS Sysinternals Autoruns64.exe from `C:\Windows\Installer\MSXXXX.msi`
- `sample2-run-calc-script.msi.bin` - executes VBScript that runs `calc` over `Wscript.Shell.Exec` method
- `sample3-run-calc-shellcode-via-dotnet.msi.bin` - bundles specially crafted CustomAction .NET DLL, that when executed, runs shellcode which spawns `calc`
- `sample4-customaction-run-calc.msi.bin` - simple MSI that runs system commands after installation is complete, here runs `calc`
- `putty-backdoored.msi.bin` - runs `calc` during PuTTY installation

All these installers install themselves to `%LOCALAPPDATA%\VcRedist` directory.

You can uninstall them with:

```
msiexec /q /x file.msi
```

Download .txt

gitextract_xq7srhum/

├── README.md
├── msidump.py
├── requirements.txt
└── test-cases/
    └── README.md

Download .txt

SYMBOL INDEX (60 symbols across 1 files)

FILE: msidump.py
  class Logger (line 76) | class Logger:
    method __init__ (line 89) | def __init__(self, opts):
    method colorize (line 93) | def colorize(txt, col):
    method stripColors (line 101) | def stripColors(txt):
    method fatal (line 106) | def fatal(self, txt):
    method info (line 110) | def info(self, txt):
    method err (line 113) | def err(self, txt):
    method ok (line 116) | def ok(self, txt):
    method verbose (line 119) | def verbose(self, txt):
    method dbg (line 123) | def dbg(self, txt):
    method text (line 127) | def text(self, txt, color='none'):
  class MSIDumper (line 135) | class MSIDumper:
    method __init__ (line 379) | def __init__(self, options, logger):
    method isprintable (line 408) | def isprintable(data):
    method fromHexdumpToRaw (line 417) | def fromHexdumpToRaw(txt):
    method hexdump (line 434) | def hexdump(data, addr = 0, num = 0):
    method parseCOMException (line 465) | def parseCOMException(self, message, error, additional=''):
    method open (line 506) | def open(self, infile):
    method close (line 539) | def close(self):
    method initCOM (line 551) | def initCOM(self):
    method collectEntries (line 577) | def collectEntries(self, table, dontSort = False):
    method _collectEntries (line 593) | def _collectEntries(self, table, dontSort = False):
    method getMaxValueFromTable (line 668) | def getMaxValueFromTable(self, table, columnNum):
    method analyse (line 678) | def analyse(self):
    method listTable (line 705) | def listTable(self, table):
    method _listTable (line 720) | def _listTable(self, table):
    method tableSpecificHighlighting (line 810) | def tableSpecificHighlighting(self, table, records):
    method extract (line 829) | def extract(self, what):
    method extractAll (line 858) | def extractAll(self):
    method sanitizeName (line 881) | def sanitizeName(self, name):
    method extractBinary (line 900) | def extractBinary(self):
    method extractCab (line 927) | def extractCab(self, infile, outdir, files):
    method extractFiles (line 961) | def extractFiles(self, overrideOutdir=''):
    method extractCABs (line 1007) | def extractCABs(self):
    method extractScripts (line 1061) | def extractScripts(self):
    method formatTable (line 1106) | def formatTable(self, tbl, table, records):
    method collectAll (line 1122) | def collectAll(self):
    method collectStreams (line 1140) | def collectStreams(self):
    method collectStats (line 1170) | def collectStats(self):
    method analysisWorker (line 1224) | def analysisWorker(self):
    method normalizeDataForOutput (line 1230) | def normalizeDataForOutput(self, val, num=0, table=''):
    method cleanString (line 1246) | def cleanString(self, txt):
    method printTable (line 1258) | def printTable(self, table, records):
    method printReport (line 1388) | def printReport(self):
    method printRecord (line 1437) | def printRecord(self, rec, indent=''):
    method isValidPE (line 1471) | def isValidPE(data):
    method sniffDataExt (line 1489) | def sniffDataExt(self, sniffed):
    method gradeFoundIndicator (line 1496) | def gradeFoundIndicator(self, indicator, data='', color='', mime=''):
    method sniffDataType (line 1506) | def sniffDataType(self, data, color=False):
    method lookForIOCs (line 1584) | def lookForIOCs(self):
    method processActions (line 1628) | def processActions(self):
    method initYara (line 1704) | def initYara(self):
    method yaraScan (line 1722) | def yaraScan(self, scanBinary=True, scanActions=True, scanFiles=True):
  function getoptions (line 1791) | def getoptions():
  function goodbye (line 1875) | def goodbye():
  function terminalWidth (line 1881) | def terminalWidth():
  function banner (line 1885) | def banner():
  function processFile (line 1898) | def processFile(args, path):
  function processDir (line 1941) | def processDir(args, infile):
  function main (line 1960) | def main():

Download .json

Condensed preview — 4 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (80K chars).

[
  {
    "path": "README.md",
    "chars": 7080,
    "preview": "# `msidump`\n\n**MSI Dump** - a tool that analyzes malicious MSI installation packages, extracts files, streams, binary da"
  },
  {
    "path": "msidump.py",
    "chars": 69417,
    "preview": "#!/usr/bin/python3\n#\n# Written by Mariusz Banach <mb@binary-offensive.com>, @mariuszbit / mgeeky\n#\n\nimport sys\nimport os"
  },
  {
    "path": "requirements.txt",
    "chars": 185,
    "preview": "olefile\ncolorama\nyara-python\nprettytable>=3.5\npefile\ncabarchive\npywin32\npython-magic\npython-magic-bin; sys_platform == \""
  },
  {
    "path": "test-cases/README.md",
    "chars": 743,
    "preview": "## msidump test cases\n\n- `sample1-run-autoruns64.msi.bin` - launches MS Sysinternals Autoruns64.exe from `C:\\Windows\\Ins"
  }
]

About this extraction

This page contains the full source code of the mgeeky/msidump GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 4 files (75.6 KB), approximately 18.1k tokens, and a symbol index with 60 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo