Repository: nlitsme/pyidbutil
Branch: master
Commit: e77d0e79e5c1
Files: 9
Total size: 114.7 KB

Directory structure:
gitextract_dtt79ccf/

├── LICENSE
├── README.md
├── idaunpack.py
├── idblib.py
├── idbtool.py
├── setup.cfg
├── test_idblib.py
├── tree-walking.py
└── tstbs.py

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2020 Willem Hengeveld <itsme@xs4all.nl>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
IDBTOOL
=======

A tool for extracting information from IDA databases.
`idbtool` knows how to handle databases from all IDA versions since v2.0, both `i64` and `idb` files.
You can also use `idbtool` to recover information from unclosed databases.

`idbtool` works without change with IDA v7.0.


Much faster than loading a file in IDA
--------------------------------------

With idbtool you can search thousands of .idb files in seconds.

More precisely: on my laptop it takes:

 *  1.5 seconds to extract 143 idc scripts from 119 idb and i64 files.
 *  3.8 seconds to print idb info for 441 files.
 *  5.6 seconds to extract 281 enums containing 4726 members from 35 files.
 * 67.8 seconds to extract 5942 structs containing 33672 members from 265 files.

Loading a approximately 5 Gbyte idb file in IDA, takes about 45 minutes.
While idb3.h takes basically no time at all, no more than a few milliseconds.


Download
========

Two versions of this tool exist:

One written in python
 * https://github.com/nlitsme/pyidbutil

One written in C++
 * https://github.com/nlitsme/idbutil

Both repositories contain a library which can be used for reading `.idb` or `.i64` files.


Usage
=====

Usage: 

    idbtool [options] [database file(s)]

 * `-n` or `--names`  will list all named values in the database.
 * `-s` or `--scripts` will list all scripts stored in the database.
 * `-u` or `--structs` will list all structs stored in the database.
 * `-e` or `--enums` will list all enums stored in the database.
 * `--imports` will list all imported symbols from the database.
 * `--funcdirs` will list function folders stored in the database.
 * `-i` or `--info` will print some general info about the database. 
 * `-d` or `--pagedump`  dump btree page tree contents.
 * `--inc`, `--dec` list all records in ascending / descending order.
 * `-q` or `--query` search specific records in the database.
 * `-m` or `--limit` limit the number of results returned by `-q`.
 * `-id0`, `-id1` dump only one specific section.
 * `--i64`, `--i32` tell idbtool that the specified file is from a 64 or 32 bit database.
 * `--recover` group files from an unpacked database.
 * `--classify` summarizes node usage in the database
 * `--dump`  hexdump the original binary data

query
-----

Queries need to be specified last on the commandline.

example:

    idbtool [database file(s)]  --query  "Root Node;V"

Will list the source binary for all the databases specified on the commandline.

A query is a string with the following format:

 * [==,<=,>=,<,>]  - optional relation, default: ==
 * a base node key:
    * a DOT followed by the numeric value of the nodeid.
    * a HASH followed by the numeric value of the system-nodeid.
    * a QUESTION followed by the name of the node. -> a 'N'ame node
    * the name of the node.  -> the name is resolved, results in a '.'Dot node
 * an optional tag ( A for Alt, S for Supval, etc )
 * an optional index value

example queries:
 * `Root Node;V` -> prints record containing the source binary name
 * `?Root Node` -> prints the Name record pointing to the root
 * `>Root Node` -> prints the first 10 records starting with the root node id.
 * `<Root Node` -> prints the 10 records startng with the recordsbefore the rootnode.
 * `.0xff000001;N` -> prints the rootnode name entry.
 * `#1;N` -> prints the rootnode name entry.

List the highest node and following record in the database in two different ways,
the first: starting at the first record below `ffc00000`, and listing the next.
The second: starting at the first record after `ffc00000`, and listing the previous:
 * `--query "<#0xc00000"  --limit 2 --inc -v`
 * `--query ">#0xc00000"  --limit 2 --dec -v`

Note that this should be the nodeid in the `$ MAX NODE` record.

List the last two records:
 * `--limit 2 --dec  -v`

List the first two records, the `$ MAX LINK` and `$ MAX NODE` records:
 * `--limit 2 --inc -v`


A full database dump
--------------------

Several methods exist for printing all records in the database. This may be useful if
you want to investigate more of IDA''s internals. But can also be useful in recovering
data from corrupted databases.

 * `--inc`, `--dec` can be used to enumerate all b-tree records in either forward, or backward direction.
    * add `-v` to get a prettier key/value output
 * `--id0`  walks the page tree, instead of the record tree, printing the contents of each page
 * `--pagedump` linearly skip through the file, this will also reveal information in deleted pages.

naked files
===========

When IDA or your computer crashed while working on a disassembly, and you did not yet save the database,
you are left with a couple of files with extensions like `.id0`, `.id1`, `.nam`, etc.

These files are the unpacked database, i call them `naked` files.

Using the `--filetype` and `--i64` or `--i32` options you can inspect these `naked` files individually.
or use the `--recover` option to view them as a complete database together.
`idbtool` will figure out automatically which files would belong together.

`idbtool` can figure out the bitsize of the database from an `.id0` file, but not(yet) from the others.


LIBRARY
=======

The file `idblib.py` contains a library.


TODO
====

 * add option to list all comments stored in the database
 * add option to list flags for a list of addresses.

Author
======

Willem Hengeveld <itsme@xs4all.nl>


================================================
FILE: idaunpack.py
================================================
"""
`idaunpack` is a tool to aid in decoding packed data structures from an
IDA idb or i64 database.
"""
from __future__ import print_function, division
import struct
import re
import sys
from binascii import a2b_hex, b2a_hex
from idblib import IdaUnpacker

def dump_packed(data, wordsize, pattern):
    p = IdaUnpacker(wordsize, data)
    if pattern:
        for c in pattern:
            if p.eof():
                print("EOF")
                break
            if c == 'H':
                val = p.next16()
                fmt = "%04x"
            elif c == 'L':
                val = p.next32()
                fmt = "%08x"
            elif c == 'Q':
                val = p.next64()
                fmt = "%016x"
            elif c == 'W':
                val = p.nextword()
                if wordsize==4:
                    fmt = "[%08x]"
                else:
                    fmt = "[%016x]"
            else:
                raise Exception("unknown pattern: %s" % c)
            print(fmt % val, end=" ")

    while not p.eof():
        val = p.next32()
        print("%08x" % val, end=" ")

    print()

def unhex(hextxt):
    return a2b_hex(re.sub(r'\W+', '', hextxt, flags=re.DOTALL))

def main():
    import argparse
    parser = argparse.ArgumentParser(description='idaunpack')
    parser.add_argument('--verbose', '-v', action='store_true')
    parser.add_argument('--debug', action='store_true', help='abort on exceptions.')
    parser.add_argument('--pattern', '-p', type=str, help='unpack pattern: sequence of H, L, Q, W')
    parser.add_argument('-4', '-3', '-32', const=4, dest='wordsize', action='store_const', help='use 32 bit words')
    parser.add_argument('-8', '-6', '-64', const=8, dest='wordsize', action='store_const', help='use 64 bit words')
    parser.add_argument('--wordsize', '-w', type=int, help='specify wordsize')
    parser.add_argument('hexconsts', nargs='*', type=str)

    args = parser.parse_args()
    if args.wordsize is None:
        args.wordsize = 4

    for x in args.hexconsts:
       dump_packed(unhex(x), args.wordsize, args.pattern)

if __name__ == '__main__':
    main()


================================================
FILE: idblib.py
================================================
"""
idblib - a module for reading hex-rays Interactive DisAssembler databases

Supports database versions starting with IDA v2.0

IDA v1.x  is not supported, that was an entirely different file format.
IDA v2.x  databases are organised as several files, in a directory
IDA v3.x  databases are bundled into .idb files
IDA v4 .. v6  various improvements, like databases larger than 4Gig, and 64 bit support.

Copyright (c) 2016 Willem Hengeveld <itsme@xs4all.nl>


An IDB file can contain up to 6 sections:
    id0  the main database
    id1  contains flags for each byte - what is returned by idc.GetFlags(ea)
    nam  contains a list of addresses of named items
    seg  .. only in older databases
    til  type info
    id2  ?

The id0 database is a simple key/value database, much like leveldb

types of records:

Some bookkeeping:

    "$ MAX NODE" -> the highest numbered node value in use.

A list of names:

    "N" + name  -> the node id for that name.

names are both user/disassembler symbols assigned to addresses
in the disassembled code, and IDA internals, like lists of items,
For example: '$ structs', or 'Root Node'.

The main part:

    "." + nodeid + tag + index

This maps directly onto the idasdk netnode interface.
The size of the nodeid and index is 32bits for .idb files and 64 bits for .i64 files.
The nodeid and index are encoded as bigendian numbers in the key, and as little endian
numbers in (most of) the values.


"""
from __future__ import division, print_function, absolute_import, unicode_literals
import struct
import binascii
import re
import os

#############################################################################
# some code to make this library run with both python2 and python3
#############################################################################

import sys
if sys.version_info[0] == 3:
    long = int
else:
    bytes = bytearray

try:
    cmp(1, 2)
except:
    # python3 does not have cmp
    def cmp(a, b): return (a > b) - (a < b)


class cachedproperty(object):
    ## .. only works with python3 somehow. -- todo: figure out why not with python2
    def __init__(self, method):
        self.method = method
        self.name = '_' + method.__name__
    def __get__(self, obj, cls):
        if not hasattr(obj, self.name):
            value = self.method(obj)
            setattr(obj, self.name, value)
        else:
            value = getattr(obj, self.name)
        return value


def strz(b, o):
    return b[o:b.find(b'\x00', o)].decode('utf-8', 'ignore')

def makeStringIO(data):
    if sys.version_info[0] == 2:
        from StringIO import StringIO
        return StringIO(data)
    else:
        from io import BytesIO
        return BytesIO(data)


#############################################################################
# some utility functions
#############################################################################


def nonefmt(fmt, item):
    # helper for outputting None without raising an error
    if item is None:
        return "-"
    return fmt % item


def hexdump(data):
    if data is None:
        return
    return binascii.b2a_hex(data).decode('utf-8')


#############################################################################


class FileSection(object):
    """
    Presents a file like object which is a section of a larger file.

    `fh` is expected to have a seek and read method.


    This class is used to access a section (e.g. the .id0 file) of a larger file (e.g. the .idb file)
    and make read/seek behave as if it were a separate file.
    """
    def __init__(self, fh, start, end):
        self.fh = fh
        self.start = start
        self.end = end

        self.curpos = 0
        self.fh.seek(self.start)

    def read(self, size=None):
        want = self.end - self.start - self.curpos
        if size is not None and want > size:
            want = size

        if want <= 0:
            return b""

        # make sure filepointer is at correct position since we are sharing the fh object with others.
        self.fh.seek(self.curpos + self.start)
        data = self.fh.read(want)
        self.curpos += len(data)
        return data

    def seek(self, offset, *args):
        def isvalidpos(offset):
            return 0 <= offset <= self.end - self.start

        if len(args) == 0:
            whence = 0
        else:
            whence = args[0]
        if whence == 0:
            if not isvalidpos(offset):
                print("invalid seek: from %x to SET:%x" % (self.curpos, offset))
                raise Exception("illegal offset")
            self.curpos = offset
        elif whence == 1:
            if not isvalidpos(self.curpos + offset):
                raise Exception("illegal offset")
            self.curpos += offset
        elif whence == 2:
            if not isvalidpos(self.end - self.start + offset):
                raise Exception("illegal offset")
            self.curpos = self.end - self.start + offset
        self.fh.seek(self.curpos + self.start)

    def tell(self):
        return self.curpos


class IdaUnpacker:
    """
    Decodes packed ida structures.
    This is used o.a. in struct definitions, and .id2 files

    Related sdk functions: pack_dd, unpack_dd, etc.
    """
    def __init__(self, wordsize, data):
        self.wordsize = wordsize
        self.data = data
        self.o = 0

    def eof(self):
        return self.o >= len(self.data)
    def have(self, n):
        return self.o+n <= len(self.data)

    def nextword(self):
        """
        Return an unsigned word-sized integer from the buffer
        """
        if self.wordsize == 4:
            return self.next32()
        elif self.wordsize == 8:
            return self.next64()
        else:
            raise Exception("unsupported wordsize")

    def nextwordsigned(self):
        """
        Return a signed word-sized integer from the buffer
        """
        if self.wordsize == 4:
            val = self.next32()
            if val < 0x80000000:
                return val
            return val - 0x100000000
        elif self.wordsize == 8:
            val = self.next64()
            if val < 0x8000000000000000:
                return val
            return val - 0x10000000000000000
        else:
            raise Exception("unsupported wordsize")


    def next64(self):
        if self.eof():
            return None
        lo = self.next32()
        hi = self.next32()
        return (hi<<32) | lo

    def next16(self):
        """
        Return a packed 16 bit integer from the buffer
        """
        if self.eof():
            return None
        byte = self.data[self.o:self.o+1]
        if byte == b'\xff':
            # a 16 bit value:
            # 1111 1111 xxxx xxxx xxxx xxxx 
            if self.o+3 > len(self.data):
                return None
            val, = struct.unpack_from(">H", self.data, self.o+1)
            self.o += 3
            return val
        elif byte < b'\x80':
            # a 7 bit value:
            # 0xxx xxxx
            self.o += 1
            val, = struct.unpack("B", byte)
            return val
        elif byte < b'\xc0':
            # a 14 bit value:
            # 10xx xxxx xxxx xxxx
            if self.o+2 > len(self.data):
                return None
            val, = struct.unpack_from(">H", self.data, self.o)
            self.o += 2
            return val&0x3FFF
        else:
            return None

    def next8(self):
        if self.eof():
            return None
        byte = self.data[self.o:self.o+1]
        self.o += 1
        val, = struct.unpack("B", byte)

        return val

    def next32(self):
        """
        Return a packed integer from the buffer
        """
        if self.eof():
            return None
        byte = self.data[self.o:self.o+1]
        if byte == b'\xff':
            # a 32 bit value:
            # 1111 1111 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
            if self.o+5 > len(self.data):
                return None
            val, = struct.unpack_from(">L", self.data, self.o+1)
            self.o += 5
            return val
        elif byte < b'\x80':
            # a 7 bit value:
            # 0xxx xxxx
            self.o += 1
            val, = struct.unpack("B", byte)
            return val
        elif byte < b'\xc0':
            # a 14 bit value:
            # 10xx xxxx xxxx xxxx
            if self.o+2 > len(self.data):
                return None
            val, = struct.unpack_from(">H", self.data, self.o)
            self.o += 2
            return val&0x3FFF
        elif byte < b'\xe0':
            # a 29 bit value:
            # 110x xxxx xxxx xxxx xxxx xxxx xxxx xxxx
            if self.o+4 > len(self.data):
                return None
            val, = struct.unpack_from(">L", self.data, self.o)
            self.o += 4
            return val&0x1FFFFFFF
        else:
            return None

    def bytes(self, n):
        """
        Return fixed length string from buffer
        """
        if not self.have(n):
            return None
        data = self.data[self.o : self.o+n]
        self.o += n
        return data


class IDBFile(object):
    """
    Provide access to the various sections in an .idb file.

    Usage:

    idb = IDBFile(fhandle)
    id0 = idb.getsection(ID0File)

    ID0File is expected to have a class property 'INDEX'

# v1..v5  id1 and nam files start with 'Va0' .. 'Va4'
# v6      id1 and nam files start with 'VA*'
# til files start with 'IDATIL'
# id2 files start with 'IDAS\x1d\xa5\x55\x55'

    """
    def __init__(self, fh):
        """ constructor takes a filehandle """
        self.fh = fh
        self.fh.seek(0)
        hdrdata = self.fh.read(0x100)

        self.magic = hdrdata[0:4].decode('utf-8', 'ignore')
        if self.magic not in ('IDA0', 'IDA1', 'IDA2'):
            raise Exception("invalid file magic")

        values = struct.unpack_from("<6LH6L", hdrdata, 6)
        if values[5] != 0xaabbccdd:
            fileversion = 0
            offsets = list(values[0:5])
            offsets.append(0)
            checksums = [0 for _ in range(6)]
        else:
            fileversion = values[6]

            if fileversion < 5:
                offsets = list(values[0:5])
                checksums = list(values[8:13])
                idsofs, idscheck = struct.unpack_from("<LH" if fileversion == 1 else "<LL", hdrdata, 56)
                offsets.append(idsofs)
                checksums.append(idscheck)

                # note: filever 4  has '0x5c', zeros, md5, more zeroes
            elif fileversion == 6:
                values = struct.unpack_from("<QQLLHQQQ5LQL", hdrdata, 6)
                offsets = [values[_] for _ in (0, 1, 5, 6, 7, 13)]
                checksums = [values[_] for _ in (8, 9, 10, 11, 12, 14)]
            elif fileversion == 910:
                """
                +00: "IDA2", 0, 0
                +06: headersize
                +0e: datastart
                +16: aabbccdd00000000
                +1e: version
                +20: compression
                +21: 6 qwords   section-size
                +5d: md5
                """
                values = struct.unpack_from("<3QHB6Q", hdrdata, 6)
                offsets = [values[1]]
                self.sizes = values[5:]
                
                for s in self.sizes:
                    offsets.append(offsets[-1]+s)
                checksums = [0] * len(offsets)
                self.compression = values[4]
                if self.compression:
                    raise Exception("compression not supported for v910")
            else:
                raise Exception("unknown file version")

        # offsets now has offsets to the various idb parts
        #  id0, id1, nam, seg, til, id2 ( = sparse file )
        self.offsets = offsets
        self.checksums = checksums
        self.fileversion = fileversion

    def getsectioninfo(self, i):
        """
        Returns a tuple with section parameters by index.

        The parameteres are:
         * compression flag
         * data offset
         * data size
         * data checksum

        Sections are stored in a fixed order: id0, id1, nam, seg, til, id2
        """
        if not 0 <= i < len(self.offsets):
            return 0, 0, 0, 0

        if self.offsets[i] == 0:
            return 0, 0, 0, 0

        self.fh.seek(self.offsets[i])
        if self.fileversion < 5:
            comp, size = struct.unpack("<BL", self.fh.read(5))
            ofs = self.offsets[i] + 5
        elif self.fileversion == 6:
            comp, size = struct.unpack("<BQ", self.fh.read(9))
            ofs = self.offsets[i] + 9
        elif self.fileversion == 910:
            comp = 0
            size = self.sizes[i]
            ofs = self.offsets[i]
        else:
            raise Exception("unhandled file version")
        return comp, ofs, size, self.checksums[i]

    def getpart(self, ix):
        """
        Returns a fileobject for the specified section.

        This method optionally decompresses the data found in the .idb file,
        and returns a file-like object, with seek, read, tell.
        """
        if self.offsets[ix] == 0:
            return

        comp, ofs, size, checksum = self.getsectioninfo(ix)

        fh = FileSection(self.fh, ofs, ofs + size)
        if comp == 2:
            import zlib
            # very old databases used a different compression scheme:
            wbits = -15 if self.magic == 'IDA0' else 15

            fh = makeStringIO(zlib.decompress(fh.read(size), wbits))
        elif comp == 0:
            pass
        else:
            raise Exception("unsupported section encoding: %02x" % comp)
        return fh

    def getsection(self, cls):
        """
        Constructs an object for the specified section.
        """
        return cls(self, self.getpart(cls.INDEX))


class RecoverIDBFile:
    """
    RecoverIDBFile has the same interface as IDBFile, but expects the database to be split over several files.

    This is useful for opening  IDAv2.x databases, or for recovering data from unclosed databases.
    """
    id2ext = ['.id0', '.id1', '.nam', '.seg', '.til', '.id2']

    def __init__(self, args, basepath, dbfiles):
        if args.i64:
            self.magic = 'IDA2'
        else:
            self.magic = 'IDA1'
        self.basepath = basepath
        self.dbfiles = dbfiles
        self.fileversion = 0

    def getsectioninfo(self, i):
        if not 0 <= i < len(self.id2ext):
            return 0, 0, 0, 0
        ext = self.id2ext[i]
        if ext not in self.dbfiles:
            return 0, 0, 0, 0
        return 0, 0, os.path.getsize(self.dbfiles[ext]), 0

    def getpart(self, ix):
        if not 0 <= ix < len(self.id2ext):
            return None
        ext = self.id2ext[ix]
        if ext not in self.dbfiles:
            print("can't find %s" % ext)
            return None
        return open(self.dbfiles[ext], "rb")

    def getsection(self, cls):
        part = self.getpart(cls.INDEX)
        if part:
            return cls(self, part)


def binary_search(a, k):
    """
    Do a binary search in an array of objects ordered by '.key'

    returns the largest index for which:  a[i].key <= k

    like c++: a.upperbound(k)--
    """
    first, last = 0, len(a)
    while first < last:
        mid = (first + last) >> 1
        if k < a[mid].key:
            last = mid
        else:
            first = mid + 1
    return first - 1


"""
################################################################################

I would have liked to make these classes a nested class of BTree, but
the problem is than there is no way for a nested-nested class
of BTree to refer back to a toplevel nested class of BTree.
So moving these outside of BTree so i can use them as baseclasses
in the various page implementations

class BTree:
    class BaseEntry(object): pass
    class BasePage(object): pass
    class Page15(BasePage):
        class Entry(BTree.BaseEntry):
            pass

>>> NameError: name 'BTree' is not defined

"""


class BaseIndexEntry(object):
    """
    Baseclass for Index Entries.

    Index entries have a key + value, and a page containing keys larger than that key
    in this index entry.

    """
    def __init__(self, data):
        ofs = self.recofs
        if self.recofs < 6:
            # reading an invalid page...
            self.val = self.key = None
            return

        keylen, = struct.unpack_from("<H", data, ofs) ; ofs += 2
        self.key = data[ofs:ofs + keylen]  ; ofs += keylen
        vallen, = struct.unpack_from("<H", data, ofs) ; ofs += 2
        self.val = data[ofs:ofs + vallen]  ; ofs += vallen

    def __repr__(self):
        return "%06x: %s = %s" % (self.page, hexdump(self.key), hexdump(self.val))


class BaseLeafEntry(BaseIndexEntry):
    """
    Baseclass for Leaf Entries

    Leaf entries have a key + value, and an `indent`

    The `indent` is there to save space in the index, since subsequent keys
    usually are very similar.
    The indent specifies the offset where this key is different from the previous key
    """
    def __init__(self, key, data):
        """ leaf entries get the previous key a an argument. """
        super(BaseLeafEntry, self).__init__(data)
        self.key = key[:self.indent] + self.key

    def __repr__(self):
        return " %02x:%02x: %s = %s" % (self.unknown1, self.unknown, hexdump(self.key), hexdump(self.val))


class BTree(object):
    """
    BTree is the IDA main database engine.
    It allows the user to do a binary search for records with
    a specified key relation ( >, <, ==, >=, <= )
    """
    class BasePage(object):
        """
        Baseclass for Pages. for the various btree versions ( 1.5, 1.6 and 2.0 )
        there are subclasses which specify the exact layout of the page header,
        and index / leaf entries.

        Leaf pages don't have a 'preceeding' page pointer.

        """
        def __init__(self, data, entsize, entfmt):
            self.preceeding, self.count = struct.unpack_from(entfmt, data)
            if self.preceeding:
                entrytype = self.IndexEntry
            else:
                entrytype = self.LeafEntry

            self.index = []
            key = b""
            for i in range(self.count):
                ent = entrytype(key, data, entsize * (1 + i))
                self.index.append(ent)
                key = ent.key
            self.unknown, self.freeptr = struct.unpack_from(entfmt, data, entsize * (1 + self.count))

        def find(self, key):
            """
            Searches pages for key, returns relation to key:

            recurse -> found a next level index page to search for key.
                       also returns the next level page nr
            gt -> found a value with a key greater than the one searched for.
            lt -> found a value with a key less than the one searched for.
            eq -> found a value with a key equal to the one searched for.
                       gt, lt and eq return the index for the key found.

            # for an index entry: the key is 'less' than anything in the page pointed to.
            """
            i = binary_search(self.index, key)
            if i < 0:
                if self.isindex():
                    return ('recurse', -1)
                return ('gt', 0)
            if self.index[i].key == key:
                return ('eq', i)
            if self.isindex():
                return ('recurse', i)
            return ('lt', i)

        def getpage(self, ix):
            """ For Indexpages, returns the page ptr for the specified entry """
            return self.preceeding if ix < 0 else self.index[ix].page

        def getkey(self, ix):
            """ For all page types, returns the key for the specified entry """
            return self.index[ix].key

        def getval(self, ix):
            """ For all page types, returns the value for the specified entry """
            return self.index[ix].val

        def isleaf(self):
            """ True when this is a Leaf Page """
            return self.preceeding == 0

        def isindex(self):
            """ True when this is an Index Page """
            return self.preceeding != 0

        def __repr__(self):
            return ("leaf" if self.isleaf() else ("index<%d>" % self.preceeding)) + repr(self.index)

    ######################################################
    # Page objects for the various versions of the database
    ######################################################
    class Page15(BasePage):
        """ v1.5 b-tree page """
        class IndexEntry(BaseIndexEntry):
            def __init__(self, key, data, ofs):
                self.page, self.recofs = struct.unpack_from("<HH", data, ofs)
                self.recofs += 1   # skip unused zero byte in each key/value record
                super(self.__class__, self).__init__(data)

        class LeafEntry(BaseLeafEntry):
            def __init__(self, key, data, ofs):
                self.indent, self.unknown, self.recofs = struct.unpack_from("<BBH", data, ofs)
                self.unknown1 = 0
                self.recofs += 1   # skip unused zero byte in each key/value record
                super(self.__class__, self).__init__(key, data)

        def __init__(self, data):
            super(self.__class__, self).__init__(data, 4, "<HH")

    class Page16(BasePage):
        """ v1.6 b-tree page """
        class IndexEntry(BaseIndexEntry):
            def __init__(self, key, data, ofs):
                self.page, self.recofs = struct.unpack_from("<LH", data, ofs)
                self.recofs += 1   # skip unused zero byte in each key/value record
                super(self.__class__, self).__init__(data)

        class LeafEntry(BaseLeafEntry):
            def __init__(self, key, data, ofs):
                self.indent, self.unknown1, self.unknown, self.recofs = struct.unpack_from("<BBHH", data, ofs)
                self.recofs += 1   # skip unused zero byte in each key/value record
                super(self.__class__, self).__init__(key, data)

        def __init__(self, data):
            super(self.__class__, self).__init__(data, 6, "<LH")

    class Page20(BasePage):
        """ v2.0 b-tree page """
        class IndexEntry(BaseIndexEntry):
            def __init__(self, key, data, ofs):
                self.page, self.recofs = struct.unpack_from("<LH", data, ofs)
                # unused zero byte is no longer there in v2.0 b-tree
                super(self.__class__, self).__init__(data)

        class LeafEntry(BaseLeafEntry):
            def __init__(self, key, data, ofs):
                self.indent, self.unknown, self.recofs = struct.unpack_from("<HHH", data, ofs)
                self.unknown1 = 0
                super(self.__class__, self).__init__(key, data)

        def __init__(self, data):
            super(self.__class__, self).__init__(data, 6, "<LH")

    class Cursor:
        """
        A Cursor object represents a position in the b-tree.

        It has methods for moving to the next or previous item.
        And methods for retrieving the key and value of the current position

        The position is represented as a list of (page, index) tuples
        """
        def __init__(self, db, stack):
            self.db = db
            self.stack = stack

        def next(self):
            """ move cursor to next entry """
            page, ix = self.stack.pop()
            if page.isleaf():
                # from leaf move towards root
                ix += 1
                while self.stack and ix == len(page.index):
                    page, ix = self.stack.pop()
                    ix += 1
                if ix < len(page.index):
                    self.stack.append((page, ix))
            else:
                # from node move towards leaf
                self.stack.append((page, ix))
                page = self.db.readpage(page.getpage(ix))
                while page.isindex():
                    ix = -1
                    self.stack.append((page, ix))
                    page = self.db.readpage(page.getpage(ix))
                ix = 0
                self.stack.append((page, ix))

        def prev(self):
            """ move cursor to the previous entry """
            page, ix = self.stack.pop()
            ix -= 1
            if page.isleaf():
                # move towards root, until non 'prec' item found
                while self.stack and ix < 0:
                    page, ix = self.stack.pop()
                if ix >= 0:
                    self.stack.append((page, ix))
            else:
                # move towards leaf
                self.stack.append((page, ix))
                while page.isindex():
                    page = self.db.readpage(page.getpage(ix))
                    ix = len(page.index) - 1
                    self.stack.append((page, ix))

        def eof(self):
            return len(self.stack) == 0

        def getkey(self):
            """ return the key value pointed to by the cursor """
            page, ix = self.stack[-1]
            return page.getkey(ix)

        def getval(self):
            """ return the data value pointed to by the cursor """
            page, ix = self.stack[-1]
            return page.getval(ix)

        def __repr__(self):
            return "cursor:" + repr(self.stack)

    def __init__(self, fh):
        """ BTree constructor - takes a filehandle """
        self.fh = fh

        self.fh.seek(0)
        data = self.fh.read(64)

        if data[13:].startswith(b"B-tree v 1.5 (C) Pol 1990"):
            self.parseheader15(data)
            self.page = self.Page15
            self.version = 15
        elif data[19:].startswith(b"B-tree v 1.6 (C) Pol 1990"):
            self.parseheader16(data)
            self.page = self.Page16
            self.version = 16
        elif data[19:].startswith(b"B-tree v2"):
            self.parseheader16(data)
            self.page = self.Page20
            self.version = 20
        else:
            print("unknown btree: %s" % hexdump(data))
            raise Exception("unknown b-tree")

    def parseheader15(self, data):
        self.firstfree, self.pagesize, self.firstindex, self.reccount, self.pagecount = struct.unpack_from("<HHHLH", data, 0)

    def parseheader16(self, data):
        # v16 and v20 both have the same header format
        self.firstfree, self.pagesize, self.firstindex, self.reccount, self.pagecount = struct.unpack_from("<LHLLL", data, 0)

    def readpage(self, nr):
        self.fh.seek(nr * self.pagesize)
        return self.page(self.fh.read(self.pagesize))

    def find(self, rel, key):
        """
        Searches for a record with the specified relation to the key

        A cursor object is returned, the user can call getkey, getval on the cursor
        to retrieve the actual value.
        or call cursor.next() / cursor.prev() to enumerate values.

        'eq'  -> record equal to the key, None when not found
        'le'  -> last record with key <= to key
        'ge'  -> first record with key >= to key
        'lt'  -> last record with key < to key
        'gt'  -> first record with key > to key
        """

        # descend tree to leaf nearest to the `key`
        page = self.readpage(self.firstindex)
        stack = []
        while len(stack) < 256:
            act, ix = page.find(key)
            stack.append((page, ix))
            if act != 'recurse':
                break
            page = self.readpage(page.getpage(ix))

        if len(stack) == 256:
            raise Exception("b-tree corrupted")
        cursor = BTree.Cursor(self, stack)

        # now correct for what was actually asked.
        if act == rel:
            pass
        elif rel == 'eq' and act != 'eq':
            return None
        elif rel in ('ge', 'le') and act == 'eq':
            pass
        elif rel in ('gt', 'ge') and act == 'lt':
            cursor.next()
        elif rel == 'gt' and act == 'eq':
            cursor.next()
        elif rel in ('lt', 'le') and act == 'gt':
            cursor.prev()
        elif rel == 'lt' and act == 'eq':
            cursor.prev()

        return cursor

    def dump(self):
        """ raw dump of all records in the b-tree """
        print("pagesize=%08x, reccount=%08x, pagecount=%08x" % (self.pagesize, self.reccount, self.pagecount))
        self.dumpfree()
        self.dumptree(self.firstindex)

    def dumpfree(self):
        """ list all free pages """
        fmt = "L" if self.version > 15 else "H"
        hdrsize = 8 if self.version > 15 else 4
        pn = self.firstfree
        if pn == 0:
            print("no free pages")
            return
        while pn:
            self.fh.seek(pn * self.pagesize)
            data = self.fh.read(self.pagesize)
            if len(data) == 0:
                print("could not read FREE data at page %06x" % pn)
                break
            count, nextfree = struct.unpack_from("<" + (fmt * 2), data)
            freepages = list(struct.unpack_from("<" + (fmt * count), data, hdrsize))
            freepages.insert(0, pn)
            for pn in freepages:
                self.fh.seek(pn * self.pagesize)
                data = self.fh.read(self.pagesize)
                print("%06x: free: %s" % (pn, hexdump(data[:64])))
            pn = nextfree

    def dumpindented(self, pn, indent=0):
        """
        Dump all nodes of the current page with keys indented, showing how the `indent`
        feature works
        """
        page = self.readpage(pn)
        print("  " * indent, page)
        if page.isindex():
            print("  " * indent, end="")
            self.dumpindented(page.preceeding, indent + 1)
            for p in range(len(page.index)):
                print("  " * indent, end="")
                self.dumpindented(page.getpage(p), indent + 1)

    def dumptree(self, pn):
        """
        Walks entire tree, dumping all records on each page
        in sequential order
        """
        page = self.readpage(pn)
        print("%06x: preceeding = %06x, reccount = %04x" % (pn, page.preceeding, page.count))
        for ent in page.index:
            print("    %s" % ent)
        if page.preceeding:
            self.dumptree(page.preceeding)
            for ent in page.index:
                self.dumptree(ent.page)

    def pagedump(self):
        """
        dump the contents of all pages, ignoring links between pages,
        this will enable you to view contents of pages which have become
        lost due to datacorruption.
        """
        self.fh.seek(self.pagesize)
        pn = 1
        while True:
            try:
                pagedata = self.fh.read(self.pagesize)
                if len(pagedata) == 0:
                    break
                elif len(pagedata) != self.pagesize:
                    print("%06x: incomplete - %d bytes ( pagesize = %d )" % (pn, len(pagedata), self.pagesize))
                    break
                elif pagedata == b'\x00' * self.pagesize:
                    print("%06x: empty" % (pn))
                else:
                    page = self.page(pagedata)

                    print("%06x: preceeding = %06x, reccount = %04x" % (pn, page.preceeding, page.count))
                    for ent in page.index:
                        print("    %s" % ent)
            except Exception as e:
                print("%06x: ERROR decoding as B-tree page: %s" % (pn, e))
            pn += 1


class ID0File(object):
    """
    Reads .id0 or 0.ida  files, containing a v1.5, v1.6 or v2.0 b-tree database.

    This is basically the low level netnode interface from the idasdk.

    There are two major groups of nodes in the database:

    key = "N"+name  -> value = littleendian(nodeid)
    key = "."+bigendian(nodeid)+char(tag)+bigendian(value)
    key = "."+bigendian(nodeid)+char(tag)+string

    key = "."+bigendian(nodeid)+char(tag)

    and some special nodes for bookkeeping:
    "$ MAX LINK"
    "$ MAX NODE"
    "$ NET DESC"

    Very old databases also have name entries with a lowercase 'n',
    and corresponding '-'+value nodes.
    I am not sure what those are for.

    several items have specially named nodes, like "$ structs", "$ enums", "Root Node"

    nodeByName(name)  returns the nodeid for a name
    bytes(nodeid, tag, val)  returns the value for a specific node.

    """
    INDEX = 0

    def __init__(self, idb, fh):
        self.btree = BTree(fh)

        self.wordsize = None
        self.maxnode = None

        if idb.magic == 'IDA2':
            # .i64 files use 64 bit values for some things.
            self.wordsize = 8
        elif idb.magic in ('IDA0', 'IDA1'):
            self.wordsize = 4
        else:
            # determine wordsize from value of '$ MAX NODE'
            c = self.btree.find('eq', b'$ MAX NODE')
            if c and not c.eof():
                self.maxnode = c.getval()
                self.wordsize = len(c.getval())

        if self.wordsize not in (4, 8):
            print("Can not determine wordsize for database - assuming 32 bit")
            self.wordsize = 4

        if self.wordsize == 4:
            self.nodebase = 0xFF000000
            if not self.maxnode:
                self.maxnode = self.nodebase + 0x0FFFFF
            self.fmt = "L"
        else:
            self.nodebase = 0xFF00000000000000
            if not self.maxnode:
                self.maxnode = self.nodebase + 0x0FFFFFFF

            self.fmt = "Q"

        # set the keyformat for this database
        self.keyfmt = ">s" + self.fmt + "s" + self.fmt

    @cachedproperty
    def root(self): return self.nodeByName("Root Node")

    # note: versions before 4.7 used a short instead of a long
    # and stored the versions with one minor digit ( 43 ) , instead of two ( 480 )
    @cachedproperty
    def idaver(self): return self.int(self.root, 'A', -1)

    @cachedproperty
    def idbparams(self): return self.bytes(self.root, 'S', 0x41b994)
    @cachedproperty
    def idaverstr(self): return self.string(self.root, 'S', 1303)
    @cachedproperty
    def nropens(self): return self.int(self.root, 'A', -4)
    @cachedproperty
    def creationtime(self): return self.int(self.root, 'A', -2)
    @cachedproperty
    def originmd5(self): return self.bytes(self.root, 'S', 1302)
    @cachedproperty
    def somecrc(self): return self.int(self.root, 'A', -5)

    def prettykey(self, key):
        """
        returns the key in a readable format.
        """
        f = list(self.decodekey(key))
        f[0] = f[0].decode('utf-8')
        if len(f) > 2 and type(f[2]) == bytes:
            f[2] = f[2].decode('utf-8')

        if f[0] == '.':
            if len(f) == 2:
                return "%s%16x" % tuple(f)
            elif len(f) == 3:
                return "%s%16x %s" % tuple(f)
            elif len(f) == 4:
                if f[2] == 'H' and type(f[3]) in (str, bytes):
                    f[3] = f[3].decode('utf-8')
                    return "%s%16x %s '%s'" % tuple(f)
                elif type(f[3]) in (int, long):
                    return "%s%16x %s %x" % tuple(f)
                else:
                    f[3] = hexdump(f[3])
                    return "%s%16x %s %s" % tuple(f)
        elif f[0] in ('N', 'n', '$'):
            if type(f[1]) in (int, long):
                return "%s %x %16x" % tuple(f)
            else:
                return "%s'%s'" % tuple(f)
        elif f[0] == '-':
            return "%s %x" % tuple(f)

        return hexdump(key)

    def prettyval(self, val):
        """
        returns the value in a readable format.
        """
        if len(val) == self.wordsize and val[-1:] in (b'\x00', b'\xff'):
            return "%x" % struct.unpack("<" + self.fmt, val)
        if len(val) == self.wordsize and re.search(b'[\x00-\x08\x0b\x0c\x0e-\x1f]', val, re.DOTALL):
            return "%x" % struct.unpack("<" + self.fmt, val)
        if len(val) < 2 or not re.match(b'^[\x09\x0a\x0d\x20-\xff]+.$', val, re.DOTALL):
            return hexdump(val)
        val = val.replace(b"\n", b"\\n")
        return "'%s'" % val.decode('utf-8', 'ignore')

    def nodeByName(self, name):
        """ Return a nodeid by name """
        # note: really long names are encoded differently:
        #  'N'+'\x00'+pack('Q', nameid)  => ofs
        #  and  (ofs, 'N') -> nameid

        # at nodebase ( 0xFF000000, 'S', 0x100*nameid )  there is a series of blobs for max 0x80000 sized names.
        cur = self.btree.find('eq', self.namekey(name))
        if cur:
            return struct.unpack('<' + self.fmt, cur.getval())[0]

    def namekey(self, name):
        if type(name) in (int, long):
            return struct.pack("<sB" + self.fmt, b'N', 0, name)
        return b'N' + name.encode('utf-8')

    def makekey(self, *args):
        """
        Return a binary key for the nodeid, tag and optional value

        makekey(node)
        makekey(node, tag)
        makekey(node, tag, stringvalue)
        makekey(node, tag, intvalue)
        """
        if len(args) > 1:
            # utf-8 encode the tag
            args = args[:1] + (args[1].encode('utf-8'),) + args[2:]

        if len(args) == 3 and type(args[-1]) == str:
            # node.tag.string type keys
            return struct.pack(self.keyfmt[:1 + len(args)], b'.', *args[:-1]) + args[-1].encode('utf-8')
        elif len(args) == 3 and type(args[-1]) == type(-1) and args[-1] < 0:
            # negative values -> need lowercase fmt char
            return struct.pack(self.keyfmt[:1 + len(args)] + self.fmt.lower(), b'.', *args)
        else:
            # node.tag.value type keys
            return struct.pack(self.keyfmt[:2 + len(args)], b'.', *args)

    def decodekey(self, key):
        """
        splits a key in a tuple, one of:
           ( [ 'N', 'n', '$' ],  0,   bignameid )
           ( [ 'N', 'n', '$' ],  name  )
           ( '-',  id )
           ( '.',  id )
           ( '.',  id,  tag )
           ( '.',  id,  tag, value )
           ( '.',  id,  'H', name  )
        """
        if key[:1] in (b'n', b'N', b'$'):
            if key[1:2] == b"\x00" and len(key) == 2 + self.wordsize:
                return struct.unpack(">sB" + self.fmt, key)
            else:
                return key[:1], key[1:].decode('utf-8', 'ignore')
        if key[:1] == b'-':
            return struct.unpack(">s" + self.fmt, key)
        if len(key) == 1 + self.wordsize:
            return struct.unpack(self.keyfmt[:3], key)
        if len(key) == 1 + self.wordsize + 1:
            return struct.unpack(self.keyfmt[:4], key)
        if len(key) == 1 + 2 * self.wordsize + 1:
            return struct.unpack(self.keyfmt[:5], key)
        if len(key) > 1 + self.wordsize + 1:
            f = struct.unpack_from(self.keyfmt[:4], key)
            return f + (key[2 + self.wordsize:], )
        raise Exception("unknown key format")

    def bytes(self, *args):
        """ return a raw value for the given arguments """
        if len(args) == 1 and isinstance(args[0], BTree.Cursor):
            cur = args[0]
        else:
            cur = self.btree.find('eq', self.makekey(*args))

        if cur:
            return cur.getval()

    def int(self, *args):
        """
        Return the integer stored in the specified node.

        Any type of integer will be decoded: byte, short, long, long long

        """
        data = self.bytes(*args)
        if data is not None:
            if len(data) == 1:
                return struct.unpack("<B", data)[0]
            if len(data) == 2:
                return struct.unpack("<H", data)[0]
            if len(data) == 4:
                return struct.unpack("<L", data)[0]
            if len(data) == 8:
                return struct.unpack("<Q", data)[0]
            print("can't get int from %s" % hexdump(data))

    def string(self, *args):
        """ return string stored in node """
        data = self.bytes(*args)
        if data is not None:
            return data.rstrip(b"\x00").decode('utf-8')

    def name(self, id):
        """
        resolves a name, both short and long names.
        """
        data = self.bytes(id, 'N')
        if not data:
            print("%x has no name" % id)
            return
        if data[:1] == b'\x00':
            nameid, = struct.unpack_from(">" + self.fmt, data, 1)
            nameblob = self.blob(self.nodebase, 'S', nameid * 256, nameid * 256 + 32)
            return nameblob.rstrip(b"\x00").decode('utf-8')
        return data.rstrip(b"\x00").decode('utf-8')

    def blob(self, nodeid, tag, start=0, end=0xFFFFFFFF):
        """
        Blobs are stored in sequential nodes
        with increasing index values.

        most blobs, like scripts start at index
        0, long names start at a specified
        offset.

        """
        startkey = self.makekey(nodeid, tag, start)
        endkey = self.makekey(nodeid, tag, end)
        cur = self.btree.find('ge', startkey)
        data = b''
        while cur.getkey() <= endkey:
            data += cur.getval()
            cur.next()
        return data


class ID1File(object):
    """
    Reads .id1 or 1.IDA files, containing byte flags

    This is basically the information for the .idc GetFlags(ea),
    FirstSeg(), NextSeg(ea), SegStart(ea), SegEnd(ea) functions
    """
    INDEX = 1

    class SegInfo:
        def __init__(self, startea, endea, offset):
            self.startea = startea
            self.endea = endea
            self.offset = offset

    def __init__(self, idb, fh):
        if idb.magic == 'IDA2':
            wordsize, fmt = 8, "Q"
        else:
            wordsize, fmt = 4, "L"
        # todo: verify wordsize using the following heuristic:
        #  L -> starting at: seglistofs + nsegs*seginfosize  are all zero
        #  L -> starting at seglistofs .. nsegs*seginfosize every even word must be unique

        self.fh = fh
        fh.seek(0)
        hdrdata = fh.read(32)
        magic = hdrdata[:4]
        if magic in (b'Va4\x00', b'Va3\x00', b'Va2\x00', b'Va1\x00', b'Va0\x00'):
            nsegments, npages = struct.unpack_from("<HH", hdrdata, 4)
            #  filesize / npages == 0x2000  for all cases
            seglistofs = 8
            seginfosize = 3
        elif magic == b'VA*\x00':
            always3, nsegments, always2k, npages = struct.unpack_from("<LLLL", hdrdata, 4)
            if always3 != 3:
                print("ID1: first dword != 3: %08x" % always3)
            if always2k != 0x800:
                print("ID1: third dword != 2k: %08x" % always2k)
            seglistofs = 20
            seginfosize = 2
        else:
            raise Exception("unknown id1 magic: %s" % hexdump(magic))

        self.seglist = []
        # Va0  - ida v3.0.5
        # Va3  - ida v3.6
        fh.seek(seglistofs)
        if magic in (b'Va4\x00', b'Va3\x00', b'Va2\x00', b'Va1\x00', b'Va0\x00'):
            segdata = fh.read(nsegments * 3 * wordsize)
            for o in range(nsegments):
                startea, endea, id1ofs = struct.unpack_from("<" + fmt + fmt + fmt, segdata, o * seginfosize * wordsize)
                self.seglist.append(self.SegInfo(startea, endea, id1ofs))
        elif magic == b'VA*\x00':
            segdata = fh.read(nsegments * 2 * wordsize)
            id1ofs = 0x2000
            for o in range(nsegments):
                startea, endea = struct.unpack_from("<" + fmt + fmt, segdata, o * seginfosize * wordsize)
                self.seglist.append(self.SegInfo(startea, endea, id1ofs))
                id1ofs += 4 * (endea - startea)

    def is32bit_heuristic(self, fh, seglistofs):
        fh.seek(seglistofs)
        # todo: verify wordsize using the following heuristic:
        #  L -> starting at: seglistofs + nsegs*seginfosize  are all zero
        #  L -> starting at seglistofs .. nsegs*seginfosize every even word must be unique

    def dump(self):
        """ print first and last bits for each segment """
        for seg in self.seglist:
            print("==== %08x-%08x" % (seg.startea, seg.endea))
            if seg.endea - seg.startea < 30:
                for ea in range(seg.startea, seg.endea):
                    print("    %08x: %08x" % (ea, self.getFlags(ea)))
            else:
                for ea in range(seg.startea, seg.startea + 10):
                    print("    %08x: %08x" % (ea, self.getFlags(ea)))
                print("...")
                for ea in range(seg.endea - 10, seg.endea):
                    print("    %08x: %08x" % (ea, self.getFlags(ea)))

    def find_segment(self, ea):
        """ do a linear search for the given address in the segment list """
        for seg in self.seglist:
            if seg.startea <= ea < seg.endea:
                return seg

    def getFlags(self, ea):
        seg = self.find_segment(ea)
        if not seg:
            return 0
        self.fh.seek(seg.offset + 4 * (ea - seg.startea))
        return struct.unpack("<L", self.fh.read(4))[0]

    def firstSeg(self):
        return self.seglist[0].startea

    def nextSeg(self, ea):
        for i, seg in enumerate(self.seglist):
            if seg.startea <= ea < seg.endea:
                if i + 1 < len(self.seglist):
                    return self.seglist[i + 1].startea
                else:
                    return

    def segStart(self, ea):
        seg = self.find_segment(ea)
        if not seg:
            return
        return seg.startea

    def segEnd(self, ea):
        seg = self.find_segment(ea)
        if not seg:
            return
        return seg.endea


class NAMFile(object):
    """ reads .nam or NAMES.IDA files, containing ptrs to named items """
    INDEX = 2

    def __init__(self, idb, fh):
        if idb.magic == 'IDA2':
            wordsize, fmt = 8, "Q"
        else:
            wordsize, fmt = 4, "L"

        self.fh = fh
        fh.seek(0)
        hdrdata = fh.read(64)
        magic = hdrdata[:4]
        # Va0  - ida v3.0.5
        # Va1  - ida v3.6
        if magic in (b'Va4\x00', b'Va3\x00', b'Va2\x00', b'Va1\x00', b'Va0\x00'):
            always1, npages, always0, nnames, pagesize = struct.unpack_from("<HH" + fmt + fmt + "L", hdrdata, 4)
            if always1 != 1: print("nam: first hw = %d" % always1)
            if always0 != 0: print("nam: third dw = %d" % always0)
        elif magic == b'VA*\x00':
            always3, always1, always2k, npages, always0, nnames = struct.unpack_from("<LLLL" + fmt + "L", hdrdata, 4)
            if always3 != 3: print("nam: 3 hw = %d" % always3)
            if always1 != 1: print("nam: 1 hw = %d" % always1)
            if always0 != 0: print("nam: 0 dw = %d" % always0)
            if always2k != 0x800: print("nam: 2k dw = %d" % always2k)
            pagesize = 0x2000
        else:
            raise Exception("unknown nam magic: %s" % hexdump(magic))
        if idb.magic == 'IDA2':
            nnames >>= 1
        self.wordsize = wordsize
        self.wordfmt = fmt
        self.nnames = nnames
        self.pagesize = pagesize

    def dump(self):
        print("nam: nnames=%d, npages=%d, pagesize=%08x" % (self.nnames, self.npages, self.pagesize))

    def allnames(self):
        self.fh.seek(self.pagesize)
        n = 0
        while n < self.nnames:
            data = self.fh.read(self.pagesize)
            want = min(self.nnames - n, int(self.pagesize / self.wordsize))
            ofslist = struct.unpack_from("<%d%s" % (want, self.wordfmt), data, 0)
            for ea in ofslist:
                yield ea
            n += want


class SEGFile(object):
    """ reads .seg or $SEGS.IDA files.  """
    INDEX = 3

    def __init__(self, idb, fh):
        pass


class TILFile(object):
    """ reads .til files """
    INDEX = 4

    def __init__(self, idb, fh):
        pass
# note: v3 databases had a .reg instead of .til


class ID2File(object):
    """
    Reads .id2 files

    ID2 sections contain packed data, resulting in tripples
    of unknown use.
    """
    INDEX = 5

    def __init__(self, idb, fh):
        pass


class Struct:
    """
    Decodes info for structures

    (structnode, N)          = structname
    (structnode, D, address) = xref-type
    (structnode, M, 0)       = packed struct info
    (structnode, S, 27)      = packed value(addr, byte)
    """
    class Member:
        """
           (membernode, N)          = struct.member-name
           (membernode, A, 3)       = structid+1
           (membernode, A, 8)       = 
           (membernode, A, 11)      = enumid+1
           (membernode, A, 16)      = flag?  -- 4:variable length flag?
           (membernode, S, 0x3000)  = type (set with 'Y')
           (membernode, S, 0x3001)  = names used in 'type'
           (membernode, S, 5)       = array type?
           (membernode, S, 9)       = offset-type
           (membernode, D, address) = xref-type
           (membernode, d, structid) = xref-type   -- for sub-structs
        """
        def __init__(self, id0, spec):
            self._id0 = id0
            self._nodeid = spec.nextword() +  self._id0.nodebase
            self.skip = spec.nextword()
            self.size = spec.nextword()
            self.flags = spec.next32()
            self.props = spec.next32()
            self.ofs = None
        @cachedproperty
        def name(self): return self._id0.name(self._nodeid)
        @cachedproperty
        def enumid(self): return self._id0.int(self._nodeid, 'A', 11)
        @cachedproperty
        def stringtype(self): return self._id0.int(self._nodeid, 'A', 16)
        @cachedproperty
        def structid(self): return self._id0.int(self._nodeid, 'A', 3)
        @cachedproperty
        def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
        @cachedproperty
        def ptrinfo(self): return self._id0.bytes(self._nodeid, 'S', 9)
        @cachedproperty
        def typeinfo(self): return self._id0.bytes(self._nodeid, 'S', 0x3000)

    def __init__(self, id0, nodeid):
        self._id0 = id0
        self._nodeid = nodeid

        spec = self._id0.blob(self._nodeid, 'M')
        p = IdaUnpacker(self._id0.wordsize, spec)
        if self._id0.idaver >= 40:
            #    1 = SF_VAR, 2 = SF_UNION, 4 = SF_HASHUNI, 8 = SF_NOLIST, 0x10 = SF_TYPLIB, 0x20 = SF_HIDDEN, 0x40 = SF_FRAME, 0xF80 = SF_ALIGN, 0x1000 = SF_GHOST
            self.flags = p.next32()
        else:
            self.flags = 0

        nmembers = p.next32()

        self.members = []
        o = 0
        for i in range(nmembers):
            m = Struct.Member(self._id0, p)
            m.ofs = o
            o += m.size

            self.members.append(m)

        self.extra = []
        while not p.eof():
            self.extra.append(p.next32())

    @cachedproperty
    def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
    @cachedproperty
    def name(self): return self._id0.name(self._nodeid)

    def __iter__(self):
        for m in self.members:
            yield m


class Enum:
    """
       (enumnode, N)     = enum-name
       (enumnode, A, -1) = nr of values
       (enumnode, A, -3) = representation
       (enumnode, A, -5) = flags: bitfield, hidden, ...
       (enumnode, A, -8) = 
       (enumnode, E, value) = valuenode + 1
        
    """
    class Member:
        """
           (membernode, N)      = membername
           (membernode, A, -2)  = enumnode + 1
           (membernode, A, -3)  = member value
        """
        def __init__(self, id0, nodeid):
            self._id0 = id0
            self._nodeid = nodeid

        @cachedproperty
        def value(self): return self._id0.int(self._nodeid, 'A', -3)
        @cachedproperty
        def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
        @cachedproperty
        def name(self): return self._id0.name(self._nodeid)

    def __init__(self, id0, nodeid):
        self._id0 = id0
        self._nodeid = nodeid

    @cachedproperty
    def count(self): return self._id0.int(self._nodeid, 'A', -1)
    @cachedproperty
    def representation(self): return self._id0.int(self._nodeid, 'A', -3)

    # flags>>3 -> width
    # flags&1 -> bitfield
    @cachedproperty
    def flags(self): return self._id0.int(self._nodeid, 'A', -5)

    @cachedproperty
    def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
    @cachedproperty
    def name(self): return self._id0.name(self._nodeid)

    def __iter__(self):
        startkey = self._id0.makekey(self._nodeid, 'E')
        endkey = self._id0.makekey(self._nodeid, 'F')
        cur = self._id0.btree.find('ge', startkey)
        while cur.getkey() < endkey:
            yield Enum.Member(self._id0, self._id0.int(cur) - 1)
            cur.next()


class Bitfield:
    class Member:
        def __init__(self, id0, nodeid):
            self._id0 = id0
            self._nodeid = nodeid

        @cachedproperty
        def value(self): return self._id0.int(self._nodeid, 'A', -3)
        @cachedproperty
        def mask(self): return self._id0.int(self._nodeid, 'A', -6) - 1
        @cachedproperty
        def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
        @cachedproperty
        def name(self): return self._id0.name(self._nodeid)

    class Mask:
        def __init__(self, id0, nodeid, mask):
            self._id0 = id0
            self._nodeid = nodeid
            self.mask = mask

        @cachedproperty
        def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
        @cachedproperty
        def name(self): return self._id0.name(self._nodeid)

        def __iter__(self):
            """
            Enumerates all Masks
            """
            startkey = self._id0.makekey(self._nodeid, 'E')
            endkey = self._id0.makekey(self._nodeid, 'F')
            cur = self._id0.btree.find('ge', startkey)
            while cur.getkey() < endkey:
                yield Bitfield.Member(self._id0, self._id0.int(cur) - 1)
                cur.next()


    def __init__(self, id0, nodeid):
        self._id0 = id0
        self._nodeid = nodeid

    @cachedproperty
    def count(self): return self._id0.int(self._nodeid, 'A', -1)
    @cachedproperty
    def representation(self): return self._id0.int(self._nodeid, 'A', -3)
    @cachedproperty
    def flags(self): return self._id0.int(self._nodeid, 'A', -5)

    @cachedproperty
    def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
    @cachedproperty
    def name(self): return self._id0.name(self._nodeid)

    def __iter__(self):
        """
        Enumerates all Masks
        """
        startkey = self._id0.makekey(self._nodeid, 'm')
        endkey = self._id0.makekey(self._nodeid, 'n')
        cur = self._id0.btree.find('ge', startkey)
        while cur.getkey() < endkey:
            key = self._id0.decodekey(cur.getkey())
            yield Bitfield.Mask(self._id0, self._id0.int(cur) - 1, key[-1])
            cur.next()

class IDBParams:
    def __init__(self, id0, data):
        self._id0 = id0
        magic, self.version,  = struct.unpack_from("<3sH", data, 0)
        if self.version<700:
            cpu, self.idpflags, self.demnames, self.filetype, self.coresize, self.corestart, self.ostype, self.apptype = struct.unpack_from("<8sBBH" + (id0.fmt * 2) + "HH", data, 5)
            self.cpu = strz(cpu, 0)
        else:
            p = IdaUnpacker(id0.wordsize, data[5:])
            cpulen = p.next32()
            self.cpu = p.bytes(cpulen)
            genflags = p.next32()
            self.idpflags = p.next32()
            self.demnames = 0
            changecount = p.next32()
            self.filetype = p.next32()
            self.ostype = p.next32()
            self.apptype = p.next32()
            asmtype = p.next32()
            specsegs = p.next32()
            specsegs = p.next32()
            aflags = p.next32()
            aflags2 = p.next32()
            base = p.nextword()
            startss = p.nextword()
            startcs = p.nextword()
            startip = p.nextword()
            startea = p.nextword()
            startsp = p.nextword()
            main = p.nextword()
            minea = p.nextword()
            maxea = p.nextword()

            self.coresize = 0
            self.corestart = 0

class Script:
    def __init__(self, id0, nodeid):
        self._id0 = id0
        self._nodeid = nodeid

    @cachedproperty
    def name(self): return self._id0.string(self._nodeid, 'S', 0)
    @cachedproperty
    def language(self): return self._id0.string(self._nodeid, 'S', 1)
    @cachedproperty
    def body(self): return strz(self._id0.blob(self._nodeid, 'X'), 0)

class Segment:
    """
    Decodes a value from "$ segs", see segment_t in segment.hpp for details.
    """
    def __init__(self, id0, spec):
        self._id0 = id0
        p = IdaUnpacker(id0.wordsize, spec)
        self.startea = p.nextword()
        self.size = p.nextword()
        self.name_id = p.nextword()
        self.class_id = p.nextword()
        self.orgbase = p.nextword()
        self.unknown = p.next16()
        self.align = p.next8()
        self.comb = p.next8()
        self.perm = p.next8()
        self.bitness = p.next8()
        self.flags = p.next8()
        self.selector = p.nextword()
        self.defsr = [p.nextword() for _ in range(16)]
        self.color = p.next32()


================================================
FILE: idbtool.py
================================================
#!/usr/bin/python3
"""
Tool for querying information from Hexrays .idb and .i64 files
without launching IDA.

Copyright (c) 2016 Willem Hengeveld <itsme@xs4all.nl>
"""

# todo:
#  '$ segs'
#      S <segaddr> = packed(startea, size, ....)
#  '$ srareas'
#      a <addr>    = packed(startea, size, flag, flag)  -- includes functions
#      b <addr>    = packed(startea, size, flag, flag)  -- segment
#      c <addr>    = packed(startea, size, flag, flag)  -- same as 'b'
#       
from __future__ import division, print_function, absolute_import, unicode_literals
import sys
import os
if sys.version_info[0] == 2:
    import scandir
    os.scandir = scandir.scandir
if sys.version_info[0] == 2:
    reload(sys)
    sys.setdefaultencoding('utf-8')

if sys.version_info[0] == 2:
    stdout = sys.stdout
else:
    stdout = sys.stdout.buffer

import struct
import binascii
import argparse
import itertools
from collections import defaultdict

import re

from datetime import datetime

import idblib
from idblib import hexdump


def timestring(t):
    if t == 0:
        return "....-..-.. ..:..:.."
    return datetime.strftime(datetime.fromtimestamp(t), "%Y-%m-%d %H:%M:%S")


def strz(b, o):
    return b[o:b.find(b'\x00', o)].decode('utf-8', 'ignore')

def nonefmt(fmt, num):
    if num is None:
        return "-"
    return fmt % num

######### license encoding ################


def decryptuser(data):
    """
    The '$ original user' node is encrypted with hexray's private key.
    Hence we can easily decrypt it, but not change it to something else.
    We can however copy the entry from another database, or just replace it with garbage.

    The node contains 128 bytes encrypted license, followed by 32 bytes zero.

    Note: i found several ida55 databases online where this does not work.
    possible these were created using a cracked version of IDA.
    """
    data = int(binascii.b2a_hex(data[127::-1]), 16)
    user = pow(data, 0x13, 0x93AF7A8E3A6EB93D1B4D1FB7EC29299D2BC8F3CE5F84BFE88E47DDBDD5550C3CE3D2B16A2E2FBD0FBD919E8038BB05752EC92DD1498CB283AA087A93184F1DD9DD5D5DF7857322DFCD70890F814B58448071BBABB0FC8A7868B62EB29CC2664C8FE61DFBC5DB0EE8BF6ECF0B65250514576C4384582211896E5478F95C42FDED)
    user = binascii.a2b_hex("%0256x" % user)
    return user[1:]


def licensestring(lic):
    """ decode a license blob """
    if not lic:
        return
    if len(lic) < 127:
        print("too short license format: %s" % binascii.b2a_hex(lic))
        return
    elif len(lic) > 127 and sum(lic[127:]) != 0:
        print("too long license format: %s" % binascii.b2a_hex(lic))
        return

    if struct.unpack_from("<L", lic, 106)[0]:
        print("unknown license format: %s" % binascii.b2a_hex(lic))
        return

    # first 2 bytes probably a checksum

    licver, = struct.unpack_from("<H", lic, 2)
    time, = struct.unpack_from("<L", lic, 4)

    # new 'Freeware version'  has licver == 0 as well, but is new format anyway, it is recognizable by time==0x10000
    if licver == 0 and time != 0x10000:
        if time:
            """
            # up to and including ida v5.2

            +00:  int16 checksum?
            +02:  int16 zero
            +04:  int32 unix timestamp
            +08:  byte[8]  zero
            +10:  int32 flags
            +14:  char[107]  license text
            """

            licflags, = struct.unpack_from("<L", lic, 16)
            licensee = strz(lic, 20)
            return "%s [%08x]  %s" % (timestring(time), licflags, licensee)
        else:
            """
            +00: byte[0x13]  zero
            +13: int32 ?
            +17: int32 timestamp
            +1b: byte[8]  zero
            +23: int32 flags
            +27: char[88]  license text
            """
            unk, = struct.unpack_from("<L", lic, 0x13)
            time, = struct.unpack_from("<L", lic, 0x17)
            licflags, = struct.unpack_from("<L", lic, 0x23)
            licensee = strz(lic, 0x27)

            return "%s [%08x] (%08x)  %s" % (timestring(time), licflags, unk, licensee)
    else:
        """
        # since ida v5.3

        +00: int16 checksum?
        +02: int16 idaversion
        +04: int32 ? small number, 1 or 2.
        +08: int64 ? -1  or big number,  maybe license flags?
        +10: int32 timestamp
        +14: int32  zero
        +18: int32  sometimes another timestamp
        +1c: byte[6]  license id
        +22: char[*] license text   ( v5.3-v5.x : 93 chars,  v6.0: 77 chars, v6.5: 69 chars )
        +67: int64 ?  since ida v6.50
        +6f: byte[16] hash   .. since ida v6.00
        """
        time1, = struct.unpack_from("<L", lic, 16)
        time2, = struct.unpack_from("<L", lic, 16 + 8)
        licid = "%02X-%02X%02X-%02X%02X-%02X" % struct.unpack_from("6B", lic, 28)
        licensee = strz(lic, 34)
        return "v%04d %s .. %s  %s  %s" % (licver, timestring(time1), timestring(time2), licid, licensee)


def dumpuser(id0):
    """ dump the original, and current database user """
    orignode = id0.nodeByName('$ original user')
    if orignode:
        user0 = id0.bytes(orignode, 'S', 0)
        if user0:
            if user0.find(b'\x00\x00\x00\x00') >= 128:
                user0 = decryptuser(user0)
            else:
                user0 = user0[:127]
            # user0 has 128 bytes rsa encrypted license, followed by 32 bytes zero
            print("orig: %s" % licensestring(user0))
        # ida9 has S10+S11 == license json
        user10 = id0.blob(orignode, 'S', 16)
        if user10:
            import json
            user10 = json.loads(user10)
            print("orig: %s" % user10)
    curnode = id0.nodeByName('$ user1')
    if curnode:
        user1 = id0.bytes(curnode, 'S', 0)
        print("user: %s" % licensestring(user1))


######### idb summary #########


filetypelist = [
    "MS DOS EXE File",
    "MS DOS COM File",
    "Binary File",
    "MS DOS Driver",
    "New Executable (NE)",
    "Intel Hex Object File",
    "MOS Technology Hex Object File",
    "Linear Executable (LX)",
    "Linear Executable (LE)",
    "Netware Loadable Module (NLM)",
    "Common Object File Format (COFF)",
    "Portable Executable (PE)",
    "Object Module Format",
    "R-records",
    "ZIP file (this file is never loaded to IDA database)",
    "Library of OMF Modules",
    "ar library",
    "file is loaded using LOADER DLL",
    "Executable and Linkable Format (ELF)",
    "Watcom DOS32 Extender (W32RUN)",
    "Linux a.out (AOUT)",
    "PalmPilot program file",
    "MS DOS EXE File",
    "MS DOS COM File",
    "AIX ar library",
    "Mac OS X Mach-O file",
]


def dumpinfo(id0):
    """ print various infos on the idb file """
    def ftstring(ft):
        if 0 < ft < len(filetypelist):
            return "%02x:%s" % (ft, filetypelist[ft])
        return "%02x:unknown" % ft

    def decodebitmask(fl, bitnames):
        l = []
        knownbits = 0
        for bit, name in enumerate(bitnames):
            if fl & (1 << bit) and name is not None:
                l.append(name)
                knownbits |= 1 << bit
        if fl & ~knownbits:
            l.append("unknown_%x" % (fl & ~knownbits))
        return ",".join(l)

    def osstring(fl):
        return decodebitmask(fl, ['msdos', 'win', 'os2', 'netw', 'unix', 'other'])

    def appstring(fl):
        return decodebitmask(fl, ['console', 'graphics', 'exe', 'dll', 'driver', '1thread', 'mthread', '16bit', '32bit', '64bit'])

    ldr = id0.nodeByName("$ loader name")
    if ldr:
        print("loader: %s %s" % (id0.string(ldr, 'S', 0), id0.string(ldr, 'S', 1)))

    if not id0.root:
        print("database has no RootNode")
        return

    if id0.idbparams:
        params = idblib.IDBParams(id0, id0.idbparams)
        print("cpu: %s, version=%d, filetype=%s, ostype=%s, apptype=%s, core:%x, size:%x" % (params.cpu, params.version, ftstring(params.filetype), osstring(params.ostype), appstring(params.apptype), params.corestart, params.coresize))

    print("idaver=%s: %s" % (nonefmt("%04d", id0.idaver), id0.idaverstr))

    srcmd5 = id0.originmd5
    print("nopens=%s, ctime=%s, crc=%s, md5=%s" % (nonefmt("%d", id0.nropens), nonefmt("%08x", id0.creationtime), nonefmt("%08x", id0.somecrc), hexdump(srcmd5) if srcmd5 else "-"))

    dumpuser(id0)


def dumpnames(args, id0, nam):
    for ea in nam.allnames():
        print("%08x: %s" % (ea, id0.name(ea)))


def dumpscript(id0, node):
    """ dump all stored scripts """
    s = idblib.Script(id0, node)

    print("======= %s %s =======" % (s.language, s.name))
    print(s.body)


def dumpstructmember(m):
    """
    Dump info for a struct member.
    """
    print("     %02x %02x %08x %02x: %-40s" % (m.skip, m.size, m.flags, m.props, m.name), end="")
    if m.enumid:
        print(" enum %08x" % m.enumid, end="")
    if m.structid:
        print(" struct %08x" % m.structid, end="")
    if m.ptrinfo:
        # packed
        # note: 64bit nrs are stored low32, high32
        #  flags1, target, base, delta, flags2

        # flags1:
        #   0=off8  1=off16 2=off32 3=low8  4=low16 5=high8 6=high16 9=off64
        #   0x10 = targetaddr, 0x20 = baseaddr, 0x40 = delta, 0x80 = base is plainnum
        # flags2:
        #   1=image is off, 0x10 = subtract, 0x20 = signed operand
        print(" ptr %s" % m.ptrinfo, end="")
    if m.typeinfo:
        print(" type %s" % m.typeinfo, end="")
    print()


def dumpstruct(id0, node):
    """
    dump all info for the struct defined by `node`
    """
    s = idblib.Struct(id0, node)


    print("struct %s, 0x%x" % (s.name, s.flags))
    for m in s:
        dumpstructmember(m)

def dumpbitmember(m):
    print("        %08x %s" % (m.value or 0, m.name))
def dumpmask(m):
    print("    mask %08x %s" % (m.mask, m.name))
    for m in m:
        dumpbitmember(m)
def dumpbitfield(id0, node):
    b = idblib.Bitfield(id0, node)
    print("bitfield %s, %s, %s, %s" % (b.name, nonefmt("0x%x", b.count), nonefmt("0x%x", b.representation), nonefmt("0x%x", b.flags)))
    for m in b:
        dumpmask(m)

def dumpenummember(m):
    """
    Print information on a single enum member
    """
    print("    %08x %s" % (m.value or 0, m.name))

def dumpenum(id0, node):
    """
    Dump all info for the enum defined by `node`
    """
    e = idblib.Enum(id0, node)
    if e.flags and e.flags&1:
        dumpbitfield(id0, node)
        return
    print("enum %s, %s, %s, %s" % (e.name, nonefmt("0x%x", e.count), nonefmt("0x%x", e.representation), nonefmt("0x%x", e.flags)))

    for m in e:
        dumpenummember(m)


def dumpimport(id0, node):
    # Note that '$ imports' is a list where the actual nodes
    # are stored in the list, therefore we add '1' to the node here.

    # first the named imports
    startkey = id0.makekey(node+1, 'S')
    endkey = id0.makekey(node+1, 'T')
    cur = id0.btree.find('ge', startkey)
    while cur.getkey() < endkey:
        txt = id0.string(cur)
        key = cur.getkey()
        ea = id0.decodekey(key)[3]
        print("%08x: %s" % (ea, txt))
        cur.next()

    # then list the imports by ordinal
    startkey = id0.makekey(node+1, 'A')
    endkey = id0.makekey(node+1, 'B')
    cur = id0.btree.find('ge', startkey)
    while cur.getkey() < endkey:
        ordinal = id0.decodekey(cur.getkey())[3]
        ea = id0.int(cur)
        print("%08x: (ord%04d) %s" % (ea, ordinal, id0.name(ea)))
        cur.next()


def enumlist(id0, listname, callback):
    """
    Lists are all stored in a similar way.

    (listnode, 'N')           = listname
    (listnode, 'A', -1)       = list size      <-- not for '$ scriptsnippets'
    (listnode, 'A', seqnr)    = itemnode+1

    (listnode, 'Y', itemnode) = seqnr          <-- only with '$ enums'

    (listnode, 'Y', 0)        = list size      <-- only '$ scriptsnippets'
    (listnode, 'Y', 1)        = ?              <-- only '$ scriptsnippets'

    (listnode, 'S', seqnr)    = dllname        <-- only '$ imports'

    """
    listnode = id0.nodeByName(listname)
    if not listnode:
        return

    startkey = id0.makekey(listnode, 'A')
    endkey = id0.makekey(listnode, 'A', 0xFFFFFFFF)
    cur = id0.btree.find('ge', startkey)
    while cur.getkey() < endkey:
        item = id0.int(cur)
        callback(id0, item - 1)
        cur.next()


def listfuncdirs(id0):
    listnode = id0.nodeByName('$ dirtree/funcs')
    if not listnode:
        return

    dir_id = 0
    while True:
        start = dir_id * 0x10000
        end = start + 0xFFFF
        data = id0.blob(listnode, 'S', start, end)
        if data == b'':
            break
        dumpfuncdir(id0, dir_id, data)
        dir_id += 1


def dumpfuncdir(id0, dir_index, data):
    terminate = data.find(b'\0', 1)
    name = data[1:terminate].decode('utf-8')

    p = idblib.IdaUnpacker(id0.wordsize, data[terminate+1:])
    parent = p.nextword()
    unk = p.next32()
    
    if data[0] == 0:  # IDA 7.5
        subdir_count = p.next32()
        subdirs = []
        while subdir_count:
            subdir_id = p.nextwordsigned()
            if subdirs:
                subdir_id = subdirs[-1] + subdir_id
            subdirs.append(subdir_id)
            subdir_count -= 1

        func_count = p.next32()
        funcs = []
        while func_count:
            func_id = p.nextwordsigned()
            if funcs:
                func_id = funcs[-1] + func_id
            funcs.append(func_id)
            func_count -= 1

    elif data[0] == 1:  # IDA 7.6
        children_count = p.next32()
        children = []
        for i in range(children_count):
            next_child = p.nextwordsigned()
            if children:
                next_child += children[-1]
            children.append(next_child)

        subdir_count = p.next32()
        children_count -= subdir_count
        childtype_counts = [subdir_count]
        while children_count:
            childtype_count = p.next32()
            children_count -= childtype_count
            childtype_counts.append(childtype_count)

        subdirs = []
        funcs = []
        i = 0
        parsing_subdirs = True  # switch back and forth
        for childtype_count in childtype_counts:
            for _ in range(childtype_count):
                if parsing_subdirs:
                    subdirs.append(children[i])
                else:
                    funcs.append(children[i])
                i += 1
            parsing_subdirs = not parsing_subdirs
    else:
        raise NotImplementedError('unsupported funcdir schema')

    if not p.eof():
        raise Exception('not EOF after dir parsed')

    print("dir %d = %s" % (dir_index, name))
    print("  parent = %d" % parent)
    print("  subdirs:")
    for subdir in subdirs:
        print("    %d" % subdir)
    print("  functions:")
    for func in funcs:
        print("    0x%x" % func)


def printent(args, id0, c):
    if args.verbose:
        print("%s = %s" % (id0.prettykey(c.getkey()), id0.prettyval(c.getval())))
    else:
        print("%s = %s" % (hexdump(c.getkey()), hexdump(c.getval())))


def createkey(args, id0, base, tag, ix):
    """

    parse base node specification:

    '?<name>' -> explicit N<name> key
    '#<number>' -> relative to nodebase
    '.<number>' -> absolute nodeid

    '<name>'  -> lookup by name.

    """
    if base[:1] == '?':
        return id0.namekey(base[1:])

    if re.match(r'^#(?:0[xX][0-9a-fA-F]+|\d+)$', base):
        nodeid = int(base[1:], 0) + id0.nodebase
    elif re.match(r'^\.(?:0[xX][0-9a-fA-F]+|\d+)$', base):
        nodeid = int(base[1:], 0)
    else:
        nodeid = id0.nodeByName(base)
        if nodeid and args.verbose > 1:
            print("found node %x for %s" % (nodeid, base))
    if nodeid is None:
        print("Could not find '%s'" % base)
        return

    s = [nodeid]
    if tag is not None:
        s.append(tag)
        if ix is not None:
            try:
                ix = int(ix, 0)
            except:
                pass
            s.append(ix)

    return id0.makekey(*s)


def enumeratecursor(args, c, onerec, callback):
    """
    Enumerate cursor in direction specified by `--dec` or `--inc`,
    taking into account the optional limit set by `--limit`

    Output according to verbosity level set by `--verbose`.
    """
    limit = args.limit
    while c and not c.eof() and (limit is None or limit > 0):
        callback(c)
        if args.dec:
            c.prev()
        else:
            c.next()
        if limit is not None:
            limit -= 1
        elif onerec:
            break


def id0query(args, id0, query):
    """
    queries start with an optional operator: <,<=,>,>=,==

    followed by either a name or address or nodeid

    Addresses are specified as a sequence of hexadecimal charaters.
    Nodeid's may be specified either as the full node id, starting with ff00,
    or starting with a '_'
    Names are anything which can be found under the name tree in the database.

    after the name/addr/node there is optionally a slash, followed by a node tag,
    and another slash, followed by a index or hash string.

    """

    xlatop = {'=': 'eq', '==': 'eq', '>': 'gt', '<': 'lt', '>=': 'ge', '<=': 'le'}

    SEP = r";"
    m = re.match(r'^([=<>]=?)?(.+?)(?:' + SEP + r'(\w+)(?:' + SEP + r'(.+))?)?$', query)
    op = m.group(1) or "=="
    base = m.group(2)
    tag = m.group(3)  # optional ;tag
    ix = m.group(4)   # optional ;ix

    op = xlatop[op]

    c = id0.btree.find(op, createkey(args, id0, base, tag, ix))

    enumeratecursor(args, c, op=='eq', lambda c:printent(args, id0, c))


def getsegs(id0):
    """
    Returns a list of all segments.
    """
    seglist = []
    node = id0.nodeByName('$ segs')
    if not node:
        return
    startkey = id0.makekey(node, 'S')
    endkey = id0.makekey(node, 'T')
    cur = id0.btree.find('ge', startkey)
    while cur.getkey() < endkey:
        s = idblib.Segment(id0, cur.getval())
        seglist.append(s)
        cur.next()

    return seglist


def listsegments(id0):
    """
    Print a summary of all segments found in the IDB.
    """
    ssnode = id0.nodeByName('$ segstrings')
    if not ssnode:
        print("can't find '$ segstrings' node")
        return
    segstrings = id0.blob(ssnode, 'S')
    p = idblib.IdaUnpacker(id0.wordsize, segstrings)
    unk = p.next32()
    nextid = p.next32()
    slist = []
    while not p.eof():
        slen = p.next32()
        if slen is None:
            break
        name = p.bytes(slen)
        if name is None:
            break
        slist.append(name.decode('utf-8', 'ignore'))

    segs = getsegs(id0)
    for s in segs:
        print("%08x - %08x  %s" % (s.startea, s.startea+s.size, slist[s.name_id-1]))

def classifynodes(args, id0):
    """
    Attempt to classify all nodes in the IDA database.

    Note: this does not work for very old dbs
    """
    nodetype = {}
    tagstats = defaultdict(lambda : defaultdict(int))

    segs = getsegs(id0)

    print("node: %x .. %x" % (id0.nodebase, id0.maxnode))

    def addstat(nodetype, k):
        if len(k)<3:
            print("??? strange, expected longer key - %s" % k)
            return
        tag = k[2].decode('utf-8')
        if len(k)==3:
            tagstats[nodetype][(tag, )] += 1
        elif len(k)==4:
            value = k[3]
            if type(value)==int:
                if isaddress(value):
                    tagstats[nodetype][(tag, 'addr')] += 1
                elif isnode(value):
                    tagstats[nodetype][(tag, 'node')] += 1
                else:
                    if value >= id0.maxnode:
                        value -= pow(0x100, id0.wordsize)
                    tagstats[nodetype][(tag, value)] += 1
            else:
                tagstats[nodetype][(tag, 'string')] += 1
        else:
            print("??? strange, expected shorter key - %s" % k)
            return

    def isaddress(addr):
        for s in segs:
            if s.startea <= addr < s.startea+s.size:
                return True

    def isnode(addr):
        return id0.nodebase <= addr <= id0.maxnode

    def processbitfieldvalue(v):
        nodetype[v._nodeid] = 'bitfieldvalue'

    def processbitfieldmask(m):
        nodetype[m._nodeid] = 'bitfieldmask'

        for m in m:
            processbitfieldvalue(m)

    def processbitfield(id0, node):
        nodetype[node] = 'bitfield'

        b = idblib.Bitfield(id0, node)
        for m in b:
            processbitfieldmask(m)


    def processenummember(m):
        nodetype[m._nodeid] = 'enummember'

    def processenums(id0, node):
        nodetype[node] = 'enum'

        e = idblib.Enum(id0, node)
        if e.flags&1:
            processbitfield(id0, node)
            return

        for m in e:
            processenummember(m)

    def processstructmember(m, typename):
        nodetype[m._nodeid] = typename

    def processstructs(id0, node, typename):
        nodetype[node] = typename
        s = idblib.Struct(id0, node)

        for m in s:
            processstructmember(m, typename+"member")

    def processscripts(id0, node):
        nodetype[node] = 'script'

    def processaddr(id0, cur):
        k = id0.decodekey(cur.getkey())
        if len(k)==4 and k[2:4] == (b'A', 2):
            nodetype[id0.int(cur)-1] = 'hexrays'

        addstat('addr', k)

    def processfunc(id0, funcspec):
        p = idblib.IdaUnpacker(id0.wordsize, funcspec)

        funcstart = p.nextword()
        funcsize = p.nextword()
        flags = p.next16()
        if flags is None:
            return
        if flags&0x8000:   # is tail
            return

        node = p.nextword()

        if node<0xFFFFFF and node!=0:
            processstructs(id0, node + id0.nodebase, "frame")

    def processimport(id0, node):
        print("imp %08x" % node)
        startkey = id0.makekey(node+1, 'A')
        endkey = id0.makekey(node+1, 'B')
        cur = id0.btree.find('ge', startkey)
        while cur.getkey() < endkey:
            dllnode = id0.int(cur)
            nodetype[dllnode] = 'import'
            cur.next()


    # mark enums, structs, scripts.
    enumlist(id0, '$ enums', processenums)
    enumlist(id0, '$ structs', lambda id0, node : processstructs(id0, node, "struct"))
    enumlist(id0, '$ scriptsnippets', processscripts)
    enumlist(id0, '$ imports', processimport)

    # enum functions, scan for stackframes
    funcsnode = id0.nodeByName('$ funcs')
    startkey = id0.makekey(funcsnode, 'S')
    endkey = id0.makekey(funcsnode, 'T')
    cur = id0.btree.find('ge', startkey)
    while cur.getkey() < endkey:
        processfunc(id0, cur.getval())
        cur.next()

    clinode = id0.nodeByName('$ cli')
    if clinode:
        for letter in "ABCDEFGHIJKMcio":
            startkey = id0.makekey(clinode, letter)
            endkey = id0.makekey(clinode, chr(ord(letter)+1))
            cur = id0.btree.find('ge', startkey)
            while cur.getkey() < endkey:
                nodetype[id0.int(cur)] = 'cli.'+letter
                cur.next()


    # enum addresses, scan for hex-rays nodes
    startkey = b'.'
    endkey = id0.makekey(id0.nodebase)
    cur = id0.btree.find('ge', startkey)
    while cur.getkey() < endkey:
        processaddr(id0, cur)
        cur.next()

    # addresses above node list
    startkey = id0.makekey(id0.maxnode+1)
    endkey = b'/'
    cur = id0.btree.find('ge', startkey)
    while cur.getkey() < endkey:
        processaddr(id0, cur)
        cur.next()

    # scan for unmarked nodes
    #  $ fr[0-9a-f]+\.\w+
    #  $ fr[0-9a-f]+\. [rs]
    #  $ F[0-9A-F]+\.\w+
    #  $ Stack of \w+
    #  Stack[0000007C]
    #  xrefs to \w+

    startkey = id0.makekey(id0.nodebase)
    endkey = id0.makekey(id0.maxnode+1)
    cur = id0.btree.find('ge', startkey)
    while cur.getkey() < endkey:
        k = id0.decodekey(cur.getkey())
        node = k[1]
        if node not in nodetype:
            nodetype[node] = "unknown"
        if nodetype[node] == "unknown" and k[2] == b'N':
            name = cur.getval().rstrip(b'\x00')
            if re.match(br'\$ fr[0-9a-f]+\.\w+$', name):
                name = 'fr-type-functionframe'
            elif re.match(br'\$ fr[0-9a-f]+\. [rs]$', name):
                name = 'fr-type-functionframe'
            elif re.match(br'\$ F[0-9A-F]+\.\w+$', name):
                name = 'F-type-functionframe'
            elif name.startswith(b'Stack of '):
                name = 'stack-type-functionframe'
            elif name.startswith(b'Stack['):
                name = 'old-stack-type-functionframe'
            elif name.startswith(b'xrefs to '):
                name = 'old-xrefs'
            else:
                name = name.decode('utf-8', 'ignore')
            nodetype[node] = name

        cur.next()

    # output node classification
    if args.verbose:
        for k, v in sorted(nodetype.items(), key=lambda kv:kv[0]):
            print("%08x: %s" % (k, v))

    # summarize tags per nodetype
    startkey = id0.makekey(id0.nodebase)
    endkey = id0.makekey(id0.maxnode+1)
    cur = id0.btree.find('ge', startkey)
    while cur.getkey() < endkey:
        k = id0.decodekey(cur.getkey())
        node = k[1]
        nt = nodetype[node]

        addstat(nt, k)

        cur.next()

    # output tag statistics
    for nt, ntstats in sorted(tagstats.items(), key=lambda kv:kv[0]):
        print("====== %s =====" % nt)
        for k, v in ntstats.items():
            if len(k)==1:
                print("%5d - %s" % (v, k[0]))
            elif len(k)==2 and type(k[1])==type(1):
                print("%5d - %s %8x" % (v, k[0], k[1]))
            elif type(k[1])==type(1):
                print("%5d - %s %8x %s" % (v, k[0], k[1], k[2:]))
            else:
                print("%5d - %s %s %s" % (v, k[0], k[1], k[2:]))


def processid0(args, id0):
    if args.info:
        dumpinfo(id0)

    if args.pagedump:
        id0.btree.pagedump()

    if args.query:
        for query in args.query:
            id0query(args, id0, query)
    elif args.id0:
        id0.btree.dump()
    elif args.inc:
        c = id0.btree.find('ge', b'')
        enumeratecursor(args, c, False, lambda c:printent(args, id0, c))
    elif args.dec:
        c = id0.btree.find('le', b'\x80')
        enumeratecursor(args, c, False, lambda c:printent(args, id0, c))


def hexascdumprange(id1, a, b):
    line = asc = ""
    for ea in range(a, b):
        if len(line)==0:
            line = "%08x:" % ea
        byte = id1.getFlags(ea)&0xFF
        line += " %02x" % byte
        asc += chr(byte) if 32<byte<127 else '.'

        if len(line) == 9 + 3*16:
            line += " " + asc
            print(line)
            line = asc = ""
    if len(line):
        while len(line) < 9 + 3*16:
            line += "   "
        line += " " + asc
        print(line)


def saverange(id1, a, b, fh):
    buf = bytes()
    for ea in range(a, b):
        byte = id1.getFlags(ea)&0xFF
        buf += struct.pack("B", byte)

        if len(buf) == 65536:
            fh.write(buf)
            buf = bytes()

    if buf:
        fh.write(buf)


def processid1(args, id1):
    if args.id1:
        id1.dump()
    elif args.dump or args.dumpraw:
        m = re.match(r'^(\d\w*)-(\d\w*)?$', args.dump or args.dumpraw)
        if not m:
            raise Exception("--dump requires a byte range")
        a = int(m.group(1), 0)
        b = int(m.group(2), 0)

        if args.dumpraw:
            saverange(id1, a, b, stdout)
        else:
            hexascdumprange(id1, a, b)


def processid2(args, id2):
    pass


def processnam(args, nam):
    pass


def processtil(args, til):
    pass


def processseg(args, seg):
    pass


def processidb(args, idb):
    if args.verbose > 1:
        print("magic=%s, filever=%d" % (idb.magic, idb.fileversion))
        for i in range(6):
            comp, ofs, size, checksum = idb.getsectioninfo(i)
            if ofs:
                part = idb.getpart(i)
                print("%2d: %02x, %08x %8x [%08x]:  %s" % (i, comp, ofs, size, checksum, hexdump(part.read(256))))

    nam = idb.getsection(idblib.NAMFile)
    id0 = idb.getsection(idblib.ID0File)
    id1 = idb.getsection(idblib.ID1File)
    processid0(args, id0)
    processid1(args, id1)
    processid2(args, idb.getsection(idblib.ID2File))
    processnam(args, nam)
    processtil(args, idb.getsection(idblib.TILFile))
    processseg(args, idb.getsection(idblib.SEGFile))

    if args.names:
        dumpnames(args, id0, nam)
    if args.classify:
        classifynodes(args, id0)

    if args.scripts:
        enumlist(id0, '$ scriptsnippets', dumpscript)
    if args.structs:
        enumlist(id0, '$ structs', dumpstruct)
    if args.enums:
        enumlist(id0, '$ enums', dumpenum)
    if args.funcdirs:
        listfuncdirs(id0)
    if args.imports:
        enumlist(id0, '$ imports', dumpimport)
    if args.segs:
        listsegments(id0)


def processfile(args, filetypehint, fh):
    class DummyIDB:
        def __init__(idb, args):
            if args.i64:
                idb.magic = 'IDA2'
            elif args.i32:
                idb.magic = 'IDA1'
            else:
                idb.magic = None

    try:
        magic = fh.read(64)
        fh.seek(-64, 1)
        if magic.startswith(b"Va") or magic.startswith(b"VA"):
            idb = DummyIDB(args)
            if filetypehint == 'id1':
                processid1(args, idblib.ID1File(idb, fh))
            elif filetypehint == 'nam':
                processnam(args, idblib.NAMFile(idb, fh))
            elif filetypehint == 'seg':
                processseg(args, idblib.SEGFile(idb, fh))
            else:
                print("unknown VA type file: %s" % hexdump(magic))
        elif magic.startswith(b"IDAS"):
            processid2(args, idblib.ID2File(DummyIDB(args), fh))
        elif magic.startswith(b"IDATIL"):
            processtil(args, idblib.ID2File(DummyIDB(args), fh))
        elif magic.startswith(b"IDA"):
            processidb(args, idblib.IDBFile(fh))
        elif magic.find(b'B-tree v') > 0:
            processid0(args, idblib.ID0File(DummyIDB(args), fh))

    except Exception as e:
        print("ERROR %s" % e)
        if args.debug:
            raise


def recover_database(args, basepath, dbfiles):
    processidb(args, idblib.RecoverIDBFile(args, basepath, dbfiles))


def DirEnumerator(args, path):
    """
    Enumerate all files / links in a directory,
    optionally recursing into subdirectories,
    or ignoring links.
    """
    for d in os.scandir(path):
        try:
            if d.name == '.' or d.name == '..':
                pass
            elif d.is_symlink() and args.skiplinks:
                pass
            elif d.is_file():
                yield d.path
            elif d.is_dir() and args.recurse:
                for f in DirEnumerator(args, d.path):
                    yield f
        except Exception as e:
            print("EXCEPTION %s accessing %s/%s" % (e, path, d.name))


def EnumeratePaths(args, paths):
    """
    Enumerate all paths, files from the commandline
    optionally recursing into subdirectories.
    """
    for fn in paths:
        try:
            # 3 - for ftp://, 4 for http://, 5 for https://
            if fn.find("://") in (3, 4, 5):
                yield fn
            if os.path.islink(fn) and args.skiplinks:
                pass
            elif os.path.isdir(fn) and args.recurse:
                for f in DirEnumerator(args, fn):
                    yield f
            elif os.path.isfile(fn):
                yield fn
        except Exception as e:
            print("EXCEPTION %s accessing %s" % (e, fn))


def filetype_from_name(fn):
    i = max(fn.rfind('.'), fn.rfind('/'))
    return fn[i + 1:].lower()


def isv2name(name):
    return name.lower() in ('$segregs.ida', '$segs.ida', '0.ida', '1.ida', 'ida.idl', 'names.ida')


def isv3ext(ext):
    return ext.lower() in ('.id0', '.id1', '.id2', '.nam', '.til')


def xlatv2name(name):
    oldnames = {
        '$segregs.ida': 'reg',
        '$segs.ida': 'seg',
        '0.ida': 'id0',
        '1.ida': 'id1',
        'ida.idl': 'idl',
        'names.ida': 'nam',
    }

    return oldnames.get(name.lower())


def main():
    parser = argparse.ArgumentParser(description='idbtool - print info from hex-rays IDA .idb and .i64 files',
                                     formatter_class=argparse.RawDescriptionHelpFormatter,
                                     epilog="""
idbtool can process complete .idb and .i64 files, but also naked .id0, .id1, .nam, .til files.
All versions since IDA v2.0 are supported.

Queries start with an optional operator: <,<=,>,>=,==.
Followed by either a name or address or nodeid.
Addresses are specified as a sequence of hexadecimal charaters.
Nodeid's may be specified either as the full node id, starting with ff00,
or starting with a '_'.
Names are anything which can be found under the name tree in the database.

After the name/addr/node there is optionally a slash, followed by a node tag,
and another slash, followed by a index or hash string.

Multiple queries can be specified, terminated by another option, or `--`.
Add `-v` for pretty printed keys and values.

Examples:

  idbtool -v --query "$ user1;S;0" -- x.idb
  idbtool -v --limit 4 --query ">#0xa" -- x.idb
  idbtool -v --limit 5 --query ">Root Node;S;0" -- x.idb
  idbtool -v --limit 10 --query ">Root Node;S" -- x.idb
  idbtool -v --query ".0xff000001;N" -- x.idb
""")
    parser.add_argument('--verbose', '-v', action='count', default=0)
    parser.add_argument('--recurse', '-r', action='store_true', help='recurse into directories')
    parser.add_argument('--skiplinks', '-L', action='store_true', help='skip symbolic links')
    parser.add_argument('--filetype', '-t', type=str, help='specify filetype when loading `naked` id1,nam or seg files')
    parser.add_argument('--i64', '-i64', action='store_true', help='specify that `naked` file is from a 64 bit database')
    parser.add_argument('--i32', '-i32', action='store_true', help='specify that `naked` file is from a 32 bit database')

    parser.add_argument('--names', '-n', action='store_true', help='print names')
    parser.add_argument('--scripts', '-s', action='store_true', help='print scripts')
    parser.add_argument('--structs', '-u', action='store_true', help='print structs')
    # parser.add_argument('--comments', '-c', action='store_true', help='print comments')
    parser.add_argument('--enums', '-e', action='store_true', help='print enums and bitfields')
    parser.add_argument('--imports', action='store_true', help='print imports')
    parser.add_argument('--segs', action='store_true', help='print segments')
    parser.add_argument('--funcdirs', action='store_true', help='print function dirs (folders)')
    parser.add_argument('--info', '-i', action='store_true', help='database info')
    parser.add_argument('--inc', action='store_true', help='dump id0 records by cursor increment')
    parser.add_argument('--dec', action='store_true', help='dump id0 records by cursor decrement')
    parser.add_argument('--id0', "-id0", action='store_true', help='dump id0 records, by walking the page tree')
    parser.add_argument('--id1', "-id1", action='store_true', help='dump id1 records')
    parser.add_argument('--dump', type=str, help='hexdump id1 bytes', metavar='FROM-UNTIL')
    parser.add_argument('--dumpraw', type=str, help='output id1 bytes', metavar='FROM-UNTIL')
    parser.add_argument('--pagedump', "-d", action='store_true', help='dump all btree pages, including any that might have become inaccessible due to datacorruption.')
    parser.add_argument('--classify', action='store_true', help='Classify nodes found in the database.')

    parser.add_argument('--query', "-q", type=str, nargs='*', help='search the id0 file for a specific record.')
    parser.add_argument('--limit', '-m', type=int, help='Max nr of records to return for a query.')

    parser.add_argument('--recover', action='store_true', help='recover idb from unpacked files, of v2 database')
    parser.add_argument('--debug', action='store_true')

    parser.add_argument('FILES', type=str, nargs='*', help='Files')

    args = parser.parse_args()

    if args.FILES:
        dbs = dict()

        for fn in EnumeratePaths(args, args.FILES):
            basepath, filename = os.path.split(fn)
            if isv2name(filename):
                d = dbs.setdefault(basepath, dict())
                d[xlatv2name(filename)] = fn
                print("%s -> %s : %s" % (xlatv2name(filename), basepath, filename))
            else:
                basepath, ext = os.path.splitext(fn)
                if isv3ext(ext):
                    d = dbs.setdefault(basepath, dict())
                    d[ext.lower()] = fn

            if not args.dumpraw:
                print("\n==> " + fn + " <==\n")

            try:
                filetype = args.filetype or filetype_from_name(fn)
                with open(fn, "rb") as fh:
                    processfile(args, filetype, fh)
            except Exception as e:
                print("ERROR: %s" % e)
                if args.debug:
                    raise

        if args.recover:
            for basepath, dbfiles in dbs.items():
                if len(dbfiles) > 1:
                    try:
                        print("\n==> " + basepath + " <==\n")
                        recover_database(args, basepath, dbfiles)
                    except Exception as e:
                        print("ERROR: %s" % e)
    else:
        print("==> STDIN <==")
        processfile(args, args.filetype, sys.stdin.buffer)


if __name__ == '__main__':
    main()


================================================
FILE: setup.cfg
================================================
[flake8]
ignore = E402,E501,E731


================================================
FILE: test_idblib.py
================================================
import unittest
from idblib import FileSection, binary_search, makeStringIO


class TestFileSection(unittest.TestCase):
    """ unittest for FileSection object """
    def test_file(self):
        s = makeStringIO(b"0123456789abcdef")
        fh = FileSection(s, 3, 11)
        self.assertEqual(fh.read(3), b"345")
        self.assertEqual(fh.read(8), b"6789a")
        self.assertEqual(fh.read(8), b"")

        fh.seek(-1, 2)
        self.assertEqual(fh.read(8), b"a")
        fh.seek(3)
        self.assertEqual(fh.read(2), b"67")
        fh.seek(-2, 1)
        self.assertEqual(fh.read(2), b"67")
        fh.seek(2, 1)
        self.assertEqual(fh.read(2), b"a")

        fh.seek(8)
        self.assertEqual(fh.read(1), b"")
        with self.assertRaises(Exception):
            fh.seek(9)


class TestBinarySearch(unittest.TestCase):
    """ unittests for binary_search """
    class Object:
        def __init__(self, num):
            self.key = num

        def __repr__(self):
            return "o(%d)" % self.num

    def test_bs(self):
        obj = self.Object
        lst = [obj(_) for _ in (2, 3, 5, 6)]
        self.assertEqual(binary_search(lst, 1), -1)
        self.assertEqual(binary_search(lst, 2), 0)
        self.assertEqual(binary_search(lst, 3), 1)
        self.assertEqual(binary_search(lst, 4), 1)
        self.assertEqual(binary_search(lst, 5), 2)
        self.assertEqual(binary_search(lst, 6), 3)
        self.assertEqual(binary_search(lst, 7), 3)

    def test_emptylist(self):
        obj = self.Object
        lst = []
        self.assertEqual(binary_search(lst, 1), -1)

    def test_oneelem(self):
        obj = self.Object
        lst = [obj(1)]
        self.assertEqual(binary_search(lst, 0), -1)
        self.assertEqual(binary_search(lst, 1), 0)
        self.assertEqual(binary_search(lst, 2), 0)

    def test_twoelem(self):
        obj = self.Object
        lst = [obj(1), obj(3)]
        self.assertEqual(binary_search(lst, 0), -1)
        self.assertEqual(binary_search(lst, 1), 0)
        self.assertEqual(binary_search(lst, 2), 0)
        self.assertEqual(binary_search(lst, 3), 1)
        self.assertEqual(binary_search(lst, 4), 1)

    def test_listsize(self):
        obj = self.Object
        for l in range(3, 32):
            lst = [obj(_ + 1) for _ in range(l)]
            lst = lst[:1] + lst[2:]
            self.assertEqual(binary_search(lst, 0), -1)
            self.assertEqual(binary_search(lst, 1), 0)
            self.assertEqual(binary_search(lst, 2), 0)
            self.assertEqual(binary_search(lst, 3), 1)
            self.assertEqual(binary_search(lst, l - 1), l - 3)
            self.assertEqual(binary_search(lst, l), l - 2)
            self.assertEqual(binary_search(lst, l + 1), l - 2)
            self.assertEqual(binary_search(lst, l + 2), l - 2)


================================================
FILE: tree-walking.py
================================================
"""
Copyright (c) 2016 Willem Hengeveld <itsme@xs4all.nl>

Experiment in btree walking


                   *-------->[00]
         *------>[02]---+    [01]
root ->[08]---+  [05]-+ |
       [17]-+ |       | +--->[03]
            | |       |      [04]
            | |       |
            | |       +----->[06]
            | |              [07]
            | |
            | |    *-------->[09]
            | +->[11]---+    [10]
            |    [14]-+ |
            |         | +--->[12]
            |         |      [13]
            |         |
            |         +----->[15]
            |                [16]
            |
            |      *-------->[18]
            +--->[20]---+    [19]
                 [23]-+ |
                      | +--->[21]
                      |      [22]
                      |
                      +----->[24]
                             [25]


decrement from 08 : ix-- -> getpage, ix=len-1 -> getpage -> ix=len-1
decrement from 17 : ix-- -> getpage, ix=len-1 -> getpage -> ix=len-1
decrement from 02 : ix-- -> getpage, ix=len-1
decrement from 05 : ix-- -> getpage, ix=len-1

decrement from 01  : ix-- -> ix>=0 -> use key at ix
decrement from 03  : ix-- -> <0 -> pop -> ix>=0 -> use key at ix
decrement from 09  : ix-- -> <0 -> pop -> ix<0 -> pop -> ix>=0 -> use key at ix

increment from 09  : ix++
increment from 10  : ix++  -> ix==len(index)  -> pop: ix==-1  -> ix++ -> ix==0  -> use
increment from 11  : recurse, ix=0  -> use
increment from 08  : recurse, ix=-1 -> recurse, ix=0 -> use
increment from 07  : ix++ -> ix==len(index) -> pop,    ix++ -> ix==len -> pop -> ix++ -> ix==0 -> use
"""
from __future__ import division, print_function, absolute_import, unicode_literals

# shape of the tree
# a <2,2>  tree is basically like the tree pictured in the ascii art above.
TREEDEPTH = 2
NODEWIDTH = 2


def binary_search(a, k):
    # c++: a.upperbound(k)--
    first, last = 0, len(a)
    while first < last:
        mid = (first + last) >> 1
        if k < a[mid].key:
            last = mid
        else:
            first = mid + 1
    return first - 1


class Entry(object):
    """
    a key/value entry from a b-tree page
    """
    def __init__(self, key, val):
        self.key = key
        self.val = val

    def __repr__(self):
        return "%s=%d" % (self.key, self.val)


class BasePage(object):
    """
    BasePage has methods common to both leaf and index pages
    """
    def __init__(self, kv):
        self.index = []
        for k, v in kv:
            self.index.append(Entry(k, v))

    def find(self, key):
        i = binary_search(self.index, key)
        if i < 0:
            if self.isindex():
                return ('recurse', -1)
            return ('gt', 0)
        if self.index[i].key == key:
            return ('eq', i)
        if self.isindex():
            return ('recurse', i)
        return ('lt', i)

    def getkey(self, ix):
        return self.index[ix].key

    def getval(self, ix):
        return self.index[ix].val

    def isleaf(self):
        return self.preceeding is None

    def isindex(self):
        return self.preceeding is not None

    def __repr__(self):
        return ("leaf" if self.isleaf() else ("index<%d>" % self.preceeding)) + repr(self.index)


class LeafPage(BasePage):
    """ a leaf page in the b-tree """
    def __init__(self, kv):
        super(self.__class__, self).__init__(kv)
        self.preceeding = None


class IndexPage(BasePage):
    """
    An index page in the b-tree.
    This page has a preceeding page plus several key+subpage pairs.
    For each key+subpage: all keys in the subpage are greater than the key
    """
    def __init__(self, preceeding, kv):
        super(self.__class__, self).__init__(kv)
        self.preceeding = preceeding

    def getpage(self, ix):
        return self.preceeding if ix < 0 else self.index[ix].val


class Cursor:
    """
    A Cursor object represents a position in the b-tree.

    It has methods for moving to the next or previous item.
    And methods for retrieving the key and value of the current position
    """
    def __init__(self, db, stack):
        self.db = db
        self.stack = stack

    def next(self):
        page, ix = self.stack.pop()
        if page.isleaf():
            # from leaf move towards root
            ix += 1
            while self.stack and ix == len(page.index):
                page, ix = self.stack.pop()
                ix += 1
            if ix < len(page.index):
                self.stack.append((page, ix))
        else:
            # from node move towards leaf
            self.stack.append((page, ix))
            page = self.db.readpage(page.getpage(ix))
            while page.isindex():
                ix = -1
                self.stack.append((page, ix))
                page = self.db.readpage(page.getpage(ix))
            ix = 0
            self.stack.append((page, ix))

        self.verify()

    def prev(self):
        page, ix = self.stack.pop()
        ix -= 1
        if page.isleaf():
            # move towards root, until non 'prec' item found
            while self.stack and ix < 0:
                page, ix = self.stack.pop()
            if ix >= 0:
                self.stack.append((page, ix))
        else:
            # move towards leaf
            self.stack.append((page, ix))
            while page.isindex():
                page = self.db.readpage(page.getpage(ix))
                ix = len(page.index) - 1
                self.stack.append((page, ix))

        self.verify()

    def verify(self):
        """ verify cursor state consistency """
        if len(self.stack) == 3:
            if not self.stack[-1][0].isleaf():
                print("WARN no leaf")
        elif len(self.stack) > 3:
            print("WARN: stack too large")

        if len(self.stack) >= 2:
            if self.stack[0][0] == self.stack[1][0]:
                print("WARN: identical index pages on stack")
            if not self.stack[0][0].isindex():
                print("WARN: expected root=index")
            if not self.stack[1][0].isindex():
                print("WARN: expected 2nd=index")

    def eof(self):
        return len(self.stack) == 0

    def getkey(self):
        page, ix = self.stack[-1]
        return page.getkey(ix)

    def getval(self):
        page, ix = self.stack[-1]
        return page.getval(ix)

    def __repr__(self):
        return "cursor:" + repr(self.stack)


class Btree:
    """
    A B-tree implementation
    """
    def __init__(self):
        self.pages = []
        self.generate(TREEDEPTH, NODEWIDTH)

    def manual(self):
        """ manually construct the ascii art tree """
        for i in range(9):
            self.pages.append(LeafPage((("%02d" % (3 * i), 0), ("%02d" % (3 * i + 1), 0))))
        for i in range(3):
            self.pages.append(IndexPage(3 * i, (("%02d" % (9 * i + 2), 3 * i + 1), ("%02d" % (9 * i + 5), 3 * i + 2))))
        self.pages.append(IndexPage(9, (("08", 10), ("17", 11))))
        self.rootindex = len(self.pages) - 1

    def generate(self, depth, nodesize):
        """ automatically generate the try in the ascii art above """

        def namegen():
            i = 0
            while True:
                yield "%03d" % i
                i += 1

        self.rootindex = self.construct(namegen(), depth, nodesize)
        print("%d pages" % (len(self.pages)))

    def construct(self, namegen, depth, nodesize):
        if depth:
            return self.createindex(namegen, depth, nodesize)
        else:
            return self.createleaf(namegen, nodesize)

    def createindex(self, namegen, depth, nodesize):
        page = IndexPage(self.construct(namegen, depth - 1, nodesize),
                         [(next(namegen), self.construct(namegen, depth - 1, nodesize)) for _ in range(nodesize)])
        self.pages.append(page)
        return len(self.pages) - 1

    def createleaf(self, namegen, nodesize):
        page = LeafPage([(next(namegen), 0) for _ in range(nodesize)])
        self.pages.append(page)
        return len(self.pages) - 1

    def readpage(self, pn):
        return self.pages[pn]

    def find(self, key):
        """
        Find a node in the tree, returns the cursor plus the reletion to the wanted key:
        'eq' for equal, 'lt' when the found key is less than the wanted key,
        or 'gt' when the found key is greater than the wanted key.
        """
        page = self.readpage(self.rootindex)
        stack = []
        while True:
            act, ix = page.find(key)
            stack.append((page, ix))
            if act != 'recurse':
                break
            page = self.readpage(page.getpage(ix))
        return act, Cursor(self, stack)

    def dumptree(self, pn, indent=0):
        """ dump all nodes of the current b-tree """
        page = self.readpage(pn)
        print("  " * indent, page)
        if page.isindex():
            print("  " * indent, end="")
            self.dumptree(page.preceeding, indent + 1)
            for p in range(len(page.index)):
                print("  " * indent, end="")
                self.dumptree(page.getpage(p), indent + 1)


db = Btree()
print("<<")
db.dumptree(db.rootindex)
print(">>")


for i in range(NODEWIDTH * len(db.pages)):
    print("--------- %03d" % i)
    act, cursor = db.find("%03d" % i)
    print("found", act, cursor.getkey(), cursor)
    cursor.prev()
    if not cursor.eof():
        print("prev:", "..", cursor.getkey(), cursor)
    else:
        print("prev:  EOF", cursor)

for i in range(NODEWIDTH * len(db.pages)):
    print("--------- %03d" % i)
    act, cursor = db.find("%03d" % i)
    print("found", act, cursor.getkey(), cursor)
    cursor.next()
    if not cursor.eof():
        print("next:", "..", cursor.getkey(), cursor)
    else:
        print("next:  EOF", cursor)

for k in ('', '0', '1', '2', '3', '000', '010', '020', '100'):
    print("--------- %s" % k)
    act, cursor = db.find(k)
    print(cursor)
    print(act, cursor.getkey(), end=" next=")
    cursor.next()
    if cursor.eof():
        print("EOF")
    else:
        print(cursor.getkey())

act, cursor = db.find("000")
print("get000", end=" ")
for i in range(NODEWIDTH * len(db.pages)):
    cursor.next()
    if cursor.eof():
        print("EOF")
    else:
        print("-> %s" % cursor.getkey(), end=" ")
print()

act, cursor = db.find("025")
print("get025", end=" ")
for i in range(NODEWIDTH * len(db.pages)):
    cursor.prev()
    if cursor.eof():
        print("EOF")
    else:
        print("-> %s" % cursor.getkey(), end=" ")
print()


================================================
FILE: tstbs.py
================================================
def binary_search(a, k):
    # c++: a.upperbound(k)--
    first, last = 0, len(a)
    while first<last:
        mid = (first+last)>>1
        if k < a[mid]:
            last = mid
        else:
            first = mid+1
    return first-1
for x in range(8):
    print(x, binary_search([2,3,5,6], x))