Repository: nlitsme/pyidbutil
Branch: master
Commit: e77d0e79e5c1
Files: 9
Total size: 114.7 KB
Directory structure:
gitextract_dtt79ccf/
├── LICENSE
├── README.md
├── idaunpack.py
├── idblib.py
├── idbtool.py
├── setup.cfg
├── test_idblib.py
├── tree-walking.py
└── tstbs.py
================================================
FILE CONTENTS
================================================
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2020 Willem Hengeveld <itsme@xs4all.nl>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
IDBTOOL
=======
A tool for extracting information from IDA databases.
`idbtool` knows how to handle databases from all IDA versions since v2.0, both `i64` and `idb` files.
You can also use `idbtool` to recover information from unclosed databases.
`idbtool` works without change with IDA v7.0.
Much faster than loading a file in IDA
--------------------------------------
With idbtool you can search thousands of .idb files in seconds.
More precisely: on my laptop it takes:
* 1.5 seconds to extract 143 idc scripts from 119 idb and i64 files.
* 3.8 seconds to print idb info for 441 files.
* 5.6 seconds to extract 281 enums containing 4726 members from 35 files.
* 67.8 seconds to extract 5942 structs containing 33672 members from 265 files.
Loading a approximately 5 Gbyte idb file in IDA, takes about 45 minutes.
While idb3.h takes basically no time at all, no more than a few milliseconds.
Download
========
Two versions of this tool exist:
One written in python
* https://github.com/nlitsme/pyidbutil
One written in C++
* https://github.com/nlitsme/idbutil
Both repositories contain a library which can be used for reading `.idb` or `.i64` files.
Usage
=====
Usage:
idbtool [options] [database file(s)]
* `-n` or `--names` will list all named values in the database.
* `-s` or `--scripts` will list all scripts stored in the database.
* `-u` or `--structs` will list all structs stored in the database.
* `-e` or `--enums` will list all enums stored in the database.
* `--imports` will list all imported symbols from the database.
* `--funcdirs` will list function folders stored in the database.
* `-i` or `--info` will print some general info about the database.
* `-d` or `--pagedump` dump btree page tree contents.
* `--inc`, `--dec` list all records in ascending / descending order.
* `-q` or `--query` search specific records in the database.
* `-m` or `--limit` limit the number of results returned by `-q`.
* `-id0`, `-id1` dump only one specific section.
* `--i64`, `--i32` tell idbtool that the specified file is from a 64 or 32 bit database.
* `--recover` group files from an unpacked database.
* `--classify` summarizes node usage in the database
* `--dump` hexdump the original binary data
query
-----
Queries need to be specified last on the commandline.
example:
idbtool [database file(s)] --query "Root Node;V"
Will list the source binary for all the databases specified on the commandline.
A query is a string with the following format:
* [==,<=,>=,<,>] - optional relation, default: ==
* a base node key:
* a DOT followed by the numeric value of the nodeid.
* a HASH followed by the numeric value of the system-nodeid.
* a QUESTION followed by the name of the node. -> a 'N'ame node
* the name of the node. -> the name is resolved, results in a '.'Dot node
* an optional tag ( A for Alt, S for Supval, etc )
* an optional index value
example queries:
* `Root Node;V` -> prints record containing the source binary name
* `?Root Node` -> prints the Name record pointing to the root
* `>Root Node` -> prints the first 10 records starting with the root node id.
* `<Root Node` -> prints the 10 records startng with the recordsbefore the rootnode.
* `.0xff000001;N` -> prints the rootnode name entry.
* `#1;N` -> prints the rootnode name entry.
List the highest node and following record in the database in two different ways,
the first: starting at the first record below `ffc00000`, and listing the next.
The second: starting at the first record after `ffc00000`, and listing the previous:
* `--query "<#0xc00000" --limit 2 --inc -v`
* `--query ">#0xc00000" --limit 2 --dec -v`
Note that this should be the nodeid in the `$ MAX NODE` record.
List the last two records:
* `--limit 2 --dec -v`
List the first two records, the `$ MAX LINK` and `$ MAX NODE` records:
* `--limit 2 --inc -v`
A full database dump
--------------------
Several methods exist for printing all records in the database. This may be useful if
you want to investigate more of IDA''s internals. But can also be useful in recovering
data from corrupted databases.
* `--inc`, `--dec` can be used to enumerate all b-tree records in either forward, or backward direction.
* add `-v` to get a prettier key/value output
* `--id0` walks the page tree, instead of the record tree, printing the contents of each page
* `--pagedump` linearly skip through the file, this will also reveal information in deleted pages.
naked files
===========
When IDA or your computer crashed while working on a disassembly, and you did not yet save the database,
you are left with a couple of files with extensions like `.id0`, `.id1`, `.nam`, etc.
These files are the unpacked database, i call them `naked` files.
Using the `--filetype` and `--i64` or `--i32` options you can inspect these `naked` files individually.
or use the `--recover` option to view them as a complete database together.
`idbtool` will figure out automatically which files would belong together.
`idbtool` can figure out the bitsize of the database from an `.id0` file, but not(yet) from the others.
LIBRARY
=======
The file `idblib.py` contains a library.
TODO
====
* add option to list all comments stored in the database
* add option to list flags for a list of addresses.
Author
======
Willem Hengeveld <itsme@xs4all.nl>
================================================
FILE: idaunpack.py
================================================
"""
`idaunpack` is a tool to aid in decoding packed data structures from an
IDA idb or i64 database.
"""
from __future__ import print_function, division
import struct
import re
import sys
from binascii import a2b_hex, b2a_hex
from idblib import IdaUnpacker
def dump_packed(data, wordsize, pattern):
p = IdaUnpacker(wordsize, data)
if pattern:
for c in pattern:
if p.eof():
print("EOF")
break
if c == 'H':
val = p.next16()
fmt = "%04x"
elif c == 'L':
val = p.next32()
fmt = "%08x"
elif c == 'Q':
val = p.next64()
fmt = "%016x"
elif c == 'W':
val = p.nextword()
if wordsize==4:
fmt = "[%08x]"
else:
fmt = "[%016x]"
else:
raise Exception("unknown pattern: %s" % c)
print(fmt % val, end=" ")
while not p.eof():
val = p.next32()
print("%08x" % val, end=" ")
print()
def unhex(hextxt):
return a2b_hex(re.sub(r'\W+', '', hextxt, flags=re.DOTALL))
def main():
import argparse
parser = argparse.ArgumentParser(description='idaunpack')
parser.add_argument('--verbose', '-v', action='store_true')
parser.add_argument('--debug', action='store_true', help='abort on exceptions.')
parser.add_argument('--pattern', '-p', type=str, help='unpack pattern: sequence of H, L, Q, W')
parser.add_argument('-4', '-3', '-32', const=4, dest='wordsize', action='store_const', help='use 32 bit words')
parser.add_argument('-8', '-6', '-64', const=8, dest='wordsize', action='store_const', help='use 64 bit words')
parser.add_argument('--wordsize', '-w', type=int, help='specify wordsize')
parser.add_argument('hexconsts', nargs='*', type=str)
args = parser.parse_args()
if args.wordsize is None:
args.wordsize = 4
for x in args.hexconsts:
dump_packed(unhex(x), args.wordsize, args.pattern)
if __name__ == '__main__':
main()
================================================
FILE: idblib.py
================================================
"""
idblib - a module for reading hex-rays Interactive DisAssembler databases
Supports database versions starting with IDA v2.0
IDA v1.x is not supported, that was an entirely different file format.
IDA v2.x databases are organised as several files, in a directory
IDA v3.x databases are bundled into .idb files
IDA v4 .. v6 various improvements, like databases larger than 4Gig, and 64 bit support.
Copyright (c) 2016 Willem Hengeveld <itsme@xs4all.nl>
An IDB file can contain up to 6 sections:
id0 the main database
id1 contains flags for each byte - what is returned by idc.GetFlags(ea)
nam contains a list of addresses of named items
seg .. only in older databases
til type info
id2 ?
The id0 database is a simple key/value database, much like leveldb
types of records:
Some bookkeeping:
"$ MAX NODE" -> the highest numbered node value in use.
A list of names:
"N" + name -> the node id for that name.
names are both user/disassembler symbols assigned to addresses
in the disassembled code, and IDA internals, like lists of items,
For example: '$ structs', or 'Root Node'.
The main part:
"." + nodeid + tag + index
This maps directly onto the idasdk netnode interface.
The size of the nodeid and index is 32bits for .idb files and 64 bits for .i64 files.
The nodeid and index are encoded as bigendian numbers in the key, and as little endian
numbers in (most of) the values.
"""
from __future__ import division, print_function, absolute_import, unicode_literals
import struct
import binascii
import re
import os
#############################################################################
# some code to make this library run with both python2 and python3
#############################################################################
import sys
if sys.version_info[0] == 3:
long = int
else:
bytes = bytearray
try:
cmp(1, 2)
except:
# python3 does not have cmp
def cmp(a, b): return (a > b) - (a < b)
class cachedproperty(object):
## .. only works with python3 somehow. -- todo: figure out why not with python2
def __init__(self, method):
self.method = method
self.name = '_' + method.__name__
def __get__(self, obj, cls):
if not hasattr(obj, self.name):
value = self.method(obj)
setattr(obj, self.name, value)
else:
value = getattr(obj, self.name)
return value
def strz(b, o):
return b[o:b.find(b'\x00', o)].decode('utf-8', 'ignore')
def makeStringIO(data):
if sys.version_info[0] == 2:
from StringIO import StringIO
return StringIO(data)
else:
from io import BytesIO
return BytesIO(data)
#############################################################################
# some utility functions
#############################################################################
def nonefmt(fmt, item):
# helper for outputting None without raising an error
if item is None:
return "-"
return fmt % item
def hexdump(data):
if data is None:
return
return binascii.b2a_hex(data).decode('utf-8')
#############################################################################
class FileSection(object):
"""
Presents a file like object which is a section of a larger file.
`fh` is expected to have a seek and read method.
This class is used to access a section (e.g. the .id0 file) of a larger file (e.g. the .idb file)
and make read/seek behave as if it were a separate file.
"""
def __init__(self, fh, start, end):
self.fh = fh
self.start = start
self.end = end
self.curpos = 0
self.fh.seek(self.start)
def read(self, size=None):
want = self.end - self.start - self.curpos
if size is not None and want > size:
want = size
if want <= 0:
return b""
# make sure filepointer is at correct position since we are sharing the fh object with others.
self.fh.seek(self.curpos + self.start)
data = self.fh.read(want)
self.curpos += len(data)
return data
def seek(self, offset, *args):
def isvalidpos(offset):
return 0 <= offset <= self.end - self.start
if len(args) == 0:
whence = 0
else:
whence = args[0]
if whence == 0:
if not isvalidpos(offset):
print("invalid seek: from %x to SET:%x" % (self.curpos, offset))
raise Exception("illegal offset")
self.curpos = offset
elif whence == 1:
if not isvalidpos(self.curpos + offset):
raise Exception("illegal offset")
self.curpos += offset
elif whence == 2:
if not isvalidpos(self.end - self.start + offset):
raise Exception("illegal offset")
self.curpos = self.end - self.start + offset
self.fh.seek(self.curpos + self.start)
def tell(self):
return self.curpos
class IdaUnpacker:
"""
Decodes packed ida structures.
This is used o.a. in struct definitions, and .id2 files
Related sdk functions: pack_dd, unpack_dd, etc.
"""
def __init__(self, wordsize, data):
self.wordsize = wordsize
self.data = data
self.o = 0
def eof(self):
return self.o >= len(self.data)
def have(self, n):
return self.o+n <= len(self.data)
def nextword(self):
"""
Return an unsigned word-sized integer from the buffer
"""
if self.wordsize == 4:
return self.next32()
elif self.wordsize == 8:
return self.next64()
else:
raise Exception("unsupported wordsize")
def nextwordsigned(self):
"""
Return a signed word-sized integer from the buffer
"""
if self.wordsize == 4:
val = self.next32()
if val < 0x80000000:
return val
return val - 0x100000000
elif self.wordsize == 8:
val = self.next64()
if val < 0x8000000000000000:
return val
return val - 0x10000000000000000
else:
raise Exception("unsupported wordsize")
def next64(self):
if self.eof():
return None
lo = self.next32()
hi = self.next32()
return (hi<<32) | lo
def next16(self):
"""
Return a packed 16 bit integer from the buffer
"""
if self.eof():
return None
byte = self.data[self.o:self.o+1]
if byte == b'\xff':
# a 16 bit value:
# 1111 1111 xxxx xxxx xxxx xxxx
if self.o+3 > len(self.data):
return None
val, = struct.unpack_from(">H", self.data, self.o+1)
self.o += 3
return val
elif byte < b'\x80':
# a 7 bit value:
# 0xxx xxxx
self.o += 1
val, = struct.unpack("B", byte)
return val
elif byte < b'\xc0':
# a 14 bit value:
# 10xx xxxx xxxx xxxx
if self.o+2 > len(self.data):
return None
val, = struct.unpack_from(">H", self.data, self.o)
self.o += 2
return val&0x3FFF
else:
return None
def next8(self):
if self.eof():
return None
byte = self.data[self.o:self.o+1]
self.o += 1
val, = struct.unpack("B", byte)
return val
def next32(self):
"""
Return a packed integer from the buffer
"""
if self.eof():
return None
byte = self.data[self.o:self.o+1]
if byte == b'\xff':
# a 32 bit value:
# 1111 1111 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
if self.o+5 > len(self.data):
return None
val, = struct.unpack_from(">L", self.data, self.o+1)
self.o += 5
return val
elif byte < b'\x80':
# a 7 bit value:
# 0xxx xxxx
self.o += 1
val, = struct.unpack("B", byte)
return val
elif byte < b'\xc0':
# a 14 bit value:
# 10xx xxxx xxxx xxxx
if self.o+2 > len(self.data):
return None
val, = struct.unpack_from(">H", self.data, self.o)
self.o += 2
return val&0x3FFF
elif byte < b'\xe0':
# a 29 bit value:
# 110x xxxx xxxx xxxx xxxx xxxx xxxx xxxx
if self.o+4 > len(self.data):
return None
val, = struct.unpack_from(">L", self.data, self.o)
self.o += 4
return val&0x1FFFFFFF
else:
return None
def bytes(self, n):
"""
Return fixed length string from buffer
"""
if not self.have(n):
return None
data = self.data[self.o : self.o+n]
self.o += n
return data
class IDBFile(object):
"""
Provide access to the various sections in an .idb file.
Usage:
idb = IDBFile(fhandle)
id0 = idb.getsection(ID0File)
ID0File is expected to have a class property 'INDEX'
# v1..v5 id1 and nam files start with 'Va0' .. 'Va4'
# v6 id1 and nam files start with 'VA*'
# til files start with 'IDATIL'
# id2 files start with 'IDAS\x1d\xa5\x55\x55'
"""
def __init__(self, fh):
""" constructor takes a filehandle """
self.fh = fh
self.fh.seek(0)
hdrdata = self.fh.read(0x100)
self.magic = hdrdata[0:4].decode('utf-8', 'ignore')
if self.magic not in ('IDA0', 'IDA1', 'IDA2'):
raise Exception("invalid file magic")
values = struct.unpack_from("<6LH6L", hdrdata, 6)
if values[5] != 0xaabbccdd:
fileversion = 0
offsets = list(values[0:5])
offsets.append(0)
checksums = [0 for _ in range(6)]
else:
fileversion = values[6]
if fileversion < 5:
offsets = list(values[0:5])
checksums = list(values[8:13])
idsofs, idscheck = struct.unpack_from("<LH" if fileversion == 1 else "<LL", hdrdata, 56)
offsets.append(idsofs)
checksums.append(idscheck)
# note: filever 4 has '0x5c', zeros, md5, more zeroes
elif fileversion == 6:
values = struct.unpack_from("<QQLLHQQQ5LQL", hdrdata, 6)
offsets = [values[_] for _ in (0, 1, 5, 6, 7, 13)]
checksums = [values[_] for _ in (8, 9, 10, 11, 12, 14)]
elif fileversion == 910:
"""
+00: "IDA2", 0, 0
+06: headersize
+0e: datastart
+16: aabbccdd00000000
+1e: version
+20: compression
+21: 6 qwords section-size
+5d: md5
"""
values = struct.unpack_from("<3QHB6Q", hdrdata, 6)
offsets = [values[1]]
self.sizes = values[5:]
for s in self.sizes:
offsets.append(offsets[-1]+s)
checksums = [0] * len(offsets)
self.compression = values[4]
if self.compression:
raise Exception("compression not supported for v910")
else:
raise Exception("unknown file version")
# offsets now has offsets to the various idb parts
# id0, id1, nam, seg, til, id2 ( = sparse file )
self.offsets = offsets
self.checksums = checksums
self.fileversion = fileversion
def getsectioninfo(self, i):
"""
Returns a tuple with section parameters by index.
The parameteres are:
* compression flag
* data offset
* data size
* data checksum
Sections are stored in a fixed order: id0, id1, nam, seg, til, id2
"""
if not 0 <= i < len(self.offsets):
return 0, 0, 0, 0
if self.offsets[i] == 0:
return 0, 0, 0, 0
self.fh.seek(self.offsets[i])
if self.fileversion < 5:
comp, size = struct.unpack("<BL", self.fh.read(5))
ofs = self.offsets[i] + 5
elif self.fileversion == 6:
comp, size = struct.unpack("<BQ", self.fh.read(9))
ofs = self.offsets[i] + 9
elif self.fileversion == 910:
comp = 0
size = self.sizes[i]
ofs = self.offsets[i]
else:
raise Exception("unhandled file version")
return comp, ofs, size, self.checksums[i]
def getpart(self, ix):
"""
Returns a fileobject for the specified section.
This method optionally decompresses the data found in the .idb file,
and returns a file-like object, with seek, read, tell.
"""
if self.offsets[ix] == 0:
return
comp, ofs, size, checksum = self.getsectioninfo(ix)
fh = FileSection(self.fh, ofs, ofs + size)
if comp == 2:
import zlib
# very old databases used a different compression scheme:
wbits = -15 if self.magic == 'IDA0' else 15
fh = makeStringIO(zlib.decompress(fh.read(size), wbits))
elif comp == 0:
pass
else:
raise Exception("unsupported section encoding: %02x" % comp)
return fh
def getsection(self, cls):
"""
Constructs an object for the specified section.
"""
return cls(self, self.getpart(cls.INDEX))
class RecoverIDBFile:
"""
RecoverIDBFile has the same interface as IDBFile, but expects the database to be split over several files.
This is useful for opening IDAv2.x databases, or for recovering data from unclosed databases.
"""
id2ext = ['.id0', '.id1', '.nam', '.seg', '.til', '.id2']
def __init__(self, args, basepath, dbfiles):
if args.i64:
self.magic = 'IDA2'
else:
self.magic = 'IDA1'
self.basepath = basepath
self.dbfiles = dbfiles
self.fileversion = 0
def getsectioninfo(self, i):
if not 0 <= i < len(self.id2ext):
return 0, 0, 0, 0
ext = self.id2ext[i]
if ext not in self.dbfiles:
return 0, 0, 0, 0
return 0, 0, os.path.getsize(self.dbfiles[ext]), 0
def getpart(self, ix):
if not 0 <= ix < len(self.id2ext):
return None
ext = self.id2ext[ix]
if ext not in self.dbfiles:
print("can't find %s" % ext)
return None
return open(self.dbfiles[ext], "rb")
def getsection(self, cls):
part = self.getpart(cls.INDEX)
if part:
return cls(self, part)
def binary_search(a, k):
"""
Do a binary search in an array of objects ordered by '.key'
returns the largest index for which: a[i].key <= k
like c++: a.upperbound(k)--
"""
first, last = 0, len(a)
while first < last:
mid = (first + last) >> 1
if k < a[mid].key:
last = mid
else:
first = mid + 1
return first - 1
"""
################################################################################
I would have liked to make these classes a nested class of BTree, but
the problem is than there is no way for a nested-nested class
of BTree to refer back to a toplevel nested class of BTree.
So moving these outside of BTree so i can use them as baseclasses
in the various page implementations
class BTree:
class BaseEntry(object): pass
class BasePage(object): pass
class Page15(BasePage):
class Entry(BTree.BaseEntry):
pass
>>> NameError: name 'BTree' is not defined
"""
class BaseIndexEntry(object):
"""
Baseclass for Index Entries.
Index entries have a key + value, and a page containing keys larger than that key
in this index entry.
"""
def __init__(self, data):
ofs = self.recofs
if self.recofs < 6:
# reading an invalid page...
self.val = self.key = None
return
keylen, = struct.unpack_from("<H", data, ofs) ; ofs += 2
self.key = data[ofs:ofs + keylen] ; ofs += keylen
vallen, = struct.unpack_from("<H", data, ofs) ; ofs += 2
self.val = data[ofs:ofs + vallen] ; ofs += vallen
def __repr__(self):
return "%06x: %s = %s" % (self.page, hexdump(self.key), hexdump(self.val))
class BaseLeafEntry(BaseIndexEntry):
"""
Baseclass for Leaf Entries
Leaf entries have a key + value, and an `indent`
The `indent` is there to save space in the index, since subsequent keys
usually are very similar.
The indent specifies the offset where this key is different from the previous key
"""
def __init__(self, key, data):
""" leaf entries get the previous key a an argument. """
super(BaseLeafEntry, self).__init__(data)
self.key = key[:self.indent] + self.key
def __repr__(self):
return " %02x:%02x: %s = %s" % (self.unknown1, self.unknown, hexdump(self.key), hexdump(self.val))
class BTree(object):
"""
BTree is the IDA main database engine.
It allows the user to do a binary search for records with
a specified key relation ( >, <, ==, >=, <= )
"""
class BasePage(object):
"""
Baseclass for Pages. for the various btree versions ( 1.5, 1.6 and 2.0 )
there are subclasses which specify the exact layout of the page header,
and index / leaf entries.
Leaf pages don't have a 'preceeding' page pointer.
"""
def __init__(self, data, entsize, entfmt):
self.preceeding, self.count = struct.unpack_from(entfmt, data)
if self.preceeding:
entrytype = self.IndexEntry
else:
entrytype = self.LeafEntry
self.index = []
key = b""
for i in range(self.count):
ent = entrytype(key, data, entsize * (1 + i))
self.index.append(ent)
key = ent.key
self.unknown, self.freeptr = struct.unpack_from(entfmt, data, entsize * (1 + self.count))
def find(self, key):
"""
Searches pages for key, returns relation to key:
recurse -> found a next level index page to search for key.
also returns the next level page nr
gt -> found a value with a key greater than the one searched for.
lt -> found a value with a key less than the one searched for.
eq -> found a value with a key equal to the one searched for.
gt, lt and eq return the index for the key found.
# for an index entry: the key is 'less' than anything in the page pointed to.
"""
i = binary_search(self.index, key)
if i < 0:
if self.isindex():
return ('recurse', -1)
return ('gt', 0)
if self.index[i].key == key:
return ('eq', i)
if self.isindex():
return ('recurse', i)
return ('lt', i)
def getpage(self, ix):
""" For Indexpages, returns the page ptr for the specified entry """
return self.preceeding if ix < 0 else self.index[ix].page
def getkey(self, ix):
""" For all page types, returns the key for the specified entry """
return self.index[ix].key
def getval(self, ix):
""" For all page types, returns the value for the specified entry """
return self.index[ix].val
def isleaf(self):
""" True when this is a Leaf Page """
return self.preceeding == 0
def isindex(self):
""" True when this is an Index Page """
return self.preceeding != 0
def __repr__(self):
return ("leaf" if self.isleaf() else ("index<%d>" % self.preceeding)) + repr(self.index)
######################################################
# Page objects for the various versions of the database
######################################################
class Page15(BasePage):
""" v1.5 b-tree page """
class IndexEntry(BaseIndexEntry):
def __init__(self, key, data, ofs):
self.page, self.recofs = struct.unpack_from("<HH", data, ofs)
self.recofs += 1 # skip unused zero byte in each key/value record
super(self.__class__, self).__init__(data)
class LeafEntry(BaseLeafEntry):
def __init__(self, key, data, ofs):
self.indent, self.unknown, self.recofs = struct.unpack_from("<BBH", data, ofs)
self.unknown1 = 0
self.recofs += 1 # skip unused zero byte in each key/value record
super(self.__class__, self).__init__(key, data)
def __init__(self, data):
super(self.__class__, self).__init__(data, 4, "<HH")
class Page16(BasePage):
""" v1.6 b-tree page """
class IndexEntry(BaseIndexEntry):
def __init__(self, key, data, ofs):
self.page, self.recofs = struct.unpack_from("<LH", data, ofs)
self.recofs += 1 # skip unused zero byte in each key/value record
super(self.__class__, self).__init__(data)
class LeafEntry(BaseLeafEntry):
def __init__(self, key, data, ofs):
self.indent, self.unknown1, self.unknown, self.recofs = struct.unpack_from("<BBHH", data, ofs)
self.recofs += 1 # skip unused zero byte in each key/value record
super(self.__class__, self).__init__(key, data)
def __init__(self, data):
super(self.__class__, self).__init__(data, 6, "<LH")
class Page20(BasePage):
""" v2.0 b-tree page """
class IndexEntry(BaseIndexEntry):
def __init__(self, key, data, ofs):
self.page, self.recofs = struct.unpack_from("<LH", data, ofs)
# unused zero byte is no longer there in v2.0 b-tree
super(self.__class__, self).__init__(data)
class LeafEntry(BaseLeafEntry):
def __init__(self, key, data, ofs):
self.indent, self.unknown, self.recofs = struct.unpack_from("<HHH", data, ofs)
self.unknown1 = 0
super(self.__class__, self).__init__(key, data)
def __init__(self, data):
super(self.__class__, self).__init__(data, 6, "<LH")
class Cursor:
"""
A Cursor object represents a position in the b-tree.
It has methods for moving to the next or previous item.
And methods for retrieving the key and value of the current position
The position is represented as a list of (page, index) tuples
"""
def __init__(self, db, stack):
self.db = db
self.stack = stack
def next(self):
""" move cursor to next entry """
page, ix = self.stack.pop()
if page.isleaf():
# from leaf move towards root
ix += 1
while self.stack and ix == len(page.index):
page, ix = self.stack.pop()
ix += 1
if ix < len(page.index):
self.stack.append((page, ix))
else:
# from node move towards leaf
self.stack.append((page, ix))
page = self.db.readpage(page.getpage(ix))
while page.isindex():
ix = -1
self.stack.append((page, ix))
page = self.db.readpage(page.getpage(ix))
ix = 0
self.stack.append((page, ix))
def prev(self):
""" move cursor to the previous entry """
page, ix = self.stack.pop()
ix -= 1
if page.isleaf():
# move towards root, until non 'prec' item found
while self.stack and ix < 0:
page, ix = self.stack.pop()
if ix >= 0:
self.stack.append((page, ix))
else:
# move towards leaf
self.stack.append((page, ix))
while page.isindex():
page = self.db.readpage(page.getpage(ix))
ix = len(page.index) - 1
self.stack.append((page, ix))
def eof(self):
return len(self.stack) == 0
def getkey(self):
""" return the key value pointed to by the cursor """
page, ix = self.stack[-1]
return page.getkey(ix)
def getval(self):
""" return the data value pointed to by the cursor """
page, ix = self.stack[-1]
return page.getval(ix)
def __repr__(self):
return "cursor:" + repr(self.stack)
def __init__(self, fh):
""" BTree constructor - takes a filehandle """
self.fh = fh
self.fh.seek(0)
data = self.fh.read(64)
if data[13:].startswith(b"B-tree v 1.5 (C) Pol 1990"):
self.parseheader15(data)
self.page = self.Page15
self.version = 15
elif data[19:].startswith(b"B-tree v 1.6 (C) Pol 1990"):
self.parseheader16(data)
self.page = self.Page16
self.version = 16
elif data[19:].startswith(b"B-tree v2"):
self.parseheader16(data)
self.page = self.Page20
self.version = 20
else:
print("unknown btree: %s" % hexdump(data))
raise Exception("unknown b-tree")
def parseheader15(self, data):
self.firstfree, self.pagesize, self.firstindex, self.reccount, self.pagecount = struct.unpack_from("<HHHLH", data, 0)
def parseheader16(self, data):
# v16 and v20 both have the same header format
self.firstfree, self.pagesize, self.firstindex, self.reccount, self.pagecount = struct.unpack_from("<LHLLL", data, 0)
def readpage(self, nr):
self.fh.seek(nr * self.pagesize)
return self.page(self.fh.read(self.pagesize))
def find(self, rel, key):
"""
Searches for a record with the specified relation to the key
A cursor object is returned, the user can call getkey, getval on the cursor
to retrieve the actual value.
or call cursor.next() / cursor.prev() to enumerate values.
'eq' -> record equal to the key, None when not found
'le' -> last record with key <= to key
'ge' -> first record with key >= to key
'lt' -> last record with key < to key
'gt' -> first record with key > to key
"""
# descend tree to leaf nearest to the `key`
page = self.readpage(self.firstindex)
stack = []
while len(stack) < 256:
act, ix = page.find(key)
stack.append((page, ix))
if act != 'recurse':
break
page = self.readpage(page.getpage(ix))
if len(stack) == 256:
raise Exception("b-tree corrupted")
cursor = BTree.Cursor(self, stack)
# now correct for what was actually asked.
if act == rel:
pass
elif rel == 'eq' and act != 'eq':
return None
elif rel in ('ge', 'le') and act == 'eq':
pass
elif rel in ('gt', 'ge') and act == 'lt':
cursor.next()
elif rel == 'gt' and act == 'eq':
cursor.next()
elif rel in ('lt', 'le') and act == 'gt':
cursor.prev()
elif rel == 'lt' and act == 'eq':
cursor.prev()
return cursor
def dump(self):
""" raw dump of all records in the b-tree """
print("pagesize=%08x, reccount=%08x, pagecount=%08x" % (self.pagesize, self.reccount, self.pagecount))
self.dumpfree()
self.dumptree(self.firstindex)
def dumpfree(self):
""" list all free pages """
fmt = "L" if self.version > 15 else "H"
hdrsize = 8 if self.version > 15 else 4
pn = self.firstfree
if pn == 0:
print("no free pages")
return
while pn:
self.fh.seek(pn * self.pagesize)
data = self.fh.read(self.pagesize)
if len(data) == 0:
print("could not read FREE data at page %06x" % pn)
break
count, nextfree = struct.unpack_from("<" + (fmt * 2), data)
freepages = list(struct.unpack_from("<" + (fmt * count), data, hdrsize))
freepages.insert(0, pn)
for pn in freepages:
self.fh.seek(pn * self.pagesize)
data = self.fh.read(self.pagesize)
print("%06x: free: %s" % (pn, hexdump(data[:64])))
pn = nextfree
def dumpindented(self, pn, indent=0):
"""
Dump all nodes of the current page with keys indented, showing how the `indent`
feature works
"""
page = self.readpage(pn)
print(" " * indent, page)
if page.isindex():
print(" " * indent, end="")
self.dumpindented(page.preceeding, indent + 1)
for p in range(len(page.index)):
print(" " * indent, end="")
self.dumpindented(page.getpage(p), indent + 1)
def dumptree(self, pn):
"""
Walks entire tree, dumping all records on each page
in sequential order
"""
page = self.readpage(pn)
print("%06x: preceeding = %06x, reccount = %04x" % (pn, page.preceeding, page.count))
for ent in page.index:
print(" %s" % ent)
if page.preceeding:
self.dumptree(page.preceeding)
for ent in page.index:
self.dumptree(ent.page)
def pagedump(self):
"""
dump the contents of all pages, ignoring links between pages,
this will enable you to view contents of pages which have become
lost due to datacorruption.
"""
self.fh.seek(self.pagesize)
pn = 1
while True:
try:
pagedata = self.fh.read(self.pagesize)
if len(pagedata) == 0:
break
elif len(pagedata) != self.pagesize:
print("%06x: incomplete - %d bytes ( pagesize = %d )" % (pn, len(pagedata), self.pagesize))
break
elif pagedata == b'\x00' * self.pagesize:
print("%06x: empty" % (pn))
else:
page = self.page(pagedata)
print("%06x: preceeding = %06x, reccount = %04x" % (pn, page.preceeding, page.count))
for ent in page.index:
print(" %s" % ent)
except Exception as e:
print("%06x: ERROR decoding as B-tree page: %s" % (pn, e))
pn += 1
class ID0File(object):
"""
Reads .id0 or 0.ida files, containing a v1.5, v1.6 or v2.0 b-tree database.
This is basically the low level netnode interface from the idasdk.
There are two major groups of nodes in the database:
key = "N"+name -> value = littleendian(nodeid)
key = "."+bigendian(nodeid)+char(tag)+bigendian(value)
key = "."+bigendian(nodeid)+char(tag)+string
key = "."+bigendian(nodeid)+char(tag)
and some special nodes for bookkeeping:
"$ MAX LINK"
"$ MAX NODE"
"$ NET DESC"
Very old databases also have name entries with a lowercase 'n',
and corresponding '-'+value nodes.
I am not sure what those are for.
several items have specially named nodes, like "$ structs", "$ enums", "Root Node"
nodeByName(name) returns the nodeid for a name
bytes(nodeid, tag, val) returns the value for a specific node.
"""
INDEX = 0
def __init__(self, idb, fh):
self.btree = BTree(fh)
self.wordsize = None
self.maxnode = None
if idb.magic == 'IDA2':
# .i64 files use 64 bit values for some things.
self.wordsize = 8
elif idb.magic in ('IDA0', 'IDA1'):
self.wordsize = 4
else:
# determine wordsize from value of '$ MAX NODE'
c = self.btree.find('eq', b'$ MAX NODE')
if c and not c.eof():
self.maxnode = c.getval()
self.wordsize = len(c.getval())
if self.wordsize not in (4, 8):
print("Can not determine wordsize for database - assuming 32 bit")
self.wordsize = 4
if self.wordsize == 4:
self.nodebase = 0xFF000000
if not self.maxnode:
self.maxnode = self.nodebase + 0x0FFFFF
self.fmt = "L"
else:
self.nodebase = 0xFF00000000000000
if not self.maxnode:
self.maxnode = self.nodebase + 0x0FFFFFFF
self.fmt = "Q"
# set the keyformat for this database
self.keyfmt = ">s" + self.fmt + "s" + self.fmt
@cachedproperty
def root(self): return self.nodeByName("Root Node")
# note: versions before 4.7 used a short instead of a long
# and stored the versions with one minor digit ( 43 ) , instead of two ( 480 )
@cachedproperty
def idaver(self): return self.int(self.root, 'A', -1)
@cachedproperty
def idbparams(self): return self.bytes(self.root, 'S', 0x41b994)
@cachedproperty
def idaverstr(self): return self.string(self.root, 'S', 1303)
@cachedproperty
def nropens(self): return self.int(self.root, 'A', -4)
@cachedproperty
def creationtime(self): return self.int(self.root, 'A', -2)
@cachedproperty
def originmd5(self): return self.bytes(self.root, 'S', 1302)
@cachedproperty
def somecrc(self): return self.int(self.root, 'A', -5)
def prettykey(self, key):
"""
returns the key in a readable format.
"""
f = list(self.decodekey(key))
f[0] = f[0].decode('utf-8')
if len(f) > 2 and type(f[2]) == bytes:
f[2] = f[2].decode('utf-8')
if f[0] == '.':
if len(f) == 2:
return "%s%16x" % tuple(f)
elif len(f) == 3:
return "%s%16x %s" % tuple(f)
elif len(f) == 4:
if f[2] == 'H' and type(f[3]) in (str, bytes):
f[3] = f[3].decode('utf-8')
return "%s%16x %s '%s'" % tuple(f)
elif type(f[3]) in (int, long):
return "%s%16x %s %x" % tuple(f)
else:
f[3] = hexdump(f[3])
return "%s%16x %s %s" % tuple(f)
elif f[0] in ('N', 'n', '$'):
if type(f[1]) in (int, long):
return "%s %x %16x" % tuple(f)
else:
return "%s'%s'" % tuple(f)
elif f[0] == '-':
return "%s %x" % tuple(f)
return hexdump(key)
def prettyval(self, val):
"""
returns the value in a readable format.
"""
if len(val) == self.wordsize and val[-1:] in (b'\x00', b'\xff'):
return "%x" % struct.unpack("<" + self.fmt, val)
if len(val) == self.wordsize and re.search(b'[\x00-\x08\x0b\x0c\x0e-\x1f]', val, re.DOTALL):
return "%x" % struct.unpack("<" + self.fmt, val)
if len(val) < 2 or not re.match(b'^[\x09\x0a\x0d\x20-\xff]+.$', val, re.DOTALL):
return hexdump(val)
val = val.replace(b"\n", b"\\n")
return "'%s'" % val.decode('utf-8', 'ignore')
def nodeByName(self, name):
""" Return a nodeid by name """
# note: really long names are encoded differently:
# 'N'+'\x00'+pack('Q', nameid) => ofs
# and (ofs, 'N') -> nameid
# at nodebase ( 0xFF000000, 'S', 0x100*nameid ) there is a series of blobs for max 0x80000 sized names.
cur = self.btree.find('eq', self.namekey(name))
if cur:
return struct.unpack('<' + self.fmt, cur.getval())[0]
def namekey(self, name):
if type(name) in (int, long):
return struct.pack("<sB" + self.fmt, b'N', 0, name)
return b'N' + name.encode('utf-8')
def makekey(self, *args):
"""
Return a binary key for the nodeid, tag and optional value
makekey(node)
makekey(node, tag)
makekey(node, tag, stringvalue)
makekey(node, tag, intvalue)
"""
if len(args) > 1:
# utf-8 encode the tag
args = args[:1] + (args[1].encode('utf-8'),) + args[2:]
if len(args) == 3 and type(args[-1]) == str:
# node.tag.string type keys
return struct.pack(self.keyfmt[:1 + len(args)], b'.', *args[:-1]) + args[-1].encode('utf-8')
elif len(args) == 3 and type(args[-1]) == type(-1) and args[-1] < 0:
# negative values -> need lowercase fmt char
return struct.pack(self.keyfmt[:1 + len(args)] + self.fmt.lower(), b'.', *args)
else:
# node.tag.value type keys
return struct.pack(self.keyfmt[:2 + len(args)], b'.', *args)
def decodekey(self, key):
"""
splits a key in a tuple, one of:
( [ 'N', 'n', '$' ], 0, bignameid )
( [ 'N', 'n', '$' ], name )
( '-', id )
( '.', id )
( '.', id, tag )
( '.', id, tag, value )
( '.', id, 'H', name )
"""
if key[:1] in (b'n', b'N', b'$'):
if key[1:2] == b"\x00" and len(key) == 2 + self.wordsize:
return struct.unpack(">sB" + self.fmt, key)
else:
return key[:1], key[1:].decode('utf-8', 'ignore')
if key[:1] == b'-':
return struct.unpack(">s" + self.fmt, key)
if len(key) == 1 + self.wordsize:
return struct.unpack(self.keyfmt[:3], key)
if len(key) == 1 + self.wordsize + 1:
return struct.unpack(self.keyfmt[:4], key)
if len(key) == 1 + 2 * self.wordsize + 1:
return struct.unpack(self.keyfmt[:5], key)
if len(key) > 1 + self.wordsize + 1:
f = struct.unpack_from(self.keyfmt[:4], key)
return f + (key[2 + self.wordsize:], )
raise Exception("unknown key format")
def bytes(self, *args):
""" return a raw value for the given arguments """
if len(args) == 1 and isinstance(args[0], BTree.Cursor):
cur = args[0]
else:
cur = self.btree.find('eq', self.makekey(*args))
if cur:
return cur.getval()
def int(self, *args):
"""
Return the integer stored in the specified node.
Any type of integer will be decoded: byte, short, long, long long
"""
data = self.bytes(*args)
if data is not None:
if len(data) == 1:
return struct.unpack("<B", data)[0]
if len(data) == 2:
return struct.unpack("<H", data)[0]
if len(data) == 4:
return struct.unpack("<L", data)[0]
if len(data) == 8:
return struct.unpack("<Q", data)[0]
print("can't get int from %s" % hexdump(data))
def string(self, *args):
""" return string stored in node """
data = self.bytes(*args)
if data is not None:
return data.rstrip(b"\x00").decode('utf-8')
def name(self, id):
"""
resolves a name, both short and long names.
"""
data = self.bytes(id, 'N')
if not data:
print("%x has no name" % id)
return
if data[:1] == b'\x00':
nameid, = struct.unpack_from(">" + self.fmt, data, 1)
nameblob = self.blob(self.nodebase, 'S', nameid * 256, nameid * 256 + 32)
return nameblob.rstrip(b"\x00").decode('utf-8')
return data.rstrip(b"\x00").decode('utf-8')
def blob(self, nodeid, tag, start=0, end=0xFFFFFFFF):
"""
Blobs are stored in sequential nodes
with increasing index values.
most blobs, like scripts start at index
0, long names start at a specified
offset.
"""
startkey = self.makekey(nodeid, tag, start)
endkey = self.makekey(nodeid, tag, end)
cur = self.btree.find('ge', startkey)
data = b''
while cur.getkey() <= endkey:
data += cur.getval()
cur.next()
return data
class ID1File(object):
"""
Reads .id1 or 1.IDA files, containing byte flags
This is basically the information for the .idc GetFlags(ea),
FirstSeg(), NextSeg(ea), SegStart(ea), SegEnd(ea) functions
"""
INDEX = 1
class SegInfo:
def __init__(self, startea, endea, offset):
self.startea = startea
self.endea = endea
self.offset = offset
def __init__(self, idb, fh):
if idb.magic == 'IDA2':
wordsize, fmt = 8, "Q"
else:
wordsize, fmt = 4, "L"
# todo: verify wordsize using the following heuristic:
# L -> starting at: seglistofs + nsegs*seginfosize are all zero
# L -> starting at seglistofs .. nsegs*seginfosize every even word must be unique
self.fh = fh
fh.seek(0)
hdrdata = fh.read(32)
magic = hdrdata[:4]
if magic in (b'Va4\x00', b'Va3\x00', b'Va2\x00', b'Va1\x00', b'Va0\x00'):
nsegments, npages = struct.unpack_from("<HH", hdrdata, 4)
# filesize / npages == 0x2000 for all cases
seglistofs = 8
seginfosize = 3
elif magic == b'VA*\x00':
always3, nsegments, always2k, npages = struct.unpack_from("<LLLL", hdrdata, 4)
if always3 != 3:
print("ID1: first dword != 3: %08x" % always3)
if always2k != 0x800:
print("ID1: third dword != 2k: %08x" % always2k)
seglistofs = 20
seginfosize = 2
else:
raise Exception("unknown id1 magic: %s" % hexdump(magic))
self.seglist = []
# Va0 - ida v3.0.5
# Va3 - ida v3.6
fh.seek(seglistofs)
if magic in (b'Va4\x00', b'Va3\x00', b'Va2\x00', b'Va1\x00', b'Va0\x00'):
segdata = fh.read(nsegments * 3 * wordsize)
for o in range(nsegments):
startea, endea, id1ofs = struct.unpack_from("<" + fmt + fmt + fmt, segdata, o * seginfosize * wordsize)
self.seglist.append(self.SegInfo(startea, endea, id1ofs))
elif magic == b'VA*\x00':
segdata = fh.read(nsegments * 2 * wordsize)
id1ofs = 0x2000
for o in range(nsegments):
startea, endea = struct.unpack_from("<" + fmt + fmt, segdata, o * seginfosize * wordsize)
self.seglist.append(self.SegInfo(startea, endea, id1ofs))
id1ofs += 4 * (endea - startea)
def is32bit_heuristic(self, fh, seglistofs):
fh.seek(seglistofs)
# todo: verify wordsize using the following heuristic:
# L -> starting at: seglistofs + nsegs*seginfosize are all zero
# L -> starting at seglistofs .. nsegs*seginfosize every even word must be unique
def dump(self):
""" print first and last bits for each segment """
for seg in self.seglist:
print("==== %08x-%08x" % (seg.startea, seg.endea))
if seg.endea - seg.startea < 30:
for ea in range(seg.startea, seg.endea):
print(" %08x: %08x" % (ea, self.getFlags(ea)))
else:
for ea in range(seg.startea, seg.startea + 10):
print(" %08x: %08x" % (ea, self.getFlags(ea)))
print("...")
for ea in range(seg.endea - 10, seg.endea):
print(" %08x: %08x" % (ea, self.getFlags(ea)))
def find_segment(self, ea):
""" do a linear search for the given address in the segment list """
for seg in self.seglist:
if seg.startea <= ea < seg.endea:
return seg
def getFlags(self, ea):
seg = self.find_segment(ea)
if not seg:
return 0
self.fh.seek(seg.offset + 4 * (ea - seg.startea))
return struct.unpack("<L", self.fh.read(4))[0]
def firstSeg(self):
return self.seglist[0].startea
def nextSeg(self, ea):
for i, seg in enumerate(self.seglist):
if seg.startea <= ea < seg.endea:
if i + 1 < len(self.seglist):
return self.seglist[i + 1].startea
else:
return
def segStart(self, ea):
seg = self.find_segment(ea)
if not seg:
return
return seg.startea
def segEnd(self, ea):
seg = self.find_segment(ea)
if not seg:
return
return seg.endea
class NAMFile(object):
""" reads .nam or NAMES.IDA files, containing ptrs to named items """
INDEX = 2
def __init__(self, idb, fh):
if idb.magic == 'IDA2':
wordsize, fmt = 8, "Q"
else:
wordsize, fmt = 4, "L"
self.fh = fh
fh.seek(0)
hdrdata = fh.read(64)
magic = hdrdata[:4]
# Va0 - ida v3.0.5
# Va1 - ida v3.6
if magic in (b'Va4\x00', b'Va3\x00', b'Va2\x00', b'Va1\x00', b'Va0\x00'):
always1, npages, always0, nnames, pagesize = struct.unpack_from("<HH" + fmt + fmt + "L", hdrdata, 4)
if always1 != 1: print("nam: first hw = %d" % always1)
if always0 != 0: print("nam: third dw = %d" % always0)
elif magic == b'VA*\x00':
always3, always1, always2k, npages, always0, nnames = struct.unpack_from("<LLLL" + fmt + "L", hdrdata, 4)
if always3 != 3: print("nam: 3 hw = %d" % always3)
if always1 != 1: print("nam: 1 hw = %d" % always1)
if always0 != 0: print("nam: 0 dw = %d" % always0)
if always2k != 0x800: print("nam: 2k dw = %d" % always2k)
pagesize = 0x2000
else:
raise Exception("unknown nam magic: %s" % hexdump(magic))
if idb.magic == 'IDA2':
nnames >>= 1
self.wordsize = wordsize
self.wordfmt = fmt
self.nnames = nnames
self.pagesize = pagesize
def dump(self):
print("nam: nnames=%d, npages=%d, pagesize=%08x" % (self.nnames, self.npages, self.pagesize))
def allnames(self):
self.fh.seek(self.pagesize)
n = 0
while n < self.nnames:
data = self.fh.read(self.pagesize)
want = min(self.nnames - n, int(self.pagesize / self.wordsize))
ofslist = struct.unpack_from("<%d%s" % (want, self.wordfmt), data, 0)
for ea in ofslist:
yield ea
n += want
class SEGFile(object):
""" reads .seg or $SEGS.IDA files. """
INDEX = 3
def __init__(self, idb, fh):
pass
class TILFile(object):
""" reads .til files """
INDEX = 4
def __init__(self, idb, fh):
pass
# note: v3 databases had a .reg instead of .til
class ID2File(object):
"""
Reads .id2 files
ID2 sections contain packed data, resulting in tripples
of unknown use.
"""
INDEX = 5
def __init__(self, idb, fh):
pass
class Struct:
"""
Decodes info for structures
(structnode, N) = structname
(structnode, D, address) = xref-type
(structnode, M, 0) = packed struct info
(structnode, S, 27) = packed value(addr, byte)
"""
class Member:
"""
(membernode, N) = struct.member-name
(membernode, A, 3) = structid+1
(membernode, A, 8) =
(membernode, A, 11) = enumid+1
(membernode, A, 16) = flag? -- 4:variable length flag?
(membernode, S, 0x3000) = type (set with 'Y')
(membernode, S, 0x3001) = names used in 'type'
(membernode, S, 5) = array type?
(membernode, S, 9) = offset-type
(membernode, D, address) = xref-type
(membernode, d, structid) = xref-type -- for sub-structs
"""
def __init__(self, id0, spec):
self._id0 = id0
self._nodeid = spec.nextword() + self._id0.nodebase
self.skip = spec.nextword()
self.size = spec.nextword()
self.flags = spec.next32()
self.props = spec.next32()
self.ofs = None
@cachedproperty
def name(self): return self._id0.name(self._nodeid)
@cachedproperty
def enumid(self): return self._id0.int(self._nodeid, 'A', 11)
@cachedproperty
def stringtype(self): return self._id0.int(self._nodeid, 'A', 16)
@cachedproperty
def structid(self): return self._id0.int(self._nodeid, 'A', 3)
@cachedproperty
def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
@cachedproperty
def ptrinfo(self): return self._id0.bytes(self._nodeid, 'S', 9)
@cachedproperty
def typeinfo(self): return self._id0.bytes(self._nodeid, 'S', 0x3000)
def __init__(self, id0, nodeid):
self._id0 = id0
self._nodeid = nodeid
spec = self._id0.blob(self._nodeid, 'M')
p = IdaUnpacker(self._id0.wordsize, spec)
if self._id0.idaver >= 40:
# 1 = SF_VAR, 2 = SF_UNION, 4 = SF_HASHUNI, 8 = SF_NOLIST, 0x10 = SF_TYPLIB, 0x20 = SF_HIDDEN, 0x40 = SF_FRAME, 0xF80 = SF_ALIGN, 0x1000 = SF_GHOST
self.flags = p.next32()
else:
self.flags = 0
nmembers = p.next32()
self.members = []
o = 0
for i in range(nmembers):
m = Struct.Member(self._id0, p)
m.ofs = o
o += m.size
self.members.append(m)
self.extra = []
while not p.eof():
self.extra.append(p.next32())
@cachedproperty
def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
@cachedproperty
def name(self): return self._id0.name(self._nodeid)
def __iter__(self):
for m in self.members:
yield m
class Enum:
"""
(enumnode, N) = enum-name
(enumnode, A, -1) = nr of values
(enumnode, A, -3) = representation
(enumnode, A, -5) = flags: bitfield, hidden, ...
(enumnode, A, -8) =
(enumnode, E, value) = valuenode + 1
"""
class Member:
"""
(membernode, N) = membername
(membernode, A, -2) = enumnode + 1
(membernode, A, -3) = member value
"""
def __init__(self, id0, nodeid):
self._id0 = id0
self._nodeid = nodeid
@cachedproperty
def value(self): return self._id0.int(self._nodeid, 'A', -3)
@cachedproperty
def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
@cachedproperty
def name(self): return self._id0.name(self._nodeid)
def __init__(self, id0, nodeid):
self._id0 = id0
self._nodeid = nodeid
@cachedproperty
def count(self): return self._id0.int(self._nodeid, 'A', -1)
@cachedproperty
def representation(self): return self._id0.int(self._nodeid, 'A', -3)
# flags>>3 -> width
# flags&1 -> bitfield
@cachedproperty
def flags(self): return self._id0.int(self._nodeid, 'A', -5)
@cachedproperty
def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
@cachedproperty
def name(self): return self._id0.name(self._nodeid)
def __iter__(self):
startkey = self._id0.makekey(self._nodeid, 'E')
endkey = self._id0.makekey(self._nodeid, 'F')
cur = self._id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
yield Enum.Member(self._id0, self._id0.int(cur) - 1)
cur.next()
class Bitfield:
class Member:
def __init__(self, id0, nodeid):
self._id0 = id0
self._nodeid = nodeid
@cachedproperty
def value(self): return self._id0.int(self._nodeid, 'A', -3)
@cachedproperty
def mask(self): return self._id0.int(self._nodeid, 'A', -6) - 1
@cachedproperty
def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
@cachedproperty
def name(self): return self._id0.name(self._nodeid)
class Mask:
def __init__(self, id0, nodeid, mask):
self._id0 = id0
self._nodeid = nodeid
self.mask = mask
@cachedproperty
def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
@cachedproperty
def name(self): return self._id0.name(self._nodeid)
def __iter__(self):
"""
Enumerates all Masks
"""
startkey = self._id0.makekey(self._nodeid, 'E')
endkey = self._id0.makekey(self._nodeid, 'F')
cur = self._id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
yield Bitfield.Member(self._id0, self._id0.int(cur) - 1)
cur.next()
def __init__(self, id0, nodeid):
self._id0 = id0
self._nodeid = nodeid
@cachedproperty
def count(self): return self._id0.int(self._nodeid, 'A', -1)
@cachedproperty
def representation(self): return self._id0.int(self._nodeid, 'A', -3)
@cachedproperty
def flags(self): return self._id0.int(self._nodeid, 'A', -5)
@cachedproperty
def comment(self, repeatable): return self._id0.string(self._nodeid, 'S', 1 if repeatable else 0)
@cachedproperty
def name(self): return self._id0.name(self._nodeid)
def __iter__(self):
"""
Enumerates all Masks
"""
startkey = self._id0.makekey(self._nodeid, 'm')
endkey = self._id0.makekey(self._nodeid, 'n')
cur = self._id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
key = self._id0.decodekey(cur.getkey())
yield Bitfield.Mask(self._id0, self._id0.int(cur) - 1, key[-1])
cur.next()
class IDBParams:
def __init__(self, id0, data):
self._id0 = id0
magic, self.version, = struct.unpack_from("<3sH", data, 0)
if self.version<700:
cpu, self.idpflags, self.demnames, self.filetype, self.coresize, self.corestart, self.ostype, self.apptype = struct.unpack_from("<8sBBH" + (id0.fmt * 2) + "HH", data, 5)
self.cpu = strz(cpu, 0)
else:
p = IdaUnpacker(id0.wordsize, data[5:])
cpulen = p.next32()
self.cpu = p.bytes(cpulen)
genflags = p.next32()
self.idpflags = p.next32()
self.demnames = 0
changecount = p.next32()
self.filetype = p.next32()
self.ostype = p.next32()
self.apptype = p.next32()
asmtype = p.next32()
specsegs = p.next32()
specsegs = p.next32()
aflags = p.next32()
aflags2 = p.next32()
base = p.nextword()
startss = p.nextword()
startcs = p.nextword()
startip = p.nextword()
startea = p.nextword()
startsp = p.nextword()
main = p.nextword()
minea = p.nextword()
maxea = p.nextword()
self.coresize = 0
self.corestart = 0
class Script:
def __init__(self, id0, nodeid):
self._id0 = id0
self._nodeid = nodeid
@cachedproperty
def name(self): return self._id0.string(self._nodeid, 'S', 0)
@cachedproperty
def language(self): return self._id0.string(self._nodeid, 'S', 1)
@cachedproperty
def body(self): return strz(self._id0.blob(self._nodeid, 'X'), 0)
class Segment:
"""
Decodes a value from "$ segs", see segment_t in segment.hpp for details.
"""
def __init__(self, id0, spec):
self._id0 = id0
p = IdaUnpacker(id0.wordsize, spec)
self.startea = p.nextword()
self.size = p.nextword()
self.name_id = p.nextword()
self.class_id = p.nextword()
self.orgbase = p.nextword()
self.unknown = p.next16()
self.align = p.next8()
self.comb = p.next8()
self.perm = p.next8()
self.bitness = p.next8()
self.flags = p.next8()
self.selector = p.nextword()
self.defsr = [p.nextword() for _ in range(16)]
self.color = p.next32()
================================================
FILE: idbtool.py
================================================
#!/usr/bin/python3
"""
Tool for querying information from Hexrays .idb and .i64 files
without launching IDA.
Copyright (c) 2016 Willem Hengeveld <itsme@xs4all.nl>
"""
# todo:
# '$ segs'
# S <segaddr> = packed(startea, size, ....)
# '$ srareas'
# a <addr> = packed(startea, size, flag, flag) -- includes functions
# b <addr> = packed(startea, size, flag, flag) -- segment
# c <addr> = packed(startea, size, flag, flag) -- same as 'b'
#
from __future__ import division, print_function, absolute_import, unicode_literals
import sys
import os
if sys.version_info[0] == 2:
import scandir
os.scandir = scandir.scandir
if sys.version_info[0] == 2:
reload(sys)
sys.setdefaultencoding('utf-8')
if sys.version_info[0] == 2:
stdout = sys.stdout
else:
stdout = sys.stdout.buffer
import struct
import binascii
import argparse
import itertools
from collections import defaultdict
import re
from datetime import datetime
import idblib
from idblib import hexdump
def timestring(t):
if t == 0:
return "....-..-.. ..:..:.."
return datetime.strftime(datetime.fromtimestamp(t), "%Y-%m-%d %H:%M:%S")
def strz(b, o):
return b[o:b.find(b'\x00', o)].decode('utf-8', 'ignore')
def nonefmt(fmt, num):
if num is None:
return "-"
return fmt % num
######### license encoding ################
def decryptuser(data):
"""
The '$ original user' node is encrypted with hexray's private key.
Hence we can easily decrypt it, but not change it to something else.
We can however copy the entry from another database, or just replace it with garbage.
The node contains 128 bytes encrypted license, followed by 32 bytes zero.
Note: i found several ida55 databases online where this does not work.
possible these were created using a cracked version of IDA.
"""
data = int(binascii.b2a_hex(data[127::-1]), 16)
user = pow(data, 0x13, 0x93AF7A8E3A6EB93D1B4D1FB7EC29299D2BC8F3CE5F84BFE88E47DDBDD5550C3CE3D2B16A2E2FBD0FBD919E8038BB05752EC92DD1498CB283AA087A93184F1DD9DD5D5DF7857322DFCD70890F814B58448071BBABB0FC8A7868B62EB29CC2664C8FE61DFBC5DB0EE8BF6ECF0B65250514576C4384582211896E5478F95C42FDED)
user = binascii.a2b_hex("%0256x" % user)
return user[1:]
def licensestring(lic):
""" decode a license blob """
if not lic:
return
if len(lic) < 127:
print("too short license format: %s" % binascii.b2a_hex(lic))
return
elif len(lic) > 127 and sum(lic[127:]) != 0:
print("too long license format: %s" % binascii.b2a_hex(lic))
return
if struct.unpack_from("<L", lic, 106)[0]:
print("unknown license format: %s" % binascii.b2a_hex(lic))
return
# first 2 bytes probably a checksum
licver, = struct.unpack_from("<H", lic, 2)
time, = struct.unpack_from("<L", lic, 4)
# new 'Freeware version' has licver == 0 as well, but is new format anyway, it is recognizable by time==0x10000
if licver == 0 and time != 0x10000:
if time:
"""
# up to and including ida v5.2
+00: int16 checksum?
+02: int16 zero
+04: int32 unix timestamp
+08: byte[8] zero
+10: int32 flags
+14: char[107] license text
"""
licflags, = struct.unpack_from("<L", lic, 16)
licensee = strz(lic, 20)
return "%s [%08x] %s" % (timestring(time), licflags, licensee)
else:
"""
+00: byte[0x13] zero
+13: int32 ?
+17: int32 timestamp
+1b: byte[8] zero
+23: int32 flags
+27: char[88] license text
"""
unk, = struct.unpack_from("<L", lic, 0x13)
time, = struct.unpack_from("<L", lic, 0x17)
licflags, = struct.unpack_from("<L", lic, 0x23)
licensee = strz(lic, 0x27)
return "%s [%08x] (%08x) %s" % (timestring(time), licflags, unk, licensee)
else:
"""
# since ida v5.3
+00: int16 checksum?
+02: int16 idaversion
+04: int32 ? small number, 1 or 2.
+08: int64 ? -1 or big number, maybe license flags?
+10: int32 timestamp
+14: int32 zero
+18: int32 sometimes another timestamp
+1c: byte[6] license id
+22: char[*] license text ( v5.3-v5.x : 93 chars, v6.0: 77 chars, v6.5: 69 chars )
+67: int64 ? since ida v6.50
+6f: byte[16] hash .. since ida v6.00
"""
time1, = struct.unpack_from("<L", lic, 16)
time2, = struct.unpack_from("<L", lic, 16 + 8)
licid = "%02X-%02X%02X-%02X%02X-%02X" % struct.unpack_from("6B", lic, 28)
licensee = strz(lic, 34)
return "v%04d %s .. %s %s %s" % (licver, timestring(time1), timestring(time2), licid, licensee)
def dumpuser(id0):
""" dump the original, and current database user """
orignode = id0.nodeByName('$ original user')
if orignode:
user0 = id0.bytes(orignode, 'S', 0)
if user0:
if user0.find(b'\x00\x00\x00\x00') >= 128:
user0 = decryptuser(user0)
else:
user0 = user0[:127]
# user0 has 128 bytes rsa encrypted license, followed by 32 bytes zero
print("orig: %s" % licensestring(user0))
# ida9 has S10+S11 == license json
user10 = id0.blob(orignode, 'S', 16)
if user10:
import json
user10 = json.loads(user10)
print("orig: %s" % user10)
curnode = id0.nodeByName('$ user1')
if curnode:
user1 = id0.bytes(curnode, 'S', 0)
print("user: %s" % licensestring(user1))
######### idb summary #########
filetypelist = [
"MS DOS EXE File",
"MS DOS COM File",
"Binary File",
"MS DOS Driver",
"New Executable (NE)",
"Intel Hex Object File",
"MOS Technology Hex Object File",
"Linear Executable (LX)",
"Linear Executable (LE)",
"Netware Loadable Module (NLM)",
"Common Object File Format (COFF)",
"Portable Executable (PE)",
"Object Module Format",
"R-records",
"ZIP file (this file is never loaded to IDA database)",
"Library of OMF Modules",
"ar library",
"file is loaded using LOADER DLL",
"Executable and Linkable Format (ELF)",
"Watcom DOS32 Extender (W32RUN)",
"Linux a.out (AOUT)",
"PalmPilot program file",
"MS DOS EXE File",
"MS DOS COM File",
"AIX ar library",
"Mac OS X Mach-O file",
]
def dumpinfo(id0):
""" print various infos on the idb file """
def ftstring(ft):
if 0 < ft < len(filetypelist):
return "%02x:%s" % (ft, filetypelist[ft])
return "%02x:unknown" % ft
def decodebitmask(fl, bitnames):
l = []
knownbits = 0
for bit, name in enumerate(bitnames):
if fl & (1 << bit) and name is not None:
l.append(name)
knownbits |= 1 << bit
if fl & ~knownbits:
l.append("unknown_%x" % (fl & ~knownbits))
return ",".join(l)
def osstring(fl):
return decodebitmask(fl, ['msdos', 'win', 'os2', 'netw', 'unix', 'other'])
def appstring(fl):
return decodebitmask(fl, ['console', 'graphics', 'exe', 'dll', 'driver', '1thread', 'mthread', '16bit', '32bit', '64bit'])
ldr = id0.nodeByName("$ loader name")
if ldr:
print("loader: %s %s" % (id0.string(ldr, 'S', 0), id0.string(ldr, 'S', 1)))
if not id0.root:
print("database has no RootNode")
return
if id0.idbparams:
params = idblib.IDBParams(id0, id0.idbparams)
print("cpu: %s, version=%d, filetype=%s, ostype=%s, apptype=%s, core:%x, size:%x" % (params.cpu, params.version, ftstring(params.filetype), osstring(params.ostype), appstring(params.apptype), params.corestart, params.coresize))
print("idaver=%s: %s" % (nonefmt("%04d", id0.idaver), id0.idaverstr))
srcmd5 = id0.originmd5
print("nopens=%s, ctime=%s, crc=%s, md5=%s" % (nonefmt("%d", id0.nropens), nonefmt("%08x", id0.creationtime), nonefmt("%08x", id0.somecrc), hexdump(srcmd5) if srcmd5 else "-"))
dumpuser(id0)
def dumpnames(args, id0, nam):
for ea in nam.allnames():
print("%08x: %s" % (ea, id0.name(ea)))
def dumpscript(id0, node):
""" dump all stored scripts """
s = idblib.Script(id0, node)
print("======= %s %s =======" % (s.language, s.name))
print(s.body)
def dumpstructmember(m):
"""
Dump info for a struct member.
"""
print(" %02x %02x %08x %02x: %-40s" % (m.skip, m.size, m.flags, m.props, m.name), end="")
if m.enumid:
print(" enum %08x" % m.enumid, end="")
if m.structid:
print(" struct %08x" % m.structid, end="")
if m.ptrinfo:
# packed
# note: 64bit nrs are stored low32, high32
# flags1, target, base, delta, flags2
# flags1:
# 0=off8 1=off16 2=off32 3=low8 4=low16 5=high8 6=high16 9=off64
# 0x10 = targetaddr, 0x20 = baseaddr, 0x40 = delta, 0x80 = base is plainnum
# flags2:
# 1=image is off, 0x10 = subtract, 0x20 = signed operand
print(" ptr %s" % m.ptrinfo, end="")
if m.typeinfo:
print(" type %s" % m.typeinfo, end="")
print()
def dumpstruct(id0, node):
"""
dump all info for the struct defined by `node`
"""
s = idblib.Struct(id0, node)
print("struct %s, 0x%x" % (s.name, s.flags))
for m in s:
dumpstructmember(m)
def dumpbitmember(m):
print(" %08x %s" % (m.value or 0, m.name))
def dumpmask(m):
print(" mask %08x %s" % (m.mask, m.name))
for m in m:
dumpbitmember(m)
def dumpbitfield(id0, node):
b = idblib.Bitfield(id0, node)
print("bitfield %s, %s, %s, %s" % (b.name, nonefmt("0x%x", b.count), nonefmt("0x%x", b.representation), nonefmt("0x%x", b.flags)))
for m in b:
dumpmask(m)
def dumpenummember(m):
"""
Print information on a single enum member
"""
print(" %08x %s" % (m.value or 0, m.name))
def dumpenum(id0, node):
"""
Dump all info for the enum defined by `node`
"""
e = idblib.Enum(id0, node)
if e.flags and e.flags&1:
dumpbitfield(id0, node)
return
print("enum %s, %s, %s, %s" % (e.name, nonefmt("0x%x", e.count), nonefmt("0x%x", e.representation), nonefmt("0x%x", e.flags)))
for m in e:
dumpenummember(m)
def dumpimport(id0, node):
# Note that '$ imports' is a list where the actual nodes
# are stored in the list, therefore we add '1' to the node here.
# first the named imports
startkey = id0.makekey(node+1, 'S')
endkey = id0.makekey(node+1, 'T')
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
txt = id0.string(cur)
key = cur.getkey()
ea = id0.decodekey(key)[3]
print("%08x: %s" % (ea, txt))
cur.next()
# then list the imports by ordinal
startkey = id0.makekey(node+1, 'A')
endkey = id0.makekey(node+1, 'B')
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
ordinal = id0.decodekey(cur.getkey())[3]
ea = id0.int(cur)
print("%08x: (ord%04d) %s" % (ea, ordinal, id0.name(ea)))
cur.next()
def enumlist(id0, listname, callback):
"""
Lists are all stored in a similar way.
(listnode, 'N') = listname
(listnode, 'A', -1) = list size <-- not for '$ scriptsnippets'
(listnode, 'A', seqnr) = itemnode+1
(listnode, 'Y', itemnode) = seqnr <-- only with '$ enums'
(listnode, 'Y', 0) = list size <-- only '$ scriptsnippets'
(listnode, 'Y', 1) = ? <-- only '$ scriptsnippets'
(listnode, 'S', seqnr) = dllname <-- only '$ imports'
"""
listnode = id0.nodeByName(listname)
if not listnode:
return
startkey = id0.makekey(listnode, 'A')
endkey = id0.makekey(listnode, 'A', 0xFFFFFFFF)
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
item = id0.int(cur)
callback(id0, item - 1)
cur.next()
def listfuncdirs(id0):
listnode = id0.nodeByName('$ dirtree/funcs')
if not listnode:
return
dir_id = 0
while True:
start = dir_id * 0x10000
end = start + 0xFFFF
data = id0.blob(listnode, 'S', start, end)
if data == b'':
break
dumpfuncdir(id0, dir_id, data)
dir_id += 1
def dumpfuncdir(id0, dir_index, data):
terminate = data.find(b'\0', 1)
name = data[1:terminate].decode('utf-8')
p = idblib.IdaUnpacker(id0.wordsize, data[terminate+1:])
parent = p.nextword()
unk = p.next32()
if data[0] == 0: # IDA 7.5
subdir_count = p.next32()
subdirs = []
while subdir_count:
subdir_id = p.nextwordsigned()
if subdirs:
subdir_id = subdirs[-1] + subdir_id
subdirs.append(subdir_id)
subdir_count -= 1
func_count = p.next32()
funcs = []
while func_count:
func_id = p.nextwordsigned()
if funcs:
func_id = funcs[-1] + func_id
funcs.append(func_id)
func_count -= 1
elif data[0] == 1: # IDA 7.6
children_count = p.next32()
children = []
for i in range(children_count):
next_child = p.nextwordsigned()
if children:
next_child += children[-1]
children.append(next_child)
subdir_count = p.next32()
children_count -= subdir_count
childtype_counts = [subdir_count]
while children_count:
childtype_count = p.next32()
children_count -= childtype_count
childtype_counts.append(childtype_count)
subdirs = []
funcs = []
i = 0
parsing_subdirs = True # switch back and forth
for childtype_count in childtype_counts:
for _ in range(childtype_count):
if parsing_subdirs:
subdirs.append(children[i])
else:
funcs.append(children[i])
i += 1
parsing_subdirs = not parsing_subdirs
else:
raise NotImplementedError('unsupported funcdir schema')
if not p.eof():
raise Exception('not EOF after dir parsed')
print("dir %d = %s" % (dir_index, name))
print(" parent = %d" % parent)
print(" subdirs:")
for subdir in subdirs:
print(" %d" % subdir)
print(" functions:")
for func in funcs:
print(" 0x%x" % func)
def printent(args, id0, c):
if args.verbose:
print("%s = %s" % (id0.prettykey(c.getkey()), id0.prettyval(c.getval())))
else:
print("%s = %s" % (hexdump(c.getkey()), hexdump(c.getval())))
def createkey(args, id0, base, tag, ix):
"""
parse base node specification:
'?<name>' -> explicit N<name> key
'#<number>' -> relative to nodebase
'.<number>' -> absolute nodeid
'<name>' -> lookup by name.
"""
if base[:1] == '?':
return id0.namekey(base[1:])
if re.match(r'^#(?:0[xX][0-9a-fA-F]+|\d+)$', base):
nodeid = int(base[1:], 0) + id0.nodebase
elif re.match(r'^\.(?:0[xX][0-9a-fA-F]+|\d+)$', base):
nodeid = int(base[1:], 0)
else:
nodeid = id0.nodeByName(base)
if nodeid and args.verbose > 1:
print("found node %x for %s" % (nodeid, base))
if nodeid is None:
print("Could not find '%s'" % base)
return
s = [nodeid]
if tag is not None:
s.append(tag)
if ix is not None:
try:
ix = int(ix, 0)
except:
pass
s.append(ix)
return id0.makekey(*s)
def enumeratecursor(args, c, onerec, callback):
"""
Enumerate cursor in direction specified by `--dec` or `--inc`,
taking into account the optional limit set by `--limit`
Output according to verbosity level set by `--verbose`.
"""
limit = args.limit
while c and not c.eof() and (limit is None or limit > 0):
callback(c)
if args.dec:
c.prev()
else:
c.next()
if limit is not None:
limit -= 1
elif onerec:
break
def id0query(args, id0, query):
"""
queries start with an optional operator: <,<=,>,>=,==
followed by either a name or address or nodeid
Addresses are specified as a sequence of hexadecimal charaters.
Nodeid's may be specified either as the full node id, starting with ff00,
or starting with a '_'
Names are anything which can be found under the name tree in the database.
after the name/addr/node there is optionally a slash, followed by a node tag,
and another slash, followed by a index or hash string.
"""
xlatop = {'=': 'eq', '==': 'eq', '>': 'gt', '<': 'lt', '>=': 'ge', '<=': 'le'}
SEP = r";"
m = re.match(r'^([=<>]=?)?(.+?)(?:' + SEP + r'(\w+)(?:' + SEP + r'(.+))?)?$', query)
op = m.group(1) or "=="
base = m.group(2)
tag = m.group(3) # optional ;tag
ix = m.group(4) # optional ;ix
op = xlatop[op]
c = id0.btree.find(op, createkey(args, id0, base, tag, ix))
enumeratecursor(args, c, op=='eq', lambda c:printent(args, id0, c))
def getsegs(id0):
"""
Returns a list of all segments.
"""
seglist = []
node = id0.nodeByName('$ segs')
if not node:
return
startkey = id0.makekey(node, 'S')
endkey = id0.makekey(node, 'T')
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
s = idblib.Segment(id0, cur.getval())
seglist.append(s)
cur.next()
return seglist
def listsegments(id0):
"""
Print a summary of all segments found in the IDB.
"""
ssnode = id0.nodeByName('$ segstrings')
if not ssnode:
print("can't find '$ segstrings' node")
return
segstrings = id0.blob(ssnode, 'S')
p = idblib.IdaUnpacker(id0.wordsize, segstrings)
unk = p.next32()
nextid = p.next32()
slist = []
while not p.eof():
slen = p.next32()
if slen is None:
break
name = p.bytes(slen)
if name is None:
break
slist.append(name.decode('utf-8', 'ignore'))
segs = getsegs(id0)
for s in segs:
print("%08x - %08x %s" % (s.startea, s.startea+s.size, slist[s.name_id-1]))
def classifynodes(args, id0):
"""
Attempt to classify all nodes in the IDA database.
Note: this does not work for very old dbs
"""
nodetype = {}
tagstats = defaultdict(lambda : defaultdict(int))
segs = getsegs(id0)
print("node: %x .. %x" % (id0.nodebase, id0.maxnode))
def addstat(nodetype, k):
if len(k)<3:
print("??? strange, expected longer key - %s" % k)
return
tag = k[2].decode('utf-8')
if len(k)==3:
tagstats[nodetype][(tag, )] += 1
elif len(k)==4:
value = k[3]
if type(value)==int:
if isaddress(value):
tagstats[nodetype][(tag, 'addr')] += 1
elif isnode(value):
tagstats[nodetype][(tag, 'node')] += 1
else:
if value >= id0.maxnode:
value -= pow(0x100, id0.wordsize)
tagstats[nodetype][(tag, value)] += 1
else:
tagstats[nodetype][(tag, 'string')] += 1
else:
print("??? strange, expected shorter key - %s" % k)
return
def isaddress(addr):
for s in segs:
if s.startea <= addr < s.startea+s.size:
return True
def isnode(addr):
return id0.nodebase <= addr <= id0.maxnode
def processbitfieldvalue(v):
nodetype[v._nodeid] = 'bitfieldvalue'
def processbitfieldmask(m):
nodetype[m._nodeid] = 'bitfieldmask'
for m in m:
processbitfieldvalue(m)
def processbitfield(id0, node):
nodetype[node] = 'bitfield'
b = idblib.Bitfield(id0, node)
for m in b:
processbitfieldmask(m)
def processenummember(m):
nodetype[m._nodeid] = 'enummember'
def processenums(id0, node):
nodetype[node] = 'enum'
e = idblib.Enum(id0, node)
if e.flags&1:
processbitfield(id0, node)
return
for m in e:
processenummember(m)
def processstructmember(m, typename):
nodetype[m._nodeid] = typename
def processstructs(id0, node, typename):
nodetype[node] = typename
s = idblib.Struct(id0, node)
for m in s:
processstructmember(m, typename+"member")
def processscripts(id0, node):
nodetype[node] = 'script'
def processaddr(id0, cur):
k = id0.decodekey(cur.getkey())
if len(k)==4 and k[2:4] == (b'A', 2):
nodetype[id0.int(cur)-1] = 'hexrays'
addstat('addr', k)
def processfunc(id0, funcspec):
p = idblib.IdaUnpacker(id0.wordsize, funcspec)
funcstart = p.nextword()
funcsize = p.nextword()
flags = p.next16()
if flags is None:
return
if flags&0x8000: # is tail
return
node = p.nextword()
if node<0xFFFFFF and node!=0:
processstructs(id0, node + id0.nodebase, "frame")
def processimport(id0, node):
print("imp %08x" % node)
startkey = id0.makekey(node+1, 'A')
endkey = id0.makekey(node+1, 'B')
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
dllnode = id0.int(cur)
nodetype[dllnode] = 'import'
cur.next()
# mark enums, structs, scripts.
enumlist(id0, '$ enums', processenums)
enumlist(id0, '$ structs', lambda id0, node : processstructs(id0, node, "struct"))
enumlist(id0, '$ scriptsnippets', processscripts)
enumlist(id0, '$ imports', processimport)
# enum functions, scan for stackframes
funcsnode = id0.nodeByName('$ funcs')
startkey = id0.makekey(funcsnode, 'S')
endkey = id0.makekey(funcsnode, 'T')
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
processfunc(id0, cur.getval())
cur.next()
clinode = id0.nodeByName('$ cli')
if clinode:
for letter in "ABCDEFGHIJKMcio":
startkey = id0.makekey(clinode, letter)
endkey = id0.makekey(clinode, chr(ord(letter)+1))
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
nodetype[id0.int(cur)] = 'cli.'+letter
cur.next()
# enum addresses, scan for hex-rays nodes
startkey = b'.'
endkey = id0.makekey(id0.nodebase)
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
processaddr(id0, cur)
cur.next()
# addresses above node list
startkey = id0.makekey(id0.maxnode+1)
endkey = b'/'
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
processaddr(id0, cur)
cur.next()
# scan for unmarked nodes
# $ fr[0-9a-f]+\.\w+
# $ fr[0-9a-f]+\. [rs]
# $ F[0-9A-F]+\.\w+
# $ Stack of \w+
# Stack[0000007C]
# xrefs to \w+
startkey = id0.makekey(id0.nodebase)
endkey = id0.makekey(id0.maxnode+1)
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
k = id0.decodekey(cur.getkey())
node = k[1]
if node not in nodetype:
nodetype[node] = "unknown"
if nodetype[node] == "unknown" and k[2] == b'N':
name = cur.getval().rstrip(b'\x00')
if re.match(br'\$ fr[0-9a-f]+\.\w+$', name):
name = 'fr-type-functionframe'
elif re.match(br'\$ fr[0-9a-f]+\. [rs]$', name):
name = 'fr-type-functionframe'
elif re.match(br'\$ F[0-9A-F]+\.\w+$', name):
name = 'F-type-functionframe'
elif name.startswith(b'Stack of '):
name = 'stack-type-functionframe'
elif name.startswith(b'Stack['):
name = 'old-stack-type-functionframe'
elif name.startswith(b'xrefs to '):
name = 'old-xrefs'
else:
name = name.decode('utf-8', 'ignore')
nodetype[node] = name
cur.next()
# output node classification
if args.verbose:
for k, v in sorted(nodetype.items(), key=lambda kv:kv[0]):
print("%08x: %s" % (k, v))
# summarize tags per nodetype
startkey = id0.makekey(id0.nodebase)
endkey = id0.makekey(id0.maxnode+1)
cur = id0.btree.find('ge', startkey)
while cur.getkey() < endkey:
k = id0.decodekey(cur.getkey())
node = k[1]
nt = nodetype[node]
addstat(nt, k)
cur.next()
# output tag statistics
for nt, ntstats in sorted(tagstats.items(), key=lambda kv:kv[0]):
print("====== %s =====" % nt)
for k, v in ntstats.items():
if len(k)==1:
print("%5d - %s" % (v, k[0]))
elif len(k)==2 and type(k[1])==type(1):
print("%5d - %s %8x" % (v, k[0], k[1]))
elif type(k[1])==type(1):
print("%5d - %s %8x %s" % (v, k[0], k[1], k[2:]))
else:
print("%5d - %s %s %s" % (v, k[0], k[1], k[2:]))
def processid0(args, id0):
if args.info:
dumpinfo(id0)
if args.pagedump:
id0.btree.pagedump()
if args.query:
for query in args.query:
id0query(args, id0, query)
elif args.id0:
id0.btree.dump()
elif args.inc:
c = id0.btree.find('ge', b'')
enumeratecursor(args, c, False, lambda c:printent(args, id0, c))
elif args.dec:
c = id0.btree.find('le', b'\x80')
enumeratecursor(args, c, False, lambda c:printent(args, id0, c))
def hexascdumprange(id1, a, b):
line = asc = ""
for ea in range(a, b):
if len(line)==0:
line = "%08x:" % ea
byte = id1.getFlags(ea)&0xFF
line += " %02x" % byte
asc += chr(byte) if 32<byte<127 else '.'
if len(line) == 9 + 3*16:
line += " " + asc
print(line)
line = asc = ""
if len(line):
while len(line) < 9 + 3*16:
line += " "
line += " " + asc
print(line)
def saverange(id1, a, b, fh):
buf = bytes()
for ea in range(a, b):
byte = id1.getFlags(ea)&0xFF
buf += struct.pack("B", byte)
if len(buf) == 65536:
fh.write(buf)
buf = bytes()
if buf:
fh.write(buf)
def processid1(args, id1):
if args.id1:
id1.dump()
elif args.dump or args.dumpraw:
m = re.match(r'^(\d\w*)-(\d\w*)?$', args.dump or args.dumpraw)
if not m:
raise Exception("--dump requires a byte range")
a = int(m.group(1), 0)
b = int(m.group(2), 0)
if args.dumpraw:
saverange(id1, a, b, stdout)
else:
hexascdumprange(id1, a, b)
def processid2(args, id2):
pass
def processnam(args, nam):
pass
def processtil(args, til):
pass
def processseg(args, seg):
pass
def processidb(args, idb):
if args.verbose > 1:
print("magic=%s, filever=%d" % (idb.magic, idb.fileversion))
for i in range(6):
comp, ofs, size, checksum = idb.getsectioninfo(i)
if ofs:
part = idb.getpart(i)
print("%2d: %02x, %08x %8x [%08x]: %s" % (i, comp, ofs, size, checksum, hexdump(part.read(256))))
nam = idb.getsection(idblib.NAMFile)
id0 = idb.getsection(idblib.ID0File)
id1 = idb.getsection(idblib.ID1File)
processid0(args, id0)
processid1(args, id1)
processid2(args, idb.getsection(idblib.ID2File))
processnam(args, nam)
processtil(args, idb.getsection(idblib.TILFile))
processseg(args, idb.getsection(idblib.SEGFile))
if args.names:
dumpnames(args, id0, nam)
if args.classify:
classifynodes(args, id0)
if args.scripts:
enumlist(id0, '$ scriptsnippets', dumpscript)
if args.structs:
enumlist(id0, '$ structs', dumpstruct)
if args.enums:
enumlist(id0, '$ enums', dumpenum)
if args.funcdirs:
listfuncdirs(id0)
if args.imports:
enumlist(id0, '$ imports', dumpimport)
if args.segs:
listsegments(id0)
def processfile(args, filetypehint, fh):
class DummyIDB:
def __init__(idb, args):
if args.i64:
idb.magic = 'IDA2'
elif args.i32:
idb.magic = 'IDA1'
else:
idb.magic = None
try:
magic = fh.read(64)
fh.seek(-64, 1)
if magic.startswith(b"Va") or magic.startswith(b"VA"):
idb = DummyIDB(args)
if filetypehint == 'id1':
processid1(args, idblib.ID1File(idb, fh))
elif filetypehint == 'nam':
processnam(args, idblib.NAMFile(idb, fh))
elif filetypehint == 'seg':
processseg(args, idblib.SEGFile(idb, fh))
else:
print("unknown VA type file: %s" % hexdump(magic))
elif magic.startswith(b"IDAS"):
processid2(args, idblib.ID2File(DummyIDB(args), fh))
elif magic.startswith(b"IDATIL"):
processtil(args, idblib.ID2File(DummyIDB(args), fh))
elif magic.startswith(b"IDA"):
processidb(args, idblib.IDBFile(fh))
elif magic.find(b'B-tree v') > 0:
processid0(args, idblib.ID0File(DummyIDB(args), fh))
except Exception as e:
print("ERROR %s" % e)
if args.debug:
raise
def recover_database(args, basepath, dbfiles):
processidb(args, idblib.RecoverIDBFile(args, basepath, dbfiles))
def DirEnumerator(args, path):
"""
Enumerate all files / links in a directory,
optionally recursing into subdirectories,
or ignoring links.
"""
for d in os.scandir(path):
try:
if d.name == '.' or d.name == '..':
pass
elif d.is_symlink() and args.skiplinks:
pass
elif d.is_file():
yield d.path
elif d.is_dir() and args.recurse:
for f in DirEnumerator(args, d.path):
yield f
except Exception as e:
print("EXCEPTION %s accessing %s/%s" % (e, path, d.name))
def EnumeratePaths(args, paths):
"""
Enumerate all paths, files from the commandline
optionally recursing into subdirectories.
"""
for fn in paths:
try:
# 3 - for ftp://, 4 for http://, 5 for https://
if fn.find("://") in (3, 4, 5):
yield fn
if os.path.islink(fn) and args.skiplinks:
pass
elif os.path.isdir(fn) and args.recurse:
for f in DirEnumerator(args, fn):
yield f
elif os.path.isfile(fn):
yield fn
except Exception as e:
print("EXCEPTION %s accessing %s" % (e, fn))
def filetype_from_name(fn):
i = max(fn.rfind('.'), fn.rfind('/'))
return fn[i + 1:].lower()
def isv2name(name):
return name.lower() in ('$segregs.ida', '$segs.ida', '0.ida', '1.ida', 'ida.idl', 'names.ida')
def isv3ext(ext):
return ext.lower() in ('.id0', '.id1', '.id2', '.nam', '.til')
def xlatv2name(name):
oldnames = {
'$segregs.ida': 'reg',
'$segs.ida': 'seg',
'0.ida': 'id0',
'1.ida': 'id1',
'ida.idl': 'idl',
'names.ida': 'nam',
}
return oldnames.get(name.lower())
def main():
parser = argparse.ArgumentParser(description='idbtool - print info from hex-rays IDA .idb and .i64 files',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
idbtool can process complete .idb and .i64 files, but also naked .id0, .id1, .nam, .til files.
All versions since IDA v2.0 are supported.
Queries start with an optional operator: <,<=,>,>=,==.
Followed by either a name or address or nodeid.
Addresses are specified as a sequence of hexadecimal charaters.
Nodeid's may be specified either as the full node id, starting with ff00,
or starting with a '_'.
Names are anything which can be found under the name tree in the database.
After the name/addr/node there is optionally a slash, followed by a node tag,
and another slash, followed by a index or hash string.
Multiple queries can be specified, terminated by another option, or `--`.
Add `-v` for pretty printed keys and values.
Examples:
idbtool -v --query "$ user1;S;0" -- x.idb
idbtool -v --limit 4 --query ">#0xa" -- x.idb
idbtool -v --limit 5 --query ">Root Node;S;0" -- x.idb
idbtool -v --limit 10 --query ">Root Node;S" -- x.idb
idbtool -v --query ".0xff000001;N" -- x.idb
""")
parser.add_argument('--verbose', '-v', action='count', default=0)
parser.add_argument('--recurse', '-r', action='store_true', help='recurse into directories')
parser.add_argument('--skiplinks', '-L', action='store_true', help='skip symbolic links')
parser.add_argument('--filetype', '-t', type=str, help='specify filetype when loading `naked` id1,nam or seg files')
parser.add_argument('--i64', '-i64', action='store_true', help='specify that `naked` file is from a 64 bit database')
parser.add_argument('--i32', '-i32', action='store_true', help='specify that `naked` file is from a 32 bit database')
parser.add_argument('--names', '-n', action='store_true', help='print names')
parser.add_argument('--scripts', '-s', action='store_true', help='print scripts')
parser.add_argument('--structs', '-u', action='store_true', help='print structs')
# parser.add_argument('--comments', '-c', action='store_true', help='print comments')
parser.add_argument('--enums', '-e', action='store_true', help='print enums and bitfields')
parser.add_argument('--imports', action='store_true', help='print imports')
parser.add_argument('--segs', action='store_true', help='print segments')
parser.add_argument('--funcdirs', action='store_true', help='print function dirs (folders)')
parser.add_argument('--info', '-i', action='store_true', help='database info')
parser.add_argument('--inc', action='store_true', help='dump id0 records by cursor increment')
parser.add_argument('--dec', action='store_true', help='dump id0 records by cursor decrement')
parser.add_argument('--id0', "-id0", action='store_true', help='dump id0 records, by walking the page tree')
parser.add_argument('--id1', "-id1", action='store_true', help='dump id1 records')
parser.add_argument('--dump', type=str, help='hexdump id1 bytes', metavar='FROM-UNTIL')
parser.add_argument('--dumpraw', type=str, help='output id1 bytes', metavar='FROM-UNTIL')
parser.add_argument('--pagedump', "-d", action='store_true', help='dump all btree pages, including any that might have become inaccessible due to datacorruption.')
parser.add_argument('--classify', action='store_true', help='Classify nodes found in the database.')
parser.add_argument('--query', "-q", type=str, nargs='*', help='search the id0 file for a specific record.')
parser.add_argument('--limit', '-m', type=int, help='Max nr of records to return for a query.')
parser.add_argument('--recover', action='store_true', help='recover idb from unpacked files, of v2 database')
parser.add_argument('--debug', action='store_true')
parser.add_argument('FILES', type=str, nargs='*', help='Files')
args = parser.parse_args()
if args.FILES:
dbs = dict()
for fn in EnumeratePaths(args, args.FILES):
basepath, filename = os.path.split(fn)
if isv2name(filename):
d = dbs.setdefault(basepath, dict())
d[xlatv2name(filename)] = fn
print("%s -> %s : %s" % (xlatv2name(filename), basepath, filename))
else:
basepath, ext = os.path.splitext(fn)
if isv3ext(ext):
d = dbs.setdefault(basepath, dict())
d[ext.lower()] = fn
if not args.dumpraw:
print("\n==> " + fn + " <==\n")
try:
filetype = args.filetype or filetype_from_name(fn)
with open(fn, "rb") as fh:
processfile(args, filetype, fh)
except Exception as e:
print("ERROR: %s" % e)
if args.debug:
raise
if args.recover:
for basepath, dbfiles in dbs.items():
if len(dbfiles) > 1:
try:
print("\n==> " + basepath + " <==\n")
recover_database(args, basepath, dbfiles)
except Exception as e:
print("ERROR: %s" % e)
else:
print("==> STDIN <==")
processfile(args, args.filetype, sys.stdin.buffer)
if __name__ == '__main__':
main()
================================================
FILE: setup.cfg
================================================
[flake8]
ignore = E402,E501,E731
================================================
FILE: test_idblib.py
================================================
import unittest
from idblib import FileSection, binary_search, makeStringIO
class TestFileSection(unittest.TestCase):
""" unittest for FileSection object """
def test_file(self):
s = makeStringIO(b"0123456789abcdef")
fh = FileSection(s, 3, 11)
self.assertEqual(fh.read(3), b"345")
self.assertEqual(fh.read(8), b"6789a")
self.assertEqual(fh.read(8), b"")
fh.seek(-1, 2)
self.assertEqual(fh.read(8), b"a")
fh.seek(3)
self.assertEqual(fh.read(2), b"67")
fh.seek(-2, 1)
self.assertEqual(fh.read(2), b"67")
fh.seek(2, 1)
self.assertEqual(fh.read(2), b"a")
fh.seek(8)
self.assertEqual(fh.read(1), b"")
with self.assertRaises(Exception):
fh.seek(9)
class TestBinarySearch(unittest.TestCase):
""" unittests for binary_search """
class Object:
def __init__(self, num):
self.key = num
def __repr__(self):
return "o(%d)" % self.num
def test_bs(self):
obj = self.Object
lst = [obj(_) for _ in (2, 3, 5, 6)]
self.assertEqual(binary_search(lst, 1), -1)
self.assertEqual(binary_search(lst, 2), 0)
self.assertEqual(binary_search(lst, 3), 1)
self.assertEqual(binary_search(lst, 4), 1)
self.assertEqual(binary_search(lst, 5), 2)
self.assertEqual(binary_search(lst, 6), 3)
self.assertEqual(binary_search(lst, 7), 3)
def test_emptylist(self):
obj = self.Object
lst = []
self.assertEqual(binary_search(lst, 1), -1)
def test_oneelem(self):
obj = self.Object
lst = [obj(1)]
self.assertEqual(binary_search(lst, 0), -1)
self.assertEqual(binary_search(lst, 1), 0)
self.assertEqual(binary_search(lst, 2), 0)
def test_twoelem(self):
obj = self.Object
lst = [obj(1), obj(3)]
self.assertEqual(binary_search(lst, 0), -1)
self.assertEqual(binary_search(lst, 1), 0)
self.assertEqual(binary_search(lst, 2), 0)
self.assertEqual(binary_search(lst, 3), 1)
self.assertEqual(binary_search(lst, 4), 1)
def test_listsize(self):
obj = self.Object
for l in range(3, 32):
lst = [obj(_ + 1) for _ in range(l)]
lst = lst[:1] + lst[2:]
self.assertEqual(binary_search(lst, 0), -1)
self.assertEqual(binary_search(lst, 1), 0)
self.assertEqual(binary_search(lst, 2), 0)
self.assertEqual(binary_search(lst, 3), 1)
self.assertEqual(binary_search(lst, l - 1), l - 3)
self.assertEqual(binary_search(lst, l), l - 2)
self.assertEqual(binary_search(lst, l + 1), l - 2)
self.assertEqual(binary_search(lst, l + 2), l - 2)
================================================
FILE: tree-walking.py
================================================
"""
Copyright (c) 2016 Willem Hengeveld <itsme@xs4all.nl>
Experiment in btree walking
*-------->[00]
*------>[02]---+ [01]
root ->[08]---+ [05]-+ |
[17]-+ | | +--->[03]
| | | [04]
| | |
| | +----->[06]
| | [07]
| |
| | *-------->[09]
| +->[11]---+ [10]
| [14]-+ |
| | +--->[12]
| | [13]
| |
| +----->[15]
| [16]
|
| *-------->[18]
+--->[20]---+ [19]
[23]-+ |
| +--->[21]
| [22]
|
+----->[24]
[25]
decrement from 08 : ix-- -> getpage, ix=len-1 -> getpage -> ix=len-1
decrement from 17 : ix-- -> getpage, ix=len-1 -> getpage -> ix=len-1
decrement from 02 : ix-- -> getpage, ix=len-1
decrement from 05 : ix-- -> getpage, ix=len-1
decrement from 01 : ix-- -> ix>=0 -> use key at ix
decrement from 03 : ix-- -> <0 -> pop -> ix>=0 -> use key at ix
decrement from 09 : ix-- -> <0 -> pop -> ix<0 -> pop -> ix>=0 -> use key at ix
increment from 09 : ix++
increment from 10 : ix++ -> ix==len(index) -> pop: ix==-1 -> ix++ -> ix==0 -> use
increment from 11 : recurse, ix=0 -> use
increment from 08 : recurse, ix=-1 -> recurse, ix=0 -> use
increment from 07 : ix++ -> ix==len(index) -> pop, ix++ -> ix==len -> pop -> ix++ -> ix==0 -> use
"""
from __future__ import division, print_function, absolute_import, unicode_literals
# shape of the tree
# a <2,2> tree is basically like the tree pictured in the ascii art above.
TREEDEPTH = 2
NODEWIDTH = 2
def binary_search(a, k):
# c++: a.upperbound(k)--
first, last = 0, len(a)
while first < last:
mid = (first + last) >> 1
if k < a[mid].key:
last = mid
else:
first = mid + 1
return first - 1
class Entry(object):
"""
a key/value entry from a b-tree page
"""
def __init__(self, key, val):
self.key = key
self.val = val
def __repr__(self):
return "%s=%d" % (self.key, self.val)
class BasePage(object):
"""
BasePage has methods common to both leaf and index pages
"""
def __init__(self, kv):
self.index = []
for k, v in kv:
self.index.append(Entry(k, v))
def find(self, key):
i = binary_search(self.index, key)
if i < 0:
if self.isindex():
return ('recurse', -1)
return ('gt', 0)
if self.index[i].key == key:
return ('eq', i)
if self.isindex():
return ('recurse', i)
return ('lt', i)
def getkey(self, ix):
return self.index[ix].key
def getval(self, ix):
return self.index[ix].val
def isleaf(self):
return self.preceeding is None
def isindex(self):
return self.preceeding is not None
def __repr__(self):
return ("leaf" if self.isleaf() else ("index<%d>" % self.preceeding)) + repr(self.index)
class LeafPage(BasePage):
""" a leaf page in the b-tree """
def __init__(self, kv):
super(self.__class__, self).__init__(kv)
self.preceeding = None
class IndexPage(BasePage):
"""
An index page in the b-tree.
This page has a preceeding page plus several key+subpage pairs.
For each key+subpage: all keys in the subpage are greater than the key
"""
def __init__(self, preceeding, kv):
super(self.__class__, self).__init__(kv)
self.preceeding = preceeding
def getpage(self, ix):
return self.preceeding if ix < 0 else self.index[ix].val
class Cursor:
"""
A Cursor object represents a position in the b-tree.
It has methods for moving to the next or previous item.
And methods for retrieving the key and value of the current position
"""
def __init__(self, db, stack):
self.db = db
self.stack = stack
def next(self):
page, ix = self.stack.pop()
if page.isleaf():
# from leaf move towards root
ix += 1
while self.stack and ix == len(page.index):
page, ix = self.stack.pop()
ix += 1
if ix < len(page.index):
self.stack.append((page, ix))
else:
# from node move towards leaf
self.stack.append((page, ix))
page = self.db.readpage(page.getpage(ix))
while page.isindex():
ix = -1
self.stack.append((page, ix))
page = self.db.readpage(page.getpage(ix))
ix = 0
self.stack.append((page, ix))
self.verify()
def prev(self):
page, ix = self.stack.pop()
ix -= 1
if page.isleaf():
# move towards root, until non 'prec' item found
while self.stack and ix < 0:
page, ix = self.stack.pop()
if ix >= 0:
self.stack.append((page, ix))
else:
# move towards leaf
self.stack.append((page, ix))
while page.isindex():
page = self.db.readpage(page.getpage(ix))
ix = len(page.index) - 1
self.stack.append((page, ix))
self.verify()
def verify(self):
""" verify cursor state consistency """
if len(self.stack) == 3:
if not self.stack[-1][0].isleaf():
print("WARN no leaf")
elif len(self.stack) > 3:
print("WARN: stack too large")
if len(self.stack) >= 2:
if self.stack[0][0] == self.stack[1][0]:
print("WARN: identical index pages on stack")
if not self.stack[0][0].isindex():
print("WARN: expected root=index")
if not self.stack[1][0].isindex():
print("WARN: expected 2nd=index")
def eof(self):
return len(self.stack) == 0
def getkey(self):
page, ix = self.stack[-1]
return page.getkey(ix)
def getval(self):
page, ix = self.stack[-1]
return page.getval(ix)
def __repr__(self):
return "cursor:" + repr(self.stack)
class Btree:
"""
A B-tree implementation
"""
def __init__(self):
self.pages = []
self.generate(TREEDEPTH, NODEWIDTH)
def manual(self):
""" manually construct the ascii art tree """
for i in range(9):
self.pages.append(LeafPage((("%02d" % (3 * i), 0), ("%02d" % (3 * i + 1), 0))))
for i in range(3):
self.pages.append(IndexPage(3 * i, (("%02d" % (9 * i + 2), 3 * i + 1), ("%02d" % (9 * i + 5), 3 * i + 2))))
self.pages.append(IndexPage(9, (("08", 10), ("17", 11))))
self.rootindex = len(self.pages) - 1
def generate(self, depth, nodesize):
""" automatically generate the try in the ascii art above """
def namegen():
i = 0
while True:
yield "%03d" % i
i += 1
self.rootindex = self.construct(namegen(), depth, nodesize)
print("%d pages" % (len(self.pages)))
def construct(self, namegen, depth, nodesize):
if depth:
return self.createindex(namegen, depth, nodesize)
else:
return self.createleaf(namegen, nodesize)
def createindex(self, namegen, depth, nodesize):
page = IndexPage(self.construct(namegen, depth - 1, nodesize),
[(next(namegen), self.construct(namegen, depth - 1, nodesize)) for _ in range(nodesize)])
self.pages.append(page)
return len(self.pages) - 1
def createleaf(self, namegen, nodesize):
page = LeafPage([(next(namegen), 0) for _ in range(nodesize)])
self.pages.append(page)
return len(self.pages) - 1
def readpage(self, pn):
return self.pages[pn]
def find(self, key):
"""
Find a node in the tree, returns the cursor plus the reletion to the wanted key:
'eq' for equal, 'lt' when the found key is less than the wanted key,
or 'gt' when the found key is greater than the wanted key.
"""
page = self.readpage(self.rootindex)
stack = []
while True:
act, ix = page.find(key)
stack.append((page, ix))
if act != 'recurse':
break
page = self.readpage(page.getpage(ix))
return act, Cursor(self, stack)
def dumptree(self, pn, indent=0):
""" dump all nodes of the current b-tree """
page = self.readpage(pn)
print(" " * indent, page)
if page.isindex():
print(" " * indent, end="")
self.dumptree(page.preceeding, indent + 1)
for p in range(len(page.index)):
print(" " * indent, end="")
self.dumptree(page.getpage(p), indent + 1)
db = Btree()
print("<<")
db.dumptree(db.rootindex)
print(">>")
for i in range(NODEWIDTH * len(db.pages)):
print("--------- %03d" % i)
act, cursor = db.find("%03d" % i)
print("found", act, cursor.getkey(), cursor)
cursor.prev()
if not cursor.eof():
print("prev:", "..", cursor.getkey(), cursor)
else:
print("prev: EOF", cursor)
for i in range(NODEWIDTH * len(db.pages)):
print("--------- %03d" % i)
act, cursor = db.find("%03d" % i)
print("found", act, cursor.getkey(), cursor)
cursor.next()
if not cursor.eof():
print("next:", "..", cursor.getkey(), cursor)
else:
print("next: EOF", cursor)
for k in ('', '0', '1', '2', '3', '000', '010', '020', '100'):
print("--------- %s" % k)
act, cursor = db.find(k)
print(cursor)
print(act, cursor.getkey(), end=" next=")
cursor.next()
if cursor.eof():
print("EOF")
else:
print(cursor.getkey())
act, cursor = db.find("000")
print("get000", end=" ")
for i in range(NODEWIDTH * len(db.pages)):
cursor.next()
if cursor.eof():
print("EOF")
else:
print("-> %s" % cursor.getkey(), end=" ")
print()
act, cursor = db.find("025")
print("get025", end=" ")
for i in range(NODEWIDTH * len(db.pages)):
cursor.prev()
if cursor.eof():
print("EOF")
else:
print("-> %s" % cursor.getkey(), end=" ")
print()
================================================
FILE: tstbs.py
================================================
def binary_search(a, k):
# c++: a.upperbound(k)--
first, last = 0, len(a)
while first<last:
mid = (first+last)>>1
if k < a[mid]:
last = mid
else:
first = mid+1
return first-1
for x in range(8):
print(x, binary_search([2,3,5,6], x))
gitextract_dtt79ccf/ ├── LICENSE ├── README.md ├── idaunpack.py ├── idblib.py ├── idbtool.py ├── setup.cfg ├── test_idblib.py ├── tree-walking.py └── tstbs.py
SYMBOL INDEX (281 symbols across 6 files)
FILE: idaunpack.py
function dump_packed (line 12) | def dump_packed(data, wordsize, pattern):
function unhex (line 44) | def unhex(hextxt):
function main (line 47) | def main():
FILE: idblib.py
function cmp (line 69) | def cmp(a, b): return (a > b) - (a < b)
class cachedproperty (line 72) | class cachedproperty(object):
method __init__ (line 74) | def __init__(self, method):
method __get__ (line 77) | def __get__(self, obj, cls):
function strz (line 86) | def strz(b, o):
function makeStringIO (line 89) | def makeStringIO(data):
function nonefmt (line 103) | def nonefmt(fmt, item):
function hexdump (line 110) | def hexdump(data):
class FileSection (line 119) | class FileSection(object):
method __init__ (line 129) | def __init__(self, fh, start, end):
method read (line 137) | def read(self, size=None):
method seek (line 151) | def seek(self, offset, *args):
method tell (line 174) | def tell(self):
class IdaUnpacker (line 178) | class IdaUnpacker:
method __init__ (line 185) | def __init__(self, wordsize, data):
method eof (line 190) | def eof(self):
method have (line 192) | def have(self, n):
method nextword (line 195) | def nextword(self):
method nextwordsigned (line 206) | def nextwordsigned(self):
method next64 (line 224) | def next64(self):
method next16 (line 231) | def next16(self):
method next8 (line 263) | def next8(self):
method next32 (line 272) | def next32(self):
method bytes (line 312) | def bytes(self, n):
class IDBFile (line 323) | class IDBFile(object):
method __init__ (line 340) | def __init__(self, fh):
method getsectioninfo (line 401) | def getsectioninfo(self, i):
method getpart (line 434) | def getpart(self, ix):
method getsection (line 459) | def getsection(self, cls):
class RecoverIDBFile (line 466) | class RecoverIDBFile:
method __init__ (line 474) | def __init__(self, args, basepath, dbfiles):
method getsectioninfo (line 483) | def getsectioninfo(self, i):
method getpart (line 491) | def getpart(self, ix):
method getsection (line 500) | def getsection(self, cls):
function binary_search (line 506) | def binary_search(a, k):
class BaseIndexEntry (line 545) | class BaseIndexEntry(object):
method __init__ (line 553) | def __init__(self, data):
method __repr__ (line 565) | def __repr__(self):
class BaseLeafEntry (line 569) | class BaseLeafEntry(BaseIndexEntry):
method __init__ (line 579) | def __init__(self, key, data):
method __repr__ (line 584) | def __repr__(self):
class BTree (line 588) | class BTree(object):
class BasePage (line 594) | class BasePage(object):
method __init__ (line 603) | def __init__(self, data, entsize, entfmt):
method find (line 618) | def find(self, key):
method getpage (line 642) | def getpage(self, ix):
method getkey (line 646) | def getkey(self, ix):
method getval (line 650) | def getval(self, ix):
method isleaf (line 654) | def isleaf(self):
method isindex (line 658) | def isindex(self):
method __repr__ (line 662) | def __repr__(self):
class Page15 (line 668) | class Page15(BasePage):
class IndexEntry (line 670) | class IndexEntry(BaseIndexEntry):
method __init__ (line 671) | def __init__(self, key, data, ofs):
class LeafEntry (line 676) | class LeafEntry(BaseLeafEntry):
method __init__ (line 677) | def __init__(self, key, data, ofs):
method __init__ (line 683) | def __init__(self, data):
class Page16 (line 686) | class Page16(BasePage):
class IndexEntry (line 688) | class IndexEntry(BaseIndexEntry):
method __init__ (line 689) | def __init__(self, key, data, ofs):
class LeafEntry (line 694) | class LeafEntry(BaseLeafEntry):
method __init__ (line 695) | def __init__(self, key, data, ofs):
method __init__ (line 700) | def __init__(self, data):
class Page20 (line 703) | class Page20(BasePage):
class IndexEntry (line 705) | class IndexEntry(BaseIndexEntry):
method __init__ (line 706) | def __init__(self, key, data, ofs):
class LeafEntry (line 711) | class LeafEntry(BaseLeafEntry):
method __init__ (line 712) | def __init__(self, key, data, ofs):
method __init__ (line 717) | def __init__(self, data):
class Cursor (line 720) | class Cursor:
method __init__ (line 729) | def __init__(self, db, stack):
method next (line 733) | def next(self):
method prev (line 755) | def prev(self):
method eof (line 773) | def eof(self):
method getkey (line 776) | def getkey(self):
method getval (line 781) | def getval(self):
method __repr__ (line 786) | def __repr__(self):
method __init__ (line 789) | def __init__(self, fh):
method parseheader15 (line 812) | def parseheader15(self, data):
method parseheader16 (line 815) | def parseheader16(self, data):
method readpage (line 819) | def readpage(self, nr):
method find (line 823) | def find(self, rel, key):
method dump (line 870) | def dump(self):
method dumpfree (line 876) | def dumpfree(self):
method dumpindented (line 899) | def dumpindented(self, pn, indent=0):
method dumptree (line 913) | def dumptree(self, pn):
method pagedump (line 927) | def pagedump(self):
class ID0File (line 956) | class ID0File(object):
method __init__ (line 987) | def __init__(self, idb, fh):
method root (line 1025) | def root(self): return self.nodeByName("Root Node")
method idaver (line 1030) | def idaver(self): return self.int(self.root, 'A', -1)
method idbparams (line 1033) | def idbparams(self): return self.bytes(self.root, 'S', 0x41b994)
method idaverstr (line 1035) | def idaverstr(self): return self.string(self.root, 'S', 1303)
method nropens (line 1037) | def nropens(self): return self.int(self.root, 'A', -4)
method creationtime (line 1039) | def creationtime(self): return self.int(self.root, 'A', -2)
method originmd5 (line 1041) | def originmd5(self): return self.bytes(self.root, 'S', 1302)
method somecrc (line 1043) | def somecrc(self): return self.int(self.root, 'A', -5)
method prettykey (line 1045) | def prettykey(self, key):
method prettyval (line 1078) | def prettyval(self, val):
method nodeByName (line 1091) | def nodeByName(self, name):
method namekey (line 1102) | def namekey(self, name):
method makekey (line 1107) | def makekey(self, *args):
method decodekey (line 1130) | def decodekey(self, key):
method bytes (line 1159) | def bytes(self, *args):
method int (line 1169) | def int(self, *args):
method string (line 1188) | def string(self, *args):
method name (line 1194) | def name(self, id):
method blob (line 1208) | def blob(self, nodeid, tag, start=0, end=0xFFFFFFFF):
class ID1File (line 1228) | class ID1File(object):
class SegInfo (line 1237) | class SegInfo:
method __init__ (line 1238) | def __init__(self, startea, endea, offset):
method __init__ (line 1243) | def __init__(self, idb, fh):
method is32bit_heuristic (line 1289) | def is32bit_heuristic(self, fh, seglistofs):
method dump (line 1295) | def dump(self):
method find_segment (line 1309) | def find_segment(self, ea):
method getFlags (line 1315) | def getFlags(self, ea):
method firstSeg (line 1322) | def firstSeg(self):
method nextSeg (line 1325) | def nextSeg(self, ea):
method segStart (line 1333) | def segStart(self, ea):
method segEnd (line 1339) | def segEnd(self, ea):
class NAMFile (line 1346) | class NAMFile(object):
method __init__ (line 1350) | def __init__(self, idb, fh):
method dump (line 1382) | def dump(self):
method allnames (line 1385) | def allnames(self):
class SEGFile (line 1397) | class SEGFile(object):
method __init__ (line 1401) | def __init__(self, idb, fh):
class TILFile (line 1405) | class TILFile(object):
method __init__ (line 1409) | def __init__(self, idb, fh):
class ID2File (line 1414) | class ID2File(object):
method __init__ (line 1423) | def __init__(self, idb, fh):
class Struct (line 1427) | class Struct:
class Member (line 1436) | class Member:
method __init__ (line 1450) | def __init__(self, id0, spec):
method name (line 1459) | def name(self): return self._id0.name(self._nodeid)
method enumid (line 1461) | def enumid(self): return self._id0.int(self._nodeid, 'A', 11)
method stringtype (line 1463) | def stringtype(self): return self._id0.int(self._nodeid, 'A', 16)
method structid (line 1465) | def structid(self): return self._id0.int(self._nodeid, 'A', 3)
method comment (line 1467) | def comment(self, repeatable): return self._id0.string(self._nodeid,...
method ptrinfo (line 1469) | def ptrinfo(self): return self._id0.bytes(self._nodeid, 'S', 9)
method typeinfo (line 1471) | def typeinfo(self): return self._id0.bytes(self._nodeid, 'S', 0x3000)
method __init__ (line 1473) | def __init__(self, id0, nodeid):
method comment (line 1501) | def comment(self, repeatable): return self._id0.string(self._nodeid, '...
method name (line 1503) | def name(self): return self._id0.name(self._nodeid)
method __iter__ (line 1505) | def __iter__(self):
class Enum (line 1510) | class Enum:
class Member (line 1520) | class Member:
method __init__ (line 1526) | def __init__(self, id0, nodeid):
method value (line 1531) | def value(self): return self._id0.int(self._nodeid, 'A', -3)
method comment (line 1533) | def comment(self, repeatable): return self._id0.string(self._nodeid,...
method name (line 1535) | def name(self): return self._id0.name(self._nodeid)
method __init__ (line 1537) | def __init__(self, id0, nodeid):
method count (line 1542) | def count(self): return self._id0.int(self._nodeid, 'A', -1)
method representation (line 1544) | def representation(self): return self._id0.int(self._nodeid, 'A', -3)
method flags (line 1549) | def flags(self): return self._id0.int(self._nodeid, 'A', -5)
method comment (line 1552) | def comment(self, repeatable): return self._id0.string(self._nodeid, '...
method name (line 1554) | def name(self): return self._id0.name(self._nodeid)
method __iter__ (line 1556) | def __iter__(self):
class Bitfield (line 1565) | class Bitfield:
class Member (line 1566) | class Member:
method __init__ (line 1567) | def __init__(self, id0, nodeid):
method value (line 1572) | def value(self): return self._id0.int(self._nodeid, 'A', -3)
method mask (line 1574) | def mask(self): return self._id0.int(self._nodeid, 'A', -6) - 1
method comment (line 1576) | def comment(self, repeatable): return self._id0.string(self._nodeid,...
method name (line 1578) | def name(self): return self._id0.name(self._nodeid)
class Mask (line 1580) | class Mask:
method __init__ (line 1581) | def __init__(self, id0, nodeid, mask):
method comment (line 1587) | def comment(self, repeatable): return self._id0.string(self._nodeid,...
method name (line 1589) | def name(self): return self._id0.name(self._nodeid)
method __iter__ (line 1591) | def __iter__(self):
method __init__ (line 1603) | def __init__(self, id0, nodeid):
method count (line 1608) | def count(self): return self._id0.int(self._nodeid, 'A', -1)
method representation (line 1610) | def representation(self): return self._id0.int(self._nodeid, 'A', -3)
method flags (line 1612) | def flags(self): return self._id0.int(self._nodeid, 'A', -5)
method comment (line 1615) | def comment(self, repeatable): return self._id0.string(self._nodeid, '...
method name (line 1617) | def name(self): return self._id0.name(self._nodeid)
method __iter__ (line 1619) | def __iter__(self):
class IDBParams (line 1631) | class IDBParams:
method __init__ (line 1632) | def __init__(self, id0, data):
class Script (line 1667) | class Script:
method __init__ (line 1668) | def __init__(self, id0, nodeid):
method name (line 1673) | def name(self): return self._id0.string(self._nodeid, 'S', 0)
method language (line 1675) | def language(self): return self._id0.string(self._nodeid, 'S', 1)
method body (line 1677) | def body(self): return strz(self._id0.blob(self._nodeid, 'X'), 0)
class Segment (line 1679) | class Segment:
method __init__ (line 1683) | def __init__(self, id0, spec):
FILE: idbtool.py
function timestring (line 46) | def timestring(t):
function strz (line 52) | def strz(b, o):
function nonefmt (line 55) | def nonefmt(fmt, num):
function decryptuser (line 63) | def decryptuser(data):
function licensestring (line 80) | def licensestring(lic):
function dumpuser (line 155) | def dumpuser(id0):
function dumpinfo (line 212) | def dumpinfo(id0):
function dumpnames (line 256) | def dumpnames(args, id0, nam):
function dumpscript (line 261) | def dumpscript(id0, node):
function dumpstructmember (line 269) | def dumpstructmember(m):
function dumpstruct (line 294) | def dumpstruct(id0, node):
function dumpbitmember (line 305) | def dumpbitmember(m):
function dumpmask (line 307) | def dumpmask(m):
function dumpbitfield (line 311) | def dumpbitfield(id0, node):
function dumpenummember (line 317) | def dumpenummember(m):
function dumpenum (line 323) | def dumpenum(id0, node):
function dumpimport (line 337) | def dumpimport(id0, node):
function enumlist (line 363) | def enumlist(id0, listname, callback):
function listfuncdirs (line 392) | def listfuncdirs(id0):
function dumpfuncdir (line 408) | def dumpfuncdir(id0, dir_index, data):
function printent (line 480) | def printent(args, id0, c):
function createkey (line 487) | def createkey(args, id0, base, tag, ix):
function enumeratecursor (line 527) | def enumeratecursor(args, c, onerec, callback):
function id0query (line 547) | def id0query(args, id0, query):
function getsegs (line 579) | def getsegs(id0):
function listsegments (line 598) | def listsegments(id0):
function classifynodes (line 624) | def classifynodes(args, id0):
function processid0 (line 857) | def processid0(args, id0):
function hexascdumprange (line 877) | def hexascdumprange(id1, a, b):
function saverange (line 897) | def saverange(id1, a, b, fh):
function processid1 (line 911) | def processid1(args, id1):
function processid2 (line 927) | def processid2(args, id2):
function processnam (line 931) | def processnam(args, nam):
function processtil (line 935) | def processtil(args, til):
function processseg (line 939) | def processseg(args, seg):
function processidb (line 943) | def processidb(args, idb):
function processfile (line 981) | def processfile(args, filetypehint, fh):
function recover_database (line 1019) | def recover_database(args, basepath, dbfiles):
function DirEnumerator (line 1023) | def DirEnumerator(args, path):
function EnumeratePaths (line 1044) | def EnumeratePaths(args, paths):
function filetype_from_name (line 1065) | def filetype_from_name(fn):
function isv2name (line 1070) | def isv2name(name):
function isv3ext (line 1074) | def isv3ext(ext):
function xlatv2name (line 1078) | def xlatv2name(name):
function main (line 1091) | def main():
FILE: test_idblib.py
class TestFileSection (line 5) | class TestFileSection(unittest.TestCase):
method test_file (line 7) | def test_file(self):
class TestBinarySearch (line 29) | class TestBinarySearch(unittest.TestCase):
class Object (line 31) | class Object:
method __init__ (line 32) | def __init__(self, num):
method __repr__ (line 35) | def __repr__(self):
method test_bs (line 38) | def test_bs(self):
method test_emptylist (line 49) | def test_emptylist(self):
method test_oneelem (line 54) | def test_oneelem(self):
method test_twoelem (line 61) | def test_twoelem(self):
method test_listsize (line 70) | def test_listsize(self):
FILE: tree-walking.py
function binary_search (line 58) | def binary_search(a, k):
class Entry (line 70) | class Entry(object):
method __init__ (line 74) | def __init__(self, key, val):
method __repr__ (line 78) | def __repr__(self):
class BasePage (line 82) | class BasePage(object):
method __init__ (line 86) | def __init__(self, kv):
method find (line 91) | def find(self, key):
method getkey (line 103) | def getkey(self, ix):
method getval (line 106) | def getval(self, ix):
method isleaf (line 109) | def isleaf(self):
method isindex (line 112) | def isindex(self):
method __repr__ (line 115) | def __repr__(self):
class LeafPage (line 119) | class LeafPage(BasePage):
method __init__ (line 121) | def __init__(self, kv):
class IndexPage (line 126) | class IndexPage(BasePage):
method __init__ (line 132) | def __init__(self, preceeding, kv):
method getpage (line 136) | def getpage(self, ix):
class Cursor (line 140) | class Cursor:
method __init__ (line 147) | def __init__(self, db, stack):
method next (line 151) | def next(self):
method prev (line 174) | def prev(self):
method verify (line 193) | def verify(self):
method eof (line 209) | def eof(self):
method getkey (line 212) | def getkey(self):
method getval (line 216) | def getval(self):
method __repr__ (line 220) | def __repr__(self):
class Btree (line 224) | class Btree:
method __init__ (line 228) | def __init__(self):
method manual (line 232) | def manual(self):
method generate (line 241) | def generate(self, depth, nodesize):
method construct (line 253) | def construct(self, namegen, depth, nodesize):
method createindex (line 259) | def createindex(self, namegen, depth, nodesize):
method createleaf (line 265) | def createleaf(self, namegen, nodesize):
method readpage (line 270) | def readpage(self, pn):
method find (line 273) | def find(self, key):
method dumptree (line 289) | def dumptree(self, pn, indent=0):
FILE: tstbs.py
function binary_search (line 1) | def binary_search(a, k):
Condensed preview — 9 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (123K chars).
[
{
"path": "LICENSE",
"chars": 1091,
"preview": "MIT License\n\nCopyright (c) 2020 Willem Hengeveld <itsme@xs4all.nl>\n\nPermission is hereby granted, free of charge, to any"
},
{
"path": "README.md",
"chars": 5397,
"preview": "IDBTOOL\n=======\n\nA tool for extracting information from IDA databases.\n`idbtool` knows how to handle databases from all "
},
{
"path": "idaunpack.py",
"chars": 2132,
"preview": "\"\"\"\n`idaunpack` is a tool to aid in decoding packed data structures from an\nIDA idb or i64 database.\n\"\"\"\nfrom __future__"
},
{
"path": "idblib.py",
"chars": 57549,
"preview": "\"\"\"\nidblib - a module for reading hex-rays Interactive DisAssembler databases\n\nSupports database versions starting with "
},
{
"path": "idbtool.py",
"chars": 37577,
"preview": "#!/usr/bin/python3\n\"\"\"\nTool for querying information from Hexrays .idb and .i64 files\nwithout launching IDA.\n\nCopyright "
},
{
"path": "setup.cfg",
"chars": 34,
"preview": "[flake8]\nignore = E402,E501,E731\n\n"
},
{
"path": "test_idblib.py",
"chars": 2818,
"preview": "import unittest\nfrom idblib import FileSection, binary_search, makeStringIO\n\n\nclass TestFileSection(unittest.TestCase):\n"
},
{
"path": "tree-walking.py",
"chars": 10588,
"preview": "\"\"\"\nCopyright (c) 2016 Willem Hengeveld <itsme@xs4all.nl>\n\nExperiment in btree walking\n\n\n *-------->[0"
},
{
"path": "tstbs.py",
"chars": 300,
"preview": "def binary_search(a, k):\n # c++: a.upperbound(k)--\n first, last = 0, len(a)\n while first<last:\n mid = (f"
}
]
About this extraction
This page contains the full source code of the nlitsme/pyidbutil GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 9 files (114.7 KB), approximately 32.0k tokens, and a symbol index with 281 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.