Repository: rogerhu/gdb-heap
Branch: master
Commit: 8f7986c90754
Files: 24
Total size: 191.2 KB

Directory structure:
gitextract_kppz_9el/

├── .gitignore
├── ChangeLog.rst
├── LICENSE-lgpl-2.1.txt
├── LICENSE-python.txt
├── LICENSE.txt
├── README.md
├── gdbheap.py
├── heap/
│   ├── __init__.py
│   ├── commands.py
│   ├── compat.py
│   ├── cplusplus.py
│   ├── cpython.py
│   ├── glibc.py
│   ├── gobject.py
│   ├── history.py
│   ├── parser.py
│   ├── pypy.py
│   ├── query.py
│   └── sqlite.py
├── make-release.sh
├── object-sizes.py
├── resultparser.py
├── run-gdb-heap
└── selftest.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*~
*.pyc
*.pyo
gdb-heap-*.tar.bz2
test_*


================================================
FILE: ChangeLog.rst
================================================
==========
Change Log
==========

* Since glib v2.15, there can now be multiple allocation arenas support for
multi-threaded environments  (http://stackoverflow.com/questions/10706466/how-does-malloc-work-in-a-multithreaded-environment).

A new command called "heap arenas" will now allow you to see how many arenas and their respective address locations


================================================
FILE: LICENSE-lgpl-2.1.txt
================================================
                  GNU LESSER GENERAL PUBLIC LICENSE
                       Version 2.1, February 1999

 Copyright (C) 1991, 1999 Free Software Foundation, Inc.
 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

[This is the first released version of the Lesser GPL.  It also counts
 as the successor of the GNU Library Public License, version 2, hence
 the version number 2.1.]

                            Preamble

  The licenses for most software are designed to take away your
freedom to share and change it.  By contrast, the GNU General Public
Licenses are intended to guarantee your freedom to share and change
free software--to make sure the software is free for all its users.

  This license, the Lesser General Public License, applies to some
specially designated software packages--typically libraries--of the
Free Software Foundation and other authors who decide to use it.  You
can use it too, but we suggest you first think carefully about whether
this license or the ordinary General Public License is the better
strategy to use in any particular case, based on the explanations below.

  When we speak of free software, we are referring to freedom of use,
not price.  Our General Public Licenses are designed to make sure that
you have the freedom to distribute copies of free software (and charge
for this service if you wish); that you receive source code or can get
it if you want it; that you can change the software and use pieces of
it in new free programs; and that you are informed that you can do
these things.

  To protect your rights, we need to make restrictions that forbid
distributors to deny you these rights or to ask you to surrender these
rights.  These restrictions translate to certain responsibilities for
you if you distribute copies of the library or if you modify it.

  For example, if you distribute copies of the library, whether gratis
or for a fee, you must give the recipients all the rights that we gave
you.  You must make sure that they, too, receive or can get the source
code.  If you link other code with the library, you must provide
complete object files to the recipients, so that they can relink them
with the library after making changes to the library and recompiling
it.  And you must show them these terms so they know their rights.

  We protect your rights with a two-step method: (1) we copyright the
library, and (2) we offer you this license, which gives you legal
permission to copy, distribute and/or modify the library.

  To protect each distributor, we want to make it very clear that
there is no warranty for the free library.  Also, if the library is
modified by someone else and passed on, the recipients should know
that what they have is not the original version, so that the original
author's reputation will not be affected by problems that might be
introduced by others.

  Finally, software patents pose a constant threat to the existence of
any free program.  We wish to make sure that a company cannot
effectively restrict the users of a free program by obtaining a
restrictive license from a patent holder.  Therefore, we insist that
any patent license obtained for a version of the library must be
consistent with the full freedom of use specified in this license.

  Most GNU software, including some libraries, is covered by the
ordinary GNU General Public License.  This license, the GNU Lesser
General Public License, applies to certain designated libraries, and
is quite different from the ordinary General Public License.  We use
this license for certain libraries in order to permit linking those
libraries into non-free programs.

  When a program is linked with a library, whether statically or using
a shared library, the combination of the two is legally speaking a
combined work, a derivative of the original library.  The ordinary
General Public License therefore permits such linking only if the
entire combination fits its criteria of freedom.  The Lesser General
Public License permits more lax criteria for linking other code with
the library.

  We call this license the "Lesser" General Public License because it
does Less to protect the user's freedom than the ordinary General
Public License.  It also provides other free software developers Less
of an advantage over competing non-free programs.  These disadvantages
are the reason we use the ordinary General Public License for many
libraries.  However, the Lesser license provides advantages in certain
special circumstances.

  For example, on rare occasions, there may be a special need to
encourage the widest possible use of a certain library, so that it becomes
a de-facto standard.  To achieve this, non-free programs must be
allowed to use the library.  A more frequent case is that a free
library does the same job as widely used non-free libraries.  In this
case, there is little to gain by limiting the free library to free
software only, so we use the Lesser General Public License.

  In other cases, permission to use a particular library in non-free
programs enables a greater number of people to use a large body of
free software.  For example, permission to use the GNU C Library in
non-free programs enables many more people to use the whole GNU
operating system, as well as its variant, the GNU/Linux operating
system.

  Although the Lesser General Public License is Less protective of the
users' freedom, it does ensure that the user of a program that is
linked with the Library has the freedom and the wherewithal to run
that program using a modified version of the Library.

  The precise terms and conditions for copying, distribution and
modification follow.  Pay close attention to the difference between a
"work based on the library" and a "work that uses the library".  The
former contains code derived from the library, whereas the latter must
be combined with the library in order to run.

                  GNU LESSER GENERAL PUBLIC LICENSE
   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

  0. This License Agreement applies to any software library or other
program which contains a notice placed by the copyright holder or
other authorized party saying it may be distributed under the terms of
this Lesser General Public License (also called "this License").
Each licensee is addressed as "you".

  A "library" means a collection of software functions and/or data
prepared so as to be conveniently linked with application programs
(which use some of those functions and data) to form executables.

  The "Library", below, refers to any such software library or work
which has been distributed under these terms.  A "work based on the
Library" means either the Library or any derivative work under
copyright law: that is to say, a work containing the Library or a
portion of it, either verbatim or with modifications and/or translated
straightforwardly into another language.  (Hereinafter, translation is
included without limitation in the term "modification".)

  "Source code" for a work means the preferred form of the work for
making modifications to it.  For a library, complete source code means
all the source code for all modules it contains, plus any associated
interface definition files, plus the scripts used to control compilation
and installation of the library.

  Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope.  The act of
running a program using the Library is not restricted, and output from
such a program is covered only if its contents constitute a work based
on the Library (independent of the use of the Library in a tool for
writing it).  Whether that is true depends on what the Library does
and what the program that uses the Library does.

  1. You may copy and distribute verbatim copies of the Library's
complete source code as you receive it, in any medium, provided that
you conspicuously and appropriately publish on each copy an
appropriate copyright notice and disclaimer of warranty; keep intact
all the notices that refer to this License and to the absence of any
warranty; and distribute a copy of this License along with the
Library.

  You may charge a fee for the physical act of transferring a copy,
and you may at your option offer warranty protection in exchange for a
fee.

  2. You may modify your copy or copies of the Library or any portion
of it, thus forming a work based on the Library, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:

    a) The modified work must itself be a software library.

    b) You must cause the files modified to carry prominent notices
    stating that you changed the files and the date of any change.

    c) You must cause the whole of the work to be licensed at no
    charge to all third parties under the terms of this License.

    d) If a facility in the modified Library refers to a function or a
    table of data to be supplied by an application program that uses
    the facility, other than as an argument passed when the facility
    is invoked, then you must make a good faith effort to ensure that,
    in the event an application does not supply such function or
    table, the facility still operates, and performs whatever part of
    its purpose remains meaningful.

    (For example, a function in a library to compute square roots has
    a purpose that is entirely well-defined independent of the
    application.  Therefore, Subsection 2d requires that any
    application-supplied function or table used by this function must
    be optional: if the application does not supply it, the square
    root function must still compute square roots.)

These requirements apply to the modified work as a whole.  If
identifiable sections of that work are not derived from the Library,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works.  But when you
distribute the same sections as part of a whole which is a work based
on the Library, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote
it.

Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Library.

In addition, mere aggregation of another work not based on the Library
with the Library (or with a work based on the Library) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.

  3. You may opt to apply the terms of the ordinary GNU General Public
License instead of this License to a given copy of the Library.  To do
this, you must alter all the notices that refer to this License, so
that they refer to the ordinary GNU General Public License, version 2,
instead of to this License.  (If a newer version than version 2 of the
ordinary GNU General Public License has appeared, then you can specify
that version instead if you wish.)  Do not make any other change in
these notices.

  Once this change is made in a given copy, it is irreversible for
that copy, so the ordinary GNU General Public License applies to all
subsequent copies and derivative works made from that copy.

  This option is useful when you wish to copy part of the code of
the Library into a program that is not a library.

  4. You may copy and distribute the Library (or a portion or
derivative of it, under Section 2) in object code or executable form
under the terms of Sections 1 and 2 above provided that you accompany
it with the complete corresponding machine-readable source code, which
must be distributed under the terms of Sections 1 and 2 above on a
medium customarily used for software interchange.

  If distribution of object code is made by offering access to copy
from a designated place, then offering equivalent access to copy the
source code from the same place satisfies the requirement to
distribute the source code, even though third parties are not
compelled to copy the source along with the object code.

  5. A program that contains no derivative of any portion of the
Library, but is designed to work with the Library by being compiled or
linked with it, is called a "work that uses the Library".  Such a
work, in isolation, is not a derivative work of the Library, and
therefore falls outside the scope of this License.

  However, linking a "work that uses the Library" with the Library
creates an executable that is a derivative of the Library (because it
contains portions of the Library), rather than a "work that uses the
library".  The executable is therefore covered by this License.
Section 6 states terms for distribution of such executables.

  When a "work that uses the Library" uses material from a header file
that is part of the Library, the object code for the work may be a
derivative work of the Library even though the source code is not.
Whether this is true is especially significant if the work can be
linked without the Library, or if the work is itself a library.  The
threshold for this to be true is not precisely defined by law.

  If such an object file uses only numerical parameters, data
structure layouts and accessors, and small macros and small inline
functions (ten lines or less in length), then the use of the object
file is unrestricted, regardless of whether it is legally a derivative
work.  (Executables containing this object code plus portions of the
Library will still fall under Section 6.)

  Otherwise, if the work is a derivative of the Library, you may
distribute the object code for the work under the terms of Section 6.
Any executables containing that work also fall under Section 6,
whether or not they are linked directly with the Library itself.

  6. As an exception to the Sections above, you may also combine or
link a "work that uses the Library" with the Library to produce a
work containing portions of the Library, and distribute that work
under terms of your choice, provided that the terms permit
modification of the work for the customer's own use and reverse
engineering for debugging such modifications.

  You must give prominent notice with each copy of the work that the
Library is used in it and that the Library and its use are covered by
this License.  You must supply a copy of this License.  If the work
during execution displays copyright notices, you must include the
copyright notice for the Library among them, as well as a reference
directing the user to the copy of this License.  Also, you must do one
of these things:

    a) Accompany the work with the complete corresponding
    machine-readable source code for the Library including whatever
    changes were used in the work (which must be distributed under
    Sections 1 and 2 above); and, if the work is an executable linked
    with the Library, with the complete machine-readable "work that
    uses the Library", as object code and/or source code, so that the
    user can modify the Library and then relink to produce a modified
    executable containing the modified Library.  (It is understood
    that the user who changes the contents of definitions files in the
    Library will not necessarily be able to recompile the application
    to use the modified definitions.)

    b) Use a suitable shared library mechanism for linking with the
    Library.  A suitable mechanism is one that (1) uses at run time a
    copy of the library already present on the user's computer system,
    rather than copying library functions into the executable, and (2)
    will operate properly with a modified version of the library, if
    the user installs one, as long as the modified version is
    interface-compatible with the version that the work was made with.

    c) Accompany the work with a written offer, valid for at
    least three years, to give the same user the materials
    specified in Subsection 6a, above, for a charge no more
    than the cost of performing this distribution.

    d) If distribution of the work is made by offering access to copy
    from a designated place, offer equivalent access to copy the above
    specified materials from the same place.

    e) Verify that the user has already received a copy of these
    materials or that you have already sent this user a copy.

  For an executable, the required form of the "work that uses the
Library" must include any data and utility programs needed for
reproducing the executable from it.  However, as a special exception,
the materials to be distributed need not include anything that is
normally distributed (in either source or binary form) with the major
components (compiler, kernel, and so on) of the operating system on
which the executable runs, unless that component itself accompanies
the executable.

  It may happen that this requirement contradicts the license
restrictions of other proprietary libraries that do not normally
accompany the operating system.  Such a contradiction means you cannot
use both them and the Library together in an executable that you
distribute.

  7. You may place library facilities that are a work based on the
Library side-by-side in a single library together with other library
facilities not covered by this License, and distribute such a combined
library, provided that the separate distribution of the work based on
the Library and of the other library facilities is otherwise
permitted, and provided that you do these two things:

    a) Accompany the combined library with a copy of the same work
    based on the Library, uncombined with any other library
    facilities.  This must be distributed under the terms of the
    Sections above.

    b) Give prominent notice with the combined library of the fact
    that part of it is a work based on the Library, and explaining
    where to find the accompanying uncombined form of the same work.

  8. You may not copy, modify, sublicense, link with, or distribute
the Library except as expressly provided under this License.  Any
attempt otherwise to copy, modify, sublicense, link with, or
distribute the Library is void, and will automatically terminate your
rights under this License.  However, parties who have received copies,
or rights, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.

  9. You are not required to accept this License, since you have not
signed it.  However, nothing else grants you permission to modify or
distribute the Library or its derivative works.  These actions are
prohibited by law if you do not accept this License.  Therefore, by
modifying or distributing the Library (or any work based on the
Library), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Library or works based on it.

  10. Each time you redistribute the Library (or any work based on the
Library), the recipient automatically receives a license from the
original licensor to copy, distribute, link with or modify the Library
subject to these terms and conditions.  You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties with
this License.

  11. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Library at all.  For example, if a patent
license would not permit royalty-free redistribution of the Library by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Library.

If any portion of this section is held invalid or unenforceable under any
particular circumstance, the balance of the section is intended to apply,
and the section as a whole is intended to apply in other circumstances.

It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system which is
implemented by public license practices.  Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.

This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.

  12. If the distribution and/or use of the Library is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Library under this License may add
an explicit geographical distribution limitation excluding those countries,
so that distribution is permitted only in or among countries not thus
excluded.  In such case, this License incorporates the limitation as if
written in the body of this License.

  13. The Free Software Foundation may publish revised and/or new
versions of the Lesser General Public License from time to time.
Such new versions will be similar in spirit to the present version,
but may differ in detail to address new problems or concerns.

Each version is given a distinguishing version number.  If the Library
specifies a version number of this License which applies to it and
"any later version", you have the option of following the terms and
conditions either of that version or of any later version published by
the Free Software Foundation.  If the Library does not specify a
license version number, you may choose any version ever published by
the Free Software Foundation.

  14. If you wish to incorporate parts of the Library into other free
programs whose distribution conditions are incompatible with these,
write to the author to ask for permission.  For software which is
copyrighted by the Free Software Foundation, write to the Free
Software Foundation; we sometimes make exceptions for this.  Our
decision will be guided by the two goals of preserving the free status
of all derivatives of our free software and of promoting the sharing
and reuse of software generally.

                            NO WARRANTY

  15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
LIBRARY IS WITH YOU.  SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

  16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.

                     END OF TERMS AND CONDITIONS

           How to Apply These Terms to Your New Libraries

  If you develop a new library, and you want it to be of the greatest
possible use to the public, we recommend making it free software that
everyone can redistribute and change.  You can do so by permitting
redistribution under these terms (or, alternatively, under the terms of the
ordinary General Public License).

  To apply these terms, attach the following notices to the library.  It is
safest to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least the
"copyright" line and a pointer to where the full notice is found.

    <one line to give the library's name and a brief idea of what it does.>
    Copyright (C) <year>  <name of author>

    This library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
    License as published by the Free Software Foundation; either
    version 2.1 of the License, or (at your option) any later version.

    This library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public
    License along with this library; if not, write to the Free Software
    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

Also add information on how to contact you by electronic and paper mail.

You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the library, if
necessary.  Here is a sample; alter the names:

  Yoyodyne, Inc., hereby disclaims all copyright interest in the
  library `Frob' (a library for tweaking knobs) written by James Random Hacker.

  <signature of Ty Coon>, 1 April 1990
  Ty Coon, President of Vice

That's all there is to it!


================================================
FILE: LICENSE-python.txt
================================================
A. HISTORY OF THE SOFTWARE
==========================

Python was created in the early 1990s by Guido van Rossum at Stichting
Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands
as a successor of a language called ABC.  Guido remains Python's
principal author, although it includes many contributions from others.

In 1995, Guido continued his work on Python at the Corporation for
National Research Initiatives (CNRI, see http://www.cnri.reston.va.us)
in Reston, Virginia where he released several versions of the
software.

In May 2000, Guido and the Python core development team moved to
BeOpen.com to form the BeOpen PythonLabs team.  In October of the same
year, the PythonLabs team moved to Digital Creations (now Zope
Corporation, see http://www.zope.com).  In 2001, the Python Software
Foundation (PSF, see http://www.python.org/psf/) was formed, a
non-profit organization created specifically to own Python-related
Intellectual Property.  Zope Corporation is a sponsoring member of
the PSF.

All Python releases are Open Source (see http://www.opensource.org for
the Open Source Definition).  Historically, most, but not all, Python
releases have also been GPL-compatible; the table below summarizes
the various releases.

    Release         Derived     Year        Owner       GPL-
                    from                                compatible? (1)

    0.9.0 thru 1.2              1991-1995   CWI         yes
    1.3 thru 1.5.2  1.2         1995-1999   CNRI        yes
    1.6             1.5.2       2000        CNRI        no
    2.0             1.6         2000        BeOpen.com  no
    1.6.1           1.6         2001        CNRI        yes (2)
    2.1             2.0+1.6.1   2001        PSF         no
    2.0.1           2.0+1.6.1   2001        PSF         yes
    2.1.1           2.1+2.0.1   2001        PSF         yes
    2.2             2.1.1       2001        PSF         yes
    2.1.2           2.1.1       2002        PSF         yes
    2.1.3           2.1.2       2002        PSF         yes
    2.2.1           2.2         2002        PSF         yes
    2.2.2           2.2.1       2002        PSF         yes
    2.2.3           2.2.2       2003        PSF         yes
    2.3             2.2.2       2002-2003   PSF         yes
    2.3.1           2.3         2002-2003   PSF         yes
    2.3.2           2.3.1       2002-2003   PSF         yes
    2.3.3           2.3.2       2002-2003   PSF         yes
    2.3.4           2.3.3       2004        PSF         yes
    2.3.5           2.3.4       2005        PSF         yes
    2.4             2.3         2004        PSF         yes
    2.4.1           2.4         2005        PSF         yes
    2.4.2           2.4.1       2005        PSF         yes
    2.4.3           2.4.2       2006        PSF         yes
    2.4.4           2.4.3       2006        PSF         yes
    2.5             2.4         2006        PSF         yes
    2.5.1           2.5         2007        PSF         yes
    2.5.2           2.5.1       2008        PSF         yes
    2.5.3           2.5.2       2008        PSF         yes
    2.6             2.5         2008        PSF         yes
    2.6.1           2.6         2008        PSF         yes
    2.6.2           2.6.1       2009        PSF         yes
    2.6.3           2.6.2       2009        PSF         yes
    2.6.4           2.6.3       2009        PSF         yes
    2.6.5           2.6.4       2010        PSF         yes

Footnotes:

(1) GPL-compatible doesn't mean that we're distributing Python under
    the GPL.  All Python licenses, unlike the GPL, let you distribute
    a modified version without making your changes open source.  The
    GPL-compatible licenses make it possible to combine Python with
    other software that is released under the GPL; the others don't.

(2) According to Richard Stallman, 1.6.1 is not GPL-compatible,
    because its license has a choice of law clause.  According to
    CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1
    is "not incompatible" with the GPL.

Thanks to the many outside volunteers who have worked under Guido's
direction to make these releases possible.


B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON
===============================================================

PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2
--------------------------------------------

1. This LICENSE AGREEMENT is between the Python Software Foundation
("PSF"), and the Individual or Organization ("Licensee") accessing and
otherwise using this software ("Python") in source or binary form and
its associated documentation.

2. Subject to the terms and conditions of this License Agreement, PSF hereby
grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce,
analyze, test, perform and/or display publicly, prepare derivative works,
distribute, and otherwise use Python alone or in any derivative version,
provided, however, that PSF's License Agreement and PSF's notice of copyright,
i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
Python Software Foundation; All Rights Reserved" are retained in Python alone or
in any derivative version prepared by Licensee.

3. In the event Licensee prepares a derivative work that is based on
or incorporates Python or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python.

4. PSF is making Python available to Licensee on an "AS IS"
basis.  PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED.  BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.

5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.

6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.

7. Nothing in this License Agreement shall be deemed to create any
relationship of agency, partnership, or joint venture between PSF and
Licensee.  This License Agreement does not grant permission to use PSF
trademarks or trade name in a trademark sense to endorse or promote
products or services of Licensee, or any third party.

8. By copying, installing or otherwise using Python, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.


BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0
-------------------------------------------

BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1

1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an
office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the
Individual or Organization ("Licensee") accessing and otherwise using
this software in source or binary form and its associated
documentation ("the Software").

2. Subject to the terms and conditions of this BeOpen Python License
Agreement, BeOpen hereby grants Licensee a non-exclusive,
royalty-free, world-wide license to reproduce, analyze, test, perform
and/or display publicly, prepare derivative works, distribute, and
otherwise use the Software alone or in any derivative version,
provided, however, that the BeOpen Python License is retained in the
Software, alone or in any derivative version prepared by Licensee.

3. BeOpen is making the Software available to Licensee on an "AS IS"
basis.  BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED.  BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.

4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE
SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS
AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY
DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.

5. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.

6. This License Agreement shall be governed by and interpreted in all
respects by the law of the State of California, excluding conflict of
law provisions.  Nothing in this License Agreement shall be deemed to
create any relationship of agency, partnership, or joint venture
between BeOpen and Licensee.  This License Agreement does not grant
permission to use BeOpen trademarks or trade names in a trademark
sense to endorse or promote products or services of Licensee, or any
third party.  As an exception, the "BeOpen Python" logos available at
http://www.pythonlabs.com/logos.html may be used according to the
permissions granted on that web page.

7. By copying, installing or otherwise using the software, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.


CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1
---------------------------------------

1. This LICENSE AGREEMENT is between the Corporation for National
Research Initiatives, having an office at 1895 Preston White Drive,
Reston, VA 20191 ("CNRI"), and the Individual or Organization
("Licensee") accessing and otherwise using Python 1.6.1 software in
source or binary form and its associated documentation.

2. Subject to the terms and conditions of this License Agreement, CNRI
hereby grants Licensee a nonexclusive, royalty-free, world-wide
license to reproduce, analyze, test, perform and/or display publicly,
prepare derivative works, distribute, and otherwise use Python 1.6.1
alone or in any derivative version, provided, however, that CNRI's
License Agreement and CNRI's notice of copyright, i.e., "Copyright (c)
1995-2001 Corporation for National Research Initiatives; All Rights
Reserved" are retained in Python 1.6.1 alone or in any derivative
version prepared by Licensee.  Alternately, in lieu of CNRI's License
Agreement, Licensee may substitute the following text (omitting the
quotes): "Python 1.6.1 is made available subject to the terms and
conditions in CNRI's License Agreement.  This Agreement together with
Python 1.6.1 may be located on the Internet using the following
unique, persistent identifier (known as a handle): 1895.22/1013.  This
Agreement may also be obtained from a proxy server on the Internet
using the following URL: http://hdl.handle.net/1895.22/1013".

3. In the event Licensee prepares a derivative work that is based on
or incorporates Python 1.6.1 or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python 1.6.1.

4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS"
basis.  CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED.  BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.

5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.

6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.

7. This License Agreement shall be governed by the federal
intellectual property law of the United States, including without
limitation the federal copyright law, and, to the extent such
U.S. federal law does not apply, by the law of the Commonwealth of
Virginia, excluding Virginia's conflict of law provisions.
Notwithstanding the foregoing, with regard to derivative works based
on Python 1.6.1 that incorporate non-separable material that was
previously distributed under the GNU General Public License (GPL), the
law of the Commonwealth of Virginia shall govern this License
Agreement only as to issues arising under or with respect to
Paragraphs 4, 5, and 7 of this License Agreement.  Nothing in this
License Agreement shall be deemed to create any relationship of
agency, partnership, or joint venture between CNRI and Licensee.  This
License Agreement does not grant permission to use CNRI trademarks or
trade name in a trademark sense to endorse or promote products or
services of Licensee, or any third party.

8. By clicking on the "ACCEPT" button where indicated, or by copying,
installing or otherwise using Python 1.6.1, Licensee agrees to be
bound by the terms and conditions of this License Agreement.

        ACCEPT


CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2
--------------------------------------------------

Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam,
The Netherlands.  All rights reserved.

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation, and that the name of Stichting Mathematisch
Centrum or CWI not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission.

STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.


================================================
FILE: LICENSE.txt
================================================
gdb-heap is licensed under the LGPLv2.1, with the exception of heap/python.py,
which is licensed under the PSF license.


================================================
FILE: README.md
================================================
gdb-heap
========

Original fork derived from `https://fedorahosted.org/gdb-heap/`.  This repo is now considered the official repository for the gdb-heap library.

Installation instructions
-------------------------
1. To get this module working with Ubuntu 16.04, make sure you have the following packages installed:

```
sudo apt-get install libc6-dev libc6-dbg python-gi libglib2.0-0-dbg python-ply
```

The original forked version assumes an "import gdb" module, which resides in
"/usr/share/glib-2.0/gdb" as part of the `libglib2.0-0-dbg` package.  Earlier versions
of Ubuntu have this library is located in the `ibglib2.0-dev` package.

There is also a conflict with the python-gobject-2 library, which are deprecated
Python bindings for the GObject library.  This package installs a glib/
directory inside /usr/lib/python2.7/dist-packages/glib/option.py, which many
Gtk-related modules depend.  You will therefore need to make sure the sys.path
for /usr/share/glib-2.0/gdb is declared first for this reason (see code
example).

You'll also want to install python-dbg since the package comes with the
debugging symbols for the stock Python 2.7, as well as a python-dbg binary
compiled with the --with-pydebug option that will only work with C extensions
modules compiled with the /usr/include/python2.7_d headers.

NOTE: The Python binary that accompanies Ubuntu distributions uses link-time
optimization compilation.  As a result, many of the Python data structures are
optimized out and prevent gdb-heap from being able to properly categorize the
various data structures.  To take advantage of this capability, you will need to
download the Python source and recompile without using the -flto option in
the CFLAGS/LDFLAGS configuration option.  Normally this capability is not used in
standard configure so simply compiling it should do the trick.  (If you want
to have SSL support in this binary, make sure to edit Modules/Setup.dist).

The python-dbg binary is compiled with the Py_TRACE_REFS conditional via the
--pydebug which modifies the internal Python data structures and adds two
pointers into every base PyObject, preventing previously compiled C extensions
to be used.  Using your own compiled version of Python is therefore the way to
go if you want to take advantage of the categorize features of gdb-heap and/or
inspecting the internal memory structures of Python.

2. Create a file that will help automate the loading of the gdbheap library:

gdb-heap-commands:

```
python
import sys
sys.path.insert(0, "/usr/share/glib-2.0/gdb")
sys.path.append("/usr/share/glib-2.0/gdb")
sys.path.append("/home/rhu/projects/gdb-heap")
import gdbheap
end
```

To attach to an existing process, you can execute as follows:

```bash
sudo gdb -p 7458 -x ~/gdb-heap-commands
```

To take a core dump of a process, you can do the following:

```
1) sudo gdb -p <pid>
2) Type "generate-core-file" at the GDB prompt.
3) Wait awhile (and be careful not to hit enter again, since it will repeat the same command)
4) Copy the core.<pid> file somewhere.
```

You can then use gdb to attach to this core file:

```bash
sudo gdb python <core file> -x ~/gdb-heap-commands
```


Commands to run
---------------

```
heap - print a report on memory usage, by category
heap sizes - print a report on memory usage, by sizes
heap used - print used heap chunks
heap free - print free heap chunks
heap all - print all heap chunks
heap log - print a log of recorded heap states
heap label - record the current state of the heap for later comparison
heap diff - compare two states of the heap
heap select - query used heap chunks
hexdump <addr> [-c] - print a hexdump, stating at the specific region of memory (expose hex characters with -c option)
heap arenas - print glibs arenas
heap arena <arena> - select glibc arena number
```

Useful resources
----------------

 * http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-dude-where-s-my-ram-a-deep-dive-into-how-python-uses-memory-4896725 (Dude - Where's My RAM?  A deep dive in how Python uses memory - David Malcom's PyCon 2011 video talk)

 * http://dmalcolm.fedorapeople.org/presentations/PyCon-US-2011/GdbPythonPresentation/GdbPython.html (David Malcom's PyCon 2011 slides)

 * http://code.woboq.org/userspace/glibc/malloc/malloc.c.html (malloc.c.html implementation)

 * Malloc per-thread arenas in glibc (http://siddhesh.in/journal/2012/10/24/malloc-per-thread-arenas-in-glibc/)

 * Understanding the heap by breaking it (http://www.blackhat.com/presentations/bh-usa-07/Ferguson/Whitepaper/bh-usa-07-ferguson-WP.pdf)

 * Building your own Python version for an easier debugging experience (http://hustoknow.blogspot.com/2014/06/how-to-troubleshoot-your-python.html)

================================================
FILE: gdbheap.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

from heap.commands import register_commands

# Register the commands with gdb:
register_commands()


================================================
FILE: heap/__init__.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

from collections import namedtuple

try:
    import gdb


    # We defer most type lookups to when they're needed, since they'll fail if the
    # DWARF data for the relevant DSO hasn't been loaded yet, which is typically
    # the case for an executable dynamically linked against glibc

    type_void_ptr = gdb.lookup_type('void').pointer()
    type_char_ptr = gdb.lookup_type('char').pointer()
    type_unsigned_char_ptr = gdb.lookup_type('unsigned char').pointer()
    sizeof_ptr = type_void_ptr.sizeof

    if sizeof_ptr == 4:
        def fmt_addr(addr):
            return '0x%08x' % addr
    else:
        # Assume 64-bit:
        def fmt_addr(addr):
            return '0x%016x' % addr

except ImportError:
    # Support importing heap.parser from outside gdb
    pass


class WrongInferiorProcess(RuntimeError):
    def __init__(self, hint):
        self.hint = hint

NUM_HEXDUMP_BYTES = 20

__type_cache = {}

def caching_lookup_type(typename):
    '''Adds caching to gdb.lookup_type(), whilst still raising RuntimeError if
    the type isn't found.'''
    if typename in __type_cache:
        gdbtype = __type_cache[typename]
        if gdbtype:
            return gdbtype
        raise RuntimeError('(cached) Could not find type "%s"' % typename)
    try:
        if 0:
            print('type cache miss: %r' % typename)
        gdbtype = gdb.lookup_type(typename).strip_typedefs()
    except RuntimeError as e:
        # did not find the type: add a None to the cache
        gdbtype = None
    __type_cache[typename] = gdbtype
    if gdbtype:
        return gdbtype
    raise RuntimeError('Could not find type "%s"' % typename)

def array_length(_gdbval):
    '''Given a gdb.Value that's an array, determine the number of elements in
    the array'''
    arr_size = _gdbval.type.sizeof
    elem_size = _gdbval[0].type.sizeof
    return arr_size/elem_size

def offsetof(typename, fieldname):
    '''Get the offset (in bytes) from the start of the given type to the given
    field'''

    # This is a transliteration to gdb's python API of:
    #    (int)(void*)&((#typename*)NULL)->#fieldname)

    t = caching_lookup_type(typename).pointer()
    v = gdb.Value(0)
    v = v.cast(t)
    field = v[fieldname].cast(type_void_ptr)
    return int(field.address)

class MissingDebuginfo(RuntimeError):
    def __init__(self, module):
        self.module = module

def check_missing_debuginfo(err, module):
    assert(isinstance(err, RuntimeError))
    if err.args[0] == 'Attempt to extract a component of a value that is not a (null).':
        # Then we likely are trying to extract a field from a struct but don't
        # have the DWARF description of the fields of the struct loaded:
        raise MissingDebuginfo(module)

class WrappedValue(object):
    """
    Base class, wrapping an underlying gdb.Value adding various useful methods,
    and allowing subclassing
    """
    def __init__(self, gdbval):
        self._gdbval = gdbval

    # __getattr__ just made it too confusing
    #def __getattr__(self, attr):
    #    return WrappedValue(self.val[attr])

    def field(self, attr):
        return self._gdbval[attr]

    def __str__(self):
        return str(self._gdbval)

    # See http://sourceware.org/gdb/onlinedocs/gdb/Values-From-Inferior.html#Values-From-Inferior
    @property
    def address(self):
        return self._gdbval.address

    @property
    def is_optimized_out(self):
        return self._gdbval.is_optimized_out

    @property
    def type(self):
        return self._gdbval.type

    @property
    def dynamic_type(self):
        return self._gdbval.dynamic_type

    @property
    def is_lazy(self):
        return self._gdbval.is_lazy

    def dereference(self):
        return WrappedValue(self._gdbval.dereference())

#    def address(self):
#        return int(self._gdbval.cast(type_void_ptr))

    def is_null(self):
        return int(self._gdbval) == 0

class WrappedPointer(WrappedValue):
    def as_address(self):
        return int(self._gdbval.cast(type_void_ptr))

    def __str__(self):
        return ('<%s for inferior 0x%x>'
                % (self.__class__.__name__,
                   self.as_address()
                   )
                )

    def cast(self, type_):
        return WrappedPointer(self._gdbval.cast(type_))

    def categorize_refs(self, usage_set, level=0, detail=None):
        '''Hook for categorizing references known by the type this points to'''
        # do nothing by default:
        pass


def fmt_size(size):
    '''
    Pretty-formatting of numeric values: return a string, subdividing the
    digits into groups of three, using commas
    '''
    s = str(size)
    result = ''
    while len(s)>3:
        result = ',' + s[-3:] + result
        s = s[0:-3]
    result = s + result
    return result

def as_hexdump_char(b):
    '''Given a byte, return a string for use by hexdump, converting
    non-printable/non-ASCII values as a period'''
    if b>=0x20 and b < 0x80:
        return chr(b)
    else:
        return '.'

def sign(amt):
    if amt >= 0:
        return '+'
    else:
        return '' # the '-' sign will come from the numeric repr


class Category(namedtuple('Category', ('domain', 'kind', 'detail'))):
    '''
    Categorization of an in-use area of memory

      domain: high-level grouping e.g. "python", "C++", etc

      kind: type information, appropriate to the domain e.g. a class/type

        Domain     Meaning of 'kind'
        ------     -----------------
        'C++'      the C++ class
        'python'   the python class
        'cpython'  C structure/type (implementation detail within Python)
        'pyarena'  Python memory allocator

      detail: additional detail
    '''

    def __new__(_cls, domain, kind, detail=None):
        return tuple.__new__(_cls, (domain, kind, detail))

    def __str__(self):
        return '%s:%s:%s' % (self.domain, self.kind, self.detail)

class Usage(object):
    # Information about an in-use area of memory
    slots = ('start', 'size', 'category', 'level', 'hd', 'obj')

    def __init__(self, start, size, category=None, level=None, hd=None, obj=None):
        assert isinstance(start, int)
        assert isinstance(size, int)
        if category:
            assert isinstance(category, Category)
        self.start = start
        self.size = size
        self.category = category
        self.level = level
        self.hd = hd
        self.obj = obj

    def __repr__(self):
        result = 'Usage(%s, %s' % (hex(self.start), hex(self.size))
        if self.category:
            result += ', %r' % (self.category, )
        if self.hd:
            result += ', hd=%r' % self.hd
        if self.obj:
            result += ', obj=%r' % self.obj
        return result + ')'

    def ensure_category(self, usage_set=None):
        if self.category is None:
            self.category = categorize(self, usage_set)

    def ensure_hexdump(self):
        if self.hd is None:
            self.hd = hexdump_as_bytes(self.start, NUM_HEXDUMP_BYTES)


def hexdump_as_bytes(addr, size, chars_only=True):
    addr = gdb.Value(addr).cast(type_unsigned_char_ptr)
    bytebuf = []
    for j in range(size):
        ptr = addr + j
        b = int(ptr.dereference())
        bytebuf.append(b)

    result = ''
    if not chars_only:
        result += ' '.join(['%02x' % b for b in bytebuf]) + ' |'
    result += ''.join([as_hexdump_char(b) for b in bytebuf])
    result += '|'

    return (result)

def hexdump_as_int(addr, count):
    addr = gdb.Value(addr).cast(caching_lookup_type('unsigned long').pointer())
    bytebuf = []
    longbuf = []
    for j in range(count):
        ptr = addr + j
        long = ptr.dereference()
        longbuf.append(long)
        bptr = gdb.Value(ptr).cast(type_unsigned_char_ptr)
        for i in range(sizeof_ptr):
            bytebuf.append(int((bptr + i).dereference()))
    return (' '.join([fmt_addr(int) for long in longbuf])
            + ' |'
            + ''.join([as_hexdump_char(b) for b in bytebuf])
            + '|')


class Table(object):
    '''A table of text/numbers that knows how to print itself'''
    def __init__(self, columnheadings=None, rows=[]):
        self.numcolumns = len(columnheadings)
        self.columnheadings = columnheadings
        self.rows = []
        self._colsep = '  '

    def add_row(self, row):
        assert len(row) == self.numcolumns
        self.rows.append(row)

    def write(self, out):
        colwidths = self._calc_col_widths()

        self._write_row(out, colwidths, self.columnheadings)

        self._write_separator(out, colwidths)

        for row in self.rows:
            self._write_row(out, colwidths, row)

    def _calc_col_widths(self):
        result = []
        for colIndex in range(self.numcolumns):
            result.append(self._calc_col_width(colIndex))
        return result

    def _calc_col_width(self, idx):
        cells = [str(row[idx]) for row in self.rows]
        heading = self.columnheadings[idx]
        return max([len(c) for c in (cells + [heading])])

    def _write_row(self, out, colwidths, values):
        for i, (value, width) in enumerate(zip(values, colwidths)):
            if i > 0:
                out.write(self._colsep)
            formatString = "%%%ds" % width # to generate e.g. "%20s"
            out.write(formatString % value)
        out.write('\n')

    def _write_separator(self, out, colwidths):
        for i, width in enumerate(colwidths):
            if i > 0:
                out.write(self._colsep)
            out.write('-' * width)
        out.write('\n')

class UsageSet(object):
    def __init__(self, usage_list):
        self.usage_list = usage_list

        # Ensure we can do fast lookups:
        self.usage_by_address = dict([(int(u.start), u) for u in usage_list])

    def set_addr_category(self, addr, category, level=0, visited=None, debug=False):
        '''Attempt to mark the given address as being of the given category,
        whilst maintaining a set of address already visited, to try to stop
        infinite graph traveral'''
        if visited:
            if addr in visited:
                if debug:
                    print('addr 0x%x already visited (for category %r)' % (addr, category))
                return False
            visited.add(addr)

        if addr in self.usage_by_address:
            if debug:
                print('addr 0x%x found (for category %r, level=%i)' % (addr, category, level))
            u = self.usage_by_address[addr]
            # Bail if we already have a more detailed categorization for the
            # address:
            if level <= u.level:
                if debug:
                    print ('addr 0x%x already has category %r (level %r)'
                           % (addr, u.category, u.level))
                return False
            u.category = category
            u.level = level
            return True
        else:
            if debug:
                print('addr 0x%x not found (for category %r)' % (addr, category))

class PythonCategorizer(object):
    '''
    Logic for categorizing buffers owned by Python objects.
    (Done as an object to capture the type-lookup state)
    '''
    def __init__(self):
        '''This will raise a TypeError if the types aren't available (e.g. not
        a python app, or debuginfo not available'''
        self._type_PyDictObject_ptr = caching_lookup_type('PyDictObject').pointer()
        self._type_PyListObject_ptr = caching_lookup_type('PyListObject').pointer()
        self._type_PySetObject_ptr = caching_lookup_type('PySetObject').pointer()
        self._type_PyUnicodeObject_ptr = caching_lookup_type('PyUnicodeObject').pointer()
        self._type_PyCodeObject_ptr = caching_lookup_type('PyCodeObject').pointer()
        self._type_PyGC_Head = caching_lookup_type('PyGC_Head')

    @classmethod
    def make(cls):
        '''Try to make a PythonCategorizer, if debuginfo is available; otherwise return None'''
        try:
            return cls()
        except RuntimeError:
            return None

    def categorize(self, u, usage_set):
        '''Try to categorize a Usage instance within an UsageSet (which could
        lead to further categorization)'''
        c = u.category
        if c.domain != 'python':
            return False
        if u.obj:
            if u.obj.categorize_refs(usage_set):
                return True

        if c.kind == 'list':
            list_ptr = gdb.Value(u.start + self._type_PyGC_Head.sizeof).cast(self._type_PyListObject_ptr)
            ob_item = int(list_ptr['ob_item'])
            usage_set.set_addr_category(ob_item,
                                        Category('cpython', 'PyListObject ob_item table', None))
            return True

        elif c.kind == 'set':
            set_ptr = gdb.Value(u.start + self._type_PyGC_Head.sizeof).cast(self._type_PySetObject_ptr)
            table = int(set_ptr['table'])
            usage_set.set_addr_category(table,
                                        Category('cpython', 'PySetObject setentry table', None))
            return True

        if c.kind == 'code':
            # Python 2.6's PyCode_Type doesn't have Py_TPFLAGS_HAVE_GC:
            code_ptr = gdb.Value(u.start).cast(self._type_PyCodeObject_ptr)
            co_code =  int(code_ptr['co_code'])
            usage_set.set_addr_category(co_code,
                                        Category('python', 'str', 'bytecode'), # FIXME: on py3k this should be bytes
                                        level=1)
            return True
        elif c.kind == 'sqlite3.Statement':
            ptr_type = caching_lookup_type('pysqlite_Statement').pointer()
            obj_ptr = gdb.Value(u.start).cast(ptr_type)
            #print obj_ptr.dereference()
            from heap.sqlite import categorize_sqlite3
            for fieldname, catname, fn in (('db', 'sqlite3', categorize_sqlite3),
                                           ('st', 'sqlite3_stmt', None)):
                field_ptr = int(obj_ptr[fieldname])

                # sqlite's src/mem1.c adds a a sqlite3_int64 (size) to the front
                # of the allocation, so we need to look 8 bytes earlier to find
                # the malloc-ed region:
                malloc_ptr = field_ptr - 8

                # print u, fieldname, category, field_ptr
                if usage_set.set_addr_category(malloc_ptr, Category('sqlite3', catname)):
                    if fn:
                        fn(field_ptr, usage_set, set())
            return True

        elif c.kind == 'rpm.hdr':
            ptr_type = caching_lookup_type('struct hdrObject_s').pointer()
            if ptr_type:
                obj_ptr = gdb.Value(u.start).cast(ptr_type)
                # print obj_ptr.dereference()
                h = obj_ptr['h']
                if usage_set.set_addr_category(int(h), Category('rpm', 'Header', None)):
                    blob = h['blob']
                    usage_set.set_addr_category(int(blob), Category('rpm', 'Header blob', None))

        elif c.kind == 'rpm.mi':
            ptr_type = caching_lookup_type('struct rpmmiObject_s').pointer()
            if ptr_type:
                obj_ptr = gdb.Value(u.start).cast(ptr_type)
                print(obj_ptr.dereference())
                mi = obj_ptr['mi']
                if usage_set.set_addr_category(int(mi),
                                               Category('rpm', 'rpmdbMatchIterator', None)):
                    pass
                    #blob = h['blob']
                    #usage_set.set_addr_category(int(blob), 'rpm Header blob')

        # Not categorized:
        return False

def _get_register_state():
    from heap.compat import execute
    return execute('thread apply all info registers')

__cached_usage_list = None
__cached_reg_state = None

def lazily_get_usage_list():
    '''Lazily do a full-graph categorization, getting a list of Usage instances'''
    global __cached_usage_list
    global __cached_reg_state

    reg_state = _get_register_state()
    # print 'reg_state', reg_state
    if __cached_usage_list and __cached_reg_state:
        # Verify that the inferior process hasn't changed state since the cache
        # was populated.
        # Something of a hack: verify that all registers have the same values:
        if reg_state == __cached_reg_state:
            # We can use the cache:
            # print 'USING THE CACHE'
            return __cached_usage_list

    # print 'REGENERATING THE CACHE'

    # Do the work:
    usage_list = list(iter_usage_with_progress())
    categorize_usage_list(usage_list)

    # Update the cache:
    __cached_usage_list = usage_list
    __cached_reg_state = reg_state

    return __cached_usage_list

def categorize_usage_list(usage_list):
    '''Do a "full-graph" categorization of the given list of Usage instances
    For example, if p is a (PyDictObject*), then mark p->ma_table and p->ma_mask
    accordingly
    '''
    usage_set = UsageSet(usage_list)
    visited = set()

    # Precompute some types, if available:
    pycategorizer = PythonCategorizer.make()

    for u in ProgressNotifier(iter(usage_list), 'Blocks analyzed'):
        # Cover the simple cases, where the category can be figured out directly:
        u.ensure_category(usage_set)

        # Cross-references:
        if u.obj:
            if u.obj.categorize_refs(usage_set):
                continue

        # Try to categorize buffers used by python objects:
        if pycategorizer:
            if pycategorizer.categorize(u, usage_set):
                continue

    from heap.cpython import python_categorization
    python_categorization(usage_set)


def categorize(u, usage_set):
    '''Given an in-use block, try to guess what it's being used for
    If usage_set is provided, this categorization may lead to further
    categorizations'''
    from heap.cpython import as_python_object, obj_addr_to_gc_addr
    addr, size = u.start, u.size
    pyop = as_python_object(addr)
    if pyop:
        u.obj = pyop
        try:
            return pyop.categorize()
        except (RuntimeError, UnicodeEncodeError, UnicodeDecodeError):
            # If something went wrong, assume that this wasn't really a python
            # object, and fall through:
            print("couldn't categorize pyop:", pyop)
            pass

    # PyPy detection:
    from heap.pypy import pypy_categorizer
    cat = pypy_categorizer(addr, size)
    if cat:
        return cat

    # C++ detection: only enabled if we can capture "execute"; there seems to
    # be a bad interaction between pagination and redirection: all output from
    # "heap" disappears in the fallback form of execute, unless we "set pagination off"
    from heap.compat import has_gdb_execute_to_string
    #  Disable for now, see https://bugzilla.redhat.com/show_bug.cgi?id=620930
    if False: # has_gdb_execute_to_string:
        from heap.cplusplus import get_class_name
        cpp_cls = get_class_name(addr, size)
        if cpp_cls:
            return Category('C++', cpp_cls)

    # GObject detection:
    from heap.gobject import as_gtype_instance
    ginst = as_gtype_instance(addr, size)
    if ginst:
        u.obj = ginst
        return ginst.categorize()

    s = as_nul_terminated_string(addr, size)
    if s and len(s) > 2:
        return Category('C', 'string data')

    # Uncategorized:
    return Category('uncategorized', '', '%s bytes' % size)

def as_nul_terminated_string(addr, size):
    # Does this look like a NUL-terminated string?
    ptr = gdb.Value(addr).cast(type_char_ptr)
    try:
        s = ptr.string(encoding='ascii')
        return s
    except (RuntimeError, UnicodeDecodeError):
        # Probably not string data:
        return None

class ProgressNotifier(object):
    '''Wrap an iterable with progress notification to stdout'''
    def __init__(self, inner, msg):
        self.inner = inner
        self.count = 0
        self.msg = msg

    def __iter__(self):
        return self

    def __next__(self):
        self.count += 1
        if 0 == self.count % 10000:
            print(self.msg, self.count)
        return self.inner.__next__()


def iter_usage_with_progress():
    return ProgressNotifier(iter_usage(), 'Blocks retrieved')


class CachedInferiorState(object):
    """
    Cached state containing information scraped from the inferior process
    """
    def __init__(self):
        self._arena_detectors = []

    def add_arena_detector(self, detector):
        self._arena_detectors.append(detector)

    def detect_arena(self, ptr, chunksize):
        '''Detect if this ptr returned by malloc is in use by any of the
        layered allocation schemes, returning arena object if it is, None
        if not'''
        for detector in self._arena_detectors:
            arena = detector.as_arena(ptr, chunksize)
            if arena:
                return arena

        # Not found:
        return None


def iter_usage():
    # Iterate through glibc, and within that, within Python arena blocks, as appropriate
    from heap.glibc import glibc_arenas
    ms = glibc_arenas.get_ms()

    cached_state = CachedInferiorState()

    from heap.cpython import ArenaDetection as CPythonArenaDetection, PyArenaPtr, ArenaObject
    try:
        cpython_arenas = CPythonArenaDetection()
        cached_state.add_arena_detector(cpython_arenas)
    except WrongInferiorProcess:
        pass

    from heap.pypy import ArenaDetection as PyPyArenaDetection
    try:
        pypy_arenas = PyPyArenaDetection()
        cached_state.add_arena_detector(pypy_arenas)
    except WrongInferiorProcess:
        pass

    for i, chunk in enumerate(ms.iter_mmap_chunks()):
        mem_ptr = chunk.as_mem()
        chunksize = chunk.chunksize()

        arena = cached_state.detect_arena(mem_ptr, chunksize)
        if arena:
            for u in arena.iter_usage():
                yield u
        else:
            yield Usage(int(mem_ptr), chunksize)

    for chunk in ms.iter_sbrk_chunks():
        mem_ptr = chunk.as_mem()
        chunksize = chunk.chunksize()

        if chunk.is_inuse():
            arena = cached_state.detect_arena(mem_ptr, chunksize)
            if arena:
                for u in arena.iter_usage():
                    yield u
            else:
                yield Usage(int(mem_ptr), chunksize)


def looks_like_ptr(value):
    '''Does this gdb.Value pointer's value looks reasonable?

    For use when casting a block of memory to a structure on pointer fields
    within that block of memory.
    '''

    # NULL is acceptable; assume that it's 0 on every arch we care about
    if value == 0:
        return True

    # Assume that pointers aren't allocated in the bottom 1MB of a process'
    # address space:
    if value < (1024 * 1024):
        return False

    # Assume that if it got this far, that it's valid:
    return True


================================================
FILE: heap/commands.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

import gdb
import re
import sys

from heap.glibc import glibc_arenas
from heap.history import history, Snapshot, Diff

from heap import lazily_get_usage_list, \
    fmt_size, fmt_addr, \
    categorize, categorize_usage_list, Usage, \
    hexdump_as_bytes, \
    Table, \
    MissingDebuginfo

def need_debuginfo(f):
    def g(self, args, from_tty):
        try:
            return f(self, args, from_tty)
        except MissingDebuginfo as e:
            print('Missing debuginfo for %s' % e.module)
            print('Suggested fix:')
            print('    debuginfo-install %s' % e.module)
    return g

class Heap(gdb.Command):
    'Print a report on memory usage, by category'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap",
                              gdb.COMMAND_DATA,
                              prefix=True)

    @need_debuginfo
    def invoke(self, args, from_tty):
        total_by_category = {}
        count_by_category = {}
        total_size = 0
        total_count = 0
        try:
            usage_list = list(lazily_get_usage_list())
            for u in usage_list:
                u.ensure_category()
                total_size += u.size
                if u.category in total_by_category:
                    total_by_category[u.category] += u.size
                else:
                    total_by_category[u.category] = u.size

                total_count += 1
                if u.category in count_by_category:
                    count_by_category[u.category] += 1
                else:
                    count_by_category[u.category] = 1

        except KeyboardInterrupt:
            pass # FIXME

        t = Table(['Domain', 'Kind', 'Detail', 'Count', 'Allocated size'])
        for category in sorted(total_by_category.keys(),
                               key=total_by_category.get,
                               reverse=True):
            detail = category.detail
            if not detail:
                detail = ''
            t.add_row([category.domain,
                       category.kind,
                       detail,
                       fmt_size(count_by_category[category]),
                       fmt_size(total_by_category[category]),
                       ])
        t.add_row(['', '', 'TOTAL', fmt_size(total_count), fmt_size(total_size)])
        t.write(sys.stdout)
        print()

class HeapSizes(gdb.Command):
    'Print a report on memory usage, by sizes'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap sizes",
                              gdb.COMMAND_DATA)
    @need_debuginfo
    def invoke(self, args, from_tty):
        ms = glibc_arenas.get_ms()
        chunks_by_size = {}
        num_chunks = 0
        total_size = 0
        try:
            for chunk in ms.iter_chunks():
                if not chunk.is_inuse():
                    continue
                size = int(chunk.chunksize())
                num_chunks += 1
                total_size += size
                if size in chunks_by_size:
                    chunks_by_size[size] += 1
                else:
                    chunks_by_size[size] = 1
        except KeyboardInterrupt:
            pass # FIXME
        t = Table(['Chunk size', 'Num chunks', 'Allocated size'])
        for size in sorted(chunks_by_size.keys(),
                           key=lambda s1: chunks_by_size[s1] * s1,
                           reverse=True):
            t.add_row([fmt_size(size),
                       chunks_by_size[size],
                       fmt_size(chunks_by_size[size] * size)])
        t.add_row(['TOTALS', num_chunks, fmt_size(total_size)])
        t.write(sys.stdout)
        print()


class HeapUsed(gdb.Command):
    'Print used heap chunks'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap used",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        print('Used chunks of memory on heap')
        print('-----------------------------')
        ms = glibc_arenas.get_ms()
        for i, chunk in enumerate(ms.iter_chunks()):
            if not chunk.is_inuse():
                continue
            size = chunk.chunksize()
            mem = chunk.as_mem()
            u = Usage(mem, size)
            category = categorize(u, None)
            hd = hexdump_as_bytes(mem, 32)
            print ('%6i: %s -> %s %8i bytes %20s |%s'
                   % (i,
                      fmt_addr(chunk.as_mem()),
                      fmt_addr(chunk.as_mem()+size-1),
                      size, category, hd))
        print()

class HeapFree(gdb.Command):
    'Print free heap chunks'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap free",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        print('Free chunks of memory on heap')
        print('-----------------------------')
        ms = glibc_arenas.get_ms()
        total_size = 0

        for i, chunk in enumerate(ms.iter_free_chunks()):
            size = chunk.chunksize()
            total_size += size
            mem = chunk.as_mem()
            u = Usage(mem, size)
            category = categorize(u, None)
            hd = hexdump_as_bytes(mem, 32)

            print ('%6i: %s -> %s %8i bytes %20s |%s'
                   % (i,
                      fmt_addr(chunk.as_mem()),
                      fmt_addr(chunk.as_mem()+size-1),
                      size, category, hd))

        print("Total size: %s" % total_size)


class HeapAll(gdb.Command):
    'Print all heap chunks'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap all",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        print('All chunks of memory on heap (both used and free)')
        print('-------------------------------------------------')
        ms = glibc_arenas.get_ms()
        for i, chunk in enumerate(ms.iter_chunks()):
            size = chunk.chunksize()
            if chunk.is_inuse():
                kind = ' inuse'
            else:
                kind = ' free'

            print ('%i: %s -> %s %s: %i bytes (%s)'
                   % (i,
                      fmt_addr(chunk.as_address()),
                      fmt_addr(chunk.as_address()+size-1),
                      kind, size, chunk))
        print()

class HeapLog(gdb.Command):
    'Print a log of recorded heap states'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap log",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        h = history
        if len(h.snapshots) == 0:
            print('(no history)')
            return
        for i in range(len(h.snapshots), 0, -1):
            s = h.snapshots[i-1]
            print('Label %i "%s" at %s' % (i, s.name, s.time))
            print('    ', s.summary())
            if i > 1:
                prev = h.snapshots[i-2]
                d = Diff(prev, s)
                print()
                print('    ', d.stats())
            print()

class HeapLabel(gdb.Command):
    'Record the current state of the heap for later comparison'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap label",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        s = history.add(args)
        print(s.summary())


class HeapDiff(gdb.Command):
    'Compare two states of the heap'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap diff",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        h = history
        if len(h.snapshots) == 0:
            print('(no history)')
            return
        prev = h.snapshots[-1]
        curr = Snapshot.current('current')
        d = Diff(prev, curr)
        print('Changes from %s to %s' % (prev.name, curr.name))
        print('  ', d.stats())
        print()
        print('\n'.join(['  ' + line for line in d.as_changes().splitlines()]))

class HeapSelect(gdb.Command):
    'Query used heap chunks'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap select",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        from heap.query import do_query
        from heap.parser import ParserError
        try:
            do_query(args)
        except ParserError as e:
            print(e)

class Hexdump(gdb.Command):
    'Print a hexdump, starting at the specific region of memory'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "hexdump",
                              gdb.COMMAND_DATA)

    def invoke(self, args, from_tty):
        print(repr(args))
        arg_list = gdb.string_to_argv(args)

        chars_only = True

        if len(arg_list) == 2:
            addr_arg = arg_list[0]
            chars_only = True if args[1] == '-c' else False
        else:
            addr_arg = args

        if addr_arg.startswith('0x'):
            addr = int(addr_arg, 16)
        else:
            addr = int(addr_arg)

        # assume that paging will cut in and the user will quit at some point:
        size = 32
        while True:
            hd = hexdump_as_bytes(addr, size, chars_only=chars_only)
            print ('%s -> %s %s' % (fmt_addr(addr), fmt_addr(addr + size -1), hd))
            addr += size

class HeapArenas(gdb.Command):
    'Display heap arenas available'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap arenas",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        for n, arena in enumerate(glibc_arenas.arenas):
            print("Arena #%d: %s" % (n, arena.address))

class HeapArenaSelect(gdb.Command):
    'Select heap arena'
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap arena",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        arena_num = int(args)

        glibc_arenas.cur_arena = glibc_arenas.arenas[arena_num]
        print("Arena set to %s" % glibc_arenas.cur_arena.address)


def register_commands():
    # Register the commands with gdb
    Heap()
    HeapSizes()
    HeapUsed()
    HeapFree()
    HeapAll()
    HeapLog()
    HeapLabel()
    HeapDiff()
    HeapSelect()
    HeapArenas()
    HeapArenaSelect()
    Hexdump()

    from heap.cpython import register_commands as register_cpython_commands
    register_cpython_commands()


================================================
FILE: heap/compat.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

'''
gdb versions vary greatly, this is a central place to
deal with varying capabilities of the underlying gdb and its python bindings
'''
import gdb

# gdb.execute's to_string keyword argument was added between F13 and F14.
# See https://bugzilla.redhat.com/show_bug.cgi?id=610241

has_gdb_execute_to_string = True
try:
    # This will either capture the result, or fail before executing,
    # so in neither case should we get noise on stdout:
    gdb.execute('info registers', to_string=True)
except TypeError:
    has_gdb_execute_to_string = False

def execute(command):
    '''Equivalent to gdb.execute(to_string=True), returning the output as
    a string rather than logging it to stdout.

    On gdb versions lacking this capability, it uses redirection and temporary
    files to achieve the same result'''
    if has_gdb_execute_to_string:
        return gdb.execute(command, to_string = True)
    else:
        import tempfile
        f = tempfile.NamedTemporaryFile('r', delete=True)
        gdb.execute("set logging off")
        gdb.execute("set logging redirect off")
        gdb.execute("set logging file %s" % f.name)
        gdb.execute("set logging redirect on")
        gdb.execute("set logging on")
        gdb.execute(command)
        gdb.execute("set logging off")
        gdb.execute("set logging redirect off")
        result = f.read()
        f.close()
        return result

def dump():
    print ('Does gdb.execute have an "to_string" keyword argument? : %s' 
           % has_gdb_execute_to_string)


================================================
FILE: heap/cplusplus.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

# C++ support
import re

import gdb

from heap import caching_lookup_type, looks_like_ptr
from heap.compat import execute

void_ptr_ptr = caching_lookup_type('void').pointer().pointer()

def get_class_name(addr, size):
    # Try to detect a vtable ptr at the top of this object:
    vtable = gdb.Value(addr).cast(void_ptr_ptr).dereference()
    if not looks_like_ptr(vtable):
        return None

    info = execute('info sym (void *)0x%x' % int(vtable))
    # "vtable for Foo + 8 in section .rodata of /home/david/heap/test_cplusplus"
    m = re.match('vtable for (.*) \+ (.*)', info)
    if m:
        return m.group(1)
    # Not matched:
    return None


def as_cplusplus_object(addr, size):
    print(get_class_name(addr))
    pass


================================================
FILE: heap/cpython.py
================================================
'''
This file is licensed under the PSF license
'''
import sys
import gdb
from heap import WrappedPointer, caching_lookup_type, Usage, \
    type_void_ptr, fmt_addr, Category, looks_like_ptr, \
    WrongInferiorProcess, Table


SIZEOF_VOID_P = type_void_ptr.sizeof

# Transliteration from Python's obmalloc.c:
ALIGNMENT             = 8
ALIGNMENT_SHIFT       = 3
ALIGNMENT_MASK        = (ALIGNMENT - 1)

# Return the number of bytes in size class I:
def INDEX2SIZE(I):
    return (I + 1) << ALIGNMENT_SHIFT

SYSTEM_PAGE_SIZE      = (4 * 1024)
SYSTEM_PAGE_SIZE_MASK = (SYSTEM_PAGE_SIZE - 1)
ARENA_SIZE            = (256 << 10)
POOL_SIZE             = SYSTEM_PAGE_SIZE
POOL_SIZE_MASK        = SYSTEM_PAGE_SIZE_MASK
def ROUNDUP(x):
    return (x + ALIGNMENT_MASK) & ~ALIGNMENT_MASK

def POOL_OVERHEAD():
    return ROUNDUP(caching_lookup_type('struct pool_header').sizeof)

class PyArenaPtr(WrappedPointer):
    # Wrapper around a (void*) that's a Python arena's buffer (the
    # arena->address, as opposed to the (struct arena_object*) itself)
    @classmethod
    def from_addr(cls, p, arenaobj):
        ptr = gdb.Value(p)
        ptr = ptr.cast(type_void_ptr)
        return cls(ptr, arenaobj)

    def __init__(self, gdbval, arenaobj):
        WrappedPointer.__init__(self, gdbval)

        assert(isinstance(arenaobj, ArenaObject))
        self.arenaobj = arenaobj

        # obmalloc.c sets up arenaobj->pool_address to the first pool
        # address, aligning it to POOL_SIZE_MASK:
        self.initial_pool_addr = self.as_address()
        self.num_pools = ARENA_SIZE / POOL_SIZE
        self.excess = self.initial_pool_addr & POOL_SIZE_MASK
        if self.excess != 0:
            self.num_pools -= 1
            self.initial_pool_addr += POOL_SIZE - self.excess

    def __str__(self):
        return ('PyArenaPtr([%s->%s], %i pools [%s->%s], excess: %i tracked by %s)'
                % (fmt_addr(self.as_address()),
                   fmt_addr(self.as_address() + ARENA_SIZE - 1),
                   self.num_pools,
                   fmt_addr(self.initial_pool_addr),
                   fmt_addr(self.initial_pool_addr
                            + (self.num_pools * POOL_SIZE) - 1),
                   self.excess,
                   self.arenaobj
                   )
                )

    def iter_pools(self):
        '''Yield a sequence of PyPoolPtr, representing all of the pools within
        this arena'''
        # print 'num_pools:', num_pools
        pool_addr = self.initial_pool_addr
        for idx in range(self.num_pools):

            # "pool_address" is a high-water-mark for activity within the arena;
            # pools at this location or beyond haven't been initialized yet:
            if pool_addr >= self.arenaobj.pool_address:
                return

            pool = PyPoolPtr.from_addr(pool_addr)
            yield pool
            pool_addr += POOL_SIZE

    def iter_usage(self):
        '''Yield a series of Usage instances'''
        if self.excess != 0:
            # FIXME: this size is wrong
            yield Usage(self.as_address(), self.excess, Category('pyarena', 'alignment wastage'))

        for pool in self.iter_pools():
            # print 'pool:', pool
            for u in pool.iter_usage():
                yield u

        # FIXME: unused space (if any) between pool_address and the alignment top

        # if self.excess != 0:
        #    # FIXME: this address is wrong
        #    yield Usage(self.as_address(), self.excess, Category('pyarena', 'alignment wastage'))


class PyPoolPtr(WrappedPointer):
    # Wrapper around Python's obmalloc.c: poolp: (struct pool_header *)

    @classmethod
    def from_addr(cls, p):
        ptr = gdb.Value(p)
        ptr = ptr.cast(cls.gdb_type())
        return cls(ptr)

    def __str__(self):
        return ('PyPoolPtr([%s->%s: %d blocks of size %i bytes))'
                % (fmt_addr(self.as_address()), fmt_addr(self.as_address() + POOL_SIZE - 1),
                   self.num_blocks(), self.block_size()))

    @classmethod
    def gdb_type(cls):
        # Deferred lookup of the "poolp" type:
        return caching_lookup_type('poolp')

    def block_size(self):
        return INDEX2SIZE(self.field('szidx'))

    def num_blocks(self):
        firstoffset = self._firstoffset()
        maxnextoffset = self._maxnextoffset()
        offsetrange = maxnextoffset - firstoffset
        return offsetrange / self.block_size() # FIXME: not exactly correctly

    def _firstoffset(self):
        return POOL_OVERHEAD()

    def _maxnextoffset(self):
        return POOL_SIZE - self.block_size()

    def iter_blocks(self):
        '''Yield all blocks within this pool, whether free or in use'''
        size = self.block_size()
        maxnextoffset = self._maxnextoffset()
        # print initnextoffset, maxnextoffset
        offset = self._firstoffset()
        base_addr = self.as_address()
        while offset <= maxnextoffset:
            yield (base_addr + offset, size)
            offset += size

    def iter_usage(self):
        # The struct pool_header at the front:
        yield Usage(self.as_address(),
                    POOL_OVERHEAD(),
                    Category('pyarena', 'pool_header overhead'))

        fb = list(self.iter_free_blocks())
        for (start, size) in fb:
            yield Usage(start, size, Category('pyarena', 'freed pool chunk'))

        for (start, size) in self.iter_used_blocks():
            if (start, size) not in fb:
                yield Usage(start, size) #, 'python pool: ' + categorize(start, size, None))

        # FIXME: yield any wastage at the end

    def iter_free_blocks(self):
        '''Yield the sequence of free blocks within this pool.  Doesn't include
        the areas after nextoffset that have never been allocated'''
        # print self._gdbval.dereference()
        size = self.block_size()
        freeblock = self.field('freeblock')
        _type_block_ptr_ptr = caching_lookup_type('unsigned char').pointer().pointer()
        # Walk the singly-linked list of free blocks for this chunk
        while int(freeblock) != 0:
            # print 'freeblock:', (fmt_addr(int(freeblock)), int(size))
            yield (int(freeblock), int(size))
            freeblock = freeblock.cast(_type_block_ptr_ptr).dereference()

    def _free_blocks(self):
        # Get the set of addresses of free blocks
        return set([addr for addr, size in self.iter_free_blocks()])

    def iter_used_blocks(self):
        '''Yield the sequence of currently in-use blocks within this pool'''
        # We'll filter out the free blocks from the list:
        free_block_addresses = self._free_blocks()

        size = self.block_size()
        initnextoffset = self._firstoffset()
        nextoffset = self.field('nextoffset')
        #print initnextoffset, nextoffset
        offset = initnextoffset
        base_addr = self.as_address()
        # Iterate upwards until you reach "pool->nextoffset": blocks beyond
        # that point have never been allocated:
        while offset < nextoffset:
            addr = base_addr + offset
            # Filter out those within this pool's linked list of free blocks:
            if int(addr) not in free_block_addresses:
                yield (int(addr), int(size))
            offset += size


Py_TPFLAGS_HEAPTYPE = (1 << 9)

Py_TPFLAGS_INT_SUBCLASS      = (1 << 23)
Py_TPFLAGS_LONG_SUBCLASS     = (1 << 24)
Py_TPFLAGS_LIST_SUBCLASS     = (1 << 25)
Py_TPFLAGS_TUPLE_SUBCLASS    = (1 << 26)
Py_TPFLAGS_STRING_SUBCLASS   = (1 << 27)
Py_TPFLAGS_UNICODE_SUBCLASS  = (1 << 28)
Py_TPFLAGS_DICT_SUBCLASS     = (1 << 29)
Py_TPFLAGS_BASE_EXC_SUBCLASS = (1 << 30)
Py_TPFLAGS_TYPE_SUBCLASS     = (1 << 31)

class PyObjectPtr(WrappedPointer):
    @classmethod
    def from_pyobject_ptr(cls, addr):
        ob_type = addr['ob_type']
        tp_flags = ob_type['tp_flags']
        if tp_flags & Py_TPFLAGS_HEAPTYPE:
            return HeapTypeObjectPtr(addr)

        if tp_flags & Py_TPFLAGS_UNICODE_SUBCLASS:
            return PyUnicodeObjectPtr(addr.cast(caching_lookup_type('PyUnicodeObject').pointer()))

        if tp_flags & Py_TPFLAGS_DICT_SUBCLASS:
            return PyDictObjectPtr(addr.cast(caching_lookup_type('PyDictObject').pointer()))

        tp_name = ob_type['tp_name'].string()
        if tp_name == 'instance':
            __type_PyInstanceObjectPtr = caching_lookup_type('PyInstanceObject').pointer()
            return PyInstanceObjectPtr(addr.cast(__type_PyInstanceObjectPtr))

        return PyObjectPtr(addr)

    def type(self):
        return PyTypeObjectPtr(self.field('ob_type'))

    def safe_tp_name(self):
        try:
            return self.type().field('tp_name').string()
        except(RuntimeError, UnicodeDecodeError):
            # Can't even read the object at all?
            return 'unknown'

    def categorize(self):
        # Python objects will be categorized as ("python", tp_name), but
        # old-style classes have to do more work
        return Category('python', self.safe_tp_name())

    def as_malloc_addr(self):
        ob_type = addr['ob_type']
        tp_flags = ob_type['tp_flags']
        addr = int(self._gdbval)
        if tp_flags & Py_TPFLAGS_: # FIXME
            return obj_addr_to_gc_addr(addr)
        else:
            return addr

# Taken from my libpython.py code in python's Tools/gdb/libpython.py
# FIXME: ideally should share code somehow
def _PyObject_VAR_SIZE(typeobj, nitems):
    type_size_t = caching_lookup_type('size_t')
    return ( ( typeobj.field('tp_basicsize') +
               nitems * typeobj.field('tp_itemsize') +
               (SIZEOF_VOID_P - 1)
             ) & ~(SIZEOF_VOID_P - 1)
           ).cast(type_size_t)
def int_from_int(gdbval):
    return int(gdbval)

class PyUnicodeObjectPtr(PyObjectPtr):
    """
    Class wrapping a gdb.Value that's a PyUnicodeObject* within the process
    being debugged.
    """
    _typename = 'PyUnicodeObject'

    def categorize_refs(self, usage_set, level=0, detail=None):
        m_str = int(self.field('str'))
        usage_set.set_addr_category(m_str,
                                    Category('cpython', 'PyUnicodeObject buffer', detail),
                                    level)
        return True

class PyDictObjectPtr(PyObjectPtr):
    """
    Class wrapping a gdb.Value that's a PyDictObject* i.e. a dict instance
    within the process being debugged.
    """
    _typename = 'PyDictObject'

    def categorize_refs(self, usage_set, level=0, detail=None):
        ma_table = int(self.field('ma_table'))
        usage_set.set_addr_category(ma_table,
                                    Category('cpython', 'PyDictEntry table', detail),
                                    level)
        return True

class PyInstanceObjectPtr(PyObjectPtr):
    _typename = 'PyInstanceObject'

    def cl_name(self):
        in_class = self.field('in_class')
        # cl_name is a python string, not a char*; rely on
        # prettyprinters for now:
        cl_name = str(in_class['cl_name'])[1:-1]
        return cl_name

    def categorize(self):
        return Category('python', self.cl_name(), 'old-style')

    def categorize_refs(self, usage_set, level=0, detail=None):
        cl_name = self.cl_name()
        # print 'cl_name', cl_name

        # Visit the in_dict:
        in_dict = self.field('in_dict')
        # print 'in_dict', in_dict

        dict_detail = '%s.__dict__' % cl_name

        # Mark the ptr as being a dictionary, adding detail
        usage_set.set_addr_category(obj_addr_to_gc_addr(in_dict),
                                    Category('cpython', 'PyDictObject', dict_detail),
                                    level=1)

        # Visit ma_table:
        _type_PyDictObject_ptr = caching_lookup_type('PyDictObject').pointer()
        in_dict = in_dict.cast(_type_PyDictObject_ptr)

        ma_table = int(in_dict['ma_table'])

        # Record details:
        usage_set.set_addr_category(ma_table,
                                    Category('cpython', 'PyDictEntry table', dict_detail),
                                    level=2)
        return True

class PyTypeObjectPtr(PyObjectPtr):
    _typename = 'PyTypeObject'

class HeapTypeObjectPtr(PyObjectPtr):
    _typename = 'PyObject'

    def categorize_refs(self, usage_set, level=0, detail=None):
        attr_dict = self.get_attr_dict()
        if attr_dict:
            # Mark the dictionary's "detail" with our typename
            # gdb.execute('print (PyObject*)0x%x' % int(attr_dict._gdbval))
            usage_set.set_addr_category(obj_addr_to_gc_addr(attr_dict._gdbval),
                                        Category('python', 'dict', '%s.__dict__' % self.safe_tp_name()),
                                        level=level+1)

            # and mark the dict's PyDictEntry with our typename:
            attr_dict.categorize_refs(usage_set, level=level+1,
                                      detail='%s.__dict__' % self.safe_tp_name())
        return True

    def get_attr_dict(self):
        '''
        Get the PyDictObject ptr representing the attribute dictionary
        (or None if there's a problem)
        '''
        from heap import type_char_ptr
        try:
            typeobj = self.type()
            dictoffset = int_from_int(typeobj.field('tp_dictoffset'))
            if dictoffset != 0:
                if dictoffset < 0:
                    type_PyVarObject_ptr = caching_lookup_type('PyVarObject').pointer()
                    tsize = int_from_int(self._gdbval.cast(type_PyVarObject_ptr)['ob_size'])
                    if tsize < 0:
                        tsize = -tsize
                    size = _PyObject_VAR_SIZE(typeobj, tsize)
                    dictoffset += size
                    assert dictoffset > 0
                    if dictoffset % SIZEOF_VOID_P != 0:
                        # Corrupt somehow?
                        return None

                dictptr = self._gdbval.cast(type_char_ptr) + dictoffset
                PyObjectPtrPtr = caching_lookup_type('PyObject').pointer().pointer()
                dictptr = dictptr.cast(PyObjectPtrPtr)
                return PyObjectPtr.from_pyobject_ptr(dictptr.dereference())
        except RuntimeError:
            # Corrupt data somewhere; fail safe
            pass

        # Not found, or some kind of error:
        return None

def is_pyobject_ptr(addr):
    try:
        _type_pyop = caching_lookup_type('PyObject').pointer()
        _type_pyvarop = caching_lookup_type('PyVarObject').pointer()
    except RuntimeError:
        # not linked against python
        return None

    pyop = gdb.Value(addr).cast(_type_pyop)
    try:
        ob_refcnt = pyop['ob_refcnt']
        if ob_refcnt >=0 and ob_refcnt < 0xffff:
            obtype = pyop['ob_type']
            if obtype != 0:
                type_refcnt = obtype.cast(_type_pyop)['ob_refcnt']
                if type_refcnt > 0 and type_refcnt < 0xffff:
                    type_ob_size = obtype.cast(_type_pyvarop)['ob_size']

                    if type_ob_size > 0xffff:
                        return 0

                    for fieldname in ('tp_del', 'tp_mro', 'tp_init', 'tp_getset'):
                        if not looks_like_ptr(obtype[fieldname]):
                            return 0

                    # Then this looks like a Python object:
                    return PyObjectPtr.from_pyobject_ptr(pyop)

    except (RuntimeError, UnicodeDecodeError):
        pass # Not a python object (or corrupt)

    # Doesn't look like a python object, implicit return None

def obj_addr_to_gc_addr(addr):
    '''Given a PyObject* address, convert to a PyGC_Head* address
    (i.e. the allocator's view of the same)'''
    #print 'obj_addr_to_gc_addr(%s)' % fmt_addr(int(addr))
    _type_PyGC_Head = caching_lookup_type('PyGC_Head')
    return int(addr) - _type_PyGC_Head.sizeof

def as_python_object(addr):
    '''Given an address of an allocation, determine if it holds a PyObject,
    or a PyGC_Head

    Return a WrappedPointer for the PyObject* if it does (which might have a
    different location c.f. when PyGC_Head was allocated)

    Return None if it doesn't look like a PyObject*'''
    # Try casting to PyObject* ?
    # FIXME: what about the debug allocator?
    try:
        _type_pyop = caching_lookup_type('PyObject').pointer()
        _type_PyGC_Head = caching_lookup_type('PyGC_Head')
    except RuntimeError:
        # not linked against python
        return None
    pyop = is_pyobject_ptr(addr)
    if pyop:
        return pyop
    else:
        # maybe a GC type:
        _type_PyGC_Head_ptr = _type_PyGC_Head.pointer()
        gc_ptr = gdb.Value(addr).cast(_type_PyGC_Head_ptr)
        # print gc_ptr.dereference()

        PYGC_REFS_REACHABLE = -3

        if gc_ptr['gc']['gc_refs'] == PYGC_REFS_REACHABLE:  # FIXME: need to cover other values
            pyop = is_pyobject_ptr(gdb.Value(addr + _type_PyGC_Head.sizeof))
            if pyop:
                return pyop
    # Doesn't look like a python object, implicit return None


class ArenaObject(WrappedPointer):
    '''
    Wrapper around Python's struct arena_object*
    Note that this is record-keeping for an arena, not the
    memory itself
    '''
    @classmethod
    def iter_arenas(cls):
        try:
            val_arenas = gdb.parse_and_eval('arenas')
            val_maxarenas = gdb.parse_and_eval('maxarenas')
        except RuntimeError:
            # Not linked against python, or no debug information:
            raise WrongInferiorProcess('cpython')

        try:
            for i in range(val_maxarenas):
                # Look up "&arenas[i]":
                obj = ArenaObject(val_arenas[i].address)

                # obj->address == 0 indicates an unused entry within the "arenas" array:
                if obj.address != 0:
                    yield obj
        except RuntimeError:
            # pypy also has a symbol named "arenas", of type "long unsigned int * volatile"
            # For now, ignore it:
            return

    @property  # need to override the base property
    def address(self):
        return self.field('address')

    def __init__(self, gdbval):
        WrappedPointer.__init__(self, gdbval)

        # Cache some values:
        # This is the high-water mark: at this point and beyond, the bytes of
        # memory are untouched since malloc:
        self.pool_address = self.field('pool_address')


class ArenaDetection(object):
    '''Detection of CPython arenas, done as an object so that we can cache state'''
    def __init__(self):
        self.arenaobjs = list(ArenaObject.iter_arenas())

    def as_arena(self, ptr, chunksize):
        '''Detect if this ptr returned by malloc is in use as a Python arena,
        returning PyArenaPtr if it is, None if not'''
        # Fast rejection of too-small chunks:
        if chunksize < (256 * 1024):
            return None

        for arenaobj in self.arenaobjs:
            if ptr == arenaobj.address:
                # Found it:
                return PyArenaPtr.from_addr(ptr, arenaobj)

        # Not found:
        return None


def python_categorization(usage_set):
    # special-cased categorization for CPython

    # The Objects/stringobject.c:interned dictionary is typically large,
    # with its PyDictEntry table occuping 200k on a 64-bit build of python 2.6
    # Identify it:
    try:
        val_interned = gdb.parse_and_eval('interned')
        pyop = PyDictObjectPtr.from_pyobject_ptr(val_interned)
        ma_table = int(pyop.field('ma_table'))
        usage_set.set_addr_category(ma_table,
                                    Category('cpython', 'PyDictEntry table', 'interned'),
                                    level=1)
    except RuntimeError:
        pass

    # Various kinds of per-type optimized allocator
    # See Modules/gcmodule.c:clear_freelists

    # The Objects/intobject.c: block_list
    try:
        val_block_list = gdb.parse_and_eval('block_list')
        if str(val_block_list.type.target()) != 'PyIntBlock':
            raise RuntimeError
        while int(val_block_list) != 0:
            usage_set.set_addr_category(int(val_block_list),
                                        Category('cpython', '_intblock', ''),
                                        level=0)
            val_block_list = val_block_list['next']

    except RuntimeError:
        pass

    # The Objects/floatobject.c: block_list
    # TODO: how to get at this? multiple vars named "block_list"

    # Objects/methodobject.c: PyCFunction_ClearFreeList
    #   "free_list" of up to 256 PyCFunctionObject, but they're still of
    #   that type

    # Objects/classobject.c: PyMethod_ClearFreeList
    #   "free_list" of up to 256 PyMethodObject, but they're still of that type

    # Objects/frameobject.c: PyFrame_ClearFreeList
    #   "free_list" of up to 300 PyFrameObject, but they're still of that type

    # Objects/tupleobject.c: array of free_list: up to 2000 free tuples of each
    # size from 1-20 (using ob_item[0] to chain up); singleton for size 0; they
    # are still tuples when deallocated, though

    # Objects/unicodeobject.c:
    #   "free_list" of up to 1024 PyUnicodeObject, with the "str" buffer
    #   optionally preserved also for lengths up to 9
    #   They're all still of type "unicode" when free
    #   Singletons for the empty unicode string, and for the first 256 code
    #   points (Latin-1)

# New gdb commands, specific to CPython

from heap.commands import need_debuginfo


class HeapCPythonAllocators(gdb.Command):
    "For CPython: display information on the allocators"
    def __init__(self):
        gdb.Command.__init__ (self,
                              "heap cpython-allocators",
                              gdb.COMMAND_DATA)

    @need_debuginfo
    def invoke(self, args, from_tty):
        t = Table(columnheadings=('struct arena_object*', '256KB buffer location', 'Free pools'))
        for arena in ArenaObject.iter_arenas():
            t.add_row([fmt_addr(arena.as_address()),
                       fmt_addr(arena.address),
                       '%i / %i ' % (arena.field('nfreepools'),
                                     arena.field('ntotalpools'))
                       ])
        print('Objects/obmalloc.c: %i arenas' % len(t.rows))
        t.write(sys.stdout)
        print()


def register_commands():
    HeapCPythonAllocators()


================================================
FILE: heap/glibc.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

'''
gdb 7 hooks for glibc's heap implementation

See /usr/src/debug/glibc-*/malloc/
e.g. /usr/src/debug/glibc-2.11.1/malloc/malloc.h and /usr/src/debug/glibc-2.11.1/malloc/malloc.c

This file is licenced under the LGPLv2.1
'''

import re

import gdb

from heap import WrappedPointer, WrappedValue, caching_lookup_type, \
    type_char_ptr, check_missing_debuginfo, array_length, offsetof

class MChunkPtr(WrappedPointer):
    '''Wrapper around glibc's mchunkptr

    Note:
      as_address() gives the address of the chunk as seen by the malloc implementation
      as_mem() gives the address as seen by the user of malloc'''

    # size field is or'ed with PREV_INUSE when previous adjacent chunk in use
    PREV_INUSE = 0x1

    # /* extract inuse bit of previous chunk */
    # #define prev_inuse(p)       ((p)->size & PREV_INUSE)


    # size field is or'ed with IS_MMAPPED if the chunk was obtained with mmap()
    IS_MMAPPED = 0x2

    # /* check for mmap()'ed chunk */
    # #define chunk_is_mmapped(p) ((p)->size & IS_MMAPPED)


    # size field is or'ed with NON_MAIN_ARENA if the chunk was obtained
    # from a non-main arena.  This is only set immediately before handing
    # the chunk to the user, if necessary.
    NON_MAIN_ARENA = 0x4

    # /* check for chunk from non-main arena */
    # #define chunk_non_main_arena(p) ((p)->size & NON_MAIN_ARENA)

    SIZE_BITS = (PREV_INUSE|IS_MMAPPED|NON_MAIN_ARENA)

    @classmethod
    def gdb_type(cls):
        # Deferred lookup of the "mchunkptr" type:
        return caching_lookup_type('mchunkptr')

    def size(self):
        if not(hasattr(self, '_cached_size')):
            self._cached_size = int(self.field('mchunk_size'))
        return self._cached_size

    def chunksize(self):
        return self.size() & ~(self.SIZE_BITS)

    def has_flag(self, flag):
        return self.size() & flag

    def has_PREV_INUSE(self):
        return self.has_flag(self.PREV_INUSE)

    def has_IS_MMAPPED(self):
        return self.has_flag(self.IS_MMAPPED)

    def has_NON_MAIN_ARENA(self):
        return self.has_flag(self.NON_MAIN_ARENA)

    def __str__(self):
        result = ('<%s chunk=0x%x mem=0x%x'
                  % (self.__class__.__name__,
                     self.as_address(),
                     self.as_mem()))
        if self.has_PREV_INUSE():
            result += ' PREV_INUSE'
        else:
            result += ' prev_size=%i' % self.field('mchunk_prev_size')
        if self.has_NON_MAIN_ARENA():
            result += ' NON_MAIN_ARENA'
        if self.has_IS_MMAPPED():
            result += ' IS_MMAPPED'
        else:
            if self.is_inuse():
                result += ' inuse'
            else:
                result += ' free'
        SIZE_SZ = caching_lookup_type('size_t').sizeof
        result += ' chunksize=%i memsize=%i>' % (self.chunksize(),
                                                 self.chunksize() - (2 * SIZE_SZ))
        return result

    def as_mem(self):
        # Analog of chunk2mem: the address as seen by the program (e.g. malloc)
        SIZE_SZ = caching_lookup_type('size_t').sizeof
        return self.as_address() + (2 * SIZE_SZ)

    def is_inuse(self):
        # Is this chunk is use?
        if self.has_IS_MMAPPED():
            return True
        # Analog of #define inuse(p)
        #   ((((mchunkptr)(((char*)(p))+((p)->size & ~SIZE_BITS)))->size) & PREV_INUSE)
        nc = self.next_chunk()
        return nc.has_PREV_INUSE()

    def next_chunk(self):
        # Analog of:
        #   #define next_chunk(p) ((mchunkptr)( ((char*)(p)) + ((p)->size & ~SIZE_BITS) ))
        ptr = self._gdbval.cast(type_char_ptr)
        cs = self.chunksize()
        ptr += cs
        ptr = ptr.cast(MChunkPtr.gdb_type())
        #print 'next_chunk returning: 0x%x' % ptr
        return MChunkPtr(ptr)

    def prev_chunk(self):
        # Analog of:
        #   #define prev_chunk(p) ((mchunkptr)( ((char*)(p)) - ((p)->prev_size) ))
        ptr = self._gdbval.cast(type_char_ptr)
        ptr -= self.field('prev_size')
        ptr = ptr.cast(MChunkPtr.gdb_type())
        return MChunkPtr(ptr)

class MBinPtr(MChunkPtr):
    # Wrapper around an "mbinptr"

    @classmethod
    def gdb_type(cls):
        # Deferred lookup of the "mbinptr" type:
        return caching_lookup_type('mbinptr')

    def first(self):
        return MChunkPtr(self.field('fd'))

    def last(self):
        return MChunkPtr(self.field('bk'))

class MFastBinPtr(MChunkPtr):
    # Wrapped around a mfastbinptr
    pass

class MallocState(WrappedValue):
    # Wrapper around struct malloc_state, as defined in malloc.c

    def fastbin(self, idx):
        return MFastBinPtr(self.field('fastbinsY')[idx])

    def bin_at(self, i):
        # addressing -- note that bin_at(0) does not exist
        #  (mbinptr) (((char *) &((m)->bins[((i) - 1) * 2]))
        #	     - offsetof (struct malloc_chunk, fd))

        ptr = self.field('bins')[(i-1)*2]
        #print '001', ptr
        ptr = ptr.address
        #print '002', ptr
        ptr = ptr.cast(type_char_ptr)
        #print '003', ptr
        ptr -= offsetof('struct malloc_chunk', 'fd')
        #print '004', ptr
        ptr = ptr.cast(MBinPtr.gdb_type())
        #print '005', ptr
        return MBinPtr(ptr)

    def iter_chunks(self):
        '''Yield a sequence of MChunkPtr corresponding to all chunks of memory
        in the heap (both used and free), in order of ascending address'''

        for c in self.iter_mmap_chunks():
            yield c

        for c in self.iter_sbrk_chunks():
            yield c

    def iter_mmap_chunks(self):
        for inf in gdb.inferiors():
            for (start, end) in iter_mmap_heap_chunks(inf.pid):
                # print "Trying 0x%x-0x%x" % (start, end)
                try:
                    chunk = MChunkPtr(gdb.Value(start).cast(MChunkPtr.gdb_type()))
                    # Does this look like the first chunk within a range of
                    # mmap address space?
                    #print ('0x%x' % chunk.as_address() + chunk.chunksize())
                    if (not chunk.has_NON_MAIN_ARENA() and chunk.has_IS_MMAPPED()
                        and chunk.as_address() + chunk.chunksize() <= end):

                        # Iterate upwards until you reach "end" of mmap space:
                        while chunk.as_address() < end and chunk.has_IS_MMAPPED():
                            yield chunk
                            # print '0x%x' % chunk.as_address(), chunk
                            chunk = chunk.next_chunk()
                except RuntimeError:
                    pass

    def iter_sbrk_chunks(self):
        '''Yield a sequence of MChunkPtr corresponding to all chunks of memory
        in the heap (both used and free), in order of ascending address, for those
        from sbrk_base upwards'''
        # FIXME: this is currently a hack; I need to verify my logic here

        # As I understand it, it's only possible to navigate the following ways:
        #
        # For a chunk with PREV_INUSE:0, then prev_size is valid, and can be used
        # to substract down to the start of that chunk
        # For a chunk with PREV_INUSE:1, then prev_size is not readable (reading it
        # could lead to SIGSEGV), and it's not possible to get at the size of the
        # previous chunk.

        # For a free chunk, we have next/prev pointers to a doubly-linked list
        # of other free chunks.

        # For a chunk, we have the size, and that size gives us the address of the next chunk in RAM
        # So if we know the address of the first chunk, then we can use this to iterate upwards through RAM,
        # and thus iterate over all of the chunks

        # Start at "mp_.sbrk_base"
        chunk = MChunkPtr(gdb.Value(sbrk_base()).cast(MChunkPtr.gdb_type()))
        # sbrk_base is NULL when no small allocations have happened:
        if chunk.as_address() > 0:
            # Iterate upwards until you reach "top":
            top = int(self.field('top'))
            while chunk.as_address() != top:
                yield chunk
                # print '0x%x' % chunk.as_address(), chunk
                try:
                    chunk = chunk.next_chunk()
                except RuntimeError:
                    break


    def iter_free_chunks(self):
        '''Yield a sequence of MChunkPtr (some of which may be MFastBinPtr),
        corresponding to the free chunks of memory'''
        # Account for top:
        print('top')
        yield MChunkPtr(self.field('top'))

        NFASTBINS = self.NFASTBINS()
        # Traverse fastbins:
        for i in range(0, int(NFASTBINS)):
            print('fastbin %i' % i)
            p = self.fastbin(i)
            while not p.is_null():
                yield p
                p = MFastBinPtr(p.field('fd'))

        #   for (p = fastbin (av, i); p != 0; p = p->fd) {
        #     ++nfastblocks;
        #     fastavail += chunksize(p);
        #   }
        # }

        # Must keep this in-sync with malloc.c:
        # FIXME: can we determine this dynamically from within gdb?
        NBINS = 128

        # Traverse regular bins:
        for i in range(1, NBINS):
            print('regular bin %i' % i)
            b = self.bin_at(i)
            #print 'b: %s' % b
            p = b.last()
            n = 0
            #print 'p:', p
            while p.as_address() != b.as_address():
                #print 'n:', n
                #print 'b:', b
                #print 'p:', p
                n+=1
                yield p
                p = MChunkPtr(p.field('bk'))
        #    for (p = last(b); p != b; p = p->bk) {
        #        ++nblocks;
        #          avail += chunksize(p);
        #    }
        # }

    def NFASTBINS(self):
        fastbinsY = self.field('fastbinsY')
        return array_length(fastbinsY)

class MallocPar(WrappedValue):
    # Wrapper around static struct malloc_par mp_
    @classmethod
    def get(cls):
        # It's a singleton:
        gdbval = gdb.parse_and_eval('mp_')
        return MallocPar(gdbval)

def sbrk_base():
    mp_ = MallocPar.get()
    try:
        return int(mp_.field('sbrk_base'))
    except RuntimeError as e:
        check_missing_debuginfo(e, 'glibc')
        raise e

"""
"""


# See malloc.c:
#    struct mallinfo mALLINFo(mstate av)
#    {
#      struct mallinfo mi;
#      size_t i;
#      mbinptr b;
#      mchunkptr p;
#      INTERNAL_SIZE_T avail;
#      INTERNAL_SIZE_T fastavail;
#      int nblocks;
#      int nfastblocks;
#
#      /* Ensure initialization */
#      if (av->top == 0)  malloc_consolidate(av);
#
#      check_malloc_state(av);
#
#      /* Account for top */
#      avail = chunksize(av->top);
#      nblocks = 1;  /* top always exists */
#
#      /* traverse fastbins */
#      nfastblocks = 0;
#      fastavail = 0;
#
#      for (i = 0; i < NFASTBINS; ++i) {
#        for (p = fastbin (av, i); p != 0; p = p->fd) {
#          ++nfastblocks;
#          fastavail += chunksize(p);
#        }
#      }
#
#      avail += fastavail;
#
#      /* traverse regular bins */
#      for (i = 1; i < NBINS; ++i) {
#        b = bin_at(av, i);
#        for (p = last(b); p != b; p = p->bk) {
#          ++nblocks;
#          avail += chunksize(p);
#        }
#      }
#
#      mi.smblks = nfastblocks;
#      mi.ordblks = nblocks;
#      mi.fordblks = avail;
#      mi.uordblks = av->system_mem - avail;
#      mi.arena = av->system_mem;
#      mi.hblks = mp_.n_mmaps;
#      mi.hblkhd = mp_.mmapped_mem;
#      mi.fsmblks = fastavail;
#      mi.keepcost = chunksize(av->top);
#      mi.usmblks = mp_.max_total_mem;
#      return mi;
#    }
#


def iter_mmap_heap_chunks(pid):
    '''Try to locate the memory-mapped heap allocations for the given
    process (by PID) by reading /proc/PID/maps

    Yield a sequence of (start, end) pairs'''
    for line in open('/proc/%i/maps' % pid):
        # print line,
        # e.g.:
        # 38e441e000-38e441f000 rw-p 0001e000 fd:01 1087                           /lib64/ld-2.11.1.so
        # 38e441f000-38e4420000 rw-p 00000000 00:00 0
        hexd = r'[0-9a-f]'
        hexdigits = '(' + hexd + '+)'
        m = re.match(hexdigits + '-' + hexdigits
                     + r' ([r\-][w\-][x\-][ps]) ' + hexdigits
                     + r' (..:..) (\d+)\s+(.*)',
                     line)
        if m:
            # print m.groups()
            start, end, perms, offset, dev, inode, pathname = m.groups()
            # PROT_READ, PROT_WRITE, MAP_PRIVATE:
            if perms == 'rw-p':
                if offset == '00000000': # FIXME bits?
                    if dev == '00:00': # FIXME
                        if inode == '0': # FIXME
                            if pathname == '': # FIXME
                                # print 'heap line?:', line
                                # print m.groups()
                                start, end = [int(m.group(i), 16) for i in (1, 2)]
                                yield (start, end)
        else:
            print('unmatched :', line)

class GlibcArenas(object):
    def __init__(self):
        self.main_arena = self.get_main_arena()
        self.cur_arena = self.get_ms(self.main_arena)
        self.get_arenas()

    def get_main_arena(self):
        return gdb.parse_and_eval("main_arena")

    def get_ms(self, arena_dereference=None):
        if arena_dereference:
            ms = MallocState(arena_dereference)
        else:
            ms = self.cur_arena

        return ms

    def get_arenas(self):
        ar_ptr = self.get_ms(self.main_arena)

        self.arenas = []
        while True:
            self.arenas.append(ar_ptr)

            if ar_ptr.address != ar_ptr.field('next'):
                ar_ptr = self.get_ms(ar_ptr.field('next').dereference())

            if ar_ptr.address == self.main_arena.address:
                return


glibc_arenas = GlibcArenas()


================================================
FILE: heap/gobject.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

import re
import sys

import gdb

from heap import WrappedPointer, WrappedValue, caching_lookup_type, type_char_ptr, Category

# Use glib's pretty-printers:
dir_ = '/usr/share/glib-2.0/gdb'
if not dir_ in sys.path:
    sys.path.insert(0, dir_)
from glib_gdb import read_global_var, g_quark_to_string


# This was adapted from glib's gobject.py:g_type_to_name
def get_typenode_for_gtype(gtype):
    def lookup_fundamental_type(typenode):
        if typenode == 0:
            return None
        val = read_global_var("static_fundamental_type_nodes")
        if val == None:
            return None

        # glib has an address() call here on the end, which looks wrong
        # (i) it's an attribute, not a method
        # (ii) it converts a TypeNode* to a TypeNode**
        return val[typenode >> 2]

    gtype = int(gtype)
    typenode = gtype - gtype % 4
    if typenode > (255 << 2):
        return gdb.Value(typenode).cast (gdb.lookup_type("TypeNode").pointer())
    else:
        return lookup_fundamental_type (typenode)

def is_typename_castable(typename):
    if typename.startswith('Gtk'):
        return True
    if typename.startswith('Gdk'):
        return True
    if typename.startswith('GType'):
        return True
    if typename.startswith('Pango'):
        return True
    if typename.startswith('GVfs'):
        return True
    return False

class GTypeInstancePtr(WrappedPointer):
    @classmethod
    def from_gtypeinstance_ptr(cls, addr, typenode):
        typename = cls.get_type_name(typenode)
        if typename:
            cls = cls.get_class_for_typename(typename)
            return cls(addr, typenode, typename)

    @classmethod
    def get_class_for_typename(cls, typename):
        '''Get the GTypeInstance subclass for the given type name'''
        if typename in typemap:
            return typemap[typename]
        return GTypeInstancePtr

    def __init__(self, addr, typenode, typename):
        # Try to cast the ptr to the named type:
        addr = gdb.Value(addr)
        try:
            if is_typename_castable(typename):
                # This requires, say, gtk2-debuginfo:
                ptr_type = caching_lookup_type(typename).pointer()
                addr = addr.cast(ptr_type)
                #print typename, addr.dereference()
                #if typename == 'GdkPixbuf':
                #    print 'GOT PIXELS', addr['pixels']
        except RuntimeError as e:
            pass
            #print addr, e

        WrappedPointer.__init__(self, addr)
        self.typenode = typenode
        self.typename = typename
        """
        try:
            print 'self', self
            print 'self.typename', self.typename
            print 'typenode', typenode
            print 'typenode.type', typenode.type
            print 'typenode.dereference()', typenode.dereference()
            print
        except:
            print 'got here'
            raise
        """

    def categorize(self):
        return Category('GType', self.typename, '')

    @classmethod
    def get_type_name(cls, typenode):
        return g_quark_to_string(typenode["qname"])


class GdkColormapPtr(GTypeInstancePtr):
    def categorize_refs(self, usage_set, level=0, detail=None):
        # print 'got here 46'
        pass
        # GdkRgbInfo is stored as qdata on a GdkColormap

class GdkImagePtr(GTypeInstancePtr):
    def categorize_refs(self, usage_set, level=0, detail=None):
        priv_type = caching_lookup_type('GdkImagePrivateX11').pointer()
        priv_data = WrappedPointer(self._gdbval['windowing_data'].cast(priv_type))

        usage_set.set_addr_category(priv_data.as_address(),
                                    Category('GType', 'GdkImagePrivateX11', ''),
                                    level=level+1, debug=True)

        ximage = WrappedPointer(priv_data.field('ximage'))
        dims = '%sw x %sh x %sbpp' % (ximage.field('width'),
                                      ximage.field('height'),
                                      ximage.field('depth'))
        usage_set.set_addr_category(ximage.as_address(),
                                    Category('X11', 'Image', dims),
                                    level=level+2, debug=True)

        usage_set.set_addr_category(int(ximage.field('data')),
                                    Category('X11', 'Image data', dims),
                                    level=level+2, debug=True)

class GdkPixbufPtr(GTypeInstancePtr):
    def categorize_refs(self, usage_set, level=0, detail=None):
        dims = '%sw x %sh' % (self._gdbval['width'],
                              self._gdbval['height'])
        usage_set.set_addr_category(int(self._gdbval['pixels']),
                                    Category('GType', 'GdkPixbuf pixels', dims),
                                    level=level+1, debug=True)

class PangoCairoFcFontMapPtr(GTypeInstancePtr):
    def categorize_refs(self, usage_set, level=0, detail=None):
        # This gives us access to the freetype library:
        FT_Library = WrappedPointer(self._gdbval['library'])

        # This is actually a "struct  FT_LibraryRec_", in FreeType's
        #   include/freetype/internal/ftobjs.h
        # print FT_Library._gdbval.dereference()

        usage_set.set_addr_category(FT_Library.as_address(),
                                    Category('FreeType', 'Library', ''),
                                    level=level+1, debug=True)

        usage_set.set_addr_category(int(FT_Library.field('raster_pool')),
                                    Category('FreeType', 'raster_pool', ''),
                                    level=level+2, debug=True)
        # potentially we could look at FT_Library['memory']


typemap = {
    'GdkColormap':GdkColormapPtr,
    'GdkImage':GdkImagePtr,
    'GdkPixbuf':GdkPixbufPtr,
    'PangoCairoFcFontMap':PangoCairoFcFontMapPtr,
}


def as_gtype_instance(addr, size):
    #type_GObject_ptr = caching_lookup_type('GObject').pointer()
    try:
        type_GTypeInstance_ptr = caching_lookup_type('GTypeInstance').pointer()
    except RuntimeError:
        # Not linked against GLib?
        return None

    gobj = gdb.Value(addr).cast(type_GTypeInstance_ptr)
    try:
        gtype = gobj['g_class']['g_type']
        #print 'gtype', gtype
        typenode = get_typenode_for_gtype(gtype)
        # If I remove the next line, we get errors like:
        #   Cannot access memory at address 0xd1a712caa5b6e5c0
        # Does this line give us an early chance to raise an exception?
        #print 'typenode', typenode
        # It appears to be in the coercion to boolean here:
        # if typenode:
        if typenode is not None:
            #print 'typenode.dereference()', typenode.dereference()
            return GTypeInstancePtr.from_gtypeinstance_ptr(addr, typenode)
    except RuntimeError:
        # Any random buffer that we point this at that isn't a GTypeInstance (or
        # GObject) is likely to raise a RuntimeError at some point in the above
        pass
    return None

# FIXME: currently this ignores G_SLICE
# e.g. use
#    G_SLICE=always-malloc
# to override this


================================================
FILE: heap/history.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

import datetime
from heap import iter_usage_with_progress, fmt_size, fmt_addr, sign

class Snapshot(object):
    '''Snapshot of the state of the heap'''
    def __init__(self, name, time):
        self.name = name
        self.time = time
        self._all_usage = set()
        self._totalsize = 0
        self._num_usage = 0

    def _add_usage(self, u):
        self._all_usage.add(u)
        self._totalsize += u.size
        self._num_usage += 1
        return u

    @classmethod
    def current(cls, name):
        result = cls(name, datetime.datetime.now())
        for i, u in enumerate(iter_usage_with_progress()):
            u.ensure_category()
            u.ensure_hexdump()
            result._add_usage(u)
        return result

    def total_size(self):
        '''Get total allocated size, in bytes'''
        return self._totalsize

    def summary(self):
        return '%s allocated, in %i blocks' % (fmt_size(self.total_size()), 
                                               self._num_usage)

    def size_by_address(self, address):
        return self._chunk_by_address[address].size

class History(object):
    '''History of snapshots of the state of the heap'''
    def __init__(self):
        self.snapshots = []

    def add(self, name):
        s = Snapshot.current(name)
        self.snapshots.append(s)
        return s

class Diff(object):
    '''Differences between two states of the heap'''
    def __init__(self, old, new):
        self.old = old
        self.new = new

        self.new_minus_old = self.new._all_usage - self.old._all_usage
        self.old_minus_new = self.old._all_usage - self.new._all_usage

    def stats(self):
        size_change = self.new.total_size() - self.old.total_size()
        count_change = self.new._num_usage - self.old._num_usage
        return "%s%s bytes, %s%s blocks" % (sign(size_change),
                                      fmt_size(size_change),
                                      sign(count_change),
                                      fmt_size(count_change))
        
    def as_changes(self):
        result = self.chunk_report('Free-d blocks', self.old, self.old_minus_new)
        result += self.chunk_report('New blocks', self.new, self.new_minus_old)
        # FIXME: add changed chunks
        return result

    def chunk_report(self, title, snapshot, set_of_usage):
        result = '%s:\n' % title
        if len(set_of_usage) == 0:
            result += '  (none)\n'
            return result
        for usage in sorted(set_of_usage,
                            lambda u1, u2: cmp(u1.start, u2.start)):
            result += ('  %s -> %s %8i bytes %20s |%s\n'
                       % (fmt_addr(usage.start),
                          fmt_addr(usage.start + usage.size-1),
                          usage.size, usage.category, usage.hd))
        return result
    
history = History()


================================================
FILE: heap/parser.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA


# Query language for the heap

# Uses "ply", so we'll need python-ply on Fedora

# Split into tokenizer, then grammar, then external interface

############################################################################
# Tokenizer:
############################################################################
import ply.lex as lex

reserved = ['AND', 'OR', 'NOT']
tokens = [
    'ID','LITERAL_NUMBER', 'LITERAL_STRING',
    'LPAREN','RPAREN',
    'COMPARISON'
    ] + reserved

t_LPAREN  = r'\('
t_RPAREN  = r'\)'

def t_ID(t):
    r'[a-zA-Z_][a-zA-Z_0-9]*'
    # Check for reserved words (case insensitive):
    if t.value.upper() in reserved:
        t.type = t.value.upper()
    else:
        t.type = 'ID'
    return t

def t_COMPARISON(t):
    r'<=|<|==|=|!=|>=|>'
    return t

def t_LITERAL_NUMBER(t):
    r'(0x[0-9a-fA-F]+|\d+)'
    try:
        if t.value.startswith('0x'):
            t.value = int(t.value, 16)
        else:
            t.value = int(t.value)
    except ValueError:
        raise ParserError(t.value)
    return t

def t_LITERAL_STRING(t):
    r'"([^"]*)"'
    # Drop the quotes:
    t.value = t.value[1:-1]
    return t

# Ignored characters
t_ignore = " \t"

def t_newline(t):
    r'\n+'
    t.lexer.lineno += t.value.count("\n")

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

lexer = lex.lex()


############################################################################
# Grammar:
############################################################################
import ply.yacc as yacc

precedence = (
    ('left', 'AND', 'OR'),
    ('left', 'NOT'),
    ('left', 'COMPARISON'),
)

from heap.query import Constant, And, Or, Not, GetAttr, \
    Comparison__le__, Comparison__lt__, Comparison__eq__, \
    Comparison__ne__, Comparison__ge__, Comparison__gt__


def p_expression_number(t):
    'expression : LITERAL_NUMBER'
    t[0] = Constant(t[1])

def p_expression_string(t):
    'expression : LITERAL_STRING'
    t[0] = Constant(t[1])

def p_comparison(t):
    'expression : expression COMPARISON expression'
    classes = { '<=' : Comparison__le__,
                '<'  : Comparison__lt__,
                '==' : Comparison__eq__,
                '='  : Comparison__eq__,
                '!=' : Comparison__ne__,
                '>=' : Comparison__ge__,
                '>'  : Comparison__gt__ }
    cls = classes[t[2]]

    t[0] = cls(t[1], t[3])

def p_and(t):
    'expression : expression AND expression'
    t[0] = And(t[1], t[3])

def p_or(t):
    'expression : expression OR expression'
    t[0] = Or(t[1], t[3])

def p_not(t):
    'expression : NOT expression'
    t[0] = Not(t[2])

def p_expression_group(t):
    'expression : LPAREN expression RPAREN'
    t[0] = t[2]

def p_expression_name(t):
    'expression : ID'
    attrname = t[1]
    attrnames = ('domain', 'kind', 'detail', 'addr', 'start', 'size')
    if attrname not in attrnames:
        raise ParserError.from_production(t, attrname,
                                          ('Unknown attribute "%s" (supported are %s)'
                                           % (attrname, ','.join(attrnames))))
    t[0] = GetAttr(attrname)

class ParserError(Exception):
    @classmethod
    def from_production(cls, p, val, msg):
        return ParserError(p.lexer.lexdata,
                           p.lexer.lexpos - len(val),
                           val,
                           msg)

    @classmethod
    def from_token(cls, t, msg="Parse error"):
        return ParserError(t.lexer.lexdata,
                           t.lexer.lexpos - len(t.value),
                           t.value,
                           msg)

    def __init__(self, input_, pos, value, msg):
        self.input_ = input_
        self.pos = pos
        self.value = value
        self.msg = msg

    def __str__(self):
        return ('%s at "%s":\n%s\n%s'
                % (self.msg, self.value,
                   self.input_,
                   ' '*self.pos + '^'*len(self.value)))

def p_error(t):
    raise ParserError.from_token(t)


############################################################################
# Interface:
############################################################################

# Entry point:
def parse_query(s):
    #try:
    parser = yacc.yacc(debug=0, write_tables=0)
    return parser.parse(s)#, debug=1)
    #except ParserError, e:
    #    print 'foo', e

def test_lexer(s):
    lexer.input(s)
    while True:
        tok = lexer.token()
        if not tok: break
        print(tok)


================================================
FILE: heap/pypy.py
================================================
import gdb
from heap import WrappedPointer, caching_lookup_type, Usage, \
    type_void_ptr, fmt_addr, Category, looks_like_ptr, \
    WrongInferiorProcess

def pypy_categorizer(addr, size):
    return None

class ArenaCollection(WrappedPointer):

    # Corresponds to pypy/rpython/memory/gc/minimarkpage.py:ArenaCollection

    def get_arenas(self):
        # Yield a sequence of (struct pypy_ArenaReference0*) gdb.Value instances
        # representing the arenas
        current_arena = self.field('ac_inst_current_arena')
        # print "self.field('ac_inst_current_arena'): %s" % self.field('ac_inst_current_arena')
        if current_arena:
            yield ArenaReference(current_arena)
        # print "self.field('ac_inst_arenas_lists'):%s" % self.field('ac_inst_arenas_lists')
        #for arena in :
        arena = self.field('ac_inst_arenas_lists')
        #while arena:
        #    yield ArenaReference(arena)
        #    arena = arena.dereference()['ac_inst_nextarena']

class ArenaReference(WrappedPointer):
    def iter_usage(self):
        # print 'got PyPy arena within allocations'
        return [] # FIXME

class ArenaDetection(object):
    '''Detection of PyPy arenas, done as an object so that we can cache state'''
    def __init__(self):
        try:
            ac_global = gdb.parse_and_eval('pypy_g_pypy_rpython_memory_gc_minimarkpage_ArenaCollect')
        except RuntimeError:
            # Not PyPy?
            raise WrongInferiorProcess('pypy')
        self._ac = ArenaCollection(ac_global.address)
        self._arena_refs = []
        self._malloc_ptrs = {}
        for ar in self._ac.get_arenas():
            print(ar)
            print(ar._gdbval.dereference())
            self._arena_refs.append(ar)
            # ar_base : address as returned by malloc
            self._malloc_ptrs[int(ar.field('ar_base'))] = ar
        print(self._malloc_ptrs)

    def as_arena(self, ptr, chunksize):
        if ptr in self._malloc_ptrs:
            return self._malloc_ptrs[ptr]
        return None


================================================
FILE: heap/query.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

import sys

class Expression(object):
    def eval_(self, u):
        raise NotImplementedError

    def __eq__(self, other):
        return (self.__class__ == other.__class__
                and self.__dict__ == other.__dict__)

class Constant(Expression):
    def __init__(self, value):
        self.value = value

    def __repr__(self):
        return 'Constant(%r)' % (self.value,)

    def eval_(self, u):
        return self.value

class GetAttr(Expression):
    def __init__(self, attrname):
        self.attrname = attrname

    def __repr__(self):
        return 'GetAttr(%r)' % (self.attrname,)

    def eval_(self, u):
        if self.attrname in ('domain', 'kind', 'detail'):
            if u.category == None:
                u.ensure_category()
            return getattr(u.category, self.attrname)
        return getattr(u, self.attrname)

class BinaryOp(Expression):
    def __init__(self, lhs, rhs):
        self.lhs = lhs
        self.rhs = rhs

class Comparison(BinaryOp):
    def __init__(self, lhs, rhs):
        BinaryOp.__init__(self, lhs, rhs)

    def __repr__(self):
        return '%s(%r, %r)' % (self.__class__.__name__, self.lhs, self.rhs)

    def eval_(self, u):
        lhs_val = self.lhs.eval_(u)
        rhs_val = self.rhs.eval_(u)
        return self.cmp_(lhs_val, rhs_val)

    def cmp_(self, lhs, rhs):
        raise NotImplementedError

class Comparison__le__(Comparison):
    def cmp_(self, lhs, rhs):
        return lhs <= rhs

class Comparison__lt__(Comparison):
    def cmp_(self, lhs, rhs):
        return lhs <  rhs

class Comparison__eq__(Comparison):
    def cmp_(self, lhs, rhs):
        return lhs == rhs

class Comparison__ne__(Comparison):
    def cmp_(self, lhs, rhs):
        return lhs != rhs

class Comparison__ge__(Comparison):
    def cmp_(self, lhs, rhs):
        return lhs >= rhs

class Comparison__gt__(Comparison):
    def cmp_(self, lhs, rhs):
        return lhs >  rhs


class And(BinaryOp):
    def __repr__(self):
        return 'And(%r, %r)' % (self.lhs, self.rhs)

    def eval_(self, u):
        # Short-circuit evaluation:
        if not self.lhs.eval_(u):
            return False
        return self.rhs.eval_(u)

class Or(BinaryOp):
    def __repr__(self):
        return 'Or(%r, %r)' % (self.lhs, self.rhs)

    def eval_(self, u):
        # Short-circuit evaluation:
        if self.lhs.eval_(u):
            return True
        return self.rhs.eval_(u)

class Not(Expression):
    def __init__(self, inner):
        self.inner = inner
    def __repr__(self):
        return 'Not(%r)' % (self.inner, )
    def eval_(self, u):
        return not self.inner.eval_(u)


class Column(object):
    def __init__(self, name, getter, formatter):
        self.name = name
        self.getter = getter
        self.formatter = formatter


class Query(object):
    def __init__(self, filter_):
        self.filter_ = filter_

    def __iter__(self):
        from heap import iter_usage_with_progress, lazily_get_usage_list

        if True:
            # 2-pass, but the expensive first pass may be cached
            usage_list = lazily_get_usage_list()
            for u in usage_list:
                if self.filter_.eval_(u):
                    yield u
        else:
            # 1-pass:
            # This may miss blocks that are only categorized w.r.t. to other
            # blocks:
            for u in iter_usage_with_progress():
                if self.filter_.eval_(u):
                    yield u

def do_query(args):
    from heap import fmt_addr, Table
    from heap.parser import parse_query

    if args == '':
        # if no query supplied, select everything:
        filter_ = Constant(True)
    else:
        filter_ = parse_query(args)

    if False:
        print(args)
        print(filter_)

    columns = [Column('Start',
                      lambda u: u.start,
                      fmt_addr),
               Column('End',
                      lambda u: u.start + u.size - 1,
                      fmt_addr
                      ),
               Column('Domain',
                      lambda u: u.category.domain,
                      None),
               Column('Kind',
                      lambda u: u.category.kind,
                      None),
               Column('Detail',
                      lambda u: u.category.detail,
                      None),
               Column('Hexdump',
                      lambda u: u.hexdump,
                      None),
               ]

    t = Table([col.name for col in columns])

    for u in Query(filter_):
        u.ensure_hexdump()
        u.ensure_category()

        if u.category:
            domain = u.category.domain
            kind = u.category.kind
            detail = u.category.detail
            if not detail:
                detail = ''
        else:
            domain = ''
            kind = ''
            detail = ''

        t.add_row([fmt_addr(u.start),
                   fmt_addr(u.start + u.size - 1),
                   domain,
                   kind,
                   detail,
                   u.hd])

    t.write(sys.stdout)
    print()


================================================
FILE: heap/sqlite.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

from heap import Category, caching_lookup_type

import gdb

def categorize_sqlite3(addr, usage_set, visited):
    # "struct sqlite3" is defined in src/sqliteInt.h, which is an internal header
    ptr_type = caching_lookup_type('sqlite3').pointer()
    obj_ptr = gdb.Value(addr).cast(ptr_type)
    # print obj_ptr.dereference()

    aDb = obj_ptr['aDb']
    Db_addr = int(aDb)
    Db_malloc_addr = Db_addr - 8
    if usage_set.set_addr_category(Db_malloc_addr, Category('sqlite3', 'struct Db', None), visited):
        print(aDb['pBt'].dereference())
        # FIXME


================================================
FILE: make-release.sh
================================================
# Utility to help dmalcolm make releases:
VERSION=$1
git clone git://git.fedorahosted.org/gdb-heap.git

pushd gdb-heap
git tag -a -m "$VERSION" $VERSION
# FIXME: pushing this isn't working for some reason
popd

mv gdb-heap gdb-heap-${VERSION}
tar cfvj gdb-heap-${VERSION}.tar.bz2 gdb-heap-${VERSION}
scp gdb-heap-${VERSION}.tar.bz2 dmalcolm@fedorahosted.org:gdb-heap
rm -rf gdb-heap-${VERSION}


================================================
FILE: object-sizes.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA


# This is a support script for selftest.py

# It creates various kinds of object, so that we can verify that gdb-heap
# detects them (and their supporting buffers)


# Four different kinds of (x, y) coordinate:

try:
    from collections import namedtuple
    NamedTuple = namedtuple('NamedTuple', ('x', 'y'))
except ImportError:
    NamedTuple = None

class OldStyle:
    def __init__(self, x, y):
        self.x = x
        self.y = y

class NewStyle(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y

class NewStyleWithSlots(object):
    __slots__ = ('x', 'y')
    def __init__(self, x, y):
        self.x = x
        self.y = y

objs = []
types = [OldStyle, NewStyle, NewStyleWithSlots]
if NamedTuple:
    types.append(NamedTuple)
for impl in types:
    objs.append(impl(x=3, y=4))
print(objs)


# Test creating an object with more than 8 attributes, so that the __dict__
# has an external PyDictEntry buffer.
# We will test to see if this detectable in the selftest.
class OldStyleManyAttribs:
    def __init__(self, **kwargs):
        self.__dict__ = kwargs

class NewStyleManyAttribs(object):
    def __init__(self, **kwargs):
        self.__dict__ = kwargs


# Create instance with 9 attributes:
old_style_many = OldStyleManyAttribs(**dict(zip('abcdefghi', range(9))))
new_style_many = NewStyleManyAttribs(**dict(zip('abcdefghi', range(9))))


# Ensure that we have a set object that uses an externally allocated
# buffer, so that we can verify that these are detected.  To do this,
# we need a set with more than PySet_MINSIZE members (which is 8):
large_set = set(range(64))
large_frozenset = frozenset(range(64))

import sqlite3
db = sqlite3.connect(':memory:')
c = db.cursor()

# Create table
c.execute('''CREATE TABLE dummy(foo TEXT, bar TEXT, v REAL)''')

# Insert a row of data
c.execute("INSERT INTO dummy VALUES ('ostrich', 'elephant', 42.0)")

# Save (commit) the changes
db.commit()

# Don't close "c"; we want to see the objects in memory


# Ensure that the selftest's breakpoint on builtin_id is hit:
id(42)


================================================
FILE: resultparser.py
================================================
# Classes for working with the textual table output from gdb-heap

import unittest
import re
from collections import namedtuple

def indent(str_):
    return '\n'.join([(' ' * 4) + line
                      for line in str_.splitlines()])

class ColumnNotFound(Exception):
    def __init__(self, colname, table):
        self.colname = colname
        self.table = table

    def __str__(self):
        return ('ColumnNotFound(%s) in:\n%s'
                % (self.colname, indent(str(self.table))))

class RowNotFound(Exception):
    def __init__(self, criteria, table):
        self.criteria = criteria
        self.table = table
    def __str__(self):
        return ('RowNotFound(%s) in:\n%s'
                % (self.criteria, indent(str(self.table))))

class Criteria(object):
    '''A list of (colname, value) criteria for searching rows in a table'''
    def __init__(self, table, kvs):
        self.kvs = kvs
        self._by_index = [(table.find_col(attrname), value)
                          for attrname, value in kvs]

    def __str__(self):
        return 'Criteria(%s)' % ','.join('%r=%r' % (attrname, value)
                                         for attrname, value in self.kvs)

    def is_matched_by(self, row):
        for colindex, value in self._by_index:
            if row[colindex] != value:
                return False
        return True


class ParsedTable(object):
    '''Parses output from heap.Table, for use in writing selftests'''
    @classmethod
    def parse_lines(cls, data):
        '''Parse the lines in the string, returning a list of ParsedTable
        instances'''
        result = []
        lines = data.splitlines()
        start = 0
        while start < len(lines):
            sep_line = cls._find_separator_line(lines[start:])
            if sep_line:
                sep_index, colmetrics = sep_line
                t = ParsedTable(sep_index, colmetrics, lines[start:])
                result.append(t)
                start += t.sep_index + 1 + len(t.rows)
            else:
                break
        return result

    # Column metrics:
    ColMetric = namedtuple('ColMetric', ('offset', 'width'))
        
    def __init__(self, sep_index, colmetrics, lines):
        self.sep_index, self.colmetrics = sep_index, colmetrics

        # Parse column headings:
        header_index = self.sep_index - 1
        self.colnames = self._split_cells(lines[header_index])

        # Parse rows:
        self.rows = []
        for line in lines[self.sep_index + 1:]:
            if line == '':
                break
            self.rows.append(self._split_cells(line))

        self.rawdata = '\n'.join(lines[header_index:header_index+len(self.rows)+2])

    def __str__(self):
        return self.rawdata

    def as_rst_grid_table(self):

        def _get_separator_row(colwidths, sepchar):
            return '+' + ('+'.join([sepchar * width for width in colwidths])) + '+\n'

        def _get_row(values, colwidths):
            row = '|'
            cells = []
            for value, width in zip(values, colwidths):
                if value is None:
                    cells.append(' ' * width)
                else:
                    formatString = "%%%ds" % width # to generate e.g. "%20s" 
                    cells.append(formatString % value)
            row += '|'.join([cell for cell in cells])
            row += '|\n'
            return row
            
        colwidths = [colmetric.width for colmetric in self.colmetrics]

        result = _get_separator_row(colwidths, '-')
        result += _get_row(self.colnames, colwidths)
        result += _get_separator_row(colwidths, '=')
        for row in self.rows:
            result += _get_row(row, colwidths)
            result += _get_separator_row(colwidths, '-')

        return result

    def get_cell(self, x, y):
        return self.rows[y][x]

    def find_col(self, colname):
        # Find the index of the column with the given name
        for x, col in enumerate(self.colnames):
            if colname == col:
                return x
        raise ColumnNotFound(colname, self)

    def find_row(self, kvs):
        # Find the first row matching the criteria, or raise RowNotFound
        criteria = Criteria(self, kvs)
        for row in self.rows:
            if criteria.is_matched_by(row):
                return row
        raise RowNotFound(criteria, self)

    def find_cell(self, kvs, attr2name):
        criteria = Criteria(self, kvs)
        row = self.find_row(kvs)
        return row[self.find_col(attr2name)]

    def _get_cell_value(self, cellstr):
        if cellstr == '':
            return None

        # Remove ',' separators from numbers, and treat as decimal:
        m = re.match('^([0-9,]+)$', cellstr) # [0-9]\,
        if m:
            return int(cellstr.replace(',', ''))

        # Hexadecimal values:
        m = re.match('^(0x[0-9a-f]+)$', cellstr)
        if m:
            return int(cellstr, 16)

        # Keep as a str:
        return cellstr

    def _split_cells(self, line):
        row = []
        for col in self.colmetrics:
            cellstr = line[col.offset: col.offset+col.width].lstrip()
            cellvalue = self._get_cell_value(cellstr)
            row.append(cellvalue)
        return tuple(row)

    @classmethod
    def _find_separator_line(cls, lines):
        # Look for the separator line
        # Return (index, tuple of ColMetric)
        for i, line in enumerate(lines):
            if line.startswith('-'):
                widths = [len(frag) for frag in line.split('  ')]
                coldata = []
                offset = 0
                for width in widths:
                    coldata.append(cls.ColMetric(offset=offset, width=width))
                    offset += width + 2
                return (i, tuple(coldata))
            

# Test data for table parsing (edited fragment of output during development):
test_table = '''
junk line

       Domain        Kind                 Detail  Count  Allocated size
-------------  ----------  ---------------------  -----  --------------
       python         str                         3,891         234,936
uncategorized                        98312 bytes      1          98,312
uncategorized                         1544 bytes     43          66,392
uncategorized                         6152 bytes     10          61,520
       python       tuple                         1,421          54,168
                                                             0xdeadbeef
                                           TOTAL  9,377         857,592

another junk line

another table

Chunk size  Num chunks  Allocated size
----------  ----------  --------------
        16         100           1,600
        24          50           1,200
    TOTALS         150           2,800

more junk
'''

class ParserTests(unittest.TestCase):
    def test_table_data(self):
        tables = ParsedTable.parse_lines(test_table)
        self.assertEquals(len(tables), 2)
        pt = tables[0]

        # Verify column names:
        self.assertEquals(pt.colnames, ('Domain', 'Kind', 'Detail', 'Count', 'Allocated size'))

        # Verify (x,y) lookup, and type conversions:
        self.assertEquals(pt.get_cell(0, 0), 'python')
        self.assertEquals(pt.get_cell(1, 3), None)
        self.assertEquals(pt.get_cell(4, 5), 0xdeadbeef)
        self.assertEquals(pt.get_cell(4, 6), 857592)

        # Verify searching by value:
        self.assertEquals(pt.find_col('Count'), 3)
        self.assertEquals(pt.find_row([('Allocated size', 54168),]),
                          ('python', 'tuple', None, 1421, 54168))
        self.assertEquals(pt.find_cell([('Kind', 'str'),], 'Count'), 3891)

        # Error-checking:
        self.assertRaises(ColumnNotFound,
                          pt.find_col, 'Ensure that a non-existant column raises an error')
        self.assertRaises(RowNotFound,
                          pt.find_row, [('Count', -1)])

        # Verify that "rawdata" contains the correct string data:
        self.assert_(pt.rawdata.startswith('       Domain'))
        self.assert_(pt.rawdata.endswith('857,592'))

        # Test the second table:
        pt = tables[1]
        self.assertEquals(pt.colnames, ('Chunk size', 'Num chunks', 'Allocated size'))
        self.assertEquals(pt.get_cell(2, 2), 2800)
        self.assert_(pt.rawdata.startswith('Chunk size'))
        self.assert_(pt.rawdata.endswith('2,800'))


    def test_multiple_tables(self):
        tables = ParsedTable.parse_lines(test_table * 5)
        self.assertEquals(len(tables), 10)

    def test_rst(self):
        tables = ParsedTable.parse_lines(test_table)
        self.assertEquals(len(tables), 2)
        pt = tables[0]
        
        rst_text = pt.as_rst_grid_table()
        
        exp = (
            '+-------------+----------+---------------------+-----+--------------+\n'
            '|       Domain|      Kind|               Detail|Count|Allocated size|\n'
            '+=============+==========+=====================+=====+==============+\n'
            '|       python|       str|                     | 3891|        234936|\n'
            '+-------------+----------+---------------------+-----+--------------+\n'
            '|uncategorized|          |          98312 bytes|    1|         98312|\n'
            '+-------------+----------+---------------------+-----+--------------+\n'
            '|uncategorized|          |           1544 bytes|   43|         66392|\n'
            '+-------------+----------+---------------------+-----+--------------+\n'
            '|uncategorized|          |           6152 bytes|   10|         61520|\n'
            '+-------------+----------+---------------------+-----+--------------+\n'
            '|       python|     tuple|                     | 1421|         54168|\n'
            '+-------------+----------+---------------------+-----+--------------+\n'
            '|             |          |                     |     |    3735928559|\n'
            '+-------------+----------+---------------------+-----+--------------+\n'
            '|             |          |                TOTAL| 9377|        857592|\n'
            '+-------------+----------+---------------------+-----+--------------+\n')
        
        self.assertEquals(rst_text, exp)

if __name__ == "__main__":
    unittest.main()


================================================
FILE: run-gdb-heap
================================================
#!/bin/bash
# Handy script for launching a program under gdb, whilst wiring up gdb to use
# the working copy of gdb-heap
# Typical usage:
#   ./run-gdb-heap python
PYTHONPATH="$(pwd)" \
  gdb \
  --eval-command="python import gdbheap" \
  --args $*


================================================
FILE: selftest.py
================================================
# Copyright (C) 2010  David Hugh Malcolm
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

# Verify that gdb can print information on the heap of an inferior process
#
# Adapted from Python's Lib/test/test_gdb.py, which in turn was adapted from
# similar work in Unladen Swallow's Lib/test/test_jit_gdb.py

import os
import re
from subprocess import Popen, PIPE, call as subprocess_call
import sys
import unittest
import random
from test.test_support import run_unittest, findfile

if sys.maxint == 0x7fffffff:
    _32bit = True
else:
    _32bit = False

try:
    gdb_version, _ = Popen(["gdb", "--version"],
                           stdout=PIPE).communicate()
except OSError:
    # This is what "no gdb" looks like.  There may, however, be other
    # errors that manifest this way too.
    raise unittest.SkipTest("Couldn't find gdb on the path")
gdb_version_number = re.search(r"^GNU gdb [^\d]*(\d+)\.", gdb_version)
if int(gdb_version_number.group(1)) < 7:
    raise unittest.SkipTest("gdb versions before 7.0 didn't support python embedding"
                            " Saw:\n" + gdb_version)

# Verify that "gdb" was built with the embedded python support enabled:
cmd = "--eval-command=python import sys; print sys.version_info"
p = Popen(["gdb", "--batch", cmd], stdout=PIPE)
gdbpy_version, _ = p.communicate()
if gdbpy_version == '':
    raise unittest.SkipTest("gdb not built with embedded python support")

class TestSource(object):
    '''Programatically construct C source code for a test program that calls into the heap'''
    def __init__(self):
        self.decls = ''
        self.operations = ''
        self.num_ptrs = 0
        self.indent = '    '

    def add_line(self, code):
        self.operations += self.indent + code + '\n'

    def add_malloc(self, size, debug=False, typename=None):
        self.num_ptrs += 1
        varname = 'ptr%03i'% self.num_ptrs

        if typename:
            cast = '(%s)' % typename
        else:
            typename = 'void *'
            cast = ''

        self.add_line('%s%s = %smalloc(0x%x); /* %i */'
                      % (typename, varname, cast, size, size))
        if debug:
            self.add_line('printf(__FILE__ ":%%i:%s=%%p\\n", __LINE__, %s);'
                          % (varname, varname))
            self.add_line('fflush(stdout);')
        return varname

    def add_realloc(self, varname, size, debug=False):
        self.num_ptrs += 1
        new_varname = 'ptr%03i'% self.num_ptrs
        self.add_line('void *%s = realloc(%s, 0x%x);'
                      % (new_varname, varname, size))
        if debug:
            self.add_line('printf(__FILE__ ":%%i:%s=%%p\\n", __LINE__, %s);'
                          % (new_varname, new_varname))
            self.add_line('fflush(stdout);')
        return new_varname

    def add_free(self, varname, debug=False):
        self.add_line('free(%s);' % varname)

    def add_breakpoint(self):
        self.add_line('__asm__ __volatile__ ("int $03");')

    def as_c_source(self):
        result = '''
#include <stdio.h>
#include <stdlib.h>
'''
        result += self.decls
        result += '''
int
main (int argc, char **argv)
{
''' + self.operations + '''
    return 0;
}
'''
        return result
        

class TestProgram(object):
    def __init__(self, name, source, is_cplusplus=False):
        self.name = name
        self.source = source

        if is_cplusplus:
            self.srcname = '%s.cc' % self.name
            compiler = 'g++'
        else:
            self.srcname = '%s.c' % self.name
            compiler = 'gcc'

        f = open(self.srcname, 'w')
        f.write(source)
        f.close()
        
        c = subprocess_call([compiler,

                             # We want debug information:
                             '-g', 
                             
                             # Name of the binary:
                             '-o', self.name,

                             # The source file:
                             self.srcname]) 
        # Check exit status:
        assert(c == 0)
        
        # Check that the binary exists:
        assert(os.path.exists(self.name))

from resultparser import ParsedTable, RowNotFound, test_table

class DebuggerTests(unittest.TestCase):

    """Test that the debugger can debug the heap"""

    def run_gdb(self, *args):
        """Runs gdb with the command line given by *args.

        Returns its stdout, stderr
        """
        out, err = Popen(args, stdout=PIPE, stderr=PIPE).communicate()
        return out, err


    def requires_binary(self, binary):
        # Slightly complicated: gdb will look for the binary within the PWD
        # as well as within the $PATH

        if os.path.exists(binary):
            # It's either an absolute or relative path, and directly exists:
            return

        p = Popen(['which', binary], stdout=PIPE, stderr=PIPE)
        out, err = p.communicate()
        if p.returncode == 0:
            # It's in the $PATH
            return

        raise unittest.SkipTest("%s not found" % binary)

    def command_test(self, progargs, commands, breakpoint=None):

        self.requires_binary(progargs[0])

        # Run under gdb, hit the breakpoint, then run our "heap" command:
        commands =  [
            'python sys.path.append(".") ; import gdbheap'
            ] + commands
        args = ["gdb", "--batch"]
        args += ['--eval-command=%s' % cmd for cmd in commands]
        args += ["--args"] + progargs

        # print args
        # print ' '.join(args)

        # Use "args" to invoke gdb, capturing stdout, stderr:
        out, err = self.run_gdb(*args)

        # Ignore some noise on stderr due to a pending breakpoint:
        if breakpoint:
            err = err.replace('Function "%s" not defined.\n' % breakpoint, '')

        # Ensure no unexpected error messages:
        if err != '':
            print out
            print err
            self.fail('stderr from gdb was non-empty: %r' % err)

        return out        

    def program_test(self, name, source, commands, is_cplusplus=False):
        p = TestProgram(name, source, is_cplusplus)
        return self.command_test([p.name], commands)

    def test_no_allocations(self):
        # Verify handling of an inferior process that doesn't use the heap
        src = TestSource()
        src.add_breakpoint()
        source = src.as_c_source()

        out = self.program_test('test_no_allocations', source, commands=['run',  'heap sizes'])
        self.assert_('''
Chunk size  Num chunks  Allocated size
----------  ----------  --------------
    TOTALS           0               0
''' in out)

    def test_small_allocations(self):
        src = TestSource()
        # 100 allocations each of sizes in the range 1-15
        for i in range(100):
            for size in range(1, 16):
                src.add_malloc(size)
        src.add_breakpoint()
        source = src.as_c_source()

        out = self.program_test('test_small_allocations', source, commands=['run',  'heap sizes'])

        if _32bit:
            exp = '''
Chunk size  Num chunks  Allocated size
----------  ----------  --------------
        16        1200          19,200
        24         300           7,200
    TOTALS        1500          26,400
'''
        else:
            exp = '''
Chunk size  Num chunks  Allocated size
----------  ----------  --------------
        32        1500          48,000
    TOTALS        1500          48,000
'''
        self.assert_(exp in out, out)


    def test_large_allocations(self):
        # 10 allocations each of sizes in the range 1MB through 10MB:
        src = TestSource()
        for i in range(10):
            size = 1024 * 1024 * (i+1)
            src.add_malloc(size)
        src.add_breakpoint()
        source = src.as_c_source()

        out = self.program_test('test_large_allocations', source, commands=['run',  'heap sizes'])
        self.assert_('''
Chunk size  Num chunks  Allocated size
----------  ----------  --------------
10,489,856           1      10,489,856
 9,441,280           1       9,441,280
 8,392,704           1       8,392,704
 7,344,128           1       7,344,128
 6,295,552           1       6,295,552
 5,246,976           1       5,246,976
 4,198,400           1       4,198,400
 3,149,824           1       3,149,824
 2,101,248           1       2,101,248
 1,052,672           1       1,052,672
    TOTALS          10      57,712,640
''' in out)

    def test_mixed_allocations(self):
        # Compile test program
        source = '''
#include <stdio.h>
#include <stdlib.h>

int
main (int argc, char **argv)
{
    int i;
    void *ptrs[100];
    /* Some small allocations: */
    for (i=0; i < 100; i++) {
        ptrs[i] = malloc(256);
        printf("malloc returned %p\\n", ptrs[i]);
        fflush(stdout);
    }

    /* Free one of the small allocations: */
    free(ptrs[50]);

    void* ptr1 = malloc(1000);
    void* ptr2 = malloc(1000);
    void* ptr3 = malloc(256000); /* large allocation */

    /* Directly insert a breakpoint: */
    __asm__ __volatile__ ("int $03");

    return 0;
}
'''

        out = self.program_test('test_simple', source, commands=['run',  'heap sizes'])
        #print out

        # Verify the result
        if _32bit:
            exp = '''
Chunk size  Num chunks  Allocated size
----------  ----------  --------------
   258,048           1         258,048
       264          99          26,136
     1,008           2           2,016
    TOTALS         102         286,200
'''
        else:
            exp = '''
Chunk size  Num chunks  Allocated size
----------  ----------  --------------
   258,048           1         258,048
       272          99          26,928
     1,008           2           2,016
    TOTALS         102         286,992
'''
        self.assert_(exp in out, out)


    def random_size(self):
        size = random.randint(1, 64)
        if random.randint(0, 5) == 0:
            size *= 1024
            size += random.randint(0, 1023)
        if random.randint(0, 5) == 0:
            size *= 256
            size += random.randint(0, 255)
        return size

    def test_random_allocations(self):
        # Fuzz-testing: lots of allocations (of various sizes)
        # and deallocations
        src = TestSource()
        sizes = {}
        live_blocks = set()
        for i in range(100):
            action = random.randint(1, 100)

            # 70% chance of malloc:
            if action <= 70:
                size = self.random_size()
                varname = src.add_malloc(size, debug=True)
                sizes[varname] = size
                live_blocks.add(varname)
            if len(live_blocks) > 0:
                # 10% chance of realloc:
                if action in range(71, 80):
                    size = self.random_size()
                    old_varname = random.sample(live_blocks, 1)[0]
                    live_blocks.remove(old_varname)
                    new_varname = src.add_realloc(old_varname, size, debug=True)
                    sizes[new_varname] = size
                    live_blocks.add(new_varname)
                # 20% chance of freeing something:
                elif action > 80:
                    varname = random.sample(live_blocks, 1)[0]
                    live_blocks.remove(varname)
                    src.add_free(varname)
            src.add_breakpoint()

        source = src.as_c_source()

        out = self.program_test('test_random_allocations', source,
                                commands=(['run']
                                          + ['heap select', 'cont'] * 100))

        # We have 100 states of the inferior process; check that each was
        # reported as we expected it to be:
        tables = ParsedTable.parse_lines(out)
        self.assertEqual(len(tables), 100)
        for i in range(100):
            heap_select_out = tables[i]
            #print heap_select_out
            reported_addrs = set([heap_select_out.get_cell(0, y)
                                  for y in range(len(heap_select_out.rows))])
            #print reported_addrs

        # FIXME: do some verification at each breakpoint: check that the
        # reported values correspond to what we expect

    def test_random_buffers(self):
        # Fuzz-testing: try to break the heuristics by throwing random bytes
        # at them.  Note that we do the randomization at the python level when
        # generating the C code, so that the result of running any given C code
        # is entirely reproducable
        src = TestSource()
        for i in range(100):
            varname = src.add_malloc(256, typename='unsigned char*')
            for offset in range(256):
                value = random.randint(0, 255)
                src.add_line('%s[%i]=0x%02x;' % (varname, offset, value))
        src.add_breakpoint()
        source = src.as_c_source()
        out = self.program_test('test_random_buffers', source, commands=['run',  'heap'])
        # print out


    def test_cplusplus(self):
        '''Verify that we can detect and categorize instances of C++ classes'''
        # Note that C++ detection is currently disabled due to a bug in execution capture
        src = TestSource()
        src.decls += '''
class Foo {
public:
    virtual ~Foo() {}
    int f1;
    int f2;
};
class Bar : Foo {
public:
    virtual ~Bar() {}
    int f1;
    // Ensure that Bar has a different allocated size to Foo, on every arch:
    int buffer[256];
};
'''
        for i in range(100):
            src.add_line('{Foo *f = new Foo();}')
            if i % 2:
                src.add_line('{Bar *b = new Bar();}')
        src.add_breakpoint()
        source = src.as_c_source()

        out = self.program_test('test_cplusplus', source, is_cplusplus=True, commands=['run',  'heap sizes', 'heap'])
        tables = ParsedTable.parse_lines(out)
        heap_sizes_out = tables[0]
        heap_out = tables[1]

        # We ought to have 150 live blocks on the heap:
        self.assertHasRow(heap_out,
                          [('Detail', 'TOTAL'), ('Count', 150)])

        # Use the differing counts of the blocks to locate the objects
        # FIXME: change the "Domain" values below and add "Kind" once C++
        # identification is re-enabled:
        self.assertHasRow(heap_out,
                          [('Count', 100), ('Domain', 'uncategorized')])
        self.assertHasRow(heap_out,
                          [('Count', 50),  ('Domain', 'uncategorized')])

    def test_history(self):
        src = TestSource()
        src.add_malloc(100)
        src.add_malloc(100)
        src.add_malloc(100)
        src.add_breakpoint()


        src.add_malloc(200)
        src.add_malloc(200)
        src.add_malloc(200)
        src.add_breakpoint()
        source = src.as_c_source()

        out = self.program_test('test_history', source, 
                                commands=['run', 'heap sizes', 'heap label foo', 'cont', 'heap log', 'heap diff'])
        #print out
        # FIXME


    def assertHasRow(self, table, kvs):
        return table.find_row(kvs)
        # ...which will raise a RowNotFound exception if there's a problem

    def assertFoundCategory(self, table, domain, kind, detail=None):
        # Ensure that the result table has a row of the given category
        # (or raise RowNotFound)
        kvs = [('Domain', domain),
               ('Kind', kind)]
        if detail:
            kvs.append( ('Detail', detail) )

        self.assertHasRow(table, kvs)

    def test_assertions(self):
        # Ensure that the domain-specific assertions work
        tables = ParsedTable.parse_lines(test_table)
        self.assertEquals(len(tables), 2)
        pt = tables[0]

        self.assertHasRow(pt, [('Domain', 'python'), ('Kind', 'str')])
        self.assertRaises(RowNotFound,
                          lambda: self.assertHasRow(pt, [('Domain', 'ruby')]))

        self.assertFoundCategory(pt, 'python', 'str')
        self.assertRaises(RowNotFound,
                          lambda: self.assertFoundCategory(pt, 'ruby', 'class'))

    def test_gobject(self):
        out = self.command_test(['gtk-demo'],
                                commands=['set breakpoint pending yes',
                                          'set environment G_SLICE=always-malloc', # for now
                                          'break gtk_main',
                                          'run',
                                          'heap',
                                          ])
        # print out

        tables = ParsedTable.parse_lines(out)
        heap_out = tables[0]

        # Ensure that instances of GObject classes are categorized:
        self.assertFoundCategory(heap_out, 'GType', 'GtkTreeView')
        self.assertFoundCategory(heap_out, 'GType', 'GtkLabel')

        # Ensure that instances of fundamental boxed types are categorized:
        self.assertFoundCategory(heap_out, 'GType', 'gchar')
        self.assertFoundCategory(heap_out, 'GType', 'guint')

        # Ensure that the code detected buffers used by the GLib/GTK types:
        self.assertFoundCategory(heap_out,
                                 'GType', 'GdkPixbuf pixels', '107w x 140h')

        # GdkImage -> X11 Images -> data:
        self.assertFoundCategory(heap_out, 'GType', 'GdkImage')
        self.assertFoundCategory(heap_out, 'X11', 'Image')
        if False:
            # Only seen whilst using X forwarded over ssh:
            self.assertFoundCategory(heap_out, 'X11', 'Image data')
        # In both above rows, "Detail" contains the exact dimensions, but these
        # seem to vary with the resolution of the display the test is run
        # against

        # FreeType:
        # These seem to be highly dependent on the environment; I originally
        # developed this whilst using X forwarded over ssh
        if False:
            self.assertFoundCategory(heap_out, 'GType', 'PangoCairoFcFontMap')
            self.assertFoundCategory(heap_out, 'FreeType', 'Library')
            self.assertFoundCategory(heap_out, 'FreeType', 'raster_pool')

    def test_python2(self):
        self._impl_test_python('python2', py3k=False)

    def test_python3(self):
        self._impl_test_python('python3', py3k=True)

    def _impl_test_python(self, pyruntime, py3k):
        # Test that we can debug CPython's memory usage, for a given runtime

        # Invoke a test python script, stopping at a breakpoint
        out = self.command_test([pyruntime, 'object-sizes.py'],
                                commands=['set breakpoint pending yes',
                                          'break builtin_id',
                                          'run',
                                          'heap cpython-allocators',
                                          'heap',
                                          'heap select kind="PyListObject ob_item table"'],
                                breakpoint='builtin_id')

        # Re-enable this for debugging:
        # print out

        tables = ParsedTable.parse_lines(out)

        # Verify that "cpython-allocators" works:
        allocators_out = tables[0]
        self.assertEquals(allocators_out.colnames,
                          ('struct arena_object*',
                           '256KB buffer location',
                           'Free pools'))

        # print allocators_out
        # self.assertHasRow(allocators_out,
        #                  kvs = [('Domain', 'cpython'),
        #                         ('Kind', 'PyListObject ob_item table')])

        heap_out = tables[1]

        # Verify that "select" works for a category that's only detectable
        # w.r.t. other categories:
        select_out = tables[2]
        # print select_out
        self.assertHasRow(select_out,
                          kvs = [('Domain', 'cpython'),
                                 ('Kind', 'PyListObject ob_item table')])
        
        # Ensure that the code detected instances of various python types we
        # expect to be present:
        for kind in ('str', 'list', 'tuple', 'dict', 'type', 'code',
                     'set', 'frozenset', 'function', 'module', 'frame', ):
            self.assertFoundCategory(heap_out, 'python', kind)

        if py3k:
            self.assertFoundCategory(heap_out, 'python', 'bytes')
        else:
            self.assertFoundCategory(heap_out, 'python', 'unicode')

        # Ensure that the blocks of int allocations are detected:
        if not py3k:
            self.assertFoundCategory(heap_out, 'cpython', '_intblock', '')

        # Ensure that bytecode "strings" are marked as such:
        self.assertFoundCategory(heap_out, 'python', 'str', 'bytecode') # FIXME

        # Ensure that old-style classes are printed with a meaningful name
        # (i.e. not just "type"):
        if not py3k:
            for clsname in ('OldStyle', 'OldStyleManyAttribs'):
                self.assertFoundCategory(heap_out,
                                         'python', clsname, 'old-style')

                # ...and that their instance dicts are marked:
                self.assertFoundCategory(heap_out,
                                         'cpython', 'PyDictObject',
                                         '%s.__dict__' % clsname)

        # ...and that an old-style instance with enough attributes to require a
        # separate PyDictEntry buffer for its __dict__ has that buffer marked
        # with the typename:
        self.assertFoundCategory(heap_out,
                                 'cpython', 'PyDictEntry table',
                                 'OldStyleManyAttribs.__dict__')

        # Likewise for new-style classes:
        for clsname in ('NewStyle', 'NewStyleManyAttribs'):
            self.assertHasRow(heap_out,
                              [('Domain', 'python'),
                               ('Kind',   clsname),
                               ('Detail', None)])
            self.assertFoundCategory(heap_out,
                              'python', 'dict', '%s.__dict__' % clsname)
        self.assertFoundCategory(heap_out,
                                 'cpython', 'PyDictEntry table',
                                 'NewStyleManyAttribs.__dict__')

        # Ensure that the code detected buffers used by python types:
        for kind in ('PyDictEntry table', 'PyListObject ob_item table',
                     'PySetObject setentry table',
                     'PyUnicodeObject buffer', 'PyDictEntry table'):
            self.assertFoundCategory(heap_out,
                                     'cpython', kind)

        # and of other types:
        self.assertFoundCategory(heap_out,
                                 'C', 'string data')
        self.assertFoundCategory(heap_out,
                                 'pyarena', 'pool_header overhead')

        # Ensure that the "interned" table is identified (it's typically
        # at least 200k on a 64-bit build):
        self.assertHasRow(heap_out,
                          [('Domain', 'cpython'),
                           ('Kind',   'PyDictEntry table'),
                           ('Detail', 'interned'),
                           ('Count',  1)])


        # Ensure that we detect python sqlite3 objects:
        for kind in ('sqlite3.Connection', 'sqlite3.Statement',
                     'sqlite3.Cache'):
            self.assertFoundCategory(heap_out,
                                     'python', kind)
        # ...and that we detect underlying sqlite3 buffers:
        for kind in ('sqlite3', 'sqlite3_stmt'):
            self.assertFoundCategory(heap_out,
                                     'sqlite3', kind)

    def test_pypy(self):
        # Try to investigate memory usage of pypy-c
        # Developed using pypy-1.4.1 as packaged on Fedora.
        #
        # In order to get meaningful data, let's try to trap the exit point
        # of pypy-c within gdb.
        #
        # For now, lets try to put a breakpoint in this location within the
        # generated "pypy_g_entry_point" C function:
        #   print_stats:158 :         debug_stop("jit-summary")
        out = self.command_test(['pypy', 'object-sizes.py'],
                                commands=['set breakpoint pending yes',

                                          'break pypy_debug_stop',
                                          'condition 1 0==strcmp(category, "jit-summary")',

                                          'run',
                                          'heap',
                                          ])
        tables = ParsedTable.parse_lines(out)
        select_out = tables[0]

    def test_select(self):
        # Ensure that "heap select" with no query does something sane
        src = TestSource()
        for i in range(3):
            src.add_malloc(1024)
        src.add_breakpoint()
        source = src.as_c_source()

        out = self.program_test('test_select', source,
                                commands=['run',
                                          'heap select',
                                          ])
        tables = ParsedTable.parse_lines(out)
        select_out = tables[0]

        # The "heap select" command should select all blocks:
        self.assertEquals(select_out.colnames,
                          ('Start', 'End', 'Domain', 'Kind', 'Detail', 'Hexdump'))
        self.assertEquals(len(select_out.rows), 3)


        # Test that syntax errors are well handled:
        out = self.program_test('test_select', source,
                                commands=['run',
                                          'heap select I AM A SYNTAX ERROR',
                                          ])
        errmsg = '''
Parse error at "AM":
I AM A SYNTAX ERROR
  ^^
'''
        if errmsg not in out:
            self.fail('Did not find expected "ParseError" message in:\n%s' % out)

        # Test that unknown attributes are well-handled:
        out = self.program_test('test_select', source,
                                commands=['run',
                                          'heap select NOT_AN_ATTRIBUTE > 42',
                                          ])
        errmsg = '''
Unknown attribute "NOT_AN_ATTRIBUTE" (supported are domain,kind,detail,addr,start,size) at "NOT_AN_ATTRIBUTE":
NOT_AN_ATTRIBUTE > 42
  ^^^^^^^^^^^^^^^^
'''
        if errmsg not in out:
            self.fail('Did not find expected "Unknown attribute" error message in:\n%s' % out)

        # Ensure that ply did not create debug files (ticket #12)
        for filename in ('parser.out', 'parsetab.py'):
            if os.path.exists(filename):
                self.fail('Unexpectedly found file %r' % filename)

    def test_select_by_size(self):
        src = TestSource()
        # Allocate ten 1kb blocks, nine 2kb blocks, etc, down to one 10kb
        # block so that we can easily query them by size:
        for i in range(10):
            for j in range(10-i):
                size = 1024 * (i+1)
                src.add_malloc(size)
        src.add_breakpoint()
        source = src.as_c_source()

        out = self.program_test('test_select_by_size', source,
                                commands=['run',
                                          'heap',

                                          'heap select size >= 10240',
                                          # (parsed as "largest_out" below)

                                          'heap select size < 2048',
                                          # (parsed as "smallest_out" below)

                                          'heap select size >= 4096 and size < 8192',
                                          # (parsed as "middle_out" below)
                                          ])
        tables = ParsedTable.parse_lines(out)
        heap_out = tables[0]
        largest_out = tables[1]
        smallest_out = tables[2]
        middle_out = tables[3]

        # The "heap" command should find all the allocations:
        self.assertHasRow(heap_out,
                          [('Detail', 'TOTAL'), ('Count', 55)])

        # The query for the largest should find just one allocation:
        self.assertEquals(len(largest_out.rows), 1)

        # The query for the smallest should find ten allocations:
        self.assertEquals(len(smallest_out.rows), 10)

        # The middle query [4096, 8192) should capture the following
        # allocations:
        #   7 of (4*4096), 6 of (5*4096), 5 of (6*4096) and 4 of (7*4096)
        # giving a total count of 7+6+5+4 = 22
        self.assertEquals(len(middle_out.rows), 22)

    def test_select_by_category(self):
        out = self.command_test(['python', '-c', 'id(42)'],
                                commands=['set breakpoint pending yes',
                                          'break builtin_id',
                                          'run',
                                          'heap select domain="python" and kind="str" and size > 512'],
                                breakpoint='builtin_id')

        tables = ParsedTable.parse_lines(out)
        select_out = tables[0]

        # Ensure that the filtering mechanism worked:
        if len(select_out.rows) < 10:
            self.fail("Didn't find any large python strings (has something gone wrong?) in: %s" % select_out)
        for row in select_out.rows:
            self.assertEquals(row[2], 'python')
            self.assertEquals(row[3], 'str')

    def test_heap_used(self):
        # Ensure that "heap used" works
        src = TestSource()
        for i in range(3):
            src.add_malloc(1024)
        src.add_breakpoint()
        source = src.as_c_source()

        out = self.program_test('test_heap_used', source,
                                commands=['run',
                                          'heap used',
                                          ])
        # FIXME: do some verification of the output

    def test_heap_all(self):
        # Ensure that "heap all" works
        src = TestSource()
        for i in range(3):
            src.add_malloc(1024)
        src.add_breakpoint()
        source = src.as_c_source()

        out = self.program_test('test_heap_all', source,
                                commands=['run',
                                          'heap all',
                                          ])
        # FIXME: do some verification of the output


from heap.parser import parse_query
from heap.query import Constant, And, Or, Not, GetAttr, \
    Comparison__le__, Comparison__lt__, Comparison__eq__, \
    Comparison__ne__, Comparison__ge__, Comparison__gt__

class QueryParsingTests(unittest.TestCase):
    def assertParsesTo(self, s, result):
        self.assertEquals(parse_query(s), result)

    def test_simple_comparisons(self):
        self.assertParsesTo('size >= 1024',
                            Comparison__ge__(GetAttr('size'), Constant(1024)))

        # Check that hexadecimal numeric literals are parsed:
        self.assertParsesTo('addr > 0xbf70ffff',
                            Comparison__gt__(GetAttr('addr'), Constant(0xbf70ffff)))

        # Check that string literals are parsed:
        self.assertParsesTo('kind == "str"',
                            Comparison__eq__(GetAttr('kind'), Constant('str')))

        # Check "and":
        self.assertParsesTo('kind == "str" and size > 1024',
                            And(Comparison__eq__(GetAttr('kind'), Constant('str')),
                                Comparison__gt__(GetAttr('size'), Constant(1024))))

        # Check "or":
        self.assertParsesTo('size > 10000 and not domain="uncategorized"',
                            And(Comparison__gt__(GetAttr('size'), Constant(10000)),
                                Not(Comparison__eq__(GetAttr('domain'), Constant('uncategorized')))))

        # Do we want algebraic support?
        #self.assertParsesTo('size == (256 * 1024)+8',
        #                    Comparison('size', '==', 1024L))


if __name__ == "__main__":
    unittest.main()