Repository: rogerhu/gdb-heap Branch: master Commit: 8f7986c90754 Files: 24 Total size: 191.2 KB Directory structure: gitextract_kppz_9el/ ├── .gitignore ├── ChangeLog.rst ├── LICENSE-lgpl-2.1.txt ├── LICENSE-python.txt ├── LICENSE.txt ├── README.md ├── gdbheap.py ├── heap/ │ ├── __init__.py │ ├── commands.py │ ├── compat.py │ ├── cplusplus.py │ ├── cpython.py │ ├── glibc.py │ ├── gobject.py │ ├── history.py │ ├── parser.py │ ├── pypy.py │ ├── query.py │ └── sqlite.py ├── make-release.sh ├── object-sizes.py ├── resultparser.py ├── run-gdb-heap └── selftest.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ *~ *.pyc *.pyo gdb-heap-*.tar.bz2 test_* ================================================ FILE: ChangeLog.rst ================================================ ========== Change Log ========== * Since glib v2.15, there can now be multiple allocation arenas support for multi-threaded environments (http://stackoverflow.com/questions/10706466/how-does-malloc-work-in-a-multithreaded-environment). A new command called "heap arenas" will now allow you to see how many arenas and their respective address locations ================================================ FILE: LICENSE-lgpl-2.1.txt ================================================ GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below. When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things. To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights. We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library. To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others. Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs. When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library. We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances. For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run. GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with. c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! ================================================ FILE: LICENSE-python.txt ================================================ A. HISTORY OF THE SOFTWARE ========================== Python was created in the early 1990s by Guido van Rossum at Stichting Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands as a successor of a language called ABC. Guido remains Python's principal author, although it includes many contributions from others. In 1995, Guido continued his work on Python at the Corporation for National Research Initiatives (CNRI, see http://www.cnri.reston.va.us) in Reston, Virginia where he released several versions of the software. In May 2000, Guido and the Python core development team moved to BeOpen.com to form the BeOpen PythonLabs team. In October of the same year, the PythonLabs team moved to Digital Creations (now Zope Corporation, see http://www.zope.com). In 2001, the Python Software Foundation (PSF, see http://www.python.org/psf/) was formed, a non-profit organization created specifically to own Python-related Intellectual Property. Zope Corporation is a sponsoring member of the PSF. All Python releases are Open Source (see http://www.opensource.org for the Open Source Definition). Historically, most, but not all, Python releases have also been GPL-compatible; the table below summarizes the various releases. Release Derived Year Owner GPL- from compatible? (1) 0.9.0 thru 1.2 1991-1995 CWI yes 1.3 thru 1.5.2 1.2 1995-1999 CNRI yes 1.6 1.5.2 2000 CNRI no 2.0 1.6 2000 BeOpen.com no 1.6.1 1.6 2001 CNRI yes (2) 2.1 2.0+1.6.1 2001 PSF no 2.0.1 2.0+1.6.1 2001 PSF yes 2.1.1 2.1+2.0.1 2001 PSF yes 2.2 2.1.1 2001 PSF yes 2.1.2 2.1.1 2002 PSF yes 2.1.3 2.1.2 2002 PSF yes 2.2.1 2.2 2002 PSF yes 2.2.2 2.2.1 2002 PSF yes 2.2.3 2.2.2 2003 PSF yes 2.3 2.2.2 2002-2003 PSF yes 2.3.1 2.3 2002-2003 PSF yes 2.3.2 2.3.1 2002-2003 PSF yes 2.3.3 2.3.2 2002-2003 PSF yes 2.3.4 2.3.3 2004 PSF yes 2.3.5 2.3.4 2005 PSF yes 2.4 2.3 2004 PSF yes 2.4.1 2.4 2005 PSF yes 2.4.2 2.4.1 2005 PSF yes 2.4.3 2.4.2 2006 PSF yes 2.4.4 2.4.3 2006 PSF yes 2.5 2.4 2006 PSF yes 2.5.1 2.5 2007 PSF yes 2.5.2 2.5.1 2008 PSF yes 2.5.3 2.5.2 2008 PSF yes 2.6 2.5 2008 PSF yes 2.6.1 2.6 2008 PSF yes 2.6.2 2.6.1 2009 PSF yes 2.6.3 2.6.2 2009 PSF yes 2.6.4 2.6.3 2009 PSF yes 2.6.5 2.6.4 2010 PSF yes Footnotes: (1) GPL-compatible doesn't mean that we're distributing Python under the GPL. All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source. The GPL-compatible licenses make it possible to combine Python with other software that is released under the GPL; the others don't. (2) According to Richard Stallman, 1.6.1 is not GPL-compatible, because its license has a choice of law clause. According to CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1 is "not incompatible" with the GPL. Thanks to the many outside volunteers who have worked under Guido's direction to make these releases possible. B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON =============================================================== PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2 -------------------------------------------- 1. This LICENSE AGREEMENT is between the Python Software Foundation ("PSF"), and the Individual or Organization ("Licensee") accessing and otherwise using this software ("Python") in source or binary form and its associated documentation. 2. Subject to the terms and conditions of this License Agreement, PSF hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python alone or in any derivative version, provided, however, that PSF's License Agreement and PSF's notice of copyright, i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 Python Software Foundation; All Rights Reserved" are retained in Python alone or in any derivative version prepared by Licensee. 3. In the event Licensee prepares a derivative work that is based on or incorporates Python or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python. 4. PSF is making Python available to Licensee on an "AS IS" basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 6. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 7. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between PSF and Licensee. This License Agreement does not grant permission to use PSF trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party. 8. By copying, installing or otherwise using Python, Licensee agrees to be bound by the terms and conditions of this License Agreement. BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0 ------------------------------------------- BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1 1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the Individual or Organization ("Licensee") accessing and otherwise using this software in source or binary form and its associated documentation ("the Software"). 2. Subject to the terms and conditions of this BeOpen Python License Agreement, BeOpen hereby grants Licensee a non-exclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use the Software alone or in any derivative version, provided, however, that the BeOpen Python License is retained in the Software, alone or in any derivative version prepared by Licensee. 3. BeOpen is making the Software available to Licensee on an "AS IS" basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 5. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 6. This License Agreement shall be governed by and interpreted in all respects by the law of the State of California, excluding conflict of law provisions. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between BeOpen and Licensee. This License Agreement does not grant permission to use BeOpen trademarks or trade names in a trademark sense to endorse or promote products or services of Licensee, or any third party. As an exception, the "BeOpen Python" logos available at http://www.pythonlabs.com/logos.html may be used according to the permissions granted on that web page. 7. By copying, installing or otherwise using the software, Licensee agrees to be bound by the terms and conditions of this License Agreement. CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1 --------------------------------------- 1. This LICENSE AGREEMENT is between the Corporation for National Research Initiatives, having an office at 1895 Preston White Drive, Reston, VA 20191 ("CNRI"), and the Individual or Organization ("Licensee") accessing and otherwise using Python 1.6.1 software in source or binary form and its associated documentation. 2. Subject to the terms and conditions of this License Agreement, CNRI hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 1.6.1 alone or in any derivative version, provided, however, that CNRI's License Agreement and CNRI's notice of copyright, i.e., "Copyright (c) 1995-2001 Corporation for National Research Initiatives; All Rights Reserved" are retained in Python 1.6.1 alone or in any derivative version prepared by Licensee. Alternately, in lieu of CNRI's License Agreement, Licensee may substitute the following text (omitting the quotes): "Python 1.6.1 is made available subject to the terms and conditions in CNRI's License Agreement. This Agreement together with Python 1.6.1 may be located on the Internet using the following unique, persistent identifier (known as a handle): 1895.22/1013. This Agreement may also be obtained from a proxy server on the Internet using the following URL: http://hdl.handle.net/1895.22/1013". 3. In the event Licensee prepares a derivative work that is based on or incorporates Python 1.6.1 or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python 1.6.1. 4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS" basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 6. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 7. This License Agreement shall be governed by the federal intellectual property law of the United States, including without limitation the federal copyright law, and, to the extent such U.S. federal law does not apply, by the law of the Commonwealth of Virginia, excluding Virginia's conflict of law provisions. Notwithstanding the foregoing, with regard to derivative works based on Python 1.6.1 that incorporate non-separable material that was previously distributed under the GNU General Public License (GPL), the law of the Commonwealth of Virginia shall govern this License Agreement only as to issues arising under or with respect to Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between CNRI and Licensee. This License Agreement does not grant permission to use CNRI trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party. 8. By clicking on the "ACCEPT" button where indicated, or by copying, installing or otherwise using Python 1.6.1, Licensee agrees to be bound by the terms and conditions of this License Agreement. ACCEPT CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2 -------------------------------------------------- Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam, The Netherlands. All rights reserved. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Stichting Mathematisch Centrum or CWI not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ================================================ FILE: LICENSE.txt ================================================ gdb-heap is licensed under the LGPLv2.1, with the exception of heap/python.py, which is licensed under the PSF license. ================================================ FILE: README.md ================================================ gdb-heap ======== Original fork derived from `https://fedorahosted.org/gdb-heap/`. This repo is now considered the official repository for the gdb-heap library. Installation instructions ------------------------- 1. To get this module working with Ubuntu 16.04, make sure you have the following packages installed: ``` sudo apt-get install libc6-dev libc6-dbg python-gi libglib2.0-0-dbg python-ply ``` The original forked version assumes an "import gdb" module, which resides in "/usr/share/glib-2.0/gdb" as part of the `libglib2.0-0-dbg` package. Earlier versions of Ubuntu have this library is located in the `ibglib2.0-dev` package. There is also a conflict with the python-gobject-2 library, which are deprecated Python bindings for the GObject library. This package installs a glib/ directory inside /usr/lib/python2.7/dist-packages/glib/option.py, which many Gtk-related modules depend. You will therefore need to make sure the sys.path for /usr/share/glib-2.0/gdb is declared first for this reason (see code example). You'll also want to install python-dbg since the package comes with the debugging symbols for the stock Python 2.7, as well as a python-dbg binary compiled with the --with-pydebug option that will only work with C extensions modules compiled with the /usr/include/python2.7_d headers. NOTE: The Python binary that accompanies Ubuntu distributions uses link-time optimization compilation. As a result, many of the Python data structures are optimized out and prevent gdb-heap from being able to properly categorize the various data structures. To take advantage of this capability, you will need to download the Python source and recompile without using the -flto option in the CFLAGS/LDFLAGS configuration option. Normally this capability is not used in standard configure so simply compiling it should do the trick. (If you want to have SSL support in this binary, make sure to edit Modules/Setup.dist). The python-dbg binary is compiled with the Py_TRACE_REFS conditional via the --pydebug which modifies the internal Python data structures and adds two pointers into every base PyObject, preventing previously compiled C extensions to be used. Using your own compiled version of Python is therefore the way to go if you want to take advantage of the categorize features of gdb-heap and/or inspecting the internal memory structures of Python. 2. Create a file that will help automate the loading of the gdbheap library: gdb-heap-commands: ``` python import sys sys.path.insert(0, "/usr/share/glib-2.0/gdb") sys.path.append("/usr/share/glib-2.0/gdb") sys.path.append("/home/rhu/projects/gdb-heap") import gdbheap end ``` To attach to an existing process, you can execute as follows: ```bash sudo gdb -p 7458 -x ~/gdb-heap-commands ``` To take a core dump of a process, you can do the following: ``` 1) sudo gdb -p 2) Type "generate-core-file" at the GDB prompt. 3) Wait awhile (and be careful not to hit enter again, since it will repeat the same command) 4) Copy the core. file somewhere. ``` You can then use gdb to attach to this core file: ```bash sudo gdb python -x ~/gdb-heap-commands ``` Commands to run --------------- ``` heap - print a report on memory usage, by category heap sizes - print a report on memory usage, by sizes heap used - print used heap chunks heap free - print free heap chunks heap all - print all heap chunks heap log - print a log of recorded heap states heap label - record the current state of the heap for later comparison heap diff - compare two states of the heap heap select - query used heap chunks hexdump [-c] - print a hexdump, stating at the specific region of memory (expose hex characters with -c option) heap arenas - print glibs arenas heap arena - select glibc arena number ``` Useful resources ---------------- * http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-dude-where-s-my-ram-a-deep-dive-into-how-python-uses-memory-4896725 (Dude - Where's My RAM? A deep dive in how Python uses memory - David Malcom's PyCon 2011 video talk) * http://dmalcolm.fedorapeople.org/presentations/PyCon-US-2011/GdbPythonPresentation/GdbPython.html (David Malcom's PyCon 2011 slides) * http://code.woboq.org/userspace/glibc/malloc/malloc.c.html (malloc.c.html implementation) * Malloc per-thread arenas in glibc (http://siddhesh.in/journal/2012/10/24/malloc-per-thread-arenas-in-glibc/) * Understanding the heap by breaking it (http://www.blackhat.com/presentations/bh-usa-07/Ferguson/Whitepaper/bh-usa-07-ferguson-WP.pdf) * Building your own Python version for an easier debugging experience (http://hustoknow.blogspot.com/2014/06/how-to-troubleshoot-your-python.html) ================================================ FILE: gdbheap.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA from heap.commands import register_commands # Register the commands with gdb: register_commands() ================================================ FILE: heap/__init__.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA from collections import namedtuple try: import gdb # We defer most type lookups to when they're needed, since they'll fail if the # DWARF data for the relevant DSO hasn't been loaded yet, which is typically # the case for an executable dynamically linked against glibc type_void_ptr = gdb.lookup_type('void').pointer() type_char_ptr = gdb.lookup_type('char').pointer() type_unsigned_char_ptr = gdb.lookup_type('unsigned char').pointer() sizeof_ptr = type_void_ptr.sizeof if sizeof_ptr == 4: def fmt_addr(addr): return '0x%08x' % addr else: # Assume 64-bit: def fmt_addr(addr): return '0x%016x' % addr except ImportError: # Support importing heap.parser from outside gdb pass class WrongInferiorProcess(RuntimeError): def __init__(self, hint): self.hint = hint NUM_HEXDUMP_BYTES = 20 __type_cache = {} def caching_lookup_type(typename): '''Adds caching to gdb.lookup_type(), whilst still raising RuntimeError if the type isn't found.''' if typename in __type_cache: gdbtype = __type_cache[typename] if gdbtype: return gdbtype raise RuntimeError('(cached) Could not find type "%s"' % typename) try: if 0: print('type cache miss: %r' % typename) gdbtype = gdb.lookup_type(typename).strip_typedefs() except RuntimeError as e: # did not find the type: add a None to the cache gdbtype = None __type_cache[typename] = gdbtype if gdbtype: return gdbtype raise RuntimeError('Could not find type "%s"' % typename) def array_length(_gdbval): '''Given a gdb.Value that's an array, determine the number of elements in the array''' arr_size = _gdbval.type.sizeof elem_size = _gdbval[0].type.sizeof return arr_size/elem_size def offsetof(typename, fieldname): '''Get the offset (in bytes) from the start of the given type to the given field''' # This is a transliteration to gdb's python API of: # (int)(void*)&((#typename*)NULL)->#fieldname) t = caching_lookup_type(typename).pointer() v = gdb.Value(0) v = v.cast(t) field = v[fieldname].cast(type_void_ptr) return int(field.address) class MissingDebuginfo(RuntimeError): def __init__(self, module): self.module = module def check_missing_debuginfo(err, module): assert(isinstance(err, RuntimeError)) if err.args[0] == 'Attempt to extract a component of a value that is not a (null).': # Then we likely are trying to extract a field from a struct but don't # have the DWARF description of the fields of the struct loaded: raise MissingDebuginfo(module) class WrappedValue(object): """ Base class, wrapping an underlying gdb.Value adding various useful methods, and allowing subclassing """ def __init__(self, gdbval): self._gdbval = gdbval # __getattr__ just made it too confusing #def __getattr__(self, attr): # return WrappedValue(self.val[attr]) def field(self, attr): return self._gdbval[attr] def __str__(self): return str(self._gdbval) # See http://sourceware.org/gdb/onlinedocs/gdb/Values-From-Inferior.html#Values-From-Inferior @property def address(self): return self._gdbval.address @property def is_optimized_out(self): return self._gdbval.is_optimized_out @property def type(self): return self._gdbval.type @property def dynamic_type(self): return self._gdbval.dynamic_type @property def is_lazy(self): return self._gdbval.is_lazy def dereference(self): return WrappedValue(self._gdbval.dereference()) # def address(self): # return int(self._gdbval.cast(type_void_ptr)) def is_null(self): return int(self._gdbval) == 0 class WrappedPointer(WrappedValue): def as_address(self): return int(self._gdbval.cast(type_void_ptr)) def __str__(self): return ('<%s for inferior 0x%x>' % (self.__class__.__name__, self.as_address() ) ) def cast(self, type_): return WrappedPointer(self._gdbval.cast(type_)) def categorize_refs(self, usage_set, level=0, detail=None): '''Hook for categorizing references known by the type this points to''' # do nothing by default: pass def fmt_size(size): ''' Pretty-formatting of numeric values: return a string, subdividing the digits into groups of three, using commas ''' s = str(size) result = '' while len(s)>3: result = ',' + s[-3:] + result s = s[0:-3] result = s + result return result def as_hexdump_char(b): '''Given a byte, return a string for use by hexdump, converting non-printable/non-ASCII values as a period''' if b>=0x20 and b < 0x80: return chr(b) else: return '.' def sign(amt): if amt >= 0: return '+' else: return '' # the '-' sign will come from the numeric repr class Category(namedtuple('Category', ('domain', 'kind', 'detail'))): ''' Categorization of an in-use area of memory domain: high-level grouping e.g. "python", "C++", etc kind: type information, appropriate to the domain e.g. a class/type Domain Meaning of 'kind' ------ ----------------- 'C++' the C++ class 'python' the python class 'cpython' C structure/type (implementation detail within Python) 'pyarena' Python memory allocator detail: additional detail ''' def __new__(_cls, domain, kind, detail=None): return tuple.__new__(_cls, (domain, kind, detail)) def __str__(self): return '%s:%s:%s' % (self.domain, self.kind, self.detail) class Usage(object): # Information about an in-use area of memory slots = ('start', 'size', 'category', 'level', 'hd', 'obj') def __init__(self, start, size, category=None, level=None, hd=None, obj=None): assert isinstance(start, int) assert isinstance(size, int) if category: assert isinstance(category, Category) self.start = start self.size = size self.category = category self.level = level self.hd = hd self.obj = obj def __repr__(self): result = 'Usage(%s, %s' % (hex(self.start), hex(self.size)) if self.category: result += ', %r' % (self.category, ) if self.hd: result += ', hd=%r' % self.hd if self.obj: result += ', obj=%r' % self.obj return result + ')' def ensure_category(self, usage_set=None): if self.category is None: self.category = categorize(self, usage_set) def ensure_hexdump(self): if self.hd is None: self.hd = hexdump_as_bytes(self.start, NUM_HEXDUMP_BYTES) def hexdump_as_bytes(addr, size, chars_only=True): addr = gdb.Value(addr).cast(type_unsigned_char_ptr) bytebuf = [] for j in range(size): ptr = addr + j b = int(ptr.dereference()) bytebuf.append(b) result = '' if not chars_only: result += ' '.join(['%02x' % b for b in bytebuf]) + ' |' result += ''.join([as_hexdump_char(b) for b in bytebuf]) result += '|' return (result) def hexdump_as_int(addr, count): addr = gdb.Value(addr).cast(caching_lookup_type('unsigned long').pointer()) bytebuf = [] longbuf = [] for j in range(count): ptr = addr + j long = ptr.dereference() longbuf.append(long) bptr = gdb.Value(ptr).cast(type_unsigned_char_ptr) for i in range(sizeof_ptr): bytebuf.append(int((bptr + i).dereference())) return (' '.join([fmt_addr(int) for long in longbuf]) + ' |' + ''.join([as_hexdump_char(b) for b in bytebuf]) + '|') class Table(object): '''A table of text/numbers that knows how to print itself''' def __init__(self, columnheadings=None, rows=[]): self.numcolumns = len(columnheadings) self.columnheadings = columnheadings self.rows = [] self._colsep = ' ' def add_row(self, row): assert len(row) == self.numcolumns self.rows.append(row) def write(self, out): colwidths = self._calc_col_widths() self._write_row(out, colwidths, self.columnheadings) self._write_separator(out, colwidths) for row in self.rows: self._write_row(out, colwidths, row) def _calc_col_widths(self): result = [] for colIndex in range(self.numcolumns): result.append(self._calc_col_width(colIndex)) return result def _calc_col_width(self, idx): cells = [str(row[idx]) for row in self.rows] heading = self.columnheadings[idx] return max([len(c) for c in (cells + [heading])]) def _write_row(self, out, colwidths, values): for i, (value, width) in enumerate(zip(values, colwidths)): if i > 0: out.write(self._colsep) formatString = "%%%ds" % width # to generate e.g. "%20s" out.write(formatString % value) out.write('\n') def _write_separator(self, out, colwidths): for i, width in enumerate(colwidths): if i > 0: out.write(self._colsep) out.write('-' * width) out.write('\n') class UsageSet(object): def __init__(self, usage_list): self.usage_list = usage_list # Ensure we can do fast lookups: self.usage_by_address = dict([(int(u.start), u) for u in usage_list]) def set_addr_category(self, addr, category, level=0, visited=None, debug=False): '''Attempt to mark the given address as being of the given category, whilst maintaining a set of address already visited, to try to stop infinite graph traveral''' if visited: if addr in visited: if debug: print('addr 0x%x already visited (for category %r)' % (addr, category)) return False visited.add(addr) if addr in self.usage_by_address: if debug: print('addr 0x%x found (for category %r, level=%i)' % (addr, category, level)) u = self.usage_by_address[addr] # Bail if we already have a more detailed categorization for the # address: if level <= u.level: if debug: print ('addr 0x%x already has category %r (level %r)' % (addr, u.category, u.level)) return False u.category = category u.level = level return True else: if debug: print('addr 0x%x not found (for category %r)' % (addr, category)) class PythonCategorizer(object): ''' Logic for categorizing buffers owned by Python objects. (Done as an object to capture the type-lookup state) ''' def __init__(self): '''This will raise a TypeError if the types aren't available (e.g. not a python app, or debuginfo not available''' self._type_PyDictObject_ptr = caching_lookup_type('PyDictObject').pointer() self._type_PyListObject_ptr = caching_lookup_type('PyListObject').pointer() self._type_PySetObject_ptr = caching_lookup_type('PySetObject').pointer() self._type_PyUnicodeObject_ptr = caching_lookup_type('PyUnicodeObject').pointer() self._type_PyCodeObject_ptr = caching_lookup_type('PyCodeObject').pointer() self._type_PyGC_Head = caching_lookup_type('PyGC_Head') @classmethod def make(cls): '''Try to make a PythonCategorizer, if debuginfo is available; otherwise return None''' try: return cls() except RuntimeError: return None def categorize(self, u, usage_set): '''Try to categorize a Usage instance within an UsageSet (which could lead to further categorization)''' c = u.category if c.domain != 'python': return False if u.obj: if u.obj.categorize_refs(usage_set): return True if c.kind == 'list': list_ptr = gdb.Value(u.start + self._type_PyGC_Head.sizeof).cast(self._type_PyListObject_ptr) ob_item = int(list_ptr['ob_item']) usage_set.set_addr_category(ob_item, Category('cpython', 'PyListObject ob_item table', None)) return True elif c.kind == 'set': set_ptr = gdb.Value(u.start + self._type_PyGC_Head.sizeof).cast(self._type_PySetObject_ptr) table = int(set_ptr['table']) usage_set.set_addr_category(table, Category('cpython', 'PySetObject setentry table', None)) return True if c.kind == 'code': # Python 2.6's PyCode_Type doesn't have Py_TPFLAGS_HAVE_GC: code_ptr = gdb.Value(u.start).cast(self._type_PyCodeObject_ptr) co_code = int(code_ptr['co_code']) usage_set.set_addr_category(co_code, Category('python', 'str', 'bytecode'), # FIXME: on py3k this should be bytes level=1) return True elif c.kind == 'sqlite3.Statement': ptr_type = caching_lookup_type('pysqlite_Statement').pointer() obj_ptr = gdb.Value(u.start).cast(ptr_type) #print obj_ptr.dereference() from heap.sqlite import categorize_sqlite3 for fieldname, catname, fn in (('db', 'sqlite3', categorize_sqlite3), ('st', 'sqlite3_stmt', None)): field_ptr = int(obj_ptr[fieldname]) # sqlite's src/mem1.c adds a a sqlite3_int64 (size) to the front # of the allocation, so we need to look 8 bytes earlier to find # the malloc-ed region: malloc_ptr = field_ptr - 8 # print u, fieldname, category, field_ptr if usage_set.set_addr_category(malloc_ptr, Category('sqlite3', catname)): if fn: fn(field_ptr, usage_set, set()) return True elif c.kind == 'rpm.hdr': ptr_type = caching_lookup_type('struct hdrObject_s').pointer() if ptr_type: obj_ptr = gdb.Value(u.start).cast(ptr_type) # print obj_ptr.dereference() h = obj_ptr['h'] if usage_set.set_addr_category(int(h), Category('rpm', 'Header', None)): blob = h['blob'] usage_set.set_addr_category(int(blob), Category('rpm', 'Header blob', None)) elif c.kind == 'rpm.mi': ptr_type = caching_lookup_type('struct rpmmiObject_s').pointer() if ptr_type: obj_ptr = gdb.Value(u.start).cast(ptr_type) print(obj_ptr.dereference()) mi = obj_ptr['mi'] if usage_set.set_addr_category(int(mi), Category('rpm', 'rpmdbMatchIterator', None)): pass #blob = h['blob'] #usage_set.set_addr_category(int(blob), 'rpm Header blob') # Not categorized: return False def _get_register_state(): from heap.compat import execute return execute('thread apply all info registers') __cached_usage_list = None __cached_reg_state = None def lazily_get_usage_list(): '''Lazily do a full-graph categorization, getting a list of Usage instances''' global __cached_usage_list global __cached_reg_state reg_state = _get_register_state() # print 'reg_state', reg_state if __cached_usage_list and __cached_reg_state: # Verify that the inferior process hasn't changed state since the cache # was populated. # Something of a hack: verify that all registers have the same values: if reg_state == __cached_reg_state: # We can use the cache: # print 'USING THE CACHE' return __cached_usage_list # print 'REGENERATING THE CACHE' # Do the work: usage_list = list(iter_usage_with_progress()) categorize_usage_list(usage_list) # Update the cache: __cached_usage_list = usage_list __cached_reg_state = reg_state return __cached_usage_list def categorize_usage_list(usage_list): '''Do a "full-graph" categorization of the given list of Usage instances For example, if p is a (PyDictObject*), then mark p->ma_table and p->ma_mask accordingly ''' usage_set = UsageSet(usage_list) visited = set() # Precompute some types, if available: pycategorizer = PythonCategorizer.make() for u in ProgressNotifier(iter(usage_list), 'Blocks analyzed'): # Cover the simple cases, where the category can be figured out directly: u.ensure_category(usage_set) # Cross-references: if u.obj: if u.obj.categorize_refs(usage_set): continue # Try to categorize buffers used by python objects: if pycategorizer: if pycategorizer.categorize(u, usage_set): continue from heap.cpython import python_categorization python_categorization(usage_set) def categorize(u, usage_set): '''Given an in-use block, try to guess what it's being used for If usage_set is provided, this categorization may lead to further categorizations''' from heap.cpython import as_python_object, obj_addr_to_gc_addr addr, size = u.start, u.size pyop = as_python_object(addr) if pyop: u.obj = pyop try: return pyop.categorize() except (RuntimeError, UnicodeEncodeError, UnicodeDecodeError): # If something went wrong, assume that this wasn't really a python # object, and fall through: print("couldn't categorize pyop:", pyop) pass # PyPy detection: from heap.pypy import pypy_categorizer cat = pypy_categorizer(addr, size) if cat: return cat # C++ detection: only enabled if we can capture "execute"; there seems to # be a bad interaction between pagination and redirection: all output from # "heap" disappears in the fallback form of execute, unless we "set pagination off" from heap.compat import has_gdb_execute_to_string # Disable for now, see https://bugzilla.redhat.com/show_bug.cgi?id=620930 if False: # has_gdb_execute_to_string: from heap.cplusplus import get_class_name cpp_cls = get_class_name(addr, size) if cpp_cls: return Category('C++', cpp_cls) # GObject detection: from heap.gobject import as_gtype_instance ginst = as_gtype_instance(addr, size) if ginst: u.obj = ginst return ginst.categorize() s = as_nul_terminated_string(addr, size) if s and len(s) > 2: return Category('C', 'string data') # Uncategorized: return Category('uncategorized', '', '%s bytes' % size) def as_nul_terminated_string(addr, size): # Does this look like a NUL-terminated string? ptr = gdb.Value(addr).cast(type_char_ptr) try: s = ptr.string(encoding='ascii') return s except (RuntimeError, UnicodeDecodeError): # Probably not string data: return None class ProgressNotifier(object): '''Wrap an iterable with progress notification to stdout''' def __init__(self, inner, msg): self.inner = inner self.count = 0 self.msg = msg def __iter__(self): return self def __next__(self): self.count += 1 if 0 == self.count % 10000: print(self.msg, self.count) return self.inner.__next__() def iter_usage_with_progress(): return ProgressNotifier(iter_usage(), 'Blocks retrieved') class CachedInferiorState(object): """ Cached state containing information scraped from the inferior process """ def __init__(self): self._arena_detectors = [] def add_arena_detector(self, detector): self._arena_detectors.append(detector) def detect_arena(self, ptr, chunksize): '''Detect if this ptr returned by malloc is in use by any of the layered allocation schemes, returning arena object if it is, None if not''' for detector in self._arena_detectors: arena = detector.as_arena(ptr, chunksize) if arena: return arena # Not found: return None def iter_usage(): # Iterate through glibc, and within that, within Python arena blocks, as appropriate from heap.glibc import glibc_arenas ms = glibc_arenas.get_ms() cached_state = CachedInferiorState() from heap.cpython import ArenaDetection as CPythonArenaDetection, PyArenaPtr, ArenaObject try: cpython_arenas = CPythonArenaDetection() cached_state.add_arena_detector(cpython_arenas) except WrongInferiorProcess: pass from heap.pypy import ArenaDetection as PyPyArenaDetection try: pypy_arenas = PyPyArenaDetection() cached_state.add_arena_detector(pypy_arenas) except WrongInferiorProcess: pass for i, chunk in enumerate(ms.iter_mmap_chunks()): mem_ptr = chunk.as_mem() chunksize = chunk.chunksize() arena = cached_state.detect_arena(mem_ptr, chunksize) if arena: for u in arena.iter_usage(): yield u else: yield Usage(int(mem_ptr), chunksize) for chunk in ms.iter_sbrk_chunks(): mem_ptr = chunk.as_mem() chunksize = chunk.chunksize() if chunk.is_inuse(): arena = cached_state.detect_arena(mem_ptr, chunksize) if arena: for u in arena.iter_usage(): yield u else: yield Usage(int(mem_ptr), chunksize) def looks_like_ptr(value): '''Does this gdb.Value pointer's value looks reasonable? For use when casting a block of memory to a structure on pointer fields within that block of memory. ''' # NULL is acceptable; assume that it's 0 on every arch we care about if value == 0: return True # Assume that pointers aren't allocated in the bottom 1MB of a process' # address space: if value < (1024 * 1024): return False # Assume that if it got this far, that it's valid: return True ================================================ FILE: heap/commands.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA import gdb import re import sys from heap.glibc import glibc_arenas from heap.history import history, Snapshot, Diff from heap import lazily_get_usage_list, \ fmt_size, fmt_addr, \ categorize, categorize_usage_list, Usage, \ hexdump_as_bytes, \ Table, \ MissingDebuginfo def need_debuginfo(f): def g(self, args, from_tty): try: return f(self, args, from_tty) except MissingDebuginfo as e: print('Missing debuginfo for %s' % e.module) print('Suggested fix:') print(' debuginfo-install %s' % e.module) return g class Heap(gdb.Command): 'Print a report on memory usage, by category' def __init__(self): gdb.Command.__init__ (self, "heap", gdb.COMMAND_DATA, prefix=True) @need_debuginfo def invoke(self, args, from_tty): total_by_category = {} count_by_category = {} total_size = 0 total_count = 0 try: usage_list = list(lazily_get_usage_list()) for u in usage_list: u.ensure_category() total_size += u.size if u.category in total_by_category: total_by_category[u.category] += u.size else: total_by_category[u.category] = u.size total_count += 1 if u.category in count_by_category: count_by_category[u.category] += 1 else: count_by_category[u.category] = 1 except KeyboardInterrupt: pass # FIXME t = Table(['Domain', 'Kind', 'Detail', 'Count', 'Allocated size']) for category in sorted(total_by_category.keys(), key=total_by_category.get, reverse=True): detail = category.detail if not detail: detail = '' t.add_row([category.domain, category.kind, detail, fmt_size(count_by_category[category]), fmt_size(total_by_category[category]), ]) t.add_row(['', '', 'TOTAL', fmt_size(total_count), fmt_size(total_size)]) t.write(sys.stdout) print() class HeapSizes(gdb.Command): 'Print a report on memory usage, by sizes' def __init__(self): gdb.Command.__init__ (self, "heap sizes", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): ms = glibc_arenas.get_ms() chunks_by_size = {} num_chunks = 0 total_size = 0 try: for chunk in ms.iter_chunks(): if not chunk.is_inuse(): continue size = int(chunk.chunksize()) num_chunks += 1 total_size += size if size in chunks_by_size: chunks_by_size[size] += 1 else: chunks_by_size[size] = 1 except KeyboardInterrupt: pass # FIXME t = Table(['Chunk size', 'Num chunks', 'Allocated size']) for size in sorted(chunks_by_size.keys(), key=lambda s1: chunks_by_size[s1] * s1, reverse=True): t.add_row([fmt_size(size), chunks_by_size[size], fmt_size(chunks_by_size[size] * size)]) t.add_row(['TOTALS', num_chunks, fmt_size(total_size)]) t.write(sys.stdout) print() class HeapUsed(gdb.Command): 'Print used heap chunks' def __init__(self): gdb.Command.__init__ (self, "heap used", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): print('Used chunks of memory on heap') print('-----------------------------') ms = glibc_arenas.get_ms() for i, chunk in enumerate(ms.iter_chunks()): if not chunk.is_inuse(): continue size = chunk.chunksize() mem = chunk.as_mem() u = Usage(mem, size) category = categorize(u, None) hd = hexdump_as_bytes(mem, 32) print ('%6i: %s -> %s %8i bytes %20s |%s' % (i, fmt_addr(chunk.as_mem()), fmt_addr(chunk.as_mem()+size-1), size, category, hd)) print() class HeapFree(gdb.Command): 'Print free heap chunks' def __init__(self): gdb.Command.__init__ (self, "heap free", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): print('Free chunks of memory on heap') print('-----------------------------') ms = glibc_arenas.get_ms() total_size = 0 for i, chunk in enumerate(ms.iter_free_chunks()): size = chunk.chunksize() total_size += size mem = chunk.as_mem() u = Usage(mem, size) category = categorize(u, None) hd = hexdump_as_bytes(mem, 32) print ('%6i: %s -> %s %8i bytes %20s |%s' % (i, fmt_addr(chunk.as_mem()), fmt_addr(chunk.as_mem()+size-1), size, category, hd)) print("Total size: %s" % total_size) class HeapAll(gdb.Command): 'Print all heap chunks' def __init__(self): gdb.Command.__init__ (self, "heap all", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): print('All chunks of memory on heap (both used and free)') print('-------------------------------------------------') ms = glibc_arenas.get_ms() for i, chunk in enumerate(ms.iter_chunks()): size = chunk.chunksize() if chunk.is_inuse(): kind = ' inuse' else: kind = ' free' print ('%i: %s -> %s %s: %i bytes (%s)' % (i, fmt_addr(chunk.as_address()), fmt_addr(chunk.as_address()+size-1), kind, size, chunk)) print() class HeapLog(gdb.Command): 'Print a log of recorded heap states' def __init__(self): gdb.Command.__init__ (self, "heap log", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): h = history if len(h.snapshots) == 0: print('(no history)') return for i in range(len(h.snapshots), 0, -1): s = h.snapshots[i-1] print('Label %i "%s" at %s' % (i, s.name, s.time)) print(' ', s.summary()) if i > 1: prev = h.snapshots[i-2] d = Diff(prev, s) print() print(' ', d.stats()) print() class HeapLabel(gdb.Command): 'Record the current state of the heap for later comparison' def __init__(self): gdb.Command.__init__ (self, "heap label", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): s = history.add(args) print(s.summary()) class HeapDiff(gdb.Command): 'Compare two states of the heap' def __init__(self): gdb.Command.__init__ (self, "heap diff", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): h = history if len(h.snapshots) == 0: print('(no history)') return prev = h.snapshots[-1] curr = Snapshot.current('current') d = Diff(prev, curr) print('Changes from %s to %s' % (prev.name, curr.name)) print(' ', d.stats()) print() print('\n'.join([' ' + line for line in d.as_changes().splitlines()])) class HeapSelect(gdb.Command): 'Query used heap chunks' def __init__(self): gdb.Command.__init__ (self, "heap select", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): from heap.query import do_query from heap.parser import ParserError try: do_query(args) except ParserError as e: print(e) class Hexdump(gdb.Command): 'Print a hexdump, starting at the specific region of memory' def __init__(self): gdb.Command.__init__ (self, "hexdump", gdb.COMMAND_DATA) def invoke(self, args, from_tty): print(repr(args)) arg_list = gdb.string_to_argv(args) chars_only = True if len(arg_list) == 2: addr_arg = arg_list[0] chars_only = True if args[1] == '-c' else False else: addr_arg = args if addr_arg.startswith('0x'): addr = int(addr_arg, 16) else: addr = int(addr_arg) # assume that paging will cut in and the user will quit at some point: size = 32 while True: hd = hexdump_as_bytes(addr, size, chars_only=chars_only) print ('%s -> %s %s' % (fmt_addr(addr), fmt_addr(addr + size -1), hd)) addr += size class HeapArenas(gdb.Command): 'Display heap arenas available' def __init__(self): gdb.Command.__init__ (self, "heap arenas", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): for n, arena in enumerate(glibc_arenas.arenas): print("Arena #%d: %s" % (n, arena.address)) class HeapArenaSelect(gdb.Command): 'Select heap arena' def __init__(self): gdb.Command.__init__ (self, "heap arena", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): arena_num = int(args) glibc_arenas.cur_arena = glibc_arenas.arenas[arena_num] print("Arena set to %s" % glibc_arenas.cur_arena.address) def register_commands(): # Register the commands with gdb Heap() HeapSizes() HeapUsed() HeapFree() HeapAll() HeapLog() HeapLabel() HeapDiff() HeapSelect() HeapArenas() HeapArenaSelect() Hexdump() from heap.cpython import register_commands as register_cpython_commands register_cpython_commands() ================================================ FILE: heap/compat.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ''' gdb versions vary greatly, this is a central place to deal with varying capabilities of the underlying gdb and its python bindings ''' import gdb # gdb.execute's to_string keyword argument was added between F13 and F14. # See https://bugzilla.redhat.com/show_bug.cgi?id=610241 has_gdb_execute_to_string = True try: # This will either capture the result, or fail before executing, # so in neither case should we get noise on stdout: gdb.execute('info registers', to_string=True) except TypeError: has_gdb_execute_to_string = False def execute(command): '''Equivalent to gdb.execute(to_string=True), returning the output as a string rather than logging it to stdout. On gdb versions lacking this capability, it uses redirection and temporary files to achieve the same result''' if has_gdb_execute_to_string: return gdb.execute(command, to_string = True) else: import tempfile f = tempfile.NamedTemporaryFile('r', delete=True) gdb.execute("set logging off") gdb.execute("set logging redirect off") gdb.execute("set logging file %s" % f.name) gdb.execute("set logging redirect on") gdb.execute("set logging on") gdb.execute(command) gdb.execute("set logging off") gdb.execute("set logging redirect off") result = f.read() f.close() return result def dump(): print ('Does gdb.execute have an "to_string" keyword argument? : %s' % has_gdb_execute_to_string) ================================================ FILE: heap/cplusplus.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # C++ support import re import gdb from heap import caching_lookup_type, looks_like_ptr from heap.compat import execute void_ptr_ptr = caching_lookup_type('void').pointer().pointer() def get_class_name(addr, size): # Try to detect a vtable ptr at the top of this object: vtable = gdb.Value(addr).cast(void_ptr_ptr).dereference() if not looks_like_ptr(vtable): return None info = execute('info sym (void *)0x%x' % int(vtable)) # "vtable for Foo + 8 in section .rodata of /home/david/heap/test_cplusplus" m = re.match('vtable for (.*) \+ (.*)', info) if m: return m.group(1) # Not matched: return None def as_cplusplus_object(addr, size): print(get_class_name(addr)) pass ================================================ FILE: heap/cpython.py ================================================ ''' This file is licensed under the PSF license ''' import sys import gdb from heap import WrappedPointer, caching_lookup_type, Usage, \ type_void_ptr, fmt_addr, Category, looks_like_ptr, \ WrongInferiorProcess, Table SIZEOF_VOID_P = type_void_ptr.sizeof # Transliteration from Python's obmalloc.c: ALIGNMENT = 8 ALIGNMENT_SHIFT = 3 ALIGNMENT_MASK = (ALIGNMENT - 1) # Return the number of bytes in size class I: def INDEX2SIZE(I): return (I + 1) << ALIGNMENT_SHIFT SYSTEM_PAGE_SIZE = (4 * 1024) SYSTEM_PAGE_SIZE_MASK = (SYSTEM_PAGE_SIZE - 1) ARENA_SIZE = (256 << 10) POOL_SIZE = SYSTEM_PAGE_SIZE POOL_SIZE_MASK = SYSTEM_PAGE_SIZE_MASK def ROUNDUP(x): return (x + ALIGNMENT_MASK) & ~ALIGNMENT_MASK def POOL_OVERHEAD(): return ROUNDUP(caching_lookup_type('struct pool_header').sizeof) class PyArenaPtr(WrappedPointer): # Wrapper around a (void*) that's a Python arena's buffer (the # arena->address, as opposed to the (struct arena_object*) itself) @classmethod def from_addr(cls, p, arenaobj): ptr = gdb.Value(p) ptr = ptr.cast(type_void_ptr) return cls(ptr, arenaobj) def __init__(self, gdbval, arenaobj): WrappedPointer.__init__(self, gdbval) assert(isinstance(arenaobj, ArenaObject)) self.arenaobj = arenaobj # obmalloc.c sets up arenaobj->pool_address to the first pool # address, aligning it to POOL_SIZE_MASK: self.initial_pool_addr = self.as_address() self.num_pools = ARENA_SIZE / POOL_SIZE self.excess = self.initial_pool_addr & POOL_SIZE_MASK if self.excess != 0: self.num_pools -= 1 self.initial_pool_addr += POOL_SIZE - self.excess def __str__(self): return ('PyArenaPtr([%s->%s], %i pools [%s->%s], excess: %i tracked by %s)' % (fmt_addr(self.as_address()), fmt_addr(self.as_address() + ARENA_SIZE - 1), self.num_pools, fmt_addr(self.initial_pool_addr), fmt_addr(self.initial_pool_addr + (self.num_pools * POOL_SIZE) - 1), self.excess, self.arenaobj ) ) def iter_pools(self): '''Yield a sequence of PyPoolPtr, representing all of the pools within this arena''' # print 'num_pools:', num_pools pool_addr = self.initial_pool_addr for idx in range(self.num_pools): # "pool_address" is a high-water-mark for activity within the arena; # pools at this location or beyond haven't been initialized yet: if pool_addr >= self.arenaobj.pool_address: return pool = PyPoolPtr.from_addr(pool_addr) yield pool pool_addr += POOL_SIZE def iter_usage(self): '''Yield a series of Usage instances''' if self.excess != 0: # FIXME: this size is wrong yield Usage(self.as_address(), self.excess, Category('pyarena', 'alignment wastage')) for pool in self.iter_pools(): # print 'pool:', pool for u in pool.iter_usage(): yield u # FIXME: unused space (if any) between pool_address and the alignment top # if self.excess != 0: # # FIXME: this address is wrong # yield Usage(self.as_address(), self.excess, Category('pyarena', 'alignment wastage')) class PyPoolPtr(WrappedPointer): # Wrapper around Python's obmalloc.c: poolp: (struct pool_header *) @classmethod def from_addr(cls, p): ptr = gdb.Value(p) ptr = ptr.cast(cls.gdb_type()) return cls(ptr) def __str__(self): return ('PyPoolPtr([%s->%s: %d blocks of size %i bytes))' % (fmt_addr(self.as_address()), fmt_addr(self.as_address() + POOL_SIZE - 1), self.num_blocks(), self.block_size())) @classmethod def gdb_type(cls): # Deferred lookup of the "poolp" type: return caching_lookup_type('poolp') def block_size(self): return INDEX2SIZE(self.field('szidx')) def num_blocks(self): firstoffset = self._firstoffset() maxnextoffset = self._maxnextoffset() offsetrange = maxnextoffset - firstoffset return offsetrange / self.block_size() # FIXME: not exactly correctly def _firstoffset(self): return POOL_OVERHEAD() def _maxnextoffset(self): return POOL_SIZE - self.block_size() def iter_blocks(self): '''Yield all blocks within this pool, whether free or in use''' size = self.block_size() maxnextoffset = self._maxnextoffset() # print initnextoffset, maxnextoffset offset = self._firstoffset() base_addr = self.as_address() while offset <= maxnextoffset: yield (base_addr + offset, size) offset += size def iter_usage(self): # The struct pool_header at the front: yield Usage(self.as_address(), POOL_OVERHEAD(), Category('pyarena', 'pool_header overhead')) fb = list(self.iter_free_blocks()) for (start, size) in fb: yield Usage(start, size, Category('pyarena', 'freed pool chunk')) for (start, size) in self.iter_used_blocks(): if (start, size) not in fb: yield Usage(start, size) #, 'python pool: ' + categorize(start, size, None)) # FIXME: yield any wastage at the end def iter_free_blocks(self): '''Yield the sequence of free blocks within this pool. Doesn't include the areas after nextoffset that have never been allocated''' # print self._gdbval.dereference() size = self.block_size() freeblock = self.field('freeblock') _type_block_ptr_ptr = caching_lookup_type('unsigned char').pointer().pointer() # Walk the singly-linked list of free blocks for this chunk while int(freeblock) != 0: # print 'freeblock:', (fmt_addr(int(freeblock)), int(size)) yield (int(freeblock), int(size)) freeblock = freeblock.cast(_type_block_ptr_ptr).dereference() def _free_blocks(self): # Get the set of addresses of free blocks return set([addr for addr, size in self.iter_free_blocks()]) def iter_used_blocks(self): '''Yield the sequence of currently in-use blocks within this pool''' # We'll filter out the free blocks from the list: free_block_addresses = self._free_blocks() size = self.block_size() initnextoffset = self._firstoffset() nextoffset = self.field('nextoffset') #print initnextoffset, nextoffset offset = initnextoffset base_addr = self.as_address() # Iterate upwards until you reach "pool->nextoffset": blocks beyond # that point have never been allocated: while offset < nextoffset: addr = base_addr + offset # Filter out those within this pool's linked list of free blocks: if int(addr) not in free_block_addresses: yield (int(addr), int(size)) offset += size Py_TPFLAGS_HEAPTYPE = (1 << 9) Py_TPFLAGS_INT_SUBCLASS = (1 << 23) Py_TPFLAGS_LONG_SUBCLASS = (1 << 24) Py_TPFLAGS_LIST_SUBCLASS = (1 << 25) Py_TPFLAGS_TUPLE_SUBCLASS = (1 << 26) Py_TPFLAGS_STRING_SUBCLASS = (1 << 27) Py_TPFLAGS_UNICODE_SUBCLASS = (1 << 28) Py_TPFLAGS_DICT_SUBCLASS = (1 << 29) Py_TPFLAGS_BASE_EXC_SUBCLASS = (1 << 30) Py_TPFLAGS_TYPE_SUBCLASS = (1 << 31) class PyObjectPtr(WrappedPointer): @classmethod def from_pyobject_ptr(cls, addr): ob_type = addr['ob_type'] tp_flags = ob_type['tp_flags'] if tp_flags & Py_TPFLAGS_HEAPTYPE: return HeapTypeObjectPtr(addr) if tp_flags & Py_TPFLAGS_UNICODE_SUBCLASS: return PyUnicodeObjectPtr(addr.cast(caching_lookup_type('PyUnicodeObject').pointer())) if tp_flags & Py_TPFLAGS_DICT_SUBCLASS: return PyDictObjectPtr(addr.cast(caching_lookup_type('PyDictObject').pointer())) tp_name = ob_type['tp_name'].string() if tp_name == 'instance': __type_PyInstanceObjectPtr = caching_lookup_type('PyInstanceObject').pointer() return PyInstanceObjectPtr(addr.cast(__type_PyInstanceObjectPtr)) return PyObjectPtr(addr) def type(self): return PyTypeObjectPtr(self.field('ob_type')) def safe_tp_name(self): try: return self.type().field('tp_name').string() except(RuntimeError, UnicodeDecodeError): # Can't even read the object at all? return 'unknown' def categorize(self): # Python objects will be categorized as ("python", tp_name), but # old-style classes have to do more work return Category('python', self.safe_tp_name()) def as_malloc_addr(self): ob_type = addr['ob_type'] tp_flags = ob_type['tp_flags'] addr = int(self._gdbval) if tp_flags & Py_TPFLAGS_: # FIXME return obj_addr_to_gc_addr(addr) else: return addr # Taken from my libpython.py code in python's Tools/gdb/libpython.py # FIXME: ideally should share code somehow def _PyObject_VAR_SIZE(typeobj, nitems): type_size_t = caching_lookup_type('size_t') return ( ( typeobj.field('tp_basicsize') + nitems * typeobj.field('tp_itemsize') + (SIZEOF_VOID_P - 1) ) & ~(SIZEOF_VOID_P - 1) ).cast(type_size_t) def int_from_int(gdbval): return int(gdbval) class PyUnicodeObjectPtr(PyObjectPtr): """ Class wrapping a gdb.Value that's a PyUnicodeObject* within the process being debugged. """ _typename = 'PyUnicodeObject' def categorize_refs(self, usage_set, level=0, detail=None): m_str = int(self.field('str')) usage_set.set_addr_category(m_str, Category('cpython', 'PyUnicodeObject buffer', detail), level) return True class PyDictObjectPtr(PyObjectPtr): """ Class wrapping a gdb.Value that's a PyDictObject* i.e. a dict instance within the process being debugged. """ _typename = 'PyDictObject' def categorize_refs(self, usage_set, level=0, detail=None): ma_table = int(self.field('ma_table')) usage_set.set_addr_category(ma_table, Category('cpython', 'PyDictEntry table', detail), level) return True class PyInstanceObjectPtr(PyObjectPtr): _typename = 'PyInstanceObject' def cl_name(self): in_class = self.field('in_class') # cl_name is a python string, not a char*; rely on # prettyprinters for now: cl_name = str(in_class['cl_name'])[1:-1] return cl_name def categorize(self): return Category('python', self.cl_name(), 'old-style') def categorize_refs(self, usage_set, level=0, detail=None): cl_name = self.cl_name() # print 'cl_name', cl_name # Visit the in_dict: in_dict = self.field('in_dict') # print 'in_dict', in_dict dict_detail = '%s.__dict__' % cl_name # Mark the ptr as being a dictionary, adding detail usage_set.set_addr_category(obj_addr_to_gc_addr(in_dict), Category('cpython', 'PyDictObject', dict_detail), level=1) # Visit ma_table: _type_PyDictObject_ptr = caching_lookup_type('PyDictObject').pointer() in_dict = in_dict.cast(_type_PyDictObject_ptr) ma_table = int(in_dict['ma_table']) # Record details: usage_set.set_addr_category(ma_table, Category('cpython', 'PyDictEntry table', dict_detail), level=2) return True class PyTypeObjectPtr(PyObjectPtr): _typename = 'PyTypeObject' class HeapTypeObjectPtr(PyObjectPtr): _typename = 'PyObject' def categorize_refs(self, usage_set, level=0, detail=None): attr_dict = self.get_attr_dict() if attr_dict: # Mark the dictionary's "detail" with our typename # gdb.execute('print (PyObject*)0x%x' % int(attr_dict._gdbval)) usage_set.set_addr_category(obj_addr_to_gc_addr(attr_dict._gdbval), Category('python', 'dict', '%s.__dict__' % self.safe_tp_name()), level=level+1) # and mark the dict's PyDictEntry with our typename: attr_dict.categorize_refs(usage_set, level=level+1, detail='%s.__dict__' % self.safe_tp_name()) return True def get_attr_dict(self): ''' Get the PyDictObject ptr representing the attribute dictionary (or None if there's a problem) ''' from heap import type_char_ptr try: typeobj = self.type() dictoffset = int_from_int(typeobj.field('tp_dictoffset')) if dictoffset != 0: if dictoffset < 0: type_PyVarObject_ptr = caching_lookup_type('PyVarObject').pointer() tsize = int_from_int(self._gdbval.cast(type_PyVarObject_ptr)['ob_size']) if tsize < 0: tsize = -tsize size = _PyObject_VAR_SIZE(typeobj, tsize) dictoffset += size assert dictoffset > 0 if dictoffset % SIZEOF_VOID_P != 0: # Corrupt somehow? return None dictptr = self._gdbval.cast(type_char_ptr) + dictoffset PyObjectPtrPtr = caching_lookup_type('PyObject').pointer().pointer() dictptr = dictptr.cast(PyObjectPtrPtr) return PyObjectPtr.from_pyobject_ptr(dictptr.dereference()) except RuntimeError: # Corrupt data somewhere; fail safe pass # Not found, or some kind of error: return None def is_pyobject_ptr(addr): try: _type_pyop = caching_lookup_type('PyObject').pointer() _type_pyvarop = caching_lookup_type('PyVarObject').pointer() except RuntimeError: # not linked against python return None pyop = gdb.Value(addr).cast(_type_pyop) try: ob_refcnt = pyop['ob_refcnt'] if ob_refcnt >=0 and ob_refcnt < 0xffff: obtype = pyop['ob_type'] if obtype != 0: type_refcnt = obtype.cast(_type_pyop)['ob_refcnt'] if type_refcnt > 0 and type_refcnt < 0xffff: type_ob_size = obtype.cast(_type_pyvarop)['ob_size'] if type_ob_size > 0xffff: return 0 for fieldname in ('tp_del', 'tp_mro', 'tp_init', 'tp_getset'): if not looks_like_ptr(obtype[fieldname]): return 0 # Then this looks like a Python object: return PyObjectPtr.from_pyobject_ptr(pyop) except (RuntimeError, UnicodeDecodeError): pass # Not a python object (or corrupt) # Doesn't look like a python object, implicit return None def obj_addr_to_gc_addr(addr): '''Given a PyObject* address, convert to a PyGC_Head* address (i.e. the allocator's view of the same)''' #print 'obj_addr_to_gc_addr(%s)' % fmt_addr(int(addr)) _type_PyGC_Head = caching_lookup_type('PyGC_Head') return int(addr) - _type_PyGC_Head.sizeof def as_python_object(addr): '''Given an address of an allocation, determine if it holds a PyObject, or a PyGC_Head Return a WrappedPointer for the PyObject* if it does (which might have a different location c.f. when PyGC_Head was allocated) Return None if it doesn't look like a PyObject*''' # Try casting to PyObject* ? # FIXME: what about the debug allocator? try: _type_pyop = caching_lookup_type('PyObject').pointer() _type_PyGC_Head = caching_lookup_type('PyGC_Head') except RuntimeError: # not linked against python return None pyop = is_pyobject_ptr(addr) if pyop: return pyop else: # maybe a GC type: _type_PyGC_Head_ptr = _type_PyGC_Head.pointer() gc_ptr = gdb.Value(addr).cast(_type_PyGC_Head_ptr) # print gc_ptr.dereference() PYGC_REFS_REACHABLE = -3 if gc_ptr['gc']['gc_refs'] == PYGC_REFS_REACHABLE: # FIXME: need to cover other values pyop = is_pyobject_ptr(gdb.Value(addr + _type_PyGC_Head.sizeof)) if pyop: return pyop # Doesn't look like a python object, implicit return None class ArenaObject(WrappedPointer): ''' Wrapper around Python's struct arena_object* Note that this is record-keeping for an arena, not the memory itself ''' @classmethod def iter_arenas(cls): try: val_arenas = gdb.parse_and_eval('arenas') val_maxarenas = gdb.parse_and_eval('maxarenas') except RuntimeError: # Not linked against python, or no debug information: raise WrongInferiorProcess('cpython') try: for i in range(val_maxarenas): # Look up "&arenas[i]": obj = ArenaObject(val_arenas[i].address) # obj->address == 0 indicates an unused entry within the "arenas" array: if obj.address != 0: yield obj except RuntimeError: # pypy also has a symbol named "arenas", of type "long unsigned int * volatile" # For now, ignore it: return @property # need to override the base property def address(self): return self.field('address') def __init__(self, gdbval): WrappedPointer.__init__(self, gdbval) # Cache some values: # This is the high-water mark: at this point and beyond, the bytes of # memory are untouched since malloc: self.pool_address = self.field('pool_address') class ArenaDetection(object): '''Detection of CPython arenas, done as an object so that we can cache state''' def __init__(self): self.arenaobjs = list(ArenaObject.iter_arenas()) def as_arena(self, ptr, chunksize): '''Detect if this ptr returned by malloc is in use as a Python arena, returning PyArenaPtr if it is, None if not''' # Fast rejection of too-small chunks: if chunksize < (256 * 1024): return None for arenaobj in self.arenaobjs: if ptr == arenaobj.address: # Found it: return PyArenaPtr.from_addr(ptr, arenaobj) # Not found: return None def python_categorization(usage_set): # special-cased categorization for CPython # The Objects/stringobject.c:interned dictionary is typically large, # with its PyDictEntry table occuping 200k on a 64-bit build of python 2.6 # Identify it: try: val_interned = gdb.parse_and_eval('interned') pyop = PyDictObjectPtr.from_pyobject_ptr(val_interned) ma_table = int(pyop.field('ma_table')) usage_set.set_addr_category(ma_table, Category('cpython', 'PyDictEntry table', 'interned'), level=1) except RuntimeError: pass # Various kinds of per-type optimized allocator # See Modules/gcmodule.c:clear_freelists # The Objects/intobject.c: block_list try: val_block_list = gdb.parse_and_eval('block_list') if str(val_block_list.type.target()) != 'PyIntBlock': raise RuntimeError while int(val_block_list) != 0: usage_set.set_addr_category(int(val_block_list), Category('cpython', '_intblock', ''), level=0) val_block_list = val_block_list['next'] except RuntimeError: pass # The Objects/floatobject.c: block_list # TODO: how to get at this? multiple vars named "block_list" # Objects/methodobject.c: PyCFunction_ClearFreeList # "free_list" of up to 256 PyCFunctionObject, but they're still of # that type # Objects/classobject.c: PyMethod_ClearFreeList # "free_list" of up to 256 PyMethodObject, but they're still of that type # Objects/frameobject.c: PyFrame_ClearFreeList # "free_list" of up to 300 PyFrameObject, but they're still of that type # Objects/tupleobject.c: array of free_list: up to 2000 free tuples of each # size from 1-20 (using ob_item[0] to chain up); singleton for size 0; they # are still tuples when deallocated, though # Objects/unicodeobject.c: # "free_list" of up to 1024 PyUnicodeObject, with the "str" buffer # optionally preserved also for lengths up to 9 # They're all still of type "unicode" when free # Singletons for the empty unicode string, and for the first 256 code # points (Latin-1) # New gdb commands, specific to CPython from heap.commands import need_debuginfo class HeapCPythonAllocators(gdb.Command): "For CPython: display information on the allocators" def __init__(self): gdb.Command.__init__ (self, "heap cpython-allocators", gdb.COMMAND_DATA) @need_debuginfo def invoke(self, args, from_tty): t = Table(columnheadings=('struct arena_object*', '256KB buffer location', 'Free pools')) for arena in ArenaObject.iter_arenas(): t.add_row([fmt_addr(arena.as_address()), fmt_addr(arena.address), '%i / %i ' % (arena.field('nfreepools'), arena.field('ntotalpools')) ]) print('Objects/obmalloc.c: %i arenas' % len(t.rows)) t.write(sys.stdout) print() def register_commands(): HeapCPythonAllocators() ================================================ FILE: heap/glibc.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ''' gdb 7 hooks for glibc's heap implementation See /usr/src/debug/glibc-*/malloc/ e.g. /usr/src/debug/glibc-2.11.1/malloc/malloc.h and /usr/src/debug/glibc-2.11.1/malloc/malloc.c This file is licenced under the LGPLv2.1 ''' import re import gdb from heap import WrappedPointer, WrappedValue, caching_lookup_type, \ type_char_ptr, check_missing_debuginfo, array_length, offsetof class MChunkPtr(WrappedPointer): '''Wrapper around glibc's mchunkptr Note: as_address() gives the address of the chunk as seen by the malloc implementation as_mem() gives the address as seen by the user of malloc''' # size field is or'ed with PREV_INUSE when previous adjacent chunk in use PREV_INUSE = 0x1 # /* extract inuse bit of previous chunk */ # #define prev_inuse(p) ((p)->size & PREV_INUSE) # size field is or'ed with IS_MMAPPED if the chunk was obtained with mmap() IS_MMAPPED = 0x2 # /* check for mmap()'ed chunk */ # #define chunk_is_mmapped(p) ((p)->size & IS_MMAPPED) # size field is or'ed with NON_MAIN_ARENA if the chunk was obtained # from a non-main arena. This is only set immediately before handing # the chunk to the user, if necessary. NON_MAIN_ARENA = 0x4 # /* check for chunk from non-main arena */ # #define chunk_non_main_arena(p) ((p)->size & NON_MAIN_ARENA) SIZE_BITS = (PREV_INUSE|IS_MMAPPED|NON_MAIN_ARENA) @classmethod def gdb_type(cls): # Deferred lookup of the "mchunkptr" type: return caching_lookup_type('mchunkptr') def size(self): if not(hasattr(self, '_cached_size')): self._cached_size = int(self.field('mchunk_size')) return self._cached_size def chunksize(self): return self.size() & ~(self.SIZE_BITS) def has_flag(self, flag): return self.size() & flag def has_PREV_INUSE(self): return self.has_flag(self.PREV_INUSE) def has_IS_MMAPPED(self): return self.has_flag(self.IS_MMAPPED) def has_NON_MAIN_ARENA(self): return self.has_flag(self.NON_MAIN_ARENA) def __str__(self): result = ('<%s chunk=0x%x mem=0x%x' % (self.__class__.__name__, self.as_address(), self.as_mem())) if self.has_PREV_INUSE(): result += ' PREV_INUSE' else: result += ' prev_size=%i' % self.field('mchunk_prev_size') if self.has_NON_MAIN_ARENA(): result += ' NON_MAIN_ARENA' if self.has_IS_MMAPPED(): result += ' IS_MMAPPED' else: if self.is_inuse(): result += ' inuse' else: result += ' free' SIZE_SZ = caching_lookup_type('size_t').sizeof result += ' chunksize=%i memsize=%i>' % (self.chunksize(), self.chunksize() - (2 * SIZE_SZ)) return result def as_mem(self): # Analog of chunk2mem: the address as seen by the program (e.g. malloc) SIZE_SZ = caching_lookup_type('size_t').sizeof return self.as_address() + (2 * SIZE_SZ) def is_inuse(self): # Is this chunk is use? if self.has_IS_MMAPPED(): return True # Analog of #define inuse(p) # ((((mchunkptr)(((char*)(p))+((p)->size & ~SIZE_BITS)))->size) & PREV_INUSE) nc = self.next_chunk() return nc.has_PREV_INUSE() def next_chunk(self): # Analog of: # #define next_chunk(p) ((mchunkptr)( ((char*)(p)) + ((p)->size & ~SIZE_BITS) )) ptr = self._gdbval.cast(type_char_ptr) cs = self.chunksize() ptr += cs ptr = ptr.cast(MChunkPtr.gdb_type()) #print 'next_chunk returning: 0x%x' % ptr return MChunkPtr(ptr) def prev_chunk(self): # Analog of: # #define prev_chunk(p) ((mchunkptr)( ((char*)(p)) - ((p)->prev_size) )) ptr = self._gdbval.cast(type_char_ptr) ptr -= self.field('prev_size') ptr = ptr.cast(MChunkPtr.gdb_type()) return MChunkPtr(ptr) class MBinPtr(MChunkPtr): # Wrapper around an "mbinptr" @classmethod def gdb_type(cls): # Deferred lookup of the "mbinptr" type: return caching_lookup_type('mbinptr') def first(self): return MChunkPtr(self.field('fd')) def last(self): return MChunkPtr(self.field('bk')) class MFastBinPtr(MChunkPtr): # Wrapped around a mfastbinptr pass class MallocState(WrappedValue): # Wrapper around struct malloc_state, as defined in malloc.c def fastbin(self, idx): return MFastBinPtr(self.field('fastbinsY')[idx]) def bin_at(self, i): # addressing -- note that bin_at(0) does not exist # (mbinptr) (((char *) &((m)->bins[((i) - 1) * 2])) # - offsetof (struct malloc_chunk, fd)) ptr = self.field('bins')[(i-1)*2] #print '001', ptr ptr = ptr.address #print '002', ptr ptr = ptr.cast(type_char_ptr) #print '003', ptr ptr -= offsetof('struct malloc_chunk', 'fd') #print '004', ptr ptr = ptr.cast(MBinPtr.gdb_type()) #print '005', ptr return MBinPtr(ptr) def iter_chunks(self): '''Yield a sequence of MChunkPtr corresponding to all chunks of memory in the heap (both used and free), in order of ascending address''' for c in self.iter_mmap_chunks(): yield c for c in self.iter_sbrk_chunks(): yield c def iter_mmap_chunks(self): for inf in gdb.inferiors(): for (start, end) in iter_mmap_heap_chunks(inf.pid): # print "Trying 0x%x-0x%x" % (start, end) try: chunk = MChunkPtr(gdb.Value(start).cast(MChunkPtr.gdb_type())) # Does this look like the first chunk within a range of # mmap address space? #print ('0x%x' % chunk.as_address() + chunk.chunksize()) if (not chunk.has_NON_MAIN_ARENA() and chunk.has_IS_MMAPPED() and chunk.as_address() + chunk.chunksize() <= end): # Iterate upwards until you reach "end" of mmap space: while chunk.as_address() < end and chunk.has_IS_MMAPPED(): yield chunk # print '0x%x' % chunk.as_address(), chunk chunk = chunk.next_chunk() except RuntimeError: pass def iter_sbrk_chunks(self): '''Yield a sequence of MChunkPtr corresponding to all chunks of memory in the heap (both used and free), in order of ascending address, for those from sbrk_base upwards''' # FIXME: this is currently a hack; I need to verify my logic here # As I understand it, it's only possible to navigate the following ways: # # For a chunk with PREV_INUSE:0, then prev_size is valid, and can be used # to substract down to the start of that chunk # For a chunk with PREV_INUSE:1, then prev_size is not readable (reading it # could lead to SIGSEGV), and it's not possible to get at the size of the # previous chunk. # For a free chunk, we have next/prev pointers to a doubly-linked list # of other free chunks. # For a chunk, we have the size, and that size gives us the address of the next chunk in RAM # So if we know the address of the first chunk, then we can use this to iterate upwards through RAM, # and thus iterate over all of the chunks # Start at "mp_.sbrk_base" chunk = MChunkPtr(gdb.Value(sbrk_base()).cast(MChunkPtr.gdb_type())) # sbrk_base is NULL when no small allocations have happened: if chunk.as_address() > 0: # Iterate upwards until you reach "top": top = int(self.field('top')) while chunk.as_address() != top: yield chunk # print '0x%x' % chunk.as_address(), chunk try: chunk = chunk.next_chunk() except RuntimeError: break def iter_free_chunks(self): '''Yield a sequence of MChunkPtr (some of which may be MFastBinPtr), corresponding to the free chunks of memory''' # Account for top: print('top') yield MChunkPtr(self.field('top')) NFASTBINS = self.NFASTBINS() # Traverse fastbins: for i in range(0, int(NFASTBINS)): print('fastbin %i' % i) p = self.fastbin(i) while not p.is_null(): yield p p = MFastBinPtr(p.field('fd')) # for (p = fastbin (av, i); p != 0; p = p->fd) { # ++nfastblocks; # fastavail += chunksize(p); # } # } # Must keep this in-sync with malloc.c: # FIXME: can we determine this dynamically from within gdb? NBINS = 128 # Traverse regular bins: for i in range(1, NBINS): print('regular bin %i' % i) b = self.bin_at(i) #print 'b: %s' % b p = b.last() n = 0 #print 'p:', p while p.as_address() != b.as_address(): #print 'n:', n #print 'b:', b #print 'p:', p n+=1 yield p p = MChunkPtr(p.field('bk')) # for (p = last(b); p != b; p = p->bk) { # ++nblocks; # avail += chunksize(p); # } # } def NFASTBINS(self): fastbinsY = self.field('fastbinsY') return array_length(fastbinsY) class MallocPar(WrappedValue): # Wrapper around static struct malloc_par mp_ @classmethod def get(cls): # It's a singleton: gdbval = gdb.parse_and_eval('mp_') return MallocPar(gdbval) def sbrk_base(): mp_ = MallocPar.get() try: return int(mp_.field('sbrk_base')) except RuntimeError as e: check_missing_debuginfo(e, 'glibc') raise e """ """ # See malloc.c: # struct mallinfo mALLINFo(mstate av) # { # struct mallinfo mi; # size_t i; # mbinptr b; # mchunkptr p; # INTERNAL_SIZE_T avail; # INTERNAL_SIZE_T fastavail; # int nblocks; # int nfastblocks; # # /* Ensure initialization */ # if (av->top == 0) malloc_consolidate(av); # # check_malloc_state(av); # # /* Account for top */ # avail = chunksize(av->top); # nblocks = 1; /* top always exists */ # # /* traverse fastbins */ # nfastblocks = 0; # fastavail = 0; # # for (i = 0; i < NFASTBINS; ++i) { # for (p = fastbin (av, i); p != 0; p = p->fd) { # ++nfastblocks; # fastavail += chunksize(p); # } # } # # avail += fastavail; # # /* traverse regular bins */ # for (i = 1; i < NBINS; ++i) { # b = bin_at(av, i); # for (p = last(b); p != b; p = p->bk) { # ++nblocks; # avail += chunksize(p); # } # } # # mi.smblks = nfastblocks; # mi.ordblks = nblocks; # mi.fordblks = avail; # mi.uordblks = av->system_mem - avail; # mi.arena = av->system_mem; # mi.hblks = mp_.n_mmaps; # mi.hblkhd = mp_.mmapped_mem; # mi.fsmblks = fastavail; # mi.keepcost = chunksize(av->top); # mi.usmblks = mp_.max_total_mem; # return mi; # } # def iter_mmap_heap_chunks(pid): '''Try to locate the memory-mapped heap allocations for the given process (by PID) by reading /proc/PID/maps Yield a sequence of (start, end) pairs''' for line in open('/proc/%i/maps' % pid): # print line, # e.g.: # 38e441e000-38e441f000 rw-p 0001e000 fd:01 1087 /lib64/ld-2.11.1.so # 38e441f000-38e4420000 rw-p 00000000 00:00 0 hexd = r'[0-9a-f]' hexdigits = '(' + hexd + '+)' m = re.match(hexdigits + '-' + hexdigits + r' ([r\-][w\-][x\-][ps]) ' + hexdigits + r' (..:..) (\d+)\s+(.*)', line) if m: # print m.groups() start, end, perms, offset, dev, inode, pathname = m.groups() # PROT_READ, PROT_WRITE, MAP_PRIVATE: if perms == 'rw-p': if offset == '00000000': # FIXME bits? if dev == '00:00': # FIXME if inode == '0': # FIXME if pathname == '': # FIXME # print 'heap line?:', line # print m.groups() start, end = [int(m.group(i), 16) for i in (1, 2)] yield (start, end) else: print('unmatched :', line) class GlibcArenas(object): def __init__(self): self.main_arena = self.get_main_arena() self.cur_arena = self.get_ms(self.main_arena) self.get_arenas() def get_main_arena(self): return gdb.parse_and_eval("main_arena") def get_ms(self, arena_dereference=None): if arena_dereference: ms = MallocState(arena_dereference) else: ms = self.cur_arena return ms def get_arenas(self): ar_ptr = self.get_ms(self.main_arena) self.arenas = [] while True: self.arenas.append(ar_ptr) if ar_ptr.address != ar_ptr.field('next'): ar_ptr = self.get_ms(ar_ptr.field('next').dereference()) if ar_ptr.address == self.main_arena.address: return glibc_arenas = GlibcArenas() ================================================ FILE: heap/gobject.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA import re import sys import gdb from heap import WrappedPointer, WrappedValue, caching_lookup_type, type_char_ptr, Category # Use glib's pretty-printers: dir_ = '/usr/share/glib-2.0/gdb' if not dir_ in sys.path: sys.path.insert(0, dir_) from glib_gdb import read_global_var, g_quark_to_string # This was adapted from glib's gobject.py:g_type_to_name def get_typenode_for_gtype(gtype): def lookup_fundamental_type(typenode): if typenode == 0: return None val = read_global_var("static_fundamental_type_nodes") if val == None: return None # glib has an address() call here on the end, which looks wrong # (i) it's an attribute, not a method # (ii) it converts a TypeNode* to a TypeNode** return val[typenode >> 2] gtype = int(gtype) typenode = gtype - gtype % 4 if typenode > (255 << 2): return gdb.Value(typenode).cast (gdb.lookup_type("TypeNode").pointer()) else: return lookup_fundamental_type (typenode) def is_typename_castable(typename): if typename.startswith('Gtk'): return True if typename.startswith('Gdk'): return True if typename.startswith('GType'): return True if typename.startswith('Pango'): return True if typename.startswith('GVfs'): return True return False class GTypeInstancePtr(WrappedPointer): @classmethod def from_gtypeinstance_ptr(cls, addr, typenode): typename = cls.get_type_name(typenode) if typename: cls = cls.get_class_for_typename(typename) return cls(addr, typenode, typename) @classmethod def get_class_for_typename(cls, typename): '''Get the GTypeInstance subclass for the given type name''' if typename in typemap: return typemap[typename] return GTypeInstancePtr def __init__(self, addr, typenode, typename): # Try to cast the ptr to the named type: addr = gdb.Value(addr) try: if is_typename_castable(typename): # This requires, say, gtk2-debuginfo: ptr_type = caching_lookup_type(typename).pointer() addr = addr.cast(ptr_type) #print typename, addr.dereference() #if typename == 'GdkPixbuf': # print 'GOT PIXELS', addr['pixels'] except RuntimeError as e: pass #print addr, e WrappedPointer.__init__(self, addr) self.typenode = typenode self.typename = typename """ try: print 'self', self print 'self.typename', self.typename print 'typenode', typenode print 'typenode.type', typenode.type print 'typenode.dereference()', typenode.dereference() print except: print 'got here' raise """ def categorize(self): return Category('GType', self.typename, '') @classmethod def get_type_name(cls, typenode): return g_quark_to_string(typenode["qname"]) class GdkColormapPtr(GTypeInstancePtr): def categorize_refs(self, usage_set, level=0, detail=None): # print 'got here 46' pass # GdkRgbInfo is stored as qdata on a GdkColormap class GdkImagePtr(GTypeInstancePtr): def categorize_refs(self, usage_set, level=0, detail=None): priv_type = caching_lookup_type('GdkImagePrivateX11').pointer() priv_data = WrappedPointer(self._gdbval['windowing_data'].cast(priv_type)) usage_set.set_addr_category(priv_data.as_address(), Category('GType', 'GdkImagePrivateX11', ''), level=level+1, debug=True) ximage = WrappedPointer(priv_data.field('ximage')) dims = '%sw x %sh x %sbpp' % (ximage.field('width'), ximage.field('height'), ximage.field('depth')) usage_set.set_addr_category(ximage.as_address(), Category('X11', 'Image', dims), level=level+2, debug=True) usage_set.set_addr_category(int(ximage.field('data')), Category('X11', 'Image data', dims), level=level+2, debug=True) class GdkPixbufPtr(GTypeInstancePtr): def categorize_refs(self, usage_set, level=0, detail=None): dims = '%sw x %sh' % (self._gdbval['width'], self._gdbval['height']) usage_set.set_addr_category(int(self._gdbval['pixels']), Category('GType', 'GdkPixbuf pixels', dims), level=level+1, debug=True) class PangoCairoFcFontMapPtr(GTypeInstancePtr): def categorize_refs(self, usage_set, level=0, detail=None): # This gives us access to the freetype library: FT_Library = WrappedPointer(self._gdbval['library']) # This is actually a "struct FT_LibraryRec_", in FreeType's # include/freetype/internal/ftobjs.h # print FT_Library._gdbval.dereference() usage_set.set_addr_category(FT_Library.as_address(), Category('FreeType', 'Library', ''), level=level+1, debug=True) usage_set.set_addr_category(int(FT_Library.field('raster_pool')), Category('FreeType', 'raster_pool', ''), level=level+2, debug=True) # potentially we could look at FT_Library['memory'] typemap = { 'GdkColormap':GdkColormapPtr, 'GdkImage':GdkImagePtr, 'GdkPixbuf':GdkPixbufPtr, 'PangoCairoFcFontMap':PangoCairoFcFontMapPtr, } def as_gtype_instance(addr, size): #type_GObject_ptr = caching_lookup_type('GObject').pointer() try: type_GTypeInstance_ptr = caching_lookup_type('GTypeInstance').pointer() except RuntimeError: # Not linked against GLib? return None gobj = gdb.Value(addr).cast(type_GTypeInstance_ptr) try: gtype = gobj['g_class']['g_type'] #print 'gtype', gtype typenode = get_typenode_for_gtype(gtype) # If I remove the next line, we get errors like: # Cannot access memory at address 0xd1a712caa5b6e5c0 # Does this line give us an early chance to raise an exception? #print 'typenode', typenode # It appears to be in the coercion to boolean here: # if typenode: if typenode is not None: #print 'typenode.dereference()', typenode.dereference() return GTypeInstancePtr.from_gtypeinstance_ptr(addr, typenode) except RuntimeError: # Any random buffer that we point this at that isn't a GTypeInstance (or # GObject) is likely to raise a RuntimeError at some point in the above pass return None # FIXME: currently this ignores G_SLICE # e.g. use # G_SLICE=always-malloc # to override this ================================================ FILE: heap/history.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA import datetime from heap import iter_usage_with_progress, fmt_size, fmt_addr, sign class Snapshot(object): '''Snapshot of the state of the heap''' def __init__(self, name, time): self.name = name self.time = time self._all_usage = set() self._totalsize = 0 self._num_usage = 0 def _add_usage(self, u): self._all_usage.add(u) self._totalsize += u.size self._num_usage += 1 return u @classmethod def current(cls, name): result = cls(name, datetime.datetime.now()) for i, u in enumerate(iter_usage_with_progress()): u.ensure_category() u.ensure_hexdump() result._add_usage(u) return result def total_size(self): '''Get total allocated size, in bytes''' return self._totalsize def summary(self): return '%s allocated, in %i blocks' % (fmt_size(self.total_size()), self._num_usage) def size_by_address(self, address): return self._chunk_by_address[address].size class History(object): '''History of snapshots of the state of the heap''' def __init__(self): self.snapshots = [] def add(self, name): s = Snapshot.current(name) self.snapshots.append(s) return s class Diff(object): '''Differences between two states of the heap''' def __init__(self, old, new): self.old = old self.new = new self.new_minus_old = self.new._all_usage - self.old._all_usage self.old_minus_new = self.old._all_usage - self.new._all_usage def stats(self): size_change = self.new.total_size() - self.old.total_size() count_change = self.new._num_usage - self.old._num_usage return "%s%s bytes, %s%s blocks" % (sign(size_change), fmt_size(size_change), sign(count_change), fmt_size(count_change)) def as_changes(self): result = self.chunk_report('Free-d blocks', self.old, self.old_minus_new) result += self.chunk_report('New blocks', self.new, self.new_minus_old) # FIXME: add changed chunks return result def chunk_report(self, title, snapshot, set_of_usage): result = '%s:\n' % title if len(set_of_usage) == 0: result += ' (none)\n' return result for usage in sorted(set_of_usage, lambda u1, u2: cmp(u1.start, u2.start)): result += (' %s -> %s %8i bytes %20s |%s\n' % (fmt_addr(usage.start), fmt_addr(usage.start + usage.size-1), usage.size, usage.category, usage.hd)) return result history = History() ================================================ FILE: heap/parser.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # Query language for the heap # Uses "ply", so we'll need python-ply on Fedora # Split into tokenizer, then grammar, then external interface ############################################################################ # Tokenizer: ############################################################################ import ply.lex as lex reserved = ['AND', 'OR', 'NOT'] tokens = [ 'ID','LITERAL_NUMBER', 'LITERAL_STRING', 'LPAREN','RPAREN', 'COMPARISON' ] + reserved t_LPAREN = r'\(' t_RPAREN = r'\)' def t_ID(t): r'[a-zA-Z_][a-zA-Z_0-9]*' # Check for reserved words (case insensitive): if t.value.upper() in reserved: t.type = t.value.upper() else: t.type = 'ID' return t def t_COMPARISON(t): r'<=|<|==|=|!=|>=|>' return t def t_LITERAL_NUMBER(t): r'(0x[0-9a-fA-F]+|\d+)' try: if t.value.startswith('0x'): t.value = int(t.value, 16) else: t.value = int(t.value) except ValueError: raise ParserError(t.value) return t def t_LITERAL_STRING(t): r'"([^"]*)"' # Drop the quotes: t.value = t.value[1:-1] return t # Ignored characters t_ignore = " \t" def t_newline(t): r'\n+' t.lexer.lineno += t.value.count("\n") def t_error(t): print("Illegal character '%s'" % t.value[0]) t.lexer.skip(1) lexer = lex.lex() ############################################################################ # Grammar: ############################################################################ import ply.yacc as yacc precedence = ( ('left', 'AND', 'OR'), ('left', 'NOT'), ('left', 'COMPARISON'), ) from heap.query import Constant, And, Or, Not, GetAttr, \ Comparison__le__, Comparison__lt__, Comparison__eq__, \ Comparison__ne__, Comparison__ge__, Comparison__gt__ def p_expression_number(t): 'expression : LITERAL_NUMBER' t[0] = Constant(t[1]) def p_expression_string(t): 'expression : LITERAL_STRING' t[0] = Constant(t[1]) def p_comparison(t): 'expression : expression COMPARISON expression' classes = { '<=' : Comparison__le__, '<' : Comparison__lt__, '==' : Comparison__eq__, '=' : Comparison__eq__, '!=' : Comparison__ne__, '>=' : Comparison__ge__, '>' : Comparison__gt__ } cls = classes[t[2]] t[0] = cls(t[1], t[3]) def p_and(t): 'expression : expression AND expression' t[0] = And(t[1], t[3]) def p_or(t): 'expression : expression OR expression' t[0] = Or(t[1], t[3]) def p_not(t): 'expression : NOT expression' t[0] = Not(t[2]) def p_expression_group(t): 'expression : LPAREN expression RPAREN' t[0] = t[2] def p_expression_name(t): 'expression : ID' attrname = t[1] attrnames = ('domain', 'kind', 'detail', 'addr', 'start', 'size') if attrname not in attrnames: raise ParserError.from_production(t, attrname, ('Unknown attribute "%s" (supported are %s)' % (attrname, ','.join(attrnames)))) t[0] = GetAttr(attrname) class ParserError(Exception): @classmethod def from_production(cls, p, val, msg): return ParserError(p.lexer.lexdata, p.lexer.lexpos - len(val), val, msg) @classmethod def from_token(cls, t, msg="Parse error"): return ParserError(t.lexer.lexdata, t.lexer.lexpos - len(t.value), t.value, msg) def __init__(self, input_, pos, value, msg): self.input_ = input_ self.pos = pos self.value = value self.msg = msg def __str__(self): return ('%s at "%s":\n%s\n%s' % (self.msg, self.value, self.input_, ' '*self.pos + '^'*len(self.value))) def p_error(t): raise ParserError.from_token(t) ############################################################################ # Interface: ############################################################################ # Entry point: def parse_query(s): #try: parser = yacc.yacc(debug=0, write_tables=0) return parser.parse(s)#, debug=1) #except ParserError, e: # print 'foo', e def test_lexer(s): lexer.input(s) while True: tok = lexer.token() if not tok: break print(tok) ================================================ FILE: heap/pypy.py ================================================ import gdb from heap import WrappedPointer, caching_lookup_type, Usage, \ type_void_ptr, fmt_addr, Category, looks_like_ptr, \ WrongInferiorProcess def pypy_categorizer(addr, size): return None class ArenaCollection(WrappedPointer): # Corresponds to pypy/rpython/memory/gc/minimarkpage.py:ArenaCollection def get_arenas(self): # Yield a sequence of (struct pypy_ArenaReference0*) gdb.Value instances # representing the arenas current_arena = self.field('ac_inst_current_arena') # print "self.field('ac_inst_current_arena'): %s" % self.field('ac_inst_current_arena') if current_arena: yield ArenaReference(current_arena) # print "self.field('ac_inst_arenas_lists'):%s" % self.field('ac_inst_arenas_lists') #for arena in : arena = self.field('ac_inst_arenas_lists') #while arena: # yield ArenaReference(arena) # arena = arena.dereference()['ac_inst_nextarena'] class ArenaReference(WrappedPointer): def iter_usage(self): # print 'got PyPy arena within allocations' return [] # FIXME class ArenaDetection(object): '''Detection of PyPy arenas, done as an object so that we can cache state''' def __init__(self): try: ac_global = gdb.parse_and_eval('pypy_g_pypy_rpython_memory_gc_minimarkpage_ArenaCollect') except RuntimeError: # Not PyPy? raise WrongInferiorProcess('pypy') self._ac = ArenaCollection(ac_global.address) self._arena_refs = [] self._malloc_ptrs = {} for ar in self._ac.get_arenas(): print(ar) print(ar._gdbval.dereference()) self._arena_refs.append(ar) # ar_base : address as returned by malloc self._malloc_ptrs[int(ar.field('ar_base'))] = ar print(self._malloc_ptrs) def as_arena(self, ptr, chunksize): if ptr in self._malloc_ptrs: return self._malloc_ptrs[ptr] return None ================================================ FILE: heap/query.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA import sys class Expression(object): def eval_(self, u): raise NotImplementedError def __eq__(self, other): return (self.__class__ == other.__class__ and self.__dict__ == other.__dict__) class Constant(Expression): def __init__(self, value): self.value = value def __repr__(self): return 'Constant(%r)' % (self.value,) def eval_(self, u): return self.value class GetAttr(Expression): def __init__(self, attrname): self.attrname = attrname def __repr__(self): return 'GetAttr(%r)' % (self.attrname,) def eval_(self, u): if self.attrname in ('domain', 'kind', 'detail'): if u.category == None: u.ensure_category() return getattr(u.category, self.attrname) return getattr(u, self.attrname) class BinaryOp(Expression): def __init__(self, lhs, rhs): self.lhs = lhs self.rhs = rhs class Comparison(BinaryOp): def __init__(self, lhs, rhs): BinaryOp.__init__(self, lhs, rhs) def __repr__(self): return '%s(%r, %r)' % (self.__class__.__name__, self.lhs, self.rhs) def eval_(self, u): lhs_val = self.lhs.eval_(u) rhs_val = self.rhs.eval_(u) return self.cmp_(lhs_val, rhs_val) def cmp_(self, lhs, rhs): raise NotImplementedError class Comparison__le__(Comparison): def cmp_(self, lhs, rhs): return lhs <= rhs class Comparison__lt__(Comparison): def cmp_(self, lhs, rhs): return lhs < rhs class Comparison__eq__(Comparison): def cmp_(self, lhs, rhs): return lhs == rhs class Comparison__ne__(Comparison): def cmp_(self, lhs, rhs): return lhs != rhs class Comparison__ge__(Comparison): def cmp_(self, lhs, rhs): return lhs >= rhs class Comparison__gt__(Comparison): def cmp_(self, lhs, rhs): return lhs > rhs class And(BinaryOp): def __repr__(self): return 'And(%r, %r)' % (self.lhs, self.rhs) def eval_(self, u): # Short-circuit evaluation: if not self.lhs.eval_(u): return False return self.rhs.eval_(u) class Or(BinaryOp): def __repr__(self): return 'Or(%r, %r)' % (self.lhs, self.rhs) def eval_(self, u): # Short-circuit evaluation: if self.lhs.eval_(u): return True return self.rhs.eval_(u) class Not(Expression): def __init__(self, inner): self.inner = inner def __repr__(self): return 'Not(%r)' % (self.inner, ) def eval_(self, u): return not self.inner.eval_(u) class Column(object): def __init__(self, name, getter, formatter): self.name = name self.getter = getter self.formatter = formatter class Query(object): def __init__(self, filter_): self.filter_ = filter_ def __iter__(self): from heap import iter_usage_with_progress, lazily_get_usage_list if True: # 2-pass, but the expensive first pass may be cached usage_list = lazily_get_usage_list() for u in usage_list: if self.filter_.eval_(u): yield u else: # 1-pass: # This may miss blocks that are only categorized w.r.t. to other # blocks: for u in iter_usage_with_progress(): if self.filter_.eval_(u): yield u def do_query(args): from heap import fmt_addr, Table from heap.parser import parse_query if args == '': # if no query supplied, select everything: filter_ = Constant(True) else: filter_ = parse_query(args) if False: print(args) print(filter_) columns = [Column('Start', lambda u: u.start, fmt_addr), Column('End', lambda u: u.start + u.size - 1, fmt_addr ), Column('Domain', lambda u: u.category.domain, None), Column('Kind', lambda u: u.category.kind, None), Column('Detail', lambda u: u.category.detail, None), Column('Hexdump', lambda u: u.hexdump, None), ] t = Table([col.name for col in columns]) for u in Query(filter_): u.ensure_hexdump() u.ensure_category() if u.category: domain = u.category.domain kind = u.category.kind detail = u.category.detail if not detail: detail = '' else: domain = '' kind = '' detail = '' t.add_row([fmt_addr(u.start), fmt_addr(u.start + u.size - 1), domain, kind, detail, u.hd]) t.write(sys.stdout) print() ================================================ FILE: heap/sqlite.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA from heap import Category, caching_lookup_type import gdb def categorize_sqlite3(addr, usage_set, visited): # "struct sqlite3" is defined in src/sqliteInt.h, which is an internal header ptr_type = caching_lookup_type('sqlite3').pointer() obj_ptr = gdb.Value(addr).cast(ptr_type) # print obj_ptr.dereference() aDb = obj_ptr['aDb'] Db_addr = int(aDb) Db_malloc_addr = Db_addr - 8 if usage_set.set_addr_category(Db_malloc_addr, Category('sqlite3', 'struct Db', None), visited): print(aDb['pBt'].dereference()) # FIXME ================================================ FILE: make-release.sh ================================================ # Utility to help dmalcolm make releases: VERSION=$1 git clone git://git.fedorahosted.org/gdb-heap.git pushd gdb-heap git tag -a -m "$VERSION" $VERSION # FIXME: pushing this isn't working for some reason popd mv gdb-heap gdb-heap-${VERSION} tar cfvj gdb-heap-${VERSION}.tar.bz2 gdb-heap-${VERSION} scp gdb-heap-${VERSION}.tar.bz2 dmalcolm@fedorahosted.org:gdb-heap rm -rf gdb-heap-${VERSION} ================================================ FILE: object-sizes.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # This is a support script for selftest.py # It creates various kinds of object, so that we can verify that gdb-heap # detects them (and their supporting buffers) # Four different kinds of (x, y) coordinate: try: from collections import namedtuple NamedTuple = namedtuple('NamedTuple', ('x', 'y')) except ImportError: NamedTuple = None class OldStyle: def __init__(self, x, y): self.x = x self.y = y class NewStyle(object): def __init__(self, x, y): self.x = x self.y = y class NewStyleWithSlots(object): __slots__ = ('x', 'y') def __init__(self, x, y): self.x = x self.y = y objs = [] types = [OldStyle, NewStyle, NewStyleWithSlots] if NamedTuple: types.append(NamedTuple) for impl in types: objs.append(impl(x=3, y=4)) print(objs) # Test creating an object with more than 8 attributes, so that the __dict__ # has an external PyDictEntry buffer. # We will test to see if this detectable in the selftest. class OldStyleManyAttribs: def __init__(self, **kwargs): self.__dict__ = kwargs class NewStyleManyAttribs(object): def __init__(self, **kwargs): self.__dict__ = kwargs # Create instance with 9 attributes: old_style_many = OldStyleManyAttribs(**dict(zip('abcdefghi', range(9)))) new_style_many = NewStyleManyAttribs(**dict(zip('abcdefghi', range(9)))) # Ensure that we have a set object that uses an externally allocated # buffer, so that we can verify that these are detected. To do this, # we need a set with more than PySet_MINSIZE members (which is 8): large_set = set(range(64)) large_frozenset = frozenset(range(64)) import sqlite3 db = sqlite3.connect(':memory:') c = db.cursor() # Create table c.execute('''CREATE TABLE dummy(foo TEXT, bar TEXT, v REAL)''') # Insert a row of data c.execute("INSERT INTO dummy VALUES ('ostrich', 'elephant', 42.0)") # Save (commit) the changes db.commit() # Don't close "c"; we want to see the objects in memory # Ensure that the selftest's breakpoint on builtin_id is hit: id(42) ================================================ FILE: resultparser.py ================================================ # Classes for working with the textual table output from gdb-heap import unittest import re from collections import namedtuple def indent(str_): return '\n'.join([(' ' * 4) + line for line in str_.splitlines()]) class ColumnNotFound(Exception): def __init__(self, colname, table): self.colname = colname self.table = table def __str__(self): return ('ColumnNotFound(%s) in:\n%s' % (self.colname, indent(str(self.table)))) class RowNotFound(Exception): def __init__(self, criteria, table): self.criteria = criteria self.table = table def __str__(self): return ('RowNotFound(%s) in:\n%s' % (self.criteria, indent(str(self.table)))) class Criteria(object): '''A list of (colname, value) criteria for searching rows in a table''' def __init__(self, table, kvs): self.kvs = kvs self._by_index = [(table.find_col(attrname), value) for attrname, value in kvs] def __str__(self): return 'Criteria(%s)' % ','.join('%r=%r' % (attrname, value) for attrname, value in self.kvs) def is_matched_by(self, row): for colindex, value in self._by_index: if row[colindex] != value: return False return True class ParsedTable(object): '''Parses output from heap.Table, for use in writing selftests''' @classmethod def parse_lines(cls, data): '''Parse the lines in the string, returning a list of ParsedTable instances''' result = [] lines = data.splitlines() start = 0 while start < len(lines): sep_line = cls._find_separator_line(lines[start:]) if sep_line: sep_index, colmetrics = sep_line t = ParsedTable(sep_index, colmetrics, lines[start:]) result.append(t) start += t.sep_index + 1 + len(t.rows) else: break return result # Column metrics: ColMetric = namedtuple('ColMetric', ('offset', 'width')) def __init__(self, sep_index, colmetrics, lines): self.sep_index, self.colmetrics = sep_index, colmetrics # Parse column headings: header_index = self.sep_index - 1 self.colnames = self._split_cells(lines[header_index]) # Parse rows: self.rows = [] for line in lines[self.sep_index + 1:]: if line == '': break self.rows.append(self._split_cells(line)) self.rawdata = '\n'.join(lines[header_index:header_index+len(self.rows)+2]) def __str__(self): return self.rawdata def as_rst_grid_table(self): def _get_separator_row(colwidths, sepchar): return '+' + ('+'.join([sepchar * width for width in colwidths])) + '+\n' def _get_row(values, colwidths): row = '|' cells = [] for value, width in zip(values, colwidths): if value is None: cells.append(' ' * width) else: formatString = "%%%ds" % width # to generate e.g. "%20s" cells.append(formatString % value) row += '|'.join([cell for cell in cells]) row += '|\n' return row colwidths = [colmetric.width for colmetric in self.colmetrics] result = _get_separator_row(colwidths, '-') result += _get_row(self.colnames, colwidths) result += _get_separator_row(colwidths, '=') for row in self.rows: result += _get_row(row, colwidths) result += _get_separator_row(colwidths, '-') return result def get_cell(self, x, y): return self.rows[y][x] def find_col(self, colname): # Find the index of the column with the given name for x, col in enumerate(self.colnames): if colname == col: return x raise ColumnNotFound(colname, self) def find_row(self, kvs): # Find the first row matching the criteria, or raise RowNotFound criteria = Criteria(self, kvs) for row in self.rows: if criteria.is_matched_by(row): return row raise RowNotFound(criteria, self) def find_cell(self, kvs, attr2name): criteria = Criteria(self, kvs) row = self.find_row(kvs) return row[self.find_col(attr2name)] def _get_cell_value(self, cellstr): if cellstr == '': return None # Remove ',' separators from numbers, and treat as decimal: m = re.match('^([0-9,]+)$', cellstr) # [0-9]\, if m: return int(cellstr.replace(',', '')) # Hexadecimal values: m = re.match('^(0x[0-9a-f]+)$', cellstr) if m: return int(cellstr, 16) # Keep as a str: return cellstr def _split_cells(self, line): row = [] for col in self.colmetrics: cellstr = line[col.offset: col.offset+col.width].lstrip() cellvalue = self._get_cell_value(cellstr) row.append(cellvalue) return tuple(row) @classmethod def _find_separator_line(cls, lines): # Look for the separator line # Return (index, tuple of ColMetric) for i, line in enumerate(lines): if line.startswith('-'): widths = [len(frag) for frag in line.split(' ')] coldata = [] offset = 0 for width in widths: coldata.append(cls.ColMetric(offset=offset, width=width)) offset += width + 2 return (i, tuple(coldata)) # Test data for table parsing (edited fragment of output during development): test_table = ''' junk line Domain Kind Detail Count Allocated size ------------- ---------- --------------------- ----- -------------- python str 3,891 234,936 uncategorized 98312 bytes 1 98,312 uncategorized 1544 bytes 43 66,392 uncategorized 6152 bytes 10 61,520 python tuple 1,421 54,168 0xdeadbeef TOTAL 9,377 857,592 another junk line another table Chunk size Num chunks Allocated size ---------- ---------- -------------- 16 100 1,600 24 50 1,200 TOTALS 150 2,800 more junk ''' class ParserTests(unittest.TestCase): def test_table_data(self): tables = ParsedTable.parse_lines(test_table) self.assertEquals(len(tables), 2) pt = tables[0] # Verify column names: self.assertEquals(pt.colnames, ('Domain', 'Kind', 'Detail', 'Count', 'Allocated size')) # Verify (x,y) lookup, and type conversions: self.assertEquals(pt.get_cell(0, 0), 'python') self.assertEquals(pt.get_cell(1, 3), None) self.assertEquals(pt.get_cell(4, 5), 0xdeadbeef) self.assertEquals(pt.get_cell(4, 6), 857592) # Verify searching by value: self.assertEquals(pt.find_col('Count'), 3) self.assertEquals(pt.find_row([('Allocated size', 54168),]), ('python', 'tuple', None, 1421, 54168)) self.assertEquals(pt.find_cell([('Kind', 'str'),], 'Count'), 3891) # Error-checking: self.assertRaises(ColumnNotFound, pt.find_col, 'Ensure that a non-existant column raises an error') self.assertRaises(RowNotFound, pt.find_row, [('Count', -1)]) # Verify that "rawdata" contains the correct string data: self.assert_(pt.rawdata.startswith(' Domain')) self.assert_(pt.rawdata.endswith('857,592')) # Test the second table: pt = tables[1] self.assertEquals(pt.colnames, ('Chunk size', 'Num chunks', 'Allocated size')) self.assertEquals(pt.get_cell(2, 2), 2800) self.assert_(pt.rawdata.startswith('Chunk size')) self.assert_(pt.rawdata.endswith('2,800')) def test_multiple_tables(self): tables = ParsedTable.parse_lines(test_table * 5) self.assertEquals(len(tables), 10) def test_rst(self): tables = ParsedTable.parse_lines(test_table) self.assertEquals(len(tables), 2) pt = tables[0] rst_text = pt.as_rst_grid_table() exp = ( '+-------------+----------+---------------------+-----+--------------+\n' '| Domain| Kind| Detail|Count|Allocated size|\n' '+=============+==========+=====================+=====+==============+\n' '| python| str| | 3891| 234936|\n' '+-------------+----------+---------------------+-----+--------------+\n' '|uncategorized| | 98312 bytes| 1| 98312|\n' '+-------------+----------+---------------------+-----+--------------+\n' '|uncategorized| | 1544 bytes| 43| 66392|\n' '+-------------+----------+---------------------+-----+--------------+\n' '|uncategorized| | 6152 bytes| 10| 61520|\n' '+-------------+----------+---------------------+-----+--------------+\n' '| python| tuple| | 1421| 54168|\n' '+-------------+----------+---------------------+-----+--------------+\n' '| | | | | 3735928559|\n' '+-------------+----------+---------------------+-----+--------------+\n' '| | | TOTAL| 9377| 857592|\n' '+-------------+----------+---------------------+-----+--------------+\n') self.assertEquals(rst_text, exp) if __name__ == "__main__": unittest.main() ================================================ FILE: run-gdb-heap ================================================ #!/bin/bash # Handy script for launching a program under gdb, whilst wiring up gdb to use # the working copy of gdb-heap # Typical usage: # ./run-gdb-heap python PYTHONPATH="$(pwd)" \ gdb \ --eval-command="python import gdbheap" \ --args $* ================================================ FILE: selftest.py ================================================ # Copyright (C) 2010 David Hugh Malcolm # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # Verify that gdb can print information on the heap of an inferior process # # Adapted from Python's Lib/test/test_gdb.py, which in turn was adapted from # similar work in Unladen Swallow's Lib/test/test_jit_gdb.py import os import re from subprocess import Popen, PIPE, call as subprocess_call import sys import unittest import random from test.test_support import run_unittest, findfile if sys.maxint == 0x7fffffff: _32bit = True else: _32bit = False try: gdb_version, _ = Popen(["gdb", "--version"], stdout=PIPE).communicate() except OSError: # This is what "no gdb" looks like. There may, however, be other # errors that manifest this way too. raise unittest.SkipTest("Couldn't find gdb on the path") gdb_version_number = re.search(r"^GNU gdb [^\d]*(\d+)\.", gdb_version) if int(gdb_version_number.group(1)) < 7: raise unittest.SkipTest("gdb versions before 7.0 didn't support python embedding" " Saw:\n" + gdb_version) # Verify that "gdb" was built with the embedded python support enabled: cmd = "--eval-command=python import sys; print sys.version_info" p = Popen(["gdb", "--batch", cmd], stdout=PIPE) gdbpy_version, _ = p.communicate() if gdbpy_version == '': raise unittest.SkipTest("gdb not built with embedded python support") class TestSource(object): '''Programatically construct C source code for a test program that calls into the heap''' def __init__(self): self.decls = '' self.operations = '' self.num_ptrs = 0 self.indent = ' ' def add_line(self, code): self.operations += self.indent + code + '\n' def add_malloc(self, size, debug=False, typename=None): self.num_ptrs += 1 varname = 'ptr%03i'% self.num_ptrs if typename: cast = '(%s)' % typename else: typename = 'void *' cast = '' self.add_line('%s%s = %smalloc(0x%x); /* %i */' % (typename, varname, cast, size, size)) if debug: self.add_line('printf(__FILE__ ":%%i:%s=%%p\\n", __LINE__, %s);' % (varname, varname)) self.add_line('fflush(stdout);') return varname def add_realloc(self, varname, size, debug=False): self.num_ptrs += 1 new_varname = 'ptr%03i'% self.num_ptrs self.add_line('void *%s = realloc(%s, 0x%x);' % (new_varname, varname, size)) if debug: self.add_line('printf(__FILE__ ":%%i:%s=%%p\\n", __LINE__, %s);' % (new_varname, new_varname)) self.add_line('fflush(stdout);') return new_varname def add_free(self, varname, debug=False): self.add_line('free(%s);' % varname) def add_breakpoint(self): self.add_line('__asm__ __volatile__ ("int $03");') def as_c_source(self): result = ''' #include #include ''' result += self.decls result += ''' int main (int argc, char **argv) { ''' + self.operations + ''' return 0; } ''' return result class TestProgram(object): def __init__(self, name, source, is_cplusplus=False): self.name = name self.source = source if is_cplusplus: self.srcname = '%s.cc' % self.name compiler = 'g++' else: self.srcname = '%s.c' % self.name compiler = 'gcc' f = open(self.srcname, 'w') f.write(source) f.close() c = subprocess_call([compiler, # We want debug information: '-g', # Name of the binary: '-o', self.name, # The source file: self.srcname]) # Check exit status: assert(c == 0) # Check that the binary exists: assert(os.path.exists(self.name)) from resultparser import ParsedTable, RowNotFound, test_table class DebuggerTests(unittest.TestCase): """Test that the debugger can debug the heap""" def run_gdb(self, *args): """Runs gdb with the command line given by *args. Returns its stdout, stderr """ out, err = Popen(args, stdout=PIPE, stderr=PIPE).communicate() return out, err def requires_binary(self, binary): # Slightly complicated: gdb will look for the binary within the PWD # as well as within the $PATH if os.path.exists(binary): # It's either an absolute or relative path, and directly exists: return p = Popen(['which', binary], stdout=PIPE, stderr=PIPE) out, err = p.communicate() if p.returncode == 0: # It's in the $PATH return raise unittest.SkipTest("%s not found" % binary) def command_test(self, progargs, commands, breakpoint=None): self.requires_binary(progargs[0]) # Run under gdb, hit the breakpoint, then run our "heap" command: commands = [ 'python sys.path.append(".") ; import gdbheap' ] + commands args = ["gdb", "--batch"] args += ['--eval-command=%s' % cmd for cmd in commands] args += ["--args"] + progargs # print args # print ' '.join(args) # Use "args" to invoke gdb, capturing stdout, stderr: out, err = self.run_gdb(*args) # Ignore some noise on stderr due to a pending breakpoint: if breakpoint: err = err.replace('Function "%s" not defined.\n' % breakpoint, '') # Ensure no unexpected error messages: if err != '': print out print err self.fail('stderr from gdb was non-empty: %r' % err) return out def program_test(self, name, source, commands, is_cplusplus=False): p = TestProgram(name, source, is_cplusplus) return self.command_test([p.name], commands) def test_no_allocations(self): # Verify handling of an inferior process that doesn't use the heap src = TestSource() src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_no_allocations', source, commands=['run', 'heap sizes']) self.assert_(''' Chunk size Num chunks Allocated size ---------- ---------- -------------- TOTALS 0 0 ''' in out) def test_small_allocations(self): src = TestSource() # 100 allocations each of sizes in the range 1-15 for i in range(100): for size in range(1, 16): src.add_malloc(size) src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_small_allocations', source, commands=['run', 'heap sizes']) if _32bit: exp = ''' Chunk size Num chunks Allocated size ---------- ---------- -------------- 16 1200 19,200 24 300 7,200 TOTALS 1500 26,400 ''' else: exp = ''' Chunk size Num chunks Allocated size ---------- ---------- -------------- 32 1500 48,000 TOTALS 1500 48,000 ''' self.assert_(exp in out, out) def test_large_allocations(self): # 10 allocations each of sizes in the range 1MB through 10MB: src = TestSource() for i in range(10): size = 1024 * 1024 * (i+1) src.add_malloc(size) src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_large_allocations', source, commands=['run', 'heap sizes']) self.assert_(''' Chunk size Num chunks Allocated size ---------- ---------- -------------- 10,489,856 1 10,489,856 9,441,280 1 9,441,280 8,392,704 1 8,392,704 7,344,128 1 7,344,128 6,295,552 1 6,295,552 5,246,976 1 5,246,976 4,198,400 1 4,198,400 3,149,824 1 3,149,824 2,101,248 1 2,101,248 1,052,672 1 1,052,672 TOTALS 10 57,712,640 ''' in out) def test_mixed_allocations(self): # Compile test program source = ''' #include #include int main (int argc, char **argv) { int i; void *ptrs[100]; /* Some small allocations: */ for (i=0; i < 100; i++) { ptrs[i] = malloc(256); printf("malloc returned %p\\n", ptrs[i]); fflush(stdout); } /* Free one of the small allocations: */ free(ptrs[50]); void* ptr1 = malloc(1000); void* ptr2 = malloc(1000); void* ptr3 = malloc(256000); /* large allocation */ /* Directly insert a breakpoint: */ __asm__ __volatile__ ("int $03"); return 0; } ''' out = self.program_test('test_simple', source, commands=['run', 'heap sizes']) #print out # Verify the result if _32bit: exp = ''' Chunk size Num chunks Allocated size ---------- ---------- -------------- 258,048 1 258,048 264 99 26,136 1,008 2 2,016 TOTALS 102 286,200 ''' else: exp = ''' Chunk size Num chunks Allocated size ---------- ---------- -------------- 258,048 1 258,048 272 99 26,928 1,008 2 2,016 TOTALS 102 286,992 ''' self.assert_(exp in out, out) def random_size(self): size = random.randint(1, 64) if random.randint(0, 5) == 0: size *= 1024 size += random.randint(0, 1023) if random.randint(0, 5) == 0: size *= 256 size += random.randint(0, 255) return size def test_random_allocations(self): # Fuzz-testing: lots of allocations (of various sizes) # and deallocations src = TestSource() sizes = {} live_blocks = set() for i in range(100): action = random.randint(1, 100) # 70% chance of malloc: if action <= 70: size = self.random_size() varname = src.add_malloc(size, debug=True) sizes[varname] = size live_blocks.add(varname) if len(live_blocks) > 0: # 10% chance of realloc: if action in range(71, 80): size = self.random_size() old_varname = random.sample(live_blocks, 1)[0] live_blocks.remove(old_varname) new_varname = src.add_realloc(old_varname, size, debug=True) sizes[new_varname] = size live_blocks.add(new_varname) # 20% chance of freeing something: elif action > 80: varname = random.sample(live_blocks, 1)[0] live_blocks.remove(varname) src.add_free(varname) src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_random_allocations', source, commands=(['run'] + ['heap select', 'cont'] * 100)) # We have 100 states of the inferior process; check that each was # reported as we expected it to be: tables = ParsedTable.parse_lines(out) self.assertEqual(len(tables), 100) for i in range(100): heap_select_out = tables[i] #print heap_select_out reported_addrs = set([heap_select_out.get_cell(0, y) for y in range(len(heap_select_out.rows))]) #print reported_addrs # FIXME: do some verification at each breakpoint: check that the # reported values correspond to what we expect def test_random_buffers(self): # Fuzz-testing: try to break the heuristics by throwing random bytes # at them. Note that we do the randomization at the python level when # generating the C code, so that the result of running any given C code # is entirely reproducable src = TestSource() for i in range(100): varname = src.add_malloc(256, typename='unsigned char*') for offset in range(256): value = random.randint(0, 255) src.add_line('%s[%i]=0x%02x;' % (varname, offset, value)) src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_random_buffers', source, commands=['run', 'heap']) # print out def test_cplusplus(self): '''Verify that we can detect and categorize instances of C++ classes''' # Note that C++ detection is currently disabled due to a bug in execution capture src = TestSource() src.decls += ''' class Foo { public: virtual ~Foo() {} int f1; int f2; }; class Bar : Foo { public: virtual ~Bar() {} int f1; // Ensure that Bar has a different allocated size to Foo, on every arch: int buffer[256]; }; ''' for i in range(100): src.add_line('{Foo *f = new Foo();}') if i % 2: src.add_line('{Bar *b = new Bar();}') src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_cplusplus', source, is_cplusplus=True, commands=['run', 'heap sizes', 'heap']) tables = ParsedTable.parse_lines(out) heap_sizes_out = tables[0] heap_out = tables[1] # We ought to have 150 live blocks on the heap: self.assertHasRow(heap_out, [('Detail', 'TOTAL'), ('Count', 150)]) # Use the differing counts of the blocks to locate the objects # FIXME: change the "Domain" values below and add "Kind" once C++ # identification is re-enabled: self.assertHasRow(heap_out, [('Count', 100), ('Domain', 'uncategorized')]) self.assertHasRow(heap_out, [('Count', 50), ('Domain', 'uncategorized')]) def test_history(self): src = TestSource() src.add_malloc(100) src.add_malloc(100) src.add_malloc(100) src.add_breakpoint() src.add_malloc(200) src.add_malloc(200) src.add_malloc(200) src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_history', source, commands=['run', 'heap sizes', 'heap label foo', 'cont', 'heap log', 'heap diff']) #print out # FIXME def assertHasRow(self, table, kvs): return table.find_row(kvs) # ...which will raise a RowNotFound exception if there's a problem def assertFoundCategory(self, table, domain, kind, detail=None): # Ensure that the result table has a row of the given category # (or raise RowNotFound) kvs = [('Domain', domain), ('Kind', kind)] if detail: kvs.append( ('Detail', detail) ) self.assertHasRow(table, kvs) def test_assertions(self): # Ensure that the domain-specific assertions work tables = ParsedTable.parse_lines(test_table) self.assertEquals(len(tables), 2) pt = tables[0] self.assertHasRow(pt, [('Domain', 'python'), ('Kind', 'str')]) self.assertRaises(RowNotFound, lambda: self.assertHasRow(pt, [('Domain', 'ruby')])) self.assertFoundCategory(pt, 'python', 'str') self.assertRaises(RowNotFound, lambda: self.assertFoundCategory(pt, 'ruby', 'class')) def test_gobject(self): out = self.command_test(['gtk-demo'], commands=['set breakpoint pending yes', 'set environment G_SLICE=always-malloc', # for now 'break gtk_main', 'run', 'heap', ]) # print out tables = ParsedTable.parse_lines(out) heap_out = tables[0] # Ensure that instances of GObject classes are categorized: self.assertFoundCategory(heap_out, 'GType', 'GtkTreeView') self.assertFoundCategory(heap_out, 'GType', 'GtkLabel') # Ensure that instances of fundamental boxed types are categorized: self.assertFoundCategory(heap_out, 'GType', 'gchar') self.assertFoundCategory(heap_out, 'GType', 'guint') # Ensure that the code detected buffers used by the GLib/GTK types: self.assertFoundCategory(heap_out, 'GType', 'GdkPixbuf pixels', '107w x 140h') # GdkImage -> X11 Images -> data: self.assertFoundCategory(heap_out, 'GType', 'GdkImage') self.assertFoundCategory(heap_out, 'X11', 'Image') if False: # Only seen whilst using X forwarded over ssh: self.assertFoundCategory(heap_out, 'X11', 'Image data') # In both above rows, "Detail" contains the exact dimensions, but these # seem to vary with the resolution of the display the test is run # against # FreeType: # These seem to be highly dependent on the environment; I originally # developed this whilst using X forwarded over ssh if False: self.assertFoundCategory(heap_out, 'GType', 'PangoCairoFcFontMap') self.assertFoundCategory(heap_out, 'FreeType', 'Library') self.assertFoundCategory(heap_out, 'FreeType', 'raster_pool') def test_python2(self): self._impl_test_python('python2', py3k=False) def test_python3(self): self._impl_test_python('python3', py3k=True) def _impl_test_python(self, pyruntime, py3k): # Test that we can debug CPython's memory usage, for a given runtime # Invoke a test python script, stopping at a breakpoint out = self.command_test([pyruntime, 'object-sizes.py'], commands=['set breakpoint pending yes', 'break builtin_id', 'run', 'heap cpython-allocators', 'heap', 'heap select kind="PyListObject ob_item table"'], breakpoint='builtin_id') # Re-enable this for debugging: # print out tables = ParsedTable.parse_lines(out) # Verify that "cpython-allocators" works: allocators_out = tables[0] self.assertEquals(allocators_out.colnames, ('struct arena_object*', '256KB buffer location', 'Free pools')) # print allocators_out # self.assertHasRow(allocators_out, # kvs = [('Domain', 'cpython'), # ('Kind', 'PyListObject ob_item table')]) heap_out = tables[1] # Verify that "select" works for a category that's only detectable # w.r.t. other categories: select_out = tables[2] # print select_out self.assertHasRow(select_out, kvs = [('Domain', 'cpython'), ('Kind', 'PyListObject ob_item table')]) # Ensure that the code detected instances of various python types we # expect to be present: for kind in ('str', 'list', 'tuple', 'dict', 'type', 'code', 'set', 'frozenset', 'function', 'module', 'frame', ): self.assertFoundCategory(heap_out, 'python', kind) if py3k: self.assertFoundCategory(heap_out, 'python', 'bytes') else: self.assertFoundCategory(heap_out, 'python', 'unicode') # Ensure that the blocks of int allocations are detected: if not py3k: self.assertFoundCategory(heap_out, 'cpython', '_intblock', '') # Ensure that bytecode "strings" are marked as such: self.assertFoundCategory(heap_out, 'python', 'str', 'bytecode') # FIXME # Ensure that old-style classes are printed with a meaningful name # (i.e. not just "type"): if not py3k: for clsname in ('OldStyle', 'OldStyleManyAttribs'): self.assertFoundCategory(heap_out, 'python', clsname, 'old-style') # ...and that their instance dicts are marked: self.assertFoundCategory(heap_out, 'cpython', 'PyDictObject', '%s.__dict__' % clsname) # ...and that an old-style instance with enough attributes to require a # separate PyDictEntry buffer for its __dict__ has that buffer marked # with the typename: self.assertFoundCategory(heap_out, 'cpython', 'PyDictEntry table', 'OldStyleManyAttribs.__dict__') # Likewise for new-style classes: for clsname in ('NewStyle', 'NewStyleManyAttribs'): self.assertHasRow(heap_out, [('Domain', 'python'), ('Kind', clsname), ('Detail', None)]) self.assertFoundCategory(heap_out, 'python', 'dict', '%s.__dict__' % clsname) self.assertFoundCategory(heap_out, 'cpython', 'PyDictEntry table', 'NewStyleManyAttribs.__dict__') # Ensure that the code detected buffers used by python types: for kind in ('PyDictEntry table', 'PyListObject ob_item table', 'PySetObject setentry table', 'PyUnicodeObject buffer', 'PyDictEntry table'): self.assertFoundCategory(heap_out, 'cpython', kind) # and of other types: self.assertFoundCategory(heap_out, 'C', 'string data') self.assertFoundCategory(heap_out, 'pyarena', 'pool_header overhead') # Ensure that the "interned" table is identified (it's typically # at least 200k on a 64-bit build): self.assertHasRow(heap_out, [('Domain', 'cpython'), ('Kind', 'PyDictEntry table'), ('Detail', 'interned'), ('Count', 1)]) # Ensure that we detect python sqlite3 objects: for kind in ('sqlite3.Connection', 'sqlite3.Statement', 'sqlite3.Cache'): self.assertFoundCategory(heap_out, 'python', kind) # ...and that we detect underlying sqlite3 buffers: for kind in ('sqlite3', 'sqlite3_stmt'): self.assertFoundCategory(heap_out, 'sqlite3', kind) def test_pypy(self): # Try to investigate memory usage of pypy-c # Developed using pypy-1.4.1 as packaged on Fedora. # # In order to get meaningful data, let's try to trap the exit point # of pypy-c within gdb. # # For now, lets try to put a breakpoint in this location within the # generated "pypy_g_entry_point" C function: # print_stats:158 : debug_stop("jit-summary") out = self.command_test(['pypy', 'object-sizes.py'], commands=['set breakpoint pending yes', 'break pypy_debug_stop', 'condition 1 0==strcmp(category, "jit-summary")', 'run', 'heap', ]) tables = ParsedTable.parse_lines(out) select_out = tables[0] def test_select(self): # Ensure that "heap select" with no query does something sane src = TestSource() for i in range(3): src.add_malloc(1024) src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_select', source, commands=['run', 'heap select', ]) tables = ParsedTable.parse_lines(out) select_out = tables[0] # The "heap select" command should select all blocks: self.assertEquals(select_out.colnames, ('Start', 'End', 'Domain', 'Kind', 'Detail', 'Hexdump')) self.assertEquals(len(select_out.rows), 3) # Test that syntax errors are well handled: out = self.program_test('test_select', source, commands=['run', 'heap select I AM A SYNTAX ERROR', ]) errmsg = ''' Parse error at "AM": I AM A SYNTAX ERROR ^^ ''' if errmsg not in out: self.fail('Did not find expected "ParseError" message in:\n%s' % out) # Test that unknown attributes are well-handled: out = self.program_test('test_select', source, commands=['run', 'heap select NOT_AN_ATTRIBUTE > 42', ]) errmsg = ''' Unknown attribute "NOT_AN_ATTRIBUTE" (supported are domain,kind,detail,addr,start,size) at "NOT_AN_ATTRIBUTE": NOT_AN_ATTRIBUTE > 42 ^^^^^^^^^^^^^^^^ ''' if errmsg not in out: self.fail('Did not find expected "Unknown attribute" error message in:\n%s' % out) # Ensure that ply did not create debug files (ticket #12) for filename in ('parser.out', 'parsetab.py'): if os.path.exists(filename): self.fail('Unexpectedly found file %r' % filename) def test_select_by_size(self): src = TestSource() # Allocate ten 1kb blocks, nine 2kb blocks, etc, down to one 10kb # block so that we can easily query them by size: for i in range(10): for j in range(10-i): size = 1024 * (i+1) src.add_malloc(size) src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_select_by_size', source, commands=['run', 'heap', 'heap select size >= 10240', # (parsed as "largest_out" below) 'heap select size < 2048', # (parsed as "smallest_out" below) 'heap select size >= 4096 and size < 8192', # (parsed as "middle_out" below) ]) tables = ParsedTable.parse_lines(out) heap_out = tables[0] largest_out = tables[1] smallest_out = tables[2] middle_out = tables[3] # The "heap" command should find all the allocations: self.assertHasRow(heap_out, [('Detail', 'TOTAL'), ('Count', 55)]) # The query for the largest should find just one allocation: self.assertEquals(len(largest_out.rows), 1) # The query for the smallest should find ten allocations: self.assertEquals(len(smallest_out.rows), 10) # The middle query [4096, 8192) should capture the following # allocations: # 7 of (4*4096), 6 of (5*4096), 5 of (6*4096) and 4 of (7*4096) # giving a total count of 7+6+5+4 = 22 self.assertEquals(len(middle_out.rows), 22) def test_select_by_category(self): out = self.command_test(['python', '-c', 'id(42)'], commands=['set breakpoint pending yes', 'break builtin_id', 'run', 'heap select domain="python" and kind="str" and size > 512'], breakpoint='builtin_id') tables = ParsedTable.parse_lines(out) select_out = tables[0] # Ensure that the filtering mechanism worked: if len(select_out.rows) < 10: self.fail("Didn't find any large python strings (has something gone wrong?) in: %s" % select_out) for row in select_out.rows: self.assertEquals(row[2], 'python') self.assertEquals(row[3], 'str') def test_heap_used(self): # Ensure that "heap used" works src = TestSource() for i in range(3): src.add_malloc(1024) src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_heap_used', source, commands=['run', 'heap used', ]) # FIXME: do some verification of the output def test_heap_all(self): # Ensure that "heap all" works src = TestSource() for i in range(3): src.add_malloc(1024) src.add_breakpoint() source = src.as_c_source() out = self.program_test('test_heap_all', source, commands=['run', 'heap all', ]) # FIXME: do some verification of the output from heap.parser import parse_query from heap.query import Constant, And, Or, Not, GetAttr, \ Comparison__le__, Comparison__lt__, Comparison__eq__, \ Comparison__ne__, Comparison__ge__, Comparison__gt__ class QueryParsingTests(unittest.TestCase): def assertParsesTo(self, s, result): self.assertEquals(parse_query(s), result) def test_simple_comparisons(self): self.assertParsesTo('size >= 1024', Comparison__ge__(GetAttr('size'), Constant(1024))) # Check that hexadecimal numeric literals are parsed: self.assertParsesTo('addr > 0xbf70ffff', Comparison__gt__(GetAttr('addr'), Constant(0xbf70ffff))) # Check that string literals are parsed: self.assertParsesTo('kind == "str"', Comparison__eq__(GetAttr('kind'), Constant('str'))) # Check "and": self.assertParsesTo('kind == "str" and size > 1024', And(Comparison__eq__(GetAttr('kind'), Constant('str')), Comparison__gt__(GetAttr('size'), Constant(1024)))) # Check "or": self.assertParsesTo('size > 10000 and not domain="uncategorized"', And(Comparison__gt__(GetAttr('size'), Constant(10000)), Not(Comparison__eq__(GetAttr('domain'), Constant('uncategorized'))))) # Do we want algebraic support? #self.assertParsesTo('size == (256 * 1024)+8', # Comparison('size', '==', 1024L)) if __name__ == "__main__": unittest.main()