Repository: lowleveldesign/debug-recipes Branch: main Commit: 2a6278c1c018 Files: 36 Total size: 320.4 KB Directory structure: gitextract_r48gq6i_/ ├── .gitignore ├── 404.html ├── CNAME ├── Gemfile ├── LICENSE ├── README.md ├── _config.yml ├── _includes/ │ ├── footer.html │ └── head.html ├── _layouts/ │ ├── home.html │ └── posts.html ├── about.md ├── articles.md ├── assets/ │ ├── main.scss │ └── other/ │ ├── EtwMetadata.ps1.txt │ ├── WTComTrace.wprp │ ├── winapi-user32.ps1.txt │ └── windbg-install.ps1.txt ├── browserconfig.xml ├── guides/ │ ├── com-troubleshooting.md │ ├── configuring-linux-for-effective-troubleshooting.md │ ├── configuring-windows-for-effective-troubleshooting.md │ ├── diagnosing-dotnet-apps.md │ ├── diagnosing-native-windows-apps.md │ ├── ebpf.md │ ├── etw.md │ ├── gdb.md │ ├── linux-tracing.md │ ├── network-tracing-tools.md │ ├── using-withdll-and-detours-to-trace-winapi.md │ ├── windbg.md │ └── windows-performance-counters.md ├── guides.md ├── index.md ├── site.webmanifest └── tools.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ _site .sass-cache .jekyll-cache .jekyll-metadata vendor draft_* ================================================ FILE: 404.html ================================================ --- permalink: /404.html layout: default ---

404

Page not found :(

The requested page could not be found.

================================================ FILE: CNAME ================================================ wtrace.net ================================================ FILE: Gemfile ================================================ source "https://rubygems.org" # Hello! This is where you manage which Jekyll version is used to run. # When you want to use a different version, change it below, save the # file and run `bundle install`. Run Jekyll with `bundle exec`, like so: # # bundle exec jekyll serve # # This will help ensure the proper Jekyll version is running. # Happy Jekylling! # gem "jekyll", "~> 4.2.0" # This is the default theme for new Jekyll sites. You may change this to anything you like. gem "minima", "~> 2.5" # gem "jekyll-theme-cayman", "~> 0.2.0" # If you want to use GitHub Pages, remove the "gem "jekyll"" above and # uncomment the line below. To upgrade, run `bundle update github-pages`. gem "github-pages", group: :jekyll_plugins # If you have any plugins, put them here! group :jekyll_plugins do gem "jekyll-feed", "~> 0.12" end # Windows and JRuby does not include zoneinfo files, so bundle the tzinfo-data gem # and associated library. platforms :mingw, :x64_mingw, :mswin, :jruby do gem "tzinfo", "~> 1.2" gem "tzinfo-data" end # Performance-booster for watching directories on Windows gem "wdm", "~> 0.1.1", :platforms => [:mingw, :x64_mingw, :mswin] gem "webrick", "~> 1.7" gem "json", "~> 2.7" ================================================ FILE: LICENSE ================================================ Attribution 4.0 International ======================================================================= Creative Commons Corporation ("Creative Commons") is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an "as-is" basis. Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible. Using Creative Commons Public Licenses Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses. Considerations for licensors: Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC- licensed material, or material used under an exception or limitation to copyright. More considerations for licensors: wiki.creativecommons.org/Considerations_for_licensors Considerations for the public: By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor's permission is not necessary for any reason--for example, because of any applicable exception or limitation to copyright--then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. More considerations for the public: wiki.creativecommons.org/Considerations_for_licensees ======================================================================= Creative Commons Attribution 4.0 International Public License By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. Section 1 -- Definitions. a. Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. b. Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. c. Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. d. Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. e. Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. f. Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License. g. Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. h. Licensor means the individual(s) or entity(ies) granting rights under this Public License. i. Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. j. Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. k. You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. Section 2 -- Scope. a. License grant. 1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: a. reproduce and Share the Licensed Material, in whole or in part; and b. produce, reproduce, and Share Adapted Material. 2. Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 3. Term. The term of this Public License is specified in Section 6(a). 4. Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a) (4) never produces Adapted Material. 5. Downstream recipients. a. Offer from the Licensor -- Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. b. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 6. No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). b. Other rights. 1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 2. Patent and trademark rights are not licensed under this Public License. 3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties. Section 3 -- License Conditions. Your exercise of the Licensed Rights is expressly made subject to the following conditions. a. Attribution. 1. If You Share the Licensed Material (including in modified form), You must: a. retain the following if it is supplied by the Licensor with the Licensed Material: i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); ii. a copyright notice; iii. a notice that refers to this Public License; iv. a notice that refers to the disclaimer of warranties; v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable; b. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and c. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. 4. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License. Section 4 -- Sui Generis Database Rights. Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database; b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. Section 5 -- Disclaimer of Warranties and Limitation of Liability. a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. Section 6 -- Term and Termination. a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: 1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 2. upon express reinstatement by the Licensor. For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License. Section 7 -- Other Terms and Conditions. a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. Section 8 -- Interpretation. a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. ======================================================================= Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” The text of the Creative Commons public licenses is dedicated to the public domain under the CC0 Public Domain Dedication. Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at creativecommons.org/policies, Creative Commons does not authorize the use of the trademark "Creative Commons" or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses. Creative Commons may be contacted at creativecommons.org. ================================================ FILE: README.md ================================================ Debug Recipes ============= It is a repository of my field notes collected while debugging various .NET application problems on Windows (mainly) and Linux. They do not contain much theory but rather describe tools and scripts with some usage examples. :floppy_disk: Old and no longer updated recipes are in the [archived branch](https://github.com/lowleveldesign/debug-recipes/tree/archive). The recipes are available in the guides folder and at **[wtrace.net](https://wtrace.net/guides)** (probably the best way to view them). ## Troubleshooting guides - [Diagnosing .NET applications](guides/diagnosing-dotnet-apps.md) - [Diagnosing native Windows applications](guides/diagnosing-native-windows-apps.md) - [COM troubleshooting](guides/com-troubleshooting) ## Tools usage guides - [WinDbg usage guide](guides/windbg.md) - [Event Tracing for Windows (ETW)](guides/etw.md) - [Using withdll and detours to trace Win API calls](guides/using-withdll-and-detours-to-trace-winapi.md) - [Windows Performance Counters](guides/windows-performance-counters.md) - [Network tracing tools](guides/network-tracing-tools.md) ================================================ FILE: _config.yml ================================================ title: wtrace.net email: contact@wtrace.net description: >- # this means to ignore newlines until "baseurl:" Tools and materials for software and system troubleshooting baseurl: "" # the subpath of your site, e.g. /blog url: "https://wtrace.net" # the base hostname & protocol for your site, e.g. http://example.com youtube_username: "@lowleveldesign" github_username: lowleveldesign permalink: pretty defaults: - scope: path: "" type: "posts" values: permalink: /:year/:month/:day/:title # Build settings theme: minima plugins: - jekyll-feed - jekyll-seo-tag - jekyll-redirect-from - jekyll-sitemap - jemoji header_pages: - guides.md - tools.md - about.md ================================================ FILE: _includes/footer.html ================================================ ================================================ FILE: _includes/head.html ================================================ {%- seo -%} {%- feed_meta -%} {%- if jekyll.environment == 'production' and site.google_analytics -%} {%- include google-analytics.html -%} {%- endif -%} ================================================ FILE: _layouts/home.html ================================================ --- --- {%- include head.html -%} {%- include header.html -%}
{%- if page.title -%}

{{ page.title }}

{% if page.description %}

{{ page.description }}

{% endif %}
{%- endif -%}
{{ content }}
{%- include footer.html -%} ================================================ FILE: _layouts/posts.html ================================================ --- layout: default ---
{%- if page.title -%}

{{ page.title }}

{%- endif -%} {%- if site.posts.size > 0 -%} {%- endif -%}
================================================ FILE: about.md ================================================ --- layout: page title: About --- I am **Sebastian Solnica**, a software engineer with more than 15 years of experience. My primary interests are debugging, profiling, and application security. I created this website to share tools and resources that can help you in your diagnostic endeavors. I also provide consulting services for troubleshooting .NET applications. If you would like to discuss consulting or contact me for any other reason, please use [the contact form on my blog](https://lowleveldesign.org/about/) or email me at contact@wtrace.net.

Credits: this site uses modified icons from the feather set.

Creative Commons License
The published guides are licensed under a Creative Commons Attribution 4.0 International License.

================================================ FILE: articles.md ================================================ --- layout: page title: Articles redirect_to: /guides --- ================================================ FILE: assets/main.scss ================================================ --- # Only the main Sass file needs front matter (the dashes are enough) --- $brand-color: #CA4E07; $credits-color: #707070; @import "minima"; body { background-color: #f6f6ef; } pre, code { background: transparent; } .highlighter-rouge .highlight { background: #f9f9f9; } .highlight .c { color: #6c6c62; } .post-title { @include relative-font-size(2.2); letter-spacing: -1px; line-height: 1; @include media-query($on-laptop) { @include relative-font-size(2.0); } } .post-content { table { table-layout: fixed; } table th { text-align: center; } table td { vertical-align: top; } h2, h3 { margin: 15px 0 15px 0; } } .site-title { @include relative-font-size(1.4); font-weight: 700; line-height: $base-line-height * $base-font-size * 2.25; letter-spacing: -1px; margin-bottom: 0; float: left; text-transform: uppercase; &, &:visited { color: $brand-color; } } .site-nav { .page-link { text-transform: uppercase; font-weight: 600; } } .feature-image { background-color: black; background-repeat: no-repeat; margin-bottom: 10px; padding-top: 50px; height: 300px; .wrapper { color: #ffffff; h1 { font-size: 4rem; font-weight: 900; margin-bottom: 0px } p { font-size: 1.2rem; } } } p.credits { color: $credits-color; padding-top: 10px; margin-top: 10px; } ================================================ FILE: assets/other/EtwMetadata.ps1.txt ================================================ $ErrorActionPreference = "Stop" $MetadataFolder = "$env:LOCALAPPDATA\MyEtwMetadata\ById" $MetadataSearchByNameFolder = "$env:LOCALAPPDATA\MyEtwMetadata\ByName" if (-not (Test-Path $MetadataFolder)) { New-Item -ItemType Directory -Path $MetadataFolder | Out-Null } if (-not (Test-Path $MetadataSearchByNameFolder)) { New-Item -ItemType Directory -Path $MetadataSearchByNameFolder | Out-Null } function _SanitizeFileName { param ([Parameter(Mandatory = $true)]$FileName) [System.IO.Path]::GetInvalidFileNameChars() | ForEach-Object -Process { $FileName = $FileName.Replace($_, [char]'_') } $FileName } Write-Output "Initializing ETW providers metadata... " wevtutil.exe ep | ForEach-Object -Process { $ProviderName = $_ Write-Debug $ProviderName $xml = $(wevtutil.exe gp /f:xml "$_" 2>$null) if ($LASTEXITCODE -eq 0 -and $xml) { $metadata = [xml]$xml $metadata.Save($(Join-Path -Path $MetadataFolder -ChildPath "$($metadata.provider.guid).xml")); $metadata.provider.guid | Out-File $( Join-Path -Path $MetadataSearchByNameFolder -ChildPath "$(_SanitizeFileName $ProviderName).txt") } else { Write-Warning "Invalid metadata for '$ProviderName'" } } function _ResolveKeywords { param ( [Parameter(Mandatory = $true)]$Metadata, [Parameter(Mandatory = $true)][ulong]$Keywords ) if ($Metadata.provider.keywords) { $Metadata.provider.keywords.keyword | ForEach-Object -Process { $MaskValue = [ulong]::Parse($_.mask.TrimStart(@('0', 'x', 'X')), [System.Globalization.NumberStyles]::HexNumber) if ($Keywords -band $MaskValue) { [PSCustomObject]@{ Name = $_.name Value = $MaskValue } } } } } # ** EXPORTS ** function Get-EtwProvidersFromWprProfile { param ( [Parameter(Mandatory = $true)][string]$WprProfilePath ) if (-not (Test-Path $MetadataFolder)) { Write-Error "No metadata found - please run Initialize-EtwProvidersMetadata first." } function ParseProvider([Parameter(ValueFromPipeline = $true, Mandatory = $true)]$ProviderData) { begin {} process { $MetadataPath = (Join-Path -Path $MetadataFolder -ChildPath "$($ProviderData.Name).xml") if (-not (Test-Path $MetadataPath)) { Write-Warning "No metadata found for provider '$($ProviderData.Name)'" return } Write-Debug "Parsing provider '$($ProviderData.Name))'" $Metadata = [xml](Get-Content $MetadataPath) [ulong]$Keywords = 0 if ($ProviderData.Keywords) { $ProviderData.Keywords.Keyword.Value | ForEach-Object -Process { $Keywords = $Keywords -bor ([ulong]::Parse($_.TrimStart(@('0', 'x', 'X')), [System.Globalization.NumberStyles]::HexNumber)) } } else { $Keywords = [ulong]::MaxValue } [ulong]$CaptureOnSaveKeywords = 0 if ($ProviderData.CaptureOnSaveKeywords) { $ProviderData.CaptureStateOnSave.Keyword.Value | ForEach-Object -Process { $CaptureOnSaveKeywords = $CaptureOnSaveKeywords -bor ([ulong]::Parse($_.TrimStart(@('0', 'x', 'X')), [System.Globalization.NumberStyles]::HexNumber)) } } [PSCustomObject]@{ Id = $ProviderData.Name Name = $Metadata.provider.name Keywords = _ResolveKeywords $Metadata $Keywords CaptureOnSaveKeywords = _ResolveKeywords $Metadata $CaptureOnSaveKeywords } } end {} } $xml = [xml](Get-Content $WprProfilePath) $xml.WindowsPerformanceRecorder.Profiles.EventProvider | ParseProvider } function Get-EtwProviderMetadata { param([Parameter(ValueFromPipeline = $true, Mandatory = $true)]$ProviderName) $ProviderId = $ProviderName $Path = $(Join-Path -Path $MetadataSearchByNameFolder -ChildPath "$(_SanitizeFileName $ProviderName).txt") if (Test-Path $Path) { $ProviderId = Get-Content $Path } $MetadataPath = (Join-Path -Path $MetadataFolder -ChildPath "$ProviderId.xml") if (-not (Test-Path $MetadataPath)) { Write-Error "No metadata found for provider '$($ProviderId)'" } $Metadata = [xml](Get-Content $MetadataPath) [PSCustomObject]@{ Id = $ProviderId Name = $Metadata.provider.name Keywords = _ResolveKeywords $Metadata $([ulong]::MaxValue) } } ================================================ FILE: assets/other/WTComTrace.wprp ================================================ ================================================ FILE: assets/other/winapi-user32.ps1.txt ================================================ $ErrorActionPreference = "Stop" Add-Type -TypeDefinition @" using System; public enum GWL_EXSTYLE : int { WS_EX_DLGMODALFRAME = 0x00000001, WS_EX_NOPARENTNOTIFY = 0x00000004, WS_EX_TOPMOST = 0x00000008, WS_EX_ACCEPTFILES = 0x00000010, WS_EX_TRANSPARENT = 0x00000020, WS_EX_MDICHILD = 0x00000040, WS_EX_TOOLWINDOW = 0x00000080, WS_EX_WINDOWEDGE = 0x00000100, WS_EX_CLIENTEDGE = 0x00000200, WS_EX_CONTEXTHELP = 0x00000400, WS_EX_RIGHT = 0x00001000, WS_EX_LEFT = 0x00000000, WS_EX_RTLREADING = 0x00002000, WS_EX_LTRREADING = 0x00000000, WS_EX_LEFTSCROLLBAR = 0x00004000, WS_EX_RIGHTSCROLLBAR = 0x00000000, WS_EX_CONTROLPARENT = 0x00010000, WS_EX_STATICEDGE = 0x00020000, WS_EX_APPWINDOW = 0x00040000, WS_EX_LAYERED = 0x00080000, WS_EX_NOINHERITLAYOUT = 0x00100000, WS_EX_NOREDIRECTIONBITMAP = 0x00200000, WS_EX_LAYOUTRTL = 0x00400000, WS_EX_COMPOSITED = 0x02000000, WS_EX_NOACTIVATE = 0x08000000 } public enum GWL_STYLE : int { WS_OVERLAPPED = 0x00000000, WS_POPUP = unchecked((int)0x80000000), WS_CHILD = 0x40000000, WS_MINIMIZE = 0x20000000, WS_VISIBLE = 0x10000000, WS_DISABLED = 0x08000000, WS_CLIPSIBLINGS = 0x04000000, WS_CLIPCHILDREN = 0x02000000, WS_MAXIMIZE = 0x01000000, WS_CAPTION = 0x00C00000, WS_BORDER = 0x00800000, WS_DLGFRAME = 0x00400000, WS_VSCROLL = 0x00200000, WS_HSCROLL = 0x00100000, WS_SYSMENU = 0x00080000, WS_THICKFRAME = 0x00040000, // WS_GROUP = 0x00020000, // WS_TABSTOP = 0x00010000, WS_MINIMIZEBOX = 0x00020000, WS_MAXIMIZEBOX = 0x00010000, // WS_TILED = WS_OVERLAPPED, // WS_ICONIC = WS_MINIMIZE, // WS_SIZEBOX = WS_THICKFRAME } public enum SWP : uint { SWP_NOSIZE = 0x0001, SWP_NOMOVE = 0x0002, SWP_NOZORDER = 0x0004, SWP_NOREDRAW = 0x0008, SWP_NOACTIVATE = 0x0010, SWP_FRAMECHANGED = 0x0020, SWP_SHOWWINDOW = 0x0040, SWP_HIDEWINDOW = 0x0080, SWP_NOCOPYBITS = 0x0100, SWP_NOOWNERZORDER = 0x0200, SWP_NOSENDCHANGING = 0x0400, // SWP_DRAWFRAME = SWP_FRAMECHANGED, // SWP_NOREPOSITION = SWP_NOOWNERZORDER, SWP_DEFERERASE = 0x2000, SWP_ASYNCWINDOWPOS = 0x4000 } "@ ================================================ FILE: assets/other/windbg-install.ps1.txt ================================================ # script created by @Izybkr (https://github.com/microsoftfeedback/WinDbg-Feedback/issues/19#issuecomment-1513926394) with my minor updates to make it work with latest WinDbg releases): param( $OutDir = ".", [ValidateSet("x64", "x86", "arm64")] $Arch = "x64" ) if (!(Test-Path $OutDir)) { $null = mkdir $OutDir } $ErrorActionPreference = "Stop" if ($PSVersionTable.PSVersion.Major -le 5) { [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 # This is a workaround to get better performance on older versions of PowerShell $ProgressPreference = 'SilentlyContinue' } # Download the appinstaller to find the current uri for the msixbundle Invoke-WebRequest https://aka.ms/windbg/download -OutFile $OutDir\windbg.appinstaller # Download the msixbundle $msixBundleUri = ([xml](Get-Content $OutDir\windbg.appinstaller)).AppInstaller.MainBundle.Uri # Download the msixbundle (but name as zip for older versions of Expand-Archive Invoke-WebRequest $msixBundleUri -OutFile $OutDir\windbg.zip # Extract the 3 msix files (plus other files) Expand-Archive -DestinationPath $OutDir\UnzippedBundle $OutDir\windbg.zip # Expand the build you want - also renaming the msix to zip for Windows PowerShell $fileName = switch ($Arch) { "x64" { "windbg_win-x64" } "x86" { "windbg_win-x86" } "arm64" { "windbg_win-arm64" } } # Rename msix (for older versions of Expand-Archive) and extract the debugger Rename-Item "$OutDir\UnzippedBundle\$fileName.msix" "$fileName.zip" Expand-Archive -DestinationPath "$OutDir\windbg" "$OutDir\UnzippedBundle\$fileName.zip" Remove-Item -Recurse -Force "$OutDir\UnzippedBundle" Remove-Item -Force "$OutDir\windbg.appinstaller" Remove-Item -Force "$OutDir\windbg.zip" # Now you can run: & $OutDir\windbg\DbgX.Shell.exe ================================================ FILE: browserconfig.xml ================================================ #da532c ================================================ FILE: guides/com-troubleshooting.md ================================================ --- layout: page title: COM troubleshooting date: 2023-04-07 08:00:00 +0200 redirect_from: - /articles/com-troubleshooting/ - /articles/com-troubleshooting --- {% raw %} **Table of contents:** - [Quick introduction to COM](#quick-introduction-to-com) - [COM metadata](#com-metadata) - [Troubleshooting COM in WinDbg](#troubleshooting-com-in-windbg) - [Monitoring COM objects in a process](#monitoring-com-objects-in-a-process) - [Tracing COM methods](#tracing-com-methods) - [Stopping the COM monitor](#stopping-the-com-monitor) - [Observing COM interactions outside WinDbg](#observing-com-interactions-outside-windbg) - [Windows Performance Recorder \(wpr.exe\)](#windows-performance-recorder-wprexe) - [Process Monitor](#process-monitor) - [wtrace](#wtrace) - [Troubleshooting .NET COM interop](#troubleshooting-net-com-interop) - [Links](#links) Quick introduction to COM ------------------------- In COM, everything is about interfaces. In old times, when various compiler vendors were fighting over whose "standard" was better, the only reliable way to call C++ class methods contained in third-party libraries was to use virtual tables. As its name suggests virtual table is a table, to be precise, a table of addresses (pointers). The "virtual" adjective relates to the fact that our table's addresses point to virtual methods. If you're familiar with object programming (you plan to debug COM, so you should!), you probably thought of inheritance and abstract classes. And that's correct! The abstract class is how we implement interfaces in C++ (to be more precise [an abstract class with pure virtual methods](https://en.cppreference.com/w/cpp/language/abstract_class)). Now, COM is all about passing pointers to those various virtual tables which happen to have GUID identifiers. The most important interface (parent of all interfaces) is `IUnknown`. Every COM interface must inherit from this interface. Why? For two reasons: to manage the object lifetime and to access all the other interfaces that our object may implement (or, in other words, to find all virtual tables our object is aware of). As this interface is so important, let's have a quick look at its definition: ```cpp struct __declspec(uuid("00000000-0000-0000-C000-000000000046"))) IUnknown { public: virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, void **ppvObject) = 0; virtual ULONG STDMETHODCALLTYPE AddRef( void) = 0; virtual ULONG STDMETHODCALLTYPE Release( void) = 0; }; ``` Guess which methods are responsible for lifetime management and which are for interface querying. OK, so we know the declaration, but to debug COM, we need to understand how COM objects are laid out in the memory. Let's have a look at a sample Probe class (the snippet comes from [my Protoss COM example repository](https://github.com/lowleveldesign/protoss-com-example)): ```cpp struct __declspec(uuid("59644217-3e52-4202-ba49-f473590cc61a")) IGameObject : public IUnknown { public: virtual HRESULT STDMETHODCALLTYPE get_Name(BSTR* name) = 0; virtual HRESULT STDMETHODCALLTYPE get_Minerals(LONG* minerals) = 0; virtual HRESULT STDMETHODCALLTYPE get_BuildTime(LONG* buildtime) = 0; }; struct __declspec(uuid("246A22D5-CF02-44B2-BF09-AAB95A34E0CF")) IProbe : public IUnknown { public: virtual HRESULT STDMETHODCALLTYPE ConstructBuilding(BSTR building_name, IUnknown * *ppUnk) = 0; }; class __declspec(uuid("EFF8970E-C50F-45E0-9284-291CE5A6F771")) Probe final : public IProbe, public IGameObject { ULONG ref_count; /* ... implementation .... */ } ``` If we instantiate (more on that later) the Probe class, its layout in the memory will look as follows: ``` 0:000> dps 0xfb2f58 L4 00fb2f58 72367744 protoss!Probe::`vftable' 00fb2f5c 7236775c protoss!Probe::`vftable' 00fb2f60 00000001 00fb2f64 fdfdfdfd 0:000> dps 72367744 L4 * IProbe interface 72367744 72341bb3 protoss!ILT+2990(?QueryInterfaceProbeUAGJABU_GUIDPAPAXZ) 72367748 72341ba9 protoss!ILT+2980(?AddRefProbeUAGKXZ) 7236774c 723411ae protoss!ILT+425(?ReleaseProbeUAGKXZ) 72367750 723414d3 protoss!ILT+1230(?ConstructBuildingProbeUAGJPA_WPAPAUIUnknownZ) 0:000> dps 7236775c L6 * IGameUnit interface 7236775c 72341e3d protoss!ILT+3640(?QueryInterfaceProbeW3AGJABU_GUIDPAPAXZ) 72367760 723416fe protoss!ILT+1785(?AddRefProbeW3AGKXZ) 72367764 72341096 protoss!ILT+145(?ReleaseProbeW3AGKXZ) 72367768 723415f0 protoss!ILT+1515(?get_NameProbeUAGJPAPA_WZ) 7236776c 723419d8 protoss!ILT+2515(?get_MineralsProbeUAGJPAJZ) 72367770 72341e1a protoss!ILT+3605(?get_BuildTimeProbeUAGJPAJZ) ``` Notice the pointers at the beginning of the object memory. As you can see in the snippet, those pointers reference arrays of function pointers or, as you remember, virtual tables. Each virtual table represents a COM interface, like `IProbe` or `IGameObject` in our case. Let's now briefly discuss the creation of COM objects. We usually start by calling one of the well-known Co-functions to create a COM object. Often, it's either `CoCreateInstance` or `CoGetClassObject`. Those functions perform actions defined in the COM registration (either in a manifest file or in the registry). In the most common (and most straightforward scenario), they load a dll and run the exported `DllGetClassObject` function: ```cpp HRESULT DllGetClassObject([in] REFCLSID rclsid, [in] REFIID riid, [out] LPVOID *ppv); ``` On a successful return, the `*ppv` value should point to an address of the virtual table representing a COM interface with the IID equal to `riid`. And this address will be a part of memory belonging to a COM object of the type identified by the `rclsid`. People often say that COM is complicated. As you just saw, COM fundamentals are clear and straightforward. However, its various implementations might cause a headache. For example, there are myriads of methods in OLE and ActiveX interfaces created to make it possible to drag/drop things between windows, use the clipboard, or embed one control in another. Remember, though, that all those crazy interfaces still need to implement `IUnknown`. And that's the advantage we can take as troubleshooters. It's easy to track new instance creations, interface queries, and interface method calls (often even with their names). That may give us enough insights to debug a problem successfully. ### COM metadata COM metadata, saved in type libraries, provides definitions of COM classes and COM interfaces. Thanks to it, we can decode method names and their argument values without debugging symbols. The tool we usually use to view the type libraries installed in the system is [OleView](https://learn.microsoft.com/en-us/windows/win32/com/ole-com-object-viewer), part of the Windows SDK. OleView has some open-source alternatives, such as [.NET OLE/COM viewer](https://github.com/tyranid/oleviewdotnet) or [OleWoo](https://github.com/leibnitz27/olewoo). [Comon](https://github.com/lowleveldesign/comon) also provides the **!cometa** command, which allows you to use COM metadata without leaving WinDbg. Before the debugging session, it is worth taking a moment to build the cometa database with the **!cometa index** command. The database resides in a temporary folder. It's an SQLite database, so you may copy it between machines. Other comon commands will use the cometa database to resolve class and interface IDs to meaningful names. You may also do some basic queries against the database with the **!cometa showc** and **!cometa showi** commands, for example: ``` 0:000> !cometa showi {59644217-3E52-4202-BA49-F473590CC61A} Found: {59644217-3E52-4202-BA49-F473590CC61A} (IGameObject) Methods: - [0] HRESULT QueryInterface(void* this, GUID* riid, void** ppvObject) - [1] ULONG AddRef(void* this) - [2] ULONG Release(void* this) - [3] HRESULT get_Name(void* this, BSTR* Name) - [4] HRESULT get_Minerals(void* this, long* Minerals) - [5] HRESULT get_BuildTime(void* this, long* BuildTime) Registered VTables for IID: - Module: protoss, CLSID: {EFF8970E-C50F-45E0-9284-291CE5A6F771} (Probe), VTable offset: 0x3775c - Module: protoss, CLSID: {F5353C58-CFD9-4204-8D92-D274C7578B53} (Nexus), VTable offset: 0x37710 ``` Troubleshooting COM in WinDbg ----------------------------- ### Monitoring COM objects in a process There are various ways in which COM objects can be created. When a given function creates a COM object, you will see a `void **` as one of its arguments. After a successful call, this pointer will point to a new COM object. Let's check how we can trace such a creation. We will use breakpoints to monitor calls to the `CoCreateInstance(REFCLSID rclsid, LPUNKNOWN pUnkOuter, DWORD dwClsContext, REFIID riid, LPVOID *ppv)` function. We are interested in the class (`rclsid`) and interface (`riid`) values, and the address of the created COM object (`*ppv`). When debugging a 64-bit process, our breakpoint command might look as follows: ``` bp combase!CoCreateInstance ".echo ==== combase!CoCreateInstance ====; dps @rsp L8; dx *(combase!GUID*)@rcx; dx *(combase!GUID*)@r9; .printf /D \"==> obj addr: %p\", poi(@rsp+28);.echo; bp /1 @$ra; g" ``` The `bp /1 @$ra` part creates a one-time breakpoint at a function return address. This second breakpoint will stop the process execution and allow us to examine the results of the function call. At this time, the `rax` register will show the return code (should be `0` for a successful call), and the created COM object (and also the interface virtual) will be at the previously printed object address. For the sake of completeness, let me show you the 32-bit version of this breakpoint: ``` bp combase!CoCreateInstance ".echo ==== combase!CoCreateInstance ====; dps @esp L8; dx **(combase!GUID **)(@esp + 4); dx **(combase!GUID **)(@esp + 0x10); .printf /D \"==> obj addr: %p\", poi(@esp+14);.echo; bp /1 @$ra; g" ``` Creating such breakpoints for various COM functions might be a mundane task, especially when we consider that our only point in doing so is to save the addresses of the virtual tables. **Fortunately, [comon](https://github.com/lowleveldesign/comon) might be of help here**. In-process COM creation usually ends in a call to the `DllGetClassObject` function exported by the DLL implementing a given COM object. After **attaching to a process** (**!comon attach**), comon creates breakpoints on all such functions and checks the results of their executions. It also breaks when a process calls `CoRegisterClassObject`, a function called by out-of-process COM servers to register the COM objects they host. After you attach comon to a debugged process, you should see various log messages showing COM object creations, for example: ``` 0:000> !comon attach COM monitor enabled for the current process. 0:000> g ... [comon] 0:000 [protoss!DllGetClassObject] CLSID: {EFF8970E-C50F-45E0-9284-291CE5A6F771} (Protoss Probe), IID: {00000001-0000-0000-C000-000000000046} (IClassFactory) -> SUCCESS (0x0) [comon] 0:000 [IClassFactory::CreateInstance] CLSID: {EFF8970E-C50F-45E0-9284-291CE5A6F771} (Protoss Probe), IID: {246A22D5-CF02-44B2-BF09-AAB95A34E0CF} (IProbe) -> SUCCESS (0x0) [comon] 0:000 [IUnknown::QueryInterface] CLSID: {EFF8970E-C50F-45E0-9284-291CE5A6F771} (Protoss Probe), IID: {59644217-3E52-4202-BA49-F473590CC61A} (IGameObject) -> SUCCESS (0x0) [comon] 0:000 [protoss!DllGetClassObject] CLSID: {F5353C58-CFD9-4204-8D92-D274C7578B53} (Protoss Nexus), IID: {00000001-0000-0000-C000-000000000046} (IClassFactory) -> SUCCESS (0x0) [comon] 0:000 [IClassFactory::CreateInstance] CLSID: {F5353C58-CFD9-4204-8D92-D274C7578B53} (Protoss Nexus), IID: {C5F45CBC-4439-418C-A9F9-05AC67525E43} (INexus) -> SUCCESS (0x0) [comon] 0:000 [IUnknown::QueryInterface] CLSID: {F5353C58-CFD9-4204-8D92-D274C7578B53} (Protoss Nexus), IID: {59644217-3E52-4202-BA49-F473590CC61A} (IGameObject) -> SUCCESS (0x0) ... ``` The `QueryInterface` calls will show up only for the first time; it won't be reported if we have the virtual table for a given interface already registered in the cometa database. To check the COM objects registered in a given session, run the **!comon status** command, for example: ``` 0:000> !comon status COM monitor is RUNNING COM types recorded for the current process: CLSID: {F5353C58-CFD9-4204-8D92-D274C7578B53} (Nexus) IID: {C5F45CBC-4439-418C-A9F9-05AC67525E43} (INexus), address: 0x723676f8 IID: {00000001-0000-0000-C000-000000000046} (N/A), address: 0x7236694c IID: {59644217-3E52-4202-BA49-F473590CC61A} (IGameObject), address: 0x72367710 CLSID: {EFF8970E-C50F-45E0-9284-291CE5A6F771} (Probe) IID: {00000001-0000-0000-C000-000000000046} (N/A), address: 0x72366968 IID: {59644217-3E52-4202-BA49-F473590CC61A} (IGameObject), address: 0x7236775c IID: {246A22D5-CF02-44B2-BF09-AAB95A34E0CF} (IProbe), address: 0x72367744 ``` The `cometa` queries show now also return information about the registered virtual tables: ``` 0:000> !cometa showc {F5353C58-CFD9-4204-8D92-D274C7578B53} Found: {F5353C58-CFD9-4204-8D92-D274C7578B53} (Nexus) Registered VTables for CLSID: - module: protoss, IID: {00000001-0000-0000-C000-000000000046} (N/A), VTable offset: 0x3694c - module: protoss, IID: {59644217-3E52-4202-BA49-F473590CC61A} (IGameObject), VTable offset: 0x37710 - module: protoss, IID: {C5F45CBC-4439-418C-A9F9-05AC67525E43} (INexus), VTable offset: 0x376f8 ``` ### Tracing COM methods When we know the interface virtual table address, nothing can stop us from creating breakpoints on interface methods :) I will first show you how to do that manually and later present how [comon](https://github.com/lowleveldesign/comon) may help. The first step is to find the offset of our method in the interface definition. Let's stick to the Protoss COM example and let's create a breakpoint on the `get_Minerals` method/property from the `IGameObject` interface: ``` 0:000> !cometa showi {59644217-3E52-4202-BA49-F473590CC61A} Found: {59644217-3E52-4202-BA49-F473590CC61A} (IGameObject) Methods: - [0] HRESULT QueryInterface(void* this, GUID* riid, void** ppvObject) - [1] ULONG AddRef(void* this) - [2] ULONG Release(void* this) - [3] HRESULT get_Name(void* this, BSTR* Name) - [4] HRESULT get_Minerals(void* this, long* Minerals) - [5] HRESULT get_BuildTime(void* this, long* BuildTime) Registered VTables for IID: - Module: protoss, CLSID: {EFF8970E-C50F-45E0-9284-291CE5A6F771} (Probe), VTable offset: 0x3775c - Module: protoss, CLSID: {F5353C58-CFD9-4204-8D92-D274C7578B53} (Nexus), VTable offset: 0x37710 ``` We can see that its ordinal number is four, and two virtual tables are registered for our interface (two classes implementing it). Let's focus on the `Probe` class. To set a breakpoint method, we can use the `bp` command: ``` bp poi(protoss + 0x3775c + 4 * $ptrsize) ``` Similarly, if we would like to set breakpoints on all the `IGameObject` methods, we might use a loop: ``` .for (r $t0 = 0; @$t0 < 6; r $t0 = @$t0 + 1) { bp poi(protoss + 0x3775c + @$t0 * @$ptrsize) } ``` Instead of setting breakpoints manually, you may use the **!cobp** command from the comon extension. It also creates a breakpoint (you will see it if you run the bl command), but on hit, comon will decode the method parameters (for the supported types). It will also automatically create a one-time breakpoint on the method return address, displaying the return code and method out parameter values. The optional parameter lets you decide if you'd like to stop when cobreakpoint is hit. An example output might look as follows: ``` 0:000> !cobp --always {EFF8970E-C50F-45E0-9284-291CE5A6F771} {59644217-3E52-4202-BA49-F473590CC61A} get_Name [comon] Breakpoint 18 (address 0x723415f0) created / updated 0:000> g [comon breakpoint] IGameObject::get_Name (iid: {59644217-3E52-4202-BA49-F473590CC61A}, clsid: {EFF8970E-C50F-45E0-9284-291CE5A6F771}) Parameters: - this: 0xfb2f5c (void*) - Name: 0x81fc1c (BSTR*) [out] 0:000> dps 0081fc1c L1 0081fc1c 00000000 0:000> g [comon breakpoint] IGameObject::get_Name (iid: {59644217-3E52-4202-BA49-F473590CC61A}, clsid: {EFF8970E-C50F-45E0-9284-291CE5A6F771}) return Result: 0x0 (HRESULT) Out parameters: - Name: 0x81fc1c (BSTR*) 0:000> du 00f9c6ac 00f9c6ac "Probe" ``` If comon can't decode a given parameter, you may use the **dx** command with combase.dll symbols (one of the rare Microsoft DLLs that comes with private symbols), for example: `dx -r2 (combase!DISPPARAMS *)(*(void **)(@esp+0x18))` or `dx -r1 ((combase!tagVARIANT[3])0x31ec1f0)`. ### Stopping the COM monitor Run the **!comon detach** command to stop the COM monitor. This command will remove all the comon breakpoints and debugging session data, but you can still examine COM metadata with the cometa command. Observing COM interactions outside WinDbg ----------------------------------------- Sometimes we only need basic information about COM interactions, such as which objects are used and how they are launched. While WinDbg can be overkill for such scenarios, there are several simpler tools we can use to collect this additional information. ### Windows Performance Recorder (wpr.exe) Let's begin with wpr.exe, a powerful tool that's likely already installed on your system. WPR requires profile files to configure tracing sessions. For basic COM event collection, you can use [the ComTrace.wprp profile](https://raw.githubusercontent.com/microsoft/winget-cli/refs/heads/master/tools/COMTrace/ComTrace.wprp) from [the winget-cli repository](https://github.com/microsoft/winget-cli). I've also created an enhanced profile, adding providers found in the [TSS scripts](https://learn.microsoft.com/en-us/troubleshoot/windows-client/windows-tss/introduction-to-troubleshootingscript-toolset-tss), which you can download **[here](/assets/other/WTComTrace.wprp)**. You can use those profiles either solely or in combination with other profiles, as shown in the examples below. ```shell # Collect only COM events wpr.exe -start .\WTComTrace.wprp -filemode # Run COM apps ... # Stop the trace when done wpr -stop C:\temp\comtrace.etl # Collect COM events with CPU sampling wpr.exe -start CPU -start .\WTComTrace.wprp -filemode # Run COM apps ... # Stop the trace when done wpr -stop C:\temp\comtrace.etl ``` Some providers are the [legacy WPP providers](https://learn.microsoft.com/en-us/windows-hardware/drivers/devtest/wpp-software-tracing), which require TMF files to read the collected events. Fortunately, the PDB file for compbase.dll contains the required TMF data and we can decode those events. To view the collected data, open the ETL file in **[Windows Performance Analyzer (WPA)](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-analyzer)**. Remember to load symbols first (check [the Windows configuration guide](guides/configuring-windows-for-effective-troubleshooting/#configuring-debug-symbols) how to configure symbols globally in the system), then navigate to the **Generic Events** category and open the **WPP Trace** view. ### Process Monitor In **[Process Monitor](https://learn.microsoft.com/en-us/sysinternals/downloads/procmon)**, we can include Registry and Process events and events where Path contains `\CLSID\` or `\AppID` strings or ends with `.dll`, as in the image below: ![](/assets/img/procmon-filters.png) The collected events should tell us which COM objects the application initiated and in which way. For example, if procmon shows a DLL path read from the `InprocServer32` and then we see this dll loaded, we may assume that the application created a given COM object (the event call stack may be an additional proof). If the COM server runs in a standalone process or a remote machine, other keys will be queried. We may then check the Process Tree or Network events for more details. [COM registry keys official documentation](https://learn.microsoft.com/en-us/windows/win32/com/com-registry-keys) is thorough, so please consult it to learn more. ### wtrace In **[wtrace](https://github.com/lowleveldesign/wtrace)**, we need to pick the proper handlers and define filters. An example command line might look as follows: ```shell wtrace --handlers registry,process,rpc -f 'path ~ \CLSID\' -f 'path ~ \AppID\' -f 'path ~ rpc' -f 'pname = ProtossComClient' ``` As you can see, wtrace may additionally show information about RPC (Remote Procedure Call) events. Troubleshooting .NET COM interop -------------------------------- A native COM object must be wrapped into a Runtime Callable Wrapper (RCW) to be accessible to managed code. RCW binds a managed object (for example, `System.__Com`) and a native COM class instance. COM Callable Wrappers (CCW) work in the opposite direction - thanks to them, we may expose .NET objects to the COM world. Interestingly, the object interop usage is saved in the object's SyncBlock. Therefore, it should not come as a surprise that the **!syncblk** command from [the SOS extension](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/sos-debugging-extension) presents information about RCWs and CCWs: ``` 0:011> !syncblk Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner ----------------------------- Total 5 CCW 1 RCW 0 ComClassFactory 0 Free 3 ``` When we add the **-all** parameter, **!syncblk** will list information about the created SyncBlocks with their corresponding objects, for example: ``` 0:007> !syncblk -all Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner 1 07FF8F54 0 0 00000000 none 030deb48 System.__ComObject 2 07FF8F20 0 0 00000000 none 030deb3c EventTesting 3 00000000 0 0 00000000 none 0 Free 4 00000000 0 0 00000000 none 0 Free 5 00000000 0 0 00000000 none 0 Free ----------------------------- Total 5 CCW 1 RCW 0 ComClassFactory 0 Free 3 ``` Now, we can dump information about managed objects using the **!dumpobj** command, for example: ``` 0:006> !dumpobj 030deb3c Name: EventTesting MethodTable: 08301668 EEClass: 082f7110 CCW: 0833ffe0 Tracked Type: false Size: 12(0xc) bytes File: c:\repos\testing-com-events\bin\NETServer.dll Fields: MT Field Offset Type VT Attr Value Name 0830db50 4000003 4 ...ng+OnEventHandler 0 instance 00000000 onEvent``` ``` The good news is that the **!dumpobj** command also checks if a given object has a SyncBlock assigned and dumps information from it. In this case, it's the address of CCW. We may get more details about it by using the **!dumpccw** command: ``` 0:011> !dumpccw 08060000 Managed object: 02e6cf88 Outer IUnknown: 00000000 Ref count: 0 Flags: RefCounted Handle: 00D714F8 (WEAK) COM interface pointers: IP MT Type 08060010 080315b0 Server.Contract.IEventTesting ``` Notice here that there is only one interface implemented by the managed object and the CCW is no longer in use by the native code (Ref count equals 0). Below is an example of a CCW representing a Windows Forms ActiveX control which is still alive and implements more interfaces: ``` 0:014> !dumpccw 0a23fde0 Managed object: 04ee6984 Outer IUnknown: 00000000 Ref count: 7 Flags: RefCounted Handle: 04C716D8 (STRONG) COM interface pointers: IP MT Type 0A23FDF8 09fbbb04 Interop+Ole32+IOleControl 0A23FDC8 09fbbc4c Interop+Ole32+IOleObject 0A23FDCC 09fbbd34 Interop+Ole32+IOleInPlaceObject 0A23FDD0 09fbbde4 Interop+Ole32+IOleInPlaceActiveObject 0A23FDA8 09fbbfa0 Interop+Ole32+IViewObject2 0A23FDB0 09fbc09c Interop+Ole32+IPersistStreamInit 0A23FD4C 09f6485c BullsEyeControlLib.IBullsEye ``` If you would like to dump information about all objects associated with SyncBlocks, you may use the following WinDbg script: ``` .foreach /pS 7 /ps 7 (addr { !syncblk -all }) { !do addr } ``` And to extract only the RCW or CCW addresses, we could use the **!grep** command from the [awesome Andrew Richard's PDE extension](https://onedrive.live.com/?authkey=%21AJeSzeiu8SQ7T4w&id=DAE128BD454CF957%217152&cid=DAE128BD454CF957): ``` 0:014> .load PDE.dll 0:014> !grep RCW: .foreach /pS 7 /ps 7 (addr { !syncblk -all }) { !do addr } RCW: 08086d30 0:014> !grep CCW: .foreach /pS 7 /ps 7 (addr { !syncblk -all }) { !do addr } CCW: 08060000 ``` To keep COM objects alive in the managed memory, .NET Runtime creates handles for them. Those are either strong or ref-counted handles and we may list them with the **!gchandles** command, for example: ``` 0:011> !gchandles -type refcounted Handle Type Object Size Data Type 00D714F8 RefCounted 02e6cf88 12 0 EventTesting Statistics: MT Count TotalSize Class Name 08031668 1 12 EventTesting Total 1 objects 0:014> !gchandles -type strong Handle Type Object Size Data Type 04C711B4 Strong 030deb48 12 System.__ComObject ... Statistics: MT Count TotalSize Class Name 04ebbf00 1 12 System.__ComObject ... Total 19 objects ``` Of course, in those lists we will find the objects we already saw in the **!syncblk** output, so it's just another way to find them. It may be useful when tracking, for example, GC leaks. Finally, to find who is keeping our managed object alive, we could use the **!gcroot** command. And it's quite easy to find the GC roots for a particular type with the following script: ``` .foreach (addr { !DumpHeap -short -type System.__ComObject }) { !gcroot addr } ``` Links ----- - ["Essential COM"](https://archive.org/details/essentialcom00boxd) by Don Box - ["Inside OLE"](https://github.com/kraigb/InsideOLE) by Kraig Brockschmidt (Kraig published the whole book with source code on GitHub!) - ["Inside COM+ Base Services"](https://thrysoee.dk/InsideCOM+/) by Guy Eddon and Henry Eddon - ["COM and .NET interoperability"](https://link.springer.com/book/10.1007/978-1-4302-0824-2) and [source code](https://github.com/Apress/com-.net-interoperability) by Andrew Troelsen - [".NET and COM: The Complete Interoperability Guide"](https://books.google.pl/books/about/NET_and_COM.html?id=x2OIPSyFLBcC) by Adam Nathan - [COM+ revisited](https://lowleveldesign.wordpress.com/2022/01/17/com-revisited/) by me :) - [Calling Local Windows RPC Servers from .NET](https://googleprojectzero.blogspot.com/2019/12/calling-local-windows-rpc-servers-from.html) by James Forshaw {% endraw %} ================================================ FILE: guides/configuring-linux-for-effective-troubleshooting.md ================================================ --- layout: page title: Configuring Linux for effective troubleshooting date: 2025-12-26 08:00:00 +0200 --- **Table of contents:** - [Configuring debug symbols](#configuring-debug-symbols) Configuring debug symbols ------------------------- These days many debugging tools can fetch debug symbols from debuginfod servers. The [official project page](https://sourceware.org/elfutils/Debuginfod.html) lists the URLs you should use for each supported distribution. For example, in my Arch Linux, the `DEBUGINFOD_URLS` environment variable is set to `https://debuginfod.archlinux.org` by the `/etc/profile.d/debuginfod.sh` script (a part of the libelf package). If you want this variable to be preserved when running commands with sudo, you can add a rule such as the following to a file in `/etc/sudoers.d/` (e.g., `/etc/sudoers.d/debuginfod`): ``` Defaults env_keep += "DEBUGINFOD_URLS" ``` ================================================ FILE: guides/configuring-windows-for-effective-troubleshooting.md ================================================ --- layout: page title: Configuring Windows for effective troubleshooting date: 2023-10-11 08:00:00 +0200 --- **Table of contents:** - [Configuring debug symbols](#configuring-debug-symbols) - [Replacing Task Manager with System Informer](#replacing-task-manager-with-system-informer) - [Installing and configuring Sysinternals Suite](#installing-and-configuring-sysinternals-suite) - [Configuring post-mortem debugging](#configuring-post-mortem-debugging) ## Configuring debug symbols Staring at raw hex numbers is not very helpful for troubleshooting. Therefore, it's essential to take the time to properly configure debug symbols on our system. One effective method is to set the **\_NT\_SYMBOL\_PATH** environment variable. Most troubleshooting tools read its value and utilize the specified symbol stores. I usually configure it to point only to the official Microsoft symbol server, resulting in the following value for the \_NT\_SYMBOL\_PATH variable on my system: `SRV*C:\symbols\dbg*https://msdl.microsoft.com/download/symbols`. Here, `C:\symbols` serves as a cache folder for storing downloaded symbols. I also use `C:\symbols\dbg` if I need to index PDB files for my applications. For further information about the \_NT\_SYMBOL\_PATH variable, refer to [the official documentation](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/symbol-path). The symbol path variable is one essential component required for successful symbol resolution. Another critical aspect is the version of **dbghelp.dll** that can work with symbol servers. Unfortunately, the version preinstalled with Windows lacks this feature. To overcome this issue, you can install the **Debugging Tools for Windows** from the [Windows SDK](https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/). Make sure to install both the x86 and x64 versions to enable debugging of both 32- and 64-bit applications. Once installed, certain tools (e.g., Symbol Informer) will automatically select the appropriate dbghelp.dll version, while others will require some configuration, as we'll explore in later sections. ## Replacing Task Manager with System Informer My long time favorite tool to observe system and processes running on it, is [System Informer](https://www.systeminformer.com/), formerly known as Process Hacker. It has so many great features that deserves a guide on its own. The process tree, which shows the process creation and termination events, is much more readable than the flat process list in Task Manager or Resource Monitor. Moreover, System Informer lets you manage services and drivers, and view live network connections. Therefore, I highly recommend to open the Options dialog and replace Task Manager with it. System Informer does not have an option to set the dbghelp.dll path in its settings, but it will detect it if you have Debugging Tools for Windows installed. So please install them to have Windows stacks correctly resolved. If you have reasons not to use System Informer, you can try [Process Explorer](https://learn.microsoft.com/en-us/sysinternals/downloads/process-explorer). It does not have as many functionalities as System Informer, but it is still a powerful system monitor. ## Installing and configuring Sysinternals Suite [Sysinternals tools](https://learn.microsoft.com/en-us/sysinternals/) help me diagnose and fix various issues on Windows systems. Most often I use [Process Monitor](https://learn.microsoft.com/en-us/sysinternals/downloads/procmon) to capture and analyze system events, and sometimes that's the only tool I need to solve the problem! Other Sysinternals tools that I frequently use are [DebugView](https://learn.microsoft.com/en-us/sysinternals/downloads/debugview), [ProcDump](https://learn.microsoft.com/en-us/sysinternals/downloads/procdump), and [LiveKd](https://learn.microsoft.com/en-us/sysinternals/downloads/livekd). You can get the entire suite or individual tools from the [SysInternals website](https://learn.microsoft.com/en-us/sysinternals/downloads/) or from [live.sysinternals.com](https://live.sysinternals.com). However, these methods require manual updates when new versions are available. A more convenient way to keep the tools up to date is to install them from [Microsoft Store](https://www.microsoft.com/store/apps/9p7knl5rwt25). To get the most out of Process Monitor and Process Explorer, you need to set up symbol resolution correctly. The default settings do not use the Microsoft symbol store, so you need to adjust them in the options or import the registry keys shown below (after installing Debugging Tools for Windows): ``` [HKEY_CURRENT_USER\Software\Sysinternals\Process Explorer] "DbgHelpPath"="C:\\Program Files (x86)\\Windows Kits\\10\\Debuggers\\x64\\dbghelp.dll" "SymbolPath"="SRV*C:\\symbols\\dbg*http://msdl.microsoft.com/download/symbols" [HKEY_CURRENT_USER\Software\Sysinternals\Process Monitor] "DbgHelpPath"="C:\\Program Files (x86)\\Windows Kits\\10\\Debuggers\\x64\\dbghelp.dll" "SymbolPath"="SRV*C:\\symbols\\dbg*http://msdl.microsoft.com/download/symbols" ``` ## Configuring post-mortem debugging We all experience application failures from time to time. When it happens, Windows collectes some data about a crash and saves it to the event log. It usually lacks details required to fully understand the root cause of an issue. Fortunately, we have options to replace this scarse report with, for example, a memory dump. One way to accomplish that is by configuring **Windows Error Reporting** . The commands below will enable minidump collection to a C:\Dumps folder on a process failure: ```shell reg.exe add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps" /v DumpType /t REG_DWORD /d 1 /f reg.exe add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps" /v DumpFolder /t REG_EXPAND_SZ /d C:\dumps /f ``` The available settings are listed and explained in the [WER documentation](https://learn.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps). Note, that by creating a subkey with an application name (for example, `HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\test.exe`), you may customize WER settings per individual applications. [ProcDump](https://learn.microsoft.com/en-us/sysinternals/downloads/procdump) is an alternative to WER. You could install it as an [automatic debugger](https://learn.microsoft.com/en-us/windows/win32/debug/configuring-automatic-debugging), which Windows will run whenever a critical error occurs in an application. Example install command (-u to uninstall): ```shell procdump -i C:\Dumps ``` These dumps can take up a lot of disk space over time, so you should either delete the old files periodically, or set up a task scheduler job that does it for you. ================================================ FILE: guides/diagnosing-dotnet-apps.md ================================================ --- layout: page title: Diagnosing .NET applications date: 2024-01-01 08:00:00 +0200 --- {% raw %} :point_right: I also authored the **[.NET Diagnostics Expert](https://diagnosticsexpert.com/?utm_source=debugrecipes&utm_medium=banner&utm_campaign=general) course**, available at Dotnetos :hot_pepper: Academy. Apart from the theory, it contains lots of demos and troubleshooting guidelines. Check it out if you're interested in learning .NET troubleshooting. :point_left: **Table of contents:** - [General .NET debugging tips](#general-net-debugging-tips) - [Loading the SOS extension into WinDbg](#loading-the-sos-extension-into-windbg) - [Manually loading symbol files for .NET Core](#manually-loading-symbol-files-for-net-core) - [Disabling JIT optimization](#disabling-jit-optimization) - [Decoding managed stacks in Sysinternals](#decoding-managed-stacks-in-sysinternals) - [Check runtime version](#check-runtime-version) - [Debugging/tracing a containerized .NET application \(Docker\)](#debuggingtracing-a-containerized-net-application-docker) - [Diagnosing exceptions or erroneous behavior](#diagnosing-exceptions-or-erroneous-behavior) - [Using Time Travel Debugging \(TTD\)](#using-time-travel-debugging-ttd) - [Collecting a memory dump](#collecting-a-memory-dump) - [Analysing exception information](#analysing-exception-information) - [Diagnosing hangs](#diagnosing-hangs) - [Listing threads call stacks](#listing-threads-call-stacks) - [Finding locks in managed code](#finding-locks-in-managed-code) - [Diagnosing waits or high CPU usage](#diagnosing-waits-or-high-cpu-usage) - [Diagnosing managed memory leaks](#diagnosing-managed-memory-leaks) - [Collecting memory snapshots](#collecting-memory-snapshots) - [Analyzing collected snapshots](#analyzing-collected-snapshots) - [Diagnosing issues with assembly loading](#diagnosing-issues-with-assembly-loading) - [Troubleshooting loading with EventPipes/ETW \(.NET\)](#troubleshooting-loading-with-eventpipesetw-net) - [Troubleshooting loading using ETW \(.NET Framework\)](#troubleshooting-loading-using-etw-net-framework) - [Troubleshooting loading using Fusion log \(.NET Framework\)](#troubleshooting-loading-using-fusion-log-net-framework) - [GAC \(.NET Framework\)](#gac-net-framework) - [Find assembly in cache](#find-assembly-in-cache) - [Uninstall assembly from cache](#uninstall-assembly-from-cache) - [Diagnosing network connectivity issues](#diagnosing-network-connectivity-issues) - [.NET Core](#net-core) - [.NET Framework](#net-framework) - [ASP.NET Core](#aspnet-core) - [Collecting ASP.NET Core logs](#collecting-aspnet-core-logs) - [ILogger logs](#ilogger-logs) - [DiagnosticSource logs](#diagnosticsource-logs) - [Collecting ASP.NET Core performance counters](#collecting-aspnet-core-performance-counters) - [ASP.NET \(.NET Framework\)](#aspnet-net-framework) - [Examining ASP.NET process memory \(and dumps\)](#examining-aspnet-process-memory-and-dumps) - [Profiling ASP.NET](#profiling-aspnet) - [Application instrumentation](#application-instrumentation) - [ASP.NET ETW providers](#aspnet-etw-providers) - [Collect events using the Perfecto tool](#collect-events-using-the-perfecto-tool) - [Collect events using FREB](#collect-events-using-freb) ## General .NET debugging tips ### Loading the SOS extension into WinDbg When debugging a **.NET Framework application**, WinDbgX should automatically find a correct version of the SOS.dll. If it fails to do so and your .NET Framework version matches the one of the target app, use the following command: ``` .loadby sos mscorwks (.NET 2.0/3.5) .loadby sos clr (.NET 4.0+) ``` For **.NET Core**, you need to download and install the **dotnet-sos** tool. The install command informs how to load SOS into WinDbg, for example: ``` > dotnet tool install -g dotnet-sos ... > dotnet sos install ... Execute '.load C:\Users\me\.dotnet\sos\sos.dll' to load SOS in your Windows debugger. Cleaning up... SOS install succeeded ``` SOS commands sometimes get overriden by other extensions help files. In such case, use **!sos.help \[cmd\]** command, for example, `!sos.help !savemodule`. ### Manually loading symbol files for .NET Core I noticed that sometimes Microsoft public symbol servers do not have .NET Core dlls symbols. That does not allow WinDbg to decode native .NET stacks. Fortunately, we may solve this problem by precaching symbol files using the [dotnet-symbol](https://github.com/dotnet/symstore/tree/master/src/dotnet-symbol) tool. Assuming we set our `_NT_SYMBOL_PATH` to `SRV*C:\symbols\dbg*https://msdl.microsoft.com/download/symbols`, we need to run dotnet-symbol with the **--cache-directory** parameter pointing to our symbol cache folder (for example, `C:\symbols\dbg`): ``` dotnet-symbol --recurse-subdirectories --cache-directory c:\symbols\dbg -o C:\temp\toremove "C:\Program Files\dotnet\shared\Microsoft.NETCore.App\3.0.0\*" ``` We may later remove the `C:\temp\toremove` folder as all PDB files are indexed in the cache directory. The output folder contains both DLL and PDB files, takes lots of space, and is often not required. ### Disabling JIT optimization For **.NET Core**, set the **COMPlus_JITMinOptsx** environment variable: ``` export COMPlus_JITMinOpts=1 ``` For **.NET Framework**, you need to create an ini file. The ini file must have the same name as the executable with only extension changed to ini, eg. my.ini file will work with my.exe application. ``` [.NET Framework Debugging Control] GenerateTrackingInfo=1 AllowOptimize=0 ``` ### Decoding managed stacks in Sysinternals As of version 16.22 version, **Process Explorer** understands managed stacks and should display them correctly when you double click on a thread in a process. **Process Monitor**, unfortunately, lacks this feature. Pure managed modules will appear as `` in the call stack view. However, we may fix the problem for the ngened assemblies. First, you need to generate a .pdb file for the ngened assembly, for example, `ngen createPDB c:\Windows\assembly\NativeImages_v4.0.30319_64\mscorlib\e2c5db271896923f5450a77229fb2077\mscorlib.ni.dll c:\symbols\private`. Then make sure you have this path in your `_NT_SYMBOL_PATH` variable, for example, `C:\symbols\private;SRV*C:\symbols\dbg*http://msdl.microsoft.com/download/symbols`. If procmon still does not resolve the symbols, go to Options - Configure Symbols and reload the dbghelp.dll. I observe this issue in version 3.50. ### Check runtime version For .NET Framework 2.0, you could check the version of mscorwks in the file properties or, if in debugger, using lmmv. For .NET Framework 4.x, you need to check clr.dll (or the Release value under the `HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full` key) and find it in the [Microsoft Docs](https://docs.microsoft.com/en-us/dotnet/framework/migration-guide/versions-and-dependencies). In .NET Core, we could run **dotnet --list-runtimes** command to list the available runtimes. ### Debugging/tracing a containerized .NET application (Docker) With the introduction of EventPipes in .NET Core 2.1, the easiest approach is to create a shared `/tmp` volume and use a sidecar diagnostics container. A sample Dockerfile.netdiag may look as follows: ``` FROM mcr.microsoft.com/dotnet/sdk:5.0 AS base RUN apt-get update && apt-get install -y lldb; \ dotnet tool install -g dotnet-symbol; \ dotnet tool install -g dotnet-sos; \ /root/.dotnet/tools/dotnet-sos install RUN dotnet tool install -g dotnet-counters; \ dotnet tool install -g dotnet-trace; \ dotnet tool install -g dotnet-dump; \ dotnet tool install -g dotnet-gcdump; \ echo 'export PATH="$PATH:/root/.dotnet/tools"' >> /root/.bashrc ENTRYPOINT ["/bin/bash"] ``` You may use it to create a .NET diagnostics Docker image, for example: ``` $ docker build -t netdiag -f .\Dockerfile.netdiag . ``` Then, create a `/tmp` volume and mount it into your .NET application container, for example: ``` $ docker volume create dotnet-tmp $ docker run --rm --name helloserver --mount "source=dotnet-tmp,target=/tmp" -p 13000:13000 helloserver 13000 ``` And you are ready to run the diagnostics container and diagnose the remote application: ``` $ docker run --rm -it --mount "source=dotnet-tmp,target=/tmp" --pid=container:helloserver netdiag root@d4bfaa3a9322:/# dotnet-trace ps 1 dotnet /usr/share/dotnet/dotnet ``` If you only want to trace the application with **dotnet-trace**, consider using a shorter Dockerfile.nettrace file: ``` FROM mcr.microsoft.com/dotnet/sdk:5.0 AS base RUN dotnet tool install -g dotnet-trace ENTRYPOINT ["/root/.dotnet/tools/dotnet-trace", "collect", "-n", "dotnet", "-o", "/work/trace.nettrace", "@/work/input.rsp"] ``` where input.rsp: ``` --providers Microsoft-Windows-DotNETRuntime:0x14C14FCCBD:4,Microsoft-DotNETCore-SampleProfiler:0xF00000000000:4 ``` The nettrace container will automatically start the tracing session enabling the providers from the input.rsp file. It also assumes the destination process name is dotnet: ``` $ docker build -t nettrace -f .\Dockerfile.nettrace . $ docker run --rm --pid=container:helloserver --mount "source=dotnet-tmp,target=/tmp" -v "$pwd/:/work" -it nettrace Provider Name Keywords Level Enabled By Microsoft-Windows-DotNETRuntime 0x00000014C14FCCBD Informational(4) --providers Microsoft-DotNETCore-SampleProfiler 0x0000F00000000000 Informational(4) --providers Process : /usr/share/dotnet/dotnet Output File : /work/trace.nettrace [00:00:00:02] Recording trace 261.502 (KB) Press or to exit...11 (KB) Stopping the trace. This may take up to minutes depending on the application being traced. ``` ## Diagnosing exceptions or erroneous behavior ### Using Time Travel Debugging (TTD) Time Travel Debugging is an excellent way of troubleshooting errors and exceptions. We can step through the code causing the problems at our own pace. I describe TTD in [a WinDbg guide](/guides/windbg). It is my preferred way of debugging issues in applications and I highly recommend giving it a try. ### Collecting a memory dump **[dotnet-dump](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump)** is one of the .NET diagnostics CLI tools. You may download it using curl or wget, for example: `curl -JLO https://aka.ms/dotnet-dump/win-x64`. To create a full memory dump, run one of the commands: ``` dotnet-dump collect -p dotnet-dump collect -n ``` You may create a heap-only memory dump by adding the **--type=Heap** option. Createdump shares the location with the coreclr library, for example, for .NET 5: `/usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.3/createdump` or `c:\Program Files\dotnet\shared\Microsoft.NETCore.App\5.0.3\createdump.exe`. To create a full memory dump, run **createdump --full {process-id}**. With no options provided, it creates a memory dump with heap memory, which equals to **createdump --withheap {pid}**. The .NET application may run **createdump** automatically on crash. We configure this feature through [environment variables](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/collect-dumps-crash), for example: ```shell # enable a memory dump creation on crash set DOTNET_DbgEnableMiniDump=1 # when crashing, create a heap (2) memory dump, (4) for full memory dump set DOTNET_DbgMiniDumpType=2 ``` Apart from the .NET tools described above, you may create memory dumps with tools described in [the guide dedicated to diagnosing native Windows applications](diagnosing-native-windows-apps). As those tools usually do not understand .NET memory layout, I recommend creating full memory dumps to have all the necessary metadata for later analysis. ### Analysing exception information First make sure with the **!Threads** command (SOS) that your current thread is the one with the exception context: ``` 0:000> !Threads ThreadCount: 2 UnstartedThread: 0 BackgroundThread: 1 PendingThread: 0 DeadThread: 0 Hosted Runtime: no ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 0 1 1ec8 000000000055adf0 2a020 Preemptive 0000000002253560:0000000002253FD0 00000000004fb970 0 Ukn System.ArgumentException 0000000002253438 5 2 1c74 00000000005851a0 2b220 Preemptive 0000000000000000:0000000000000000 00000000004fb970 0 Ukn (Finalizer) ``` In the snippet above we can see that the exception was thrown on the thread no. 0 and this is our currently selected thread (in case it's not, we would use **\~0s** command) so we may use the **!PrintException** command from SOS (alias **!pe**), for example: ``` 0:000> !pe Exception object: 0000000002253438 Exception type: System.ArgumentException Message: v should not be null InnerException: StackTrace (generated): StackTraceString: HResult: 80070057 ``` To see the full managed call stack, use the **!CLRStack** command. By default, the debugger will stop on an unhandled exception. If you want to stop at the moment when an exception is thrown (first-chance exception), run the **sxe clr** command at the beginning of the debugging session. ## Diagnosing hangs We usually start the analysis by looking at the threads running in a process. The call stacks help us identify blocked threads. We can use TTD, thread-time trace, or memory dumps to learn about what threads are doing. In the follow-up sections, I will describe how to find lock objects and relations between threads in memory dumps. ### Listing threads call stacks To list native stacks for all the threads in **WinDbg**, run: **~\*k** or **~\*e!dumpstack**. If you are interested only in managed stacks, you may use the **~\*e!clrstack** SOS command. The **dotnet-dump**'s **analyze** command provides a super useful parallel stacks command: ``` > dotnet dump analyze test.dmp > pstacks ________________________________________________ ~~~~ 5cd8 1 System.Threading.Monitor.Enter(Object, Boolean ByRef) 1 deadlock.Program.Lock2() ~~~~ 3e58 1 System.Threading.Monitor.Enter(Object, Boolean ByRef) 1 deadlock.Program.Lock1() 2 System.Threading.Tasks.Task.InnerInvoke() ... 2 System.Threading.ThreadPoolWorkQueue.Dispatch() 2 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() ``` In **LLDB**, we may show native call stacks for all the threads with the **bt all** command. Unfortunately, if we want to use !dumpstack or !clrstack commands, we need to manually switch between threads with the thread select command. ### Finding locks in managed code You may examine thin locks using **!DumpHeap -thinlocks**. To find all sync blocks, use the **!SyncBlk -all** command. On .NET Framework, you may also use the **!dlk** command from the SOSEX extension. It is pretty good in detecting deadlocks, for example: ``` 0:007> .load sosex 0:007> !dlk Examining SyncBlocks... Scanning for ReaderWriterLock(Slim) instances... Scanning for holders of ReaderWriterLock locks... Scanning for holders of ReaderWriterLockSlim locks... Examining CriticalSections... Scanning for threads waiting on SyncBlocks... Scanning for threads waiting on ReaderWriterLock locks... Scanning for threads waiting on ReaderWriterLocksSlim locks... *** WARNING: Unable to verify checksum for C:\WINDOWS\assembly\NativeImages_v4.0.30319_32\System\3a4f0a84904c4b568b6621b30306261c\System.ni.dll *** WARNING: Unable to verify checksum for C:\WINDOWS\assembly\NativeImages_v4.0.30319_32\System.Transactions\ebef418f08844f99287024d1790a62a4\System.Transactions.ni.dll Scanning for threads waiting on CriticalSections... *DEADLOCK DETECTED* CLR thread 0x1 holds the lock on SyncBlock 011e59b0 OBJ:02e93410[System.Object] ...and is waiting on CriticalSection 01216a58 CLR thread 0x3 holds CriticalSection 01216a58 ...and is waiting for the lock on SyncBlock 011e59b0 OBJ:02e93410[System.Object] CLR Thread 0x1 is waiting at clr!CrstBase::SpinEnter+0x92 CLR Thread 0x3 is waiting at System.Threading.Monitor.Enter(System.Object, Boolean ByRef)(+0x17 Native) ``` When debugging locks in code that is using tasks it is often necessary to examine execution contexts assigned to the running threads. I prepared a simple script which lists threads with their execution contexts. You only need (as in previous script) to find the MT of the Thread class in your appdomain, e.g. ``` 0:036> !Name2EE mscorlib.dll System.Threading.Thread Module: 72551000 Assembly: mscorlib.dll Token: 020001d1 MethodTable: 72954960 EEClass: 725bc0c4 Name: System.Threading.Thread ``` And then paste it in the scripts below: x86 version: ``` .foreach ($addr {!DumpHeap -short -mt }) { .printf /D "Thread: %i; Execution context: %p\n", poi(${$addr}+28), poi(${$addr}+8), poi(${$addr}+8) } ``` x64 version: ``` .foreach ($addr {!DumpHeap -short -mt }) { .printf /D "Thread: %i; Execution context: %p\n", poi(${$addr}+4c), poi(${$addr}+10), poi(${$addr}+10) } ``` Notice that the thread number from the output is a managed thread id and to map it to the windbg thread number you need to use the !Threads command. ## Diagnosing waits or high CPU usage Dotnet-trace allows us to enable the runtime CPU sampling provider (**Microsoft-DotNETCore-SampleProfiler**). However, using it might impact application performance as it internally calls **ThreadSuspend::SuspendEE** to suspend managed code execution while collecting the samples. Although it is a sampling profiler, it is a bit special. It runs on a separate thread and collects stacks of all the managed threads, even the waiting ones. This behavior resembles the thread time profiler. Probably that's the reason why PerfView shows us the **Thread Time** view when opening the .nettrace file. Sample collect examples: ```bash dotnet-trace collect --profile cpu-sampling -p 12345 dotnet-trace collect --profile cpu-sampling -- myapp.exe ``` Dotnet-trace does not automatically enable DiagnosticSource or TPL providers. Therefore, if we want to see activities in PerfView, we need to turn them on manually, for example: ```bash dotnet-trace collect --profile cpu-sampling --providers "Microsoft-Diagnostics-DiagnosticSource:0xFFFFFFFFFFFFF7FF:4:FilterAndPayloadSpecs=HttpHandlerDiagnosticListener/System.Net.Http.Request@Activity2Start:Request.RequestUri\nHttpHandlerDiagnosticListener/System.Net.Http.Response@Activity2Stop:Response.StatusCode,System.Threading.Tasks.TplEventSource:1FF:5" -n testapp ``` For diagnosing CPU problems in .NET applications running on Windows, we may also rely on ETW (Event Tracing for Windows). In [a guide dedicated to diagnosing native applications](diagnosing-native-windows-apps), I describe how to collect and analyze ETW traces. On Linux, we additionally have the [perfcollect](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/trace-perfcollect-lttng) script. It is the easiest way to use Linux Kernel perf_events for diagnosing .NET apps. In my tests, however, I found that quite often, it did not correctly resolve .NET stacks. To collect CPU samples with perfcollect, use the **perfcollect collect** command. To also enable the Thread Time events, add the **-threadtime** option. If only possible, I would recommend opening the traces (even the ones from Linux) in PerfView. But if it's impossible, try the **view** command of the perfcollect script, for example: ```bash perfcollect view sqrt.trace.zip -graphtype caller ``` Using the **-graphtype** option, we may switch from the top-down view (`caller`) to the bottom-up view (`callee`). ## Diagnosing managed memory leaks ### Collecting memory snapshots If we are interested only in GC Heaps, we may create the GC Heap snapshot using **PerfView**: perfview heapsnapshot In GUI, we may use the menu option: **Memory -> Take Heap Snapshot**. For .NET Core applications, we have a CLI tool: **dotnet-gcdump**, which you may get from the https://aka.ms/dotnet-gcdump/runtime-id URL, for example, https://aka.ms/dotnet-gcdump/linux-x64. And to collect the GC dump we need to run one of the commands: ``` dotnet-gcdump -p dotnet-gcdump -n ``` Sometimes managed heap is not enough to diagnose the memory leak. In such situations, we need to create a memory dump, as described in [a guide dedicated to diagnosing native applications](diagnosing-native-windows-apps). ### Analyzing collected snapshots **PerfView** can open GC Heap snapshots and dumps. If you only have a memory dump, you may convert a memory dump file to a PerfView snapshot using **PerfView HeapSnapshotFromProcessDump ProcessDumpFile {DataFile}** or using the GUI options **Memory -> Take Heap Snapshot from Dump**. I would like to bring your attention to an excellent diffing option available for heap snapshots. Imagine you made two heap snapshots of the leaking process: - first named LeakingProcess.gcdump - second (taken a minute later) named LeakingProcess.1.gcdump You may now run PerfView, open two collected snapshots, switch to the LeakingProcess.1.gcdump and under the Diff menu you should see an option to diff this snapshot with the baseline: ![diff option under the menu](/assets/img/perfview-snapshots-diff.png) After you choose it, a new window will pop up with a tree of objects which have changed between the snapshots. Of course, if you have more snapshots you can generate diffs between them all. A really powerful feature! **WinDbg** allows you to analyze the full memory dumps. **Make sure that bitness of the dump matches bitness of the debugger.** Then load the SOS extension and identify objects which use most of the memory using **!DumpHeap -stat**. Later, analyze the references using the **!GCRoot** command. Other SOS commands for analyzing the managed heap include: ``` !EEHeap [-gc] [-loader] !HeapStat [-inclUnrooted | -iu] !DumpHeap [-stat] [-strings] [-short] [-min ] [-max ] [-live] [-dead] [-thinlock] [-startAtLowerBound] [-mt ] [-type ] [start [end]] !ObjSize [] !GCRoot [-nostacks] !DumpObject
| !DumpArray
| !DumpVC
``` **dotnet-gcdump** has a **report** command that lists the objects recorded in the GC heaps. The output resembles output from the SOS `!dumpheap` command. ## Diagnosing issues with assembly loading ### Troubleshooting loading with EventPipes/ETW (.NET) The **Loader** keyword (`0x8`) in the **Microsoft-Windows-DotNETRuntime** provider enables events relating to **loading and unloading** of **appdomains**, **assemblies** and **modules**. Starting with **.NET 5**, the new **AssemblyLoader** keyword (`0x4`) gives us a detailed view of the **assembly resolution process**. Additionally, we can group the activity events per assembly using the `ActivityID`. dotnet-trace collect --providers Microsoft-Windows-DotNETRuntime:C -- testapp.exe ### Troubleshooting loading using ETW (.NET Framework) There is a number of ETW events defined under the **Microsoft-Windows-DotNETRuntimePrivate/Binding/** category. We may use, for example, **PerfView** to collect them. Just make sure that you have the .NET check box selected in the collection dialog. Start collection and stop it after the loading exception occurs. Then open the .etl file, go to the **Events** screen and filter them by *binding*. Select all of the events and press ENTER. PerfView will immediately print the instances of the selected events in the grid on the right. You may later search or filter the grid with the help of the search boxes above it. ### Troubleshooting loading using Fusion log (.NET Framework) Fusion log is available in all versions of the .NET Framework. There is a tool named **fuslogvw** in .NET SDK, which you may use to set the Fusion log configuration. Andreas Wäscher implemented an easier-to-use version of this tool, with a modern UI, named [Fusion++](https://github.com/awaescher/Fusion). You may download the precompiled version from the [release page](https://github.com/awaescher/Fusion/releases/). If using neither of the above tools is possible (for example, you are in a restricted environment), you may configure the Fusion log through **registry settings**. The root of all the Fusion log settings is **HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Fusion**. When writing to a folder on a hard drive fusion logs are split among categories and processes, e.g.: ``` C:\TEMP\FUSLOGVW ├───Default │ └───powershell.exe └───NativeImage └───powershell.exe ``` Log to exception text: HKEY_LOCAL_MACHINE\software\microsoft\fusion EnableLog REG_DWORD 0x1 or reg delete HKLM\Software\Microsoft\Fusion /va reg add HKLM\Software\Microsoft\Fusion /v EnableLog /t REG_DWORD /d 0x1 Log failures to disk: HKEY_LOCAL_MACHINE\software\microsoft\fusion LogFailures REG_DWORD 0x1 LogPath REG_SZ c:\logs\fuslogvw or reg delete HKLM\Software\Microsoft\Fusion /va reg add HKLM\Software\Microsoft\Fusion /v LogFailures /t REG_DWORD /d 0x1 reg add HKLM\Software\Microsoft\Fusion /v LogPath /t REG_SZ /d "C:\logs\fuslogvw" Log all binds to disk HKEY_LOCAL_MACHINE\software\microsoft\fusion LogPath REG_SZ c:\logs\fuslogvw ForceLog REG_DWORD 0x1 or reg delete HKLM\Software\Microsoft\Fusion /va reg add HKLM\Software\Microsoft\Fusion /v ForceLog /t REG_DWORD /d 0x1 reg add HKLM\Software\Microsoft\Fusion /v LogPath /t REG_SZ /d "C:\logs\fuslogvw" Log disabled HKEY_LOCAL_MACHINE\software\microsoft\fusion LogPath REG_SZ c:\logs\fuslogvw or reg delete HKLM\Software\Microsoft\Fusion /va ### GAC (.NET Framework) For .NET2.0/3.5 Global Assembly Cache was located in **c:\Windows\assembly** folder with a drag/drop option for installing/uninstalling assemblies. Citing [a stackoverflow answer](http://stackoverflow.com/questions/10013047/gacutil-vs-manually-editing-c-windows-assembly): > This functionality is provided by a custom shell extension, shfusion.dll. It flattens the GAC and makes it look like a single folder. And takes care of automatically un/registering the assemblies for you when you manipulate the explorer window. So you’re fine doing this. To **disable GAC viewer in Windows Explorer**, add a DWORD value **DisableCacheViewer** set to 1 under the **HKLM\Software\Microsoft\Fusion** key. Note that this will no longer work for .NET 4, it uses in a different folder to store GAC files (**c:\windows\microsoft.net\assembly**) and that folder does not have the same kind of shell extension. Thus, you can see the raw content of it. However, you should not directly use it. It is best to use **gacutil** to manipulate GAC content. Though it’s possible to install assembly in both GAC folders as stated [here](http://stackoverflow.com/questions/7095887/registering-the-same-version-of-an-assembly-but-with-different-target-frameworks), but I would not consider it a good practice as framework tools can’t deal with it. .NET GAC settings are stored under the registry key: HKLM\Software\Microsoft\Fusion. #### Find assembly in cache We can use the **gacutil /l** to find an assembly in GAC. If no name is provided, the command lists all the assemblies in cache. gacutil /l System.Core The Global Assembly Cache contains the following assemblies: System.Core, Version=3.5.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089, processorArchitecture=MSIL System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089, processorArchitecture=MSIL Number of items = 2 #### Uninstall assembly from cache gacutil /u MyTest.exe ## Diagnosing network connectivity issues ### .NET Core .NET Core provides a number of ETW and EventPipes providers to collect the network tracing events. Enabling the providers could be done in **dotnet-trace**, **PerfView**, or **dotnet-wtrace**. Network ETW providers use only two keywords (`Default = 0x1` and `Debug = 0x2`) and, as usual, we may filter the events by the log level (from 1 (critical) to 5 (verbose)). In **.NET 5**, the providers were renamed and currently we can use the following names: - `Private.InternalDiagnostics.System.Net.Primitives` - cookie container, cache credentials logs - `Private.InternalDiagnostics.System.Net.Sockets` - logs describing operations on sockets, connection status events, - `Private.InternalDiagnostics.System.Net.NameResolution` - `Private.InternalDiagnostics.System.Net.Mail` - `Private.InternalDiagnostics.System.Net.Requests` - logs from System.Net.Requests classes - `Private.InternalDiagnostics.System.Net.HttpListener` - `Private.InternalDiagnostics.System.Net.WinHttpHandler` - `Private.InternalDiagnostics.System.Net.Http` - HttpClient and HTTP handler logs, authentication events - `Private.InternalDiagnostics.System.Net.Security` - SecureChannel (TLS) events, Windows SSPI logs For previous .NET Core versions, the names were as follows: - `Microsoft-System-Net-Primitives` - `Microsoft-System-Net-Sockets` - `Microsoft-System-Net-NameResolution` - `Microsoft-System-Net-Mail` - `Microsoft-System-Net-Requests` - `Microsoft-System-Net-HttpListener` - `Microsoft-System-Net-WinHttpHandler` - `Microsoft-System-Net-Http` - `Microsoft-System-Net-Security` We may create a network.rsp file that enables all these event sources and the Kestrel one. You may use it with **dotnet-trace**, for example: ``` $ dotnet-trace collect -n dotnet @network.rsp ``` The network.rsp file for older .NET Core (before .NET 5) might look as follows: ``` --providers Microsoft-System-Net-Primitives,Microsoft-System-Net-Sockets,Microsoft-System-Net-NameResolution,Microsoft-System-Net-Mail,Microsoft-System-Net-Requests,Microsoft-System-Net-HttpListener,Microsoft-System-Net-WinHttpHandler,Microsoft-System-Net-Http,Microsoft-System-Net-Security,Microsoft-AspNetCore-Server-Kestrel ``` For .NET 5 and newer: ``` --providers Private.InternalDiagnostics.System.Net.Primitives,Private.InternalDiagnostics.System.Net.Sockets,Private.InternalDiagnostics.System.Net.NameResolution,Private.InternalDiagnostics.System.Net.Mail,Private.InternalDiagnostics.System.Net.Requests,Private.InternalDiagnostics.System.Net.HttpListener,Private.InternalDiagnostics.System.Net.WinHttpHandler,Private.InternalDiagnostics.System.Net.Http,Private.InternalDiagnostics.System.Net.Security,Microsoft-AspNetCore-Server-Kestrel ``` I also developed [**dotnet-wtrace**](https://github.com/lowleveldesign/dotnet-wtrace), a lightweight traces that makes it straightfoward to live collect .NET events, including network traces. ### .NET Framework All classes from `System.Net`, if configured properly, may provide a lot of interesting logs through the default System.Diagnostics mechanisms. The list of the available trace sources is available in [Microsoft docs](https://docs.microsoft.com/en-us/dotnet/framework/network-programming/how-to-configure-network-tracing). This is a configuration sample which writes network traces to a file: ```xml ``` These logs may be verbose and numerous, therefore, I suggest starting with Information level and smaller number of sources. You may also consider using **EventProviderTraceListener** to make the trace writes faster and less impactful. An example configuration file with those changes: ```xml ``` And to collect such a trace: ```shell logman start "net-trace-session" -p "{0f09a664-1713-4665-91e8-8d6b8baee030}" -bs 512 -nb 8 64 -o "c:\temp\net-trace.etl" -ets & pause & logman stop net-trace-session -ets ``` ## ASP.NET Core ### Collecting ASP.NET Core logs For low-level network traces, you may enable .NET network providers, as described in the previous section. ASP.NET Core framework logs events either through **DiagnosticSource** using **Microsoft.AspNetCore** as the source name or through the **ILogger** interface. #### ILogger logs The CreateDefaultBuilder method adds LoggingEventSource (named **Microsoft-Extensions-Logging**) as one of the log outputs. The **FilterSpecs** argument makes it possible to filter the events by logger name and level, for example: ``` Microsoft-Extensions-Logging:5:5:FilterSpecs=webapp.Pages.IndexModel:0 ``` We may define the log message format with keywords (pick one): - 0x1 - enable meta events - 0x2 - enable events with raw arguments - 0x4 - enable events with formatted message (the most readable) - 0x8 - enable events with data seriazlied to JSON For example, to collect ILogger info messages: `dotnet-trace collect -p PID --providers "Microsoft-Extensions-Logging:0x4:0x4"` #### DiagnosticSource logs To listen to **DiagnosticSource events**, we should enable the **Microsoft-Diagnostics-DiagnosticSource** event source. DiagnosticSource events often contain complex types and we need to use [parser specifications](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/DiagnosticSourceEventSource.cs) to extract the interesting properties. The **Microsoft-Diagnostics-DiagnosticSourcex** event source some special keywords: - 0x1 - enable diagnostic messages - 0x2 - enable regular events - 0x0800 - disable the shortcuts keywords, listed below - 0x1000 - enable activity tracking and basic hosting events (ASP.NET Core) - 0x2000 - enable activity tracking and basic command events (EF Core) Also, we should enable the minimal logging from the **System.Threading.Tasks.TplEventSource** provider to profit from the [activity tracking](https://docs.microsoft.com/en-us/archive/blogs/vancem/exploring-eventsource-activity-correlation-and-causation-features). When our application is hosted on the Kestrel server, we may enable the **Microsoft-AspNetCore-Server-Kestrel** provider to get Kestrel events. An example command that enables all ASP.NET Core event traces and some other useful network event providers. It also adds activity tracking for **HttpClient** requests: ``` > dotnet-trace collect --providers "Private.InternalDiagnostics.System.Net.Security,Private.InternalDiagnostics.System.Net.Sockets,Microsoft-AspNetCore-Server-Kestrel,Microsoft-Diagnostics-DiagnosticSource:0x1003:5:FilterAndPayloadSpecs=\"Microsoft.AspNetCore\nHttpHandlerDiagnosticListener\nHttpHandlerDiagnosticListener/System.Net.Http.Request@Activity2Start:Request.RequestUri\nHttpHandlerDiagnosticListener/System.Net.Http.Response@Activity2Stop:Response.StatusCode\",System.Threading.Tasks.TplEventSource:0x80:4,Microsoft-Extensions-Logging:4:5" -n webapp ``` ### Collecting ASP.NET Core performance counters ASP.NET Core provides some basic performance counters through the **Microsoft.AspNetCore.Hosting** event source. If we are also using Kestrel, we may add some interesting counters by enabling **Microsoft-AspNetCore-Server-Kestrel**: ``` > dotnet-counters monitor "Microsoft.AspNetCore.Hosting" "Microsoft-AspNetCore-Server-Kestrel" -n testapp Press p to pause, r to resume, q to quit. Status: Running [Microsoft.AspNetCore.Hosting] Current Requests 0 Failed Requests 0 Request Rate (Count / 1 sec) 0 Total Requests 0 [Microsoft-AspNetCore-Server-Kestrel] Connection Queue Length 0 Connection Rate (Count / 1 sec) 0 Current Connections 1 Current TLS Handshakes 0 Current Upgraded Requests (WebSockets) 0 Failed TLS Handshakes 2 Request Queue Length 0 TLS Handshake Rate (Count / 1 sec) 0 Total Connections 7 Total TLS Handshakes 7 ``` ## ASP.NET (.NET Framework) ### Examining ASP.NET process memory (and dumps) Some useful [PSSCOR4](http://www.microsoft.com/en-us/download/details.aspx?id=21255) commands for ASP.NET: ``` !ProcInfo [-env] [-time] [-mem] FindDebugTrue !FindDebugModules [-full] !DumpHttpContext dumps the HttpContexts in the heap. It shows the status of the request and the return code, etc. It also prints out the start time !ASPXPages just calls !DumpHttpContext to print out information on the ASPX pages running on threads. !DumpASPNETCache [-short] [-stat] [-s] !DumpRequestTable [-a] [-p] [-i] [-c] [-m] [-q] [-n] [-e] [-w] [-h] [-r] [-t] [-x] [-dw] [-dh] [-de] [-dx] !DumpHistoryTable [-a] !DumpHistoryTable dumps the aspnet_wp history table. !DumpBuckets dumps entire request table buckets. !GetWorkItems given a CLinkListNode, print out request & work items. ``` [Netext](http://netext.codeplex.com/) commands for ASP.NET: ``` !whttp [/order] [/running] [/withthread] [/status ] [/notstatus ] [/verb ] [] - dump HttpContext objects !wconfig - dump configuration sections loaded into memory !wruntime - dump all active Http Runtime information ``` ### Profiling ASP.NET ### Application instrumentation Interesting tools and libraries: - [ASP.NET 4.5 page instrumentation mechanism - PageExecutionListener](http://weblogs.asp.net/imranbaloch/archive/2013/11/23/page-instrumentation-in-asp-net-4-5.aspx) - [Glimpse](https://github.com/glimpse/glimpse) - [MiniProfiler](https://miniprofiler.com/) - [Elmah](https://elmah.github.io/) We may also use the ASP.NET trace listener to print diagnostic message to the page trace. In the configuration file below, we configure the Performance TraceSource to pass events to the ASP.NET trace listener. ```xml ``` ### ASP.NET ETW providers ASP.NET ETW providers are defined in the aspnet.mof file in the main .NET Framework folder. They should be installed with the framework: ``` > logman query /providers "ASP.NET Events" Provider GUID ------------------------------------------------------------------------------- ASP.NET Events {AFF081FE-0247-4275-9C4E-021F3DC1DA35} Value Keyword Description ------------------------------------------------------------------------------- 0x0000000000000001 Infrastructure Infrastructure Events 0x0000000000000002 Module Pipeline Module Events 0x0000000000000004 Page Page Events 0x0000000000000008 AppServices Application Services Events Value Level Description ------------------------------------------------------------------------------- 0x01 Fatal Abnormal exit or termination 0x02 Error Severe errors 0x03 Warning Warnings 0x04 Information Information 0x05 Verbose Detailed information ``` If they are not, use mofcomp.exe to install them. To start collecting trace events from the ASP.NET and IIS providers run the following command: ``` logman start aspnettrace -pf ctrl-iis-aspnet.guids -ct perf -o aspnet.etl -ets ``` where the ctrl-iis-aspnet.guids looks as follows: ``` {AFF081FE-0247-4275-9C4E-021F3DC1DA35} 0xf 5 ASP.NET Events {3A2A4E84-4C21-4981-AE10-3FDA0D9B0F83} 0x1ffe 5 IIS: WWW Server ``` And stop it with the command: ``` logman stop aspnettrace -ets ``` ### Collect events using the Perfecto tool Perfecto is a tool that creates an ASP.NET data collector in the system and allows you to generate nice reports of requests made to your ASP.NET application. After installing you can either use the **perfmon** to start the report generation: 1. On perfmon, navigate to the "Performance\Data Collector Sets\User Defined\ASPNET Perfecto" node. 2. Click the "Start the Data Collector Set" button on the tool bar. 3. Wait for/or make requests to the server (more than 10 seconds). 4. Click the "Stop the Data Collector Set" button on the tool bar. 5. Click the "View latest report" button on the tool bar or navigate to the last report at "Performance\Reports\User Defined\ASPNET Perfecto" or **logman**: ``` logman.exe start -n "Service\ASPNET Perfecto" logman.exe stop -n "Service\ASPNET Perfecto" ``` Note: The View commands are also available as toolbar buttons. Sometimes you can see an error like below: ``` Error Code: 0xc0000bf8 Error Message: At least one of the input binary log files contain fewer than two data samples. ``` This usually happens when you collected data too fast. The performance counters are set by default to collect every 10 seconds. So a fast start/stop sequence may end without enough counter data being collected. Always allow more than 10 seconds between a start and stop commands. Or otherwise delete the performance counters collector or change the sample interval. Requirements: 1. Windows >= Vista 2. Installed IIS tracing (`dism /online /enable-feature /featurename:IIS-HttpTracing`) ### Collect events using FREB New IIS servers (7.0 up) contain a nice diagnostics functionality called Failed Request Tracing (or **FREB**). You may find a lot of information how to enable it on the [IIS official site](https://www.iis.net/learn/troubleshoot/using-failed-request-tracing/troubleshooting-failed-requests-using-tracing-in-iis) and in my [iis debugging recipe](asp.net/troubleshooting-iis.md). {% endraw %} ================================================ FILE: guides/diagnosing-native-windows-apps.md ================================================ --- layout: page title: Diagnosing native Windows applications date: 2025-05-25 08:00:00 +0200 --- {% raw %} **Table of contents:** - [Debugging process execution](#debugging-process-execution) - [Collecting memory dumps on errors](#collecting-memory-dumps-on-errors) - [Using procdump](#using-procdump) - [Using Windows Error Reporting \(WER\)](#using-windows-error-reporting-wer) - [Automatic dump collection using AeDebug registry key](#automatic-dump-collection-using-aedebug-registry-key) - [Diagnosing waits or high CPU usage](#diagnosing-waits-or-high-cpu-usage) - [Collecting ETW trace](#collecting-etw-trace) - [Anaysing the collected traces](#anaysing-the-collected-traces) - [Diagnosing issues with DLL loading](#diagnosing-issues-with-dll-loading) - [Diagnosing window functions \(user32\)](#diagnosing-window-functions-user32) Debugging process execution --------------------------- Please check [the WinDbg guide](/guides/windbg) where I describe various troubleshooting commands in WinDbg, along with Time Travel Debugging. Collecting memory dumps on errors --------------------------------- ### Using procdump My preferred tool to collect memory dumps is **[procdump](https://learn.microsoft.com/en-us/sysinternals/downloads/procdump)**. It is often a good way to start diagnosing errors by observing 1st chance exceptions occurring in a process. At this point we don't want to collect any dumps, only logs. We may achieve this by specyfing a non-existing exception name in the filter command, for example: ``` C:\Utils> procdump -e 1 -f "DoesNotExist" 8012 ... CLR Version: v4.0.30319 [09:03:27] Exception: E0434F4D.System.NullReferenceException ("Object reference not set to an instance of an object.") [09:03:28] Exception: E0434F4D.System.NullReferenceException ("Object reference not set to an instance of an object.") ``` We may also observe the logs in procmon. In order to see the procdump log events in **procmon** remember to add procdump.exe and procdump64.exe to the accepted process names in procmon filters. To create a full memory dump when `NullReferenceException` occurs use the following command: ``` procdump -ma -e 1 -f "E0434F4D.System.NullReferenceException" 8012 ``` From some time procdump uses a managed debugger engine when attaching to .NET Framework processes. This is great because we can filter exceptions based on their managed names. Unfortunately, that works only for 1st chance exceptions (at least for .NET 4.0). 2nd chance exceptions are raised out of the .NET Framework and must be handled by a native debugger. Starting from .NET 4.0 it is no longer possible to attach both managed and native engine to the same process. Thus, if we want to make a dump on the 2nd chance exception for a .NET application, we need to use the **-g** option in order to force procdump to use the native engine. ### Using Windows Error Reporting (WER) By default WER takes dump only when necessary, but this behavior can be configured and we can force WER to always create a dump by modifying `HKLM\Software\Microsoft\Windows\Windows Error Reporting\ForceQueue=1` or (`HKEY_CURRENT_USER\Software\Microsoft\Windows\Windows Error Reporting\ForceQueue=1`). The reports are usually saved at `%LocalAppData%\Microsoft\Windows\WER`, in two directories: `ReportArchive`, when a server is available or `ReportQueue`, when the server is unavailable. If you want to keep the data locally, just set the server to a non-existing machine (for example, `HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\CorporateWERServer=NonExistingServer`). For **system processes** you need to look at `C:\ProgramData\Microsoft\Windows\WER`. In Windows 2003 Server R2 Error Reporting stores errors in the signed-in user's directory (for example, `C:\Documents and Settings\me\Local Settings\Application Data\PCHealth\ErrorRep`). Starting with Windows Server 2008 and Windows Vista with Service Pack 1 (SP1), Windows Error Reporting can be configured to [collect full memory dumps on application crash](https://learn.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps). The registry key enabling this behavior is `HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps`. An example configuration for saving full-memory dumps to the %SYSTEMDRIVE%\dumps folder when the test.exe application fails looks as follows: ``` Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps] [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\test.exe] "DumpFolder"=hex(2):25,00,53,00,59,00,53,00,54,00,45,00,4d,00,44,00,52,00,49,\ 00,56,00,45,00,25,00,5c,00,64,00,75,00,6d,00,70,00,73,00,00,00 "DumpType"=dword:00000002 ``` With the help of [the WER API](https://learn.microsoft.com/en-us/windows/win32/wer/wer-reference), you may also force WER reports in your custom application or even [register a custom crash handler](https://minidump.net/windows-error-reporting/). To **completely disable WER**, create a DWORD Value under the `HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting` key, named `Disabled` and set its value to `1`. For 32-bit apps use the `HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows\Windows Error Reporting` key. ### Automatic dump collection using AeDebug registry key There is a special [AeDebug](https://learn.microsoft.com/en-us/windows/win32/debug/configuring-automatic-debugging) key in the registry defining what should happen when an unhandled exception occurs in an application. You may find it under the `HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion` key (or `HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Windows NT\CurrentVersion` for 32-bit apps). Its important value keys include: - `Debugger` : REG_SZ - application which will be called to handle the problematic process (example value: `procdump.exe -accepteula -j "c:\dumps" %ld %ld %p`), the first %ld parameter is replaced with the process ID and the second with the event handle - `Auto` : REG_SZ - defines if the debugger runs automatically, without prompting the user (example value: 1) - `UserDebuggerHotKey` : REG_DWORD - not sure, but it looks it enables the Debug button on the exception handling message box (example value: 1) To set **WinDbg** as your default AeDebug debugger, run `windbg -I`. After running this command, WinDbg will launch on application crashes. You may also automate WinDbg to create a memory dump and then allow process to terminate, for example: `windbg -c ".dump /ma /u c:\dumps\crash.dmp; qd" -p %ld -e %ld -g`. My favourite tool to use as the automatic debugger is **procdump**. The command line to install it is `procdump -mp -i c:\dumps`, where c:\dumps is the folder where I would like to store the dumps of crashing apps. Diagnosing waits or high CPU usage ---------------------------------- There are two ways of tracing CPU time. We could either use CPU sampling or Thread Time profiling. CPU sampling is about collecting samples in intervals: each CPU sample contains an instruction pointer to the currently executing code. Thus, this technique is excellent when diagnosing high CPU usage of an application. It won't work for analyzing waits in the applications. For such scenarios, we should rely on Thread Time profiling. It uses the system scheduler/dispatcher events to get detailed information about application CPU time. When combined with CPU sampling, it is the best non-invasive profiling solution. ### Collecting ETW trace We may use **PerfView** or **wpr.exe** to collect CPU samples and Thread Time events. When collecting CPU samples, PerfView relies on Profile events coming from the Kernel ETW provider which has very low impact on the system overall performance. An example command to start the CPU sampling: ```shell perfview collect -NoGui -KernelEvents:Profile,ImageLoad,Process,Thread -ClrEvents:JITSymbols cpu-collect.etl ``` Alternatively, you may use the Collect dialog. Make sure the Cpu Samples checkbox is selected. To collect Thread Time events, you may use the following command: ```shell perfview collect -NoGui -ThreadTime thread-time-collect.etl ``` The Collect dialog has also the Thread Time checkbox. ### Anaysing the collected traces For analyzing **CPU Samples**, use the **CPU Stacks** view. Always check the number of samples if it corresponds to the tracing time (CPU sampling works when we have enough events). If necessary, zoom into the interesting period using a histogram (select the time and press Alt + R). Checking the **By Name** tab could be enough to find the method responsible for the high CPU Usage (look at the inclusive time and make sure you use correct grouping patterns). When analyzing waits in an application, we should use the **Thread Time Stacks** views. The default one, **with StartStop activities**, tries to group the tasks under activities and helps diagnose application activities, such as HTTP requests or database queries. Remember that the exclusive time in the activities view is a sum of all the child tasks. The thread under the activity is the thread on which the task started, not necessarily the one on which it continued. The **with ReadyThread** view can help when we are looking for thread interactions. For example, we want to find the thread that released a lock on which a given thread was waiting. The **Thread Time Stacks** view (with no grouping) is the best one to visualize the application's sequence of actions. Expanding thread nodes in the CallTree could take lots of time, so make sure you use other events (for example, from the Events view) to set the time ranges. As usual, check the grouping patterns. Diagnosing issues with DLL loading ---------------------------------- An invaluable source of information when dealing with DLL loading issues are Windows Loader snaps. Those are detailed logs of the steps that Windows Loader takes to resolve the application library dependencies. They are one of the available Global Flags that we can set for an executable, so we may use the **gflags.exe** tool to enable them. ![gflags - loader snaps](/assets/img/gflags-loader-snaps.png) Alternatively, you may modify the process IFEO registry key, for example: ``` Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\winver.exe] "GlobalFlag"=dword:000000002 ``` Once enabled, you need to start the failing application under a debugger and the Loader logs should appear in the debug output. Alternatively, you may collect a procmon or ETW trace and search for any failure in the file events. Diagnosing window functions (user32) ------------------------------------ The code snippet below contains example commands creating breakpoints to trace window functions: ```cpp # 32-bit bp user32!NtUserSetWindowPos ".printf \"SetWindowPos( hWnd: %p, hWndInsertAfter: %p, X: %d, Y: %d, cx: %d, cy: %d, uFlags: %x )\\n\", poi(@esp+4), poi(@esp+8), poi(@esp+0xC), poi(@esp+0x10), poi(@esp+0x14), poi(@esp+0x18), poi(@esp+0x1C); g" bp user32!NtUserShowWindow ".printf \"ShowWindow( hWnd: %p, nCmdShow: %d )\\n\", poi(@esp+4), poi(@esp+8); g" bp user32!SetWindowLongW ".printf \"SetWindowLongW( hWnd: %p, nIndex: %d, dwNewLong: %p )\\n\", poi(@esp+4), poi(@esp+8), poi(@esp+0xC); g" bp user32!SetForegroundWindow ".printf \"SetForegroundWindow( hWnd: %p )\\n\", poi(@esp+4); g" bp user32!NtUserSetParent ".printf \"SetParent( hWndChild: %p, hWndNewParent: %p )\\n\", poi(@esp+4), poi(@esp+8); g" # 32-bit, but using dx bp user32!NtUserSetWindowPos "dx new { function = \"SetWindowPos\", hWnd = *(void **)(@esp+4), hWndInsertAfter = *(void **)(@esp+8), X = *(int *)(@esp+0xC), Y = *(int *)(@esp+0x10), cx = *(int *)(@esp+0x14), cy = *(int *)(@esp+0x18), uFlags = *(unsigned int *)(@esp+0x1C) }; g" bp user32!NtUserSetForegroundWindow "dx new { function = \"SetForegroundWindow\", hWnd = *(void **)(@esp+4) }; g" bp user32!NtUserShowWindow "dx new { function = \"ShowWindow\", hWnd = *(void **)(@esp+4), nCmdShow = *(int *)(@esp+8) }; g" bp user32!NtUserSetParent "dx new { function = \"SetParent\", hWndChild = *(void **)(@esp+4), hWndNewParent = *(void **)(@esp+8) }; g" bp user32!NtUserSetWindowLong "dx new { function = \"SetWindowLongW\", hWnd = *(void **)(@esp+4), nIndex = *(int *)(@esp+8), dwNewLong = *(long *)(@esp+0xC) }; g" # 64-bit bp user32!NtUserSetWindowPos ".printf \"SetWindowPos( hWnd: %p, hWndInsertAfter: %p, X: %d, Y: %d, cx: %d, cy: %d, uFlags: %x )\\n\", @rcx, @rdx, @r8, @r9, poi(@rsp+0x20), poi(@rsp+0x28), poi(@rsp+0x30); g" bp user32!NtUserShowWindow ".printf \"ShowWindow( hWnd: %p, nCmdShow: %d )\\n\", @rcx, @rdx; g" bp user32!SetWindowLongW ".printf \"SetWindowLongW( hWnd: %p, nIndex: %d, dwNewLong: %p )\\n\", @rcx, @rdx, @r8; g" bp user32!SetForegroundWindow ".printf \"SetForegroundWindow( hWnd: %p )\\n\", @rcx; g" bp user32!NtUserSetParent ".printf \"SetParent( hWndChild: %p, hWndNewParent: %p )\\n\", @rcx, @rdx; g" # 64-bit, but using dx bp user32!NtUserSetWindowPos "dx new { function = \"SetWindowPos\", hWnd = (void *)@rcx, hWndInsertAfter = (void *)@rdx, X = (int)@r8, Y = (int)@r9, cx = *(int *)(@rsp+0x28), cy = *(int *)(@rsp+0x30), uFlags = *(unsigned int *)(@rsp+0x38) }; g" bp user32!SetForegroundWindow "dx new { function = \"SetForegroundWindow\", hWnd = (void *)@rcx }; g" bp user32!NtUserShowWindow "dx new { function = \"ShowWindow\", hWnd = (void *)@rcx, nCmdShow = (int)@rdx }; g" bp user32!_imp_NtUserSetParent "dx new { function = \"SetParent\", hWndChild = (void *)@rcx, hWndNewParent = (void *)@rdx }; g" bp user32!SetWindowLongW "dx new { function = \"SetWindowLongW\", hWnd = (void *)@rcx, nIndex = (int)@rdx, dwNewLong = (long)@r8 }; g" # conditional breakpoints bp user32!PeekMessageW "r $t1 = poi(@esp+4); bp /1 @$ra \".lastevent; dt (combase!tagMSG)@$t1; g\"; g" bp user32!PeekMessageW ".lastevent; r $t1 = poi(@esp+4); r $t2 = poi(@esp+8); .printf \"PeekMessageW(%x, %x)\n\", @$t1, @$t2; ba e1 /1 @$ra \".if (poi(@$t1) == 0x40526) { .lastevent; dt (combase!tagMSG)@$t1; g } .else { g }\"; g" bp user32!PeekMessageW "r $t1 = poi(@esp+4); ba e1 /1 @$ra \".if (poi(@$t1) == 0x7049c) { .lastevent; dt (combase!tagMSG)@$t1; g } .else { g }\"; g" bp user32!SetWindowLongW ".lastevent; dps @esp L4; r $t0 = poi(@esp+c); .if ($t0 = 0) { g }" bp user32!SetWindowLongW ".lastevent; dps @esp L4; r $t0 = poi(@esp+8); .if ($t0 = 0xffffffeb) { r @eip; } .else { g }" bp user32!SetWindowLongW ".lastevent; dps @esp L4; .if (poi(@esp+8) = -2) { r @eip; } .else { g }" ``` When analyzing a TTD trace, it is quicker to list the function calls while extracting their parameters to anonymous objects, for example: ```cpp dx -g @$cursession.TTD.Calls("user32!NtUserSetWindowPos").Select(c => new { HWND = c.Parameters[0], WClass = @$scriptContents.findWindow(c.Parameters[0]).className, X = c.Parameters[2], Y = c.Parameters[3], TimeStart = c.TimeStart, SystemTime = c.SystemTimeStart }) dx -g @$cursession.TTD.Calls("user32!SetParentStub").Select(c => new { Child = c.Parameters[0], ChildClass = @$scriptContents.findWindow(c.Parameters[0]).className, Parent = c.Parameters[1], ParentClass = @$scriptContents.findWindow(c.Parameters[1]).className, TimeStart = c.TimeStart, SystemTime = c.SystemTimeStart }) ``` I also created a [**winapi-user32.ps1**](/assets/other/winapi-user32.ps1.txt) script, which decodes some of the window flag values to their text representation, for example: ```sh # load script . winapi-user32.ps1 # decode GWL_STYLE flag Get-EnumFlagsFromMask -Enum ([GWL_STYLE]) -Mask 382664704 # WS_MAXIMIZEBOX # WS_MINIMIZEBOX # WS_THICKFRAME # WS_SYSMENU # WS_DLGFRAME # WS_BORDER # WS_CAPTION # WS_CLIPCHILDREN # WS_CLIPSIBLINGS # WS_VISIBLE # decode GWL_EXSTYLE flag Get-EnumFlagsFromMask -Enum ([GWL_EXSTYLE]) -Mask 262400 # WS_EX_WINDOWEDGE # WS_EX_APPWINDOW Get-EnumFlagsFromMask ([SWP]) 20 # SWP_NOZORDER # SWP_NOACTIVATE ``` {% endraw %} ================================================ FILE: guides/ebpf.md ================================================ --- layout: page title: eBPF date: 2025-12-22 08:00:00 +0200 --- {% raw %} **Table of contents:** - [General information](#general-information) - [bpftrace](#bpftrace) - [Probe Metadata](#probe-metadata) - [Language Syntax](#language-syntax) - [CPU Sampling](#cpu-sampling) - [Available functions](#available-functions) - [My one-liners](#my-one-liners) General information ------------------- [Main project page](https://ebpf.io/) To use eBPF you need to hold the following **required capabilities**: `CAP_BPF`, `CAP_PERFMON` (loading tracing programs), `CAP_NET_ADMIN` (loading network programs). bpftrace -------- ### Probe Metadata Information about available probes (instrumentation point for capturing event data) can be retrieved with the **-l** option, e.g.: ```shell bpftrace -l 'tracepoint:syscalls:*execve*' # tracepoint:syscalls:sys_enter_execve # tracepoint:syscalls:sys_enter_execveat # tracepoint:syscalls:sys_exit_execve # tracepoint:syscalls:sys_exit_execveat # and parameters bpftrace -lv 'tracepoint:syscalls:sys_enter_execve*' # tracepoint:syscalls:sys_enter_execve # int __syscall_nr # const char * filename # const char *const * argv # const char *const * envp # tracepoint:syscalls:sys_enter_execveat # int __syscall_nr # int fd # const char * filename # const char *const * argv # const char *const * envp # int flags ``` ### Language Syntax In each event, we can reference one of the **[built-in variables](https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#builtins)**, including: `comm` (process name), `pid`, `tid`, or `args` (a special variable that allows us to access arguments of a given event, e.g., `args.filename` for `tracepoint:syscalls:sys_enter_openat`). Additionally, we can create so-called **scratch variables**, which will only be visible within a given probe: ```perl BEGIN { let $n = (uint8)1; } END { printf("%d", $n) } # error - $n is not available ``` Hashmaps, on the other hand, remain visible throughout the entire script execution: ```perl BEGIN { @myconf["only_stacks"] = (uint8)0; } END { printf("Stats enabled: %d\n", @myconf["only_stats"]); delete(@myconf, "only_stacks"); } ``` We also cannot change the type of either a variable or a value stored in a map, e.g.: ```perl @myconf["pid"] = $# > 1 ? $2 : 0; @myconf["pid"] = "test"; # error ``` The default numeric type is `uint64` and everything is cast to it, e.g.: ```perl # pid will be stored as uint64 even if we cast to int32 BEGIN { @myconf["pid"] = $# > 1 ? $2 : 0; } # WARNING: comparison of integers of different signs: 'int32' and 'uint64' can lead to undefined behavior tracepoint:sched:sched_process_fork / @myconf["pid"] != 0 && args.parent_pid == @myconf["pid"] / # OK: tracepoint:sched:sched_process_fork / @myconf["pid"] != 0 && args.parent_pid == (int32)@myconf["pid"] / ``` At the beginning of a script, we can use **preprocessor directives**, but `#define` only works for constants (when I tried to add a function, I got a strange error). In the preamble, we can also create our own types, but it seems we can only use them when working with C pointers. I couldn't initialize a variable of my type (`$t : my_struct = {}`). However, we can use tuples, which should often be sufficient. We reference tuple fields by their numeric index, e.g.: ```perl tracepoint:syscalls:sys_enter_openat { @openat[tid] = (args.dfd, args.filename, args.mode); } tracepoint:syscalls:sys_exit_openat { $eventName = "file_openat"; PRINT_EVENT_COMMONS; $data = @openat[tid]; printf("dfd: %d filename: '%s', mode: %x, ret: %d\n", $data.0, str($data.1), $data.2, args.ret); delete(@openat[tid]); } ``` `BEGIN` is one of the **[available probes](https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#probes)** that allows code execution at the start of a tracing session, e.g.: ```shell bpftrace -e 'BEGIN { printf("hello world\n"); }' # Attaching 1 probe... # hello world # ^C ``` If we prefix a variable name with `@`, we get a hashmap (without a name, we'll use the global hashmap). We can access its keys through square brackets. At the end of tracing, bpftrace outputs all used hashmaps, e.g.: ```perl bpftrace -e 'tracepoint:syscalls:sys_enter_write { @[comm] = count(); }' # Attaching 1 probe... # ^C # # @[rtkit-daemon]: 1 # @[Worker Launcher]: 1 # ... bpftrace -e 'tracepoint:syscalls:sys_enter_write { @write[comm] = count(); } tracepoint:syscalls:sys_enter_writev { @writev[comm] = count(); }' # Attaching 2 probes... # ^C # # @write[redshift-gtk]: 2 # @write[syncthing]: 2 # @write[redshift]: 2 # @writev[at-spi2-registr]: 2 # @writev[redshift]: 4 ``` If we add `/ ... /` after a syscall name, we can place **a filter** between the slashes, e.g., `pid == 1234`, to display events only for the process with ID 1234, e.g.: ```perl bpftrace -e 'tracepoint:syscalls:sys_enter_write / comm == "fish" / { @ = count(); }' # Attaching 1 probe... # ^C # # @: 415 ``` We can also use ifs inside action code. `count` is one of the **[map functions](https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#map-functions)** we can use to generate hashmaps. `hist`, `stats` and `avg` are other such functions. ### CPU Sampling bpftrace [supports debuginfod symbols](https://github.com/iovisor/bcc/pull/3393/files) and this is awesome because, for example, ustack or kstack show real stacks. After collecting a trace, it can be converted to a flame graph using scripts from the [FlameGraph](https://github.com/brendangregg/FlameGraph) repository, e.g.: ```perl bpftrace -o test-service.out -q -e 'profile:hz:99 / comm == "test-service" / { @[ustack()] = count(); }' ./stackcollapse-bpftrace.pl test-service.out > test-service.flame ./flamegraph.pl test-service.flame > test-service.flame.svg ``` ### Available functions In the **printf** function, the '-' character before the width means the text will be left-aligned, e.g.: ```perl printf("|%-15s|\n", "TIME"); #|TIME | printf("|%15s|\n", "TIME"); #| TIME| ``` ### My one-liners ```perl # openat with process information and collected stack bpftrace -e 'tracepoint:syscalls:sys_enter_openat / strcontains(comm, "dump-") == 1 / { printf("%d:%s %d %s\n", pid, comm, args.dfd, str(args.filename)); print(ustack()); }' ``` {% endraw %} ================================================ FILE: guides/etw.md ================================================ --- layout: page title: Event Tracing for Windows (ETW) date: 2025-10-02 08:00:00 +0200 redirect_from: - /guides/using-etw/ --- {% raw %} **Table of contents:** - [General information](#general-information) - [Tools](#tools) - [Windows Performance Recorder \(WPR\)](#windows-performance-recorder-wpr) - [Profiles](#profiles) - [Starting and stopping the trace](#starting-and-stopping-the-trace) - [Issues](#issues) - [Windows Performance Analyzer \(WPA\)](#windows-performance-analyzer-wpa) - [Installation](#installation) - [Tips on analyzing events](#tips-on-analyzing-events) - [Perfview](#perfview) - [Installation](#installation_1) - [Tips on recording events](#tips-on-recording-events) - [Tips on analyzing events](#tips-on-analyzing-events_1) - [Live view of events](#live-view-of-events) - [Issues](#issues_1) - [logman](#logman) - [Querying providers installed in the system](#querying-providers-installed-in-the-system) - [Starting and stopping the trace](#starting-and-stopping-the-trace_1) - [wevtutil](#wevtutil) - [tracerpt](#tracerpt) - [xperf](#xperf) - [TSS \(TroubleShootingScript toolset\)](#tss-troubleshootingscript-toolset) - [MSO scripts \(PowerShell\)](#mso-scripts-powershell) - [Event types](#event-types) - [Autologger events](#autologger-events) - [System boot events](#system-boot-events) - [File events](#file-events) - [Registry events](#registry-events) - [WPP events](#wpp-events) - [Libraries](#libraries) - [ETW tools and libs \(including EtwEnumerator\)](#etw-tools-and-libs-including-etwenumerator) - [TraceProcessing](#traceprocessing) - [WPRContol](#wprcontol) - [TraceEvent](#traceevent) - [KrabsETW](#krabsetw) - [Performance Logs and Alerts \(PLA\)](#performance-logs-and-alerts-pla) - [System API](#system-api) General information ------------------- When loading **symbols**, the ETW tools and libraries use the **\_NT\_SYMBOLS\_PATH** environment variable to download (and cache) the PDB files and **\_NT\_SYMCACHE\_PATH** to store their preprocessed (cached) versions. An example machine configuration might look as follows: ```shell setx /M _NT_SYMBOL_PATH "SRV*C:\symbols\dbg*https://msdl.microsoft.com/download/symbols" setx /M _NT_SYMCACHE_PATH "C:\symcache" ``` On Windows 7 64-bit, to improve stack walking, disable paging of the drivers and kernel-mode system code: ```sh reg add "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG\_DWORD -f # or wpr.exe -disablepagingexecutive` ``` For **manifest-based providers** set `MatchAnyKeywords` to `0x00` to receive all events. Otherwise you need to create a bitmask which will be or-ed with event keywords. Additionally when `MatchAllKeywords` is set, its value is used for events that passed the `MatchAnyKeywords` test and providers additional and filtering. For **classic providers** set `MatchAnyKeywords` to `0xFFFFFFFF` to receive all events. Up to 8 sessions may collect manifest-based provider events, but only 1 session may be created for a classic provider (when a new session is created the provider switches to the session). When creating a session we may also specify the minimal severity level for collected events, where `1` is the critical level and `5` the verbose level (all events are logged). Tools ----- ### Windows Performance Recorder (WPR) #### Profiles As its name suggests, WPR is a tool that records ETW traces and is available on all modern Windowses. It is straightforward to use and provides a big number of **ready-to-use tracing profiles**. We can list them with the `-profiles` command and show any profile details with the `profiledetails` command, for example: ```shell # list available profiles with their short description wpr.exe -profiles # ... # GeneralProfile First level triage # CPU CPU usage # DiskIO Disk I/O activity # FileIO File I/O activity # ... # show profile details wpr.exe -profiledetails CPU # ... # Profile : CPU.Verbose.Memory # # Collector Name : WPR_initiated_WprApp_WPR System Collector # Buffer Size (KB) : 1024 # Number of Buffers : 3258 # Providers # System Keywords # CpuConfig # CSwitch # ... # SampledProfile # ThreadPriority # System Stacks # CSwitch # ReadyThread # SampledProfile # # Collector Name : WPR_initiated_WprApp_WPR Event Collector # Buffer Size (KB) : 1024 # Number of Buffers : 20 # Providers # b7a19fcd-15ba-41ba-a3d7-dc352d5f79ba: : 0xff # e7ef96be-969f-414f-97d7-3ddb7b558ccc: 0x2000: 0xff # Microsoft-JScript: 0x1: 0xff # Microsoft-Windows-BrokerInfrastructure: 0x1: 0xff # Microsoft-Windows-DotNETRuntime: 0x20098: 0x05 # ... # Microsoft-Windows-Win32k: 0x80000: 0xff ``` Profiles often come in two versions: verbose and light, and we decide which one to use by appending "Verbose" or "Light" to the main profile name (if we do not specify the version, WPR defaults to "Verbose"), for example: ```sh wpr.exe -profiledetails CPU.Light ``` The trace could be memory- or file- based, with memory-based being the default. We can switch to the file-based profile by using the `-filemode` option. If we can find a profile for our tracing scenario, we may build a custom one (WPR profile schema is documented [here](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/recording-profile-xml-reference)). It is often easier to base it one of the existing profiles, which we may extract with the `-exportprofile` command, for example: ```sh # export the memory-based CPU.Light profilek wpr.exe -exportprofile CPU.Light C:\temp\CPU.light.wprp # export the file-based CPU.Light profilek wpr.exe -exportprofile CPU.Light C:\temp\CPU.light.wprp -filemode ``` Interestingly, in the XML file, profile names include also the tracing mode, so the memory-based profile will have name `CPU.Light.Memory`, as you can see in the example below: ```xml ``` An exteremly important parameter of the collector configuration are buffers. If we look into the exported profiles, we will find that the number of buffers differs depending on the mode which we use for tracing. Memory-based profiles will use a much higher number of buffers, for example: ```xml ``` The number of buffers depends also on the amount of memory on the host. Because `BufferSize` specifies memory size in KB, the above space is quite large (1GB). In memory mode, we operate on circular in-memory buffers - the system adds new buffers when the previous buffers fill up. When it reaches the maximum, it begins to overwrite events in the oldest buffers. For a file-based traces, the number of buffers is much smaller, as we only need to ensure that we are not dropping events because the disk cannot keep up with the write operations. Apart from keywords and levels, we may **[filter the trace and stack events](https://devblogs.microsoft.com/performance-diagnostics/filtering-events-using-wpr/)** by the event IDs (`EventFilters`, `StackFilters`). Filtering by process name is also possible, however, in my tests I found that the `ProcessExeFilter` works only for processes already running when we start the trace: ```xml ``` Working with WPR profiles is described in details in a great series of posts on [Microsoft's Performance and Diagnostics blog](https://devblogs.microsoft.com/performance-diagnostics/) and I highly recommend reading them: - [WPR Start and Stop Commands](https://devblogs.microsoft.com/performance-diagnostics/wpr-start-and-stop-commands/) - [Authoring custom profiles – Part 1](https://devblogs.microsoft.com/performance-diagnostics/authoring-custom-profiles-part-1/) - [Authoring Custom Profiles – Part 2](https://devblogs.microsoft.com/performance-diagnostics/authoring-custom-profiles-part-2/) - [Authoring Custom Profiles – Part 3](https://devblogs.microsoft.com/performance-diagnostics/authoring-custom-profile-part3/) I also created **an [EtwMetadata.ps1](/assets/other/EtwMetadata.ps1.txt) script that you may use to decode the wprp files**. For example: ```sh wpr.exe -exportprofile CPU.Light C:\temp\CPU.light.wprp curl.exe -o C:\temp\EtwMetadata.ps1 https://wtrace.net/assets/other/EtwMetadata.ps1.txt . C:\temp\EtwMetadata.ps1 # Initializing ETW providers metadata... Get-EtwProvidersFromWprProfile C:\temp\CPU.light.wprp # WARNING: No metadata found for provider 'b7a19fcd-15ba-41ba-a3d7-dc352d5f79ba' # WARNING: No metadata found for provider 'e7ef96be-969f-414f-97d7-3ddb7b558ccc' # Id Name Keywords # -- ---- -------- # 36b6f488-aad7-48c2-afe3-d4ec2c8b46fa Microsoft-Windows-Performance-Recorder-Control @{Name=PerfStatus; Value=65536} # b675ec37-bdb6-4648-bc92-f3fdc74d3ca2 Microsoft-Windows-Kernel-EventTracing @{Name=ETW_KEYWORD_LOST_EVENT; Val… # 83ed54f0-4d48-4e45-b16e-726ffd1fa4af Microsoft-Windows-Networking-Correlation {@{Name=ActivityTransfer; Value=1}… # d8975f88-7ddb-4ed0-91bf-3adf48c48e0c Microsoft-Windows-RPCSS {@{Name=EpmapDebug; Value=256}, @{… # 6ad52b32-d609-4be9-ae07-ce8dae937e39 Microsoft-Windows-RPC # d49918cf-9489-4bf1-9d7b-014d864cf71f Microsoft-Windows-ProcessStateManager {@{Name=StateChange; Value=1}, @{N… # e6835967-e0d2-41fb-bcec-58387404e25a Microsoft-Windows-BrokerInfrastructure @{Name=BackgroundTask; Value=1} ``` #### Starting and stopping the trace After picking a profile or profiles that we want to use, we can **start a tracing session** with the `-start` command. Some examples: ```sh # starts verbose CPU profile wpr.exe -start CPU.verbose # same as above wpr.exe -start CPU # starts light CPU profile wpr.exe -start CPU.light # multiple profiles start wpr.exe -start CPU -start VirtualAllocation -start Network # starts a custom WPRTest.Verbose profile defined in the C:\temp\CustomProfile.wprp file wpr.exe -start "C:\temp\CustomProfile.wprp!WPRTest" -filemode # starts a custom WPRTest.Light profile defined in the C:\temp\CustomProfile.wprp file wpr.exe -start "C:\temp\CustomProfile.wprp!WPRTest.Light" ``` There could be only one WPR trace running in the system and we can check its status using the `-status` command: ```sh wpr -status # Microsoft Windows Performance Recorder Version 10.0.26100 (CoreSystem) # Copyright (c) 2024 Microsoft Corporation. All rights reserved. # # WPR recording is in progress... # # Time since start : 00:00:01 # Dropped event : 0 # Logging mode : File ``` To **terminate the trace** we may use either the `-stop` or the `-cancel` command: ```shell # stopping the trace and saving it to a file with an optional description wpr.exe -stop "C:\temp\testapp-fail.etl" "Abnormal termination of testapp.exe" # cancelling the trace (no trace files will be created) wpr.exe -cancel ``` #### Issues ##### Error 0x80010106 (RPC_E_CHANGED_MODE) If it happens when you run the `-stop` command, use wpr.exe from Windows SDK, build 1950 or later. ##### Error 0xc5580612 If you are using `ProcessExeFilter` in your profile, this error may indicate that a process with a given name is not running when the trace starts (it is thrown by `WindowsPerformanceRecorderControl!WindowsPerformanceRecorder::CControlManager::VerifyAllProvidersEnabled`): ``` An Event session cannot be started without any providers. Profile Id: Wtrace.Verbose.File Error code: 0xc5580612 An Event session cannot be started without any providers. ``` ### Windows Performance Analyzer (WPA) #### Installation **Windows Performance Analyzer (wpa.exe)**, may be installed from [Microsoft Store](https://apps.microsoft.com/store/detail/windows-performance-analyzer-preview/9N58QRW40DFW?hl=en-sh&gl=sh) (recommended) or as part of the **Windows Performance Toolkit**, included in the [Windows SDK](https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/). #### Tips on analyzing events In **CPU Wait analysis**, each row marks a moment, when a thread received CPU time ([MS docs](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/cpu-analysis#cpu-usage-precise-graph)) after, for example, waiting on an event object. The `Readying Thread` is the thread that woke up the `New Thread`. And the `Old Thread` is the thread which gave place on a CPU to the `New Thread`. The diagram below from Microsoft documentation nicely explain those terms: ![](/assets/img/cpu-usage-precise-diagram.jpg) Here is an example view of my test GUI app when I call the `Sleep` function after pressing a button: ![](/assets/img/ui-delay-with-cpu-precise.png) As you can see, the `Wait` column shows the time spent on waiting, while the UI view shows the time when the application was unresponsive. WPA allows us to **group the call stacks** by tags. The default stacktag list can be found in the `c:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\Catalog\default.stacktags` file. We may also **extend WPA with our own plugins**. The [SDK repository](https://github.com/microsoft/microsoft-performance-toolkit-sdk/) contains sample extensions. [Wpa.Demystifier](https://github.com/Zhentar/Wpa.Demystifier/tree/master) is another interesting extension to check. ### Perfview #### Installation Could be downloaded from [its release page](https://github.com/microsoft/perfview/releases) or installed with winget: ```sh winget install --id Microsoft.PerfView ``` #### Tips on recording events Most often you will use the Collect dialog, but it is also possible to use PerfView from a command line. An example command collecting traces into a 500MB file (in circular mode) may look as follows: ```sh perfview -AcceptEULA -ThreadTime -CircularMB:500 -Circular:1 -LogFile:perf.output -Merge:TRUE -Zip:TRUE -noView collect ``` A new console window will open with the following text: ``` Pre V4.0 .NET Rundown enabled, Type 'D' to disable and speed up .NET Rundown. Do NOT close this console window. It will leave collection on! Type S to stop collection, 'A' will abort. (Also consider /MaxCollectSec:N) Type 'S' when you are done with tracing and wait (DO NOT CLOSE THE WINDOW) till you see `Press enter to close window`. Then copy the files: PerfViewData.etl.zip and perf.output to the machine when you will perform analysis. ``` If you are also interested in the network traces append the `-NetMonCapture` option. This will generate an additional PerfViewData_netmon.cab file. If we use the EventSource provider and want to collect the call stacks along with the events, we need to append `@StacksEnabled=true` to the provider name, for example: `*EFTrace:@StacksEnabled=true`. #### Tips on analyzing events Select a **time range** and press `Alt+R` to set it for the grid. We may also copy a range, paste it in the Start box and then press Enter to apply it (PerfView should fill the End box). The table below contains grouping patterns I use for various analysis targets Name | Pattern -------- | -------- Just my code with folded threads | `[My app + folded threads] \Temporary ASP.NET Files\->;!dynamicClass.S->;!=>OTHER;Thread->AllThreads` | Just my code with folded threads (ASP.NET view) | `[My app + folded threads and ASP.NET requests] Thread -> AllThreads;Request ID * URL: {*}-> URL $1;\Temporary ASP.NET Files\->;!dynamicClass.S->;!=>OTHER` Just my code with folded threads (Server requests view) | `[My app + folded threads and requests] Thread -> AllThreads;ASP.NET Request: * URL: {*}-> URL $1;\Temporary ASP.NET Files\->;!dynamicClass.S->;!=>OTHER` Group requests | `^Request ID->ALL Requests` Group requests by URL | `Request ID * URL:{*}->$1` Group async calls (by Christophe Nasarre) | `{%}!{%}+<>c__DisplayClass*+<<{%}>b__*>d.MoveNext()->($1) $2 async $3` When exporting to **Excel**, the data coming from PerfView often does not have valid formatting and contains some strange characters at the beginning or at the end, for example: ``` 0000 A0 A0 32 32 34   224 ``` We may clean up those values by using the **SUBSTITUTE** function, for example: ``` =SUBSTITUTE(A1,LEFT(A1,1),"") =SUBSTITUTE(A1,RIGHT(A1,1),"") ``` And later do the usual Copy, Paste as Values operation. Alternatively, we may copy the values column by column. In that case, PerfView won't insert those special characters. If we want to open a trace created by PerfView in **WPA**, we need to first convert it, for example: ```sh perfview /wpr unzip test.etl.zip # The above command should create two files (.etl and .etl.ngenpdb) # and we can open wpa wpa test.etl ``` #### Live view of events The `Listen` user command enables a live view dump of events in the PerfView log: ```sh PerfView.exe UserCommand Listen Microsoft-JScript:0x7:Verbose # inspired by Konrad Kokosa's tweet PerfView.exe UserCommand Listen Microsoft-Windows-DotNETRuntime:0x1:Verbose:@EventIDsToEnable="1 2" ``` #### Issues ##### Error 0x800700B7 (ERROR_ALREADY_EXISTS) ``` [Kernel Log: C:\tools\PerfViewData.kernel.etl] Kernel keywords enabled: Default Aborting tracing for sessions 'NT Kernel Logger' and 'PerfViewSession'. Insuring .NET Allocation profiler not installed. Completed: Collecting data C:\tools\PerfViewData.etl (Elapsed Time: 0,858 sec) Exception Occured: System.Runtime.InteropServices.COMException (0x800700B7): Cannot create a file when that file already exists. (Exception from HRESULT: 0x800700B7) at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo) at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.EnableKernelProvider(Keywords flags, Keywords stackCapture) at PerfView.CommandProcessor.Start(CommandLineArgs parsedArgs) at PerfView.CommandProcessor.Collect(CommandLineArgs parsedArgs) at PerfView.MainWindow.c__DisplayClass9.b__7() at PerfView.StatusBar.c__DisplayClass8.b__6(Object param0) An exceptional condition occurred, see log for details. ``` If you receive such error, make sure that no kernel log is running with `perfview listsessions` and eventually kill it with `perfview abort`. ### logman Nowadays, logman will not be our first choice tool to collect ETW trace, but the best thing about it is that it is a built-in tool and has been available in Windows for many years already, so might be the only option if you need to work on a legacy Windows system. #### Querying providers installed in the system Logman is great for querying ETW providers installed in the system or activated in a given process: ```sh # list all providers in the system logman query providers # show details about the ".NET Common Language Runtime" provider logman query providers ".NET Common Language Runtime" # list providers active in a process with ID 808 logman query providers -pid 808 ``` #### Starting and stopping the trace The following commands start and stop a tracing session that is using one provider: ```sh logman start mysession -p {9744AD71-6D44-4462-8694-46BD49FC7C0C} -o "c:\temp\test.etl" -ets & timeout -1 & logman stop mysession -ets ``` For the provider options you may additionally specify the keywords (flags) and levels that will be logged: `-p provider [flags [level]]` You may also use a file with a list of providers: ```sh logman start mysession -pf providers.guids -o c:\temp\test.etl -ets & timeout -1 & logman stop mysession -ets ``` And the `providers.guids` file content is built of lines following the format: `{guid} [flags] [level] [provider name]` (flags, level, and provider name are optional), for example: ``` {AFF081FE-0247-4275-9C4E-021F3DC1DA35} 0xf 5 ASP.NET Events {3A2A4E84-4C21-4981-AE10-3FDA0D9B0F83} 0x1ffe 5 IIS: WWW Server ``` If you want to record events from the **kernel provider** you need to name the session: `NT Kernel Logger`, for example: ```sh logman start "NT Kernel Logger" -p "Windows Kernel Trace" "(process,thread,file,fileio,net)" -o c:\kernel.etl -ets & timeout -1 & logman stop "NT Kernel Logger" -ets ``` To see the available kernel provider keywords, run: ```sh logman query providers "Windows Kernel Trace" # Provider GUID # ------------------------------------------------------------------------------- # Windows Kernel Trace {9E814AAD-3204-11D2-9A82-006008A86939} # # Value Keyword Description # ------------------------------------------------------------------------------- # 0x0000000000000001 process Process creations/deletions # 0x0000000000000002 thread Thread creations/deletions # ... ``` Additionally, we may change the way how events are saved to the file using the `-mode` parameter. For example, to use a circular file with maximum size of 200MB, we can run the following command: ```sh logman start "NT Kernel Logger" -p "Windows Kernel Trace" "(process,thread,img)" -o C:\ntlm-kernel.etl -mode circular -max 200 -ets ``` ### wevtutil Wevtutil is a built-in tool that allows us to manage **manifest-based providers (publishers)** installed in our system. Example usages: ```sh # list all installed publishers wevtutil ep # find MSMQ publishers wevtutil ep | findstr /i msmq # extract details about a Microsoft-Windows-MSMQ publisher wevtutil gp Microsoft-Windows-MSMQ /ge /gm /f:xml ``` ### tracerpt Tracerpt is another built-in tool. It may collect ETW traces, but I usually use it only to convert etl files from binary to text format. Example commands: ```sh # convert etl file to evtx tracerpt -of EVTX test.etl -o test.evtx -summary test-summary.xml # dump events to an XML file tracerpt test.etl -o test.xml -summary test-summary.xml # dump events to a HTML file tracerpt.exe '.\NT Kernel Logger.etl' -o -report -f html ``` ### xperf For a long time xperf was the best tool to collect ETW traces, providing ways to configure many aspects of the tracing sessions. It is now considered legacy (with [wpr](#windows-performance-recorder-wpr) being its replacement), but many people still find its command line syntax eaier to use than WPR profiles. Here are some usage examples: ```sh # list available Kernel Flags xperf -providers KF # PROC_THREAD : Process and Thread create/delete # LOADER : Kernel and user mode Image Load/Unload events # PROFILE : CPU Sample profile # CSWITCH : Context Switch # ... # list available Kernel Groups xperf -providers KG # Base : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+PROFILE+MEMINFO+MEMINFO_WS # Diag : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+DPC+INTERRUPT+CSWITCH+PERF_COUNTER+COMPACT_CSWITCH # DiagEasy : PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+DPC+INTERRUPT+CSWITCH+PERF_COUNTER # ... # list installed providers xperf -providers I # 0063715b-eeda-4007-9429-ad526f62696e : Microsoft-Windows-Services # 0075e1ab-e1d1-5d1f-35f5-da36fb4f41b1 : Microsoft-Windows-Network-ExecutionContext # 00b7e1df-b469-4c69-9c41-53a6576e3dad : Microsoft-Windows-Security-IdentityStore # 01090065-b467-4503-9b28-533766761087 : Microsoft-Windows-ParentalControls # ... # start the kernel trace, enabling flags defined in the DiagEasy group xperf -on DiagEasy # stop the kernel trace xperf -stop -d "c:\temp\DiagEasy.etl" # start the kernel with some additional settings and wait for the user to stop it xperf -on Latency -stackwalk Profile -buffersize 2048 -MaxFile 1024 -FileMode Circular && timeout -1 && xperf stop -d "C:\highCPUUsage.etl" # in user-mode tracing you may still use kernel flags and groups but for each user-trace provider # you need to add some additional parameters: -on (GUID|KnownProviderName)[:Flags[:Level[:0xnnnnnnnn|'stack|[,]sid|[,]tsid']]] xperf -start ClrRundownSession -on ClrAll:0x118:5+a669021c-c450-4609-a035-5af59af4df18:0x118:5 -f clr_DCend.etl -buffersize 128 -minbuffers 256 -maxbuffers 512 timeout /t 15 xperf -stop ClrSession ClrRundownSession -stop -d cpu_clr.etl # dump collected events to a text file xperf -i test.etl -o test.csv ``` Chad Schultz published [many xperf scripts](https://github.com/itoleck/WindowsPerformance/tree/main/ETW/Tools/WPT/Xperf/CaptureScripts) in the [WindowsPerformance repository](https://github.com/itoleck/WindowsPerformance), so check them out if you are interested in using xperf. ### TSS (TroubleShootingScript toolset) TSS contains tons of various scripts and ETW is only a part of it. TSS official documentation is [here](https://learn.microsoft.com/en-us/troubleshoot/windows-client/windows-tss/introduction-to-troubleshootingscript-toolset-tss) and we can download the package from . Here is an example PowerShell script to install and run the main script: ```shell powershell.exe -NoProfile -ExecutionPolicy RemoteSigned -Command "Invoke-WebRequest -Uri https://aka.ms/getTSS -OutFile $env:TEMP\TSS.zip; Unblock-File $env:TEMP\TSS.zip; Expand-Archive -Force -LiteralPath $env:TEMP\TSS.zip -DestinationPath C:\TSS; Remove-Item $env:TEMP\TSS.zip; C:\TSS\TSS.ps1 -ListSupportedTrace" ``` TSS defined many **troubleshooting scenarios** with precompiled parameters: ```shell C:\tSS\TSS.ps1 -ListSupportedScenarioTrace # ... # NET_General - collects CommonTask NET, NetshScenario InternetClient_dbg, Procmon, PSR, Video, SDP NET, xray, CollectComponentLog # ... ``` where: - `CommonTask` are commands run before and after the scenario (only `NET` in this case) - `NetshScenario` is the selected netsh scenario (`InternetClient_dbg`) - `Procmon` will start procmon - `PSR` will run step recorder - `Video` will record a video of what the user is doing - `SDP` (Support Diagnostic Package) and `NET` enable `General`, `SMB`, and `NET` counters - `xray` runs xray scripts to discover existing problems - `CollectComponentLog` collects logs of commands run in a given scenario To start a scenario, we run: ```shell C:\TSS\TSS.ps1 -Scenario NET_General ``` We may also manually "compose" the TSS command. A nice GUI tool for this purpose is `.\TSSGUI.ps1` (start it from the TSS folder). We may also list available TSS features: ```shell C:\TSS\TSS.ps1 -ListSupportedCommands C:\TSS\TSS.ps1 -ListSupportedControls C:\TSS\TSS.ps1 -ListSupportedDiag C:\TSS\TSS.ps1 -ListSupportedLog C:\TSS\TSS.ps1 -ListSupportedNetshScenario C:\TSS\TSS.ps1 -ListSupportedNoOptions C:\TSS\TSS.ps1 -ListSupportedPerfCounters C:\TSS\TSS.ps1 -ListSupportedScenarioTrace C:\TSS\TSS.ps1 -ListSupportedSDP C:\TSS\TSS.ps1 -ListSupportedSetOptions C:\TSS\TSS.ps1 -ListSupportedTrace C:\TSS\TSS.ps1 -ListSupportedWPRScenario C:\TSS\TSS.ps1 -ListSupportedXperfProfile ``` Example commands to check which ETW providers the `NET_COM` component is using: ```shell .\TSS.ps1 -ListSupportedTrace | select-string "_COM" # [Component] -NET_COM COM/DCOM/WinRT/PRC component tracing. -EnableCOMDebug will enable further debug logging # [Component] -UEX_COM COM/DCOM/WinRT/PRC component ETW tracing. -EnableCOMDebug will enable further debug logging # Usage: # .\TSS.ps1 - - # Example: .\TSS.ps1 -UEX_FSLogix -UEX_Logon .\TSS -ListETWProviders NeT_COM # List of 20 Provider GUIDs (Flags/Level) for ComponentName: NET_COM # ========================================================== # {9474a749-a98d-4f52-9f45-5b20247e4f01} # {bda92ae8-9f11-4d49-ba1d-a4c2abca692e} # ... ``` The TSS commands create raports in the `C:\MS_DATA` folder. To collect the trace in the background we may use the `-StartNoWait` option and `-Stop` to stop the trace. If we add the `-StartAutoLogger` option, our trace will start when the system boots. We stop by calling `TSS.ps1 -Stop`, as usual. Example commands: ```shell # starting WPR using TSS C:\TSS\TSS.ps1 -WPR CPU -WPROptions "-start Dotnet -start DesktopComposition" # Starting time travel debugging session using TSS # 1234 is the process PID (we may use process name as well, for example winver.exe) C:\TSS\TSS.ps1 -AcceptEula -TTD 1234 ``` ### MSO scripts (PowerShell) [MSO-Scripts repository](https://github.com/microsoft/MSO-Scripts) hosts many interesting PowerShell scripts for working with ETW traces. Event types ----------- ### Autologger events Autologger ETW session collects events appearing after the system start. It can be enabled with wpr: ```sh wpr.exe -boottrace -addboot FileIO ``` Additional information: - [Autologger session](https://learn.microsoft.com/en-us/windows/win32/etw/configuring-and-starting-an-autologger-session) - [Autologger with WPR](https://devblogs.microsoft.com/performance-diagnostics/setting-up-an-autologger-with-wpr/) ### System boot events To collect general profile traces use: ```sh wpr.exe -start generalprofile -onoffscenario boot -numiterations 1 ``` ### File events Described in [a post on my blog](https://lowleveldesign.org/2020/08/15/fixing-empty-paths-in-fileio-events-etw/). ### Registry events Described in [a post on my blog](https://lowleveldesign.org/2020/08/20/monitoring-registry-activity-with-etw/). ### WPP events WPP events are legacy events, for which we need TMF files to decode their payload. TMF may be available as standalone files or they might be embedded into PDB files. For the latter case, we may extract them using **tracepdb.exe**, for example: ```sh tracepdb.exe -f .\combase.pdb -p .\tmfs ``` TMF data is stored as a binary block in the PDB file: ``` 0D9:46A0 BA 00 19 10 20 52 0A 00 01 00 06 00 54 4D 46 3A º... R......TMF: 0D9:46B0 00 64 61 66 38 39 65 63 31 2D 64 66 66 32 2D 33 .daf89ec1-dff2-3 0D9:46C0 30 35 35 2D 36 30 61 62 2D 36 33 64 34 63 31 31 055-60ab-63d4c11 0D9:46D0 62 33 64 39 63 20 4F 4C 45 43 4F 4D 20 2F 2F 20 b3d9c OLECOM // 0D9:46E0 53 52 43 3D 63 6F 6D 74 72 61 63 65 77 6F 72 6B SRC=comtracework 0D9:46F0 65 72 2E 63 78 78 20 4D 4A 3D 20 4D 4E 3D 00 23 er.cxx MJ= MN=.# 0D9:4700 74 79 70 65 76 20 63 6F 6D 74 72 61 63 65 77 6F typev comtracewo 0D9:4710 72 6B 65 72 5F 63 78 78 31 38 36 20 31 31 20 22 rker_cxx186 11 " 0D9:4720 25 30 25 31 30 21 73 21 22 20 2F 2F 20 20 20 4C %0%10!s!" // L 0D9:4730 45 56 45 4C 3D 57 41 52 4E 49 4E 47 00 7B 00 6D EVEL=WARNING.{.m 0D9:4740 65 73 73 61 67 65 2C 20 49 74 65 6D 57 53 74 72 essage, ItemWStr 0D9:4750 69 6E 67 20 2D 2D 20 31 30 00 7D 00 BA 00 19 10 ing -- 10.}.º... ``` The GUID at the beginning of the block defines the provider ID and may appear multiple times in the PDB file. Tracepdb uses this ID as the name of the generated TMF file. When decoding WPP events, if we do not configure the `TDH_CONTEXT_WPP_TMFSEARCHPATH`, Tdh functions will look for TMF files in the path specified in the [TRACE_FORMAT_SEARCH_PATH environment variable](https://learn.microsoft.com/en-us/windows/win32/api/tdh/ne-tdh-tdh_context_type). **WPA** has a special view for WPP events and can load the TMF manifests from symbol files, so **remember to first load the symbols**. Libraries --------- This section lists some of the ETW libraries I used with my notes about them. It is not meant to be a comprehensive documentation of those libraries, but rather a list of tips and tricks. ### ETW tools and libs (including EtwEnumerator) [Source code](https://github.com/microsoft/ETW) This C++ library contains code to parse ETW events. The sample EtwEnumerator CLI tool formats events from a binary etl file to their text representation. To build the library run: ```shell cd EtwEnumerator cmake -B bin . cmake --build bin ``` The `EtwEnumerator` instance stores information about the currently analyzed event in an efficient way, caching metadata for future processing of similar events. Please check the [README](https://github.com/microsoft/ETW/tree/main/EtwEnumerator). Below is an example C# code that formats event to a JSON string in the [ETW callback function](https://learn.microsoft.com/en-us/windows/win32/api/evntrace/nc-evntrace-pevent_record_callback): ```cs EtwStringViewZ etwString; fixed (char* formatPtr = "[%9]%8.%3::%4 [%1]") { if (!ee->FormatCurrentEvent((ushort*)formatPtr, EtwJsonSuffixFlags.EtwJsonSuffixFlags_Default, &etwString)) { Trace.WriteLine("ERROR"); return; } } var s = new string((char*)etwString.Data, 0, (int)etwString.DataLength); writer.TryWrite(new MessageEvent(s)); ``` ### TraceProcessing [Documentation](https://learn.microsoft.com/en-us/windows/apps/trace-processing/) | [Code samples](https://github.com/microsoft/eventtracing-processing-samples) TraceProcessing library **categorized the events and splits them between Trace Processor**. Before processing the trace, we mark Trace Processors that we want to active, and we may query the events they processed after the analysis finishes, for example: ```cs using var trace = TraceProcessor.Create(traceFilePath); var pendingProcesses = trace.UseProcesses(); var pendingFileIO = trace.UseFileIOData(); trace.Process(); var filecopyProcess = pendingProcesses.Result.Processes.Where(p => p.ImageName == "filecopy.exe").First(); var fev = pendingFileIO.Result.CreateFileObjectActivity.First(f => f.IssuingProcess.Id == filecopyProcess.Id && f.FileName == "sampling-2-1.etl"); Console.WriteLine($"Create file event: {fev.Path} ({fev.FileObject})"); ``` The above code uses the buffered mode of opening a trace file, in which all processed events land in memory (we may notice that the application memory consumption will be really high for bigger traces). Therefore, for bigger traces we may also use [the streaming mode](https://learn.microsoft.com/en-us/windows/apps/trace-processing/streaming), but not all event types support it. An example session using streaming mode might be coded as follows: ```cs using var trace = TraceProcessor.Create(traceFilePath); var pendingProcesses = trace.UseProcesses(); int filecopyProcessId = 0; long eventCount = 0; long filecopyEventCount = 0; // ConsumerSchedule defines when our parser will be called, for example, we may choose // SecondPass when buffered processors will be available trace.UseStreaming().UseUnparsedEvents(ConsumerSchedule.Default, context => { eventCount++; }); trace.UseStreaming().UseUnparsedEvents(ConsumerSchedule.SecondPass, context => { if (filecopyProcessId == 0) { filecopyProcessId = pendingProcesses.Result.Processes.Where(p => p.ImageName == "filecopy.exe").First().Id; } if (context.Event.ProcessId == filecopyProcessId) { filecopyEventCount++; } }); trace.Process(); return (filecopyEventCount, eventCount); ``` In my tests, I discovered that **GenericEvents** processor is not very reliable as I could not find some of the events (for example, FileIo), visible in other tools, but maybe I was doing something wrong :) ### WPRContol WPRControl is the COM object used by, for example, wpr.exe. Its API is [well-documented](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/wprcontrol-api-reference), with `KernelTraceControl.h` and `WindowsPerformanceRecordedControl.h` headers and IDLs available for our usage. ### TraceEvent [Source code](https://github.com/microsoft/perfview/tree/main/src/TraceEvent) | [Documentation](https://github.com/microsoft/perfview/tree/main/documentation) TraceEvent is a huge library which is the tracing engine that PerfView uses for collecting and processing events. When iterating through collected events, remember to clone the events you need for future processing as the current `TraceEvent` instance is in-memory replaced by the next analyzed event. For example the `requestStartEvent` and `requestStopEvent` in the code below will contain invalid data at the end of the loop (we should be calling `ev.Clone()` to save the event): ```cs TraceEvent? requestStartEvent = null, requestStopEvent = null; foreach (var ev in traceLog.Events.Where(ev => ev.ProviderGuid == aspNetProviderId)) { if (ev.ActivityID == activityIdGuid) { if (ev.ID == (TraceEventID)2) // Request/Start { requestStartEvent = ev; } if (ev.ID == (TraceEventID)3) // Request/Stop { requestStopEvent = ev; } } } // requestStartEvent i requestStopEvent zawierają błędne dane, ponieważ obiekt, którego wewnętrznie używają ma nadpisane dane przez późniejsze eventy ``` If you are interested how TraceEvent library processes the ETW events, the good place to start is the `ETWTraceEventSource.RawDispatchClassic` event callback function. It uses `TraceEvent.Lookup` to create the final instance of the `TraceEvent` class. ### KrabsETW [Source code](https://github.com/microsoft/krabsetw) KrabsETW is used by the Office 365 Security team. An example code to start a live session looks as follows: ```cs using Microsoft.O365.Security.ETW; using Microsoft.O365.Security.ETW.Kernel; using var trace = new KernelTrace("krabsetw-lab"); var processProvider = new ProcessProvider(); processProvider.OnEvent += (record) => { if (record.Opcode == 0x01) { var image = record.GetAnsiString("ImageFileName", "Unknown"); var pid = record.GetUInt32("ProcessId", 0); Console.WriteLine($"{image} started with PID {pid}"); } }; trace.Enable(processProvider); Console.CancelKeyPress += (sender, ev) => { ev.Cancel = true; trace.Stop(); }; trace.Start(); ``` The KrabsETW is implemented in C++ CLI which complicates the deployment. Firstly, I needed to add `win-x64` to my csproj file to fix a problem with missing `Ijwhost.dll` library. However, it still produced errors when trimming and the application was failing: ```sh dotnet publish -c release -r win-x64 -p:PublishSingleFile=true -p:PublishTrimmed=true --self-contained -p:IncludeNativeLibrariesForSelfExtract=true # MSBuild version 17.6.8+c70978d4d for .NET # Determining projects to restore... # All projects are up-to-date for restore. # krabsetw-lab -> C:\code\krabsetw-lab\bin\release\net7.0-windows\win-x64\krabsetw-lab.dl # l # Optimizing assemblies for size. This process might take a while. # C:\Users\me\.nuget\packages\microsoft.o365.security.native.etw\4.3.1\lib\net6.0\Microsoft.O365.Security.Native.ETW.dll # : warning IL2104: Assembly 'Microsoft.O365.Security.Native.ETW' produced trim warnings. For more information see https: # //aka.ms/dotnet-illink/libraries [C:\code\krabsetw-lab\krabsetw-lab.csproj] # krabsetw-lab -> C:\code\krabsetw-lab\bin\release\net7.0-windows\win-x64\publish\ ``` ```sh krabsetw-lab.exe # Unhandled exception. System.BadImageFormatException: # File name: 'C:\code\krabsetw-lab\bin\release\net7.0-windows\win-x64\publish\Microsoft.O365.Security.Native.ETW.dll' # at Program.
$(String[] args) ``` When processing events, KrabsETW uses `schema_locator` to cache and decode payload of a given event: ```cpp struct schema_key { guid provider; uint16_t id; uint8_t opcode; uint8_t version; uint8_t level; // ... } inline const PTRACE_EVENT_INFO schema_locator::get_event_schema(const EVENT_RECORD &record) const { // check the cache auto key = schema_key(record); auto& buffer = cache_[key]; if (!buffer) { auto temp = get_event_schema_from_tdh(record); buffer.swap(temp); } return (PTRACE_EVENT_INFO)(buffer.get()); } ``` ### Performance Logs and Alerts (PLA) [Documentation](https://learn.microsoft.com/en-us/previous-versions/windows/desktop/pla/pla-portal) PLA is a COM library used by logman to provide trace collection options. The library registration can be located in the registry: ``` Computer\HKEY_CLASSES_ROOT\CLSID\{03837513-098B-11D8-9414-505054503030} ``` The main DLLs are **pla.dll** and **plasrv.exe**. For example, the `ITraceDataProviderCollection::GetTraceDataProvidersByProcess` method, responsible for querying providers in a process, calls `TraceSession::LoadGuidArray`, which then uses `EnumerateTraceGuidsEx`. ### System API [Documentation](https://learn.microsoft.com/en-us/windows/win32/api/_etw/) Low-level API to collect and analyze traces - all above libraries use these functions. {% endraw %} ================================================ FILE: guides/gdb.md ================================================ --- layout: page title: GDB usage guide date: 2025-05-27 08:00:00 +0200 --- {% raw %} **Table of contents:** - [Configuration](#configuration) - [.gdbinit](#gdbinit) - [ptrace capability](#ptrace-capability) - [TUI](#tui) - [Symbols](#symbols) - [Searching for symbols and addresses](#searching-for-symbols-and-addresses) - [Searching for source code](#searching-for-source-code) - [Debugging child processes](#debugging-child-processes) - [Execution Control](#execution-control) - [Process startup](#process-startup) - [Breakpoints and catchpoints](#breakpoints-and-catchpoints) - [Code execution](#code-execution) - [Signals](#signals) - [State Control](#state-control) - [Process information](#process-information) - [Threads](#threads) - [Shared libs](#shared-libs) - [Stack](#stack) - [Code and Assembler](#code-and-assembler) - [Memory](#memory) - [Expressions \(variables, registers, etc.\)](#expressions-variables-registers-etc) - [Extensions](#extensions) - [Python interpreter](#python-interpreter) - [GUI / CUI](#gui-cui) Configuration ------------- ### .gdbinit It's worth enabling the following elements permanently in the `~/.gdbinit` file: ```shell # show disassembly on every stop and use intel syntax set disassembly-flavor intel set disassemble-next-line on # enable debuginfod set debuginfod enable on # stop on forking and exceptions catch fork catch vfork catch throw catch rethrow ``` We may check the debuginfod settings in GDB: ``` (gdb) show debuginfod debuginfod enabled: Debuginfod functionality is currently set to "ask". debuginfod urls: Debuginfod URLs have not been set. debuginfod verbose: Debuginfod verbose output is set to 1. ``` ### ptrace capability To ptrace any process, you may add ptrace capability to gdb: ```shell sudo setcap cap_sys_ptrace=eip $(which gdb) ``` TUI --- This is a windowed interface for GDB. We enable/disable it using **Ctrl-x a**. **Ctrl-x 1** enables single-window mode, **Ctrl-x 2** enables two-window mode. The `tui layout` command determines what appears in the windows, e.g.: ```shell tui layout split tui layout src ``` **Ctrl-x o** allows us to switch between active debugger windows. Symbols ------- ### Searching for symbols and addresses `info address` finds the symbol associated with a given memory address, `info symbol` finds the address associated with a given symbol, for example: ```shell info address lo_getattr # Symbol "lo_getattr" is a function at address 0x555555556af0. info symbol 0x555555556af0 # lo_getattr in section .text of /tmp/passthrough-minimal/passthrough_ll ``` `info types` searches for type declarations (accepts regexes). For functions we have `info functions` e.g.: ```shell info functions statx # All functions matching regular expression "statx": # # File ../sysdeps/unix/sysv/linux/statx.c: # 25: int statx(int, const char *, int, unsigned int, struct statx *); # # File ./statx_generic.c: # 42: static int statx_generic(int, const char *, int, struct statx *, unsigned int); ``` `ptype` allows viewing the definition of a given type. As a parameter we can provide either the type name or a variable of that type, e.g.: ```shell ptype struct link_map # type = struct link_map { # Elf64_Addr l_addr; # char *l_name; # Elf64_Dyn *l_ld; # struct link_map *l_next; # struct link_map *l_prev; # struct link_map *l_real; # Lmid_t l_ns; # ... # } ``` `info scope` - shows symbols currently available in a given scope, e.g. for function `match_symbol`: ```shell info scope match_symbol # Scope for match_symbol: # Symbol digits is optimized out. # Symbol _itoa_word is a function at address 0x7ffff7fda11a, length 1. # Symbol value is multi-location: # Range 0x7ffff7fda21c-0x7ffff7fda220: a variable in $rcx # Range 0x7ffff7fda240-0x7ffff7fda26f: a variable in $rcx # Range 0x7ffff7fda26f-0x7ffff7fda275: a variable in $rdx # , length 8. # Symbol buflim is multi-location: # Range 0x7ffff7fda21c-0x7ffff7fda220: a complex DWARF expression: # 0: DW_OP_fbreg -109 # 3: DW_OP_stack_value # # Range 0x7ffff7fda220-0x7ffff7fda275: a variable in $rsi # , length 8. # Symbol base is multi-location: # Range 0x7ffff7fda21c-0x7ffff7fda275: the constant 10 # , length 4. # Symbol upper_case is multi-location: # Range 0x7ffff7fda21c-0x7ffff7fda275: the constant 0 # , length 4. # Symbol digits is optimized out. ``` ### Searching for source code The `directory` command allows us to add additional directories for the source code: ```shell show directories # Source directories searched: $cdir:$cwd directory /tmp/openssl-0.9.8-easy_tls-orig/ # Source directories searched: /tmp/openssl-0.9.8-easy_tls-orig:$cdir:$cwd ``` Debugging child processes ------------------------- `set detach-on-fork off` makes the debugger debug both the parent and its fork. If we don't enable this, we can decide what happens at fork using `set follow-fork-mode`. The `parent` option will cause the debugger to continue debugging the parent. The `child` option will switch to the child. At the moment of fork, it's possible that `continue` won't work. We then need to allow both parent and child to execute simultaneously using `set schedule-multiple on`. We can view currently debugged processes with the `info inferiors` command, and switch between them with `inferior {ID}`. Execution Control ----------------- ### Process startup GDB takes the debugged binary as an argument. Then we can add startup arguments with the `run` command. It also accepts stdin redirection from any file, e.g.: ```shell r mytestapp < /tmp/test_file ``` ### Breakpoints and catchpoints `b func_name` sets a breakpoint on a function, `b file:line_num` sets a breakpoint on a line. Additionally, we have special breakpoints called catchpoints for handling events (somewhat similar to `sxe` in WinDbg), e.g. `catch fork` to stop the debugger at fork: ```shell catch fork # Catchpoint 1 (fork) info breakpoints # Num Type Disp Enb Address What # 1 catchpoint keep y fork # 2 breakpoint keep y 0x00000000004044f0 in main at test.c:38 # breakpoint already hit 1 time # Catchpoint 1 (forked process 5473), arch_fork (ctid=0x7ffff7ca8690) at ../sysdeps/unix/sysv/linux/arch-fork.h:50 # 50 ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, ctid, 0); # => 0x00007ffff7db9b57 <__GI__Fork+39>: 48 3d 00 f0 ff ff cmp rax,0xfffffffffffff000 # 0x00007ffff7db9b5d <__GI__Fork+45>: 77 39 ja 0x7ffff7db9b98 <__GI__Fork+104> # set detach-on-fork off c # Continuing. # [New inferior 2 (process 5473)] # [Thread debugging using libthread_db enabled] # Using host libthread_db library "/usr/lib/libthread_db.so.1". ``` The `catch` command alone will display available events (similar to `sx` in WinDbg). `rb function_regex` allows setting breakpoints based on regular expressions: ```shell rb ssl_shim::wrapped_.* # Breakpoint 2 at 0x7ffff7f741cd: file src/lib.rs, line 308. # fn ssl_shim::wrapped_SSL_CTX_check_private_key(*mut ssl_shim::ssl::ssl_ctx_st) -> i32; # Breakpoint 3 at 0x7ffff7f74539: file src/lib.rs, line 406. # fn ssl_shim::wrapped_SSL_CTX_ctrl(*mut ssl_shim::ssl::ssl_ctx_st, i32, i64, *mut core::ffi::c_void) -> i64; # Breakpoint 4 at 0x7ffff7f73b4e: file src/lib.rs, line 124. # fn ssl_shim::wrapped_SSL_CTX_free(*mut ssl_shim::ssl::ssl_ctx_st); # Breakpoint 5 at 0x7ffff7f73f0d: file src/lib.rs, line 226. # fn ssl_shim::wrapped_SSL_CTX_get_client_CA_list(*mut ssl_shim::ssl::ssl_ctx_st) -> *mut ssl_shim::ssl::stack_st_X509_NAME; # ... ``` `info break` lists the breakpoints and catchpoints. `disable ID` disables the breakpoint, `enable ID` enables the breakpoint, `del ID` deletes the breakpoint. `commands ID` allows us to assign a command to a breakpoint: ```shell b easy-tls.c:991 info b # Num Type Disp Enb Address What # 1 breakpoint keep y easy_tls.c:991 commands 1 # Type commands for breakpoint(s) 1, one per line. # End with a line saying just "end". >print r >end ``` `watch {var}` - break if the value of the variable changes ### Code execution | Command | Description | |---------------------|-------------| | `r {args}` | (re)run the program | | `s` | step in | | `n` | step over | | `u` | until the next line (for example, to exit the loop) | | `c` | continue | | `ret {return_code}` | return from the current function | | `j {line}` | jump to a given line | `info registers` - current register state for the selected stack frame `info threads` - list the active threads `thread {num}` - switch focus to thread `{num}` `info inferiors` - list the debugged processes (when `detach-on-fork` is `off`) `inferior {num}` - switch focus to a process `{num}` ### Signals The debugger intercepts some signals (e.g. SIGINT) and handles them. To send such a signal to the application we can use the `signal` command, e.g. `signal SIGINT`. State Control ------------- ### Process information We can view currently debugged processes with the `info inferiors` command, and switch between them with `inferior {ID}`. `info proc` and its subcommands provide insight into the internals of the executing process, e.g.: ```shell info proc # process 10372 # cmdline = '/tmp/easy_tls_0_9_8o_stripped' # cwd = '/tmp' # exe = '/tmp/easy_tls_0_9_8o_stripped' ``` ### Threads `info threads` to list the threads, `thread ID` to switch to a thread. We can also execute a command on all the threads by using: `thread apply all`, for example `thread apply all bt`. ### Shared libs `info proc exe` shows information about the main module `info dll` - shows the status of loaded libraries ### Stack `bt` - shows the stack `f {num}` - selects a stack frame `{num}` as active `up` or `down` - moves up or down the stack ### Code and Assembler `list` shows the current location in sources. You can also list a function by passing its name as parameters. `disassemble /s` shows the assembly code of the current function along with source code, if available. You can provide start and end addresses of any location in memory as parameters. ### Memory `x` (examine) `mem read -tdouble -c10 arr` - read a count of 10 items of type double from an array `info proc mappings` lists memory regions occupied by the process. ### Expressions (variables, registers, etc.) `info local` - show local variables `info args` - show all arguments to the function `info vars` - show all local variables `print EXP` allows executing a given expression and saving the result in history under some variable, e.g.: ```shell print $rcx # $1 = 0 # print the first 10 elements of the array arr p *arr@10 ``` `output` works similarly but doesn't save the result in history and doesn't insert a newline character. In GDB, you may create custom variables with `set` for example, `set $t = my_var->t`. You may use the output variable of the command to reference it: ``` (gdb) p x $12 = (int) 2 ``` The `$` is for the last variable in the output. To print structures we may use GDB functions `display EXP` - display variable on each debugger break (can be called multiple times) `undisp {var}` - do not show the variable any longer Extensions ---------- ### Python interpreter The `python` command starts the Python interpreter, from where we can access the GDB interface through the gdb object, e.g.: ```py python print (gdb.breakpoints()) ``` ### GUI / CUI Interesting extensions: - [gef](https://github.com/hugsy/gef) - [nnd](https://github.com/al13n321/nnd) {% endraw %} ================================================ FILE: guides/linux-tracing.md ================================================ --- layout: page title: Linux Kernel Tracing (/sys/kernel/tracing) date: 2025-12-22 08:00:00 +0200 --- {% raw %} **Table of contents:** - [General information](#general-information) - [Collecting events](#collecting-events) - [Function tracing](#function-tracing) General information ------------------- If `/sys/kernel/tracing` is not available we may **mount it** with the following command: ```shell mount -t tracefs nodev /sys/kernel/tracing ``` Writing to the buffer (trace / trace_pipe) is enabled globally by writing `1` to the file `/sys/kernel/tracing/tracing_on` (default value). If we write `0`, traces are still set up, but the kernel stops writing to the buffer. This is like a pause. Collecting events ----------------- [Official documentation](https://docs.kernel.org/trace/events.html) The list of events is available in the `available_events` file. We enable the tracer by sending the name to the `set_event` file or by setting `1` in the `enabled` file for events in the events directory (e.g., enabled in events/ will enable all events): ```shell # events only echo nop > current_tracer # clear trace echo > trace # enable events echo 1 > /sys/kernel/tracing/events/sched/sched_process_exec/enable echo 1 > /sys/kernel/tracing/events/sched/sched_process_fork/enable # continuous reading or periodically cat /sys/kernel/tracing/trace cat /sys/kernel/tracing/trace_pipe # disable all events echo 0 > /sys/kernel/tracing/events/enable ``` Using the `trace_event=[event-list]` option in **boot options** we can enable very early tracing. We can **filter** events by fields using the filter file in the given event's directory (events/). Additionally, filtering by PIDs is possible through the `set_event_pid` file. To automatically **filter forks and remove PIDs of processes that have ended**, you can set the `event-fork` option: ```shell echo 1 > options/event-fork echo $$ # 3187 echo 3187 > set_event_pid # clear trace echo > trace # start tracing echo 1 > events/sched/enable bash # in bash # [me@testbox tmp]$ echo $$ # 7519 cat set_event_pid # 3187 # 7519 cat trace # disable tracing echo 0 > events/enable ``` Collected events can be found in `/sys/kernel/tracing/trace` (collection of recent events, for reading by us, new line clears it) or `/sys/kernel/tracing/trace_pipe` (event stream, events disappear after reading). Description of event fields can be found in the given event's directory, in the `format` file, e.g.: ```shell cat events/sched/sched_process_exec/format # name: sched_process_exec # ID: 322 # format: # field:unsigned short common_type; offset:0; size:2; signed:0; # field:unsigned char common_flags; offset:2; size:1; signed:0; # field:unsigned char common_preempt_count; offset:3; size:1; signed:0; # field:int common_pid; offset:4; size:4; signed:1; # # field:__data_loc char[] filename; offset:8; size:4; signed:0; # field:pid_t pid; offset:12; size:4; signed:1; # field:pid_t old_pid; offset:16; size:4; signed:1; # # print fmt: "filename=%s pid=%d old_pid=%d", __get_str(filename), REC->pid, REC->old_pid ``` Function tracing ---------------- [Official documentation](https://docs.kernel.org/trace/ftrace.html) Function tracing feature should be enabled by default and it is controlled using `kernel.ftrace` global switch. To enable it, run: ```sh sysctl kernel.ftrace_enabled=1 ``` **Events/function calls** can be collected either aggregated (less invasive) or sequentially. To enable statistics for (selected) kernel functions, we write `1` to `function_profile_enabled`. Statistics are collected for all functions listed in `available_filter_function`. We can filter these statistics by writing to `set_ftrace_filter` and `set_ftrace_notrace` (function) as well as `set_graph_function` and `set_graph_notrace` (function_graph). PIDs that interest us can be written to `set_ftrace_pid` or `set_ftrace_notrace_pid`. Call statistics can be found in `trace_stat/function`. Example trace: ```shell echo 2594 > set_ftrace_pid echo 1 > function_profile_enabled # collection time echo 0 > function_profile_enabled cat trace_stat/function* ``` To enable tracing of individual functions, we set the tracer to "function" (and possibly "function_graph") and read calls through `trace_pipe` or `trace`, as with events: ```shell # enabling echo 'tcp*' > set_ftrace_filter && echo function > current_tracer # collecting events from buffer cat trace > /tmp/tcp-trace.txt # disabling echo nop > current_tracer && echo > set_ftrace_filter ``` {% endraw %} ================================================ FILE: guides/network-tracing-tools.md ================================================ --- layout: page title: Network tracing tools date: 2024-01-01 08:00:00 +0200 redirect_from: - /guides/using-network-tracing-tools/ --- - [Testing connectivity](#testing-connectivity) - [Collecting network traces](#collecting-network-traces) - [pktmon \(Windows\)](#pktmon-windows) - [netsh \(Windows\)](#netsh-windows) - [tcpdump \(Linux\)](#tcpdump-linux) - [Measuring network latency](#measuring-network-latency) - [Measuring network bandwidth](#measuring-network-bandwidth) - [Logging HTTP\(S\) requests in a proxy](#logging-https-requests-in-a-proxy) ## Testing connectivity It is a common mistake to rely on ping when testing TCP connections. Ping uses a different protocol (ICMP) and although it is a fine tool to check if there is connectivity between two hosts (assuming ICMP traffic is not blocked), it will not tell us anything about opened TCP ports. On **Linux**, to check if there is anything listening on a TCP port 80 on a remote host, you may use **netcat**: ```shell nc -vnz 192.168.0.20 80 ``` On **Windows**, we may use the `Test-NetConnection` (`tnc`) cmdlet, for example: ```sh tnc example.com -Port 443 # ComputerName : example.com # RemoteAddress : 23.215.0.138 # RemotePort : 443 # InterfaceAlias : Ethernet # SourceAddress : 192.168.88.164 # TcpTestSucceeded : True ``` PsPing (a part of the [Sysinternals toolkit](https://technet.microsoft.com/en-us/sysinternals)) also has few interesting options when it comes to diagnosing network connectivity issues. The simplest usage is just a replacement for a ping.exe tool (performs ICMP ping): ```shell psping www.google.com ``` By adding a port number at the end of the host we will test a TCP handshake (or discover a closed port on the remote host): ```shell psping www.google.com:80 ``` To test UDP add **-u** option on the command line. ## Collecting network traces Probably the best tool to analyze network traffic is **[Wireshark](https://www.wireshark.org/)**. Of course, Wireshark may also collect network traffic. However, as it's a GUI application, you may have problems running it on servers. On Windows, Wireshark requires an npcap driver which might also generate problems. Therefore, a better choice might be to use command line tools that I discuss later in this ection. Another problem in network traces is that they lack the ID of the process owning the network connection. We might get this information with the help of other tracing tools. For example, in [this blog post](https://lowleveldesign.org/2018/05/11/correlate-pids-with-network-packets-in-wireshark/), I present how to use Process Monitor logs for this purpose. ### pktmon (Windows) Switching to the command line tools, starting with **Window 10 (Server 2019)**, we have a new network tracing tool in our arsenal: **pktmon**. It groups packets per components in the network stack, which is especially helpful when monitoring virtualized applications. Here are some usage examples: ```shell # List active components in the network stack pktmon component list # Create a filter for TCP traffic for the 172.29.235.111 IP and the 8080 port pktmon filter add -t tcp -i 172.29.235.111 -p 8080 # Show the configured filters pktmon filter list # Start the capturing session (-c) for all the components (--comp) pktmon start -c --comp all && timeout -1 && pktmon stop # Start the capture session (-c) for all NICs only (--comp), logging the entire # packets (--pkt-size 0), overwriting the older packets when the output file # reaches 512MB (-m circular -s 512) pktmon start -c --comp nics --pkt-size 0 -m circular -s 512 -f c:\network-trace.etl && timeout -1 && pktmon stop ``` We may later convert the etl file to open it in Wireshark: ```shell pktmon etl2pcap C:\network-trace.etl --out C:\network-trace.pcap ``` If the pcap file contains duplicate network packets, it is probably because same packets were logged by different network components. We can use the `--comp` parameter also in the `etl2pcap` subcommand to filter the packets, for example: ```shell pktmon etl2pcap C:\network-trace.etl --out C:\network-trace.pcap --comp 12 ``` If you don't know the component number, you may use the `etl2txt` subcommand to list events in text format with their component IDs, and then pick the right component. ### netsh (Windows) Netsh is another tool we could use for this purpose on Windows (even on **older Windows versions**). The **netsh trace {start\|stop}** command will create an ETW-based network trace, allowing us to choose from a variety of diagnostics scenarios: ``` > netsh trace show scenarios Available scenarios (18): ------------------------------------------------------------------- AddressAcquisition : Troubleshoot address acquisition-related issues DirectAccess : Troubleshoot DirectAccess related issues FileSharing : Troubleshoot common file and printer sharing problems InternetClient : Diagnose web connectivity issues InternetServer : Set of HTTP service counters L2SEC : Troubleshoot layer 2 authentication related issues LAN : Troubleshoot wired LAN related issues Layer2 : Troubleshoot layer 2 connectivity related issues MBN : Troubleshoot mobile broadband related issues NDIS : Troubleshoot network adapter related issues NetConnection : Troubleshoot issues with network connections P2P-Grouping : Troubleshoot Peer-to-Peer Grouping related issues P2P-PNRP : Troubleshoot Peer Name Resolution Protocol (PNRP) related issues RemoteAssistance : Troubleshoot Windows Remote Assistance related issues Virtualization : Troubleshoot network connectivity issues in virtualization environment WCN : Troubleshoot Windows Connect Now related issues WFP-IPsec : Troubleshoot Windows Filtering Platform and IPsec related issues WLAN : Troubleshoot wireless LAN related issues ``` *NOTE: For DHCP traces you may check netsh dhcpclient trace ... commands. Also LAN and WLAN modes have some tracing capabilities which you may enable with a command netsh (w)lan set tracing mode=yes and stop with a command netsh (w)lan set tracing mode=no* To know exactly which providers are enabled in each scenario use **netsh trace show scenario {scenarioname}**. After choosing the right scenario for your diagnosing case start the trace, for example: ```shell netsh trace start scenario=InternetClient capture=yes && timeout -1 && netsh trace stop ``` A new .etl file should be created in the output directory (as well as a .cab file with some interesting system logs). If you only need a trace file, you may add **report=no tracefile=d:\temp\net.etl** paramters. Some ETW providers do not generate information about the processes related to the specific events (for instance WFP provider) - keep this in mind when choosing your own set. Many interesting capture filters are available, you may use **netsh trace show CaptureFilterHelp** to list them. Most interesting include CaptureInterface, Protocol, Ethernet, IPv4, and IPv6 options set, for example: ```shell netsh trace start scenario=InternetClient capture=yes CaptureInterface="Local Area Connection 2" Protocol=TCP Ethernet.Type=IPv4 IPv4.Address=157.59.136.1 maxSize=250 fileMode=circular overwrite=yes traceFile=c:\temp\nettrace.etl ``` We can **convert the generated .etl file to .pcapng** with the [etl2pcapng](https://github.com/microsoft/etl2pcapng) tool, and open them in Wireshark. ### tcpdump (Linux) Most commonly used tool to collect network traces on Linux is **tcpdump**. The BPF language is quite complex and allows various filtering options. A great explanation of its syntax can be found [here](http://www.biot.com/capstats/bpf.html). Below, you may find example session configurations. ```shell # View traffic only between two hosts: tcpdump host 192.168.0.1 && host 192.168.0.2 # View traffic in a particular network: tcpdump net 192.168.0.1/24 # Dump traffic to a file and rotate it every 1KB: tcpdump -C 1024 -w test.pcap ``` ## Measuring network latency On **Windows**, we may use **psping**. We need to run it in a server mode on the connection target (-f for creating a temporary exception in the Windows Firewall, -s to enable server listening mode): ```shell psping -f -s 192.168.1.3:4000 ``` Then start the client and perform the test: ```shell psping -l 16k -n 100 192.168.1.3:4000 ``` ## Measuring network bandwidth **iperf** is a tool that can measure bandwidth on Windows and Linux. We need to start the iperf server (-s) (the -e option is to enable enhanced output and -l sets the TCP read buffer size): ```shell iperf -s -l 128k -p 8080 -e ``` Then, for an example test, we may run the client for 30s (-t) using two parallel threads (-P) and showing interval summaries every 2s (-i): ```shell iperf -c 172.30.102.167 -p 8080 -l 128k -P 2 -i 2 -t 30 ``` On **Windows**, we may alternatively use **psping**. Again, we need to run it in a server mode on the connection target (-f for creating a temporary exception in the Windows Firewall, -s to enable server listening mode): ```shell psping -f -s 192.168.1.3:4000 ``` Then start the client and perform the test: ```shell psping -b -l 16k -n 100 192.168.1.3:4000 ``` ## Logging HTTP(S) requests in a proxy If you are on Windows, use the system settings to change the system proxy. On Linux, set the **HTTP_PROXY** and **HTTPS_PROXY** variables, for example: ```bash export HTTP_PROXY="http://localhost:8080" export HTTPS_PROXY="http://localhost:8080" ``` When you make a request in code you should remember to configure its proxy according to the system settings, for exampe in C#: ```csharp var request = WebRequest.Create(url); request.Proxy = WebRequest.GetSystemWebProxy(); request.Method = "POST"; request.ContentType = "application/json; charset=utf-8"; ... ``` or in the configuration file: ```xml ``` Then run [Fiddler](http://www.telerik.com/fiddler) (or [Burp Suite](https://portswigger.net/burp/) or any other proxy) and requests data should appear in the sessions window. Unfortunately, this approach won't work for requests to applications served on the local server. A workaround is to use one of the Fiddler's localhost alternatives in the url: `ipv4.fiddler`, `ipv6.fiddler` or `localhost.fiddler` (more [here](http://docs.telerik.com/fiddler/Configure-Fiddler/Tasks/MonitorLocalTraffic)). **NOTE for WCF clients**: WCF has its own proxy settings, to use the default proxy add an `useDefaultWebProxy=true` attribute to your binding. If you want to trace HTTPS traffic you probably also need to **install the Root CA** of your proxy. On Windows, install the certificate to the Third-Party Root Certification Authorities. On Ubuntu Linux, run the following commands: ```bash sudo mkdir /usr/share/ca-certificates/extra sudo cp mitmproxy.crt /usr/share/ca-certificates/extra/mitmproxy.crt sudo dpkg-reconfigure ca-certificates ``` *NOTE for Python*: if there is Python code that you need to trace, use `export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt` to force Python to validate TLS certs with your system cert store. If you would like to apply custom modifications to the proxied requests, you should consider implementing your own network proxy. I present several C# examples of such proxies in [a blog post](https://lowleveldesign.wordpress.com/2020/02/03/writing-network-proxies-for-development-purposes-in-c/) on my blog. ================================================ FILE: guides/using-withdll-and-detours-to-trace-winapi.md ================================================ --- layout: page title: Using withdll and detours to trace Win API calls date: 2023-11-25 08:00:00 +0200 --- **Table of contents:** - [Introducing withdll](#introducing-withdll) - [Detours syelog library and log collector \(syelogd.exe\)](#detours-syelog-library-and-log-collector-syelogdexe) - [Detours sample libraries that log Win API functions calls](#detours-sample-libraries-that-log-win-api-functions-calls) - [Injecting libraries with withdll](#injecting-libraries-with-withdll) ## Introducing withdll The [Detours](https://github.com/microsoft/Detours) repository contains many interesting samples, some of which could be particularly useful in software troubleshooting. Inspired by one of those samples, named withdll, I created my clone of it in C# with some additional features. In this guide, I will present to you how you may use withdll with Detours samples to collect traces of Win API calls. ## Detours syelog library and log collector (syelogd.exe) Detours developers implemented a logging library, syelog, based on Windows named pipes. As you may see in the sltest example, it is straightforward to use. We may receive the logged messages with the syelogd application (also a Detours sample). Here is the result of running sltest and syelogd in separate console windows: ![](/assets/img/withdll-sltest-sylogd.png) Each syelog message has a timestamp, process ID, facility number, severity code, and the textual message. Syelogd prints them in separate columns in the output. The timestamp could be either absolute (as in the example output) or relative to the last received message if you use the /d option. Having covered the receiver, let us focus on the senders. ## Detours sample libraries that log Win API functions calls The Detours repository contains a few syelog-based tracers. The most thorough tracer is [**traceapi**](https://github.com/microsoft/Detours/tree/main/samples/traceapi). It hooks [a vast number of Win32 API functions](https://github.com/microsoft/Detours/blob/main/samples/traceapi/_win32.cpp). More tailored loggers include: - [**tracemem**](https://github.com/microsoft/Detours/tree/main/samples/tracemem) to trace heap allocations - [**tracereg**](https://github.com/microsoft/Detours/tree/main/samples/tracereg) to trace registry operations - [**tracetcp**](https://github.com/microsoft/Detours/tree/main/samples/tracetcp) to trace TCP connections - [**tracessl**](https://github.com/microsoft/Detours/tree/main/samples/tracessl) to trace plain text messages sent over TLS (it hooks EncryptMessage and DecryptMessage functions) And, if we are not satisfied with the examples provided, it is quite easy to create a custom tracer (you may start by adding new hooks to, for example, trcmem.cpp). The last step to start collecting Win API traces is to put the tracing libraries into the memory of the process that we want to analyze. And that is the place where withdll comes to the rescue. ## Injecting libraries with withdll The detours repository already contains a withdll sample that wraps the DetoursCreateProcessWithDlls function and allows you to start a new process with given DLLs injected. Unfortunately, it does not allow injecting DLLs into a running process. I decided to implement this feature in my version of withdll, and, to make it a bit more interesting, I reimplemented it in C#. Thanks to the excellent [win32metadata](https://github.com/microsoft/win32metadata) and [cswin32](https://github.com/microsoft/cswin32) projects, I could [easily generate C# bindings for structures and functions defined in the detours’ header](https://lowleveldesign.wordpress.com/2023/11/23/generating-c-bindings-for-native-windows-libraries/). You may download the compiled executable from the [release page](https://github.com/lowleveldesign/withdll/releases). I also added the detours sample tracers and syelogd.exe, so you may quickly run the first tracing session 😊. Withdll is a 64-bit application (compiled with NativeAOT and statically linked with the detours library) but supports both 32-bit and 64-bit targets. An example command line to inject a DLL into a running process with PID 1234 may look as follows: ``` withdll.exe -d trcapi32.dll 1234 ``` And to start, for example, winver.exe with injected traceapi libraries, you may run: ``` withdll.exe -d trcapi64.dll C:\Windows\System32\winver.exe withdll.exe -d trcapi32.dll C:\Windows\SysWow64\winver.exe ``` Please note that you may inject multiple DLLs at once. If you compile a library for 32-bit and 64-bit architectures, add a “bitness suffix” to its base name, and withdll will replace the suffix if the target process is 32-bit. For example, if we have trcapi32.dll and trcapi64.dll in the same folder and we run `withdll.exe -d trcapi64.dll C:\Windows\SysWow64\winver.exe`, winver.exe instance will have trcapi32.dll in its loaded module list. Finally, if you would like to **always inject a DLL into a given application**, you may use the Image File Execution Option registry key. However, to profit from this key, withdll must play the role of a debugger when launching the application. Therefore, when defining a Debugger value key, add an additional `--debug` switch to the withdll command, for example: ``` Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\winver.exe] "Debugger"="c:\\tools\\withdll.exe --debug -d c:\\tools\\trcapi64.dll" ``` I also recorded a short video presenting the usage of withdll with the traceapi sample library: [![Using detours and withdll to trace Win API calls](https://img.youtube.com/vi/q_iBojsF1sA/mqdefault.jpg)](https://www.youtube.com/watch?v=q_iBojsF1sA) ================================================ FILE: guides/windbg.md ================================================ --- layout: page title: WinDbg usage guide date: 2026-02-20 08:00:00 +0200 redirect_from: - /guides/using-ttd/ - /guides/using-windbg/ --- {% raw %} **Table of contents:** - [Installing WinDbg](#installing-windbg) - [WinDbgX \(WinDbgNext, formely WinDbg Preview\)](#windbgx-windbgnext-formely-windbg-preview) - [Classic WinDbg](#classic-windbg) - [Extensions](#extensions) - [Configuring WinDbg](#configuring-windbg) - [Referencing extensions and scripts for easy access](#referencing-extensions-and-scripts-for-easy-access) - [Installing WinDbg as the Windows AE debugger](#installing-windbg-as-the-windows-ae-debugger) - [Controlling the debugging session](#controlling-the-debugging-session) - [Enable local kernel-mode debugging](#enable-local-kernel-mode-debugging) - [Setup Windows Kernel Debugging over network](#setup-windows-kernel-debugging-over-network) - [Debugging system services \(local remote debugging\)](#debugging-system-services-local-remote-debugging) - [Getting information about the debugging session](#getting-information-about-the-debugging-session) - [Symbols and modules](#symbols-and-modules) - [Working with memory](#working-with-memory) - [General memory commands](#general-memory-commands) - [Stack](#stack) - [Variables](#variables) - [Strings](#strings) - [Fixed size arrays](#fixed-size-arrays) - [Analyzing exceptions and errors](#analyzing-exceptions-and-errors) - [Reading the exception record](#reading-the-exception-record) - [Find Windows Runtime Error message](#find-windows-runtime-error-message) - [Find the C++ exception object in the SEH exception record](#find-the-c-exception-object-in-the-seh-exception-record) - [Read the Last Windows Error value](#read-the-last-windows-error-value) - [Scanning the stack for native exception records](#scanning-the-stack-for-native-exception-records) - [Finding exception handlers](#finding-exception-handlers) - [Breaking on a specific exception event](#breaking-on-a-specific-exception-event) - [Breaking on a specific Windows Error](#breaking-on-a-specific-windows-error) - [Breaking on a function return](#breaking-on-a-function-return) - [Decoding error numbers](#decoding-error-numbers) - [Diagnosing dead-locks and hangs](#diagnosing-dead-locks-and-hangs) - [Listing threads call stacks](#listing-threads-call-stacks) - [Finding locks in memory dumps](#finding-locks-in-memory-dumps) - [System objects in the debugger](#system-objects-in-the-debugger) - [Processes \(kernel-mode\)](#processes-kernel-mode) - [Handles](#handles) - [Threads](#threads) - [Critical sections](#critical-sections) - [Controlling process execution](#controlling-process-execution) - [Controlling the target \(g, t, p\)](#controlling-the-target-g-t-p) - [Watch trace](#watch-trace) - [Breaking when a specific function is in the call stack](#breaking-when-a-specific-function-is-in-the-call-stack) - [Breaking on a specific function enter and leave](#breaking-on-a-specific-function-enter-and-leave) - [Breaking for all methods in the C++ object virtual table](#breaking-for-all-methods-in-the-c-object-virtual-table) - [Breaking when a user-mode process is created \(kernel-mode\)](#breaking-when-a-user-mode-process-is-created-kernel-mode) - [Setting a user-mode breakpoint in kernel-mode](#setting-a-user-mode-breakpoint-in-kernel-mode) - [Scripting the debugger](#scripting-the-debugger) - [Using meta-commands \(legacy way\)](#using-meta-commands-legacy-way) - [Using the dx command](#using-the-dx-command) - [Using variables and creating new objects in the dx query](#using-variables-and-creating-new-objects-in-the-dx-query) - [Using text files](#using-text-files) - [Example queries with explanations](#example-queries-with-explanations) - [Managed application support in the dx queries](#managed-application-support-in-the-dx-queries) - [Using the JavaScript engine](#using-the-javascript-engine) - [Loading a script](#loading-a-script) - [Running a script](#running-a-script) - [Working with types](#working-with-types) - [Accessing the debugger engine objects](#accessing-the-debugger-engine-objects) - [Evaluating expressions in a debugger context](#evaluating-expressions-in-a-debugger-context) - [Debugging a script](#debugging-a-script) - [Launching commands from a script file](#launching-commands-from-a-script-file) - [Time Travel Debugging \(TTD\)](#time-travel-debugging-ttd) - [Installation](#installation) - [Collection](#collection) - [Accessing TTD data](#accessing-ttd-data) - [Querying debugging events](#querying-debugging-events) - [Examining function calls](#examining-function-calls) - [Position in TTD trace](#position-in-ttd-trace) - [Examining memory access](#examining-memory-access) - [Misc tips](#misc-tips) - [Converting a memory dump from one format to another](#converting-a-memory-dump-from-one-format-to-another) - [Loading an arbitrary DLL into WinDbg for analysis](#loading-an-arbitrary-dll-into-windbg-for-analysis) - [Keyboard and mouse shortcuts](#keyboard-and-mouse-shortcuts) - [Running a command for all the processes](#running-a-command-for-all-the-processes) - [Attaching to multiple processes at once](#attaching-to-multiple-processes-at-once) - [Injecting a DLL into a process being debugged](#injecting-a-dll-into-a-process-being-debugged) - [Save and reopen formatted WinDbg output](#save-and-reopen-formatted-windbg-output) Installing WinDbg ----------------- There are two versions of WinDbg available nowadays. The modern one, called WinDbgX or WinDbg Preview, and the old one. The modern WinDbg has many interesting features (support for Time-Travel debugging is one of them), so that's the version you probably want to use if you're on a supported system. ### WinDbgX (WinDbgNext, formely WinDbg Preview) On modern systems download the [appinstaller](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/) file and choose Install in the context menu. If you are on Windows Server 2019 and you don't see the Install option in the context menu, there is a big chance you're missing the App Installer package on your system. In that case, you may download and run [this PowerShell script](/assets/other/windbg-install.ps1.txt) ([created by @Izybkr](https://github.com/microsoftfeedback/WinDbg-Feedback/issues/19#issuecomment-1513926394) with my minor updates to make it work with latest WinDbg releases). ### Classic WinDbg If you need to debug on an old system with no support for WinDbgX, you need to download Windows SDK and install the Debugging Tools for Windows feature. Executables will be in the Debuggers folder, for example, `c:\Program Files (x86)\Windows Kits\10\Debuggers`. ### Extensions Some problems may require actions that are challenging to achieve using the default WinDbg commands. One solution is to create a debugger script using the [legacy scripting language](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/command-tokens), the [dx command](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/dx--display-visualizer-variables-), or the [JavaScript Debugger](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/javascript-debugger-scripting). Another option is to search for an extension that may already have the desired feature implemented. Here's a list of extensions I use daily when troubleshooting user-mode issues: - [PDE](https://onedrive.live.com/?authkey=%21AJeSzeiu8SQ7T4w&id=DAE128BD454CF957%217152&cid=DAE128BD454CF957) by Andrew Richards - contains lots of useful commands (run `!pde.help` to learn more) - [lldext](https://github.com/lowleveldesign/lldext) - contains my utility commands and scripts - [comon](https://github.com/lowleveldesign/comon) - contains commands to help debug COM services - [MEX](https://www.microsoft.com/en-us/download/details.aspx?id=53304) - another extension with many helper commands (run `!mex.help` to list them) - [dotnet-sos](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-sos) - to debug .NET applications Additionally, you may also check the following repositories containing WinDbg scripts for various problems: - [TimMisiak/WinDbgCookbook](https://github.com/TimMisiak/WinDbgCookbook) - [hugsy/windbg_js_scripts](https://github.com/hugsy/windbg_js_scripts) - [0vercl0k/windbg-scripts](https://github.com/0vercl0k/windbg-scripts) - [yardenshafir/WinDbg_Scripts](https://github.com/yardenshafir/WinDbg_Scripts) Configuring WinDbg ------------------ ### Referencing extensions and scripts for easy access When we use the `.load` or `.scriptload` commands, WinDbg will search for extensions in the following folders: - `{install_folder}\{target_arch}\winxp` - `{install_folder}\{target_arch}\winext` - `{install_folder}\{target_arch}\winext\arcade` - `{install_folder}\{target_arch}\pri` - `{install_folder}\{target_arch}` - `%LOCALAPPDATA%\DBG\EngineExtensions32` or `%LOCALAPPDATA%\DBG\EngineExtensions` (only WinDbgX) - `%PATH%` where target_arch is either x86 or amd64. I usually include the directories containing the JavaScript scripts in the PATH since they are architecture-agnostic. As for the 32- and 64-bit DLLs, I store them in EngineExtensions32 and EngineExtensions folders, respectively. It is also possible to configure [extensions galleries](https://github.com/microsoft/WinDbg-Samples/tree/master/Manifest). Unfortunately, I didn't manage to make it work with my own extensions. ### Installing WinDbg as the Windows AE debugger The `windbgx -I` command registers WinDbg as the automatic system debugger - it will launch anytime an application crashes. The modified AeDebug registry keys: ``` HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Windows NT\CurrentVersion\AeDebug ``` However, we may also use configure those keys manually and use WinDbg to, for example, create a memory dump when the application crashes: ``` REG_SZ Debugger = "C:\Users\me\AppData\Local\Microsoft\WindowsApps\WinDbgX.exe" -c ".dump /ma /u C:\dumps\crash.dmp; qd" -p %ld -e %ld -g REG_SZ Auto = 1 ``` If you miss the -g option, WinDbg will inject a remote thread with a breakpoint instruction, which will hide our original exception. In such case, you might need to scan the stack to find the original exception record. Controlling the debugging session --------------------------------- ### Enable local kernel-mode debugging If you are a software developer, you may not have much experience with kernel debugging. But it can be very useful to know how to inspect kernel objects in some cases. For instance, you can troubleshoot thread waits in kernel-mode more effectively and find out the causes of dead-locks or hangs faster. To do full kernel debugging (so to control the kernel code execution) you need another Windows machine. But if you just want to analyse the kernel internal memory, you can enable local kernel debugging on your own machine. This is how you do it: ```shell bcdedit /debug on ``` After a restart, you should be able to attach to your local kernel from WinDbg. Another option is to use [LiveKd](https://learn.microsoft.com/en-us/sysinternals/downloads/livekd) which creates a snaphost of the kernel memory and attaches a debugger to it. It is also capable of creating a kernel memory dump for later analysis. An example command to create such a dump looks as follows: ```shell livekd -accepteula -b -vsym -k "c:\Program Files (x86)\Windows Kits\10\Debuggers\x64\kd.exe" -o c:\tmp\kernel.dmp ``` **You don't need to boot the system in debugging mode to use livekd.** So it is safe to use even in production environments. ### Setup Windows Kernel Debugging over network Turn on network debugging (HOSTIP is the address of the machine on which we will run the debugger): ```sh bcdedit /dbgsettings NET HOSTIP:192.168.0.2 PORT:60000 # Key=3ma3qyz02ptls.23uxbvnd0e2zh.1gnwiqb6v3mpb.mjltos9cf63x bcdedit /debug {current} on # The operation completed successfully. ``` Then on the host machine, run windbg, select **Attach to kernel** and fill the port and key textboxes. When debugging a **Hyper-V Gen 2 VM** remember to turn off the secure booting: ```sh Set-VMFirmware -VMName "Windows 2012 R2" -EnableSecureBoot Off -Confirm ``` If you are hosting your guest on **QEMU KVM** and want to use network debugging, you need to either create your VM as a Generic one (not Windows) or update the VM configuration XML, changing the vendor_id under the hyperv node, for example: ```xml win2k19 ``` I highly recommend checking [this post by the OSR team](https://www.osr.com/blog/2021/10/05/using-windbg-over-kdnet-on-qemu-kvm/) describing why those changes are required and revealing some details about the kdnet inner working. ### Debugging system services (local remote debugging) Attaching a debugger to a Windows service running in session 0 should not be a problem, assuming you have the SeDebugPrivilege and can access the service. However, debugging the service startup process can be challenging. I typically use a WinDbg remote session over named pipes along with the Image File Execution Options registry key. The approach involves starting the service under a debugger (using the -server option), both running in session 0, and then connecting to the debugger server from a local debugger instance. Here is an example registry configuration for a testservice.exe: ``` Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\myapp.exe] "Debugger"="windbgx.exe -Q -server npipe:pipe=svcpipe" ``` When the testservice starts, the debugger server will wait for the client to connect. You may start the client with the following command: ```sh windbgx -remote "npipe:pipe=svcpipe,server=localhost" ``` If the Windows Service Manager stops the service before you manage to connect to it, you may need to adjust the service start timeout. For example, to set it to 3 minutes (180000 ms), use the following registry configuration: ``` Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control] "ServicesPipeTimeout"=dword:0002bf20 ``` To terminate the entire session and exit the debugging server, use the `q` command. To exit from one debugging client without terminating the server, you must issue a command from that specific client. If this client is KD or CDB, use the CTRL+B key to exit. If you are using a script to run KD or CDB, use `.remote_exit`. ### Getting information about the debugging session The `|` command displays a path to the process image. You may run `vercommand` to check how the debugger was launched. The `vertarget` command shows the OS version, the process lifetime, and more, for example, the dump time when debugging a memory dump. The `.time` command displays information about the system time variable (session time). `.lastevent` shows the last reason why the debugger stopped and `.eventlog` displays the recent events. Symbols and modules ------------------- The `lm` command lists all modules with symbol load info. To examine a specific module, use `lmvm {module-name}`. To find out if a given address belongs to any of the loaded dlls you may use the `!dlls -c {addr}` command. Another way would be to use the `lma {addr}` command. The `.sympath` command shows the symbol search path and allows its modification. Use `.reload /f {module-name}` to reload symbols for a given module. The `x {module-name}!{function}` command resolves a function address, and `ln {address}` finds the nearest symbol. In WinDbgX, we may also list and filter modules with the `@$curprocess.Modules` property. Some usage examples: ```shell # Display information about the win32u module dx @$curprocess.Modules["win32u.dll"] # Show public exports of the win32u module dx @$curprocess.Modules["win32u.dll"].Contents.Exports # List modules with information if they have combase.dll as a direct import dx -g @$curprocess.Modules.Select(m => new { Name = m.Name, HasCombase = m.Contents.Imports.Any(i => i.ModuleName == "combase.dll") }) ``` **When we don't have access to the symbol server**, we may create a list of required symbols with `symchk.exe` (part of the Debugging Tools for Windows installation) and download them later on a different host. First, we need to prepare the manifest, for example: ```shell symchk /id test.dmp /om test.dmp.sym /s C:\non-existing ``` Then copy it to the machine with the symbol server access, and download the required symbols, for example: ```shell symchk /im test.dmp.sym /s SRV*C:\symbols*https://msdl.microsoft.com/download/symbols ``` If **you want to add a PDB file (or files) to an existing symbol store**, you may use the `symstore add` command, for example: ```sh # /r – recursive, /o – verbose output, /f – a path to files to index (@ symbol before the name denotes a file which contains the list of files) # /s – root directory of the store, /t – product name, /v – version, /c – comment symstore add /r /o /f C:\src\myapp\bin /s \\symsrv\symbols\ /t myapp /v '1.0.1' /c 'rel 1.0.1' ``` Working with memory ------------------- ### General memory commands The `!address` command shows information about a specific region of memory, for example: ```shell !address 0x00fd7df8 # Usage: Image # Base Address: 00fd6000 # End Address: 00fdc000 # Region Size: 00006000 ( 24.000 kB) # State: 00001000 MEM_COMMIT # Protect: 00000002 PAGE_READONLY # Type: 01000000 MEM_IMAGE # Allocation Base: 00fb0000 # Allocation Protect: 00000080 PAGE_EXECUTE_WRITECOPY # ... ``` Additionally, it can display regions of memory of specific type, for example: ```shell !address -f:FileMap # BaseAddr EndAddr+1 RgnSize Type State Protect Usage # ----------------------------------------------------------------------------------------------- # 9a0000 9b0000 10000 MEM_MAPPED MEM_COMMIT PAGE_READWRITE MappedFile "PageFile" # 9b0000 9b1000 1000 MEM_MAPPED MEM_COMMIT PAGE_READONLY MappedFile "PageFile" !address -f:MEM_MAPPED # BaseAddr EndAddr+1 RgnSize Type State Protect Usage # ----------------------------------------------------------------------------------------------- # 9a0000 9b0000 10000 MEM_MAPPED MEM_COMMIT PAGE_READWRITE MappedFile "PageFile" # 9b0000 9b1000 1000 MEM_MAPPED MEM_COMMIT PAGE_READONLY MappedFile "PageFile" ``` ### Stack Stack grows from high to low addresses. Thus, when you see addresses bigger than the frame base (such as ebp+C) they usually refer to the function arguments. Smaller addresses (such as ebp-20) usually refer to local function variables. To display stack frames use the `k` command. The `kP` command will additionally print function arguments if private symbols are available. The `kbM` command outputs stack frames with first three parameters passed on the stack (those will be first three parameters of the function in x86). When there are many threads running in a process it's common that some of them have the same call stacks. To better organize call stacks we can use the `!uniqstack` command. Adding -b parameter adds first three parameters to the output, -v displays all parameters (but requires private symbols). To switch a local context to a different stack frame we can use the `.frame` command: ```shell .frame [/c] [/r] [FrameNumber] .frame [/c] [/r] = BasePtr [FrameIncrement] .frame [/c] [/r] = BasePtr StackPtr InstructionPtr ``` The `!for_each_frame` extension command enables you to execute a single command repeatedly, once for each frame in the stack. In WinDbgX, we may access the call stack frames using `@$curstack.Frames`, for example: ```shell dx @$curstack.Frames # @$curstack.Frames # [0x0] : ntdll!LdrpDoDebuggerBreak + 0x30 [Switch To] # [0x1] : ntdll!LdrpInitializeProcess + 0x1cfa [Switch To] dx @$curstack.Frames[0].Attributes # InstructionOffset : 0x7ffa1102b784 # ReturnOffset : 0x7ffa1102e9d6 # FrameOffset : 0xea5055f370 # StackOffset : 0xea5055f340 # FuncTableEntry : 0x0 # Virtual : 1 # FrameNumber : 0x0 # SourceInformation ``` ### Variables When you have private symbols you may list local variables with the `dv` command. Additionally the `dt` command allows you to work with type symbols. You may either list them, eg.: `dt notepad!g_*` or dump a data address using a given type format, eg.: `dt nt!_PEB 0x13123`. The `dx` command allows you to dump local variables or read them from any place in the memory. It uses a navigation expressions just like Visual Studio (you may define your own file .natvis files). You load the interesting .natvis file with the `.nvload` command. `#FIELD_OFFSET(Type, Field)` is an interesting operator which returns the offset of the field in the type, eg. `? #FIELD_OFFSET(_PEB, ImageSubsystemMajorVersion)`. ### Strings The `!du` command from the [PDE extension](https://onedrive.live.com/redir?resid=DAE128BD454CF957!7152&authkey=!AJeSzeiu8SQ7T4w&ithint=folder%2czip) shows strings up to 4GB (the default du command stops when it hits the range limit). The PDE extension also contains the `!ssz` command to look for zero-terminated (either unicode or ascii) strings. To change a text in memory use `!ezu`, for example: `ezu "test string"`. The extension works on committed memory. Another interesting command is `!grep`, which allows you to filter the output of other commands: `!grep _NT !peb`. ### Fixed size arrays Printing an array of a specific size with dx might be tricky. The code below shows two ways of printing a fixed-size char array: ```sh dx (*((char (*)[16])0x31aa5526)),c # (*((jvm!char (*)[16])0x31aa5526)),c [Type: char [16]] # [0] : 106 'j' [Type: char] # ... # [15] : 116 't' [Type: char] dx ((char*)0x31aa5526),16c # ((char*)0x31aa5526),16c : 0x31aa5526 [Type: char *] # [0] : 106 'j' [Type: char] # ... # [15] : 116 't' [Type: char] ``` Altenatively, we could use `db 0x31aa5526 L10`. Analyzing exceptions and errors ------------------------------- ### Reading the exception record The `.ecxr` debugger command instructs the debugger to restore the thread context to its state when the initial fault happened. When dispatching a SEH exception, the OS builds an internal structure called an `exception record`. It also conveniently saves the thread context at the time of the initial fault in a context record structure. ```cpp typedef struct _EXCEPTION_RECORD { DWORD ExceptionCode; DWORD ExceptionFlags; struct _EXCEPTION_RECORD *ExceptionRecord; PVOID ExceptionAddress; DWORD NumberParameters; ULONG_PTR ExceptionInformation[EXCEPTION_MAXIMUM_PARAMETERS]; } EXCEPTION_RECORD; ``` `.lastevent` will also show you information about the last error that occured (if the debugger stopped because of an error). You may then examine the exception record using the `.exr` command, for example: ```sh .lastevent # Last event: 15ae8.133b4: CLR exception - code e0434f4d (first/second chance not available) # debugger time: Thu Jul 30 19:23:53.169 2015 (UTC + 2:00) .exr -1 # ExceptionAddress: 000007fe9b17f963 # ExceptionCode: e0434f4d (CLR exception) # ExceptionFlags: 00000000 # NumberParameters: 0 ``` If we look at the raw memory, we will find that .exr changes the order of the EXCEPTION_RECORD fields, for example: ```sh .exr 0430af24 # ExceptionAddress: abe8f04d # ExceptionCode: c0000005 (Access violation) # ExceptionFlags: 00000000 # NumberParameters: 2 # Parameter[0]: 00000000 # Parameter[1]: abe8f04d ``` ``` 0430af24 c0000005 <- exception code 0430af28 00000000 0430af2c 00000000 0430af30 abe8f04d <- exception address (code address) 0430af34 00000002 <- parameters number 0430af38 00000000 0430af3c abe8f04d ``` ### Find Windows Runtime Error message If you need to diagnose Windows Runtime Error for example: `(2f88.3358): Windows Runtime Originate Error - code 40080201 (first chance)`, you may enable first chance notification for this error: `sxe 40080201`. When stopped, retrieve the exception context, and the third parameter should contain an error message: ```sh .exr -1 # ExceptionAddress: 77942822 (KERNELBASE!RaiseException+0x00000062) # ExceptionCode: 40080201 (Windows Runtime Originate Error) # ExceptionFlags: 00000000 # NumberParameters: 3 # Parameter[0]: 80040155 # Parameter[1]: 00000052 # Parameter[2]: 0dddf680 du 0dddf680 # 0dddf680 "Failed to find proxy registratio" # 0dddf6c0 "n for IID: {xxxxxxxx-xxxx-xxxx-x" # 0dddf700 "xxx-xxxxxxxxxxxx}." ``` We may automate this step by using the `$exr_param2` pseudo-register: ```sh sxe -c "du @$exr_param2 L40; g" 40080201 ``` ### Find the C++ exception object in the SEH exception record *(Tested on MSVC140)* If it's the first chance exception, we can find the exception record at the top of the stack: ```sh dps @esp # 00f3fb28 7657ec52 KERNELBASE!RaiseException+0x62 # 00f3fb2c 00f3fb30 # 00f3fb30 e06d7363 # 00f3fb34 00000001 # 00f3fb38 00000000 # 00f3fb3c 7657ebf0 KERNELBASE!RaiseException # 00f3fb40 00000003 # 00f3fb44 19930520 # 00f3fb48 00f3fbd8 # 00f3fb4c 009ab96c exceptions!_TI3?AVinvalid_argumentstd ``` With dx and the `MSVCP140D!EHExceptionRecord` symbol (without this symbol, we need to get the value from `.exr -1`), we may decode the exception record parameters: ```sh dx -r2 (MSVCP140D!EHExceptionRecord*)0x00f3fb30 # (MSVCP140D!EHExceptionRecord*)0x00f3fb30 : 0xf3fb30 [Type: EHExceptionRecord *] # [+0x000] ExceptionCode : 0xe06d7363 [Type: unsigned long] # [+0x004] ExceptionFlags : 0x1 [Type: unsigned long] # [+0x008] ExceptionRecord : 0x0 [Type: _EXCEPTION_RECORD *] # [+0x00c] ExceptionAddress : 0x7657ebf0 [Type: void *] # [+0x010] NumberParameters : 0x3 [Type: unsigned long] # [+0x014] params [Type: EHExceptionRecord::EHParameters] # [+0x000] magicNumber : 0x19930520 [Type: unsigned long] # [+0x004] pExceptionObject : 0xf3fbd8 [Type: void *] # [+0x008] pThrowInfo : 0x9ab96c [Type: _s_ThrowInfo *] ``` As you can see, the second parameter points to the C++ exception object. If we know its type, we may dump its properties, for example: ```sh dx (exceptions!std::invalid_argument*)0x00f3fbd8 # [+0x004] _Data : __std_exception_data dx -r1 (*((exceptions!__std_exception_data *)0xf3fbdc)) # (*((exceptions!__std_exception_data *)0xf3fbdc)) [Type: __std_exception_data] # [+0x000] _What : 0x1449748 : "arg1" [Type: char *] # [+0x004] _DoFree : true [Type: bool] ``` ### Read the Last Windows Error value To get the last error value for the current thread we may use the `!gle` or `!teb` command. `!gle` has an additional -all parameter which shows the last errors for all the threads: ```sh !gle -all # Last error for thread 0: # LastErrorValue: (Win32) 0 (0) - The operation completed successfully. # LastStatusValue: (NTSTATUS) 0xc0000034 - Object Name not found. # # Last error for thread 1: # LastErrorValue: (Win32) 0 (0) - The operation completed successfully. # LastStatusValue: (NTSTATUS) 0 - STATUS_SUCCESS ``` ### Scanning the stack for native exception records Sometimes, when the memory dump was incorrectly collected, we may not see the exception information and the `.exr -1` does not work. When this happens, there is still a chance that the original exception is somewhere in the stack. Using the `.foreach` command, we may scan the stack and try all the addresses to see if any of them is a valid exception record. For example: ```sh .foreach /ps1 ($addr { dp /c1 @$csp L100 }) { .echo $addr; .exr $addr } # 0430af24 # ExceptionAddress: abe8f04d # ExceptionCode: c0000005 (Access violation) # ExceptionFlags: 00000000 # NumberParameters: 2 # Parameter[0]: 00000000 # Parameter[1]: abe8f04d ``` ### Finding exception handlers To list exception handlers for the currently running method use `!exchain` command. Managed exception handlers can be listed using the `!EHInfo` command from the SOS extenaion. I present how to use this command to list ASP.NET MVC exception handlers [on my blog](https://lowleveldesign.wordpress.com/2013/04/26/life-of-exception-in-asp-net/). In 32-bit application, pointer to the exception handler is kept in `fs:[0]`. The prolog for a method with exception handling has the following structure: ``` mov eax,fs:[00000000] push eax mov fs:[00000000],esp ``` An Example session of retrieving the exception handler: ```sh dd /c1 fs:[0]-8 L10 # 0053:fffffff8 00000000 # 0053:fffffffc 00000000 # 0053:00000000 0072ef74 <-- this is our first exception pointer to a handler # 0053:00000004 00730000 # 0053:00000008 0072c000 dd /c1 0072ef74-8 L10 # 0072ef6c 0072eefc # 0072ef70 74275582 # 0072ef74 0072f04c <-- previous handler # 0072ef78 744048b9 <-- handler address # 0072ef7c 2778008f # 0072ef80 00000000 # 0072ef84 0072f058 # 0072ef88 744064f9 ``` In 64-bit applications, information about exception handlers is stored in the PE file. We can list them using, for example, the `dumpbin /unwindinfo` command. ### Breaking on a specific exception event The `sx-` commands define how WinDbg handles exception events that happen in the process lifetime. For example, to stop the debugger when a C++ exception is thrown (1st change exception) we would use the `sxe eh` command. If we only need information that an exception occurred, we could use the `sxn eh` command. Additionally, the -c parameter gives us a possibility to run our custom command on error: ```sh sxe -c ".lastevent;!pe;!clrstack;g" clr ``` ### Breaking on a specific Windows Error There is a special global variable in ntdll, `g_dwLastErrorToBreakOn`, that you may set to cause a break whenever a given last error code is set by the application. For example, to break the application execution whenever it reports the `0x4cf` (ERROR_NETWORK_UNREACHABLE) error, run: ```sh ed ntdll!g_dwLastErrorToBreakOn 0x4cf ``` You may find the list of errors in [the Windows documentation](https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes). ### Breaking on a function return If we want to break when a function finishes, for example, to analyze its result, we may use a nested one-time breakpoint on the function return address, for example: ```sh bp kernelbase!CreateFileW "bp /1 $ra \"r @rax\"; g" ``` ### Decoding error numbers If you receive an error message with a cryptic error number like this, for example: *Compiler Error Message: The compiler failed with error code -1073741502*, you may use the `!error` command: ```sh !error c0000142 # Error code: (NTSTATUS) 0xc0000142 (3221225794) - {DLL Initialization Failed} Initialization of the dynamic link library %hs failed. The process is terminating abnormally. ``` Even more error codes and error messages are contained in the `!pde.err` command from the PDE extension. If you need to convert HRESULT to Windows Error, the following pseudo-code might help: ```cpp a = hresult & 0x1FF0000 if (a == 0x70000) { winerror = hresult & 0xFFFF } else { winerror = hresult } ``` Converting Windows Error to HRESULT is straightforward: `hresult = 0x80070000 | winerror`. A great **command line tool** for decoding error number is [err.exe or Error Code Look-up](https://www.microsoft.com/en-us/download/details.aspx?id=985). It looks for the specific value in Windows headers, additionally performing the convertion to hex, for example: ```sh err -1073741502 # for decimal -1073741502 / hex 0xc0000142 : # STATUS_DLL_INIT_FAILED ntstatus.h # {DLL Initialization Failed} # Initialization of the dynamic link library %hs failed. The # process is terminating abnormally. # ... ``` There is also a subcommand in the Windows built-in `net` command to decode Windows error numbers (and only error numbers), for example: ```sh net helpmsg 2 # The system cannot find the file specified. ``` Diagnosing dead-locks and hangs ------------------------------- We usually start the analysis by looking at the threads running in a process. The call stacks help us identify blocked threads. We can use TTD, thread-time trace, or memory dumps to learn about what threads are doing. In the follow-up sections, I will describe how to find lock objects and relations between threads in memory dumps. ### Listing threads call stacks To list native stacks for all the threads run: `~*k` or `!uniqstacks`. ### Finding locks in memory dumps There are many types of objects that the thread can wait on. You usually see WaitOnMultipleObjects on many threads. If you see `RtlWaitForCriticalSection` it might indicate that the thread is waiting on a critical section`. Its adress should be in the call stack. And we may list the critical sections using the `!cs` command. With the -s option, we may examine details of a specific critical section: ```sh !cs -s 000000001a496f50 # ----------------------------------------- # Critical section = 0x000000001a496f50 (+0x1A496F50) # DebugInfo = 0x0000000013c9bee0 # LOCKED # LockCount = 0x0 # WaiterWoken = No # OwningThread = 0x0000000000001b04 # RecursionCount = 0x1 # LockSemaphore = 0x0 # SpinCount = 0x00000000020007d0 ``` LockCount tells you how many threads are currently waiting on a given cs. The OwningThread is a thread that owns the cs at the time the command is run. You can easily identify the thread that is waiting on a given cs by issuing kv command and looking for critical section identifier in the call parameters. We can also look for **synchronization object handles** using the `!handle` command. For example, we may list all the Mutant objects in a process by using the `!handle 0 f Mutant` command. System objects in the debugger ------------------------------ The `!object` command displays some basic information about a kernel object: ```sh !object ffffc30162f26080 # Object: ffffc30162f26080 Type: (ffffc30161891d20) Process # ObjectHeader: ffffc30162f26050 (new version) # HandleCount: 23 PointerCount: 582900 ``` We may then analyze the object header to learn some more details about the object, for example: ```sh dx (nt!_OBJECT_HEADER *)0xffffc30162f26050 # (nt!_OBJECT_HEADER *)0xffffc30162f26050 : 0xffffc30162f26050 [Type: _OBJECT_HEADER *] # [+0x000] PointerCount : 582900 [Type: __int64] # [+0x008] HandleCount : 23 [Type: __int64] # [+0x008] NextToFree : 0x17 [Type: void *] # [+0x010] Lock [Type: _EX_PUSH_LOCK] # [+0x018] TypeIndex : 0x5 [Type: unsigned char] # [+0x019] TraceFlags : 0x0 [Type: unsigned char] # [+0x019 ( 0: 0)] DbgRefTrace : 0x0 [Type: unsigned char] # [+0x019 ( 1: 1)] DbgTracePermanent : 0x0 [Type: unsigned char] # [+0x01a] InfoMask : 0x88 [Type: unsigned char] # [+0x01b] Flags : 0x0 [Type: unsigned char] # [+0x01b ( 0: 0)] NewObject : 0x0 [Type: unsigned char] # [+0x01b ( 1: 1)] KernelObject : 0x0 [Type: unsigned char] # [+0x01b ( 2: 2)] KernelOnlyAccess : 0x0 [Type: unsigned char] # [+0x01b ( 3: 3)] ExclusiveObject : 0x0 [Type: unsigned char] # [+0x01b ( 4: 4)] PermanentObject : 0x0 [Type: unsigned char] # [+0x01b ( 5: 5)] DefaultSecurityQuota : 0x0 [Type: unsigned char] # [+0x01b ( 6: 6)] SingleHandleEntry : 0x0 [Type: unsigned char] # [+0x01b ( 7: 7)] DeletedInline : 0x0 [Type: unsigned char] # [+0x01c] Reserved : 0x62005c [Type: unsigned long] # [+0x020] ObjectCreateInfo : 0xffffc301671872c0 [Type: _OBJECT_CREATE_INFORMATION *] # [+0x020] QuotaBlockCharged : 0xffffc301671872c0 [Type: void *] # [+0x028] SecurityDescriptor : 0xffffd689feeef0ea [Type: void *] # [+0x030] Body [Type: _QUAD] # ObjectType : Process # UnderlyingObject [Type: _EPROCESS] dx -r1 (*((ntkrnlmp!_EPROCESS *)0xffffc30162f26080)) # (*((ntkrnlmp!_EPROCESS *)0xffffc30162f26080)) [Type: _EPROCESS] # [+0x000] Pcb [Type: _KPROCESS] # [+0x438] ProcessLock [Type: _EX_PUSH_LOCK] # [+0x440] UniqueProcessId : 0x1488 [Type: void *] # [+0x448] ActiveProcessLinks [Type: _LIST_ENTRY] # [+0x458] RundownProtect [Type: _EX_RUNDOWN_REF] # [+0x460] Flags2 : 0x200d014 [Type: unsigned long] # [+0x460 ( 0: 0)] JobNotReallyActive : 0x0 [Type: unsigned long] # [+0x460 ( 1: 1)] AccountingFolded : 0x0 [Type: unsigned long] # [+0x460 ( 2: 2)] NewProcessReported : 0x1 [Type: unsigned long] # ... ``` ### Processes (kernel-mode) Each time you break into the kernel-mode debugger, one of the processes will be active. You may learn which one by running the `!process -1 0` command. If you are going to work with user-mode memory space you need to reload the process modules symbols (otherwise you will see symbols from the last reload). You may do so while switching process context with `.process /i` (/i means invasive debugging and allows you to control the process from the kernel debugger) or `.process /r /p` (/r reloads user-mode symbols after the process context has been set (the behavior is the same as `.reload /user`), /p translates all transition page table entries (PTEs) for this process to physical addresses before access). `!peb` shows loaded modules, environment variables, command line arg, and more. The `!process 0 0 {image}` command finds a proces using its image name, e.g.: `!process 0 0 LINQPad.UserQuery.exe`. When we know the process ID, we may use `!process {PID | address} 0x7` (the 0x7 flag will list all the threads with their stacks). ### Handles There is a special debugger extension command `!handle` that allows you to find system handles reserved by a process. To list all handles reserved by a process use -1 (in kernel mode) or 0 (in user-mode). You may filter the list by setting a type of a handle: ```shell !handle 0 1 File # ... # Handle 1c0 # Type File # 7 handles of type File ``` ### Threads The `!thread {addr}` command shows details about a specific thread. Each thread has its own register values. These values are stored in the CPU registers when the thread is executing and are stored in memory when another thread is executing. You can set the register context using .thread command: ``` .thread [/p [/r] ] [/P] [/w] [Thread] ``` or ``` .trap [Address] .cxr [Options] [Address] ``` For **WOW64 processes**, the /w parameter (`.thread /w`) will additionally switch to the x86 context. After loading the thread context, the commands opearating on stack should start working (remember to be in the right process context). **To list all threads** in a current process use the `~*`command (user-mode). Dot (.) in the first column signals a currently selected thread and hash (#) points to a thread on which an exception occurred. `!runaway` shows the time consumed by each thread: ```shell !runaway 7 # User Mode Time # Thread Time # 0:bfc 0 days 0:00:00.031 # 3:10c 0 days 0:00:00.000 # 2:844 0 days 0:00:00.000 # 1:15bc 0 days 0:00:00.000 # Kernel Mode Time # Thread Time # 0:bfc 0 days 0:00:00.046 # 3:10c 0 days 0:00:00.000 # 2:844 0 days 0:00:00.000 # 1:15bc 0 days 0:00:00.000 # Elapsed Time # Thread Time # 0:bfc 0 days 0:27:19.817 # 1:15bc 0 days 0:27:19.810 # 2:844 0 days 0:27:19.809 # 3:10c 0 days 0:27:19.809 ``` `~~[thread-id]` - in case you would like to use the system thread id you may with this syntax. `!tls Slot` extension displays a thread local storage slot (or -1 for all slots) ### Critical sections Display information about a particular critical section: `!critsec {address}`. `!locks` extension in Ntsdexts.dll displays a list of critical sections associated with the current process. `!cs -lso [Address]` - display information about critical sections (-l - only locked critical sections, -o - owner's stack, -s - initialization stack, if available) `!critsec Address` - information about a specific critical section ```sh !cs -lso # ----------------------------------------- # DebugInfo = 0x77294380 # Critical section = 0x772920c0 (ntdll!LdrpLoaderLock+0x0) # LOCKED # LockCount = 0x10 # WaiterWoken = No # OwningThread = 0x00002c78 # RecursionCount = 0x1 # LockSemaphore = 0x194 # SpinCount = 0x00000000 # ----------------------------------------- # DebugInfo = 0x00581850 # Critical section = 0x5164a394 (AcLayers!NS_VirtualRegistry::csRegCriticalSection+0x0) # LOCKED # LockCount = 0x4 # WaiterWoken = No # OwningThread = 0x0000206c # RecursionCount = 0x1 # LockSemaphore = 0x788 # SpinCount = 0x00000000 ``` Finally, we may use the raw output: ```shell dx -r1 ((ole32!_RTL_CRITICAL_SECTION_DEBUG *)0x581850) # ((ole32!_RTL_CRITICAL_SECTION_DEBUG *)0x581850) : 0x581850 [Type: _RTL_CRITICAL_SECTION_DEBUG *] # [+0x000] Type : 0x0 [Type: unsigned short] # [+0x002] CreatorBackTraceIndex : 0x0 [Type: unsigned short] # [+0x004] CriticalSection : 0x5164a394 [Type: _RTL_CRITICAL_SECTION *] # [+0x008] ProcessLocksList [Type: _LIST_ENTRY] # [+0x010] EntryCount : 0x0 [Type: unsigned long] # [+0x014] ContentionCount : 0x6 [Type: unsigned long] # [+0x018] Flags : 0x0 [Type: unsigned long] # [+0x01c] CreatorBackTraceIndexHigh : 0x0 [Type: unsigned short] # [+0x01e] SpareUSHORT : 0x0 [Type: unsigned short] ``` Controlling process execution ----------------------------- ### Controlling the target (g, t, p) To go up the funtion use `gu` command. We can go to a specified address using `ga [address]`. We can also step or trace to a specified address using accordingly `pa` and `ta` commands. Useful commands are `pc` and `tc` which step or trace to the next call statement. `pt` and `tt` step or trace to the next return statement. ### Watch trace `wt` is a very powerful command and might be excellent at revealing what the function under the cursor is doing, eg. (-oa displays the actual address of the call sites, -or displays the return register values): ```shell wt -l1 -oa -or # Tracing notepad!NPInit to return address 00007ff6`72c23af5 # 11 0 [ 0] notepad!NPInit # call at 00007ff6`72c27749 # 14 0 [ 1] notepad!_chkstk rax = 1570 # 20 14 [ 0] notepad!NPInit # call at 00007ff6`72c27772 # 11 0 [ 1] USER32!RegisterWindowMessageW rax = c06f # 26 25 [ 0] notepad!NPInit # call at 00007ff6`72c2778f # 11 0 [ 1] USER32!RegisterWindowMessageW rax = c06c # 31 36 [ 0] notepad!NPInit # call at 00007ff6`72c277a5 # 6 0 [ 1] USER32!NtUserGetDC rax = 9011652 # >> More than one level popped 0 -> 0 # 37 42 [ 0] notepad!NPInit # call at 00007ff6`72c277bc # 1635 0 [ 1] notepad!InitStrings rax = 1 # 42 1677 [ 0] notepad!NPInit # call at 00007ff6`72c277d0 # 8 0 [ 1] USER32!LoadCursorW rax = 10007 # 46 1685 [ 0] notepad!NPInit # call at 00007ff6`72c277e4 # 8 0 [ 1] USER32!LoadCursorW rax = 10009 # 50 1693 [ 0] notepad!NPInit # call at 00007ff6`72c277fb # 24 0 [ 1] USER32!LoadAcceleratorsW # 24 0 [ 1] USER32!LoadAcc rax = 0 # 59 1741 [ 0] notepad!NPInit # call at 00007ff6`72c27d84 # 6 0 [ 1] notepad!_security_check_cookie rax = 0 # 69 1747 [ 0] notepad!NPInit # # 1816 instructions were executed in 1815 events (0 from other threads) # # Function Name Invocations MinInst MaxInst AvgInst # USER32!LoadAcc 1 24 24 24 # USER32!LoadAcceleratorsW 1 24 24 24 # USER32!LoadCursorW 2 8 8 8 # USER32!NtUserGetDC 1 6 6 6 # USER32!RegisterWindowMessageW 2 11 11 11 # notepad!InitStrings 1 1635 1635 1635 # notepad!NPInit 1 69 69 69 # notepad!_chkstk 1 14 14 14 # notepad!_security_check_cookie 1 6 6 6 # # 1 system call was executed # # Calls System Call # 1 USER32!NtUserGetDC ``` The first number in the trace output specifies the number of instructions that were executed from the beginning of the trace in a given function (it is always incrementing), the second number specifies the number of instructions executed in the child functions (it is also always incrementing), and the third represents the depth of the function in the stack (parameter -l). If the `wt` command does not work, you may achieve similar results manually with the help of the target controlling commands: - stepping until a specified address: `ta`, `pa` - stepping until the next branching instruction: `th`, `ph` - stepping until the next call instruction: `tc`, `pc` - stepping until the next return: `tt`, `pt` - stepping until the next return or call instruction: `tct`, `pct` ### Breaking when a specific function is in the call stack ```shell bp Module!MyFunctionWithConditionalBreakpoint "r $t0 = 0;.foreach (v { k }) { .if ($spat(\"v\", \"*Module!ClassA:MemberFunction*\")) { r $t0 = 1;.break } }; .if($t0 = 0) { gc }" ``` ### Breaking on a specific function enter and leave The trick is to set a one-time breakpoint on the return address (`bp /1 @$ra`) when the main breakpoint is hit, for example: ```shell bp 031a6160 "dt ntdll!_GUID poi(@esp + 8); .printf /D \"==> obj addr: %p\", poi(@esp + C);.echo; bp /1 @$ra; g" bp kernel32!RegOpenKeyExW "du @rdx; bp /1 @$ra \"r @$retreg; g\"; g" ``` ```shell bp kernelbase!CreateFileW ".printf \"CreateFileW('%mu', ...)\", @rcx; bp /1 @$ra \".printf \\\" => %p\\\\n\\\", @rax; g\"; g" bp kernelbase!DeviceIoControl ".printf \"DeviceIoControl(%p, %p, ...)\\n\", @rcx, @rdx; g" bp kernelbase!CloseHandle ".printf \"CloseHandle(%p)\\n\", @rcx;g" ``` Remove the 'g' commands from the above samples if you want the debugger to stop. ### Breaking for all methods in the C++ object virtual table This could be useful when debugging COM interfaces, as in the example below. When we know the number of methods in the interface and the address of the virtual table, we may set the breakpoint using the .for loop, for example: ```shell .for (r $t0 = 0; @$t0 < 5; r $t0= @$t0 + 1) { bp poi(5f4d8948 + @$t0 * @$ptrsize) } ``` ### Breaking when a user-mode process is created (kernel-mode) `bp nt!PspInsertProcess` The breakpoint is hit whenever a new user-mode process is created. To know what process is it we may access the \_EPROCESS structure ImageFileName field. ```shell # x64 dt nt!_EPROCESS @rcx ImageFileName # x86 dt nt!_EPROCESS @eax ImageFileName ``` ### Setting a user-mode breakpoint in kernel-mode You may set a breakpoint in user space, but you need to be in a valid process context: ```shell !process 0 0 notepad.exe # PROCESS ffffe0014f80d680 # SessionId: 2 Cid: 0e44 Peb: 7ff7360ef000 ParentCid: 0aac # DirBase: 2d497000 ObjectTable: ffffc00054529240 HandleCount: # Image: notepad.exe .process /i ffffe0014f80d680 # You need to continue execution (press 'g' ) for the context # to be switched. When the debugger breaks in again, you will be in # the new process context. kd> g ``` Then when you are in a given process context, set the breakpoint: ```shell .reload /user !process -1 0 # PROCESS ffffe0014f80d680 # SessionId: 2 Cid: 0e44 Peb: 7ff7360ef000 ParentCid: 0aac # DirBase: 2d497000 ObjectTable: ffffc00054529240 HandleCount: # Image: notepad.exe x kernel32!CreateFileW # 00007ffa`d8502508 KERNEL32!CreateFileW () bp 00007ffa`d8502508 ``` Alternative way (which does not require process context switching) is to use data execution breakpoints, eg.: ```shell !process 0 0 notepad.exe # PROCESS ffffe0014ca22480 # SessionId: 2 Cid: 0614 Peb: 7ff73628f000 ParentCid: 0d88 # DirBase: 5607b000 ObjectTable: ffffc0005c2dfc40 HandleCount: # Image: notepad.exe .process /r /p ffffe0014ca22480 # Implicit process is now ffffe001`4ca22480 # .cache forcedecodeuser done # Loading User Symbols # .......................... x KERNEL32!CreateFileW # 00007ffa`d8502508 KERNEL32!CreateFileW () ba e1 00007ffa`d8502508 ``` For both those commands you may limit their scope to a particular process using /p switch. Scripting the debugger ---------------------- ### Using meta-commands (legacy way) WinDbg contains several meta-commands (starting with a dot) that allow you to control the debugger actions. The `.expr` command prints the expression evaluator (MASM or C++) that will be used when interpreting the symbols in the executed commands. You may use the /s to change it. The `?` command uses the default evaluator, and `??` always uses the C++ evaluator. Also, you can mix the evaluators in one expression by using `@@c++(expression)` or `@@masm(expression)` syntax, for example: `? @@c++(@$peb->ImageSubsystemMajorVersion) + @@masm(0y1)`. When using `.if` and `.foreach`, sometimes the names are not resolved - use spaces between them. For example, the command would fail if there was no space between poi( and addr in the code below. ```shell .foreach (addr {!DumpHeap -mt 71d75b24 -short}) { .if (dwo(poi( addr + 5c ) + c)) { !do addr } } ``` ### Using the dx command The [dx command](https://learn.microsoft.com/en-us/windows-hardware/drivers/debuggercmds/dx--display-visualizer-variables-) allows us to query the Debugger Object Model. There is a set of root objects from which we may start our query, including `@$cursession`, `@$curprocess`, `@$curthread`, `@$curstack`, or `@$curframe`. `dx Debugger.State` shows the current state of the debugger. The -h parameter additionally displays help for the debugger objects, for example: ```shell dx -h Debugger.State # Debugger.State [State pertaining to the current execution of the debugger (e.g.: user variables)] # DebuggerInformation [Debugger variables which are owned by the debugger and can be referenced by a pseudo-register prefix of @$] # DebuggerVariables [Debugger variables which are owned by the debugger and can be referenced by a pseudo-register prefix of @$] # FunctionAliases [Functions aliased to names which are accessible via a pseudo-register prefix of @$ or executable via a '!' command prefix] # PseudoRegisters [Categorizied debugger managed pseudo-registers which can be referenced by a pseudo-register prefix of @$] # Scripts [Scripts which have been loaded into the debugger and have properties, methods, or other accessible constructs] # UserVariables [User variables which are maintained by the debugger and can be referenced by a pseudo-register prefix of @$] # ExtensionGallery [Extension Gallery] ``` If we add the -v parameter, dx will print not only the values of the properties and fields but also the methods we may call on an object: ```shell dx -v -r1 Debugger.Sessions[0].Processes[15416].Threads[12796] # Debugger.Sessions[0].Processes[15416].Threads[12796] [Switch To] # Id : 0x31fc # Index : 0x0 # Stack # Registers # SwitchTo [SwitchTo() - Switch to this thread as the default context] # Environment # TTD # ToDisplayString [ToDisplayString([FormatSpecifier]) - Method which converts the object to its display string representation according to an optional format specifier] ``` #### Using variables and creating new objects in the dx query In our queries we may create anonymous objets, lambdas, arrays and objects of the Debugger Object Model types, for example: ```sh # Create an anonymous object for each call to RtlSetLastWin32Error that contains TTD time of the call and the error code value dx -g @$cursession.TTD.Calls("ntdll!RtlSetLastWin32Error").Select(c => new { TimeStart = c.TimeStart, Error = c.Parameters[0] }) # ========================================= # = = (+) TimeStart = Error = # ========================================= # = [0x0] - 725:3B - 0xbb = # = [0x1] - 725:3D6 - 0x57 = # = [0x2] - 725:4AA - 0x57 = # = [0x3] - 725:EF0 - 0xbb = # .... # Create a simple array containing four numbers dx Debugger.Utility.Collections.CreateArray(1, 2, 3, 4) # Debugger.Utility.Collections.CreateArray(1, 2, 3, 4) # [0x0] : 1 # [0x1] : 2 # [0x2] : 3 # [0x3] : 4 # Create a TTD position object and use it to set the current trace position dx -s @$create("Debugger.Models.TTD.Position", 4173, 75).SeekTo() # Create a lambda function to sum two numbers dx ((x, y) => x + y)(1, 2) # ((x, y) => x + y)(1, 2) : 3 ``` Additionally, we may assign the created object or the result of a dx query to a variable, for example: ```shell # Assign a lambda function to a $sum variable and use it dx @$sum = (x, y) => x + y dx @$sum(1, 2) # @$sum(1, 2) : 3 # Save all calls to the CreateFileW function to the @$calls variable dx @$calls = @$cursession.TTD.Calls("kernelbase!CreateFileW") ``` We may also use variables and pseudo-registers available in the debugger context. You may list them by examining the `Debugger.State.DebuggerVariables`, `Debugger.State.PseudoRegisters`, and `Debugger.State.UserVariables` objects. #### Using text files The `FileSystem` API allows us to access the host file system. To have the full control over the lifetime of the opened file handle, I recommend using the file object explicitly. The following code is an example when we read all lines from a file to an array: ```cpp dx @$file = Debugger.Utility.FileSystem.OpenFile("c:\\temp\\test.txt") dx @$lines = Debugger.Utility.FileSystem.CreateTextReader(@$file).ReadLineContents().ToArray() dx @$file.Close() ``` #### Example queries with explanations ```sh # Find kernel32 exports that contain the 'RegGetVal' string (by Tim Misiak) dx @$curprocess.Modules["kernel32"].Contents.Exports.Where(exp => exp.Name.Contains("RegGetVal")) # Show the address of the exported RegGetValueW function (by Tim Misiak) dx -r1 @$curprocess.Modules["kernel32"].Contents.Exports.Single(exp => exp.Name == "RegGetValueW").CodeAddress # Set a breakpoint on every exported function of the bindfltapi module dx @$curprocess.Modules["bindfltapi"].Contents.Exports.Select(m => Debugger.Utility.Control.ExecuteCommand($"bp {m.CodeAddress}")) # Show the number of calls made to functions with names starting from NdrClient in the rpcrt4 module dx -g @$cursession.TTD.Calls("rpcrt4!NdrClient*").GroupBy(c => c.Function).Select(g => new { Function = g.First().Function, Count = g.Count() }) ``` More examples of the dx queries for analysing the TTD traces can be found in the [TTD guide](/guides/using-ttd). #### Managed application support in the dx queries The SOS extension does not currently support the Debugger Object Models, but we can see that some of the debugger objects understand the managed context. For example, when we list **stack frames** of a managed process, the method names should be properly decoded: ```shell dx -r1 @$curprocess.Threads[13236].Stack.Frames # @$curprocess.Threads[13236].Stack.Frames # [0x0] : ntdll!NtReadFile + 0x14 [Switch To] # [0x1] : KERNELBASE!ReadFile + 0x7b [Switch To] # [0x2] : System_Console!Interop.Kernel32.ReadFile + 0x84 [Switch To] # [0x3] : System_Console!System.ConsolePal.WindowsConsoleStream.ReadFileNative + 0x60 [Switch To] # [0x4] : System_Console!System.ConsolePal.WindowsConsoleStream.Read + 0x2b [Switch To] # [0x5] : System_Console!System.IO.ConsoleStream.Read + 0x74 [Switch To] # [0x6] : System_Private_CoreLib!System.IO.StreamReader.ReadBuffer + 0x268 [Switch To] # [0x7] : System_Private_CoreLib!System.IO.StreamReader.ReadLine + 0xd3 [Switch To] # [0x8] : System_Console!System.IO.SyncTextReader.ReadLine + 0x3d [Switch To] # [0x9] : System_Console!System.Console.ReadLine + 0x19 [Switch To] # [0xa] : testcs!Program.Main + 0xc6 [Switch To] # ... dx -r1 @$curprocess.Threads[13236].Stack.Frames[10] # @$curprocess.Threads[13236].Stack.Frames[10] : testcs!Program.Main + 0xc6 [Switch To] # LocalVariables # Parameters : () # Attributes dx -r1 @$curprocess.Threads[13236].Stack.Frames[10].LocalVariables # @$curprocess.Threads[13236].Stack.Frames[10].LocalVariables # ex : 0x0 [Type: System.Exception] # slot0 [Type: System.Runtime.CompilerServices.DefaultInterpolatedStringHandler] # ... ``` Additionally, we may query **the managed heap** (the `ManagedHeap` property is a nice replacement for the `!DumpHeap` command): ```shell dx -r1 @$curprocess.Memory.ManagedHeap # @$curprocess.Memory.ManagedHeap # GCHandles # Objects # ObjectsByType dx -r1 @$curprocess.Memory.ManagedHeap.Objects # @$curprocess.Memory.ManagedHeap.Objects # [0x0] : 0x1ab6fc00020 size = 60 type = int[] # [0x1] : 0x1ab6fc00080 size = 80 type = System.OutOfMemoryException # [0x2] : 0x1ab6fc00100 size = 80 type = System.StackOverflowException # [0x3] : 0x1ab6fc00180 size = 80 type = System.ExecutionEngineException # [0x4] : 0x1ab6fc00200 size = 18 type = System.Object # [0x5] : 0x1ab6fc00218 size = 18 type = System.String # [0x6] : 0x1ab6fc00230 size = 50 type = System.Collections.Generic.Dictionary # [0x7] : 0x1ab6fc00280 size = 48 type = System.String # [...] ``` ### Using the JavaScript engine Links: - [Official Microsoft documentation](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/javascript-debugger-scripting) - [The API reference for the host object](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/native-objects-in-javascript-extensions-debugger-objects) - [Debugger data model, Javascript & x64 exception handling](https://doar-e.github.io/blog/2017/12/01/debugger-data-model) - a great article on scripting the debugger by Alex "0vercl0k" Souchet #### Loading a script The `.scriptproviders` command must include the JavaScript provider in the output. Then we may run a script with the `.scriptrun` command or load it using the `.scriptload` command. The difference is that model modifications made by the `.scriptload` will stay in place until the call to `.scriptunload`. Also, `.scriptrun` will call the `invokeScript` JS function after the usual calls to the root code and the `initializeScript` function. `.scriptlist` lists the loaded scripts. #### Running a script After loading a script file, we may find it in the `Debugger.State.Scripts` list (`.scriptlist` will show it, too): ```shell .scriptload c:\windbg-js\windbg-scripting.js # JavaScript script successfully loaded from 'c:\windbg-js\windbg-scripting.js' dx -r1 Debugger.State.Scripts # Debugger.State.Scripts # windbg-scripting ``` Then we are ready to call any defined public function, for example, logn: ```shell dx Debugger.State.Scripts.@"windbg-scripting".Contents.logn("test") # test Debugger.State.Scripts.@"windbg-scripting".Contents.logn("test") ``` The `@$scriptContents` variable is a shortcut to all the public functions from all the loaded scripts, so our call could be more compact: ```shell dx @$scriptContents.logn("test") # test @$scriptContents.logn("test") ``` #### Working with types The `Number` type in JavaScript has a 53-bit limitation, which prevents us from working with 64-bit types. Fortunately, WinDbg provides us with the `Int64` type with methods for operations on 64-bit numbers such as `getLowPart`, `getHighPart`, `bitwiseAnd`, or `bitwiseShiftLeft` (others in [the documentation](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/javascript-debugger-scripting#work-with-64-bit-values-in-javascript-extensions)). It also has a properly implemented `toString` and can be safely used for hexadecimal conversion of data from the debugger, for example: ```js function initializeScript() { return [new host.apiVersionSupport(1, 7), new host.functionAlias(runTest, "runTest")] } function __hexString2(n) { return n.toString(16); } function runTest(n) { return __hexString2(n); } ``` ```shell dx @$n = 0xffffffffffffffff # @$n = 0xffffffffffffffff : -1 !runTest(@$n) # @$runTest(@$n) : ffffffffffffffff # Length : 0x10 ``` Additiona, the JS provider tracks created `Int64` objects and, if an object for a given value already exists, it will be returned, for example: ```js const call = host.currentSession.TTD.Calls("combase!CoCreateInstance").First(); const ppv = call.Parameters.ppv.address; call.TimeEnd.SeekTo(); const cobj = host.evaluateExpression(`*(void **)${ppv}`).address; const i2 = new host.Int64(cobj); const m = new Map(); m.set(i2, clsid); m.set(cobj, clsid); // the m size is 1 __logn(`m size : ${m.size}`); ``` #### Accessing the debugger engine objects The `host.namespace` gives us access to the `debuggerRootNamespace` which we normally use with the `dx` command: ```shell dx @$debuggerRootNamespace # @$debuggerRootNamespace # Debugger ``` ```js var ctl = host.namespace.Debugger.Utility.Control; ctl.ExecuteCommand(".process /p /r " + procId); ``` DML might pollute the command output. If that's the case, you may disable it with the `.prefer_dml 0` command. #### Evaluating expressions in a debugger context The `host.evaluateExpression` allows to evaluate expressions, for eaxmple: ```js function exc(addr) { let exceptionRecord = host.evaluateExpression(`(_EXCEPTION_RECORD*)${addr}`); let exceptionCode = host.evaluateExpression(`(DWORD)${exceptionRecord.ExceptionCode}`) if (exceptionCode === 0xe06d7363) { println("== EH exception =="); exceptionRecord. } else { logn(`Other exception: ${exceptionCode}`) } } ``` It is quite slow, so using it in frequently executed functions is not practical. #### Debugging a script After we loaded the script (`.scriptload`), we may also debug its parts thanks to the `.scriptdebug` command, for example: ```shell .scriptload c:\windbg-js\strings.js .scriptdebug strings.js # *** Inside JS debugger context *** | # ... # [11] NatVis script from 'C:\Program Files\WindowsApps\Microsoft.WinDbg_1.2308.2002.0_x64__8wekyb3d8bbwe\amd64\Visualizers\winrt.natvis' # [12] [*DEBUGGED*] JavaScript script from 'c:\windbg-js\strings.js' # bp logn # Breakpoint 1 set at logn (11:5) bl # Id State Pos # 1 enabled 11:5 # q ``` We are running a debugger in the debugger, so it could be a bit confusing :) After quitting the JavaScript debugger, it will keep the breakpoints information, so when we call our function from the main debugger, we will land in the JavaScript debugger again, for example: ```shell dx @$scriptContents.logn("test") # >>> ****** SCRIPT BREAK strings [Breakpoint 1] ****** # Location: line = 11, column = 5 # Text: log(s + "\n") # # *** Inside JS debugger context *** dv # s = test ``` The number of commands available in the inner JavaScript debugger is quite long and we may list them with the `.help` command. Especially, the evaluate expression (`?` or `??`) are very useful as they allow us to execute any JavaScript expressions and check their results: ```shell ? host # host : {...} # __proto__ : {...} # ... # Int64 : function () { [native code] } # parseInt64 : function () { [native code] } # namespace : {...} # evaluateExpression : function () { [native code] } # evaluateExpressionInContext : function () { [native code] } # getModuleSymbol : function () { [native code] } # getModuleContainingSymbol : function () { [native code] } # getModuleContainingSymbolInformation : function () { [native code] } # getModuleSymbolAddress : function () { [native code] } # setModuleSymbol : function () { [native code] } # getModuleType : function () { [native code] } # ... ``` ### Launching commands from a script file We can also execute commands from a script file. We use the `$$` command family for that purpose. The -c option allows us to run a command on a debugger launch. So if we pass the `$$<` command with a file path, windbg will read the file and execute the commands from it as if they were entered manually, for example: ```shell windbgx -c "$$args<` command variant to pass arguments to our script. When analyzing multiple files, I often use PowerShell to call WinDbg with the commands I want to run. In each WinDbg session, I pass the output of the commands to the windbg.log file, for example: ```shell Get-ChildItem .\dumps | % { Start-Process -Wait -FilePath windbg-x64\windbg.exe -ArgumentList @("-loga", "windbg.log", "-y", "`"SRV*C:\dbg\symbols*https://msdl.microsoft.com/download/symbols`"", "-c", "`".exr -1; .ecxr; k; q`"", "-z", $_.FullName) } ``` To make a **comment**, you can use one of the comment commands: `$$ my comment` or `* my comment`. The difference between them is that `*` comments everything till the end of the line, while `$$` comments text till the semicolon (or end of a line), e.g., `r eax; $$ some text; r ebx; * more text; r ecx` will print eax, ebx but not ecx. The `.echo` command ends if the debugger encounters a semicolon (unless the semicolon occurs within a quoted string). Time Travel Debugging (TTD) --------------------------- [TTD](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/time-travel-debugging-overview) is a fantastic way to debug application code. After collecting a debug trace, we may query process memory, function calls, going deeper and deeper into the call stacks if necessary, and jump through various process lifetime events. ### Installation The collector is installed with WinDbgX and we may enable it when starting a WinDbgX debugging session. Alternatively, we could [install the command-line TTD collector](https://learn.microsoft.com/en-us/windows-hardware/drivers/debuggercmds/time-travel-debugging-ttd-exe-command-line-util#how-to-download-and-install-the-ttdexe-command-line-utility-preferred-method). The PowerShell script published on the linked site is capable of installing TTD even on systems not supporting the MSIX installations. The command-line tool is probably the best option when collecting TTD traces on server systems. When done, you may uninstall the driver by using the -cleanup option. ### Collection If you have WinDbgX, you may use TTD by checking the "Record with Time Travel Debugging" checkbox when you start a new process or attach to a running one. When you stop the TTD trace in WinDbgX it will terminate the target process (TTD.exe, described later, can detach from a process without killing it). An alternative to WinDbgX is running the command-line TTD collector. Some usage examples: ```sh # launch a new winver.exe process and record the trace in C:\logs ttd.exe -accepteula -out c:\logs winver.exe # attach and trace the process with ID 1234 and all its newly started children ttd.exe -accepteula -children -out c:\logs -attach 1234 # attach and trace the process with ID 1234 to a ring buffer, backed by a trace file of maximum size 1024 MB ttd.exe -accepteula -ring -maxFile 1024 -out c:\logs -attach 1234 # record a trace of the running and newly started processes, add a timestamp to the trace file names ttd.exe -accepteula -timestampFilename -out c:\logs -monitor winver.exe ttd.exe -accepteula -timestampFilename -out c:\logs -monitor app1.exe -monitor app2.exe ``` ### Accessing TTD data We can acess TTD objects by querying the TTD property of the session or process objects: ```sh dx -v @$cursession.TTD # @$cursession.TTD # HeapLookup [Returns a vector of heap blocks that contain the provided address: TTD.Utility.HeapLookup(address)] # Calls [Returns call information from the trace for the specified set of methods: TTD.Calls("module!method1", "module!method2", ...) For example: dx @$cursession.TTD.Calls("user32!SendMessageA")] # Memory [Returns memory access information for specified address range: TTD.Memory(startAddress, endAddress [, "rwec"])] # MemoryForPositionRange [Returns memory access information for specified address range and position range: TTD.MemoryForPositionRange(startAddress, endAddress [, "rwec"], minPosition, maxPosition)] # PinObjectPosition [Pins an object to the given time position: TTD.PinObjectPosition(obj, pos)] # AsyncQueryEnabled : false # Data : Normalized data sources based on the contents of the time travel trace # Utility : Methods that can be useful when analyzing time travel traces # ToDisplayString [ToDisplayString([FormatSpecifier]) - Method which converts the object to its display string representation according to an optional format specifier] dx -v @$curprocess.TTD # @$curprocess.TTD # Index # Threads # Events # DebugOutput # Lifetime : [66:0, 118A2:0] # DefaultMemoryPolicy : GloballyAggressive # SetPosition [Sets the debugger to point to the given position on this process.] # GatherMemoryUse [0] # RecordClients ``` ### Querying debugging events The `@$curprocess.Events` collection contains [TTD Event objects](https://learn.microsoft.com/en-us/windows-hardware/drivers/debuggercmds/time-travel-debugging-event-objects). We can use the group query to learn what type of events we have in our trace: ```sh dx -g @$curprocess.TTD.Events.GroupBy(ev => ev.Type).Select(g => new { Type = g.First().Type, Count = g.Count() }) # =========================================================== # = = (+) Type = Count = # =========================================================== # = ["ModuleLoaded"] - ModuleLoaded - 0x23 = # = ["ThreadCreated"] - ThreadCreated - 0x9 = # = ["ThreadTerminated"] - ThreadTerminated - 0x9 = # = ["Exception"] - Exception - 0x4 = # = ["ModuleUnloaded"] - ModuleUnloaded - 0x23 = # =========================================================== ``` Next, we may filter the list for events that interest us, for example, to extract the first [TTD Exception object](https://learn.microsoft.com/en-us/windows-hardware/drivers/debuggercmds/time-travel-debugging-exception-objects), we may run the following query: ```sh dx @$curprocess.TTD.Events.Where(ev => ev.Type == "Exception").Select(ev => ev.Exception).First() # @$curprocess.TTD.Events.Where(ev => ev.Type == "Exception").Select(ev => ev.Exception).First() : Exception 0xE0434352 of type Software at PC: 0X7FF91E0842D0 # Position : 7E7C:0 [Time Travel] # Type : Software # ProgramCounter : 0x7ff91e0842d0 # Code : 0xe0434352 # Flags : 0x1 # RecordAddress : 0x0 # ... ``` ### Examining function calls The `Calls` method of the `TTD` objects allows us to query [function calls](https://learn.microsoft.com/en-us/windows-hardware/drivers/debuggercmds/time-travel-debugging-calls-objects) made in the trace. We may use either an address or a symbol name (even with wildcards) as a parameter to the Calls method: ```shell x OLEAUT32!IDispatch_Invoke_Proxy # 75a13bf0 OLEAUT32!IDispatch_Invoke_Proxy (void) # we may use the address of a function dx @$cursession.TTD.Calls(0x75a13bf0).Count() # @$cursession.TTD.Calls(0x75a13bf0).Count() : 0x6a18 # or its symbolic name dx @$cursession.TTD.Calls("OLEAUT32!IDispatch_Invoke_Proxy").Count() # @$cursession.TTD.Calls("OLEAUT32!IDispatch_Invoke_Proxy").Count() : 0x6a18 ``` Thanks to **wildcards**, we can easily get statistics on function calls from a given module or modules (this call might take some time for longer traces): ```shell # Show the number of calls made to functions with names starting from NdrClient in the rpcrt4 module dx -g @$cursession.TTD.Calls("rpcrt4!NdrClient*").GroupBy(c => c.Function).Select(g => new { Function = g.First().Function, Count = g.Count() }) # ============================================================================== # = = (+) Function = Count = # ============================================================================== # = ["RPCRT4!NdrClientCall2"] - RPCRT4!NdrClientCall2 - 0x5 = # = ["RPCRT4!NdrClientInitialize"] - RPCRT4!NdrClientInitialize - 0x5 = # = ["RPCRT4!NdrClientCall3"] - RPCRT4!NdrClientCall3 - 0x8 = # = ["RPCRT4!NdrClientZeroOut"] - RPCRT4!NdrClientZeroOut - 0x1 = # ============================================================================== ``` TimeStart shows the position of a call in a trace and we may use it to jump between different places in the trace. SystemTimeStart shows the clock time of a given call: ```shell dx -g @$cursession.TTD.Calls("user32!DialogBox*").Select(c => new { Function = c.Function, TimeStart = c.TimeStart, SystemTimeStart = c.SystemTimeStart }) # ============================================================================================================== # = = (+) Function = (+) TimeStart = (+) SystemTimeStart = # ============================================================================================================== # = [0x0] - USER32!DialogBoxIndirectParamW - 62E569:57 - Friday, February 2, 2024 16:03:39.391 = # = [0x1] - USER32!DialogBoxIndirectParamAorW - 62E569:5C - Friday, February 2, 2024 16:03:39.391 = # = [0x2] - USER32!DialogBox2 - 631C23:102 - Friday, February 2, 2024 16:03:39.791 = ``` Each function call has a Parameters property that gives us access to the function parameters (without private symbols, we can access the first four parameters) of a call: ```shell # Check which LastErrors were set during the call dx -h @$cursession.TTD.Calls("ntdll!RtlSetLastWin32Error").Select(c => c.Parameters[0]).Distinct() # @$cursession.TTD.Calls("ntdll!RtlSetLastWin32Error").Select(c => c.Parameters[0]).Distinct() # [0x0] : 0xbb # [0x1] : 0x57 # [0x2] : 0x0 # [0x3] : 0x7e # [0x4] : 0x3f0 # Find LastError calls when LastError is not zero dx -g @$cursession.TTD.Calls("ntdll!RtlSetLastWin32Error").Where(c => c.Parameters[0] != 0).Select(c => new { TimeStart = c.TimeStart, Error = c.Parameters[0] }) # ========================================= # = = (+) TimeStart = Error = # ========================================= # = [0x0] - 725:3B - 0xbb = # = [0x1] - 725:3D6 - 0x57 = # = [0x2] - 725:4AA - 0x57 = # = [0x3] - 725:EF0 - 0xbb = # .... ``` ### Position in TTD trace The [TTD Position object](https://learn.microsoft.com/en-us/windows-hardware/drivers/debuggercmds/time-travel-debugging-position-objects) describes a moment in time in the trace. Its `SeekTo` method allows us to jump to this moment and analyze the process state: ```shell dx -r1 @$create("Debugger.Models.TTD.Position", 34395, 1278) # @$create("Debugger.Models.TTD.Position", 34395, 1278) : 865B:4FE [Time Travel] # Sequence : 0x865b # Steps : 0x4fe # SeekTo [Method which seeks to time position] # ToSystemTime [Method which obtains the approximate system time at a given position] dx -s @$create("Debugger.Models.TTD.Position", 34395, 1278).SeekTo() # (1d30.1b94): Break instruction exception - code 80000003 (first/second chance not available) # Time Travel Position: 865B:4FE ``` Alternatively, we could use `!tt 865B:4FE` to jump to a specific time position. If we are troubleshooting an issue spanning multiple processes, we may simultaneously record TTD traces for all of them, and later, use the TTD Position objects to set the same moment in time in all the traces. It is a very effective technique when debugging locking issues. ### Examining memory access The `Memory` and `MemoryForPositionRange` methods of the TTD Session object return [TTD Memory objects](https://learn.microsoft.com/en-us/windows-hardware/drivers/debuggercmds/time-travel-debugging-memory-objects) describing various operations on the memory. For example, the command below shows all the changes to the global GcInProgress variable in a .NET application: ```shell dx -g @$cursession.TTD.Memory(&coreclr!g_pGCHeap->GcInProgress, &coreclr!g_pGCHeap->GcInProgress+4, "w") # ============================================================================================================================================================================================================================================================================================================== # = = (+) EventType = (+) ThreadId = (+) UniqueThreadId = (+) TimeStart = (+) TimeEnd = (+) AccessType = (+) IP = (+) Address = (+) Size = (+) Value = (+) OverwrittenValue = (+) SystemTimeStart = (+) SystemTimeEnd = # ============================================================================================================================================================================================================================================================================================================== # = [0x0] - 0x1 - 0x2c80 - 0x2 - C79:58C - C79:58C - Write - 0x7ff8fdbce0ee - 0x7ff8fe00caf0 - 0x8 - 0x2b4800c9bc0 - 0x0 - poniedziałek, 15 kwietnia 2024 10:14:18.475 - poniedziałek, 15 kwietnia 2024 10:14:18.475 = # = [0x1] - 0x1 - 0x2c80 - 0x2 - 3AF4:5A - 3AF4:5A - Write - 0x7ff8fdcdacc3 - 0x7ff8fe00cae8 - 0x4 - 0x1 - 0x0 - poniedziałek, 15 kwietnia 2024 10:14:20.896 - poniedziałek, 15 kwietnia 2024 10:14:20.896 = # = [0x2] - 0x1 - 0x2c80 - 0x2 - 3B26:E6C - 3B26:E6C - Write - 0x7ff8fdcdacc3 - 0x7ff8fe00cae8 - 0x4 - 0x0 - 0x1 - poniedziałek, 15 kwietnia 2024 10:14:20.910 - poniedziałek, 15 kwietnia 2024 10:14:20.910 = # = [0x3] - 0x1 - 0x2c80 - 0x2 - 87DF:5A - 87DF:5A - Write - 0x7ff8fdcdacc3 - 0x7ff8fe00cae8 - 0x4 - 0x1 - 0x0 - poniedziałek, 15 kwietnia 2024 10:14:24.539 - poniedziałek, 15 kwietnia 2024 10:14:24.539 = # = [0x4] - 0x1 - 0x2c80 - 0x2 - 880C:50C - 880C:50C - Write - 0x7ff8fdcdacc3 - 0x7ff8fe00cae8 - 0x4 - 0x0 - 0x1 - poniedziałek, 15 kwietnia 2024 10:14:24.548 - poniedziałek, 15 kwietnia 2024 10:14:24.548 = # = [0x5] - 0x1 - 0x2c80 - 0x2 - 889F:5A - 889F:5A - Write - 0x7ff8fdcdacc3 - 0x7ff8fe00cae8 - 0x4 - 0x1 - 0x0 - poniedziałek, 15 kwietnia 2024 10:14:25.769 - poniedziałek, 15 kwietnia 2024 10:14:25.769 = # ============================================================================================================================================================================================================================================================================================================== ``` The `MemoryForPositionRange` method allows us to additionally limit memory access queries to a specific time-range. It makes sense to use this method for scope-based addresses, such as function parameters or local variables. Below, you may see an example of a query when we list all the places in the CreateFileW function that read the file name (the first argument to the function): ```shell dx -s @$call = @$cursession.TTD.Calls("kernelbase!CreateFileW").First() dx -g @$cursession.TTD.MemoryForPositionRange(@$call.Parameters[0], @$call.Parameters[0] + sizeof(wchar_t), "r", @$call.TimeStart, @$call.TimeEnd) # ====================================================================================================================================================== # = = (+) Position = ThreadId = UniqueThreadId = Address = IP = Size = AccessType = Value = (+) Data = # ====================================================================================================================================================== # = [0x0] - AB:1981 - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04a836 - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0x1] - AB:1AD4 - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04b6e1 - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0x2] - AB:1C27 - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04b796 - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0x3] - AB:1C5E - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04bca9 - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0x4] - AB:1CC8 - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04caa8 - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0x5] - AB:1CCA - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04caae - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0x6] - AB:1CCF - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04cabe - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0x7] - AB:1E23 - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04bd5a - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0x8] - AB:1E2A - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04bd7b - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0x9] - AB:1E5C - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04be56 - 0x2 - Read - 0x55005c003a0043 - {...} = # = [0xa] - AB:1E68 - 0x2018 - 0x2 - 0x236011c33c0 - 0x7ff91e04be7a - 0x2 - Read - 0x55005c003a0043 - {...} = # ====================================================================================================================================================== ``` Misc tips --------- ### Converting a memory dump from one format to another When debugging a full memory dump (**/ma**), we may convert it to a smaller memory dump using again the `.dump` command, for example: ```shell .dump /mpi c:\tmp\smaller.dmp ``` ### Loading an arbitrary DLL into WinDbg for analysis WinDbg allows analysis of an arbitrary PE file if we load it as a crash dump (the **Open dump file** menu option or the -z command-line argument), for example: `windbgx -z C:\Windows\System32\shell32.dll`. WinDbg will load a DLL/EXE as a data file. Alternatively, if we want to normally load the DLL, we may use **rundll32.exe** as our debugging target and wait until the DLL gets loaded, for example: `windbgx -c "sxe ld:jscript9.dll;g" rundll32.exe .\jscript9.dll,TestFunction`. The TestFunction in the snippet could be any string. Rundll32.exe loads the DLL before validating the exported function address. ### Keyboard and mouse shortcuts The **SHIFT + \[UP ARROW\]** completes the current command from previously executed commands (much as F8 in cmd). If you double-click on a word in the command window in WinDbgX, the debugger will **highlight** all occurrences of the selected term. You may highlight other words with different colors if you press the ctrl key when double-clicking on them. To unhighlight a given word, double-click on it again, pressing the ctrl key. ### Running a command for all the processes ```shell dx -r2 @$cursession.Processes.Where(p => p.Name == "test.exe").Select(p => Debugger.Utility.Control.ExecuteCommand("|~[0n" + p.Id + "]s;bp testlib!TestMethod \".lastevent; r @rdx; u poi(@rdx); g\"")) ``` ### Attaching to multiple processes at once In PowerShell: ```shell Get-Process -Name disp+work | where Id -ne 6612 | % { ".attach -b 0n$($_.Id)" } | Out-File -Encoding ascii c:\tmp\attach_all.txt windbgx.exe -c "`$`$ - [General information](#general-information) - [Listing Performance Counters installed in the system](#listing-performance-counters-installed-in-the-system) - [Collecting performance data](#collecting-performance-data) - [Examining the collected performance data](#examining-the-collected-performance-data) - [Using system tools](#using-system-tools) - [Using Log Parser](#using-log-parser) - [Save performance data in SQL Server](#save-performance-data-in-sql-server) - [Fix problems with Performance Counters](#fix-problems-with-performance-counters) - [Corrupted counters](#corrupted-counters) ## General information The Performance Counter selection uses following syntax: `\\Computer\PerfObject(ParentInstance/ObjectInstance#InstanceIndex)\Counter`. In order to match the process instance index with a PID you may use a special counter `\Process(*)\ID Process`. Similar counter (`\.NET CLR Memory(*)\Process ID`) exists for .NET Framework apps. If we want to track performance data for a particular process, we should start with collecting data from those two counters, for example: ```shell typeperf -c "\Process(*)\ID Process" -si 1 -sc 1 -f CSV -o pids.txt typeperf -c "\.NET CLR Memory(*)\Process ID" -si 1 -sc 1 -f CSV -o clr-pids.txt ``` An application that supports Performance Counters must have a **Performance** key under the **HKLM\SYSTEM\CurrentControlSet\Services\appname** key. The following example shows the values that you must include for this key. HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \application-name \Linkage Export = a REG_MULTI_SZ value that will be passed to the `OpenPerformanceData` function \Performance Library = Name of your performance DLL Open = Name of your Open function in your DLL Collect = Name of your Collect function in your DLL Close = Name of your Close function in your DLL Open Timeout = Timeout when waiting for the `OpenPerformanceData` to finish Collect Timeout = Timeout when waiting for the `CollectPerformanceData` to finish Disable Performance Counters = A value added by system if something is wrong with the library The Performance Counter names and descriptions are stored under the **HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Perflib** key in the registry. HKEY_LOCAL_MACHINE \SOFTWARE \Microsoft \Windows NT \CurrentVersion \Perflib Last Counter = highest counter index Last Help = highest help index \009 Counters = 2 System 4 Memory... Help = 3 The System Object Type... \supported language, other than English Counters = ... Help = ... ## Listing Performance Counters installed in the system To list the available Performance Counters we may use the **Get-Counter** cmdlet in **PowerShell** or the **typeperf** command. For example, below, we look for Performance Counters in the `processor` set: ``` PS> Get-Counter -listset processor CounterSetName : Processor MachineName : . CounterSetType : MultiInstance Description : The Processor performance object consists of counters that measure aspects of processor activity. The processor is the part of the computer that performs arithmetic and logical computations, initi ates operations on peripherals, and runs the threads of processes. A computer can have multiple p rocessors. The processor object represents each processor as an instance of the object. Paths : {\Processor(*)\% Processor Time, \Processor(*)\% User Time, \Processor(*)\% Privileged Time, \Proc essor(*)\Interrupts/sec...} PathsWithInstances : {\Processor(0)\% Processor Time, \Processor(1)\% Processor Time, \Processor(_Total)\% Processor Ti me, \Processor(0)\% User Time...} Counter : {\Processor(*)\% Processor Time, \Processor(*)\% User Time, \Processor(*)\% Privileged Time, \Proc essor(*)\Interrupts/sec...} ``` The Get-Counter cmdlet accepts also **wildcards** and is case insensitive so to list Performance Counter sets which starts with `.net` you may issue command: `Get-Counter -listset .net*`. To find all Performance Counters for the `.NET CLR Memory` object using **typeperf**, we could run: ``` > typeperf -q ".NET CLR Memory" \.NET CLR Memory(*)\# Gen 0 Collections \.NET CLR Memory(*)\# Gen 1 Collections ... ``` If we also want to include instance information: ``` > typeperf -qx ".NET CLR Memory" \.NET CLR Memory(_Global_)\# Gen 0 Collections \.NET CLR Memory(powershell)\# Gen 0 Collections \.NET CLR Memory(powershell#1)\# Gen 0 Collections \.NET CLR Memory(_Global_)\# Gen 1 Collections \.NET CLR Memory(powershell)\# Gen 1 Collections ... ``` Finally, the **lodctr** extracts Performance Counters information from the registry: ``` > lodctr /q:".NET CLR Data" Performance Counter ID Queries [PERFLIB]: Base Index: 0x00000737 (1847) Last Counter Text ID: 0x0000435A (17242) Last Help Text ID: 0x0000435B (17243) [.NET CLR Data] Performance Counters (Enabled) DLL Name: netfxperf.dll Open Procedure: OpenPerformanceData Collect Procedure: CollectPerformanceData Close Procedure: ClosePerformanceData First Counter ID: 0x000013A4 (5028) Last Counter ID: 0x000013B0 (5040) First Help ID: 0x000013A5 (5029) Last Help ID: 0x000013B1 (5041) ``` ## Collecting performance data We could use the same tools we used for querying also to collect Performance Counters data. In **PowerShell**, to collect 50 samples (with 1s interval) from all the process counters and save them to a binary file we could run the following set of commands: ```shell (Cet-Counter -listset process).Paths > counters.txt Get-Counter (gc .\counters.txt) -sampleinterval 1 -maxsamples 20 | Export-Counter testdata.blg -FileFormat BLG -Force ``` Another example shows how to collect samples with interval 2s until ctrl-c is pressed: ```shell Get-Counter (gc .\counters.txt) -sampleinterval 2 -continuous / ``` We may achieve the same results with **typeperf**, for example: ```shell typeperf -cf .\counters.txt -si 1 -o testdata.blg -f BIN -sc 20 typeperf -cf .\counters.txt -si 1 ``` Of course, with both PowerShell or typeperf, we may also retrieve only one counter data: ```shell typeperf -c "\process(*)\% Processor Time" -si 1 -sc 20 -o testdata.blg -f BIN ``` Finally, we have a gui tool, **perfmon** that allows us to pick the interesting counters and present their values in a graph. We may also trigger a scheduled task when a specific counter threshold is met. You just need to manually create a **User-Created Data Collector** of type **Performance Counter Alert**. You will then be able select which counter values are interesting for you. ## Examining the collected performance data ### Using system tools If we saved the counters data to a binary file, we can open it with **perfmon**: ```shell perfmon /sys /open "c:\temp\testdata.blg" ``` *REMARK: Remember to specify full path to the binary file.* A command line tool to query the collected performance data is **relog**. For example, to list the Performance Counters available in the input file, run the following command: ```shell relog -q testdata.blg ``` In PowerShell, the **Import-Counter** cmdlet reads performance data generated by any Performance Counter tool and converts it to the performance data objects (the same as generated by the **Get-Counter** command). Collect Performance Counter binary data and convert it using the **Import-Counter** cmdlet: ```shell typeperf -cf .\counters.txt -si 1 -o testdata.blg -f BIN -sc 20 Import-Counter .\testdata.blg ``` The Import-Counter cmdlet may show statistics for the performance data file, for example: ``` PS C:\temp> Import-Counter .\testdata.blg -summary OldestRecord NewestRecord SampleCount ------------ ------------ ----------- 2012-03-31 15:54:27 2012-03-31 15:54:46 20 ``` ### Using Log Parser **[Log Parser Studio](https://techcommunity.microsoft.com/t5/exchange-team-blog/introducing-log-parser-studio/ba-p/601131)** and the command line **[logparser](https://www.microsoft.com/en-in/download/details.aspx?id=24659)** tool (and library) are great data analysing tools and we may use them to query Performance Counters data as well. They do not understand the BLG format so before we can look into the data we need to convert the BLG file to CSV format (additional filtering is possible): ```shell relog -f CSV testdata.blg -o testdata.csv ``` And we are ready to use logparser to parse the data, for example: ```shell logparser "select * from testdata.csv" -o:DATAGRID logparser "select top 2 [Event Name], Type, [User Data] into c:\temp\test.csv from dumpfile.csv" ``` To draw a chart presenting the Performance Counters data use the following syntax: ```shell logparser "select [time], [\\pecet\process(system)\% user time],[\\pecet\process(_total)\% user time] into test.gif from testdata.csv" -o:CHART logparser "select to_timestamp(time, 'MM/dd/yyyy HH:mm:ss.ll'), [\\pecet\process(system)\% user time],[\\pecet\process(_total)\% user time] into test.gif from testdata.csv" -o:CHART ``` ### Save performance data in SQL Server To save Performance Counters data in SQL Server, you need to create a new Data Source (ODBC) using the SQL Server driver (SQLSRV32.dll). Then run the relog tool, for example: ``` > relog -f SQL -o SQL:Test!fd .\memperfdata-blog.csv Input ---------------- File(s): .\memperfdata-blog.csv (CSV) Begin: 2012-4-17 6:44:15 End: 2012-4-17 6:44:25 Samples: 10 100.00% Output ---------------- File: SQL:Test!fd Begin: 2012-4-17 6:44:15 End: 2012-4-17 6:44:25 Samples: 4 The command completed successfully. ``` More information: - Relog Syntax Examples (for SQL Server) - SQL Log File Schema ## Fix problems with Performance Counters ### Corrupted counters Performance Counters sometimes might become corrupted - in such a case try to locate last Performance Counter data backup in C:\Windows\System32 folder. It should have a name similar to **PerfStringBackup.ini**. Before making any changes make backup of your current perf counters: ``` lodctr /S:PerfStringBackup_broken.ini ``` and then restore the counters: ``` lodctr /R:PerfStringBackup.ini ``` {% endraw %} ================================================ FILE: guides.md ================================================ --- layout: page title: Guides --- Please first check the [Windows degugging configuration guide](configuring-windows-for-effective-troubleshooting) as it presents fundamental settings and tools for effective problems troubleshooting on Windows. Similarly, I published the [Linux debugging configuration guide](configuring-linux-for-effective-troubleshooting) (work in progress). ### :triangular_ruler: Troubleshooting scenarios #### [Diagnosing .NET applications](diagnosing-dotnet-apps) This guide describes ways of troubleshooting various problems in .NET applications, such as high CPU usage, memory leaks, network issues, etc. #### [Diagnosing native Windows applications](diagnosing-native-windows-apps) This guide describes ways of troubleshooting various problems in native applications on Windows, such as high CPU usage, hangs, abnormal terminations, etc. #### [COM troubleshooting](com-troubleshooting) A guide presenting troubleshooting techniques and tools (including the [comon extension](https://github.com/lowleveldesign/comon)) useful for debugging COM objects. ### :wrench: Tools usage #### [WinDbg usage guide](windbg) My field notes describing usage of WinDbg and WinDbgX (new WinDbg). #### [GDB usage guide](gdb) My field notes describing usage of GDB. #### [Event Tracing for Windows (ETW)](etw) This guide describes how to collect and analyze ETW traces. #### [Linux Kernel Tracing](linux-tracing) The guide presents tracing frameworks available through `/sys/kernel/tracing` mount point. #### [eBPF](ebpf) The guide describes how to use eBPF to trace system and application events. #### [Network tracing tools](network-tracing-tools) This guide lists various network tools you may use to diagnose connectivity problems and collect network traces on Windows and Linux. #### [Windows Performance Counters](windows-performance-counters) The guide presents how to query Windows Performance Counters and analyze the collected data. #### [Using withdll and detours to trace Win API calls](using-withdll-and-detours-to-trace-winapi) This guide describes how to use [withdll](https://github.com/lowleveldesign/withdll) and [Detours](https://github.com/microsoft/Detours) samples to collect traces of Win API calls. ================================================ FILE: index.md ================================================ --- title: wtrace.net description: Tools and materials for software and system troubleshooting feature_image: /assets/img/background.jpg --- ## Hello fellow troubleshooters! I created this site to share guides and tools that I developed during my career as a software developer and troubleshooter. The [**guides**](/guides/) focus on practical techniques, tools, and scripts with usage examples rather than theoretical concepts. I regularly update them with new discoveries and insights. ### Quick Links - [WinDbg usage guide](/guides/windbg) - [Diagnosing native Windows applications](/guides/diagnosing-native-windows-apps) - [Diagnosing .NET applications](/guides/diagnosing-dotnet-apps) - [Network tracing tools](/guides/network-tracing-tools/) - [Event Tracing for Windows](/guides/etw) ================================================ FILE: site.webmanifest ================================================ { "name": "", "short_name": "", "icons": [ { "src": "/android-chrome-192x192.png", "sizes": "192x192", "type": "image/png" }, { "src": "/android-chrome-512x512.png", "sizes": "512x512", "type": "image/png" } ], "theme_color": "#ffffff", "background_color": "#ffffff", "display": "standalone" } ================================================ FILE: tools.md ================================================ --- layout: page title: Tools --- ### :feet: Tracing tools #### [wtrace](https://github.com/lowleveldesign/wtrace) A command-line tool for live recording ETW trace events on Windows systems. Wtrace collects, among others, File I/O and Registry operations, TPC/IP connections, and RPC calls. Its purpose is to give you some insights into what is happening in the system. #### [dotnet-wtrace](http://github.com/lowleveldesign/dotnet-wtrace) A cross-platform command-line tool for live recording .NET trace events. Dotnet-wtrace collects, among others, GC, network, ASP.NET Core, and exception events. #### [withdll](https://github.com/lowleveldesign/withdll) A small tool which can inject DLLs into already running and newly started processes. The injected DLL may, for example, trace or patch functions in the remote process. ### :beetle: Debugging tools #### [lldext](https://github.com/lowleveldesign/lldext) (a WinDbg extension) The repository contains the source code of a native lldext extension and my various scripts enhancing debugging with WinDbg. #### [comon](https://github.com/lowleveldesign/comon) (a WinDbg extension) A WinDbg extension showing traces of COM class creations and interface querying. You may use it to investigate various COM issues and better understand application logic.