Repository: phil-opp/blog_os Branch: main Commit: 4f524ff4167c Files: 240 Total size: 5.1 MB Directory structure: gitextract_p9hgi7es/ ├── .github/ │ ├── FUNDING.yml │ └── workflows/ │ ├── blog.yml │ ├── check-links.yml │ └── scheduled-builds.yml ├── .gitignore ├── LICENSE-APACHE ├── LICENSE-MIT ├── README.md ├── blog/ │ ├── .gitignore │ ├── before_build.py │ ├── config.toml │ ├── content/ │ │ ├── LICENSE-CC-BY-NC │ │ ├── README.md │ │ ├── _index.ar.md │ │ ├── _index.es.md │ │ ├── _index.fa.md │ │ ├── _index.fr.md │ │ ├── _index.ja.md │ │ ├── _index.ko.md │ │ ├── _index.md │ │ ├── _index.pt-BR.md │ │ ├── _index.ru.md │ │ ├── _index.zh-CN.md │ │ ├── _index.zh-TW.md │ │ ├── edition-1/ │ │ │ ├── _index.md │ │ │ ├── extra/ │ │ │ │ ├── _index.md │ │ │ │ ├── cross-compile-binutils.md │ │ │ │ ├── cross-compile-libcore.md │ │ │ │ ├── naked-exceptions/ │ │ │ │ │ ├── 01-catching-exceptions/ │ │ │ │ │ │ └── index.md │ │ │ │ │ ├── 02-better-exception-messages/ │ │ │ │ │ │ └── index.md │ │ │ │ │ ├── 03-returning-from-exceptions/ │ │ │ │ │ │ └── index.md │ │ │ │ │ └── _index.md │ │ │ │ ├── set-up-gdb/ │ │ │ │ │ └── index.md │ │ │ │ └── talks.md │ │ │ └── posts/ │ │ │ ├── 01-multiboot-kernel/ │ │ │ │ └── index.md │ │ │ ├── 02-entering-longmode/ │ │ │ │ └── index.md │ │ │ ├── 03-set-up-rust/ │ │ │ │ └── index.md │ │ │ ├── 04-printing-to-screen/ │ │ │ │ └── index.md │ │ │ ├── 05-allocating-frames/ │ │ │ │ └── index.md │ │ │ ├── 06-page-tables/ │ │ │ │ └── index.md │ │ │ ├── 07-remap-the-kernel/ │ │ │ │ └── index.md │ │ │ ├── 08-kernel-heap/ │ │ │ │ └── index.md │ │ │ ├── 09-handling-exceptions/ │ │ │ │ └── index.md │ │ │ ├── 10-double-faults/ │ │ │ │ └── index.md │ │ │ └── _index.md │ │ ├── edition-2/ │ │ │ ├── _index.md │ │ │ ├── extra/ │ │ │ │ ├── _index.md │ │ │ │ └── building-on-android/ │ │ │ │ └── index.md │ │ │ └── posts/ │ │ │ ├── 01-freestanding-rust-binary/ │ │ │ │ ├── index.ar.md │ │ │ │ ├── index.es.md │ │ │ │ ├── index.fa.md │ │ │ │ ├── index.fr.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.ko.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ ├── index.ru.md │ │ │ │ ├── index.zh-CN.md │ │ │ │ └── index.zh-TW.md │ │ │ ├── 02-minimal-rust-kernel/ │ │ │ │ ├── _index.md │ │ │ │ ├── disable-red-zone/ │ │ │ │ │ ├── index.ko.md │ │ │ │ │ ├── index.md │ │ │ │ │ ├── index.pt-BR.md │ │ │ │ │ ├── index.ru.md │ │ │ │ │ └── index.zh-CN.md │ │ │ │ ├── disable-simd/ │ │ │ │ │ ├── index.ko.md │ │ │ │ │ ├── index.md │ │ │ │ │ ├── index.pt-BR.md │ │ │ │ │ ├── index.ru.md │ │ │ │ │ └── index.zh-CN.md │ │ │ │ ├── index.es.md │ │ │ │ ├── index.fa.md │ │ │ │ ├── index.fr.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.ko.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ ├── index.ru.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 03-vga-text-buffer/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.fa.md │ │ │ │ ├── index.fr.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.ko.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 04-testing/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.fa.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.ko.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 05-cpu-exceptions/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.fa.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.ko.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 06-double-faults/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.fa.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.ko.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 07-hardware-interrupts/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.fa.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.ko.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 08-paging-introduction/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.fa.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 09-paging-implementation/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 10-heap-allocation/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 11-allocator-designs/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ ├── index.ru.md │ │ │ │ └── index.zh-CN.md │ │ │ ├── 12-async-await/ │ │ │ │ ├── index.es.md │ │ │ │ ├── index.ja.md │ │ │ │ ├── index.md │ │ │ │ ├── index.pt-BR.md │ │ │ │ ├── index.ru.md │ │ │ │ ├── index.zh-CN.md │ │ │ │ ├── index.zh-TW.md │ │ │ │ └── scancode-queue.drawio │ │ │ ├── _index.ar.md │ │ │ ├── _index.es.md │ │ │ ├── _index.fa.md │ │ │ ├── _index.fr.md │ │ │ ├── _index.ja.md │ │ │ ├── _index.ko.md │ │ │ ├── _index.md │ │ │ ├── _index.pt-BR.md │ │ │ ├── _index.ru.md │ │ │ ├── _index.zh-CN.md │ │ │ ├── _index.zh-TW.md │ │ │ └── deprecated/ │ │ │ ├── 04-unit-testing/ │ │ │ │ └── index.md │ │ │ ├── 05-integration-tests/ │ │ │ │ └── index.md │ │ │ ├── 10-advanced-paging/ │ │ │ │ └── index.md │ │ │ └── _index.md │ │ ├── news/ │ │ │ ├── 2018-03-09-pure-rust.md │ │ │ └── _index.md │ │ ├── pages/ │ │ │ ├── _index.md │ │ │ └── contact.md │ │ └── status-update/ │ │ ├── 2019-05-01.md │ │ ├── 2019-06-03.md │ │ ├── 2019-07-06.md │ │ ├── 2019-08-02.md │ │ ├── 2019-09-09.md │ │ ├── 2019-10-06.md │ │ ├── 2019-12-02.md │ │ ├── 2020-01-07.md │ │ ├── 2020-02-01.md │ │ ├── 2020-03-02.md │ │ ├── 2020-04-01/ │ │ │ └── index.md │ │ └── _index.md │ ├── diagrams/ │ │ ├── red-zone-overwrite.dia │ │ ├── red-zone.dia │ │ └── xmm-overwrite.dia │ ├── requirements.txt │ ├── sass/ │ │ └── css/ │ │ └── edition-2/ │ │ └── main.scss │ ├── static/ │ │ ├── CNAME │ │ ├── atom.xml/ │ │ │ └── index.html │ │ ├── css/ │ │ │ └── edition-1/ │ │ │ ├── isso.css │ │ │ ├── main.css │ │ │ └── poole.css │ │ ├── handling-exceptions-with-naked-fns.html │ │ └── js/ │ │ ├── edition-1/ │ │ │ └── main.js │ │ └── edition-2/ │ │ └── main.js │ ├── templates/ │ │ ├── 404.html │ │ ├── auto/ │ │ │ ├── forks.html │ │ │ ├── recent-updates.html │ │ │ ├── stars.html │ │ │ ├── status-updates-truncated.html │ │ │ └── status-updates.html │ │ ├── base.html │ │ ├── edition-1/ │ │ │ ├── base.html │ │ │ ├── comments/ │ │ │ │ ├── allocating-frames.html │ │ │ │ ├── better-exception-messages.html │ │ │ │ ├── catching-exceptions.html │ │ │ │ ├── double-faults.html │ │ │ │ ├── entering-longmode.html │ │ │ │ ├── handling-exceptions.html │ │ │ │ ├── kernel-heap.html │ │ │ │ ├── multiboot-kernel.html │ │ │ │ ├── page-tables.html │ │ │ │ ├── printing-to-screen.html │ │ │ │ ├── remap-the-kernel.html │ │ │ │ ├── returning-from-exceptions.html │ │ │ │ └── set-up-rust.html │ │ │ ├── comments.html │ │ │ ├── handling-exceptions-with-naked-fns.html │ │ │ ├── index.html │ │ │ ├── macros.html │ │ │ ├── page.html │ │ │ └── section.html │ │ ├── edition-2/ │ │ │ ├── base.html │ │ │ ├── extra.html │ │ │ ├── index.html │ │ │ ├── macros.html │ │ │ ├── page.html │ │ │ └── section.html │ │ ├── index.html │ │ ├── news-page.html │ │ ├── news-section.html │ │ ├── plain.html │ │ ├── redirect-to-frontpage.html │ │ ├── rss.xml │ │ ├── section.html │ │ ├── snippets.html │ │ ├── status-update-page.html │ │ └── status-update-section.html │ └── typos.toml ├── docker/ │ ├── .bash_aliases │ ├── Dockerfile │ ├── README.md │ └── entrypoint.sh ├── giscus.json └── scripts/ ├── merge.fish └── push.fish ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/FUNDING.yml ================================================ # These are supported funding model platforms github: [phil-opp] # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] custom: ['https://donorbox.org/phil-opp'] # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2'] patreon: phil_opp # Replace with a single Patreon username open_collective: # Replace with a single Open Collective username ko_fi: # Replace with a single Ko-fi username tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry liberapay: phil-opp # Replace with a single Liberapay username issuehunt: # Replace with a single IssueHunt username otechie: # Replace with a single Otechie username ================================================ FILE: .github/workflows/blog.yml ================================================ name: Blog on: push: branches: - '*' - '!staging.tmp' tags: - '*' pull_request: schedule: - cron: '0 0 1/4 * *' # every 4 days jobs: build_site: name: "Zola Build" runs-on: ubuntu-latest steps: - uses: actions/checkout@v1 - name: 'Download Zola' run: curl -sL https://github.com/getzola/zola/releases/download/v0.19.0/zola-v0.19.0-x86_64-unknown-linux-gnu.tar.gz | tar zxv - name: 'Install Python Libraries' run: python -m pip install --user -r requirements.txt working-directory: "blog" - name: "Run before_build.py script" run: python before_build.py working-directory: "blog" - name: "Build Site" run: ../zola build working-directory: "blog" - name: Upload Generated Site uses: actions/upload-artifact@v4 with: name: generated_site path: blog/public check_spelling: name: "Check Spelling" runs-on: ubuntu-latest steps: - uses: actions/checkout@v1 - name: Typo Check uses: crate-ci/typos@v1.1.9 with: files: blog deploy_site: name: "Deploy Generated Site" runs-on: ubuntu-latest needs: [build_site, check_spelling] if: github.ref == 'refs/heads/main' && (github.event_name == 'push' || github.event_name == 'schedule') steps: - name: "Download Generated Site" uses: actions/download-artifact@v4 with: name: generated_site path: generated_site - name: Setup SSH Keys and known_hosts run: | mkdir -p ~/.ssh ssh-keyscan github.com >> ~/.ssh/known_hosts ssh-agent -a $SSH_AUTH_SOCK > /dev/null ssh-add - <<< "$deploy_key" echo "SSH_AUTH_SOCK=$SSH_AUTH_SOCK" >> $GITHUB_ENV env: SSH_AUTH_SOCK: /tmp/ssh_agent.sock deploy_key: ${{ secrets.DEPLOY_SSH_KEY }} - name: "Clone blog_os_deploy Repo" run: git clone git@github.com:phil-opp/blog_os_deploy.git --branch gh-pages - name: "Set Up Git Identity" run: | git config --local user.name "GitHub Actions Deploy" git config --local user.email "github-actions-deploy@phil-opp.com" working-directory: "blog_os_deploy" - name: "Delete Old Content" run: "rm -r ./*" working-directory: "blog_os_deploy" - name: "Add New Content" run: cp -r generated_site/* blog_os_deploy - name: "Commit New Content" run: | git add . git commit --allow-empty -m "Deploy ${GITHUB_SHA} Deploy of commit https://github.com/phil-opp/blog_os/commit/${GITHUB_SHA}" working-directory: "blog_os_deploy" - name: "Show Changes" run: "git show" working-directory: "blog_os_deploy" - name: "Push Changes" run: "git push" working-directory: "blog_os_deploy" ================================================ FILE: .github/workflows/check-links.yml ================================================ name: Check Links on: push: branches: - "*" - "!staging.tmp" tags: - "*" pull_request: schedule: - cron: "0 0 1/4 * *" # every 4 days jobs: zola_check: name: "Zola Link Check" runs-on: ubuntu-latest steps: - uses: actions/checkout@v1 - name: "Download Zola" run: curl -sL https://github.com/getzola/zola/releases/download/v0.19.0/zola-v0.19.0-x86_64-unknown-linux-gnu.tar.gz | tar zxv - name: "Run zola check" run: ../zola check working-directory: "blog" ================================================ FILE: .github/workflows/scheduled-builds.yml ================================================ name: Build code on schedule on: schedule: - cron: '40 1 * * *' # every day at 1:40 jobs: trigger-build: name: Trigger Build strategy: matrix: branch: [ post-01, post-02, post-03, post-04, post-05, post-06, post-07, post-08, post-09, post-10, post-11, post-12, ] runs-on: ubuntu-latest steps: - name: Invoke workflow uses: benc-uk/workflow-dispatch@v1.1 with: workflow: Code token: ${{ secrets.SCHEDULED_BUILDS_TOKEN }} ref: ${{ matrix.branch }} ================================================ FILE: .gitignore ================================================ code ================================================ FILE: LICENSE-APACHE ================================================ Apache License Version 2.0, January 2004 https://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS ================================================ FILE: LICENSE-MIT ================================================ The MIT License (MIT) Copyright (c) 2019 Philipp Oppermann Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # Blog OS This repository contains the source code for the _Writing an OS in Rust_ series at [os.phil-opp.com](https://os.phil-opp.com). If you have questions, open an issue or chat with us [on Gitter](https://gitter.im/phil-opp/blog_os). ## Where is the code? The code for each post lives in a separate git branch. This makes it possible to see the intermediate state after each post. **The code for the latest post is available [here][latest-post].** [latest-post]: https://github.com/phil-opp/blog_os/tree/post-12 You can find the branch for each post by following the `(source code)` link in the [post list](#posts) below. The branches are named `post-XX` where `XX` is the post number, for example `post-03` for the _VGA Text Mode_ post or `post-07` for the _Hardware Interrupts_ post. For build instructions, see the Readme of the respective branch. You can check out a branch in a subdirectory using [git worktree]: [git worktree]: https://git-scm.com/docs/git-worktree ``` git worktree add code post-10 ``` The above command creates a subdirectory named `code` that contains the code for the 10th post ("Heap Allocation"). ## Posts The goal of this project is to provide step-by-step tutorials in individual blog posts. We currently have the following set of posts: **Bare Bones:** - [A Freestanding Rust Binary](https://os.phil-opp.com/freestanding-rust-binary/) ([source code](https://github.com/phil-opp/blog_os/tree/post-01)) - [A Minimal Rust Kernel](https://os.phil-opp.com/minimal-rust-kernel/) ([source code](https://github.com/phil-opp/blog_os/tree/post-02)) - [VGA Text Mode](https://os.phil-opp.com/vga-text-mode/) ([source code](https://github.com/phil-opp/blog_os/tree/post-03)) - [Testing](https://os.phil-opp.com/testing/) ([source code](https://github.com/phil-opp/blog_os/tree/post-04)) **Interrupts:** - [CPU Exceptions](https://os.phil-opp.com/cpu-exceptions/) ([source code](https://github.com/phil-opp/blog_os/tree/post-05)) - [Double Faults](https://os.phil-opp.com/double-fault-exceptions/) ([source code](https://github.com/phil-opp/blog_os/tree/post-06)) - [Hardware Interrupts](https://os.phil-opp.com/hardware-interrupts/) ([source code](https://github.com/phil-opp/blog_os/tree/post-07)) **Memory Management:** - [Introduction to Paging](https://os.phil-opp.com/paging-introduction/) ([source code](https://github.com/phil-opp/blog_os/tree/post-08)) - [Paging Implementation](https://os.phil-opp.com/paging-implementation/) ([source code](https://github.com/phil-opp/blog_os/tree/post-09)) - [Heap Allocation](https://os.phil-opp.com/heap-allocation/) ([source code](https://github.com/phil-opp/blog_os/tree/post-10)) - [Allocator Designs](https://os.phil-opp.com/allocator-designs/) ([source code](https://github.com/phil-opp/blog_os/tree/post-11)) **Multitasking**: - [Async/Await](https://os.phil-opp.com/async-await/) ([source code](https://github.com/phil-opp/blog_os/tree/post-12)) ## First Edition Posts The current version of the blog is already the second edition. The first edition is outdated and no longer maintained, but might still be useful. The posts of the first edition are:
Click to expand **Bare Bones:** - [A Minimal x86 Kernel](https://os.phil-opp.com/multiboot-kernel.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_1)) - [Entering Long Mode](https://os.phil-opp.com/entering-longmode.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_2)) - [Set Up Rust](https://os.phil-opp.com/set-up-rust.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_3)) - [Printing to Screen](https://os.phil-opp.com/printing-to-screen.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_4)) **Memory Management:** - [Allocating Frames](https://os.phil-opp.com/allocating-frames.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_5)) - [Page Tables](https://os.phil-opp.com/modifying-page-tables.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_6)) - [Remap the Kernel](https://os.phil-opp.com/remap-the-kernel.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_7)) - [Kernel Heap](https://os.phil-opp.com/kernel-heap.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_8)) **Exceptions:** - [Handling Exceptions](https://os.phil-opp.com/handling-exceptions.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_9)) - [Double Faults](https://os.phil-opp.com/double-faults.html) ([source code](https://github.com/phil-opp/blog_os/tree/first_edition_post_10)) **Additional Resources:** - [Cross Compile Binutils](https://os.phil-opp.com/cross-compile-binutils.html) - [Cross Compile libcore](https://os.phil-opp.com/cross-compile-libcore.html) - [Set Up GDB](https://os.phil-opp.com/set-up-gdb) - [Handling Exceptions using Naked Functions](https://os.phil-opp.com/handling-exceptions-with-naked-fns.html) - [Catching Exceptions](https://os.phil-opp.com/catching-exceptions.html) ([source code](https://github.com/phil-opp/blog_os/tree/catching_exceptions)) - [Better Exception Messages](https://os.phil-opp.com/better-exception-messages.html) ([source code](https://github.com/phil-opp/blog_os/tree/better_exception_messages)) - [Returning from Exceptions](https://os.phil-opp.com/returning-from-exceptions.html) ([source code](https://github.com/phil-opp/blog_os/tree/returning_from_exceptions))
## License This project, with exception of the `blog/content` folder, is licensed under either of - Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or https://www.apache.org/licenses/LICENSE-2.0) - MIT license ([LICENSE-MIT](LICENSE-MIT) or https://opensource.org/licenses/MIT) at your option. For licensing of the `blog/content` folder, see the [`blog/content/README.md`](blog/content/README.md). ### Contribution Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions. ================================================ FILE: blog/.gitignore ================================================ /public zola ================================================ FILE: blog/before_build.py ================================================ #!/usr/bin/env python # -*- coding: utf-8 -*- import io import urllib import datetime from github import Github g = Github() one_month_ago = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=32) def filter_date(issue): return issue.closed_at > one_month_ago def format_number(number): if number > 1000: return u"{:.1f}k".format(float(number) / 1000) else: return u"{}".format(number) with io.open("templates/auto/recent-updates.html", 'w', encoding='utf8') as recent_updates: recent_updates.truncate() relnotes_issues = g.search_issues("is:merged", repo="phil-opp/blog_os", type="pr", label="relnotes")[:100] recent_relnotes_issues = list(filter(filter_date, relnotes_issues)) if len(recent_relnotes_issues) == 0: recent_updates.write(u"No notable updates recently.") else: recent_updates.write(u"") repo = g.get_repo("phil-opp/blog_os") with io.open("templates/auto/stars.html", 'w', encoding='utf8') as stars: stars.truncate() stars.write(format_number(repo.stargazers_count)) with io.open("templates/auto/forks.html", 'w', encoding='utf8') as forks: forks.truncate() forks.write(format_number(repo.forks_count)) # query "This week in Rust OSDev posts" lines = [] year = 2020 month = 4 while True: url = "https://rust-osdev.com/this-month/" + str(year) + "-" + str(month).zfill(2) + "/" try: urllib.request.urlopen(url) except urllib.error.HTTPError as e: break month_str = datetime.date(1900, month, 1).strftime('%B') link = 'This Month in Rust OSDev (' + month_str + " " + str(year) + ") " lines.append(u"
  • " + link + "
  • \n") month = month + 1 if month > 12: month = 1 year = year + 1 lines.reverse() with io.open("templates/auto/status-updates.html", 'w', encoding='utf8') as status_updates: status_updates.truncate() for line in lines: status_updates.write(line) with io.open("templates/auto/status-updates-truncated.html", 'w', encoding='utf8') as status_updates: status_updates.truncate() for index, line in enumerate(lines): if index == 5: break status_updates.write(line) ================================================ FILE: blog/config.toml ================================================ base_url = "https://os.phil-opp.com" title = "Writing an OS in Rust" description = "This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code." generate_feeds = true feed_filenames = ["rss.xml"] compile_sass = true minify_html = false ignored_content = ["*/README.md", "*/LICENSE-CC-BY-NC"] [markdown] highlight_code = true highlight_theme = "visual-studio-dark" smart_punctuation = true [link_checker] skip_prefixes = [ "https://crates.io/crates", # see https://github.com/rust-lang/crates.io/issues/788 "https://www.amd.com/system/files/TechDocs/", # seems to have problems with PDFs "https://developer.apple.com/library/archive/qa/qa1118/_index.html", # results in a 401 (I don't know why) "https://github.com", # rate limiting often leads to "Error 429 Too Many Requests" "https://www.linkedin.com/", # seems to send invalid HTTP status codes ] skip_anchor_prefixes = [ "https://github.com/", # see https://github.com/getzola/zola/issues/805 "https://docs.rs/x86_64/0.1.2/src/", # source code highlight "https://doc.rust-jp.rs/book-ja/", # seems like Zola has problems with Japanese anchor names "https://doc.rust-jp.rs/edition-guide/rust-2018", # seems like Zola has problems with Japanese anchor names "https://doc.rust-jp.rs/rust-nomicon-ja/", # seems like Zola has problems with Japanese anchor names ] [extra] subtitle = "Philipp Oppermann's blog" author = { name = "Philipp Oppermann" } default_language = "en" languages = [ "en", "ar", "es", "fa", "fr", "ja", "ko", "pt-BR", "ru", "zh-CN", "zh-TW", ] [translations] lang_name = "English (original)" toc = "Table of Contents" all_posts = "« All Posts" comments = "Comments" comments_notice = "Please leave your comments in English if possible." readmore = "read more »" not_translated = "(This post is not translated yet.)" translated_content = "Translated Content:" translated_content_notice = "This is a community translation of the _original.title_ post. It might be incomplete, outdated or contain errors. Please report any issues!" translated_by = "Translation by" translation_contributors = "With contributions from" word_separator = "and" support_me = """

    Support Me

    Creating and maintaining this blog and the associated libraries is a lot of work, but I really enjoy doing it. By supporting me, you allow me to invest more time in new content, new features, and continuous maintenance. The best way to support me is to sponsor me on GitHub. Thank you!

    """ comment_note = """ Do you have a problem, want to share feedback, or discuss further ideas? Feel free to leave a comment here! Please stick to English and follow Rust's code of conduct. This comment thread directly maps to a discussion on GitHub, so you can also comment there if you prefer. """ # Chinese (simplified) [languages.zh-CN] title = "Writing an OS in Rust" description = "This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code." [languages.zh-CN.translations] lang_name = "Chinese (simplified)" toc = "目录" all_posts = "« 所有文章" comments = "评论" comments_notice = "请尽可能使用英语评论。" readmore = "更多 »" not_translated = "(该文章还没有被翻译。)" translated_content = "翻译内容:" translated_content_notice = "这是对原文章 _original.title_ 的社区中文翻译。它可能不完整,过时或者包含错误。可以在 这个 Issue 上评论和提问!" translated_by = "翻译者:" translation_contributors = "With contributions from" word_separator = "和" support_me = """

    支持我

    创建和维护这个博客以及相关的库带来了十分庞大的工作量,即便我十分热爱它们,仍然需要你们的支持。通过赞助我,可以让我有能投入更多时间与精力在创造新内容,开发新功能上。赞助我最好的办法是通过sponsor me on GitHub. 十分感谢各位!

    """ comment_note = """ 你有问题需要解决,想要分享反馈,或者讨论更多的想法吗?请随时在这里留下评论!请使用尽量使用英文并遵循 Rust 的 code of conduct. 这个讨论串将与 discussion on GitHub 直接连接,所以你也可以直接在那边发表评论 """ # Chinese (traditional) [languages.zh-TW] title = "Writing an OS in Rust" description = "This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code." [languages.zh-TW.translations] lang_name = "Chinese (traditional)" toc = "目錄" all_posts = "« 所有文章" comments = "評論" comments_notice = "請儘可能使用英語評論。" readmore = "更多 »" not_translated = "(該文章還沒有被翻譯。)" translated_content = "翻譯內容:" translated_content_notice = "這是對原文章 _original.title_ 的社區中文翻譯。它可能不完整,過時或者包含錯誤。可以在 這個 Issue 上評論和提問!" translated_by = "翻譯者:" translation_contributors = "With contributions from" word_separator = "和" support_me = """

    Support Me

    Creating and maintaining this blog and the associated libraries is a lot of work, but I really enjoy doing it. By supporting me, you allow me to invest more time in new content, new features, and continuous maintenance. The best way to support me is to sponsor me on GitHub. Thank you!

    """ comment_note = """ Do you have a problem, want to share feedback, or discuss further ideas? Feel free to leave a comment here! Please stick to English and follow Rust's code of conduct. This comment thread directly maps to a discussion on GitHub, so you can also comment there if you prefer. """ # Japanese [languages.ja] title = "Writing an OS in Rust" description = "This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code." [languages.ja.translations] lang_name = "Japanese" toc = "目次" all_posts = "« すべての記事へ" comments = "コメント" comments_notice = "可能な限りコメントは英語で残すようにしてください。" readmore = "もっと読む »" not_translated = "(この記事はまだ翻訳されていません。)" translated_content = "この記事は翻訳されたものです:" translated_content_notice = "この記事は_original.title_をコミュニティの手により翻訳したものです。そのため、翻訳が完全・最新でなかったり、原文にない誤りを含んでいる可能性があります。問題があればこのissue上で報告してください!" translated_by = "翻訳者:" translation_contributors = "With contributions from" word_separator = "及び" support_me = """

    Support Me

    Creating and maintaining this blog and the associated libraries is a lot of work, but I really enjoy doing it. By supporting me, you allow me to invest more time in new content, new features, and continuous maintenance. The best way to support me is to sponsor me on GitHub. Thank you!

    """ comment_note = """ Do you have a problem, want to share feedback, or discuss further ideas? Feel free to leave a comment here! Please stick to English and follow Rust's code of conduct. This comment thread directly maps to a discussion on GitHub, so you can also comment there if you prefer. """ # Persian [languages.fa] title = "Writing an OS in Rust" description = "This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code." [languages.fa.translations] lang_name = "Persian" toc = "فهرست مطالب" all_posts = "« همه پست‌ها" comments = "نظرات" comments_notice = "لطفا نظرات خود را در صورت امکان به انگلیسی بنویسید." readmore = "ادامه‌مطلب»" not_translated = "(.این پست هنوز ترجمه نشده است)" translated_content = "محتوای ترجمه شده:" translated_content_notice = "این یک ترجمه از جامعه کاربران برای پست _original.title_ است. ممکن است ناقص، منسوخ شده یا دارای خطا باشد. لطفا هر گونه مشکل را در این ایشو گزارش دهید!" translated_by = "ترجمه توسط" translation_contributors = "With contributions from" word_separator = "و" support_me = """

    Support Me

    Creating and maintaining this blog and the associated libraries is a lot of work, but I really enjoy doing it. By supporting me, you allow me to invest more time in new content, new features, and continuous maintenance. The best way to support me is to sponsor me on GitHub. Thank you!

    """ comment_note = """ Do you have a problem, want to share feedback, or discuss further ideas? Feel free to leave a comment here! Please stick to English and follow Rust's code of conduct. This comment thread directly maps to a discussion on GitHub, so you can also comment there if you prefer. """ # Russian [languages.ru] title = "Writing an OS in Rust" description = "This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code." [languages.ru.translations] lang_name = "Russian" toc = "Содержание" all_posts = "« Все посты" comments = "Комментарии" comments_notice = "Пожалуйста, оставляйте комментарии на английском по возможности." readmore = "читать дальше »" not_translated = "(Этот пост еще не переведен.)" translated_content = "Переведенное содержание:" translated_content_notice = "Это перевод сообщества поста _original.title_. Он может быть неполным, устаревшим или содержать ошибки. Пожалуйста, сообщайте о любых проблемах!" translated_by = "Перевод сделан" translation_contributors = "With contributions from" word_separator = "и" support_me = """

    Support Me

    Creating and maintaining this blog and the associated libraries is a lot of work, but I really enjoy doing it. By supporting me, you allow me to invest more time in new content, new features, and continuous maintenance. The best way to support me is to sponsor me on GitHub. Thank you!

    """ comment_note = """ Do you have a problem, want to share feedback, or discuss further ideas? Feel free to leave a comment here! Please stick to English and follow Rust's code of conduct. This comment thread directly maps to a discussion on GitHub, so you can also comment there if you prefer. """ # French [languages.fr] title = "Writing an OS in Rust" description = "This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code." [languages.fr.translations] lang_name = "French" toc = "Table des matières" all_posts = "« Tous les articles" comments = "Commentaires" comments_notice = "Veuillez commenter en Anglais si possible." readmore = "Voir plus »" not_translated = "(Cet article n'est pas encore traduit.)" translated_content = "Contenu traduit : " translated_content_notice = "Ceci est une traduction communautaire de l'article _original.title_. Il peut être incomplet, obsolète ou contenir des erreurs. Veuillez signaler les quelconques problèmes !" translated_by = "Traduit par : " translation_contributors = "With contributions from" word_separator = "et" support_me = """

    Support Me

    Creating and maintaining this blog and the associated libraries is a lot of work, but I really enjoy doing it. By supporting me, you allow me to invest more time in new content, new features, and continuous maintenance. The best way to support me is to sponsor me on GitHub. Thank you!

    """ comment_note = """ Do you have a problem, want to share feedback, or discuss further ideas? Feel free to leave a comment here! Please stick to English and follow Rust's code of conduct. This comment thread directly maps to a discussion on GitHub, so you can also comment there if you prefer. """ # Korean [languages.ko] title = "Writing an OS in Rust" description = "This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code." [languages.ko.translations] lang_name = "Korean" toc = "목차" all_posts = "« 모든 게시글" comments = "댓글" comments_notice = "댓글은 가능하면 영어로 작성해주세요." readmore = "더 읽기 »" not_translated = "(아직 번역이 완료되지 않은 게시글입니다)" translated_content = "번역된 내용 : " translated_content_notice = "이것은 커뮤니티 멤버가 _original.title_ 포스트를 번역한 글입니다. 부족한 설명이나 오류, 혹은 시간이 지나 더 이상 유효하지 않은 정보를 발견하시면 제보해주세요!" translated_by = "번역한 사람 : " translation_contributors = "With contributions from" word_separator = "와" support_me = """

    Support Me

    Creating and maintaining this blog and the associated libraries is a lot of work, but I really enjoy doing it. By supporting me, you allow me to invest more time in new content, new features, and continuous maintenance. The best way to support me is to sponsor me on GitHub. Thank you!

    """ comment_note = """ Do you have a problem, want to share feedback, or discuss further ideas? Feel free to leave a comment here! Please stick to English and follow Rust's code of conduct. This comment thread directly maps to a discussion on GitHub, so you can also comment there if you prefer. """ [languages.ar] title = "Writing an OS in Rust" [languages.ar.translations] lang_name = "Arabic" toc = "Table of Contents" all_posts = "« All Posts" comments = "Comments" comments_notice = "Please leave your comments in English if possible." readmore = "read more »" not_translated = "(This post is not translated yet.)" translated_content = "Translated Content:" translated_content_notice = "This is a community translation of the _original.title_ post. It might be incomplete, outdated or contain errors. Please report any issues!" translated_by = "Translation by" translation_contributors = "With contributions from" word_separator = "and" support_me = """

    Support Me

    Creating and maintaining this blog and the associated libraries is a lot of work, but I really enjoy doing it. By supporting me, you allow me to invest more time in new content, new features, and continuous maintenance. The best way to support me is to sponsor me on GitHub. Thank you!

    """ comment_note = """ Do you have a problem, want to share feedback, or discuss further ideas? Feel free to leave a comment here! Please stick to English and follow Rust's code of conduct. This comment thread directly maps to a discussion on GitHub, so you can also comment there if you prefer. """ # Spanish [languages.es] title = "Writing an OS in Rust" description = "This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code." [languages.es.translations] lang_name = "Spanish" toc = "Tabla de Contenidos" all_posts = "« Todos los Posts" comments = "Comentarios" comments_notice = "Por favor deja tus comentarios en inglés si es posible." readmore = "leer más »" not_translated = "(Este post aún no está traducido.)" translated_content = "Contenido Traducido:" translated_content_notice = "Esta es una traducción comunitaria del post _original.title_. Puede estar incompleta, desactualizada o contener errores. ¡Por favor reporta cualquier problema!" translated_by = "Traducción por" translation_contributors = "Con contribuciones de" word_separator = "y" support_me = """

    Apóyame

    Crear y mantener este blog y las bibliotecas asociadas es mucho trabajo, pero realmente disfruto haciéndolo. Al apoyarme, me permites invertir más tiempo en nuevo contenido, nuevas características y mantenimiento continuo. La mejor manera de apoyarme es patrocinarme en GitHub. ¡Gracias!

    """ comment_note = """ ¿Tienes algún problema, quieres compartir comentarios o discutir más ideas? ¡No dudes en dejar un comentario aquí! Por favor, utiliza inglés y sigue el código de conducta de Rust. Este hilo de comentarios se vincula directamente con una discusión en GitHub, así que también puedes comentar allí si lo prefieres. """ # Portuguese (Brazil) [languages.pt-BR] title = "Escrevendo um OS em Rust" description = "Esta série de blog cria um pequeno sistema operacional na linguagem de programação Rust. Cada post é um pequeno tutorial e inclui todo o código necessário." [languages.pt-BR.translations] lang_name = "Português (Brasil)" toc = "Tabela de Conteúdos" all_posts = "« Todos os Posts" comments = "Comentários" comments_notice = "Por favor, deixe seus comentários em inglês se possível." readmore = "ler mais »" not_translated = "(Esta postagem ainda não foi traduzida.)" translated_content = "Conteúdo Traduzido:" translated_content_notice = "Esta é uma tradução comunitária do post _original.title_. Pode estar incompleta, desatualizada ou conter erros. Por favor, reporte qualquer problema!" translated_by = "Traduzido por" translation_contributors = "Com contribuições de" word_separator = "e" support_me = """

    Apoie-me

    Criar e manter este blog e as bibliotecas associadas dá muito trabalho, mas eu realmente gosto de fazê-lo. Ao me apoiar, você me permite investir mais tempo em novo conteúdo, novos recursos e manutenção contínua. A melhor forma de me apoiar é me patrocinar no GitHub. Obrigado!

    """ comment_note = """ Teve algum problema, quer deixar um feedback ou discutir mais ideias? Fique à vontade para deixar um comentário aqui! Por favor, use o inglês e siga o código de conduta do Rust. Este tópico de comentários está diretamente vinculado a uma discussão no GitHub, então você também pode comentar lá se preferir. """ ================================================ FILE: blog/content/LICENSE-CC-BY-NC ================================================ Creative Commons Attribution-NonCommercial 4.0 International Public License By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. Section 1 – Definitions. Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License. Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. Licensor means the individual(s) or entity(ies) granting rights under this Public License. NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange. Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. Section 2 – Scope. License grant. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: reproduce and Share the Licensed Material, in whole or in part, for NonCommercial purposes only; and produce, reproduce, and Share Adapted Material for NonCommercial purposes only. Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. Term. The term of this Public License is specified in Section 6(a). Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material. Downstream recipients. Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). Other rights. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. Patent and trademark rights are not licensed under this Public License. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties, including when the Licensed Material is used other than for NonCommercial purposes. Section 3 – License Conditions. Your exercise of the Licensed Rights is expressly made subject to the following conditions. Attribution. If You Share the Licensed Material (including in modified form), You must: retain the following if it is supplied by the Licensor with the Licensed Material: identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); a copyright notice; a notice that refers to this Public License; a notice that refers to the disclaimer of warranties; a URI or hyperlink to the Licensed Material to the extent reasonably practicable; indicate if You modified the Licensed Material and retain an indication of any previous modifications; and indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License. Section 4 – Sui Generis Database Rights. Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database for NonCommercial purposes only; if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. Section 5 – Disclaimer of Warranties and Limitation of Liability. Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You. To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. Section 6 – Term and Termination. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or upon express reinstatement by the Licensor. For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. Sections 1, 5, 6, 7, and 8 survive termination of this Public License. Section 7 – Other Terms and Conditions. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. Section 8 – Interpretation. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. ================================================ FILE: blog/content/README.md ================================================ # Blog Content This folder contains the content for the _"Writing an OS in Rust"_ blog. ## License This folder is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, available in [LICENSE-CC-BY-NC](LICENSE-CC-BY-NC) or under . All _code examples_ between markdown code blocks denoted by three backticks (\`\`\`) are additionally licensed under either of - Apache License, Version 2.0 ([LICENSE-APACHE](../../LICENSE-APACHE) or https://www.apache.org/licenses/LICENSE-2.0) - MIT license ([LICENSE-MIT](../../LICENSE-MIT) or https://opensource.org/licenses/MIT) at your option. ### Contribution Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be licensed as above, without any additional terms or conditions. ================================================ FILE: blog/content/_index.ar.md ================================================ +++ template = "edition-2/index.html" +++

    كتابة نظام تشغيل بلغة Rust

    تنشئ سلسلة المدونات هذه نظام تشغيل صغير بلغة البرمجة [Rust ](https://www.rust-lang.org/). كل منشور هو عبارة عن برنامج تعليمي صغير ويتضمن كل الشيفرة المطلوبة، لذا يمكنك المتابعة إذا أردت. الكود المصدري متاح أيضًا في مستودع [Github ](https://github.com/phil-opp/blog_os) المقابل. آخر منشور:
    ================================================ FILE: blog/content/_index.es.md ================================================ +++ template = "edition-2/index.html" +++

    Escribiendo un sistema operativo en Rust

    Esta serie de blogs crea un pequeño sistema operativo en el [lenguaje de programación Rust](https://www.rust-lang.org/). Cada publicación es un pequeño tutorial e incluye todo el código necesario, para que puedas seguir los pasos si lo deseas. El código fuente también está disponible en el [repositorio correspondiente de Github](https://github.com/phil-opp/blog_os). Última publicación:
    ================================================ FILE: blog/content/_index.fa.md ================================================ +++ template = "edition-2/index.html" +++

    نوشتن یک سیستم عامل با راست

    این مجموعه بلاگ یک سیستم عامل کوچک در [زبان برنامه نویسی Rust](https://www.rust-lang.org/) ایجاد می کند. هر پست یک آموزش کوچک است و شامل تمام کدهای مورد نیاز است ، بنابراین اگر دوست دارید می توانید آن را دنبال کنید. کد منبع نیز در [مخزن گیت‌هاب](https://github.com/phil-opp/blog_os) مربوطه موجود است. اخرین پست:
    ================================================ FILE: blog/content/_index.fr.md ================================================ +++ template = "edition-2/index.html" +++

    Écrire un OS en Rust

    L'objectif de ce blog est de créer un petit système d'exploitation avec le [langage de programmation Rust](https://www.rust-lang.org/). Chaque article est un petit tutoriel et comprend tout le code nécessaire, vous pouvez donc essayer en même temps si vous le souhaitez. Le code source est aussi disponible dans le [dépôt GitHub](https://github.com/phil-opp/blog_os) correspondant. Dernier article :
    ================================================ FILE: blog/content/_index.ja.md ================================================ +++ template = "edition-2/index.html" +++

    RustでOSを書く

    このブログシリーズでは、ちょっとしたオペレーティングシステムを[Rustプログラミング言語](https://www.rust-lang.org/)を使って作ります。それぞれの記事が小さなチュートリアルになっており、必要なコードも全て記事内に記されているので、一つずつ読み進めて行けば理解できるでしょう。対応した[Githubリポジトリ](https://github.com/phil-opp/blog_os)でソースコードを見ることもできます。 最新記事:
    ================================================ FILE: blog/content/_index.ko.md ================================================ +++ template = "edition-2/index.html" +++

    Rust로 OS 구현하기

    이 블로그 시리즈는 [Rust 프로그래밍 언어](https://www.rust-lang.org/)로 작은 OS를 구현하는 것을 주제로 합니다. 각 포스트는 구현에 필요한 소스 코드를 포함한 작은 튜토리얼 형식으로 구성되어 있습니다. 소스 코드는 이 블로그의 [Github 저장소](https://github.com/phil-opp/blog_os)에서도 확인하실 수 있습니다. 최신 포스트:
    ================================================ FILE: blog/content/_index.md ================================================ +++ template = "edition-2/index.html" +++

    Writing an OS in Rust

    This blog series creates a small operating system in the [Rust programming language](https://www.rust-lang.org/). Each post is a small tutorial and includes all needed code, so you can follow along if you like. The source code is also available in the corresponding [Github repository](https://github.com/phil-opp/blog_os). Latest post:
    ================================================ FILE: blog/content/_index.pt-BR.md ================================================ +++ template = "edition-2/index.html" +++

    Escrevendo um OS em Rust

    Esta série de posts do blog cria um pequeno sistema operacional na [linguagem de programação Rust](https://www.rust-lang.org/). Cada post é um pequeno tutorial e inclui todo o código necessário, então você pode acompanhar se quiser. O código-fonte também está disponível no [repositório Github](https://github.com/phil-opp/blog_os) correspondente. Último post:
    ================================================ FILE: blog/content/_index.ru.md ================================================ +++ template = "edition-2/index.html" +++

    Собственная операционная система на Rust

    Этот блог посвящен написанию маленькой операционной системы на [языке программирования Rust](https://www.rust-lang.org/). Каждый пост — это маленькое руководство, включающее в себя весь необходимый код, — вы сможете следовать ему, если пожелаете. Исходный код также доступен в соотвестующем [репозитории на Github](https://github.com/phil-opp/blog_os). Последний пост:
    ================================================ FILE: blog/content/_index.zh-CN.md ================================================ +++ template = "edition-2/index.html" +++

    用Rust写一个操作系统

    这个博客系列用[Rust编程语言](https://www.rust-lang.org/)编写了一个小操作系统。每篇文章都是一个小教程,并且包含了所有代码,你可以跟着一起学习。源代码也放在了[Github 仓库](https://github.com/phil-opp/blog_os)。 最新文章:
    ================================================ FILE: blog/content/_index.zh-TW.md ================================================ +++ template = "edition-2/index.html" +++

    Writing an OS in Rust

    This blog series creates a small operating system in the [Rust programming language](https://www.rust-lang.org/). Each post is a small tutorial and includes all needed code, so you can follow along if you like. The source code is also available in the corresponding [Github repository](https://github.com/phil-opp/blog_os). Latest post:
    ================================================ FILE: blog/content/edition-1/_index.md ================================================ +++ title = "First Edition" template = "edition-1/index.html" aliases = ["first-edition/index.html"] +++ ================================================ FILE: blog/content/edition-1/extra/_index.md ================================================ +++ title = "Extra Content" insert_anchor_links = "left" render = false sort_by = "weight" +++ ================================================ FILE: blog/content/edition-1/extra/cross-compile-binutils.md ================================================ +++ title = "Cross Compile Binutils" template = "plain.html" path = "cross-compile-binutils" weight = 2 +++ The [GNU Binutils] are a collection of various binary tools such as `ld`, `as`, `objdump`, or `readelf`. These tools are platform-specific, so you need to compile them again if your host system and target system are different. In our case, we need `ld` and `objdump` for the x86_64 architecture. [GNU Binutils]: https://www.gnu.org/software/binutils/ ## Building Setup First, you need to download a current binutils version from [here][download] \(the latest one is near the bottom). After extracting, you should have a folder named `binutils-2.X` where `X` is for example `25.1`. Now can create and switch to a new folder for building (recommended): [download]: ftp://sourceware.org/pub/binutils/snapshots ```bash mkdir build-binutils cd build-binutils ``` ## Configuration We execute binutils's `configure` and pass a lot of arguments to it (replace the `X` with the version number): ```bash ../binutils-2.X/configure --target=x86_64-elf --prefix="$HOME/opt/cross" \ --disable-nls --disable-werror \ --disable-gdb --disable-libdecnumber --disable-readline --disable-sim ``` - The `target` argument specifies the the x86_64 target architecture. - The `prefix` argument selects the installation directory, you can change it if you like. But be careful that you do not overwrite your system's binutils. - The `disable-nls` flag disables native language support (so you'll get the same english error messages). It also reduces build dependencies. - The `disable-werror` turns all warnings into errors. - The last line disables features we don't need to reduce compile time. ## Building it Now we can build and install it to the location supplied as `prefix` (it will take a while): ```bash make make install ``` Now you should have multiple `x86_64-elf-XXX` files in `$HOME/opt/cross/bin`. ## Adding it to the PATH To use the tools from the command line easily, you should add the `bin` folder to your PATH: ```bash export PATH="$HOME/opt/cross/bin:$PATH" ``` If you add this line to your e.g. `.bashrc`, the `x86_64-elf-XXX` commands are always available. ================================================ FILE: blog/content/edition-1/extra/cross-compile-libcore.md ================================================ +++ title = "Cross Compiling: libcore" template = "plain.html" path = "cross-compile-libcore" weight = 3 +++ If you get an `error: can't find crate for 'core'`, you're probably compiling for a different target (e.g. you're passing the `target` option to `cargo build`). Now the compiler complains that it can't find the `core` library. This document gives a quick overview how to fix this problem. For more details, see the [rust-cross] project. [rust-cross]: https://github.com/japaric/rust-cross ## Libcore The core library is a dependency-free library that is added implicitly when using `#![no_std]`. It provides basic standard library features like Option or Iterator. The core library is installed together with the rust compiler (just like the std library). But the installed libcore is specific to your architecture. If you aren't working on x86_64 Linux and pass `‑‑target x86_64‑unknown‑linux‑gnu` to cargo, it can't find a x86_64 libcore. To fix this, you can either use `rustup` or `xargo`. ## rustup Thanks to [rustup], cross-compilation for [official target triples] is pretty easy today: Just execute `rustup target add x86_64-unknown-linux-gnu`. [rustup]: https://rustup.rs [official target triples]: https://github.com/japaric/rust-cross#the-target-triple ## xargo If you're using a _custom target specification_, the `rustup` method doesn't work. Instead, you can use [xargo]. Xargo is a wrapper for cargo that eases cross compilation. We can install it by executing: ``` cargo install xargo ``` If the installation fails, make sure that you have `cmake` and the OpenSSL headers installed. For more details, see the xargo's [dependency section]. [xargo]: https://github.com/japaric/xargo [dependency section]: https://github.com/japaric/xargo#dependencies Xargo is “a drop-in replacement for cargo”, so every cargo command also works with `xargo`. You can do e.g. `xargo --help`, `xargo clean`, or `xargo doc`. However, the `build` command gains additional functionality: `xargo build` will automatically cross compile the `core` library (and a few other libraries such as `alloc`) when compiling for custom targets. [xargo]: https://github.com/japaric/xargo So if your custom target file is named `your-cool-target.json`, you can compile your code using xargo through `xargo build --target your-cool-target` (note the omitted extension). ================================================ FILE: blog/content/edition-1/extra/naked-exceptions/01-catching-exceptions/index.md ================================================ +++ title = "Catching Exceptions" weight = 1 path = "catching-exceptions" aliases = ["catching-exceptions.html"] date = 2016-05-28 template = "edition-1/page.html" [extra] updated = "2016-06-25" +++ In this post, we start exploring exceptions. We set up an interrupt descriptor table and add handler functions. At the end of this post, our kernel will be able to catch divide-by-zero faults. As always, the complete source code is on [GitHub]. Please file [issues] for any problems, questions, or improvement suggestions. There is also a comment section at the end of this page. [GitHub]: https://github.com/phil-opp/blog_os/tree/catching_exceptions [issues]: https://github.com/phil-opp/blog_os/issues > **Note**: This post describes how to handle exceptions using naked functions (see [“Handling Exceptions with Naked Functions”] for an overview). Our new way of handling exceptions can be found in the [“Handling Exceptions”] post. [“Handling Exceptions with Naked Functions”]: @/edition-1/extra/naked-exceptions/_index.md [“Handling Exceptions”]: @/edition-1/posts/09-handling-exceptions/index.md ## Exceptions An exception signals that something is wrong with the current instruction. For example, the CPU issues an exception if the current instruction tries to divide by 0. When an exception occurs, the CPU interrupts its current work and immediately calls a specific exception handler function, depending on the exception type. We've already seen several types of exceptions in our kernel: - **Invalid Opcode**: This exception occurs when the current instruction is invalid. For example, this exception occurred when we tried to use SSE instructions before enabling SSE. Without SSE, the CPU didn't know the `movups` and `movaps` instructions, so it throws an exception when it stumbles over them. - **Page Fault**: A page fault occurs on illegal memory accesses. For example, if the current instruction tries to read from an unmapped page or tries to write to a read-only page. - **Double Fault**: When an exception occurs, the CPU tries to call the corresponding handler function. If another exception exception occurs _while calling the exception handler_, the CPU raises a double fault exception. This exception also occurs when there is no handler function registered for an exception. - **Triple Fault**: If an exception occurs while the CPU tries to call the double fault handler function, it issues a fatal _triple fault_. We can't catch or handle a triple fault. Most processors react by resetting themselves and rebooting the operating system. This causes the bootloops we experienced in the previous posts. For the full list of exceptions check out the [OSDev wiki][exceptions]. [exceptions]: https://wiki.osdev.org/Exceptions ### The Interrupt Descriptor Table In order to catch and handle exceptions, we have to set up a so-called _Interrupt Descriptor Table_ (IDT). In this table we can specify a handler function for each CPU exception. The hardware uses this table directly, so we need to follow a predefined format. Each entry must have the following 16-byte structure: Type| Name | Description ----|--------------------------|----------------------------------- u16 | Function Pointer [0:15] | The lower bits of the pointer to the handler function. u16 | GDT selector | Selector of a code segment in the GDT. u16 | Options | (see below) u16 | Function Pointer [16:31] | The middle bits of the pointer to the handler function. u32 | Function Pointer [32:63] | The remaining bits of the pointer to the handler function. u32 | Reserved | The options field has the following format: Bits | Name | Description ------|-----------------------------------|----------------------------------- 0-2 | Interrupt Stack Table Index | 0: Don't switch stacks, 1-7: Switch to the n-th stack in the Interrupt Stack Table when this handler is called. 3-7 | Reserved | 8 | 0: Interrupt Gate, 1: Trap Gate | If this bit is 0, interrupts are disabled when this handler is called. 9-11 | must be one | 12 | must be zero | 13‑14 | Descriptor Privilege Level (DPL) | The minimal privilege level required for calling this handler. 15 | Present | Each exception has a predefined IDT index. For example the invalid opcode exception has table index 6 and the page fault exception has table index 14. Thus, the hardware can automatically load the corresponding IDT entry for each exception. The [Exception Table][exceptions] in the OSDev wiki shows the IDT indexes of all exceptions in the “Vector nr.” column. When an exception occurs, the CPU roughly does the following: 1. Read the corresponding entry from the Interrupt Descriptor Table (IDT). For example, the CPU reads the 14-th entry when a page fault occurs. 2. Check if the entry is present. Raise a double fault if not. 3. Push some registers on the stack, including the instruction pointer and the [EFLAGS] register. (We will use these values in a future post.) 4. Disable interrupts if the entry is an interrupt gate (bit 40 not set). 5. Load the specified GDT selector into the CS segment. 6. Jump to the specified handler function. [EFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register ## Handling Exceptions Let's try to catch and handle CPU exceptions. We start by creating a new `interrupts` module with an `idt` submodule: ``` rust // in src/lib.rs ... mod interrupts; ... ``` ``` rust // src/interrupts/mod.rs mod idt; ``` Now we create types for the IDT and its entries: ```rust // src/interrupts/idt.rs use x86_64::instructions::segmentation; use x86_64::structures::gdt::SegmentSelector; use x86_64::PrivilegeLevel; pub struct Idt([Entry; 16]); #[derive(Debug, Clone, Copy)] #[repr(C, packed)] pub struct Entry { pointer_low: u16, gdt_selector: SegmentSelector, options: EntryOptions, pointer_middle: u16, pointer_high: u32, reserved: u32, } ``` The IDT is variable sized and can have up to 256 entries. We only need the first 16 entries in this post, so we define the table as `[Entry; 16]`. The remaining 240 handlers are treated as non-present by the CPU. The `Entry` type is the translation of the above table to Rust. The `repr(C, packed)` attribute ensures that the compiler keeps the field ordering and does not add any padding between them. Instead of describing the `gdt_selector` as a plain `u16`, we use the `SegmentSelector` type of the `x86` crate. We also merge bits 32 to 47 into an `option` field, because Rust has no `u3` or `u1` type. The `EntryOptions` type is described below: ### Entry Options The `EntryOptions` type has the following skeleton: ``` rust #[derive(Debug, Clone, Copy)] pub struct EntryOptions(u16); impl EntryOptions { fn new() -> Self {...} pub fn set_present(&mut self, present: bool) {...} pub fn disable_interrupts(&mut self, disable: bool) {...} pub fn set_privilege_level(&mut self, dpl: u16) {...} pub fn set_stack_index(&mut self, index: u16) {...} } ``` The implementations of these methods need to modify the correct bits of the `u16` without touching the other bits. For example, we would need the following bit-fiddling to set the stack index: ``` rust self.0 = (self.0 & 0xfff8) | stack_index; ``` Or alternatively: ``` rust self.0 = (self.0 & (!0b111)) | stack_index; ``` Or: ``` rust self.0 = ((self.0 >> 3) << 3) | stack_index; ``` Well, none of these variants is really _readable_ and it's very easy to make mistakes somewhere. Therefore I created a `BitField` trait that provides the following [Range]-based API: [Range]: https://doc.rust-lang.org/nightly/core/ops/struct.Range.html ``` rust self.0.set_bits(0..3, stack_index); ``` I think it is much more readable, since we abstracted away all bit-masking details. The `BitField` trait is contained in the [bit_field] crate. (It's pretty new, so it might still contain bugs.) To add it as dependency, we run `cargo add bit_field` and add `extern crate bit_field;` to our `src/lib.rs`. [bit_field]: https://crates.io/crates/bit_field Now we can use the trait to implement the methods of `EntryOptions`: ```rust // in src/interrupts/idt.rs use bit_field::BitField; #[derive(Debug, Clone, Copy)] pub struct EntryOptions(u16); impl EntryOptions { fn minimal() -> Self { let mut options = 0; options.set_bits(9..12, 0b111); // 'must-be-one' bits EntryOptions(options) } fn new() -> Self { let mut options = Self::minimal(); options.set_present(true).disable_interrupts(true); options } pub fn set_present(&mut self, present: bool) -> &mut Self { self.0.set_bit(15, present); self } pub fn disable_interrupts(&mut self, disable: bool) -> &mut Self { self.0.set_bit(8, !disable); self } pub fn set_privilege_level(&mut self, dpl: u16) -> &mut Self { self.0.set_bits(13..15, dpl); self } pub fn set_stack_index(&mut self, index: u16) -> &mut Self { self.0.set_bits(0..3, index); self } } ``` Note that the ranges are _exclusive_ the upper bound. The `minimal` function creates an `EntryOptions` type with only the “must-be-one” bits set. The `new` function, on the other hand, chooses reasonable defaults: It sets the present bit (why would you want to create a non-present entry?) and disables interrupts (normally we don't want that our exception handlers can be interrupted). By returning the self pointer from the `set_*` methods, we allow easy method chaining such as `options.set_present(true).disable_interrupts(true)`. ### Creating IDT Entries Now we can add a function to create new IDT entries: ```rust impl Entry { fn new(gdt_selector: SegmentSelector, handler: HandlerFunc) -> Self { let pointer = handler as u64; Entry { gdt_selector: gdt_selector, pointer_low: pointer as u16, pointer_middle: (pointer >> 16) as u16, pointer_high: (pointer >> 32) as u32, options: EntryOptions::new(), reserved: 0, } } } ``` We take a GDT selector and a handler function as arguments and create a new IDT entry for it. The `HandlerFunc` type is described below. It is a function pointer that can be converted to an `u64`. We choose the lower 16 bits for `pointer_low`, the next 16 bits for `pointer_middle` and the remaining 32 bits for `pointer_high`. For the options field we choose our default options, i.e. present and disabled interrupts. ### The Handler Function Type The `HandlerFunc` type is a type alias for a function type: ``` rust pub type HandlerFunc = extern "C" fn() -> !; ``` It needs to be a function with a defined [calling convention], as it called directly by the hardware. The C calling convention is the de facto standard in OS development, so we're using it, too. The function takes no arguments, since the hardware doesn't supply any arguments when jumping to the handler function. [calling convention]: https://en.wikipedia.org/wiki/Calling_convention It is important that the function is [diverging], i.e. it must never return. The reason is that the hardware doesn't _call_ the handler functions, it just _jumps_ to them after pushing some values to the stack. So our stack might look different: [diverging]: https://doc.rust-lang.org/rust-by-example/fn/diverging.html ![normal function return vs interrupt function return](normal-vs-interrupt-function-return.svg) If our handler function returned normally, it would try to pop the return address from the stack. But it might get some completely different value then. For example, the CPU pushes an error code for some exceptions. Bad things would happen if we interpreted this error code as return address and jumped to it. Therefore interrupt handler functions must diverge[^fn-must-diverge]. [^fn-must-diverge]: Another reason is that we overwrite the current register values by executing the handler function. Thus, the interrupted function looses its state and can't proceed anyway. ### IDT methods Let's add a function to create new interrupt descriptor tables: ```rust impl Idt { pub fn new() -> Idt { Idt([Entry::missing(); 16]) } } impl Entry { fn missing() -> Self { Entry { gdt_selector: SegmentSelector::new(0, PrivilegeLevel::Ring0), pointer_low: 0, pointer_middle: 0, pointer_high: 0, options: EntryOptions::minimal(), reserved: 0, } } } ``` The `missing` function creates a non-present Entry. We could choose any values for the pointer and GDT selector fields as long as the present bit is not set. However, a table with non-present entries is not very useful. So we create a `set_handler` method to add new handler functions: ```rust impl Idt { pub fn set_handler(&mut self, entry: u8, handler: HandlerFunc) -> &mut EntryOptions { self.0[entry as usize] = Entry::new(segmentation::cs(), handler); &mut self.0[entry as usize].options } } ``` The method overwrites the specified entry with the given handler function. We use the `segmentation::cs` function of the [x86_64 crate] to get the current code segment descriptor. There's no need for different kernel code segments in long mode, so the current `cs` value should be always the right choice. [x86_64 crate]: https://docs.rs/x86_64 By returning a mutual reference to the entry's options, we allow the caller to override the default settings. For example, the caller could add a non-present entry by executing: `idt.set_handler(11, handler_fn).set_present(false)`. ### Loading the IDT Now we're able to create new interrupt descriptor tables with registered handler functions. We just need a way to load an IDT, so that the CPU uses it. The x86 architecture uses a special register to store the active IDT and its length. In order to load a new IDT we need to update this register through the [lidt] instruction. [lidt]: https://www.felixcloutier.com/x86/lgdt:lidt The `lidt` instruction expects a pointer to a special data structure, which specifies the start address of the IDT and its length: Type | Name | Description --------|---------|----------------------------------- u16 | Limit | The maximum addressable byte in the table. Equal to the table size in bytes minus 1. u64 | Offset | Virtual start address of the table. This structure is already contained [in the x86_64 crate], so we don't need to create it ourselves. The same is true for the [lidt function]. So we just need to put the pieces together to create a `load` method: [in the x86_64 crate]: https://docs.rs/x86_64/0.1.0/x86_64/instructions/tables/struct.DescriptorTablePointer.html [lidt function]: https://docs.rs/x86_64/0.1.0/x86_64/instructions/tables/fn.lidt.html ```rust impl Idt { pub fn load(&self) { use x86_64::instructions::tables::{DescriptorTablePointer, lidt}; use core::mem::size_of; let ptr = DescriptorTablePointer { base: self as *const _ as u64, limit: (size_of::() - 1) as u16, }; unsafe { lidt(&ptr) }; } } ``` The method does not need to modify the IDT, so it takes `self` by immutable reference. First, we create a `DescriptorTablePointer` and then we pass it to `lidt`. The `lidt` function expects that the `base` field has the type `u64`, therefore we need to cast the `self` pointer. For calculating the `limit` we use [mem::size_of]. The additional `-1` is needed because the limit field has to be the maximum addressable byte (inclusive bound). We need an unsafe block around `lidt`, because the function assumes that the specified handler addresses are valid. [mem::size_of]: https://doc.rust-lang.org/nightly/core/mem/fn.size_of.html #### Safety But can we really guarantee that handler addresses are always valid? Let's see: - The `Idt::new` function creates a new table populated with non-present entries. There's no way to set these entries to present from outside of this module, so this function is fine. - The `set_handler` method allows us to overwrite a specified entry and point it to some handler function. Rust's type system guarantees that function pointers are always valid (as long as no `unsafe` is involved), so this function is fine, too. There are no other public functions in the `idt` module (except `load`), so it should be safe… right? Wrong! Imagine the following scenario: ```rust pub fn init() { load_idt(); cause_page_fault(); } fn load_idt() { let mut idt = idt::Idt::new(); idt.set_handler(14, page_fault_handler); idt.load(); } fn cause_page_fault() { let x = [1,2,3,4,5,6,7,8,9]; unsafe{ *(0xdeadbeaf as *mut u64) = x[4] }; } ``` This won't work. If we're lucky, we get a triple fault and a boot loop. If we're unlucky, our kernel does strange things and fails at some completely unrelated place. So what's the problem here? Well, we construct an IDT _on the stack_ and load it. It is perfectly valid until the end of the `load_idt` function. But as soon as the function returns, its stack frame can be reused by other functions. Thus, the IDT gets overwritten by the stack frame of the `cause_page_fault` function. So when the page fault occurs and the CPU tries to read the entry, it only sees some garbage values and issues a double fault, which escalates to a triple fault and a CPU reset. Now imagine that the `cause_page_fault` function declared an array of pointers instead. If the present was coincidentally set, the CPU would jump to some random pointer and interpret random memory as code. This would be a clear violation of memory safety. #### Fixing the load method So how do we fix it? We could make the load function itself `unsafe` and push the unsafety to the caller. However, there is a much better solution in this case. In order to see it, we formulate the requirement for the `load` method: > The referenced IDT must be valid until a new IDT is loaded. We can't know when the next IDT will be loaded. Maybe never. So in the worst case: > The referenced IDT must be valid as long as our kernel runs. This is exactly the definition of a [static lifetime]. So we can easily ensure that the IDT lives long enough by adding a `'static` requirement to the signature of the `load` function: [static lifetime]: https://doc.rust-lang.org/rust-by-example/scope/lifetime/static_lifetime.html ```rust pub fn load(&'static self) {...} // ^^^^^^^ ensure that the IDT reference has the 'static lifetime ``` That's it! Now the Rust compiler ensures that the above error can't happen anymore: ``` error: `idt` does not live long enough --> src/interrupts/mod.rs:78:5 78 |> idt.load(); |> ^^^ note: reference must be valid for the static lifetime... note: ...but borrowed value is only valid for the block suffix following statement 0 at 75:34 --> src/interrupts/mod.rs:75:35 75 |> let mut idt = idt::Idt::new(); |> ^ ``` ### A static IDT So a valid IDT needs to have the `'static` lifetime. We can either create a `static` IDT or [deliberately leak a Box][into_raw]. We will most likely only need a single IDT for the foreseeable future, so let's try the `static` approach: [into_raw]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.into_raw ```rust // in src/interrupts/mod.rs static IDT: idt::Idt = { let mut idt = idt::Idt::new(); idt.set_handler(0, divide_by_zero_handler); idt }; extern "C" fn divide_by_zero_handler() -> ! { println!("EXCEPTION: DIVIDE BY ZERO"); loop {} } ``` We register a single handler function for a [divide by zero error] \(index 0). Like the name says, this exception occurs when dividing a number by 0. Thus we have an easy way to test our new exception handler. [divide by zero error]: https://wiki.osdev.org/Exceptions#Division_Error However, it doesn't work this way: ``` error: calls in statics are limited to constant functions, struct and enum constructors [E0015] ... error: blocks in statics are limited to items and tail expressions [E0016] ... error: references in statics may only refer to immutable values [E0017] ... ``` The reason is that the Rust compiler is not able to evaluate the value of the `static` at compile time. Maybe it will work someday when `const` functions become more powerful. But until then, we have to find another solution. #### Lazy Statics to the Rescue Fortunately the `lazy_static` macro exists. Instead of evaluating a `static` at compile time, the macro performs the initialization when the `static` is referenced the first time. Thus, we can do almost everything in the initialization block and are even able to read runtime values. Let's add the `lazy_static` crate to our project: ```rust // in src/lib.rs #[macro_use] extern crate lazy_static; ``` ```toml # in Cargo.toml [dependencies.lazy_static] version = "0.2.1" features = ["spin_no_std"] ``` We need the `spin_no_std` feature, since we don't link the standard library. With `lazy_static`, we can define our IDT without problems: ```rust // in src/interrupts/mod.rs lazy_static! { static ref IDT: idt::Idt = { let mut idt = idt::Idt::new(); idt.set_handler(0, divide_by_zero_handler); idt }; } ``` Now we're ready to load our IDT! Therefore we add a `interrupts::init` function: ```rust // in src/interrupts/mod.rs pub fn init() { IDT.load(); } ``` We don't need our `assert_has_not_been_called` macro here, since nothing bad happens when `init` is called twice. It just reloads the same IDT again. ## Testing it Now we should be able to catch page faults! Let's try it in our `rust_main`: ```rust // in src/lib.rs pub extern "C" fn rust_main(...) { ... memory::init(boot_info); // initialize our IDT interrupts::init(); // provoke a divide-by-zero fault 42 / 0; println!("It did not crash!"); loop {} } ``` When we run it, we get a runtime panic: ``` PANIC in src/lib.rs at line 57: attempted to divide by zero ``` That's a not our exception handler. The reason is that Rust itself checks for a possible division by zero and panics in that case. So in order to raise a divide-by-zero error in the CPU, we need to bypass the Rust compiler somehow. ### Inline Assembly In order to cause a divide-by-zero exception, we need to execute a [div] or [idiv] assembly instruction with operand 0. We could write a small assembly function and call it from our Rust code. An easier way is to use Rust's [inline assembly] macro. [div]: https://www.felixcloutier.com/x86/div [idiv]: https://www.felixcloutier.com/x86/idiv [inline assembly]: https://doc.rust-lang.org/1.10.0/book/inline-assembly.html Inline assembly allows us to write raw x86 assembly within a Rust function. The feature is unstable, so we need to add `#![feature(asm)]` to our `src/lib.rs`. Then we're able to write a `divide_by_zero` function: ```rust fn divide_by_zero() { unsafe { asm!("mov dx, 0; div dx" ::: "ax", "dx" : "volatile", "intel") } } ``` Let's try to decode it: - The `asm!` macro emits raw assembly instructions, so it's `unsafe` to use it. - We insert two assembly instructions here: `mov dx, 0` and `div dx`. The former loads a 0 into the `dx` register (a subset of `rdx`) and the latter divides the `ax` register by `dx`. (The `div` instruction always implicitly operates on the `ax` register). - The colons are separators. After the first `:` we could specify output operands and after the second `:` we could specify input operands. We need neither, so we leave these areas empty. - After the third colon, we specify the so-called _clobbers_. These tell the compiler that our assembly modifies the values of some registers. Otherwise, the compiler assumes that the registers preserve their value. In our case, we clobber `dx` (we load 0 to it) and `ax` (the `div` instruction places the result in it). - The last block (after the 4th colon) specifies some options. The `volatile` option tells the compiler: “This code has side effects. Do not delete it and do not move it elsewhere”. In our case, the “side effect” is the divide-by-zero exception. Finally, the `intel` option allows us to use the Intel assembly syntax instead of the default AT&T syntax. Let's use our new `divide_by_zero` function to raise a CPU exception: ```rust // in src/lib.rs pub extern "C" fn rust_main(...) { ... // provoke a divide-by-zero fault divide_by_zero(); println!("It did not crash!"); loop {} } ``` It works! We see a `EXCEPTION: DIVIDE BY ZERO` message at the bottom of our screen: ![QEMU screenshot with `EXCEPTION: DIVIDE BY ZERO` message](qemu-divide-error-println.png) ## What's next? We've successfully caught our first exception! However, our `EXCEPTION: DIVIDE BY ZERO` message doesn't contain much information about the cause of the exception. The next post improves the situation by printing i.a. the current stack pointer and address of the causing instruction. We will also explore other exceptions such as page faults, for which the CPU pushes an _error code_ on the stack. ================================================ FILE: blog/content/edition-1/extra/naked-exceptions/02-better-exception-messages/index.md ================================================ +++ title = "Better Exception Messages" weight = 2 path = "better-exception-messages" aliases = ["better-exception-messages.html"] date = 2016-08-03 template = "edition-1/page.html" [extra] updated = "2016-11-01" +++ In this post, we explore exceptions in more detail. Our goal is to print additional information when an exception occurs, for example the values of the instruction and stack pointer. In the course of this, we will explore inline assembly and naked functions. We will also add a handler function for page faults and read the associated error code. As always, the complete source code is on [GitHub]. Please file [issues] for any problems, questions, or improvement suggestions. There is also a [gitter chat] and a comment section at the end of this page. [GitHub]: https://github.com/phil-opp/blog_os/tree/better_exception_messages [issues]: https://github.com/phil-opp/blog_os/issues [gitter chat]: https://gitter.im/phil-opp/blog_os > **Note**: This post describes how to handle exceptions using naked functions (see [“Handling Exceptions with Naked Functions”] for an overview). Our new way of handling exceptions can be found in the [“Handling Exceptions”] post. [“Handling Exceptions with Naked Functions”]: @/edition-1/extra/naked-exceptions/_index.md [“Handling Exceptions”]: @/edition-1/posts/09-handling-exceptions/index.md ## Exceptions in Detail An exception signals that something is wrong with the currently-executed instruction. Whenever an exception occurs, the CPU interrupts its current work and starts an internal exception routine. This routine involves reading the interrupt descriptor table and invoking the registered handler function. But first, the CPU pushes various information onto the stack, which describe the current state and provide information about the cause of the exception: ![exception stack frame](exception-stack-frame.svg) The pushed information contain the instruction and stack pointer, the current CPU flags, and (for some exceptions) an error code, which contains further information about the cause of the exception. Let's look at the fields in detail: - First, the CPU aligns the stack pointer on a 16-byte boundary. This allows the handler function to use SSE instructions, which partly expect such an alignment. - After that, the CPU pushes the stack segment descriptor (SS) and the old stack pointer (from before the alignment) onto the stack. This allows us to restore the previous stack pointer when we want to resume the interrupted program. - Then the CPU pushes the contents of the [RFLAGS] register. This register contains various state information of the interrupted program. For example, it indicates if interrupts were enabled and whether the last executed instruction returned zero. - Next the CPU pushes the instruction pointer and its code segment descriptor onto the stack. This tells us the address of the last executed instruction, which caused the exception. - Finally, the CPU pushes an error code for some exceptions. This error code only exists for exceptions such as page faults or general protection faults and provides additional information. For example, it tells us whether a page fault was caused by a read or a write request. [RFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register ## Printing the Exception Stack Frame Let's create a struct that represents the exception stack frame: ```rust // in src/interrupts/mod.rs #[derive(Debug)] #[repr(C)] struct ExceptionStackFrame { instruction_pointer: u64, code_segment: u64, cpu_flags: u64, stack_pointer: u64, stack_segment: u64, } ``` The divide-by-zero fault pushes no error code, so we leave it out for now. Note that the stack grows downwards in memory, so we need to declare the fields in reverse order (compared to the figure above). Now we need a way to find the memory address of this stack frame. When we look at the above graphic again, we see that the start address of the exception stack frame is the new stack pointer. So we just need to read the value of `rsp` at the very beginning of our handler function: ```rust // in src/interrupts/mod.rs extern "C" fn divide_by_zero_handler() -> ! { let stack_frame: &ExceptionStackFrame; unsafe { asm!("mov $0, rsp" : "=r"(stack_frame) ::: "intel"); } println!("\nEXCEPTION: DIVIDE BY ZERO\n{:#?}", stack_frame); loop {} } ``` We're using [inline assembly] here to load the value from the `rsp` register into `stack_frame`. The syntax is a bit strange, so here's a quick explanation: [inline assembly]: https://doc.rust-lang.org/1.10.0/book/inline-assembly.html - The `asm!` macro emits raw assembly instructions. This is the only way to read raw register values in Rust. - We insert a single assembly instruction: `mov $0, rsp`. It moves the value of `rsp` to some register (the `$0` is a placeholder for an arbitrary register, which gets filled by the compiler). - The colons are separators. After the first colon, the `asm!` macro expects output operands. We're specifying our `stack_frame` variable as a single output operand here. The `=r` tells the compiler that it should use any register for the first placeholder `$0`. - After the second colon, we can specify input operands. We don't need any, therefore we leave it empty. - After the third colon, the macro expects so called [clobbers]. We don't change any register values, so we leave it empty too. - The last block (after the 4th colon) specifies options. The `intel` option tells the compiler that our code is in Intel assembly syntax (instead of the default AT&T syntax). [clobbers]: https://doc.rust-lang.org/1.10.0/book/inline-assembly.html#clobbers So the inline assembly loads the stack pointer value to `stack_frame` at the very beginning of our function. Thus we have a pointer to the exception stack frame and are able to pretty-print its `Debug` formatting through the `{:#?}` argument. ### Testing it Let's try it by executing `make run`: ![qemu printing an ExceptionStackFrame with strange values](qemu-print-stack-frame-try.png) Those `ExceptionStackFrame` values look very wrong. The instruction pointer definitely shouldn't be 1 and the code segment should be `0x8` instead of some big number. So what's going on here? ### Debugging It seems like we somehow got the pointer wrong. The `ExceptionStackFrame` type and our inline assembly seem correct, so something must be modifying `rsp` before we load it into `stack_frame`. Let's see what's happening by looking at the disassembly of our function: ``` > objdump -d build/kernel-x86_64.bin | grep -A20 "divide_by_zero_handler" [...] 000000000010ced0 <_ZN7blog_os10interrupts22divide_by_zero_handler17h62189e8E>: 10ced0: 55 push %rbp 10ced1: 48 89 e5 mov %rsp,%rbp 10ced4: 48 81 ec b0 00 00 00 sub $0xb0,%rsp 10cedb: 48 8d 45 98 lea -0x68(%rbp),%rax 10cedf: 48 b9 1d 1d 1d 1d 1d movabs $0x1d1d1d1d1d1d1d1d,%rcx 10cee6: 1d 1d 1d 10cee9: 48 89 4d 98 mov %rcx,-0x68(%rbp) 10ceed: 48 89 4d f8 mov %rcx,-0x8(%rbp) 10cef1: 48 89 e1 mov %rsp,%rcx 10cef4: 48 89 4d f8 mov %rcx,-0x8(%rbp) 10cef8: ... [...] ``` Our `divide_by_zero_handler` starts at address `0x10ced0`. Let's look at the instruction at address `0x10cef1`: ``` mov %rsp,%rcx ``` This is our inline assembly instruction, which loads the stack pointer into the `stack_frame` variable. It just looks a bit different, since it's in AT&T syntax and contains `rcx` instead of our `$0` placeholder. It moves `rsp` to `rcx`, and then the next instruction (`mov %rcx,-0x8(%rbp)`) moves `rcx` to the variable on the stack. We can clearly see the problem here: The compiler inserted various other instructions before our inline assembly. These instructions modify the stack pointer so that we don't read the original `rsp` value and get a wrong pointer. But why is the compiler doing this? The reason is that we need some place on the stack to store things like variables. Therefore the compiler inserts a so-called _[function prologue]_, which prepares the stack and reserves space for all variables. In our case, the compiler subtracts from the stack pointer to make room for i.a. our `stack_frame` variable. This prologue is the first thing in every function and comes before every other code. So in order to correctly load the exception frame pointer, we need some way to circumvent the automatic prologue generation. [function prologue]: https://en.wikipedia.org/wiki/Function_prologue ### Naked Functions Fortunately there is a way to disable the prologue: [naked functions]. A naked function has no prologue and immediately starts with the first instruction of its body. However, most Rust code requires the prologue. Therefore naked functions should only contain inline assembly. [naked functions]: https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md A naked function looks like this (note the `#[naked]` attribute): ```rust #[naked] extern "C" fn naked_function_example() { unsafe { asm!("mov rax, 0x42" ::: "rax" : "intel"); }; } ``` Naked functions are highly unstable, so we need to add `#![feature(naked_functions)]` to our `src/lib.rs`. If you want to try it, insert it in `src/lib.rs` and call it from `rust_main`. When we inspect the disassembly, we see that the function prologue is missing: ``` > objdump -d build/kernel-x86_64.bin | grep -A5 "naked_function_example" [...] 000000000010df90 <_ZN7blog_os22naked_function_example17ha9f733dfe42b595dE>: 10df90: 48 c7 c0 2a 00 00 00 mov $0x42,%rax 10df97: c3 retq 10df98: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 10df9f: 00 ``` It contains just the specified inline assembly and a return instruction (you can ignore the junk values after the return statement). So let's try to use a naked function to retrieve the exception frame pointer. ### A Naked Exception Handler We can't use Rust code in naked functions, but we still want to use Rust in our exception handler. Therefore we split our handler function in two parts. A main exception handler in Rust and a small naked wrapper function, which just loads the exception frame pointer and then calls the main handler. Our new two-stage exception handler looks like this: ```rust // in src/interrupts/mod.rs #[naked] extern "C" fn divide_by_zero_wrapper() -> ! { unsafe { asm!(/* load exception frame pointer and call main handler */); } } extern "C" fn divide_by_zero_handler(stack_frame: &ExceptionStackFrame) -> ! { println!("\nEXCEPTION: DIVIDE BY ZERO\n{:#?}", unsafe { &*stack_frame }); loop {} } ``` The naked wrapper function retrieves the exception stack frame pointer and then calls the `divide_by_zero_handler` with the pointer as argument. We can't use Rust code in naked functions, so we need to do both things in inline assembly. Retrieving the pointer to the exception stack frame is easy: We just need to load it from the `rsp` register. Our wrapper function has no prologue (it's naked), so we can be sure that nothing modifies the register before. Calling the main handler is a bit more complicated, since we need to pass the argument correctly. Our main handler uses the C calling convention, which specifies that the the first argument is passed in the `rdi` register. So we need to load the pointer value into `rdi` and then use the `call` instruction to call `divide_by_zero_handler`. Translated to assembly, it looks like this: ```nasm mov rdi, rsp call divide_by_zero_handler ``` It moves the exception stack frame pointer from `rsp` to `rdi`, where the first argument is expected, and then calls the main handler. Let's create the corresponding inline assembly to complete our wrapper function: ```rust #[naked] extern "C" fn divide_by_zero_wrapper() -> ! { unsafe { asm!("mov rdi, rsp; call $0" :: "i"(divide_by_zero_handler as extern "C" fn(_) -> !) : "rdi" : "intel"); } } ``` Instead of `call divide_by_zero_handler`, we use a placeholder again. The reason is Rust's name mangling, which changes the name of the `divide_by_zero_handler` function. To circumvent this, we pass a function pointer as input parameter (after the second colon). The `"i"` tells the compiler that it is an immediate value, which can be directly inserted for the placeholder. We also specify a clobber after the third colon, which tells the compiler that we change the value of the `rdi` register. ### Intrinsics::Unreachable When we try to compile it, we get the following error: ``` error: computation may converge in a function marked as diverging --> src/interrupts/mod.rs:23:1 |> 23 |> extern "C" fn divide_by_zero_wrapper() -> ! { |> ^ ``` The reason is that we marked our `divide_by_zero_wrapper` function as diverging (the `!`). We call another diverging function in inline assembly, so it is clear that the function diverges. However, the Rust compiler doesn't understand inline assembly, so it doesn't know that. To fix this, we tell the compiler that all code after the `asm!` macro is unreachable: ```rust #[naked] extern "C" fn divide_by_zero_wrapper() -> ! { unsafe { asm!("mov rdi, rsp; call $0" :: "i"(divide_by_zero_handler as extern "C" fn(_) -> !) : "rdi" : "intel"); ::core::intrinsics::unreachable(); } } ``` The [intrinsics::unreachable] function is unstable, so we need to add `#![feature(core_intrinsics)]` to our `src/lib.rs`. It is just an annotation for the compiler and produces no real code. (Not to be confused with the [unreachable!] macro, which is completely different!) [intrinsics::unreachable]: https://doc.rust-lang.org/nightly/core/intrinsics/fn.unreachable.html [unreachable!]: https://doc.rust-lang.org/nightly/core/macro.unreachable!.html ### It works! The last step is to update the interrupt descriptor table (IDT) to use our new wrapper function: ```rust // in src/interrupts/mod.rs lazy_static! { static ref IDT: idt::Idt = { let mut idt = idt::Idt::new(); idt.set_handler(0, divide_by_zero_wrapper); // changed idt }; } ``` Now we see a correct exception stack frame when we execute `make run`: ![QEMU showing correct divide by zero stack frame](qemu-divide-by-zero-stack-frame.png) ## Testing on real Hardware Virtual machines such as QEMU are very convenient to quickly test our kernel. However, they might behave a bit different than real hardware in some situations. So we should test our kernel on real hardware, too. Let's do it by burning it to an USB stick: ``` > sudo dd if=build/os-x86_64.iso of=/dev/sdX; and sync ``` Replace `sdX` by the device name of your USB stick. But **be careful**! The command will erase everything on that device. Now we should be able to boot from this USB stick. When we do it, we see that it works fine on real hardware, too. Great! However, this section wouldn't exist if there weren't a problem. To trigger this problem, we add some example code to the start of our `divide_by_zero_handler`: ```rust // in src/interrupts/mod.rs extern "C" fn divide_by_zero_handler(...) { let x = (1u64, 2u64, 3u64); let y = Some(x); for i in (0..100).map(|z| (z, z - 1)) {} println!(...); loop {} } ``` This is just some garbage code that doesn't do anything useful. When we try it in QEMU using `make run`, it still works fine. However, when we burn it to an USB stick again and boot from it on real hardware, we see that our computer reboots just before printing the exception message. So our code, which worked well in QEMU, _causes a triple fault_ on real hardware. What's happening? ### Reproducing the Bug in QEMU Debugging on a real machine is difficult. Fortunately there is a way to reproduce this bug in QEMU: We use Linux's [Kernel-based Virtual Machine] \(KVM) by passing the `‑enable-kvm` flag: [Kernel-based Virtual Machine]: https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine ``` > qemu-system-x86_64 -cdrom build/os-x86_64.iso -enable-kvm ``` Now QEMU triple faults as well. This should make debugging much easier. ### Debugging QEMU's `-d int`, which prints every exception, doesn't seem to work in KVM mode. However `-d cpu_reset` still works. It prints the complete CPU state whenever the CPU resets. Let's try it: ``` > qemu-system-x86_64 -cdrom build/os-x86_64.iso -enable-kvm -d cpu_reset CPU Reset (CPU 0) EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 [...] CPU Reset (CPU 0) EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 [...] CPU Reset (CPU 0) RAX=0000000000118cb8 RBX=0000000000000800 RCX=1d1d1d1d1d1d1d1d RDX=0..0000000 RSI=0000000000112cd0 RDI=0000000000118d38 RBP=0000000000118d28 RSP=0..0118c68 R8 =0000000000000000 R9 =0000000000000100 R10=0000000000118700 R11=0..0118a00 R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0..0000000 RIP=000000000010cf08 RFL=00210002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 [...] ``` The first two resets occur while the CPU is still in 32-bit mode (`EAX` instead of `RAX`), so we ignore them. The third reset is the interesting one, because it occurs in 64-bit mode. The register dump tells us that the instruction pointer (`rip`) was `0x10cf08` just before the reset. This might be the address of the instruction that caused the triple fault. We can find the corresponding instruction by disassembling our kernel: ``` objdump -d build/kernel-x86_64.bin | grep "10cf08:" 10cf08: 0f 29 45 b0 movaps %xmm0,-0x50(%rbp) ``` The [movaps] instruction is an [SSE] instruction that moves aligned 128bit values. It can fail for a number of reasons: [movaps]: https://www.felixcloutier.com/x86/movaps [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions 1. For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 2. For an illegal address in the SS segment. 3. If a memory operand is not aligned on a 16-byte boundary. 4. For a page fault. 5. If TS in CR0 is set. The segment registers contain no meaningful values in long mode, so they can't contain illegal addresses. We did not change the TS bit in [CR0] and there is no reason for a page fault either. So it has to be option 3. [CR0]: https://en.wikipedia.org/wiki/Control_register#CR0 ### 16-byte Alignment Some SSE instructions such as `movaps` require that memory operands are 16-byte aligned. In our case, the instruction is `movaps %xmm0,-0x50(%rbp)`, which writes to address `rbp - 0x50`. Therefore `rbp` needs to be 16-byte aligned. Let's look at the above `-d cpu_reset` dump again and check the value of `rbp`: ``` CPU Reset (CPU 0) RAX=[...] RBX=[...] RCX=[...] RDX=[...] RSI=[...] RDI=[...] RBP=0000000000118d28 RSP=[...] ... ``` `RBP` is `0x118d28`, which is _not_ 16-byte aligned. So this is the reason for the triple fault. (It seems like QEMU doesn't check the alignment for `movaps`, but real hardware of course does.) But how did we end up with a misaligned `rbp` register? ### The Base Pointer In order to solve this mystery, we need to look at the disassembly of the preceding code: ``` > objdump -d build/kernel-x86_64.bin | grep -B10 "10cf08:" 000000000010cee0 <_ZN7blog_os10interrupts22divide_by_zero_handler17hE>: 10cee0: 55 push %rbp 10cee1: 48 89 e5 mov %rsp,%rbp 10cee4: 48 81 ec c0 00 00 00 sub $0xc0,%rsp 10ceeb: 48 8d 45 90 lea -0x70(%rbp),%rax 10ceef: 48 b9 1d 1d 1d 1d 1d movabs $0x1d1d1d1d1d1d1d1d,%rcx 10cef6: 1d 1d 1d 10cef9: 48 89 4d 90 mov %rcx,-0x70(%rbp) 10cefd: 48 89 7d f8 mov %rdi,-0x8(%rbp) 10cf01: 0f 10 05 a8 51 00 00 movups 0x51a8(%rip),%xmm0 10cf08: 0f 29 45 b0 movaps %xmm0,-0x50(%rbp) ``` At the last line we have the `movaps` instruction, which caused the triple fault. The exception occurs inside our `divide_by_zero_handler` function. We see that `rbp` is loaded with the value of `rsp` at the beginning (at `0x10cee1`). The `rbp` register holds the so-called _base pointer_, which points to the beginning of the stack frame. It is used in the rest of the function to address variables and other values on the stack. The base pointer is initialized directly from the stack pointer (`rsp`) after pushing the old base pointer. There is no special alignment code, so the compiler blindly assumes that `(rsp - 8)`[^fn-rsp-8] is always 16-byte aligned. This seems to be wrong in our case. But why does the compiler assume this? [^fn-rsp-8]: By pushing the old base pointer, `rsp` is updated to `rsp-8`. ### Calling Conventions The reason is that our exception handler is defined as `extern "C" function`, which specifies that it's using the C [calling convention]. On x86_64 Linux, the C calling convention is specified by the System V AMD64 ABI ([PDF][system v abi]). Section 3.2.2 defines the following: [calling convention]: https://en.wikipedia.org/wiki/X86_calling_conventions [system v abi]: https://web.archive.org/web/20160801075139/https://www.x86-64.org/documentation/abi.pdf > The end of the input argument area shall be aligned on a 16 byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 when control is transferred to the function entry point. The “end of the input argument area” refers to the last stack-passed argument (in our case there aren't any). So the stack pointer must be 16 byte aligned whenever we `call` a C-compatible function. The `call` instruction then pushes the return value on the stack so that “the value (%rsp + 8) is a multiple of 16 when control is transferred to the function entry point”. _Summary_: The calling convention requires a 16 byte aligned stack pointer before `call` instructions. The compiler relies on this requirement, but we broke it somehow. Thus the generated code triple faults due to a misaligned memory address in the `movaps` instruction. ### Fixing the Alignment In order to fix this bug, we need to make sure that the stack pointer is correctly aligned before calling `extern "C"` functions. Let's summarize the stack pointer modifications that occur before the exception handler is called: 1. The CPU aligns the stack pointer to a 16 byte boundary. 2. The CPU pushes `ss`, `rsp`, `rflags`, `cs`, and `rip`. So it pushes five 8 byte registers, which makes `rsp` misaligned. 3. The wrapper function calls `divide_by_zero_handler` with a misaligned stack pointer. The problem is that we're pushing an uneven number of 8 byte registers. Thus we need to align the stack pointer again before the `call` instruction: ```rust #[naked] extern "C" fn divide_by_zero_wrapper() -> ! { unsafe { asm!("mov rdi, rsp sub rsp, 8 // align the stack pointer call $0" :: "i"(divide_by_zero_handler as extern "C" fn(_) -> !) : "rdi" : "intel"); ::core::intrinsics::unreachable(); } } ``` The additional `sub rsp, 8` instruction aligns the stack pointer to a 16 byte boundary. Now it should work on real hardware (and in QEMU KVM mode) again. ## A Handler Macro The next step is to add handlers for other exceptions. However, we would need wrapper functions for them too. To avoid this code duplication, we create a `handler` macro that creates the wrapper functions for us: ```rust // in src/interrupts/mod.rs macro_rules! handler { ($name: ident) => {{ #[naked] extern "C" fn wrapper() -> ! { unsafe { asm!("mov rdi, rsp sub rsp, 8 // align the stack pointer call $0" :: "i"($name as extern "C" fn( &ExceptionStackFrame) -> !) : "rdi" : "intel"); ::core::intrinsics::unreachable(); } } wrapper }} } ``` The macro takes a single Rust identifier (`ident`) as argument and expands to a `{}` block (hence the double braces). The block defines a new wrapper function that calls the function `$name` and passes a pointer to the exception stack frame. Note that we're fixing the argument type to `&ExceptionStackFrame`. If we used a `_` like before, the passed function could accept an arbitrary argument, which would lead to ugly bugs at runtime. Now we can remove the `divide_by_zero_wrapper` and use our new `handler!` macro instead: ```rust // in src/interrupts/mod.rs lazy_static! { static ref IDT: idt::Idt = { let mut idt = idt::Idt::new(); idt.set_handler(0, handler!(divide_by_zero_handler)); // new idt }; } ``` Note that the `handler!` macro needs to be defined above the static `IDT`, because macros are only available after their definition. ### Invalid Opcode Exception With the `handler!` macro we can create new handler functions easily. For example, we can add a handler for the invalid opcode exception as follows: ```rust // in src/interrupts/mod.rs lazy_static! { static ref IDT: idt::Idt = { let mut idt = idt::Idt::new(); idt.set_handler(0, handler!(divide_by_zero_handler)); idt.set_handler(6, handler!(invalid_opcode_handler)); // new idt }; } extern "C" fn invalid_opcode_handler(stack_frame: &ExceptionStackFrame) -> ! { let stack_frame = unsafe { &*stack_frame }; println!("\nEXCEPTION: INVALID OPCODE at {:#x}\n{:#?}", stack_frame.instruction_pointer, stack_frame); loop {} } ``` Invalid opcode faults have the vector number 6, so we set the 6th IDT entry. This time we additionally print the address of the invalid instruction. We can test our new handler with the special [ud2] instruction, which generates a invalid opcode: [ud2]: https://www.felixcloutier.com/x86/ud ```rust // in src/lib.rs #[no_mangle] pub extern "C" fn rust_main(multiboot_information_address: usize) { ... // initialize our IDT interrupts::init(); // provoke a invalid opcode exception unsafe { asm!("ud2") }; println!("It did not crash!"); loop {} } ``` ## Exceptions with Error Codes When a divide-by-zero exception occurs, we immediately know the reason: Someone tried to divide by zero. In contrast, there are faults with many possible causes. For example, a page fault occurs in many occasions: When accessing a non-present page, when writing to a read-only page, when the page table is malformed, etc. In order to differentiate these causes, the CPU pushes an additional error code onto the stack for such exceptions, which gives additional information. ### A new Macro Since the CPU pushes an additional error code, the stack frame is different and our `handler!` macro is not applicable. Therefore we create a new `handler_with_error_code!` macro for them: ```rust // in src/interrupts/mod.rs macro_rules! handler_with_error_code { ($name: ident) => {{ #[naked] extern "C" fn wrapper() -> ! { unsafe { asm!("pop rsi // pop error code into rsi mov rdi, rsp sub rsp, 8 // align the stack pointer call $0" :: "i"($name as extern "C" fn( &ExceptionStackFrame, u64) -> !) : "rdi","rsi" : "intel"); ::core::intrinsics::unreachable(); } } wrapper }} } ``` The difference to the `handler!` macro is the additional error code argument. The CPU pushes the error code last, so we pop it right at the beginning of the wrapper function. We pop it into `rsi` because the C calling convention expects the second argument in it. ### A Page Fault Handler Let's write a page fault handler which analyzes and prints the error code: ```rust // in src/interrupts/mod.rs extern "C" fn page_fault_handler(stack_frame: &ExceptionStackFrame, error_code: u64) -> ! { println!( "\nEXCEPTION: PAGE FAULT with error code {:?}\n{:#?}", error_code, unsafe { &*stack_frame }); loop {} } ``` We need to register our new handler function in the static interrupt descriptor table (IDT): ```rust // in src/interrupts/mod.rs lazy_static! { static ref IDT: idt::Idt = { let mut idt = idt::Idt::new(); idt.set_handler(0, handler!(divide_by_zero_handler)); idt.set_handler(6, handler!(invalid_opcode_handler)); // new idt.set_handler(14, handler_with_error_code!(page_fault_handler)); idt }; } ``` Page faults have the vector number 14, so we set the 14th IDT entry. #### Testing it Let's test our new page fault handler by provoking a page fault in our main function: ```rust // in src/lib.rs #[no_mangle] pub extern "C" fn rust_main(multiboot_information_address: usize) { ... // initialize our IDT interrupts::init(); // provoke a page fault unsafe { *(0xdeadbeaf as *mut u64) = 42 }; println!("It did not crash!"); loop {} } ``` We get the following output: ![QEMU: page fault with error code 2 and stack frame dump](qemu-page-fault-handler.png) ### The Page Fault Error Code “Error code 2” is not really an useful error message. Let's improve this by creating a `PageFaultErrorCode` type: ```rust // in src/interrupts/mod.rs bitflags! { struct PageFaultErrorCode: u64 { const PROTECTION_VIOLATION = 1 << 0; const CAUSED_BY_WRITE = 1 << 1; const USER_MODE = 1 << 2; const MALFORMED_TABLE = 1 << 3; const INSTRUCTION_FETCH = 1 << 4; } } ``` - When the `PROTECTION_VIOLATION` flag is set, the page fault was caused e.g. by a write to a read-only page. If it's not set, it was caused by accessing a non-present page. - The `CAUSED_BY_WRITE` flag specifies if the fault was caused by a write (if set) or a read (if not set). - The `USER_MODE` flag is set when the fault occurred in non-privileged mode. - The `MALFORMED_TABLE` flag is set when the page table entry has a 1 in a reserved field. - When the `INSTRUCTION_FETCH` flag is set, the page fault occurred while fetching the next instruction. Now we can improve our page fault error message by using the new `PageFaultErrorCode`. We also print the accessed memory address: ```rust extern "C" fn page_fault_handler(stack_frame: &ExceptionStackFrame, error_code: u64) -> ! { use x86_64::registers::control_regs; println!( "\nEXCEPTION: PAGE FAULT while accessing {:#x}\ \nerror code: {:?}\n{:#?}", unsafe { control_regs::cr2() }, PageFaultErrorCode::from_bits(error_code).unwrap(), unsafe { &*stack_frame }); loop {} } ``` The `from_bits` function tries to convert the `u64` into a `PageFaultErrorCode`. We use `unwrap` to panic if the error code has invalid bits set, since this indicates an error in our `PageFaultErrorCode` definition or a stack corruption. We also print the contents of the `cr2` register. It contains the accessed memory address, which was the cause of the page fault. Now we get a useful error message when a page fault occurs, which allows us to debug it more easily: ![QEMU: output is now `PAGE FAULT with error code CAUSED_BY_WRITE`](qemu-page-fault-error-code.png) As expected, the page fault was caused by write to `0xdeadbeaf`. The `PROTECTION_VIOLATION` flag is not set, so the accessed page was not present. ## What's next? Now we're able to catch and analyze various exceptions. The next step is to _resolve_ exceptions, if possible. An example is [demand paging]: The OS swaps out memory pages to disk so that a page fault occurs when the page is accessed the next time. In that case, the OS can resolve the exception by bringing the page back into memory. Afterwards, the OS resumes the interrupted program as if nothing had happened. [demand paging]: https://en.wikipedia.org/wiki/Demand_paging The next post will implement the first portion of demand paging: saving and restoring the complete state of an program. This will allow us to transparently interrupt and resume programs in the future. ================================================ FILE: blog/content/edition-1/extra/naked-exceptions/03-returning-from-exceptions/index.md ================================================ +++ title = "Returning from Exceptions" weight = 3 path = "returning-from-exceptions" aliases = ["returning-from-exceptions.html"] date = 2016-09-21 template = "edition-1/page.html" [extra] updated = "2016-11-01" +++ In this post, we learn how to return from exceptions correctly. In the course of this, we will explore the `iretq` instruction, the C calling convention, multimedia registers, and the red zone. As always, the complete source code is on [GitHub]. Please file [issues] for any problems, questions, or improvement suggestions. There is also a [gitter chat] and a comment section at the end of this page. [GitHub]: https://github.com/phil-opp/blog_os/tree/returning_from_exceptions [issues]: https://github.com/phil-opp/blog_os/issues [gitter chat]: https://gitter.im/phil-opp/blog_os > **Note**: This post describes how to handle exceptions using naked functions (see [“Handling Exceptions with Naked Functions”] for an overview). Our new way of handling exceptions can be found in the [“Handling Exceptions”] post. [“Handling Exceptions with Naked Functions”]: @/edition-1/extra/naked-exceptions/_index.md [“Handling Exceptions”]: @/edition-1/posts/09-handling-exceptions/index.md ## Introduction Most exceptions are fatal and can't be resolved. For example, we can't return from a divide-by-zero exception in a reasonable way. However, there are some exceptions that we can resolve: Imagine a system that uses [memory mapped files]: We map a file into the virtual address space without loading it into memory. Whenever we access a part of the file for the first time, a page fault occurs. However, this page fault is not fatal. We can resolve it by loading the corresponding page from disk into memory and setting the `present` flag in the page table. Then we can return from the page fault handler and restart the failed instruction, which now successfully accesses the file data. [memory mapped files]: https://en.wikipedia.org/wiki/Memory-mapped_file Memory mapped files are completely out of scope for us right now (we have neither a file concept nor a hard disk driver). So we need an exception that we can resolve easily so that we can return from it in a reasonable way. Fortunately, there is an exception that needs no resolution at all: the breakpoint exception. ## The Breakpoint Exception The breakpoint exception is the perfect exception to test our upcoming return-from-exception logic. Its only purpose is to temporary pause a program when the breakpoint instruction `int3` is executed. The breakpoint exception is commonly used in debuggers: When the user sets a breakpoint, the debugger overwrites the corresponding instruction with the `int3` instruction so that the CPU throws the breakpoint exception when it reaches that line. When the user wants to continue the program, the debugger replaces the `int3` instruction with the original instruction again and continues the program. For more details, see the [How debuggers work] series. [How debuggers work]: https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints For our use case, we don't need to overwrite any instructions (it wouldn't even be possible since we [set the page table flags] to read-only). Instead, we just want to print a message when the breakpoint instruction is executed and then continue the program. [set the page table flags]: @/edition-1/posts/07-remap-the-kernel/index.md#using-the-correct-flags ### Catching Breakpoints Let's start by defining a handler function for the breakpoint exception: ```rust // in src/interrupts/mod.rs extern "C" fn breakpoint_handler(stack_frame: &ExceptionStackFrame) -> ! { let stack_frame = unsafe { &*stack_frame }; println!("\nEXCEPTION: BREAKPOINT at {:#x}\n{:#?}", stack_frame.instruction_pointer, stack_frame); loop {} } ``` We print an error message and also output the instruction pointer and the rest of the stack frame. Note that this function does _not_ return yet, since our `handler!` macro still requires a diverging function. We need to register our new handler function in the interrupt descriptor table (IDT): ```rust // in src/interrupts/mod.rs lazy_static! { static ref IDT: idt::Idt = { let mut idt = idt::Idt::new(); idt.set_handler(0, handler!(divide_by_zero_handler)); idt.set_handler(3, handler!(breakpoint_handler)); // new idt.set_handler(6, handler!(invalid_opcode_handler)); idt.set_handler(14, handler_with_error_code!(page_fault_handler)); idt }; } ``` We set the IDT entry with number 3 since it's the vector number of the breakpoint exception. #### Testing it In order to test it, we insert an `int3` instruction in our `rust_main`: ```rust // in src/lib.rs ... #[macro_use] // needed for the `int!` macro extern crate x86_64; ... #[no_mangle] pub extern "C" fn rust_main(...) { ... interrupts::init(); // trigger a breakpoint exception unsafe { int!(3) }; println!("It did not crash!"); loop {} } ``` When we execute `make run`, we see the following: ![QEMU showing `EXCEPTION: BREAKPOINT at 0x110970` and a dump of the exception stack frame](qemu-breakpoint-handler.png) It works! Now we “just” need to return from the breakpoint handler somehow so that we see the `It did not crash` message again. ## Returning from Exceptions So how do we return from exceptions? To make it easier, we look at a normal function return first: ![function stack frame](function-stack-frame.svg) When calling a function, the `call` instruction pushes the return address on the stack. When the called function is finished, it can return to the parent function through the `ret` instruction, which pops the return address from the stack and then jumps to it. The exception stack frame, in contrast, looks a bit different: ![exception stack frame](exception-stack-frame.svg) Instead of pushing a return address, the CPU pushes the stack and instruction pointers (with their segment descriptors), the RFLAGS register, and an optional error code. It also aligns the stack pointer to a 16 byte boundary before pushing values. So we can't use a normal `ret` instruction, since it expects a different stack frame layout. Instead, there is a special instruction for returning from exceptions: `iretq`. ### The `iretq` Instruction The `iretq` instruction is the one and only way to return from exceptions and is specifically designed for this purpose. The AMD64 instruction manual ([PDF][amd-manual]) even demands that `iretq` “_must_ be used to terminate the exception or interrupt handler associated with the exception”. [amd-manual]: https://www.amd.com/system/files/TechDocs/24594.pdf IRETQ restores `rip`, `cs`, `rflags`, `rsp`, and `ss` from the values saved on the stack and thus continues the interrupted program. The instruction does not handle the optional error code, so it must be popped from the stack before. We see that `iretq` treats the stored instruction pointer as return address. For most exceptions, the stored `rip` points to the instruction that caused the fault. So by executing `iretq`, we restart the failing instruction. This makes sense because we should have resolved the exception when returning from it, so the instruction should no longer fail (e.g. the accessed part of the memory mapped file is now present in memory). The situation is a bit different for the breakpoint exception, since it needs no resolution. Restarting the `int3` instruction wouldn't make sense, since it would cause a new breakpoint exception and we would enter an endless loop. For this reason the hardware designers decided that the stored `rip` should point to the next instruction after the `int3` instruction. Let's check this for our breakpoint handler. Remember, the handler printed the following message (see the image above): ``` EXCEPTION: BREAKPOINT at 0x110970 ``` So let's disassemble the instruction at `0x110970` and its predecessor: ```bash > objdump -d build/kernel-x86_64.bin | grep -B1 "110970:" 11096f: cc int3 110970: 48 c7 01 2a 00 00 00 movq $0x2a,(%rcx) ``` We see that `0x110970` indeed points to the next instruction after `int3`. So we can simply jump to the stored instruction pointer when we want to return from the breakpoint exception. ### Implementation Let's update our `handler!` macro to support non-diverging exception handlers: ```rust // in src/interrupts/mod.rs macro_rules! handler { ($name: ident) => {{ #[naked] extern "C" fn wrapper() -> ! { unsafe { asm!("mov rdi, rsp sub rsp, 8 // align the stack pointer call $0" :: "i"($name as extern "C" fn( &ExceptionStackFrame)) // no longer diverging : "rdi" : "intel", "volatile"); // new asm!("add rsp, 8 // undo stack pointer alignment iretq" :::: "intel", "volatile"); ::core::intrinsics::unreachable(); } } wrapper }} } ``` When an exception handler returns from the `call` instruction, we use the `iretq` instruction to continue the interrupted program. Note that we need to undo the stack pointer alignment before, so that `rsp` points to the end of the exception stack frame again. We've changed the handler function type, so we need to adjust our existing exception handlers: ```diff // in src/interrupts/mod.rs extern "C" fn divide_by_zero_handler( - stack_frame: &ExceptionStackFrame) -> ! {...} + stack_frame: &ExceptionStackFrame) {...} extern "C" fn invalid_opcode_handler( - stack_frame: &ExceptionStackFrame) -> ! {...} + stack_frame: &ExceptionStackFrame) {...} extern "C" fn breakpoint_handler( - stack_frame: &ExceptionStackFrame) -> ! { + stack_frame: &ExceptionStackFrame) { println!(...); - loop {} } ``` Note that we also removed the `loop {}` at the end of our `breakpoint_handler` so that it no longer diverges. The `divide_by_zero_handler` and the `invalid_opcode_handler` still diverge (albeit the new function type would allow a return). ### Testing Let's try our new `iretq` logic: ![QEMU output with `EXCEPTION BREAKPOINT` and `EXCEPTION PAGE FAULT` but no `It did not crash`](qemu-breakpoint-return-page-fault.png) Instead of the expected _“It did not crash”_ message after the breakpoint exception, we get a page fault. The strange thing is that our kernel tried to access address `0x1`, which should never happen. So it seems like we messed up something important. ### Debugging Let's debug it using GDB. For that we execute `make debug` in one terminal (which starts QEMU with the `-s -S` flags) and then `make gdb` (which starts and connects GDB) in a second terminal. For more information about GDB debugging, check out our [Set Up GDB] guide. [Set Up GDB]: @/edition-1/extra/set-up-gdb/index.md First we want to check if our `iretq` was successful. Therefore we set a breakpoint on the `println!("It did not crash line!")` statement in `src/lib.rs`. Let's assume that it's on line 61: ``` (gdb) break blog_os/src/lib.rs:61 Breakpoint 1 at 0x110a95: file /home/.../blog_os/src/lib.rs, line 61. ``` This line is after the `int3` instruction, so we know that the `iretq` succeeded when the breakpoint is hit. To test this, we continue the execution: ``` (gdb) continue Continuing. Breakpoint 1, blog_os::rust_main (multiboot_information_address=1539136) at /home/.../blog_os/src/lib.rs:61 61 println!("It did not crash!"); ``` It worked! So our kernel successfully returned from the `int3` instruction, which means that the `iretq` itself works. However, when we `continue` the execution again, we get the page fault. So the exception occurs somewhere in the `println` logic. This means that it occurs in code generated by the compiler (and not e.g. in inline assembly). But the compiler should never access `0x1`, so how is this happening? The answer is that we've used the wrong _calling convention_ for our exception handlers. Thus, we violate some compiler invariants so that the code that works fine without intermediate exceptions starts to violate memory safety when it's executed after a breakpoint exception. ## Calling Conventions Exceptions are quite similar to function calls: The CPU jumps to the first instruction of the (handler) function and executes the function. Afterwards, if the function is not diverging, the CPU jumps to the return address and continues the execution of the parent function. However, there is a major difference between exceptions and function calls: A function call is invoked voluntary by a compiler inserted `call` instruction, while an exception might occur at _any_ instruction. In order to understand the consequences of this difference, we need to examine function calls in more detail. [Calling conventions] specify the details of a function call. For example, they specify where function parameters are placed (e.g. in registers or on the stack) and how results are returned. On x86_64 Linux, the following rules apply for C functions (specified in the [System V ABI]): [Calling conventions]: https://en.wikipedia.org/wiki/Calling_convention [System V ABI]: https://refspecs.linuxbase.org/elf/gabi41.pdf - the first six integer arguments are passed in registers `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` - additional arguments are passed on the stack - results are returned in `rax` and `rdx` Note that Rust does not follow the C ABI (in fact, [there isn't even a Rust ABI yet][rust abi]). So these rules apply only to functions declared as `extern "C" fn`. [rust abi]: https://github.com/rust-lang/rfcs/issues/600 ### Preserved and Scratch Registers The calling convention divides the registers in two parts: _preserved_ and _scratch_ registers. The values of the preserved register must remain unchanged across function calls. So a called function (the _“callee”_) is only allowed to overwrite these registers if it restores their original values before returning. Therefore these registers are called _“callee-saved”_. A common pattern is to save these registers to the stack at the function's beginning and restore them just before returning. In contrast, a called function is allowed to overwrite scratch registers without restrictions. If the caller wants to preserve the value of a scratch register across a function call, it needs to backup and restore it (e.g. by pushing it to the stack before the function call). So the scratch registers are _caller-saved_. On x86_64, the C calling convention specifies the following preserved and scratch registers: preserved registers | scratch registers ---|--- `rbp`, `rbx`, `rsp`, `r12`, `r13`, `r14`, `r15` | `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`, `r9`, `r10`, `r11` _callee-saved_ | _caller-saved_ The compiler knows these rules, so it generates the code accordingly. For example, most functions begin with a `push rbp`, which backups `rbp` on the stack (because it's a callee-saved register). ### The Exception Calling Convention In contrast to function calls, exceptions can occur on _any_ instruction. In most cases we don't even know at compile time if the generated code will cause an exception. For example, the compiler can't know if an instruction causes a stack overflow or an other page fault. Since we don't know when an exception occurs, we can't backup any registers before. This means that we can't use a calling convention that relies on caller-saved registers for our exception handlers. But we do so at the moment: Our exception handlers are declared as `extern "C" fn` and thus use the C calling convention. So here is what happens: - `rust_main` is executing; it writes some memory address into `rax`. - The `int3` instruction causes a breakpoint exception. - Our `breakpoint_handler` prints to the screen and assumes that it can overwrite `rax` freely (since it's a scratch register). Somehow the value `0` ends up in `rax`. - We return from the breakpoint exception using `iretq`. - `rust_main` continues and accesses the memory address in `rax`. - The CPU tries to access address `0x1`, which causes a page fault. So our exception handler erroneously assumes that the scratch registers were saved by the caller. But the caller (`rust_main`) couldn't save any registers since it didn't know that an exception occurs. So nobody saves `rax` and the other scratch registers, which leads to the page fault. The problem is that we use a calling convention with caller-saved registers for our exception handlers. Instead, we need a calling convention means that preserves _all registers_. In other words, all registers must be callee-saved: ```rust extern "all-registers-callee-saved" fn exception_handler() {...} ``` Unfortunately, Rust does not support such a calling convention. It was [proposed once][interrupt calling conventions], but did not get accepted for various reasons. The primary reason was that such calling conventions can be simulated by writing a naked wrapper function. (Remember: [Naked functions] are functions without prologue and can contain only inline assembly. They were discussed in the [previous post][naked fn post].) [interrupt calling conventions]: https://github.com/rust-lang/rfcs/pull/1275 [Naked functions]: https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md [naked fn post]: @/edition-1/extra/naked-exceptions/02-better-exception-messages/index.md#naked-functions ### A naked wrapper function Such a naked wrapper function might look like this: ```rust #[naked] extern "C" fn calling_convention_wrapper() { unsafe { asm!(" push rax push rcx push rdx push rsi push rdi push r8 push r9 push r10 push r11 // TODO: call exception handler with C calling convention pop r11 pop r10 pop r9 pop r8 pop rdi pop rsi pop rdx pop rcx pop rax " :::: "intel", "volatile"); } } ``` This wrapper function saves all _scratch_ registers to the stack before calling the exception handler and restores them afterwards. Note that we `pop` the registers in reverse order. We don't need to backup _preserved_ registers since they are callee-saved in the C calling convention. Thus, the compiler already takes care of preserving their values. ### Fixing our Handler Macro Let's update our handler macro to fix the calling convention problem. Therefore we need to backup and restore all scratch registers. For that we create two new macros: ```rust // in src/interrupts/mod.rs macro_rules! save_scratch_registers { () => { asm!("push rax push rcx push rdx push rsi push rdi push r8 push r9 push r10 push r11 " :::: "intel", "volatile"); } } macro_rules! restore_scratch_registers { () => { asm!("pop r11 pop r10 pop r9 pop r8 pop rdi pop rsi pop rdx pop rcx pop rax " :::: "intel", "volatile"); } } ``` We need to declare these macros _above_ our `handler` macro, since macros are only available after their declaration. Now we can use these macros to fix our `handler!` macro: ```rust // in src/interrupts/mod.rs macro_rules! handler { ($name: ident) => {{ #[naked] extern "C" fn wrapper() -> ! { unsafe { save_scratch_registers!(); asm!("mov rdi, rsp add rdi, 9*8 // calculate exception stack frame pointer // sub rsp, 8 (stack is aligned already) call $0" :: "i"($name as extern "C" fn(&ExceptionStackFrame)) : "rdi" : "intel", "volatile"); restore_scratch_registers!(); asm!(" // add rsp, 8 (undo stack alignment; not needed anymore) iretq" :::: "intel", "volatile"); ::core::intrinsics::unreachable(); } } wrapper }} } ``` It's important that we save the registers first, before we modify any of them. After the `call` instruction (but before `iretq`) we restore the registers again. Because we're now changing `rsp` (by pushing the register values) before we load it into `rdi`, we would get a wrong exception stack frame pointer. Therefore we need to adjust it by adding the number of bytes we push. We push 9 registers that are 8 bytes each, so `9 * 8` bytes in total. Note that we no longer need to manually align the stack pointer, because we're pushing an uneven number of registers in `save_scratch_registers`. Thus the stack pointer already has the required 16-byte alignment. ### Testing it again Let's test it again with our corrected `handler!` macro: ![QEMU output with `EXCEPTION BREAKPOINT` and `It did not crash`](qemu-breakpoint-return.png) The page fault is gone and we see the _“It did not crash”_ message again! So the page fault occurred because our exception handler didn't preserve the scratch register `rax`. Our new `handler!` macro fixes this problem by saving all scratch registers (including `rax`) before calling exception handlers. Thus, `rax` still contains the valid memory address when `rust-main` continues execution. ## Multimedia Registers When we discussed calling conventions above, we assumed that a x86_64 CPU only has the following 16 registers: `rax`, `rbx`, `rcx`, `rdx`, `rsi`, `rdi`, `rsp`, `rbp`, `r8`, `r9`, `r10`, `r11`.`r12`, `r13`, `r14`, and `r15`. These registers are called _general purpose registers_ since each of them can be used for arithmetic and load/store instructions. However, modern CPUs also have a set of _special purpose registers_, which can be used to improve performance in several use cases. On x86_64, the most important set of special purpose registers are the _multimedia registers_. These registers are larger than the general purpose registers and can be used to speed up audio/video processing or matrix calculations. For example, we could use them to add two 4-dimensional vectors _in a single CPU instruction_: ![`(1,2,3,4) + (5,6,7,8) = (6,8,10,12)`](vector-addition.png) Such multimedia instructions are called [Single Instruction Multiple Data (SIMD)] instructions, because they simultaneously perform an operation (e.g. addition) on multiple data words. Good compilers are able to transform normal loops into such SIMD code automatically. This process is called [auto-vectorization] and can lead to huge performance improvements. [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD [auto-vectorization]: https://en.wikipedia.org/wiki/Automatic_vectorization However, auto-vectorization causes a problem for us: Most of the multimedia registers are caller-saved. According to our discussion of calling conventions above, this means that our exception handlers erroneously assume that they are allowed to overwrite them without preserving their values. We don't use any multimedia registers explicitly, but the Rust compiler might auto-vectorize our code (including the exception handlers). Thus we could silently clobber the multimedia registers, which leads to the same problems as above: ![example: program uses mm0, mm1, and mm2. Then the exception handler clobbers mm1.](xmm-overwrite.svg) This example shows a program that is using the first three multimedia registers (`mm0` to `mm2`). At some point, an exception occurs and control is transferred to the exception handler. The exception handler uses `mm1` for its own data and thus overwrites the previous value. When the exception is resolved, the CPU continues the interrupted program again. However, the program is now corrupt since it relies on the original `mm1` value. ### Saving and Restoring Multimedia Registers In order to fix this problem, we need to backup all caller-saved multimedia registers before we call the exception handler. The problem is that the set of multimedia registers varies between CPUs. There are different standards: - [MMX]: The MMX instruction set was introduced in 1997 and defines eight 64 bit registers called `mm0` through `mm7`. These registers are just aliases for the registers of the [x87 floating point unit]. - [SSE]: The _Streaming SIMD Extensions_ instruction set was introduced in 1999. Instead of re-using the floating point registers, it adds a completely new register set. The sixteen new registers are called `xmm0` through `xmm15` and are 128 bits each. - [AVX]: The _Advanced Vector Extensions_ are extensions that further increase the size of the multimedia registers. The new registers are called `ymm0` through `ymm15` and are 256 bits each. They extend the `xmm` registers, so e.g. `xmm0` is the lower (or upper?) half of `ymm0`. [MMX]: https://en.wikipedia.org/wiki/MMX_(instruction_set) [x87 floating point unit]: https://en.wikipedia.org/wiki/X87 [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions [AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions The Rust compiler (and LLVM) assume that the `x86_64-unknown-linux-gnu` target supports only MMX and SSE, so we don't need to save the `ymm0` through `ymm15`. But we need to save `xmm0` through `xmm15` and also `mm0` through `mm7`. There is a special instruction to do this: [fxsave]. This instruction saves the floating point and multimedia state to a given address. It needs _512 bytes_ to store that state. [fxsave]: https://www.felixcloutier.com/x86/fxsave In order to save/restore the multimedia registers, we _could_ add new macros: ```rust macro_rules! save_multimedia_registers { () => { asm!("sub rsp, 512 fxsave [rsp] " :::: "intel", "volatile"); } } macro_rules! restore_multimedia_registers { () => { asm!("fxrstor [rsp] add rsp, 512 " :::: "intel", "volatile"); } } ``` First, we reserve the 512 bytes on the stack and then we use `fxsave` to backup the multimedia registers. In order to restore them later, we use the [fxrstor] instruction. Note that `fxsave` and `fxrstor` require a 16 byte aligned memory address. [fxrstor]: https://www.felixcloutier.com/x86/fxrstor However, _we won't do it that way_. The problem is the large amount of memory required. We will reuse the same code when we handle hardware interrupts in a future post. So for each mouse click, pressed key, or arrived network package we need to write 512 bytes to memory. This would be a huge performance problem. Fortunately, there exists an alternative solution. ### Disabling Multimedia Extensions We just disable MMX, SSE, and all the other fancy multimedia extensions in our kernel[^fn-userspace-sse]. This way, our exception handlers won't clobber the multimedia registers because they won't use them at all. [^fn-userspace-sse]: Userspace programs will still be able to use the multimedia registers. This solution has its own disadvantages, of course. For example, it leads to slower kernel code because the compiler can't perform any auto-vectorization optimizations. But it's still the faster solution (since we save many memory accesses) and most kernels do it this way (including Linux). So how do we disable MMX and SSE? Well, we just tell the compiler that our target system doesn't support it. Since the very beginning, we're compiling our kernel for the `x86_64-unknown-linux-gnu` target. This worked fine so far, but now we want a different target without support for multimedia extensions. We can do so by creating a _target configuration file_. ### Target Specifications In order to disable the multimedia extensions for our kernel, we need to compile for a custom target. We want a target that is equal to `x86_64-unknown-linux-gnu`, but without MMX and SSE support. Rust allows us to specify such a target using a JSON configuration file. A minimal target specification that describes the `x86_64-unknown-linux-gnu` target looks like this: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "target-endian": "little", "target-pointer-width": "64", "target-c-int-width": "32", "arch": "x86_64", "os": "none" } ``` The `llvm-target` field specifies the target triple that is passed to LLVM. We want to derive a 64-bit Linux target, so we choose `x86_64-unknown-linux-gnu`. The `data-layout` field is also passed to LLVM and specifies how data should be laid out in memory. It consists of various specifications separated by a `-` character. For example, the `e` means little endian and `S128` specifies that the stack should be 128 bits (= 16 byte) aligned. The format is described in detail in the [LLVM documentation][data layout] but there shouldn't be a reason to change this string. The other fields are used for conditional compilation. This allows crate authors to use `cfg` variables to write special code for depending on the OS or the architecture. There isn't any up-to-date documentation about these fields but the [corresponding source code][target specification] is quite readable. [data layout]: https://llvm.org/docs/LangRef.html#data-layout [target specification]: https://github.com/rust-lang/rust/blob/c772948b687488a087356cb91432425662e034b9/src/librustc_back/target/mod.rs#L194-L214 #### Disabling MMX and SSE In order to disable the multimedia extensions, we create a new target named `x86_64-blog_os`. To describe this target, we create a file named `x86_64-blog_os.json` in the project root with the following content: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "target-endian": "little", "target-pointer-width": "64", "target-c-int-width": "32", "arch": "x86_64", "os": "none", "features": "-mmx,-sse" } ``` It's equal to `x86_64-unknown-linux-gnu` target but has one additional option: `"features": "-mmx,-sse"`. So we added two target _features_: `-mmx` and `-sse`. The minus prefix defines that our target does _not_ support this feature. So by specifying `-mmx` and `-sse`, we disable the default `mmx` and `sse` features. In order to compile for the new target, we need to adjust our Makefile: ```diff # in `Makefile` arch ?= x86_64 -target ?= $(arch)-unknown-linux-gnu +target ?= $(arch)-blog_os ... ``` The new target name (`x86_64-blog_os`) is the file name of the JSON configuration file without the `.json` extension. ### Cross compilation Let's try if our kernel still works with the new target: ``` > make run Compiling raw-cpuid v2.0.1 Compiling rlibc v0.1.5 Compiling x86 v0.7.1 Compiling spin v0.3.5 error[E0463]: can't find crate for `core` error: aborting due to previous error Build failed, waiting for other jobs to finish... ... Makefile:52: recipe for target 'cargo' failed make: *** [cargo] Error 101 ``` It doesn't compile anymore. The error tells us that the Rust compiler no longer finds the core library. The [core library] is implicitly linked to all `no_std` crates and contains things such as `Result`, `Option`, and iterators. We've used that library without problems since [the very beginning], so why is it no longer available? [core library]: https://doc.rust-lang.org/nightly/core/index.html [the very beginning]: @/edition-1/posts/03-set-up-rust/index.md The problem is that the core library is distributed together with the Rust compiler as a _precompiled_ library. So it is only valid for the host triple, which is `x86_64-unknown-linux-gnu` in our case. If we want to compile code for other targets, we need to recompile `core` for these targets first. #### Xargo That's where [xargo] comes in. It is a wrapper for cargo that eases cross compilation. We can install it by executing: [xargo]: https://github.com/japaric/xargo ``` cargo install xargo ``` Xargo depends on the rust source code, which we can install with `rustup component add rust-src`. Xargo is “a drop-in replacement for cargo”, so every cargo command also works with `xargo`. You can do e.g. `xargo --help`, `xargo clean`, or `xargo doc`. However, the `build` command gains additional functionality: `xargo build` will automatically cross compile the `core` library when compiling for custom targets. That's exactly what we want, so we change one letter in our Makefile: ```diff # in `Makefile` ... cargo: - @cargo build --target $(target) + @xargo build --target $(target) ... ``` Now the build goes through `xargo`, which should fix the compilation error. Let's try it out: ``` > make run Compiling core v0.0.0 (file:///home/…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore) LLVM ERROR: SSE register return with SSE disabled error: Could not compile `core`. ``` Well, we get a different error now, so it seems like we're making progress :). It seems like there is a “SSE register return” although SSE is disabled. But what's an “SSE register return”? ### SSE Register Return Remember when we discussed calling conventions above? The calling convention defines which registers are used for return values. Well, the [System V ABI] defines that `xmm0` should be used for returning floating point values. So somewhere in the `core` library a function returns a float and LLVM doesn't know what to do. The ABI says “use `xmm0`” but the target specification says “don't use `xmm` registers”. In order to fix this problem, we need to change our float ABI. The idea is to avoid normal hardware-supported floats and use a pure software implementation instead. We can do so by enabling the `soft-float` feature for our target. For that, we edit `x86_64-blog_os.json`: ```json { "llvm-target": "x86_64-unknown-linux-gnu", ... "features": "-mmx,-sse,+soft-float" } ``` The plus prefix tells LLVM to enable the `soft-float` feature. Let's try `make run` again: ``` > make run Compiling core v0.0.0 (file:///…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore) Finished release [optimized] target(s) in 21.95 secs Compiling spin v0.4.5 Compiling once v0.3.2 Compiling x86 v0.8.0 Compiling bitflags v0.9.1 Compiling raw-cpuid v2.0.1 Compiling rlibc v0.1.5 Compiling linked_list_allocator v0.2.3 Compiling volatile v0.1.0 Compiling bitflags v0.4.0 Compiling bit_field v0.5.0 Compiling spin v0.3.5 Compiling multiboot2 v0.1.0 Compiling lazy_static v0.2.2 Compiling hole_list_allocator v0.1.0 (file:///…/libs/hole_list_allocator) Compiling blog_os v0.1.0 (file:///…) error[E0463]: can't find crate for `alloc` --> src/lib.rs:33:1 | 33 | extern crate alloc; | ^^^^^^^^^^^^^^^^^^^ can't find crate error: aborting due to previous error ``` We see that `xargo` now compiles the `core` crate in release mode. Then it starts the normal cargo build. Cargo then recompiles all dependencies, since it needs to generate different code for the new target. However, the build still fails. The reason is that xargo only installs `core` by default, but we also need the `alloc` crate. We can enable it by creating a file named `Xargo.toml` with the following contents: ```toml # Xargo.toml [target.x86_64-blog_os.dependencies] alloc = {} ``` Now xargo compiles `alloc`, too: ``` > make run Compiling core v0.0.0 (file:///…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore) Compiling std_unicode v0.0.0 (file:///…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd_unicode) Compiling alloc v0.0.0 (file:///…/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc) Finished release [optimized] target(s) in 28.84 secs Compiling blog_os v0.1.0 (file:///…/Documents/blog_os/master) warning: unused variable: `allocator` […] warning: unused variable: `frame` […] Finished debug [unoptimized + debuginfo] target(s) in 1.75 secs ``` It worked! Now we have a kernel that never touches the multimedia registers! We can verify this by executing: ``` > objdump -d build/kernel-x86_64.bin | grep "mm[0-9]" ``` If the command produces no output, our kernel uses neither MMX (`mm0` – `mm7`) nor SSE (`xmm0` – `xmm15`) registers. So now our return-from-exception logic works without problems in _most_ cases. However, there is still a pitfall hidden in the C calling convention, which might cause hideous bugs in some rare cases. ## The Red Zone The [red zone] is an optimization of the [System V ABI] that allows functions to temporary use the 128 bytes below its stack frame without adjusting the stack pointer: [red zone]: https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone ![stack frame with red zone](red-zone.svg) The image shows the stack frame of a function with `n` local variables. On function entry, the stack pointer is adjusted to make room on the stack for the local variables. The red zone is defined as the 128 bytes below the adjusted stack pointer. The function can use this area for temporary data that's not needed across function calls. Thus, the two instructions for adjusting the stack pointer can be avoided in some cases (e.g. in small leaf functions). However, this optimization leads to huge problems with exceptions. Let's assume that an exception occurs while a function uses the red zone: ![red zone overwritten by exception handler](red-zone-overwrite.svg) The CPU and the exception handler overwrite the data in red zone. But this data is still needed by the interrupted function. So the function won't work correctly anymore when we return from the exception handler. It might fail or cause another exception, but it could also lead to strange bugs that [take weeks to debug]. [take weeks to debug]: https://forum.osdev.org/viewtopic.php?t=21720 ### Adjusting our Exception Handler? The problem is that the [System V ABI] demands that the red zone _“shall not be modified by signal or interrupt handlers.”_ Our current exception handlers do not respect this. We could try to fix it by subtracting 128 from the stack pointer before pushing anything: ```nasm sub rsp, 128 save_scratch_registers() ... call ... ... restore_scratch_registers() add rsp, 128 iretq ``` _This will not work._ The problem is that the CPU pushes the exception stack frame before even calling our handler function. So the CPU itself will clobber the red zone and there is nothing we can do about that. So our only chance is to disable the red zone. ### Disabling the Red Zone The red zone is a property of our target, so in order to disable it we edit our `x86_64-blog_os.json` a last time: ```json { "llvm-target": "x86_64-unknown-linux-gnu", ... "features": "-mmx,-sse,+soft-float", "disable-redzone": true } ``` We add one additional option at the end: `"disable-redzone": true`. As you might guess, this option disables the red zone optimization. Now we have a red zone free kernel! ## Exceptions with Error Codes We're now able to correctly return from exceptions without error codes. However, we still can't return from exceptions that push an error code (e.g. page faults). Let's fix that by updating our `handler_with_error_code` macro: ```rust // in src/interrupts/mod.rs macro_rules! handler_with_error_code { ($name: ident) => {{ #[naked] extern "C" fn wrapper() -> ! { unsafe { asm!("pop rsi // pop error code into rsi mov rdi, rsp sub rsp, 8 // align the stack pointer call $0" :: "i"($name as extern "C" fn( &ExceptionStackFrame, u64)) : "rdi","rsi" : "intel"); asm!("iretq" :::: "intel", "volatile"); ::core::intrinsics::unreachable(); } } wrapper }} } ``` First, we change the type of the handler function: no more `-> !`, so it no longer needs to diverge. We also add an `iretq` instruction at the end. Now we can make our `page_fault_handler` non-diverging: ```diff // in src/interrupts/mod.rs extern "C" fn page_fault_handler(stack_frame: &ExceptionStackFrame, - error_code: u64) -> ! { ... } + error_code: u64) { ... } ``` However, now we have the same problem as above: The handler function will overwrite the scratch registers and cause bugs when returning. Let's fix this by invoking `save_scratch_registers` at the beginning: ```rust // in src/interrupts/mod.rs macro_rules! handler_with_error_code { ($name: ident) => {{ #[naked] extern "C" fn wrapper() -> ! { unsafe { save_scratch_registers!(); asm!("pop rsi // pop error code into rsi mov rdi, rsp add rdi, 10*8 // calculate exception stack frame pointer sub rsp, 8 // align the stack pointer call $0 add rsp, 8 // undo stack pointer alignment " :: "i"($name as extern "C" fn( &ExceptionStackFrame, u64)) : "rdi","rsi" : "intel"); restore_scratch_registers!(); asm!("iretq" :::: "intel", "volatile"); ::core::intrinsics::unreachable(); } } wrapper }} } ``` Now we backup the scratch registers to the stack right at the beginning and restore them just before the `iretq`. Like in the `handler` macro, we now need to add `10*8` to `rdi` in order to get the correct exception stack frame pointer (`save_scratch_registers` pushes nine 8 byte registers, plus the error code). We also need to undo the stack pointer alignment after the `call` [^fn-stack-alignment]. [^fn-stack-alignment]: The stack alignment is actually wrong here, since we additionally pushed an uneven number of registers. However, the `pop rsi` is wrong too, since the error code is no longer at the top of the stack. When we fix that problem, the stack alignment becomes correct again. So I left it in to keep things simple. Now we have one last bug: We `pop` the error code into `rsi`, but the error code is no longer at the top of the stack (since `save_scratch_registers` pushed 9 registers on top of it). So we need to do it differently: ```rust // in src/interrupts/mod.rs macro_rules! handler_with_error_code { ($name: ident) => {{ #[naked] extern "C" fn wrapper() -> ! { unsafe { save_scratch_registers!(); asm!("mov rsi, [rsp + 9*8] // load error code into rsi mov rdi, rsp add rdi, 10*8 // calculate exception stack frame pointer sub rsp, 8 // align the stack pointer call $0 add rsp, 8 // undo stack pointer alignment " :: "i"($name as extern "C" fn( &ExceptionStackFrame, u64)) : "rdi","rsi" : "intel"); restore_scratch_registers!(); asm!("add rsp, 8 // pop error code iretq" :::: "intel", "volatile"); ::core::intrinsics::unreachable(); } } wrapper }} } ``` Instead of using `pop`, we're calculating the error code address manually (`save_scratch_registers` pushes nine 8 byte registers) and load it into `rsi` using a `mov`. So now the error code stays on the stack. But `iretq` doesn't handle the error code, so we need to pop it before invoking `iretq`. Phew! That was a lot of fiddling with assembly. Let's test if it still works. ### Testing First, we test if the exception stack frame pointer and the error code are still correct: ```rust // in rust_main in src/lib.rs ... unsafe { int!(3) }; // provoke a page fault unsafe { *(0xdeadbeaf as *mut u64) = 42; } println!("It did not crash!"); loop {} ``` This should cause the following error message: ``` EXCEPTION: PAGE FAULT while accessing 0xdeadbeaf error code: CAUSED_BY_WRITE ExceptionStackFrame { instruction_pointer: 1114753, code_segment: 8, cpu_flags: 2097158, stack_pointer: 1171104, stack_segment: 16 } ``` The error code should still be `CAUSED_BY_WRITE` and the exception stack frame values should also be correct (e.g. `code_segment` should be 8 and `stack_segment` should be 16). #### Returning from Page Faults Let's see what happens if we comment out the trailing `loop` in our page fault handler: ![QEMU printing the same page fault message again and again](qemu-page-fault-return.png) We see that the same error message is printed over and over again. Here is what happens: - The CPU executes `rust_main` and tries to access `0xdeadbeaf`. This causes a page fault. - The page fault handler prints an error message and returns without fixing the cause of the exception (`0xdeadbeaf` is still unaccessible). - The CPU restarts the instruction that caused the page fault and thus tries to access `0xdeadbeaf` again. Of course, this causes a page fault again. - The page fault handler prints the error message and returns. … and so on. Thus, our code indefinitely jumps between the page fault handler and the instruction that accesses `0xdeadbeaf`. This is a good thing! It means that our `iretq` logic is working correctly, since it returns to the correct instruction every time. So our `handler_with_error_code` macro seems to be correct. ## What's next? We are now able to catch exceptions and to return from them. However, there are still exceptions that completely crash our kernel by causing a [triple fault]. In the next post, we will fix this issue by handling a special type of exception: the [double fault]. Thus, we will be able to avoid random reboots in our kernel. [triple fault]: https://en.wikipedia.org/wiki/Triple_fault [double fault]: https://en.wikipedia.org/wiki/Double_fault ================================================ FILE: blog/content/edition-1/extra/naked-exceptions/_index.md ================================================ +++ title = "Handling Exceptions using naked Functions" sort_by = "weight" template = "edition-1/handling-exceptions-with-naked-fns.html" insert_anchor_links = "left" aliases = ["first-edition/extra/naked-exceptions/index.html"] +++ ================================================ FILE: blog/content/edition-1/extra/set-up-gdb/index.md ================================================ +++ title = "Set Up GDB" template = "plain.html" path = "set-up-gdb" aliases = ["set-up-gdb.html"] weight = 4 +++ There are a lot of things that can go wrong when developing an OS. So it's a good idea to add a debugger to our toolset, which allows us to set breakpoints and examine variables. We will use [GDB](https://www.gnu.org/software/gdb/) as QEMU supports it out of the box. ### QEMU parameters To make QEMU listen for a gdb connection, we add the `-s` flag to the `run` target in our Makefile: ```make run: $(iso) @qemu-system-x86_64 -cdrom $(iso) -s ``` This allows us to connect a debugger at any time, for example to investigate why a panic occurred. To wait for a debugger connection on startup, we add a `debug` target to the Makefile: ```make debug: $(iso) @qemu-system-x86_64 -cdrom $(iso) -s -S ``` It is identical to the `run` target except for the additional `-S` flag. This flag causes QEMU to freeze on startup and wait until a debugger is connected. Now it _should_ be possible to connect gdb. ### The annoying issue Unfortunately gdb has an issue with the switch to long mode. If we connect when the CPU is already in long mode, everything works fine. But if we use `make debug` and thus connect right at the start, we get an error when we set a breakpoint in 64-bit mode: ``` Remote 'g' packet reply is too long: [a very long number] ``` This issue is known [since 2012][gdb issue patch] but it is still not fixed. Maybe we find the reason in the [issue thread][gdb issue thread]: [gdb issue patch]: https://web.archive.org/web/20190114181420/https://www.cygwin.com/ml/gdb-patches/2012-03/msg00116.html [gdb issue thread]: https://sourceware.org/bugzilla/show_bug.cgi?id=13984#c11 > from my (limited) experience, unless you ping the gdb-patches list weekly, this patch is more likely to remain forgotten :-) Pretty frustrating, especially since the patch is [very small][gdb patch commit]. [gdb patch commit]: https://github.com/phil-opp/binutils-gdb/commit/9e88c451844ad38bb82fe77d1f388c87c41b4520 ### Building the patched GDB So the only way to use gdb with `make debug` is to build a modified gdb version that includes the patch. I created a repository with the patched GDB to make this easy. Just follow [the build instructions]. [the build instructions]: https://github.com/phil-opp/binutils-gdb#gdb-for-64-bit-rust-operating-systems ### Connecting GDB Now you should have a `rust-os-gdb` subfolder. In its `bin` directory you find the `gdb` executable and the `rust-gdb` script, which [improves rendering of Rust types]. To make it easy to use it for our OS, we add a `make gdb` target to our Makefile: [improves rendering of Rust types]: https://michaelwoerister.github.io/2015/03/27/rust-xxdb.html ```make gdb: @rust-os-gdb/bin/rust-gdb "build/kernel-x86_64.bin" -ex "target remote :1234" ``` It loads the debug information from our kernel binary and connects to the `localhost:1234` port, on which QEMU listens by default. ### Using GDB After connecting to QEMU, you can use various gdb commands to control execution and examine data. All commands can be abbreviated as long they are still unique. For example, you can write `c` or `cont` instead of `continue`. The most important commands are: - `help` or `h`: Show the help. - `break` or `b`: Set a breakpoint. It possible to break on functions such as `rust_main` or on source lines such as `lib.rs:42`. You can use tab for autocompletion and omit parts of the path as long it's still unique. To modify breakpoints, you can use `disable`, `enable`, and `delete` plus the breakpoint number. - `continue` or `c`: Continue execution until a breakpoint is reached. - `next` or `n`: Step over the current line and break on the next line of the function. Sometimes this doesn't work in Rust OSes. - `step` or `s`: Step into the current line, i.e. jump to the called function. Sometimes this doesn't work in Rust OSes. - `list` or `l`: Shows the source code around the current position. - `print` or `p`: Prints the value of a variable. You can use Cs `*` and `&` operators. To print in hexadecimal, use `p/x`. - `tui enable`: Enables the text user interface, which provides a graphical interface (see below). To disable it again, run `tui disable`. ![gdb text user interface](gdb-tui-screenshot.png) Of course there are many more commands. Feel free to send a PR if you think this list is missing something important. For a more complete GDB overview, check out [Beej's Quick Guide][bggdb] or the [website for Harvard's CS161 course][CS161]. [bggdb]: https://beej.us/guide/bggdb/ [CS161]: https://www.eecs.harvard.edu/~cs161/resources/gdb.html ================================================ FILE: blog/content/edition-1/extra/talks.md ================================================ +++ title = "Talks" path = "talks" template = "plain.html" weight = 1 +++ ## 2018 - “The Rust Way Of OS Development” at HTWG Konstanz, May 30, 2018: [slides](https://phil-opp.github.io/talk-konstanz-may-2018/) [pdf](https://phil-opp.github.io/talk-konstanz-may-2018/talk.pdf) ## 2017 - “Open Source OS Development in Rust” at HTWG Konstanz, May 22, 2017: [slides](https://phil-opp.github.io/talk-konstanz-may-2017/) ================================================ FILE: blog/content/edition-1/posts/01-multiboot-kernel/index.md ================================================ +++ title = "A minimal Multiboot Kernel" weight = 1 path = "multiboot-kernel" aliases = ["multiboot-kernel.html", "/2015/08/18/multiboot-kernel/", "/rust-os/multiboot-kernel.html"] date = 2015-08-18 template = "edition-1/page.html" +++ This post explains how to create a minimal x86 operating system kernel using the Multiboot standard. In fact, it will just boot and print `OK` to the screen. In subsequent blog posts we will extend it using the [Rust] programming language. [Rust]: https://www.rust-lang.org/ I tried to explain everything in detail and to keep the code as simple as possible. If you have any questions, suggestions or other issues, please leave a comment or [create an issue] on Github. The source code is available in a [repository][source code], too. [create an issue]: https://github.com/phil-opp/blog_os/issues [source code]: https://github.com/phil-opp/blog_os/tree/first_edition_post_1/src/arch/x86_64 Note that this tutorial is written mainly for Linux. For some known problems on OS X see the comment section and [this issue][mac os issue]. If you want to use a virtual Linux machine, you can find instructions and a Vagrantfile in Ashley Willams's [x86-kernel repository]. [mac os issue]: https://github.com/phil-opp/blog_os/issues/55 [x86-kernel repository]: https://github.com/ashleygwilliams/x86-kernel ## Overview When you turn on a computer, it loads the [BIOS] from some special flash memory. The BIOS runs self test and initialization routines of the hardware, then it looks for bootable devices. If it finds one, the control is transferred to its _bootloader_, which is a small portion of executable code stored at the device's beginning. The bootloader has to determine the location of the kernel image on the device and load it into memory. It also needs to switch the CPU to the so-called [protected mode] because x86 CPUs start in the very limited [real mode] by default (to be compatible to programs from 1978). [BIOS]: https://en.wikipedia.org/wiki/BIOS [protected mode]: https://en.wikipedia.org/wiki/Protected_mode [real mode]: https://wiki.osdev.org/Real_Mode We won't write a bootloader because that would be a complex project on its own (if you really want to do it, check out [_Rolling Your Own Bootloader_]). Instead we will use one of the [many well-tested bootloaders][bootloader comparison] out there to boot our kernel from a CD-ROM. But which one? [_Rolling Your Own Bootloader_]: https://wiki.osdev.org/Rolling_Your_Own_Bootloader [bootloader comparison]: https://en.wikipedia.org/wiki/Comparison_of_boot_loaders ## Multiboot Fortunately there is a bootloader standard: the [Multiboot Specification][multiboot]. Our kernel just needs to indicate that it supports Multiboot and every Multiboot-compliant bootloader can boot it. We will use the Multiboot 2 specification ([PDF][Multiboot 2]) together with the well-known [GRUB 2] bootloader. [multiboot]: https://en.wikipedia.org/wiki/Multiboot_Specification [multiboot 2]: https://nongnu.askapache.com/grub/phcoder/multiboot.pdf [grub 2]: https://wiki.osdev.org/GRUB_2 To indicate our Multiboot 2 support to the bootloader, our kernel must start with a _Multiboot Header_, which has the following format: Field | Type | Value ------------- | --------------- | ---------------------------------------- magic number | u32 | `0xE85250D6` architecture | u32 | `0` for i386, `4` for MIPS header length | u32 | total header size, including tags checksum | u32 | `-(magic + architecture + header_length)` tags | variable | end tag | (u16, u16, u32) | `(0, 0, 8)` Converted to a x86 assembly file it looks like this (Intel syntax): ```nasm section .multiboot_header header_start: dd 0xe85250d6 ; magic number (multiboot 2) dd 0 ; architecture 0 (protected mode i386) dd header_end - header_start ; header length ; checksum dd 0x100000000 - (0xe85250d6 + 0 + (header_end - header_start)) ; insert optional multiboot tags here ; required end tag dw 0 ; type dw 0 ; flags dd 8 ; size header_end: ``` If you don't know x86 assembly, here is some quick guide: - the header will be written to a section named `.multiboot_header` (we need this later) - `header_start` and `header_end` are _labels_ that mark a memory location, we use them to calculate the header length easily - `dd` stands for `define double` (32bit) and `dw` stands for `define word` (16bit). They just output the specified 32bit/16bit constant. - the additional `0x100000000` in the checksum calculation is a small hack[^fn-checksum_hack] to avoid a compiler warning We can already _assemble_ this file (which I called `multiboot_header.asm`) using `nasm`. It produces a flat binary by default, so the resulting file just contains our 24 bytes (in little endian if you work on a x86 machine): ``` > nasm multiboot_header.asm > hexdump -x multiboot_header 0000000 50d6 e852 0000 0000 0018 0000 af12 17ad 0000010 0000 0000 0008 0000 0000018 ``` ## The Boot Code To boot our kernel, we must add some code that the bootloader can call. Let's create a file named `boot.asm`: ```nasm global start section .text bits 32 start: ; print `OK` to screen mov dword [0xb8000], 0x2f4b2f4f hlt ``` There are some new commands: - `global` exports a label (makes it public). As `start` will be the entry point of our kernel, it needs to be public. - the `.text` section is the default section for executable code - `bits 32` specifies that the following lines are 32-bit instructions. It's needed because the CPU is still in [Protected mode] when GRUB starts our kernel. When we switch to [Long mode] in the [next post] we can use `bits 64` (64-bit instructions). - the `mov dword` instruction moves the 32bit constant `0x2f4b2f4f` to the memory at address `b8000` (it prints `OK` to the screen, an explanation follows in the next posts) - `hlt` is the halt instruction and causes the CPU to stop Through assembling, viewing and disassembling we can see the CPU [Opcodes] in action: [Opcodes]: https://en.wikipedia.org/wiki/Opcode ``` > nasm boot.asm > hexdump -x boot 0000000 05c7 8000 000b 2f4b 2f4f 00f4 000000b > ndisasm -b 32 boot 00000000 C70500800B004B2F mov dword [dword 0xb8000],0x2f4b2f4f -4F2F 0000000A F4 hlt ``` ## Building the Executable To boot our executable later through GRUB, it should be an [ELF] executable. So we want `nasm` to create ELF [object files] instead of plain binaries. To do that, we simply pass the `‑f elf64` argument to it. [ELF]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format [object files]: https://wiki.osdev.org/Object_Files To create the ELF _executable_, we need to [link] the object files together. We use a custom [linker script] named `linker.ld`: [link]: https://en.wikipedia.org/wiki/Linker_(computing) [linker script]: https://sourceware.org/binutils/docs/ld/Scripts.html ```ld ENTRY(start) SECTIONS { . = 1M; .boot : { /* ensure that the multiboot header is at the beginning */ *(.multiboot_header) } .text : { *(.text) } } ``` Let's translate it: - `start` is the entry point, the bootloader will jump to it after loading the kernel - `. = 1M;` sets the load address of the first section to 1 MiB, which is a conventional place to load a kernel[^Linker 1M] - the executable will have two sections: `.boot` at the beginning and `.text` afterwards - the `.text` output section contains all input sections named `.text` - Sections named `.multiboot_header` are added to the first output section (`.boot`) to ensure they are at the beginning of the executable. This is necessary because GRUB expects to find the Multiboot header very early in the file. So let's create the ELF object files and link them using our new linker script: ``` > nasm -f elf64 multiboot_header.asm > nasm -f elf64 boot.asm > ld -n -o kernel.bin -T linker.ld multiboot_header.o boot.o ``` It's important to pass the `-n` (or `--nmagic`) flag to the linker, which disables the automatic section alignment in the executable. Otherwise the linker may page align the `.boot` section in the executable file. If that happens, GRUB isn't able to find the Multiboot header because it isn't at the beginning anymore. We can use `objdump` to print the sections of the generated executable and verify that the `.boot` section has a low file offset: ``` > objdump -h kernel.bin kernel.bin: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 0 .boot 00000018 0000000000100000 0000000000100000 00000080 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .text 0000000b 0000000000100020 0000000000100020 000000a0 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE ``` _Note_: The `ld` and `objdump` commands are platform specific. If you're _not_ working on x86_64 architecture, you will need to [cross compile binutils]. Then use `x86_64‑elf‑ld` and `x86_64‑elf‑objdump` instead of `ld` and `objdump`. [cross compile binutils]: @/edition-1/extra/cross-compile-binutils.md ## Creating the ISO All PC BIOSes know how to boot from a CD-ROM, so we want to create a bootable CD-ROM image, containing our kernel and the GRUB bootloader's files, in a single file called an [ISO](https://en.wikipedia.org/wiki/ISO_image). Make the following directory structure and copy the `kernel.bin` to the right place: ``` isofiles └── boot ├── grub │ └── grub.cfg └── kernel.bin ``` The `grub.cfg` specifies the file name of our kernel and its Multiboot 2 compliance. It looks like this: ``` set timeout=0 set default=0 menuentry "my os" { multiboot2 /boot/kernel.bin boot } ``` Now we can create a bootable image using the command: ``` grub-mkrescue -o os.iso isofiles ``` _Note_: `grub-mkrescue` causes problems on some platforms. If it does not work for you, try the following steps: - try to run it with `--verbose` - make sure `xorriso` is installed (`xorriso` or `libisoburn` package) - If you're using an EFI-system, `grub-mkrescue` tries to create an EFI image by default. You can either pass `-d /usr/lib/grub/i386-pc` to avoid EFI or install the `mtools` package to get a working EFI image - on some system the command is named `grub2-mkrescue` ## Booting Now it's time to boot our OS. We will use [QEMU]: [QEMU]: https://en.wikipedia.org/wiki/QEMU ``` qemu-system-x86_64 -cdrom os.iso ``` ![qemu output](qemu-ok.png) Notice the green `OK` in the upper left corner. If it does not work for you, take a look at the comment section. Let's summarize what happens: 1. the BIOS loads the bootloader (GRUB) from the virtual CD-ROM (the ISO) 2. the bootloader reads the kernel executable and finds the Multiboot header 3. it copies the `.boot` and `.text` sections to memory (to addresses `0x100000` and `0x100020`) 4. it jumps to the entry point (`0x100020`, you can obtain it through `objdump -f`) 5. our kernel prints the green `OK` and stops the CPU You can test it on real hardware, too. Just burn the ISO to a disk or USB stick and boot from it. ## Build Automation Right now we need to execute 4 commands in the right order every time we change a file. That's bad. So let's automate the build using a `Makefile`. But first we should create some clean directory structure for our source files to separate the architecture specific files: ``` … ├── Makefile └── src └── arch └── x86_64 ├── multiboot_header.asm ├── boot.asm ├── linker.ld └── grub.cfg ``` The Makefile looks like this (indented with tabs instead of spaces): ```Makefile arch ?= x86_64 kernel := build/kernel-$(arch).bin iso := build/os-$(arch).iso linker_script := src/arch/$(arch)/linker.ld grub_cfg := src/arch/$(arch)/grub.cfg assembly_source_files := $(wildcard src/arch/$(arch)/*.asm) assembly_object_files := $(patsubst src/arch/$(arch)/%.asm, \ build/arch/$(arch)/%.o, $(assembly_source_files)) .PHONY: all clean run iso all: $(kernel) clean: @rm -r build run: $(iso) @qemu-system-x86_64 -cdrom $(iso) iso: $(iso) $(iso): $(kernel) $(grub_cfg) @mkdir -p build/isofiles/boot/grub @cp $(kernel) build/isofiles/boot/kernel.bin @cp $(grub_cfg) build/isofiles/boot/grub @grub-mkrescue -o $(iso) build/isofiles 2> /dev/null @rm -r build/isofiles $(kernel): $(assembly_object_files) $(linker_script) @ld -n -T $(linker_script) -o $(kernel) $(assembly_object_files) # compile assembly files build/arch/$(arch)/%.o: src/arch/$(arch)/%.asm @mkdir -p $(shell dirname $@) @nasm -felf64 $< -o $@ ``` Some comments (see the [Makefile tutorial] if you don't know `make`): - the `$(wildcard src/arch/$(arch)/*.asm)` chooses all assembly files in the src/arch/$(arch)` directory, so you don't have to update the Makefile when you add a file - the `patsubst` operation for `assembly_object_files` just translates `src/arch/$(arch)/XYZ.asm` to `build/arch/$(arch)/XYZ.o` - the `$<` and `$@` in the assembly target are [automatic variables] - if you're using [cross-compiled binutils][cross compile binutils] just replace `ld` with `x86_64‑elf‑ld` [automatic variables]: https://www.gnu.org/software/make/manual/html_node/Automatic-Variables.html Now we can invoke `make` and all updated assembly files are compiled and linked. The `make iso` command also creates the ISO image and `make run` will additionally start QEMU. ## What's next? In the [next post] we will create a page table and do some CPU configuration to switch to the 64-bit [long mode]. [next post]: @/edition-1/posts/02-entering-longmode/index.md [long mode]: https://en.wikipedia.org/wiki/Long_mode ## Footnotes [^fn-checksum_hack]: The formula from the table, `-(magic + architecture + header_length)`, creates a negative value that doesn't fit into 32bit. By subtracting from `0x100000000` (= 2^(32)) instead, we keep the value positive without changing its truncated value. Without the additional sign bit(s) the result fits into 32bit and the compiler is happy :). [^Linker 1M]: We don't want to load the kernel to e.g. `0x0` because there are many special memory areas below the 1MB mark (for example the so-called VGA buffer at `0xb8000`, that we use to print `OK` to the screen). ================================================ FILE: blog/content/edition-1/posts/02-entering-longmode/index.md ================================================ +++ title = "Entering Long Mode" weight = 2 path = "entering-longmode" aliases = ["entering-longmode.html", "/2015/08/25/entering-longmode/", "/rust-os/entering-longmode.html"] date = 2015-08-25 template = "edition-1/page.html" [extra] updated = "2015-10-29" +++ In the [previous post] we created a minimal multiboot kernel. It just prints `OK` and hangs. The goal is to extend it and call 64-bit [Rust] code. But the CPU is currently in [protected mode] and allows only 32-bit instructions and up to 4GiB memory. So we need to set up _Paging_ and switch to the 64-bit [long mode] first. [previous post]: @/edition-1/posts/01-multiboot-kernel/index.md [Rust]: https://www.rust-lang.org/ [protected mode]: https://en.wikipedia.org/wiki/Protected_mode [long mode]: https://en.wikipedia.org/wiki/Long_mode I tried to explain everything in detail and to keep the code as simple as possible. If you have any questions, suggestions, or issues, please leave a comment or [create an issue] on Github. The source code is available in a [repository][source code], too. [create an issue]: https://github.com/phil-opp/blog_os/issues [source code]: https://github.com/phil-opp/blog_os/tree/first_edition_post_2/src/arch/x86_64 ## Some Tests To avoid bugs and strange errors on old CPUs we should check if the processor supports every needed feature. If not, the kernel should abort and display an error message. To handle errors easily, we create an error procedure in `boot.asm`. It prints a rudimentary `ERR: X` message, where X is an error code letter, and hangs: ```nasm ; Prints `ERR: ` and the given error code to screen and hangs. ; parameter: error code (in ascii) in al error: mov dword [0xb8000], 0x4f524f45 mov dword [0xb8004], 0x4f3a4f52 mov dword [0xb8008], 0x4f204f20 mov byte [0xb800a], al hlt ``` At address `0xb8000` begins the so-called [VGA text buffer]. It's an array of screen characters that are displayed by the graphics card. A [future post] will cover the VGA buffer in detail and create a Rust interface to it. But for now, manual bit-fiddling is the easiest option. [VGA text buffer]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode [future post]: @/edition-1/posts/04-printing-to-screen/index.md A screen character consists of a 8 bit color code and a 8 bit [ASCII] character. We used the color code `4f` for all characters, which means white text on red background. `0x52` is an ASCII `R`, `0x45` is an `E`, `0x3a` is a `:`, and `0x20` is a space. The second space is overwritten by the given ASCII byte. Finally the CPU is stopped with the `hlt` instruction. [ASCII]: https://en.wikipedia.org/wiki/ASCII Now we can add some check _functions_. A function is just a normal label with an `ret` (return) instruction at the end. The `call` instruction can be used to call it. Unlike the `jmp` instruction that just jumps to a memory address, the `call` instruction will push a return address to the stack (and the `ret` will jump to this address). But we don't have a stack yet. The [stack pointer] in the esp register could point to some important data or even invalid memory. So we need to update it and point it to some valid stack memory. [stack pointer]: https://stackoverflow.com/a/1464052/866447 ### Creating a Stack To create stack memory we reserve some bytes at the end of our `boot.asm`: ```nasm ... section .bss stack_bottom: resb 64 stack_top: ``` A stack doesn't need to be initialized because we will `pop` only when we `pushed` before. So storing the stack memory in the executable file would make it unnecessary large. By using the [.bss] section and the `resb` (reserve byte) command, we just store the length of the uninitialized data (= 64). When loading the executable, GRUB will create the section of required size in memory. [.bss]: https://en.wikipedia.org/wiki/.bss To use the new stack, we update the stack pointer register right after `start`: ```nasm global start section .text bits 32 start: mov esp, stack_top ; print `OK` to screen ... ``` We use `stack_top` because the stack grows downwards: A `push eax` subtracts 4 from `esp` and does a `mov [esp], eax` afterwards (`eax` is a general purpose register). Now we have a valid stack pointer and are able to call functions. The following check functions are just here for completeness and I won't explain details. Basically they all work the same: They will check for a feature and jump to `error` if it's not available. ### Multiboot check We rely on some Multiboot features in the next posts. To make sure the kernel was really loaded by a Multiboot compliant bootloader, we can check the `eax` register. According to the Multiboot specification ([PDF][Multiboot specification]), the bootloader must write the magic value `0x36d76289` to it before loading a kernel. To verify that we can add a simple function: ```nasm check_multiboot: cmp eax, 0x36d76289 jne .no_multiboot ret .no_multiboot: mov al, "0" jmp error ``` We use the `cmp` instruction to compare the value in `eax` to the magic value. If the values are equal, the `cmp` instruction sets the zero flag in the [FLAGS register]. The `jne` (“jump if not equal”) instruction reads this zero flag and jumps to the given address if it's not set. Thus we jump to the `.no_multiboot` label if `eax` does not contain the magic value. In `no_multiboot`, we use the `jmp` (“jump”) instruction to jump to our error function. We could just as well use the `call` instruction, which additionally pushes the return address. But the return address is not needed because `error` never returns. To pass `0` as error code to the `error` function, we move it into `al` before the jump (`error` will read it from there). [Multiboot specification]: https://nongnu.askapache.com/grub/phcoder/multiboot.pdf [FLAGS register]: https://en.wikipedia.org/wiki/FLAGS_register ### CPUID check [CPUID] is a CPU instruction that can be used to get various information about the CPU. But not every processor supports it. CPUID detection is quite laborious, so we just copy a detection function from the [OSDev wiki][CPUID detection]: [CPUID]: https://wiki.osdev.org/CPUID [CPUID detection]: https://wiki.osdev.org/Setting_Up_Long_Mode#Detection_of_CPUID ```nasm check_cpuid: ; Check if CPUID is supported by attempting to flip the ID bit (bit 21) ; in the FLAGS register. If we can flip it, CPUID is available. ; Copy FLAGS in to EAX via stack pushfd pop eax ; Copy to ECX as well for comparing later on mov ecx, eax ; Flip the ID bit xor eax, 1 << 21 ; Copy EAX to FLAGS via the stack push eax popfd ; Copy FLAGS back to EAX (with the flipped bit if CPUID is supported) pushfd pop eax ; Restore FLAGS from the old version stored in ECX (i.e. flipping the ; ID bit back if it was ever flipped). push ecx popfd ; Compare EAX and ECX. If they are equal then that means the bit ; wasn't flipped, and CPUID isn't supported. cmp eax, ecx je .no_cpuid ret .no_cpuid: mov al, "1" jmp error ``` Basically, the `CPUID` instruction is supported if we can flip some bit in the [FLAGS register]. We can't operate on the flags register directly, so we need to load it into some general purpose register such as `eax` first. The only way to do this is to push the `FLAGS` register on the stack through the `pushfd` instruction and then pop it into `eax`. Equally, we write it back through `push ecx` and `popfd`. To flip the bit we use the `xor` instruction to perform an [exclusive OR]. Finally we compare the two values and jump to `.no_cpuid` if both are equal (`je` – “jump if equal”). The `.no_cpuid` code just jumps to the `error` function with error code `1`. Don't worry, you don't need to understand the details. [exclusive OR]: https://en.wikipedia.org/wiki/Exclusive_or ### Long Mode check Now we can use CPUID to detect whether long mode can be used. I use code from [OSDev][long mode detection] again: [long mode detection]: wiki.osdev.org/Setting_Up_Long_Mode#Checking_for_long_mode_support ```nasm check_long_mode: ; test if extended processor info in available mov eax, 0x80000000 ; implicit argument for cpuid cpuid ; get highest supported argument cmp eax, 0x80000001 ; it needs to be at least 0x80000001 jb .no_long_mode ; if it's less, the CPU is too old for long mode ; use extended info to test if long mode is available mov eax, 0x80000001 ; argument for extended processor info cpuid ; returns various feature bits in ecx and edx test edx, 1 << 29 ; test if the LM-bit is set in the D-register jz .no_long_mode ; If it's not set, there is no long mode ret .no_long_mode: mov al, "2" jmp error ``` Like many low-level things, CPUID is a bit strange. Instead of taking a parameter, the `cpuid` instruction implicitly uses the `eax` register as argument. To test if long mode is available, we need to call `cpuid` with `0x80000001` in `eax`. This loads some information to the `ecx` and `edx` registers. Long mode is supported if the 29th bit in `edx` is set. [Wikipedia][cpuid long mode] has detailed information. [cpuid long mode]: https://en.wikipedia.org/wiki/CPUID#EAX=8000'0001h:_Extended_Processor_Info_and_Feature_Bits If you look at the assembly above, you'll probably notice that we call `cpuid` twice. The reason is that the CPUID command started with only a few functions and was extended over time. So old processors may not know the `0x80000001` argument at all. To test if they do, we need to invoke `cpuid` with `0x80000000` in `eax` first. It returns the highest supported parameter value in `eax`. If it's at least `0x80000001`, we can test for long mode as described above. Else the CPU is old and doesn't know what long mode is either. In that case, we directly jump to `.no_long_mode` through the `jb` instruction (“jump if below”). ### Putting it together We just call these check functions right after start: ```nasm global start section .text bits 32 start: mov esp, stack_top call check_multiboot call check_cpuid call check_long_mode ; print `OK` to screen ... ``` When the CPU doesn't support a needed feature, we get an error message with an unique error code. Now we can start the real work. ## Paging _Paging_ is a memory management scheme that separates virtual and physical memory. The address space is split into equal sized _pages_ and a _page table_ specifies which virtual page points to which physical page. If you never heard of paging, you might want to look at the paging introduction ([PDF][paging chapter]) of the [Three Easy Pieces] OS book. [paging chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-paging.pdf [Three Easy Pieces]: http://pages.cs.wisc.edu/~remzi/OSTEP/ In long mode, x86 uses a page size of 4096 bytes and a 4 level page table that consists of: - the Page-Map Level-4 Table (PML4), - the Page-Directory Pointer Table (PDP), - the Page-Directory Table (PD), - and the Page Table (PT). As I don't like these names, I will call them P4, P3, P2, and P1 from now on. Each page table contains 512 entries and one entry is 8 bytes, so they fit exactly in one page (`512*8 = 4096`). To translate a virtual address to a physical address the CPU[^hardware_lookup] will do the following[^virtual_physical_translation_source]: ![translation of virtual to physical addresses in 64 bit mode](X86_Paging_64bit.svg) 1. Get the address of the P4 table from the CR3 register 2. Use bits 39-47 (9 bits) as an index into P4 (`2^9 = 512 = number of entries`) 3. Use the following 9 bits as an index into P3 4. Use the following 9 bits as an index into P2 5. Use the following 9 bits as an index into P1 6. Use the last 12 bits as page offset (`2^12 = 4096 = page size`) But what happens to bits 48-63 of the 64-bit virtual address? Well, they can't be used. The “64-bit” long mode is in fact just a 48-bit mode. The bits 48-63 must be copies of bit 47, so each valid virtual address is still unique. For more information see [Wikipedia][wikipedia_48bit_mode]. [wikipedia_48bit_mode]: https://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details An entry in the P4, P3, P2, and P1 tables consists of the page aligned 52-bit _physical_ address of the frame or the next page table and the following bits that can be OR-ed in: | Bit(s) | Name | Meaning | | ------ | --------------------- | -------------------------------------------------------------------------------------------- | | 0 | present | the page is currently in memory | | 1 | writable | it's allowed to write to this page | | 2 | user accessible | if not set, only kernel mode code can access this page | | 3 | write through caching | writes go directly to memory | | 4 | disable cache | no cache is used for this page | | 5 | accessed | the CPU sets this bit when this page is used | | 6 | dirty | the CPU sets this bit when a write to this page occurs | | 7 | huge page/null | must be 0 in P1 and P4, creates a 1GiB page in P3, creates a 2MiB page in P2 | | 8 | global | page isn't flushed from caches on address space switch (PGE bit of CR4 register must be set) | | 9-11 | available | can be used freely by the OS | | 52-62 | available | can be used freely by the OS | | 63 | no execute | forbid executing code on this page (the NXE bit in the EFER register must be set) | ### Set Up Identity Paging When we switch to long mode, paging will be activated automatically. The CPU will then try to read the instruction at the following address, but this address is now a virtual address. So we need to do _identity mapping_, i.e. map a physical address to the same virtual address. The `huge page` bit is now very useful to us. It creates a 2MiB (when used in P2) or even a 1GiB page (when used in P3). So we could map the first _gigabytes_ of the kernel with only one P4 and one P3 table by using 1GiB pages. Unfortunately 1GiB pages are relatively new feature, for example Intel introduced it 2010 in the [Westmere architecture]. Therefore we will use 2MiB pages instead to make our kernel compatible to older computers, too. [Westmere architecture]: https://en.wikipedia.org/wiki/Westmere_(microarchitecture)#Technology To identity map the first gigabyte of our kernel with 512 2MiB pages, we need one P4, one P3, and one P2 table. Of course we will replace them with finer-grained tables later. But now that we're stuck with assembly, we choose the easiest way. We can add these two tables at the beginning[^page_table_alignment] of the `.bss` section: ```nasm ... section .bss align 4096 p4_table: resb 4096 p3_table: resb 4096 p2_table: resb 4096 stack_bottom: resb 64 stack_top: ``` The `resb` command reserves the specified amount of bytes without initializing them, so the 8KiB don't need to be saved in the executable. The `align 4096` ensures that the page tables are page aligned. When GRUB creates the `.bss` section in memory, it will initialize it to `0`. So the `p4_table` is already valid (it contains 512 non-present entries) but not very useful. To be able to map 2MiB pages, we need to link P4's first entry to the `p3_table` and P3's first entry to the the `p2_table`: ```nasm set_up_page_tables: ; map first P4 entry to P3 table mov eax, p3_table or eax, 0b11 ; present + writable mov [p4_table], eax ; map first P3 entry to P2 table mov eax, p2_table or eax, 0b11 ; present + writable mov [p3_table], eax ; TODO map each P2 entry to a huge 2MiB page ret ``` We just set the present and writable bits (`0b11` is a binary number) in the aligned P3 table address and move it to the first 4 bytes of the P4 table. Then we do the same to link the first P3 entry to the `p2_table`. Now we need to map P2's first entry to a huge page starting at 0, P2's second entry to a huge page starting at 2MiB, P2's third entry to a huge page starting at 4MiB, and so on. It's time for our first (and only) assembly loop: ```nasm set_up_page_tables: ... ; map each P2 entry to a huge 2MiB page mov ecx, 0 ; counter variable .map_p2_table: ; map ecx-th P2 entry to a huge page that starts at address 2MiB*ecx mov eax, 0x200000 ; 2MiB mul ecx ; start address of ecx-th page or eax, 0b10000011 ; present + writable + huge mov [p2_table + ecx * 8], eax ; map ecx-th entry inc ecx ; increase counter cmp ecx, 512 ; if counter == 512, the whole P2 table is mapped jne .map_p2_table ; else map the next entry ret ``` Maybe I should first explain how an assembly loop works. We use the `ecx` register as a counter variable, just like `i` in a for loop. After mapping the `ecx-th` entry, we increase `ecx` by one and jump to `.map_p2_table` again if it's still smaller than 512. To map a P2 entry we first calculate the start address of its page in `eax`: The `ecx-th` entry needs to be mapped to `ecx * 2MiB`. We use the `mul` operation for that, which multiplies `eax` with the given register and stores the result in `eax`. Then we set the `present`, `writable`, and `huge page` bits and write it to the P2 entry. The address of the `ecx-th` entry in P2 is `p2_table + ecx * 8`, because each entry is 8 bytes large. Now the first gigabyte (512 * 2MiB) of our kernel is identity mapped and thus accessible through the same physical and virtual addresses. ### Enable Paging To enable paging and enter long mode, we need to do the following: 1. write the address of the P4 table to the CR3 register (the CPU will look there, see the [paging section](#paging)) 2. long mode is an extension of [Physical Address Extension] \(PAE), so we need to enable PAE first 3. Set the long mode bit in the EFER register 4. Enable Paging [Physical Address Extension]: https://en.wikipedia.org/wiki/Physical_Address_Extension The assembly function looks like this (some boring bit-moving to various registers): ```nasm enable_paging: ; load P4 to cr3 register (cpu uses this to access the P4 table) mov eax, p4_table mov cr3, eax ; enable PAE-flag in cr4 (Physical Address Extension) mov eax, cr4 or eax, 1 << 5 mov cr4, eax ; set the long mode bit in the EFER MSR (model specific register) mov ecx, 0xC0000080 rdmsr or eax, 1 << 8 wrmsr ; enable paging in the cr0 register mov eax, cr0 or eax, 1 << 31 mov cr0, eax ret ``` The `or eax, 1 << X` is a common pattern. It sets the bit `X` in the eax register (`<<` is a left shift). Through `rdmsr` and `wrmsr` it's possible to read/write to the so-called model specific registers at address `ecx` (in this case `ecx` points to the EFER register). Finally we need to call our new functions in `start`: ```nasm ... start: mov esp, stack_top call check_multiboot call check_cpuid call check_long_mode call set_up_page_tables ; new call enable_paging ; new ; print `OK` to screen mov dword [0xb8000], 0x2f4b2f4f hlt ... ``` To test it we execute `make run`. If the green OK is still printed, we have successfully enabled paging! ## The Global Descriptor Table After enabling Paging, the processor is in long mode. So we can use 64-bit instructions now, right? Wrong. The processor is still in a 32-bit compatibility submode. To actually execute 64-bit code, we need to set up a new Global Descriptor Table. The Global Descriptor Table (GDT) was used for _Segmentation_ in old operating systems. I won't explain Segmentation but the [Three Easy Pieces] OS book has good introduction ([PDF][Segmentation chapter]) again. [Segmentation chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-segmentation.pdf Today almost everyone uses Paging instead of Segmentation (and so do we). But on x86, a GDT is always required, even when you're not using Segmentation. GRUB has set up a valid 32-bit GDT for us but now we need to switch to a long mode GDT. A GDT always starts with a 0-entry and contains an arbitrary number of segment entries afterwards. A 64-bit entry has the following format: | Bit(s) | Name | Meaning | | ------ | --------------- | ----------------------------------------------------------------------------------------------------------------- | | 0-41 | ignored | ignored in 64-bit mode | | 42 | conforming | the current privilege level can be higher than the specified level for code segments (else it must match exactly) | | 43 | executable | if set, it's a code segment, else it's a data segment | | 44 | descriptor type | should be 1 for code and data segments | | 45-46 | privilege | the [ring level]: 0 for kernel, 3 for user | | 47 | present | must be 1 for valid selectors | | 48-52 | ignored | ignored in 64-bit mode | | 53 | 64-bit | should be set for 64-bit code segments | | 54 | 32-bit | must be 0 for 64-bit segments | | 55-63 | ignored | ignored in 64-bit mode | [ring level]: https://wiki.osdev.org/Security#Rings We need one code segment, a data segment is not necessary in 64-bit mode. Code segments have the following bits set: _descriptor type_, _present_, _executable_ and the _64-bit_ flag. Translated to assembly the long mode GDT looks like this: ```nasm section .rodata gdt64: dq 0 ; zero entry dq (1<<43) | (1<<44) | (1<<47) | (1<<53) ; code segment ``` We chose the `.rodata` section here because it's initialized read-only data. The `dq` command stands for `define quad` and outputs a 64-bit constant (similar to `dw` and `dd`). And the `(1<<43)` is a bit shift that sets bit 43. ### Loading the GDT To load our new 64-bit GDT, we have to tell the CPU its address and length. We do this by passing the memory location of a special pointer structure to the `lgdt` (load GDT) instruction. The pointer structure looks like this: ```nasm gdt64: dq 0 ; zero entry dq (1<<43) | (1<<44) | (1<<47) | (1<<53) ; code segment .pointer: dw $ - gdt64 - 1 dq gdt64 ``` The first 2 bytes specify the (GDT length - 1). The `$` is a special symbol that is replaced with the current address (it's equal to `.pointer` in our case). The following 8 bytes specify the GDT address. Labels that start with a point (such as `.pointer`) are sub-labels of the last label without point. To access them, they must be prefixed with the parent label (e.g., `gdt64.pointer`). Now we can load the GDT in `start`: ```nasm start: ... call enable_paging ; load the 64-bit GDT lgdt [gdt64.pointer] ; print `OK` to screen ... ``` When you still see the green `OK`, everything went fine and the new GDT is loaded. But we still can't execute 64-bit code: The code selector register `cs` still has the values from the old GDT. To update it, we need to load it with the GDT offset (in bytes) of the desired segment. In our case the code segment starts at byte 8 of the GDT, but we don't want to hardcode that 8 (in case we modify our GDT later). Instead, we add a `.code` label to our GDT, that calculates the offset directly from the GDT: ```nasm section .rodata gdt64: dq 0 ; zero entry .code: equ $ - gdt64 ; new dq (1<<43) | (1<<44) | (1<<47) | (1<<53) ; code segment .pointer: ... ``` We can't just use a normal label here, since we need the table _offset_. We calculate this offset using the current address `$` and set the label to this value using [equ]. Now we can use `gdt64.code` instead of 8 and this label will still work if we modify the GDT. [equ]: https://www.nasm.us/doc/nasmdoc3.html#section-3.2.4 In order to finally enter the true 64-bit mode, we need to load `cs` with `gdt64.code`. But we can't do it through `mov`. The only way to reload the code selector is a _far jump_ or a _far return_. These instructions work like a normal jump/return but change the code selector. We use a far jump to a long mode label: ```nasm global start extern long_mode_start ... start: ... lgdt [gdt64.pointer] jmp gdt64.code:long_mode_start ... ``` The actual `long_mode_start` label is defined as `extern`, so it's part of another file. The `jmp gdt64.code:long_mode_start` is the mentioned far jump. I put the 64-bit code into a new file to separate it from the 32-bit code, thereby we can't call the (now invalid) 32-bit code accidentally. The new file (I named it `long_mode_init.asm`) looks like this: ```nasm global long_mode_start section .text bits 64 long_mode_start: ; print `OKAY` to screen mov rax, 0x2f592f412f4b2f4f mov qword [0xb8000], rax hlt ``` You should see a green `OKAY` on the screen. Some notes on this last step: - As the CPU expects 64-bit instructions now, we use `bits 64` - We can now use the extended registers. Instead of the 32-bit `eax`, `ebx`, etc. we now have the 64-bit `rax`, `rbx`, … - and we can write these 64-bit registers directly to memory using `mov qword` (quad word) _Congratulations_! You have successfully wrestled through this CPU configuration and compatibility mode mess :). #### One Last Thing Above, we reloaded the code segment register `cs` with the new GDT offset. However, the data segment registers `ss`, `ds`, `es`, `fs`, and `gs` still contain the data segment offsets of the old GDT. This isn't necessarily bad, since they're ignored by almost all instructions in 64-bit mode. However, there are a few instructions that expect a valid data segment descriptor _or the null descriptor_ in those registers. An example is the the [iretq] instruction that we'll need in the [_Returning from Exceptions_] post. [iretq]: @/edition-1/extra/naked-exceptions/03-returning-from-exceptions/index.md#the-iretq-instruction [_Returning from Exceptions_]: @/edition-1/extra/naked-exceptions/03-returning-from-exceptions/index.md To avoid future problems, we reload all data segment registers with null: ```nasm long_mode_start: ; load 0 into all data segment registers mov ax, 0 mov ss, ax mov ds, ax mov es, ax mov fs, ax mov gs, ax ; print `OKAY` to screen ... ``` ## What's next? It's time to finally leave assembly behind and switch to [Rust]. Rust is a systems language without garbage collections that guarantees memory safety. Through a real type system and many abstractions it feels like a high-level language but can still be low-level enough for OS development. The [next post] describes the Rust setup. [Rust]: https://www.rust-lang.org/ [next post]: @/edition-1/posts/03-set-up-rust/index.md ## Footnotes [^hardware_lookup]: In the x86 architecture, the page tables are _hardware walked_, so the CPU will look at the table on its own when it needs a translation. Other architectures, for example MIPS, just throw an exception and let the OS translate the virtual address. [^virtual_physical_translation_source]: Image source: [Wikipedia](https://commons.wikimedia.org/wiki/File:X86_Paging_64bit.svg), with modified font size, page table naming, and removed sign extended bits. The modified file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license. [^page_table_alignment]: Page tables need to be page-aligned as the bits 0-11 are used for flags. By putting these tables at the beginning of `.bss`, the linker can just page align the whole section and we don't have unused padding bytes in between. ================================================ FILE: blog/content/edition-1/posts/03-set-up-rust/index.md ================================================ +++ title = "Set Up Rust" weight = 3 path = "set-up-rust" aliases = ["set-up-rust.html", "setup-rust.html", "/2015/09/02/setup-rust/", "/rust-os/setup-rust.html"] date = 2015-09-02 template = "edition-1/page.html" [extra] updated = "2017-04-12" +++ In the previous posts we created a [minimal Multiboot kernel][multiboot post] and [switched to Long Mode][long mode post]. Now we can finally switch to [Rust] code. Rust is a high-level language without runtime. It allows us to not link the standard library and write bare metal code. Unfortunately the setup is not quite hassle-free yet. [multiboot post]: @/edition-1/posts/01-multiboot-kernel/index.md [long mode post]: @/edition-1/posts/02-entering-longmode/index.md [Rust]: https://www.rust-lang.org/ This blog post tries to set up Rust step-by-step and point out the different problems. If you have any questions, problems, or suggestions please [file an issue] or create a comment at the bottom. The code from this post is in a [Github repository], too. [file an issue]: https://github.com/phil-opp/blog_os/issues [Github repository]: https://github.com/phil-opp/blog_os/tree/first_edition_post_3 ## Installing Rust We need a nightly compiler, as we will use many unstable features. To manage Rust installations I highly recommend [rustup]. It allows you to install nightly, beta, and stable compilers side-by-side and makes it easy to update them. To use a nightly compiler for the current directory, you can run `rustup override add nightly`. Alternatively, you can add a file called `rust-toolchain` to the project's root directory: ``` nightly ``` [rustup]: https://www.rustup.rs/ ## Creating a Cargo project [Cargo] is Rust's excellent package manager. Normally you would call `cargo new` when you want to create a new project folder. We can't use it because our folder already exists, so we need to do it manually. Fortunately we only need to add a cargo configuration file named `Cargo.toml`: [Cargo]: https://doc.crates.io/guide.html ```toml [package] name = "blog_os" version = "0.1.0" authors = ["Philipp Oppermann "] [lib] crate-type = ["staticlib"] ``` The `package` section contains required project metadata such as the [semantic crate version]. The `lib` section specifies that we want to build a static library, i.e. a library that contains all of its dependencies. This is required to link the Rust project with our kernel. [semantic crate version]: https://doc.rust-lang.org/cargo/reference/manifest.html#the-package-section Now we place our root source file in `src/lib.rs`: ```rust #![feature(lang_items)] #![no_std] #[no_mangle] pub extern fn rust_main() {} #[lang = "eh_personality"] #[no_mangle] pub extern fn eh_personality() {} #[lang = "panic_fmt"] #[no_mangle] pub extern fn panic_fmt() -> ! {loop{}} ``` Let's break it down: - `#!` defines an [attribute] of the current module. Since we are at the root module, the attributes apply to the crate itself. - The `feature` attribute is used to allow the specified _feature-gated_ attributes in this crate. You can't do that in a stable/beta compiler, so this is one reason we need a Rust nighly. - The `no_std` attribute prevents the automatic linking of the standard library. We can't use `std` because it relies on operating system features like files, system calls, and various device drivers. Remember that currently the only “feature” of our OS is printing `OKAY` :). - A `#` without a `!` afterwards defines an attribute for the _following_ item (a function in our case). - The `no_mangle` attribute disables the automatic [name mangling] that Rust uses to get unique function names. We want to do a `call rust_main` from our assembly code, so this function name must stay as it is. - We mark our main function as `extern` to make it compatible to the standard C [calling convention]. - The `lang` attribute defines a Rust [language item]. - The `eh_personality` function is used for Rust's [unwinding] on `panic!`. We can leave it empty since we don't have any unwinding support in our OS yet. - The `panic_fmt` function is the entry point on panic. Right now we can't do anything useful, so we just make sure that it doesn't return (required by the `!` return type). [attribute]: https://doc.rust-lang.org/book/attributes.html [name mangling]: https://en.wikipedia.org/wiki/Name_mangling [calling convention]: https://en.wikipedia.org/wiki/Calling_convention [language item]: https://doc.rust-lang.org/1.10.0/book/lang-items.html [unwinding]: https://doc.rust-lang.org/nomicon/unwinding.html ## Building Rust We can now build it using `cargo build`, which creates a static library at `target/debug/libblog_os.a`. However, the resulting library is specific to our _host_ operating system. This is undesirable, because our target system might be different. Let's define some properties of our target system: - **x86_64**: Our target CPU is a recent `x86_64` CPU. - **No operating system**: Our target does not run any operating system (we're currently writing it), so the compiler should not assume any OS-specific functionality. - **Handles hardware interrupts**: We're writing a kernel, so we'll need to handle asynchronous hardware interrupts at some point. This means that we have to disable a certain stack pointer optimization (the so-called [red zone]), because it would cause stack corruptions otherwise. - **No SSE**: Our target might not have [SSE] support. Even if it does, we probably don't want to use SSE instructions in our kernel, because it makes interrupt handling much slower. We will explain this in detail in the [“Handling Exceptions”] post. - **No hardware floats**: The `x86_64` architecture uses SSE instructions for floating point operations, which we don't want to use (see the previous point). So we also need to avoid hardware floating point operations in our kernel. Instead, we will use _soft floats_, which are basically software functions that emulate floating point operations using normal integers. [“Handling Exceptions”]: @/edition-1/posts/09-handling-exceptions/index.md ### Target Specifications Rust allows us to define [custom targets] through a JSON configuration file. A minimal target specification equal to `x86_64-unknown-linux-gnu` (the default 64-bit Linux target) looks like this: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "linker-flavor": "gcc", "target-endian": "little", "target-pointer-width": "64", "target-c-int-width": "32", "arch": "x86_64", "os": "linux" } ``` [custom targets]: https://doc.rust-lang.org/1.1.0/rustc_back/target/ The `llvm-target` field specifies the target triple that is passed to LLVM. [Target triples] are a naming convention that define the CPU architecture (e.g., `x86_64` or `arm`), the vendor (e.g., `apple` or `unknown`), the operating system (e.g., `windows` or `linux`), and the [ABI] \(e.g., `gnu` or `msvc`). For example, the target triple for 64-bit Linux is `x86_64-unknown-linux-gnu` and for 32-bit Windows the target triple is `i686-pc-windows-msvc`. [Target triples]: https://llvm.org/docs/LangRef.html#target-triple [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface The `data-layout` field is also passed to LLVM and specifies how data should be laid out in memory. It consists of various specifications separated by a `-` character. For example, the `e` means little endian and `S128` specifies that the stack should be 128 bits (= 16 byte) aligned. The format is described in detail in the [LLVM documentation][data layout] but there shouldn't be a reason to change this string. The `linker-flavor` field was recently introduced in [#40018] with the intention to add support for the LLVM linker [LLD], which is platform independent. In the future, this might allow easy cross compilation without the need to install a gcc cross compiler for linking. [#40018]: https://github.com/rust-lang/rust/pull/40018 [LLD]: https://lld.llvm.org/ The other fields are used for conditional compilation. This allows crate authors to use `cfg` variables to write special code for depending on the OS or the architecture. There isn't any up-to-date documentation about these fields but the [corresponding source code][target specification] is quite readable. [data layout]: https://llvm.org/docs/LangRef.html#data-layout [target specification]: https://github.com/rust-lang/rust/blob/c772948b687488a087356cb91432425662e034b9/src/librustc_back/target/mod.rs#L194-L214 ### A Kernel Target Specification For our target system, we define the following JSON configuration in a file named `x86_64-blog_os.json`: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "linker-flavor": "gcc", "target-endian": "little", "target-pointer-width": "64", "target-c-int-width": "32", "arch": "x86_64", "os": "none", "disable-redzone": true, "features": "-mmx,-sse,+soft-float" } ``` As `llvm-target` we use `x86_64-unknown-none`, which defines the `x86_64` architecture, an `unknown` vendor, and no operating system (`none`). The ABI doesn't matter for us, so we just leave it off. The `data-layout` field is just copied from the `x86_64-unknown-linux-gnu` target. We also use the same values for the `target-endian`, `target-pointer-width`, `target-c-int-width`, and `arch` fields. For the `os` field we choose `none`, since our kernel runs on bare metal. #### The Red Zone The [red zone] is an optimization of the [System V ABI] that allows functions to temporary use the 128 bytes below its stack frame without adjusting the stack pointer: [red zone]: https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone [System V ABI]: https://wiki.osdev.org/System_V_ABI ![stack frame with red zone](red-zone.svg) The image shows the stack frame of a function with `n` local variables. On function entry, the stack pointer is adjusted to make room on the stack for the local variables. The red zone is defined as the 128 bytes below the adjusted stack pointer. The function can use this area for temporary data that's not needed across function calls. Thus, the two instructions for adjusting the stack pointer can be avoided in some cases (e.g. in small leaf functions). However, this optimization leads to huge problems with exceptions or hardware interrupts. Let's assume that an exception occurs while a function uses the red zone: ![red zone overwritten by exception handler](red-zone-overwrite.svg) The CPU and the exception handler overwrite the data in red zone. But this data is still needed by the interrupted function. So the function won't work correctly anymore when we return from the exception handler. This might lead to strange bugs that [take weeks to debug]. [take weeks to debug]: https://forum.osdev.org/viewtopic.php?t=21720 To avoid such bugs when we implement exception handling in the future, we disable the red zone right from the beginning. This is achieved by adding the `"disable-redzone": true` line to our target configuration file. #### SIMD Extensions The `features` field enables/disables target features. We disable the `mmx` and `sse` features by prefixing them with a minus and enable the `soft-float` feature by prefixing it with a plus. The `mmx` and `sse` features determine support for [Single Instruction Multiple Data (SIMD)] instructions, which simultaneously perform an operation (e.g. addition) on multiple data words. The `x86` architecture supports the following standards: [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD - [MMX]: The _Multi Media Extension_ instruction set was introduced in 1997 and defines eight 64 bit registers called `mm0` through `mm7`. These registers are just aliases for the registers of the [x87 floating point unit]. - [SSE]: The _Streaming SIMD Extensions_ instruction set was introduced in 1999. Instead of re-using the floating point registers, it adds a completely new register set. The sixteen new registers are called `xmm0` through `xmm15` and are 128 bits each. - [AVX]: The _Advanced Vector Extensions_ are extensions that further increase the size of the multimedia registers. The new registers are called `ymm0` through `ymm15` and are 256 bits each. They extend the `xmm` registers, so e.g. `xmm0` is the lower half of `ymm0`. [MMX]: https://en.wikipedia.org/wiki/MMX_(instruction_set) [x87 floating point unit]: https://en.wikipedia.org/wiki/X87 [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions [AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions By using such SIMD standards, programs can often speed up significantly. Good compilers are able to transform normal loops into such SIMD code automatically through a process called [auto-vectorization]. [auto-vectorization]: https://en.wikipedia.org/wiki/Automatic_vectorization However, the large SIMD registers lead to problems in OS kernels. The reason is that the kernel has to backup all registers that it uses on each hardware interrupt (we will look into this in the [“Handling Exceptions”] post). So if the kernel uses SIMD registers, it has to backup a lot more data, which noticeably decreases performance. To avoid this performance loss, we disable the `sse` and `mmx` features (the `avx` feature is disabled by default). As noted above, floating point operations on `x86_64` use SSE registers, so floats are no longer usable without SSE. Unfortunately, the Rust core library already uses floats (e.g., it implements traits for `f32` and `f64`), so we need an alternative way to implement float operations. The `soft-float` feature solves this problem by emulating all floating point operations through software functions based on normal integers. ### Compiling To build our kernel for our new target, we pass the configuration file's name as `--target` argument. There is currently an [open bug][custom-target-bug] for custom target specifications, so you also need to set the `RUST_TARGET_PATH` environment variable to the current directory, otherwise Rust doesn't find your target. The full command is: [custom-target-bug]: https://github.com/rust-lang/cargo/issues/4905 ``` RUST_TARGET_PATH=$(pwd) cargo build --target x86_64-blog_os ``` However, the following error occurs: ``` error[E0463]: can't find crate for `core` | = note: the `x86_64-blog_os` target may not be installed ``` The error tells us that the Rust compiler no longer finds the core library. The [core library] is implicitly linked to all `no_std` crates and contains things such as `Result`, `Option`, and iterators. [core library]: https://doc.rust-lang.org/nightly/core/index.html The problem is that the core library is distributed together with the Rust compiler as a _precompiled_ library. So it is only valid for the host triple (e.g., `x86_64-unknown-linux-gnu`) but not for our custom target. If we want to compile code for other targets, we need to recompile `core` for these targets first. #### Xargo That's where [xargo] comes in. It is a wrapper for cargo that eases cross compilation. We can install it by executing: [xargo]: https://github.com/japaric/xargo ``` cargo install xargo ``` Xargo depends on the rust source code, which we can install with `rustup component add rust-src`. Xargo is “a drop-in replacement for cargo”, so every cargo command also works with `xargo`. You can do e.g. `xargo --help`, `xargo clean`, or `xargo doc`. However, the `build` command gains additional functionality: `xargo build` will automatically cross compile the `core` library when compiling for custom targets. Let's try it: ```bash > RUST_TARGET_PATH=$(pwd) xargo build --target=x86_64-blog_os Compiling core v0.0.0 (file:///…/rust/src/libcore) Finished release [optimized] target(s) in 22.87 secs Compiling blog_os v0.1.0 (file:///…/blog_os/tags) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` It worked! We see that `xargo` cross-compiled the `core` library for our new custom target and then continued to compile our `blog_os` crate. After compilation, we can find a static library at `target/x86_64-blog_os/debug/libblog_os.a`, which can be linked with our assembly kernel. ## Integrating Rust Let's try to integrate our Rust library into our assembly kernel so that we can call the `rust_main` function. For that we need to pass the `libblog_os.a` file to the linker, together with the assembly object files. ### Adjusting the Makefile To build and link the rust library on `make`, we extend our `Makefile`([full file][github makefile]): ```make # ... target ?= $(arch)-blog_os rust_os := target/$(target)/debug/libblog_os.a # ... .PHONY: all clean run iso kernel # ... $(kernel): kernel $(rust_os) $(assembly_object_files) $(linker_script) @ld -n -T $(linker_script) -o $(kernel) \ $(assembly_object_files) $(rust_os) kernel: @RUST_TARGET_PATH=$(shell pwd) xargo build --target $(target) ``` We add a new `kernel` target that just executes `xargo build` and modify the `$(kernel)` target to link the created static lib. We also add the new `kernel` target to the `.PHONY` list, since it does not belong to a file with that name. But now `xargo build` is executed on every `make`, even if no source file was changed. And the ISO is recreated on every `make iso`/`make run`, too. We could try to avoid this by adding dependencies on all rust source and cargo configuration files to the `kernel` target, but the ISO creation takes only half a second on my machine and most of the time we will have changed a Rust file when we run `make`. So we keep it simple for now and let cargo do the bookkeeping of changed files (it does it anyway). [github makefile]: https://github.com/phil-opp/blog_os/blob/first_edition_post_3/Makefile ### Calling Rust Now we can call the main method in `long_mode_start`: ```nasm bits 64 long_mode_start: ... ; call the rust main extern rust_main ; new call rust_main ; new ; print `OKAY` to screen mov rax, 0x2f592f412f4b2f4f mov qword [0xb8000], rax hlt ``` By defining `rust_main` as `extern` we tell nasm that the function is defined in another file. As the linker takes care of linking them together, we'll get a linker error if we have a typo in the name or forget to mark the rust function as `pub extern`. If we've done everything right, we should still see the green `OKAY` when executing `make run`. That means that we successfully called the Rust function and returned back to assembly. ### Fixing Linker Errors Now we can try some Rust code: ```rust pub extern fn rust_main() { let x = ["Hello", "World", "!"]; let y = x; } ``` When we test it using `make run`, it fails with `undefined reference to 'memcpy'`. The `memcpy` function is one of the basic functions of the C library (`libc`). Usually the `libc` crate is linked to every Rust program together with the standard library, but we opted out through `#![no_std]`. We could try to fix this by adding the [libc crate] as `extern crate`. But `libc` is just a wrapper for the system `libc`, for example `glibc` on Linux, so this won't work for us. Instead we need to recreate the basic `libc` functions such as `memcpy`, `memmove`, `memset`, and `memcmp` in Rust. [libc crate]: https://doc.rust-lang.org/1.10.0/libc/index.html #### rlibc Fortunately there already is a crate for that: [rlibc]. When we look at its [source code][rlibc source] we see that it contains no magic, just some [raw pointer] operations in a while loop. To add `rlibc` as a dependency we just need to add two lines to the `Cargo.toml`: ```toml ... [dependencies] rlibc = "1.0" ``` and an `extern crate` definition in our `src/lib.rs`: ```rust ... extern crate rlibc; #[no_mangle] pub extern fn rust_main() { ... ``` Now `make run` doesn't complain about `memcpy` anymore. Instead it will show a pile of new ugly linker errors: ``` target/x86_64-blog_os/debug/libblog_os.a(core-92335f822fa6c9a6.0.o): In function `_$LT$f32$u20$as$u20$core..num..dec2flt.. rawfp..RawFloat$GT$::from_int::h50f7952efac3fdca': core.cgu-0.rs:(.text._ZN59_$LT$f32$u20$as$u20$core..num..dec2flt.. rawfp..RawFloat$GT$8from_int17h50f7952efac3fdcaE+0x2): undefined reference to `__floatundisf' target/x86_64-blog_os/debug/libblog_os.a(core-92335f822fa6c9a6.0.o): In function `_$LT$f64$u20$as$u20$core..num..dec2flt..rawfp.. RawFloat$GT$::from_int::h12a81f175246914a': core.cgu-0.rs:(.text._ZN59_$LT$f64$u20$as$u20$core..num..dec2flt..rawfp.. RawFloat$GT$8from_int17h12a81f175246914aE+0x2): undefined reference to `__floatundidf' target/x86_64-blog_os/debug/libblog_os.a(core-92335f822fa6c9a6.0.o): In function `core::num::from_str_radix::h09b12650704e0508': core.cgu-0.rs:(.text._ZN4core3num14from_str_radix 17h09b12650704e0508E+0xcf): undefined reference to `__muloti4' ... ``` [rlibc]: https://crates.io/crates/rlibc [rlibc source]: https://github.com/alexcrichton/rlibc/blob/defb486e765846417a8e73329e8c5196f1dca49a/src/lib.rs [raw pointer]: https://doc.rust-lang.org/book/raw-pointers.html [crates.io]: https://crates.io #### --gc-sections The new errors are linker errors about various missing functions such as `__floatundisf` or `__muloti4`. These functions are part of LLVM's [`compiler-rt` builtins] and are normally linked by the standard library. For `no_std` crates like ours, one has to link the `compiler-rt` library manually. Unfortunately, this library is implemented in C and the build process is a bit cumbersome. Alternatively, there is the [compiler-builtins] crate that tries to port the library to Rust, but it isn't complete yet. [`compiler-rt` builtins]: https://compiler-rt.llvm.org/ [compiler-builtins]: https://github.com/rust-lang-nursery/compiler-builtins In our case, there is a much simpler solution, since our kernel doesn't really need any of those functions yet. So we can just tell the linker to remove unused program sections and hopefully all references to these functions will disappear. Removing unused sections is generally a good idea as it reduces kernel size. The magic linker flag for this is `--gc-sections`, which stands for “garbage collect sections”. Let's add it to the `$(kernel)` target in our `Makefile`: ```make $(kernel): xargo $(rust_os) $(assembly_object_files) $(linker_script) @ld -n --gc-sections -T $(linker_script) -o $(kernel) \ $(assembly_object_files) $(rust_os) ``` Now we can do a `make run` again and it compiles without errors again. However, it doesn't boot anymore: ``` GRUB error: no multiboot header found. ``` What happened? Well, the linker removed unused sections. And since we don't use the Multiboot section anywhere, `ld` removes it, too. So we need to tell the linker explicitly that it should keep this section. The `KEEP` command does exactly that, so we add it to the linker script (`linker.ld`): ``` .boot : { /* ensure that the multiboot header is at the beginning */ KEEP(*(.multiboot_header)) } ``` Now everything should work again (the green `OKAY`). But there is another linking issue, which is triggered by some other example code. #### panic = "abort" The following snippet still fails: ```rust ... let test = (0..3).flat_map(|x| 0..x).zip(0..); ``` The error is a linker error again (hence the ugly error message): ``` target/x86_64-blog_os/debug/libblog_os.a(blog_os-b5a29f28b14f1f1f.0.o): In function `core::ptr::drop_in_place, core::ops::Range, closure>, core::ops::RangeFrom>>': /…/rust/src/libcore/ptr.rs:66: undefined reference to `_Unwind_Resume' target/x86_64-blog_os/debug/libblog_os.a(blog_os-b5a29f28b14f1f1f.0.o): In function `core::iter::iterator::Iterator::zip, core::ops::Range, closure>, core::ops::RangeFrom>': /…/rust/src/libcore/iter/iterator.rs:389: undefined reference to `_Unwind_Resume' ... ``` So the linker can't find a function named `_Unwind_Resume` that is referenced e.g. in `iter/iterator.rs:389` in libcore. This reference is not really there at [line 389][iterator.rs:389] of libcore's `iterator.rs`. Instead, it is a compiler inserted _landing pad_, which is used for panic handling. [iterator.rs:389]: https://github.com/rust-lang/rust/blob/c58c928e658d2e45f816fd05796a964aa83759da/src/libcore/iter/iterator.rs#L389 By default, the destructors of all stack variables are run when a `panic` occurs. This is called _unwinding_ and allows parent threads to recover from panics. However, it requires a platform specific gcc library, which isn't available in our kernel. Fortunately, Rust allows us to disable unwinding for our target. For that we add the following line to our `x86_64-blog_os.json` file: ```json { "...", "panic-strategy": "abort" } ``` By setting the [panic strategy] to `abort` instead of the default `unwind`, we disable all unwinding in our kernel. Let's try `make run` again: [panic strategy]: https://github.com/nox/rust-rfcs/blob/master/text/1513-less-unwinding.md ``` Compiling core v0.0.0 (file:///…/rust/src/libcore) Finished release [optimized] target(s) in 22.24 secs Finished dev [unoptimized + debuginfo] target(s) in 0.5 secs target/x86_64-blog_os/debug/libblog_os.a(blog_os-b5a29f28b14f1f1f.0.o): In function `core::ptr::drop_in_place<…>': /…/src/libcore/ptr.rs:66: undefined reference to `_Unwind_Resume' ... ``` We see that `xargo` recompiles the `core` crate, but the `_Unwind_Resume` error still occurs. This is because our `blog_os` crate was not recompiled somehow and thus still references the unwinding function. To fix this, we need to force a recompile using `cargo clean`: ``` > cargo clean > make run Compiling rlibc v1.0.0 Compiling blog_os v0.1.0 (file:///home/philipp/Documents/blog_os/tags) warning: unused variable: `test` […] Finished dev [unoptimized + debuginfo] target(s) in 0.60 secs ``` It worked! We no longer see linker errors and our kernel prints `OKAY` again. ## Hello World! Finally, it's time for a `Hello World!` from Rust: ```rust #[no_mangle] pub extern fn rust_main() { // ATTENTION: we have a very small stack and no guard page let hello = b"Hello World!"; let color_byte = 0x1f; // white foreground, blue background let mut hello_colored = [color_byte; 24]; for (i, char_byte) in hello.into_iter().enumerate() { hello_colored[i*2] = *char_byte; } // write `Hello World!` to the center of the VGA text buffer let buffer_ptr = (0xb8000 + 1988) as *mut _; unsafe { *buffer_ptr = hello_colored }; loop{} } ``` Some notes: - The `b` prefix creates a [byte string], which is just an array of `u8` - [enumerate] is an `Iterator` method that adds the current index `i` to elements - `buffer_ptr` is a [raw pointer] that points to the center of the VGA text buffer - Rust doesn't know the VGA buffer and thus can't guarantee that writing to the `buffer_ptr` is safe (it could point to important data). So we need to tell Rust that we know what we are doing by using an [unsafe block]. [byte string]: https://doc.rust-lang.org/reference/tokens.html#characters-and-strings [enumerate]: https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html#method.enumerate [unsafe block]: https://doc.rust-lang.org/book/unsafe.html ### Stack Overflows Since we still use the small 64 byte [stack from the last post], we must be careful not to [overflow] it. Normally, Rust tries to avoid stack overflows through _guard pages_: The page below the stack isn't mapped and such a stack overflow triggers a page fault (instead of silently overwriting random memory). But we can't unmap the page below our stack right now since we currently use only a single big page. Fortunately the stack is located just above the page tables. So some important page table entry would probably get overwritten on stack overflow and then a page fault occurs, too. [stack from the last post]: @/edition-1/posts/02-entering-longmode/index.md#creating-a-stack [overflow]: https://en.wikipedia.org/wiki/Stack_overflow ## What's next? Until now we write magic bits to some memory location when we want to print something to screen. In the [next post] we create a abstraction for the VGA text buffer that allows us to print strings in different colors and provides a simple interface. [next post]: @/edition-1/posts/04-printing-to-screen/index.md ================================================ FILE: blog/content/edition-1/posts/04-printing-to-screen/index.md ================================================ +++ title = "Printing to Screen" weight = 4 path = "printing-to-screen" aliases = ["printing-to-screen.html", "/2015/10/23/printing-to-screen/", "/rust-os/printing-to-screen.html"] date = 2015-10-23 template = "edition-1/page.html" [extra] updated = "2016-10-31" +++ In the [previous post] we switched from assembly to [Rust], a systems programming language that provides great safety. But so far we are using unsafe features like [raw pointers] whenever we want to print to screen. In this post we will create a Rust module that provides a safe and easy-to-use interface for the VGA text buffer. It will support Rust's [formatting macros], too. [previous post]: @/edition-1/posts/03-set-up-rust/index.md [Rust]: https://www.rust-lang.org/ [raw pointers]: https://doc.rust-lang.org/book/raw-pointers.html [formatting macros]: https://doc.rust-lang.org/std/fmt/#related-macros This post uses recent unstable features, so you need an up-to-date nighly compiler. If you have any questions, problems, or suggestions please [file an issue] or create a comment at the bottom. The code from this post is also available on [GitHub][code repository]. [file an issue]: https://github.com/phil-opp/blog_os/issues [code repository]: https://github.com/phil-opp/blog_os/tree/first_edition_post_4 ## The VGA Text Buffer The text buffer starts at physical address `0xb8000` and contains the characters displayed on screen. It has 25 rows and 80 columns. Each screen character has the following format: | Bit(s) | Value | | ------ | ---------------- | | 0-7 | ASCII code point | | 8-11 | Foreground color | | 12-14 | Background color | | 15 | Blink | The following colors are available: | Number | Color | Number + Bright Bit | Bright Color | | ------ | ---------- | ------------------- | ------------ | | 0x0 | Black | 0x8 | Dark Gray | | 0x1 | Blue | 0x9 | Light Blue | | 0x2 | Green | 0xa | Light Green | | 0x3 | Cyan | 0xb | Light Cyan | | 0x4 | Red | 0xc | Light Red | | 0x5 | Magenta | 0xd | Pink | | 0x6 | Brown | 0xe | Yellow | | 0x7 | Light Gray | 0xf | White | Bit 4 is the _bright bit_, which turns for example blue into light blue. It is unavailable in background color as the bit is used to control if the text should blink. If you want to use a light background color (e.g. white) you have to disable blinking through a [BIOS function][disable blinking]. [disable blinking]: http://www.ctyme.com/intr/rb-0117.htm ## A basic Rust Module Now that we know how the VGA buffer works, we can create a Rust module to handle printing: ```rust // in src/lib.rs mod vga_buffer; ``` The content of this module can live either in `src/vga_buffer.rs` or `src/vga_buffer/mod.rs`. The latter supports submodules while the former does not. But our module does not need any submodules so we create it as `src/vga_buffer.rs`. All of the code below goes into our new module (unless specified otherwise). ### Colors First, we represent the different colors using an enum: ```rust #[allow(dead_code)] #[repr(u8)] pub enum Color { Black = 0, Blue = 1, Green = 2, Cyan = 3, Red = 4, Magenta = 5, Brown = 6, LightGray = 7, DarkGray = 8, LightBlue = 9, LightGreen = 10, LightCyan = 11, LightRed = 12, Pink = 13, Yellow = 14, White = 15, } ``` We use a [C-like enum] here to explicitly specify the number for each color. Because of the `repr(u8)` attribute each enum variant is stored as an `u8`. Actually 4 bits would be sufficient, but Rust doesn't have an `u4` type. [C-like enum]: https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html Normally the compiler would issue a warning for each unused variant. By using the `#[allow(dead_code)]` attribute we disable these warnings for the `Color` enum. To represent a full color code that specifies foreground and background color, we create a newtype on top of `u8`: ```rust struct ColorCode(u8); impl ColorCode { const fn new(foreground: Color, background: Color) -> ColorCode { ColorCode((background as u8) << 4 | (foreground as u8)) } } ``` The `ColorCode` contains the full color byte, containing foreground and background color. Blinking is enabled implicitly by using a bright background color (soon we will disable blinking anyway). The `new` function is a [const function] to allow it in static initializers. As `const` functions are unstable we need to add the `const_fn` feature in `src/lib.rs`. [const function]: https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md ### The Text Buffer Now we can add structures to represent a screen character and the text buffer: ```rust #[repr(C)] struct ScreenChar { ascii_character: u8, color_code: ColorCode, } const BUFFER_HEIGHT: usize = 25; const BUFFER_WIDTH: usize = 80; struct Buffer { chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Since the field ordering in default structs is undefined in Rust, we need the [repr(C\)] attribute. It guarantees that the struct's fields are laid out exactly like in a C struct and thus guarantees the correct field ordering. [repr(C\)]: https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc To actually write to screen, we now create a writer type: ```rust use core::ptr::Unique; pub struct Writer { column_position: usize, color_code: ColorCode, buffer: Unique, } ``` The writer will always write to the last line and shift lines up when a line is full (or on `\n`). The `column_position` field keeps track of the current position in the last row. The current foreground and background colors are specified by `color_code` and a pointer to the VGA buffer is stored in `buffer`. To make it possible to create a `static` Writer later, the `buffer` field stores an `Unique` instead of a plain `*mut Buffer`. [Unique] is a wrapper that implements Send/Sync and is thus usable as a `static`. Since it's unstable, you may need to add the `unique` feature to `lib.rs`: [Unique]: https://doc.rust-lang.org/1.10.0/core/ptr/struct.Unique.html ```rust // in src/lib.rs #![feature(unique)] ``` ## Printing Characters Now we can use the `Writer` to modify the buffer's characters. First we create a method to write a single ASCII byte (it doesn't compile yet): ```rust impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { if self.column_position >= BUFFER_WIDTH { self.new_line(); } let row = BUFFER_HEIGHT - 1; let col = self.column_position; let color_code = self.color_code; self.buffer().chars[row][col] = ScreenChar { ascii_character: byte, color_code: color_code, }; self.column_position += 1; } } } fn buffer(&mut self) -> &mut Buffer { unsafe{ self.buffer.as_mut() } } fn new_line(&mut self) {/* TODO */} } ``` If the byte is the [newline] byte `\n`, the writer does not print anything. Instead it calls a `new_line` method, which we'll implement later. Other bytes get printed to the screen in the second match case. [newline]: https://en.wikipedia.org/wiki/Newline When printing a byte, the writer checks if the current line is full. In that case, a `new_line` call is required before to wrap the line. Then it writes a new `ScreenChar` to the buffer at the current position. Finally, the current column position is advanced. The `buffer()` auxiliary method converts the raw pointer in the `buffer` field into a safe mutable buffer reference. The unsafe block is needed because the [as_mut()] method of `Unique` is unsafe. But our `buffer()` method itself isn't marked as unsafe, so it must not introduce any unsafety (e.g. cause segfaults). To guarantee that, it's very important that the `buffer` field always points to a valid `Buffer`. It's like a contract that we must stand to every time we create a `Writer`. To ensure that it's not possible to create an invalid `Writer` from outside of the module, the struct must have at least one private field and public creation functions are not allowed either. [as_mut()]: https://doc.rust-lang.org/1.26.0/core/ptr/struct.Unique.html#method.as_mut ### Cannot Move out of Borrowed Content When we try to compile it, we get the following error: ``` error[E0507]: cannot move out of borrowed content --> src/vga_buffer.rs:79:34 | 79 | let color_code = self.color_code; | ^^^^ cannot move out of borrowed content ``` The reason it that Rust _moves_ values by default instead of copying them like other languages. And we cannot move `color_code` out of `self` because we only borrowed `self`. For more information check out the [ownership section] in the Rust book. [ownership section]: https://doc.rust-lang.org/book/ownership.html [by reference]: https://rust-lang.github.io/book/ch04-02-references-and-borrowing.html To fix it, we can implement the [Copy] trait for the `ColorCode` type. The easiest way to do this is to use the built-in [derive macro]: [Copy]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [derive macro]: https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html ```rust #[derive(Debug, Clone, Copy)] struct ColorCode(u8); ``` We also derive the [Clone] trait, since it's a requirement for `Copy`, and the [Debug] trait, which allows us to print this field for debugging purposes. [Clone]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html [Debug]: https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html Now our project should compile again. However, the [documentation for Copy] says: _“if your type can implement Copy, it should”_. Therefore we also derive Copy for `Color` and `ScreenChar`: [documentation for Copy]: https://doc.rust-lang.org/core/marker/trait.Copy.html#when-should-my-type-be-copy ```rust #[allow(dead_code)] #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum Color {...} #[derive(Debug, Clone, Copy)] #[repr(C)] struct ScreenChar {...} ``` ### Try it out! To write some characters to the screen, you can create a temporary function: ```rust pub fn print_something() { let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::LightGreen, Color::Black), buffer: unsafe { Unique::new_unchecked(0xb8000 as *mut _) }, }; writer.write_byte(b'H'); } ``` It just creates a new Writer that points to the VGA buffer at `0xb8000`. To use the unstable `Unique::new_unchecked` function, we need to add the feature flag `#![feature(const_unique_new)]` to the top of our `src/lib.rs`. Then it writes the byte `b'H'` to it. The `b` prefix creates a [byte character], which represents an ASCII code point. When we call `vga_buffer::print_something` in main, a `H` should be printed in the _lower_ left corner of the screen in light green: [byte character]: https://doc.rust-lang.org/reference/tokens.html#characters-and-strings ![QEMU output with a green `H` in the lower left corner](vga-H-lower-left.png) ### Volatile We just saw that our `H` was printed correctly. However, it might not work with future Rust compilers that optimize more aggressively. The problem is that we only write to the `Buffer` and never read from it again. The compiler doesn't know about the side effect that some characters appear on the screen. So it might decide that these writes are unnecessary and can be omitted. To avoid this erroneous optimization, we need to specify these writes as _[volatile]_. This tells the compiler that the write has side effects and should not be optimized away. [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) In order to use volatile writes for the VGA buffer, we use the [volatile][volatile crate] library. This _crate_ (this is how packages are called in the Rust world) provides a `Volatile` wrapper type with `read` and `write` methods. These methods internally use the [read_volatile] and [write_volatile] functions of the standard library and thus guarantee that the reads/writes are not optimized away. [volatile crate]: https://docs.rs/volatile [read_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html [write_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html We can add a dependency on the `volatile` crate by adding it to the `dependencies` section of our `Cargo.toml`: ```toml # in Cargo.toml [dependencies] volatile = "0.1.0" ``` The `0.1.0` is the [semantic] version number. For more information, see the [Specifying Dependencies] guide of the cargo documentation. [semantic]: https://semver.org/ [Specifying Dependencies]: https://doc.crates.io/specifying-dependencies.html Now we've declared that our project depends on the `volatile` crate and are able to import it in `src/lib.rs`: ```rust // in src/lib.rs extern crate volatile; ``` Let's use it to make writes to the VGA buffer volatile. We update our `Buffer` type as follows: ```rust // in src/vga_buffer.rs use volatile::Volatile; struct Buffer { chars: [[Volatile; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Instead of a `ScreenChar`, we're now using a `Volatile`. (The `Volatile` type is [generic] and can wrap (almost) any type). This ensures that we can't accidentally write to it through a “normal” write. Instead, we have to use the `write` method now. [generic]: https://doc.rust-lang.org/book/second-edition/ch10-00-generics.html This means that we have to update our `Writer::write_byte` method: ```rust impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { ... self.buffer().chars[row][col].write(ScreenChar { ascii_character: byte, color_code: color_code, }); ... } } } ... } ``` Instead of a normal assignment using `=`, we're now using the `write` method. This guarantees that the compiler will never optimize away this write. ## Printing Strings To print whole strings, we can convert them to bytes and print them one-by-one: ```rust // in `impl Writer` pub fn write_str(&mut self, s: &str) { for byte in s.bytes() { self.write_byte(byte) } } ``` You can try it yourself in the `print_something` function. When you print strings with some special characters like `ä` or `λ`, you'll notice that they cause weird symbols on screen. That's because they are represented by multiple bytes in [UTF-8]. By converting them to bytes, we of course get strange results. But since the VGA buffer doesn't support UTF-8, it's not possible to display these characters anyway. [core tracking issue]: https://github.com/rust-lang/rust/issues/27701 [UTF-8]: https://www.fileformat.info/info/unicode/utf8.htm ### Support Formatting Macros It would be nice to support Rust's formatting macros, too. That way, we can easily print different types like integers or floats. To support them, we need to implement the [core::fmt::Write] trait. The only required method of this trait is `write_str` that looks quite similar to our `write_str` method. To implement the trait, we just need to move it into an `impl fmt::Write for Writer` block and add a return type: ```rust use core::fmt; impl fmt::Write for Writer { fn write_str(&mut self, s: &str) -> fmt::Result { for byte in s.bytes() { self.write_byte(byte) } Ok(()) } } ``` The `Ok(())` is just a `Ok` Result containing the `()` type. We can drop the `pub` because trait methods are always public. Now we can use Rust's built-in `write!`/`writeln!` formatting macros: ```rust // in the `print_something` function use core::fmt::Write; let mut writer = Writer {...}; writer.write_byte(b'H'); writer.write_str("ello! "); write!(writer, "The numbers are {} and {}", 42, 1.0/3.0); ``` Now you should see a `Hello! The numbers are 42 and 0.3333333333333333` at the bottom of the screen. [core::fmt::Write]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html ### Newlines Right now, we just ignore newlines and characters that don't fit into the line anymore. Instead we want to move every character one line up (the top line gets deleted) and start at the beginning of the last line again. To do this, we add an implementation for the `new_line` method of `Writer`: ```rust // in `impl Writer` fn new_line(&mut self) { for row in 1..BUFFER_HEIGHT { for col in 0..BUFFER_WIDTH { let buffer = self.buffer(); let character = buffer.chars[row][col].read(); buffer.chars[row - 1][col].write(character); } } self.clear_row(BUFFER_HEIGHT-1); self.column_position = 0; } fn clear_row(&mut self, row: usize) {/* TODO */} ``` We iterate over all screen characters and move each characters one row up. Note that the range notation (`..`) is exclusive the upper bound. We also omit the 0th row (the first range starts at `1`) because it's the row that is shifted off screen. Now we only need to implement the `clear_row` method to finish the newline code: ```rust // in `impl Writer` fn clear_row(&mut self, row: usize) { let blank = ScreenChar { ascii_character: b' ', color_code: self.color_code, }; for col in 0..BUFFER_WIDTH { self.buffer().chars[row][col].write(blank); } } ``` This method clears a row by overwriting all of its characters with a space character. ## Providing an Interface To provide a global writer that can used as an interface from other modules, we can add a `static` writer: ```rust pub static WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::LightGreen, Color::Black), buffer: unsafe { Unique::new_unchecked(0xb8000 as *mut _) }, }; ``` But we can't use it to print anything! You can try it yourself in the `print_something` function. The reason is that we try to take a mutable reference (`&mut`) to a immutable `static` when calling `WRITER.print_byte`. To resolve it, we could use a [mutable static]. But then every read and write to it would be unsafe since it could easily introduce data races and other bad things. Using `static mut` is highly discouraged, there are even proposals to [remove it][remove static mut]. [mutable static]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable [remove static mut]: https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437 But what are the alternatives? We could try to use a cell type like [RefCell] or even [UnsafeCell] to provide [interior mutability]. But these types aren't [Sync] \(with good reason), so we can't use them in statics. [RefCell]: https://doc.rust-lang.org/nightly/core/cell/struct.RefCell.html [UnsafeCell]: https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html [interior mutability]: https://doc.rust-lang.org/1.30.0/book/first-edition/mutability.html#interior-vs-exterior-mutability [Sync]: https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html To get synchronized interior mutability, users of the standard library can use [Mutex]. It provides mutual exclusion by blocking threads when the resource is already locked. But our basic kernel does not have any blocking support or even a concept of threads, so we can't use it either. However there is a really basic kind of mutex in computer science that requires no operating system features: the [spinlock]. Instead of blocking, the threads simply try to lock it again and again in a tight loop and thus burn CPU time until the mutex is free again. [Mutex]: https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html [spinlock]: https://en.wikipedia.org/wiki/Spinlock To use a spinning mutex, we can add the [spin crate] as a dependency: [spin crate]: https://crates.io/crates/spin ```toml # in Cargo.toml [dependencies] rlibc = "0.1.4" spin = "0.4.5" ``` ```rust // in src/lib.rs extern crate spin; ``` Then we can use the spinning Mutex to add interior mutability to our static writer: ```rust // in src/vga_buffer.rs again use spin::Mutex; ... pub static WRITER: Mutex = Mutex::new(Writer { column_position: 0, color_code: ColorCode::new(Color::LightGreen, Color::Black), buffer: unsafe { Unique::new_unchecked(0xb8000 as *mut _) }, }); ``` [Mutex::new] is a const function, too, so it can be used in statics. Now we can easily print from our main function: [Mutex::new]: https://docs.rs/spin/0.4.5/spin/struct.Mutex.html#method.new ```rust // in src/lib.rs pub extern fn rust_main() { use core::fmt::Write; vga_buffer::WRITER.lock().write_str("Hello again"); write!(vga_buffer::WRITER.lock(), ", some numbers: {} {}", 42, 1.337); loop{} } ``` Note that we need to import the `Write` trait if we want to use its functions. ## A println macro Rust's [macro syntax] is a bit strange, so we won't try to write a macro from scratch. Instead we look at the source of the [`println!` macro] in the standard library: [macro syntax]: https://doc.rust-lang.org/nightly/book/second-edition/appendix-04-macros.html [`println!` macro]: https://doc.rust-lang.org/nightly/std/macro.println!.html ```rust macro_rules! println { ($fmt:expr) => (print!(concat!($fmt, "\n"))); ($fmt:expr, $($arg:tt)*) => (print!(concat!($fmt, "\n"), $($arg)*)); } ``` Macros are defined through one or more rules, which are similar to `match` arms. The `println` macro has two rules: The first rule is for invocations with a single argument (e.g. `println!("Hello")`) and the second rule is for invocations with additional parameters (e.g. `println!("{}{}", 4, 2)`). Both rules simply append a newline character (`\n`) to the format string and then invoke the [`print!` macro], which is defined as: [`print!` macro]: https://doc.rust-lang.org/nightly/std/macro.print!.html ```rust macro_rules! print { ($($arg:tt)*) => ($crate::io::_print(format_args!($($arg)*))); } ``` The macro expands to a call of the [`_print` function] in the `io` module. The [`$crate` variable] ensures that the macro also works from outside the `std` crate. For example, it expands to `::std` when it's used in other crates. The [`format_args` macro] builds a [fmt::Arguments] type from the passed arguments, which is passed to `_print`. The [`_print` function] of libstd is rather complicated, as it supports different `Stdout` devices. We don't need that complexity since we just want to print to the VGA buffer. [`_print` function]: https://github.com/rust-lang/rust/blob/46d39f3329487115e7d7dcd37bc64eea6ef9ba4e/src/libstd/io/stdio.rs#L631 [`$crate` variable]: https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate [`format_args` macro]: https://doc.rust-lang.org/nightly/std/macro.format_args.html [fmt::Arguments]: https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html To print to the VGA buffer, we just copy the `println!` macro and modify the `print!` macro to use our static `WRITER` instead of `_print`: ```rust // in src/vga_buffer.rs macro_rules! print { ($($arg:tt)*) => ({ use core::fmt::Write; let mut writer = $crate::vga_buffer::WRITER.lock(); writer.write_fmt(format_args!($($arg)*)).unwrap(); }); } ``` Instead of a `_print` function, we call the `write_fmt` method of our static `Writer`. Since we're using a method from the `Write` trait, we need to import it before. The additional `unwrap()` at the end panics if printing isn't successful. But since we always return `Ok` in `write_str`, that should not happen. Note the additional `{}` scope around the macro: We write `=> ({…})` instead of `=> (…)`. The additional `{}` avoids that the `Write` trait is silently imported to the parent scope when `print` is used. ### Clearing the screen We can now use `println!` to add a rather trivial function to clear the screen: ```rust // in src/vga_buffer.rs pub fn clear_screen() { for _ in 0..BUFFER_HEIGHT { println!(""); } } ``` ### Hello World using `println` To use `println` in `lib.rs`, we need to import the macros of the VGA buffer module first. Therefore we add a `#[macro_use]` attribute to the module declaration: ```rust // in src/lib.rs #[macro_use] mod vga_buffer; #[no_mangle] pub extern fn rust_main() { // ATTENTION: we have a very small stack and no guard page vga_buffer::clear_screen(); println!("Hello World{}", "!"); loop{} } ``` Since we imported the macros at crate level, they are available in all modules and thus provide an easy and safe interface to the VGA buffer. As expected, we now see a _“Hello World!”_ on a cleared screen: ![QEMU printing “Hello World!” on a cleared screen](vga-hello-world.png) ### Deadlocks Whenever we use locks, we must be careful to not accidentally introduce _deadlocks_. A [deadlock] occurs when a thread/program waits for a lock that will never be released. Normally, this happens when multiple threads access multiple locks. For example, when thread A holds lock 1 and tries to acquire lock 2 and -- at the same time -- thread B holds lock 2 and tries to acquire lock 1. [deadlock]: https://en.wikipedia.org/wiki/Deadlock However, a deadlock can also occur when a thread tries to acquire the same lock twice. This way we can trigger a deadlock in our VGA driver: ```rust // in rust_main in src/lib.rs println!("{}", { println!("inner"); "outer" }); ``` The argument passed to `println` is new block that resolves to the string _“outer”_ (a block always returns the result of the last expression). But before returning “outer”, the block tries to print the string _“inner”_. When we try this code in QEMU, we see that neither of the strings are printed. To understand what's happening, we take a look at our `print` macro again: ```rust macro_rules! print { ($($arg:tt)*) => ({ use core::fmt::Write; let mut writer = $crate::vga_buffer::WRITER.lock(); writer.write_fmt(format_args!($($arg)*)).unwrap(); }); } ``` So we _first_ lock the `WRITER` and then we evaluate the arguments using `format_args`. The problem is that the argument in our code example contains another `println`, which tries to lock the `WRITER` again. So now the inner `println` waits for the outer `println` and vice versa. Thus, a deadlock occurs and the CPU spins endlessly. ### Fixing the Deadlock In order to fix the deadlock, we need to evaluate the arguments _before_ locking the `WRITER`. We can do so by moving the locking and printing logic into a new `print` function (like it's done in the standard library): ```rust // in src/vga_buffer.rs macro_rules! print { ($($arg:tt)*) => ({ $crate::vga_buffer::print(format_args!($($arg)*)); }); } pub fn print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` Now the macro only evaluates the arguments (through `format_args!`) and passes them to the new `print` function. The `print` function then locks the `WRITER` and prints the formatting arguments using `write_fmt`. So now the arguments are evaluated before locking the `WRITER`. Thus, we fixed the deadlock: ![QEMU printing “inner” and then “outer”](fixed-println-deadlock.png) We see that both “inner” and “outer” are printed. ## What's next? In the next posts we will map the kernel pages correctly so that accessing `0x0` or writing to `.rodata` is not possible anymore. To obtain the loaded kernel sections we will read the Multiboot information structure. Then we will create a paging module and use it to switch to a new page table where the kernel sections are mapped correctly. The [next post] describes the Multiboot information structure and creates a frame allocator using the information about memory areas. [next post]: @/edition-1/posts/05-allocating-frames/index.md ## Other Rust OS Projects Now that you know the very basics of OS development in Rust, you should also check out the following projects: - [Rust Bare-Bones Kernel]: A basic kernel with roughly the same functionality as ours. Writes output to the serial port instead of the VGA buffer and maps the kernel to the [higher half] \(instead of our identity mapping). _Note_: You need to [cross compile binutils] to build it (or you create some symbolic links[^fn-symlink] if you're on x86_64). - [RustOS]: More advanced kernel that supports allocation, keyboard inputs, and threads. It also has a scheduler and a basic network driver. - ["Tifflin" Experimental Kernel]: Big kernel project by thepowersgang, that is actively developed and has over 650 commits. It has a separate userspace and supports multiple file systems, even a GUI is included. Needs a cross compiler. - [Redox]: Probably the most complete Rust OS today. It has an active community and over 1000 Github stars. File systems, network, an audio player, a picture viewer, and much more. Just take a look at the [screenshots][redox screenshots]. [Rust Bare-Bones Kernel]: https://github.com/thepowersgang/rust-barebones-kernel [higher half]: https://wiki.osdev.org/Higher_Half_Kernel [cross compile binutils]: @/edition-1/extra/cross-compile-binutils.md [RustOS]: https://github.com/RustOS-Fork-Holding-Ground/RustOS ["Tifflin" Experimental Kernel]:https://github.com/thepowersgang/rust_os [Redox]: https://github.com/redox-os/redox [redox screenshots]: https://github.com/redox-os/redox#what-it-looks-like ## Footnotes [^fn-symlink]: You will need to symlink `x86_64-none_elf-XXX` to `/usr/bin/XXX` where `XXX` is in {`as`, `ld`, `objcopy`, `objdump`, `strip`}. The `x86_64-none_elf-XXX` files must be in some folder that is in your `$PATH`. But then you can only build for your x86_64 host architecture, so use this hack only for testing. ================================================ FILE: blog/content/edition-1/posts/05-allocating-frames/index.md ================================================ +++ title = "Allocating Frames" weight = 5 path = "allocating-frames" aliases = ["allocating-frames.html"] date = 2015-11-15 template = "edition-1/page.html" +++ In this post we create an allocator that provides free physical frames for a future paging module. To get the required information about available and used memory we use the Multiboot information structure. Additionally, we improve the `panic` handler to print the corresponding message and source line. The full source code is available on [GitHub][source repo]. Feel free to open issues there if you have any problems or improvements. You can also leave a comment at the bottom. [source repo]: https://github.com/phil-opp/blog_os/tree/first_edition_post_5 ## Preparation We still have a really tiny stack of 64 bytes, which won't suffice for this post. So we increase it to 16kB (four pages) in `boot.asm`: ```asm section .bss ... stack_bottom: resb 4096 * 4 stack_top: ``` ## The Multiboot Information Structure When a Multiboot compliant bootloader loads a kernel, it passes a pointer to a boot information structure in the `ebx` register. We can use it to get information about available memory and loaded kernel sections. First, we need to pass this pointer to our kernel as an argument to `rust_main`. To find out how arguments are passed to functions, we can look at the [calling convention of Linux]: [calling convention of Linux]: https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI > The first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9 So to pass the pointer to our kernel, we need to move it to `rdi` before calling the kernel. Since we're not using the `rdi`/`edi` register in our bootstrap code, we can simply set the `edi` register right after booting (in `boot.asm`): ```nasm start: mov esp, stack_top mov edi, ebx ; Move Multiboot info pointer to edi ``` Now we can add the argument to our `rust_main`: ```rust pub extern fn rust_main(multiboot_information_address: usize) { ... } ``` Instead of writing an own Multiboot module, we use the [multiboot2] crate. It gives us some basic information about mapped kernel sections and available memory. I just wrote it for this blog post since I could not find any other Multiboot 2 crate. It's still incomplete, but it does its job. [multiboot2]: https://docs.rs/multiboot2 So let's add a dependency on the git repository: ```toml # in Cargo.toml [dependencies] ... multiboot2 = "0.1.0" ``` ```rust // in src/lib.rs extern crate multiboot2; ``` Now we can use it to print available memory areas. ### Available Memory The boot information structure consists of various _tags_. See section 3.4 of the Multiboot specification ([PDF][multiboot specification]) for a complete list. The _memory map_ tag contains a list of all available RAM areas. Special areas such as the VGA text buffer at `0xb8000` are not available. Note that some of the available memory is already used by our kernel and by the multiboot information structure itself. [multiboot specification]: https://nongnu.askapache.com/grub/phcoder/multiboot.pdf To print all available memory areas, we can use the `multiboot2` crate in our `rust_main` as follows: ```rust let boot_info = unsafe{ multiboot2::load(multiboot_information_address) }; let memory_map_tag = boot_info.memory_map_tag() .expect("Memory map tag required"); println!("memory areas:"); for area in memory_map_tag.memory_areas() { println!(" start: 0x{:x}, length: 0x{:x}", area.base_addr, area.length); } ``` The `load` function is `unsafe` because it relies on a valid address. Since the memory tag is not required by the Multiboot specification, the `memory_map_tag()` function returns an `Option`. The `memory_areas()` function returns the desired memory area iterator. The output looks like this: ``` Hello World! memory areas: start: 0x0, length: 0x9fc00 start: 0x100000, length: 0x7ee0000 ``` So we have one area from `0x0` to `0x9fc00`, which is a bit below the 1MiB mark. The second, bigger area starts at 1MiB and contains the rest of available memory. The area from `0x9fc00` to 1MiB is not available since it contains for example the VGA text buffer at `0xb8000`. This is the reason for putting our kernel at 1MiB and not somewhere below. If you give QEMU more than 4GiB of memory by passing `-m 5G`, you get another unusable area below the 4GiB mark. This memory is normally mapped to some hardware devices. See the [OSDev Wiki][Memory_map] for more information. [Memory_map]: https://wiki.osdev.org/Memory_Map_(x86) ### Handling Panics We used `expect` in the code above, which will panic if there is no memory map tag. But our current panic handler just loops without printing any error message. Of course we could replace `expect` by a `match`, but we should fix the panic handler nonetheless: ```rust #[lang = "panic_fmt"] #[no_mangle] pub extern fn panic_fmt() -> ! { println!("PANIC"); loop{} } ``` Now we get a `PANIC` message. But we can do even better. The `panic_fmt` function has actually some arguments: ```rust #[lang = "panic_fmt"] #[no_mangle] pub extern fn panic_fmt(fmt: core::fmt::Arguments, file: &'static str, line: u32) -> ! { println!("\n\nPANIC in {} at line {}:", file, line); println!(" {}", fmt); loop{} } ``` Be careful with these arguments as the compiler does not check the function signature for `lang_items`. Now we get the panic message and the causing source line. You can try it by inserting a `panic` somewhere. ### Kernel ELF Sections To read and print the sections of our kernel ELF file, we can use the _Elf-sections_ tag: ```rust let elf_sections_tag = boot_info.elf_sections_tag() .expect("Elf-sections tag required"); println!("kernel sections:"); for section in elf_sections_tag.sections() { println!(" addr: 0x{:x}, size: 0x{:x}, flags: 0x{:x}", section.addr, section.size, section.flags); } ``` This should print out the start address and size of all kernel sections. If the section is writable, the `0x1` bit is set in `flags`. The `0x4` bit marks an executable section and the `0x2` bit indicates that the section was loaded in memory. For example, the `.text` section is executable but not writable and the `.data` section just the opposite. But when we execute it, tons of really small sections are printed. We can use the `objdump -h build/kernel-x86_64.bin` command to list the sections with name. There seem to be over 200 sections and many of them start with `.text.*` or `.data.rel.ro.local.*`. This is because the Rust compiler puts e.g. each function in its own `.text` subsection. That way, unused functions are removed when the linker omits unused sections. To merge these subsections, we need to update our linker script: ``` ENTRY(start) SECTIONS { . = 1M; .boot : { KEEP(*(.multiboot_header)) } .text : { *(.text .text.*) } .rodata : { *(.rodata .rodata.*) } .data.rel.ro : { *(.data.rel.ro.local*) *(.data.rel.ro .data.rel.ro.*) } } ``` These lines are taken from the default linker script of `ld`, which can be obtained through `ld ‑verbose`. The `.text` _output_ section contains now all `.text.*` _input_ sections of the static library (and the same applies for the `.rodata` and `.data.rel.ro` sections). Now there are only 12 sections left and we get a much more useful output: ![qemu output](qemu-memory-areas-and-kernel-sections.png) If you like, you can compare this output to the `objdump -h build/kernel-x86_64.bin` output. You will see that the start addresses and sizes match exactly for each section. The sections with flags `0x0` are mostly debug sections, so they don't need to be loaded. And the last few sections of the QEMU output aren't in the `objdump` output because they are special sections such as string tables. ### Start and End of Kernel We can now use the ELF section tag to calculate the start and end address of our loaded kernel: ```rust let kernel_start = elf_sections_tag.sections().map(|s| s.addr) .min().unwrap(); let kernel_end = elf_sections_tag.sections().map(|s| s.addr + s.size) .max().unwrap(); ``` The other used memory area is the Multiboot Information structure: ```rust let multiboot_start = multiboot_information_address; let multiboot_end = multiboot_start + (boot_info.total_size as usize); ``` Printing these numbers gives us: ``` kernel_start: 0x100000, kernel_end: 0x11a168 multiboot_start: 0x11d400, multiboot_end: 0x11d9c8 ``` So the kernel starts at 1MiB (like expected) and is about 105 KiB in size. The multiboot information structure was placed at `0x11d400` by GRUB and needs 1480 bytes. Of course your numbers could be a bit different due to different versions of Rust or GRUB (or some differences in the source code). ## A frame allocator When using paging, the physical memory is split into equally sized chunks (normally 4096 bytes) Such a chunk is called "physical page" or "frame". These frames can be mapped to any virtual page through page tables. For more information about paging take a peek at the [next post]. We will need a free frame in many cases. For example when want to increase the size of our future kernel heap. Or when we create a new page table. Or when we add a new kernel thread and thus need to allocate a new stack. So we need some kind of allocator that keeps track of physical frames and gives us a free one when needed. There are various ways to write such a frame allocator: We could create some kind of linked list from the free frames. For example, each frame could begin with a pointer to the next free frame. Since the frames are free, this would not overwrite any data. Our allocator would just save the head of the list and could easily allocate and deallocate frames by updating pointers. Unfortunately, this approach has a problem: It requires reading and writing these free frames. So we would need to map all physical frames to some virtual address, at least temporary. Another disadvantage is that we need to create this linked list at startup. That implies that we need to set over one million pointers at startup if the machine has 4GiB of RAM. Another approach is to create some kind of data structure such as a [bitmap or a stack] to manage free frames. We could place it in the already identity mapped area right behind the kernel or multiboot structure. That way we would not need to (temporary) map each free frame. But it has the same problem of the slow initial creating/filling. In fact, we will use this approach in a future post to manage frames that are freed again. But for the initial management of free frames, we use a different method. [bitmap or a stack]: https://wiki.osdev.org/Page_Frame_Allocation#Physical_Memory_Allocators In the following, we will use Multiboot's memory map directly. The idea is to maintain a simple counter that starts at frame 0 and is increased constantly. If the current frame is available (part of an available area in the memory map) and not used by the kernel or the multiboot structure (we know their start and end addresses), we know that it's free and return it. Else, we increase the counter to the next possibly free frame. That way, we don't need to create a data structure when booting and the physical frames can remain unmapped. The only problem is that we cannot reasonably free frames again, but we will solve that problem in a future post (by adding an intermediate frame stack that saves freed frames). So let's start implementing our memory map based frame allocator. ### A Memory Module First we create a memory module with a `Frame` type (`src/memory/mod.rs`): ```rust #[derive(Debug, PartialEq, Eq, PartialOrd, Ord)] pub struct Frame { number: usize, } ``` (Don't forget to add the `mod memory` line to `src/lib.rs`.) Instead of e.g. the start address, we just store the frame number. We use `usize` here since the number of frames depends on the memory size. The long `derive` line makes frames printable and comparable. To make it easy to get the corresponding frame for a physical address, we add a `containing_address` method: ```rust pub const PAGE_SIZE: usize = 4096; impl Frame { fn containing_address(address: usize) -> Frame { Frame{ number: address / PAGE_SIZE } } } ``` We also add a `FrameAllocator` trait: ```rust pub trait FrameAllocator { fn allocate_frame(&mut self) -> Option; fn deallocate_frame(&mut self, frame: Frame); } ``` This allows us to create another, more advanced frame allocator in the future. ### The Allocator Now we can put everything together and create the actual frame allocator. Therefor we create a `src/memory/area_frame_allocator.rs` submodule. The allocator struct looks like this: ```rust use memory::{Frame, FrameAllocator}; use multiboot2::{MemoryAreaIter, MemoryArea}; pub struct AreaFrameAllocator { next_free_frame: Frame, current_area: Option<&'static MemoryArea>, areas: MemoryAreaIter, kernel_start: Frame, kernel_end: Frame, multiboot_start: Frame, multiboot_end: Frame, } ``` The `next_free_frame` field is a simple counter that is increased every time we return a frame. It's initialized to `0` and every frame below it counts as used. The `current_area` field holds the memory area that contains `next_free_frame`. If `next_free_frame` leaves this area, we will look for the next one in `areas`. When there are no areas left, all frames are used and `current_area` becomes `None`. The `{kernel, multiboot}_{start, end}` fields are used to avoid returning already used fields. To implement the `FrameAllocator` trait, we need to implement the allocation and deallocation methods: ```rust impl FrameAllocator for AreaFrameAllocator { fn allocate_frame(&mut self) -> Option { // TODO (see below) } fn deallocate_frame(&mut self, frame: Frame) { // TODO (see below) } } ``` The `allocate_frame` method looks like this: ```rust // in `allocate_frame` in `impl FrameAllocator for AreaFrameAllocator` if let Some(area) = self.current_area { // "Clone" the frame to return it if it's free. Frame doesn't // implement Clone, but we can construct an identical frame. let frame = Frame{ number: self.next_free_frame.number }; // the last frame of the current area let current_area_last_frame = { let address = area.base_addr + area.length - 1; Frame::containing_address(address as usize) }; if frame > current_area_last_frame { // all frames of current area are used, switch to next area self.choose_next_area(); } else if frame >= self.kernel_start && frame <= self.kernel_end { // `frame` is used by the kernel self.next_free_frame = Frame { number: self.kernel_end.number + 1 }; } else if frame >= self.multiboot_start && frame <= self.multiboot_end { // `frame` is used by the multiboot information structure self.next_free_frame = Frame { number: self.multiboot_end.number + 1 }; } else { // frame is unused, increment `next_free_frame` and return it self.next_free_frame.number += 1; return Some(frame); } // `frame` was not valid, try it again with the updated `next_free_frame` self.allocate_frame() } else { None // no free frames left } ``` The `choose_next_area` method isn't part of the trait and thus goes into a new `impl AreaFrameAllocator` block: ```rust // in `impl AreaFrameAllocator` fn choose_next_area(&mut self) { self.current_area = self.areas.clone().filter(|area| { let address = area.base_addr + area.length - 1; Frame::containing_address(address as usize) >= self.next_free_frame }).min_by_key(|area| area.base_addr); if let Some(area) = self.current_area { let start_frame = Frame::containing_address(area.base_addr as usize); if self.next_free_frame < start_frame { self.next_free_frame = start_frame; } } } ``` This function chooses the area with the minimal base address that still has free frames, i.e. `next_free_frame` is smaller than its last frame. Note that we need to clone the iterator because the [min_by_key] function consumes it. If there are no areas with free frames left, `min_by_key` automatically returns the desired `None`. [min_by_key]: https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html#method.min_by_key If the `next_free_frame` is below the new `current_area`, it needs to be updated to the area's start frame. Else, the `allocate_frame` call could return an unavailable frame. We don't have a data structure to store free frames, so we can't implement `deallocate_frame` reasonably. Thus we use the `unimplemented` macro, which just panics when the method is called: ```rust impl FrameAllocator for AreaFrameAllocator { fn allocate_frame(&mut self) -> Option { // described above } fn deallocate_frame(&mut self, _frame: Frame) { unimplemented!() } } ``` Now we only need a constructor function to make the allocator usable: ```rust pub fn new(kernel_start: usize, kernel_end: usize, multiboot_start: usize, multiboot_end: usize, memory_areas: MemoryAreaIter) -> AreaFrameAllocator { let mut allocator = AreaFrameAllocator { next_free_frame: Frame::containing_address(0), current_area: None, areas: memory_areas, kernel_start: Frame::containing_address(kernel_start), kernel_end: Frame::containing_address(kernel_end), multiboot_start: Frame::containing_address(multiboot_start), multiboot_end: Frame::containing_address(multiboot_end), }; allocator.choose_next_area(); allocator } ``` Note that we call `choose_next_area` manually here because `allocate_frame` returns `None` as soon as `current_area` is `None`. So by calling `choose_next_area` we initialize it to the area with the minimal base address. ### Testing it In order to test it in main, we need to [re-export] the `AreaFrameAllocator` in the `memory` module. Then we can create a new allocator: [re-export]: https://doc.rust-lang.org/1.30.0/book/first-edition/crates-and-modules.html#re-exporting-with-pub-use ```rust let mut frame_allocator = memory::AreaFrameAllocator::new( kernel_start as usize, kernel_end as usize, multiboot_start, multiboot_end, memory_map_tag.memory_areas()); ``` Now we can test it by adding some frame allocations: ```rust println!("{:?}", frame_allocator.allocate_frame()); ``` You will see that the frame number starts at `0` and increases steadily, but the kernel and Multiboot frames are left out (you need to allocate many frames to see this since the kernel starts at frame 256). The following `for` loop allocates all frames and prints out the total number of allocated frames: ```rust for i in 0.. { if let None = frame_allocator.allocate_frame() { println!("allocated {} frames", i); break; } } ``` You can try different amounts of memory by passing e.g. `-m 500M` to QEMU. To compare these numbers, [WolframAlpha] can be very helpful. [WolframAlpha]: https://www.wolframalpha.com/input/?i=%2832605+*+4096%29+bytes+in+MiB ## Conclusion Now we have a working frame allocator. It is a bit rudimentary and cannot free frames, but it also is very fast since it reuses the Multiboot memory map and does not need any costly initialization. A future post will build upon this allocator and add a stack-like data structure for freed frames. ## What's next? The [next post] will be about paging again. We will use the frame allocator to create a safe module that allows us to switch page tables and map pages. Then we will use this module and the information from the Elf-sections tag to remap the kernel correctly. [next post]: @/edition-1/posts/06-page-tables/index.md ## Recommended Posts Eric Kidd started the [Bare Metal Rust] series last week. Like this post, it builds upon the code from [Printing to Screen], but tries to support keyboard input instead of wrestling through memory management details. [Bare Metal Rust]: http://www.randomhacks.net/bare-metal-rust/ [Printing to Screen]: @/edition-1/posts/04-printing-to-screen/index.md ================================================ FILE: blog/content/edition-1/posts/06-page-tables/index.md ================================================ +++ title = "Page Tables" weight = 6 path = "page-tables" aliases = ["page-tables.html", "modifying-page-tables.html"] date = 2015-12-09 template = "edition-1/page.html" +++ In this post we will create a paging module, which allows us to access and modify the 4-level page table. We will explore recursive page table mapping and use some Rust features to make it safe. Finally we will create functions to translate virtual addresses and to map and unmap pages. You can find the source code and this post itself on [GitHub][source repository]. Please file an issue there if you have any problems or improvement suggestions. There is also a comment section at the end of this page. Note that this post requires a current Rust nightly. [source repository]: https://github.com/phil-opp/blog_os/tree/first_edition_post_6 ## Paging _Paging_ is a memory management scheme that separates virtual and physical memory. The address space is split into equal sized _pages_ and _page tables_ specify which virtual page points to which physical frame. For an extensive paging introduction take a look at the paging chapter ([PDF][paging chapter]) of the [Three Easy Pieces] OS book. [paging chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-paging.pdf [Three Easy Pieces]: http://pages.cs.wisc.edu/~remzi/OSTEP/ The x86 architecture uses a 4-level page table in 64-bit mode. A virtual address has the following structure: ![structure of a virtual address on x86](x86_address_structure.svg) The bits 48–63 are so-called _sign extension_ bits and must be copies of bit 47. The following 36 bits define the page table indexes (9 bits per table) and the last 12 bits specify the offset in the 4KiB page. Each table has 2^9 = 512 entries and each entry is 8 byte. Thus a page table fits exactly in one page (4 KiB). To translate an address, the CPU reads the P4 address from the CR3 register. Then it uses the indexes to walk the tables: ![translation of virtual to physical addresses in 64 bit mode](X86_Paging_64bit.svg) The P4 entry points to a P3 table, where the next 9 bits of the address are used to select an entry. The P3 entry then points to a P2 table and the P2 entry points to a P1 table. The P1 entry, which is specified through bits 12–20, finally points to the physical frame. ## A Basic Paging Module Let's create a basic paging module in `memory/paging/mod.rs`: ```rust use memory::PAGE_SIZE; // needed later const ENTRY_COUNT: usize = 512; pub type PhysicalAddress = usize; pub type VirtualAddress = usize; pub struct Page { number: usize, } ``` We import the `PAGE_SIZE` and define a constant for the number of entries per table. To make future function signatures more expressive, we can use the type aliases `PhysicalAddress` and `VirtualAddress`. The `Page` struct is similar to the `Frame` struct in the [previous post], but represents a virtual page instead of a physical frame. [previous post]: @/edition-1/posts/05-allocating-frames/index.md#a-memory-module ### Page Table Entries To model page table entries, we create a new `entry` submodule: ```rust use memory::Frame; // needed later pub struct Entry(u64); impl Entry { pub fn is_unused(&self) -> bool { self.0 == 0 } pub fn set_unused(&mut self) { self.0 = 0; } } ``` We define that an unused entry is completely 0. That allows us to distinguish unused entries from other non-present entries in the future. For example, we could define one of the available bits as the `swapped_out` bit for pages that are swapped to disk. Next we will model the contained physical address and the various flags. Remember, entries have the following format: Bit(s) | Name | Meaning --------------------- | ------ | ---------------------------------- 0 | present | the page is currently in memory 1 | writable | it's allowed to write to this page 2 | user accessible | if not set, only kernel mode code can access this page 3 | write through caching | writes go directly to memory 4 | disable cache | no cache is used for this page 5 | accessed | the CPU sets this bit when this page is used 6 | dirty | the CPU sets this bit when a write to this page occurs 7 | huge page/null | must be 0 in P1 and P4, creates a 1GiB page in P3, creates a 2MiB page in P2 8 | global | page isn't flushed from caches on address space switch (PGE bit of CR4 register must be set) 9-11 | available | can be used freely by the OS 12-51 | physical address | the page aligned 52bit physical address of the frame or the next page table 52-62 | available | can be used freely by the OS 63 | no execute | forbid executing code on this page (the NXE bit in the EFER register must be set) To model the various flags, we will use the [bitflags] crate. To add it as a dependency, add the following to your `Cargo.toml`: [bitflags]: https://github.com/rust-lang-nursery/bitflags ```toml [dependencies] ... bitflags = "0.9.1" ``` To import the macro, we need to use `#[macro_use]` above the `extern crate` definition: ```rust // in src/lib.rs #[macro_use] extern crate bitflags; ``` Now we can model the various flags: ```rust bitflags! { pub struct EntryFlags: u64 { const PRESENT = 1 << 0; const WRITABLE = 1 << 1; const USER_ACCESSIBLE = 1 << 2; const WRITE_THROUGH = 1 << 3; const NO_CACHE = 1 << 4; const ACCESSED = 1 << 5; const DIRTY = 1 << 6; const HUGE_PAGE = 1 << 7; const GLOBAL = 1 << 8; const NO_EXECUTE = 1 << 63; } } ``` To extract the flags from the entry we create an `Entry::flags` method that uses [from_bits_truncate]: [from_bits_truncate]: https://docs.rs/bitflags/0.9.1/bitflags/example_generated/struct.Flags.html#method.from_bits_truncate ```rust pub fn flags(&self) -> EntryFlags { EntryFlags::from_bits_truncate(self.0) } ``` This allows us to check for flags through the `contains()` function. For example, `flags().contains(PRESENT | WRITABLE)` returns true if the entry contains _both_ flags. To extract the physical address, we add a `pointed_frame` method: ```rust pub fn pointed_frame(&self) -> Option { if self.flags().contains(PRESENT) { Some(Frame::containing_address( self.0 as usize & 0x000fffff_fffff000 )) } else { None } } ``` If the entry is present, we mask bits 12–51 and return the corresponding frame. If the entry is not present, it does not point to a valid frame so we return `None`. To modify entries, we add a `set` method that updates the flags and the pointed frame: ```rust pub fn set(&mut self, frame: Frame, flags: EntryFlags) { assert!(frame.start_address() & !0x000fffff_fffff000 == 0); self.0 = (frame.start_address() as u64) | flags.bits(); } ``` The start address of a frame should be page aligned and smaller than 2^52 (since x86 uses 52bit physical addresses). Since an invalid address could mess up the entry, we add an assertion. To actually set the entry, we just need to `or` the start address and the flag bits. The missing `Frame::start_address` method is pretty simple: ```rust use self::paging::PhysicalAddress; fn start_address(&self) -> PhysicalAddress { self.number * PAGE_SIZE } ``` We add it to the `impl Frame` block in `memory/mod.rs`. ### Page Tables To model page tables, we create a basic `Table` struct in a new `table` submodule: ```rust use memory::paging::entry::*; use memory::paging::ENTRY_COUNT; pub struct Table { entries: [Entry; ENTRY_COUNT], } ``` It's just an array of 512 page table entries. To make the `Table` indexable itself, we can implement the `Index` and `IndexMut` traits: ```rust use core::ops::{Index, IndexMut}; impl Index for Table { type Output = Entry; fn index(&self, index: usize) -> &Entry { &self.entries[index] } } impl IndexMut for Table { fn index_mut(&mut self, index: usize) -> &mut Entry { &mut self.entries[index] } } ``` Now it's possible to get the 42th entry through `some_table[42]`. Of course we could replace `usize` with `u32` or even `u16` here but it would cause more numerical conversions (`x as u16`). Let's add a method that sets all entries to unused. We will need it when we create new page tables in the future. The method looks like this: ```rust pub fn zero(&mut self) { for entry in self.entries.iter_mut() { entry.set_unused(); } } ``` Now we can read page tables and retrieve the mapping information. We can also update them through the `IndexMut` trait and the `Entry::set` method. But how do we get references to the various page tables? We could read the `CR3` register to get the physical address of the P4 table and read its entries to get the P3 addresses. The P3 entries then point to the P2 tables and so on. But this method only works for identity-mapped pages. In the future we will create new page tables, which aren't in the identity-mapped area anymore. Since we can't access them through their physical address, we need a way to map them to virtual addresses. ## Mapping Page Tables So how do we map the page tables itself? We don't have that problem for the current P4, P3, and P2 table since they are part of the identity-mapped area, but we need a way to access future tables, too. One solution is to identity map all page tables. That way we would not need to differentiate virtual and physical addresses and could easily access the tables. But it clutters the virtual address space and increases fragmentation. And it makes creating page tables much more complicated since we need a physical frame whose corresponding page isn't already used for something else. An alternative solution is to map the page tables only temporary. To read/write a page table, we would map it to some free virtual address until we're done. We could use a small pool of such virtual addresses and reuse them for various tables. This method occupies only few virtual addresses and thus is a good solution for 32-bit systems, which have small address spaces. But it makes things much more complicated since we need to temporary map up to 4 tables to access a single page. And the temporary mapping requires modification of other page tables, which need to be mapped, too. We will solve the problem in another way using a trick called _recursive mapping_. ### Recursive Mapping The trick is to map the P4 table recursively: The last entry doesn't point to a P3 table, but to the P4 table itself. We can use this entry to remove a translation level so that we land on a page table instead. For example, we can “loop” once to access a P1 table: ![access P1 table through recursive paging](recursive_mapping_access_p1.svg) By selecting the 511th P4 entry, which points points to the P4 table itself, the P4 table is used as the P3 table. Similarly, the P3 table is used as a P2 table and the P2 table is treated like a P1 table. Thus the P1 table becomes the target page and can be accessed through the offset. It's also possible to access P2 tables by looping twice. And if we select the 511th entry three times, we can access and modify P3 tables: ![access P3 table through recursive paging](recursive_mapping_access_p3.svg) So we just need to specify the desired P3 table in the address through the P1 index. By choosing the 511th entry multiple times, we stay on the P4 table until the address's P1 index becomes the actual P4 index. To access the P4 table itself, we loop once more and thus never leave the frame: ![access P4 table through recursive paging](recursive_mapping_access_p4.svg) So we can access and modify page tables of all levels by just setting one P4 entry once. Most work is done by the CPU, we just the recursive entry to remove one or more translation levels. It may seem a bit strange at first, but it's a clean and simple solution once you wrapped your head around it. By using recursive mapping, each page table is accessible through an unique virtual address. The math checks out, too: If all page tables are used, there is 1 P4 table, 511 P3 tables (the last entry is used for the recursive mapping), `511*512` P2 tables, and `511*512*512` P1 tables. So there are `134217728` page tables altogether. Each page table occupies 4KiB, so we need `134217728 * 4KiB = 512GiB` to store them. That's exactly the amount of memory that can be accessed through one P4 entry since `4KiB per page * 512 P1 entries * 512 P2 entries * 512 P3 entries = 512GiB`. Of course recursive mapping has some disadvantages, too. It occupies a P4 entry and thus 512GiB of the virtual address space. But since we're in long mode and have a 48-bit address space, there are still 225.5TiB left. The bigger problem is that only the active table can be modified by default. To access another table, the recursive entry needs to be replaced temporary. We will tackle this problem in the next post when we switch to a new page table. ### Implementation To map the P4 table recursively, we just need to point the 511th entry to the table itself. Of course we could do it in Rust, but it would require some fiddling with unsafe pointers. It's easier to just add some lines to our boot assembly: ```nasm mov eax, p4_table or eax, 0b11 ; present + writable mov [p4_table + 511 * 8], eax ``` I put it right after the `set_up_page_tables` label, but you can add it wherever you like. Now we can use special virtual addresses to access the page tables. The P4 table is available at `0xfffffffffffff000`. Let's add a P4 constant to the `table` submodule: ```rust pub const P4: *mut Table = 0xffffffff_fffff000 as *mut _; ``` Let's switch to the octal system, since it makes more sense for the other special addresses. The P4 address from above is equivalent to `0o177777_777_777_777_777_0000` in octal. You can see that is has index `777` in all tables and offset `0000`. The `177777` bits on the left are the sign extension bits, which are copies of the 47th bit. They are required because x86 only uses 48bit virtual addresses. The other tables can be accessed through the following addresses: Table | Address | Indexes ----- | ------------------------------- | ---------------------------------- P4 | `0o177777_777_777_777_777_0000` | – P3 | `0o177777_777_777_777_XXX_0000` | `XXX` is the P4 index P2 | `0o177777_777_777_XXX_YYY_0000` | like above, and `YYY` is the P3 index P1 | `0o177777_777_XXX_YYY_ZZZ_0000` | like above, and `ZZZ` is the P2 index If we look closely, we can see that the P3 address is equal to `(P4 << 9) | XXX_0000`. And the P2 address is calculated through `(P3 << 9) | YYY_0000`. So to get the next address, we need to shift it 9 bits to the left and add the table index. As a formula: ``` next_table_address = (table_address << 9) | (index << 12) ``` ### The `next_table` Methods Let's add the above formula as a `Table` method: ```rust fn next_table_address(&self, index: usize) -> Option { let entry_flags = self[index].flags(); if entry_flags.contains(PRESENT) && !entry_flags.contains(HUGE_PAGE) { let table_address = self as *const _ as usize; Some((table_address << 9) | (index << 12)) } else { None } } ``` The next table address is only valid if the corresponding entry is present and does not create a huge page. Then we can do some pointer casting to get the table address and use the formula to calculate the next address. If the index is out of bounds, the function will panic since Rust checks array bounds. The panic is desired here since a wrong index should not be possible and indicates a bug. To convert the address into references, we add two functions: ```rust pub fn next_table(&self, index: usize) -> Option<&Table> { self.next_table_address(index) .map(|address| unsafe { &*(address as *const _) }) } pub fn next_table_mut(&mut self, index: usize) -> Option<&mut Table> { self.next_table_address(index) .map(|address| unsafe { &mut *(address as *mut _) }) } ``` We convert the address into raw pointers through `as` casts and then convert them into Rust references through `&mut *`. The latter is an `unsafe` operation since Rust can't guarantee that the raw pointer is valid. Note that `self` stays borrowed as long as the returned reference is valid. This is because of Rust's [lifetime elision] rules. Basically, these rules say that the lifetime of an output reference is the same as the lifetime of the input reference by default. So the above function signatures are expanded to: [lifetime elision]: https://doc.rust-lang.org/1.30.0/book/first-edition/lifetimes.html#lifetime-elision ```rust pub fn next_table<'a>(&'a self, index: usize) -> Option<&'a Table> {...} pub fn next_table_mut<'a>(&'a mut self, index: usize) -> Option<&'a mut Table> {...} ``` Note the additional lifetime parameters, which are identical for input and output references. That's exactly what we want. It ensures that we can't modify tables as long as we have references to lower tables. For example, it would be very bad if we could unmap a P3 table if we still write to one of its P2 tables. #### Safety Now we can start at the `P4` constant and use the `next_table` functions to access the lower tables. And we don't even need `unsafe` blocks to do it! Right now, your alarm bells should be ringing. Thanks to Rust, everything we've done before in this post was completely safe. But we just introduced two unsafe blocks to convince Rust that there are valid tables at the specified addresses. Can we really be sure? First, these addresses are only valid if the P4 table is mapped recursively. Since the paging module will be the only module that modifies page tables, we can introduce an invariant for the module: > _The 511th entry of the active P4 table must always be mapped to the active P4 table itself._ So if we switch to another P4 table at some time, it needs to be identity mapped _before_ it becomes active. As long as we obey this invariant, we can safely use the special addresses. But even with this invariant, there is a big problem with the two methods: _What happens if we call them on a P1 table?_ Well, they would calculate the address of the next table (which does not exist) and treat it as a page table. Either they construct an invalid address (if `XXX < 400`)[^fn-invalid-address] or access the mapped page itself. That way, we could easily corrupt memory or cause CPU exceptions by accident. So these two functions are not _safe_ in Rust terms. Thus we need to make them `unsafe` functions unless we find some clever solution. ## Some Clever Solution We can use Rust's type system to statically guarantee that the `next_table` methods can only be called on P4, P3, and P2 tables, but not on a P1 table. The idea is to add a `Level` parameter to the `Table` type and implement the `next_table` methods only for level 4, 3, and 2. To model the levels we use a trait and empty enums: ```rust pub trait TableLevel {} pub enum Level4 {} pub enum Level3 {} pub enum Level2 {} pub enum Level1 {} impl TableLevel for Level4 {} impl TableLevel for Level3 {} impl TableLevel for Level2 {} impl TableLevel for Level1 {} ``` An empty enum has size zero and disappears completely after compiling. Unlike an empty struct, it's not possible to instantiate an empty enum. Since we will use `TableLevel` and the table levels in exported types, they need to be public. To differentiate the P1 table from the other tables, we introduce a `HierarchicalLevel` trait, which is a subtrait of `TableLevel`. But we implement it only for the levels 4, 3, and 2: ```rust pub trait HierarchicalLevel: TableLevel {} impl HierarchicalLevel for Level4 {} impl HierarchicalLevel for Level3 {} impl HierarchicalLevel for Level2 {} ``` Now we add the level parameter to the `Table` type: ```rust use core::marker::PhantomData; pub struct Table { entries: [Entry; ENTRY_COUNT], level: PhantomData, } ``` We need to add a [PhantomData] field because unused type parameters are not allowed in Rust. [PhantomData]: https://doc.rust-lang.org/core/marker/struct.PhantomData.html#unused-type-parameters Since we changed the `Table` type, we need to update every use of it: ```rust pub const P4: *mut Table = 0xffffffff_fffff000 as *mut _; ... impl Table where L: TableLevel { pub fn zero(&mut self) {...} } impl Table where L: HierarchicalLevel { pub fn next_table(&self, index: usize) -> Option<&Table> {...} pub fn next_table_mut(&mut self, index: usize) -> Option<&mut Table> {...} fn next_table_address(&self, index: usize) -> Option {...} } impl Index for Table where L: TableLevel {...} impl IndexMut for Table where L: TableLevel {...} ``` Now the `next_table` methods are only available for P4, P3, and P2 tables. But they have the incomplete return type `Table` now. What should we fill in for the `???`? For a P4 table we would like to return a `Table`, for a P3 table a `Table`, and for a P2 table a `Table`. So we want to return a table of the _next level_. We can define the next level by adding an associated type to the `HierarchicalLevel` trait: ```rust trait HierarchicalLevel: TableLevel { type NextLevel: TableLevel; } impl HierarchicalLevel for Level4 { type NextLevel = Level3; } impl HierarchicalLevel for Level3 { type NextLevel = Level2; } impl HierarchicalLevel for Level2 { type NextLevel = Level1; } ``` Now we can replace the `Table` types with `Table` types and our code works as intended. You can try it with a simple test function: ```rust fn test() { let p4 = unsafe { &*P4 }; p4.next_table(42) .and_then(|p3| p3.next_table(1337)) .and_then(|p2| p2.next_table(0xdeadbeaf)) .and_then(|p1| p1.next_table(0xcafebabe)) } ``` Most of the indexes are completely out of bounds, so it would panic if it's called. But we don't need to call it since it already fails at compile time: ``` error: no method named `next_table` found for type `&memory::paging::table::Table` in the current scope ``` Remember that this is bare metal kernel code. We just used type system magic to make low-level page table manipulations safer. Rust is just awesome! ## Translating Addresses Now let's do something useful with our new module. We will create a function that translates a virtual address to the corresponding physical address. We add it to the `paging/mod.rs` module: ```rust pub fn translate(virtual_address: VirtualAddress) -> Option { let offset = virtual_address % PAGE_SIZE; translate_page(Page::containing_address(virtual_address)) .map(|frame| frame.number * PAGE_SIZE + offset) } ``` It uses two functions we haven't defined yet: `translate_page` and `Page::containing_address`. Let's start with the latter: ```rust pub fn containing_address(address: VirtualAddress) -> Page { assert!(address < 0x0000_8000_0000_0000 || address >= 0xffff_8000_0000_0000, "invalid address: 0x{:x}", address); Page { number: address / PAGE_SIZE } } ``` The assertion is needed because there can be invalid addresses. Addresses on x86 are just 48-bit long and the other bits are just _sign extension_, i.e. a copy of the most significant bit. For example: ``` invalid address: 0x0000_8000_0000_0000 valid address: 0xffff_8000_0000_0000 └── bit 47 ``` So the address space is split into two halves: the _higher half_ containing addresses with sign extension and the _lower half_ containing addresses without. Everything in between is invalid. Since we added `containing_address`, we add the inverse method as well (maybe we need it later): ```rust fn start_address(&self) -> usize { self.number * PAGE_SIZE } ``` The other missing function, `translate_page`, looks like this: ```rust use memory::Frame; fn translate_page(page: Page) -> Option { use self::entry::HUGE_PAGE; let p3 = unsafe { &*table::P4 }.next_table(page.p4_index()); let huge_page = || { // TODO }; p3.and_then(|p3| p3.next_table(page.p3_index())) .and_then(|p2| p2.next_table(page.p2_index())) .and_then(|p1| p1[page.p1_index()].pointed_frame()) .or_else(huge_page) } ``` We use an unsafe block to convert the raw `P4` pointer to a reference. Then we use the [Option::and_then] function to go through the four table levels. If some entry along the way is `None`, we check if the page is a huge page through the (unimplemented) `huge_page` closure. The `Page::p*_index` functions return the different table indexes. They look like this: [Option::and_then]: https://doc.rust-lang.org/nightly/core/option/enum.Option.html#method.and_then ```rust fn p4_index(&self) -> usize { (self.number >> 27) & 0o777 } fn p3_index(&self) -> usize { (self.number >> 18) & 0o777 } fn p2_index(&self) -> usize { (self.number >> 9) & 0o777 } fn p1_index(&self) -> usize { (self.number >> 0) & 0o777 } ``` ### Safety We use an `unsafe` block to convert the raw `P4` pointer into a shared reference. It's safe because we don't create any `&mut` references to the table right now and don't switch the P4 table either. But as soon as we do something like that, we have to revisit this method. ### Huge Pages The `huge_page` closure calculates the corresponding frame if huge pages are used. Its content looks like this: ```rust p3.and_then(|p3| { let p3_entry = &p3[page.p3_index()]; // 1GiB page? if let Some(start_frame) = p3_entry.pointed_frame() { if p3_entry.flags().contains(HUGE_PAGE) { // address must be 1GiB aligned assert!(start_frame.number % (ENTRY_COUNT * ENTRY_COUNT) == 0); return Some(Frame { number: start_frame.number + page.p2_index() * ENTRY_COUNT + page.p1_index(), }); } } if let Some(p2) = p3.next_table(page.p3_index()) { let p2_entry = &p2[page.p2_index()]; // 2MiB page? if let Some(start_frame) = p2_entry.pointed_frame() { if p2_entry.flags().contains(HUGE_PAGE) { // address must be 2MiB aligned assert!(start_frame.number % ENTRY_COUNT == 0); return Some(Frame { number: start_frame.number + page.p1_index() }); } } } None }) ``` This function is much longer and more complex than the `translate_page` function itself. To avoid this complexity in the future, we will only work with standard 4KiB pages from now on. ## Mapping Pages Let's add a function that modifies the page tables to map a `Page` to a `Frame`: ```rust pub use self::entry::*; use memory::FrameAllocator; pub fn map_to(page: Page, frame: Frame, flags: EntryFlags, allocator: &mut A) where A: FrameAllocator { let p4 = unsafe { &mut *P4 }; let mut p3 = p4.next_table_create(page.p4_index(), allocator); let mut p2 = p3.next_table_create(page.p3_index(), allocator); let mut p1 = p2.next_table_create(page.p2_index(), allocator); assert!(p1[page.p1_index()].is_unused()); p1[page.p1_index()].set(frame, flags | PRESENT); } ``` We add an re-export for all `entry` types since they are required to call the function. We assert that the page is unmapped and always set the present flag (since it wouldn't make sense to map a page without setting it). The `Table::next_table_create` method doesn't exist yet. It should return the next table if it exists, or create a new one. For the implementation we need the `FrameAllocator` from the [previous post] and the `Table::zero` method: ```rust use memory::FrameAllocator; pub fn next_table_create(&mut self, index: usize, allocator: &mut A) -> &mut Table where A: FrameAllocator { if self.next_table(index).is_none() { assert!(!self.entries[index].flags().contains(HUGE_PAGE), "mapping code does not support huge pages"); let frame = allocator.allocate_frame().expect("no frames available"); self.entries[index].set(frame, PRESENT | WRITABLE); self.next_table_mut(index).unwrap().zero(); } self.next_table_mut(index).unwrap() } ``` We can use `unwrap()` here since the next table definitely exists. ### Safety We used an `unsafe` block in `map_to` to convert the raw `P4` pointer to a `&mut` reference. That's bad. It's now possible that the `&mut` reference is not exclusive, which breaks Rust's guarantees. It's only a matter time before we run into a data race. For example, imagine that one thread maps an entry to `frame_A` and another thread (on the same core) tries to map the same entry to `frame_B`. The problem is that there's no clear _owner_ for the page tables. So let's define page table ownership! ### Page Table Ownership We define the following: > A page table owns all of its subtables. We already obey this rule: To get a reference to a table, we need to borrow it from its parent table through the `next_table` method. But who owns the P4 table? > The recursively mapped P4 table is owned by a `ActivePageTable` struct. We just defined some random owner for the P4 table. But it will solve our problems. And it will also provide the interface to other modules. So let's create the struct: ```rust use self::table::{Table, Level4}; use core::ptr::Unique; pub struct ActivePageTable { p4: Unique>, } ``` We can't store the `Table` directly because it needs to be at a special memory location (like the [VGA text buffer]). We could use a raw pointer or `&mut` instead of [Unique], but Unique indicates ownership better. [VGA text buffer]: @/edition-1/posts/04-printing-to-screen/index.md#the-text-buffer [Unique]: https://doc.rust-lang.org/1.10.0/core/ptr/struct.Unique.html Because the `ActivePageTable` owns the unique recursive mapped P4 table, there must be only one `ActivePageTable` instance. Thus we make the constructor function unsafe: ```rust impl ActivePageTable { pub unsafe fn new() -> ActivePageTable { ActivePageTable { p4: Unique::new_unchecked(table::P4), } } } ``` We add some methods to get P4 references: ```rust fn p4(&self) -> &Table { unsafe { self.p4.as_ref() } } fn p4_mut(&mut self) -> &mut Table { unsafe { self.p4.as_mut() } } ``` Since we will only create valid P4 pointers, the `unsafe` blocks are safe. However, we don't make these functions public since they can be used to make page tables invalid. Only the higher level functions (such as `translate` or `map_to`) should be usable from other modules. Now we can make the `map_to` and `translate` functions safe by making them methods of `ActivePageTable`: ```rust impl ActivePageTable { pub unsafe fn new() -> ActivePageTable {...} fn p4(&self) -> &Table {...} fn p4_mut(&mut self) -> &mut Table {...} pub fn translate(&self, virtual_address: VirtualAddress) -> Option { ... self.translate_page(...).map(...) } fn translate_page(&self, page: Page) -> Option { let p3 = self.p4().next_table(...); ... } pub fn map_to(&mut self, page: Page, frame: Frame, flags: EntryFlags, allocator: &mut A) where A: FrameAllocator { let mut p3 = self.p4_mut().next_table_create(...); ... } } ``` Now the `p4()` and `p4_mut()` methods should be the only methods containing an `unsafe` block in the `paging/mod.rs` file. ### More Mapping Functions For convenience, we add a `map` method that just picks a free frame for us: ```rust pub fn map(&mut self, page: Page, flags: EntryFlags, allocator: &mut A) where A: FrameAllocator { let frame = allocator.allocate_frame().expect("out of memory"); self.map_to(page, frame, flags, allocator) } ``` We also add a `identity_map` function to make it easier to remap the kernel in the next post: ```rust pub fn identity_map(&mut self, frame: Frame, flags: EntryFlags, allocator: &mut A) where A: FrameAllocator { let page = Page::containing_address(frame.start_address()); self.map_to(page, frame, flags, allocator) } ``` ### Unmapping Pages To unmap a page, we set the corresponding P1 entry to unused: ```rust fn unmap(&mut self, page: Page, allocator: &mut A) where A: FrameAllocator { assert!(self.translate(page.start_address()).is_some()); let p1 = self.p4_mut() .next_table_mut(page.p4_index()) .and_then(|p3| p3.next_table_mut(page.p3_index())) .and_then(|p2| p2.next_table_mut(page.p2_index())) .expect("mapping code does not support huge pages"); let frame = p1[page.p1_index()].pointed_frame().unwrap(); p1[page.p1_index()].set_unused(); // TODO free p(1,2,3) table if empty allocator.deallocate_frame(frame); } ``` The assertion ensures that the page is mapped. Thus the corresponding P1 table and frame must exist for a standard 4KiB page. We set the entry to unused and free the associated frame in the supplied frame allocator. We can also free the P1, P2, or even P3 table when the last entry is freed. But checking the whole table on every `unmap` would be very expensive. So we leave the `TODO` in place until we find a good solution. I'm open for suggestions :). _Spoiler_: There is an ugly bug in this function, which we will find in the next section. ## Testing and Bugfixing To test it, we add a `test_paging` function in `memory/paging/mod.rs`: ```rust pub fn test_paging(allocator: &mut A) where A: FrameAllocator { let mut page_table = unsafe { ActivePageTable::new() }; // test it } ``` We borrow the frame allocator since we will need it for the mapping functions. To be able to call that function from main, we need to re-export it in `memory/mod.rs`: ```rust // in memory/mod.rs pub use self::paging::test_paging; // lib.rs let mut frame_allocator = ...; memory::test_paging(&mut frame_allocator); ``` ### map_to Let's test the `map_to` function: ```rust let addr = 42 * 512 * 512 * 4096; // 42th P3 entry let page = Page::containing_address(addr); let frame = allocator.allocate_frame().expect("no more frames"); println!("None = {:?}, map to {:?}", page_table.translate(addr), frame); page_table.map_to(page, frame, EntryFlags::empty(), allocator); println!("Some = {:?}", page_table.translate(addr)); println!("next free frame: {:?}", allocator.allocate_frame()); ``` We just map some random page to a free frame. To be able to borrow the page table as `&mut`, we need to make it mutable. You should see output similar to this: ``` None = None, map to Frame { number: 0 } Some = Some(0) next free frame: Some(Frame { number: 3 }) ``` It's frame 0 because it's the first frame returned by the frame allocator. Since we map the 42th P3 entry, the mapping code needs to create a P2 and a P1 table. So the next free frame returned by the allocator is frame 3. ### unmap To test the `unmap` function, we unmap the test page so that it translates to `None` again: ```rust page_table.unmap(Page::containing_address(addr), allocator); println!("None = {:?}", page_table.translate(addr)); ``` It causes a panic since we call the unimplemented `deallocate_frame` method in `unmap`. If we comment this call out, it works without problems. But there is some bug in this function nevertheless. Let's read something from the mapped page (of course before we unmap it again): ```rust println!("{:#x}", unsafe { *(Page::containing_address(addr).start_address() as *const u64) }); ``` Since we don't zero the mapped pages, the output is random. For me, it's `0xf000ff53f000ff53`. If `unmap` worked correctly, reading it again after unmapping should cause a page fault. But it doesn't. Instead, it just prints the same number again. When we remove the first read, we get the desired page fault (i.e. QEMU reboots again and again). So this seems to be some cache issue. An x86 processor has many different caches because always accessing the main memory would be very slow. Most of these caches are completely _transparent_. That means everything works exactly the same as without them, it's just much faster. But there is one cache, that needs to be updated manually: the _translation lookaside buffer_. The translation lookaside buffer, or TLB, caches the translation of virtual to physical addresses. It's filled automatically when a page is accessed. But it's not updated transparently when the mapping of a page changes. This is the reason that we still can access the page even through we unmapped it in the page table. So to fix our `unmap` function, we need to remove the cached translation from the TLB. We can use the [x86_64][x86_64 crate] crate to do this easily. To add it, we append the following to our `Cargo.toml`: [x86_64 crate]: https://docs.rs/x86_64 ```toml [dependencies] ... x86_64 = "0.1.2" ``` Now we can use it to fix `unmap`: ```rust ... p1[page.p1_index()].set_unused(); use x86_64::instructions::tlb; use x86_64::VirtualAddress; tlb::flush(VirtualAddress(page.start_address())); // TODO free p(1,2,3) table if empty //allocator.deallocate_frame(frame); } ``` Now the desired page fault occurs even when we access the page before. ## Conclusion This post has become pretty long. So let's summarize what we've done: - we created a paging module and modeled page tables plus entries - we mapped the P4 page recursively and created `next_table` methods - we used empty enums and associated types to make the `next_table` functions safe - we wrote a function to translate virtual to physical addresses - we created safe functions to map and unmap pages - and we fixed stack overflow and TLB related bugs ## What's next? In the [next post] we will extend this module and add a function to modify inactive page tables. Through that function, we will create a new page table hierarchy that maps the kernel correctly using 4KiB pages. Then we will switch to the new table to get a safer kernel environment. [next post]: @/edition-1/posts/07-remap-the-kernel/index.md Afterwards, we will use this paging module to build a heap allocator. This will allow us to use allocation and collection types such as `Box` and `Vec`. Image sources: [^virtual_physical_translation_source] ## Footnotes [^fn-invalid-address]: If the `XXX` part of the address is smaller than `0o400`, it's binary representation doesn't start with `1`. But the sign extension bits, which should be a copy of that bit, are `1` instead of `0`. Thus the address is not valid. [^virtual_physical_translation_source]: Image sources: Modified versions of an image from [Wikipedia](https://commons.wikimedia.org/wiki/File:X86_Paging_64bit.svg). The modified files are licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license. ================================================ FILE: blog/content/edition-1/posts/07-remap-the-kernel/index.md ================================================ +++ title = "Remap the Kernel" weight = 7 path = "remap-the-kernel" aliases = ["remap-the-kernel.html"] date = 2016-01-01 template = "edition-1/page.html" [extra] updated = "2016-03-06" +++ In this post we will create a new page table to map the kernel sections correctly. Therefore we will extend the paging module to support modifications of _inactive_ page tables as well. Then we will switch to the new table and secure our kernel stack by creating a guard page. As always, you can find the source code on [GitHub]. Don't hesitate to file issues there if you have any problems or improvement suggestions. There is also a comment section at the end of this page. Note that this post requires a current Rust nightly. [GitHub]: https://github.com/phil-opp/blog_os/tree/first_edition_post_7 ## Motivation In the [previous post], we had a strange bug in the `unmap` function. Its reason was a silent stack overflow, which corrupted the page tables. Fortunately, our kernel stack is right above the page tables so that we noticed the overflow relatively quickly. This won't be the case when we add threads with new stacks in the future. Then a silent stack overflow could overwrite some data without us noticing. But eventually some completely unrelated function fails because a variable changed its value. [previous post]: @/edition-1/posts/06-page-tables/index.md As you can imagine, these kinds of bugs are horrendous to debug. For that reason we will create a new hierarchical page table in this post, which has _guard page_ below the stack. A guard page is basically an unmapped page that causes a page fault when accessed. Thus we can catch stack overflows right when they happen. Also, we will use the [information about kernel sections] to map the various sections individually instead of blindly mapping the first gigabyte. To improve safety even further, we will set the correct page table flags for the various sections. Thus it won't be possible to modify the contents of `.text` or to execute code from `.data` anymore. [information about kernel sections]: @/edition-1/posts/05-allocating-frames/index.md#kernel-elf-sections ## Preparation There are many things that can go wrong when we switch to a new table. Therefore it's a good idea to [set up a debugger][set up gdb]. You should not need it when you follow this post, but it's good to know how to debug a problem when it occurs[^fn-debug-notes]. [set up gdb]: @/edition-1/extra/set-up-gdb/index.md We also update the `Page` and `Frame` types to make our lives easier. The `Page` struct gets some derived traits: ```rust // in src/memory/paging/mod.rs #[derive(Debug, Clone, Copy)] pub struct Page { number: usize, } ``` By making it [Copy][Copy trait], we can still use it after passing it to functions such as `map_to`. We also make the `Page::containing_address` public (if it isn't already). [Copy trait]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html The `Frame` type gets a `clone` method too, but it does not implement the [Clone trait]: [Clone trait]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html ```rust // in src/memory/mod.rs impl Frame { ... fn clone(&self) -> Frame { Frame { number: self.number } } } ``` The big difference is that this `clone` method is private. If we implemented the [Clone trait], it would be public and usable from other modules. For example they could abuse it to free the same frame twice in the frame allocator. So why do we implement `Copy` for `Page` and make even its constructor public, but keep `Frame` as private as possible? The reason is that we can easily check the status of a `Page` by looking at the page tables. For example, the `map_to` function can easily check that the given page is unused. We can't do that for a `Frame`. If we wanted to be sure that a given frame is unused, we would need to look at all mapped _pages_ and verify that none of them is mapped to the given frame. Since this is impractical, we need to rely on the fact that a passed `Frame` is always unused. For that reason it must not be possible to create a new `Frame` or to clone one from other modules. The only valid way to get a frame is to allocate it from a `FrameAllocator`. ## Recap: The Paging Module This post builds upon the post about [page tables][previous post], so let's start by quickly recapitulating what we've done there. We created a `memory::paging` module, which reads and modifies the hierarchical page table through recursive mapping. The owner of the active P4 table and thus all subtables is an `ActivePageTable` struct, which must be instantiated only once. The `ActivePageTable` struct provides the following interface: ```rust /// Translates a virtual to the corresponding physical address. /// Returns `None` if the address is not mapped. pub fn translate(&self, virtual_address: VirtualAddress) -> Option {...} /// Maps the page to the frame with the provided flags. /// The `PRESENT` flag is added by default. Needs a /// `FrameAllocator` as it might need to create new page tables. pub fn map_to(&mut self, page: Page, frame: Frame, flags: EntryFlags, allocator: &mut A) where A: FrameAllocator {...} /// Maps the page to some free frame with the provided flags. /// The free frame is allocated from the given `FrameAllocator`. pub fn map(&mut self, page: Page, flags: EntryFlags, allocator: &mut A) where A: FrameAllocator {...} /// Identity map the the given frame with the provided flags. /// The `FrameAllocator` is used to create new page tables if needed. pub fn identity_map(&mut self, frame: Frame, flags: EntryFlags, allocator: &mut A) where A: FrameAllocator {...} /// Unmaps the given page and adds all freed frames to the given /// `FrameAllocator`. fn unmap(&mut self, page: Page, allocator: &mut A) where A: FrameAllocator {...} ``` ## Overview Our goal is to use the `ActivePageTable` functions to map the kernel sections correctly in a new page table. In pseudo code: ```rust fn remap_the_kernel(boot_info: &BootInformation) { let new_table = create_new_table(); for section in boot_info.elf_sections { for frame in section { new_table.identity_map(frame, section.flags); } } new_table.activate(); create_guard_page_for_stack(); } ``` But the `ActivePageTable` methods – as the name suggests – only work for the _active table_. So we would need to activate `new_table` _before_ we use `identity_map`. But this is not possible since it would cause an immediate page fault when the CPU tries to read the next instruction. So we need a way to use the `ActivePageTable` methods on _inactive_ page tables as well. ## Inactive Tables Let's start by creating a type for inactive page tables. Like an `ActivePageTable`, an `InactivePageTable` owns a P4 table. The difference is that the inactive P4 table is not used by the CPU. We create the struct in `memory/paging/mod.rs`: ```rust pub struct InactivePageTable { p4_frame: Frame, } impl InactivePageTable { pub fn new(frame: Frame) -> InactivePageTable { // TODO zero and recursive map the frame InactivePageTable { p4_frame: frame } } } ``` Without zeroing, the P4 table contains complete garbage and maps random memory. But we can't zero it right now because the `p4_frame` is not mapped to a virtual address. Well, maybe it's still part of the identity mapped first gigabyte. Then we could zero it without problems since the physical address would be a valid virtual address, too. But this “solution” is hacky and won't work after this post anymore (since we will remove all needless identity mapping). Instead, we will try to temporary map the frame to some virtual address. ### Temporary Mapping Therefor we add a `TemporaryPage` struct. We create it in a new `temporary_page` submodule to keep the paging module clean. It looks like this: ```rust // src/memory/paging/mod.rs mod temporary_page; // src/memory/paging/temporary_page.rs use super::Page; pub struct TemporaryPage { page: Page, } ``` We add methods to temporary map and unmap the page: ```rust use super::{ActivePageTable, VirtualAddress}; use memory::Frame; impl TemporaryPage { /// Maps the temporary page to the given frame in the active table. /// Returns the start address of the temporary page. pub fn map(&mut self, frame: Frame, active_table: &mut ActivePageTable) -> VirtualAddress { use super::entry::WRITABLE; assert!(active_table.translate_page(self.page).is_none(), "temporary page is already mapped"); active_table.map_to(self.page, frame, WRITABLE, ???); self.page.start_address() } /// Unmaps the temporary page in the active table. pub fn unmap(&mut self, active_table: &mut ActivePageTable) { active_table.unmap(self.page, ???) } } ``` The `???` needs to be some `FrameAllocator`. We could just add an additional `allocator` argument but there is a better solution. It takes advantage of the fact that we always map the same page. So the allocator only needs to hold 3 frames: one P3, one P2, and one P1 table (the P4 table is always mapped). This allows us to create a tiny allocator and add it as field to the `TemporaryPage` struct itself: ```rust pub struct TemporaryPage { page: Page, allocator: TinyAllocator, } impl TemporaryPage { // as above, but with `&mut self.allocator` instead of `???` } struct TinyAllocator([Option; 3]); ``` Our tiny allocator just consists of 3 slots to store frames. It will be empty when the temporary page is mapped and full when all corresponding page tables are unmapped. To turn `TinyAllocator` into a frame allocator, we need to add the trait implementation: ```rust use memory::FrameAllocator; impl FrameAllocator for TinyAllocator { fn allocate_frame(&mut self) -> Option { for frame_option in &mut self.0 { if frame_option.is_some() { return frame_option.take(); } } None } fn deallocate_frame(&mut self, frame: Frame) { for frame_option in &mut self.0 { if frame_option.is_none() { *frame_option = Some(frame); return; } } panic!("Tiny allocator can hold only 3 frames."); } } ``` On allocation, we use the [Option::take] function to take an available frame from the first filled slot and on deallocation, we put the frame back into the first free slot. [Option::take]: https://doc.rust-lang.org/nightly/core/option/enum.Option.html#method.take To finish the `TinyAllocator`, we add a constructor that fills it from some other allocator: ```rust impl TinyAllocator { fn new(allocator: &mut A) -> TinyAllocator where A: FrameAllocator { let mut f = || allocator.allocate_frame(); let frames = [f(), f(), f()]; TinyAllocator(frames) } } ``` We use a little closure here that saves us some typing. Now our `TemporaryPage` type is nearly complete. We only add one more method for convenience: ```rust use super::table::{Table, Level1}; /// Maps the temporary page to the given page table frame in the active /// table. Returns a reference to the now mapped table. pub fn map_table_frame(&mut self, frame: Frame, active_table: &mut ActivePageTable) -> &mut Table { unsafe { &mut *(self.map(frame, active_table) as *mut Table) } } ``` This function interprets the given frame as a page table frame and returns a `Table` reference. We return a table of level 1 because it [forbids calling the `next_table` methods][some clever solution]. Calling `next_table` must not be possible since it's not a page of the recursive mapping. To be able to return a `Table`, we need to make the `Level1` enum in `memory/paging/table.rs` public. [some clever solution]: @/edition-1/posts/06-page-tables/index.md#some-clever-solution The `unsafe` block is safe since the `VirtualAddress` returned by the `map` function is always valid and the type cast just reinterprets the frame's content. To complete the `temporary_page` module, we add a `TemporaryPage::new` constructor: ```rust pub fn new(page: Page, allocator: &mut A) -> TemporaryPage where A: FrameAllocator { TemporaryPage { page: page, allocator: TinyAllocator::new(allocator), } } ``` ### Zeroing the InactivePageTable Now we can use `TemporaryPage` to fix our `InactivePageTable::new` function: ```rust // in src/memory/paging/mod.rs use self::temporary_page::TemporaryPage; impl InactivePageTable { pub fn new(frame: Frame, active_table: &mut ActivePageTable, temporary_page: &mut TemporaryPage) -> InactivePageTable { { let table = temporary_page.map_table_frame(frame.clone(), active_table); // now we are able to zero the table table.zero(); // set up recursive mapping for the table table[511].set(frame.clone(), PRESENT | WRITABLE); } temporary_page.unmap(active_table); InactivePageTable { p4_frame: frame } } } ``` We added two new arguments, `active_table` and `temporary_page`. We need an [inner scope] to ensure that the `table` variable is dropped before we try to unmap the temporary page again. This is required since the `table` variable exclusively borrows `temporary_page` as long as it's alive. [inner scope]: https://doc.rust-lang.org/rust-by-example/variable_bindings/scope.html Now we are able to create valid inactive page tables, which are zeroed and recursively mapped. But we still can't modify them. To resolve this problem, we need to look at recursive mapping again. ## Revisiting Recursive Mapping Recursive mapping works by mapping the last P4 entry to the P4 table itself. Thus we can access the page tables by looping one or more times. For example, accessing a P3 table requires lopping three times: ![access active P3 table through recursive mapping](recursive_mapping_access_p3.svg) We can use the same mechanism to access inactive tables. The trick is to change the recursive mapping of the active P4 table to point to the inactive P4 table: ![access inactive P3 table through recursive mapping](recursive_mapping_access_p3_inactive_table.svg) Now the inactive table can be accessed exactly as the active table, even the magic addresses are the same. This allows us to use the `ActivePageTable` interface and the existing mapping methods for inactive tables, too. Note that everything besides the recursive mapping continues to work exactly as before since we've never changed the active table in the CPU. ### Implementation Draft We add a method to `ActivePageTable` that temporary changes the recursive mapping and executes a given closure in the new context: ```rust pub fn with(&mut self, table: &mut InactivePageTable, f: F) where F: FnOnce(&mut ActivePageTable) { use x86_64::instructions::tlb; // overwrite recursive mapping self.p4_mut()[511].set(table.p4_frame.clone(), PRESENT | WRITABLE); tlb::flush_all(); // execute f in the new context f(self); // TODO restore recursive mapping to original p4 table } ``` It overwrites the 511th P4 entry and points it to the inactive table frame. Then it flushes the [translation lookaside buffer (TLB)][TLB], which still contains some old translations. We need to flush all pages that are part of the recursive mapping, so the easiest way is to flush the TLB completely. [TLB]: https://wiki.osdev.org/TLB Now that the recursive mapping points to the given inactive table, we execute the closure in the new context. The closure can call all active table methods such as `translate` or `map_to`. It could even call `with` again and chain another inactive table! Wait… that would not work: ![access inactive P3 table through recursive mapping](recursive_mapping_access_p1_invalid_chaining.svg) Here the closure called `with` again and thus changed the recursive mapping of the inactive table to point to a second inactive table. Now we want to modify the P1 of the _second_ inactive table, but instead we land on the P1 of the _first_ inactive table since we never follow the pointer to the second table. Only when modifying the P2, P3, or P4 table we really access the second inactive table. This inconsistency would break our mapping functions completely. So we should really prohibit the closure from calling `with` again. We could add some runtime assertion that panics when the active table is not recursive mapped anymore. But a cleaner solution is to split off the mapping code from `ActivePageTable` into a new `Mapper` type. ### Refactoring We start by creating a new `memory/paging/mapper.rs` submodule and moving the `ActivePageTable` struct and its `impl` block to it. Then we rename it to `Mapper` and make all methods public (so we can still use them from the paging module). The `with` method is removed. After adjusting the imports, the module should look like this: ```rust // in memory/paging/mod.rs mod mapper; // memory/paging/mapper.rs use super::{VirtualAddress, PhysicalAddress, Page, ENTRY_COUNT}; use super::entry::*; use super::table::{self, Table, Level4, Level1}; use memory::{PAGE_SIZE, Frame, FrameAllocator}; use core::ptr::Unique; pub struct Mapper { p4: Unique>, } impl Mapper { pub unsafe fn new() -> Mapper {...} pub fn p4(&self) -> &Table {...} // the remaining mapping methods, all public } ``` Now we create a new `ActivePageTable` struct in `memory/paging/mod.rs`: ```rust pub use self::mapper::Mapper; use core::ops::{Deref, DerefMut}; pub struct ActivePageTable { mapper: Mapper, } impl Deref for ActivePageTable { type Target = Mapper; fn deref(&self) -> &Mapper { &self.mapper } } impl DerefMut for ActivePageTable { fn deref_mut(&mut self) -> &mut Mapper { &mut self.mapper } } impl ActivePageTable { unsafe fn new() -> ActivePageTable { ActivePageTable { mapper: Mapper::new(), } } pub fn with(&mut self, table: &mut InactivePageTable, f: F) where F: FnOnce(&mut Mapper) // `Mapper` instead of `ActivePageTable` {...} } ``` The [Deref] and [DerefMut] implementations allow us to use the `ActivePageTable` exactly as before, for example we still can call `map_to` on it (because of [deref coercions]). But the closure called in the `with` function can no longer invoke `with` again. The reason is that we changed the type of the generic `F` parameter a bit: Instead of an `ActivePageTable`, the closure just gets a `Mapper` as argument. [Deref]: https://doc.rust-lang.org/nightly/core/ops/trait.Deref.html [DerefMut]: https://doc.rust-lang.org/nightly/core/ops/trait.DerefMut.html [deref coercions]: https://doc.rust-lang.org/nightly/book/deref-coercions.html ### Restoring the Recursive Mapping Right now, the `with` function overwrites the recursive mapping and calls the closure. But it does not restore the previous recursive mapping yet. So let's fix that! To backup the physical P4 frame of the active table, we can either read it from the 511th P4 entry (before we change it) or from the CR3 control register directly. We will do the latter as it should be faster and we already have a external crate that makes it easy: ```rust use x86_64::registers::control_regs; let backup = Frame::containing_address( unsafe { control_regs::cr3() } as usize ); ``` Why is it unsafe? Because reading the CR3 register leads to a CPU exception if the processor is not running in kernel mode ([Ring 0]). But this code will always run in kernel mode, so the `unsafe` block is completely safe here. [Ring 0]: https://wiki.osdev.org/Security#Low-level_Protection_Mechanisms Now that we have a backup of the original P4 frame, we need a way to restore it after the closure has run. So we need to somehow modify the 511th entry of the original P4 frame, which is still the active table in the CPU. But we can't access it because the recursive mapping now points to the inactive table: ![it's not possible to access the original P4 through recursive mapping anymore](recursive_mapping_inactive_table_scheme.svg) It's just not possible to access the active P4 entry in 4 steps, so we can't reach it through the 4-level page table. We could try to overwrite the recursive mapping of the _inactive_ P4 table and point it back to the original P4 frame: ![cyclic map active and inactive P4 tables](cyclic_mapping_inactive_tables.svg) Now we can reach the active P4 entry in 4 steps and could restore the original mapping in the active table. But this hack has a drawback: The inactive table is now invalid since it is no longer recursive mapped. We would need to fix it by using a temporary page again (as above). But if we need a temporary page anyway, we can just use it to map the original P4 frame directly. Thus we avoid the above hack and make the code simpler. So let's do it that way. ### Completing the Implementation The `with` method gets an additional `TemporaryPage` argument, which we use to backup and restore the original recursive mapping: ```rust pub fn with(&mut self, table: &mut InactivePageTable, temporary_page: &mut temporary_page::TemporaryPage, // new f: F) where F: FnOnce(&mut Mapper) { use x86_64::instructions::tlb; use x86_64::registers::control_regs; { let backup = Frame::containing_address( control_regs::cr3().0 as usize); // map temporary_page to current p4 table let p4_table = temporary_page.map_table_frame(backup.clone(), self); // overwrite recursive mapping self.p4_mut()[511].set(table.p4_frame.clone(), PRESENT | WRITABLE); tlb::flush_all(); // execute f in the new context f(self); // restore recursive mapping to original p4 table p4_table[511].set(backup, PRESENT | WRITABLE); tlb::flush_all(); } temporary_page.unmap(self); } ``` Again, the inner scope is needed to end the borrow of `temporary_page` so that we can unmap it again. Note that we need to flush the TLB another time after we restored the original recursive mapping. Now the `with` function is ready to be used! ## Remapping the Kernel Let's tackle the main task of this post: remapping the kernel sections. Therefor we create a `remap_the_kernel` function in `memory/paging/mod.rs`: ```rust use multiboot2::BootInformation; use memory::{PAGE_SIZE, Frame, FrameAllocator}; pub fn remap_the_kernel(allocator: &mut A, boot_info: &BootInformation) where A: FrameAllocator { let mut temporary_page = TemporaryPage::new(Page { number: 0xcafebabe }, allocator); let mut active_table = unsafe { ActivePageTable::new() }; let mut new_table = { let frame = allocator.allocate_frame().expect("no more frames"); InactivePageTable::new(frame, &mut active_table, &mut temporary_page) }; active_table.with(&mut new_table, &mut temporary_page, |mapper| { let elf_sections_tag = boot_info.elf_sections_tag() .expect("Memory map tag required"); for section in elf_sections_tag.sections() { // TODO mapper.identity_map() all pages of `section` } }); } ``` First, we create a temporary page at page number `0xcafebabe`. We could use `0xdeadbeaf` or `0x123456789` as well, as long as the page is unused. The `active_table` and the `new_table` are created using their constructor functions. Then we use the `with` function to temporary change the recursive mapping and execute the closure as if the `new_table` were active. This allows us to map the sections in the new table without changing the active mapping. To get the kernel sections, we use the [Multiboot information structure]. [Multiboot information structure]: @/edition-1/posts/05-allocating-frames/index.md#the-multiboot-information-structure Let's resolve the above `TODO` by identity mapping the sections: ```rust for section in elf_sections_tag.sections() { use self::entry::WRITABLE; if !section.is_allocated() { // section is not loaded to memory continue; } assert!(section.start_address() % PAGE_SIZE == 0, "sections need to be page aligned"); println!("mapping section at addr: {:#x}, size: {:#x}", section.addr, section.size); let flags = WRITABLE; // TODO use real section flags let start_frame = Frame::containing_address(section.start_address()); let end_frame = Frame::containing_address(section.end_address() - 1); for frame in Frame::range_inclusive(start_frame, end_frame) { mapper.identity_map(frame, flags, allocator); } } ``` We skip all sections that were not loaded into memory (e.g. debug sections). We require that all sections are page aligned because a page must not contain sections with different flags. For example, we would need to set the `EXECUTABLE` and `WRITABLE` flags for a page that contains parts of the `.code` and `.data` section. Thus we could modify the running code or execute bytes from the `.data` section as code. To map a section, we iterate over all of its frames of a section by using a new `Frame::range_inclusive` function (shown below). Note that the end address is exclusive, so that it's not part of the section anymore (it's the first byte of the next section). Thus we need to subtract 1 to get the `end_frame`. The `Frame::range_inclusive` function looks like this: ```rust // in src/memory/mod.rs impl Frame { fn range_inclusive(start: Frame, end: Frame) -> FrameIter { FrameIter { start: start, end: end, } } } struct FrameIter { start: Frame, end: Frame, } impl Iterator for FrameIter { type Item = Frame; fn next(&mut self) -> Option { if self.start <= self.end { let frame = self.start.clone(); self.start.number += 1; Some(frame) } else { None } } } ``` Instead of creating a custom iterator, we could have used the [Range] struct of the standard library. But it requires that we implement the [One] and [Add] traits for `Frame`. Then every module could perform arithmetic operations on frames, for example `let frame3 = frame1 + frame2`. This would violate our safety invariants because `frame3` could be already in use. The `range_inclusive` function does not have these problems because it is only available inside the `memory` module. [Range]: https://doc.rust-lang.org/nightly/core/ops/struct.Range.html [One]: https://doc.rust-lang.org/1.10.0/core/num/trait.One.html [Add]: https://doc.rust-lang.org/nightly/core/ops/trait.Add.html ### Page Align Sections Right now our sections aren't page aligned, so the assertion in `remap_the_kernel` would fail. We can fix this by making the section size a multiple of the page size. To do this, we add an `ALIGN` statement to all sections in the linker file. For example: ``` SECTIONS { . = 1M; .text : { *(.text .text.*) . = ALIGN(4K); } } ``` The `.` is the “current location counter” and represents the current virtual address. At the beginning of the `SECTIONS` tag we set it to `1M`, so our kernel starts at 1MiB. We use the [ALIGN][linker align] function to align the current location counter to the next `4K` boundary (`4K` is the page size). Thus the end of the `.text` section – and the beginning of the next section – are page aligned. [linker align]: https://www.math.utah.edu/docs/info/ld_3.html#SEC12 To put all sections on their own page, we add the `ALIGN` statement to all of them: ``` /* src/arch/x86_64/linker.ld */ ENTRY(start) SECTIONS { . = 1M; .rodata : { /* ensure that the multiboot header is at the beginning */ KEEP(*(.multiboot_header)) *(.rodata .rodata.*) . = ALIGN(4K); } .text : { *(.text .text.*) . = ALIGN(4K); } .data : { *(.data .data.*) . = ALIGN(4K); } .bss : { *(.bss .bss.*) . = ALIGN(4K); } .got : { *(.got) . = ALIGN(4K); } .got.plt : { *(.got.plt) . = ALIGN(4K); } .data.rel.ro : ALIGN(4K) { *(.data.rel.ro.local*) *(.data.rel.ro .data.rel.ro.*) . = ALIGN(4K); } .gcc_except_table : ALIGN(4K) { *(.gcc_except_table) . = ALIGN(4K); } } ``` Instead of page aligning the `.multiboot_header` section, we merge it into the `.rodata` section. That way, we don't waste a whole page for the few bytes of the Multiboot header. We could merge it into any section, but `.rodata` fits best because it has the same flags (neither writable nor executable). The Multiboot header still needs to be at the beginning of the file, so `.rodata` must be our first section now. ### Testing it Time to test it! We re-export the `remap_the_kernel` function from the memory module and call it from `rust_main`: ```rust // in src/memory/mod.rs pub use self::paging::remap_the_kernel; ``` ```rust // in src/lib.rs #[no_mangle] pub extern "C" fn rust_main(multiboot_information_address: usize) { // ATTENTION: we have a very small stack and no guard page // the same as before vga_buffer::clear_screen(); println!("Hello World{}", "!"); let boot_info = unsafe { multiboot2::load(multiboot_information_address) }; let memory_map_tag = boot_info.memory_map_tag() .expect("Memory map tag required"); let elf_sections_tag = boot_info.elf_sections_tag() .expect("Elf sections tag required"); let kernel_start = elf_sections_tag.sections().map(|s| s.addr) .min().unwrap(); let kernel_end = elf_sections_tag.sections().map(|s| s.addr + s.size) .max().unwrap(); let multiboot_start = multiboot_information_address; let multiboot_end = multiboot_start + (boot_info.total_size as usize); println!("kernel start: 0x{:x}, kernel end: 0x{:x}", kernel_start, kernel_end); println!("multiboot start: 0x{:x}, multiboot end: 0x{:x}", multiboot_start, multiboot_end); let mut frame_allocator = memory::AreaFrameAllocator::new( kernel_start as usize, kernel_end as usize, multiboot_start, multiboot_end, memory_map_tag.memory_areas()); // this is the new part memory::remap_the_kernel(&mut frame_allocator, boot_info); println!("It did not crash!"); loop {} } ``` If you see the `It did not crash` message, the kernel survived our page table modifications without causing a CPU exception. But did we map the kernel sections correctly? Let's try it out by switching to the new table! We identity map all kernel sections, so it should work without problems. ## Switching Tables Switching tables is easy. We just need to reload the `CR3` register with the physical address of the new P4 frame. We do this in a new `ActivePageTable::switch` method: ```rust // in `impl ActivePageTable` in src/memory/paging/mod.rs pub fn switch(&mut self, new_table: InactivePageTable) -> InactivePageTable { use x86_64::PhysicalAddress; use x86_64::registers::control_regs; let old_table = InactivePageTable { p4_frame: Frame::containing_address( control_regs::cr3().0 as usize ), }; unsafe { control_regs::cr3_write(PhysicalAddress( new_table.p4_frame.start_address() as u64)); } old_table } ``` This function activates the given inactive table and returns the previous active table as a `InactivePageTable`. We don't need to flush the TLB here, as the CPU does it automatically when the P4 table is switched. In fact, the `tlb::flush_all` function, which we used above, does nothing more than [reloading the CR3 register]. [reloading the CR3 register]: https://docs.rs/x86_64/0.1.2/src/x86_64/instructions/tlb.rs.html#11-14 Now we are finally able to switch to the new table. We do it by adding the following lines to our `remap_the_kernel` function: ```rust // in remap_the_kernel in src/memory/paging/mod.rs ... active_table.with(&mut new_table, &mut temporary_page, |mapper| { ... }); let old_table = active_table.switch(new_table); println!("NEW TABLE!!!"); ``` Let's cross our fingers and run it… … and it fails with a boot loop. ### Debugging A QEMU boot loop indicates that some CPU exception occurred. We can see all thrown CPU exception by starting QEMU with `-d int`: ```bash > qemu-system-x86_64 -d int -no-reboot -cdrom build/os-x86_64.iso ... check_exception old: 0xffffffff new 0xe 0: v=0e e=0002 i=0 cpl=0 IP=0008:000000000010ab97 pc=000000000010ab97 SP=0010:00000000001182d0 CR2=00000000000b8f00 ... ``` These lines are the important ones. We can read many useful information from them: - `v=0e`: An exception with number `0xe` occurred, which is a page fault according to the [OSDev Wiki][osdev exception overview]. - `e=0002`: The CPU set an [error code][page fault error code], which tells us why the exception occurred. The `0x2` bit tells us that it was caused by a write operation. And since the `0x1` bit is not set, the target page was not present. - `IP=0008:000000000010ab97` or `pc=000000000010ab97`: The program counter register tells us that the exception occurred when the CPU tried to execute the instruction at `0x10ab97`. We can disassemble this address to see the corresponding function. The `0008:` prefix in `IP` indicates the code [GDT segment]. - `SP=0010:00000000001182d0`: The stack pointer was `0x1182d0` (the `0010:` prefix indicates the data [GDT segment]). This tells us if it the stack overflowed. - `CR2=00000000000b8f00`: Finally the most useful register. It tells us which virtual address caused the page fault. In our case it's `0xb8f00`, which is part of the [VGA text buffer]. [osdev exception overview]: https://wiki.osdev.org/Exceptions [page fault]: https://wiki.osdev.org/Exceptions#Page_Fault [page fault error code]: https://wiki.osdev.org/Exceptions#Error_code [GDT segment]: @/edition-1/posts/02-entering-longmode/index.md#loading-the-gdt [VGA text buffer]: @/edition-1/posts/04-printing-to-screen/index.md#the-vga-text-buffer So let's find out which function caused the exception: ``` objdump -d build/kernel-x86_64.bin | grep -B100 "10ab97" ``` We disassemble our kernel and search for `10ab97`. The `-B100` option prints the 100 preceding lines too. The output tells us the responsible function: ``` ... 000000000010aa80 <_ZN10vga_buffer6Writer10write_byte20h4601f5e405b6e89facaE>: 10aa80: 55 push %rbp ... 10ab93: 66 8b 55 aa mov -0x56(%rbp),%dx 10ab97: 66 89 14 48 mov %dx,(%rax,%rcx,2) ``` The reason for the cryptical function name is Rust's [name mangling]. But we can identity the `vga_buffer::Writer::write_byte` function nonetheless. [name mangling]: https://en.wikipedia.org/wiki/Name_mangling So the reason for the page fault is that the `write_byte` function tried to write to the VGA text buffer at `0xb8f00`. Of course this provokes a page fault: We forgot to identity map the VGA buffer in the new page table. The fix is pretty simple: ```rust // in src/memory/paging/mod.rs pub fn remap_the_kernel(allocator: &mut A, boot_info: &BootInformation) where A: FrameAllocator { ... active_table.with(&mut new_table, &mut temporary_page, |mapper| { ... for section in elf_sections_tag.sections() { ... } // identity map the VGA text buffer let vga_buffer_frame = Frame::containing_address(0xb8000); // new mapper.identity_map(vga_buffer_frame, WRITABLE, allocator); // new }); let old_table = active_table.switch(new_table); println!("NEW TABLE!!!"); } ``` Now we should see the `NEW TABLE!!!` message (and also the `It did not crash!` line again). Congratulations! We successfully switched our kernel to a new page table! ### Fixing the Frame Allocator The same problem as above occurs when we try to use our [AreaFrameAllocator] again. Try to add the following to `rust_main` after switching to the new table: [AreaFrameAllocator]: @/edition-1/posts/05-allocating-frames/index.md#the-allocator ```rust // in src/lib.rs pub extern "C" fn rust_main(multiboot_information_address: usize) { ... memory::remap_the_kernel(&mut frame_allocator, boot_info); frame_allocator.allocate_frame(); // new: try to allocate a frame println!("It did not crash!"); ``` This causes the same bootloop as above. The reason is that the `AreaFrameAllocator` uses the memory map of the Multiboot information structure. But we did not map the Multiboot structure, so it causes a page fault. To fix it, we identity map it as well: ```rust // in `remap_the_kernel` in src/memory/paging/mod.rs active_table.with(&mut new_table, &mut temporary_page, |mapper| { // … identity map the allocated kernel sections // … identity map the VGA text buffer // new: // identity map the multiboot info structure let multiboot_start = Frame::containing_address(boot_info.start_address()); let multiboot_end = Frame::containing_address(boot_info.end_address() - 1); for frame in Frame::range_inclusive(multiboot_start, multiboot_end) { mapper.identity_map(frame, PRESENT, allocator); } }); ``` Normally the multiboot struct fits on one page. But GRUB can place it anywhere, so it could randomly cross a page boundary. Therefore we use `range_inclusive` to be on the safe side. Note that we need to subtract 1 to get the address of the last byte because the end address is exclusive. Now we should be able to allocate frames again. ## Using the Correct Flags Right now, our new table maps all kernel sections as writable and executable. To fix this, we add a `EntryFlags::from_elf_section_flags` function: ```rust // in src/memory/paging/entry.rs use multiboot2::ElfSection; impl EntryFlags { pub fn from_elf_section_flags(section: &ElfSection) -> EntryFlags { use multiboot2::{ELF_SECTION_ALLOCATED, ELF_SECTION_WRITABLE, ELF_SECTION_EXECUTABLE}; let mut flags = EntryFlags::empty(); if section.flags().contains(ELF_SECTION_ALLOCATED) { // section is loaded to memory flags = flags | PRESENT; } if section.flags().contains(ELF_SECTION_WRITABLE) { flags = flags | WRITABLE; } if !section.flags().contains(ELF_SECTION_EXECUTABLE) { flags = flags | NO_EXECUTE; } flags } } ``` It just converts the ELF section flags to page table flags. Now we can use it to fix the `TODO` in our `remap_the_kernel` function: ```rust // in src/memory/paging/mod.rs pub fn remap_the_kernel(allocator: &mut A, boot_info: &BootInformation) where A: FrameAllocator { ... active_table.with(&mut new_table, &mut temporary_page, |mapper| { ... for section in elf_sections_tag.sections() { ... if !section.is_allocated() { // section is not loaded to memory continue; } ... // this is the new part let flags = EntryFlags::from_elf_section_flags(section); ... for frame in Frame::range_inclusive(start_frame, end_frame) { mapper.identity_map(frame, flags, allocator); } } ... }); ... } ``` But when we test it now, we get a page fault again. We can use the same technique as above to get the responsible function. I won't bother you with the QEMU output and just tell you the results: This time the responsible function is `control_regs::cr3_write()` itself. From the [error code][page fault error code] we learn that it was a page protection violation and caused by “reading a 1 in a reserved field”. So the page table had some reserved bit set that should be always 0. It must be the `NO_EXECUTE` flag, since it's the only new bit that we set in the page table. ### The NXE Bit The reason is that the `NO_EXECUTE` bit must only be used when the `NXE` bit in the [Extended Feature Enable Register] \(EFER) is set. That register is similar to Rust's feature gating and can be used to enable all sorts of advanced CPU features. Since the `NXE` bit is off by default, we caused a page fault when we added the `NO_EXECUTE` bit to the page table. [Extended Feature Enable Register]: https://en.wikipedia.org/wiki/Control_register#EFER So we need to enable the `NXE` bit. For that we use the [x86_64 crate] again: [x86_64 crate]: https://docs.rs/x86_64 ```rust // in lib.rs fn enable_nxe_bit() { use x86_64::registers::msr::{IA32_EFER, rdmsr, wrmsr}; let nxe_bit = 1 << 11; unsafe { let efer = rdmsr(IA32_EFER); wrmsr(IA32_EFER, efer | nxe_bit); } } ``` The unsafe block is needed since accessing the `EFER` register is only allowed in kernel mode. But we are in kernel mode, so everything is fine. When we call this function before calling `remap_the_kernel`, everything should work again. ### The Write Protect Bit Right now, we are still able to modify the `.code` and `.rodata` sections, even though we did not set the `WRITABLE` flag for them. The reason is that the CPU ignores this bit in kernel mode by default. To enable write protection for the kernel as well, we need to set the _Write Protect_ bit in the `CR0` register: ```rust // in lib.rs fn enable_write_protect_bit() { use x86_64::registers::control_regs::{cr0, cr0_write, Cr0}; unsafe { cr0_write(cr0() | Cr0::WRITE_PROTECT) }; } ``` The `cr0` functions are unsafe because accessing the `CR0` register is only allowed in kernel mode. If we haven't forgotten to set the `WRITABLE` flag somewhere, it should still work without crashing. ## Creating a Guard Page The final step is to create a guard page for our kernel stack. The decision to place the kernel stack right above the page tables was already useful to detect a silent stack overflow in the [previous post][silent stack overflow]. Now we profit from it again. Let's look at our assembly `.bss` section again to understand why: [silent stack overflow]: @/edition-1/posts/06-page-tables/index.md ```nasm ; in src/arch/x86_64/boot.asm section .bss align 4096 p4_table: resb 4096 p3_table: resb 4096 p2_table: resb 4096 stack_bottom: resb 4096 * 4 stack_top: ``` The old page tables are right below the stack. They are still identity mapped since they are part of the kernel's `.bss` section. We just need to turn the old `p4_table` into a guard page to secure the kernel stack. That way we even reuse the memory of the old P3 and P2 tables to increase the stack size. So let's implement it: ```rust // in src/memory/paging/mod.rs pub fn remap_the_kernel(allocator: &mut A, boot_info: &BootInformation) where A: FrameAllocator { ... let old_table = active_table.switch(new_table); println!("NEW TABLE!!!"); // below is the new part // turn the old p4 page into a guard page let old_p4_page = Page::containing_address( old_table.p4_frame.start_address() ); active_table.unmap(old_p4_page, allocator); println!("guard page at {:#x}", old_p4_page.start_address()); } ``` Now we have a very basic guard page: The page below the stack is unmapped, so a stack overflow causes an immediate page fault. Thus, silent stack overflows are no longer possible. Or to be precise, they are improbable. If we have a function with many big stack variables, it's possible that the guard page is missed. For example, the following function could still corrupt memory below the stack: ```rust fn stack_overflow() { let x = [0; 99999]; } ``` This creates a very big array on the stack, which is currently filled from bottom to top. Therefore it misses the guard page and overwrites some memory below the stack. Eventually it hits the bottom of the guard page and causes a page fault. But before, it messes up memory, which is bad. Fortunately, there exists a solution called _stack probes_. The basic idea is to check all required stack pages at the beginning of each function. For example, a function that needs 9000 bytes on the stack would try to access `SP + 0`, `SP + 4096`, and `SP + 2 * 4096` (`SP` is the stack pointer). If the stack is not big enough, the guard page is hit and a page fault occurs. The function can't mess up memory anymore since the stack check occurs right at its start. Unfortunately stack probes require compiler support. They already work on Windows but they don't exist on Linux yet. The problem seems to be in LLVM, which Rust uses as backend. Hopefully it gets resolved soon so that our kernel stack becomes safe. For the current status and more information about stack probes check out the [tracking issue][stack probes issue]. [stack probes issue]: https://github.com/rust-lang/rust/issues/16012#issuecomment-160380183 ## What's next? Now that we have a (mostly) safe kernel stack and a working page table module, we can add a virtual memory allocator. The [next post] will explore Rust's allocator API and create a very basic allocator. At the end of that post, we will be able to use Rust's allocation and collections types such as [Box], [Vec], or even [BTreeMap]. [next post]: @/edition-1/posts/08-kernel-heap/index.md [Box]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html [Vec]: https://doc.rust-lang.org/1.10.0/collections/vec/struct.Vec.html [BTreeMap]: https://doc.rust-lang.org/1.10.0/collections/btree_map/struct.BTreeMap.html ## Footnotes [^fn-debug-notes]: For this post the most useful GDB command is probably `p/x *((long int*)0xfffffffffffff000)@512`. It prints all entries of the recursively mapped P4 table by interpreting it as an array of 512 long ints (the `@512` is GDB's array syntax). Of course you can also print other tables by adjusting the address. ================================================ FILE: blog/content/edition-1/posts/08-kernel-heap/index.md ================================================ +++ title = "Kernel Heap" weight = 8 path = "kernel-heap" aliases = ["kernel-heap.html"] date = 2016-04-11 template = "edition-1/page.html" [extra] updated = "2017-11-19" +++ In the previous posts we created a [frame allocator] and a [page table module]. Now we are ready to create a kernel heap and a memory allocator. Thus, we will unlock `Box`, `Vec`, `BTreeMap`, and the rest of the [alloc] crate. [frame allocator]: @/edition-1/posts/05-allocating-frames/index.md [page table module]: @/edition-1/posts/06-page-tables/index.md [alloc]: https://doc.rust-lang.org/nightly/alloc/index.html As always, you can find the complete source code on [GitHub]. Please file [issues] for any problems, questions, or improvement suggestions. There is also a comment section at the end of this page. [GitHub]: https://github.com/phil-opp/blog_os/tree/first_edition_post_8 [issues]: https://github.com/phil-opp/blog_os/issues ## Introduction The _heap_ is the memory area for long-lived allocations. The programmer can access it by using types like [Box][Box rustbyexample] or [Vec]. Behind the scenes, the compiler manages that memory by inserting calls to some memory allocator. By default, Rust links to the [jemalloc] allocator (for binaries) or the system allocator (for libraries). However, both rely on [system calls] such as [sbrk] and are thus unusable in our kernel. So we need to create and link our own allocator. [Box rustbyexample]: https://doc.rust-lang.org/rust-by-example/std/box.html [Vec]: https://doc.rust-lang.org/book/vectors.html [jemalloc]: http://jemalloc.net/ [system calls]: https://en.wikipedia.org/wiki/System_call [sbrk]: https://en.wikipedia.org/wiki/Sbrk A good allocator is fast and reliable. It also effectively utilizes the available memory and keeps [fragmentation] low. Furthermore, it works well for concurrent applications and scales to any number of processors. It even optimizes the memory layout with respect to the CPU caches to improve [cache locality] and avoid [false sharing]. [cache locality]: https://www.geeksforgeeks.org/locality-of-reference-and-cache-operation-in-cache-memory/ [fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing) [false sharing]: https://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html These requirements make good allocators pretty complex. For example, [jemalloc] has over 30.000 lines of code. This complexity is out of scope for our kernel, so we will create a much simpler allocator. Nevertheless, it should suffice for the foreseeable future, since we'll allocate only when it's absolutely necessary. ## The Allocator Interface The allocator interface in Rust is defined through the [`Alloc` trait], which looks like this: [`Alloc` trait]: https://doc.rust-lang.org/1.20.0/alloc/allocator/trait.Alloc.html ```rust pub unsafe trait Alloc { unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>; unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout); […] // about 13 methods with default implementations } ``` The `alloc` method should allocate a memory block with the size and alignment given through `Layout` parameter. The `deallocate` method should free such memory blocks again. Both methods are `unsafe`, as is the trait itself. This has different reasons: - Implementing the `Alloc` trait is unsafe, because the implementation must satisfy a set of contracts. Among other things, pointers returned by `alloc` must point to valid memory and adhere to the `Layout` requirements. - Calling `alloc` is unsafe because the caller must ensure that the passed layout does not have size zero. I think this is because of compatibility reasons with existing C-allocators, where zero-sized allocations are undefined behavior. - Calling `dealloc` is unsafe because the caller must guarantee that the passed parameters adhere to the contract. For example, `ptr` must denote a valid memory block allocated via this allocator. To set the system allocator, the `global_allocator` attribute can be added to a `static` that implements `Alloc` for a shared reference of itself. For example: ```rust #[global_allocator] static MY_ALLOCATOR: MyAllocator = MyAllocator {...}; impl<'a> Alloc for &'a MyAllocator { unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr> {...} unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout) {...} } ``` Note that `Alloc` needs to be implemented for `&MyAllocator`, not for `MyAllocator`. The reason is that the `alloc` and `dealloc` methods require mutable `self` references, but there's no way to get such a reference safely from a `static`. By requiring implementations for `&MyAllocator`, the global allocator interface avoids this problem and pushes the burden of synchronization onto the user. ## Including the alloc crate The `Alloc` trait is part of the `alloc` crate, which like `core` is a subset of Rust's standard library. Apart from the trait, the crate also contains the standard types that require allocations such as `Box`, `Vec` and `Arc`. We can include it through a simple `extern crate`: ```rust // in src/lib.rs #![feature(alloc)] // the alloc crate is still unstable [...] #[macro_use] extern crate alloc; ``` We don't need to add anything to our Cargo.toml, since the `alloc` crate is part of the standard library and shipped with the Rust compiler. The `alloc` crate provides the [format!] and [vec!] macros, so we use `#[macro_use]` to import them. [format!]: https://doc.rust-lang.org/1.10.0/collections/macro.format!.html [vec!]: https://doc.rust-lang.org/1.10.0/collections/macro.vec!.html When we try to compile our crate now, the following error occurs: ``` error[E0463]: can't find crate for `alloc` --> src/lib.rs:10:1 | 16 | extern crate alloc; | ^^^^^^^^^^^^^^^^^^^ can't find crate ``` The problem is that [`xargo`] only cross compiles `libcore` by default. To also cross compile the `alloc` crate, we need to create a file named `Xargo.toml` in our project root (right next to the `Cargo.toml`) with the following content: [`xargo`]: https://github.com/japaric/xargo ```toml [target.x86_64-blog_os.dependencies] alloc = {} ``` This instructs `xargo` that we also need `alloc`. It still doesn't compile, since we need to define a global allocator in order to use the `alloc` crate: ``` error: no #[default_lib_allocator] found but one is required; is libstd not linked? ``` ## A Bump Allocator For our first allocator, we start simple. We create a `memory::heap_allocator` module containing a so-called _bump allocator_: ```rust // in src/memory/mod.rs mod heap_allocator; // in src/memory/heap_allocator.rs use alloc::heap::{Alloc, AllocErr, Layout}; /// A simple allocator that allocates memory linearly and ignores freed memory. #[derive(Debug)] pub struct BumpAllocator { heap_start: usize, heap_end: usize, next: usize, } impl BumpAllocator { pub const fn new(heap_start: usize, heap_end: usize) -> Self { Self { heap_start, heap_end, next: heap_start } } } unsafe impl Alloc for BumpAllocator { unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr> { let alloc_start = align_up(self.next, layout.align()); let alloc_end = alloc_start.saturating_add(layout.size()); if alloc_end <= self.heap_end { self.next = alloc_end; Ok(alloc_start as *mut u8) } else { Err(AllocErr::Exhausted{ request: layout }) } } unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout) { // do nothing, leak memory } } ``` We also need to add `#![feature(allocator_api)]` to our `lib.rs`, since the allocator API is still unstable. The `heap_start` and `heap_end` fields contain the start and end address of our kernel heap. The `next` field contains the next free address and is increased after every allocation. To `allocate` a memory block we align the `next` address using the `align_up` function (described below). Then we add up the desired `size` and make sure that we don't exceed the end of the heap. We use a saturating add so that the `alloc_end` cannot overflow, which could lead to an invalid allocation. If everything goes well, we update the `next` address and return a pointer to the start address of the allocation. Else, we return `None`. ### Alignment In order to simplify alignment, we add `align_down` and `align_up` functions: ``` rust /// Align downwards. Returns the greatest x with alignment `align` /// so that x <= addr. The alignment must be a power of 2. pub fn align_down(addr: usize, align: usize) -> usize { if align.is_power_of_two() { addr & !(align - 1) } else if align == 0 { addr } else { panic!("`align` must be a power of 2"); } } /// Align upwards. Returns the smallest x with alignment `align` /// so that x >= addr. The alignment must be a power of 2. pub fn align_up(addr: usize, align: usize) -> usize { align_down(addr + align - 1, align) } ``` Let's start with `align_down`: If the alignment is a valid power of two (i.e. in `{1,2,4,8,…}`), we use some bitwise operations to return the aligned address. It works because every power of two has exactly one bit set in its binary representation. For example, the numbers `{1,2,4,8,…}` are `{1,10,100,1000,…}` in binary. By subtracting 1 we get `{0,01,011,0111,…}`. These binary numbers have a `1` at exactly the positions that need to be zeroed in `addr`. For example, the last 3 bits need to be zeroed for a alignment of 8. To align `addr`, we create a [bitmask] from `align-1`. We want a `0` at the position of each `1`, so we invert it using `!`. After that, the binary numbers look like this: `{…11111,…11110,…11100,…11000,…}`. Finally, we zero the correct bits using a binary `AND`. [bitmask]: https://en.wikipedia.org/wiki/Mask_(computing) Aligning upwards is simple now. We just increase `addr` by `align-1` and call `align_down`. We add `align-1` instead of `align` because we would otherwise waste `align` bytes for already aligned addresses. ### Reusing Freed Memory The heap memory is limited, so we should reuse freed memory for new allocations. This sounds simple, but is not so easy in practice since allocations can live arbitrarily long (and can be freed in an arbitrary order). This means that we need some kind of data structure to keep track of which memory areas are free and which are in use. This data structure should be very optimized since it causes overheads in both space (i.e. it needs backing memory) and time (i.e. accessing and organizing it needs CPU cycles). Our bump allocator only keeps track of the next free memory address, which doesn't suffice to keep track of freed memory areas. So our only choice is to ignore deallocations and leak the corresponding memory. Thus our allocator quickly runs out of memory in a real system, but it suffices for simple testing. Later in this post, we will introduce a better allocator that does not leak freed memory. ### Using it as System Allocator Above we saw that we can use a static allocator as system allocator through the `global_allocator` attribute: ```rust #[global_allocator] static ALLOCATOR: MyAllocator = MyAllocator {...}; ``` This requires an implementation of `Alloc` for `&MyAllocator`, i.e. a shared reference. If we try to add such an implementation for our bump allocator (`unsafe impl<'a> Alloc for &'a BumpAllocator`), we have a problem: Our `alloc` method requires updating the `next` field, which is not possible for a shared reference. One solution could be to put the bump allocator behind a Mutex and wrap it into a new type, for which we can implement `Alloc` for a shared reference: ```rust struct LockedBumpAllocator(Mutex); impl<'a> Alloc for &'a LockedBumpAllocator { unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr> { self.0.lock().alloc(layout) } unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout) { self.0.lock().dealloc(ptr, layout) } } ``` However, there is a more interesting solution for our bump allocator that avoids locking altogether. The idea is to exploit that we only need to update a single `usize` field byusing an `AtomicUsize` type. This type uses special synchronized hardware instructions to ensure data race freedom without requiring locks. #### A lock-free Bump Allocator A lock-free implementation looks like this: ```rust use core::sync::atomic::{AtomicUsize, Ordering}; /// A simple allocator that allocates memory linearly and ignores freed memory. #[derive(Debug)] pub struct BumpAllocator { heap_start: usize, heap_end: usize, next: AtomicUsize, } impl BumpAllocator { pub const fn new(heap_start: usize, heap_end: usize) -> Self { // NOTE: requires adding #![feature(const_atomic_usize_new)] to lib.rs Self { heap_start, heap_end, next: AtomicUsize::new(heap_start) } } } unsafe impl<'a> Alloc for &'a BumpAllocator { unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr> { loop { // load current state of the `next` field let current_next = self.next.load(Ordering::Relaxed); let alloc_start = align_up(current_next, layout.align()); let alloc_end = alloc_start.saturating_add(layout.size()); if alloc_end <= self.heap_end { // update the `next` pointer if it still has the value `current_next` let next_now = self.next.compare_and_swap(current_next, alloc_end, Ordering::Relaxed); if next_now == current_next { // next address was successfully updated, allocation succeeded return Ok(alloc_start as *mut u8); } } else { return Err(AllocErr::Exhausted{ request: layout }) } } } unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout) { // do nothing, leak memory } } ``` The implementation is a bit more complicated now. First, there is now a `loop` around the whole method body, since we might need multiple tries until we succeed (e.g. if multiple threads try to allocate at the same time). Also, the loads operation is an explicit method call now, i.e. `self.next.load(Ordering::Relaxed)` instead of just `self.next`. The ordering parameter makes it possible to restrict the automatic instruction reordering performed by both the compiler and the CPU itself. For example, it is used when implementing locks to ensure that no write to the locked variable happens before the lock is acquired. We don't have such requirements, so we use the less restrictive `Relaxed` ordering. The heart of this lock-free method is the `compare_and_swap` call that updates the `next` address: ```rust ... let next_now = self.next.compare_and_swap(current_next, alloc_end, Ordering::Relaxed); if next_now == current_next { // next address was successfully updated, allocation succeeded return Ok(alloc_start as *mut u8); } ... ``` Compare-and-swap is a special CPU instruction that updates a variable with a given value if it still contains the value we expect. If it doesn't, it means that another thread updated the value simultaneously, so we need to try again. The important feature is that this happens in a single uninteruptible operation (thus the name `atomic`), so no partial updates or intermediate states are possible. In detail, `compare_and_swap` works by comparing `next` with the first argument and, in case they're equal, updates `next` with the second parameter (the previous value is returned). To find out whether a switch happened, we check the returned previous value of `next`. If it is equal to the first parameter, the values were swapped. Otherwise, we try again in the next loop iteration. #### Setting the Global Allocator Now we can define a static bump allocator, that we can set as system allocator: ```rust pub const HEAP_START: usize = 0o_000_001_000_000_0000; pub const HEAP_SIZE: usize = 100 * 1024; // 100 KiB #[global_allocator] static HEAP_ALLOCATOR: BumpAllocator = BumpAllocator::new(HEAP_START, HEAP_START + HEAP_SIZE); ``` We use `0o_000_001_000_000_0000` as heap start address, which is the address starting at the second `P3` entry. It doesn't really matter which address we choose here as long as it's unused. We use a heap size of 100 KiB, which should be large enough for the near future. Putting the above in the `memory::heap_allocator` module would make most sense, but unfortunately there is currently a [weird bug][global allocator bug] in the global allocator implementation that requires putting the global allocator in the root module. I hope it's fixed soon, but until then we need to put the above lines in `src/lib.rs`. For that, we need to make the `memory::heap_allocator` module public and add an import for `BumpAllocator`. We also need to add the `#![feature(global_allocator)]` at the top of our `lib.rs`, since the `global_allocator` attribute is still unstable. [global allocator bug]: https://github.com/rust-lang/rust/issues/44113 That's it! We have successfully created and linked a custom system allocator. Now we're ready to test it. ### Testing We should be able to allocate memory on the heap now. Let's try it in our `rust_main`: ```rust // in rust_main in src/lib.rs use alloc::boxed::Box; let heap_test = Box::new(42); ``` When we run it, a triple fault occurs and causes permanent rebooting. Let's try debug it using QEMU and objdump as described [in the previous post][qemu debugging]: [qemu debugging]: @/edition-1/posts/07-remap-the-kernel/index.md#debugging ``` > qemu-system-x86_64 -d int -no-reboot -cdrom build/os-x86_64.iso … check_exception old: 0xffffffff new 0xe 0: v=0e e=0002 i=0 cpl=0 IP=0008:0000000000102860 pc=0000000000102860 SP=0010:0000000000116af0 CR2=0000000040000000 … ``` Aha! It's a [page fault] \(`v=0e`) and was caused by the code at `0x102860`. The code tried to write (`e=0002`) to address `0x40000000`. This address is `0o_000_001_000_000_0000` in octal, which is the `HEAP_START` address defined above. Of course it page-faults: We have forgotten to map the heap memory to some physical memory. [page fault]: https://wiki.osdev.org/Exceptions#Page_Fault ### Some Refactoring In order to map the heap cleanly, we do a bit of refactoring first. We move all memory initialization from our `rust_main` to a new `memory::init` function. Now our `rust_main` looks like this: ```rust // in src/lib.rs #[no_mangle] pub extern "C" fn rust_main(multiboot_information_address: usize) { // ATTENTION: we have a very small stack and no guard page vga_buffer::clear_screen(); println!("Hello World{}", "!"); let boot_info = unsafe { multiboot2::load(multiboot_information_address) }; enable_nxe_bit(); enable_write_protect_bit(); // set up guard page and map the heap pages memory::init(boot_info); use alloc::boxed::Box; let heap_test = Box::new(42); println!("It did not crash!"); loop {} } ``` The `memory::init` function looks like this: ```rust // in src/memory/mod.rs use multiboot2::BootInformation; pub fn init(boot_info: &BootInformation) { let memory_map_tag = boot_info.memory_map_tag().expect( "Memory map tag required"); let elf_sections_tag = boot_info.elf_sections_tag().expect( "Elf sections tag required"); let kernel_start = elf_sections_tag.sections() .filter(|s| s.is_allocated()).map(|s| s.addr).min().unwrap(); let kernel_end = elf_sections_tag.sections() .filter(|s| s.is_allocated()).map(|s| s.addr + s.size).max() .unwrap(); println!("kernel start: {:#x}, kernel end: {:#x}", kernel_start, kernel_end); println!("multiboot start: {:#x}, multiboot end: {:#x}", boot_info.start_address(), boot_info.end_address()); let mut frame_allocator = AreaFrameAllocator::new( kernel_start as usize, kernel_end as usize, boot_info.start_address(), boot_info.end_address(), memory_map_tag.memory_areas()); paging::remap_the_kernel(&mut frame_allocator, boot_info); } ``` We've just moved the code to a new function. However, we've sneaked some improvements in: - An additional `.filter(|s| s.is_allocated())` in the calculation of `kernel_start` and `kernel_end`. This ignores all sections that aren't loaded to memory (such as debug sections). Thus, the kernel end address is no longer artificially increased by such sections. - We use the `start_address()` and `end_address()` methods of `boot_info` instead of calculating the addresses manually. - We use the alternate `{:#x}` form when printing kernel/multiboot addresses. Before, we used `0x{:x}`, which leads to the same result. For a complete list of these “alternate” formatting forms, check out the [std::fmt documentation]. [std::fmt documentation]: https://doc.rust-lang.org/nightly/std/fmt/index.html#sign0 ### Safety It is important that the `memory::init` function is called only once, because it creates a new frame allocator based on kernel and multiboot start/end. When we call it a second time, a new frame allocator is created that reassigns the same frames, even if they are already in use. In the second call it would use an identical frame allocator to remap the kernel. The `remap_the_kernel` function would request a frame from the frame allocator to create a new page table. But the returned frame is already in use, since we used it to create our current page table in the first call. In order to initialize the new table, the function zeroes it. This is the point where everything breaks, since we zero our current page table. The CPU is unable to read the next instruction and throws a page fault. So we need to ensure that `memory::init` can be only called once. We could mark it as `unsafe`, which would bring it in line with Rust's memory safety rules. However, that would just push the unsafety to the caller. The caller can still accidentally call the function twice, the only difference is that the mistake needs to happen inside `unsafe` blocks. A better solution is to insert a check at the function's beginning, that panics if the function is called a second time. This approach has a small runtime cost, but we only call it once, so it's negligible. And we avoid two `unsafe` blocks (one at the calling site and one at the function itself), which is always good. In order to make such checks easy, I created a small crate named [once]. To add it, we run `cargo add once` and add the following to our `src/lib.rs`: [once]: https://crates.io/crates/once ```rust // in src/lib.rs #[macro_use] extern crate once; ``` The crate provides an [assert_has_not_been_called!] macro (sorry for the long name :D). We can use it to fix the safety problem easily: [assert_has_not_been_called!]: https://docs.rs/once/0.3.2/once/macro.assert_has_not_been_called!.html ``` rust // in src/memory/mod.rs pub fn init(boot_info: &BootInformation) { assert_has_not_been_called!("memory::init must be called only once"); let memory_map_tag = ... ... } ``` That's it. Now our `memory::init` function can only be called once. The macro works by creating a static [AtomicBool] named `CALLED`, which is initialized to `false`. When the macro is invoked, it checks the value of `CALLED` and sets it to `true`. If the value was already `true` before, the macro panics. [AtomicBool]: https://doc.rust-lang.org/nightly/core/sync/atomic/struct.AtomicBool.html ### Mapping the Heap Now we're ready to map the heap pages. In order to do it, we need access to the `ActivePageTable` or `Mapper` instance (see the [page table] and [kernel remapping] posts). For that we return it from the `paging::remap_the_kernel` function: [page table]: @/edition-1/posts/06-page-tables/index.md [kernel remapping]: @/edition-1/posts/07-remap-the-kernel/index.md ```rust // in src/memory/paging/mod.rs pub fn remap_the_kernel(allocator: &mut A, boot_info: &BootInformation) -> ActivePageTable // new where A: FrameAllocator { ... println!("guard page at {:#x}", old_p4_page.start_address()); active_table // new } ``` Now we have full page table access in the `memory::init` function. This allows us to map the heap pages to physical frames: ```rust // in src/memory/mod.rs pub fn init(boot_info: &BootInformation) { ... let mut frame_allocator = ...; // below is the new part let mut active_table = paging::remap_the_kernel(&mut frame_allocator, boot_info); use self::paging::Page; use {HEAP_START, HEAP_SIZE}; let heap_start_page = Page::containing_address(HEAP_START); let heap_end_page = Page::containing_address(HEAP_START + HEAP_SIZE-1); for page in Page::range_inclusive(heap_start_page, heap_end_page) { active_table.map(page, paging::WRITABLE, &mut frame_allocator); } } ``` The `Page::range_inclusive` function is just a copy of the `Frame::range_inclusive` function: ```rust // in src/memory/paging/mod.rs #[derive(…, PartialEq, Eq, PartialOrd, Ord)] pub struct Page {...} impl Page { ... pub fn range_inclusive(start: Page, end: Page) -> PageIter { PageIter { start: start, end: end, } } } pub struct PageIter { start: Page, end: Page, } impl Iterator for PageIter { type Item = Page; fn next(&mut self) -> Option { if self.start <= self.end { let page = self.start; self.start.number += 1; Some(page) } else { None } } } ``` Now we map the whole heap to physical pages. This needs some time and might introduce a noticeable delay when we increase the heap size in the future. Another drawback is that we consume a large amount of physical frames even though we might not need the whole heap space. We will fix these problems in a future post by mapping the pages lazily. ### It works! Now `Box` and `Vec` should work. For example: ```rust // in rust_main in src/lib.rs use alloc::boxed::Box; let mut heap_test = Box::new(42); *heap_test -= 15; let heap_test2 = Box::new("hello"); println!("{:?} {:?}", heap_test, heap_test2); let mut vec_test = vec![1,2,3,4,5,6,7]; vec_test[3] = 42; for i in &vec_test { print!("{} ", i); } ``` We can also use all other types of the `alloc` crate, including: - the reference counted pointers [Rc] and [Arc] - the owned string type [String] and the [format!] macro - [Linked List] - the growable ring buffer [VecDeque] - [BinaryHeap] - [BTreeMap] and [BTreeSet] [Rc]: https://doc.rust-lang.org/1.10.0/alloc/rc/ [Arc]: https://doc.rust-lang.org/1.10.0/alloc/arc/ [String]: https://doc.rust-lang.org/1.10.0/collections/string/struct.String.html [Linked List]: https://doc.rust-lang.org/1.10.0/collections/linked_list/struct.LinkedList.html [VecDeque]: https://doc.rust-lang.org/1.10.0/collections/vec_deque/struct.VecDeque.html [BinaryHeap]: https://doc.rust-lang.org/1.10.0/collections/binary_heap/struct.BinaryHeap.html [BTreeMap]: https://doc.rust-lang.org/1.10.0/collections/btree_map/struct.BTreeMap.html [BTreeSet]: https://doc.rust-lang.org/1.10.0/collections/btree_set/struct.BTreeSet.html ## A better Allocator Right now, we leak every freed memory block. Thus, we run out of memory quickly, for example, by creating a new `String` in each iteration of a loop: ```rust // in rust_main in src/lib.rs for i in 0..10000 { format!("Some String"); } ``` To fix this, we need to create an allocator that keeps track of freed memory blocks and reuses them if possible. This introduces some challenges: - We need to keep track of a possibly unlimited number of freed blocks. For example, an application could allocate `n` one-byte sized blocks and free every second block, which creates `n/2` freed blocks. We can't rely on any upper bound of freed block since `n` could be arbitrarily large. - We can't use any of the collections from above, since they rely on allocations themselves. (It might be possible as soon as [RFC #1398] is [implemented][#32838], which allows user-defined allocators for specific collection instances.) - We need to merge adjacent freed blocks if possible. Otherwise, the freed memory is no longer usable for large allocations. We will discuss this point in more detail below. - Our allocator should search the set of freed blocks quickly and keep fragmentation low. [RFC #1398]: https://github.com/rust-lang/rfcs/blob/master/text/1398-kinds-of-allocators.md [#32838]: https://github.com/rust-lang/rust/issues/32838 ### Creating a List of freed Blocks Where do we store the information about an unlimited number of freed blocks? We can't use any fixed size data structure since it could always be too small for some allocation sequences. So we need some kind of dynamically growing set. One possible solution could be to use an array-like data structure that starts at some unused virtual address. If the array becomes full, we increase its size and map new physical frames as backing storage. This approach would require a large part of the virtual address space since the array could grow significantly. We would need to create a custom implementation of a growable array and manipulate the page tables when deallocating. It would also consume a possibly large number of physical frames as backing storage. We will choose another solution with different tradoffs. It's not clearly “better” than the approach above and has significant disadvantages itself. However, it has one big advantage: It does not need any additional physical or virtual memory at all. This makes it less complex since we don't need to manipulate any page tables. The idea is the following: A freed memory block is not used anymore and no one needs the stored information. It is still mapped to a virtual address and backed by a physical page. So we just store the information about the freed block _in the block itself_. We keep a pointer to the first block and store a pointer to the next block in each block. Thus, we create a single linked list: ![Linked List Allocator](overview.svg) In the following, we call a freed block a _hole_. Each hole stores its size and a pointer to the next hole. If a hole is larger than needed, we leave the remaining memory unused. By storing a pointer to the first hole, we are able to traverse the complete list. #### Initialization When the heap is created, all of its memory is unused. Thus, it forms a single large hole: ![Heap Initialization](initialization.svg) The optional pointer to the next hole is set to `None`. #### Allocation In order to allocate a block of memory, we need to find a hole that satisfies the size and alignment requirements. If the found hole is larger than required, we split it into two smaller holes. For example, when we allocate a 24 byte block right after initialization, we split the single hole into a hole of size 24 and a hole with the remaining size: ![split hole](split-hole.svg) Then we use the new 24 byte hole to perform the allocation: ![24 bytes allocated](allocate.svg) To find a suitable hole, we can use several search strategies: - **best fit**: Search the whole list and choose the _smallest_ hole that satisfies the requirements. - **worst fit**: Search the whole list and choose the _largest_ hole that satisfies the requirements. - **first fit**: Search the list from the beginning and choose the _first_ hole that satisfies the requirements. Each strategy has its advantages and disadvantages. Best fit uses the smallest hole possible and leaves larger holes for large allocations. But splitting the smallest hole might create a tiny hole, which is too small for most allocations. In contrast, the worst fit strategy always chooses the largest hole. Thus, it does not create tiny holes, but it consumes the large block, which might be required for large allocations. For our use case, the best fit strategy is better than worst fit. The reason is that we have a minimal hole size of 16 bytes, since each hole needs to be able to store a size (8 bytes) and a pointer to the next hole (8 bytes). Thus, even the best fit strategy leads to holes of usable size. Furthermore, we will need to allocate very large blocks occasionally (e.g. for [DMA] buffers). [DMA]: https://en.wikipedia.org/wiki/Direct_memory_access However, both best fit and worst fit have a significant problem: They need to scan the whole list for each allocation in order to find the optimal block. This leads to long allocation times if the list is long. The first fit strategy does not have this problem, as it returns as soon as it finds a suitable hole. It is fairly fast for small allocations and might only need to scan the whole list for large allocations. #### Deallocation To deallocate a block of memory, we can just insert its corresponding hole somewhere into the list. However, we need to merge adjacent holes. Otherwise, we are unable to reuse the freed memory for larger allocations. For example: ![deallocate memory, which leads to adjacent holes](deallocate.svg) In order to use these adjacent holes for a large allocation, we need to merge them to a single large hole first: ![merge adjacent holes and allocate large block](merge-holes-and-allocate.svg) The easiest way to ensure that adjacent holes are always merged, is to keep the hole list sorted by address. Thus, we only need to check the predecessor and the successor in the list when we free a memory block. If they are adjacent to the freed block, we merge the corresponding holes. Else, we insert the freed block as a new hole at the correct position. ### Implementation The detailed implementation would go beyond the scope of this post, since it contains several hidden difficulties. For example: - Several merge cases: Merge with the previous hole, merge with the next hole, merge with both holes. - We need to satisfy the alignment requirements, which requires additional splitting logic. - The minimal hole size of 16 bytes: We must not create smaller holes when splitting a hole. I created the [linked_list_allocator] crate to handle all of these cases. It consists of a [Heap struct] that provides an `allocate_first_fit` and a `deallocate` method. It also contains a [LockedHeap] type that wraps `Heap` into spinlock so that it's usable as a static system allocator. If you are interested in the implementation details, check out the [source code][linked_list_allocator source]. [linked_list_allocator]: https://docs.rs/crate/linked_list_allocator/0.4.1 [Heap struct]: https://docs.rs/linked_list_allocator/0.4.1/linked_list_allocator/struct.Heap.html [LockedHeap]: https://docs.rs/linked_list_allocator/0.4.1/linked_list_allocator/struct.LockedHeap.html [linked_list_allocator source]: https://github.com/phil-opp/linked-list-allocator We need to add the extern crate to our `Cargo.toml` and our `lib.rs`: ``` bash > cargo add linked_list_allocator ``` ```rust // in src/lib.rs extern crate linked_list_allocator; ``` Now we can change our global allocator: ```rust use linked_list_allocator::LockedHeap; #[global_allocator] static HEAP_ALLOCATOR: LockedHeap = LockedHeap::empty(); ``` We can't initialize the linked list allocator statically, since it needs to initialize the first hole (like described [above](#initialization)). This can't be done at compile time, so the function can't be a `const` function. Therefore we can only create an empty heap and initialize it later at runtime. For that, we add the following lines to our `rust_main` function: ```rust // in src/lib.rs #[no_mangle] pub extern "C" fn rust_main(multiboot_information_address: usize) { […] // set up guard page and map the heap pages memory::init(boot_info); // initialize the heap allocator unsafe { HEAP_ALLOCATOR.lock().init(HEAP_START, HEAP_START + HEAP_SIZE); } […] } ``` It is important that we initialize the heap _after_ mapping the heap pages, since the init function writes to the heap memory (the first hole). Our kernel uses the new allocator now, so we can deallocate memory without leaking it. The example from above should work now without causing an OOM situation: ```rust // in rust_main in src/lib.rs for i in 0..10000 { format!("Some String"); } ``` ### Performance The linked list based approach has some performance problems. Each allocation or deallocation might need to scan the complete list of holes in the worst case. However, I think it's good enough for now, since our heap will stay relatively small for the near future. When our allocator becomes a performance problem eventually, we can just replace it with a faster alternative. ## Summary Now we're able to use heap storage in our kernel without leaking memory. This allows us to effectively process dynamic data such as user supplied strings in the future. We can also use `Rc` and `Arc` to create types with shared ownership. And we have access to various data structures such as `Vec` or `Linked List`, which will make our lives much easier. We even have some well tested and optimized [binary heap] and [B-tree] implementations! [binary heap]:https://en.wikipedia.org/wiki/Binary_heap [B-tree]: https://en.wikipedia.org/wiki/B-tree ## What's next? This post concludes the section about memory management for now. We will revisit this topic eventually, but now it's time to explore other topics. The upcoming posts will be about CPU exceptions and interrupts. We will catch all page, double, and triple faults and create a driver to read keyboard input. The [next post] starts by setting up a so-called _Interrupt Descriptor Table_. [next post]: @/edition-1/posts/09-handling-exceptions/index.md ================================================ FILE: blog/content/edition-1/posts/09-handling-exceptions/index.md ================================================ +++ title = "Handling Exceptions" weight = 9 path = "handling-exceptions" aliases = ["handling-exceptions.html"] date = 2017-03-26 template = "edition-1/page.html" +++ In this post, we start exploring CPU exceptions. Exceptions occur in various erroneous situations, for example when accessing an invalid memory address or when dividing by zero. To catch them, we have to set up an _interrupt descriptor table_ that provides handler functions. At the end of this post, our kernel will be able to catch [breakpoint exceptions] and to resume normal execution afterwards. [breakpoint exceptions]: https://wiki.osdev.org/Exceptions#Breakpoint As always, the complete source code is available on [GitHub]. Please file [issues] for any problems, questions, or improvement suggestions. There is also a comment section at the end of this page. [GitHub]: https://github.com/phil-opp/blog_os/tree/first_edition_post_9 [issues]: https://github.com/phil-opp/blog_os/issues ## Exceptions An exception signals that something is wrong with the current instruction. For example, the CPU issues an exception if the current instruction tries to divide by 0. When an exception occurs, the CPU interrupts its current work and immediately calls a specific exception handler function, depending on the exception type. We've already seen several types of exceptions in our kernel: - **Invalid Opcode**: This exception occurs when the current instruction is invalid. For example, this exception occurred when we tried to use SSE instructions before enabling SSE. Without SSE, the CPU didn't know the `movups` and `movaps` instructions, so it throws an exception when it stumbles over them. - **Page Fault**: A page fault occurs on illegal memory accesses. For example, if the current instruction tries to read from an unmapped page or tries to write to a read-only page. - **Double Fault**: When an exception occurs, the CPU tries to call the corresponding handler function. If another exception occurs _while calling the exception handler_, the CPU raises a double fault exception. This exception also occurs when there is no handler function registered for an exception. - **Triple Fault**: If an exception occurs while the CPU tries to call the double fault handler function, it issues a fatal _triple fault_. We can't catch or handle a triple fault. Most processors react by resetting themselves and rebooting the operating system. This causes the bootloops we experienced in the previous posts. For the full list of exceptions check out the [OSDev wiki][exceptions]. [exceptions]: https://wiki.osdev.org/Exceptions ### The Interrupt Descriptor Table In order to catch and handle exceptions, we have to set up a so-called _Interrupt Descriptor Table_ (IDT). In this table we can specify a handler function for each CPU exception. The hardware uses this table directly, so we need to follow a predefined format. Each entry must have the following 16-byte structure: | Type | Name | Description | | ---- | ------------------------ | ---------------------------------------------------------- | | u16 | Function Pointer [0:15] | The lower bits of the pointer to the handler function. | | u16 | GDT selector | Selector of a code segment in the GDT. | | u16 | Options | (see below) | | u16 | Function Pointer [16:31] | The middle bits of the pointer to the handler function. | | u32 | Function Pointer [32:63] | The remaining bits of the pointer to the handler function. | | u32 | Reserved | The options field has the following format: | Bits | Name | Description | | ----- | -------------------------------- | --------------------------------------------------------------------------------------------------------------- | | 0-2 | Interrupt Stack Table Index | 0: Don't switch stacks, 1-7: Switch to the n-th stack in the Interrupt Stack Table when this handler is called. | | 3-7 | Reserved | | 8 | 0: Interrupt Gate, 1: Trap Gate | If this bit is 0, interrupts are disabled when this handler is called. | | 9-11 | must be one | | 12 | must be zero | | 13‑14 | Descriptor Privilege Level (DPL) | The minimal privilege level required for calling this handler. | | 15 | Present | Each exception has a predefined IDT index. For example the invalid opcode exception has table index 6 and the page fault exception has table index 14. Thus, the hardware can automatically load the corresponding IDT entry for each exception. The [Exception Table][exceptions] in the OSDev wiki shows the IDT indexes of all exceptions in the “Vector nr.” column. When an exception occurs, the CPU roughly does the following: 1. Push some registers on the stack, including the instruction pointer and the [RFLAGS] register. (We will use these values later in this post.) 2. Read the corresponding entry from the Interrupt Descriptor Table (IDT). For example, the CPU reads the 14-th entry when a page fault occurs. 3. Check if the entry is present. Raise a double fault if not. 4. Disable interrupts if the entry is an interrupt gate (bit 40 not set). 5. Load the specified GDT selector into the CS segment. 6. Jump to the specified handler function. [RFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register ## An IDT Type Instead of creating our own IDT type, we will use the [`Idt` struct] of the `x86_64` crate, which looks like this: [`Idt` struct]: https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.Idt.html ``` rust #[repr(C)] pub struct Idt { pub divide_by_zero: IdtEntry, pub debug: IdtEntry, pub non_maskable_interrupt: IdtEntry, pub breakpoint: IdtEntry, pub overflow: IdtEntry, pub bound_range_exceeded: IdtEntry, pub invalid_opcode: IdtEntry, pub device_not_available: IdtEntry, pub double_fault: IdtEntry, pub invalid_tss: IdtEntry, pub segment_not_present: IdtEntry, pub stack_segment_fault: IdtEntry, pub general_protection_fault: IdtEntry, pub page_fault: IdtEntry, pub x87_floating_point: IdtEntry, pub alignment_check: IdtEntry, pub machine_check: IdtEntry, pub simd_floating_point: IdtEntry, pub virtualization: IdtEntry, pub security_exception: IdtEntry, pub interrupts: [IdtEntry; 224], // some fields omitted } ``` The fields have the type [`IdtEntry`], which is a struct that represents the fields of an IDT entry (see the table above). The type parameter `F` defines the expected handler function type. We see that some entries require a [`HandlerFunc`] and some entries require a [`HandlerFuncWithErrCode`]. The page fault even has its own special type: [`PageFaultHandlerFunc`]. [`IdtEntry`]: https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.IdtEntry.html [`HandlerFunc`]: https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/type.HandlerFunc.html [`HandlerFuncWithErrCode`]: https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/type.HandlerFuncWithErrCode.html [`PageFaultHandlerFunc`]: https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/type.PageFaultHandlerFunc.html Let's look at the `HandlerFunc` type first: ```rust type HandlerFunc = extern "x86-interrupt" fn(_: &mut ExceptionStackFrame); ``` It's a [type alias] for an `extern "x86-interrupt" fn` type. The `extern` keyword defines a function with a [foreign calling convention] and is often used to communicate with C code (`extern "C" fn`). But what is the `x86-interrupt` calling convention? [type alias]: https://doc.rust-lang.org/book/type-aliases.html [foreign calling convention]: https://doc.rust-lang.org/1.30.0/book/first-edition/ffi.html#foreign-calling-conventions ## The Interrupt Calling Convention Exceptions are quite similar to function calls: The CPU jumps to the first instruction of the called function and executes it. Afterwards, if the function is not diverging, the CPU jumps to the return address and continues the execution of the parent function. However, there is a major difference between exceptions and function calls: A function call is invoked voluntary by a compiler inserted `call` instruction, while an exception might occur at _any_ instruction. In order to understand the consequences of this difference, we need to examine function calls in more detail. [Calling conventions] specify the details of a function call. For example, they specify where function parameters are placed (e.g. in registers or on the stack) and how results are returned. On x86_64 Linux, the following rules apply for C functions (specified in the [System V ABI]): [Calling conventions]: https://en.wikipedia.org/wiki/Calling_convention [System V ABI]: https://refspecs.linuxbase.org/elf/gabi41.pdf - the first six integer arguments are passed in registers `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` - additional arguments are passed on the stack - results are returned in `rax` and `rdx` Note that Rust does not follow the C ABI (in fact, [there isn't even a Rust ABI yet][rust abi]). So these rules apply only to functions declared as `extern "C" fn`. [rust abi]: https://github.com/rust-lang/rfcs/issues/600 ### Preserved and Scratch Registers The calling convention divides the registers in two parts: _preserved_ and _scratch_ registers. The values of _preserved_ registers must remain unchanged across function calls. So a called function (the _“callee”_) is only allowed to overwrite these registers if it restores their original values before returning. Therefore these registers are called _“callee-saved”_. A common pattern is to save these registers to the stack at the function's beginning and restore them just before returning. In contrast, a called function is allowed to overwrite _scratch_ registers without restrictions. If the caller wants to preserve the value of a scratch register across a function call, it needs to backup and restore it before the function call (e.g. by pushing it to the stack). So the scratch registers are _caller-saved_. On x86_64, the C calling convention specifies the following preserved and scratch registers: | preserved registers | scratch registers | | ----------------------------------------------- | ----------------------------------------------------------- | | `rbp`, `rbx`, `rsp`, `r12`, `r13`, `r14`, `r15` | `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`, `r9`, `r10`, `r11` | | _callee-saved_ | _caller-saved_ | The compiler knows these rules, so it generates the code accordingly. For example, most functions begin with a `push rbp`, which backups `rbp` on the stack (because it's a callee-saved register). ### Preserving all Registers In contrast to function calls, exceptions can occur on _any_ instruction. In most cases we don't even know at compile time if the generated code will cause an exception. For example, the compiler can't know if an instruction causes a stack overflow or a page fault. Since we don't know when an exception occurs, we can't backup any registers before. This means that we can't use a calling convention that relies on caller-saved registers for exception handlers. Instead, we need a calling convention means that preserves _all registers_. The `x86-interrupt` calling convention is such a calling convention, so it guarantees that all register values are restored to their original values on function return. ### The Exception Stack Frame On a normal function call (using the `call` instruction), the CPU pushes the return address before jumping to the target function. On function return (using the `ret` instruction), the CPU pops this return address and jumps to it. So the stack frame of a normal function call looks like this: ![function stack frame](function-stack-frame.svg) For exception and interrupt handlers, however, pushing a return address would not suffice, since interrupt handlers often run in a different context (stack pointer, CPU flags, etc.). Instead, the CPU performs the following steps when an interrupt occurs: 1. **Aligning the stack pointer**: An interrupt can occur at any instructions, so the stack pointer can have any value, too. However, some CPU instructions (e.g. some SSE instructions) require that the stack pointer is aligned on a 16 byte boundary, therefore the CPU performs such an alignment right after the interrupt. 2. **Switching stacks** (in some cases): A stack switch occurs when the CPU privilege level changes, for example when a CPU exception occurs in an user mode program. It is also possible to configure stack switches for specific interrupts using the so-called _Interrupt Stack Table_ (described in the next post). 3. **Pushing the old stack pointer**: The CPU pushes the values of the stack pointer (`rsp`) and the stack segment (`ss`) registers at the time when the interrupt occurred (before the alignment). This makes it possible to restore the original stack pointer when returning from an interrupt handler. 4. **Pushing and updating the `RFLAGS` register**: The [`RFLAGS`] register contains various control and status bits. On interrupt entry, the CPU changes some bits and pushes the old value. 5. **Pushing the instruction pointer**: Before jumping to the interrupt handler function, the CPU pushes the instruction pointer (`rip`) and the code segment (`cs`). This is comparable to the return address push of a normal function call. 6. **Pushing an error code** (for some exceptions): For some specific exceptions such as page faults, the CPU pushes an error code, which describes the cause of the exception. 7. **Invoking the interrupt handler**: The CPU reads the address and the segment descriptor of the interrupt handler function from the corresponding field in the IDT. It then invokes this handler by loading the values into the `rip` and `cs` registers. [`RFLAGS`]: https://en.wikipedia.org/wiki/FLAGS_register So the _exception stack frame_ looks like this: ![exception stack frame](exception-stack-frame.svg) In the `x86_64` crate, the exception stack frame is represented by the [`ExceptionStackFrame`] struct. It is passed to interrupt handlers as `&mut` and can be used to retrieve additional information about the exception's cause. The struct contains no error code field, since only some few exceptions push an error code. These exceptions use the separate [`HandlerFuncWithErrCode`] function type, which has an additional `error_code` argument. [`ExceptionStackFrame`]: https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.ExceptionStackFrame.html ### Behind the Scenes The `x86-interrupt` calling convention is a powerful abstraction that hides almost all of the messy details of the exception handling process. However, sometimes it's useful to know what's happening behind the curtain. Here is a short overview of the things that the `x86-interrupt` calling convention takes care of: - **Retrieving the arguments**: Most calling conventions expect that the arguments are passed in registers. This is not possible for exception handlers, since we must not overwrite any register values before backing them up on the stack. Instead, the `x86-interrupt` calling convention is aware that the arguments already lie on the stack at a specific offset. - **Returning using `iretq`**: Since the exception stack frame completely differs from stack frames of normal function calls, we can't return from handlers functions through the normal `ret` instruction. Instead, the `iretq` instruction must be used. - **Handling the error code**: The error code, which is pushed for some exceptions, makes things much more complex. It changes the stack alignment (see the next point) and needs to be popped off the stack before returning. The `x86-interrupt` calling convention handles all that complexity. However, it doesn't know which handler function is used for which exception, so it needs to deduce that information from the number of function arguments. That means that the programmer is still responsible to use the correct function type for each exception. Luckily, the `Idt` type defined by the `x86_64` crate ensures that the correct function types are used. - **Aligning the stack**: There are some instructions (especially SSE instructions) that require a 16-byte stack alignment. The CPU ensures this alignment whenever an exception occurs, but for some exceptions it destroys it again later when it pushes an error code. The `x86-interrupt` calling convention takes care of this by realigning the stack in this case. If you are interested in more details: We also have a series of posts that explains exception handling using [naked functions] linked [at the end of this post][too-much-magic]. [naked functions]: https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md [too-much-magic]: #too-much-magic ## Implementation Now that we've understood the theory, it's time to handle CPU exceptions in our kernel. We start by creating a new `interrupts` module: ``` rust // in src/lib.rs ... mod interrupts; ... ``` In the new module, we create an `init` function, that creates a new `Idt`: ``` rust // in src/interrupts.rs use x86_64::structures::idt::Idt; pub fn init() { let mut idt = Idt::new(); } ``` Now we can add handler functions. We start by adding a handler for the [breakpoint exception]. The breakpoint exception is the perfect exception to test exception handling. Its only purpose is to temporary pause a program when the breakpoint instruction `int3` is executed. [breakpoint exception]: https://wiki.osdev.org/Exceptions#Breakpoint The breakpoint exception is commonly used in debuggers: When the user sets a breakpoint, the debugger overwrites the corresponding instruction with the `int3` instruction so that the CPU throws the breakpoint exception when it reaches that line. When the user wants to continue the program, the debugger replaces the `int3` instruction with the original instruction again and continues the program. For more details, see the ["_How debuggers work_"] series. ["_How debuggers work_"]: https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints For our use case, we don't need to overwrite any instructions (it wouldn't even be possible since we [set the page table flags] to read-only). Instead, we just want to print a message when the breakpoint instruction is executed and then continue the program. [set the page table flags]: @/edition-1/posts/07-remap-the-kernel/index.md#using-the-correct-flags So let's create a simple `breakpoint_handler` function and add it to our IDT: ```rust /// in src/interrupts.rs use x86_64::structures::idt::ExceptionStackFrame; pub fn init() { let mut idt = Idt::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); } extern "x86-interrupt" fn breakpoint_handler( stack_frame: &mut ExceptionStackFrame) { println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); } ``` Our handler just outputs a message and pretty-prints the exception stack frame. When we try to compile it, the following error occurs: ``` error: x86-interrupt ABI is experimental and subject to change (see issue #40180) --> src/interrupts.rs:8:1 | 8 | extern "x86-interrupt" fn breakpoint_handler( | _^ starting here... 9 | | stack_frame: &mut ExceptionStackFrame) 10 | | { 11 | | println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); 12 | | } | |_^ ...ending here | = help: add #![feature(abi_x86_interrupt)] to the crate attributes to enable ``` This error occurs because the `x86-interrupt` calling convention is still unstable. To use it anyway, we have to explicitly enable it by adding `#![feature(abi_x86_interrupt)]` on the top of our `lib.rs`. ### Loading the IDT In order that the CPU uses our new interrupt descriptor table, we need to load it using the [`lidt`] instruction. The `Idt` struct of the `x86_64` provides a [`load`][Idt::load] method function for that. Let's try to use it: [`lidt`]: https://www.felixcloutier.com/x86/lgdt:lidt [Idt::load]: https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.Idt.html#method.load ```rust pub fn init() { let mut idt = Idt::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.load(); } ``` When we try to compile it now, the following error occurs: ``` error: `idt` does not live long enough --> src/interrupts/mod.rs:43:5 | 43 | idt.load(); | ^^^ does not live long enough 44 | } | - borrowed value only lives until here | = note: borrowed value must be valid for the static lifetime... ``` So the `load` methods expects a `&'static self`, that is a reference that is valid for the complete runtime of the program. The reason is that the CPU will access this table on every interrupt until we load a different IDT. So using a shorter lifetime than `'static` could lead to use-after-free bugs. In fact, this is exactly what happens here. Our `idt` is created on the stack, so it is only valid inside the `init` function. Afterwards the stack memory is reused for other functions, so the CPU would interpret random stack memory as IDT. Luckily, the `Idt::load` method encodes this lifetime requirement in its function definition, so that the Rust compiler is able to prevent this possible bug at compile time. In order to fix this problem, we need to store our `idt` at a place where it has a `'static` lifetime. To achieve this, we could either allocate our IDT on the heap using `Box` and then convert it to a `'static` reference or we can store the IDT as a `static`. Let's try the latter: ```rust static IDT: Idt = Idt::new(); pub fn init() { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } ``` There are two problems with this. First, statics are immutable, so we can't modify the breakpoint entry from our `init` function. Second, the `Idt::new` function is not a [`const` function], so it can't be used to initialize a `static`. We could solve this problem by using a [`static mut`] of type `Option`: [`const` function]: https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md [`static mut`]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable ```rust static mut IDT: Option = None; pub fn init() { unsafe { let IDT = Some(Idt::new()); let idt = IDT.as_mut_ref().unwrap(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.load(); } } ``` This variant compiles without errors but it's far from idiomatic. `static mut`s are very prone to data races, so we need an [`unsafe` block] on each access. Also, we need to explicitly `unwrap` the `IDT` on each use, since might be `None`. [`unsafe` block]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers #### Lazy Statics to the Rescue The one-time initialization of statics with non-const functions is a common problem in Rust. Fortunately, there already exists a good solution in a crate named [lazy_static]. This crate provides a `lazy_static!` macro that defines a lazily initialized `static`. Instead of computing its value at compile time, the `static` laziliy initializes itself when it's accessed the first time. Thus, the initialization happens at runtime so that arbitrarily complex initialization code is possible. [lazy_static]: https://docs.rs/lazy_static/0.2.4/lazy_static/ Let's add the `lazy_static` crate to our project: ```rust // in src/lib.rs #[macro_use] extern crate lazy_static; ``` ```toml # in Cargo.toml [dependencies.lazy_static] version = "0.2.4" features = ["spin_no_std"] ``` We need the `spin_no_std` feature, since we don't link the standard library. We also need the `#[macro_use]` attribute on the `extern crate` line to import the `lazy_static!` macro. Now we can create our static IDT using `lazy_static`: ```rust lazy_static! { static ref IDT: Idt = { let mut idt = Idt::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt }; } pub fn init() { IDT.load(); } ``` Note how this solution requires no `unsafe` blocks or `unwrap` calls. > ##### Aside: How does the `lazy_static!` macro work? > > The macro generates a `static` of type `Once`. The [`Once`][spin::Once] type is provided by the `spin` crate and allows deferred one-time initialization. It is implemented using an [`AtomicUsize`] for synchronization and an [`UnsafeCell`] for storing the (possibly uninitialized) value. So this solution also uses `unsafe` behind the scenes, but it is abstracted away in a safe interface. [spin::Once]: https://docs.rs/spin/0.4.5/spin/struct.Once.html [`AtomicUsize`]: https://doc.rust-lang.org/nightly/core/sync/atomic/struct.AtomicUsize.html [`UnsafeCell`]: https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html ### Testing it Now we should be able to handle breakpoint exceptions! Let's try it in our `rust_main`: ```rust // in src/lib.rs pub extern "C" fn rust_main(...) { ... memory::init(boot_info); // initialize our IDT interrupts::init(); // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); println!("It did not crash!"); loop {} } ``` When we run it in QEMU now (using `make run`), we see the following: ![QEMU printing `EXCEPTION: BREAKPOINT` and the exception stack frame](qemu-breakpoint-exception.png) It works! The CPU successfully invokes our breakpoint handler, which prints the message, and then returns back to the `rust_main` function, where the `It did not crash!` message is printed. > **Aside**: If it doesn't work and a boot loop occurs, this might be caused by a kernel stack overflow. Try increasing the stack size to at least 16kB (4096 * 4 bytes) in the `boot.asm` file. We see that the exception stack frame tells us the instruction and stack pointers at the time when the exception occurred. This information is very useful when debugging unexpected exceptions. For example, we can look at the corresponding assembly line using `objdump`: ``` > objdump -d build/kernel-x86_64.bin | grep -B5 "1140a6:" 00000000001140a0 : 1140a0: 55 push %rbp 1140a1: 48 89 e5 mov %rsp,%rbp 1140a4: 50 push %rax 1140a5: cc int3 1140a6: 48 83 c4 08 add $0x8,%rsp ``` The `-d` flags disassembles the `code` section and `-C` flag makes function names more readable by [demangling] them. The `-B` flag of `grep` specifies the number of preceding lines that should be shown (5 in our case). [demangling]: https://en.wikipedia.org/wiki/Name_mangling We clearly see the `int3` exception that caused the breakpoint exception at address `1140a5`. Wait… the stored instruction pointer was `1140a6`, which is a normal `add` operation. What's happening here? ### Faults, Aborts, and Traps The answer is that the stored instruction pointer only points to the causing instruction for _fault_ type exceptions, but not for _trap_ or _abort_ type exceptions. The difference between these types is the following: - **Faults** are exceptions that can be corrected so that the program can continue as if nothing happened. An example is the [page fault], which can often be resolved by loading the accessed page from the disk into memory. - **Aborts** are fatal exceptions that can't be recovered. Examples are [machine check exception] or the [double fault]. - **Traps** are only reported to the kernel, but don't hinder the continuation of the program. Examples are the breakpoint exception and the [overflow exception]. [page fault]: https://wiki.osdev.org/Exceptions#Page_Fault [machine check exception]: https://wiki.osdev.org/Exceptions#Machine_Check [double fault]: https://wiki.osdev.org/Exceptions#Double_Fault [overflow exception]: https://wiki.osdev.org/Exceptions#Overflow The reason for the diffent instruction pointer values is that the stored value is also the return address. So for faults, the instruction that caused the exception is restarted and might cause the same exception again if it's not resolved. This would not make much sense for traps, since invoking the breakpoint exception again would just cause another breakpoint exception[^fn-breakpoint-restart-use-cases]. Thus the instruction pointer points to the _next_ instruction for these exceptions. In some cases, the distinction between faults and traps is vague. For example, the [debug exception] behaves like a fault in some cases, but like a trap in others. So to find out the meaning of the saved instruction pointer, it is a good idea to read the official documentation for the exception, which can be found in the [AMD64 manual] in Section 8.2. For example, for the breakpoint exception it says: [debug exception]: https://wiki.osdev.org/Exceptions#Debug [AMD64 manual]: https://www.amd.com/system/files/TechDocs/24593.pdf > `#BP` is a trap-type exception. The saved instruction pointer points to the byte after the `INT3` instruction. The documentation of the [`Idt`] struct and the [OSDev Wiki][osdev wiki exceptions] also contain this information. [`Idt`]: https://docs.rs/x86_64/0.1.1/x86_64/structures/idt/struct.Idt.html [osdev wiki exceptions]: https://wiki.osdev.org/Exceptions ## Too much Magic? The `x86-interrupt` calling convention and the [`Idt`] type made the exception handling process relatively straightforward and painless. If this was too much magic for you and you like to learn all the gory details of exception handling, we got you covered: Our [“Handling Exceptions with Naked Functions”] series shows how to handle exceptions without the `x86-interrupt` calling convention and also creates its own `Idt` type. Historically, these posts were the main exception handling posts before the `x86-interrupt` calling convention and the `x86_64` crate existed. [“Handling Exceptions with Naked Functions”]: @/edition-1/extra/naked-exceptions/_index.md ## What's next? We've successfully caught our first exception and returned from it! The next step is to add handlers for other common exceptions such as page faults. We also need to make sure that we never cause a [triple fault], since it causes a complete system reset. The next post explains how we can avoid this by correctly catching [double faults]. [triple fault]: https://wiki.osdev.org/Triple_Fault [double faults]: https://wiki.osdev.org/Double_Fault#Double_Fault ## Footnotes [^fn-breakpoint-restart-use-cases]: There are valid use cases for restarting an instruction that caused a breakpoint. The most common use case is a debugger: When setting a breakpoint on some code line, the debugger overwrites the corresponding instruction with an `int3` instruction, so that the CPU traps when that line is executed. When the user continues execution, the debugger swaps in the original instruction and continues the program from the replaced instruction. ================================================ FILE: blog/content/edition-1/posts/10-double-faults/index.md ================================================ +++ title = "Double Faults" weight = 10 path = "double-faults" aliases = ["double-faults.html"] date = 2017-01-02 template = "edition-1/page.html" +++ In this post we explore double faults in detail. We also set up an _Interrupt Stack Table_ to catch double faults on a separate kernel stack. This way, we can completely prevent triple faults, even on kernel stack overflow. As always, the complete source code is available on [GitHub]. Please file [issues] for any problems, questions, or improvement suggestions. There is also a [gitter chat] and a comment section at the end of this page. [GitHub]: https://github.com/phil-opp/blog_os/tree/first_edition_post_10 [issues]: https://github.com/phil-opp/blog_os/issues [gitter chat]: https://gitter.im/phil-opp/blog_os ## What is a Double Fault? In simplified terms, a double fault is a special exception that occurs when the CPU fails to invoke an exception handler. For example, it occurs when a page fault is triggered but there is no page fault handler registered in the [Interrupt Descriptor Table][IDT] (IDT). So it's kind of similar to catch-all blocks in programming languages with exceptions, e.g. `catch(...)` in C++ or `catch(Exception e)` in Java or C#. [IDT]: @/edition-1/posts/09-handling-exceptions/index.md#the-interrupt-descriptor-table A double fault behaves like a normal exception. It has the vector number `8` and we can define a normal handler function for it in the IDT. It is really important to provide a double fault handler, because if a double fault is unhandled a fatal _triple fault_ occurs. Triple faults can't be caught and most hardware reacts with a system reset. ### Triggering a Double Fault Let's provoke a double fault by triggering an exception for that we didn't define a handler function: ```rust // in src/lib.rs #[no_mangle] pub extern "C" fn rust_main(multiboot_information_address: usize) { ... // initialize our IDT interrupts::init(); // trigger a page fault unsafe { *(0xdeadbeaf as *mut u64) = 42; }; println!("It did not crash!"); loop {} } ``` We try to write to address `0xdeadbeaf`, but the corresponding page is not present in the page tables. Thus, a page fault occurs. We haven't registered a page fault handler in our [IDT], so a double fault occurs. When we start our kernel now, we see that it enters an endless boot loop: ![boot loop](boot-loop.gif) The reason for the boot loop is the following: 1. The CPU tries to write to `0xdeadbeaf`, which causes a page fault. 2. The CPU looks at the corresponding entry in the IDT and sees that the present bit isn't set. Thus, it can't call the page fault handler and a double fault occurs. 3. The CPU looks at the IDT entry of the double fault handler, but this entry is also non-present. Thus, a _triple_ fault occurs. 4. A triple fault is fatal. QEMU reacts to it like most real hardware and issues a system reset. So in order to prevent this triple fault, we need to either provide a handler function for page faults or a double fault handler. Let's start with the latter, since we want to avoid triple faults in all cases. ### A Double Fault Handler A double fault is a normal exception with an error code, so we can use our `handler_with_error_code` macro to create a wrapper function: ```rust // in src/interrupts.rs lazy_static! { static ref IDT: idt::Idt = { let mut idt = idt::Idt::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.double_fault.set_handler_fn(double_fault_handler); idt }; } // our new double fault handler extern "x86-interrupt" fn double_fault_handler( stack_frame: &mut ExceptionStackFrame, _error_code: u64) { println!("\nEXCEPTION: DOUBLE FAULT\n{:#?}", stack_frame); loop {} } ``` Our handler prints a short error message and dumps the exception stack frame. The error code of the double fault handler is always zero, so there's no reason to print it. When we start our kernel now, we should see that the double fault handler is invoked: ![QEMU printing `EXCEPTION: DOUBLE FAULT` and the exception stack frame](qemu-catch-double-fault.png) It worked! Here is what happens this time: 1. The CPU executes tries to write to `0xdeadbeaf`, which causes a page fault. 2. Like before, the CPU looks at the corresponding entry in the IDT and sees that the present bit isn't set. Thus, a double fault occurs. 3. The CPU jumps to the – now present – double fault handler. The triple fault (and the boot-loop) no longer occurs, since the CPU can now call the double fault handler. That was quite straightforward! So why do we need a whole post for this topic? Well, we're now able to catch _most_ double faults, but there are some cases where our current approach doesn't suffice. ## Causes of Double Faults Before we look at the special cases, we need to know the exact causes of double faults. Above, we used a pretty vague definition: > A double fault is a special exception that occurs when the CPU fails to invoke an exception handler. What does _“fails to invoke”_ mean exactly? The handler is not present? The handler is [swapped out]? And what happens if a handler causes exceptions itself? [swapped out]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf For example, what happens if… : 1. a divide-by-zero exception occurs, but the corresponding handler function is swapped out? 2. a page fault occurs, but the page fault handler is swapped out? 3. a divide-by-zero handler causes a breakpoint exception, but the breakpoint handler is swapped out? 4. our kernel overflows its stack and the [guard page] is hit? [guard page]: @/edition-1/posts/07-remap-the-kernel/index.md#creating-a-guard-page Fortunately, the AMD64 manual ([PDF][AMD64 manual]) has an exact definition (in Section 8.2.9). According to it, a “double fault exception _can_ occur when a second exception occurs during the handling of a prior (first) exception handler”. The _“can”_ is important: Only very specific combinations of exceptions lead to a double fault. These combinations are: First Exception | Second Exception ----------------|----------------- [Divide-by-zero],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] [Page Fault] | [Page Fault],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] [Divide-by-zero]: https://wiki.osdev.org/Exceptions#Division_Error [Invalid TSS]: https://wiki.osdev.org/Exceptions#Invalid_TSS [Segment Not Present]: https://wiki.osdev.org/Exceptions#Segment_Not_Present [Stack-Segment Fault]: https://wiki.osdev.org/Exceptions#Stack-Segment_Fault [General Protection Fault]: https://wiki.osdev.org/Exceptions#General_Protection_Fault [Page Fault]: https://wiki.osdev.org/Exceptions#Page_Fault [AMD64 manual]: https://www.amd.com/system/files/TechDocs/24593.pdf So for example a divide-by-zero fault followed by a page fault is fine (the page fault handler is invoked), but a divide-by-zero fault followed by a general-protection fault leads to a double fault. With the help of this table, we can answer the first three of the above questions: 1. If a divide-by-zero exception occurs and the corresponding handler function is swapped out, a _page fault_ occurs and the _page fault handler_ is invoked. 2. If a page fault occurs and the page fault handler is swapped out, a _double fault_ occurs and the _double fault handler_ is invoked. 3. If a divide-by-zero handler causes a breakpoint exception, the CPU tries to invoke the breakpoint handler. If the breakpoint handler is swapped out, a _page fault_ occurs and the _page fault handler_ is invoked. In fact, even the case of a non-present handler follows this scheme: A non-present handler causes a _segment-not-present_ exception. We didn't define a segment-not-present handler, so another segment-not-present exception occurs. According to the table, this leads to a double fault. ### Kernel Stack Overflow Let's look at the fourth question: > What happens if our kernel overflows its stack and the [guard page] is hit? When our kernel overflows its stack and hits the guard page, a _page fault_ occurs. The CPU looks up the page fault handler in the IDT and tries to push the [exception stack frame] onto the stack. However, our current stack pointer still points to the non-present guard page. Thus, a second page fault occurs, which causes a double fault (according to the above table). [exception stack frame]: @/edition-1/posts/09-handling-exceptions/index.md#the-exception-stack-frame So the CPU tries to call our _double fault handler_ now. However, on a double fault the CPU tries to push the exception stack frame, too. Our stack pointer still points to the guard page, so a _third_ page fault occurs, which causes a _triple fault_ and a system reboot. So our current double fault handler can't avoid a triple fault in this case. Let's try it ourselves! We can easily provoke a kernel stack overflow by calling a function that recurses endlessly: ```rust // in src/lib.rs #[no_mangle] pub extern "C" fn rust_main(multiboot_information_address: usize) { ... // initialize our IDT interrupts::init(); fn stack_overflow() { stack_overflow(); // for each recursion, the return address is pushed } // trigger a stack overflow stack_overflow(); println!("It did not crash!"); loop {} } ``` When we try this code in QEMU, we see that the system enters a boot-loop again. So how can we avoid this problem? We can't omit the pushing of the exception stack frame, since the CPU itself does it. So we need to ensure somehow that the stack is always valid when a double fault exception occurs. Fortunately, the x86_64 architecture has a solution to this problem. ## Switching Stacks The x86_64 architecture is able to switch to a predefined, known-good stack when an exception occurs. This switch happens at hardware level, so it can be performed before the CPU pushes the exception stack frame. This switching mechanism is implemented as an _Interrupt Stack Table_ (IST). The IST is a table of 7 pointers to known-good stacks. In Rust-like pseudo code: ```rust struct InterruptStackTable { stack_pointers: [Option; 7], } ``` For each exception handler, we can choose a stack from the IST through the `options` field in the corresponding [IDT entry]. For example, we could use the first stack in the IST for our double fault handler. Then the CPU would automatically switch to this stack whenever a double fault occurs. This switch would happen before anything is pushed, so it would prevent the triple fault. [IDT entry]: @/edition-1/posts/09-handling-exceptions/index.md#the-interrupt-descriptor-table ### Allocating a new Stack In order to fill an Interrupt Stack Table later, we need a way to allocate new stacks. Therefore we extend our `memory` module with a new `stack_allocator` submodule: ```rust // in src/memory/mod.rs mod stack_allocator; ``` First, we create a new `StackAllocator` struct and a constructor function: ```rust // in src/memory/stack_allocator.rs use memory::paging::PageIter; pub struct StackAllocator { range: PageIter, } impl StackAllocator { pub fn new(page_range: PageIter) -> StackAllocator { StackAllocator { range: page_range } } } ``` We create a simple `StackAllocator` that allocates stacks from a given range of pages (`PageIter` is an Iterator over a range of pages; we introduced it [in the kernel heap post].). [in the kernel heap post]: @/edition-1/posts/08-kernel-heap/index.md#mapping-the-heap We add a `alloc_stack` method that allocates a new stack: ```rust // in src/memory/stack_allocator.rs use memory::paging::{self, Page, ActivePageTable}; use memory::{PAGE_SIZE, FrameAllocator}; impl StackAllocator { pub fn alloc_stack(&mut self, active_table: &mut ActivePageTable, frame_allocator: &mut FA, size_in_pages: usize) -> Option { if size_in_pages == 0 { return None; /* a zero sized stack makes no sense */ } // clone the range, since we only want to change it on success let mut range = self.range.clone(); // try to allocate the stack pages and a guard page let guard_page = range.next(); let stack_start = range.next(); let stack_end = if size_in_pages == 1 { stack_start } else { // choose the (size_in_pages-2)th element, since index // starts at 0 and we already allocated the start page range.nth(size_in_pages - 2) }; match (guard_page, stack_start, stack_end) { (Some(_), Some(start), Some(end)) => { // success! write back updated range self.range = range; // map stack pages to physical frames for page in Page::range_inclusive(start, end) { active_table.map(page, paging::WRITABLE, frame_allocator); } // create a new stack let top_of_stack = end.start_address() + PAGE_SIZE; Some(Stack::new(top_of_stack, start.start_address())) } _ => None, /* not enough pages */ } } } ``` The method takes mutable references to the [ActivePageTable] and a [FrameAllocator], since it needs to map the new virtual stack pages to physical frames. We define that the stack size is a multiple of the page size. [ActivePageTable]: @/edition-1/posts/06-page-tables/index.md#page-table-ownership [FrameAllocator]: @/edition-1/posts/05-allocating-frames/index.md#a-frame-allocator Instead of operating directly on `self.range`, we [clone] it and only write it back on success. This way, subsequent stack allocations can still succeed if there are pages left (e.g., a call with `size_in_pages = 3` can still succeed after a failed call with `size_in_pages = 100`). In order to be able to clone `PageIter`, we add a `#[derive(Clone)]` to its definition in `src/memory/paging/mod.rs`. We also need to make the `start_address` method of the `Page` type public (in the same file). [clone]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#tymethod.clone The actual allocation is straightforward: First, we choose the next page as [guard page]. Then we choose the next `size_in_pages` pages as stack pages using [Iterator::nth]. If all three variables are `Some`, the allocation succeeded and we map the stack pages to physical frames using [ActivePageTable::map]. The guard page remains unmapped. [Iterator::nth]: https://doc.rust-lang.org/nightly/core/iter/trait.Iterator.html#method.nth [ActivePageTable::map]: @/edition-1/posts/06-page-tables/index.md#more-mapping-functions Finally, we create and return a new `Stack`, which we define as follows: ```rust // in src/memory/stack_allocator.rs #[derive(Debug)] pub struct Stack { top: usize, bottom: usize, } impl Stack { fn new(top: usize, bottom: usize) -> Stack { assert!(top > bottom); Stack { top: top, bottom: bottom, } } pub fn top(&self) -> usize { self.top } pub fn bottom(&self) -> usize { self.bottom } } ``` The `Stack` struct describes a stack though its top and bottom addresses. #### The Memory Controller Now we're able to allocate a new double fault stack. However, we add one more level of abstraction to make things easier. For that we add a new `MemoryController` type to our `memory` module: ```rust // in src/memory/mod.rs pub use self::stack_allocator::Stack; pub struct MemoryController { active_table: paging::ActivePageTable, frame_allocator: AreaFrameAllocator, stack_allocator: stack_allocator::StackAllocator, } impl MemoryController { pub fn alloc_stack(&mut self, size_in_pages: usize) -> Option { let &mut MemoryController { ref mut active_table, ref mut frame_allocator, ref mut stack_allocator } = self; stack_allocator.alloc_stack(active_table, frame_allocator, size_in_pages) } } ``` The `MemoryController` struct holds the three types that are required for `alloc_stack` and provides a simpler interface (only one argument). The `alloc_stack` wrapper just takes the tree types as `&mut` through [destructuring] and forwards them to the `stack_allocator`. The [ref mut]-s are needed to take the inner fields by mutable reference. Note that we're re-exporting the `Stack` type since it is returned by `alloc_stack`. [destructuring]: https://doc.rust-lang.org/1.10.0/book/patterns.html#destructuring [ref mut]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch18-03-pattern-syntax.html#creating-references-in-patterns-with-ref-and-ref-mut The last step is to create a `StackAllocator` and return a `MemoryController` from `memory::init`: ```rust // in src/memory/mod.rs pub fn init(boot_info: &BootInformation) -> MemoryController { ... let stack_allocator = { let stack_alloc_start = heap_end_page + 1; let stack_alloc_end = stack_alloc_start + 100; let stack_alloc_range = Page::range_inclusive(stack_alloc_start, stack_alloc_end); stack_allocator::StackAllocator::new(stack_alloc_range) }; MemoryController { active_table: active_table, frame_allocator: frame_allocator, stack_allocator: stack_allocator, } } ``` We create a new `StackAllocator` with a range of 100 pages starting right after the last heap page. In order to do arithmetic on pages (e.g. calculate the hundredth page after `stack_alloc_start`), we implement `Add` for `Page`: ```rust // in src/memory/paging/mod.rs use core::ops::Add; impl Add for Page { type Output = Page; fn add(self, rhs: usize) -> Page { Page { number: self.number + rhs } } } ``` #### Allocating a Double Fault Stack Now we can allocate a new double fault stack by passing the memory controller to our `interrupts::init` function: ```rust // in src/lib.rs #[no_mangle] pub extern "C" fn rust_main(multiboot_information_address: usize) { ... // set up guard page and map the heap pages let mut memory_controller = memory::init(boot_info); // new return type // initialize our IDT interrupts::init(&mut memory_controller); // new argument ... } // in src/interrupts.rs use memory::MemoryController; pub fn init(memory_controller: &mut MemoryController) { let double_fault_stack = memory_controller.alloc_stack(1) .expect("could not allocate double fault stack"); IDT.load(); } ``` We allocate a 4096 bytes stack (one page) for our double fault handler. Now we just need some way to tell the CPU that it should use this stack for handling double faults. ### The IST and TSS The Interrupt Stack Table (IST) is part of an old legacy structure called _[Task State Segment]_ \(TSS). The TSS used to hold various information (e.g. processor register state) about a task in 32-bit mode and was for example used for [hardware context switching]. However, hardware context switching is no longer supported in 64-bit mode and the format of the TSS changed completely. [Task State Segment]: https://en.wikipedia.org/wiki/Task_state_segment [hardware context switching]: https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching On x86_64, the TSS no longer holds any task specific information at all. Instead, it holds two stack tables (the IST is one of them). The only common field between the 32-bit and 64-bit TSS is the pointer to the [I/O port permissions bitmap]. [I/O port permissions bitmap]: https://en.wikipedia.org/wiki/Task_state_segment#I.2FO_port_permissions The 64-bit TSS has the following format: Field | Type ------ | ---------------- (reserved) | `u32` Privilege Stack Table | `[u64; 3]` (reserved) | `u64` Interrupt Stack Table | `[u64; 7]` (reserved) | `u64` (reserved) | `u16` I/O Map Base Address | `u16` The _Privilege Stack Table_ is used by the CPU when the privilege level changes. For example, if an exception occurs while the CPU is in user mode (privilege level 3), the CPU normally switches to kernel mode (privilege level 0) before invoking the exception handler. In that case, the CPU would switch to the 0th stack in the Privilege Stack Table (since 0 is the target privilege level). We don't have any user mode programs yet, so we ignore this table for now. #### Creating a TSS Let's create a new TSS that contains our double fault stack in its interrupt stack table. For that we need a TSS struct. Fortunately, the `x86_64` crate already contains a [`TaskStateSegment` struct] that we can use: [`TaskStateSegment` struct]: https://docs.rs/x86_64/0.1.1/x86_64/structures/tss/struct.TaskStateSegment.html ```rust // in src/interrupts.rs use x86_64::structures::tss::TaskStateSegment; ``` Let's create a new TSS in our `interrupts::init` function: ```rust // in src/interrupts.rs use x86_64::VirtualAddress; const DOUBLE_FAULT_IST_INDEX: usize = 0; pub fn init(memory_controller: &mut MemoryController) { let double_fault_stack = memory_controller.alloc_stack(1) .expect("could not allocate double fault stack"); let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX] = VirtualAddress( double_fault_stack.top()); IDT.load(); } ``` We define that the 0th IST entry is the double fault stack (any other IST index would work too). We create a new TSS through the `TaskStateSegment::new` function and load the top address (stacks grow downwards) of the double fault stack into the 0th entry. #### Loading the TSS Now that we created a new TSS, we need a way to tell the CPU that it should use it. Unfortunately, this is a bit cumbersome, since the TSS is a Task State _Segment_ (for historical reasons). So instead of loading the table directly, we need to add a new segment descriptor to the [Global Descriptor Table] \(GDT). Then we can load our TSS invoking the [`ltr` instruction] with the respective GDT index. [Global Descriptor Table]: https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/ [`ltr` instruction]: https://www.felixcloutier.com/x86/ltr ### The Global Descriptor Table (again) The Global Descriptor Table (GDT) is a relict that was used for [memory segmentation] before paging became the de facto standard. It is still needed in 64-bit mode for various things such as kernel/user mode configuration or TSS loading. [memory segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation We already created a GDT [when switching to long mode]. Back then, we used assembly to create valid code and data segment descriptors, which were required to enter 64-bit mode. We could just edit that assembly file and add an additional TSS descriptor. However, we now have the expressiveness of Rust, so let's do it in Rust instead. [when switching to long mode]: @/edition-1/posts/02-entering-longmode/index.md#the-global-descriptor-table We start by creating a new `interrupts::gdt` submodule. For that we need to rename the `src/interrupts.rs` file to `src/interrupts/mod.rs`. Then we can create a new submodule: ```rust // in src/interrupts/mod.rs mod gdt; ``` ```rust // src/interrupts/gdt.rs pub struct Gdt { table: [u64; 8], next_free: usize, } impl Gdt { pub fn new() -> Gdt { Gdt { table: [0; 8], next_free: 1, } } } ``` We create a simple `Gdt` struct with two fields. The `table` field contains the actual GDT modeled as a `[u64; 8]`. Theoretically, a GDT can have up to 8192 entries, but this doesn't make much sense in 64-bit mode (since there is no real segmentation support). Eight entries should be more than enough for our system. The `next_free` field stores the index of the next free entry. We initialize it with `1` since the 0th entry needs always needs to be 0 in a valid GDT. #### User and System Segments There are two types of GDT entries in long mode: user and system segment descriptors. Descriptors for code and data segment segments are user segment descriptors. They contain no addresses since segments always span the complete address space on x86_64 (real segmentation is no longer supported). Thus, user segment descriptors only contain a few flags (e.g. present or user mode) and fit into a single `u64` entry. System descriptors such as TSS descriptors are different. They often contain a base address and a limit (e.g. TSS start and length) and thus need more than 64 bits. Therefore, system segments are 128 bits. They are stored as two consecutive entries in the GDT. Consequently, we model a `Descriptor` as an `enum`: ```rust // in src/interrupts/gdt.rs pub enum Descriptor { UserSegment(u64), SystemSegment(u64, u64), } ``` The flag bits are common between all descriptor types, so we create a general `DescriptorFlags` type (using the [bitflags] macro): [bitflags]: https://docs.rs/bitflags/0.9.1/bitflags/macro.bitflags.html ```rust // in src/interrupts/gdt.rs bitflags! { struct DescriptorFlags: u64 { const CONFORMING = 1 << 42; const EXECUTABLE = 1 << 43; const USER_SEGMENT = 1 << 44; const PRESENT = 1 << 47; const LONG_MODE = 1 << 53; } } ``` We only add flags that are relevant in 64-bit mode. For example, we omit the read/write bit, since it is completely ignored by the CPU in 64-bit mode. #### Code Segments We add a function to create kernel mode code segments: ```rust // in src/interrupts/gdt.rs impl Descriptor { pub fn kernel_code_segment() -> Descriptor { let flags = USER_SEGMENT | PRESENT | EXECUTABLE | LONG_MODE; Descriptor::UserSegment(flags.bits()) } } ``` We set the `USER_SEGMENT` bit to indicate a 64 bit user segment descriptor (otherwise the CPU expects a 128 bit system segment descriptor). The `PRESENT`, `EXECUTABLE`, and `LONG_MODE` bits are also needed for a 64-bit mode code segment. The data segment registers `ds`, `ss`, and `es` are completely ignored in 64-bit mode, so we don't need any data segment descriptors in our GDT. #### TSS Segments A TSS descriptor is a system segment descriptor with the following format: Bit(s) | Name | Meaning --------------------- | ------ | ---------------------------------- 0-15 | **limit 0-15** | the first 2 byte of the TSS's limit 16-39 | **base 0-23** | the first 3 byte of the TSS's base address 40-43 | **type** | must be `0b1001` for an available 64-bit TSS 44 | zero | must be 0 45-46 | privilege | the [ring level]: 0 for kernel, 3 for user 47 | **present** | must be 1 for valid selectors 48-51 | limit 16-19 | bits 16 to 19 of the segment's limit 52 | available | freely available to the OS 53-54 | ignored | 55 | granularity | if it's set, the limit is the number of pages, else it's a byte number 56-63 | **base 24-31** | the fourth byte of the base address 64-95 | **base 32-63** | the last four bytes of the base address 96-127 | ignored/must be zero | bits 104-108 must be zero, the rest is ignored [ring level]: https://wiki.osdev.org/Security#Rings We only need the bold fields for our TSS descriptor. For example, we don't need the `limit 16-19` field since a TSS has a fixed size that is smaller than `2^16`. Let's add a function to our descriptor that creates a TSS descriptor for a given TSS: ```rust // in src/interrupts/gdt.rs use x86_64::structures::tss::TaskStateSegment; impl Descriptor { pub fn tss_segment(tss: &'static TaskStateSegment) -> Descriptor { use core::mem::size_of; use bit_field::BitField; let ptr = tss as *const _ as u64; let mut low = PRESENT.bits(); // base low.set_bits(16..40, ptr.get_bits(0..24)); low.set_bits(56..64, ptr.get_bits(24..32)); // limit (the `-1` in needed since the bound is inclusive) low.set_bits(0..16, (size_of::() - 1) as u64); // type (0b1001 = available 64-bit tss) low.set_bits(40..44, 0b1001); let mut high = 0; high.set_bits(0..32, ptr.get_bits(32..64)); Descriptor::SystemSegment(low, high) } } ``` The `set_bits` and `get_bits` methods are provided by the [`BitField` trait] of the `bit_fields` crate. They allow us to easily get or set specific bits in an integer without using bit masks or shift operations. For example, we can do `x.set_bits(8..12, 42)` instead of `x = (x & 0xfffff0ff) | (42 << 8)`. [`BitField` trait]: https://docs.rs/bit_field/0.6.0/bit_field/trait.BitField.html#method.get_bit To link the `bit_fields` crate, we modify our `Cargo.toml` and our `src/lib.rs`: ```toml [dependencies] bit_field = "0.7.0" ``` ```rust extern crate bit_field; ``` We require the `'static` lifetime for the `TaskStateSegment` reference, since the hardware might access it on every interrupt as long as the OS runs. #### Adding Descriptors to the GDT In order to add descriptors to the GDT, we add a `add_entry` method: ```rust // in src/interrupts/gdt.rs use x86_64::structures::gdt::SegmentSelector; use x86_64::PrivilegeLevel; impl Gdt { pub fn add_entry(&mut self, entry: Descriptor) -> SegmentSelector { let index = match entry { Descriptor::UserSegment(value) => self.push(value), Descriptor::SystemSegment(value_low, value_high) => { let index = self.push(value_low); self.push(value_high); index } }; SegmentSelector::new(index as u16, PrivilegeLevel::Ring0) } } ``` For an user segment we just push the `u64` and remember the index. For a system segment, we push the low and high `u64` and use the index of the low value. We then use this index to return a new [SegmentSelector]. [SegmentSelector]: https://docs.rs/x86/0.8.0/x86/shared/segmentation/struct.SegmentSelector.html#method.new The `push` method looks like this: ```rust // in src/interrupts/gdt.rs impl Gdt { fn push(&mut self, value: u64) -> usize { if self.next_free < self.table.len() { let index = self.next_free; self.table[index] = value; self.next_free += 1; index } else { panic!("GDT full"); } } } ``` The method just writes to the `next_free` entry and returns the corresponding index. If there is no free entry left, we panic since this likely indicates a programming error (we should never need to create more than two or three GDT entries for our kernel). #### Loading the GDT To load the GDT, we add a new `load` method: ```rust // in src/interrupts/gdt.rs impl Gdt { pub fn load(&'static self) { use x86_64::instructions::tables::{DescriptorTablePointer, lgdt}; use core::mem::size_of; let ptr = DescriptorTablePointer { base: self.table.as_ptr() as u64, limit: (self.table.len() * size_of::() - 1) as u16, }; unsafe { lgdt(&ptr) }; } } ``` We use the [`DescriptorTablePointer` struct] and the [`lgdt` function] provided by the `x86_64` crate to load our GDT. Again, we require a `'static` reference since the GDT possibly needs to live for the rest of the run time. [`DescriptorTablePointer` struct]: https://docs.rs/x86_64/0.1.1/x86_64/instructions/tables/struct.DescriptorTablePointer.html [`lgdt` function]: https://docs.rs/x86_64/0.1.1/x86_64/instructions/tables/fn.lgdt.html ### Putting it together We now have a double fault stack and are able to create and load a TSS (which contains an IST). So let's put everything together to catch kernel stack overflows. We already created a new TSS in our `interrupts::init` function. Now we can load this TSS by creating a new GDT: ```rust // in src/interrupts/mod.rs pub fn init(memory_controller: &mut MemoryController) { let double_fault_stack = memory_controller.alloc_stack(1) .expect("could not allocate double fault stack"); let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX] = VirtualAddress( double_fault_stack.top()); let mut gdt = gdt::Gdt::new(); let code_selector = gdt.add_entry(gdt::Descriptor::kernel_code_segment()); let tss_selector = gdt.add_entry(gdt::Descriptor::tss_segment(&tss)); gdt.load(); IDT.load(); } ``` However, when we try to compile it, the following errors occur: ``` error: `tss` does not live long enough --> src/interrupts/mod.rs:118:68 | 118 | let tss_selector = gdt.add_entry(gdt::Descriptor::tss_segment(&tss)); | does not live long enough ^^^ ... 122 | } | - borrowed value only lives until here | = note: borrowed value must be valid for the static lifetime... error: `gdt` does not live long enough --> src/interrupts/mod.rs:119:5 | 119 | gdt.load(); | ^^^ does not live long enough ... 122 | } | - borrowed value only lives until here | = note: borrowed value must be valid for the static lifetime... ``` The problem is that we require that the TSS and GDT are valid for the rest of the run time (i.e. for the `'static` lifetime). But our created `tss` and `gdt` live on the stack and are thus destroyed at the end of the `init` function. So how do we fix this problem? We could allocate our TSS and GDT on the heap using `Box` and use [into_raw] and a bit of `unsafe` to convert it to `&'static` references ([RFC 1233] was closed unfortunately). Alternatively, we could store them in a `static` somehow. The [`lazy_static` macro] doesn't work here, since we need access to the `MemoryController` for initialization. However, we can use its fundamental building block, the [`spin::Once` type]. [into_raw]: https://doc.rust-lang.org/std/boxed/struct.Box.html#method.into_raw [RFC 1233]: https://github.com/rust-lang/rfcs/pull/1233 [`lazy_static` macro]: https://docs.rs/lazy_static/0.2.2/lazy_static/ [`spin::Once` type]: https://docs.rs/spin/0.4.5/spin/struct.Once.html #### spin::Once Let's try to solve our problem using [`spin::Once`][`spin::Once` type]: ```rust // in src/interrupts/mod.rs use spin::Once; static TSS: Once = Once::new(); static GDT: Once = Once::new(); ``` The `Once` type allows us to initialize a `static` at runtime. It is safe because the only way to access the static value is through the provided methods ([call_once][Once::call_once], [try][Once::try], and [wait][Once::wait]). Thus, no value can be read before initialization and the value can only be initialized once. [Once::call_once]: https://docs.rs/spin/0.4.5/spin/struct.Once.html#method.call_once [Once::try]: https://docs.rs/spin/0.4.5/spin/struct.Once.html#method.try [Once::wait]: https://docs.rs/spin/0.4.5/spin/struct.Once.html#method.wait (The `Once` was added in spin 0.4, so you're probably need to update your spin dependency.) So let's rewrite our `interrupts::init` function to use the static `TSS` and `GDT`: ```rust pub fn init(memory_controller: &mut MemoryController) { let double_fault_stack = memory_controller.alloc_stack(1) .expect("could not allocate double fault stack"); let tss = TSS.call_once(|| { let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX] = VirtualAddress( double_fault_stack.top()); tss }); let gdt = GDT.call_once(|| { let mut gdt = gdt::Gdt::new(); let code_selector = gdt.add_entry(gdt::Descriptor:: kernel_code_segment()); let tss_selector = gdt.add_entry(gdt::Descriptor::tss_segment(&tss)); gdt }); gdt.load(); IDT.load(); } ``` Now it should compile again! #### The final Steps We're almost done. We successfully loaded our new GDT, which contains a TSS descriptor. Now there are just a few steps left: 1. We changed our GDT, so we should reload the `cs`, the code segment register. This required since the old segment selector could point a different GDT descriptor now (e.g. a TSS descriptor). 2. We loaded a GDT that contains a TSS selector, but we still need to tell the CPU that it should use that TSS. 3. As soon as our TSS is loaded, the CPU has access to a valid interrupt stack table (IST). Then we can tell the CPU that it should use our new double fault stack by modifying our double fault IDT entry. For the first two steps, we need access to the `code_selector` and `tss_selector` variables outside of the closure. We can achieve this by moving the `let` declarations out of the closure: ```rust // in src/interrupts/mod.rs pub fn init(memory_controller: &mut MemoryController) { use x86_64::structures::gdt::SegmentSelector; use x86_64::instructions::segmentation::set_cs; use x86_64::instructions::tables::load_tss; ... let mut code_selector = SegmentSelector(0); let mut tss_selector = SegmentSelector(0); let gdt = GDT.call_once(|| { let mut gdt = gdt::Gdt::new(); code_selector = gdt.add_entry(gdt::Descriptor::kernel_code_segment()); tss_selector = gdt.add_entry(gdt::Descriptor::tss_segment(&tss)); gdt }); gdt.load(); unsafe { // reload code segment register set_cs(code_selector); // load TSS load_tss(tss_selector); } IDT.load(); } ``` We first set the descriptors to `empty` and then update them from inside the closure (which implicitly borrows them as `&mut`). Now we're able to reload the code segment register using [`set_cs`] and to load the TSS using [`load_tss`]. [`set_cs`]: https://docs.rs/x86_64/0.1.2/x86_64/instructions/segmentation/fn.set_cs.html [`load_tss`]: https://docs.rs/x86_64/0.1.2/x86_64/instructions/tables/fn.load_tss.html Now that we loaded a valid TSS and interrupt stack table, we can set the stack index for our double fault handler in the IDT: ```rust // in src/interrupt/mod.rs lazy_static! { static ref IDT: idt::Idt = { let mut idt = idt::Idt::new(); ... unsafe { idt.double_fault.set_handler_fn(double_fault_handler) .set_stack_index(DOUBLE_FAULT_IST_INDEX as u16); } ... }; } ``` The `set_stack_index` method is unsafe because the the caller must ensure that the used index is valid and not already used for another exception. That's it! Now the CPU should switch to the double fault stack whenever a double fault occurs. Thus, we are able to catch _all_ double faults, including kernel stack overflows: ![QEMU printing `EXCEPTION: DOUBLE FAULT` and a dump of the exception stack frame](qemu-double-fault-on-stack-overflow.png) From now on we should never see a triple fault again! ## What's next? Now that we mastered exceptions, it's time to explore another kind of interrupts: interrupts from external devices such as timers, keyboards, or network controllers. These hardware interrupts are very similar to exceptions, e.g. they are also dispatched through the IDT. However, unlike exceptions, they don't arise directly on the CPU. Instead, an _interrupt controller_ aggregates these interrupts and forwards them to CPU depending on their priority. In the next posts we will explore the two interrupt controller variants on x86: the [Intel 8259] \(“PIC”) and the [APIC]. This will allow us to react to keyboard and mouse input. [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 [APIC]: https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller ================================================ FILE: blog/content/edition-1/posts/_index.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false +++ ================================================ FILE: blog/content/edition-2/_index.md ================================================ +++ title = "Second Edition" template = "redirect-to-frontpage.html" aliases = ["second-edition/index.html"] +++ ================================================ FILE: blog/content/edition-2/extra/_index.md ================================================ +++ title = "Extra Content" insert_anchor_links = "left" render = false sort_by = "weight" page_template = "edition-2/extra.html" +++ ================================================ FILE: blog/content/edition-2/extra/building-on-android/index.md ================================================ +++ title = "Building on Android" weight = 3 aliases = ["second-edition/extra/building-on-android/index.html"] +++ I finally managed to get `blog_os` building on my Android phone using [termux](https://termux.com/). This post explains the necessary steps to set it up.
    This post is outdated and the instructions provided here might not work anymore.
    Screenshot of the compilation output from android ### Install Termux and Nightly Rust First, install [termux](https://termux.com/) from the [Google Play Store](https://play.google.com/store/apps/details?id=com.termux) or from F-Droid. After installing, open it and perform the following steps: - Install fish shell, set as default shell, and launch it: ``` pkg install fish chsh -s fish fish ``` This step is of course optional. However, if you continue with bash you will need to adjust some of the following commands to bash syntax. - Install some basic tools: ``` pkg install wget tar ``` - Add the [community repository by its-pointless](https://wiki.termux.com/wiki/Package_Management): ``` wget https://its-pointless.github.io/setup-pointless-repo.sh bash setup-pointless-repo.sh ``` - Install cargo and a nightly version of rustc: ``` pkg install rustc cargo rustc-nightly ``` - Prepend the nightly rustc path to your `PATH` in order to use nightly (fish syntax): ``` set -U fish_user_paths $PREFIX/opt/rust-nightly/bin/ $fish_user_paths ``` Now `rustc --version` should work and output a nightly version number. ### Install Git and Clone blog_os We need something to compile, so let's download the `blog_os` repository: - Install git: ``` pkg install git ``` - Clone the `blog_os` repository: ``` git clone https://github.com/phil-opp/blog_os.git ``` If you want to clone/push via SSH, you need to install the `openssh` package: `pkg install openssh`. ### Install Xbuild and Bootimage Now we're ready to install `cargo xbuild` and `bootimage` - Run `cargo install`: ``` cargo install cargo-xbuild bootimage ``` - Add the cargo bin directory to your `PATH` (fish syntax): ``` set -U fish_user_paths ~/.cargo/bin/ $fish_user_paths ``` Now `cargo xbuild` and `bootimage` should be available. It does not work yet because `cargo xbuild` needs access to the rust source code. By default it tries to use rustup for this, but we have no rustup support so we need a different way. ### Providing the Rust Source Code The Rust source code corresponding to our installed nightly is available in the [`its-pointless` repository](https://github.com/its-pointless/its-pointless.github.io): - Download a tar containing the source code: ``` wget https://github.com/its-pointless/its-pointless.github.io/raw/master/rust-src-nightly.tar.xz ``` - Extract it: ``` tar xf rust-src-nightly.tar.xz ``` - Set the `XARGO_RUST_SRC` environment variable to tell cargo-xbuild the source path (fish syntax): ``` set -Ux XARGO_RUST_SRC ~/rust-src-nightly/rust-src/lib/rustlib/src/rust/src ``` Now cargo-xbuild should no longer complain about a missing `rust-src` component. However it will throw an I/O error after building the sysroot. The problem is that the downloaded Rust source code has a different structure than the source provided by rustup. We can fix this by adding a symbolic link: ``` ln -s ~/../usr/opt/rust-nightly/bin ~/../usr/opt/rust-nightly/lib/rustlib/aarch64-linux-android/bin ``` Now `cargo xbuild --target x86_64-blog_os.json` and `bootimage build` should both work! I couldn't get QEMU to run yet, so you won't be able to run your kernel. If you manage to get it working, please tell me :). ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.ar.md ================================================ +++ title = "A Freestanding Rust Binary" weight = 1 path = "ar/freestanding-rust-binary" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "087a464ed77361cff6c459fb42fc655cb9eacbea" # GitHub usernames of the people that translated this post translators = ["ZAAFHachemrachid"] +++ تتمثل الخطوة الأولى في إنشاء نواة نظام التشغيل الخاصة بنا في إنشاء ملف Rust قابل للتنفيذ لا يربط المكتبة القياسية. هذا يجعل من الممكن تشغيل شيفرة Rust على [bare metal] دون نظام تشغيل أساسي. [bare metal]: https://en.wikipedia.org/wiki/Bare_machine تم تطوير هذه المدونة بشكل مفتوح على [GitHub]. إذا كان لديك أي مشاكل أو أسئلة، يرجى فتح مشكلة هناك. يمكنك أيضًا ترك تعليقات [في الأسفل]. يمكن العثور على الشيفرة المصدرية الكاملة لهذا المنشور في فرع [post-01][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## مقدمة لكتابة نواة نظام تشغيل، نحتاج إلى شيفرة لا تعتمد على أي ميزات نظام تشغيل. هذا يعني أنه لا يمكننا استخدام سلاسل الرسائل(threads) أو الملفات(File System) أو Heap ram أو الشبكة أو الأرقام العشوائية أو الإخراج القياسي(I/O) أو أي ميزات أخرى تتطلب تجريدات نظام التشغيل أو أجهزة معينة. وهذا منطقي، لأننا نحاول كتابة نظام التشغيل الخاص بنا (OS) وبرامج التشغيل الخاصة بنا (drivers). هذا يعني أنه لا يمكننا استخدام معظم [Rust standard library]، ولكن هناك الكثير من ميزات Rust التي _يمكننا استخدامها. على سبيل المثال، يمكننا استخدام [iterators] و [closures] و [pattern matching] و [option] و [اresult] و [string formatting] وبالطبع [ownership system]. هذه الميزات تجعل من الممكن كتابة نواة بطريقة معبرة جدًا وعالية المستوى دون القلق بشأن [undefined behavior] أو [memory safety]. [option]: https://doc.rust-lang.org/core/option/ [result]:https://doc.rust-lang.org/core/result/ [Rust standard library]: https://doc.rust-lang.org/std/ [iterators]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [string formatting]: https://doc.rust-lang.org/core/macro.write.html [ownership system]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [undefined behavior]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [memory safety]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention من أجل إنشاء نواة نظام تشغيل في Rust، نحتاج إلى إنشاء ملف قابل للتنفيذ يمكن تشغيله بدون نظام تشغيل أساسي. غالبًا ما يُطلق على هذا الملف القابل للتنفيذ اسم الملف القابل للتنفيذ ”القائم بذاته“ أو ”المعدني العاري“. يصف هذا المنشور الخطوات اللازمة لإنشاء ثنائي Rust قائم بذاته ويشرح سبب الحاجة إلى هذه الخطوات. إذا كنت مهتمًا بمثال بسيط فقط، يمكنك **[الانتقال إلى الملخص] (#ملخص)**. ## تعطيل المكتبة القياسية بشكل افتراضي، تربط جميع صناديق Rust [standard library]، والتي تعتمد على نظام التشغيل لميزات (مثل threads, files, or networking). كما أنها تعتمد أيضًا على مكتبة C القياسية 'libc'، والتي تتفاعل بشكل وثيق مع خدمات نظام التشغيل. نظرًا لأن خطتنا هي كتابة نظام تشغيل، لا يمكننا استخدام أي مكتبات تعتمد على نظام التشغيل. لذا يجب علينا تعطيل التضمين التلقائي للمكتبة القياسية من خلال سمة [no_std]. [standard library]: https://doc.rust-lang.org/std/ [`no_std` attribute]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html ``` cargo new blog_os --bin --edition 2024 ``` لقد أطلقتُ على المشروع اسم ”Blog_os“، ولكن بالطبع يمكنك اختيار اسمك الخاص. تُحدّد علامة ”bin“ أننا نريد إنشاء نسخة binary قابلة للتنفيذ (على عكس المكتبة) وتحدّد علامة ”--- Edition 2024“ أننا نريد استخدام [2024 edition] من Rust لصندوقنا. عندما نُشغّل الأمر، تُنشئ لنا الشحنة بنية الدليل التالية: [2024 edition]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` يحتوي ملف 'Cargo.toml' على تكوين الصندوق، على سبيل المثال اسم الصندوق، والمؤلف، ورقم [semantic version]، والتبعيات. يحتوي الملف 'src/main.rs' على الوحدة النمطية الجذرية للصندوق والدالة 'الرئيسية'. يمكنك تجميع قفصك من خلال 'cargo build' ثم تشغيل الملف الثنائي 'blog_os' المجمّع في المجلد الفرعي 'target/debug'. [semantic version]: https://semver.org/ ### السمة 'no_std' يربط صندوقنا الآن المكتبة القياسية ضمنيًا بالمكتبة القياسية. دعونا نحاول تعطيل ذلك بإضافة سمة [no_std]: ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` عندما نحاول بناءه الآن (عن طريق تشغيل ”cargo build“)، يحدث الخطأ التالي: ``` error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` والسبب في هذا الخطأ هو أن [`println` macro] هو جزء من المكتبة القياسية، والتي لم نعد نضمّنها. لذا لم يعد بإمكاننا طباعة الأشياء. هذا أمر منطقي، لأن 'println' يكتب إلى [standard output]، وهو واصف ملف خاص يوفره نظام التشغيل. [`println` macro]: https://doc.rust-lang.org/std/macro.println.html [standard output]: https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29 لذا دعنا نحذف الطباعة ونحاول مرة أخرى بدالة رئيسية فارغة: ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` يفتقد بناء المترجمات البرمجية الآن إلى `#[panic_handler]` دالة و _language item_. ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.es.md ================================================ +++ title = "Un Binario Rust Autónomo" weight = 1 path = "es/freestanding-rust-binary" date = 2018-02-10 [extra] chapter = "Bare Bones" # GitHub usernames of the people that translated this post translators = ["dobleuber"] translation_contributors = ["richarddalves"] +++ El primer paso para crear nuestro propio kernel de sistema operativo es crear un ejecutable en Rust que no enlace con la biblioteca estándar. Esto hace posible ejecutar código Rust directamente en el [bare metal] sin un sistema operativo subyacente. [bare metal]: https://en.wikipedia.org/wiki/Bare_machine Este blog se desarrolla abiertamente en [GitHub]. Si tienes algún problema o pregunta, por favor abre un issue allí. También puedes dejar comentarios [al final]. El código fuente completo para esta publicación se encuentra en la rama [`post-01`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## Introducción Para escribir un kernel de sistema operativo, necesitamos código que no dependa de características del sistema operativo. Esto significa que no podemos usar hilos, archivos, memoria dinámica, redes, números aleatorios, salida estándar ni ninguna otra característica que requiera abstracciones de sistema operativo o hardware específico. Esto tiene sentido, ya que estamos intentando escribir nuestro propio sistema operativo y nuestros propios controladores. Esto implica que no podemos usar la mayor parte de la [biblioteca estándar de Rust], pero hay muchas características de Rust que sí _podemos_ usar. Por ejemplo, podemos utilizar [iteradores], [closures], [pattern matching], [option] y [result], [formateo de cadenas] y, por supuesto, el [sistema de ownership]. Estas características hacen posible escribir un kernel de una manera muy expresiva y de alto nivel, sin preocuparnos por el [comportamiento indefinido] o la [seguridad de la memoria]. [option]: https://doc.rust-lang.org/core/option/ [result]: https://doc.rust-lang.org/core/result/ [biblioteca estándar de Rust]: https://doc.rust-lang.org/std/ [iteradores]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [formateo de cadenas]: https://doc.rust-lang.org/core/macro.write.html [sistema de ownership]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [comportamiento indefinido]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [seguridad de la memoria]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention Para crear un kernel de sistema operativo en Rust, necesitamos crear un ejecutable que pueda ejecutarse sin un sistema operativo subyacente. Dicho ejecutable se llama frecuentemente un ejecutable “autónomo” o de “bare metal”. Esta publicación describe los pasos necesarios para crear un binario autónomo en Rust y explica por qué son necesarios. Si solo te interesa un ejemplo mínimo, puedes **[saltar al resumen](#resumen)**. ## Deshabilitando la Biblioteca Estándar Por defecto, todos los crates de Rust enlazan con la [biblioteca estándar], que depende del sistema operativo para características como hilos, archivos o redes. También depende de la biblioteca estándar de C, `libc`, que interactúa estrechamente con los servicios del sistema operativo. Como nuestro plan es escribir un sistema operativo, no podemos usar ninguna biblioteca que dependa del sistema operativo. Por lo tanto, tenemos que deshabilitar la inclusión automática de la biblioteca estándar mediante el atributo [`no_std`]. [biblioteca estándar]: https://doc.rust-lang.org/std/ [`no_std`]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html Comenzamos creando un nuevo proyecto de aplicación en Cargo. La forma más fácil de hacerlo es a través de la línea de comandos: ``` cargo new blog_os --bin --edition 2024 ``` Nombré el proyecto `blog_os`, pero, por supuesto, puedes elegir tu propio nombre. La bandera `--bin` especifica que queremos crear un binario ejecutable (en contraste con una biblioteca), y la bandera `--edition 2024` indica que queremos usar la [edición 2024] de Rust para nuestro crate. Al ejecutar el comando, Cargo crea la siguiente estructura de directorios para nosotros: [2024 edition]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` El archivo `Cargo.toml` contiene la configuración del crate, como el nombre del crate, el autor, el número de [versión semántica] y las dependencias. El archivo `src/main.rs` contiene el módulo raíz de nuestro crate y nuestra función `main`. Puedes compilar tu crate utilizando `cargo build` y luego ejecutar el binario compilado `blog_os` ubicado en la subcarpeta `target/debug`. [semantic version]: https://semver.org/ ### El Atributo `no_std` Actualmente, nuestro crate enlaza implícitamente con la biblioteca estándar. Intentemos deshabilitar esto añadiendo el [`atributo no_std`]: ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` Cuando intentamos compilarlo ahora (ejecutando `cargo build`), ocurre el siguiente error: ``` error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` La razón de este error es que la [macro `println`] forma parte de la biblioteca estándar, la cual ya no estamos incluyendo. Por lo tanto, ya no podemos imprimir cosas. Esto tiene sentido, ya que `println` escribe en la [salida estándar], que es un descriptor de archivo especial proporcionado por el sistema operativo. [macro `println`]: https://doc.rust-lang.org/std/macro.println.html [salida estándar]: https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29 Así que eliminemos la impresión e intentemos de nuevo con una función `main` vacía: ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` Ahora el compilador indica que falta una función `#[panic_handler]` y un _elemento de lenguaje_ (_language item_). ## Implementación de Panic El atributo `panic_handler` define la función que el compilador invoca cuando ocurre un [panic]. La biblioteca estándar proporciona su propia función de panico, pero en un entorno `no_std` debemos definirla nosotros mismos: [panic]: https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html ```rust // en main.rs use core::panic::PanicInfo; /// Esta función se llama cuando ocurre un pánico. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` El [parámetro `PanicInfo`][PanicInfo] contiene el archivo y la línea donde ocurrió el panic, así como el mensaje opcional del panic. La función no debería retornar nunca, por lo que se marca como una [función divergente][diverging function] devolviendo el [tipo “never”][“never” type] `!`. Por ahora, no hay mucho que podamos hacer en esta función, así que simplemente entramos en un bucle infinito. [PanicInfo]: https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html [diverging function]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions [“never” type]: https://doc.rust-lang.org/nightly/std/primitive.never.html ## El Elemento de Lenguaje `eh_personality` Los elementos de lenguaje son elementos especiales (traits, funciones, tipos, etc.) que el compilador requiere internamente. Por ejemplo, el trait [`Copy`] es un elemento de lenguaje que indica al compilador qué tipos tienen [_semántica de copia_][`Copy`]. Si observamos su [implementación][copy code], veremos que tiene el atributo especial `#[lang = "copy"]`, que lo define como un elemento de lenguaje. [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [copy code]: https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299 Aunque es posible proporcionar implementaciones personalizadas de elementos de lenguaje, esto debería hacerse solo como último recurso. La razón es que los elementos de lenguaje son detalles de implementación altamente inestables y ni siquiera están verificados por tipos (el compilador no comprueba si una función tiene los tipos de argumento correctos). Afortunadamente, hay una forma más estable de solucionar el error relacionado con el elemento de lenguaje mencionado. El [elemento de lenguaje `eh_personality`][`eh_personality` language item] marca una función utilizada para implementar el [desenrollado de pila][stack unwinding]. Por defecto, Rust utiliza unwinding para ejecutar los destructores de todas las variables de pila activas en caso de un [pánico][panic]. Esto asegura que toda la memoria utilizada sea liberada y permite que el hilo principal capture el pánico y continúe ejecutándose. Sin embargo, el unwinding es un proceso complicado y requiere algunas bibliotecas específicas del sistema operativo (por ejemplo, [libunwind] en Linux o [manejadores estructurados de excepciones][structured exception handling] en Windows), por lo que no queremos usarlo en nuestro sistema operativo. [`eh_personality` language item]: https://github.com/rust-lang/rust/blob/edb368491551a77d77a48446d4ee88b35490c565/src/libpanic_unwind/gcc.rs#L11-L45 [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php [libunwind]: https://www.nongnu.org/libunwind/ [structured exception handling]: https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling [panic]: https://doc.rust-lang.org/book/ch09-01-unrecoverable-errors-with-panic.html ### Deshabilitando el Unwinding Existen otros casos de uso en los que el no es deseable, por lo que Rust proporciona una opción para [abortar en caso de pánico][abort on panic]. Esto desactiva la generación de información de símbolos de unwinding y, por lo tanto, reduce considerablemente el tamaño del binario. Hay múltiples lugares donde podemos deshabilitar el unwinding. La forma más sencilla es agregar las siguientes líneas a nuestro archivo `Cargo.toml`: ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` Esto establece la estrategia de pánico en `abort` tanto para el perfil `dev` (utilizado en `cargo build`) como para el perfil `release` (utilizado en `cargo build --release`). Ahora, el elemento de lenguaje `eh_personality` ya no debería ser necesario. [abort on panic]: https://github.com/rust-lang/rust/pull/32900 Ahora hemos solucionado ambos errores anteriores. Sin embargo, si intentamos compilarlo ahora, ocurre otro error: ``` > cargo build error: requires `start` lang_item ``` Nuestro programa carece del elemento de lenguaje `start`, que define el punto de entrada. ## El Atributo `start` Podría pensarse que la función `main` es la primera que se ejecuta al correr un programa. Sin embargo, la mayoría de los lenguajes tienen un [sistema de tiempo de ejecución][runtime system], encargado de tareas como la recolección de basura (por ejemplo, en Java) o los hilos de software (por ejemplo, goroutines en Go). Este sistema de tiempo de ejecución necesita ejecutarse antes de `main`, ya que debe inicializarse. [runtime system]: https://en.wikipedia.org/wiki/Runtime_system En un binario típico de Rust que enlaza con la biblioteca estándar, la ejecución comienza en una biblioteca de tiempo de ejecución de C llamada `crt0` ("C runtime zero"), que configura el entorno para una aplicación en C. Esto incluye la creación de una pila y la colocación de los argumentos en los registros adecuados. Luego, el tiempo de ejecución de C invoca el [punto de entrada del tiempo de ejecución de Rust][rt::lang_start], que está marcado por el elemento de lenguaje `start`. Rust tiene un tiempo de ejecución muy minimalista, que se encarga de tareas menores como configurar los guardias de desbordamiento de pila o imprimir un backtrace en caso de pánico. Finalmente, el tiempo de ejecución llama a la función `main`. [rt::lang_start]: https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73 Nuestro ejecutable autónomo no tiene acceso al tiempo de ejecución de Rust ni a `crt0`, por lo que necesitamos definir nuestro propio punto de entrada. Implementar el elemento de lenguaje `start` no ayudaría, ya que aún requeriría `crt0`. En su lugar, debemos sobrescribir directamente el punto de entrada de `crt0`. ### Sobrescribiendo el Punto de Entrada Para indicar al compilador de Rust que no queremos usar la cadena normal de puntos de entrada, agregamos el atributo `#![no_main]`: ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// Esta función se llama cuando ocurre un pánico. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` Podrás notar que eliminamos la función `main`. La razón es que una función `main` no tiene sentido sin un sistema de tiempo de ejecución subyacente que la invoque. En su lugar, estamos sobrescribiendo el punto de entrada del sistema operativo con nuestra propia función `_start`: ```rust #[no_mangle] pub extern "C" fn _start() -> ! { loop {} } ``` Al usar el atributo `#[no_mangle]`, deshabilitamos el [name mangling] para asegurarnos de que el compilador de Rust realmente genere una función con el nombre `_start`. Sin este atributo, el compilador generaría un símbolo críptico como `_ZN3blog_os4_start7hb173fedf945531caE` para dar un nombre único a cada función. Este atributo es necesario porque necesitamos informar al enlazador el nombre de la función de punto de entrada en el siguiente paso. También debemos marcar la función como `extern "C"` para indicar al compilador que debe usar la [convención de llamadas en C][C calling convention] para esta función (en lugar de la convención de llamadas de Rust, que no está especificada). El motivo para nombrar la función `_start` es que este es el nombre predeterminado del punto de entrada en la mayoría de los sistemas. [name mangling]: https://en.wikipedia.org/wiki/Name_mangling [C calling convention]: https://en.wikipedia.org/wiki/Calling_convention El tipo de retorno `!` significa que la función es divergente, es decir, no está permitido que retorne nunca. Esto es necesario porque el punto de entrada no es llamado por ninguna función, sino que es invocado directamente por el sistema operativo o el bootloader. En lugar de retornar, el punto de entrada debería, por ejemplo, invocar la [llamada al sistema `exit`][`exit` system call] del sistema operativo. En nuestro caso, apagar la máquina podría ser una acción razonable, ya que no queda nada por hacer si un binario autónomo regresa. Por ahora, cumplimos con este requisito entrando en un bucle infinito. [`exit` system call]: https://en.wikipedia.org/wiki/Exit_(system_call) Cuando ejecutamos `cargo build` ahora, obtenemos un feo error del _linker_ (enlazador). ## Errores del Enlazador El enlazador es un programa que combina el código generado en un ejecutable. Dado que el formato del ejecutable varía entre Linux, Windows y macOS, cada sistema tiene su propio enlazador que lanza errores diferentes. Sin embargo, la causa fundamental de los errores es la misma: la configuración predeterminada del enlazador asume que nuestro programa depende del tiempo de ejecución de C, lo cual no es cierto. Para solucionar los errores, necesitamos informar al enlazador que no debe incluir el tiempo de ejecución de C. Esto puede hacerse pasando un conjunto específico de argumentos al enlazador o construyendo para un destino de bare metal. ### Construyendo para un Destino de Bare Metal Por defecto, Rust intenta construir un ejecutable que pueda ejecutarse en el entorno actual de tu sistema. Por ejemplo, si estás usando Windows en `x86_64`, Rust intenta construir un ejecutable `.exe` para Windows que utilice instrucciones `x86_64`. Este entorno se llama tu sistema "host". Para describir diferentes entornos, Rust utiliza una cadena llamada [_target triple_]. Puedes ver el _target triple_ de tu sistema host ejecutando: ``` rustc 1.35.0-nightly (474e7a648 2019-04-07) binary: rustc commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab commit-date: 2019-04-07 host: x86_64-unknown-linux-gnu release: 1.35.0-nightly LLVM version: 8.0 ``` El resultado anterior es de un sistema Linux `x86_64`. Vemos que la tripleta del `host` es `x86_64-unknown-linux-gnu`, lo que incluye la arquitectura de la CPU (`x86_64`), el proveedor (`unknown`), el sistema operativo (`linux`) y el [ABI] (`gnu`). [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface Al compilar para la tripleta del host, el compilador de Rust y el enlazador asumen que hay un sistema operativo subyacente como Linux o Windows que utiliza el tiempo de ejecución de C de forma predeterminada, lo que provoca los errores del enlazador. Para evitar estos errores, podemos compilar para un entorno diferente que no tenga un sistema operativo subyacente. Un ejemplo de este tipo de entorno bare metal es la tripleta de destino `thumbv7em-none-eabihf`, que describe un sistema [embebido][embedded] basado en [ARM]. Los detalles no son importantes, lo que importa es que la tripleta de destino no tiene un sistema operativo subyacente, lo cual se indica por el `none` en la tripleta de destino. Para poder compilar para este destino, necesitamos agregarlo usando `rustup`: ``` rustup target add thumbv7em-none-eabihf ``` Esto descarga una copia de las bibliotecas estándar (y core) para el sistema. Ahora podemos compilar nuestro ejecutable autónomo para este destino: ``` cargo build --target thumbv7em-none-eabihf ``` Al pasar un argumento `--target`, realizamos un [compilado cruzado][cross compile] de nuestro ejecutable para un sistema bare metal. Dado que el sistema de destino no tiene un sistema operativo, el enlazador no intenta enlazar con el tiempo de ejecución de C, y nuestra compilación se completa sin errores del enlazador. [cross compile]: https://en.wikipedia.org/wiki/Cross_compiler Este es el enfoque que utilizaremos para construir nuestro kernel de sistema operativo. En lugar de `thumbv7em-none-eabihf`, utilizaremos un [destino personalizado][custom target] que describa un entorno bare metal `x86_64`. Los detalles se explicarán en la siguiente publicación. [custom target]: https://doc.rust-lang.org/rustc/targets/custom.html ### Argumentos del Enlazador En lugar de compilar para un sistema bare metal, también es posible resolver los errores del enlazador pasando un conjunto específico de argumentos al enlazador. Este no es el enfoque que usaremos para nuestro kernel, por lo tanto, esta sección es opcional y se proporciona solo para completar. Haz clic en _"Argumentos del Enlazador"_ a continuación para mostrar el contenido opcional.
    Argumentos del Enlazador En esta sección discutimos los errores del enlazador que ocurren en Linux, Windows y macOS, y explicamos cómo resolverlos pasando argumentos adicionales al enlazador. Ten en cuenta que el formato del ejecutable y el enlazador varían entre sistemas operativos, por lo que se requiere un conjunto diferente de argumentos para cada sistema. #### Linux En Linux ocurre el siguiente error del enlazador (resumido): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` El problema es que el enlazador incluye por defecto la rutina de inicio del tiempo de ejecución de C, que también se llama `_start`. Esta rutina requiere algunos símbolos de la biblioteca estándar de C `libc` que no incluimos debido al atributo `no_std`, por lo que el enlazador no puede resolver estas referencias. Para solucionar esto, podemos indicar al enlazador que no enlace la rutina de inicio de C pasando la bandera `-nostartfiles`. Una forma de pasar atributos al enlazador a través de Cargo es usar el comando `cargo rustc`. Este comando se comporta exactamente como `cargo build`, pero permite pasar opciones a `rustc`, el compilador subyacente de Rust. `rustc` tiene la bandera `-C link-arg`, que pasa un argumento al enlazador. Combinados, nuestro nuevo comando de compilación se ve así: ``` cargo rustc -- -C link-arg=-nostartfiles ``` ¡Ahora nuestro crate se compila como un ejecutable autónomo en Linux! No fue necesario especificar explícitamente el nombre de nuestra función de punto de entrada, ya que el enlazador busca una función con el nombre `_start` por defecto. #### Windows En Windows, ocurre un error del enlazador diferente (resumido): ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` El error "entry point must be defined" significa que el enlazador no puede encontrar el punto de entrada. En Windows, el nombre predeterminado del punto de entrada [depende del subsistema utilizado][windows-subsystems]. Para el subsistema `CONSOLE`, el enlazador busca una función llamada `mainCRTStartup`, y para el subsistema `WINDOWS`, busca una función llamada `WinMainCRTStartup`. Para anular este comportamiento predeterminado y decirle al enlazador que busque nuestra función `_start`, podemos pasar un argumento `/ENTRY` al enlazador: [windows-subsystems]: https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol ``` cargo rustc -- -C link-arg=/ENTRY:_start ``` Por el formato diferente del argumento, podemos ver claramente que el enlazador de Windows es un programa completamente distinto al enlazador de Linux. Ahora ocurre un error diferente del enlazador: ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` Este error ocurre porque los ejecutables de Windows pueden usar diferentes [subsistemas][windows-subsystems]. En programas normales, se infieren dependiendo del nombre del punto de entrada: si el punto de entrada se llama `main`, se usa el subsistema `CONSOLE`, y si el punto de entrada se llama `WinMain`, se usa el subsistema `WINDOWS`. Dado que nuestra función `_start` tiene un nombre diferente, necesitamos especificar el subsistema explícitamente: ``` cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` Aquí usamos el subsistema `CONSOLE`, pero el subsistema `WINDOWS` también funcionaría. En lugar de pasar `-C link-arg` varias veces, podemos usar `-C link-args`, que acepta una lista de argumentos separados por espacios. Con este comando, nuestro ejecutable debería compilarse exitosamente en Windows. #### macOS En macOS, ocurre el siguiente error del enlazador (resumido): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` Este mensaje de error nos indica que el enlazador no puede encontrar una función de punto de entrada con el nombre predeterminado `main` (por alguna razón, en macOS todas las funciones tienen un prefijo `_`). Para establecer el punto de entrada en nuestra función `_start`, pasamos el argumento del enlazador `-e`: ``` cargo rustc -- -C link-args="-e __start" ``` La bandera `-e` especifica el nombre de la función de punto de entrada. Dado que en macOS todas las funciones tienen un prefijo adicional `_`, necesitamos establecer el punto de entrada en `__start` en lugar de `_start`. Ahora ocurre el siguiente error del enlazador: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` macOS [no admite oficialmente binarios enlazados estáticamente] y requiere que los programas enlacen la biblioteca `libSystem` por defecto. Para anular esto y enlazar un binario estático, se pasa la bandera `-static` al enlazador: [no admite oficialmente binarios enlazados estáticamente]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ``` cargo rustc -- -C link-args="-e __start -static" ``` Esto aún no es suficiente, ya que ocurre un tercer error del enlazador: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` Este error ocurre porque los programas en macOS enlazan con `crt0` (“C runtime zero”) por defecto. Esto es similar al error que tuvimos en Linux y también se puede resolver añadiendo el argumento del enlazador `-nostartfiles`: ``` cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` Ahora nuestro programa debería compilarse exitosamente en macOS. #### Unificando los Comandos de Construcción Actualmente, tenemos diferentes comandos de construcción dependiendo de la plataforma host, lo cual no es ideal. Para evitar esto, podemos crear un archivo llamado `.cargo/config.toml` que contenga los argumentos específicos de cada plataforma: ```toml # en .cargo/config.toml [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` La clave `rustflags` contiene argumentos que se añaden automáticamente a cada invocación de `rustc`. Para más información sobre el archivo `.cargo/config.toml`, consulta la [documentación oficial](https://doc.rust-lang.org/cargo/reference/config.html). Ahora nuestro programa debería poder construirse en las tres plataformas con un simple `cargo build`. #### ¿Deberías Hacer Esto? Aunque es posible construir un ejecutable autónomo para Linux, Windows y macOS, probablemente no sea una buena idea. La razón es que nuestro ejecutable aún espera varias cosas, por ejemplo, que una pila esté inicializada cuando se llama a la función `_start`. Sin el tiempo de ejecución de C, algunos de estos requisitos podrían no cumplirse, lo que podría hacer que nuestro programa falle, por ejemplo, con un error de segmentación. Si deseas crear un binario mínimo que se ejecute sobre un sistema operativo existente, incluir `libc` y configurar el atributo `#[start]` como se describe [aquí](https://doc.rust-lang.org/1.16.0/book/no-stdlib.html) probablemente sea una mejor idea.
    ## Resumen {#resumen} Un binario mínimo autónomo en Rust se ve así: `src/main.rs`: ```rust #![no_std] // no enlazar con la biblioteca estándar de Rust #![no_main] // deshabilitar todos los puntos de entrada a nivel de Rust use core::panic::PanicInfo; #[no_mangle] // no modificar el nombre de esta función pub extern "C" fn _start() -> ! { // esta función es el punto de entrada, ya que el enlazador busca una función // llamada `_start` por defecto loop {} } /// Esta función se llama cuando ocurre un pánico. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # el perfil usado para `cargo build` [profile.dev] panic = "abort" # deshabilitar el desenrollado de la pila en caso de pánico # el perfil usado para `cargo build --release` [profile.release] panic = "abort" # deshabilitar el desenrollado de la pila en caso de pánico ``` Para construir este binario, necesitamos compilar para un destino bare metal, como `thumbv7em-none-eabihf`: ``` cargo build --target thumbv7em-none-eabihf ``` Alternativamente, podemos compilarlo para el sistema host pasando argumentos adicionales al enlazador: ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` Ten en cuenta que este es solo un ejemplo mínimo de un binario autónomo en Rust. Este binario espera varias cosas, por ejemplo, que una pila esté inicializada cuando se llama a la función `_start`. **Por lo tanto, para cualquier uso real de un binario como este, se requieren más pasos**. ## ¿Qué sigue? La [próxima publicación][next post] explica los pasos necesarios para convertir nuestro binario autónomo en un kernel de sistema operativo mínimo. Esto incluye crear un destino personalizado, combinar nuestro ejecutable con un bootloader y aprender cómo imprimir algo en la pantalla. [next post]: @/edition-2/posts/02-minimal-rust-kernel/index.md ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.fa.md ================================================ +++ title = " یک باینری مستقل Rust" weight = 1 path = "fa/freestanding-rust-binary" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "80136cc0474ae8d2da04f391b5281cfcda068c1a" # GitHub usernames of the people that translated this post translators = ["hamidrezakp", "MHBahrampour"] rtl = true +++ اولین قدم برای نوشتن سیستم‌عامل، ساخت یک باینری راست (کلمه: Rust) هست که به کتابخانه استاندارد نیازمند نباشد. این باعث می‌شود تا بتوانیم کد راست را بدون سیستم‌عامل زیرین، بر روی سخت افزار [bare metal] اجرا کنیم. [bare metal]: https://en.wikipedia.org/wiki/Bare_machine این بلاگ بصورت آزاد بر روی [گیت‌هاب] توسعه داده شده. اگر مشکل یا سوالی دارید، لطفاً آن‌جا یک ایشو باز کنید. همچنین می‌توانید [در زیر] این پست کامنت بگذارید. سورس کد کامل این پست را می‌توانید در بِرَنچ [`post-01`][post branch] پیدا کنید. [گیت‌هاب]: https://github.com/phil-opp/blog_os [در زیر]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## مقدمه برای نوشتن هسته سیستم‌عامل، ما به کدی نیاز داریم که به هیچ یک از ویژگی‌های سیستم‌عامل نیازی نداشته باشد. یعنی نمی‌توانیم از نخ‌ها (ترجمه: Threads)، فایل‌ها، حافظه هیپ (کلمه: Heap)، شبکه، اعداد تصادفی، ورودی استاندارد، یا هر ویژگی دیگری که نیاز به انتزاعات سیستم‌عامل یا سخت‌افزار خاصی داشته، استفاده کنیم. منطقی هم به نظر می‌رسد، چون ما سعی داریم سیستم‌عامل و درایور‌های خودمان را بنویسیم. نداشتن انتزاعات سیستم‌عامل به این معنی هست که نمی‌توانیم از بخش زیادی از [کتابخانه استاندارد راست] استفاده کنیم، اما هنوز بسیاری از ویژگی‌های راست هستند که می‌توانیم از آن‌ها استفاده کنیم. به عنوان مثال، می‌توانیم از [iterator] ها، [closure] ها، [pattern matching]، [option]، [result]، [string formatting] و البته [سیستم ownership] استفاده کنیم. این ویژگی‌ها به ما امکان نوشتن هسته به طور رسا، سطح بالا و بدون نگرانی درباره [رفتار تعریف نشده] و [امنیت حافظه] را میدهند. [option]: https://doc.rust-lang.org/core/option/ [result]:https://doc.rust-lang.org/core/result/ [کتابخانه استاندارد راست]: https://doc.rust-lang.org/std/ [iterators]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [string formatting]: https://doc.rust-lang.org/core/macro.write.html [سیستم ownership]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [رفتار تعریف نشده]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [امنیت حافظه]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention برای ساختن یک هسته سیستم‌عامل به زبان راست، باید فایل اجرایی‌ای بسازیم که بتواند بدون سیستم‌عامل زیرین اجرا بشود. چنین فایل اجرایی، فایل اجرایی مستقل (ترجمه: freestanding) یا فایل اجرایی “bare-metal” نامیده می‌شود. این پست قدم‌های لازم برای ساخت یک باینری مستقل راست و اینکه چرا این قدم‌ها نیاز هستند را توضیح می‌دهد. اگر علاقه‌ایی به خواندن کل توضیحات ندارید، می‌توانید **[به قسمت خلاصه مراجعه کنید](#summary)**. ## غیر فعال کردن کتابخانه استاندارد به طور پیش‌فرض تمام کِرِیت‌های راست، از [کتابخانه استاندارد] استفاده می‌کنند(لینک به آن دارند)، که به سیستم‌عامل برای قابلیت‌هایی مثل نخ‌ها، فایل‌ها یا شبکه وابستگی دارد. همچنین به کتابخانه استاندارد زبان سی، `libc` هم وابسطه هست که با سرویس‌های سیستم‌عامل تعامل نزدیکی دارند. از آن‌جا که قصد داریم یک سیستم‌عامل بنویسیم، نمی‌توانیم از هیچ کتابخانه‌ایی که به سیستم‌عامل نیاز داشته باشد استفاده کنیم. بنابراین باید اضافه شدن خودکار کتابخانه استاندارد را از طریق [خاصیت `no_std`] غیر فعال کنیم. [کتابخانه استاندارد]: https://doc.rust-lang.org/std/ [خاصیت `no_std`]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html با ساخت یک اپلیکیشن جدید کارگو شروع می‌کنیم. ساده‌ترین راه برای انجام این کار از طریق خط فرمان است: ``` cargo new blog_os --bin --edition 2024 ``` نام پروژه را `blog_os‍` گذاشتم، اما شما می‌توانید نام دلخواه خود را انتخاب کنید. پرچمِ (ترجمه: Flag) `bin--` مشخص می‌کند که ما می‌خواهیم یک فایل اجرایی ایجاد کنیم (به جای یک کتابخانه) و پرچمِ `edition 2024--` مشخص می‌کند که می‌خواهیم از [ویرایش 2024] زبان راست برای کریت خود استفاده کنیم. وقتی دستور را اجرا می‌کنیم، کارگو ساختار پوشه‌های زیر را برای ما ایجاد می‌کند: [ویرایش 2024]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` فایل `Cargo.toml` شامل تنظیمات کریت می‌باشد، به عنوان مثال نام کریت، نام نویسنده، شماره [نسخه سمنتیک] و وابستگی‌ها. فایل `src/main.rs` شامل ماژول ریشه برای کریت ما و تابع `main` است. می‌توانید کریت خود را با دستور `cargo build‍` کامپایل کنید و سپس باینری کامپایل شده ‍‍`blog_os` را در زیرپوشه `target/debug` اجرا کنید. [نسخه سمنتیک]: https://semver.org/ ### خاصیت `no_std` در حال حاظر کریت ما بطور ضمنی به کتابخانه استاندارد لینک دارد. بیایید تا سعی کنیم آن را با اضافه کردن [خاصیت `no_std`] غیر فعال کنیم: ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` حالا وقتی سعی می‌کنیم تا بیلد کنیم (با اجرای دستور `cargo build`)، خطای زیر رخ می‌دهد: ``` error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` دلیل این خطا این هست که [ماکروی `println`]\(ترجمه: macro) جزوی از کتابخانه استاندارد است، که ما دیگر آن را نداریم. بنابراین نمی‌توانیم چیزی را چاپ کنیم. منطقی هست زیرا `println` در [خروجی استاندارد] می‌نویسد، که یک توصیف کننده فایل (ترجمه: File Descriptor) خاص است که توسط سیستم‌عامل ارائه می‌شود. [ماکروی `println`]: https://doc.rust-lang.org/std/macro.println.html [خروجی استاندارد]: https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29 پس بیایید قسمت مروبط به چاپ را پاک کرده و این‌ بار با یک تابع main خالی امتحان کنیم: ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` حالا کامپایلر با کمبود یک تابع `#[panic_handler]` و یک _language item_ روبرو است. ## پیاده‌سازی پنیک (کلمه: Panic) خاصیت `panic_handler` تابعی را تعریف می‌کند که کامپایلر باید در هنگام رخ دادن یک [پنیک] اجرا کند. کتابخانه استاندارد تابع مدیریت پنیک خود را ارائه می‌دهد، اما در یک محیط `no_std` ما باید خودمان آن را تعریف کنیم. [پنیک]: https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html ```rust // in main.rs use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` [پارامتر `PanicInfo`][PanicInfo] شامل فایل و شماره خطی که پنیک رخ داده و پیام پنیکِ اختیاری می‌باشد. تابع هیچ وقت نباید چیزی را برگرداند به همین دلیل به عنوان یک [تابع واگرا]\(ترجمه: diverging function) بوسیله نوع برگشتی `!` [نوع ”هرگز“] علامت‌گذاری شده است. فعلا کار زیادی نیست که بتوانیم در این تابع انجام دهیم، بنابراین فقط یک حلقه بی‌نهایت می‌نویسیم. [PanicInfo]: https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html [تابع واگرا]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions [نوع ”هرگز“]: https://doc.rust-lang.org/nightly/std/primitive.never.html ## آیتم زبان `eh_personality` آیتم‌های زبان، توابع و انواع خاصی هستند که برای استفاده درون کامپایلر ضروری‌اند. به عنوان مثال، تِرِیت ‍‍[`Copy`]\(کلمه: Trait) یک آیتم زبان است که به کامپایلر می‌گوید کدام انواع دارای [_مفهوم کپی_][`Copy`] هستند. وقتی به [پیاده‌سازی][copy code] آن نگاه می‌کنیم، می‌بینیم که یک خاصیت ویژه `#[lang = "copy"]` دارد که آن را به عنوان یک آیتم زبان تعریف می‌کند. [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [copy code]: https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299 درحالی که می‌توان پیاده‌سازی خاص برای آیتم‌های زبان فراهم کرد، فقط باید به عنوان آخرین راه حل از آن استفاده کرد. زیرا آیتم‌های زبان بسیار در جزئیات پیاده‌سازی ناپایدار هستند و حتی انواع آن‌ها نیز چک نمی‌شود (بنابراین کامپایلر حتی چک نمی‌کند که آرگومان تابع نوع درست را دارد). خوشبختانه یک راه پایدارتر برای حل مشکل آیتم زبان بالا وجود دارد. [آیتم زبان `eh_personality`] یک تابع را به عنوان تابعی که برای پیاده‌سازی [بازکردن پشته (Stack Unwinding)] استفاده شده، علامت‌گذاری می‌کند. راست بطور پیش‌فرض از _بازکردن_ (ترجمه: unwinding) برای اجرای نابودگرهای (ترجمه: Destructors) تمام متغیرهای زنده درون استک در مواقع [پنیک] استفاده می‌کند. این تضمین می‌کند که تمام حافظه استفاده شده آزاد می‌شود و به نخ اصلی اجازه می‌دهد پنیک را دریافت کرده و اجرا را ادامه دهد. باز کردن، یک فرآیند پیچیده است و به برخی از کتابخانه‌های خاص سیستم‌عامل (به عنوان مثال [libunwind] در لینوکس یا [مدیریت اکسپشن ساخت یافته] در ویندوز) نیاز دارد، بنابراین ما نمی‌خواهیم از آن برای سیستم‌عامل خود استفاده کنیم. [آیتم زبان `eh_personality`]: https://github.com/rust-lang/rust/blob/edb368491551a77d77a48446d4ee88b35490c565/src/libpanic_unwind/gcc.rs#L11-L45 [بازکردن پشته (Stack Unwinding)]: https://www.bogotobogo.com/cplusplus/stackunwinding.php [libunwind]: https://www.nongnu.org/libunwind/ [مدیریت اکسپشن ساخت یافته]: https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling ### غیرفعال کردن Unwinding موارد استفاده دیگری نیز وجود دارد که باز کردن نامطلوب است، بنابراین راست به جای آن گزینه [قطع در پنیک] را فراهم می‌کند. این امر تولید اطلاعات نمادها (ترجمه: Symbol) را از بین می‌برد و بنابراین اندازه باینری را بطور قابل توجهی کاهش می‌دهد. چندین مکان وجود دارد که می توانیم باز کردن را غیرفعال کنیم. ساده‌ترین راه این است که خطوط زیر را به `Cargo.toml` اضافه کنید: ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` این استراتژی پنیک را برای دو پروفایل `dev` (در `cargo build` استفاده می‌شود) و پروفایل `release` (در ` cargo build --release` استفاده می‌شود) تنظیم می‌کند. اکنون آیتم زبان `eh_personality` نباید دیگر لازم باشد. [قطع در پنیک]: https://github.com/rust-lang/rust/pull/32900 اکنون هر دو خطای فوق را برطرف کردیم. با این حال‌، اگر اکنون بخواهیم آن را کامپایل کنیم، خطای دیگری رخ می‌دهد: ``` > cargo build error: requires `start` lang_item ``` برنامه ما آیتم زبان `start` که نقطه ورود را مشخص می‌کند، را ندارد. ## خاصیت `start` ممکن است تصور شود که تابع `main` اولین تابعی است که هنگام اجرای یک برنامه فراخوانی می‌شود. با این حال، بیشتر زبان‌ها دارای [سیستم رانتایم] هستند که مسئول مواردی مانند جمع آوری زباله (به عنوان مثال در جاوا) یا نخ‌های نرم‌افزار (به عنوان مثال goroutines در Go) است. این رانتایم باید قبل از `main` فراخوانی شود، زیرا باید خود را مقداردهی اولیه و آماده کند. [سیستم رانتایم]: https://en.wikipedia.org/wiki/Runtime_system در یک باینری معمولی راست که از کتابخانه استاندارد استفاده می‌کند، اجرا در یک کتابخانه رانتایم C به نام `crt0` ("زمان اجرا صفر C") شروع می‌شود، که محیط را برای یک برنامه C تنظیم می‌کند. این شامل ایجاد یک پشته و قرار دادن آرگومان‌ها در رجیسترهای مناسب است. سپس رانتایم C [ورودی رانتایم راست][rt::lang_start] را فراخوانی می‌کند، که با آیتم زبان `start` مشخص شده است. راست فقط یک رانتایم بسیار کوچک دارد، که مواظب برخی از کارهای کوچک مانند راه‌اندازی محافظ‌های سرریز پشته یا چاپ backtrace با پنیک می‌باشد. رانتایم در نهایت تابع `main` را فراخوانی می‌کند. [rt::lang_start]: https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73 برنامه اجرایی مستقل ما به رانتایم و `crt0` دسترسی ندارد، بنابراین باید نقطه ورود را مشخص کنیم. پیاده‌سازی آیتم زبان `start` کمکی نخواهد کرد، زیرا همچنان به `crt0` نیاز دارد. در عوض، باید نقطه ورود `crt0` را مستقیماً بازنویسی کنیم. ### بازنویسی نقطه ورود برای اینکه به کامپایلر راست بگوییم که نمی‌خواهیم از زنجیره نقطه ورودی عادی استفاده کنیم، ویژگی `#![no_main]` را اضافه می‌کنیم. ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` ممکن است متوجه شده باشید که ما تابع `main` را حذف کردیم. دلیل این امر این است که `main` بدون یک رانتایم اساسی که آن را صدا کند معنی ندارد. در عوض، ما در حال بازنویسی نقطه ورود سیستم‌عامل با تابع `start_` خود هستیم: ```rust #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { loop {} } ``` با استفاده از ویژگی `[no_mangle]#` ما [name mangling] را غیرفعال می کنیم تا اطمینان حاصل کنیم که کامپایلر راست تابعی با نام `start_` را خروجی می‌دهد. بدون این ویژگی، کامپایلر برخی از نمادهای رمزنگاری شده `ZN3blog_os4_start7hb173fedf945531caE_` را تولید می‌کند تا به هر تابع یک نام منحصر به فرد بدهد. این ویژگی لازم است زیرا در مرحله بعدی باید نام تایع نقطه ورود را به لینکر (کلمه: linker) بگوییم. ما همچنین باید تابع را به عنوان `"extern "C` علامت‌گذاری کنیم تا به کامپایلر بگوییم که باید از [قرارداد فراخوانی C] برای این تابع استفاده کند (به جای قرارداد مشخص نشده فراخوانی راست). دلیل نامگذاری تابع `start_` این است که این نام نقطه پیش‌فرض ورودی برای اکثر سیستم‌ها است. [name mangling]: https://en.wikipedia.org/wiki/Name_mangling [قرارداد فراخوانی C]: https://en.wikipedia.org/wiki/Calling_convention نوع بازگشت `!` به این معنی است که تایع واگرا است، یعنی اجازه بازگشت ندارد. این مورد لازم است زیرا نقطه ورود توسط هیچ تابعی فراخوانی نمی‌شود، بلکه مستقیماً توسط سیستم‌عامل یا بوت‌لودر فراخوانی می‌شود. بنابراین به جای بازگشت، نقطه ورود باید به عنوان مثال [فراخوان سیستمی `exit`] از سیستم‌عامل را فراخوانی کند. در مورد ما، خاموش کردن دستگاه می‌تواند اقدامی منطقی باشد، زیرا در صورت بازگشت باینری مستقل دیگر کاری برای انجام دادن وجود ندارد. در حال حاضر، ما این نیاز را با حلقه‌های بی‌پایان انجام می‌دهیم. [فراخوان سیستمی `exit`]: https://en.wikipedia.org/wiki/Exit_(system_call) حالا وقتی `cargo build` را اجرا می‌کنیم، با یک خطای _لینکر_ زشت مواجه می‌شویم. ## خطا‌های لینکر (Linker) لینکر برنامه‌ای است که کد تولید شده را ترکیب کرده و یک فایل اجرایی می‌سازد. از آن‌جا که فرمت اجرایی بین لینوکس، ویندوز و macOS متفاوت است، هر سیستم لینکر خود را دارد که خطای متفاوتی ایجاد می‌کند. علت اصلی خطاها یکسان است: پیکربندی پیش‌فرض لینکر فرض می‌کند که برنامه ما به رانتایم C وابسته است، که این طور نیست. برای حل خطاها، باید به لینکر بگوییم که نباید رانتایم C را اضافه کند. ما می‌توانیم این کار را با اضافه کردن مجموعه‌ای از آرگمان‌ها به لینکر یا با ساختن یک هدف (ترجمه: Target) bare metal انجام دهیم. ### بیلد کردن برای یک هدف bare metal راست به طور پیش‌فرض سعی در ایجاد یک اجرایی دارد که بتواند در محیط سیستم فعلی شما اجرا شود. به عنوان مثال، اگر از ویندوز در `x86_64` استفاده می‌کنید، راست سعی در ایجاد یک `exe.` اجرایی ویندوز دارد که از دستورالعمل‌های `x86_64` استفاده می‌کند. به این محیط سیستم "میزبان" شما گفته می‌شود. راست برای توصیف محیط‌های مختلف، از رشته‌ای به نام [_target triple_]\(سه‌گانه هدف) استفاده می‌کند. با اجرای `rustc --version --verbose` می‌توانید target triple را برای سیستم میزبان خود مشاهده کنید: [_target triple_]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple ``` rustc 1.35.0-nightly (474e7a648 2019-04-07) binary: rustc commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab commit-date: 2019-04-07 host: x86_64-unknown-linux-gnu release: 1.35.0-nightly LLVM version: 8.0 ``` خروجی فوق از یک سیستم لینوکس `x86_64` است. می‌بینیم که سه‌گانه میزبان `x86_64-unknown-linux-gnu` است که شامل معماری پردازنده (`x86_64`)، فروشنده (`ناشناخته`)، سیستم‌عامل (` linux`) و [ABI] (`gnu`) است. [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface با کامپایل کردن برای سه‌گانه میزبان‌مان، کامپایلر راست و لینکر فرض می‌کنند که یک سیستم‌عامل زیرین مانند Linux یا Windows وجود دارد که به طور پیش‌فرض از رانتایم C استفاده می‌کند، که باعث خطاهای لینکر می‌شود. بنابراین برای جلوگیری از خطاهای لینکر، می‌توانیم برای محیطی متفاوت و بدون سیستم‌عامل زیرین کامپایل کنیم. یک مثال برای چنین محیطِ bare metal ی، سه‌گانه هدف `thumbv7em-none-eabihf` است، که یک سیستم [تعبیه شده][ARM] را توصیف می‌کند. جزئیات مهم نیستند، مهم این است که سه‌گانه هدف فاقد سیستم‌عامل زیرین باشد، که با `none` در سه‌گانه هدف نشان داده می‌شود. برای این که بتوانیم برای این هدف کامپایل کنیم، باید آن را به rustup اضافه کنیم: [تعبیه شده]: https://en.wikipedia.org/wiki/Embedded_system [ARM]: https://en.wikipedia.org/wiki/ARM_architecture ``` rustup target add thumbv7em-none-eabihf ``` با این کار نسخه‌ای از کتابخانه استاندارد (و core) برای سیستم بارگیری می‌شود. اکنون می‌توانیم برای این هدف اجرایی مستقل خود را بسازیم: ``` cargo build --target thumbv7em-none-eabihf ``` با استفاده از یک آرگومان `target--`، ما اجرایی خود را برای یک سیستم هدف bare metal [کراس کامپایل] می‌کنیم. از آن‌جا که سیستم هدف فاقد سیستم‌عامل است، لینکر سعی نمی‌کند رانتایم C را به آن پیوند دهد و بیلد ما بدون هیچ گونه خطای لینکر با موفقیت انجام می‌شود. [کراس کامپایل]: https://en.wikipedia.org/wiki/Cross_compiler این روشی است که ما برای ساخت هسته سیستم‌عامل خود استفاده خواهیم کرد. به جای `thumbv7em-none-eabihf`، ما از یک [هدف سفارشی] استفاده خواهیم کرد که یک محیط `x86_64` bare metal را توصیف می‌کند. جزئیات در پست بعدی توضیح داده خواهد شد. [هدف سفارشی]: https://doc.rust-lang.org/rustc/targets/custom.html ### آرگومان‌های لینکر به جای کامپایل کردن برای یک سیستم bare metal، می‌توان خطاهای لینکر را با استفاده از مجموعه خاصی از آرگومان‌ها به لینکر حل کرد. این روشی نیست که ما برای هسته خود استفاده کنیم، بنابراین این بخش اختیاری است و فقط برای کامل بودن ارائه می‌شود. برای نشان دادن محتوای اختیاری، روی _"آرگومان‌های لینکر"_ در زیر کلیک کنید.
    آرگومان‌های لینکر در این بخش، ما در مورد خطاهای لینکر که در لینوکس، ویندوز و macOS رخ می‌دهد بحث می‌کنیم و نحوه حل آن‌ها را با استفاده از آرگومان‌های اضافی به لینکر توضیح می‌دهیم. توجه داشته باشید که فرمت اجرایی و لینکر بین سیستم‌عامل‌ها متفاوت است، بنابراین برای هر سیستم مجموعه‌ای متفاوت از آرگومان‌ها مورد نیاز است. #### لینوکس در لینوکس خطای لینکر زیر رخ می‌دهد (کوتاه شده): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` مشکل این است که لینکر به طور پیش‌فرض شامل روال راه‌اندازی رانتایم C است که به آن `start_` نیز گفته می‌شود. به برخی از نمادهای کتابخانه استاندارد C یعنی `libc` نیاز دارد که به دلیل ویژگی`no_std` آن‌ها را نداریم، بنابراین لینکر نمی‌تواند این مراجع را پیدا کند. برای حل این مسئله، با استفاده از پرچم `nostartfiles-` می‌توانیم به لینکر بگوییم که نباید روال راه‌اندازی C را لینک دهد. یکی از راه‌های عبور صفات لینکر از طریق cargo، دستور `cargo rustc` است. این دستور دقیقاً مانند `cargo build` رفتار می‌کند، اما اجازه می‌دهد گزینه‌ها را به `rustc`، کامپایلر اصلی راست انتقال دهید. `rustc` دارای پرچم`C link-arg-` است که آرگومان را به لینکر منتقل می‌کند. با ترکیب همه این‌ها، دستور بیلد جدید ما به این شکل است: ``` cargo rustc -- -C link-arg=-nostartfiles ``` اکنون کریت ما بصورت اجرایی مستقل در لینوکس ساخته می‌شود! لازم نیست که صریحاً نام تابع نقطه ورود را مشخص کنیم، زیرا لینکر به طور پیش‌فرض به دنبال تابعی با نام `start_` می‌گردد. #### ویندوز در ویندوز، یک خطای لینکر متفاوت رخ می‌دهد (کوتاه شده): ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` خطای "entry point must be defined" به این معنی است که لینکر نمی‌تواند نقطه ورود را پیدا کند. در ویندوز، نام پیش‌فرض نقطه ورود [بستگی به زیر سیستم استفاده شده دارد] [windows-subsystem]. برای زیر سیستم `CONSOLE` لینکر به دنبال تابعی به نام `mainCRTStartup` و برای زیر سیستم `WINDOWS` به دنبال تابعی به نام `WinMainCRTStartup` می‌گردد. برای بازنویسی این پیش‌فرض و به لینکر گفتن که در عوض به دنبال تابع `_start` ما باشد ، می توانیم یک آرگومان `ENTRY/` را به لینکر ارسال کنیم: [windows-subsystems]: https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol ``` cargo rustc -- -C link-arg=/ENTRY:_start ``` از متفاوت بودن فرمت آرگومان، به وضوح می‌فهمیم که لینکر ویندوز یک برنامه کاملاً متفاوت از لینکر Linux است. اکنون یک خطای لینکر متفاوت رخ داده است: ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` این خطا به این دلیل رخ می‌دهد که برنامه‌های اجرایی ویندوز می‌توانند از [زیر سیستم های][windows-subsystems] مختلف استفاده کنند. برای برنامه‌های عادی، بسته به نام نقطه ورود استنباط می شوند: اگر نقطه ورود `main` نامگذاری شود، از زیر سیستم `CONSOLE` و اگر نقطه ورود `WinMain` نامگذاری شود، از زیر سیستم `WINDOWS` استفاده می‌شود. از آن‌جا که تابع `start_` ما نام دیگری دارد، باید زیر سیستم را صریحاً مشخص کنیم: ``` cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` ما در اینجا از زیر سیستم `CONSOLE` استفاده می‌کنیم، اما زیر سیستم `WINDOWS` نیز کار خواهد کرد. به جای اینکه چند بار از `C link-arg-` استفاده کنیم، از`C link-args-` استفاده می‌کنیم که لیستی از آرگومان‌ها به صورت جدا شده با فاصله را دریافت می‌کند. با استفاده از این دستور، اجرایی ما باید با موفقیت بر روی ویندوز ساخته شود. #### macOS در macOS، خطای لینکر زیر رخ می‌دهد (کوتاه شده): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` این پیام خطا به ما می‌گوید که لینکر نمی‌تواند یک تابع نقطه ورود را با نام پیش‌فرض `main` پیدا کند (به دلایلی همه توابع در macOS دارای پیشوند `_` هستند). برای تنظیم نقطه ورود به تابع `start_` ، آرگومان لینکر `e-` را استفاده می‌کنیم: ``` cargo rustc -- -C link-args="-e __start" ``` پرچم `e‌-‌` نام تابع نقطه ورود را مشخص می‌کند. از آن‌جا که همه توابع در macOS دارای یک پیشوند اضافی `_` هستند، ما باید به جای `start_` نقطه ورود را روی `start__` تنظیم کنیم. اکنون خطای لینکر زیر رخ می‌دهد: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` سیستم‌عامل مک‌ [رسماً باینری‌هایی را که بطور استاتیک با هم پیوند دارند پشتیبانی نمی‌کند] و بطور پیش‌فرض به برنامه‌هایی برای پیوند دادن کتابخانه `libSystem` نیاز دارد. برای تغییر این حالت و پیوند دادن یک باینری استاتیک، پرچم `static-` را به لینکر ارسال می‌کنیم: [باینری‌هایی را که بطور استاتیک با هم پیوند دارند پشتیبانی نمی‌کند]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ``` cargo rustc -- -C link-args="-e __start -static" ``` این نیز کافی نیست، سومین خطای لینکر رخ می‌دهد: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` این خطا رخ می‌دهد زیرا برنامه های موجود در macOS به طور پیش‌فرض به `crt0` ("رانتایم صفر C") پیوند دارند. این همان خطایی است که در لینوکس داشتیم و با افزودن آرگومان لینکر `nostartfiles-` نیز قابل حل است: ``` cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` اکنون برنامه ما باید با موفقیت بر روی macOS ساخته شود. #### متحد کردن دستورات Build در حال حاضر بسته به سیستم‌عامل میزبان، دستورات ساخت متفاوتی داریم که ایده آل نیست. برای جلوگیری از این، می‌توانیم فایلی با نام `cargo/config.toml.` ایجاد کنیم که حاوی آرگومان‌های خاص هر پلتفرم است: ```toml # in .cargo/config.toml [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` کلید `rustflags` شامل آرگومان‌هایی است که بطور خودکار به هر فراخوانی `rustc` اضافه می‌شوند. برای کسب اطلاعات بیشتر در مورد فایل `cargo/config.toml‌.‌` به [اسناد رسمی](https://doc.rust-lang.org/cargo/reference/config.html) مراجعه کنید. اکنون برنامه ما باید در هر سه سیستم‌عامل با یک `cargo build` ساده قابل بیلد باشد. #### آیا شما باید این کار را انجام دهید؟ اگرچه ساخت یک اجرایی مستقل برای لینوکس، ویندوز و macOS امکان پذیر است، اما احتمالاً ایده خوبی نیست. چرا که اجرایی ما هنوز انتظار موارد مختلفی را دارد، به عنوان مثال با فراخوانی تابع `start_` یک پشته مقداردهی اولیه شده است. بدون رانتایم C، ممکن است برخی از این الزامات برآورده نشود، که ممکن است باعث شکست برنامه ما شود، به عنوان مثال از طریق `segmentation fault`. اگر می خواهید یک باینری کوچک ایجاد کنید که بر روی سیستم‌عامل موجود اجرا شود، اضافه کردن `libc` و تنظیم ویژگی `[start]#` همان‌طور که [اینجا](https://doc.rust-lang.org/1.16.0/book/no-stdlib.html) شرح داده شده است، احتمالاً ایده بهتری است.
    ## خلاصه {#summary} یک باینری مستقل مینیمال راست مانند این است: `src/main.rs`: ```rust #![no_std] // don't link the Rust standard library #![no_main] // disable all Rust-level entry points use core::panic::PanicInfo; #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { // this function is the entry point, since the linker looks for a function // named `_start` by default loop {} } /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # the profile used for `cargo build` [profile.dev] panic = "abort" # disable stack unwinding on panic # the profile used for `cargo build --release` [profile.release] panic = "abort" # disable stack unwinding on panic ``` برای ساخت این باینری، ما باید برای یک هدف bare metal مانند `thumbv7em-none-eabihf` کامپایل کنیم: ``` cargo build --target thumbv7em-none-eabihf ``` یک راه دیگر این است که می‌توانیم آن را برای سیستم میزبان با استفاده از آرگومان‌های اضافی لینکر کامپایل کنیم: ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` توجه داشته باشید که این فقط یک نمونه حداقلی از باینری مستقل راست است. این باینری انتظار چیزهای مختلفی را دارد، به عنوان مثال با فراخوانی تابع `start_` یک پشته مقداردهی اولیه می‌شود. **بنابراین برای هر گونه استفاده واقعی از چنین باینری، مراحل بیشتری لازم است**. ## بعدی چیست؟ [پست بعدی] مراحل مورد نیاز برای تبدیل باینری مستقل به حداقل هسته سیستم‌عامل را توضیح می‌دهد. که شامل ایجاد یک هدف سفارشی، ترکیب اجرایی ما با بوت‌لودر و یادگیری نحوه چاپ چیزی در صفحه است. [پست بعدی]: @/edition-2/posts/02-minimal-rust-kernel/index.fa.md ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.fr.md ================================================ +++ title = "Un binaire Rust autoporté" weight = 1 path = "fr/freestanding-rust-binary" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "3e87916b6c2ed792d1bdb8c0947906aef9013ac1" # GitHub usernames of the people that translated this post translators = ["AlexandreMarcq", "alaincao", "richarddalves"] +++ La première étape pour créer notre propre noyau de système d'exploitation est de créer un exécutable Rust qui ne relie pas la bibliothèque standard. Cela rend possible l'exécution du code Rust sur la ["bare machine"][machine nue] sans système d'exploitation sous-jacent. [machine nue]: https://en.wikipedia.org/wiki/Bare_machine Ce blog est développé sur [GitHub]. Si vous avez un problème ou une question, veuillez ouvrir une issue. Vous pouvez aussi laisser un commentaire [en bas de page]. Le code source complet de cet article est disponible sur la branche [`post-01`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [en bas de page]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## Introduction Pour écrire un noyau de système d'exploitation, nous avons besoin d'un code qui ne dépend pas de fonctionnalités de système d'exploitation. Cela signifie que nous ne pouvons pas utiliser les fils d'exécution, les fichiers, la mémoire sur le tas, le réseau, les nombres aléatoires, la sortie standard ou tout autre fonctionnalité nécessitant une abstraction du système d'exploitation ou un matériel spécifique. Cela a du sens, étant donné que nous essayons d'écrire notre propre OS et nos propres pilotes. Cela signifie que nous ne pouvons pas utiliser la majeure partie de la [bibliothèque standard de Rust]. Il y a néanmoins beaucoup de fonctionnalités de Rust que nous _pouvons_ utiliser. Par exemple, nous pouvons utiliser les [iterators], les [closures], le [pattern matching], l'[option] et le [result], le [string formatting], et bien sûr l'[ownership system]. Ces fonctionnalités permettent l'écriture d'un noyau d'une façon expressive et haut-niveau sans se soucier des [comportements indéfinis] ou de la [sécurité de la mémoire]. [option]: https://doc.rust-lang.org/core/option/ [result]:https://doc.rust-lang.org/core/result/ [bibliothèque standard de Rust]: https://doc.rust-lang.org/std/ [iterators]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [string formatting]: https://doc.rust-lang.org/core/macro.write.html [ownership system]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [comportement non-défini]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [sécurité de la mémoire]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention Pour créer un noyau d'OS en Rust, nous devons créer un exécutable qui peut tourner sans système d'exploitation sous-jacent. Un tel exécutable est appelé “freestanding” (autoporté) ou “bare-metal”. Cet article décrit les étapes nécessaires pour créer un exécutable Rust autoporté et explique pourquoi ces étapes sont importantes. Si vous n'êtes intéressé que par un exemple minimal, vous pouvez **[aller au résumé](#resume)**. ## Désactiver la Bibliothèque Standard Par défaut, toutes les crates Rust sont liées à la bibliothèque standard, qui repose sur les fonctionnalités du système d’exploitation telles que les fils d'exécution, les fichiers et la connectivité réseau. Elle est également liée à la bibliothèque standard C `libc`, qui interagit étroitement avec les services fournis par l'OS. Comme notre plan est d'écrire un système d'exploitation, nous ne pouvons pas utiliser des bibliothèques dépendant de l'OS. Nous devons donc désactiver l'inclusion automatique de la bibliothèque standard en utilisant l'[attribut `no std`]. [bibliothèque standard]: https://doc.rust-lang.org/std/ [attribut `no std`]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html Nous commençons par créer un nouveau projet d'application cargo. La manière la plus simple de faire est avec la ligne de commande : ``` cargo new blog_os --bin --edition 2024 ``` J'ai nommé le projet `blog_os`, mais vous pouvez évidemment choisir le nom qu'il vous convient. Le flag `--bin` indique que nous voulons créer un exécutable (contrairement à une bibliothèque) et le flag `--edition 2024` indique que nous voulons utiliser l'[édition 2024] de Rust pour notre crate. Quand nous lançons la commande, cargo crée la structure de répertoire suivante pour nous : [édition 2024]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` Le fichier `Cargo.toml` contient la configuration de la crate, par exemple le nom de la crate, l'auteur, le numéro de [versionnage sémantique] et les dépendances. Le fichier `src/main.rs` contient le module racine de notre crate et notre fonction `main`. Vous pouvez compiler votre crate avec `cargo build` et ensuite exécuter l'exécutable compilé `blog_os` dans le sous-dossier `target/debug`. [versionnage sémantique]: https://semver.org/ ### L'Attribut `no_std` Pour l'instant, notre crate relie la bibliothèque standard implicitement. Désactivons cela en ajoutant l'[attribut `no std`] : ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` Quand nous essayons maintenant de compiler (avec `cargo build)`, l'erreur suivante se produit : ``` error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` La raison est que la [macro `println`] fait partie de la bibliothèque standard, que nous ne pouvons plus utiliser. Nous ne pouvons donc plus afficher de texte avec. Cela est logique, car `println` écrit dans la [sortie standard], qui est un descripteur de fichier spécial fourni par le système d'eploitation. [macro `println`]: https://doc.rust-lang.org/std/macro.println.html [sortie standard]: https://fr.wikipedia.org/wiki/Flux_standard#Sortie_standard Supprimons l'affichage et essayons à nouveau avec une fonction main vide : ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` Maintenant le compilateur a besoin d'une fonction `#[panic_handler]` et d'un _objet de langage_. ## Implémentation de Panic L'attribut `panic_handler` définit la fonction que le compilateur doit appeler lorsqu'un [panic] arrive. La bibliothèque standard fournit sa propre fonction de gestion de panic mais dans un environnement `no_std`, nous avons besoin de le définir nous-mêmes : [panic]: https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html ```rust // dans main.rs use core::panic::PanicInfo; /// Cette fonction est appelée à chaque panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` Le [paramètre `PanicInfo`][PanicInfo] contient le fichier et la ligne où le panic a eu lieu et le message optionnel de panic. La fonction ne devrait jamais retourner quoi que ce soit, elle est donc marquée comme [fonction divergente] en retournant le [type “never”] `!`. Nous ne pouvons pas faire grand chose dans cette fonction pour le moment, nous bouclons donc indéfiniment. [PanicInfo]: https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html [fonction divergente]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions [type “never”]: https://doc.rust-lang.org/nightly/std/primitive.never.html ## L'Objet de Langage `eh_personality` Les objets de langage sont des éléments spéciaux (traits, fonctions, types, etc.) qui sont requis par le compilateur de manière interne. Par exemple, le trait [`Copy`] est un objet de langage qui indique au compilateur quels types possèdent la [sémantique copy][`Copy`]. Quand nous regardons l'[implémentation][copy code] du code, nous pouvons voir qu'il possède l'attribut spécial `#[lang = copy]` qui le définit comme étant un objet de langage. [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [copy code]: https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299 Bien qu'il soit possible de fournir des implémentations personnalisées des objets de langage, cela ne devrait être fait qu'en dernier recours. La raison est que les objets de langages sont des détails d'implémentation très instables et qui ne sont même pas vérifiés au niveau de leur type (donc le compilateur ne vérifie même pas qu'une fonction possède les bons types d'arguments). Heureusement, il y a une manière plus robuste de corriger l'erreur d'objet de langage ci-dessus. L'[objet de langage `eh_personality`] marque une fonction qui est utilisée pour l'implémentation du [déroulement de pile]. Par défaut, Rust utilise le déroulement de pile pour exécuter les destructeurs de chaque variable vivante sur la pile en cas de [panic]. Cela assure que toute la mémoire utilisée est libérée et permet au fil d'exécution parent d'attraper la panic et de continuer l'exécution. Le déroulement toutefois est un processus compliqué et nécessite des bibliothèques spécifiques à l'OS ([libunwind] pour Linux ou [gestion structurée des erreurs] pour Windows), nous ne voulons donc pas l'utiliser pour notre système d'exploitation. [objet de langage `eh_personality`]: https://github.com/rust-lang/rust/blob/edb368491551a77d77a48446d4ee88b35490c565/src/libpanic_unwind/gcc.rs#L11-L45 [déroulement de pile]: https://docs.microsoft.com/fr-fr/cpp/cpp/exceptions-and-stack-unwinding-in-cpp?view=msvc-160 [libunwind]: https://www.nongnu.org/libunwind/ [gestion structurée des erreurs]: https://docs.microsoft.com/fr-fr/windows/win32/debug/structured-exception-handling ### Désactiver le Déroulement Il y a d'autres cas d'utilisation pour lesquels le déroulement n'est pas souhaité. Rust offre donc une option pour [interrompre après un panic]. Cela désactive la génération de symboles de déroulement et ainsi réduit considérablement la taille de l'exécutable. Il y a de multiples endroit où nous pouvons désactiver le déroulement. Le plus simple est d'ajouter les lignes suivantes dans notre `Cargo.toml` : ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` Cela configure la stratégie de panic à `abort` pour le profil `dev` (utilisé pour `cargo build`) et le profil `release` (utilisé pour `cargo build --release`). Maintenant l'objet de langage `eh_personality` ne devrait plus être requis. [interrompre après un panic]: https://github.com/rust-lang/rust/pull/32900 Nous avons dorénavant corrigé les deux erreurs ci-dessus. Toutefois, si nous essayons de compiler, une autre erreur apparaît : ``` > cargo build error: requires `start` lang_item ``` L'objet de langage `start` manque à notre programme. Il définit le point d'entrée. ## L'attribut `start` On pourrait penser que la fonction `main` est la première fonction appelée lorsqu'un programme est exécuté. Toutefois, la plupart des langages ont un [environnement d'exécution] qui est responsable des tâches telles que le ramassage des miettes (ex: dans Java) ou les fils d'exécution logiciel (ex: les goroutines dans Go). Cet environnement doit être appelé avant `main` puisqu'il a besoin de s'initialiser. [environnement d'exécution]: https://fr.wikipedia.org/wiki/Environnement_d%27ex%C3%A9cution Dans un exécutable Rust classique qui relie la bibliothèque standard, l'exécution commence dans une bibliothèque d'environnement d'exécution C appelé `crt0` (“C runtime zero”). Elle configure l'environnement pour une application C. Cela comprend la création d'une pile et le placement des arguments dans les bons registres. L'environnement d'exécution C appelle ensuite [le point d'entrée de l'environnement d'exécution de Rust][rt::lang_start], qui est marqué par l'objet de langage `start`. Rust possède un environnement d'exécution très minime, qui se charge de petites tâches telles que la configuration des guardes de dépassement de pile ou l'affichage de la trace d'appels lors d'un panic. L'environnement d'exécution finit par appeler la fonction `main`. [rt::lang_start]: https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73 Notre exécutable autoporté n'a pas accès à l'environnement d'exécution de Rust ni à `crt0`. Nous avons donc besoin de définir notre propre point d'entrée. Implémenter l'objet de langage `start` n'aiderait pas car nous aurions toujours besoin de `crt0`. Nous avons plutôt besoin de réécrire le point d'entrée de `crt0` directement. ### Réécrire le Point d'Entrée Pour indiquer au compilateur que nous ne voulons pas utiliser la chaîne de point d'entrée normale, nous ajoutons l'attribut `#![no_main]`. ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// Cette fonction est appelée à chaque panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` Vous remarquerez peut-être que nous avons retiré la fonction `main`. La raison est que la présence de cette fonction n'a pas de sens sans un environnement d'exécution sous-jacent qui l'appelle. À la place, nous réécrivons le point d'entrée du système d'exploitation avec notre propre fonction `_start` : ```rust #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { loop {} } ``` En utilisant l'attribut `#[unsafe(no_mangle)]`, nous désactivons la [décoration de nom] pour assurer que le compilateur Rust crée une fonction avec le nom `_start`. Sans cet attribut, le compilateur génèrerait un symbol obscure `_ZN3blog_os4_start7hb173fedf945531caE` pour donner un nom unique à chaque fonction. L'attribut est nécessaire car nous avons besoin d'indiquer le nom de la fonction de point d'entrée à l'éditeur de lien (*linker*) dans l'étape suivante. Nous devons aussi marquer la fonction avec `extern C` pour indiquer au compilateur qu'il devrait utiliser la [convention de nommage] de C pour cette fonction (au lieu de la convention de nommage de Rust non-spécifiée). Cette fonction se nomme `_start` car c'est le nom par défaut des points d'entrée pour la plupart des systèmes. [décoration de nom]: https://fr.wikipedia.org/wiki/D%C3%A9coration_de_nom [convention de nommage]: https://fr.wikipedia.org/wiki/Convention_de_nommage Le type de retour `!` signifie que la fonction est divergente, c-à-d qu'elle n'a pas le droit de retourner quoi que ce soit. Cela est nécessaire car le point d'entrée n'est pas appelé par une fonction, mais invoqué directement par le système d'exploitation ou par le chargeur d'amorçage. Donc au lieu de retourner une valeur, le point d'entrée doit invoquer l'[appel système `exit`] du système d'exploitation. Dans notre cas, arrêter la machine pourrait être une action convenable, puisqu'il ne reste rien d'autre à faire si un exécutable autoporté s'arrête. Pour l'instant, nous remplissons la condition en bouclant indéfiniement. [appel système `exit`]: https://fr.wikipedia.org/wiki/Appel_syst%C3%A8me Quand nous lançons `cargo build`, nous obtenons une erreur de _linker_. ## Erreurs de Linker Le linker est un programme qui va transformer le code généré en exécutable. Comme le format de l'exécutable differt entre Linux, Windows et macOS, chaque système possède son propre linker qui lève une erreur différente. La cause fondamentale de cette erreur est la même : la configuration par défaut du linker part du principe que notre programme dépend de l'environnement d'exécution de C, ce qui n'est pas le cas. Pour résoudre les erreurs, nous devons indiquer au linker qu'il ne doit pas inclure l'environnement d'exécution de C. Nous pouvons faire cela soit en passant un ensemble précis d'arguments, soit en compilant pour une cible bare metal. ### Compiler pour une Cible Bare Metal Par défaut Rust essaie de compiler un exécutable qui est compatible avec l'environnment du système actuel. Par exemple, si vous utilisez Windows avec `x86_64`, Rust essaie de compiler un exécutable Windows `.exe` qui utilises des instructions `x86_64`. Cet environnement est appelé système "hôte". Pour décrire plusieurs environnements, Rust utilise une chaîne de caractères appelée [_triplé cible_]. Vous pouvez voir le triplé cible de votre système hôte en lançant la commande `rustc --version --verbose` : [_triplé cible_]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple ``` rustc 1.35.0-nightly (474e7a648 2019-04-07) binary: rustc commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab commit-date: 2019-04-07 host: x86_64-unknown-linux-gnu release: 1.35.0-nightly LLVM version: 8.0 ``` La sortie ci-dessus provient d'un système Linux `x86_64`. Nous pouvons voir que le triplé `host` est `x86_64-unknown-linux-gnu`, qui inclut l'architecture du CPU (`x86_64`), le vendeur (`unknown`), le système d'exploitation (`linux`) et l'[ABI] (`gnu`). [ABI]: https://fr.wikipedia.org/wiki/Application_binary_interface En compilant pour notre triplé hôte, le compilateur Rust ainsi que le linker supposent qu'il y a un système d'exploitation sous-jacent comme Linux ou Windows qui utilise l'environnement d'exécution C par défaut, ce qui cause les erreurs de linker. Donc pour éviter ces erreurs, nous pouvons compiler pour un environnement différent sans système d'exploitation sous-jacent. Un exemple d'un tel envrironnement est le triplé cible `thumbv7em-none-eabihf`, qui décrit un système [ARM] [embarqué]. Les détails ne sont pas importants, tout ce qui compte est que le triplé cible n'a pas de système d'exploitation sous-jacent, ce qui est indiqué par le `none` dans le triplé cible. Pour pouvoir compiler pour cette cible, nous avons besoin de l'ajouter dans rustup : [embarqué]: https://fr.wikipedia.org/wiki/Syst%C3%A8me_embarqu%C3%A9 [ARM]: https://fr.wikipedia.org/wiki/Architecture_ARM ``` rustup target add thumbv7em-none-eabihf ``` Cela télécharge une copie de la bibliothèque standard (et core) pour le système. Maintenant nous pouvons compiler notre exécutable autoporté pour cette cible : ``` cargo build --target thumbv7em-none-eabihf ``` En donnant un argument `--target`, nous effectuons une [compilation croisée][cross_compile] de notre exécutable pour un système bare metal. Comme le système cible n'a pas de système d'exploitation, le linker n'essaie pas de lier l'environnement d'exécution C et notre compilation réussit sans erreur de linker. [cross compile]: https://en.wikipedia.org/wiki/Cross_compiler C'est l'approche que nous allons utiliser pour construire notre noyau d'OS. Plutôt que `thumbv7em-none-eabihf`, nous allons utiliser une [cible personnalisée][custom target] qui décrit un environnement bare metal `x86_64`. Les détails seront expliqués dans le prochain article. [custom target]: https://doc.rust-lang.org/rustc/targets/custom.html ### Arguments du Linker Au lieu de compiler pour un système bare metal, il est aussi possible de résoudre les erreurs de linker en passant un ensemble précis d'arguments au linker. Ce n'est pas l'approche que nous allons utiliser pour notre noyau. Cette section est donc optionnelle et fournis uniquement à titre de complétude. Cliquez sur _"Arguments du Linker"_ ci-dessous pour montrer le contenu optionel.
    Arguments du Linker Dans cette section nous allons parler des erreurs de linker qui se produisent sur Linux, Windows et macOS. Nous allons aussi apprendre à résoudre ces erreurs en passant des arguments complémentaires au linker. À noter que le format de l'exécutable et le linker diffèrent entre les systèmes d'exploitation. Il faut donc un ensemble d'arguments différent pour chaque système. #### Linux Sur Linux, voici l'erreur de linker qui se produit (raccourcie) : ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` Le problème est que le linker inclut par défaut la routine de démarrage de l'environnement d'exécution de C, qui est aussi appelée `_start`. Elle requiert des symboles de la bibliothèque standard de C `libc` que nous n'incluons pas à cause de l'attribut `no_std`. Le linker ne peut donc pas résoudre ces références. Pour résoudre cela, nous pouvons indiquer au linker qu'il ne devrait pas lier la routine de démarrage de C en passant l'argument `-nostartfiles`. Une façon de passer des attributs au linker via cargo est la commande `cargo rustc`. Cette commande se comporte exactement comme `cargo build`, mais permet aussi de donner des options à `rustc`, le compilateur Rust sous-jacent. `rustc` possède le flag `-C link-arg`, qui donne un argument au linker. Combinés, notre nouvelle commande ressemble à ceci : ``` cargo rustc -- -C link-arg=-nostartfiles ``` Dorénavant notre crate compile en tant qu'exécutable Linux autoporté ! Nous n'avions pas besoin de spécifier le nom de notre point d'entrée de façon explicite car le linker cherche par défaut une fonction nommée `_start`. #### Windows Sur Windows, une erreur de linker différente se produit (raccourcie) : ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` Cette erreur signifie que le linker ne peut pas trouver le point d'entrée. Sur Windows, le nom par défaut du point d'entrée [dépend du sous-système utilisé][windows-subsystems]. Pour le sous-système `CONSOLE`, le linker cherche une fonction nommée `mainCRTStartup` et pour le sous-système `WINDOWS`, il cherche une fonction nomée `WinMainCRTStartup`. Pour réécrire la valeur par défaut et indiquer au linker de chercher notre fonction `_start` à la place, nous pouvons donner l'argument `/ENTRY` au linker : [windows-subsystems]: https://docs.microsoft.com/fr-fr/cpp/build/reference/entry-entry-point-symbol?view=msvc-160 ``` cargo rustc -- -C link-arg=/ENTRY:_start ``` Vu le format d'argument différent nous pouvons clairement voir que le linker Windows est un programme totalement différent du linker Linux. Maintenant une erreur de linker différente se produit : ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` Cette erreur se produit car les exécutables Windows peuvent utiliser différents [sous-systèmes][windows-subsystems]. Pour les programmes normaux, ils sont inférés en fonction du nom du point d'entrée : s'il est nommé `main`, le sous-système `CONSOLE` est utilisé. Si le point d'entrée est nommé `WinMain`, alors le sous-sytème `WINDOWS` est utilisé. Comme notre fonction `_start` possède un nom différent, nous devons préciser le sous-système explicitement : ``` cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` Ici nous utilisons le sous-système `CONSOLE`, mais le sous-système `WINDOWS` pourrait fonctionner aussi. Au lieu de donner `-C link-arg` plusieurs fois, nous utilisons `-C link-args` qui utilise des arguments séparés par des espaces. Avec cette commande, notre exécutable devrait compiler avec succès sous Windows. #### macOS Sur macOS, voici l'erreur de linker qui se produit (raccourcie) : ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` Cette erreur nous indique que le linker ne peut pas trouver une fonction de point d'entrée avec le nom par défaut `main` (pour une quelconque raison, toutes les fonctions sur macOS sont précédées de `_`). Pour configurer le point d'entrée sur notre fonction `_start`, nous donnons l'argument `-e` au linker : ``` cargo rustc -- -C link-args="-e __start" ``` L'argument `-e` spécifie le nom de la fonction de point d'entrée. Comme toutes les fonctions ont un préfixe supplémentaire `_` sur macOS, nous devons configurer le point d'entrée comme étant `__start` au lieu de `_start`. Maintenant l'erreur de linker suivante se produit : ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` macOS [ne supporte pas officiellement les bibliothèques liées de façon statique] et necéessite que les programmes lient la bibliothèque `libSystem` par défaut. Pour réécrire ceci et lier une bibliothèque statique, nous donnons l'argument `-static` au linker : [ne supporte pas officiellement les bibliothèques liées de façon statique]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ``` cargo rustc -- -C link-args="-e __start -static" ``` Cela ne suffit toujours pas, une troisième erreur de linker se produit : ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` Cette erreur se produit car les programmes sous macOS lient `crt0` (“C runtime zero”) par défaut. Ceci est similaire à l'erreur que nous avions eu sous Linux et peut aussi être résolue en ajoutant l'argument `-nostartfiles` au linker : ``` cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` Maintenant notre programme compile avec succès sous macOS. #### Unifier les Commandes de Compilation À cet instant nous avons différentes commandes de compilation en fonction de la plateforme hôte, ce qui n'est pas idéal. Pour éviter cela, nous pouvons créer un ficher nommé `.cargo/config.toml` qui contient les arguments spécifiques aux plateformes : ```toml # dans .cargo/config.toml [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` La clé `rustflags` contient des arguments qui sont automatiquement ajoutés à chaque appel de `rustc`. Pour plus d'informations sur le fichier `.cargo/config.toml`, allez voir la [documentation officielle](https://doc.rust-lang.org/cargo/reference/config.html) Maintenant notre programme devrait être compilable sur les trois plateformes avec un simple `cargo build`. #### Devriez-vous Faire Ça ? Bien qu'il soit possible de compiler un exécutable autoporté pour Linux, Windows et macOS, ce n'est probablement pas une bonne idée. La raison est que notre exécutable s'attend toujours à trouver certaines choses, par exemple une pile initialisée lorsque la fonction `_start` est appelée. Sans l'environnement d'exécution C, certaines de ces conditions peuvent ne pas être remplies, ce qui pourrait faire planter notre programme, avec par exemple une erreur de segmentation. Si vous voulez créer un exécutable minimal qui tourne sur un système d'exploitation existant, include `libc` et mettre l'attribut `#[start]` come décrit [ici](https://doc.rust-lang.org/1.16.0/book/no-stdlib.html) semble être une meilleure idée.
    ## Résumé Un exécutable Rust autoporté minimal ressemble à ceci : `src/main.rs`: ```rust #![no_std] // ne pas lier la bibliothèque standard Rust #![no_main] // désactiver tous les points d'entrée au niveau de Rust use core::panic::PanicInfo; #[unsafe(no_mangle)] // ne pas décorer le nom de cette fonction pub extern "C" fn _start() -> ! { // cette fonction est le point d'entrée, comme le linker cherche une fonction // nomée `_start` par défaut loop {} } /// Cette fonction est appelée à chaque panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # le profile utilisé pour `cargo build` [profile.dev] panic = "abort" # désactive le déroulement de la pile lors d'un panic # le profile utilisé pour `cargo build --release` [profile.release] panic = "abort" # désactive le déroulement de la pile lors d'un panic ``` Pour compiler cet exécutable, nous devons compiler pour une cible bare metal telle que `thumbv7em-none-eabihf` : ``` cargo build --target thumbv7em-none-eabihf ``` Sinon, nous pouvons aussi compiler pour le système hôte en donnant des arguments supplémentaires pour le linker : ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` À noter que ceci est juste un exemple minimal d'un exécutable Rust autoporté. Cet exécutable s'attend à de nombreuses choses, comme par exemple le fait qu'une pile soit initialisée lorsque la fonction `_start` est appelée. **Donc pour une réelle utilisation d'un tel exécutable, davantages d'étapes sont requises.** ## Et ensuite ? Le [poste suivant][next post] explique les étapes nécessaires pour transformer notre exécutable autoporté minimal en noyau de système d'opération. Cela comprend la création d'une cible personnalisée, l'intégration de notre exécutable avec un chargeur d'amorçage et l'apprentissage de comment imprimer quelque chose sur l'écran. [next post]: @/edition-2/posts/02-minimal-rust-kernel/index.md ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.ja.md ================================================ +++ title = "フリースタンディングな Rust バイナリ" weight = 1 path = "ja/freestanding-rust-binary" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "e6c148d6f47bcf8a34916393deaeb7e8da2d5e2a" # GitHub usernames of the people that translated this post translators = ["JohnTitor","ic3w1ne"] +++ 私達自身のオペレーティングシステム(以下、OS)カーネルを作っていく最初のステップは標準ライブラリとリンクしない Rust の実行可能ファイルをつくることです。これにより、基盤となる OS がない[ベアメタル][bare metal]上で Rust のコードを実行することができるようになります。 [bare metal]: https://en.wikipedia.org/wiki/Bare_machine このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][comments]にコメントを残すこともできます。この記事の完全なソースコードは[`post-01` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [comments]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## 導入 OS カーネルを書くためには、いかなる OS の機能にも依存しないコードが必要となります。つまり、スレッドやヒープメモリ、ネットワーク、乱数、標準出力、その他 OS による抽象化や特定のハードウェアを必要とする機能は使えません。私達は自分自身で OS とそのドライバを書こうとしているので、これは理にかなっています。 これは [Rust の標準ライブラリ][Rust standard library]をほとんど使えないということを意味しますが、それでも私達が使うことのできる Rust の機能はたくさんあります。例えば、[イテレータ][iterators]や[クロージャ][closures]、[パターンマッチング][pattern matching]、 [`Option`][option] や [`Result`][result] 型に[文字列フォーマット][string formatting]、そしてもちろん[所有権システム][ownership system]を使うことができます。これらの機能により、[未定義動作][undefined behavior]や[メモリ安全性][memory safety]を気にせずに、高い水準で表現力豊かにカーネルを書くことができます。 [option]: https://doc.rust-lang.org/core/option/ [result]:https://doc.rust-lang.org/core/result/ [Rust standard library]: https://doc.rust-lang.org/std/ [iterators]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [string formatting]: https://doc.rust-lang.org/core/macro.write.html [ownership system]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [undefined behavior]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [memory safety]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention Rust で OS カーネルを書くには、基盤となる OS なしで動く実行環境をつくる必要があります。そのような実行環境はフリースタンディング環境やベアメタルのように呼ばれます。 この記事では、フリースタンディングな Rust のバイナリをつくるために必要なステップを紹介し、なぜそれが必要なのかを説明します。もし最小限の説明だけを読みたいのであれば **[概要](#summary)** まで飛ばしてください。 ## 標準ライブラリの無効化 デフォルトでは、全ての Rust クレートは[標準ライブラリ][standard library]にリンクされています。標準ライブラリはスレッドやファイル、ネットワークのような OS の機能に依存しています。また OS と密接な関係にある C の標準ライブラリ(`libc`)にも依存しています。私達の目的は OS を書くことなので、 OS 依存のライブラリを使うことはできません。そのため、 [`no_std` attribute] を使って標準ライブラリが自動的にリンクされるのを無効にします。 [standard library]: https://doc.rust-lang.org/std/ [`no_std` attribute]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html 新しい Cargo プロジェクトをつくるところから始めましょう。もっとも簡単なやり方はコマンドラインで以下を実行することです。 ```bash cargo new blog_os --bin --edition 2024 ``` プロジェクト名を `blog_os` としましたが、もちろんお好きな名前をつけていただいても大丈夫です。`--bin`フラグは実行可能バイナリを作成することを、 `--edition 2024` は[2024エディション][2024 edition]を使用することを明示的に指定します。コマンドを実行すると、 Cargoは以下のようなディレクトリ構造を作成します: [2024 edition]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ```bash blog_os ├── Cargo.toml └── src └── main.rs ``` `Cargo.toml` にはクレートの名前や作者名、[セマンティックバージョニング][semantic version]に基づくバージョンナンバーや依存関係などが書かれています。`src/main.rs` には私達のクレートのルートモジュールと `main` 関数が含まれています。`cargo build` コマンドでこのクレートをコンパイルして、 `target/debug` ディレクトリの中にあるコンパイルされた `blog_os` バイナリを実行することができます。 [semantic version]: https://semver.org/ ### `no_std` Attribute 今のところ私達のクレートは暗黙のうちに標準ライブラリをリンクしています。[`no_std` attribute]を追加してこれを無効にしてみましょう: ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` (`cargo build` を実行して)ビルドしようとすると、次のようなエラーが発生します: ```bash error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` これは [`println` マクロ][`println` macro]が標準ライブラリに含まれているためです。`no_std` で標準ライブラリを無効にしたので、何かをプリントすることはできなくなりました。`println` は標準出力に書き込むのでこれは理にかなっています。[標準出力][standard output]は OS によって提供される特別なファイル記述子であるためです。 [`println` macro]: https://doc.rust-lang.org/std/macro.println.html [standard output]: https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29 では、 `println` を削除し `main` 関数を空にしてもう一度ビルドしてみましょう: ```rust // main.rs #![no_std] fn main() {} ``` ```bash > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` この状態では `#[panic_handler]` 関数と `language item` がないというエラーが発生します。 ## Panic の実装 `panic_handler` attribute は[パニック]が発生したときにコンパイラが呼び出す関数を定義します。標準ライブラリには独自のパニックハンドラー関数がありますが、 `no_std` 環境では私達の手でそれを実装する必要があります: [パニック]: https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html ```rust // in main.rs use core::panic::PanicInfo; /// この関数はパニック時に呼ばれる #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` [`PanicInfo` パラメータ]には、パニックが発生したファイルと行、およびオプションでパニックメッセージが含まれます。この関数は戻り値を取るべきではないので、["never" 型(`!`)][“never” type]を返すことで[発散する関数][diverging function]となります。今のところこの関数でできることは多くないので、無限にループするだけです。 [`PanicInfo` パラメータ]: https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html [diverging function]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions [“never” type]: https://doc.rust-lang.org/nightly/std/primitive.never.html ## `eh_personality` Language Item language item はコンパイラによって内部的に必要とされる特別な関数や型です。例えば、[`Copy`] トレイトはどの型が[コピーセマンティクス][`Copy`]を持っているかをコンパイラに伝える language item です。[実装][copy code]を見てみると、 language item として定義されている特別な `#[lang = "copy"]` attribute を持っていることが分かります。 [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [copy code]: https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299 独自に language item を実装することもできますが、これは最終手段として行われるべきでしょう。というのも、language item は非常に不安定な実装であり型検査も行われないからです(なので、コンパイラは関数が正しい引数の型を取っているかさえ検査しません)。幸い、上記の language item のエラーを修正するためのより安定した方法があります。 [`eh_personality` language item] は[スタックアンワインド][stack unwinding] を実装するための関数を定義します。デフォルトでは、パニックが起きた場合には Rust はアンワインドを使用してすべてのスタックにある変数のデストラクタを実行します。これにより、使用されている全てのメモリが確実に解放され、親スレッドはパニックを検知して実行を継続できます。しかしアンワインドは複雑であり、いくつかの OS 特有のライブラリ(例えば、Linux では [libunwind] 、Windows では[構造化例外][structured exception handling])を必要とするので、私達の OS には使いたくありません。 [`eh_personality` language item]: https://github.com/rust-lang/rust/blob/edb368491551a77d77a48446d4ee88b35490c565/src/libpanic_unwind/gcc.rs#L11-L45 [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php [libunwind]: https://www.nongnu.org/libunwind/ [structured exception handling]: https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling ### アンワインドの無効化 他にもアンワインドが望ましくないユースケースがあります。そのため、Rust には代わりに[パニックで中止する][abort on panic]オプションがあります。これにより、アンワインドのシンボル情報の生成が無効になり、バイナリサイズが大幅に削減されます。アンワインドを無効にする方法は複数あります。もっとも簡単な方法は、`Cargo.toml` に次の行を追加することです: ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` これは dev プロファイル(`cargo build` に使用される)と release プロファイル(`cargo build --release` に使用される)の両方でパニックで中止するようにするための設定です。これで `eh_personality` language item が不要になりました。 [abort on panic]: https://github.com/rust-lang/rust/pull/32900 これで上の2つのエラーを修正しました。しかし、コンパイルしようとすると別のエラーが発生します: ```bash > cargo build error: requires `start` lang_item ``` 私達のプログラムにはエントリポイントを定義する `start` language item がありません。 ## `start` attribute `main` 関数はプログラムを実行したときに最初に呼び出される関数であると思うかもしれません。しかし、ほとんどの言語には[ランタイムシステム][runtime system]があり、これはガベージコレクション(Java など)やソフトウェアスレッド(Go のゴルーチン)などを処理します。ランタイムは自身を初期化する必要があるため、`main` 関数の前に呼び出す必要があります。これにはスタック領域の作成と正しいレジスタへの引数の配置が含まれます。 [runtime system]: https://en.wikipedia.org/wiki/Runtime_system 標準ライブラリをリンクする一般的な Rust バイナリでは、`crt0` ("C runtime zero")と呼ばれる C のランタイムライブラリで実行が開始され、C アプリケーションの環境が設定されます。その後 C ランタイムは、`start` language item で定義されている [Rust ランタイムのエントリポイント][rt::lang_start]を呼び出します。Rust にはごくわずかなランタイムしかありません。これは、スタックオーバーフローを防ぐ設定やパニック時のバックトレースの表示など、いくつかの小さな処理を行います。最後に、ランタイムは `main` 関数を呼び出します。 [rt::lang_start]: https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73 私達のフリースタンディングな実行可能ファイルは今のところ Rust ランタイムと `crt0` へアクセスできません。なので、私達は自身でエントリポイントを定義する必要があります。`start` language item を実装することは `crt0` を必要とするのでここではできません。代わりに `crt0` エントリポイントを直接上書きしなければなりません。 ### エントリポイントの上書き Rust コンパイラに通常のエントリポイントを使いたくないことを伝えるために、`#![no_main]` attribute を追加します。 ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `main` 関数を削除したことに気付いたかもしれません。`main` 関数を呼び出す基盤となるランタイムなしには置いていても意味がないからです。代わりに、OS のエントリポイントを独自の `_start` 関数で上書きしていきます: ```rust #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { loop {} } ``` Rust コンパイラが `_start` という名前の関数を実際に出力するように、`#[unsafe(no_mangle)]` attributeを用いて[名前修飾][name mangling]を無効にします。この attribute がないと、コンパイラはすべての関数にユニークな名前をつけるために、 `_ZN3blog_os4_start7hb173fedf945531caE` のようなシンボルを生成します。次のステップでエントリポイントとなる関数の名前をリンカに伝えるため、この属性が必要となります。 また、(指定されていない Rust の呼び出し規約の代わりに)この関数に [C の呼び出し規約][C calling convention]を使用するようコンパイラに伝えるために、関数を `extern "C"` として定義する必要があります。`_start`という名前をつける理由は、これがほとんどのシステムのデフォルトのエントリポイント名だからです。 [name mangling]: https://en.wikipedia.org/wiki/Name_mangling [C calling convention]: https://en.wikipedia.org/wiki/Calling_convention 戻り値の型である `!` は関数が発散している、つまり値を返すことができないことを意味しています。エントリポイントはどの関数からも呼び出されず、OS またはブートローダから直接呼び出されるので、これは必須です。なので、値を返す代わりに、エントリポイントは例えば OS の [`exit` システムコール][`exit` system call]を呼び出します。今回はフリースタンディングなバイナリが返されたときマシンをシャットダウンするようにすると良いでしょう。今のところ、私達は無限ループを起こすことで要件を満たします。 [`exit` system call]: https://en.wikipedia.org/wiki/Exit_(system_call) `cargo build` を実行すると、見づらいリンカエラーが発生します。 ## リンカエラー リンカは、生成されたコードを実行可能ファイルに紐付けるプログラムです。実行可能ファイルの形式は Linux や Windows、macOS でそれぞれ異なるため、各システムにはそれぞれ異なるエラーを発生させる独自のリンカがあります。エラーの根本的な原因は同じです。リンカのデフォルト設定では、プログラムが C ランタイムに依存していると仮定していますが、実際にはしていません。 エラーを回避するためにはリンカに C ランタイムに依存しないことを伝える必要があります。これはリンカに一連の引数を渡すか、ベアメタルターゲット用にビルドすることで可能となります。 ### ベアメタルターゲット用にビルドする デフォルトでは、Rust は現在のシステム環境に合った実行可能ファイルをビルドしようとします。例えば、`x86_64` で Windows を使用している場合、Rust は `x86_64` 用の `.exe` Windows 実行可能ファイルをビルドしようとします。このような環境は「ホスト」システムと呼ばれます。 様々な環境を表現するために、Rust は [_target triple_] という文字列を使います。`rustc --version --verbose` を実行すると、ホストシステムの target triple を確認できます: [_target triple_]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple ```bash rustc 1.35.0-nightly (474e7a648 2019-04-07) binary: rustc commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab commit-date: 2019-04-07 host: x86_64-unknown-linux-gnu release: 1.35.0-nightly LLVM version: 8.0 ``` 上記の出力は `x86_64` の Linux によるものです。`host` は `x86_64-unknown-linux-gnu` です。これには CPU アーキテクチャ(`x86_64`)、ベンダー(`unknown`)、OS(`Linux`)、そして [ABI] (`gnu`)が含まれています。 [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface ホストの triple 用にコンパイルすることで、Rust コンパイラとリンカは、デフォルトで C ランタイムを使用する Linux や Windows のような基盤となる OS があると想定し、それによってリンカエラーが発生します。なのでリンカエラーを回避するために、基盤となる OS を使用せずに異なる環境用にコンパイルします。 このようなベアメタル環境の例としては、`thumbv7em-none-eabihf` target triple があります。これは、[組込みシステム][embedded]を表しています。詳細は省きますが、重要なのは `none` という文字列からわかるように、 この target triple に基盤となる OS がないことです。このターゲット用にコンパイルできるようにするには、 rustup にこれを追加する必要があります: [embedded]: https://en.wikipedia.org/wiki/Embedded_system ```bash rustup target add thumbv7em-none-eabihf ``` これにより、この target triple 用の標準(およびコア)ライブラリのコピーがダウンロードされます。これで、このターゲット用にフリースタンディングな実行可能ファイルをビルドできます: ```bash cargo build --target thumbv7em-none-eabihf ``` `--target` 引数を渡すことで、ベアメタルターゲット用に実行可能ファイルを[クロスコンパイル][cross compile]します。このターゲットシステムには OS がないため、リンカは C ランタイムをリンクしようとせず、ビルドはリンカエラーなしで成功します。 [cross compile]: https://en.wikipedia.org/wiki/Cross_compiler これが私達の OS カーネルを書くために使うアプローチです。`thumbv7em-none-eabihf` の代わりに、`x86_64` のベアメタル環境を表す[カスタムターゲット][custom target]を使用することもできます。詳細は次のセクションで説明します。 [custom target]: https://doc.rust-lang.org/rustc/targets/custom.html ### リンカへの引数 ベアメタルターゲット用にコンパイルする代わりに、特定の引数のセットをリンカにわたすことでリンカエラーを回避することもできます。これは私達がカーネルに使用するアプローチではありません。したがって、このセクションはオプションであり、選択肢を増やすために書かれています。表示するには以下の「リンカへの引数」をクリックしてください。
    リンカへの引数 このセクションでは、Linux、Windows、および macOS で発生するリンカエラーについてと、リンカに追加の引数を渡すことによってそれらを解決する方法を説明します。実行可能ファイルの形式とリンカは OS によって異なるため、システムごとに異なる引数のセットが必要です。 #### Linux Linux では以下のようなエラーが発生します(抜粋): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` 問題は、デフォルトで C ランタイムの起動ルーチンがリンカに含まれていることです。これは `_start` とも呼ばれます。`no_std` attribute により、C 標準ライブラリ `libc` のいくつかのシンボルが必要となります。なので、リンカはこれらの参照を解決できません。これを解決するために、リンカに `-nostartfiles` フラグを渡して、C の起動ルーチンをリンクしないようにします。 Cargo を通してリンカの attribute を渡す方法の一つに、`cargo rustc` コマンドがあります。このコマンドは `cargo build` と全く同じように動作しますが、基本となる Rust コンパイラである `rustc` にオプションを渡すことができます。`rustc` にはリンカに引数を渡す `-C link-arg` フラグがあります。新しいビルドコマンドは次のようになります: ```bash cargo rustc -- -C link-arg=-nostartfiles ``` これで crate を Linux 上で独立した実行ファイルとしてビルドできます! リンカはデフォルトで `_start` という名前の関数を探すので、エントリポイントとなる関数の名前を明示的に指定する必要はありません。 #### Windows Windows では別のリンカエラーが発生します(抜粋): ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` "entry point must be defined" というエラーは、リンカがエントリポイントを見つけられていないことを意味します。Windows では、デフォルトのエントリポイント名は[使用するサブシステム][windows-subsystems]によって異なります。`CONSOLE` サブシステムの場合、リンカは `mainCRTStartup` という名前の関数を探し、`WINDOWS` サブシステムの場合は、`WinMainCRTStartup` という名前の関数を探します。デフォルトの動作を無効にし、代わりに `_start` 関数を探すようにリンカに指示するには、`/ENTRY` 引数をリンカに渡します: [windows-subsystems]: https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol ```bash cargo rustc -- -C link-arg=/ENTRY:_start ``` 引数の形式が異なることから、Windows のリンカは Linux のリンカとは全く異なるプログラムであることが分かります。 これにより、別のリンカエラーが発生します: ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` このエラーは Windows での実行可能ファイルが異なる [subsystems][windows-subsystems] を使用することができるために発生します。通常のプログラムでは、エントリポイント名に基づいて推定されます。エントリポイントが `main` という名前の場合は `CONSOLE` サブシステムが使用され、エントリポイント名が `WinMain` である場合には `WINDOWS` サブシステムが使用されます。`_start` 関数は別の名前を持っているので、サブシステムを明示的に指定する必要があります: This error occurs because Windows executables can use different [subsystems][windows-subsystems]. For normal programs they are inferred depending on the entry point name: If the entry point is named `main`, the `CONSOLE` subsystem is used, and if the entry point is named `WinMain`, the `WINDOWS` subsystem is used. Since our `_start` function has a different name, we need to specify the subsystem explicitly: ```bash cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` ここでは `CONSOLE` サブシステムを使用しますが、`WINDOWS` サブシステムを使うこともできます。`-C link-arg` を複数渡す代わりに、スペースで区切られたリストを引数として取る `-C link-args` を渡します。 このコマンドで、実行可能ファイルが Windows 上で正しくビルドされます。 #### macOS macOS では次のようなリンカエラーが発生します(抜粋): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` このエラーメッセージは、リンカがデフォルト名が `main` (いくつかの理由で、macOS 上ではすべての関数の前には `_` が付きます) であるエントリポイントとなる関数を見つけられないことを示しています。`_start` 関数をエントリポイントとして設定するには、`-e` というリンカ引数を渡します: ```bash cargo rustc -- -C link-args="-e __start" ``` `-e` というフラグでエントリポイントとなる関数の名前を指定できます。macOS 上では全ての関数には `_` というプレフィックスが追加されるので、`_start` ではなく `__start` にエントリポイントを設定する必要があります。 これにより、次のようなリンカエラーが発生します: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` macOS は[正式には静的にリンクされたバイナリをサポートしておらず][does not officially support statically linked binaries]、プログラムはデフォルトで `libSystem` ライブラリにリンクされる必要があります。これを無効にして静的バイナリをリンクするには、`-static` フラグをリンカに渡します: [does not officially support statically linked binaries]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ```bash cargo rustc -- -C link-args="-e __start -static" ``` これでもまだ十分ではありません、3つ目のリンカエラーが発生します: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` このエラーは、macOS 上のプログラムがデフォルトで `crt0` ("C runtime zero") にリンクされるために発生します。これは Linux 上で起きたエラーと似ており、`-nostartfiles` というリンカ引数を追加することで解決できます: ```bash cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` これで 私達のプログラムを macOS 上で正しくビルドできます。 #### ビルドコマンドの統一 現時点では、ホストプラットフォームによって異なるビルドコマンドを使っていますが、これは理想的ではありません。これを回避するために、プラットフォーム固有の引数を含む `.cargo/config.toml` というファイルを作成します: ```toml # in .cargo/config.toml [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` `rustflags` には `rustc` を呼び出すたびに自動的に追加される引数が含まれています。`.cargo/config.toml` についての詳細は[公式のドキュメント][official documentation]を確認してください。 [official documentation]: https://doc.rust-lang.org/cargo/reference/config.html これで私達のプログラムは3つすべてのプラットフォーム上で、シンプルに `cargo build` のみでビルドすることができるようになります。 #### 私達はこれをすべきですか? これらの手順で Linux、Windows および macOS 用の独立した実行可能ファイルをビルドすることはできますが、おそらく良い方法ではありません。その理由は、例えば `_start` 関数が呼ばれたときにスタックが初期化されるなど、まだ色々なことを前提としているからです。C ランタイムがなければ、これらの要件のうちいくつかが満たされない可能性があり、セグメンテーション違反(segfault)などによってプログラムが失敗する可能性があります。 もし既存の OS 上で動作する最小限のバイナリを作成したいなら、`libc` を使って `#[start]` attribute を[ここ][no-stdlib]で説明するとおりに設定するのが良いでしょう。 [no-stdlib]: https://doc.rust-lang.org/1.16.0/book/no-stdlib.html
    ## 概要 {#summary} 最小限の独立した Rust バイナリは次のようになります: `src/main.rs`: ```rust #![no_std] // Rust の標準ライブラリにリンクしない #![no_main] // 全ての Rust レベルのエントリポイントを無効にする use core::panic::PanicInfo; #[unsafe(no_mangle)] // この関数の名前修飾をしない pub extern "C" fn _start() -> ! { // リンカはデフォルトで `_start` という名前の関数を探すので、 // この関数がエントリポイントとなる loop {} } /// この関数はパニック時に呼ばれる #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # the profile used for `cargo build` [profile.dev] panic = "abort" # disable stack unwinding on panic # the profile used for `cargo build --release` [profile.release] panic = "abort" # disable stack unwinding on panic ``` このバイナリをビルドするために、`thumbv7em-none-eabihf` のようなベアメタルターゲット用にコンパイルする必要があります: ```bash cargo build --target thumbv7em-none-eabihf ``` あるいは、追加のリンカ引数を渡してホストシステム用にコンパイルすることもできます: ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` これは独立した Rust バイナリの最小の例にすぎないことに注意してください。このバイナリは `_start` 関数が呼び出されたときにスタックが初期化されるなど、さまざまなことを前提としています。**そのため、このようなバイナリを実際に使用するには、より多くの手順が必要となります**。 ## `rust-analyzer` を満足させる [`rust-analyzer`](https://rust-analyzer.github.io/) プロジェクトは、エディタでRustコードのコード補完や「定義へ移動」のサポート(およびその他多くの機能)を提供する優れたツールです。 `#![no_std]` プロジェクトでも非常によく動作するため、カーネル開発にも使用することをお勧めします! `rust-analyzer` の [`checkOnSave`](https://rust-analyzer.github.io/book/configuration.html#checkOnSave) 機能(デフォルトで有効)を使用している場合、カーネルの panic 関数に対してエラーが報告されるかもしれません。 ``` found duplicate lang item `panic_impl` ``` このエラーの理由は、`rust-analyzer` がデフォルトで `cargo check --all-targets` を呼び出すためです。これにより、[テスト](https://doc.rust-lang.org/book/ch11-01-writing-tests.html) モードと[ベンチマーク](https://doc.rust-lang.org/rustc/tests/index.html#benchmarks) モードでもバイナリのビルドが試行されます。
    ### 「target」の2つの意味 `--all-targets` フラグは `--target` 引数とは完全に無関係です。 `cargo` における「target」という用語には2つの異なる意味があります。 - `--target` フラグは、`rustc` コンパイラに渡されるべき **[コンパイルターゲット][compile target]** を指定します。これは、私たちのコードを実行するマシンの[ターゲットトリプル][target triple]に設定する必要があります。 - `--all-targets` フラグは、Cargoの **[パッケージターゲット][package target]** を参照します。Cargoパッケージはライブラリとバイナリを同時に持つことができるため、 crate をどのようにビルドしたいかを指定できます。さらに、Cargoには[サンプル](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#examples)、[テスト](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#tests)、[ベンチマーク](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#benchmarks)用のパッケージターゲットもあります。これらのパッケージターゲットは共存できるため、同じ crate をライブラリモードやテストモードなどでビルド/チェックすることができます。 [compile target]: https://doc.rust-lang.org/rustc/targets/index.html [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [package target]: https://doc.rust-lang.org/cargo/reference/cargo-targets.html
    デフォルトでは、`cargo check` は _ライブラリ_ と _バイナリ_ のパッケージターゲットのみをビルドします。 しかし、`rust-analyzer` は [`checkOnSave`](https://rust-analyzer.github.io/book/configuration.html#checkOnSave) が有効な場合、デフォルトですべてのパッケージターゲットをチェックすることを選択します。 これが、`cargo check` では見られない上記の `lang item` エラーを `rust-analyzer` が報告する理由です。 `cargo check --all-targets` を実行すると、同じエラーが表示されます。 ``` error[E0152]: found duplicate lang item `panic_impl` --> src/main.rs:13:1 | 13 | / fn panic(_info: &PanicInfo) -> ! { 14 | | loop {} 15 | | } | |_^ | = note: the lang item is first defined in crate `std` (which `test` depends on) = note: first definition in `std` loaded from /home/[...]/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-8df6be531efb3fd0.rlib = note: second definition in the local crate (`blog_os`) ``` 最初の `note` は、 panic 言語アイテムがすでに `std` crate で定義されていることを示しています。`std` は `test` crate の依存関係です。 `test` crate は、[テストモード](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#tests)で crate をビルドするときに自動的に含まれます。 これは、ベアメタル上で標準ライブラリをサポートする方法がない、`#![no_std]` カーネルには適していません。 したがって、このエラーはプロジェクトに関係ないため、安全に無視できます。 このエラーを回避する適切な方法は、`Cargo.toml` でバイナリが `test` モードと `bench` モードでのビルドをサポートしていないことを指定することです。 これは、`Cargo.toml` に `[[bin]]` セクションを追加して、バイナリの[ビルドを設定](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#configuring-a-target)することで行えます。 ```toml # Cargo.toml 内 [[bin]] name = "blog_os" test = false bench = false ``` `bin` の周りの二重括弧は間違いではありません。これはTOML形式で、複数回出現できるキーを定義する方法です。 crate は複数のバイナリを持つことができるため、`[[bin]]` セクションも `Cargo.toml` 内に複数回出現できます。 これが `name` フィールドが必須である理由でもあり、バイナリの名前と一致させる必要があります(これにより `cargo` はどの設定をどのバイナリに適用すべきかを認識します)。 [`test`](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#the-test-field) フィールドと [`bench`](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#the-bench-field) フィールドを `false` に設定することで、テストモードやベンチマークモードでバイナリをビルドしないよう `cargo` に指示します。 これで `cargo check --all-targets` はエラーをスローしなくなり、`rust-analyzer` の `checkOnSave` も満足するはずです。 ## 次は? [次の記事][next post]では、この独立したバイナリを最小限の OS カーネルにするために必要なステップを説明しています。カスタムターゲットの作成、実行可能ファイルとブートローダの組み合わせ、画面に何か文字を表示する方法について説明しています。 [next post]: @/edition-2/posts/02-minimal-rust-kernel/index.ja.md ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.ko.md ================================================ +++ title = "Rust로 'Freestanding 실행파일' 만들기" weight = 1 path = "ko/freestanding-rust-binary" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "c1af4e31b14e562826029999b9ab1dce86396b93" # GitHub usernames of the people that translated this post translators = ["JOE1994", "Quqqu"] +++ 운영체제 커널을 만드는 첫 단계는 표준 라이브러리(standard library)를 링크하지 않는 Rust 실행파일을 만드는 것입니다. 이 실행파일은 운영체제가 없는 [bare metal] 시스템에서 동작할 수 있습니다. [bare metal]: https://en.wikipedia.org/wiki/Bare_machine 이 블로그는 [GitHub 저장소][GitHub]에서 오픈 소스로 개발되고 있으니, 문제나 문의사항이 있다면 저장소의 'Issue' 기능을 이용해 제보해주세요. [페이지 맨 아래][at the bottom]에 댓글을 남기실 수도 있습니다. 이 포스트와 관련된 모든 소스 코드는 저장소의 [`post-01 브랜치`][post branch]에서 확인하실 수 있습니다. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## 소개 운영체제 커널을 만드려면 운영체제에 의존하지 않는 코드가 필요합니다. 자세히 설명하자면, 스레드, 파일, 동적 메모리, 네트워크, 난수 생성기, 표준 출력 및 기타 운영체제의 추상화 또는 특정 하드웨어의 기능을 필요로 하는 것들은 전부 사용할 수 없다는 뜻입니다. 우리는 스스로 운영체제 및 드라이버를 직접 구현하려는 상황이니 어찌 보면 당연한 조건입니다. 운영체제에 의존하지 않으려면 [Rust 표준 라이브러리][Rust standard library]의 많은 부분을 사용할 수 없습니다. 그래도 우리가 이용할 수 있는 Rust 언어 자체의 기능들은 많이 남아 있습니다. 예를 들어 [반복자][iterators], [클로저][closures], [패턴 매칭][pattern matching], [option] / [result], [문자열 포맷 설정][string formatting], 그리고 [소유권 시스템][ownership system] 등이 있습니다. 이러한 기능들은 우리가 커널을 작성할 때 [undefined behavior]나 [메모리 안전성][memory safety]에 대한 걱정 없이 큰 흐름 단위의 코드를 작성하는 데에 집중할 수 있도록 해줍니다. [option]: https://doc.rust-lang.org/core/option/ [result]:https://doc.rust-lang.org/core/result/ [Rust standard library]: https://doc.rust-lang.org/std/ [iterators]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [string formatting]: https://doc.rust-lang.org/core/macro.write.html [ownership system]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [undefined behavior]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [memory safety]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention Rust로 운영체제 커널을 작성하려면, 운영체제 없이도 실행가능한 실행파일이 필요합니다. 이러한 실행파일은 보통 "freestanding 실행파일" 혹은 "bare-metal 실행파일" 이라고 불립니다. 이 포스트에서는 "freestanding 실행 파일" 을 만드는 데 필요한 것들을 여러 단계로 나누고, 각 단계가 왜 필요한지에 대해 설명해드립니다. 중간 과정은 생략하고 그저 최소한의 예제 코드만 확인하고 싶으시면 **[요약 섹션으로 넘어가시면 됩니다](#summary)**. ## Rust 표준 라이브러리 링크 해제하기 모든 Rust 프로그램들은 Rust 표준 라이브러리를 링크하는데, 이 라이브러리는 스레드, 파일, 네트워킹 등의 기능을 제공하기 위해 운영체제에 의존합니다. Rust 표준 라이브러리는 또한 C 표준 라이브러리인 `libc`에도 의존합니다 (`libc`는 운영체제의 여러 기능들을 이용합니다). 우리가 운영체제를 직접 구현하기 위해서는 운영체제를 이용하는 라이브러리들은 사용할 수 없습니다. 그렇기에 우선 [`no_std` 속성][`no_std` attribute]을 이용해 자동으로 Rust 표준 라이브러리가 링크되는 것을 막아야 합니다. [standard library]: https://doc.rust-lang.org/std/ [`no_std` attribute]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html 제일 먼저 아래의 명령어를 통해 새로운 cargo 애플리케이션 크레이트를 만듭니다. ``` cargo new blog_os --bin --edition 2024 ``` 프로젝트 이름은 `blog_os` 또는 원하시는 이름으로 정해주세요. `--bin` 인자는 우리가 cargo에게 실행 파일 (라이브러리와 대조됨)을 만들겠다고 알려주고, `--edition 2024` 인자는 cargo에게 우리가 [Rust 2024 에디션][2024 edition]을 사용할 것이라고 알려줍니다. 위 명령어를 실행하고 나면, cargo가 아래와 같은 크레이트 디렉토리를 만들어줍니다. [2024 edition]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` 크레이트 설정은 `Cargo.toml`에 전부 기록해야 합니다 (크레이트 이름, 크레이트 원작자, [semantic version] 번호, 의존 라이브러리 목록 등). `src/main.rs` 파일에 크레이트 실행 시 맨 처음 호출되는 `main` 함수를 포함한 중추 모듈이 있습니다. `cargo build` 명령어를 통해 크레이트를 빌드하면 `target/debug` 디렉토리에 `blog_os` 실행파일이 생성됩니다. [semantic version]: https://semver.org/ ### `no_std` 속성 현재 우리가 만든 크레이트는 암시적으로 Rust 표준 라이브러리를 링크합니다. 아래와 같이 [`no_std` 속성]을 이용해 더 이상 표준 라이브러리가 링크되지 않게 해줍니다. ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` 이제 `cargo build` 명령어를 다시 실행하면 아래와 같은 오류 메세지가 뜰 것입니다: ``` error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` 이 오류가 뜨는 이유는 [`println` 매크로][`println` macro]를 제공하는 Rust 표준 라이브러리를 우리의 크레이트에 링크하지 않게 되었기 때문입니다. `println`은 [표준 입출력][standard output] (운영체제가 제공하는 특별한 파일 서술자)으로 데이터를 쓰기 때문에, 우리는 이제 `println`을 이용해 메세지를 출력할 수 없습니다. [`println` macro]: https://doc.rust-lang.org/std/macro.println.html [standard output]: https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29 `println` 매크로 호출 코드를 지운 후 크레이트를 다시 빌드해봅시다. ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` 오류 메세지를 통해 컴파일러가 `#[panic_handler]` 함수와 _language item_ 을 필요로 함을 확인할 수 있습니다. ## 패닉 (Panic) 시 호출되는 함수 구현하기 컴파일러는 [패닉][panic]이 일어날 경우 `panic_handler` 속성이 적용된 함수가 호출되도록 합니다. 표준 라이브러리는 패닉 시 호출되는 함수가 제공되지만, `no_std` 환경에서는 우리가 패닉 시 호출될 함수를 직접 설정해야 합니다. [panic]: https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html ```rust // in main.rs use core::panic::PanicInfo; /// 패닉이 일어날 경우, 이 함수가 호출됩니다. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` [`PanicInfo` 인자][PanicInfo]는 패닉이 일어난 파일명, 패닉이 파일 내 몇 번째 줄에서 일어났는지, 그리고 패닉시 전달된 메세지에 대한 정보를 가진 구조체입니다. 위 `panic` 함수는 절대로 반환하지 않기에 ["never" 타입][“never” type] `!`을 반환하도록 적어 컴파일러에게 이 함수가 [반환 함수][diverging function]임을 알립니다. 당장 이 함수에서 우리가 하고자 하는 일은 없기에 그저 함수가 반환하지 않도록 무한루프를 넣어줍니다. [PanicInfo]: https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html [diverging function]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions [“never” type]: https://doc.rust-lang.org/nightly/std/primitive.never.html ## `eh_personality` Language Item Language item은 컴파일러가 내부적으로 요구하는 특별한 함수 및 타입들을 가리킵니다. 예를 들어 [`Copy`] 트레잇은 어떤 타입들이 [_copy semantics_][`Copy`] 를 가지는지 컴파일러에게 알려주는 language item 입니다. [`Copy` 트레잇이 구현된 코드][copy code]에 있는 `#[lang = "copy"]` 속성을 통해 이 트레잇이 language item으로 선언되어 있음을 확인할 수 있습니다. [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [copy code]: https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299 임의로 구현한 language item을 사용할 수는 있지만, 위험할 수도 있기에 주의해야 합니다. 그 이유는 language item의 구현 코드는 매우 자주 변경되어 불안정하며, language item에 대해서 컴파일러가 타입 체크 조차 하지 않습니다 (예시: language item 함수의 인자 타입이 정확한지 조차 체크하지 않습니다). 임의로 구현한 language item을 이용하는 것보다 더 안정적으로 위의 language item 오류를 고칠 방법이 있습니다. [`eh_personality` language item]은 [스택 되감기 (stack unwinding)][stack unwinding]을 구현하는 함수를 가리킵니다. 기본적으로 Rust는 [패닉][panic]이 일어났을 때 스택 되감기를 통해 스택에 살아있는 각 변수의 소멸자를 호출합니다. 이를 통해 자식 스레드에서 사용 중이던 모든 메모리 리소스가 반환되고, 부모 스레드가 패닉에 대처한 후 계속 실행될 수 있게 합니다. 스택 되감기는 복잡한 과정으로 이루어지며 운영체제마다 특정한 라이브러리를 필요로 하기에 (예: Linux는 [libunwind], Windows는 [structured exception handling]), 우리가 구현할 운영체제에서는 이 기능을 사용하지 않을 것입니다. [`eh_personality` language item]: https://github.com/rust-lang/rust/blob/edb368491551a77d77a48446d4ee88b35490c565/src/libpanic_unwind/gcc.rs#L11-L45 [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php [libunwind]: https://www.nongnu.org/libunwind/ [structured exception handling]: https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling ### 스택 되감기를 해제하는 방법 스택 되감기가 불필요한 상황들이 여럿 있기에, Rust 언어는 [패닉 시 실행 종료][abort on panic] 할 수 있는 선택지를 제공합니다. 이는 스택 되감기에 필요한 심볼 정보 생성을 막아주어 실행 파일의 크기 자체도 많이 줄어들게 됩니다. 스택 되감기를 해제하는 방법은 여러가지 있지만, 가장 쉬운 방법은 `Cargo.toml` 파일에 아래의 코드를 추가하는 것입니다. ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` 위의 코드를 통해 `dev` 빌드 (`cargo build` 실행)와 `release` 빌드 (`cargo build --release` 실행) 에서 모두 패닉 시 실행이 종료되도록 설정되었습니다. 이제 더 이상 컴파일러가 `eh_personality` language item을 필요로 하지 않습니다. [abort on panic]: https://github.com/rust-lang/rust/pull/32900 위에서 본 오류들을 고쳤지만, 크레이트를 빌드하려고 하면 새로운 오류가 뜰 것입니다: ``` > cargo build error: requires `start` lang_item ``` 우리의 프로그램에는 프로그램 실행 시 최초 실행 시작 지점을 지정해주는 `start` language item이 필요합니다. ## `start` 속성 혹자는 프로그램 실행 시 언제나 `main` 함수가 가장 먼저 호출된다고 생각할지도 모릅니다. 대부분의 프로그래밍 언어들은 [런타임 시스템][runtime system]을 가지고 있는데, 이는 가비지 컬렉션 (예시: Java) 혹은 소프트웨어 스레드 (예시: GoLang의 goroutine) 등의 기능을 담당합니다. 이러한 런타임 시스템은 프로그램 실행 이전에 초기화 되어야 하기에 `main` 함수 호출 이전에 먼저 호출됩니다. [runtime system]: https://en.wikipedia.org/wiki/Runtime_system 러스트 표준 라이브러리를 링크하는 전형적인 러스트 실행 파일의 경우, 프로그램 실행 시 C 런타임 라이브러리인 `crt0` (“C runtime zero”) 에서 실행이 시작됩니다. `crt0`는 C 프로그램의 환경을 설정하고 초기화하는 런타임 시스템으로, 스택을 만들고 프로그램에 주어진 인자들을 적절한 레지스터에 배치합니다. `crt0`가 작업을 마친 후 `start` language item으로 지정된 [Rust 런타임의 실행 시작 함수][rt::lang_start]를 호출합니다. Rust는 최소한의 런타임 시스템을 가지며, 주요 기능은 스택 오버플로우 가드를 초기화하고 패닉 시 역추적 (backtrace) 정보를 출력하는 것입니다. Rust 런타임의 초기화 작업이 끝난 후에야 `main` 함수가 호출됩니다. [rt::lang_start]: https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73 우리의 "freestanding 실행 파일" 은 Rust 런타임이나 `crt0`에 접근할 수 없기에, 우리가 직접 프로그램 실행 시작 지점을 지정해야 합니다. `crt0`가 `start` language item을 호출해주는 방식으로 동작하기에, `start` language item을 구현하고 지정하는 것만으로는 문제를 해결할 수 없습니다. 대신 우리가 직접 `crt0`의 시작 지점을 대체할 새로운 실행 시작 지점을 제공해야 합니다. ### 실행 시작 지점 덮어쓰기 `#![no_main]` 속성을 이용해 Rust 컴파일러에게 우리가 일반적인 실행 시작 호출 단계를 이용하지 않겠다고 선언합니다. ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// 패닉이 일어날 경우, 이 함수가 호출됩니다. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `main` 함수가 사라진 것을 눈치채셨나요? `main` 함수를 호출해주는 런타임 시스템이 없는 이상 `main` 함수의 존재도 더 이상 의미가 없습니다. 우리는 운영체제가 호출하는 프로그램 실행 시작 지점 대신 우리의 새로운 `_start` 함수를 실행 시작 지점으로 대체할 것입니다. ```rust #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { loop {} } ``` `#[unsafe(no_mangle)]` 속성을 통해 [name mangling]을 해제하여 Rust 컴파일러가 `_start` 라는 이름 그대로 함수를 만들도록 합니다. 이 속성이 없다면, 컴파일러가 각 함수의 이름을 고유하게 만드는 과정에서 이 함수의 실제 이름을 `_ZN3blog_os4_start7hb173fedf945531caE` 라는 이상한 이름으로 바꿔 생성합니다. 우리가 원하는 실제 시작 지점 함수의 이름을 정확히 알고 있어야 링커 (linker)에도 그 이름을 정확히 전달할 수 있기에 (후속 단계에서 진행) `#[unsafe(no_mangle)]` 속성이 필요합니다. 또한 우리는 이 함수에 `extern "C"`라는 표시를 추가하여 이 함수가 Rust 함수 호출 규약 대신에 [C 함수 호출 규약][C calling convention]을 사용하도록 합니다. 함수의 이름을 `_start`로 지정한 이유는 그저 런타임 시스템들의 실행 시작 함수 이름이 대부분 `_start`이기 때문입니다. [name mangling]: https://en.wikipedia.org/wiki/Name_mangling [C calling convention]: https://en.wikipedia.org/wiki/Calling_convention `!` 반환 타입은 이 함수가 발산 함수라는 것을 의미합니다. 시작 지점 함수는 오직 운영체제나 부트로더에 의해서만 직접 호출됩니다. 따라서 시작 지점 함수는 반환하는 대신 운영체제의 [`exit` 시스템콜][`exit` system call]을 이용해 종료됩니다. 우리의 "freestanding 실행 파일" 은 실행 종료 후 더 이상 실행할 작업이 없기에, 시작 지점 함수가 작업을 마친 후 기기를 종료하는 것이 합리적입니다. 여기서는 일단 `!` 타입의 조건을 만족시키기 위해 무한루프를 넣어 줍니다. [`exit` system call]: https://en.wikipedia.org/wiki/Exit_(system_call) 다시 `cargo build`를 실행하면, 끔찍한 _링커_ 오류를 마주하게 됩니다. ## 링커 오류 링커는 컴파일러가 생성한 코드들을 묶어 실행파일로 만드는 프로그램입니다. 실행 파일 형식은 Linux, Windows, macOS 마다 전부 다르기에 각 운영체제는 자신만의 링커가 있고 링커마다 다른 오류 메세지를 출력할 것입니다. 오류가 나는 근본적인 원인은 모두 동일한데, 링커는 주어진 프로그램이 C 런타임 시스템을 이용할 것이라고 가정하는 반면 우리의 크레이트는 그렇지 않기 때문입니다. 이 링커 오류를 해결하려면 링커에게 C 런타임을 링크하지 말라고 알려줘야 합니다. 두 가지 방법이 있는데, 하나는 링커에 특정 인자들을 주는 것이고, 또다른 하나는 크레이트 컴파일 대상 기기를 bare metal 기기로 설정하는 것입니다. ### Bare Metal 시스템을 목표로 빌드하기 기본적으로 Rust는 당신의 현재 시스템 환경에서 실행할 수 있는 실행파일을 생성하고자 합니다. 예를 들어 Windows `x86_64` 사용자의 경우, Rust는 `x86_64` 명령어 셋을 사용하는 `.exe` 확장자 실행파일을 생성합니다. 사용자의 기본 시스템 환경을 "호스트" 시스템이라고 부릅니다. 여러 다른 시스템 환경들을 표현하기 위해 Rust는 [_target triple_]이라는 문자열을 이용합니다. 현재 호스트 시스템의 target triple이 궁금하시다면 `rustc --version --verbose` 명령어를 실행하여 확인 가능합니다. [_target triple_]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple ``` rustc 1.35.0-nightly (474e7a648 2019-04-07) binary: rustc commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab commit-date: 2019-04-07 host: x86_64-unknown-linux-gnu release: 1.35.0-nightly LLVM version: 8.0 ``` 위의 출력 내용은 `x86_64` Linux 시스템에서 얻은 것입니다. 호스트 target triple이 `x86_64-unknown-linux-gnu`으로 나오는데, 이는 CPU 아키텍쳐 정보 (`x86_64`)와 하드웨어 판매자 (`unknown`), 운영체제 (`linux`) 그리고 [응용 프로그램 이진 인터페이스 (ABI)][ABI] (`gnu`) 정보를 모두 담고 있습니다. [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface 우리의 호스트 시스템 triple을 위해 컴파일하는 경우, Rust 컴파일러와 링커는 Linux나 Windows와 같은 운영체제가 있다고 가정하고 또한 운영체제가 C 런타임 시스템을 사용할 것이라고 가정하기 때문에 링커 오류 메세지가 출력된 것입니다. 이런 링커 오류를 피하려면 운영체제가 없는 시스템 환경에서 코드가 구동하는 것을 목표로 컴파일해야 합니다. 운영체제가 없는 bare metal 시스템 환경의 한 예시로 `thumbv7em-none-eabihf` target triple이 있습니다 (이는 [임베디드][embedded] [ARM] 시스템을 가리킵니다). Target triple의 `none`은 시스템에 운영체제가 동작하지 않음을 의미하며, 이 target triple의 나머지 부분의 의미는 아직 모르셔도 괜찮습니다. 이 시스템 환경에서 구동 가능하도록 컴파일하려면 rustup에서 해당 시스템 환경을 추가해야 합니다. [embedded]: https://en.wikipedia.org/wiki/Embedded_system [ARM]: https://en.wikipedia.org/wiki/ARM_architecture ``` rustup target add thumbv7em-none-eabihf ``` 위 명령어를 실행하면 해당 시스템을 위한 Rust 표준 라이브러리 및 코어 라이브러리를 설치합니다. 이제 해당 target triple을 목표로 하는 freestanding 실행파일을 만들 수 있습니다. ``` cargo build --target thumbv7em-none-eabihf ``` `--target` 인자를 통해 우리가 해당 bare metal 시스템을 목표로 [크로스 컴파일][cross compile]할 것이라는 것을 cargo에게 알려줍니다. 목표 시스템 환경에 운영체제가 없는 것을 링커도 알기 때문에 C 런타임을 링크하려고 시도하지 않으며 이제는 링커 에러 없이 빌드가 성공할 것입니다. [cross compile]: https://en.wikipedia.org/wiki/Cross_compiler 우리는 이 방법을 이용하여 우리의 운영체제 커널을 빌드해나갈 것입니다. 위에서 보인 `thumbv7em-none-eabihf` 시스템 환경 대신 bare metal `x86_64` 시스템 환경을 묘사하는 [커스텀 시스템 환경][custom target]을 설정하여 빌드할 것입니다. 더 자세한 내용은 다음 포스트에서 더 설명하겠습니다. [custom target]: https://doc.rust-lang.org/rustc/targets/custom.html ### 링커 인자 Bare metal 시스템을 목표로 컴파일하는 대신, 링커에게 특정 인자들을 추가로 주어 링커 오류를 해결하는 방법도 있습니다. 이 방법은 앞으로 우리가 작성해나갈 커널 코드를 빌드할 때는 사용하지 않을 것이지만, 더 알고싶어 하실 분들을 위해서 이 섹션을 준비했습니다. 아래의 _"링커 인자"_ 텍스트를 눌러 이 섹션의 내용을 확인하세요.
    링커 인자 이 섹션에서는 Linux, Windows 그리고 macOS 각각의 운영체제에서 나타나는 링커 오류에 대해 다루고 각 운영체제마다 링커에 어떤 추가 인자들을 주어 링커 오류를 해결할 수 있는지 설명할 것입니다. #### Linux Linux 에서는 아래와 같은 링커 오류 메세지가 출력됩니다 (일부 생략됨): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` 이 상황을 설명하자면 링커가 기본적으로 C 런타임의 실행 시작 루틴을 링크하는데, 이 루틴 역시 `_start`라는 이름을 가집니다. 이 `_start` 루틴은 C 표준 라이브러리 (`libc`)가 포함하는 여러 symbol들을 필요로 하지만, 우리는 `no_std` 속성을 이용해 크레이트에서 `libc`를 링크하지 않기 때문에 링커가 몇몇 symbol들의 출처를 찾지 못하여 위와 같은 링커 오류 메세지가 출력되는 것입니다. 이 문제를 해결하려면, 링커에게 `--nostartfiles` 인자를 전달하여 더 이상 링커가 C 런타임의 실행 시작 루틴을 링크하지 않도록 해야 합니다. 링커에 인자를 전달하는 한 방법은 `cargo rustc` 명령어를 이용하는 것입니다. 이 명령어는 `cargo build`와 유사하게 동작하나, `rustc`(Rust 컴파일러)에 직접 인자를 전달할 수 있게 해줍니다. `rustc`는 `-C link-arg` 인자를 통해 링커에게 인자를 전달할 수 있게 해줍니다. 우리가 이용할 새로운 빌드 명령어는 아래와 같습니다: ``` cargo rustc -- -C link-arg=-nostartfiles ``` 이제 우리의 크레이트가 성공적으로 빌드되고 Linux에서 동작하는 freestanding 실행파일이 생성됩니다! 우리는 위의 빌드 명령어에서 실행 시작 함수의 이름을 명시적으로 전달하지 않았는데, 그 이유는 링커가 기본적으로 `_start` 라는 이름의 함수를 찾아 그 함수를 실행 시작 함수로 이용하기 때문입니다. #### Windows Windows에서는 다른 링커 오류를 마주하게 됩니다 (일부 생략): ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` 오류 메세지 "entry point must be defined"는 링커가 실행 시작 지점을 찾을 수 없다는 것을 알려줍니다. Windows에서는 기본 실행 시작 지점의 이름이 [사용 중인 서브시스템(subsystem)에 따라 다릅니다][windows-subsystems]. `CONSOLE` 서브시스템의 경우 링커가 `mainCRTStartup`이라는 함수를 실행 시작 지점으로 간주하고, `WINDOWS` 서브시스템의 경우 링커가 `WinMainCRTStartup`이라는 이름의 함수를 실행 시작 지점으로 간주합니다. 이러한 기본값을 변경하여 링커가 `_start`라는 이름의 함수를 실행 시작 지점으로 간주하도록 만드려면 링커에 `/ENTRY` 인자를 넘겨주어야 합니다: [windows-subsystems]: https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol ``` cargo rustc -- -C link-arg=/ENTRY:_start ``` Linux에서와는 다른 인자 형식을 통해 Windows의 링커는 Linux의 링커와 완전히 다른 프로그램이라는 것을 유추할 수 있습니다. 이제 또 다른 링커 오류가 발생합니다: ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` 이 오류가 뜨는 이유는 Windows 실행파일들은 여러 가지 [서브시스템][windows-subsystems]을 사용할 수 있기 때문입니다. 일반적인 프로그램들의 경우, 실행 시작 지점 함수의 이름에 따라 어떤 서브시스템을 사용하는지 추론합니다: 실행 시작 지점의 이름이 `main`인 경우 `CONSOLE` 서브시스템이 사용 중이라는 것을 알 수 있으며, 실행 시작 지점의 이름이 `WinMain`인 경우 `WINDOWS` 서브시스템이 사용 중이라는 것을 알 수 있습니다. 우리는 `_start`라는 새로운 이름의 실행 시작 지점을 이용할 것이기에, 우리가 어떤 서브시스템을 사용할 것인지 인자를 통해 명시적으로 링커에게 알려줘야 합니다: ``` cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` 위 명령어에서는 `CONSOLE` 서브시스템을 서용했지만, `WINDOWS` 서브시스템을 적용해도 괜찮습니다. `-C link-arg` 인자를 반복해서 쓰는 대신, `-C link-args`인자를 이용해 여러 인자들을 빈칸으로 구분하여 전달할 수 있습니다. 이 명령어를 통해 우리의 실행 파일을 Windows에서도 성공적으로 빌드할 수 있을 것입니다. #### macOS macOS에서는 아래와 같은 링커 오류가 출력됩니다 (일부 생략): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` 위 오류 메세지는 우리에게 링커가 실행 시작 지점 함수의 기본값 이름 `main`을 찾지 못했다는 것을 알려줍니다 (macOS에서는 무슨 이유에서인지 모든 함수들의 이름 맨 앞에 `_` 문자가 앞에 붙습니다). 실행 시작 지점 함수의 이름을 `_start`로 새롭게 지정해주기 위해 아래와 같이 링커 인자 `-e`를 이용합니다: ``` cargo rustc -- -C link-args="-e __start" ``` `-e` 인자를 통해 실행 시작 지점 함수 이름을 설정합니다. macOS에서는 모든 함수의 이름 앞에 추가로 `_` 문자가 붙기에, 실행 시작 지점 함수의 이름을 `_start` 대신 `__start`로 지정해줍니다. 이제 아래와 같은 링커 오류가 나타날 것입니다: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` macOS는 [공식적으로는 정적으로 링크된 실행파일을 지원하지 않으며][does not officially support statically linked binaries], 기본적으로 모든 프로그램이 `libSystem` 라이브러리를 링크하도록 요구합니다. 이러한 기본 요구사항을 무시하고 정적으로 링크된 실행 파일을 만드려면 링커에게 `-static` 인자를 주어야 합니다: [does not officially support statically linked binaries]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ``` cargo rustc -- -C link-args="-e __start -static" ``` 아직도 충분하지 않았는지, 세 번째 링커 오류가 아래와 같이 출력됩니다: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` 이 오류가 뜨는 이유는 macOS에서 모든 프로그램은 기본적으로 `crt0` (“C runtime zero”)를 링크하기 때문입니다. 이 오류는 우리가 Linux에서 봤던 오류와 유사한 것으로, 똑같이 링커에 `-nostartfiles` 인자를 주어 해결할 수 있습니다: ``` cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` 이제는 우리의 프로그램을 macOS에서 성공적으로 빌드할 수 있을 것입니다. #### 플랫폼 별 빌드 명령어들을 하나로 통합하기 위에서 살펴본 대로 호스트 플랫폼 별로 상이한 빌드 명령어가 필요한데, `.cargo/config.toml` 이라는 파일을 만들고 플랫폼 마다 필요한 상이한 인자들을 명시하여 여러 빌드 명령어들을 하나로 통합할 수 있습니다. ```toml # in .cargo/config.toml [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` `rustflags`에 포함된 인자들은 `rustc`가 실행될 때마다 자동적으로 `rustc`에 인자로 전달됩니다. `.cargo/config.toml`에 대한 더 자세한 정보는 [공식 안내 문서](https://doc.rust-lang.org/cargo/reference/config.html)를 통해 확인해주세요. 이제 `cargo build` 명령어 만으로 세 가지 플랫폼 어디에서도 우리의 프로그램을 성공적으로 빌드할 수 있습니다. #### 이렇게 하는 것이 괜찮나요? Linux, Windows 또는 macOS 위에서 동작하는 freestanding 실행파일을 빌드하는 것이 가능하긴 해도 좋은 방법은 아닙니다. 운영체제가 갖춰진 환경을 목표로 빌드를 한다면, 실행 파일 동작 시 다른 많은 조건들이 런타임에 의해 제공될 것이라는 가정 하에 빌드가 이뤄지기 때문입니다 (예: 실행 파일이 `_start` 함수가 호출되는 시점에 이미 스택이 초기화되어있을 것이라고 간주하고 작동합니다). C 런타임 없이는 실행 파일이 필요로 하는 조건들이 갖춰지지 않아 결국 세그멘테이션 오류가 나는 등 프로그램이 제대로 실행되지 못할 수 있습니다. 이미 존재하는 운영체제 위에서 동작하는 최소한의 실행 파일을 만들고 싶다면, `libc`를 링크하고 [이 곳의 설명](https://doc.rust-lang.org/1.16.0/book/no-stdlib.html)에 따라 `#[start]` 속성을 설정하는 것이 더 좋은 방법일 것입니다.
    ## 요약 {#summary} 아래와 같은 최소한의 코드로 "freestanding" Rust 실행파일을 만들 수 있습니다: `src/main.rs`: ```rust #![no_std] // Rust 표준 라이브러리를 링크하지 않도록 합니다 #![no_main] // Rust 언어에서 사용하는 실행 시작 지점 (main 함수)을 사용하지 않습니다 use core::panic::PanicInfo; #[unsafe(no_mangle)] // 이 함수의 이름을 mangle하지 않습니다 pub extern "C" fn _start() -> ! { // 링커는 기본적으로 '_start' 라는 이름을 가진 함수를 실행 시작 지점으로 삼기에, // 이 함수는 실행 시작 지점이 됩니다 loop {} } /// 패닉이 일어날 경우, 이 함수가 호출됩니다. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # `cargo build` 실행 시 이용되는 빌드 설정 [profile.dev] panic = "abort" # 패닉 시 스택 되감기를 하지 않고 바로 프로그램 종료 # `cargo build --release` 실행 시 이용되는 빌드 설정 [profile.release] panic = "abort" # 패닉 시 스택 되감기를 하지 않고 바로 프로그램 종료 ``` 이 실행 파일을 빌드하려면, `thumbv7em-none-eabihf`와 같은 bare metal 시스템 환경을 목표로 컴파일해야 합니다: ``` cargo build --target thumbv7em-none-eabihf ``` 또다른 방법으로, 각 호스트 시스템마다 추가적인 링커 인자들을 전달해주어 호스트 시스템 환경을 목표로 컴파일할 수도 있습니다: ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` 주의할 것은 이것이 정말 최소한의 freestanding Rust 실행 파일이라는 것입니다. 실행 파일은 여러 가지 조건들을 가정하는데, 그 예로 실행파일 동작 시 `_start` 함수가 호출될 때 스택이 초기화되어 있을 것을 가정합니다. **이 freestanding 실행 파일을 이용해 실제로 유용한 작업을 처리하려면 아직 더 많은 코드 구현이 필요합니다**. ## 다음 단계는 무엇일까요? [다음 포스트][next post]에서는 우리의 freestanding 실행 파일을 최소한의 기능을 갖춘 운영체제 커널로 만드는 과정을 단게별로 설명할 것입니다. 예시로 커스텀 시스템 환경을 설정하는 방법, 우리의 실행 파일을 부트로더와 합치는 방법, 그리고 화면에 메세지를 출력하는 방법 등에 대해 다루겠습니다. [next post]: @/edition-2/posts/02-minimal-rust-kernel/index.md ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.md ================================================ +++ title = "A Freestanding Rust Binary" weight = 1 path = "freestanding-rust-binary" date = 2018-02-10 [extra] chapter = "Bare Bones" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ The first step in creating our own operating system kernel is to create a Rust executable that does not link the standard library. This makes it possible to run Rust code on the [bare metal] without an underlying operating system. [bare metal]: https://en.wikipedia.org/wiki/Bare_machine This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-01`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## Introduction To write an operating system kernel, we need code that does not depend on any operating system features. This means that we can't use threads, files, heap memory, the network, random numbers, standard output, or any other features requiring OS abstractions or specific hardware. Which makes sense, since we're trying to write our own OS and our own drivers. This means that we can't use most of the [Rust standard library], but there are a lot of Rust features that we _can_ use. For example, we can use [iterators], [closures], [pattern matching], [option] and [result], [string formatting], and of course the [ownership system]. These features make it possible to write a kernel in a very expressive, high level way without worrying about [undefined behavior] or [memory safety]. [option]: https://doc.rust-lang.org/core/option/ [result]:https://doc.rust-lang.org/core/result/ [Rust standard library]: https://doc.rust-lang.org/std/ [iterators]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [string formatting]: https://doc.rust-lang.org/core/macro.write.html [ownership system]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [undefined behavior]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [memory safety]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention In order to create an OS kernel in Rust, we need to create an executable that can be run without an underlying operating system. Such an executable is often called a “freestanding” or “bare-metal” executable. This post describes the necessary steps to create a freestanding Rust binary and explains why the steps are needed. If you're just interested in a minimal example, you can **[jump to the summary](#summary)**. ## Disabling the Standard Library By default, all Rust crates link the [standard library], which depends on the operating system for features such as threads, files, or networking. It also depends on the C standard library `libc`, which closely interacts with OS services. Since our plan is to write an operating system, we can't use any OS-dependent libraries. So we have to disable the automatic inclusion of the standard library through the [`no_std` attribute]. [standard library]: https://doc.rust-lang.org/std/ [`no_std` attribute]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html We start by creating a new cargo application project. The easiest way to do this is through the command line: ``` cargo new blog_os --bin --edition 2024 ``` I named the project `blog_os`, but of course you can choose your own name. The `--bin` flag specifies that we want to create an executable binary (in contrast to a library) and the `--edition 2024` flag specifies that we want to use the [2024 edition] of Rust for our crate. When we run the command, cargo creates the following directory structure for us: [2024 edition]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` The `Cargo.toml` contains the crate configuration, for example the crate name, the author, the [semantic version] number, and dependencies. The `src/main.rs` file contains the root module of our crate and our `main` function. You can compile your crate through `cargo build` and then run the compiled `blog_os` binary in the `target/debug` subfolder. [semantic version]: https://semver.org/ ### The `no_std` Attribute Right now our crate implicitly links the standard library. Let's try to disable this by adding the [`no_std` attribute]: ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` When we try to build it now (by running `cargo build`), the following error occurs: ``` error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` The reason for this error is that the [`println` macro] is part of the standard library, which we no longer include. So we can no longer print things. This makes sense, since `println` writes to [standard output], which is a special file descriptor provided by the operating system. [`println` macro]: https://doc.rust-lang.org/std/macro.println.html [standard output]: https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29 So let's remove the printing and try again with an empty main function: ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` Now the compiler is missing a `#[panic_handler]` function and a _language item_. ## Panic Implementation The `panic_handler` attribute defines the function that the compiler should invoke when a [panic] occurs. The standard library provides its own panic handler function, but in a `no_std` environment we need to define it ourselves: [panic]: https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html ```rust // in main.rs use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` The [`PanicInfo` parameter][PanicInfo] contains the file and line where the panic happened and the optional panic message. The function should never return, so it is marked as a [diverging function] by returning the [“never” type] `!`. There is not much we can do in this function for now, so we just loop indefinitely. [PanicInfo]: https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html [diverging function]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions [“never” type]: https://doc.rust-lang.org/nightly/std/primitive.never.html ## The `eh_personality` Language Item Language items are special items (traits, functions, types, etc.) that are required internally by the compiler. For example, the [`Copy`] trait is a language item that tells the compiler which types have [_copy semantics_][`Copy`]. When we look at the [implementation][copy code], we see it has the special `#[lang = "copy"]` attribute that defines it as a language item. [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [copy code]: https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299 While providing custom implementations of language items is possible, it should only be done as a last resort. The reason is that language items are highly unstable implementation details and not even type checked (so the compiler doesn't even check if a function has the right argument types). Fortunately, there is a more stable way to fix the above language item error. The [`eh_personality` language item] marks a function that is used for implementing [stack unwinding]. By default, Rust uses unwinding to run the destructors of all live stack variables in case of a [panic]. This ensures that all used memory is freed and allows the parent thread to catch the panic and continue execution. Unwinding, however, is a complicated process and requires some OS-specific libraries (e.g. [libunwind] on Linux or [structured exception handling] on Windows), so we don't want to use it for our operating system. [`eh_personality` language item]: https://github.com/rust-lang/rust/blob/edb368491551a77d77a48446d4ee88b35490c565/src/libpanic_unwind/gcc.rs#L11-L45 [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php [libunwind]: https://www.nongnu.org/libunwind/ [structured exception handling]: https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling ### Disabling Unwinding There are other use cases as well for which unwinding is undesirable, so Rust provides an option to [abort on panic] instead. This disables the generation of unwinding symbol information and thus considerably reduces binary size. There are multiple places where we can disable unwinding. The easiest way is to add the following lines to our `Cargo.toml`: ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` This sets the panic strategy to `abort` for both the `dev` profile (used for `cargo build`) and the `release` profile (used for `cargo build --release`). Now the `eh_personality` language item should no longer be required. [abort on panic]: https://github.com/rust-lang/rust/pull/32900 Now we fixed both of the above errors. However, if we try to compile it now, another error occurs: ``` > cargo build error: requires `start` lang_item ``` Our program is missing the `start` language item, which defines the entry point. ## The `start` attribute One might think that the `main` function is the first function called when you run a program. However, most languages have a [runtime system], which is responsible for things such as garbage collection (e.g. in Java) or software threads (e.g. goroutines in Go). This runtime needs to be called before `main`, since it needs to initialize itself. [runtime system]: https://en.wikipedia.org/wiki/Runtime_system In a typical Rust binary that links the standard library, execution starts in a C runtime library called `crt0` (“C runtime zero”), which sets up the environment for a C application. This includes creating a stack and placing the arguments in the right registers. The C runtime then invokes the [entry point of the Rust runtime][rt::lang_start], which is marked by the `start` language item. Rust only has a very minimal runtime, which takes care of some small things such as setting up stack overflow guards or printing a backtrace on panic. The runtime then finally calls the `main` function. [rt::lang_start]: https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73 Our freestanding executable does not have access to the Rust runtime and `crt0`, so we need to define our own entry point. Implementing the `start` language item wouldn't help, since it would still require `crt0`. Instead, we need to overwrite the `crt0` entry point directly. ### Overwriting the Entry Point To tell the Rust compiler that we don't want to use the normal entry point chain, we add the `#![no_main]` attribute. ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` You might notice that we removed the `main` function. The reason is that a `main` doesn't make sense without an underlying runtime that calls it. Instead, we are now overwriting the operating system entry point with our own `_start` function: ```rust #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { loop {} } ``` By using the `#[unsafe(no_mangle)]` attribute, we disable [name mangling] to ensure that the Rust compiler really outputs a function with the name `_start`. Without the attribute, the compiler would generate some cryptic `_ZN3blog_os4_start7hb173fedf945531caE` symbol to give every function a unique name. The attribute is required because we need to tell the name of the entry point function to the linker in the next step. We also have to mark the function as `extern "C"` to tell the compiler that it should use the [C calling convention] for this function (instead of the unspecified Rust calling convention). The reason for naming the function `_start` is that this is the default entry point name for most systems. [name mangling]: https://en.wikipedia.org/wiki/Name_mangling [C calling convention]: https://en.wikipedia.org/wiki/Calling_convention The `!` return type means that the function is diverging, i.e. not allowed to ever return. This is required because the entry point is not called by any function, but invoked directly by the operating system or bootloader. So instead of returning, the entry point should e.g. invoke the [`exit` system call] of the operating system. In our case, shutting down the machine could be a reasonable action, since there's nothing left to do if a freestanding binary returns. For now, we fulfill the requirement by looping endlessly. [`exit` system call]: https://en.wikipedia.org/wiki/Exit_(system_call) When we run `cargo build` now, we get an ugly _linker_ error. ## Linker Errors The linker is a program that combines the generated code into an executable. Since the executable format differs between Linux, Windows, and macOS, each system has its own linker that throws a different error. The fundamental cause of the errors is the same: the default configuration of the linker assumes that our program depends on the C runtime, which it does not. To solve the errors, we need to tell the linker that it should not include the C runtime. We can do this either by passing a certain set of arguments to the linker or by building for a bare metal target. ### Building for a Bare Metal Target By default Rust tries to build an executable that is able to run in your current system environment. For example, if you're using Windows on `x86_64`, Rust tries to build an `.exe` Windows executable that uses `x86_64` instructions. This environment is called your "host" system. To describe different environments, Rust uses a string called [_target triple_]. You can see the target triple for your host system by running `rustc --version --verbose`: [_target triple_]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple ``` rustc 1.35.0-nightly (474e7a648 2019-04-07) binary: rustc commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab commit-date: 2019-04-07 host: x86_64-unknown-linux-gnu release: 1.35.0-nightly LLVM version: 8.0 ``` The above output is from a `x86_64` Linux system. We see that the `host` triple is `x86_64-unknown-linux-gnu`, which includes the CPU architecture (`x86_64`), the vendor (`unknown`), the operating system (`linux`), and the [ABI] (`gnu`). [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface By compiling for our host triple, the Rust compiler and the linker assume that there is an underlying operating system such as Linux or Windows that uses the C runtime by default, which causes the linker errors. So, to avoid the linker errors, we can compile for a different environment with no underlying operating system. An example of such a bare metal environment is the `thumbv7em-none-eabihf` target triple, which describes an [embedded] [ARM] system. The details are not important, all that matters is that the target triple has no underlying operating system, which is indicated by the `none` in the target triple. To be able to compile for this target, we need to add it in rustup: [embedded]: https://en.wikipedia.org/wiki/Embedded_system [ARM]: https://en.wikipedia.org/wiki/ARM_architecture ``` rustup target add thumbv7em-none-eabihf ``` This downloads a copy of the standard (and core) library for the system. Now we can build our freestanding executable for this target: ``` cargo build --target thumbv7em-none-eabihf ``` By passing a `--target` argument we [cross compile] our executable for a bare metal target system. Since the target system has no operating system, the linker does not try to link the C runtime and our build succeeds without any linker errors. [cross compile]: https://en.wikipedia.org/wiki/Cross_compiler This is the approach that we will use for building our OS kernel. Instead of `thumbv7em-none-eabihf`, we will use a [custom target] that describes a `x86_64` bare metal environment. The details will be explained in the next post. [custom target]: https://doc.rust-lang.org/rustc/targets/custom.html ### Linker Arguments Instead of compiling for a bare metal system, it is also possible to resolve the linker errors by passing a certain set of arguments to the linker. This isn't the approach that we will use for our kernel, therefore this section is optional and only provided for completeness. Click on _"Linker Arguments"_ below to show the optional content.
    Linker Arguments In this section we discuss the linker errors that occur on Linux, Windows, and macOS, and explain how to solve them by passing additional arguments to the linker. Note that the executable format and the linker differ between operating systems, so that a different set of arguments is required for each system. #### Linux On Linux the following linker error occurs (shortened): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` The problem is that the linker includes the startup routine of the C runtime by default, which is also called `_start`. It requires some symbols of the C standard library `libc` that we don't include due to the `no_std` attribute, therefore the linker can't resolve these references. To solve this, we can tell the linker that it should not link the C startup routine by passing the `-nostartfiles` flag. One way to pass linker attributes via cargo is the `cargo rustc` command. The command behaves exactly like `cargo build`, but allows to pass options to `rustc`, the underlying Rust compiler. `rustc` has the `-C link-arg` flag, which passes an argument to the linker. Combined, our new build command looks like this: ``` cargo rustc -- -C link-arg=-nostartfiles ``` Now our crate builds as a freestanding executable on Linux! We didn't need to specify the name of our entry point function explicitly since the linker looks for a function with the name `_start` by default. #### Windows On Windows, a different linker error occurs (shortened): ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` The "entry point must be defined" error means that the linker can't find the entry point. On Windows, the default entry point name [depends on the used subsystem][windows-subsystems]. For the `CONSOLE` subsystem, the linker looks for a function named `mainCRTStartup` and for the `WINDOWS` subsystem, it looks for a function named `WinMainCRTStartup`. To override the default and tell the linker to look for our `_start` function instead, we can pass an `/ENTRY` argument to the linker: [windows-subsystems]: https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol ``` cargo rustc -- -C link-arg=/ENTRY:_start ``` From the different argument format we clearly see that the Windows linker is a completely different program than the Linux linker. Now a different linker error occurs: ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` This error occurs because Windows executables can use different [subsystems][windows-subsystems]. For normal programs, they are inferred depending on the entry point name: If the entry point is named `main`, the `CONSOLE` subsystem is used, and if the entry point is named `WinMain`, the `WINDOWS` subsystem is used. Since our `_start` function has a different name, we need to specify the subsystem explicitly: ``` cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` We use the `CONSOLE` subsystem here, but the `WINDOWS` subsystem would work too. Instead of passing `-C link-arg` multiple times, we use `-C link-args` which takes a space separated list of arguments. With this command, our executable should build successfully on Windows. #### macOS On macOS, the following linker error occurs (shortened): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` This error message tells us that the linker can't find an entry point function with the default name `main` (for some reason, all functions are prefixed with a `_` on macOS). To set the entry point to our `_start` function, we pass the `-e` linker argument: ``` cargo rustc -- -C link-args="-e __start" ``` The `-e` flag specifies the name of the entry point function. Since all functions have an additional `_` prefix on macOS, we need to set the entry point to `__start` instead of `_start`. Now the following linker error occurs: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` macOS [does not officially support statically linked binaries] and requires programs to link the `libSystem` library by default. To override this and link a static binary, we pass the `-static` flag to the linker: [does not officially support statically linked binaries]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ``` cargo rustc -- -C link-args="-e __start -static" ``` This still does not suffice, as a third linker error occurs: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` This error occurs because programs on macOS link to `crt0` (“C runtime zero”) by default. This is similar to the error we had on Linux and can also be solved by adding the `-nostartfiles` linker argument: ``` cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` Now our program should build successfully on macOS. #### Unifying the Build Commands Right now we have different build commands depending on the host platform, which is not ideal. To avoid this, we can create a file named `.cargo/config.toml` that contains the platform-specific arguments: ```toml # in .cargo/config.toml [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` The `rustflags` key contains arguments that are automatically added to every invocation of `rustc`. For more information on the `.cargo/config.toml` file, check out the [official documentation](https://doc.rust-lang.org/cargo/reference/config.html). Now our program should be buildable on all three platforms with a simple `cargo build`. #### Should You Do This? While it's possible to build a freestanding executable for Linux, Windows, and macOS, it's probably not a good idea. The reason is that our executable still expects various things, for example that a stack is initialized when the `_start` function is called. Without the C runtime, some of these requirements might not be fulfilled, which might cause our program to fail, e.g. through a segmentation fault. If you want to create a minimal binary that runs on top of an existing operating system, including `libc` and setting the `#[start]` attribute as described [here](https://doc.rust-lang.org/1.16.0/book/no-stdlib.html) is probably a better idea.
    ## Summary A minimal freestanding Rust binary looks like this: `src/main.rs`: ```rust #![no_std] // don't link the Rust standard library #![no_main] // disable all Rust-level entry points use core::panic::PanicInfo; #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { // this function is the entry point, since the linker looks for a function // named `_start` by default loop {} } /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # the profile used for `cargo build` [profile.dev] panic = "abort" # disable stack unwinding on panic # the profile used for `cargo build --release` [profile.release] panic = "abort" # disable stack unwinding on panic ``` To build this binary, we need to compile for a bare metal target such as `thumbv7em-none-eabihf`: ``` cargo build --target thumbv7em-none-eabihf ``` Alternatively, we can compile it for the host system by passing additional linker arguments: ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` Note that this is just a minimal example of a freestanding Rust binary. This binary expects various things, for example, that a stack is initialized when the `_start` function is called. **So for any real use of such a binary, more steps are required**. ## Making `rust-analyzer` happy The [`rust-analyzer`](https://rust-analyzer.github.io/) project is a great way to get code completion and "go to definition" support (and many other features) for Rust code in your editor. It works really well for `#![no_std]` projects too, so I recommend using it for kernel development! If you're using the [`checkOnSave`](https://rust-analyzer.github.io/book/configuration.html#checkOnSave) feature of `rust-analyzer` (enabled by default), it might report an error for the panic function of our kernel: ``` found duplicate lang item `panic_impl` ``` The reason for this error is that `rust-analyzer` invokes `cargo check --all-targets` by default, which also tries to build the binary in [test](https://doc.rust-lang.org/book/ch11-01-writing-tests.html) and [benchmark](https://doc.rust-lang.org/rustc/tests/index.html#benchmarks) mode.
    ### The two meanings of "target" The `--all-targets` flag is completely unrelated to the `--target` argument. There are two different meanings of the term "target" in `cargo`: - The `--target` flag specifies the **[_compilation target_]** that should be passed to the `rustc` compiler. This should be set to the [target triple] of the machine that should run our code. - The `--all-targets` flag references the **[_package target_]** of Cargo. Cargo packages can be a library and binary at the same time, so you can specify in which way you like to build your crate. In addition, Cargo also has package targets for [examples](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#examples), [tests](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#tests), and [benchmarks](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#benchmarks). These package targets can co-exist, so you can build/check the same crate in e.g. library or test mode. [_compilation target_]: https://doc.rust-lang.org/rustc/targets/index.html [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [_package target_]: https://doc.rust-lang.org/cargo/reference/cargo-targets.html
    By default, `cargo check` only builds the _library_ and _binary_ package targets. However, `rust-analyzer` chooses to check all package targets by default when [`checkOnSave`](https://rust-analyzer.github.io/book/configuration.html#checkOnSave) is enabled. This is the reason that `rust-analyzer` reports the above `lang item` error that we don't see in `cargo check`. If we run `cargo check --all-targets`, we see the error too: ``` error[E0152]: found duplicate lang item `panic_impl` --> src/main.rs:13:1 | 13 | / fn panic(_info: &PanicInfo) -> ! { 14 | | loop {} 15 | | } | |_^ | = note: the lang item is first defined in crate `std` (which `test` depends on) = note: first definition in `std` loaded from /home/[...]/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-8df6be531efb3fd0.rlib = note: second definition in the local crate (`blog_os`) ``` The first `note` tells us that the panic language item is already defined in the `std` crate, which is a dependency of the `test` crate. The `test` crate is automatically included when building a crate in [test mode](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#tests). This does not make sense for our `#![no_std]` kernel as there is no way to support the standard library on bare metal. So this error is not relevant to our project and we can safely ignore it. The proper way to avoid this error is to specify in our `Cargo.toml` that our binary does not support building in `test` and `bench` modes. We can do that by adding a `[[bin]]` section to our `Cargo.toml` to [configure the build](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#configuring-a-target) of our binary: ```toml # in Cargo.toml [[bin]] name = "blog_os" test = false bench = false ``` The double-brackets around `bin` are not a mistake, this is how the TOML format defines keys that can appear multiple times. Since a crate can have multiple binaries, the `[[bin]]` section can appear multiple times in the `Cargo.toml` as well. This is also the reason for the mandatory `name` field, which needs to match the name of the binary (so that `cargo` knows which settings should be applied to which binary). By setting the [`test`](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#the-test-field) and [`bench` ](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#the-bench-field) fields to `false`, we instruct `cargo` to not build our binary in test or benchmark mode. Now `cargo check --all-targets` should not throw any errors anymore, and the `checkOnSave` implementation of `rust-analyzer` should be happy too. ## What's next? The [next post] explains the steps needed for turning our freestanding binary into a minimal operating system kernel. This includes creating a custom target, combining our executable with a bootloader, and learning how to print something to the screen. [next post]: @/edition-2/posts/02-minimal-rust-kernel/index.md ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.pt-BR.md ================================================ +++ title = "Um Binário Rust Independente" weight = 1 path = "pt-BR/freestanding-rust-binary" date = 2018-02-10 [extra] chapter = "O Básico" # Please update this when updating the translation translation_based_on_commit = "624f0b7663daca1ce67f297f1c450420fbb4d040" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ O primeiro passo para criar nosso próprio kernel de sistema operacional é criar um executável Rust que não vincule a biblioteca padrão. Isso torna possível executar o código Rust no [bare metal] sem um sistema operacional subjacente. [bare metal]: https://en.wikipedia.org/wiki/Bare_machine Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na banch [`post-01`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## Introdução Para escrever um kernel de sistema operacional, precisamos de código que não dependa de nenhum recurso do sistema operacional. Isso significa que não podemos usar threads, arquivos, memória heap, rede, números aleatórios, saída padrão ou qualquer outro recurso que exija abstrações do sistema operacional ou hardware específico. O que faz sentido, já que estamos tentando escrever nosso próprio sistema operacional e nossos próprios drivers. Isso significa que não podemos usar a maior parte da [biblioteca padrão do Rust], mas há muitos recursos do Rust que _podemos_ usar. Por exemplo, podemos usar [iteradores], [closures], [pattern matching], [option] e [result], [formatação de string] e, claro, o [sistema de ownership]. Esses recursos tornam possível escrever um kernel de uma maneira muito expressiva e de alto nível, sem nos preocuparmos com [undefined behavior] ou [memory safety]. [option]: https://doc.rust-lang.org/core/option/ [result]:https://doc.rust-lang.org/core/result/ [Rust standard library]: https://doc.rust-lang.org/std/ [iteradores]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [formatação de string]: https://doc.rust-lang.org/core/macro.write.html [sistema de ownership]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [undefined behavior]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [memory safety]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention Para criar um kernel de sistema operacional em Rust, precisamos criar um executável que possa ser executado sem um sistema operacional subjacente. Esse executável é frequentemente chamado de executável “autônomo” ou “bare-metal”. Este post descreve as etapas necessárias para criar um binário Rust independente e explica por que essas etapas são necessárias. Se você estiver interessado apenas em um exemplo mínimo, pode **[ir para o resumo](#resumo)**. ## Desativando a biblioteca padrão Por padrão, todos as crates Rust vinculam a [biblioteca padrão], que depende do sistema operacional para recursos como threads, arquivos ou rede. Ela também depende da biblioteca padrão C `libc`, que interage intimamente com os serviços do sistema operacional. Como nosso plano é escrever um sistema operacional, não podemos usar nenhuma biblioteca dependente de um sistema operacional. Portanto, temos que desativar a inclusão automática da biblioteca padrão por meio do [atributo `no_std`]. [biblioteca padrão]: https://doc.rust-lang.org/std/ [atributo `no_std`]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html Começamos criando um novo projeto de binário cargo. A maneira mais fácil de fazer isso é através da linha de comando: ``` cargo new blog_os --bin --edition 2024 ``` Eu nommei o projeto `blog_os`, mas claro que você pode escolher o seu próprio nome. A flag `--bin` especifica que queremos criar um executável binário (em contraste com uma biblioteca) e a flag `--edition 2024` especifica que queremos usar a [edição 2024] de Rust para nossa crate. Quando executamos o comando, o cargo cria a seguinte estrutura de diretório para nós: [edição 2024]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` O `Cargo.toml` contém a configuração da crate, por exemplo o nome da crate, o autor, o número da [versão semântica] e dependências. O arquivo `src/main.rs` contém o módulo raiz da nossa crate e nossa função `main`. Você pode compilar sua crate através de `cargo build` e então executar o binário compilado `blog_os` na subpasta `target/debug`. [versão semântica]: https://semver.org/ ### O Atributo `no_std` Agora nossa crate implicitamente vincula a biblioteca padrão. Vamos tentar desativar isso adicionando o [atributo `no_std`]: ```rust // main.rs #![no_std] fn main() { println!("Olá, mundo!"); } ``` Quando tentamos compilá-lo agora (executando `cargo build`), o seguinte erro ocorre: ``` error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Olá, mundo!"); | ^^^^^^^ ``` A razão deste erro é que a [macro `println`] é parte da biblioteca padrão, que não incluímos mais. Então não conseguimos mais imprimir coisas. Isso faz sentido, já que `println` escreve no [standard output], que é um descritor de arquivo especial fornecido pelo sistema operacional. [macro `println`]: https://doc.rust-lang.org/std/macro.println.html [standard output]: https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29 Então vamos remover o println!() e tentar novamente com uma função main vazia: ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` Agora o compilador está pedindo uma função `#[panic_handler]` e um _item de linguagem_. ## Implementação de Panic O atributo `panic_handler` define a função que o compilador deve invocar quando ocorre um [panic]. A biblioteca padrão fornece sua própria função de tratamento de panic, mas em um ambiente `no_std` precisamos defini-la nós mesmos: [panic]: https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html ```rust // in main.rs use core::panic::PanicInfo; /// Esta função é chamada em caso de pânico. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` O parâmetro [`PanicInfo`][PanicInfo] contém o arquivo e a linha onde o panic aconteceu e a mensagem de panic opcional. A função nunca deve retornar, então é marcada como uma [função divergente] ao retornar o [tipo “never”] `!`. Não há muito que possamos fazer nesta função por enquanto, então apenas fazemos um loop infinito. [PanicInfo]: https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html [função divergente]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions [tipo “never”]: https://doc.rust-lang.org/nightly/std/primitive.never.html ## O Item de Linguagem `eh_personality` Items de linguagem são elementos especiais (traits, funções, tipos, etc.) necessários internamente pelo compilador. Por exemplo, a trait [`Copy`] é um item de linguagem que diz ao compilador quais tipos têm [_semântica de cópia_][`Copy`]. Quando olhamos para a [implementação][copy code], vemos que tem o atributo especial `#[lang = "copy"]` que o define como um item de linguagem (_Language Item_ em inglês). [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [copy code]: https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299 Enquanto é possível fornecer implementações customizadas de items de linguagem, isso deve ser feito apenas como último recurso. A razão é que items de linguagem são detalhes de implementação altamente instáveis e nem mesmo verificados de tipo (então o compilador não verifica se uma função tem os tipos de argumento corretos). Felizmente, há uma forma mais estável de corrigir o erro de item de linguagem acima. O [item de linguagem `eh_personality`] marca uma função que é usada para implementar [stack unwinding]. Por padrão, Rust usa unwinding para executar os destructores de todas as variáveis da stack vivas em caso de [panic]. Isso garante que toda memória usada seja liberada e permite que a thread pai capture o panic e continue a execução. Unwinding, no entanto, é um processo complicado e requer algumas bibliotecas específicas do SO (por exemplo, [libunwind] no Linux ou [tratamento estruturado de exceção] no Windows), então não queremos usá-lo para nosso sistema operacional. [item de linguagem `eh_personality`]: https://github.com/rust-lang/rust/blob/edb368491551a77d77a48446d4ee88b35490c565/src/libpanic_unwind/gcc.rs#L11-L45 [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php [libunwind]: https://www.nongnu.org/libunwind/ [tratamento estruturado de exceção]: https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling ### Desativando o Unwinding Existem outros casos de uso também para os quais unwinding é indesejável, então Rust fornece uma opção para [abortar no panic] em vez disso. Isso desativa a geração de informações de símbolo de desenrolar e reduz consideravelmente o tamanho do binário. Há múltiplos locais onde podemos desativar o unwinding. A forma mais fácil é adicionar as seguintes linhas ao nosso `Cargo.toml`: ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` Isso define a estratégia de panic para `abort` tanto para o perfil `dev` (usado para `cargo build`) quanto para o perfil `release` (usado para `cargo build --release`). Agora o item de linguagem `eh_personality` não deve mais ser necessário. [abortar no panic]: https://github.com/rust-lang/rust/pull/32900 Agora corrigimos ambos os erros acima. No entanto, se tentarmos compilar agora, outro erro ocorre: ``` > cargo build error: requires `start` lang_item ``` Está faltando o item de linguagem `start` no nosso programa, que define o ponto de entrada. ## O Atributo `start` Alguém pode pensar que a função `main` é a primeira função chamada quando você executa um programa. No entanto, a maioria das linguagens tem um [sistema de runtime], que é responsável por coisas como coleta de lixo (por exemplo, em Java) ou threads de software (por exemplo, goroutines em Go). Este runtime precisa ser chamado antes de `main`, já que ele precisa se inicializar a si mesmo. [sistema de runtime]: https://en.wikipedia.org/wiki/Runtime_system Em um binário Rust típico que vincula a biblioteca padrão, a execução começa em uma biblioteca de runtime C chamada `crt0` ("C runtime zero"), que configura o ambiente para uma aplicação C. Isso inclui criar um stack e colocar os argumentos nos registradores certos. O runtime C então invoca o [ponto de entrada do runtime Rust][rt::lang_start], que é marcado pelo item de linguagem `start`. Rust tem apenas um runtime muito mínimo, que cuida de algumas poucas coisas como configurar guardas de estouro do stack ou imprimir um backtrace ao fazer panic. O runtime então finalmente chama a função `main`. [rt::lang_start]: https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73 Nosso executável independente não tem acesso ao runtime Rust e ao `crt0`, então precisamos definir nosso próprio ponto de entrada. Implementar o item de linguagem `start` não ajudaria, já que ainda exigiria `crt0`. Em vez disso, precisamos sobrescrever diretamente o ponto de entrada `crt0`. ### Sobrescrevendo o Ponto de Entrada (Entry Point) Para dizer ao compilador Rust que não queremos usar a cadeia normal de ponto de entrada, adicionamos o atributo `#![no_main]`. ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// Esta função é chamada em caso de pânico. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` Você pode notar que removemos a função `main`. A razão é que um `main` não faz sentido sem um runtime subjacente que o chame. Em vez disso, estamos agora sobrescrevendo o ponto de entrada do sistema operacional com nossa própria função `_start`: ```rust #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { loop {} } ``` Ao usar o atributo `#[unsafe(no_mangle)]`, desativamos [mangling de nomes] para garantir que o compilador Rust realmente produza uma função com o nome `_start`. Sem o atributo, o compilador geraria algum símbolo criptografado como `_ZN3blog_os4_start7hb173fedf945531caE` para dar a cada função um nome único. O atributo é necessário porque precisamos dizer o nome da função do ponto de entrada ao linker no próximo passo. Também temos que marcar a função como `extern "C"` para dizer ao compilador que ele deve usar a [convenção de chamada C] para esta função (em vez da convenção de chamada Rust não especificada). A razão de nomear a função `_start` é que este é o nome do ponto de entrada padrão para a maioria dos sistemas. [mangling de nomes]: https://en.wikipedia.org/wiki/Name_mangling [convenção de chamada C]: https://en.wikipedia.org/wiki/Calling_convention O tipo de retorno `!` significa que a função é divergente, ou seja, não é permitida retornar nunca. Isso é necessário porque o ponto de entrada não é chamado por nenhuma função, mas invocado diretamente pelo sistema operacional ou bootloader. Então em vez de retornar, o ponto de entrada deve por exemplo invocar a [chamada de sistema `exit`] do sistema operacional. No nosso caso, desligar a máquina poderia ser uma ação razoável, já que não há nada mais a fazer se um binário independente retorna. Por enquanto, cumprimos o requisito fazendo um loop infinito. [chamada de sistema `exit`]: https://en.wikipedia.org/wiki/Exit_(system_call) Quando executamos `cargo build` agora, recebemos um feio erro de _linker_. ## Erros do Linker O linker é um programa que combina o código gerado em um executável. Como o formato executável difere entre Linux, Windows e macOS, cada sistema tem seu próprio linker que lança um erro diferente. A causa fundamental dos erros é a mesma: a configuração padrão do linker assume que nosso programa depende do runtime C, o que não é o caso. Para resolver os erros, precisamos dizer ao linker que ele não deve incluir o runtime C. Podemos fazer isso passando um certo conjunto de argumentos ao linker ou compilando para um alvo bare metal. ### Compilando para um Alvo Bare Metal Por padrão, Rust tenta construir um executável que seja capaz de executar no seu ambiente de sistema atual. Por exemplo, se você estiver usando Windows em `x86_64`, Rust tenta construir um executável `.exe` Windows que usa instruções `x86_64`. Este ambiente é chamado seu sistema "host". Para descrever ambientes diferentes, Rust usa uma string chamada [_target triple_]. Você pode ver o target triple do seu sistema host executando `rustc --version --verbose`: [_target triple_]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple ``` rustc 1.91.0 (f8297e351 2025-10-28) binary: rustc commit-hash: f8297e351a40c1439a467bbbb6879088047f50b3 commit-date: 2025-10-28 host: x86_64-unknown-linux-gnu release: 1.91.0 LLVM version: 21.1.2 ``` A saída acima é de um sistema Linux `x86_64`. Vemos que o triple `host` é `x86_64-unknown-linux-gnu`, que inclui a arquitetura de CPU (`x86_64`), o vendor (`unknown`), o sistema operacional (`linux`), e a [ABI] (`gnu`). [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface Ao compilar para nosso triple host, o compilador Rust e o linker assumem que há um sistema operacional subjacente como Linux ou Windows que usa o runtime C por padrão, o que causa os erros do linker. Então, para evitar os erros do linker, podemos compilar para um ambiente diferente sem nenhum sistema operacional subjacente. Um exemplo de tal ambiente bare metal é o target triple `thumbv7em-none-eabihf`, que descreve um sistema [embarcado] [ARM]. Os detalhes não são importantes, tudo o que importa é que o target triple não tem nenhum sistema operacional subjacente, o que é indicado pelo `none` no target triple. Para ser capaz de compilar para este alvo, precisamos adicioná-lo em rustup: [embarcado]: https://en.wikipedia.org/wiki/Embedded_system [ARM]: https://en.wikipedia.org/wiki/ARM_architecture ``` rustup target add thumbv7em-none-eabihf ``` Isso baixa uma cópia da biblioteca padrão std (e core) para o sistema. Agora podemos compilar nosso executável independente para este alvo: ``` cargo build --target thumbv7em-none-eabihf ``` Ao passar um argumento `--target`, nós fazemos uma compilação [cross compile] nosso executável para um sistema alvo bare metal. Como o sistema alvo não tem sistema operacional, o linker não tenta vincular o runtime C e nossa compilação é bem-sucedida sem nenhum erro de linker. [cross compile]: https://en.wikipedia.org/wiki/Cross_compiler Esta é a abordagem que usaremos para construir nosso kernel de SO. Em vez de `thumbv7em-none-eabihf`, usaremos um [alvo customizado] que descreve um ambiente bare metal `x86_64`. Os detalhes serão explicados no próximo post. [alvo customizado]: https://doc.rust-lang.org/rustc/targets/custom.html ### Argumentos do Linker Em vez de compilar para um sistema bare metal, também é possível resolver os erros do linker passando um certo conjunto de argumentos ao linker. Esta não é a abordagem que usaremos para nosso kernel, portanto esta seção é opcional e fornecida apenas para completude. Clique em _"Argumentos do Linker"_ abaixo para mostrar o conteúdo opcional.
    Argumentos do Linker Nesta seção discutimos os erros do linker que ocorrem no Linux, Windows e macOS, e explicamos como resolvê-los passando argumentos adicionais ao linker. Note que o formato executável e o linker diferem entre sistemas operacionais, então que um conjunto diferente de argumentos é necessário para cada sistema. #### Linux No Linux, o seguinte erro de linker ocorre (encurtado): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` O problema é que o linker inclui a rotina de inicialização do runtime C por padrão, que também é chamada `_start`. Ela requer alguns símbolos da biblioteca padrão C `libc` que não incluímos devido ao atributo `no_std`, portanto o linker não consegue resolver estas referências. Para resolver isso, podemos dizer ao linker que ele não deve vincular a rotina de inicialização C passando a flag `-nostartfiles`. Uma forma de passar atributos de linker via cargo é o comando `cargo rustc`. O comando se comporta exatamente como `cargo build`, mas permite passar opções para `rustc`, o compilador Rust subjacente. `rustc` tem a flag `-C link-arg`, que passa um argumento ao linker. Combinados, nosso novo comando de compilação se parece com isso: ``` cargo rustc -- -C link-arg=-nostartfiles ``` Agora nossa crate compilada como um executável independente no Linux! Não precisávamos especificar o nome da nossa função de ponto de entrada explicitamente, já que o linker procura por uma função com o nome `_start` por padrão. #### Windows No Windows, um erro de linker diferente ocorre (encurtado): ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` O erro "entry point must be defined" (ponto de entrada deve ser definido) significa que o linker não consegue encontrar o ponto de entrada. No Windows, o nome do ponto de entrada padrão [depende do subsistema usado][windows-subsystems]. Para o subsistema `CONSOLE`, o linker procura por uma função nomeada `mainCRTStartup` e para o subsistema `WINDOWS`, ele procura por uma função nomeada `WinMainCRTStartup`. Para sobrescrever o padrão e dizer ao linker para procurar por nossa função `_start` em vez disso, podemos passar um argumento `/ENTRY` ao linker: [windows-subsystems]: https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol ``` cargo rustc -- -C link-arg=/ENTRY:_start ``` Do formato de argumento diferente, vemos claramente que o linker Windows é um programa completamente diferente do linker Linux. Agora um erro de linker diferente ocorre: ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` Este erro ocorre porque os executáveis Windows podem usar [subsistemas] diferentes[windows-subsystems]. Para programas normais, eles são inferidos dependendo do nome do ponto de entrada: Se o ponto de entrada é nomeado `main`, o subsistema `CONSOLE` é usado, e se o ponto de entrada é nomeado `WinMain`, o subsistema `WINDOWS` é usado. Como nossa função `_start` tem um nome diferente, precisamos especificar o subsistema explicitamente: ``` cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` Usamos o subsistema `CONSOLE` aqui, mas o subsistema `WINDOWS` funcionaria também. Em vez de passar `-C link-arg` múltiplas vezes, usamos `-C link-args` que leva uma lista de argumentos separados por espaço. Com este comando, nosso executável deve compilar com sucesso no Windows. #### macOS No macOS, o seguinte erro de linker ocorre (encurtado): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` Esta mensagem de erro nos diz que o linker não consegue encontrar uma função de ponto de entrada com o nome padrão `main` (por alguma razão, todas as funções são prefixadas com um `_` no macOS). Para definir o ponto de entrada para nossa função `_start`, passamos o argumento de linker `-e`: ``` cargo rustc -- -C link-args="-e __start" ``` A flag `-e` especifica o nome da função de ponto de entrada. Como todas as funções têm um `_` adicional prefixado no macOS, precisamos definir o ponto de entrada para `__start` em vez de `_start`. Agora o seguinte erro de linker ocorre: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` macOS [não oferece suporte oficial a binários vinculados estaticamente] e requer que programas vinculem a biblioteca `libSystem` por padrão. Para sobrescrever isto e vincular um binário estático, passamos a flag `-static` ao linker: [não oferece suporte oficial a binários vinculados estaticamente]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ``` cargo rustc -- -C link-args="-e __start -static" ``` Isso ainda não é suficiente, pois um terceiro erro de linker ocorre: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` Este erro ocorre porque programas no macOS vinculam a `crt0` ("C runtime zero") por padrão. Isto é similar ao erro que tivemos no Linux e também pode ser resolvido adicionando o argumento de linker `-nostartfiles`: ``` cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` Agora nosso programa deve compilar com sucesso no macOS. #### Unificando os Comandos de Compilação Agora temos diferentes comandos de compilação dependendo da plataforma host, o que não é o ideal. Para evitar isto, podemos criar um arquivo nomeado `.cargo/config.toml` que contém os argumentos específicos de plataforma: ```toml # in .cargo/config.toml [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` A key `rustflags` contém argumentos que são automaticamente adicionados a cada invocação de `rustc`. Para mais informações sobre o arquivo `.cargo/config.toml`, veja a [documentação oficial](https://doc.rust-lang.org/cargo/reference/config.html). Agora nosso programa deve ser compilável em todas as três plataformas com um simples `cargo build`. #### Você Deveria Fazer Isto? Enquanto é possível construir um executável independente para Linux, Windows e macOS, provavelmente não é uma boa ideia. A razão é que nosso executável ainda espera por várias coisas, por exemplo que uma pilha seja inicializada quando a função `_start` é chamada. Sem o runtime C, alguns desses requisitos podem não ser atendidos, o que pode causar nosso programa falhar, por exemplo através de um segmentation fault. Se você quiser criar um binário mínimo que execute em cima de um sistema operacional existente, incluindo `libc` e definindo o atributo `#[start]` conforme descrito [aqui](https://doc.rust-lang.org/1.16.0/book/no-stdlib.html) é provavelmente uma melhor ideia.
    ## Resumo Um binário Rust independente mínimo se parece com isto: `src/main.rs`: ```rust #![no_std] // Não vincule a biblioteca padrão do Rust #![no_main] // desativar todos os pontos de entrada no nível Rust use core::panic::PanicInfo; #[unsafe(no_mangle)] // não altere (mangle) o nome desta função pub extern "C" fn _start() -> ! { // essa função é o ponto de entrada, já que o vinculador procura uma função // denominado `_start` por padrão loop {} } /// Esta função é chamada em caso de pânico. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # o perfil usado para `cargo build` [profile.dev] panic = "abort" # desativar o unwinding do stack em caso de pânico # o perfil usado para `cargo build --release` [profile.release] panic = "abort" # desativar o unwinding do stack em caso de pânico ``` Para construir este binário, precisamos compilar para um alvo bare metal como `thumbv7em-none-eabihf`: ``` cargo build --target thumbv7em-none-eabihf ``` Alternativamente, podemos compilá-lo para o sistema host passando argumentos adicionais de linker: ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` Note que isto é apenas um exemplo mínimo de um binário Rust independente. Este binário espera por várias coisas, por exemplo, que um stack seja inicializado quando a função `_start` é chamada. **Portanto para qualquer uso real de tal binário, mais passos são necessários**. ## Deixando `rust-analyzer` Feliz O projeto [`rust-analyzer`](https://rust-analyzer.github.io/) é uma ótima forma de obter autocompletar e suporte "ir para definição" (e muitos outros recursos) para código Rust no seu editor. Funciona muito bem para projetos `#![no_std]` também, então recomendo usá-lo para desenvolvimento de kernel! Se você estiver usando a funcionalidade [`checkOnSave`](https://rust-analyzer.github.io/book/configuration.html#checkOnSave) de `rust-analyzer` (habilitada por padrão), ela pode relatar um erro para a função panic do nosso kernel: ``` found duplicate lang item `panic_impl` ``` A razão para este erro é que `rust-analyzer` invoca `cargo check --all-targets` por padrão, que também tenta construir o binário em modo [teste](https://doc.rust-lang.org/book/ch11-01-writing-tests.html) e [benchmark](https://doc.rust-lang.org/rustc/tests/index.html#benchmarks).
    ### Os dois significados de "target" A flag `--all-targets` é completamente não relacionada ao argumento `--target`. Há dois significados diferentes do termo "target" no `cargo`: - A flag `--target` especifica o [_alvo de compilação_] que deve ser passado ao compilador `rustc`. Isso deve ser definido como o [target triple] da máquina que deve executar nosso código. - A flag `--all-targets` referencia o [_alvo do package] do Cargo. Pacotes Cargo podem ser uma biblioteca e binário ao mesmo tempo, então você pode especificar de qual forma você gostaria de construir sua crate. Além disso, Cargo também tem alvos de package para [exemplos](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#examples), [testes](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#tests), e [benchmarks](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#benchmarks). Esses alvos de pacote podem coexistir, então você pode construir/verificar a mesma crate por exemplo em modo biblioteca ou modo teste. [_alvo de compilação_]: https://doc.rust-lang.org/rustc/targets/index.html [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [_alvo do package]: https://doc.rust-lang.org/cargo/reference/cargo-targets.html
    Por padrão, `cargo check` apenas constrói o _biblioteca_ e os alvos de pacote _binário_. No entanto, `rust-analyzer` escolhe verificar todos os alvos de pacote por padrão quando [`checkOnSave`](https://rust-analyzer.github.io/book/configuration.html#checkOnSave) é habilitado. Esta é a razão pela qual `rust-analyzer` relata o erro de `lang item` acima que não vemos em `cargo check`. Se executarmos `cargo check --all-targets`, vemos o erro também: ``` error[E0152]: found duplicate lang item `panic_impl` --> src/main.rs:13:1 | 13 | / fn panic(_info: &PanicInfo) -> ! { 14 | | loop {} 15 | | } | |_^ | = note: the lang item is first defined in crate `std` (which `test` depends on) = note: first definition in `std` loaded from /home/[...]/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-8df6be531efb3fd0.rlib = note: second definition in the local crate (`blog_os`) ``` A primeira `note` nos diz que o item de linguagem panic já está definido na crate `std`, que é uma dependência da crate `test`. A crate `test` é automaticamente incluída ao construir uma crate em [modo teste](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#tests). Isso não faz sentido para nosso kernel `#![no_std]` já que não há forma de suportar a biblioteca padrão em bare metal. Então este erro não é relevante para nosso projeto e podemos seguramente ignorá-lo. A forma apropriada de evitar este erro é especificar em nosso `Cargo.toml` que nosso binário não suporta construção em modos `test` e `bench`. Podemos fazer isso adicionando uma seção `[[bin]]` em nosso `Cargo.toml` para [configurar a construção](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#configuring-a-target) do nosso binário: ```toml # no Cargo.toml [[bin]] name = "blog_os" test = false bench = false ``` Os colchetes duplos ao redor de `bin` não é um erro, isto é como o formato TOML define chaves que podem aparecer múltiplas vezes. Como uma crate pode ter múltiplos binários, a seção `[[bin]]` pode aparecer múltiplas vezes em `Cargo.toml` também. Esta é também a razão para o campo `name` obrigatório, que precisa corresponder ao nome do binário (para que `cargo` saiba quais configurações devem ser aplicadas a qual binário). Ao definir os campos [`test`](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#the-test-field) e [`bench` ](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#the-bench-field) para `false`, instruímos `cargo` a não construir nosso binário em modo teste ou benchmark. Agora `cargo check --all-targets` não deve lançar mais erros, e a implementação de `checkOnSave` de `rust-analyzer` também deve estar feliz. ## O que vem a seguir? O [próximo post] explica os passos necessários para transformar nosso binário independente em um kernel mínimo do sistema operacional. Isso inclui criar um alvo customizado, combinar nosso executável com um bootloader, e aprender como imprimir algo na tela. [próximo post]: @/edition-2/posts/02-minimal-rust-kernel/index.md ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.ru.md ================================================ +++ title = "Независимый бинарный файл на Rust" weight = 1 path = "ru/freestanding-rust-binary" date = 2018-02-10 [extra] translators = ["MrZloHex"] +++ Первый шаг в создании собственного ядра операционной системы — это создание исполняемого файла на Rust, который не будет подключать стандартную библиотеку. Именно это дает возможность запускать Rust код на [голом железе][bare metal] без слоя операционной системы. [bare metal]: https://en.wikipedia.org/wiki/Bare_machine Этот блог открыто разрабатывается на [GitHub]. Если у вас возникли какие-либо проблемы или вопросы, пожалуйста, создайте _issue_. Также вы можете оставлять комментарии [в конце страницы][at the bottom]. Полный исходный код для этого поста вы можете найти в репозитории в ветке [`post-01`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## Введение Для того, чтобы написать ядро операционной системы, нужен код, который не зависит от операционной системы и ее возможностей. Это означает, что нельзя использовать потоки, файлы, [кучу][heap], сети, случайные числа, стандартный вывод или другие возможности, которые зависят от ОС или определённого железа. [heap]: https://en.wikipedia.org/wiki/Heap_(data_structure) Это значит, что нельзя использовать большую часть [стандартной библиотеки Rust][Rust Standard library], но остается множество других возможностей Rust, которые _можно использовать_. Например, [итераторы][iterators], [замыкания][closures], [сопоставление с образцом][pattern matching], [`Option`][option] и [`Result`][result], [форматирование строк][string formatting] и, конечно же, [систему владения][ownership system]. Эти функции дают возможность для написания ядра в очень выразительном и высоко-уровневом стиле, не беспокоясь о [неопределенном поведении][undefined behavior] или [сохранности памяти][memory safety]. [option]: https://doc.rust-lang.org/core/option/ [result]:https://doc.rust-lang.org/core/result/ [Rust standard library]: https://doc.rust-lang.org/std/ [iterators]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [string formatting]: https://doc.rust-lang.org/core/macro.write.html [ownership system]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [undefined behavior]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [memory safety]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention Чтобы создать ядро ОС на Rust, нужно создать исполняемый файл, который мог бы запускаться без ОС. Этот пост описывает необходимые шаги для создания независимого исполняемого файла на Rust и объясняет, почему эти шаги нужны. Если вам интересен только минимальный пример, можете сразу перейти к __[итогам](#summary)__. ## Отключение стандартной библиотеки По умолчанию, все Rust-крейты подключают [стандартную библиотеку][standard library], которая зависит от возможностей операционной системы, таких как потоки, файлы, сети. Она также зависит от стандартной библиотки C `libc`, которая очень тесно взаимодействует с возможностями ОС. Так как мы хотим написать операционную систему, мы не можем использовать библиотеки, которые зависят от операционной системы. Поэтому необходимо отключить автоматические подключение стандартной библиотеки через [атрибут `no_std`][attribute]. [standard library]: https://doc.rust-lang.org/std/ [attribute]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html Мы начнем с создания нового проекта cargo. Самый простой способ сделать это — через командную строку: ``` cargo new blog_os --bin -- edition 2024 ``` Я назвал этот проект `blog_os`, но вы можете назвать как вам угодно. Флаг `--bin` указывает на то, что мы хотим создать исполняемый файл (а не библиотеку), а флаг `--edition 2024` указывает, что мы хотим использовать [редакцию Rust 2024][edition] для нашего крейта. После выполнения команды cargo создаст каталог со следующей структурой: [edition]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` `Cargo.toml` содержит данные и конфигурацию крейта, такие как _название, автор, [семантическую версию][semantic version]_ и _зависимости_ от других крейтов. Файл `src/main.rs` содержит корневой модуль нашего крейта и функцию `main`. Можно скомпилировать крейт с помощью `cargo build` и запустить скомпилированную программу `blog_os` в поддиректории `target/debug`. [semantic version]: https://semver.org/ ### Атрибут `no_std` В данный момент наш крейт неявно подключает стандартную библиотеку. Это можно исправить путем добавления [атрибута `no_std`][attribute]: ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` Если сейчас попробовать скомпилировать программу (с помоцью команды `cargo build`), то появится следующая ошибка: ``` error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` Эта ошибка объясняется тем, что [макрос `println`][macro] — часть стандартной библиотеки, которая была отключена. Поэтому у нас больше нет возможность выводить что-либо на экран. Это логично, так как `println` печатает через [стандартный вывод][standard output], который, в свою очередь, является специальным файловым дескриптором, предоставляемым операционной системой. [macro]: https://doc.rust-lang.org/std/macro.println.html [standard output]: https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29 Давайте уберем макрос `println` и попробуем скомпилировать еще раз: ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` Сейчас компилятор не может найти функцию `#[panic_handler]` и «элемент языка». ## Реализация _паники_ Атрибут `pаnic_handler` определяет функцию, которая должна вызываться, когда происходит [паника (panic)][panic]. Стандартная библиотека предоставляет собственную функцию обработчика паники, но после отключения стандартной библиотеки мы должны написать собственный обработчик: [panic]: https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html ```rust // in main.rs use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` Параметр [`PanicInfo`][PanicInfo] содержит название файла и строку, где произошла паника, и дополнительное сообщение с пояснением. Эта функция никогда не должна возвратиться, и такая функция называется [расходящейся][diverging functions] и она возращает [пустой тип]["never" type] `!`. Пока что мы ничего не можем сделать в этой функции, поэтому мы просто войдем в бесконечный цикл. [PanicInfo]: https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html [diverging functions]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions ["never" type]: https://doc.rust-lang.org/nightly/std/primitive.never.html ## Элемент языка `eh_personality` Элементы языка — это специальные функции и типы, которые необходимы компилятору. Например, трейт [`Copy`] указывает компилятору, у каких типов есть [_семантика копирования_][`Copy`]. Если мы посмотрим на [реализацию][copy code] этого трейта, то увидим специальный атрибут `#[lang = "copy"]`, который говорит, что этот трейт является элементом языка. [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [copy code]: https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299 Несмотря на то, что можно предоставить свою реализацию элементов языка, это следует делать только в крайних случаях. Причина в том, что элементы языка являются крайне нестабильными деталями реализации, и компилятор даже не проверяет в них согласованность типов (поэтому он даже не проверяет, имеет ли функция правильные типы аргументов). К счастью, существует более стабильный способ исправить вышеупомянутую ошибку. Элемент языка [`eh_personality`][language item] указывает на функцию, которая используется для реализации [раскрутки стека][stack unwinding]. По умолчанию, Rust использует раскрутку для запуска деструктуров для всех _живых_ переменных на стеке в случае [паники][panic]. Это гарантирует, что вся использованная память будет освобождена, и позволяет родительскому потоку перехватить панику и продолжить выполнение. Раскрутка — очень сложный процесс и требует некоторых специльных библиотек ОС (например, [libunwind] для Linux или [structured exception handling] для Windows), так что мы не должны использовать её для нашей операционной системы. [language item]: https://github.com/rust-lang/rust/blob/edb368491551a77d77a48446d4ee88b35490c565/src/libpanic_unwind/gcc.rs#L11-L45 [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php [libunwind]: https://www.nongnu.org/libunwind/ [structured exception handling]: https://docs.microsoft.com/ru-ru/windows/win32/debug/structured-exception-handling ### Отключение раскрутки Существуют и другие случаи использования, для которых раскрутка нежелательна, поэтому Rust предоставляет опцию [прерывания выполнения при панике][abort on panic]. Это отключает генерацию информации о символах раскрутки и, таким образом, значительно уменьшает размер бинарного файла. Есть несколько мест, где мы можем отключить раскрутку. Самый простой способ — добавить следующие строки в наш `Cargo.toml`: ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` Это устанавливает стратегию паники на `abort` (прерывание) как для профиля `dev` (используемого для `cargo build`), так и для профиля `release` (используемого для `cargo build --release`). Теперь элемент языка `eh_personality` больше не должен требоваться. [abort on panic]: https://github.com/rust-lang/rust/pull/32900 Теперь мы исправили обе вышеуказанные ошибки. Однако, если мы сейчас попытаемся скомпилировать программу, возникнет другая ошибка: ``` > cargo build error: requires `start` lang_item ``` В нашей программе отсутствует элемент языка `start`, который определяет начальную точку входа программы. ## Аттрибут `start` Можно подумать, что функция `main` — это первая функция, вызываемая при запуске программы. Однако в большинстве языков есть [среда выполнения][runtime system], которая отвечает за такие вещи, как сборка мусора (например, в Java) или программные потоки (например, goroutines в Go). Эта система выполнения должна быть вызвана до `main`, поскольку ей необходимо инициализировать себя. [runtime system]: https://en.wikipedia.org/wiki/Runtime_system В типичном исполнимом файле Rust, который использует стандартную библиотеку, выполнение начинается в runtime-библиотеке C под названием `crt0` ("C runtime zero"), которая создает окружение для C-приложения. Это включает создание стека и размещение аргументов в нужных регистрах. Затем C runtime вызывает [точку входа для Rust-приложения][rt::lang_start], которая обозначается элементом языка `start`. Rust имеет очень маленький runtime, который заботится о некоторых мелочах, таких как установка защиты от переполнения стека или вывод сообщения при панике. Затем рантайм вызывает функцию `main`. [rt::lang_start]: https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73 Наш независимый исполняемый файл не имеет доступа к runtime Rust и `crt0`, поэтому нам нужно определить собственную точку входа. Реализация языкового элемента `start` не поможет, поскольку он все равно потребует `crt0`. Вместо этого нам нужно напрямую переопределить точку входа `crt0`. ### Переопределение точки входа Чтобы сообщить компилятору Rust, что мы не хотим использовать стандартную цепочку точек входа, мы добавляем атрибут `#![no_main]`. ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` Можно заметить, что мы удалили функцию `main`. Причина в том, что `main` не имеет смысла без стандартного runtime, которая ее вызывает. Вместо этого мы переопределим точку входа операционной системы с помощью нашей собственной функции `_start`: ```rust #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { loop {} } ``` Используя атрибут `#[unsafe(no_mangle)]`, мы отключаем [искажение имен][name mangling], чтобы гарантировать, что компилятор Rust сгенерирует функцию с именем `_start`. Без этого атрибута компилятор генерировал бы какой-нибудь загадочный символ `_ZN3blog_os4_start7hb173fedf945531caE`, чтобы дать каждой функции уникальное имя. Атрибут необходим, потому что на следующем этапе нам нужно сообщить имя функции точки входа компоновщику. Мы также должны пометить функцию как `extern "C"`, чтобы указать компилятору, что он должен использовать [соглашение о вызове C][C calling convention] для этой функции (вместо неопределенного соглашения о вызове Rust). Причина именования функции `_start` в том, что это имя точки входа по умолчанию для большинства систем. [name mangling]: https://en.wikipedia.org/wiki/Name_mangling [C calling convention]: https://en.wikipedia.org/wiki/Calling_convention Возвращаемый `!` означает, что функция является расходящейся, т.е. не имеет права возвращаться. Это необходимо, поскольку точка входа не вызывается никакой функцией, а вызывается непосредственно операционной системой или загрузчиком. Поэтому вместо возврата точка входа должна, например, вызвать [системный вызов `exit`][`exit` system call] операционной системы. В нашем случае разумным действием может быть выключение машины, поскольку ничего не останется делать, если независимый исполнимый файл завершит исполнение. Пока что мы выполняем это требование путем бесконечного цикла. [`exit` system call]: https://en.wikipedia.org/wiki/Exit_(system_call) Если мы выполним `cargo build` сейчас, мы получим ошибку компоновщика (_linker_ error). ## Ошибки компоновщика Компоновщик — это программа, которая объединяет сгенерированный код в исполняемый файл. Поскольку формат исполняемого файла отличается в Linux, Windows и macOS, в каждой системе есть свой компоновщик, и каждый покажет свою ошибку. Основная причина ошибок одна и та же: конфигурация компоновщика по умолчанию предполагает, что наша программа зависит от C runtime, а это не так. Чтобы устранить ошибки, нам нужно сообщить компоновщику, что он не должен включать C runtime. Мы можем сделать это, передав компоновщику определенный набор аргументов или выполнив компиляцию для голого железа. ### Компиляция для голого железа По умолчанию Rust пытается создать исполняемый файл, который может быть запущен в окружении вашей текущей системы. Например, если вы используете Windows на `x86_64`, Rust пытается создать исполняемый файл Windows `.exe`, который использует инструкции `x86_64`. Это окружение называется вашей "хост-системой". Для описания различных окружений Rust использует строку [_target triple_]. Вы можете узнать тройку вашей хост-системы, выполнив команду `rustc --version --verbose`: [_target triple_]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple ``` rustc 1.35.0-nightly (474e7a648 2019-04-07) binary: rustc commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab commit-date: 2019-04-07 host: x86_64-unknown-linux-gnu release: 1.35.0-nightly LLVM version: 8.0 ``` Приведенный выше результат получен от системы `x86_64` Linux. Мы видим, что тройка `host` — это `x86_64-unknown-linux-gnu`, которая включает архитектуру процессора (`x86_64`), производителя (`unknown`), операционную систему (`linux`) и [ABI] (`gnu`). [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface Компилируя для тройки нашего хоста, компилятор Rust и компоновщик предполагают наличие базовой операционной системы, такой как Linux или Windows, которая по умолчанию использует C runtime, что вызывает ошибки компоновщика. Поэтому, чтобы избежать ошибок компоновщика, мы можем настроить компиляцию для другого окружения без базовой операционной системы. Примером такого "голого" окружения является тройка `thumbv7em-none-eabihf`, которая описывает [ARM] архитектуру. Детали не важны, важно лишь то, что тройка не имеет базовой операционной системы, на что указывает `none` в тройке. Чтобы иметь возможность компилировать для этой системы, нам нужно добавить ее в rustup: [ARM]: https://en.wikipedia.org/wiki/ARM_architecture ``` rustup target add thumbv7em-none-eabihf ``` Это загружает копию стандартной библиотеки (и `core`) для системы. Теперь мы можем собрать наш независимый исполняемый файл для этой системы: ``` cargo build --target thumbv7em-none-eabihf ``` Передавая аргумент `--target`, мы [кросс-компилируем][cross compile] наш исполняемый файл для голого железа. Поскольку система, под которую мы компилируем, не имеет операционной системы, компоновщик не пытается компоновать C runtime, и наша компиляция проходит успешно без каких-либо ошибок компоновщика. [cross compile]: https://en.wikipedia.org/wiki/Cross_compiler Именно этот подход мы будем использовать для сборки ядра нашей ОС. Вместо `thumbv7em-none-eabihf` мы будем использовать [custom target], который описывает окружение для архитектуры `x86_64`. Подробности будут описаны в следующем посте. [custom target]: https://doc.rust-lang.org/rustc/targets/custom.html ### Аргументы компоновщика Вместо компиляции под голое железо, ошибки компоновщика можно исправить, передав ему определенный набор аргументов. Мы не будем использовать этот подход для нашего ядра, поэтому данный раздел является необязательным и приводится только для полноты картины. Щелкните на _"Аргументы компоновщика"_ ниже, чтобы показать необязательное содержание.
    Аргументы компоновщика В этом разделе мы рассмотрим ошибки компоновщика, возникающие в Linux, Windows и macOS, и объясним, как их решить, передав компоновщику дополнительные аргументы. Обратите внимание, что формат исполняемого файла и компоновщик отличаются в разных операционных системах, поэтому для каждой системы требуется свой набор аргументов. #### Linux На Linux возникает следующая ошибка компоновщика (сокращенно): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` Проблема заключается в том, что компоновщик по умолчанию включает процедуру запуска C runtime, которая также называется `_start`. Она требует некоторых символов стандартной библиотеки C `libc`, которые мы не включаем из-за атрибута `no_std`, поэтому компоновщик не может подключить эти библиотеки, поэтому появляются ошибки. Чтобы решить эту проблему, мы можем сказать компоновщику, что он не должен компоновать процедуру запуска C, передав флаг `-nostartfiles`. Одним из способов передачи атрибутов компоновщика через cargo является команда `cargo rustc`. Команда ведет себя точно так же, как `cargo build`, но позволяет передавать опции `rustc`, базовому компилятору Rust. У `rustc` есть флаг `-C link-arg`, который передает аргумент компоновщику. В совокупности наша новая команда сборки выглядит следующим образом: ``` cargo rustc -- -C link-arg=-nostartfiles ``` Теперь наш крейт собирается как независимый исполняемый файл в Linux! Нам не нужно было явно указывать имя нашей функции точки входа, поскольку компоновщик по умолчанию ищет функцию с именем `_start`. #### Windows В Windows возникает другая ошибка компоновщика (сокращенно): ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` Ошибка "точка входа должна быть определена" (_"entry point must be defined"_) означает, что компоновщик не может найти точку входа. В Windows имя точки входа по умолчанию [зависит от используемой подсистемы][windows-subsystems]. Для подсистемы `CONSOLE` компоновщик ищет функцию с именем `mainCRTStartup`, а для подсистемы `WINDOWS` - функцию с именем `WinMainCRTStartup`. Чтобы переопределить названия точки входа на `_start`, мы можем передать компоновщику аргумент `/ENTRY`: [windows-subsystems]: https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol ``` cargo rustc -- -C link-arg=/ENTRY:_start ``` Из разного формата аргументов мы ясно видим, что компоновщик Windows - это совершенно другая программа, чем компоновщик Linux. Теперь возникает другая ошибка компоновщика: ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` Эта ошибка возникает из-за того, что исполняемые файлы Windows могут использовать различные [подсистемы][windows-subsystems]. Для обычных программ они определяются в зависимости от имени точки входа: если точка входа называется `main`, то используется подсистема `CONSOLE`, а если точка входа называется `WinMain`, то используется подсистема `WINDOWS`. Поскольку наша функция `_start` имеет другое имя, нам нужно явно указать подсистему: ``` cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` Здесь мы используем подсистему `CONSOLE`, но подойдет и подсистема `WINDOWS`. Вместо того, чтобы передавать `-C link-arg` несколько раз, мы используем `-C link-args`, который принимает список аргументов, разделенных пробелами. С помощью этой команды наш исполняемый файл должен успешно скомпилироваться под Windows. #### macOS На macOS возникает следующая ошибка компоновщика (сокращенно): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` Это сообщение об ошибке говорит нам, что компоновщик не может найти функцию точки входа с именем по умолчанию `main` (по какой-то причине в macOS все функции имеют префикс `_`). Чтобы установить точку входа в нашу функцию `_start`, мы передаем аргумент компоновщика `-e`: ``` cargo rustc -- -C link-args="-e __start" ``` Флаг `-e` задает имя функции точки входа. Поскольку в macOS все функции имеют дополнительный префикс `_`, нам нужно установить точку входа на `__start` вместо `_start`. Теперь возникает следующая ошибка компоновщика: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` macOS [официально не поддерживает статически скомпонованные исполняемые файлы][static binary] и по умолчанию требует от программ компоновки библиотеки `libSystem`. Чтобы переопределить это поведение и скомпоновать статический исполняемый файл, передадим компоновщику флаг `-static`: [static binary]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ``` cargo rustc -- -C link-args="-e __start -static" ``` Этого все равно недостаточно, так как возникает третья ошибка компоновщика: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` Эта ошибка возникает из-за того, что программы на macOS по умолчанию ссылаются на `crt0` ("C runtime zero"). Она похожа на ошибку под Linux и тоже может быть решена добавлением аргумента компоновщика `-nostartfiles`: ``` cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` Теперь наша программа должна успешно скомпилироваться на macOS. #### Объединение команд сборки Сейчас у нас разные команды сборки в зависимости от платформы хоста, что не идеально. Чтобы избежать этого, мы можем создать файл с именем `.cargo/config.toml`, который будет содержать аргументы для конкретной платформы: ```toml # in .cargo/config.toml [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` Ключ `rustflags` содержит аргументы, которые автоматически добавляются к каждому вызову `rustc`. Более подробную информацию о файле `.cargo/config.toml` можно найти в [официальной документации](https://doc.rust-lang.org/cargo/reference/config.html). Теперь наша программа должна собираться на всех трех платформах с помощью простой `cargo build`. #### Должны ли вы это делать? Хотя можно создать независимый исполняемый файл для Linux, Windows и macOS, это, вероятно, не очень хорошая идея. Причина в том, что наш исполняемый файл все еще ожидает различных вещей, например, инициализации стека при вызове функции `_start`. Без C runtime некоторые из этих требований могут быть не выполнены, что может привести к сбою нашей программы, например, из-за ошибки сегментации. Если вы хотите создать минимальный исполняемый файл, запускаемый поверх существующей операционной системы, то включение `libc` и установка атрибута `#[start]`, как описано [здесь] (https://doc.rust-lang.org/1.16.0/book/no-stdlib.html), вероятно, будет идеей получше.
    ## Итоги {#summary} Минимальный независимый исполняемый бинарный файл Rust выглядит примерно так: `src/main.rs`: ```rust #![no_std] // don't link the Rust standard library #![no_main] // disable all Rust-level entry points use core::panic::PanicInfo; #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { // this function is the entry point, since the linker looks for a function // named `_start` by default loop {} } /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # the profile used for `cargo build` [profile.dev] panic = "abort" # disable stack unwinding on panic # the profile used for `cargo build --release` [profile.release] panic = "abort" # disable stack unwinding on panic ``` Чтобы собрать этот исполняемый файл, его надо скомпилировать для голого железа, например, `thumbv7em-none-eabihf`: ``` cargo build --target thumbv7em-none-eabihf ``` В качестве альтернативы, мы можем скомпилировать его для хост-системы, передав дополнительные аргументы компоновщика: ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` Обратите внимание, что это лишь минимальный пример независимого бинарного файла Rust. Этот бинарник ожидает различных вещей, например, инициализацию стека при вызове функции `_start`. **Поэтому для любого реального использования такого бинарного файла потребуется совершить еще больше действий**. ## Что дальше? В [следующем посте][next post] описаны шаги, необходимые для превращения нашего независимого бинарного файла в минимальное ядро операционной системы. Сюда входит создание custom target, объединение нашего исполняемого файла с загрузчиком и изучение, как вывести что-то на экран. [next post]: @/edition-2/posts/02-minimal-rust-kernel/index.ru.md ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.zh-CN.md ================================================ +++ title = "独立式可执行程序" weight = 1 path = "zh-CN/freestanding-rust-binary" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "e6c148d6f47bcf8a34916393deaeb7e8da2d5e2a" # GitHub usernames of the people that translated this post translators = ["luojia65", "Rustin-Liu", "TheBegining", "liuyuran","ic3w1ne"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["JiangengDong"] +++ 创建一个不链接标准库的 Rust 可执行文件,将是我们迈出的第一步。无需底层操作系统的支撑,这样才能在**裸机**([bare metal])上运行 Rust 代码。 [bare metal]: https://en.wikipedia.org/wiki/Bare_machine 此博客在 [GitHub] 上公开开发. 如果您有任何问题或疑问,请在此处打开一个 issue。 您也可以在[底部][at the bottom]发表评论. 这篇文章的完整源代码可以在 [`post-01`] [post branch] 分支中找到。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## 简介 要编写一个操作系统内核,我们需要编写不依赖任何操作系统特性的代码。这意味着我们不能使用线程、文件、堆内存、网络、随机数、标准输出,或其它任何需要操作系统抽象和特定硬件的特性;因为我们正在编写自己的操作系统和硬件驱动。 实现这一点,意味着我们不能使用 [Rust标准库](https://doc.rust-lang.org/std/)的大部分;但还有很多 Rust 特性是我们依然可以使用的。比如说,我们可以使用[迭代器](https://doc.rust-lang.org/book/ch13-02-iterators.html)、[闭包](https://doc.rust-lang.org/book/ch13-01-closures.html)、[模式匹配](https://doc.rust-lang.org/book/ch06-00-enums.html)、[Option](https://doc.rust-lang.org/core/option/)、[Result](https://doc.rust-lang.org/core/result/index.html)、[字符串格式化](https://doc.rust-lang.org/core/macro.write.html),当然还有[所有权系统](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html)。这些功能让我们能够编写表达性强、高层抽象的操作系统,而无需关心[未定义行为](https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs)和[内存安全](https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention)。 为了用 Rust 编写一个操作系统内核,我们需要创建一个独立于操作系统的可执行程序。这样的可执行程序常被称作**独立式可执行程序**(freestanding executable)或**裸机程序**(bare-metal executable)。 在这篇文章里,我们将逐步地创建一个独立式可执行程序,并且详细解释为什么每个步骤都是必须的。如果读者只对最终的代码感兴趣,可以跳转到本篇文章的小结部分。 ## 禁用标准库 在默认情况下,所有的 Rust **包**(crate)都会链接**标准库**([standard library](https://doc.rust-lang.org/std/)),而标准库依赖于操作系统功能,如线程、文件系统、网络。标准库还与 **Rust 的 C 语言标准库实现库**(libc)相关联,它也是和操作系统紧密交互的。既然我们的计划是编写自己的操作系统,我们就需要不使用任何与操作系统相关的库——因此我们必须禁用**标准库自动引用**(automatic inclusion)。使用 [no_std 属性](https://doc.rust-lang.org/book/first-edition/using-rust-without-the-standard-library.html)可以实现这一点。 我们可以从创建一个新的 cargo 项目开始。最简单的办法是使用下面的命令: ```bash cargo new blog_os --bin --edition 2024 ``` 在这里我把项目命名为 `blog_os`,当然读者也可以选择自己的项目名称。默认情况下,即使不显式指定,cargo 也会为我们添加`--bin` 选项,说明我们将要创建一个可执行文件(而不是一个库); 另外 `--edition 2024` 参数指明了项目的包要使用 Rust 的 **2024 版次**([2024 edition]),但在默认情况下,该参数会指向本地安装的最新版本。当我们成功执行这行指令后,cargo 为我们创建的目录结构如下: [2024 edition]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` 在这里,`Cargo.toml` 文件包含了包的**配置**(configuration),比如包的名称、作者、[semver版本](https://semver.org/) 和项目依赖项;`src/main.rs` 文件包含包的**根模块**(root module)和 main 函数。我们可以使用 `cargo build` 来编译这个包,然后在 `target/debug` 文件夹内找到编译好的 `blog_os` 二进制文件。 ### no_std 属性 现在我们的包依然隐式地与标准库链接。为了禁用这种链接,我们可以尝试添加 [no_std 属性](https://doc.rust-lang.org/book/first-edition/using-rust-without-the-standard-library.html): ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` 看起来很顺利。当我们使用 `cargo build` 来编译的时候,却出现了下面的错误: ```rust error: cannot find macro `println!` in this scope --> src\main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` 出现这个错误的原因是:[println! 宏](https://doc.rust-lang.org/std/macro.println.html)是标准库的一部分,而我们的项目不再依赖于标准库。我们选择不再打印字符串。这也很好理解,因为 `println!` 将会向**标准输出**([standard output](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29))打印字符,它依赖于特殊的文件描述符,而这是由操作系统提供的特性。 所以我们可以移除这行代码,使用一个空的 main 函数再次尝试编译: ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` 现在我们发现,编译器缺少一个 `#[panic_handler]` 函数和一个**语言项**(language item)。 ## 实现 panic 处理函数 `panic_handler` 属性定义了一个函数,它会在一个 panic 发生时被调用。标准库中提供了自己的 panic 处理函数,但在 `no_std` 环境中,我们需要定义一个自己的 panic 处理函数: ```rust // in main.rs use core::panic::PanicInfo; /// 这个函数将在 panic 时被调用 #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` 类型为 [PanicInfo](https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html) 的参数包含了 panic 发生的文件名、代码行数和可选的错误信息。这个函数从不返回,所以他被标记为**发散函数**([diverging function])。发散函数的返回类型称作 **Never 类型**(["never" type](https://doc.rust-lang.org/nightly/std/primitive.never.html)),记为`!`。对这个函数,我们目前能做的很少,所以我们只需编写一个无限循环 `loop {}`。 [diverging function]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions ## eh_personality 语言项 语言项是一些编译器需求的特殊函数或类型。举例来说,Rust 的 [Copy](https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html) trait 是一个这样的语言项,告诉编译器哪些类型需要遵循**复制语义**([copy semantics](https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html))——当我们查找 `Copy` trait 的[实现](https://github.com/rust-lang/rust/blob/485397e49a02a3b7ff77c17e4a3f16c653925cb3/src/libcore/marker.rs#L296-L299)时,我们会发现,一个特殊的 `#[lang = "copy"]` 属性将它定义为了一个语言项,达到与编译器联系的目的。 我们可以自己实现语言项,但这是下下策:目前来看,语言项是高度不稳定的语言细节实现,它们不会经过编译期类型检查(所以编译器甚至不确保它们的参数类型是否正确)。幸运的是,我们有更稳定的方式,来修复上面的语言项错误。 `eh_personality` 语言项标记的函数,将被用于实现**栈展开**([stack unwinding](https://www.bogotobogo.com/cplusplus/stackunwinding.php))。在使用标准库的情况下,当 panic 发生时,Rust 将使用栈展开,来运行在栈上所有活跃的变量的**析构函数**(destructor)——这确保了所有使用的内存都被释放,允许调用程序的**父进程**(parent thread)捕获 panic,处理并继续运行。但是,栈展开是一个复杂的过程,如 Linux 的 [libunwind](https://www.nongnu.org/libunwind/) 或 Windows 的**结构化异常处理**([structured exception handling, SEH](https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling)),通常需要依赖于操作系统的库;所以我们不在自己编写的操作系统中使用它。 ### 禁用栈展开 在其它一些情况下,栈展开并不是迫切需求的功能;因此,Rust 提供了**在 panic 时中止**([abort on panic](https://github.com/rust-lang/rust/pull/32900))的选项。这个选项能禁用栈展开相关的标志信息生成,也因此能缩小生成的二进制程序的长度。有许多方式能打开这个选项,最简单的方式是把下面的几行设置代码加入我们的 `Cargo.toml`: ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` 这些选项能将 **dev 配置**(dev profile)和 **release 配置**(release profile)的 panic 策略设为 `abort`。`dev` 配置适用于 `cargo build`,而 `release` 配置适用于 `cargo build --release`。现在编译器应该不再要求我们提供 `eh_personality` 语言项实现。 现在我们已经修复了出现的两个错误,可以开始编译了。然而,尝试编译运行后,一个新的错误出现了: ```bash > cargo build error: requires `start` lang_item ``` ## start 语言项 这里,我们的程序遗失了 `start` 语言项,它将定义一个程序的**入口点**(entry point)。 我们通常会认为,当运行一个程序时,首先被调用的是 `main` 函数。但是,大多数语言都拥有一个**运行时系统**([runtime system](https://en.wikipedia.org/wiki/Runtime_system)),它通常为**垃圾回收**(garbage collection)或**绿色线程**(software threads,或 green threads)服务,如 Java 的 GC 或 Go 语言的协程(goroutine);这个运行时系统需要在 main 函数前启动,因为它需要让程序初始化。 在一个典型的使用标准库的 Rust 程序中,程序运行是从一个名为 `crt0` 的运行时库开始的。`crt0` 意为 C runtime zero,它能建立一个适合运行 C 语言程序的环境,这包含了栈的创建和可执行程序参数的传入。在这之后,这个运行时库会调用 [Rust 的运行时入口点](https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73),这个入口点被称作 **start语言项**("start" language item)。Rust 只拥有一个极小的运行时,它被设计为拥有较少的功能,如爆栈检测和打印**栈轨迹**(stack trace)。这之后,这个运行时将会调用 main 函数。 我们的独立式可执行程序并不能访问 Rust 运行时或 `crt0` 库,所以我们需要定义自己的入口点。只实现一个 `start` 语言项并不能帮助我们,因为这之后程序依然要求 `crt0` 库。所以,我们要做的是,直接重写整个 `crt0` 库和它定义的入口点。 ### 重写入口点 要告诉 Rust 编译器我们不使用预定义的入口点,我们可以添加 `#![no_main]` 属性。 ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// 这个函数将在 panic 时被调用 #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` 读者也许会注意到,我们移除了 `main` 函数。原因很显然,既然没有底层运行时调用它,`main` 函数也失去了存在的必要性。为了重写操作系统的入口点,我们转而编写一个 `_start` 函数: ```rust #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { loop {} } ``` 我们使用 `no_mangle` 标记这个函数,来对它禁用**名称重整**([name mangling](https://en.wikipedia.org/wiki/Name_mangling))——这确保 Rust 编译器输出一个名为 `_start` 的函数;否则,编译器可能最终生成名为 `_ZN3blog_os4_start7hb173fedf945531caE` 的函数,无法让链接器正确辨别。 我们还将函数标记为 `extern "C"`,告诉编译器这个函数应当使用 [C 语言的调用约定](https://en.wikipedia.org/wiki/Calling_convention),而不是 Rust 语言的调用约定。函数名为 `_start` ,是因为大多数系统默认使用这个名字作为入口点名称。 与前文的 `panic` 函数类似,这个函数的返回值类型为`!`——它定义了一个发散函数,或者说一个不允许返回的函数。这一点很重要,因为这个入口点不会被任何函数调用,但将直接被操作系统或**引导程序**(bootloader)调用。所以作为函数返回的替代,这个入口点应该去调用,比如操作系统提供的 **exit 系统调用**(["exit" system call](https://en.wikipedia.org/wiki/Exit_(system_call)))函数。在我们编写操作系统的情况下,关机应该是一个合适的选择,因为**当一个独立式可执行程序返回时,不会留下任何需要做的事情**(there is nothing to do if a freestanding binary returns)。现在来看,我们可以添加一个无限循环,来满足对返回值类型的需求。 如果我们现在编译这段程序,会出来一大段不太好看的**链接器错误**(linker error)。 ## 链接器错误 **链接器**(linker)是一个程序,它将生成的目标文件组合为一个可执行文件。不同的操作系统如 Windows、macOS、Linux,规定了不同的可执行文件格式,因此也各有自己的链接器,抛出不同的错误;但这些错误的根本原因还是相同的:链接器的默认配置假定程序依赖于C语言的运行时环境,但我们的程序并不依赖于它。 为了解决这个错误,我们需要告诉链接器,它不应该包含(include)C 语言运行环境。我们可以选择提供特定的**链接器参数**(linker argument),也可以选择编译为**裸机目标**(bare metal target)。 ### 编译为裸机目标 在默认情况下,Rust 尝试适配当前的系统环境,编译可执行程序。举个例子,如果你使用 `x86_64` 平台的 Windows 系统,Rust 将尝试编译一个扩展名为 `.exe` 的 Windows 可执行程序,并使用 `x86_64` 指令集。这个环境又被称作为你的**宿主系统**("host" system)。 为了描述不同的环境,Rust 使用一个称为**目标三元组**(target triple)的字符串。要查看当前系统的目标三元组,我们可以运行 `rustc --version --verbose`: ``` rustc 1.35.0-nightly (474e7a648 2019-04-07) binary: rustc commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab commit-date: 2019-04-07 host: x86_64-unknown-linux-gnu release: 1.35.0-nightly LLVM version: 8.0 ``` 上面这段输出来自一个 `x86_64` 平台下的 Linux 系统。我们能看到,`host` 字段的值为三元组 `x86_64-unknown-linux-gnu`,它包含了 CPU 架构 `x86_64` 、供应商 `unknown` 、操作系统 `linux` 和[二进制接口](https://en.wikipedia.org/wiki/Application_binary_interface) `gnu`。 Rust 编译器尝试为当前系统的三元组编译,并假定底层有一个类似于 Windows 或 Linux 的操作系统提供C语言运行环境——然而这将导致链接器错误。所以,为了避免这个错误,我们可以另选一个底层没有操作系统的运行环境。 这样的运行环境被称作裸机环境,例如目标三元组 `thumbv7em-none-eabihf` 描述了一个 ARM **嵌入式系统**([embedded system](https://en.wikipedia.org/wiki/Embedded_system))。我们暂时不需要了解它的细节,只需要知道这个环境底层没有操作系统——这是由三元组中的 `none` 描述的。要为这个目标编译,我们需要使用 rustup 添加它: ``` rustup target add thumbv7em-none-eabihf ``` 这行命令将为目标下载一个标准库和 core 库。这之后,我们就能为这个目标构建独立式可执行程序了: ``` cargo build --target thumbv7em-none-eabihf ``` 我们传递了 `--target` 参数,来为裸机目标系统**交叉编译**([cross compile](https://en.wikipedia.org/wiki/Cross_compiler))我们的程序。我们的目标并不包括操作系统,所以链接器不会试着链接 C 语言运行环境,因此构建过程会成功完成,不会产生链接器错误。 我们将使用这个方法编写自己的操作系统内核。我们不会编译到 `thumbv7em-none-eabihf`,而是使用描述 `x86_64` 环境的**自定义目标**([custom target](https://doc.rust-lang.org/rustc/targets/custom.html))。在下一篇文章中,我们将详细描述一些相关的细节。 ### 链接器参数 我们也可以选择不编译到裸机系统,因为传递特定的参数也能解决链接器错误问题。虽然我们不会在后面使用到这个方法,为了教程的完整性,我们也撰写了专门的短文章,来提供这个途径的解决方案。 如有需要,请点击下方的 _"链接器参数"_ 按钮来展开可选内容。
    链接器参数 在本章节中,我们讨论了Linux、Windows和macOS中遇到的链接错误,并阐述如何通过传递额外参数来解决这些错误。注意,由于不同操作系统的可执行文件内在格式不同,所以对于不同操作系统而言,所适用的额外参数也有所不同。 #### Linux 在Linux下,会触发以下链接错误(简化版): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` 这里的问题在于,链接器默认包含了C启动例程,即构建名为 `_start` 的入口函数的地方。但其依赖一些C标准库 `libc` 中的符号,而我们已经使用 `no_std` 开关排除掉了这些符号,所以链接器报告了这些错误。要解决这个问题,我们需要通过 `-nostartfiles` 参数来告诉链接器不要使用C启动例程功能。 通过 `cargo rustc` 可以传递链接器参数,该命令和 `cargo build` 的效果完全一致,但是可以将参数传递给rust的底层编译器 `rustc`。`rustc` 支持 `-C link-arg` 参数,此参数可以传递参数给配套的链接器。那么以此推断,我们的编译语句可以这样写: ``` cargo rustc -- -C link-arg=-nostartfiles ``` 现在我们编译出的程序就可以在Linux上独立运行了。 我们并不需要显式指定入口函数名,链接器默认会查找 `_start` 函数作为入口点。 #### Windows 在Windows下,会触发以下链接错误(简化版): ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` 错误信息 “entry point must be defined” 意味着链接器没有找到程序入口点。在Windows环境下,默认入口点[取决于使用的子系统][windows-subsystems]。对于 `CONSOLE` 子系统,链接器会寻找 `mainCRTStartup` 函数作为入口,而对于 `WINDOWS` 子系统,入口函数名叫做 `WinMainCRTStartup`。要复写掉入口函数名的默认设定,使其使用我们已经定义的 `_start` 函数,可以将 `/ENTRY` 参数传递给链接器: [windows-subsystems]: https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol ``` cargo rustc -- -C link-arg=/ENTRY:_start ``` 显而易见,从链接参数上看,Windows平台使用的链接器和Linux平台是完全不同的。 此时可能你还会遇到这个链接错误: ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` 该错误的原因是Windows平台下的可执行文件可以使用不同的[子系统][windows-subsystems]。一般而言,操作系统会如此判断:如果入口函数名叫 `main` ,则会使用 `CONSOLE` 子系统;若名叫 `WinMain` ,则会使用 `WINDOWS` 子系统。然而此时我们使用的入口函数名叫 `_start` ,两者都不是,此时就需要显式指定子系统: ``` cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` 这里我们使用了 `CONSOLE` 子系统,如果使用 `WINDOWS` 子系统其实也可以。但是多次使用 `-C link-arg` 参数大可不必,我们可以如上面一样,将一个引号包裹起来的以空格分隔的列表传递给 `-C link-arg` 参数。 现在我们编译出的程序就可以在Windows平台成功运行了。 #### macOS 在macOS下,会触发以下链接错误(简化版): ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` 该错误告诉我们链接器找不到入口函数 `main` (由于某些原因,macOS平台下,所有函数都会具有 `_` 前缀)。要重设入口函数名,我们可以传入链接器参数 `-e` : ``` cargo rustc -- -C link-args="-e __start" ``` `-e` 参数可用于重设入口函数名。由于在macOS平台下,所有函数都具有 `_` 前缀,所以需要传入 `__start` ,而不是 `_start` 。 接下来,会出现一个新的链接错误: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` macOS [并未官方支持静态链接][does not officially support statically linked binaries] ,并且在默认情况下程序会链接 `libSystem` 库。要复写这个设定并进行静态链接,我们可以传入链接器参数 `-static` : [does not officially support statically linked binaries]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ``` cargo rustc -- -C link-args="-e __start -static" ``` 然而问题并没有解决,链接器再次抛出了一个错误: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` 该错误的原因是macOS平台下的程序会默认链接 `crt0` (即“C runtime zero”)。 这个错误实际上和Linux平台上的错误类似,可以添加链接器参数 `-nostartfiles` 解决: ``` cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` 现在,我们的程序可以在macOS下编译成功了。 #### 统一编译命令 经过上面的章节,我们知道了在各个平台使用的编译命令是不同的,这十分不优雅。要解决这个问题,我们可以创建一个 `.cargo/config.toml` 文件,分别配置不同平台下所使用的参数: ```toml # in .cargo/config.toml [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` 对应的 `rustflags` 配置项的值可以自动被填充到 `rustc` 的运行参数中。要寻找 `.cargo/config.toml` 更多的用法,可以看一下 [官方文档](https://doc.rust-lang.org/cargo/reference/config.html)。 现在只需要运行 `cargo build` 即可在全部三个平台编译我们的程序了。 #### 我们真的需要做这些? 尽管我们可以在Linux、Windows和macOS编译出可执行程序,但这可能并非是个好主意。 因为我们的程序少了不少本该存在的东西,比如 `_start` 执行时的栈初始化。 失去了C运行时,部分基于它的依赖项很可能无法正确执行,这会造成程序出现各式各样的异常,比如segmentation fault(段错误)。 如果你希望创建一个基于已存在的操作系统的最小类库,建议引用 `libc` ,阅读 [这里](https://doc.rust-lang.org/1.16.0/book/no-stdlib.html) 并恰当设定 `#[start]` 比较好。
    ## 小结 一个用 Rust 编写的最小化的独立式可执行程序应该长这样: `src/main.rs`: ```rust #![no_std] // 不链接 Rust 标准库 #![no_main] // 禁用所有 Rust 层级的入口点 use core::panic::PanicInfo; #[unsafe(no_mangle)] // 不重整函数名 pub extern "C" fn _start() -> ! { // 因为链接器会寻找一个名为 `_start` 的函数,所以这个函数就是入口点 // 默认命名为 `_start` loop {} } /// 这个函数将在 panic 时被调用 #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # 使用 `cargo build` 编译时需要的配置 [profile.dev] panic = "abort" # 禁用panic时栈展开 # 使用 `cargo build --release` 编译时需要的配置 [profile.release] panic = "abort" # 禁用 panic 时栈展开 ``` 选用任意一个裸机目标来编译。比如对 `thumbv7em-none-eabihf`,我们使用以下命令: ```bash cargo build --target thumbv7em-none-eabihf ``` 另外,我们也可以选择以本地操作系统为目标进行编译: ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` 要注意的是,现在我们的代码只是一个 Rust 编写的独立式可执行程序的一个例子。运行这个二进制程序还需要很多准备,比如在 `_start` 函数之前需要一个已经预加载完毕的栈。所以为了真正运行这样的程序,**我们还有很多事情需要做**。 ## 让 `rust-analyzer` 愉快工作 [`rust-analyzer`](https://rust-analyzer.github.io/) 项目是一个很好的工具,可以为你的编辑器中的 Rust 代码提供代码补全和"转到定义"支持(以及许多其他功能)。 它对于 `#![no_std]` 项目也同样出色,所以我推荐在内核开发中使用它! 如果你使用了 `rust-analyzer` 的 [`checkOnSave`](https://rust-analyzer.github.io/book/configuration.html#checkOnSave) 特性(默认开启),它可能会报告我们内核的 panic 函数存在一个错误: ``` found duplicate lang item `panic_impl` ``` 这个错误的原因是 `rust-analyzer` 默认会调用 `cargo check --all-targets`,这也会导致以[测试](https://doc.rust-lang.org/book/ch11-01-writing-tests.html)和[基准测试](https://doc.rust-lang.org/rustc/tests/index.html#benchmarks)模式构建二进制文件。
    ### "Target"的两种含义 `--all-targets` 标志与 `--target` 参数完全无关。 在 `cargo` 中,"target"这个术语有两种不同的含义: - `--target` 标志指定了应该传递给 `rustc` 编译器的 **[编译目标][compilation target]**。这应该设置为运行我们代码的机器的[目标三元组][target triple]。 - `--all-targets` 标志指的是 Cargo 的 **[包目标][package target]**。Cargo 包可以同时是库和二进制文件,因此你可以指定你想要构建 crate 的方式。此外,Cargo 还有用于[示例](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#examples)、[测试](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#tests)和[基准测试](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#benchmarks)的包目标。这些包目标可以共存,因此你可以同时在例如库模式或测试模式下构建/检查同一个 crate。 [compilation target]: https://doc.rust-lang.org/rustc/targets/index.html [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [package target]: https://doc.rust-lang.org/cargo/reference/cargo-targets.html
    默认情况下,`cargo check` 只构建 _库(library)_ 和 _二进制(binary)_ 包目标。 然而,当启用 [`checkOnSave`](https://rust-analyzer.github.io/book/configuration.html#checkOnSave) 时,`rust-analyzer` 默认选择检查所有包目标。 这就是 `rust-analyzer` 报告上述我们在 `cargo check` 中没有看到的`语言项(lang item)`错误的原因。 如果我们运行 `cargo check --all-targets`,我们也会看到这个错误: ``` error[E0152]: found duplicate lang item `panic_impl` --> src/main.rs:13:1 | 13 | / fn panic(_info: &PanicInfo) -> ! { 14 | | loop {} 15 | | } | |_^ | = note: the lang item is first defined in crate `std` (which `test` depends on) = note: first definition in `std` loaded from /home/[...]/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-8df6be531efb3fd0.rlib = note: second definition in the local crate (`blog_os`) ``` 第一个 `note` 告诉我们 panic 语言项已经在 `std` crate 中定义了,而 `std` 是 `test` crate 的一个依赖项。 `test` crate 在以[测试模式](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#tests)构建 crate 时会自动被包含进来。 这对于我们的 `#![no_std]` 内核来说不合适,因为在裸机上没有办法支持标准库。 所以这个错误与我们的项目无关,我们可以安全地忽略它。 避免这个错误的正确方法是在我们的 `Cargo.toml` 中指定我们的二进制文件不支持以 `测试(test)` 和 `基准测试(bench)` 模式构建。 我们可以通过添加一个 `[[bin]]` 部分到 `Cargo.toml` 来[构建配置](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#configuring-a-target)我们的二进制文件: ```toml # 在 Cargo.toml 中 [[bin]] name = "blog_os" test = false bench = false ``` `bin` 周围的双括号并非笔误,这是 TOML 格式定义可多次出现的键的方式。 由于一个 crate 可以有多个二进制文件,`[[bin]]` 部分也可以在 `Cargo.toml` 中出现多次。 这也是必须有 `name` 字段的原因,它需要与二进制文件的名称匹配(以便 `cargo` 知道哪些设置应该应用于哪个二进制文件)。 通过将 [`test`](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#the-test-field) 和 [`bench`](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#the-bench-field) 字段设置为 `false`,我们指示 `cargo` 不要以测试或基准测试模式构建我们的二进制文件。 现在 `cargo check --all-targets` 应该不会再抛出任何错误了,`rust-analyzer` 的 `checkOnSave` 也应该愉快工作了。 ## 下篇预览 下一篇文章要做的事情基于我们这篇文章的成果,它将详细讲述编写一个最小的操作系统内核需要的步骤:如何配置特定的编译目标,如何将可执行程序与引导程序拼接,以及如何把一些特定的字符串打印到屏幕上。 [next post]: @/edition-2/posts/02-minimal-rust-kernel/index.md ================================================ FILE: blog/content/edition-2/posts/01-freestanding-rust-binary/index.zh-TW.md ================================================ +++ title = "獨立的 Rust 二進制檔" weight = 1 path = "zh-TW/freestanding-rust-binary" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "24d04e0e39a3395ecdce795bab0963cb6afe1bfd" # GitHub usernames of the people that translated this post translators = ["wusyong"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["gnitoahc"] +++ 建立我們自己的作業系統核心的第一步是建立一個不連結標準函式庫的 Rust 執行檔,這使得無需基礎作業系統即可在[裸機][bare metal]上執行 Rust 程式碼。 [bare metal]: https://en.wikipedia.org/wiki/Bare_machine 此網誌在 [GitHub] 上公開開發,如果您有任何問題或疑問,請在那開一個 issue,您也可以在[下面][at the bottom]發表評論,這篇文章的完整開源程式碼可以在 [`post-01`][post branch] 分支中找到。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-01 ## 介紹 要編寫作業系統核心,我們需要不依賴於任何作業系統功能的程式碼。這代表我們不能使用執行緒、檔案系統、堆記憶體、網路、隨機數、標準輸出或任何其他需要作業系統抽象或特定硬體的功能。這也是理所當然的,因為我們正在嘗試寫出自己的 OS 和我們的驅動程式。 這意味著我們不能使用大多數的 [Rust 標準函式庫][Rust standard library],但是我們還是可以使用 _很多_ Rust 的功能。比如說我們可以使用[疊代器][iterators]、[閉包][closures]、[模式配對][pattern matching]、[option] 和 [result]、[字串格式化][string formatting],當然還有[所有權系統][ownership system]。這些功能讓我們能夠以非常有表達力且高階的方式編寫核心,而無需擔心[未定義行為][undefined behavior]或[記憶體安全][memory safety]。 [option]: https://doc.rust-lang.org/core/option/ [result]:https://doc.rust-lang.org/core/result/ [Rust standard library]: https://doc.rust-lang.org/std/ [iterators]: https://doc.rust-lang.org/book/ch13-02-iterators.html [closures]: https://doc.rust-lang.org/book/ch13-01-closures.html [pattern matching]: https://doc.rust-lang.org/book/ch06-00-enums.html [string formatting]: https://doc.rust-lang.org/core/macro.write.html [ownership system]: https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [undefined behavior]: https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs [memory safety]: https://tonyarcieri.com/it-s-time-for-a-memory-safety-intervention 為了在 Rust 中建立 OS 核心,我們需要建立一個無須底層作業系統即可運行的執行檔,這類的執行檔通常稱為「獨立式(freestanding)」或「裸機(bare-metal)」的執行檔。 這篇文章描述了建立一個獨立的 Rust 執行檔的必要步驟,並解釋為什麼需要這些步驟。如果您只對簡單的範例感興趣,可以直接跳到 **[總結](#summary)**。 ## 停用標準函式庫 Rust 所有的 crate 在預設情況下都會連結[標準函式庫][standard library],而標準函式庫會依賴作業系統的功能,像式執行緒、檔案系統或是網路。它也會依賴 C 語言的標準函式庫 `libc`,因為其與作業系統緊密相關。既然我們的計劃是編寫自己的作業系統,我們就得用到 [`no_std` 屬性][`no_std` attribute]來停止標準函式庫的自動引用(automatic inclusion)。 [standard library]: https://doc.rust-lang.org/std/ [`no_std` attribute]: https://doc.rust-lang.org/1.30.0/book/first-edition/using-rust-without-the-standard-library.html 我們先從建立一個新的 cargo 專案開始,最簡單的辦法是輸入下面的命令: ``` cargo new blog_os --bin --edition 2024 ``` 我將專案命名為 `blog_os`,當然讀者也可以自己的名稱。`--bin` 選項說明我們將要建立一個執行檔(而不是一個函式庫),`--edition 2024` 選項指明我們的 crate 想使用 Rust [2024 版本][2024 edition]。當我們執行這行指令的時候,cargo 會為我們建立以下目錄結構: [2024 edition]: https://doc.rust-lang.org/nightly/edition-guide/rust-2024/index.html ``` blog_os ├── Cargo.toml └── src └── main.rs ``` `Cargo.toml` 包含 crate 的設置,像是 crate 的名稱、作者、[語意化版本][semantic version]以及依賴套件。`src/main.rs` 檔案則包含 crate 的根模組(root module)以及我們的 `main` 函式。您可以用 `cargo build` 編譯您的 crate 然後在 `target/debug` 目錄下運行編譯過後的 `blog_os` 執行檔。 [semantic version]: https://semver.org/lang/zh-TW/ ### no_std 屬性 現在我們的 crate 背後依然有和標準函式庫連結。讓我們加上 [`no_std` 屬性][`no_std` attribute] 來停用: ```rust // main.rs #![no_std] fn main() { println!("Hello, world!"); } ``` 當我們嘗試用 `cargo build` 編譯時會出現以下錯誤訊息: ``` error: cannot find macro `println!` in this scope --> src/main.rs:4:5 | 4 | println!("Hello, world!"); | ^^^^^^^ ``` 出現這個錯誤的原因是因為 [`println` 巨集(macro)][`println` macro]是標準函式庫的一部份,而我們不再包含它,所以我們無法再輸出東西來。這也是理所當然因為 `println` 會寫到[標準輸出][standard output],而這是一個由作業系統提供的特殊檔案描述符。 [`println` macro]: https://doc.rust-lang.org/std/macro.println.html [standard output]: https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29 所以讓我們移除這行程式碼,然後用空的 main 函式再試一次: ```rust // main.rs #![no_std] fn main() {} ``` ``` > cargo build error: `#[panic_handler]` function required, but not found error: language item required, but not found: `eh_personality` ``` 現在編譯器告訴我們缺少 `#[panic_handler]` 函式以及 _language item_。 ## 實作 panic 處理函式 `panic_handler` 屬性定義了當 [panic] 發生時編譯器需要呼叫的函式。在標準函式庫中有自己的 panic 處理函式,但在 `no_std` 的環境中我們得定義我們自己的: [panic]: https://doc.rust-lang.org/stable/book/ch09-01-unrecoverable-errors-with-panic.html ```rust // main.rs use core::panic::PanicInfo; /// 此函式會在 panic 時呼叫。 #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` [`PanicInfo` parameter][PanicInfo] 包含 panic 發生時的檔案、行數以及可選的錯誤訊息。這個函式不會返回,所以它被標記為[發散函式][diverging function],只會返回[“never” 型態][“never” type] `!`。現在我們什麼事可以做,所以我們只需寫一個無限迴圈。 [PanicInfo]: https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html [diverging function]: https://doc.rust-lang.org/1.30.0/book/first-edition/functions.html#diverging-functions [“never” type]: https://doc.rust-lang.org/nightly/std/primitive.never.html ## eh_personality Language Item Language item 是一些編譯器需求的特殊函式或類型。舉例來說,Rust 的 [`Copy`] trait 就是一個 language item,告訴編譯器哪些類型擁有[_複製的語意_][`Copy`]。當我們搜尋 `Copy` trait 的[實作][copy code]時,我們會發現一個特殊的 `#[lang = "copy"]` 屬性將它定義為一個 language item。 我們可以自己實現 language item,但這只應是最後的手段。因為 language item 屬於非常不穩定的實作細節,而且不會做類型檢查(所以編譯器甚至不會確保它們的參數類型是否正確)。幸運的是,我們有更穩定的方式來修復上面關於 language item 的錯誤。 `eh_personality` language item 標記的函式將被用於實作[堆疊回溯][stack unwinding]。在預設情況下當 panic 發生時,Rust 會使用堆疊回溯來執行所有存在堆疊上變數的解構子(destructor)。這確保所有使用的記憶體都被釋放,並讓 parent thread 獲取 panic 資訊並繼續運行。但是堆疊回溯是一個複雜的過程,通常會需要一些 OS 的函式庫如 Linux 的 [libunwind] 或 Windows 的 [structured exception handling]。所以我們並不希望在我們的作業系統中使用它。 [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php [libunwind]: https://www.nongnu.org/libunwind/ [structured exception handling]: https://docs.microsoft.com/en-us/windows/win32/debug/structured-exception-handling ### 停用回溯 在某些狀況下回溯可能並不是我們要的功能,因此 Rust 提供了[在 panic 時中止][abort on panic]的選項。這個選項能停用回溯標誌訊息的產生,也因此能縮小生成的二進制檔案大小。我們能用許多方式開啟這個選項,而最簡單的方式就是把以下幾行設置加入我們的 `Cargo.toml`: ```toml [profile.dev] panic = "abort" [profile.release] panic = "abort" ``` 這些選項能將 `dev` 設置(用於 `cargo build`)和 `release` 設置(用於 `cargo build --release`)的 panic 策略設為 `abort`。現在編譯器不會再要求我們提供 `eh_personality` language item。 [abort on panic]: https://github.com/rust-lang/rust/pull/32900 現在我們已經修復了上面的錯誤,但是如果我們嘗試編譯的話,又會出現一個新的錯誤: ``` > cargo build error: requires `start` lang_item ``` 我們的程式缺少 `start` 這個用來定義入口點(entry point)的 language item。 ## `start` 屬性 我們通常會認為執行一個程式時,首先被呼叫的是 `main` 函式。但是大多數語言都擁有一個[執行時系統][runtime system],它通常負責垃圾回收(garbage collection)像是 Java 或軟體執行緒(software threads)像是 Go 的 goroutines。這個執行時系統需要在 main 函式前啟動,因為它需要讓先進行初始化。 [runtime system]: https://en.wikipedia.org/wiki/Runtime_system 在一個典型使用標準函式庫的 Rust 程式中,程式運行是從一個名為 `crt0`(“C runtime zero”)的執行時函式庫開始的,它會設置 C 程式的執行環境。這包含建立堆疊和可執行程式參數的傳入。在這之後,這個執行時函式庫會呼叫 [Rust 的執行時入口點][rt::lang_start],而此處就是由 `start` language item 標記。 Rust 只有一個非常小的執行時系統,負責處理一些小事情,像是堆疊溢位或是印出 panic 時回溯的訊息。再來執行時系統最終才會呼叫 main 函式。 [rt::lang_start]: https://github.com/rust-lang/rust/blob/bb4d1491466d8239a7a5fd68bd605e3276e97afb/src/libstd/rt.rs#L32-L73 我們的獨立式可執行檔並沒有辦法存取 Rust 執行時系統或 `crt0`,所以我們需要定義自己的入口點。實作 `start` language item 並沒有用,因為這樣還是會需要 `crt0`。所以我們要做的是直接覆寫 `crt0` 的入口點。 ### 重寫入口點 為了告訴 Rust 編譯器我們不要使用一般的入口點呼叫順序,我們先加上 `#![no_main]` 屬性。 ```rust #![no_std] #![no_main] use core::panic::PanicInfo; /// 此函式會在 panic 時呼叫。 #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` 您可能會注意到我們移除了 `main` 函式,原因是因為既然沒有了底層的執行時系統呼叫,那麼 `main` 也沒必要存在。我們要重寫作業系統的入口點,定義為 `_start` 函式: ```rust #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { loop {} } ``` 我們使用 `no_mangle` 屬性來停用[名稱重整][name mangling],確保 Rust 編譯器輸出的函式名稱會是 `_start`。沒有這個屬性的話,編譯器會產生符號像是 `_ZN3blog_os4_start7hb173fedf945531caE` 來讓每個函式的名稱都是獨一無二的。我們會需要這項屬性的原因是因為我們接下來希望連結器能夠呼叫入口點函式的名稱。 我們還將函式標記為 `extern "C"` 來告訴編譯器這個函式應當使用 [C 的呼叫慣例][C calling convention],而不是 Rust 的呼叫慣例。而函式名稱選用 `_start` 的原因是因為這是大多數系統的預設入口點名稱。 [name mangling]: https://en.wikipedia.org/wiki/Name_mangling [C calling convention]: https://en.wikipedia.org/wiki/Calling_convention `!` 返回型態代表這個函式是發散函式,它不允許返回。這是必要的因為入口點不會被任何函式呼叫,只會直接由作業系統或啟動程式(bootloader)執行。所以取代返回值的是入口點需要執行作業系統的 [`exit` 系統呼叫][`exit` system call]。在我們的例子中,關閉機器似乎是個理想的動作,因為獨立的二進制檔案返回後也沒什麼事可做。現在我們先寫一個無窮迴圈來滿足需求。 [`exit` system call]: https://en.wikipedia.org/wiki/Exit_(system_call) 當我們現在運行 `cargo build` 的話會看到很醜的 _連結器_ 錯誤。 ## 連結器錯誤 連結器是用來將產生的程式碼結合起來成為執行檔的程式。因為 Linux、Windows 和 macOS 之間的執行檔格式都不同,每個系統都會有自己的連結器錯誤。不過造成錯誤的原因通常都差不多:連結器預設的設定會認為我們的程式依賴於 C 的執行時系統,但我們並沒有。 為了解決這個錯誤,我們需要告訴連結器它不需要包含 C 的執行時系統。我們可以選擇提供特定的連結器參數設定,或是選擇編譯為裸機目標。 ### 編譯為裸機目標 Rust 在預設情況下會嘗試編譯出符合你目前系統環境的可執行檔。舉例來說,如果你正在 `x86_64` 上使用 Windows,那麼 Rust 就會嘗試編譯出 `.exe`,一個使用 `x86_64` 指令集的 Windows 執行檔。這樣的環境稱之為主機系統(host system)。 為了描述不同環境,Rust 使用 [_target triple_] 的字串。要查看目前系統的 target triple,你可以執行 `rustc --version --verbose`: [_target triple_]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple ``` rustc 1.35.0-nightly (474e7a648 2019-04-07) binary: rustc commit-hash: 474e7a6486758ea6fc761893b1a49cd9076fb0ab commit-date: 2019-04-07 host: x86_64-unknown-linux-gnu release: 1.35.0-nightly LLVM version: 8.0 ``` 上面的輸出訊息來自 `x86_64` 上的 Linux 系統,我們可以看到 `host` 的 target triple 為 `x86_64-unknown-linux-gnu`,分別代表 CPU 架構 (`x86_64`)、供應商 (`unknown`) 以及作業系統 (`linux`) 和 [ABI] (`gnu`)。 [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface 在依據主機的 triple 編譯時,Rust 編譯器和連結器理所當然地會認為預設是底層的作業系統並使用 C 執行時系統,這便是造成錯誤的原因。要避免這項錯誤,我們可以選擇編譯出沒有底層作業系統的不同環境。 其中一個裸機環境的例子是 `thumbv7em-none-eabihf` target triple,它描述了[嵌入式][embedded] [ARM] 系統。其中的細節目前並不重要,我們現在只需要知道沒有底層作業系統的 target triple 是用 `none` 描述的。想要編譯這樣的目標的話,我們需要將它新增至 rustup: [embedded]: https://en.wikipedia.org/wiki/Embedded_system [ARM]: https://en.wikipedia.org/wiki/ARM_architecture ``` rustup target add thumbv7em-none-eabihf ``` 這會下載一份該系統的標準(以及 core)函式庫,現在我們可以用此目標建立我們的獨立執行檔了: ``` cargo build --target thumbv7em-none-eabihf ``` 我們傳入 `--target` [交叉編譯][cross compile]我們在裸機系統的執行檔。因為目標系統沒有作業系統,連結器不會嘗試連結 C 執行時系統並成功建立,不會產生任何連結器錯誤。 [cross compile]: https://en.wikipedia.org/wiki/Cross_compiler 這將會是我們到時候用來建立自己的作業系統核心的方法。不過我們不會用到 `thumbv7em-none-eabihf`,我們將會使用[自訂目標][custom target]來描述一個 `x86_64` 的裸機環境。 [custom target]: https://doc.rust-lang.org/rustc/targets/custom.html ### 連結器引數 除了編譯裸機系統為目標以外,我們也可以傳入特定的引數組合給連結器來解決錯誤。這不會是我們到時候用在我們核心的方法,所以以下的內容不是必需的,只是用來補齊資訊。點選下面的 _「連結器引數」_ 來顯示額外資訊。
    連結器引數 在這部份我們將討論 Linux、Windows 和 macOS 上發生的連結器錯誤,然後解釋如何傳入額外引數給連結器以解決錯誤。注意執行檔和連結器在不同作業系統之間都會相異,所以不同系統需要傳入不同引數。 #### Linux 以下是 Linux 上會出現的(簡化過)連結器錯誤: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x12): undefined reference to `__libc_csu_fini' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x19): undefined reference to `__libc_csu_init' /usr/lib/gcc/../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x25): undefined reference to `__libc_start_main' collect2: error: ld returned 1 exit status ``` 問題的原因是因為連結器在一開始包含了 C 的執行時系統,而且剛好也叫做 `_start`。它需要一些 C 標準函式庫 `libc` 提供的符號,但我們用 `no_std` 來停用它了,所以連結器無法找出引用來源。我們可以用 `-nostartfiles` 來告訴連結器一開始不必連結 C 的執行時系統。 要傳入的其中一個方法是透過 cargo 的 `cargo rustc` 命令,此命令行為和 `cargo build` 一樣,不過允許傳入一些選項到 Rust 底層的編譯器 `rustc`。`rustc` 有 `-C link-arg` 的選項會繼續將引數傳到連結器,這樣一來我們的指令會長得像這樣: ``` cargo rustc -- -C link-arg=-nostartfiles ``` 現在我們的 crate 便能產生出 Linux 上的獨立執行檔了! 我們不必再指明入口點的函式名稱,因為連結器預設會尋找 `_start` 函式。 #### Windows 在 Windows 上會出現不一樣的(簡化過)連結器錯誤: ``` error: linking with `link.exe` failed: exit code: 1561 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1561: entry point must be defined ``` "entry point must be defined" 錯誤表示連結器找不到入口點,在 Windows 上預設的入口點名稱會[依據使用的子系統][windows-subsystems]。如果是 `CONSOLE` 子系統的話,連結器會尋找 `mainCRTStartup` 函式名稱;而 `WINDOWS` 子系統的話則會尋找 `WinMainCRTStartup` 函式名稱。要覆蓋預設的選項並讓連結器尋找我們的 `_start` 函式的話,我們可以傳入 `/ENTRY` 引數給連結器: [windows-subsystems]: https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbol ``` cargo rustc -- -C link-arg=/ENTRY:_start ``` 從引數格式來看我們可以清楚理解 Windows 連結器與 Linux 連結器是完全不同的程式。 現在會出現另一個連結器錯誤: ``` error: linking with `link.exe` failed: exit code: 1221 | = note: "C:\\Program Files (x86)\\…\\link.exe" […] = note: LINK : fatal error LNK1221: a subsystem can't be inferred and must be defined ``` 此錯誤出現的原因是因為 Windows 執行檔可以使用不同的[子系統][windows-subsystems]。一般的程式會依據入口點名稱來決定:如果入口點名稱為 `main` 則會使用 `CONSOLE` 子系統;如果入口點名稱為 `WinMain` 則會使用 `WINDOWS` 子系統。由於我們的函式 `_start` 名稱不一樣,我們必須指明子系統: ``` cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" ``` 我們使用 `CONSOLE` 子系統不過 `WINDOWS` 一樣也可以。與其輸入好多次 `-C link-arg` ,我們可以用 `-C link-args` 來傳入許多引數。 使用此命令後,我們的執行檔應當能成功在 Windows 上建立。 #### macOS 以下是 macOS 上會出現的(簡化過)連結器錯誤: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: entry point (_main) undefined. for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` 此錯誤訊息告訴我們連結器無法找到入口點函式 `main`,基於某些原因 macOS 上的函式都會加上前綴 `_`。為了設定入口點為我們的函式 `_start`,我們傳入 `-e` 連結器引數: ``` cargo rustc -- -C link-args="-e __start" ``` `-e` 表示入口點的函式名稱,然後由於 macOS 上所有的函式都會加上前綴 `_`,我們需要設置入口點為 `__start` 而不是 `_start`。 接下來會出現另一個連結器錯誤: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: dynamic main executables must link with libSystem.dylib for architecture x86_64 clang: error: linker command failed with exit code 1 […] ``` macOS [官方並不支援靜態連結執行檔][does not officially support statically linked binaries]且要求程式預設要連結到 `libSystem` 函式庫。要覆蓋這個設定並連結靜態執行檔,我們傳入 `-static` 給連結器: [does not officially support statically linked binaries]: https://developer.apple.com/library/archive/qa/qa1118/_index.html ``` cargo rustc -- -C link-args="-e __start -static" ``` 但這樣還不夠,我們會遇到第三個連結器錯誤: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" […] = note: ld: library not found for -lcrt0.o clang: error: linker command failed with exit code 1 […] ``` 這錯誤出現的原因是因為 macOS 的程式預設都會連結到 `crt0` (“C runtime zero”)。這和我們在 Linux 上遇到的類似,所以也可以用 `-nostartfiles` 連結器引數來解決: ``` cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` 現在我們的程式應當能成功在 macOS 上建立。 #### 統一建構命令 現在我們得依據主機平台來使用不同的建構命令,這樣感覺不是很理想。我們可以建立個檔案 `.cargo/config` 來解決,裡面會包含平台相關的引數: ```toml # in .cargo/config [target.'cfg(target_os = "linux")'] rustflags = ["-C", "link-arg=-nostartfiles"] [target.'cfg(target_os = "windows")'] rustflags = ["-C", "link-args=/ENTRY:_start /SUBSYSTEM:console"] [target.'cfg(target_os = "macos")'] rustflags = ["-C", "link-args=-e __start -static -nostartfiles"] ``` `rustflags` 包含的引數會自動加到 `rustc` 如果條件符合的話。想了解更多關於 `.cargo/config` 的資訊請參考[官方文件][official documentation](https://doc.rust-lang.org/cargo/reference/config.html)。 這樣一來我們就能同時在三個平台只用 `cargo build` 來建立了。 #### 你該這麼作嗎? 雖然我們可以在 Linux、Windows 和 macOS 上建立獨立執行檔,不過這可能不是好主意。我們目前會需要這樣做的原因是因為我們的執行檔仍然需要仰賴一些事情,像是當 `_start` 函式呼叫時堆疊已經初始化完畢。少了 C 執行時系統,有些要求可能會無法達成,造成我們的程式失效,像是 segmentation fault。 如果你想要建立一個運行在已存作業系統上的最小執行檔,改用 `libc` 然後如這邊[所述](https://doc.rust-lang.org/1.16.0/book/no-stdlib.html)設置 `#[start]` 屬性可能會是更好的做法。
    ## 總結 {#summary} 一個最小的 Rust 獨立執行檔會看起來像這樣: `src/main.rs`: ```rust #![no_std] // 不連結標準函式庫 #![no_main] // 停用 Rust 層級的入口點 use core::panic::PanicInfo; #[unsafe(no_mangle)] // 不修飾函式名稱 pub extern "C" fn _start() -> ! { // 因為連結器預設會尋找 `_start` 函式名稱 // 所以這個函式就是入口點 loop {} } /// 此函式會在 panic 時呼叫 #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` `Cargo.toml`: ```toml [package] name = "crate_name" version = "0.1.0" authors = ["Author Name "] # `cargo build` 時需要的設置 [profile.dev] panic = "abort" # 停用 panic 時堆疊回溯 # `cargo build --release` 時需要的設置 [profile.release] panic = "abort" # 停用 panic 時堆疊回溯 ``` 要建構出此執行檔,我們需要選擇一個裸機目標來編譯像是 `thumbv7em-none-eabihf`: ``` cargo build --target thumbv7em-none-eabihf ``` 不然我們也可以用主機系統來編譯,不過要加上額外的連結器引數: ```bash # Linux cargo rustc -- -C link-arg=-nostartfiles # Windows cargo rustc -- -C link-args="/ENTRY:_start /SUBSYSTEM:console" # macOS cargo rustc -- -C link-args="-e __start -static -nostartfiles" ``` 注意這只是最小的 Rust 獨立執行檔範例,它還是會依賴一些事情,像是當 `_start` 函式呼叫時堆疊已經初始化完畢。**所以如果想真的使用這樣的執行檔的話還需要更多步驟。** ## 接下來呢? [下一篇文章][next post] 將會講解如何將我們的獨立執行檔轉成最小的作業系統核心。這包含建立自訂目標、用啟動程式組合我們的執行檔,還有學習如何輸出一些東西到螢幕上。 [next post]: @/edition-2/posts/02-minimal-rust-kernel/index.md ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/_index.md ================================================ +++ title = "Extra Posts for Minimal Rust Kernel" sort_by = "weight" insert_anchor_links = "left" render = false +++ ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.ko.md ================================================ +++ title = "Red Zone 기능 해제하기" weight = 1 path = "ko/red-zone" template = "edition-2/extra.html" +++ [red zone]은 [System V ABI]에서 사용 가능한 최적화 기법으로, 스택 포인터를 변경하지 않은 채로 함수들이 임시적으로 스택 프레임 아래의 128 바이트 공간을 사용할 수 있게 해줍니다: [red zone]: https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone [System V ABI]: https://wiki.osdev.org/System_V_ABI ![stack frame with red zone](red-zone.svg) 위 사진은 `n`개의 지역 변수를 가진 함수의 스택 프레임을 보여줍니다. 함수가 호출되었을 때, 함수의 반환 주소 및 지역 변수들을 스택에 저장할 수 있도록 스택 포인터의 값이 조정됩니다. red zone은 조정된 스택 포인터 아래의 128바이트의 메모리 구간을 가리킵니다. 함수가 또 다른 함수를 호출하지 않는 구간에서만 사용하는 임시 데이터의 경우, 함수가 이 구간에 해당 데이터를 저장하는 데 이용할 수 있습니다. 따라서 스택 포인터를 조정하기 위해 필요한 명령어 두 개를 생략할 수 있는 상황이 종종 있습니다 (예: 다른 함수를 호출하지 않는 함수). 하지만 이 최적화 기법을 사용하는 도중 소프트웨어 예외(exception) 혹은 하드웨어 인터럽트가 일어날 경우 큰 문제가 생깁니다. 함수가 red zone을 사용하던 도중 예외가 발생한 상황을 가정해보겠습니다: ![red zone overwritten by exception handler](red-zone-overwrite.svg) CPU와 예외 처리 핸들러가 red zone에 있는 데이터를 덮어씁니다. 하지만 이 데이터는 인터럽트된 함수가 사용 중이었던 것입니다. 따라서 예외 처리 핸들러로부터 반환하여 다시 인터럽트된 함수가 계속 실행되게 되었을 때 변경된 red zone의 데이터로 인해 함수가 오작동할 수 있습니다. 이런 현상으로 인해 [디버깅하는 데에 몇 주씩 걸릴 수 있는 이상한 버그][take weeks to debug]가 발생할지도 모릅니다. [take weeks to debug]: https://forum.osdev.org/viewtopic.php?t=21720 미래에 예외 처리 로직을 구현할 때 이러한 오류가 일어나는 것을 피하기 위해 우리는 미리 red zone 최적화 기법을 해제한 채로 프로젝트를 진행할 것입니다. 컴파일 대상 환경 설정 파일에 `"disable-redzone": true` 줄을 추가함으로써 해당 기능을 해제할 수 있습니다. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.md ================================================ +++ title = "Disable the Red Zone" weight = 1 path = "red-zone" template = "edition-2/extra.html" +++ The [red zone] is an optimization of the [System V ABI] that allows functions to temporarily use the 128 bytes below their stack frame without adjusting the stack pointer: [red zone]: https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone [System V ABI]: https://wiki.osdev.org/System_V_ABI ![stack frame with red zone](red-zone.svg) The image shows the stack frame of a function with `n` local variables. On function entry, the stack pointer is adjusted to make room on the stack for the return address and the local variables. The red zone is defined as the 128 bytes below the adjusted stack pointer. The function can use this area for temporary data that's not needed across function calls. Thus, the two instructions for adjusting the stack pointer can be avoided in some cases (e.g. in small leaf functions). However, this optimization leads to huge problems with exceptions or hardware interrupts. Let's assume that an exception occurs while a function uses the red zone: ![red zone overwritten by exception handler](red-zone-overwrite.svg) The CPU and the exception handler overwrite the data in the red zone. But this data is still needed by the interrupted function. So the function won't work correctly anymore when we return from the exception handler. This might lead to strange bugs that [take weeks to debug]. [take weeks to debug]: https://forum.osdev.org/viewtopic.php?t=21720 To avoid such bugs when we implement exception handling in the future, we disable the red zone right from the beginning. This is achieved by adding the `"disable-redzone": true` line to our target configuration file. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.pt-BR.md ================================================ +++ title = "Desabilitando a Red Zone" weight = 1 path = "pt-BR/red-zone" template = "edition-2/extra.html" [extra] # Please update this when updating the translation translation_based_on_commit = "9d079e6d3e03359469d6cf1759bb1a196d8a11ac" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ A [red zone] é uma otimização da [System V ABI] que permite que funções usem temporariamente os 128 bytes abaixo do seu stack frame sem ajustar o ponteiro de pilha: [red zone]: https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone [System V ABI]: https://wiki.osdev.org/System_V_ABI ![stack frame com red zone](red-zone.svg) A imagem mostra o stack frame de uma função com `n` variáveis locais. Na entrada da função, o ponteiro de pilha é ajustado para abrir espaço na pilha para o endereço de retorno e as variáveis locais. A red zone é definida como os 128 bytes abaixo do ponteiro de pilha ajustado. A função pode usar esta área para dados temporários que não são necessários entre chamadas de função. Assim, as duas instruções para ajustar o ponteiro de pilha podem ser evitadas em alguns casos (por exemplo, em pequenas funções folha). No entanto, esta otimização leva a problemas enormes com exceções ou interrupções de hardware. Vamos assumir que uma exceção ocorre enquanto uma função usa a red zone: ![red zone sobrescrita pelo handler de exceção](red-zone-overwrite.svg) A CPU e o handler de exceção sobrescrevem os dados na red zone. Mas estes dados ainda são necessários pela função interrompida. Então a função não funcionará mais corretamente quando retornarmos do handler de exceção. Isso pode levar a bugs estranhos que [levam semanas para depurar]. [levam semanas para depurar]: https://forum.osdev.org/viewtopic.php?t=21720 Para evitar tais bugs quando implementarmos tratamento de exceções no futuro, desabilitamos a red zone logo de início. Isso é alcançado adicionando a linha `"disable-redzone": true` ao nosso arquivo de configuração de alvo. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.ru.md ================================================ +++ title = "Отключение красной зоны" weight = 1 path = "ru/red-zone" template = "edition-2/extra.html" +++ [Красная зона][red zone] — это оптимизация [System V ABI], которая позволяет функциям временно использовать 128 байт ниже своего стекового кадра без корректировки указателя стека: [red zone]: https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone [System V ABI]: https://wiki.osdev.org/System_V_ABI ![stack frame with red zone](red-zone.svg) На рисунке показан стековый фрейм функции с `n` локальных переменных. При входе в функцию указатель стека корректируется, чтобы освободить место в стеке для адреса возврата и локальных переменных. Красная зона определяется как 128 байт ниже скорректированного указателя стека. Функция может использовать эту зону для временных данных, которые не нужны при всех вызовах функции. Таким образом, в некоторых случаях (например, в небольших листовых функциях) можно обойтись без двух инструкций для корректировки указателя стека. Однако такая оптимизация приводит к огромным проблемам при работе с исключениями или аппаратными прерываниями. Предположим, что во время использования функцией красной зоны происходит исключение: ![red zone overwritten by exception handler](red-zone-overwrite.svg) Процессор и обработчик исключений перезаписывают данные в красной зоне. Но эти данные все еще нужны прерванной функции. Поэтому функция не будет работать правильно, когда мы вернемся из обработчика исключений. Это может привести к странным ошибкам, на отладку которых [уйдут недели][take weeks to debug]. [take weeks to debug]: https://forum.osdev.org/viewtopic.php?t=21720 Чтобы избежать подобных ошибок при реализации обработки исключений в будущем, мы отключим красную зону с самого начала. Это достигается путем добавления строки `"disable-redzone": true` в наш целевой конфигурационный файл. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.zh-CN.md ================================================ +++ title = "Disable the Red Zone" weight = 1 path = "zh-CN/red-zone" template = "edition-2/extra.html" +++ [红区][red zone] 是 [System V ABI] 提供的一种优化技术,它使得函数可以在不修改栈指针的前提下,临时使用其栈帧下方的128个字节。 [red zone]: https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64#the-red-zone [System V ABI]: https://wiki.osdev.org/System_V_ABI ![stack frame with red zone](red-zone.svg) 上图展示了一个包含了 `n` 个局部变量的栈帧。当方法开始执行时,栈指针会被调整到一个合适的位置,为返回值和局部变量留出足够的空间。 红区是位于调整后的栈指针下方,长度为128字节的区域,函数会使用这部分空间存储不会被跨函数调用的临时数据。所以在某些情况下(比如逻辑简短的叶函数),红区可以节省用于调整栈指针的两条机器指令。 然而红区优化有时也会引发无法处理的巨大问题(异常或者硬件中断),如果使用红区时发生了某种异常: ![red zone overwritten by exception handler](red-zone-overwrite.svg) CPU和异常处理机制会把红色区域内的数据覆盖掉,但是被中断的函数依然在引用着这些数据。当函数从错误中恢复时,错误的数据就会引发更大的错误,这类错误往往需要[追踪数周][take weeks to debug]才能找到。 [take weeks to debug]: https://forum.osdev.org/viewtopic.php?t=21720 要在编写异常处理机制时避免这些隐蔽而难以追踪的bug,我们需要从一开始就禁用红区优化,具体到配置文件中的配置项,就是 `"disable-redzone": true`。 ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.ko.md ================================================ +++ title = "SIMD 해제하기" weight = 2 path = "ko/disable-simd" template = "edition-2/extra.html" +++ [Single Instruction Multiple Data (SIMD)] 명령어들은 여러 데이터 word에 동시에 덧셈 등의 작업을 실행할 수 있으며, 이를 통해 프로그램의 실행 시간을 상당히 단축할 수 있습니다. `x86_64` 아키텍처는 다양한 SIMD 표준들을 지원합니다: [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD - [MMX]: _Multi Media Extension_ 명령어 집합은 1997년에 등장하였으며, `mm0`에서 `mm7`까지 8개의 64비트 레지스터들을 정의합니다. 이 레지스터들은 그저 [x87 부동 소수점 장치][x87 floating point unit]의 레지스터들을 가리키는 별칭입니다. - [SSE]: _Streaming SIMD Extensions_ 명령어 집합은 1999년에 등장하였습니다. 부동 소수점 연산용 레지스터를 재사용하는 대신 새로운 레지스터 집합을 도입했습니다. `xmm0`에서 `xmm15`까지 16개의 새로운 128비트 레지스터를 정의합니다. - [AVX]: _Advanced Vector Extensions_ 은 SSE에 추가로 멀티미디어 레지스터의 크기를 늘리는 확장 표준입니다. `ymm0`에서 `ymm15`까지 16개의 새로운 256비트 레지스터를 정의합니다. `ymm` 레지스터들은 기존의 `xmm` 레지스터를 확장합니다 (`xmm0`이 `ymm0` 레지스터의 하부 절반을 차지하는 식으로 다른 15개의 짝에도 같은 방식의 확장이 적용됩니다). [MMX]: https://en.wikipedia.org/wiki/MMX_(instruction_set) [x87 floating point unit]: https://en.wikipedia.org/wiki/X87 [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions [AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions 이러한 SIMD 표준들을 사용하면 프로그램 실행 속도를 많이 향상할 수 있는 경우가 많습니다. 우수한 컴파일러는 [자동 벡터화 (auto-vectorization)][auto-vectorization]이라는 과정을 통해 일반적인 반복문을 SIMD 코드로 변환할 수 있습니다. [auto-vectorization]: https://en.wikipedia.org/wiki/Automatic_vectorization 하지만 운영체제 커널은 크기가 큰 SIMD 레지스터들을 사용하기에 문제가 있습니다. 그 이유는 하드웨어 인터럽트가 일어날 때마다 커널이 사용 중이던 레지스터들의 상태를 전부 메모리에 백업해야 하기 때문입니다. 이렇게 하지 않으면 인터럽트 되었던 프로그램의 실행이 다시 진행될 때 인터럽트 당시의 프로그램 상태를 보존할 수가 없습니다. 따라서 커널이 SIMD 레지스터들을 사용하는 경우, 커널이 백업해야 하는 데이터 양이 많이 늘어나게 되어 (512-1600 바이트) 커널의 성능이 눈에 띄게 나빠집니다. 이러한 성능 손실을 피하기 위해서 `sse` 및 `mmx` 기능을 해제하는 것이 바람직합니다 (`avx` 기능은 해제된 상태가 기본 상태입니다). 컴파일 대상 환경 설정 파일의 `features` 필드를 이용해 해당 기능들을 해제할 수 있습니다. `mmx` 및 `sse` 기능을 해제하려면 아래와 같이 해당 기능 이름 앞에 빼기 기호를 붙여주면 됩니다: ```json "features": "-mmx,-sse" ``` ## 부동소수점 (Floating Point) 우리의 입장에서는 안타깝게도, `x86_64` 아키텍처는 부동 소수점 계산에 SSE 레지스터를 사용합니다. 따라서 SSE 기능이 해제된 상태에서 부동 소수점 계산을 컴파일하면 LLVM이 오류를 일으킵니다. Rust의 core 라이브러리는 이미 부동 소수점 숫자들을 사용하기에 (예: `f32` 및 `f64` 에 대한 각종 trait들을 정의함), 우리의 커널에서 부동 소수점 계산을 피하더라도 부동 소수점 계산을 컴파일하는 것을 피할 수 없습니다. 다행히도 LLVM은 `soft-float` 기능을 지원합니다. 이 기능을 통해 정수 계만으로 모든 부동소수점 연산 결과를 모방하여 산출할 수 있습니다. 일반 부동소수점 계산보다는 느리겠지만, 이 기능을 통해 우리의 커널에서도 SSE 기능 없이 부동소수점을 사용할 수 있습니다. 우리의 커널에서 `soft-float` 기능을 사용하려면 컴파일 대상 환경 설정 파일의 `features` 필드에 덧셈 기호와 함께 해당 기능의 이름을 적어주면 됩니다: ```json "features": "-mmx,-sse,+soft-float" ``` ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.md ================================================ +++ title = "Disable SIMD" weight = 2 path = "disable-simd" template = "edition-2/extra.html" +++ [Single Instruction Multiple Data (SIMD)] instructions are able to perform an operation (e.g., addition) simultaneously on multiple data words, which can speed up programs significantly. The `x86_64` architecture supports various SIMD standards: [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD - [MMX]: The _Multi Media Extension_ instruction set was introduced in 1997 and defines eight 64-bit registers called `mm0` through `mm7`. These registers are just aliases for the registers of the [x87 floating point unit]. - [SSE]: The _Streaming SIMD Extensions_ instruction set was introduced in 1999. Instead of re-using the floating point registers, it adds a completely new register set. The sixteen new registers are called `xmm0` through `xmm15` and are 128 bits each. - [AVX]: The _Advanced Vector Extensions_ are extensions that further increase the size of the multimedia registers. The new registers are called `ymm0` through `ymm15` and are 256 bits each. They extend the `xmm` registers, so e.g. `xmm0` is the lower half of `ymm0`. [MMX]: https://en.wikipedia.org/wiki/MMX_(instruction_set) [x87 floating point unit]: https://en.wikipedia.org/wiki/X87 [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions [AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions By using such SIMD standards, programs can often speed up significantly. Good compilers are able to transform normal loops into such SIMD code automatically through a process called [auto-vectorization]. [auto-vectorization]: https://en.wikipedia.org/wiki/Automatic_vectorization However, the large SIMD registers lead to problems in OS kernels. The reason is that the kernel has to backup all registers that it uses to memory on each hardware interrupt, because they need to have their original values when the interrupted program continues. So if the kernel uses SIMD registers, it has to backup a lot more data (512–1600 bytes), which noticeably decreases performance. To avoid this performance loss, we want to disable the `sse` and `mmx` features (the `avx` feature is disabled by default). We can do that through the `features` field in our target specification. To disable the `mmx` and `sse` features, we add them prefixed with a minus: ```json "features": "-mmx,-sse" ``` ## Floating Point Unfortunately for us, the `x86_64` architecture uses SSE registers for floating point operations. Thus, every use of floating point with disabled SSE causes an error in LLVM. The problem is that Rust's core library already uses floats (e.g., it implements traits for `f32` and `f64`), so avoiding floats in our kernel does not suffice. Fortunately, LLVM has support for a `soft-float` feature that emulates all floating point operations through software functions based on normal integers. This makes it possible to use floats in our kernel without SSE; it will just be a bit slower. To turn on the `soft-float` feature for our kernel, we add it to the `features` line in our target specification, prefixed with a plus: ```json "features": "-mmx,-sse,+soft-float" ``` ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.pt-BR.md ================================================ +++ title = "Desabilitando SIMD" weight = 2 path = "pt-BR/disable-simd" template = "edition-2/extra.html" [extra] # Please update this when updating the translation translation_based_on_commit = "9d079e6d3e03359469d6cf1759bb1a196d8a11ac" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Instruções [Single Instruction Multiple Data (SIMD)] são capazes de realizar uma operação (por exemplo, adição) simultaneamente em múltiplas palavras de dados, o que pode acelerar programas significativamente. A arquitetura `x86_64` suporta vários padrões SIMD: [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD - [MMX]: O conjunto de instruções _Multi Media Extension_ foi introduzido em 1997 e define oito registradores de 64 bits chamados `mm0` até `mm7`. Esses registradores são apenas aliases para os registradores da [unidade de ponto flutuante x87]. - [SSE]: O conjunto de instruções _Streaming SIMD Extensions_ foi introduzido em 1999. Em vez de reutilizar os registradores de ponto flutuante, ele adiciona um conjunto de registradores completamente novo. Os dezesseis novos registradores são chamados `xmm0` até `xmm15` e têm 128 bits cada. - [AVX]: As _Advanced Vector Extensions_ são extensões que aumentam ainda mais o tamanho dos registradores multimídia. Os novos registradores são chamados `ymm0` até `ymm15` e têm 256 bits cada. Eles estendem os registradores `xmm`, então por exemplo `xmm0` é a metade inferior de `ymm0`. [MMX]: https://en.wikipedia.org/wiki/MMX_(instruction_set) [unidade de ponto flutuante x87]: https://en.wikipedia.org/wiki/X87 [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions [AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions Ao usar tais padrões SIMD, programas frequentemente podem acelerar significativamente. Bons compiladores são capazes de transformar loops normais em tal código SIMD automaticamente através de um processo chamado [auto-vetorização]. [auto-vetorização]: https://en.wikipedia.org/wiki/Automatic_vectorization No entanto, os grandes registradores SIMD levam a problemas em kernels de SO. A razão é que o kernel tem que fazer backup de todos os registradores que usa para a memória em cada interrupção de hardware, porque eles precisam ter seus valores originais quando o programa interrompido continua. Então, se o kernel usa registradores SIMD, ele tem que fazer backup de muito mais dados (512-1600 bytes), o que diminui notavelmente o desempenho. Para evitar esta perda de desempenho, queremos desabilitar os recursos `sse` e `mmx` (o recurso `avx` é desabilitado por padrão). Podemos fazer isso através do campo `features` na nossa especificação de alvo. Para desabilitar os recursos `mmx` e `sse`, nós os adicionamos prefixados com um menos: ```json "features": "-mmx,-sse" ``` ## Ponto Flutuante Infelizmente para nós, a arquitetura `x86_64` usa registradores SSE para operações de ponto flutuante. Assim, todo uso de ponto flutuante com SSE desabilitado causa um erro no LLVM. O problema é que a biblioteca core do Rust já usa floats (por exemplo, ela implementa traits para `f32` e `f64`), então evitar floats no nosso kernel não é suficiente. Felizmente, o LLVM tem suporte para um recurso `soft-float` que emula todas as operações de ponto flutuante através de funções de software baseadas em inteiros normais. Isso torna possível usar floats no nosso kernel sem SSE; será apenas um pouco mais lento. Para ativar o recurso `soft-float` para o nosso kernel, nós o adicionamos à linha `features` na nossa especificação de alvo, prefixado com um mais: ```json "features": "-mmx,-sse,+soft-float" ``` ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.ru.md ================================================ +++ title = "Отключение SIMD" weight = 2 path = "ru/disable-simd" template = "edition-2/extra.html" +++ Инструкции [Single Instruction Multiple Data (SIMD)] способны выполнять операцию (например, сложение) одновременно над несколькими словами данных, что может значительно ускорить работу программ. Архитектура `x86_64` поддерживает различные стандарты SIMD: [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD - [MMX]: Набор инструкций _Multi Media Extension_ был представлен в 1997 году и определяет восемь 64-битных регистров, называемых `mm0` - `mm7`. Эти регистры являются псевдонимами регистров [x87 блока с плавающей запятой][x87 floating point unit]. - [SSE]: Набор инструкций _Streaming SIMD Extensions_ был представлен в 1999 году. Вместо повторного использования регистров с плавающей запятой он добавляет совершенно новый набор регистров. Шестнадцать новых регистров называются `xmm0` - `xmm15` и имеют размер 128 бит каждый. - [AVX]: _Advanced Vector Extensions_ - это расширения, которые еще больше увеличивают размер мультимедийных регистров. Новые регистры называются `ymm0` - `ymm15` и имеют размер 256 бит каждый. Они расширяют регистры `xmm`, поэтому, например, `xmm0` - это нижняя половина `ymm0`. [MMX]: https://en.wikipedia.org/wiki/MMX_(instruction_set) [x87 floating point unit]: https://en.wikipedia.org/wiki/X87 [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions [AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions Используя такие стандарты SIMD, программы часто могут значительно ускориться. Хорошие компиляторы способны автоматически преобразовывать обычные циклы в такой SIMD-код с помощью процесса, называемого [автовекторизацией][auto-vectorization]. [auto-vectorization]: https://en.wikipedia.org/wiki/Automatic_vectorization Однако большие регистры SIMD приводят к проблемам в ядрах ОС. Причина в том, что ядро должно создавать резервные копии всех регистров, которые оно использует, в память при каждом аппаратном прерывании, потому что они должны иметь свои первоначальные значения, когда прерванная программа продолжает работу. Поэтому, если ядро использует SIMD-регистры, ему приходится резервировать гораздо больше данных (512-1600 байт), что заметно снижает производительность. Чтобы избежать этого снижения производительности, мы хотим отключить функции `sse` и `mmx` (функция `avx` отключена по умолчанию). Мы можем сделать это через поле `features` в нашей целевой спецификации. Чтобы отключить функции `mmx` и `sse`, мы добавим их с минусом: ```json "features": "-mmx,-sse" ``` ## Числа с плавающей точкой К сожалению для нас, архитектура `x86_64` использует регистры SSE для операций с числами с плавающей точкой. Таким образом, каждое использование чисел с плавающей точкой с отключенным SSE вызовёт ошибку в LLVM. Проблема в том, что библиотека `core` уже использует числа с плавающей точкой (например, в ней реализованы трейты для `f32` и `f64`), поэтому недостаточно избегать чисел с плавающей точкой в нашем ядре. К счастью, LLVM поддерживает функцию `soft-float`, эмулирующую все операции с числавами с плавающей точкой через программные функции, основанные на обычных целых числах. Это позволяет использовать плавающие числа в нашем ядре без SSE, просто это будет немного медленнее. Чтобы включить функцию `soft-float` для нашего ядра, мы добавим ее в строку `features` в спецификации цели с префиксом плюс: ```json "features": "-mmx,-sse,+soft-float" ``` ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.zh-CN.md ================================================ +++ title = "Disable SIMD" weight = 2 path = "zh-CN/disable-simd" template = "edition-2/extra.html" +++ [单指令多数据][Single Instruction Multiple Data (SIMD)] 指令允许在一个操作符(比如加法)内传入多组数据,以此加速程序执行速度。`x86_64` 架构支持多种SIMD标准: [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD - [MMX]: _多媒体扩展_ 指令集于1997年发布,定义了8个64位寄存器,分别被称为 `mm0` 到 `mm7`,不过,这些寄存器只是 [x87浮点执行单元][x87 floating point unit] 中寄存器的映射而已。 - [SSE]: _流处理SIMD扩展_ 指令集于1999年发布,不同于MMX的复用浮点执行单元,该指令集加入了一个完整的新寄存器组,即被称为 `xmm0` 到 `xmm15` 的16个128位寄存器。 - [AVX]: _先进矢量扩展_ 用于进一步扩展多媒体寄存器的数量,它定义了 `ymm0` 到 `ymm15` 共16个256位寄存器,但是这些寄存器继承于 `xmm`,例如 `xmm0` 寄存器是 `ymm0` 的低128位。 [MMX]: https://en.wikipedia.org/wiki/MMX_(instruction_set) [x87 floating point unit]: https://en.wikipedia.org/wiki/X87 [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions [AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions 通过应用这些SIMD标准,计算机程序可以显著提高执行速度。优秀的编译器可以将常规循环自动优化为适用SIMD的代码,这种优化技术被称为 [自动矢量化][auto-vectorization]。 [auto-vectorization]: https://en.wikipedia.org/wiki/Automatic_vectorization 尽管如此,SIMD会让操作系统内核出现一些问题。具体来说,就是操作系统在处理硬件中断时,需要保存所有寄存器信息到内存中,在中断结束后再将其恢复以供使用。所以说,如果内核需要使用SIMD寄存器,那么每次处理中断需要备份非常多的数据(512-1600字节),这会显著地降低性能。要避免这部分性能损失,我们需要禁用 `sse` 和 `mmx` 这两个特性(`avx` 默认已禁用)。 我们可以在编译配置文件中的 `features` 配置项做出如下修改,加入以减号为前缀的 `mmx` 和 `sse` 即可: ```json "features": "-mmx,-sse" ``` ## 浮点数 还有一件不幸的事,`x86_64` 架构在处理浮点数计算时,会用到 `sse` 寄存器,因此,禁用SSE的前提下使用浮点数计算LLVM都一定会报错。 更大的问题在于Rust核心库里就存在着为数不少的浮点数运算(如 `f32` 和 `f64` 的数个trait),所以试图避免使用浮点数是不可能的。 幸运的是,LLVM支持 `soft-float` 特性,这个特性可以使用整型运算在软件层面模拟浮点数运算,使得我们为内核关闭SSE成为了可能,只需要牺牲一点点性能。 要为内核打开 `soft-float` 特性,我们只需要在编译配置文件中的 `features` 配置项做出如下修改即可: ```json "features": "-mmx,-sse,+soft-float" ``` ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/index.es.md ================================================ +++ title = "Un Kernel Mínimo en Rust" weight = 2 path = "es/minimal-rust-kernel" date = 2018-02-10 [extra] chapter = "Bare Bones" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ En esta publicación, crearemos un kernel mínimo de 64 bits en Rust para la arquitectura x86. Partiremos del [un binario Rust autónomo] de la publicación anterior para crear una imagen de disco arrancable que imprima algo en la pantalla. [un binario Rust autónomo]: @/edition-2/posts/01-freestanding-rust-binary/index.md Este blog se desarrolla abiertamente en [GitHub]. Si tienes problemas o preguntas, por favor abre un issue ahí. También puedes dejar comentarios [al final]. El código fuente completo para esta publicación se encuentra en la rama [`post-02`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-02 ## El Proceso de Arranque {#el-proceso-de-arranque} Cuando enciendes una computadora, comienza a ejecutar código de firmware almacenado en la [ROM] de la placa madre. Este código realiza una [prueba automática de encendido], detecta la memoria RAM disponible y preinicializa la CPU y el hardware. Después, busca un disco arrancable y comienza a cargar el kernel del sistema operativo. [ROM]: https://en.wikipedia.org/wiki/Read-only_memory [prueba automática de encendido]: https://en.wikipedia.org/wiki/Power-on_self-test En x86, existen dos estándares de firmware: el “Sistema Básico de Entrada/Salida” (**[BIOS]**) y la más reciente “Interfaz de Firmware Extensible Unificada” (**[UEFI]**). El estándar BIOS es antiguo y está desactualizado, pero es simple y está bien soportado en cualquier máquina x86 desde los años 80. UEFI, en contraste, es más moderno y tiene muchas más funciones, pero es más complejo de configurar (al menos en mi opinión). [BIOS]: https://en.wikipedia.org/wiki/BIOS [UEFI]: https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface Actualmente, solo proporcionamos soporte para BIOS, pero también planeamos agregar soporte para UEFI. Si te gustaría ayudarnos con esto, revisa el [issue en Github](https://github.com/phil-opp/blog_os/issues/349). ### Arranque con BIOS Casi todos los sistemas x86 tienen soporte para arranque con BIOS, incluyendo máquinas más recientes basadas en UEFI que usan un BIOS emulado. Esto es excelente, porque puedes usar la misma lógica de arranque en todas las máquinas del último siglo. Sin embargo, esta amplia compatibilidad también es la mayor desventaja del arranque con BIOS, ya que significa que la CPU se coloca en un modo de compatibilidad de 16 bits llamado [modo real] antes de arrancar, para que los bootloaders arcaicos de los años 80 sigan funcionando. Pero comencemos desde el principio: Cuando enciendes una computadora, carga el BIOS desde una memoria flash especial ubicada en la placa madre. El BIOS ejecuta rutinas de autoprueba e inicialización del hardware, y luego busca discos arrancables. Si encuentra uno, transfiere el control a su _bootloader_ (_cargador de arranque_), que es una porción de código ejecutable de 512 bytes almacenada al inicio del disco. La mayoría de los bootloaders son más grandes que 512 bytes, por lo que suelen dividirse en una pequeña primera etapa, que cabe en esos 512 bytes, y una segunda etapa que se carga posteriormente. El bootloader debe determinar la ubicación de la imagen del kernel en el disco y cargarla en la memoria. Tambien necesita cambiar la CPU del [modo real] de 16 bits primero al [modo protegido] de 32 bits, y luego al [modo largo] de 64 bits, donde están disponibles los registros de 64 bits y toda la memoria principal. Su tercera tarea es consultar cierta información (como un mapa de memoria) desde el BIOS y pasársela al kernel del sistema operativo. [modo real]: https://en.wikipedia.org/wiki/Real_mode [modo protegido]: https://en.wikipedia.org/wiki/Protected_mode [modo largo]: https://en.wikipedia.org/wiki/Long_mode [segmentación de memoria]: https://en.wikipedia.org/wiki/X86_memory_segmentation Escribir un bootloader es un poco tedioso, ya que requiere lenguaje ensamblador y muchos pasos poco claros como “escribir este valor mágico en este registro del procesador”. Por ello, no cubrimos la creación de bootloaders en este artículo y en su lugar proporcionamos una herramienta llamada [bootimage] que automatiza el proceso de creación de un bootloader. [bootimage]: https://github.com/rust-osdev/bootimage Si te interesa construir tu propio bootloader: ¡Estén atentos! Un conjunto de artículos sobre este tema está en camino. #### El Estándar Multiboot Para evitar que cada sistema operativo implemente su propio bootloader, que sea compatible solo con un único sistema, la [Free Software Foundation] creó en 1995 un estándar abierto de bootloaders llamado [Multiboot]. El estándar define una interfaz entre el bootloader y el sistema operativo, de modo que cualquier bootloader compatible con Multiboot pueda cargar cualquier sistema operativo compatible con Multiboot. La implementación de referencia es [GNU GRUB], que es el bootloader más popular para sistemas Linux. [Free Software Foundation]: https://en.wikipedia.org/wiki/Free_Software_Foundation [Multiboot]: https://wiki.osdev.org/Multiboot [GNU GRUB]: https://en.wikipedia.org/wiki/GNU_GRUB Para hacer un kernel compatible con Multiboot, solo necesitas insertar un llamado [encabezado Multiboot] al inicio del archivo del kernel. Esto hace que arrancar un sistema operativo desde GRUB sea muy sencillo. Sin embargo, GRUB y el estándar Multiboot también tienen algunos problemas: [encabezado Multiboot]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format - Solo soportan el modo protegido de 32 bits. Esto significa que aún tienes que configurar la CPU para cambiar al modo largo de 64 bits. - Están diseñados para simplificar el cargador de arranque en lugar del kernel. Por ejemplo, el kernel necesita vincularse con un [tamaño de página predeterminado ajustado], porque GRUB no puede encontrar el encabezado Multiboot de otro modo. Otro ejemplo es que la [información de arranque], que se pasa al kernel, contiene muchas estructuras dependientes de la arquitectura en lugar de proporcionar abstracciones limpias. - Tanto GRUB como el estándar Multiboot están escasamente documentados. - GRUB necesita instalarse en el sistema host para crear una imagen de disco arrancable a partir del archivo del kernel. Esto dificulta el desarrollo en Windows o Mac. [tamaño de página predeterminado ajustado]: https://wiki.osdev.org/Multiboot#Multiboot_2 [información de arranque]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format Debido a estas desventajas, decidimos no usar GRUB ni el estándar Multiboot. Sin embargo, planeamos agregar soporte para Multiboot a nuestra herramienta [bootimage], para que sea posible cargar tu kernel en un sistema GRUB también. Si te interesa escribir un kernel compatible con Multiboot, revisa la [primera edición] de esta serie de blogs. [primera edición]: @/edition-1/_index.md ### UEFI (Por el momento no proporcionamos soporte para UEFI, ¡pero nos encantaría hacerlo! Si deseas ayudar, por favor háznoslo saber en el [issue de Github](https://github.com/phil-opp/blog_os/issues/349).) ## Un Kernel Mínimo Ahora que tenemos una idea general de cómo arranca una computadora, es momento de crear nuestro propio kernel mínimo. Nuestro objetivo es crear una imagen de disco que, al arrancar, imprima “Hello World!” en la pantalla. Para esto, extendemos el [un binario Rust autónomo] del artículo anterior. Como recordarás, construimos el binario independiente mediante `cargo`, pero dependiendo del sistema operativo, necesitábamos diferentes nombres de punto de entrada y banderas de compilación. Esto se debe a que `cargo` construye por defecto para el _sistema anfitrión_, es decir, el sistema en el que estás ejecutando el comando. Esto no es lo que queremos para nuestro kernel, ya que un kernel que funcione encima, por ejemplo, de Windows, no tiene mucho sentido. En su lugar, queremos compilar para un _sistema destino_ claramente definido. ### Instalación de Rust Nightly {#instalacion-de-rust-nightly} Rust tiene tres canales de lanzamiento: _stable_, _beta_ y _nightly_. El libro de Rust explica muy bien la diferencia entre estos canales, así que tómate un momento para [revisarlo](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#choo-choo-release-channels-and-riding-the-trains). Para construir un sistema operativo, necesitaremos algunas características experimentales que solo están disponibles en el canal nightly, por lo que debemos instalar una versión nightly de Rust. Para administrar instalaciones de Rust, recomiendo ampliamente [rustup]. Este permite instalar compiladores nightly, beta y estable lado a lado, y facilita mantenerlos actualizados. Con rustup, puedes usar un compilador nightly en el directorio actual ejecutando `rustup override set nightly`. Alternativamente, puedes agregar un archivo llamado `rust-toolchain` con el contenido `nightly` en el directorio raíz del proyecto. Puedes verificar que tienes una versión nightly instalada ejecutando `rustc --version`: el número de versión debería contener `-nightly` al final. [rustup]: https://www.rustup.rs/ El compilador nightly nos permite activar varias características experimentales utilizando las llamadas _banderas de características_ al inicio de nuestro archivo. Por ejemplo, podríamos habilitar el macro experimental [`asm!`] para ensamblador en línea agregando `#![feature(asm)]` en la parte superior de nuestro archivo `main.rs`. Ten en cuenta que estas características experimentales son completamente inestables, lo que significa que futuras versiones de Rust podrían cambiarlas o eliminarlas sin previo aviso. Por esta razón, solo las utilizaremos si son absolutamente necesarias. [`asm!`]: https://doc.rust-lang.org/stable/reference/inline-assembly.html ### Especificación del Objetivo Cargo soporta diferentes sistemas destino mediante el parámetro `--target`. El destino se describe mediante un _[tripleta de destino]_, que especifica la arquitectura de la CPU, el proveedor, el sistema operativo y el [ABI]. Por ejemplo, el tripleta de destino `x86_64-unknown-linux-gnu` describe un sistema con una CPU `x86_64`, sin un proveedor claro, y un sistema operativo Linux con el ABI GNU. Rust soporta [muchas tripleta de destino diferentes][platform-support], incluyendo `arm-linux-androideabi` para Android o [`wasm32-unknown-unknown` para WebAssembly](https://www.hellorust.com/setup/wasm-target/). [tripleta de destino]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [ABI]: https://stackoverflow.com/a/2456882 [platform-support]: https://forge.rust-lang.org/release/platform-support.html [custom-targets]: https://doc.rust-lang.org/nightly/rustc/targets/custom.html Para nuestro sistema destino, sin embargo, requerimos algunos parámetros de configuración especiales (por ejemplo, sin un sistema operativo subyacente), por lo que ninguno de los [tripletas de destino existentes][platform-support] encaja. Afortunadamente, Rust nos permite definir [nuestros propios objetivos][custom-targets] mediante un archivo JSON. Por ejemplo, un archivo JSON que describe el objetivo `x86_64-unknown-linux-gnu` se ve así: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "linux", "executables": true, "linker-flavor": "gcc", "pre-link-args": ["-m64"], "morestack": false } ``` La mayoría de los campos son requeridos por LLVM para generar código para esa plataforma. Por ejemplo, el campo [`data-layout`] define el tamaño de varios tipos de enteros, números de punto flotante y punteros. Luego, hay campos que Rust utiliza para la compilación condicional, como `target-pointer-width`. El tercer tipo de campo define cómo debe construirse el crate. Por ejemplo, el campo `pre-link-args` especifica argumentos que se pasan al [linker]. [`data-layout`]: https://llvm.org/docs/LangRef.html#data-layout [linker]: https://en.wikipedia.org/wiki/Linker_(computing) Nuestro kernel también tiene como objetivo los sistemas `x86_64`, por lo que nuestra especificación de objetivo será muy similar a la anterior. Comencemos creando un archivo llamado `x86_64-blog_os.json` (puedes elegir el nombre que prefieras) con el siguiente contenido común: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true } ``` Ten en cuenta que cambiamos el sistema operativo en el campo `llvm-target` y en el campo `os` a `none`, porque nuestro kernel se ejecutará directamente sobre hardware sin un sistema operativo subyacente. Agregamos las siguientes entradas relacionadas con la construcción: ```json "linker-flavor": "ld.lld", "linker": "rust-lld", ``` En lugar de usar el enlazador predeterminado de la plataforma (que podría no soportar objetivos de Linux), utilizamos el enlazador multiplataforma [LLD] que se incluye con Rust para enlazar nuestro kernel. [LLD]: https://lld.llvm.org/ ```json "panic-strategy": "abort", ``` Esta configuración especifica que el objetivo no soporta [stack unwinding] en caso de un pánico, por lo que el programa debería abortar directamente. Esto tiene el mismo efecto que la opción `panic = "abort"` en nuestro archivo Cargo.toml, por lo que podemos eliminarla de ahí. (Ten en cuenta que, a diferencia de la opción en Cargo.toml, esta opción del destino también se aplica cuando recompilamos la biblioteca `core` más adelante en este artículo. Por lo tanto, incluso si prefieres mantener la opción en Cargo.toml, asegúrate de incluir esta opción.) [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php ```json "disable-redzone": true, ``` Estamos escribiendo un kernel, por lo que en algún momento necesitaremos manejar interrupciones. Para hacerlo de manera segura, debemos deshabilitar una optimización del puntero de pila llamada _“red zone”_, ya que de lo contrario podría causar corrupción en la pila. Para más información, consulta nuestro artículo sobre [cómo deshabilitar la red zone]. [deshabilitar la red zone]: @/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.md ```json "features": "-mmx,-sse,+soft-float", ``` El campo `features` habilita o deshabilita características del destinos. Deshabilitamos las características `mmx` y `sse` anteponiéndoles un signo menos y habilitamos la característica `soft-float` anteponiéndole un signo más. Ten en cuenta que no debe haber espacios entre las diferentes banderas, ya que de lo contrario LLVM no podrá interpretar correctamente la cadena de características. Las características `mmx` y `sse` determinan el soporte para instrucciones [Single Instruction Multiple Data (SIMD)], que a menudo pueden acelerar significativamente los programas. Sin embargo, el uso de los registros SIMD en kernels de sistemas operativos genera problemas de rendimiento. Esto se debe a que el kernel necesita restaurar todos los registros a su estado original antes de continuar un programa interrumpido. Esto implica que el kernel debe guardar el estado completo de SIMD en la memoria principal en cada llamada al sistema o interrupción de hardware. Dado que el estado SIMD es muy grande (512–1600 bytes) y las interrupciones pueden ocurrir con mucha frecuencia, estas operaciones adicionales de guardar/restaurar afectan considerablemente el rendimiento. Para evitar esto, deshabilitamos SIMD para nuestro kernel (pero no para las aplicaciones que se ejecutan encima). [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD Un problema al deshabilitar SIMD es que las operaciones de punto flotante en `x86_64` requieren registros SIMD por defecto. Para resolver este problema, agregamos la característica `soft-float`, que emula todas las operaciones de punto flotante mediante funciones de software basadas en enteros normales. Para más información, consulta nuestro artículo sobre [cómo deshabilitar SIMD](@/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.md). #### Juntándolo Todo Nuestro archivo de especificación de objetivo ahora se ve así: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true, "linker-flavor": "ld.lld", "linker": "rust-lld", "panic-strategy": "abort", "disable-redzone": true, "features": "-mmx,-sse,+soft-float" } ``` ### Construyendo nuestro Kernel Compilar para nuestro nuevo objetivo usará convenciones de Linux, ya que la opción de enlazador `ld.lld` instruye a LLVM a compilar con la bandera `-flavor gnu` (para más opciones del enlazador, consulta [la documentación de rustc](https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-flavor)). Esto significa que necesitamos un punto de entrada llamado `_start`, como se describió en el [artículo anterior]: [artículo anterior]: @/edition-2/posts/01-freestanding-rust-binary/index.md ```rust // src/main.rs #![no_std] // no enlazar con la biblioteca estándar de Rust #![no_main] // deshabilitar todos los puntos de entrada a nivel de Rust use core::panic::PanicInfo; /// Esta función se llama cuando ocurre un pánico. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } #[unsafe(no_mangle)] // no modificar el nombre de esta función pub extern "C" fn _start() -> ! { // esta función es el punto de entrada, ya que el enlazador busca una función // llamada `_start` por defecto loop {} } ``` Ten en cuenta que el punto de entrada debe llamarse `_start` sin importar el sistema operativo anfitrión. Ahora podemos construir el kernel para nuestro nuevo objetivo pasando el nombre del archivo JSON como `--target`: ``` > cargo build --target x86_64-blog_os.json error: `.json` target specs require -Zjson-target-spec ``` ¡Falla! El error nos indica que las especificaciones de objetivo JSON personalizadas son una característica inestable que requiere habilitación explícita. Esto se debe a que el formato de los archivos JSON de objetivo aún no se considera estable, por lo que podrían ocurrir cambios en futuras versiones de Rust. Consulta el [issue de seguimiento para especificaciones de objetivo JSON personalizadas][json-target-spec-issue] para más información. [json-target-spec-issue]: https://github.com/rust-lang/rust/issues/151528 #### La Opción `json-target-spec` Para habilitar el soporte para especificaciones de objetivo JSON personalizadas, necesitamos crear un archivo de configuración local de [cargo] en `.cargo/config.toml` (la carpeta `.cargo` debería estar junto a tu carpeta `src`) con el siguiente contenido: [cargo]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # en .cargo/config.toml [unstable] json-target-spec = true ``` Esto habilita la característica inestable `json-target-spec`, permitiéndonos usar archivos JSON de objetivo personalizados. Con esta configuración en su lugar, intentemos construir nuevamente: ``` > cargo build --target x86_64-blog_os.json error[E0463]: can't find crate for `core` ``` ¡Ahora vemos un error diferente! El error nos indica que el compilador de Rust ya no encuentra la [biblioteca `core`]. Esta biblioteca contiene tipos básicos de Rust como `Result`, `Option` e iteradores, y se vincula implícitamente a todos los crates con `no_std`. [biblioteca `core`]: https://doc.rust-lang.org/nightly/core/index.html El problema es que la biblioteca `core` se distribuye junto con el compilador de Rust como una biblioteca _precompilada_. Por lo tanto, solo es válida para tripletas de anfitrión soportados (por ejemplo, `x86_64-unknown-linux-gnu`), pero no para nuestro objetivo personalizado. Si queremos compilar código para otros objetivos, necesitamos recompilar `core` para esos objetivos primero. #### La Opción `build-std` Aquí es donde entra en juego la característica [`build-std`] de cargo. Esta permite recompilar `core` y otras bibliotecas estándar bajo demanda, en lugar de usar las versiones precompiladas que vienen con la instalación de Rust. Esta característica es muy nueva y aún no está terminada, por lo que está marcada como "inestable" y solo está disponible en los [compiladores de Rust nightly]. [`build-std`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std [compiladores de Rust nightly]: #instalacion-de-rust-nightly Para usar esta característica, necesitamos añadir lo siguiente a nuestro archivo de configuración de [cargo] en `.cargo/config.toml`: ```toml # en .cargo/config.toml [unstable] json-target-spec = true build-std = ["core", "compiler_builtins"] ``` Esto le indica a cargo que debe recompilar las bibliotecas `core` y `compiler_builtins`. Esta última es necesaria porque es una dependencia de `core`. Para poder recompilar estas bibliotecas, cargo necesita acceso al código fuente de Rust, el cual podemos instalar ejecutando `rustup component add rust-src`.
    **Nota:** La clave de configuración `unstable.build-std` requiere al menos la versión de Rust nightly del 15 de julio de 2020.
    Después de configurar la clave `unstable.build-std` e instalar el componente `rust-src`, podemos ejecutar nuevamente nuestro comando de construcción: ``` > cargo build --target x86_64-blog_os.json Compiling core v0.0.0 (/…/rust/src/libcore) Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) Compiling compiler_builtins v0.1.32 Compiling blog_os v0.1.0 (/…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` Vemos que `cargo build` ahora recompila las bibliotecas `core`, `rustc-std-workspace-core` (una dependencia de `compiler_builtins`) y `compiler_builtins` para nuestro objetivo personalizado. #### Intrínsecos Relacionados con la Memoria El compilador de Rust asume que un cierto conjunto de funciones integradas está disponible para todos los sistemas. La mayoría de estas funciones son proporcionadas por el crate `compiler_builtins`, que acabamos de recompilar. Sin embargo, hay algunas funciones relacionadas con la memoria en ese crate que no están habilitadas por defecto, ya que normalmente son proporcionadas por la biblioteca C del sistema. Estas funciones incluyen `memset`, que establece todos los bytes de un bloque de memoria a un valor dado, `memcpy`, que copia un bloque de memoria a otro, y `memcmp`, que compara dos bloques de memoria. Aunque no necesitamos estas funciones para compilar nuestro kernel en este momento, serán necesarias tan pronto como agreguemos más código (por ejemplo, al copiar estructuras). Dado que no podemos vincularnos a la biblioteca C del sistema operativo, necesitamos una forma alternativa de proporcionar estas funciones al compilador. Una posible solución podría ser implementar nuestras propias funciones `memset`, `memcpy`, etc., y aplicarles el atributo `#[unsafe(no_mangle)]` (para evitar el renombramiento automático durante la compilación). Sin embargo, esto es peligroso, ya que el más mínimo error en la implementación de estas funciones podría conducir a un comportamiento indefinido. Por ejemplo, implementar `memcpy` con un bucle `for` podría resultar en una recursión infinita, ya que los bucles `for` llaman implícitamente al método del trait [`IntoIterator::into_iter`], que podría invocar nuevamente a `memcpy`. Por lo tanto, es una buena idea reutilizar implementaciones existentes y bien probadas. [`IntoIterator::into_iter`]: https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter Afortunadamente, el crate `compiler_builtins` ya contiene implementaciones para todas las funciones necesarias, pero están deshabilitadas por defecto para evitar conflictos con las implementaciones de la biblioteca C. Podemos habilitarlas configurando la bandera [`build-std-features`] de cargo como `["compiler-builtins-mem"]`. Al igual que la bandera `build-std`, esta bandera puede pasarse como un flag `-Z` en la línea de comandos o configurarse en la tabla `unstable` en el archivo `.cargo/config.toml`. Dado que siempre queremos compilar con esta bandera, la opción de archivo de configuración tiene más sentido para nosotros: [`build-std-features`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features ```toml # en .cargo/config.toml [unstable] json-target-spec = true build-std-features = ["compiler-builtins-mem"] build-std = ["core", "compiler_builtins"] ``` (El soporte para la característica `compiler-builtins-mem` fue [añadido muy recientemente](https://github.com/rust-lang/rust/pull/77284), por lo que necesitas al menos Rust nightly `2020-09-30` para usarla). Detrás de escena, esta bandera habilita la [característica `mem`] del crate `compiler_builtins`. El efecto de esto es que el atributo `#[unsafe(no_mangle)]` se aplica a las [implementaciones de `memcpy`, etc.] del crate, lo que las hace disponibles para el enlazador. [característica `mem`]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L54-L55 [implementaciones de `memcpy`, etc.]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69 Con este cambio, nuestro kernel tiene implementaciones válidas para todas las funciones requeridas por el compilador, por lo que continuará compilándose incluso si nuestro código se vuelve más complejo. #### Configurar un Objetivo Predeterminado Para evitar pasar el parámetro `--target` en cada invocación de `cargo build`, podemos sobrescribir el objetivo predeterminado. Para hacer esto, añadimos lo siguiente a nuestro archivo de [configuración de cargo] en `.cargo/config.toml`: [configuración de cargo]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # en .cargo/config.toml [build] target = "x86_64-blog_os.json" ``` Esto le indica a `cargo` que use nuestro objetivo `x86_64-blog_os.json` cuando no se pase explícitamente el argumento `--target`. Esto significa que ahora podemos construir nuestro kernel con un simple `cargo build`. Para más información sobre las opciones de configuración de cargo, consulta la [documentación oficial][configuración de cargo]. Ahora podemos construir nuestro kernel para un objetivo bare metal con un simple `cargo build`. Sin embargo, nuestro punto de entrada `_start`, que será llamado por el cargador de arranque, aún está vacío. Es momento de mostrar algo en la pantalla desde ese punto. ### Imprimiendo en Pantalla La forma más sencilla de imprimir texto en la pantalla en esta etapa es usando el [búfer de texto VGA]. Es un área de memoria especial mapeada al hardware VGA que contiene el contenido mostrado en pantalla. Normalmente consta de 25 líneas, cada una con 80 celdas de caracteres. Cada celda de carácter muestra un carácter ASCII con algunos colores de primer plano y fondo. La salida en pantalla se ve así: [búfer de texto VGA]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode ![salida en pantalla para caracteres ASCII comunes](https://upload.wikimedia.org/wikipedia/commons/f/f8/Codepage-437.png) Discutiremos el diseño exacto del búfer VGA en el próximo artículo, donde escribiremos un primer controlador pequeño para él. Para imprimir “Hello World!”, solo necesitamos saber que el búfer está ubicado en la dirección `0xb8000` y que cada celda de carácter consta de un byte ASCII y un byte de color. La implementación se ve así: ```rust static HELLO: &[u8] = b"Hello World!"; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { let vga_buffer = 0xb8000 as *mut u8; for (i, &byte) in HELLO.iter().enumerate() { unsafe { *vga_buffer.offset(i as isize * 2) = byte; *vga_buffer.offset(i as isize * 2 + 1) = 0xb; } } loop {} } ``` Primero, convertimos el entero `0xb8000` en un [raw pointer]. Luego, [iteramos] sobre los bytes de la [cadena de bytes estática] `HELLO`. Usamos el método [`enumerate`] para obtener adicionalmente una variable de conteo `i`. En el cuerpo del bucle `for`, utilizamos el método [`offset`] para escribir el byte de la cadena y el byte de color correspondiente (`0xb` representa un cian claro). [iteramos]: https://doc.rust-lang.org/stable/book/ch13-02-iterators.html [raw pointer]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [estática]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate [cadena de bytes]: https://doc.rust-lang.org/reference/tokens.html#byte-string-literals [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset Ten en cuenta que hay un bloque [`unsafe`] alrededor de todas las escrituras de memoria. Esto se debe a que el compilador de Rust no puede probar que los punteros crudos que creamos son válidos. Podrían apuntar a cualquier lugar y causar corrupción de datos. Al poner estas operaciones en un bloque `unsafe`, básicamente le decimos al compilador que estamos absolutamente seguros de que las operaciones son válidas. Sin embargo, un bloque `unsafe` no desactiva las verificaciones de seguridad de Rust; simplemente permite realizar [cinco operaciones adicionales]. [`unsafe`]: https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html [cinco operaciones adicionales]: https://doc.rust-lang.org/stable/book/ch20-01-unsafe-rust.html#unsafe-superpowers Quiero enfatizar que **esta no es la forma en que queremos hacer las cosas en Rust**. Es muy fácil cometer errores al trabajar con punteros crudos dentro de bloques `unsafe`. Por ejemplo, podríamos escribir más allá del final del búfer si no somos cuidadosos. Por lo tanto, queremos minimizar el uso de `unsafe` tanto como sea posible. Rust nos permite lograr esto creando abstracciones seguras. Por ejemplo, podríamos crear un tipo de búfer VGA que encapsule toda la inseguridad y garantice que sea _imposible_ hacer algo incorrecto desde el exterior. De esta manera, solo necesitaríamos cantidades mínimas de código `unsafe` y podríamos estar seguros de no violar la [seguridad de la memoria]. Crearemos una abstracción segura para el búfer VGA en el próximo artículo. [seguridad de la memoria]: https://en.wikipedia.org/wiki/Memory_safety ## Ejecutando Nuestro Kernel Ahora que tenemos un ejecutable que realiza algo perceptible, es momento de ejecutarlo. Primero, necesitamos convertir nuestro kernel compilado en una imagen de disco arrancable vinculándolo con un cargador de arranque. Luego, podemos ejecutar la imagen de disco en la máquina virtual [QEMU] o iniciarla en hardware real usando una memoria USB. ### Creando una Bootimage Para convertir nuestro kernel compilado en una imagen de disco arrancable, debemos vincularlo con un cargador de arranque. Como aprendimos en la [sección sobre el proceso de arranque], el cargador de arranque es responsable de inicializar la CPU y cargar nuestro kernel. [sección sobre el proceso de arranque]: #el-proceso-de-arranque En lugar de escribir nuestro propio cargador de arranque, lo cual es un proyecto en sí mismo, usamos el crate [`bootloader`]. Este crate implementa un cargador de arranque básico para BIOS sin dependencias en C, solo Rust y ensamblador en línea. Para usarlo y arrancar nuestro kernel, necesitamos agregarlo como dependencia: [`bootloader`]: https://crates.io/crates/bootloader ```toml # en Cargo.toml [dependencies] bootloader = "0.9" ``` **Nota:** Este artículo solo es compatible con `bootloader v0.9`. Las versiones más recientes usan un sistema de construcción diferente y generarán errores de compilación al seguir este artículo. Agregar el bootloader como dependencia no es suficiente para crear una imagen de disco arrancable. El problema es que necesitamos vincular nuestro kernel con el bootloader después de la compilación, pero cargo no tiene soporte para [scripts post-compilación]. [scripts post-compilación]: https://github.com/rust-lang/cargo/issues/545 Para resolver este problema, creamos una herramienta llamada `bootimage` que primero compila el kernel y el bootloader, y luego los vincula para crear una imagen de disco arrancable. Para instalar esta herramienta, dirígete a tu directorio de inicio (o cualquier directorio fuera de tu proyecto de cargo) y ejecuta el siguiente comando en tu terminal: ``` cargo install bootimage ``` Para ejecutar `bootimage` y compilar el bootloader, necesitas tener instalado el componente `llvm-tools-preview` de rustup. Puedes hacerlo ejecutando el comando correspondiente. Después de instalar `bootimage` y agregar el componente `llvm-tools-preview`, puedes crear una imagen de disco arrancable regresando al directorio de tu proyecto de cargo y ejecutando: ``` > cargo bootimage ``` Vemos que la herramienta recompila nuestro kernel usando `cargo build`, por lo que automáticamente aplicará cualquier cambio que realices. Después, compila el bootloader, lo cual puede tardar un poco. Como ocurre con todas las dependencias de los crates, solo se compila una vez y luego se almacena en caché, por lo que las compilaciones posteriores serán mucho más rápidas. Finalmente, `bootimage` combina el bootloader y tu kernel en una imagen de disco arrancable. Después de ejecutar el comando, deberías ver una imagen de disco arrancable llamada `bootimage-blog_os.bin` en tu directorio `target/x86_64-blog_os/debug`. Puedes arrancarla en una máquina virtual o copiarla a una unidad USB para arrancarla en hardware real. (Ten en cuenta que esta no es una imagen de CD, que tiene un formato diferente, por lo que grabarla en un CD no funcionará). #### ¿Cómo funciona? La herramienta `bootimage` realiza los siguientes pasos detrás de escena: - Compila nuestro kernel en un archivo [ELF]. - Compila la dependencia del bootloader como un ejecutable independiente. - Vincula los bytes del archivo ELF del kernel con el bootloader. [ELF]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format [rust-osdev/bootloader]: https://github.com/rust-osdev/bootloader Al arrancar, el bootloader lee y analiza el archivo ELF anexado. Luego, mapea los segmentos del programa a direcciones virtuales en las tablas de páginas, inicializa a cero la sección `.bss` y configura una pila. Finalmente, lee la dirección del punto de entrada (nuestra función `_start`) y salta a ella. ### Arrancando en QEMU Ahora podemos arrancar la imagen de disco en una máquina virtual. Para arrancarla en [QEMU], ejecuta el comando correspondiente. [QEMU]: https://www.qemu.org/ ``` > qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin ``` Esto abre una ventana separada que debería verse similar a esto: ![QEMU mostrando "Hello World!"](qemu.png) Vemos que nuestro "Hello World!" es visible en la pantalla. ### Máquina Real También es posible escribir la imagen a una memoria USB y arrancarla en una máquina real, **pero ten mucho cuidado** al elegir el nombre correcto del dispositivo, porque **todo en ese dispositivo será sobrescrito**: ``` > dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX && sync ``` Donde `sdX` es el nombre del dispositivo de tu memoria USB. Después de escribir la imagen en la memoria USB, puedes ejecutarla en hardware real iniciando desde ella. Probablemente necesitarás usar un menú de arranque especial o cambiar el orden de arranque en la configuración del BIOS para iniciar desde la memoria USB. Ten en cuenta que actualmente no funciona para máquinas UEFI, ya que el crate `bootloader` aún no tiene soporte para UEFI. ### Usando `cargo run` Para facilitar la ejecución de nuestro kernel en QEMU, podemos configurar la clave de configuración `runner` para cargo: ```toml # en .cargo/config.toml [target.'cfg(target_os = "none")'] runner = "bootimage runner" ``` La tabla `target.'cfg(target_os = "none")'` se aplica a todos los objetivos cuyo campo `"os"` en el archivo de configuración del objetivo esté configurado como `"none"`. Esto incluye nuestro objetivo `x86_64-blog_os.json`. La clave `runner` especifica el comando que debe ejecutarse para `cargo run`. El comando se ejecuta después de una compilación exitosa, con la ruta del ejecutable pasada como el primer argumento. Consulta la [documentación de cargo][configuración de cargo] para más detalles. El comando `bootimage runner` está específicamente diseñado para ser utilizado como un ejecutable `runner`. Vincula el ejecutable dado con la dependencia del bootloader del proyecto y luego lanza QEMU. Consulta el [README de `bootimage`] para más detalles y posibles opciones de configuración. [README de `bootimage`]: https://github.com/rust-osdev/bootimage Ahora podemos usar `cargo run` para compilar nuestro kernel e iniciarlo en QEMU. ## ¿Qué sigue? En el próximo artículo, exploraremos el búfer de texto VGA con más detalle y escribiremos una interfaz segura para él. También añadiremos soporte para el macro `println`. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/index.fa.md ================================================ +++ title = "یک هسته مینیمال با Rust" weight = 2 path = "fa/minimal-rust-kernel" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "7212ffaa8383122b1eb07fe1854814f99d2e1af4" # GitHub usernames of the people that translated this post translators = ["hamidrezakp", "MHBahrampour"] rtl = true +++ در این پست ما برای معماری x86 یک هسته مینیمال ۶۴ بیتی به زبان راست می‌سازیم. با استفاده از باینری مستقل Rust از پست قبل، یک دیسک ایمیج قابل بوت می‌سازیم، که متنی را در صفحه چاپ کند. [باینری مستقل Rust]: @/edition-2/posts/01-freestanding-rust-binary/index.md این بلاگ بصورت آزاد روی [گیت‌هاب] توسعه داده شده است. اگر شما مشکل یا سوالی دارید، لطفاً آن‌جا یک ایشو باز کنید. شما همچنین می‌توانید [در زیر] این پست کامنت بگذارید. منبع کد کامل این پست را می‌توانید در بِرَنچ [`post-02`][post branch] پیدا کنید. [گیت‌هاب]: https://github.com/phil-opp/blog_os [در زیر]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-02 ## فرآیند بوت شدن وقتی یک رایانه را روشن می‌کنید، شروع به اجرای کد فِرْم‌وِر (کلمه: firmware) ذخیره شده در [ROM] مادربرد می‌کند. این کد یک [power-on self-test] انجام می‌دهد، رم موجود را تشخیص داده، و پردازنده و سخت افزار را پیش‌ مقداردهی اولیه می‌کند. پس از آن به یک دنبال دیسک قابل بوت می‌گردد و شروع به بوت کردن هسته سیستم عامل می‌کند. [ROM]: https://en.wikipedia.org/wiki/Read-only_memory [power-on self-test]: https://en.wikipedia.org/wiki/Power-on_self-test در x86، دو استاندارد فِرْم‌وِر (کلمه: firmware) وجود دارد: «سامانهٔ ورودی/خروجیِ پایه» (**[BIOS]**) و استاندارد جدیدتر «رابط فِرْم‌وِر توسعه یافته یکپارچه» (**[UEFI]**). استاندارد BIOS قدیمی و منسوخ است، اما ساده است و از دهه ۱۹۸۰ تاکنون در هر دستگاه x86 کاملاً پشتیبانی می‌شود. در مقابل‌، UEFI مدرن‌تر است و ویژگی‌های بسیار بیشتری دارد‌، اما راه اندازی آن پیچیده‌تر است (حداقل به نظر من). [BIOS]: https://en.wikipedia.org/wiki/BIOS [UEFI]: https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface در حال حاضر، ما فقط پشتیبانی BIOS را ارائه می‌دهیم، اما پشتیبانی از UEFI نیز برنامه‌ریزی شده است. اگر می‌خواهید در این زمینه به ما کمک کنید، [ایشو گیت‌هاب](https://github.com/phil-opp/blog_os/issues/349) را بررسی کنید. ### بوت شدن BIOS تقریباً همه سیستم‌های x86 از بوت شدن BIOS پشتیبانی می‌کنند‌، از جمله سیستم‌های جدیدترِ مبتنی بر UEFI که از BIOS شبیه‌سازی شده استفاده می‌کنند. این عالی است‌، زیرا شما می‌توانید از منطق بوت یکسانی در تمام سیستم‌های قرن‌های گذشته استفاده کنید. اما این سازگاری گسترده در عین حال بزرگترین نقطه ضعف راه‌‌اندازی BIOS است، زیرا این بدان معناست که پردازنده قبل از بوت شدن در یک حالت سازگاری 16 بیتی به نام [real mode] قرار داده می‌شود تا بوت‌لودرهای قدیمی از دهه 1980 همچنان کار کنند. اما بیایید از ابتدا شروع کنیم: وقتی یک رایانه را روشن می‌کنید، BIOS را از حافظه فلش مخصوصی که روی مادربرد قرار دارد بارگذاری می‌کند. BIOS روال‌های خودآزمایی و مقداردهی اولیه سخت افزار را اجرا می کند‌، سپس به دنبال دیسک‌های قابل بوت می‌گردد. اگر یکی را پیدا کند، کنترل به _بوت‌لودرِ_ آن منتقل می‌شود‌، که یک قسمت ۵۱۲ بایتی از کد اجرایی است و در ابتدای دیسک ذخیره شده است. بیشتر بوت‌لودرها از ۵۱۲ بایت بزرگتر هستند، بنابراین بوت‌لودرها معمولاً به یک قسمت کوچک ابتدایی تقسیم می‌شوند که در ۵۱۲ بایت جای می‌گیرد و قسمت دوم که متعاقباً توسط قسمت اول بارگذاری می‌شود. بوت‌لودر باید محل ایمیج هسته را بر روی دیسک تعیین کرده و آن را در حافظه بارگذاری کند. همچنین ابتدا باید CPU را از [real mode] (ترجمه: حالت واقعی) 16 بیتی به [protected mode] (ترجمه: حالت محافظت شده) 32 بیتی و سپس به [long mode] (ترجمه: حالت طولانی) 64 بیتی سوییچ کند، جایی که ثبات‌های 64 بیتی و کل حافظه اصلی در آن در دسترس هستند. کار سوم آن پرس‌وجو درباره اطلاعات خاص (مانند نگاشت حافظه) از BIOS و انتقال آن به هسته سیستم عامل است. [real mode]: https://en.wikipedia.org/wiki/Real_mode [protected mode]: https://en.wikipedia.org/wiki/Protected_mode [long mode]: https://en.wikipedia.org/wiki/Long_mode [memory segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation نوشتن بوت‌لودر کمی دشوار است زیرا به زبان اسمبلی و بسیاری از مراحل غیر بصیرانه مانند "نوشتن این مقدار جادویی در این ثبات پردازنده" نیاز دارد. بنابراین ما در این پست ایجاد بوت‌لودر را پوشش نمی‌دهیم و در عوض ابزاری به نام [bootimage] را ارائه می‌دهیم که بوت‌لودر را به طور خودکار به هسته شما اضافه می‌کند. [bootimage]: https://github.com/rust-osdev/bootimage اگر علاقه‌مند به ساخت بوت‌لودر هستید: با ما همراه باشید‌، مجموعه‌ای از پست‌ها در این زمینه از قبل برنامه‌ریزی شده است! #### استاندارد بوت چندگانه برای جلوگیری از این که هر سیستم عاملی بوت‌لودر خود را پیاده‌سازی کند، که فقط با یک سیستم عامل سازگار است، [بنیاد نرم افزار آزاد] در سال 1995 یک استاندارد بوت‌لودر آزاد به نام [Multiboot] ایجاد کرد. این استاندارد یک رابط بین بوت‌لودر و سیستم عامل را تعریف می‌کند، به طوری که هر بوت‌لودر سازگار با Multiboot می‌تواند هر سیستم عامل سازگار با Multiboot را بارگذاری کند. پیاده‌سازی مرجع [GNU GRUB] است که محبوب‌ترین بوت‌لودر برای سیستم‌های لینوکس است. [بنیاد نرم افزار آزاد]: https://en.wikipedia.org/wiki/Free_Software_Foundation [Multiboot]: https://wiki.osdev.org/Multiboot [GNU GRUB]: https://en.wikipedia.org/wiki/GNU_GRUB برای سازگار کردن هسته با Multiboot، کافیست یک به اصطلاح [Multiboot header] را در ابتدای فایل هسته اضافه کنید. با این کار بوت کردن سیستم عامل در GRUB بسیار آسان خواهد شد. با این حال، GRUB و استاندارد Multiboot نیز دارای برخی مشکلات هستند: [Multiboot header]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format - آنها فقط از حالت محافظت شده 32 بیتی پشتیبانی می‌کنند. این بدان معناست که شما برای تغییر به حالت طولانی 64 بیتی هنوز باید پیکربندی CPU را انجام دهید. - آنها برای ساده سازی بوت‌لودر طراحی شده‌اند نه برای ساده سازی هسته. به عنوان مثال، هسته باید با [اندازه صفحه پیش فرض تنظیم شده] پیوند داده شود، زیرا GRUB در غیر اینصورت نمی‌تواند هدر Multiboot را پیدا کند. مثال دیگر این است که [اطلاعات بوت]، که به هسته منتقل می‌شوند‌، به جای ارائه انتزاعات تمیز و واضح، شامل ساختارها با وابستگی زیاد به معماری هستند. - هر دو استاندارد GRUB و Multiboot بصورت ناقص مستند شده‌اند. - برای ایجاد یک ایمیج دیسکِ قابل بوت از فایل هسته، GRUB باید روی سیستم میزبان نصب شود. این امر باعث دشوارتر شدنِ توسعه در ویندوز یا Mac می‌شود. [اندازه صفحه پیش فرض تنظیم شده]: https://wiki.osdev.org/Multiboot#Multiboot_2 [اطلاعات بوت]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format به دلیل این اشکالات ما تصمیم گرفتیم از GRUB یا استاندارد Multiboot استفاده نکنیم. با این حال، ما قصد داریم پشتیبانی Multiboot را به ابزار [bootimage] خود اضافه کنیم، به طوری که امکان بارگذاری هسته شما بر روی یک سیستم با بوت‌لودر GRUB نیز وجود داشته باشد. اگر علاقه‌مند به نوشتن هسته سازگار با Multiboot هستید، [نسخه اول] مجموعه پست‌های این وبلاگ را بررسی کنید. [نسخه اول]: @/edition-1/_index.md ### UEFI (ما در حال حاضر پشتیبانی UEFI را ارائه نمی‌دهیم، اما خیلی دوست داریم این کار را انجام دهیم! اگر می‌خواهید کمک کنید، لطفاً در [ایشو گیت‌هاب](https://github.com/phil-opp/blog_os/issues/349) به ما بگویید.) ## یک هسته مینیمال اکنون که تقریباً می‌دانیم چگونه یک کامپیوتر بوت می‌شود، وقت آن است که هسته مینیمال خودمان را ایجاد کنیم. هدف ما ایجاد دیسک ایمیجی می‌باشد که “!Hello World” را هنگام بوت شدن چاپ کند. برای این منظور از [باینری مستقل Rust] که در پست قبل دیدید استفاده می‌کنیم. همانطور که ممکن است به یاد داشته باشید، باینری مستقل را از طریق `cargo` ایجاد کردیم، اما با توجه به سیستم عامل، به نام‌های ورودی و پرچم‌های کامپایل مختلف نیاز داشتیم. به این دلیل که `cargo` به طور پیش فرض برای سیستم میزبان بیلد می‌کند، بطور مثال سیستمی که از آن برای نوشتن هسته استفاده می‌کنید. این چیزی نیست که ما برای هسته خود بخواهیم‌، زیرا منطقی نیست که هسته سیستم عامل‌مان را روی یک سیستم عامل دیگر اجرا کنیم. در عوض، ما می‌خواهیم هسته را برای یک _سیستم هدف_ کاملاً مشخص کامپایل کنیم. ### نصب Rust Nightly {#installing-rust-nightly} راست دارای سه کانال انتشار است: _stable_, _beta_, and _nightly_ (ترجمه از چپ به راست: پایدار، بتا و شبانه). کتاب Rust تفاوت بین این کانال‌ها را به خوبی توضیح می‌دهد، بنابراین یک دقیقه وقت بگذارید و [آن را بررسی کنید](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#choo-choo-release-channels-and-riding-the-trains). برای ساخت یک سیستم عامل به برخی از ویژگی‌های آزمایشی نیاز داریم که فقط در کانال شبانه موجود است‌، بنابراین باید نسخه شبانه Rust را نصب کنیم. برای مدیریت نصب‌های Rust من به شدت [rustup] را توصیه می‌کنم. به شما این امکان را می‌دهد که کامپایلرهای شبانه، بتا و پایدار را در کنار هم نصب کنید و بروزرسانی آنها را آسان می‌کند. با rustup شما می‌توانید از یک کامپایلر شبانه برای دایرکتوری جاری استفاده کنید، کافیست دستور `rustup override set nightly` را اجرا کنید. همچنین می‌توانید فایلی به نام `rust-toolchain` را با محتوای `nightly` در دایرکتوری ریشه پروژه اضافه کنید. با اجرای `rustc --version` می‌توانید چک کنید که نسخه شبانه را دارید یا نه. شماره نسخه باید در پایان شامل `nightly-` باشد. [rustup]: https://www.rustup.rs/ کامپایلر شبانه به ما امکان می‌دهد با استفاده از به اصطلاح _feature flags_ در بالای فایل، از ویژگی‌های مختلف آزمایشی استفاده کنیم. به عنوان مثال، می‌توانیم [`asm!` macro] آزمایشی را برای اجرای دستورات اسمبلیِ این‌لاین (تلفظ: inline) با اضافه کردن `[feature(asm)]!#` به بالای فایل `main.rs` فعال کنیم. توجه داشته باشید که این ویژگی‌های آزمایشی، کاملاً ناپایدار هستند‌، به این معنی که نسخه‌های آتی Rust ممکن است بدون هشدار قبلی آن‌ها را تغییر داده یا حذف کند. به همین دلیل ما فقط در صورت لزوم از آنها استفاده خواهیم کرد. [`asm!` macro]: https://doc.rust-lang.org/stable/reference/inline-assembly.html ### مشخصات هدف کارگو (کلمه: cargo) سیستم‌های هدف‌ مختلف را از طریق `target--` پشتیبانی می‌کند. سیستم هدف توسط یک به اصطلاح _[target triple]_ (ترجمه: هدف سه گانه) توصیف شده‌ است، که معماری CPU، فروشنده، سیستم عامل، و [ABI] را شامل می‌شود. برای مثال، هدف سه گانه `x86_64-unknown-linux-gnu` یک سیستم را توصیف می‌کند که دارای سی‌پی‌یو `x86_64`، بدون فروشنده مشخص و یک سیستم عامل لینوکس با GNU ABI است. Rust از [هدف‌های سه گانه مختلفی][platform-support] پشتیبانی می‌کند، شامل `arm-linux-androideabi` برای اندروید یا [`wasm32-unknown-unknown` برای وب‌اسمبلی](https://www.hellorust.com/setup/wasm-target/). [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [ABI]: https://stackoverflow.com/a/2456882 [platform-support]: https://forge.rust-lang.org/release/platform-support.html [custom-targets]: https://doc.rust-lang.org/nightly/rustc/targets/custom.html برای سیستم هدف خود، به برخی از پارامترهای خاص پیکربندی نیاز داریم (به عنوان مثال، فاقد سیستم عامل زیرین)، بنابراین هیچ یک از [اهداف سه گانه موجود][platform-support] مناسب نیست. خوشبختانه Rust به ما اجازه می‌دهد تا [هدف خود][custom-targets] را از طریق یک فایل JSON تعریف کنیم. به عنوان مثال، یک فایل JSON که هدف `x86_64-unknown-linux-gnu` را توصیف می‌کند به این شکل است: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "linux", "executables": true, "linker-flavor": "gcc", "pre-link-args": ["-m64"], "morestack": false } ``` اکثر فیلدها برای LLVM مورد نیاز هستند تا بتواند کد را برای آن پلتفرم ایجاد کند. برای مثال، فیلد [`data-layout`] اندازه انواع مختلف عدد صحیح، مُمَیزِ شناور و انواع اشاره‌گر را تعریف می‌کند. سپس فیلد‌هایی وجود دارد که Rust برای کامپایل شرطی از آن‌ها استفاده می‌کند، مانند `target-pointer-width`. نوع سوم فیلدها نحوه ساخت crate (تلفظ: کرِیت) را تعریف می‌کنند. مثلا، فیلد `pre-link-args` آرگومان‌های منتقل شده به [لینکر] را مشخص می‌کند. [`data-layout`]: https://llvm.org/docs/LangRef.html#data-layout [لینکر]: https://en.wikipedia.org/wiki/Linker_(computing) ما همچنین سیستم‌های `x86_64` را با هسته خود مورد هدف قرار می‌دهیم‌، بنابراین مشخصات هدف ما بسیار شبیه به مورد بالا خواهد بود. بیایید با ایجاد یک فایل `x86_64-blog_os.json` شروع کنیم (هر اسمی را که دوست دارید انتخاب کنید) با محتوای مشترک: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true } ``` توجه داشته باشید که ما OS را در `llvm-target` و همچنین فیلد `os` را به `none` تغییر دادیم، زیرا ما هسته را روی یک bare metal اجرا می‌کنیم. همچنین موارد زیر که مربوط به ساخت (ترجمه: build-related) هستند را اضافه می‌کنیم: ```json "linker-flavor": "ld.lld", "linker": "rust-lld", ``` به جای استفاده از لینکر پیش فرض پلتفرم (که ممکن است از اهداف لینوکس پشتیبانی نکند)، ما از لینکر کراس پلتفرم [LLD] استفاده می‌کنیم که برای پیوند دادن هسته ما با Rust ارائه می‌شود. [LLD]: https://lld.llvm.org/ ```json "panic-strategy": "abort", ``` این تنظیم مشخص می‌کند که هدف از [stack unwinding] درهنگام panic پشتیبانی نمی‌کند، بنابراین به جای آن خود برنامه باید مستقیماً متوقف شود. این همان اثر است که آپشن `panic = "abort"` در فایل Cargo.toml دارد، پس میتوانیم آن را از فایل Cargo.toml حذف کنیم.(توجه داشته باشید که این آپشنِ هدف همچنین زمانی اعمال می‌شود که ما کتابخانه `هسته` را مجددا در ادامه همین پست کامپایل می‌‌کنیم. بنابراین حتماً این گزینه را اضافه کنید، حتی اگر ترجیح می دهید گزینه Cargo.toml را حفظ کنید.) [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php ```json "disable-redzone": true, ``` ما در حال نوشتن یک هسته هستیم‌، بنابراین بالاخره باید وقفه‌ها را مدیریت کنیم. برای انجام ایمن آن، باید بهینه‌سازی اشاره‌گر پشته‌ای خاصی به نام _“red zone”_ (ترجمه: منطقه قرمز) را غیرفعال کنیم، زیرا در غیر این صورت باعث خراب شدن پشته می‌شود. برای اطلاعات بیشتر، به پست جداگانه ما در مورد [غیرفعال کردن منطقه قرمز] مراجعه کنید. [غیرفعال کردن منطقه قرمز]: @/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.md ```json "features": "-mmx,-sse,+soft-float", ``` فیلد `features` ویژگی‌های هدف را فعال/غیرفعال می‌کند. ما ویژگی‌های `mmx` و `sse` را با گذاشتن یک منفی در ابتدای آن‌ها غیرفعال کردیم و ویژگی `soft-float` را با اضافه کردن یک مثبت به ابتدای آن فعال کردیم. توجه داشته باشید که بین پرچم‌های مختلف نباید فاصله‌ای وجود داشته باشد، در غیر این صورت LLVM قادر به تفسیر رشته ویژگی‌ها نیست. ویژگی‌های `mmx` و `sse` پشتیبانی از دستورالعمل‌های [Single Instruction Multiple Data (SIMD)] را تعیین می‌کنند، که اغلب می‌تواند سرعت برنامه‌ها را به میزان قابل توجهی افزایش دهد. با این حال، استفاده از ثبات‌های بزرگ SIMD در هسته سیستم عامل منجر به مشکلات عملکردی می‌شود. دلیل آن این است که هسته قبل از ادامه یک برنامه‌ی متوقف شده، باید تمام رجیسترها را به حالت اولیه خود برگرداند. این بدان معناست که هسته در هر فراخوانی سیستم یا وقفه سخت افزاری باید حالت کامل SIMD را در حافظه اصلی ذخیره کند. از آنجا که حالت SIMD بسیار بزرگ است (512-1600 بایت) و وقفه‌ها ممکن است اغلب اتفاق بیفتند، این عملیات ذخیره و بازیابی اضافی به طور قابل ملاحظه‌ای به عملکرد آسیب می‌رساند. برای جلوگیری از این، SIMD را برای هسته خود غیرفعال می‌کنیم (نه برای برنامه‌هایی که از روی آن اجرا می شوند!). [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD یک مشکل در غیرفعال کردن SIMD این است که عملیات‌های مُمَیزِ شناور (ترجمه: floating point) در `x86_64` به طور پیش فرض به ثبات‌های SIMD نیاز دارد. برای حل این مشکل، ویژگی `soft-float` را اضافه می‌کنیم، که از طریق عملکردهای نرم‌افزاری مبتنی بر اعداد صحیح عادی، تمام عملیات مُمَیزِ شناور را شبیه‌سازی می‌کند. For more information, see our post on [disabling SIMD](@/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.md). ```json "rustc-abi": "x86-softfloat" ``` As we want to use the `soft-float` feature, we also need to tell the Rust compiler `rustc` that we want to use the corresponding ABI. We can do that by setting the `rustc-abi` field to `x86-softfloat`. #### کنار هم قرار دادن فایل مشخصات هدف ما اکنون به این شکل است: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true, "linker-flavor": "ld.lld", "linker": "rust-lld", "panic-strategy": "abort", "disable-redzone": true, "features": "-mmx,-sse,+soft-float", "rustc-abi": "x86-softfloat" } ``` ### ساخت هسته عملیات کامپایل کردن برای هدف جدید ما از قراردادهای لینوکس استفاده خواهد کرد (کاملاً مطمئن نیستم که چرا، تصور می‌کنم این فقط پیش فرض LLVM باشد). این بدان معنی است که ما به یک نقطه ورود به نام `start_` نیاز داریم همانطور که در [پست قبلی] توضیح داده شد: [پست قبلی]: @/edition-2/posts/01-freestanding-rust-binary/index.md ```rust // src/main.rs #![no_std] // don't link the Rust standard library #![no_main] // disable all Rust-level entry points use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { // this function is the entry point, since the linker looks for a function // named `_start` by default loop {} } ``` توجه داشته باشید که بدون توجه به سیستم عامل میزبان، باید نقطه ورود را `start_` بنامید. اکنون می‌توانیم با نوشتن نام فایل JSON بعنوان `target--`، هسته خود را برای هدف جدید بسازیم: ``` > cargo build --target x86_64-blog_os.json error: `.json` target specs require -Zjson-target-spec ``` شکست میخورد! این خطا به ما می‌گوید که مشخصات هدف JSON سفارشی یک ویژگی ناپایدار است که نیاز به فعال‌سازی صریح دارد. این به این دلیل است که فرمت فایل‌های هدف JSON هنوز پایدار در نظر گرفته نمی‌شود، بنابراین ممکن است در نسخه‌های آینده Rust تغییر کند. برای اطلاعات بیشتر به [مسئله پیگیری مشخصات هدف JSON سفارشی][json-target-spec-issue] مراجعه کنید. [json-target-spec-issue]: https://github.com/rust-lang/rust/issues/151528 #### آپشن `json-target-spec` برای فعال کردن پشتیبانی از مشخصات هدف JSON سفارشی، ما نیاز داریم تا یک فایل [پیکربندی کارگو] در `cargo/config.toml.` (پوشه `cargo.` باید کنار پوشه `src` شما باشد) با محتوای زیر بسازیم: [پیکربندی کارگو]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # in .cargo/config.toml [unstable] json-target-spec = true ``` این ویژگی ناپایدار `json-target-spec` را فعال می‌کند و به ما امکان استفاده از فایل‌های هدف JSON سفارشی را می‌دهد. حالا با این پیکربندی، بیایید دوباره بسازیم: ``` > cargo build --target x86_64-blog_os.json error[E0463]: can't find crate for `core` ``` حالا یک خطای متفاوت می‌بینیم! این خطا به ما می‌گوید که کامپایلر Rust دیگر [کتابخانه `core`] را پیدا نمی‌کند. این کتابخانه شامل انواع اساسی Rust مانند `Result` ، `Option` و iterators است، و به طور ضمنی به همه کریت‌های `no_std` لینک است. [کتابخانه `core`]: https://doc.rust-lang.org/nightly/core/index.html مشکل این است که کتابخانه core همراه با کامپایلر Rust به عنوان یک کتابخانه _precompiled_ (ترجمه: از پیش کامپایل شده) توزیع می‌شود. بنابراین فقط برای میزبان‌های سه‌گانه پشتیبانی شده مجاز است (مثلا، `x86_64-unknown-linux-gnu`) اما برای هدف سفارشی ما صدق نمی‌کند. اگر می‌خواهیم برای سیستم‌های هدف دیگر کدی را کامپایل کنیم، ابتدا باید `core` را برای این اهداف دوباره کامپایل کنیم. #### آپشن `build-std` این‌جاست که [ویژگی `build-std`] کارگو وارد می‌شود. این امکان را می‌دهد تا بجای استفاده از نسخه‌های از پیش کامپایل شده با نصب Rust، بتوانیم `core` و ‌کریت سایر کتابخانه‌های استاندارد را در صورت نیاز دوباره کامپایل کنیم. این ویژگی بسیار جدید بوده و هنوز تکمیل نشده است، بنابراین بعنوان «ناپایدار» علامت گذاری شده و فقط در [نسخه شبانه کامپایلر Rust] در دسترس می‌باشد. [ویژگی `build-std`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std [نسخه شبانه کامپایلر Rust]: #installing-rust-nightly برای استفاده از این ویژگی، باید موارد زیر را به فایل [پیکربندی کارگو] در `cargo/config.toml.` اضافه کنیم: ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std = ["core", "compiler_builtins"] ``` این به کارگو می‌گوید که باید `core` و کتابخانه‌ `compiler_builtins` را دوباره کامپایل کند. مورد دوم لازم است زیرا یک وابستگی از `core` است. به منظور کامپایل مجدد این کتابخانه‌ها، کارگو نیاز به دسترسی به کد منبع Rust دارد که می‌توانیم آن را با `rustup component add rust-src` نصب کنیم.
    **یادداشت:** کلید پیکربندی `unstable.build-std` به نسخه‌‌ای جدیدتر از نسخه 2020-07-15 شبانه Rust نیاز دارد.
    پس از تنظیم کلید پیکربندی `unstable.build-std` و نصب مولفه `rust-src`، می‌توانیم مجددا دستور بیلد (کلمه: build) را اجرا کنیم. ``` > cargo build --target x86_64-blog_os.json Compiling core v0.0.0 (/…/rust/src/libcore) Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) Compiling compiler_builtins v0.1.32 Compiling blog_os v0.1.0 (/…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` می‌بینیم که `cargo build` دوباره `core` و `rustc-std-workspace-core` (یک وابستگی از `compiler_builtins`)، و کتابخانه `compiler_builtins` را برای سیستم هدف سفارشی‌مان کامپایل می‌کند. #### موارد ذاتیِ مربوط به مموری کامپایلر Rust فرض می‌کند که مجموعه خاصی از توابع داخلی برای همه سیستم‌ها در دسترس است. اکثر این توابع توسط کریت `compiler_builtins` ارائه می‌شود که ما آن را به تازگی مجددا کامپایل کردیم. با این حال‌، برخی از توابع مربوط به حافظه در آن کریت وجود دارد که به طور پیش‌فرض فعال نیستند زیرا به طور معمول توسط کتابخانه C موجود در سیستم ارائه می‌شوند. این توابع شامل `memset` می‌باشد که مجموعه تمام بایت‌ها را در یک بلوک حافظه بر روی یک مقدار مشخص قرار می‌دهد، `memcpy` که یک بلوک حافظه را در دیگری کپی می‌کند و `memcmp` که دو بلوک حافظه را با یکدیگر مقایسه می‌کند. اگرچه ما در حال حاضر به هیچ یک از این توابع برای کامپایل هسته خود نیازی نداریم، اما به محض افزودن کدهای بیشتر به آن، این توابع مورد نیاز خواهند بود (برای مثال، هنگام کپی کردن یک ساختمان). از آنجا که نمی‌توانیم به کتابخانه C سیستم عامل لینک دهیم، به روشی جایگزین برای ارائه این توابع به کامپایلر نیاز داریم. یک رویکرد ممکن برای این کار می‌تواند پیاده‌سازی توابع `memset` و غیره و اعمال صفت `#[unsafe(no_mangle)]` (برای جلوگیری از تغییر نام خودکار در هنگام کامپایل کردن) بر روی آنها اعمال باشد. با این حال، این خطرناک است زیرا کوچک‌ترین اشتباهی در اجرای این توابع می‌تواند منجر به یک رفتار تعریف نشده شود. به عنوان مثال، ممکن است هنگام پیاده‌سازی `memcpy` با استفاده از حلقه `for` یک بازگشت بی‌پایان داشته باشید زیرا حلقه‌های `for` به طور ضمنی مِتُد تریتِ (کلمه: trait) [`IntoIterator::into_iter`] را فراخوانی می‌کنند، که ممکن است دوباره `memcpy` را فراخوانی کند. بنابراین بهتر است به جای آن از پیاده سازی‌های تست شده موجود، مجدداً استفاده کنید. [`IntoIterator::into_iter`]: https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter خوشبختانه کریت `compiler_builtins` از قبل شامل پیاده سازی تمام توابع مورد نیازمان است، آن‌ها فقط به طور پیش فرض غیرفعال هستند تا با پیاده سازی های کتابخانه C تداخلی نداشته باشند. ما می‌توانیم آنها را با تنظیم پرچم [`build-std-features`] کارگو بر روی `["compiler-builtins-mem"]` فعال کنیم. مانند پرچم `build-std`، این پرچم می‌تواند به عنوان پرچم `Z-` در خط فرمان استفاده شود یا در جدول `unstable` در فایل `cargo/config.toml.` پیکربندی شود. از آن‌جا که همیشه می‌خواهیم با این پرچم بیلد کنیم، گزینه پیکربندی فایل منطقی‌تر است: [`build-std-features`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std-features = ["compiler-builtins-mem"] build-std = ["core", "compiler_builtins"] ``` پشتیبانی برای ویژگی `compiler-builtins-mem` [به تازگی اضافه شده](https://github.com/rust-lang/rust/pull/77284)، پس حداقل به نسخه‌ شبانه‌ `2020-09-30` نیاز دارید. در پشت صحنه، این پرچم [ویژگی `mem`] از کریت `compiler_builtins` را فعال می‌کند. اثرش این است که صفت `#[unsafe(no_mangle)]` بر روی [پیاده‌سازی `memcpy` و بقیه موارد] از کریت اعمال می‌شود، که آن‌ها در دسترس لینکر قرار می‌دهد. شایان ذکر است که این توابع در حال حاضر [بهینه نشده‌اند]، بنابراین ممکن است عملکرد آ‌ن‌ها در بهترین حالت نباشد، اما حداقل صحیح هستند. برای `x86_64` ، یک pull request باز برای [بهینه سازی این توابع با استفاده از دستورالعمل‌های خاص اسمبلی][memcpy rep movsb] وجود دارد. [ویژگی `mem`]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L51-L52 [پیاده‌سازی `memcpy` و بقیه موارد]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69 [بهینه نشده‌اند]: https://github.com/rust-lang/compiler-builtins/issues/339 [memcpy rep movsb]: https://github.com/rust-lang/compiler-builtins/pull/365 با این تغییر، هسته ما برای همه توابع مورد نیاز کامپایلر، پیاده سازی معتبری دارد، بنابراین حتی اگر کد ما پیچیده‌تر باشد نیز باز کامپایل می‌شود. #### تنظیم یک هدف پیش‌ فرض برای این‌که نیاز نباشد در هر فراخوانی `cargo build` پارامتر `target--` را وارد کنیم، می‌توانیم هدف پیش‌فرض را بازنویسی کنیم. برای این کار، ما کد زیر را به [پیکربندی کارگو] در فایل `cargo/config.toml.` اضافه می‌کنیم: [پیکربندی کارگو]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # in .cargo/config.toml [build] target = "x86_64-blog_os.json" ``` این به `cargo` می‌گوید در صورتی که صریحاً از `target--` استفاده نکردیم، از هدف ما یعنی `x86_64-blog_os.json` استفاده کند. در واقع اکنون می‌توانیم هسته خود را با یک `cargo build` ساده بسازیم. برای اطلاعات بیشتر در مورد گزینه‌های پیکربندی کارگو، [اسناد رسمی][پیکربندی کارگو] را بررسی کنید. اکنون می‌توانیم هسته را برای یک هدف bare metal با یک `cargo build` ساده بسازیم. با این حال، نقطه ورود `start_` ما، که توسط بوت لودر فراخوانی می‌شود، هنوز خالی است. وقت آن است که از طریق آن، چیزی را در خروجی نمایش دهیم. ### چاپ روی صفحه ساده‌ترین راه برای چاپ متن در صفحه در این مرحله [بافر متن VGA] است. این یک منطقه خاص حافظه است که به سخت افزار VGA نگاشت (مَپ) شده و حاوی مطالب نمایش داده شده روی صفحه است. به طور معمول از 25 خط تشکیل شده است که هر کدام شامل 80 سلول کاراکتر هستند. هر سلول کاراکتر یک کاراکتر ASCII را با برخی از رنگ‌های پیش زمینه و پس زمینه نشان می‌دهد. خروجی صفحه به این شکل است: [بافر متن VGA]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode ![screen output for common ASCII characters](https://upload.wikimedia.org/wikipedia/commons/f/f8/Codepage-437.png) ما در پست بعدی، جایی که اولین درایور کوچک را برای آن می‌نویسیم، در مورد قالب دقیق بافر متن VGA بحث خواهیم کرد. برای چاپ “!Hello World”‌، فقط باید بدانیم که بافر در آدرس `0xb8000` قرار دارد و هر سلول کاراکتر از یک بایت ASCII و یک بایت رنگ تشکیل شده است. پیاده‌سازی مشابه این است: ```rust static HELLO: &[u8] = b"Hello World!"; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { let vga_buffer = 0xb8000 as *mut u8; for (i, &byte) in HELLO.iter().enumerate() { unsafe { *vga_buffer.offset(i as isize * 2) = byte; *vga_buffer.offset(i as isize * 2 + 1) = 0xb; } } loop {} } ``` ابتدا عدد صحیح `0xb8000` را در یک اشاره‌گر خام (ترجمه: raw pointer) می‌ریزیم. سپس روی بایت‌های [رشته بایت][byte string] [استاتیک][static] `HELLO` [پیمایش][iterate] می‌کنیم. ما از متد [`enumerate`] برای اضافه کردن متغیر درحال اجرای `i` استفاده می‌کنیم. در بدنه حلقه for، از متد [`offset`] برای نوشتن بایت رشته و بایت رنگ مربوطه استفاده می‌کنیم (`0xb` فیروزه‌ای روشن است). [iterate]: https://doc.rust-lang.org/stable/book/ch13-02-iterators.html [static]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate [byte string]: https://doc.rust-lang.org/reference/tokens.html#byte-string-literals [اشاره‌گر خام]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset توجه داشته باشید که یک بلوک [`unsafe`] همیشه هنگام نوشتن در حافظه مورد استفاده قرار می‌گیرد. دلیل این امر این است که کامپایلر Rust نمی‌تواند معتبر بودن اشاره‌گرهای خام که ایجاد میکنیم را ثابت کند. آن‌ها می‌توانند به هر کجا اشاره کنند و منجر به خراب شدن داده‌ها شوند. با قرار دادن آن‌ها در یک بلوک `unsafe`، ما در اصل به کامپایلر می‌گوییم که کاملاً از معتبر بودن عملیات اطمینان داریم. توجه داشته باشید که یک بلوک `unsafe`، بررسی‌های ایمنی Rust را خاموش نمی‌کند. فقط به شما این امکان را می‌دهد که [پنج کار اضافی] انجام دهید. [`unsafe`]: https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html [پنج کار اضافی]: https://doc.rust-lang.org/stable/book/ch20-01-unsafe-rust.html#unsafe-superpowers می خواهم تأکید کنم که **این روشی نیست که ما بخواهیم در Rust کارها را از طریق آن پبش ببریم!** به هم ریختگی هنگام کار با اشاره‌گرهای خام در داخل بلوک‌های ناامن بسیار محتمل و ساده است، به عنوان مثال، اگر مواظب نباشیم به راحتی می‌توانیم فراتر از انتهای بافر بنویسیم. بنابراین ما می‌خواهیم تا آن‌جا که ممکن است استفاده از `unsafe` را به حداقل برسانیم. Rust با ایجاد انتزاع‌های ایمن به ما توانایی انجام این کار را می‌دهد. به عنوان مثال، ما می‌توانیم یک نوع بافر VGA ایجاد کنیم که تمام کدهای ناامن را در خود قرار داده و اطمینان حاصل کند که انجام هرگونه اشتباه از خارج از این انتزاع _غیرممکن_ است. به این ترتیب، ما فقط به حداقل مقادیر ناامن نیاز خواهیم داشت و می‌توان اطمینان داشت که [ایمنی حافظه] را نقض نمی‌کنیم. در پست بعدی چنین انتزاع ایمن بافر VGA را ایجاد خواهیم کرد. [ایمنی حافظه]: https://en.wikipedia.org/wiki/Memory_safety ## اجرای هسته حال یک هسته اجرایی داریم که کار محسوسی را انجام می‌دهد، پس زمان اجرای آن فرا رسیده است. ابتدا، باید هسته کامپایل شده خود را با پیوند دادن آن به یک بوت‌لودر، به یک دیسک ایمیج قابل بوت تبدیل کنیم. سپس می‌توانیم دیسک ایمیج را در ماشین مجازی [QEMU] اجرا یا با استفاده از یک درایو USB آن را بر روی سخت افزار واقعی بوت کنیم. ### ساخت دیسک ایمیج برای تبدیل هسته کامپایل شده به یک دیسک ایمیج قابل بوت، باید آن را با یک بوت لودر پیوند دهیم. همانطور که در [بخش مربوط به بوت شدن (لینک باید اپدیت شود)] آموختیم، بوت لودر مسئول مقداردهی اولیه پردازنده و بارگیری هسته می‌باشد. [بخش مربوط به بوت شدن]: #the-boot-process به جای نوشتن یک بوت لودر مخصوص خودمان، که به تنهایی یک پروژه است، از کریت [`bootloader`] استفاده می‌کنیم. این کریت بوت‌لودر اصلی BIOS را بدون هیچگونه وابستگی به C، فقط با استفاده از Rust و این‌لاین اسمبلی پیاده سازی می‌کند. برای استفاده از آن برای راه اندازی هسته، باید وابستگی به آن را ضافه کنیم: [`bootloader`]: https://crates.io/crates/bootloader ```toml # in Cargo.toml [dependencies] bootloader = "0.9" ``` افزودن بوت‌لودر به عنوان وابستگی برای ایجاد یک دیسک ایمیج قابل بوت کافی نیست. مشکل این است که ما باید هسته خود را با بوت لودر پیوند دهیم، اما کارگو از [اسکریپت های بعد از بیلد] پشتیبانی نمی‌کند. [اسکریپت های بعد از بیلد]: https://github.com/rust-lang/cargo/issues/545 برای حل این مشکل، ما ابزاری به نام `bootimage` ایجاد کردیم که ابتدا هسته و بوت لودر را کامپایل می‌کند و سپس آن‌ها را به یکدیگر پیوند می‌دهد تا یک ایمیج دیسک قابل بوت ایجاد کند. برای نصب ابزار‌، دستور زیر را در ترمینال خود اجرا کنید: ``` cargo install bootimage ``` برای اجرای `bootimage` و ساختن بوت‌لودر، شما باید `llvm-tools-preview` که یک مولفه rustup می‌باشد را نصب داشته باشید. شما می‌توانید این کار را با اجرای دستور `rustup component add llvm-tools-preview` انجام دهید. پس از نصب `bootimage` و اضافه کردن مولفه `llvm-tools-preview`، ما می‌توانیم یک دیسک ایمیج قابل بوت را با اجرای این دستور ایجاد کنیم: ``` > cargo bootimage ``` می‌بینیم که این ابزار، هسته ما را با استفاده از `cargo build` دوباره کامپایل می‌کند، بنابراین به طور خودکار هر تغییری که ایجاد می‌کنید را در‌بر‌ میگیرد. پس از آن بوت‌لودر را کامپایل می‌کند که ممکن است مدتی طول بکشد. مانند تمام کریت‌های وابسته ، فقط یک بار بیلد می‌شود و سپس کش (کلمه: cache) می‌شود، بنابراین بیلدهای بعدی بسیار سریع‌تر خواهد بود. سرانجام، `bootimage`، بوت‌لودر و هسته شما را با یک دیسک ایمیج قابل بوت ترکیب می‌کند. پس از اجرای این دستور، شما باید یک دیسک ایمیج قابل بوت به نام `bootimage-blog_os.bin` در مسیر `target/x86_64-blog_os/debug` ببینید. شما می‌توانید آن را در یک ماشین مجازی بوت کنید یا آن را در یک درایو USB کپی کرده و روی یک سخت افزار واقعی بوت کنید. (توجه داشته باشید که این یک ایمیج CD نیست، بنابراین رایت کردن آن روی CD بی‌فایده‌ است چرا که ایمیج CD دارای قالب متفاوتی است). #### چگونه کار می کند؟ ابزار `bootimage` مراحل زیر را در پشت صحنه انجام می دهد: - کرنل ما را به یک فایل [ELF] کامپایل می‌کند. - وابستگی بوت‌لودر را به عنوان یک اجرایی مستقل (ترجمه: standalone executable) کامپایل می‌کند. - بایت‌های فایل ELF هسته را به بوت‌لودر پیوند می‌دهد. [ELF]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format [rust-osdev/bootloader]: https://github.com/rust-osdev/bootloader وقتی بوت شد، بوت‌لودر فایل ضمیمه شده ELF را خوانده و تجزیه می‌کند. سپس بخش‌های (ترجمه: segments) برنامه را به آدرس‌های مجازی در جداول صفحه نگاشت (مپ) می‌کند، بخش `bss.` را صفر کرده و یک پشته را تنظیم می‌کند. در آخر، آدرس نقطه ورود (تابع `start_`) را خوانده و به آن پرش میکند. ### بوت کردن در QEMU اکنون می‌توانیم دیسک ایمیج را در یک ماشین مجازی بوت کنیم. برای راه اندازی آن در [QEMU]، دستور زیر را اجرا کنید: [QEMU]: https://www.qemu.org/ ``` > qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] ``` این یک پنجره جداگانه با این شکل باز می‌کند: ![QEMU showing "Hello World!"](qemu.png) می‌بینیم که “!Hello World” بر روی صفحه قابل مشاهده است. ### ماشین واقعی همچنین می‌توانید آن را بر روی یک درایو USB رایت و بر روی یک دستگاه واقعی بوت کنید: ``` > dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX && sync ``` در اینجا `sdX` نام دستگاه USB شماست. **مراقب باشید** که نام دستگاه را به درستی انتخاب کنید، زیرا همه داده‌های موجود در آن دستگاه بازنویسی می‌شوند. پس از رایت کردن ایمیج در USB، می‌توانید با بوت کردن، آن را بر روی سخت افزار واقعی اجرا کنید. برای راه اندازی از طریق USB احتمالاً باید از یک منوی بوت ویژه استفاده کنید یا ترتیب بوت را در پیکربندی BIOS تغییر دهید. توجه داشته باشید که این در حال حاضر برای دستگاه‌های UEFI کار نمی‌کند، زیرا کریت `bootloader` هنوز پشتیبانی UEFI را ندارد. ### استفاده از `cargo run` برای سهولت اجرای هسته در QEMU، می‌توانیم کلید پیکربندی `runner` را برای کارگو تنظیم کنیم: ```toml # in .cargo/config.toml [target.'cfg(target_os = "none")'] runner = "bootimage runner" ``` جدول `'target.'cfg(target_os = "none")` برای همه اهدافی که فیلد `"os"` فایل پیکربندی هدف خود را روی `"none"` تنظیم کرده‌اند، اعمال می‌شود. این شامل هدف `x86_64-blog_os.json` نیز می‌شود. `runner` دستوری را که باید برای `cargo run` فراخوانی شود مشخص می‌کند. دستور پس از بیلد موفقیت آمیز با مسیر فایل اجرایی که به عنوان اولین آرگومان داده شده، اجرا می‌شود. برای جزئیات بیشتر به [اسناد کارگو][پیکربندی کارگو] مراجعه کنید. دستور `bootimage runner` بصورت مشخص طراحی شده تا بعنوان یک `runner` قابل اجرا مورد استفاده قرار بگیرد. فایل اجرایی داده شده را به بوت‌لودر پروژه پیوند داده و سپس QEMU را اجرا می‌کند. برای جزئیات بیشتر و گزینه‌های پیکربندی احتمالی‌، به [توضیحات `bootimage`] مراجعه کنید. [توضیحات `bootimage`]: https://github.com/rust-osdev/bootimage اکنون می‌توانیم از `cargo run` برای کامپایل هسته خود و راه اندازی آن در QEMU استفاده کنیم. ## مرحله بعد چیست؟ در پست بعدی، ما بافر متن VGA را با جزئیات بیشتری بررسی خواهیم کرد و یک رابط امن برای آن می‌نویسیم. همچنین پشتیبانی از ماکرو `println` را نیز اضافه خواهیم کرد. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/index.fr.md ================================================ +++ title = "Un noyau Rust minimal" weight = 2 path = "fr/minimal-rust-kernel" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "c689ecf810f8e93f6b2fb3c4e1e8b89b8a0998eb" # GitHub usernames of the people that translated this post translators = ["TheMimiCodes", "maximevaillancourt"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["alaincao"] +++ Dans cet article, nous créons un noyau Rust 64-bit minimal pour l'architecture x86. Nous continuons le travail fait dans l'article précédent “[Un binaire Rust autonome][freestanding Rust binary]” pour créer une image de disque amorçable qui affiche quelque chose à l'écran. [freestanding Rust binary]: @/edition-2/posts/01-freestanding-rust-binary/index.fr.md Cet article est développé de manière ouverte sur [GitHub]. Si vous avez des problèmes ou des questions, veuillez ouvrir une _Issue_ sur GitHub. Vous pouvez aussi laisser un commentaire [au bas de la page]. Le code source complet pour cet article se trouve dans la branche [`post-02`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [au bas de la page]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-02 ## Le processus d'amorçage Quand vous allumez un ordinateur, il commence par exécuter le code du micrologiciel qui est enregistré dans la carte mère ([ROM]). Ce code performe un [test d'auto-diagnostic de démarrage][power-on self-test], détecte la mémoire volatile disponible, et pré-initialise le processeur et le matériel. Par la suite, il recherche un disque amorçable et commence le processus d'amorçage du noyau du système d'exploitation. [ROM]: https://fr.wikipedia.org/wiki/M%C3%A9moire_morte [power-on self-test]: https://fr.wikipedia.org/wiki/Power-on_self-test_(informatique) Sur x86, il existe deux standards pour les micrologiciels : le “Basic Input/Output System“ (**[BIOS]**) et le nouvel “Unified Extensible Firmware Interface” (**[UEFI]**). Le BIOS standard est vieux et dépassé, mais il est simple et bien supporté sur toutes les machines x86 depuis les années 1980. Au contraire, l'UEFI est moderne et offre davantage de fonctionnalités. Cependant, il est plus complexe à installer (du moins, selon moi). [BIOS]: https://fr.wikipedia.org/wiki/BIOS_(informatique) [UEFI]: https://fr.wikipedia.org/wiki/UEFI Actuellement, nous offrons seulement un support BIOS, mais nous planifions aussi du support pour l'UEFI. Si vous aimeriez nous aider avec cela, consultez l'[_issue_ sur GitHub](https://github.com/phil-opp/blog_os/issues/349). ### Amorçage BIOS Presque tous les systèmes x86 peuvent amorcer le BIOS, y compris les nouvelles machines UEFI qui utilisent un BIOS émulé. C'est une bonne chose car cela permet d'utiliser la même logique d'amorçage sur toutes les machines du dernier siècle. Cependant, cette grande compatibilité est aussi le plus grand inconvénient de l'amorçage BIOS, car cela signifie que le CPU est mis dans un mode de compatibilité 16-bit appelé _[real mode]_ avant l'amorçage afin que les bootloaders archaïques des années 1980 puissent encore fonctionner. Mais commençons par le commencement : Quand vous allumez votre ordinateur, il charge le BIOS provenant d'un emplacement de mémoire flash spéciale localisée sur la carte mère. Le BIOS exécute des tests d'auto-diagnostic et des routines d'initialisation du matériel, puis il cherche des disques amorçables. S'il en trouve un, le contrôle est transféré à son _bootloader_, qui est une portion de 512 octets de code exécutable enregistré au début du disque. Vu que la plupart des bootloaders dépassent 512 octets, ils sont généralement divisés en deux phases: la première, plus petite, tient dans ces 512 octets, tandis que la seconde phase est chargée subséquemment. Le bootloader doit déterminer l'emplacement de l'image de noyau sur le disque et la charger en mémoire. Il doit aussi passer le processeur de 16-bit ([real mode]) à 32-bit ([protected mode]), puis à 64-bit ([long mode]), dans lequel les registres 64-bit et la totalité de la mémoire principale sont disponibles. Sa troisième responsabilité est de récupérer certaines informations (telle que les associations mémoires) du BIOS et de les transférer au noyau du système d'exploitation. [real mode]: https://fr.wikipedia.org/wiki/Mode_r%C3%A9el [protected mode]: https://fr.wikipedia.org/wiki/Mode_prot%C3%A9g%C3%A9 [long mode]: https://en.wikipedia.org/wiki/Long_mode [memory segmentation]: https://fr.wikipedia.org/wiki/Segmentation_(informatique) Implémenter un bootloader est fastidieux car cela requiert l'écriture en language assembleur ainsi que plusieurs autres étapes particulières comme “écrire une valeur magique dans un registre du processeur". Par conséquent, nous ne couvrons pas la création d'un bootloader dans cet article et fournissons plutôt un outil appelé [bootimage] qui ajoute automatiquement un bootloader au noyau. [bootimage]: https://github.com/rust-osdev/bootimage Si vous êtes intéressé par la création de votre propre booloader : restez dans le coin, plusieurs articles sur ce sujet sont déjà prévus à ce sujet! #### Le standard Multiboot Pour éviter que chaque système d'exploitation implémente son propre bootloader, qui est seulement compatible avec un seul système d'exploitation, la [Free Software Foundation] a créé en 1995 un bootloader standard public appelé [Multiboot]. Le standard définit une interface entre le bootloader et le système d'exploitation afin que n'importe quel bootloader compatible Multiboot puisse charger n'importe quel système d'exploitation compatible Multiboot. L'implémentation de référence est [GNU GRUB], qui est le bootloader le plus populaire pour les systèmes Linux. [Free Software Foundation]: https://fr.wikipedia.org/wiki/Free_Software_Foundation [Multiboot]: https://wiki.osdev.org/Multiboot [GNU GRUB]: https://fr.wikipedia.org/wiki/GNU_GRUB Pour créer un noyau compatible Multiboot, il suffit d'insérer une [en-tête Multiboot][Multiboot header] au début du fichier du noyau. Cela rend très simple l'amorçage d'un système d'exploitation depuis GRUB. Cependant, GRUB et le standard Multiboot présentent aussi quelques problèmes : [Multiboot header]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format - Ils supportent seulement le "protected mode" 32-bit. Cela signifie que vous devez encore effectuer la configuration du processeur pour passer au "long mode" 64-bit. - Ils sont conçus pour simplifier le bootloader plutôt que le noyau. Par exemple, le noyau doit être lié avec une [taille de page prédéfinie][adjusted default page size], étant donné que GRUB ne peut pas trouver les entêtes Multiboot autrement. Un autre exemple est que l'[information de boot][boot information], qui est fournies au noyau, contient plusieurs structures spécifiques à l'architecture au lieu de fournir des abstractions pures. - GRUB et le standard Multiboot sont peu documentés. - GRUB doit être installé sur un système hôte pour créer une image de disque amorçable depuis le fichier du noyau. Cela rend le développement sur Windows ou sur Mac plus difficile. [adjusted default page size]: https://wiki.osdev.org/Multiboot#Multiboot_2 [boot information]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format En raison de ces désavantages, nous avons décidé de ne pas utiliser GRUB ou le standard Multiboot. Cependant, nous avons l'intention d'ajouter le support Multiboot à notre outil [bootimage], afin qu'il soit aussi possible de charger le noyau sur un système GRUB. Si vous êtes interessé par l'écriture d'un noyau Multiboot conforme, consultez la [première édition][first edition] de cette série d'articles. [first edition]: @/edition-1/_index.md ### UEFI (Nous ne fournissons pas le support UEFI à l'heure actuelle, mais nous aimerions bien! Si vous voulez aider, dites-le nous dans cette [_issue_ GitHub](https://github.com/phil-opp/blog_os/issues/349).) ## Un noyau minimal Maintenant que nous savons à peu près comment un ordinateur démarre, il est temps de créer notre propre noyau minimal. Notre objectif est de créer une image de disque qui affiche “Hello World!” à l'écran lorsqu'il démarre. Nous ferons ceci en améliorant le [binaire Rust autonome][freestanding Rust binary] du dernier article. Comme vous vous en rappelez peut-être, nous avons créé un binaire autonome grâce à `cargo`, mais selon le système d'exploitation, nous avions besoin de différents points d'entrée et d'options de compilation. C'est dû au fait que `cargo` construit pour _système hôte_ par défaut, c'est-à-dire le système que vous utilisez. Ce n'est pas ce que nous voulons pour notre noyau, car un noyau qui s'exécute, par exemple, sur Windows n'a pas de sens. Nous voulons plutôt compiler pour un _système cible_ bien défini. ### Installer une version nocturne de Rust Rust a trois canaux de distribution : _stable_, _beta_, et _nightly_. Le Livre de Rust explique bien les différences entre ces canaux, alors prenez une minute et [jetez y un coup d'oeil](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#choo-choo-release-channels-and-riding-the-trains). Pour construire un système d'exploitation, nous aurons besoin de fonctionalités expérimentales qui sont disponibles uniquement sur le canal de distribution nocturne. Donc nous devons installer une version nocturne de Rust. Pour gérer l'installation de Rust, je recommande fortement [rustup]. Il vous permet d'installer les versions nocturne, beta et stable du compilateur côte-à-côte et facilite leurs mises à jour. Avec rustup, vous pouvez utiliser un canal de distribution nocturne pour le dossier actuel en exécutant `rustup override set nightly`. Par ailleurs, vous pouvez ajouter un fichier appelé `rust-toolchain` avec le contenu `nightly` au dossier racine du projet. Vous pouvez vérifier que vous avez une version nocturne installée en exécutant `rustc --version`: Le numéro de la version devrait comprendre `-nightly` à la fin. [rustup]: https://www.rustup.rs/ La version nocturne du compilateur nous permet d'activer certaines fonctionnalités expérimentales en utilisant certains _drapeaux de fonctionalité_ dans le haut de notre fichier. Par exemple, nous pourrions activer [macro expérimentale `asm!`][`asm!` macro] pour écrire du code assembleur intégré en ajoutant `#![feature(asm)]` au haut de notre `main.rs`. Notez que ces fonctionnalités expérimentales sont tout à fait instables, ce qui veut dire que des versions futures de Rust pourraient les changer ou les retirer sans préavis. Pour cette raison, nous les utiliserons seulement lorsque strictement nécessaire. [`asm!` macro]: https://doc.rust-lang.org/stable/reference/inline-assembly.html ### Spécification de la cible Cargo supporte différent systèmes cibles avec le paramètre `--target`. La cible est définie par un soi-disant _[triplet de cible][target triple]_, qui décrit l'architecteur du processeur, le fabricant, le système d'exploitation, et l'interface binaire d'application ([ABI]). Par exemple, le triplet `x86_64-unknown-linux-gnu` décrit un système avec un processeur `x86_64`, sans fabricant défini, et un système d'exploitation Linux avec l'interface binaire d'application GNU. Rust supporte [plusieurs différents triplets de cible][platform-support], incluant `arm-linux-androideabi` pour Android ou [`wasm32-unknown-unknown` pour WebAssembly](https://www.hellorust.com/setup/wasm-target/). [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [ABI]: https://stackoverflow.com/a/2456882 [platform-support]: https://forge.rust-lang.org/release/platform-support.html [custom-targets]: https://doc.rust-lang.org/nightly/rustc/targets/custom.html Pour notre système cible toutefois, nous avons besoin de paramètres de configuration spéciaux (par exemple, pas de système d'explotation sous-jacent), donc aucun des [triplets de cible existants][platform-support] ne convient. Heureusement, Rust nous permet de définir [notre propre cible][custom-targets] par l'entremise d'un fichier JSON. Par exemple, un fichier JSON qui décrit une cible `x86_64-unknown-linux-gnu` ressemble à ceci: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "linux", "executables": true, "linker-flavor": "gcc", "pre-link-args": ["-m64"], "morestack": false } ``` La plupart des champs sont requis par LLVM pour générer le code pour cette plateforme. Par exemple, le champ [`data-layout`] définit la taille de divers types d'entiers, de nombres à virgule flottante, et de pointeurs. Puis, il y a des champs que Rust utilise pour de la compilation conditionelle, comme `target-pointer-width`. Le troisième type de champ définit comment la crate doit être construite. Par exemple, le champ `pre-link-args` spécifie les arguments fournis au [lieur][linker]. [`data-layout`]: https://llvm.org/docs/LangRef.html#data-layout [linker]: https://en.wikipedia.org/wiki/Linker_(computing) Nous pouvons aussi cibler les systèmes `x86_64` avec notre noyau, donc notre spécification de cible ressemblera beaucoup à celle plus haut. Commençons par créer un fichier `x86_64-blog_os.json` (utilisez le nom de votre choix) avec ce contenu commun: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true } ``` Notez que nous avons changé le système d'exploitation dans le champs `llvm-target` et `os` en `none`, puisque nous ferons l'exécution sur du "bare metal" (donc, sans système d'exploitation sous-jacent). Nous ajoutons ensuite les champs suivants reliés à la construction: ```json "linker-flavor": "ld.lld", "linker": "rust-lld", ``` Plutôt que d'utiliser le lieur par défaut de la plateforme (qui pourrait ne pas supporter les cibles Linux), nous utilisons le lieur multi-plateforme [LLD] qui est inclut avec Rust pour lier notre noyau. [LLD]: https://lld.llvm.org/ ```json "panic-strategy": "abort", ``` Ce paramètre spécifie que la cible ne permet pas le [déroulement de la pile][stack unwinding] lorsque le noyau panique, alors le système devrait plutôt s'arrêter directement. Ceci mène au même résultat que l'option `panic = "abort"` dans notre Cargo.toml, alors nous pouvons la retirer de ce fichier. (Notez que, contrairement à l'option Cargo.toml, cette option de cible s'applique aussi quand nous recompilerons la bibliothèque `core` plus loin dans cet article. Ainsi, même si vous préférez garder l'option Cargo.toml, gardez cette option.) [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php ```json "disable-redzone": true, ``` Nous écrivons un noyau, donc nous devrons éventuellement gérer les interruptions. Pour ce faire en toute sécurité, nous devons désactiver une optimisation de pointeur de pile nommée la _“zone rouge"_, puisqu'elle causerait une corruption de la pile autrement. Pour plus d'informations, lire notre article séparé à propos de la [désactivation de la zone rouge][disabling the red zone]. [disabling the red zone]: @/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.md ```json "features": "-mmx,-sse,+soft-float", ``` Le champ `features` active/désactive des fonctionalités de la cible. Nous désactivons les fonctionalités `mmx` et `sse` en les précédant d'un signe "moins" et activons la fonctionnalité `soft-float` en la précédant d'un signe "plus". Notez qu'il ne doit pas y avoir d'espace entre les différentes fonctionnalités, sinon LLVM n'arrive pas à analyser la chaîne de caractères des fonctionnalités. Les fonctionnalités `mmx` et `sse` déterminent le support les instructions [Single Instruction Multiple Data (SIMD)], qui peuvent souvent significativement accélérer les programmes. Toutefois, utiliser les grands registres SIMD dans les noyaux des systèmes d'exploitation mène à des problèmes de performance. Ceci parce que le noyau a besoin de restaurer tous les registres à leur état original avant de continuer un programme interrompu. Cela signifie que le noyau doit enregistrer l'état SIMD complet dans la mémoire principale à chaque appel système ou interruption matérielle. Puisque l'état SIMD est très grand (512–1600 octets) et que les interruptions peuvent survenir très fréquemment, ces opérations d'enregistrement/restauration additionnelles nuisent considérablement à la performance. Pour prévenir cela, nous désactivons SIMD pour notre noyau (pas pour les applications qui s'exécutent dessus!). [Single Instruction Multiple Data (SIMD)]: https://fr.wikipedia.org/wiki/Single_instruction_multiple_data Un problème avec la désactivation de SIMD est que les opérations sur les nombres à virgule flottante sur `x86_64` nécessitent les registres SIMD par défaut. Pour résoudre ce problème, nous ajoutons la fonctionnalité `soft-float`, qui émule toutes les opérations à virgule flottante avec des fonctions logicielles utilisant des entiers normaux. Pour plus d'informations, voir notre article sur la [désactivation de SIMD](@/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.md). ```json "rustc-abi": "x86-softfloat" ``` As we want to use the `soft-float` feature, we also need to tell the Rust compiler `rustc` that we want to use the corresponding ABI. We can do that by setting the `rustc-abi` field to `x86-softfloat`. #### Assembler le tout Notre fichier de spécification de cible ressemble maintenant à ceci : ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true, "linker-flavor": "ld.lld", "linker": "rust-lld", "panic-strategy": "abort", "disable-redzone": true, "features": "-mmx,-sse,+soft-float", "rustc-abi": "x86-softfloat" } ``` ### Construction de notre noyau Compiler pour notre nouvelle cible utilisera les conventions Linux (je ne suis pas trop certain pourquoi; j'assume que c'est simplement le comportement par défaut de LLVM). Cela signifie que nos avons besoin d'un point d'entrée nommé `_start` comme décrit dans [l'article précédent][previous post]: [previous post]: @/edition-2/posts/01-freestanding-rust-binary/index.fr.md ```rust // src/main.rs #![no_std] // ne pas lier la bibliothèque standard Rust #![no_main] // désactiver tous les points d'entrée Rust use core::panic::PanicInfo; /// Cette fonction est invoquée lorsque le système panique #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } #[unsafe(no_mangle)] // ne pas massacrer le nom de cette fonction pub extern "C" fn _start() -> ! { // cette fonction est le point d'entrée, puisque le lieur cherche une fonction // nommée `_start` par défaut loop {} } ``` Notez que le point d'entrée doit être appelé `_start` indépendamment du système d'exploitation hôte. Nous pouvons maintenant construire le noyau pour notre nouvelle cible en fournissant le nom du fichier JSON comme `--target`: ``` > cargo build --target x86_64-blog_os.json error: `.json` target specs require -Zjson-target-spec ``` Cela échoue! L'erreur nous dit que les spécifications de cibles JSON personnalisées sont une fonctionnalité instable qui nécessite une activation explicite. Cela s'explique par le fait que le format des fichiers JSON de cible n'est pas encore considéré comme stable, donc des modifications pourraient avoir lieu dans les futures versions de Rust. Consultez l'[issue de suivi pour les spécifications de cibles JSON personnalisées][json-target-spec-issue] pour plus d'informations. [json-target-spec-issue]: https://github.com/rust-lang/rust/issues/151528 #### L'option `json-target-spec` Pour activer le support des spécifications de cibles JSON personnalisées, nous devons créer un fichier de [configuration cargo][cargo configuration] dans `.cargo/config.toml` (le dossier `.cargo` doit être à côté de votre dossier `src`) avec le contenu suivant: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # dans .cargo/config.toml [unstable] json-target-spec = true ``` Ceci active la fonctionnalité instable `json-target-spec`, nous permettant d'utiliser des fichiers JSON de cible personnalisés. Avec cette configuration en place, essayons de construire à nouveau: ``` > cargo build --target x86_64-blog_os.json error[E0463]: can't find crate for `core` ``` Maintenant nous voyons une erreur différente! L'erreur nous dit que le compilateur ne trouve plus la [bibliothèque `core`][`core` library]. Cette bibliothèque contient les types de base Rust comme `Result`, `Option`, les itérateurs, et est implicitement liée à toutes les crates `no_std`. [`core` library]: https://doc.rust-lang.org/nightly/core/index.html Le problème est que la bibliothèque `core` est distribuée avec le compilateur Rust comme biliothèque _precompilée_. Donc, elle est seulement valide pour les triplets d'hôtes supportés (par exemple, `x86_64-unknown-linux-gnu`) mais pas pour notre cible personnalisée. Si nous voulons compiler du code pour d'autres cibles, nous devons d'abord recompiler `core` pour ces cibles. #### L'option `build-std` C'est ici que la [fonctionnalité `build-std`][`build-std` feature] de cargo entre en jeu. Elle permet de recompiler `core` et d'autres crates de la bibliothèque standard sur demande, plutôt que d'utiliser des versions précompilées incluses avec l'installation de Rust. Cette fonctionnalité est très récente et n'est pas encore complète, donc elle est définie comme instable et est seulement disponible avec les [versions nocturnes du compilateur Rust][nightly Rust compilers]. [`build-std` feature]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std [nightly Rust compilers]: #installer-une-version-nocturne-de-rust Pour utiliser cette fonctionnalité, nous devons ajouter ce qui suit à notre fichier de [configuration cargo][cargo configuration] dans `.cargo/config.toml`: ```toml # dans .cargo/config.toml [unstable] json-target-spec = true build-std = ["core", "compiler_builtins"] ``` Ceci indique à cargo qu'il doit recompiler les bibliothèques `core` et `compiler_builtins`. Celle-ci est nécessaire pour qu'elle ait une dépendance de `core`. Afin de recompiler ces bibliothèques, cargo doit avoir accès au code source de Rust, que nous pouvons installer avec `rustup component add rust-src`.
    **Note:** La clé de configuration `unstable.build-std` nécessite une version nocturne de Rust plus récente que 2020-07-15.
    Après avoir défini la clé de configuration `unstable.build-std` et installé la composante `rust-src`, nous pouvons exécuter notre commande de construction à nouveau: ``` > cargo build --target x86_64-blog_os.json Compiling core v0.0.0 (/…/rust/src/libcore) Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) Compiling compiler_builtins v0.1.32 Compiling blog_os v0.1.0 (/…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` Nous voyons que `cargo build` recompile maintenant les bibliothèques `core`, `rustc-std-workspace-core` (une dépendance de `compiler_builtins`), et `compiler_builtins` pour notre cible personnalisée. #### Détails reliés à la mémoire Le compilateur Rust assume qu'un certain ensemble de fonctions intégrées sont disponibles pour tous les systèmes. La plupart de ces fonctions sont fournies par la crate `compiler_builtins` que nous venons de recompiler. Toutefois, certaines fonctions liées à la mémoire dans cette crate ne sont pas activées par défaut puisqu'elles sont normalement fournies par la bibliothèque C sur le système. Parmi ces fonctions, on retrouve `memset`, qui définit tous les octets dans un bloc mémoire à une certaine valeur, `memcpy`, qui copie un bloc mémoire vers un autre, et `memcmp`, qui compare deux blocs mémoire. Alors que nous n'avions pas besoin de ces fonctions pour compiler notre noyau maintenant, elles seront nécessaires aussitôt que nous lui ajouterons plus de code (par exemple, lorsque nous copierons des `struct`). Puisque nous ne pouvons pas lier avec la bibliothèque C du système d'exploitation, nous avons besoin d'une méthode alternative de fournir ces fonctions au compilateur. Une approche possible pour ce faire serait d'implémenter nos propre fonctions `memset`, etc. et de leur appliquer l'attribut `#[unsafe(no_mangle)]` (pour prévenir le changement de nom automatique pendant la compilation). Or, ceci est dangereux puisque toute erreur dans l'implémentation pourrait mener à un comportement indéterminé. Par exemple, implémenter `memcpy` avec une boucle `for` pourrait mener à une recursion infinie puisque les boucles `for` invoquent implicitement la méthode _trait_ [`IntoIterator::into_iter`], qui pourrait invoquer `memcpy` de nouveau. C'est donc une bonne idée de plutôt réutiliser des implémentations existantes et éprouvées. [`IntoIterator::into_iter`]: https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter Heureusement, la crate `compiler_builtins` contient déjà des implémentations pour toutes les fonctions nécessaires, elles sont seulement désactivées par défaut pour ne pas interférer avec les implémentations de la bibliothèque C. Nous pouvons les activer en définissant le drapeau [`build-std-features`] de cargo à `["compiler-builtins-mem"]`. Comme pour le drapeau `build-std`, ce drapeau peut être soit fourni en ligne de commande avec `-Z` ou configuré dans la table `unstable` du fichier `.cargo/config.toml`. Puisque nous voulons toujours construire avec ce drapeau, l'option du fichier de configuration fait plus de sens pour nous: [`build-std-features`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features ```toml # dans .cargo/config.toml [unstable] json-target-spec = true build-std-features = ["compiler-builtins-mem"] build-std = ["core", "compiler_builtins"] ``` (Le support pour la fonctionnalité `compiler-builtins-mem` a [été ajouté assez récemment](https://github.com/rust-lang/rust/pull/77284), donc vous aurez besoin de la version nocturne `2020-09-30` de Rust ou plus récent pour l'utiliser.) Dans les coulisses, ce drapeau active la [fonctionnalité `mem`][`mem` feature] de la crate `compiler_builtins`. Le résultat est que l'attribut `#[unsafe(no_mangle)]` est appliqué aux [implémentations `memcpy` et autres][`memcpy` etc. implementations] de la caise, ce qui les rend disponible au lieur. [`mem` feature]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L54-L55 [`memcpy` etc. implementations]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69 Avec ce changement, notre noyau a des implémentations valides pour toutes les fonctions requises par le compilateur, donc il peut continuer à compiler même si notre code devient plus complexe. #### Définir une cible par défaut Pour ne pas avoir à fournir le paramètre `--target` à chaque invocation de `cargo build`, nous pouvons définir la cible par défaut. Pour ce faire, nous ajoutons le code suivant à notre fichier de [configuration Cargo][cargo configuration] dans `.cargo/config.toml`: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # dans .cargo/config.toml [build] target = "x86_64-blog_os.json" ``` Ceci indique à `cargo` d'utiliser notre cible `x86_64-blog_os.json` quand il n'y a pas d'argument de cible `--target` explicitement fourni. Ceci veut dire que nous pouvons maintenant construire notre noyau avec un simple `cargo build`. Pour plus d'informations sur les options de configuration cargo, jetez un coup d'oeil à la [documentation officielle de cargo][cargo configuration]. Nous pouvons maintenant construire notre noyau pour une cible "bare metal" avec un simple `cargo build`. Toutefois, notre point d'entrée `_start`, qui sera appelé par le bootloader, est encore vide. Il est temps de lui faire afficher quelque chose à l'écran. ### Afficher à l'écran La façon la plus facile d'afficher à l'écran à ce stade est grâce au tampon texte VGA. C'est un emplacement mémoire spécial associé au matériel VGA qui contient le contenu affiché à l'écran. Il consiste normalement en 25 lines qui contiennent chacune 80 cellules de caractère. Chaque cellule de caractère affiche un caractère ASCII avec des couleurs d'avant-plan et d'arrière-plan. Le résultat à l'écran ressemble à ceci: [VGA text buffer]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode ![sortie à l'écran pour des caractères ASCII ordinaires](https://upload.wikimedia.org/wikipedia/commons/f/f8/Codepage-437.png) Nous discuterons de la disposition exacte du tampon VGA dans le prochain article, où nous lui écrirons un premier petit pilote. Pour afficher “Hello World!”, nous devons seulement savoir que le tampon est situé à l'adresse `0xb8000` et que chaque cellule de caractère consiste en un octet ASCII et un octet de couleur. L'implémentation ressemble à ceci : ```rust static HELLO: &[u8] = b"Hello World!"; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { let vga_buffer = 0xb8000 as *mut u8; for (i, &byte) in HELLO.iter().enumerate() { unsafe { *vga_buffer.offset(i as isize * 2) = byte; *vga_buffer.offset(i as isize * 2 + 1) = 0xb; } } loop {} } ``` D'abord, nous transformons l'entier `0xb8000` en un [pointeur brut][raw pointer]. Puis nous [parcourons][iterate] les octets de la [chaîne d'octets][byte string] [statique][static] `HELLO`. Nous utilisons la méthode [`enumerate`] pour aussi obtenir une variable `i`. Dans le corps de la boucle `for`, nous utilisons la méthode [`offset`] pour écrire la chaîne d'octets et l'octet de couleur correspondant(`0xb` est un cyan pâle). [iterate]: https://doc.rust-lang.org/stable/book/ch13-02-iterators.html [static]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate [byte string]: https://doc.rust-lang.org/reference/tokens.html#byte-string-literals [raw pointer]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset Notez qu'il y a un bloc [`unsafe`] qui enveloppe les écritures mémoire. La raison en est que le compilateur Rust ne peut pas prouver que les pointeurs bruts que nous créons sont valides. Ils pourraient pointer n'importe où et mener à une corruption de données. En les mettant dans un bloc `unsafe`, nous disons fondamentalement au compilateur que nous sommes absolument certains que les opérations sont valides. Notez qu'un bloc `unsafe` ne désactive pas les contrôles de sécurité de Rust. Il permet seulement de faire [cinq choses supplémentaires][five additional things]. [`unsafe`]: https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html [five additional things]: https://doc.rust-lang.org/stable/book/ch20-01-unsafe-rust.html#unsafe-superpowers Je veux souligner que **ce n'est pas comme cela que les choses se font en Rust!** Il est très facile de faire des erreurs en travaillant avec des pointeurs bruts à l'intérieur de blocs `unsafe`. Par exemple, nous pourrions facilement écrire au-delà de la fin du tampon si nous ne sommes pas prudents. Alors nous voulons minimiser l'utilisation de `unsafe` autant que possible. Rust nous offre la possibilité de le faire en créant des abstractions de sécurité. Par exemple, nous pourrions créer un type tampon VGA qui encapsule les risques et qui s'assure qu'il est impossible de faire quoi que ce soit d'incorrect à l'extérieur de ce type. Ainsi, nous aurions besoin de très peu de code `unsafe` et nous serions certains que nous ne violons pas la [sécurité de mémoire][memory safety]. Nous allons créer une telle abstraction de tampon VGA buffer dans le prochain article. [memory safety]: https://en.wikipedia.org/wiki/Memory_safety ## Exécuter notre noyau Maintenant que nous avons un exécutable qui fait quelque chose de perceptible, il est temps de l'exécuter. D'abord, nous devons transformer notre noyau compilé en une image de disque amorçable en le liant à un bootloader. Ensuite, nous pourrons exécuter l'image de disque dans une machine virtuelle [QEMU] ou l'amorcer sur du véritable matériel en utilisant une clé USB. ### Créer une image d'amorçage Pour transformer notre noyau compilé en image de disque amorçable, nous devons le lier avec un bootloader. Comme nous l'avons appris dans la [section à propos du lancement][section about booting], le bootloader est responsable de l'initialisation du processeur et du chargement de notre noyau. [section about booting]: #le-processus-d-amorcage Plutôt que d'écrire notre propre bootloader, ce qui est un projet en soi, nous utilisons la crate [`bootloader`]. Cette crate propose un bootloader BIOS de base sans dépendance C. Seulement du code Rust et de l'assembleur intégré. Pour l'utiliser afin de lancer notre noyau, nous devons ajouter une dépendance à cette crate: [`bootloader`]: https://crates.io/crates/bootloader ```toml # dans Cargo.toml [dependencies] bootloader = "0.9.8" ``` Ajouter le bootloader comme dépendance n'est pas suffisant pour réellement créer une image de disque amorçable. Le problème est que nous devons lier notre noyau avec le bootloader après la compilation, mais cargo ne supporte pas les [scripts post-build][post-build scripts]. [post-build scripts]: https://github.com/rust-lang/cargo/issues/545 Pour résoudre ce problème, nous avons créé un outil nommé `bootimage` qui compile d'abord le noyau et le bootloader, et les lie ensuite ensemble pour créer une image de disque amorçable. Pour installer cet outil, exécutez la commande suivante dans votre terminal: ``` cargo install bootimage ``` Pour exécuter `bootimage` et construire le bootloader, vous devez avoir la composante rustup `llvm-tools-preview` installée. Vous pouvez l'installer en exécutant `rustup component add llvm-tools-preview`. Après avoir installé `bootimage` et ajouté la composante `llvm-tools-preview`, nous pouvons créer une image de disque amorçable en exécutant: ``` > cargo bootimage ``` Nous voyons que l'outil recompile notre noyau en utilisant `cargo build`, donc il utilisera automatiquement tout changements que vous faites. Ensuite, il compile le bootloader, ce qui peut prendre un certain temps. Comme toutes les dépendances de crates, il est seulement construit une fois puis il est mis en cache, donc les builds subséquentes seront beaucoup plus rapides. Enfin, `bootimage` combine le bootloader et le noyau en une image de disque amorçable. Après avoir exécuté la commande, vous devriez voir une image de disque amorçable nommée `bootimage-blog_os.bin` dans votre dossier `target/x86_64-blog_os/debug`. Vous pouvez la lancer dans une machine virtuelle ou la copier sur une clé USB pour la lancer sur du véritable matériel. (Notez que ceci n'est pas une image CD, qui est un format différent, donc la graver sur un CD ne fonctionne pas). #### Comment cela fonctionne-t-il? L'outil `bootimage` effectue les étapes suivantes en arrière-plan: - Il compile notre noyau en un fichier [ELF]. - Il compile notre dépendance bootloader en exécutable autonome. - Il lie les octets du fichier ELF noyau au bootloader. [ELF]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format [rust-osdev/bootloader]: https://github.com/rust-osdev/bootloader Lorsque lancé, le bootloader lit et analyse le fichier ELF ajouté. Il associe ensuite les segments du programme aux adresses virtuelles dans les tables de pages, réinitialise la section `.bss`, puis met en place une pile. Finalement, il lit le point d'entrée (notre fonction `_start`) et s'y rend. ### Amorçage dans QEMU Nous pouvons maintenant lancer l'image disque dans une machine virtuelle. Pour la démarrer dans [QEMU], exécutez la commande suivante : [QEMU]: https://www.qemu.org/ ``` > qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] ``` Ceci ouvre une fenêtre séparée qui devrait ressembler à cela: ![QEMU showing "Hello World!"](qemu.png) Nous voyoons que notre "Hello World!" est visible à l'écran. ### Véritable ordinateur Il est aussi possible d'écrire l'image disque sur une clé USB et de le lancer sur un véritable ordinateur, **mais soyez prudent** et choisissez le bon nom de périphérique, parce que **tout sur ce périphérique sera écrasé**: ``` > dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX && sync ``` Où `sdX` est le nom du périphérique de votre clé USB. Après l'écriture de l'image sur votre clé USB, vous pouvez l'exécuter sur du véritable matériel en l'amorçant à partir de la clé USB. Vous devrez probablement utiliser un menu d'amorçage spécial ou changer l'ordre d'amorçage dans votre configuration BIOS pour amorcer à partir de la clé USB. Notez que cela ne fonctionne actuellement pas avec des ordinateurs UEFI, puisque la crate `bootloader` ne supporte pas encore UEFI. ### Utilisation de `cargo run` Pour faciliter l'exécution de notre noyau dans QEMU, nous pouvons définir la clé de configuration `runner` pour cargo: ```toml # dans .cargo/config.toml [target.'cfg(target_os = "none")'] runner = "bootimage runner" ``` La table `target.'cfg(target_os = "none")'` s'applique à toutes les cibles dont le champ `"os"` dans le fichier de configuration est défini à `"none"`. Ceci inclut notre cible `x86_64-blog_os.json`. La clé `runner` key spécifie la commande qui doit être invoquée pour `cargo run`. La commande est exécutée après une build réussie avec le chemin de l'exécutable comme premier argument. Voir la [configuration cargo][cargo configuration] pour plus de détails. La commande `bootimage runner` est spécifiquement conçue pour être utilisable comme un exécutable `runner`. Elle lie l'exécutable fourni avec le bootloader duquel dépend le projet et lance ensuite QEMU. Voir le [README de `bootimage`][Readme of `bootimage`] pour plus de détails et les options de configuration possibles. [Readme of `bootimage`]: https://github.com/rust-osdev/bootimage Nous pouvons maintenant utiliser `cargo run` pour compiler notre noyau et le lancer dans QEMU. ## Et ensuite? Dans le prochain article, nous explorerons le tampon texte VGA plus en détails et nous écrirons une interface sécuritaire pour l'utiliser. Nous allons aussi mettre en place la macro `println`. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/index.ja.md ================================================ +++ title = "Rustでつくる最小のカーネル" weight = 2 path = "ja/minimal-rust-kernel" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "7212ffaa8383122b1eb07fe1854814f99d2e1af4" # GitHub usernames of the people that translated this post translators = ["swnakamura", "JohnTitor"] +++ この記事では、Rustで最小限の64bitカーネルを作ります。前の記事で作った[フリースタンディングなRustバイナリ][freestanding Rust binary]を下敷きにして、何かを画面に出力する、ブータブルディスクイメージを作ります。 [freestanding Rust binary]: @/edition-2/posts/01-freestanding-rust-binary/index.ja.md このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-02` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-02 ## 起動 (Boot) のプロセス {#the-boot-process} コンピュータを起動すると、マザーボードの [ROM] に保存されたファームウェアのコードを実行し始めます。このコードは、[起動時の自己テスト (power-on self test) ][power-on self-test]を実行し、使用可能なRAMを検出し、CPUとハードウェアを事前初期化 (pre-initialize) します。その後、ブータブル (bootable) ディスクを探し、オペレーティングシステムのカーネルを起動 (boot) します。 [ROM]: https://ja.wikipedia.org/wiki/Read_only_memory [power-on self-test]: https://ja.wikipedia.org/wiki/Power_On_Self_Test x86には2つのファームウェアの標準規格があります:"Basic Input/Output System" (**[BIOS]**) と、より新しい "Unified Extensible Firmware Interface" (**[UEFI]**) です。BIOS規格は古く時代遅れですが、シンプルでありすべてのx86のマシンで1980年代からよくサポートされています。対して、UEFIはより現代的でずっと多くの機能を持っていますが、セットアップが複雑です(少なくとも私はそう思います)。 [BIOS]: https://ja.wikipedia.org/wiki/Basic_Input/Output_System [UEFI]: https://ja.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface 今の所、このブログではBIOSしかサポートしていませんが、UEFIのサポートも計画中です。お手伝いいただける場合は、[GitHubのissue](https://github.com/phil-opp/blog_os/issues/349)をご覧ください。 ### BIOSの起動 ほぼすべてのx86システムがBIOSによる起動をサポートしています。これは近年のUEFIベースのマシンも例外ではなく、それらはエミュレートされたBIOSを使います。前世紀のすべてのマシンにも同じブートロジックが使えるなんて素晴らしいですね。しかし、この広い互換性は、BIOSによる起動の最大の欠点でもあるのです。というのもこれは、1980年代の化石のようなブートローダーを動かすために、CPUが[リアルモード (real mode) ][real mode]と呼ばれる16bit互換モードにされてしまうということを意味しているからです。 まあ順を追って見ていくこととしましょう。 コンピュータは起動時にマザーボードにある特殊なフラッシュメモリからBIOSを読み込みます。BIOSは自己テストとハードウェアの初期化ルーチンを実行し、ブータブルディスクを探します。ディスクが見つかると、 **ブートローダー (bootloader) ** と呼ばれる、その先頭512バイトに保存された実行可能コードへと操作権が移ります。多くのブートローダーのサイズは512バイトより大きいため、通常は512バイトに収まる小さな最初のステージと、その最初のステージによって読み込まれる第2ステージに分けられています。 ブートローダーはディスク内のカーネルイメージの場所を特定し、メモリに読み込まなければなりません。また、CPUを16bitの[リアルモード][real mode]から32bitの[プロテクトモード (protected mode) ][protected mode]へ、そして64bitの[ロングモード (long mode) ][long mode]――64bitレジスタとすべてのメインメモリが利用可能になります――へと変更しなければなりません。3つ目の仕事は、特定の情報(例えばメモリーマップなどです)をBIOSから聞き出し、OSのカーネルに渡すことです。 [real mode]: https://ja.wikipedia.org/wiki/リアルモード [protected mode]: https://ja.wikipedia.org/wiki/プロテクトモード [long mode]: https://en.wikipedia.org/wiki/Long_mode [memory segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation ブートローダーを書くのにはアセンブリ言語を必要とするうえ、「何も考えずにプロセッサーのこのレジスタにこの値を書き込んでください」のような勉強の役に立たない作業がたくさんあるので、ちょっと面倒くさいです。ですのでこの記事ではブートローダーの制作については飛ばして、代わりに[bootimage]という、自動でカーネルの前にブートローダを置いてくれるツールを使いましょう。 [bootimage]: https://github.com/rust-osdev/bootimage 自前のブートローダーを作ることに興味がある人もご期待下さい、これに関する記事も計画中です! #### Multiboot標準規格 すべてのオペレーティングシステムが、自身にのみ対応しているブートローダーを実装するということを避けるために、1995年に[フリーソフトウェア財団][Free Software Foundation]が[Multiboot]というブートローダーの公開標準規格を策定しています。この標準規格では、ブートローダーとオペレーティングシステムのインターフェースが定義されており、Multibootに準拠したブートローダーであれば、同じくそれに準拠したすべてのオペレーティングシステムが読み込めるようになっています。そのリファレンス実装として、Linuxシステムで一番人気のブートローダーである[GNU GRUB]があります。 [Free Software Foundation]: https://ja.wikipedia.org/wiki/フリーソフトウェア財団 [Multiboot]: https://wiki.osdev.org/Multiboot [GNU GRUB]: https://ja.wikipedia.org/wiki/GNU_GRUB カーネルをMultibootに準拠させるには、カーネルファイルの先頭にいわゆる[Multiboot header]を挿入するだけで済みます。このおかげで、OSをGRUBで起動するのはとても簡単です。しかし、GRUBとMultiboot標準規格にはいくつか問題もあります: [Multiboot header]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format - これらは32bitプロテクトモードしかサポートしていません。そのため、64bitロングモードに変更するためのCPUの設定は依然行う必要があります。 - これらは、カーネルではなくブートローダーがシンプルになるように設計されています。例えば、カーネルは[通常とは異なるデフォルトページサイズ][adjusted default page size]でリンクされる必要があり、そうしないとGRUBはMultiboot headerを見つけることができません。他にも、カーネルに渡される[ブート情報 (boot information) ][boot information]は、クリーンな抽象化を与えてくれず、アーキテクチャ依存の構造を多く含んでいます。 - GRUBもMultiboot標準規格もドキュメントが充実していません。 - カーネルファイルからブータブルディスクイメージを作るには、ホストシステムにGRUBがインストールされている必要があります。これにより、MacとWindows上での開発は比較的難しくなっています。 [adjusted default page size]: https://wiki.osdev.org/Multiboot#Multiboot_2 [boot information]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format これらの欠点を考慮し、私達はGRUBとMultiboot標準規格を使わないことに決めました。しかし、あなたのカーネルをGRUBシステム上で読み込めるように、私達の[bootimage]ツールにMultibootのサポートを追加することも計画しています。Multiboot準拠なカーネルを書きたい場合は、このブログシリーズの[第1版][first edition]をご覧ください。 [first edition]: @/edition-1/_index.md ### UEFI (今の所UEFIのサポートは提供していませんが、ぜひともしたいと思っています!お手伝いいただける場合は、 [GitHub issue](https://github.com/phil-opp/blog_os/issues/349)で教えてください。) ## 最小のカーネル どのようにコンピュータが起動するのかについてざっくりと理解できたので、自前で最小のカーネルを書いてみましょう。目標は、起動したら画面に"Hello, World!"と出力するようなディスクイメージを作ることです。というわけで、前の記事の[独立した (freestanding) Rustバイナリ][freestanding Rust binary]をもとにして作っていきます。 覚えていますか、この独立したバイナリは`cargo`を使ってビルドしましたが、オペレーティングシステムに依って異なるエントリポイント名とコンパイルフラグが必要なのでした。これは`cargo`は標準では **ホストシステム**(あなたの使っているシステム)向けにビルドするためです。例えばWindows上で走るカーネルというのはあまり意味がなく、私達の望む動作ではありません。代わりに、明確に定義された **ターゲットシステム** 向けにコンパイルできると理想的です。 ### RustのNightly版をインストールする {#installing-rust-nightly} Rustには**stable**、**beta**、**nightly**の3つのリリースチャンネルがあります。Rust Bookはこれらの3つのチャンネルの違いをとても良く説明しているので、一度[確認してみてください](https://doc.rust-jp.rs/book-ja/appendix-07-nightly-rust.html)。オペレーティングシステムをビルドするには、nightlyチャンネルでしか利用できないいくつかの実験的機能を使う必要があるので、Rustのnightly版をインストールすることになります。 Rustの実行環境を管理するのには、[rustup]を強くおすすめします。nightly、beta、stable版のコンパイラをそれぞれインストールすることができますし、アップデートするのも簡単です。現在のディレクトリにnightlyコンパイラを使うようにするには、`rustup override set nightly`と実行してください。もしくは、`rust-toolchain`というファイルに`nightly`と記入してプロジェクトのルートディレクトリに置くことでも指定できます。Nightly版を使っていることは、`rustc --version`と実行することで確かめられます。表示されるバージョン名の末尾に`-nightly`とあるはずです。 [rustup]: https://www.rustup.rs/ nightlyコンパイラでは、いわゆる**feature flag**をファイルの先頭につけることで、いろいろな実験的機能を使うことを選択できます。例えば、`#![feature(asm)]`を`main.rs`の先頭につけることで、インラインアセンブリのための実験的な[`asm!`マクロ][`asm!` macro]を有効化することができます。ただし、これらの実験的機能は全くもって不安定 (unstable) であり、将来のRustバージョンにおいては事前の警告なく変更されたり取り除かれたりする可能性があることに注意してください。このため、絶対に必要なときにのみこれらを使うことにします。 [`asm!` macro]: https://doc.rust-lang.org/stable/reference/inline-assembly.html ### ターゲットの仕様 Cargoは`--target`パラメータを使ってさまざまなターゲットをサポートします。ターゲットはいわゆる[target triple (3つ組) ][target triple]によって表されます。これはCPUアーキテクチャ、製造元、オペレーティングシステム、そして[ABI]を表します。例えば、`x86_64-unknown-linux-gnu`というtarget tripleは、`x86_64`のCPU、製造元不明、GNU ABIのLinuxオペレーティングシステム向けのシステムを表します。Rustは[多くのtarget triple][platform-support]をサポートしており、その中にはAndroidのための`arm-linux-androideabi`や[WebAssemblyのための`wasm32-unknown-unknown`](https://www.hellorust.com/setup/wasm-target/)などがあります。 [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [ABI]: https://stackoverflow.com/a/2456882 [platform-support]: https://forge.rust-lang.org/release/platform-support.html [custom-targets]: https://doc.rust-lang.org/nightly/rustc/targets/custom.html しかしながら、私達のターゲットシステムには、いくつか特殊な設定パラメータが必要になります(例えば、その下ではOSが走っていない、など)。なので、[既存のtarget triple][platform-support]はどれも当てはまりません。ありがたいことに、RustではJSONファイルを使って[独自のターゲット][custom-targets]を定義できます。例えば、`x86_64-unknown-linux-gnu`というターゲットを表すJSONファイルはこんな感じです。 ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "linux", "executables": true, "linker-flavor": "gcc", "pre-link-args": ["-m64"], "morestack": false } ``` ほとんどのフィールドはLLVMがそのプラットフォーム向けのコードを生成するために必要なものです。例えば、[`data-layout`]フィールドは種々の整数、浮動小数点数、ポインタ型の大きさを定義しています。次に、`target-pointer-width`のような、条件付きコンパイルに用いられるフィールドがあります。第3の種類のフィールドはクレートがどのようにビルドされるべきかを定義します。例えば、`pre-link-args`フィールドは[リンカ (linker) ][linker]に渡される引数を指定しています。 [`data-layout`]: https://llvm.org/docs/LangRef.html#data-layout [linker]: https://ja.wikipedia.org/wiki/リンケージエディタ 私達のカーネルも`x86_64`のシステムをターゲットとするので、私達のターゲット仕様も上のものと非常によく似たものになるでしょう。`x86_64-blog_os.json`というファイル(お好きな名前を選んでください)を作り、共通する要素を埋めるところから始めましょう。 ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true } ``` ベアメタル (bare metal) 環境で実行するので、`llvm-target`のOSを変え、`os`フィールドを`none`にしたことに注目してください。 以下の、ビルドに関係する項目を追加します。 ```json "linker-flavor": "ld.lld", "linker": "rust-lld", ``` 私達のカーネルをリンクするのに、プラットフォーム標準の(Linuxターゲットをサポートしていないかもしれない)リンカではなく、Rustに付属しているクロスプラットフォームの[LLD]リンカを使用します。 [LLD]: https://lld.llvm.org/ ```json "panic-strategy": "abort", ``` この設定は、ターゲットがパニック時の[stack unwinding]をサポートしていないので、プログラムは代わりに直接中断 (abort) しなければならないということを指定しています。これは、Cargo.tomlに`panic = "abort"`という設定を書くのに等しいですから、後者の設定を消しても構いません(このターゲット設定は、Cargo.tomlの設定と異なり、このあと行う`core`ライブラリの再コンパイルにも適用されます。ですので、Cargo.tomlに設定する方が好みだったとしても、この設定を追加するようにしてください)。 [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php ```json "disable-redzone": true, ``` カーネルを書いている以上、ある時点で割り込み (interrupt) を処理しなければならなくなるでしょう。これを安全に行うために、 **"red zone"** と呼ばれる、ある種のスタックポインタ最適化を無効化する必要があります。こうしないと、スタックの破損 (corruption) を引き起こしてしまう恐れがあるためです。より詳しくは、[red zoneの無効化][disabling the red zone]という別記事をご覧ください。 [disabling the red zone]: @/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.md ```json "features": "-mmx,-sse,+soft-float", ``` `features`フィールドは、ターゲットの機能 (features) を有効化/無効化します。マイナスを前につけることで`mmx`と`sse`という機能を無効化し、プラスを前につけることで`soft-float`という機能を有効化しています。それぞれのフラグの間にスペースは入れてはならず、もしそうするとLLVMが機能文字列の解釈に失敗してしまうことに注意してください。 `mmx`と`sse`という機能は、[Single Instruction Multiple Data (SIMD)]命令をサポートするかを決定します。この命令は、しばしばプログラムを著しく速くしてくれます。しかし、大きなSIMDレジスタをOSカーネルで使うことは性能上の問題に繋がります。 その理由は、カーネルは、割り込まれたプログラムを再開する前に、すべてのレジスタを元に戻さないといけないためです。これは、カーネルがSIMDの状態のすべてを、システムコールやハードウェア割り込みがあるたびにメインメモリに保存しないといけないということを意味します。SIMDの状態情報はとても巨大(512〜1600 bytes)で、割り込みは非常に頻繁に起こるかもしれないので、保存・復元の操作がこのように追加されるのは性能にかなりの悪影響を及ぼします。これを避けるために、(カーネルの上で走っているアプリケーションではなく!)カーネル上でSIMDを無効化するのです。 [Single Instruction Multiple Data (SIMD)]: https://ja.wikipedia.org/wiki/SIMD SIMDを無効化することによる問題に、`x86_64`における浮動小数点演算は標準ではSIMDレジスタを必要とするということがあります。この問題を解決するため、`soft-float`機能を追加します。これは、すべての浮動小数点演算を通常の整数に基づいたソフトウェア上の関数を使ってエミュレートするというものです。 より詳しくは、[SIMDを無効化する](@/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.md)ことに関する私達の記事を読んでください。 ```json "rustc-abi": "x86-softfloat" ``` As we want to use the `soft-float` feature, we also need to tell the Rust compiler `rustc` that we want to use the corresponding ABI. We can do that by setting the `rustc-abi` field to `x86-softfloat`. #### まとめると 私達のターゲット仕様ファイルは今このようになっているはずです。 ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true, "linker-flavor": "ld.lld", "linker": "rust-lld", "panic-strategy": "abort", "disable-redzone": true, "features": "-mmx,-sse,+soft-float", "rustc-abi": "x86-softfloat" } ``` ### カーネルをビルドする 私達の新しいターゲットのコンパイルにはLinuxの慣習に倣います(理由は知りません、LLVMのデフォルトであるというだけではないでしょうか)。つまり、[前の記事][previous post]で説明したように`_start`という名前のエントリポイントが要るということです。 [previous post]: @/edition-2/posts/01-freestanding-rust-binary/index.ja.md ```rust // src/main.rs #![no_std] // don't link the Rust standard library #![no_main] // disable all Rust-level entry points use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { // this function is the entry point, since the linker looks for a function // named `_start` by default loop {} } ``` ホストOSが何であるかにかかわらず、エントリポイントは`_start`という名前でなければならないことに注意してください。 これで、私達の新しいターゲットのためのカーネルを、JSONファイル名を`--target`として渡すことでビルドできるようになりました。 ``` > cargo build --target x86_64-blog_os.json error: `.json` target specs require -Zjson-target-spec ``` 失敗しましたね!エラーは、カスタムJSONターゲット仕様は明示的な有効化が必要な不安定機能であると言っています。これは、JSONターゲットファイルのフォーマットがまだ安定と見なされていないため、将来のRustのバージョンで変更される可能性があるからです。詳細は[カスタムJSONターゲット仕様のトラッキングissue][json-target-spec-issue]をご覧ください。 [json-target-spec-issue]: https://github.com/rust-lang/rust/issues/151528 #### `json-target-spec`オプション カスタムJSONターゲット仕様のサポートを有効にするためには、[cargoの設定][cargo configuration]ファイルを`.cargo/config.toml`に作り(`.cargo`フォルダは`src`フォルダの横に置きます)、次の内容を書きましょう: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # in .cargo/config.toml [unstable] json-target-spec = true ``` これにより不安定な`json-target-spec`機能が有効になり、カスタムJSONターゲットファイルを使用できるようになります。 この設定を行ったら、もう一度ビルドしてみましょう: ``` > cargo build --target x86_64-blog_os.json error[E0463]: can't find crate for `core` ``` 今度は別のエラーが出ました!エラーはRustコンパイラが[`core`ライブラリ][`core` library]を見つけられなくなったと言っています。このライブラリは、`Result` や `Option`、イテレータのような基本的なRustの型を持っており、暗黙のうちにすべての`no_std`なクレートにリンクされています。 [`core` library]: https://doc.rust-lang.org/nightly/core/index.html 問題は、coreライブラリはRustコンパイラと一緒にコンパイル済み (precompiled) ライブラリとして配布されているということです。そのため、これは、私達独自のターゲットではなく、サポートされているhost triple(例えば `x86_64-unknown-linux-gnu`)でのみ使えるのです。他のターゲットのためにコードをコンパイルしたいときには、`core`をそれらのターゲットに向けて再コンパイルする必要があります。 #### `build-std`オプション ここでcargoの[`build-std`機能][`build-std` feature]の出番です。これを使うと`core`やその他の標準ライブラリクレートについて、Rustインストール時に一緒についてくるコンパイル済みバージョンを使う代わりに、必要に応じて再コンパイルすることができます。これはとても新しくまだ完成していないので、不安定 (unstable) 機能とされており、[nightly Rustコンパイラ][nightly Rust compilers]でのみ利用可能です。 [`build-std` feature]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std [nightly Rust compilers]: #installing-rust-nightly この機能を使うためには、[cargoの設定][cargo configuration]ファイル`.cargo/config.toml`に以下を追加しましょう: ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std = ["core", "compiler_builtins"] ``` これはcargoに`core`と`compiler_builtins`ライブラリを再コンパイルするよう命令します。後者が必要なのは`core`がこれに依存しているためです。 これらのライブラリを再コンパイルするためには、cargoがRustのソースコードにアクセスできる必要があります。これは`rustup component add rust-src`でインストールできます。
    **注意:** `unstable.build-std`設定キーを使うには、少なくとも2020-07-15以降のRust nightlyが必要です。
    `unstable.build-std`設定キーをセットし、`rust-src`コンポーネントをインストールしたら、ビルドコマンドをもう一度実行しましょう。 ``` > cargo build --target x86_64-blog_os.json Compiling core v0.0.0 (/…/rust/src/libcore) Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) Compiling compiler_builtins v0.1.32 Compiling blog_os v0.1.0 (/…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` 今回は、`cargo build`が`core`、`rustc-std-workspace-core` (`compiler_builtins`の依存です)、そして `compiler_builtins`を私達のカスタムターゲット向けに再コンパイルしているということがわかります。 #### メモリ関係の組み込み関数 (intrinsics) Rustコンパイラは、すべてのシステムにおいて、特定の組み込み関数が利用可能であるということを前提にしています。それらの関数の多くは、私達がちょうど再コンパイルした`compiler_builtins`クレートによって提供されています。しかしながら、通常システムのCライブラリによって提供されているので標準では有効化されていない、メモリ関係の関数がいくつかあります。それらの関数には、メモリブロック内のすべてのバイトを与えられた値にセットする`memset`、メモリーブロックを他のブロックへとコピーする`memcpy`、2つのメモリーブロックを比較する`memcmp`などがあります。これらの関数はどれも、現在の段階で我々のカーネルをコンパイルするのに必要というわけではありませんが、コードを追加していくとすぐに必要になるでしょう(たとえば、構造体をコピーする、など)。 オペレーティングシステムのCライブラリにリンクすることはできませんので、これらの関数をコンパイラに与えてやる別の方法が必要になります。このための方法として考えられるものの一つが、自前で`memset`を実装し、(コンパイル中の自動リネームを防ぐため)`#[unsafe(no_mangle)]`アトリビュートをこれらに適用することでしょう。しかし、こうすると、これらの関数の実装のちょっとしたミスが未定義動作に繋がりうるため危険です。たとえば、`for`ループを使って`memcpy`を実装すると無限再帰を起こしてしまうかもしれません。なぜなら、`for`ループは暗黙のうちに[`IntoIterator::into_iter`]トレイトメソッドを呼び出しており、これが`memcpy`を再び呼び出しているかもしれないためです。なので、代わりに既存のよくテストされた実装を再利用するのが良いでしょう。 [`IntoIterator::into_iter`]: https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter ありがたいことに、`compiler_builtins`クレートにはこれらの必要な関数すべての実装が含まれており、標準ではCライブラリの実装と競合しないように無効化されているだけなのです。これはcargoの[`build-std-features`]フラグを`["compiler-builtins-mem"]`に設定することで有効化できます。`build-std`フラグと同じように、このフラグはコマンドラインで`-Z`フラグとして渡すこともできれば、`.cargo/config.toml`ファイルの`unstable`テーブルで設定することもできます。ビルド時は常にこのフラグをセットしたいので、設定ファイルを使う方が良いでしょう: [`build-std-features`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std-features = ["compiler-builtins-mem"] build-std = ["core", "compiler_builtins"] ``` (`compiler-builtins-mem`機能のサポートが追加されたのは[つい最近](https://github.com/rust-lang/rust/pull/77284)なので、`2020-09-30`以降のRust nightlyが必要です。) このとき、裏で`compiler_builtins`クレートの[`mem`機能][`mem` feature]が有効化されています。これにより、このクレートの[`memcpy`などの実装][`memcpy` etc. implementations]に`#[unsafe(no_mangle)]`アトリビュートが適用され、リンカがこれらを利用できるようになっています。 [`mem` feature]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L51-L52 [`memcpy` etc. implementations]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69 この変更をもって、私達のカーネルはコンパイラに必要とされているすべての関数の有効な実装を手に入れたので、コードがもっと複雑になっても変わらずコンパイルできるでしょう。 #### 標準のターゲットをセットする `cargo build`を呼び出すたびに`--target`パラメータを渡すのを避けるために、デフォルトのターゲットを書き換えることができます。これをするには、以下を`.cargo/config.toml`の[cargo設定][cargo configuration]ファイルに付け加えます: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # in .cargo/config.toml [build] target = "x86_64-blog_os.json" ``` これは、明示的に`--target`引数が渡されていないときは、`x86_64-blog_os.json`ターゲットを使うように`cargo`に命令します。つまり、私達はカーネルをシンプルな`cargo build`コマンドでビルドできるということです。cargoの設定のオプションについてより詳しく知るには、[公式のドキュメント][cargo configuration]を読んでください。 これにより、シンプルな`cargo build`コマンドで、ベアメタルのターゲットに私達のカーネルをビルドできるようになりました。しかし、ブートローダーによって呼び出される私達の`_start`エントリポイントはまだ空っぽです。そろそろここから何かを画面に出力してみましょう。 ### 画面に出力する 現在の段階で画面に文字を出力する最も簡単な方法は[VGAテキストバッファ][VGA text buffer]です。これは画面に出力されている内容を保持しているVGAハードウェアにマップされた特殊なメモリです。通常、これは25行からなり、それぞれの行は80文字セルからなります。それぞれの文字セルは、背景色と前景色付きのASCII文字を表示します。画面出力はこのように見えるでしょう: [VGA text buffer]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode ![screen output for common ASCII characters](https://upload.wikimedia.org/wikipedia/commons/f/f8/Codepage-437.png) 次の記事では、VGAバッファの正確なレイアウトについて議論し、このためのちょっとしたドライバも書きます。"Hello World!"を出力するためには、バッファがアドレス`0xb8000`にあり、それぞれの文字セルはASCIIのバイトと色のバイトからなることを知っている必要があります。 実装はこんな感じになります: ```rust static HELLO: &[u8] = b"Hello World!"; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { let vga_buffer = 0xb8000 as *mut u8; for (i, &byte) in HELLO.iter().enumerate() { unsafe { *vga_buffer.offset(i as isize * 2) = byte; *vga_buffer.offset(i as isize * 2 + 1) = 0xb; } } loop {} } ``` まず、`0xb8000`という整数を[生ポインタ][raw pointer]にキャストします。次に[静的 (static) ][static]な`HELLO`という[バイト列][byte string]変数の要素に対し[イテレート][iterate]します。[`enumerate`]メソッドを使うことで、`for` ループの実行回数を表す変数 `i` も取得します。ループの内部では、[`offset`]メソッドを使って文字列のバイトと対応する色のバイト(`0xb`は明るいシアン色)を書き込んでいます。 [iterate]: https://doc.rust-jp.rs/book-ja/ch13-02-iterators.html [static]: https://doc.rust-jp.rs/book-ja/ch10-03-lifetime-syntax.html#静的ライフタイム [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate [byte string]: https://doc.rust-lang.org/reference/tokens.html#byte-string-literals [raw pointer]: https://doc.rust-jp.rs/book-ja/ch19-01-unsafe-rust.html#生ポインタを参照外しする [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset すべてのメモリへの書き込み処理のコードを、[`unsafe` (安全でない) ][`unsafe`]ブロックが囲んでいることに注意してください。この理由は、私達の作った生ポインタが正しいものであることをRustコンパイラが証明できないためです。生ポインタはどんな場所でも指しうるので、データの破損につながるかもしれません。これらの操作を`unsafe`ブロックに入れることで、私達はこれが正しいことを確信しているとコンパイラに伝えているのです。ただし、`unsafe`ブロックはRustの安全性チェックを消すわけではなく、[追加で5つのことができるようになる][five additional things]だけということに注意してください。
    **訳注:** 翻訳時点(2020-10-20)では、リンク先のThe Rust book日本語版には「追加でできるようになること」は4つしか書かれていません。
    [`unsafe`]: https://doc.rust-jp.rs/book-ja/ch19-01-unsafe-rust.html [five additional things]: https://doc.rust-jp.rs/book-ja/ch19-01-unsafe-rust.html#unsafeの強大な力superpower 強調しておきたいのですが、 **このような機能はRustでプログラミングするときに使いたいものではありません!** unsafeブロック内で生ポインタを扱うと非常にしくじりやすいです。たとえば、注意不足でバッファの終端のさらに奥に書き込みを行ってしまったりするかもしれません。 ですので、`unsafe`の使用は最小限にしたいです。これをするために、Rustでは安全なabstraction (抽象化されたもの) を作ることができます。たとえば、VGAバッファ型を作り、この中にすべてのunsafeな操作をカプセル化し、外側からの誤った操作が**不可能**であることを保証できるでしょう。こうすれば、`unsafe`の量を最小限にでき、[メモリ安全性][memory safety]を侵していないことを確かにできます。そのような安全なVGAバッファの abstraction を次の記事で作ります。 [memory safety]: https://ja.wikipedia.org/wiki/メモリ安全性 ## カーネルを実行する では、目で見て分かる処理を行う実行可能ファイルを手に入れたので、実行してみましょう。まず、コンパイルした私達のカーネルを、ブートローダーとリンクすることによってブータブルディスクイメージにする必要があります。そして、そのディスクイメージを、[QEMU]バーチャルマシン内や、USBメモリを使って実際のハードウェア上で実行できます。 ### ブートイメージを作る コンパイルされた私達のカーネルをブータブルディスクイメージに変えるには、ブートローダーとリンクする必要があります。[起動のプロセスのセクション][section about booting]で学んだように、ブートローダーはCPUを初期化しカーネルをロードする役割があります。 [section about booting]: #the-boot-process 自前のブートローダーを書くと、それだけで1つのプロジェクトになってしまうので、代わりに[`bootloader`]クレートを使いましょう。このクレートは、Cに依存せず、Rustとインラインアセンブリだけで基本的なBIOSブートローダーを実装しています。私達のカーネルを起動するためにこれを依存関係に追加する必要があります: [`bootloader`]: https://crates.io/crates/bootloader ```toml # in Cargo.toml [dependencies] bootloader = "0.9" ``` bootloaderを依存として加えることだけでブータブルディスクイメージが実際に作れるわけではなく、私達のカーネルをコンパイル後にブートローダーにリンクする必要があります。問題は、cargoが[ビルド後 (post-build) にスクリプトを走らせる機能][post-build scripts]を持っていないことです。 [post-build scripts]: https://github.com/rust-lang/cargo/issues/545 この問題を解決するため、私達は`bootimage`というツールを作りました。これは、まずカーネルとブートローダーをコンパイルし、そしてこれらをリンクしてブータブルディスクイメージを作ります。このツールをインストールするには、以下のコマンドをターミナルで実行してください: ``` cargo install bootimage ``` `bootimage`を実行しブートローダをビルドするには、`llvm-tools-preview`というrustupコンポーネントをインストールする必要があります。これは`rustup component add llvm-tools-preview`と実行することでできます。 `bootimage`をインストールし、`llvm-tools-preview`を追加したら、以下のように実行することでブータブルディスクイメージを作れます: ``` > cargo bootimage ``` このツールが私達のカーネルを`cargo build`を使って再コンパイルしていることがわかります。そのため、あなたの行った変更を自動で検知してくれます。その後、bootloaderをビルドします。これには少し時間がかかるかもしれません。他の依存クレートと同じように、ビルドは一度しか行われず、その都度キャッシュされるので、以降のビルドはもっと早くなります。最終的に、`bootimage`はbootloaderとあなたのカーネルを合体させ、ブータブルディスクイメージにします。 このコマンドを実行したら、`target/x86_64-blog_os/debug`ディレクトリ内に`bootimage-blog_os.bin`という名前のブータブルディスクイメージがあるはずです。これをバーチャルマシン内で起動してもいいですし、実際のハードウェア上で起動するためにUSBメモリにコピーしてもいいでしょう(ただし、これはCDイメージではありません。CDイメージは異なるフォーマットを持つので、これをCDに焼いてもうまくいきません)。 #### どういう仕組みなの? `bootimage`ツールは、裏で以下のステップを行っています: - 私達のカーネルを[ELF]ファイルにコンパイルする。 - 依存であるbootloaderをスタンドアロンの実行ファイルとしてコンパイルする。 - カーネルのELFファイルのバイト列をブートローダーにリンクする。 [ELF]: https://ja.wikipedia.org/wiki/Executable_and_Linkable_Format [rust-osdev/bootloader]: https://github.com/rust-osdev/bootloader 起動時、ブートローダーは追加されたELFファイルを読み、解釈します。次にプログラム部をページテーブル (page table) 仮想アドレス (virtual address) にマップし、`.bss`部をゼロにし、スタックをセットアップします。最後に、エントリポイントのアドレス(私達の`_start`関数)を読み、そこにジャンプします。 ### QEMUで起動する これで、ディスクイメージを仮想マシンで起動できます。[QEMU]を使ってこれを起動するには、以下のコマンドを実行してください: [QEMU]: https://www.qemu.org/ ``` > qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] ``` これにより、以下のような見た目の別のウィンドウが開きます: ![QEMU showing "Hello World!"](qemu.png) 私達の書いた"Hello World!"が画面に見えますね。 ### 実際のマシン USBメモリにこれを書き込んで実際のマシン上で起動することも可能です: ``` > dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX && sync ``` `sdX`はあなたのUSBメモリのデバイス名です。そのデバイス上のすべてのデータが上書きされてしまうので、 **正しいデバイス名を選んでいるのかよく確認してください** 。 イメージをUSBメモリに書き込んだあとは、そこから起動することによって実際のハードウェア上で走らせることができます。特殊なブートメニューを使ったり、BIOS設定で起動時の優先順位を変え、USBメモリから起動することを選択する必要があるでしょう。ただし、`bootloader`クレートはUEFIをサポートしていないので、UEFIマシン上ではうまく動作しないということに注意してください。 ### `cargo run`を使う QEMU上でより簡単に私達のカーネルを走らせるために、cargoの`runner`設定が使えます。 ```toml # in .cargo/config.toml [target.'cfg(target_os = "none")'] runner = "bootimage runner" ``` `target.'cfg(target_os = "none")'`テーブルは、`"os"`フィールドが`"none"`であるようなすべてのターゲットに適用されます。私達の`x86_64-blog_os.json`ターゲットもその1つです。`runner`キーは`cargo run`のときに呼ばれるコマンドを指定しています。このコマンドは、ビルドが成功した後に、実行可能ファイルのパスを第一引数として実行されます。詳しくは、[cargoのドキュメント][cargo configuration]を読んでください。 `bootimage runner`コマンドは、`runner`キーとして実行するために設計されています。このコマンドは、与えられた実行ファイルをプロジェクトの依存するbootloaderとリンクして、QEMUを立ち上げます。より詳しく知りたいときや、設定オプションについては[`bootimage`のReadme][Readme of `bootimage`]を読んでください。 [Readme of `bootimage`]: https://github.com/rust-osdev/bootimage これで、`cargo run`を使ってカーネルをコンパイルしQEMU内で起動することができます。 ## 次は? 次の記事では、VGAテキストバッファをより詳しく学び、そのための安全なインターフェースを書きます。また、`println`マクロのサポートも行います。 ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/index.ko.md ================================================ +++ title = "최소 기능을 갖춘 커널" weight = 2 path = "ko/minimal-rust-kernel" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "c1af4e31b14e562826029999b9ab1dce86396b93" # GitHub usernames of the people that translated this post translators = ["JOE1994", "Quqqu"] +++ 이번 포스트에서는 x86 아키텍처에서 최소한의 기능으로 동작하는 64비트 Rust 커널을 함께 만들 것입니다. 지난 포스트 [Rust로 'Freestanding 실행파일' 만들기][freestanding Rust binary] 에서 작업한 것을 토대로 부팅 가능한 디스크 이미지를 만들고 화면에 데이터를 출력해볼 것입니다. [freestanding Rust binary]: @/edition-2/posts/01-freestanding-rust-binary/index.md 이 블로그는 [GitHub 저장소][GitHub]에서 오픈 소스로 개발되고 있으니, 문제나 문의사항이 있다면 저장소의 'Issue' 기능을 이용해 제보해주세요. [페이지 맨 아래][at the bottom]에 댓글을 남기실 수도 있습니다. 이 포스트와 관련된 모든 소스 코드는 저장소의 [`post-02 브랜치`][post branch]에서 확인하실 수 있습니다. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-02 ## 부팅 과정 {#the-boot-process} 전원이 켜졌을 때 컴퓨터가 맨 처음 하는 일은 바로 마더보드의 [롬 (ROM)][ROM]에 저장된 펌웨어 코드를 실행하는 것입니다. 이 코드는 [시동 자체 시험][power-on self-test]을 진행하고, 사용 가능한 램 (RAM)을 확인하며, CPU 및 하드웨어의 초기화 작업을 진행합니다. 그 후에는 부팅 가능한 디스크를 감지하고 운영체제 커널을 부팅하기 시작합니다. [ROM]: https://en.wikipedia.org/wiki/Read-only_memory [power-on self-test]: https://en.wikipedia.org/wiki/Power-on_self-test x86 시스템에는 두 가지 펌웨어 표준이 존재합니다: 하나는 "Basic Input/Output System"(**[BIOS]**)이고 다른 하나는 "Unified Extensible Firmware Interface" (**[UEFI]**) 입니다. BIOS 표준은 구식 표준이지만, 간단하며 1980년대 이후 출시된 어떤 x86 하드웨어에서도 지원이 잘 됩니다. UEFI는 신식 표준으로서 더 많은 기능들을 갖추었지만, 제대로 설정하고 구동시키기까지의 과정이 더 복잡합니다 (적어도 제 주관적 입장에서는 그렇게 생각합니다). [BIOS]: https://en.wikipedia.org/wiki/BIOS [UEFI]: https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface 우리가 만들 운영체제에서는 BIOS 표준만을 지원할 것이지만, UEFI 표준도 지원하고자 하는 계획이 있습니다. UEFI 표준을 지원할 수 있도록 도와주시고 싶다면 해당 [깃헙 이슈](https://github.com/phil-opp/blog_os/issues/349)를 확인해주세요. ### BIOS 부팅 UEFI 표준으로 동작하는 최신 기기들도 가상 BIOS를 지원하기에, 존재하는 거의 모든 x86 시스템들이 BIOS 부팅을 지원합니다. 덕분에 하나의 BIOS 부팅 로직을 구현하면 여태 만들어진 거의 모든 컴퓨터를 부팅시킬 수 있습니다. 동시에 이 방대한 호환성이 BIOS의 가장 큰 약점이기도 한데, 그 이유는 1980년대의 구식 부트로더들에 대한 하위 호환성을 유지하기 위해 부팅 전에는 항상 CPU를 16비트 호환 모드 ([real mode]라고도 불림)로 설정해야 하기 때문입니다. 이제 BIOS 부팅 과정의 첫 단계부터 살펴보겠습니다: 여러분이 컴퓨터의 전원을 켜면, 제일 먼저 컴퓨터는 마더보드의 특별한 플래시 메모리로부터 BIOS 이미지를 로드합니다. BIOS 이미지는 자가 점검 및 하드웨어 초기화 작업을 처리한 후에 부팅 가능한 디스크가 있는지 탐색합니다. 부팅 가능한 디스크가 있다면, 제어 흐름은 해당 디스크의 _부트로더 (bootloader)_ 에게 넘겨집니다. 이 부트로더는 디스크의 가장 앞 주소 영역에 저장되는 512 바이트 크기의 실행 파일입니다. 대부분의 부트로더들의 경우 로직을 저장하는 데에 512 바이트보다 더 큰 용량이 필요하기에, 부트로더의 로직을 둘로 쪼개어 첫 단계 로직을 첫 512 바이트 안에 담고, 두 번째 단계 로직은 첫 단계 로직에 의해 로드된 이후 실행됩니다. 부트로더는 커널 이미지가 디스크의 어느 주소에 저장되어있는지 알아낸 후 메모리에 커널 이미지를 로드해야 합니다. 그다음 CPU를 16비트 [real mode]에서 32비트 [protected mode]로 전환하고, 그 후에 다시 CPU를 64비트 [long mode]로 전환한 이후부터 64비트 레지스터 및 메인 메모리의 모든 주소를 사용할 수 있게 됩니다. 부트로더가 세 번째로 할 일은 BIOS로부터 메모리 매핑 정보 등의 필요한 정보를 알아내어 운영체제 커널에 전달하는 것입니다. [real mode]: https://en.wikipedia.org/wiki/Real_mode [protected mode]: https://en.wikipedia.org/wiki/Protected_mode [long mode]: https://en.wikipedia.org/wiki/Long_mode [memory segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation 부트로더를 작성하는 것은 상당히 성가신 작업인데, 그 이유는 어셈블리 코드도 작성해야 하고 "A 레지스터에 B 값을 저장하세요" 와 같이 원리를 단번에 이해하기 힘든 작업이 많이 수반되기 때문입니다. 따라서 이 포스트에서는 부트로더를 만드는 것 자체를 다루지는 않고, 대신 운영체제 커널의 맨 앞에 부트로더를 자동으로 추가해주는 [bootimage]라는 도구를 제공합니다. [bootimage]: https://github.com/rust-osdev/bootimage 본인의 부트로더를 직접 작성하는 것에 흥미가 있으시다면, 이 주제로 여러 포스트가 나올 계획이니 기대해주세요! #### Multiboot 표준 운영체제마다 부트로더 구현 방법이 다르다면 한 운영체제에서 동작하는 부트로더가 다른 운영체제에서는 호환이 되지 않을 것입니다. 이런 불편한 점을 막기 위해 [Free Software Foundation]에서 1995년에 [Multiboot]라는 부트로더 표준을 개발했습니다. 이 표준은 부트로더와 운영체제 사이의 상호 작용 방식을 정의하였는데, 이 Multiboot 표준에 따르는 부트로더는 Multiboot 표준을 지원하는 어떤 운영체제에서도 동작합니다. 이 표준을 구현한 대표적인 예로 리눅스 시스템에서 가장 인기 있는 부트로더인 [GNU GRUB]이 있습니다. [Free Software Foundation]: https://en.wikipedia.org/wiki/Free_Software_Foundation [Multiboot]: https://wiki.osdev.org/Multiboot [GNU GRUB]: https://en.wikipedia.org/wiki/GNU_GRUB 운영체제 커널이 Multiboot를 지원하게 하려면 커널 파일의 맨 앞에 [Multiboot 헤더][Multiboot header]를 삽입해주면 됩니다. 이렇게 하면 GRUB에서 운영체제를 부팅하는 것이 매우 쉬워집니다. 하지만 GRUB 및 Multiboot 표준도 몇 가지 문제점들을 안고 있습니다: [Multiboot header]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format - 오직 32비트 protected mode만을 지원합니다. 64비트 long mode를 이용하고 싶다면 CPU 설정을 별도로 변경해주어야 합니다. - Multiboot 표준 및 GRUB은 부트로더 구현의 단순화를 우선시하여 개발되었기에, 이에 호응하는 커널 측의 구현이 번거로워진다는 단점이 있습니다. 예를 들어, GRUB이 Multiboot 헤더를 제대로 찾을 수 있으려면 커널 측에서 [조정된 기본 페이지 크기 (adjusted default page size)][adjusted default page size]를 링크하는 것이 강제됩니다. 또한, 부트로더가 커널로 전달하는 [부팅 정보][boot information]는 적절한 추상 레벨에서 표준화된 형태로 전달되는 대신 하드웨어 아키텍처마다 상이한 형태로 제공됩니다. - GRUB 및 Multiboot 표준에 대한 문서화 작업이 덜 되어 있습니다. - GRUB이 호스트 시스템에 설치되어 있어야만 커널 파일로부터 부팅 가능한 디스크 이미지를 만들 수 있습니다. 이 때문에 Windows 및 Mac에서는 부트로더를 개발하는 것이 Linux보다 어렵습니다. [adjusted default page size]: https://wiki.osdev.org/Multiboot#Multiboot_2 [boot information]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format 이러한 단점들 때문에 우리는 GRUB 및 Multiboot 표준을 사용하지 않을 것입니다. 하지만 미래에 우리의 [bootimage] 도구가 Multiboot 표준을 지원하도록 하는 것도 계획 중입니다. Multiboot 표준을 지원하는 운영체제를 커널을 개발하는 것에 관심이 있으시다면, 이 블로그 시리즈의 [첫 번째 에디션][first edition]을 확인해주세요. [first edition]: @/edition-1/_index.md ### UEFI (아직 UEFI 표준을 지원하지 않지만, UEFI 표준을 지원할 수 있도록 도와주시려면 해당 [깃헙 이슈](https://github.com/phil-opp/blog_os/issues/349)에 댓글을 남겨주세요!) ## 최소한의 기능을 갖춘 운영체제 커널 컴퓨터의 부팅 과정에 대해서 대략적으로 알게 되었으니, 이제 우리 스스로 최소한의 기능을 갖춘 운영체제 커널을 작성해볼 차례입니다. 우리의 목표는 부팅 이후 화면에 "Hello World!" 라는 메세지를 출력하는 디스크 이미지를 만드는 것입니다. 지난 포스트에서 만든 [freestanding Rust 실행파일][freestanding Rust binary] 을 토대로 작업을 이어나갑시다. 지난 포스트에서 우리는 `cargo`를 통해 freestanding 실행파일을 만들었었는데, 호스트 시스템의 운영체제에 따라 프로그램 실행 시작 지점의 이름 및 컴파일 인자들을 다르게 설정해야 했습니다. 이것은 `cargo`가 기본적으로 _호스트 시스템_ (여러 분이 실행 중인 컴퓨터 시스템) 을 목표로 빌드하기 때문이었습니다. 우리의 커널은 다른 운영체제 (예를 들어 Windows) 위에서 실행될 것이 아니기에, 호스트 시스템에 설정 값을 맞추는 대신에 우리가 명확히 정의한 _목표 시스템 (target system)_ 을 목표로 컴파일할 것입니다. ### Rust Nightly 설치하기 {#installing-rust-nightly} Rust는 _stable_, _beta_ 그리고 _nightly_ 이렇게 세 가지의 채널을 통해 배포됩니다. Rust Book에 [세 채널들 간의 차이에 대해 잘 정리한 챕터](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#choo-choo-release-channels-and-riding-the-trains)가 있습니다. 운영체제를 빌드하기 위해서는 _nightly_ 채널에서만 제공하는 실험적인 기능들을 이용해야 하기에 _nightly_ 버전의 Rust를 설치하셔야 합니다. 여러 버전의 Rust 언어 설치 파일들을 관리할 때 [rustup]을 사용하는 것을 강력 추천합니다. rustup을 통해 nightly, beta 그리고 stable 컴파일러들을 모두 설치하고 업데이트할 수 있습니다. `rustup override set nightly` 명령어를 통해 현재 디렉토리에서 항상 nightly 버전의 Rust를 사용하도록 설정할 수 있습니다. `rust-toolchain`이라는 파일을 프로젝트 루트 디렉토리에 만들고 이 파일에 `nightly`라는 텍스트를 적어 놓아도 같은 효과를 볼 수 있습니다. `rustc --version` 명령어를 통해 현재 nightly 버전이 설치되어 있는지 확인할 수 있습니다 (출력되는 버전 넘버가 `-nightly`라는 텍스트로 끝나야 합니다). [rustup]: https://www.rustup.rs/ nightly 컴파일러는 _feature 플래그_ 를 소스코드의 맨 위에 추가함으로써 여러 실험적인 기능들을 선별해 이용할 수 있게 해줍니다. 예를 들어, `#![feature(asm)]` 를 `main.rs`의 맨 위에 추가하면 [`asm!` 매크로][`asm!` macro]를 사용할 수 있습니다. `asm!` 매크로는 인라인 어셈블리 코드를 작성할 때 사용합니다. 이런 실험적인 기능들은 말 그대로 "실험적인" 기능들이기에 미래의 Rust 버전들에서는 예고 없이 변경되거나 삭제될 수도 있습니다. 그렇기에 우리는 이 실험적인 기능들을 최소한으로만 사용할 것입니다. [`asm!` macro]: https://doc.rust-lang.org/stable/reference/inline-assembly.html ### 컴파일 대상 정의하기 Cargo는 `--target` 인자를 통해 여러 컴파일 대상 시스템들을 지원합니다. 컴파일 대상은 소위 _[target triple]_ 을 통해 표현되는데, CPU 아키텍쳐와 CPU 공급 업체, 운영체제, 그리고 [ABI]를 파악할 수 있습니다. 예를 들어 `x86_64-unknown-linux-gnu`는 `x86_64` CPU, 임의의 CPU 공급 업체, Linux 운영체제, 그리고 GNU ABI를 갖춘 시스템을 나타냅니다. Rust는 Android를 위한 `arm-linux-androideabi`와 [WebAssembly를 위한 `wasm32-unknown-unknown`](https://www.hellorust.com/setup/wasm-target/)를 비롯해 [다양한 target triple들][platform-support]을 지원합니다. [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [ABI]: https://stackoverflow.com/a/2456882 [platform-support]: https://forge.rust-lang.org/release/platform-support.html [custom-targets]: https://doc.rust-lang.org/nightly/rustc/targets/custom.html 우리가 목표로 하는 컴파일 대상 환경 (운영체제가 따로 없는 환경)을 정의하려면 몇 가지 특별한 설정 인자들을 사용해야 하기에 [Rust 에서 기본적으로 지원하는 target triple][platform-support] 중에서는 우리가 쓸 수 있는 것은 없습니다. 다행히도 Rust에서는 JSON 파일을 이용해 [우리가 목표로 하는 컴파일 대상 환경][custom-targets]을 직접 정의할 수 있습니다. 예를 들어, `x86_64-unknown-linux-gnu` 환경을 직접 정의하는 JSON 파일의 내용은 아래와 같습니다: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "linux", "executables": true, "linker-flavor": "gcc", "pre-link-args": ["-m64"], "morestack": false } ``` 대부분의 필드 값들은 LLVM이 해당 환경을 목표로 코드를 생성하는 과정에서 필요합니다. 예시로, [`data-layout`] 필드는 다양한 정수, 부동소수점 표기 소수, 포인터 등의 메모리 상 실제 크기를 지정합니다. 또한 `target-pointer-width`와 같이 Rust가 조건부 컴파일을 하는 과정에서 이용하는 필드들도 있습니다. 마지막 남은 종류의 필드들은 crate가 어떻게 빌드되어야 하는지 결정합니다. 예를 들어 `pre-link-args` 필드는 [링커][linker]에 전달될 인자들을 설정합니다. [`data-layout`]: https://llvm.org/docs/LangRef.html#data-layout [linker]: https://en.wikipedia.org/wiki/Linker_(computing) 우리도 `x86_64` 시스템에서 구동할 운영체제 커널을 작성할 것이기에, 우리가 사용할 컴파일 대상 환경 환경 설정 파일 (JSON 파일) 또한 위의 내용과 많이 유사할 것입니다. 일단 `x86_64-blog_os.json`이라는 파일을 만들고 아래와 같이 파일 내용을 작성해주세요: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true } ``` 우리의 운영체제는 bare metal 환경에서 동작할 것이기에, `llvm-target` 필드의 운영체제 값과 `os` 필드의 값은 `none`입니다. 아래의 빌드 관련 설정들을 추가해줍니다: ```json "linker-flavor": "ld.lld", "linker": "rust-lld", ``` 현재 사용 중인 플랫폼의 기본 링커 대신 Rust와 함께 배포되는 크로스 플랫폼 [LLD] 링커를 사용해 커널을 링크합니다 (기본 링커는 리눅스 환경을 지원하지 않을 수 있습니다). [LLD]: https://lld.llvm.org/ ```json "panic-strategy": "abort", ``` 해당 환경이 패닉 시 [스택 되감기][stack unwinding]을 지원하지 않기에, 위 설정을 통해 패닉 시 프로그램이 즉시 실행 종료되도록 합니다. 위 설정은 Cargo.toml 파일에 `panic = "abort"` 설정을 추가하는 것과 비슷한 효과이기에, Cargo.toml에서는 해당 설정을 지우셔도 괜찮습니다 (다만, Cargo.toml에서의 설정과는 달리 이 설정은 이후 단계에서 우리가 `core` 라이브러리를 재컴파일할 때에도 유효하게 적용된다는 점이 중요합니다. 위 설정은 꼭 추가해주세요!). [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php ```json "disable-redzone": true, ``` 커널을 작성하려면, 커널이 인터럽트에 대해 어떻게 대응하는지에 대한 로직도 작성하게 될 것입니다. 안전하게 이런 로직을 작성하기 위해서는 _“red zone”_ 이라고 불리는 스택 포인터 최적화 기능을 해제해야 합니다 (그렇지 않으면 해당 기능으로 인해 스택 메모리가 우리가 원치 않는 값으로 덮어쓰일 수 있습니다). 이 내용에 대해 더 자세히 알고 싶으시면 [red zone 기능 해제][disabling the red zone] 포스트를 확인해주세요. [disabling the red zone]: @/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.ko.md ```json "features": "-mmx,-sse,+soft-float", ``` `features` 필드는 컴파일 대상 환경의 기능들을 활성화/비활성화 하는 데 이용합니다. 우리는 `-` 기호를 통해 `mmx`와 `sse` 기능들을 비활성화시키고 `+` 기호를 통해 `soft-float` 기능을 활성화시킬 것입니다. `features` 필드의 문자열 내부 플래그들 사이에 빈칸이 없도록 해야 합니다. 그렇지 않으면 LLVM이 `features` 필드의 문자열 값을 제대로 해석하지 못하기 때문입니다. `mmx`와 `sse`는 [Single Instruction Multiple Data (SIMD)] 명령어들의 사용 여부를 결정하는데, 해당 명령어들은 프로그램의 실행 속도를 훨씬 빠르게 만드는 데에 도움을 줄 수 있습니다. 하지만 운영체제에서 큰 SIMD 레지스터를 사용할 경우 커널의 성능에 문제가 생길 수 있습니다. 그 이유는 커널이 인터럽트 되었던 프로그램을 다시 실행하기 전에 모든 레지스터 값들을 인터럽트 직전 시점의 상태로 복원시켜야 하기 때문입니다. 커널이 SIMD 레지스터를 사용하려면 각 시스템 콜 및 하드웨어 인터럽트가 일어날 때마다 모든 SIMD 레지스터에 저장된 값들을 메인 메모리에 저장해야 할 것입니다. SIMD 레지스터들이 총 차지하는 용량은 매우 크고 (512-1600 바이트) 인터럽트 또한 자주 일어날 수 있기에, SIMD 레지스터 값들을 메모리에 백업하고 또 다시 복구하는 과정은 커널의 성능을 심각하게 해칠 수 있습니다. 이를 피하기 위해 커널이 SIMD 명령어를 사용하지 않도록 설정합니다 (물론 우리의 커널 위에서 구동할 프로그램들은 SIMD 명령어들을 사용할 수 있습니다!). [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD `x86_64` 환경에서 SIMD 기능을 비활성화하는 것에는 걸림돌이 하나 있는데, 그것은 바로 `x86_64` 환경에서 부동소수점 계산 시 기본적으로 SIMD 레지스터가 사용된다는 것입니다. 이 문제를 해결하기 위해 `soft-float` 기능 (일반 정수 계산만을 이용해 부동소수점 계산을 소프트웨어 단에서 모방)을 활성화시킵니다. 더 자세히 알고 싶으시다면, 저희가 작성한 [SIMD 기능 해제](@/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.ko.md)에 관한 포스트를 확인해주세요. ```json "rustc-abi": "x86-softfloat" ``` `soft-float` 기능을 사용하려면, Rust 컴파일러 `rustc` 에게도 해당 ABI를 사용하겠다고 알려줘야 합니다. 이를 위해 `rustc-abi` 필드를 `x86-softfloat` 으로 설정하면 됩니다. #### 요약 컴파일 대상 환경 설정 파일을 아래와 같이 작성합니다: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true, "linker-flavor": "ld.lld", "linker": "rust-lld", "panic-strategy": "abort", "disable-redzone": true, "features": "-mmx,-sse,+soft-float", "rustc-abi": "x86-softfloat" } ``` ### 커널 빌드하기 우리가 정의한 새로운 컴파일 대상 환경을 목표로 컴파일할 때에 리눅스 시스템의 관례를 따를 것입니다 (LLVM이 기본적으로 리눅스 시스템 관례를 따르기에 그렇습니다). 즉, [지난 포스트][previous post]에서 설명한 것처럼 우리는 실행 시작 지점의 이름을 `_start`로 지정할 것입니다: [previous post]: @/edition-2/posts/01-freestanding-rust-binary/index.md ```rust // src/main.rs #![no_std] // Rust 표준 라이브러리를 링크하지 않도록 합니다 #![no_main] // Rust 언어에서 사용하는 실행 시작 지점 (main 함수)을 사용하지 않습니다 use core::panic::PanicInfo; /// 패닉이 일어날 경우, 이 함수가 호출됩니다. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } #[unsafe(no_mangle)] // 이 함수의 이름을 mangle하지 않습니다 pub extern "C" fn _start() -> ! { // 링커는 기본적으로 '_start' 라는 이름을 가진 함수를 실행 시작 지점으로 삼기에, // 이 함수는 실행 시작 지점이 됩니다 loop {} } ``` 호스트 운영체제에 관계 없이 실행 시작 지점 함수의 이름은 `_start`로 지정해야 함을 기억해주세요. 이제 `--target` 인자를 통해 위에서 다룬 JSON 파일의 이름을 전달하여 우리가 정의한 새로운 컴파일 대상 환경을 목표로 커널을 빌드할 수 있습니다: ``` > cargo build --target x86_64-blog_os.json error: `.json` target specs require -Zjson-target-spec ``` 실패하였군요! 이 오류는 커스텀 JSON 타겟 스펙이 명시적인 활성화가 필요한 불안정한 기능이라는 것을 알려줍니다. JSON 타겟 파일의 형식이 아직 안정적으로 간주되지 않기 때문에 미래 Rust 버전에서 변경될 수 있습니다. 자세한 정보는 [커스텀 JSON 타겟 스펙 트래킹 이슈][json-target-spec-issue]를 참조하세요. [json-target-spec-issue]: https://github.com/rust-lang/rust/issues/151528 #### `json-target-spec` 기능 커스텀 JSON 타겟 스펙 지원을 활성화하려면, [cargo 설정][cargo configuration] 파일 `.cargo/config.toml`을 생성해야 합니다 (`.cargo` 폴더는 `src` 폴더 옆에 위치해야 합니다): [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # .cargo/config.toml 에 들어갈 내용 [unstable] json-target-spec = true ``` 이를 통해 불안정한 `json-target-spec` 기능이 활성화되어 커스텀 JSON 타겟 파일을 사용할 수 있게 됩니다. 이 설정을 완료한 후, 다시 빌드해 봅시다: ``` > cargo build --target x86_64-blog_os.json error[E0463]: can't find crate for `core` ``` 이제 다른 오류가 발생합니다! 이 오류는 Rust 컴파일러가 더 이상 [`core` 라이브러리][`core` library]를 찾지 못한다는 것을 알려줍니다. 이 라이브러리는 `Result`와 `Option` 그리고 반복자 등 Rust의 기본적인 타입들을 포함하며, 모든 `no_std` 크레이트에 암시적으로 링크됩니다. [`core` library]: https://doc.rust-lang.org/nightly/core/index.html 문제는 core 라이브러리가 _미리 컴파일된 상태_ 의 라이브러리로 Rust 컴파일러와 함께 배포된다는 것입니다. `x86_64-unknown-linux-gnu` 등 배포된 라이브러리가 지원하는 컴파일 목표 환경을 위해 빌드하는 경우 문제가 없지만, 우리가 정의한 커스텀 환경을 위해 빌드하는 경우에는 라이브러리를 이용할 수 없습니다. 기본적으로 지원되지 않는 새로운 시스템 환경을 위해 코드를 빌드하기 위해서는 새로운 시스템 환경에서 구동 가능하도록 `core` 라이브러리를 새롭게 빌드해야 합니다. #### `build-std` 기능 이제 cargo의 [`build-std 기능`][`build-std` feature]이 필요한 시점이 왔습니다. Rust 언어 설치파일에 함께 배포된 `core` 및 다른 표준 라이브러리 크레이트 버전을 사용하는 대신, 이 기능을 이용하여 해당 크레이트들을 직접 재컴파일하여 사용할 수 있습니다. 이 기능은 아직 비교적 새로운 기능이며 아직 완성된 기능이 아니기에, "unstable" 한 기능으로 표기되며 [nightly 버전의 Rust 컴파일러][nightly Rust compilers]에서만 이용가능합니다. [`build-std` feature]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std [nightly Rust compilers]: #installing-rust-nightly 해당 기능을 사용하려면, [cargo 설정][cargo configuration] 파일 `.cargo/config.toml`에 아래와 같이 추가해야 합니다: ```toml # .cargo/config.toml 에 들어갈 내용 [unstable] json-target-spec = true build-std = ["core", "compiler_builtins"] ``` 위 설정은 cargo에게 `core`와 `compiler_builtins` 라이브러리를 새로 컴파일하도록 지시합니다. `compiler_builtins`는 `core`가 사용하는 라이브러리입니다. 해당 라이브러리들의 소스 코드가 있어야 새로 컴파일할 수 있기에, `rustup component add rust-src` 명령어를 통해 소스 코드를 설치합니다.
    **주의:** `unstable.build-std` 설정 키를 이용하려면 2020-07-15 혹은 그 이후에 출시된 Rust nightly 버전을 사용하셔야 합니다.
    cargo 설정 키 `unstable.build-std`를 설정하고 `rust-src` 컴포넌트를 설치한 후에 다시 빌드 명령어를 실행합니다: ``` > cargo build --target x86_64-blog_os.json Compiling core v0.0.0 (/…/rust/src/libcore) Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) Compiling compiler_builtins v0.1.32 Compiling blog_os v0.1.0 (/…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` 이제 `cargo build` 명령어가 `core`, `rustc-std-workspace-core` (`compiler_builtins`가 필요로 하는 라이브러리) 그리고 `compiler_builtins` 라이브러리를 우리의 커스텀 컴파일 대상을 위해 다시 컴파일하는 것을 확인할 수 있습니다. #### 메모리 관련 내장 함수 Rust 컴파일러는 특정 군의 내장 함수들이 (built-in function) 모든 시스템에서 주어진다고 가정합니다. 대부분의 내장 함수들은 우리가 방금 컴파일한 `compiler_builtins` 크레이트가 이미 갖추고 있습니다. 하지만 그중 몇몇 메모리 관련 함수들은 기본적으로 사용 해제 상태가 되어 있는데, 그 이유는 해당 함수들을 호스트 시스템의 C 라이브러리가 제공하는 것이 관례이기 때문입니다. `memset`(메모리 블럭 전체에 특정 값 저장하기), `memcpy` (한 메모리 블럭의 데이터를 다른 메모리 블럭에 옮겨쓰기), `memcmp` (메모리 블럭 두 개의 데이터를 비교하기) 등이 이 분류에 해당합니다. 여태까지는 우리가 이 함수들 중 어느 하나도 사용하지 않았지만, 운영체제 구현을 더 추가하다 보면 필수적으로 사용될 함수들입니다 (예를 들어, 구조체를 복사하여 다른 곳에 저장할 때). 우리는 운영체제의 C 라이브러리를 링크할 수 없기에, 다른 방식으로 이러한 내장 함수들을 컴파일러에 전달해야 합니다. 한 방법은 우리가 직접 `memset` 등의 내장함수들을 구현하고 컴파일 과정에서 함수명이 바뀌지 않도록 `#[unsafe(no_mangle)]` 속성을 적용하는 것입니다. 하지만 이 방법의 경우 우리가 직접 구현한 함수 로직에 아주 작은 실수만 있어도 undefined behavior를 일으킬 수 있기에 위험합니다. 예를 들어 `memcpy`를 구현하는 데에 `for`문을 사용한다면 무한 재귀 루프가 발생할 수 있는데, 그 이유는 `for`문의 구현이 내부적으로 trait 함수인 [`IntoIterator::into_iter`]를 호출하고 이 함수가 다시 `memcpy` 를 호출할 수 있기 때문입니다. 그렇기에 충분히 검증된 기존의 구현 중 하나를 사용하는 것이 바람직합니다. [`IntoIterator::into_iter`]: https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter 다행히도 `compiler_builtins` 크레이트가 이미 필요한 내장함수 구현을 전부 갖추고 있으며, C 라이브러리에서 오는 내장함수 구현과 충돌하지 않도록 사용 해제되어 있었던 것 뿐입니다. cargo의 [`build-std-features`] 플래그를 `["compiler-builtins-mem"]`으로 설정함으로써 `compiler_builtins`에 포함된 내장함수 구현을 사용할 수 있습니다. `build-std` 플래그와 유사하게 이 플래그 역시 커맨드 라인에서 `-Z` 플래그를 이용해 인자로 전달하거나 `.cargo/config.toml`의 `[unstable]` 테이블에서 설정할 수 있습니다. 우리는 매번 이 플래그를 사용하여 빌드할 예정이기에 `.cargo/config.toml`을 통해 설정을 하는 것이 장기적으로 더 편리할 것입니다: [`build-std-features`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features ```toml # .cargo/config.toml 에 들어갈 내용 [unstable] json-target-spec = true build-std-features = ["compiler-builtins-mem"] build-std = ["core", "compiler_builtins"] ``` (`compiler-builtins-mem` 기능에 대한 지원이 [굉장히 최근에 추가되었기에](https://github.com/rust-lang/rust/pull/77284), Rust nightly `2020-09-30` 이상의 버전을 사용하셔야 합니다.) 이 기능은 `compiler_builtins` 크레이트의 [`mem` 기능 (feature)][`mem` feature]를 활성화 시킵니다. 이는 `#[unsafe(no_mangle)]` 속성이 [`memcpy` 등의 함수 구현][`memcpy` etc. implementations]에 적용되게 하여 링크가 해당 함수들을 식별하고 사용할 수 있게 합니다. [`mem` feature]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L54-L55 [`memcpy` etc. implementations]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69 이제 우리의 커널은 컴파일러가 요구하는 함수들에 대한 유효한 구현을 모두 갖추게 되었기에, 커널 코드가 더 복잡해지더라도 상관 없이 컴파일하는 데에 문제가 없을 것입니다. #### 기본 컴파일 대상 환경 설정하기 기본 컴파일 대상 환경을 지정하여 설정해놓으면 `cargo build` 명령어를 실행할 때마다 `--target` 인자를 넘기지 않아도 됩니다. [cargo 설정][cargo configuration] 파일인 `.cargo/config.toml`에 아래의 내용을 추가해주세요: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # .cargo/config.toml 에 들어갈 내용 [build] target = "x86_64-blog_os.json" ``` 이로써 `cargo`는 명시적으로 `--target` 인자가 주어지지 않으면 `x86_64-blog_os.json`에 명시된 컴파일 대상 환경을 기본 값으로 이용합니다. `cargo build` 만으로 간단히 커널을 빌드할 수 있게 되었습니다. cargo 설정 옵션들에 대해 더 자세한 정보를 원하시면 [공식 문서][cargo configuration]을 확인해주세요. `cargo build`만으로 이제 bare metal 환경을 목표로 커널을 빌드할 수 있지만, 아직 실행 시작 지점 함수 `_start`는 텅 비어 있습니다. 이제 이 함수에 코드를 추가하여 화면에 메세지를 출력해볼 것입니다. ### 화면에 출력하기 현재 단계에서 가장 쉽게 화면에 문자를 출력할 수 있는 방법은 바로 [VGA 텍스트 버퍼][VGA text buffer]를 이용하는 것입니다. 이것은 VGA 하드웨어에 매핑되는 특수한 메모리 영역이며 화면에 출력될 내용이 저장됩니다. 주로 이 버퍼는 주로 25행 80열 (행마다 80개의 문자 저장)로 구성됩니다. 각 문자는 ASCII 문자로서 전경색 혹은 배경색과 함께 화면에 출력됩니다. 화면 출력 결과의 모습은 아래와 같습니다: [VGA text buffer]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode ![ASCII 문자들을 출력한 화면의 모습](https://upload.wikimedia.org/wikipedia/commons/f/f8/Codepage-437.png) VGA 버퍼가 정확히 어떤 구조를 하고 있는지는 다음 포스트에서 VGA 버퍼 드라이버를 작성하면서 다룰 것입니다. "Hello World!" 메시지를 출력하는 데에는 그저 버퍼의 시작 주소가 `0xb8000`이라는 것, 그리고 각 문자는 ASCII 문자를 위한 1바이트와 색상 표기를 위한 1바이트가 필요하다는 것만 알면 충분합니다. 코드 구현은 아래와 같습니다: ```rust static HELLO: &[u8] = b"Hello World!"; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { let vga_buffer = 0xb8000 as *mut u8; for (i, &byte) in HELLO.iter().enumerate() { unsafe { *vga_buffer.offset(i as isize * 2) = byte; *vga_buffer.offset(i as isize * 2 + 1) = 0xb; } } loop {} } ``` 우선 정수 `0xb8000`을 [raw 포인터][raw pointer]로 형변환 합니다. 그 다음 [static (정적 변수)][static] [바이트 문자열][byte string] `HELLO`의 반복자를 통해 각 바이트를 읽고, [`enumerate`] 함수를 통해 각 바이트의 문자열 내에서의 인덱스 값 `i`를 얻습니다. for문의 내부에서는 [`offset`] 함수를 통해 VGA 버퍼에 문자열의 각 바이트 및 색상 코드를 저장합니다 (`0xb`: light cyan 색상 코드). [iterate]: https://doc.rust-lang.org/stable/book/ch13-02-iterators.html [static]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate [byte string]: https://doc.rust-lang.org/reference/tokens.html#byte-string-literals [raw pointer]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset 메모리 쓰기 작업을 위한 코드 주변에 [`unsafe`] 블록이 있는 것에 주목해주세요. 여기서 `unsafe` 블록이 필요한 이유는 Rust 컴파일러가 우리가 만든 raw 포인터가 유효한 포인터인지 검증할 능력이 없기 때문입니다. `unsafe` 블록 안에 포인터에 대한 쓰기 작업 코드를 적음으로써, 우리는 컴파일러에게 해당 메모리 쓰기 작업이 확실히 안전하다고 선언한 것입니다. `unsafe` 블록이 Rust의 모든 안전성 체크를 해제하는 것은 아니며, `unsafe` 블록 안에서만 [다섯 가지 작업들을 추가적으로][five additional things] 할 수 있습니다. [`unsafe`]: https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html [five additional things]: https://doc.rust-lang.org/stable/book/ch20-01-unsafe-rust.html#unsafe-superpowers **이런 식의 Rust 코드를 작성하는 것은 절대 바람직하지 않다는 것을 강조드립니다!** unsafe 블록 안에서 raw pointer를 쓰다보면 메모리 버퍼 크기를 넘어선 메모리 주소에 데이터를 저장하는 등의 실수를 범하기 매우 쉽습니다. 그렇기에 `unsafe` 블록의 사용을 최소화하는 것이 바람직하며, 그렇게 하기 위해 Rust에서 우리는 안전한 추상 계층을 만들어 이용할 수 있습니다. 예를 들어, 모든 위험한 요소들을 전부 캡슐화한 VGA 버퍼 타입을 만들어 외부 사용자가 해당 타입을 사용 중에 메모리 안전성을 해칠 가능성을 _원천 차단_ 할 수 있습니다. 이런 설계를 통해 최소한의 `unsafe` 블록만을 사용하면서 동시에 우리가 [메모리 안전성][memory safety]을 해치는 일이 없을 것이라 자신할 수 있습니다. 이러한 안전한 추상 레벨을 더한 VGA 버퍼 타입은 다음 포스트에서 만들게 될 것입니다. [memory safety]: https://en.wikipedia.org/wiki/Memory_safety ## 커널 실행시키기 이제 우리가 얻은 실행 파일을 실행시켜볼 차례입니다. 우선 컴파일 완료된 커널을 부트로더와 링크하여 부팅 가능한 디스크 이미지를 만들어야 합니다. 그 다음에 해당 디스크 이미지를 QEMU 가상머신에서 실행시키거나 USB 드라이브를 이용해 실제 컴퓨터에서 부팅할 수 있습니다. ### 부팅 가능한 디스크 이미지 만들기 부팅 가능한 디스크 이미지를 만들기 위해서는 컴파일된 커널을 부트로더와 링크해야합니다. [부팅에 대한 섹션][section about booting]에서 알아봤듯이, 부트로더는 CPU를 초기화하고 커널을 불러오는 역할을 합니다. [section about booting]: #the-boot-process 우리는 부트로더를 직접 작성하는 대신에 [`bootloader`] 크레이트를 사용할 것입니다. 이 크레이트는 Rust와 인라인 어셈블리만으로 간단한 BIOS 부트로더를 구현합니다. 운영체제 커널을 부팅하는 데에 이 크레이트를 쓰기 위해 의존 크레이트 목록에 추가해줍니다: [`bootloader`]: https://crates.io/crates/bootloader ```toml # Cargo.toml 에 들어갈 내용 [dependencies] bootloader = "0.9" ``` 부트로더를 의존 크레이트로 추가하는 것만으로는 부팅 가능한 디스크 이미지를 만들 수 없습니다. 커널 컴파일이 끝난 후 커널을 부트로더와 함께 링크할 수 있어야 하는데, cargo는 현재 [빌드 직후 스크립트 실행][post-build scripts] 기능을 지원하지 않습니다. [post-build scripts]: https://github.com/rust-lang/cargo/issues/545 이 문제를 해결하기 위해 저희가 `bootimage` 라는 도구를 만들었습니다. 이 도구는 커널과 부트로더를 각각 컴파일 한 이후에 둘을 링크하여 부팅 가능한 디스크 이미지를 생성해줍니다. 이 도구를 설치하려면 터미널에서 아래의 명령어를 실행해주세요. ``` cargo install bootimage ``` `bootimage` 도구를 실행시키고 부트로더를 빌드하려면 `llvm-tools-preview` 라는 rustup 컴포넌트가 필요합니다. 명령어 `rustup component add llvm-tools-preview`를 통해 해당 컴포넌트를 설치합니다. `bootimage` 도구를 설치하고 `llvm-tools-preview` 컴포넌트를 추가하셨다면, 이제 아래의 명령어를 통해 부팅 가능한 디스크 이미지를 만들 수 있습니다: ``` > cargo bootimage ``` 이 도구가 `cargo build`를 통해 커널을 다시 컴파일한다는 것을 확인하셨을 것입니다. 덕분에 커널 코드가 변경되어도 `cargo bootimage` 명령어 만으로도 해당 변경 사항이 바로 빌드에 반영됩니다. 그 다음 단계로 이 도구가 부트로더를 컴파일 할 것인데, 시간이 제법 걸릴 수 있습니다. 일반적인 의존 크레이트들과 마찬가지로 한 번 빌드한 후에 빌드 결과가 캐시(cache)되기 때문에, 두 번째 빌드부터는 소요 시간이 훨씬 적습니다. 마지막 단계로 `bootimage` 도구가 부트로더와 커널을 하나로 합쳐 부팅 가능한 디스크 이미지를 생성합니다. 명령어 실행이 끝난 후, `target/x86_64-blog_os/debug` 디렉토리에 `bootimage-blog_os.bin`이라는 부팅 가능한 디스크 이미지가 생성되어 있을 것입니다. 이것을 가상머신에서 부팅하거나 USB 드라이브에 복사한 뒤 실제 컴퓨터에서 부팅할 수 있습니다 (우리가 만든 디스크 이미지는 CD 이미지와는 파일 형식이 다르기 때문에 CD에 복사해서 부팅하실 수는 없습니다). #### 어떻게 동작하는 걸까요? `bootimage` 도구는 아래의 작업들을 순서대로 진행합니다: - 커널을 컴파일하여 [ELF] 파일 생성 - 부트로더 크레이트를 독립된 실행파일로서 컴파일 - 커널의 ELF 파일을 부트로더에 링크 [ELF]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format [rust-osdev/bootloader]: https://github.com/rust-osdev/bootloader 부팅이 시작되면, 부트로더는 커널의 ELF 파일을 읽고 파싱합니다. 그 다음 프로그램의 세그먼트들을 페이지 테이블의 가상 주소에 매핑하고, `bss` 섹션의 모든 메모리 값을 0으로 초기화하며, 스택을 초기화합니다. 마지막으로, 프로그램 실행 시작 지점의 주소 (`_start` 함수의 주소)에서 제어 흐름이 계속되도록 점프합니다. ### QEMU에서 커널 부팅하기 이제 우리의 커널 디스크 이미지를 가상 머신에서 부팅할 수 있습니다. [QEMU]에서 부팅하려면 아래의 명령어를 실행하세요: [QEMU]: https://www.qemu.org/ ``` > qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] ``` 위 명령어를 실행하면 아래와 같은 새로운 창이 열릴 것입니다: ![QEMU showing "Hello World!"](qemu.png) 화면에 "Hello World!" 메세지가 출력된 것을 확인하실 수 있습니다. ### 실제 컴퓨터에서 부팅하기 USB 드라이브에 우리의 커널을 저장한 후 실제 컴퓨터에서 부팅하는 것도 가능합니다: ``` > dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX && sync ``` `sdX` 대신 여러분이 소지한 USB 드라이브의 기기명을 입력하시면 됩니다. 해당 기기에 쓰인 데이터는 전부 덮어씌워지기 때문에 정확한 기기명을 입력하도록 주의해주세요. 이미지를 USB 드라이브에 다 덮어썼다면, 이제 실제 하드웨어에서 해당 이미지를 통해 부트하여 실행할 수 있습니다. 아마 특별한 부팅 메뉴를 사용하거나 BIOS 설정에서 부팅 순서를 변경하여 USB로부터 부팅하도록 설정해야 할 것입니다. `bootloader` 크레이트가 아직 UEFI를 지원하지 않기에, UEFI 표준을 사용하는 기기에서는 부팅할 수 없습니다. ### `cargo run` 명령어 사용하기 QEMU에서 커널을 쉽게 실행할 수 있게 아래처럼 `runner`라는 새로운 cargo 설정 키 값을 추가합니다. ```toml # .cargo/config.toml 에 들어갈 내용 [target.'cfg(target_os = "none")'] runner = "bootimage runner" ``` `target.'cfg(target_os = "none")'`가 붙은 키 값은 `"os"` 필드 설정이 `"none"`으로 되어 있는 컴파일 대상 환경에만 적용됩니다. 따라서 우리의 `x86_64-blog_os.json` 또한 적용 대상에 포함됩니다. `runner` 키 값은 `cargo run` 명령어 실행 시 어떤 명령어를 실행할지 지정합니다. 빌드가 성공적으로 끝난 후에 `runner` 키 값의 명령어가 실행됩니다. [cargo 공식 문서][cargo configuration]를 통해 더 자세한 내용을 확인하실 수 있습니다. 명령어 `bootimage runner`는 프로젝트의 부트로더 라이브러리를 링크한 후에 QEMU를 실행시킵니다. 그렇기에 일반적인 `runner` 실행파일을 실행하듯이 `bootimage runner` 명령어를 사용하실 수 있습니다. [`bootimage` 도구의 Readme 문서][Readme of `bootimage`]를 통해 더 자세한 내용 및 다른 가능한 설정 옵션들을 확인하세요. [Readme of `bootimage`]: https://github.com/rust-osdev/bootimage 이제 `cargo run` 명령어를 통해 우리의 커널을 컴파일하고 QEMU에서 부팅할 수 있습니다. ## 다음 단계는 무엇일까요? 다음 글에서는 VGA 텍스트 버퍼 (text buffer)에 대해 더 알아보고 VGA text buffer와 안전하게 상호작용할 수 있는 방법을 구현할 것입니다. 또한 `println` 매크로를 사용할 수 있도록 기능을 추가할 것입니다. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/index.md ================================================ +++ title = "A Minimal Rust Kernel" weight = 2 path = "minimal-rust-kernel" date = 2018-02-10 [extra] chapter = "Bare Bones" +++ In this post, we create a minimal 64-bit Rust kernel for the x86 architecture. We build upon the [freestanding Rust binary] from the previous post to create a bootable disk image that prints something to the screen. [freestanding Rust binary]: @/edition-2/posts/01-freestanding-rust-binary/index.md This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-02`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-02 ## The Boot Process When you turn on a computer, it begins executing firmware code that is stored in motherboard [ROM]. This code performs a [power-on self-test], detects available RAM, and pre-initializes the CPU and hardware. Afterwards, it looks for a bootable disk and starts booting the operating system kernel. [ROM]: https://en.wikipedia.org/wiki/Read-only_memory [power-on self-test]: https://en.wikipedia.org/wiki/Power-on_self-test On x86, there are two firmware standards: the “Basic Input/Output System“ (**[BIOS]**) and the newer “Unified Extensible Firmware Interface” (**[UEFI]**). The BIOS standard is old and outdated, but simple and well-supported on any x86 machine since the 1980s. UEFI, in contrast, is more modern and has much more features, but is more complex to set up (at least in my opinion). [BIOS]: https://en.wikipedia.org/wiki/BIOS [UEFI]: https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface Currently, we only provide BIOS support, but support for UEFI is planned, too. If you'd like to help us with this, check out the [Github issue](https://github.com/phil-opp/blog_os/issues/349). ### BIOS Boot Almost all x86 systems have support for BIOS booting, including newer UEFI-based machines that use an emulated BIOS. This is great, because you can use the same boot logic across all machines from the last century. But this wide compatibility is at the same time the biggest disadvantage of BIOS booting, because it means that the CPU is put into a 16-bit compatibility mode called [real mode] before booting so that archaic bootloaders from the 1980s would still work. But let's start from the beginning: When you turn on a computer, it loads the BIOS from some special flash memory located on the motherboard. The BIOS runs self-test and initialization routines of the hardware, then it looks for bootable disks. If it finds one, control is transferred to its _bootloader_, which is a 512-byte portion of executable code stored at the disk's beginning. Most bootloaders are larger than 512 bytes, so bootloaders are commonly split into a small first stage, which fits into 512 bytes, and a second stage, which is subsequently loaded by the first stage. The bootloader has to determine the location of the kernel image on the disk and load it into memory. It also needs to switch the CPU from the 16-bit [real mode] first to the 32-bit [protected mode], and then to the 64-bit [long mode], where 64-bit registers and the complete main memory are available. Its third job is to query certain information (such as a memory map) from the BIOS and pass it to the OS kernel. [real mode]: https://en.wikipedia.org/wiki/Real_mode [protected mode]: https://en.wikipedia.org/wiki/Protected_mode [long mode]: https://en.wikipedia.org/wiki/Long_mode [memory segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation Writing a bootloader is a bit cumbersome as it requires assembly language and a lot of non insightful steps like “write this magic value to this processor register”. Therefore, we don't cover bootloader creation in this post and instead provide a tool named [bootimage] that automatically prepends a bootloader to your kernel. [bootimage]: https://github.com/rust-osdev/bootimage If you are interested in building your own bootloader: Stay tuned, a set of posts on this topic is already planned! #### The Multiboot Standard To avoid that every operating system implements its own bootloader, which is only compatible with a single OS, the [Free Software Foundation] created an open bootloader standard called [Multiboot] in 1995. The standard defines an interface between the bootloader and the operating system, so that any Multiboot-compliant bootloader can load any Multiboot-compliant operating system. The reference implementation is [GNU GRUB], which is the most popular bootloader for Linux systems. [Free Software Foundation]: https://en.wikipedia.org/wiki/Free_Software_Foundation [Multiboot]: https://wiki.osdev.org/Multiboot [GNU GRUB]: https://en.wikipedia.org/wiki/GNU_GRUB To make a kernel Multiboot compliant, one just needs to insert a so-called [Multiboot header] at the beginning of the kernel file. This makes it very easy to boot an OS from GRUB. However, GRUB and the Multiboot standard have some problems too: [Multiboot header]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format - They support only the 32-bit protected mode. This means that you still have to do the CPU configuration to switch to the 64-bit long mode. - They are designed to make the bootloader simple instead of the kernel. For example, the kernel needs to be linked with an [adjusted default page size], because GRUB can't find the Multiboot header otherwise. Another example is that the [boot information], which is passed to the kernel, contains lots of architecture-dependent structures instead of providing clean abstractions. - Both GRUB and the Multiboot standard are only sparsely documented. - GRUB needs to be installed on the host system to create a bootable disk image from the kernel file. This makes development on Windows or Mac more difficult. [adjusted default page size]: https://wiki.osdev.org/Multiboot#Multiboot_2 [boot information]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format Because of these drawbacks, we decided to not use GRUB or the Multiboot standard. However, we plan to add Multiboot support to our [bootimage] tool, so that it's possible to load your kernel on a GRUB system too. If you're interested in writing a Multiboot compliant kernel, check out the [first edition] of this blog series. [first edition]: @/edition-1/_index.md ### UEFI (We don't provide UEFI support at the moment, but we would love to! If you'd like to help, please tell us in the [Github issue](https://github.com/phil-opp/blog_os/issues/349).) ## A Minimal Kernel Now that we roughly know how a computer boots, it's time to create our own minimal kernel. Our goal is to create a disk image that prints a “Hello World!” to the screen when booted. We do this by extending the previous post's [freestanding Rust binary]. As you may remember, we built the freestanding binary through `cargo`, but depending on the operating system, we needed different entry point names and compile flags. That's because `cargo` builds for the _host system_ by default, i.e., the system you're running on. This isn't something we want for our kernel, because a kernel that runs on top of, e.g., Windows, does not make much sense. Instead, we want to compile for a clearly defined _target system_. ### Installing Rust Nightly Rust has three release channels: _stable_, _beta_, and _nightly_. The Rust Book explains the difference between these channels really well, so take a minute and [check it out](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#choo-choo-release-channels-and-riding-the-trains). For building an operating system, we will need some experimental features that are only available on the nightly channel, so we need to install a nightly version of Rust. To manage Rust installations, I highly recommend [rustup]. It allows you to install nightly, beta, and stable compilers side-by-side and makes it easy to update them. With rustup, you can use a nightly compiler for the current directory by running `rustup override set nightly`. Alternatively, you can add a file called `rust-toolchain` with the content `nightly` to the project's root directory. You can check that you have a nightly version installed by running `rustc --version`: The version number should contain `-nightly` at the end. [rustup]: https://www.rustup.rs/ The nightly compiler allows us to opt-in to various experimental features by using so-called _feature flags_ at the top of our file. For example, we could enable the experimental [`asm!` macro] for inline assembly by adding `#![feature(asm)]` to the top of our `main.rs`. Note that such experimental features are completely unstable, which means that future Rust versions might change or remove them without prior warning. For this reason, we will only use them if absolutely necessary. [`asm!` macro]: https://doc.rust-lang.org/stable/reference/inline-assembly.html ### Target Specification Cargo supports different target systems through the `--target` parameter. The target is described by a so-called _[target triple]_, which describes the CPU architecture, the vendor, the operating system, and the [ABI]. For example, the `x86_64-unknown-linux-gnu` target triple describes a system with an `x86_64` CPU, no clear vendor, and a Linux operating system with the GNU ABI. Rust supports [many different target triples][platform-support], including `arm-linux-androideabi` for Android or [`wasm32-unknown-unknown` for WebAssembly](https://www.hellorust.com/setup/wasm-target/). [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [ABI]: https://stackoverflow.com/a/2456882 [platform-support]: https://forge.rust-lang.org/release/platform-support.html [custom-targets]: https://doc.rust-lang.org/nightly/rustc/targets/custom.html For our target system, however, we require some special configuration parameters (e.g. no underlying OS), so none of the [existing target triples][platform-support] fits. Fortunately, Rust allows us to define [our own target][custom-targets] through a JSON file. For example, a JSON file that describes the `x86_64-unknown-linux-gnu` target looks like this: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "linux", "executables": true, "linker-flavor": "gcc", "pre-link-args": ["-m64"], "morestack": false } ``` Most fields are required by LLVM to generate code for that platform. For example, the [`data-layout`] field defines the size of various integer, floating point, and pointer types. Then there are fields that Rust uses for conditional compilation, such as `target-pointer-width`. The third kind of field defines how the crate should be built. For example, the `pre-link-args` field specifies arguments passed to the [linker]. [`data-layout`]: https://llvm.org/docs/LangRef.html#data-layout [linker]: https://en.wikipedia.org/wiki/Linker_(computing) We also target `x86_64` systems with our kernel, so our target specification will look very similar to the one above. Let's start by creating an `x86_64-blog_os.json` file (choose any name you like) with the common content: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true } ``` Note that we changed the OS in the `llvm-target` and the `os` field to `none`, because we will run on bare metal. We add the following build-related entries: ```json "linker-flavor": "ld.lld", "linker": "rust-lld", ``` Instead of using the platform's default linker (which might not support Linux targets), we use the cross-platform [LLD] linker that is shipped with Rust for linking our kernel. [LLD]: https://lld.llvm.org/ ```json "panic-strategy": "abort", ``` This setting specifies that the target doesn't support [stack unwinding] on panic, so instead the program should abort directly. This has the same effect as the `panic = "abort"` option in our Cargo.toml, so we can remove it from there. (Note that, in contrast to the Cargo.toml option, this target option also applies when we recompile the `core` library later in this post. So, even if you prefer to keep the Cargo.toml option, make sure to include this option.) [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php ```json "disable-redzone": true, ``` We're writing a kernel, so we'll need to handle interrupts at some point. To do that safely, we have to disable a certain stack pointer optimization called the _“red zone”_, because it would cause stack corruption otherwise. For more information, see our separate post about [disabling the red zone]. [disabling the red zone]: @/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.md ```json "features": "-mmx,-sse,+soft-float", ``` The `features` field enables/disables target features. We disable the `mmx` and `sse` features by prefixing them with a minus and enable the `soft-float` feature by prefixing it with a plus. Note that there must be no spaces between different flags, otherwise LLVM fails to interpret the features string. The `mmx` and `sse` features determine support for [Single Instruction Multiple Data (SIMD)] instructions, which can often speed up programs significantly. However, using the large SIMD registers in OS kernels leads to performance problems. The reason is that the kernel needs to restore all registers to their original state before continuing an interrupted program. This means that the kernel has to save the complete SIMD state to main memory on each system call or hardware interrupt. Since the SIMD state is very large (512–1600 bytes) and interrupts can occur very often, these additional save/restore operations considerably harm performance. To avoid this, we disable SIMD for our kernel (not for applications running on top!). [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD A problem with disabling SIMD is that floating point operations on `x86_64` require SIMD registers by default. To solve this problem, we add the `soft-float` feature, which emulates all floating point operations through software functions based on normal integers. For more information, see our post on [disabling SIMD](@/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.md). ```json "rustc-abi": "x86-softfloat" ``` As we want to use the `soft-float` feature, we also need to tell the Rust compiler `rustc` that we want to use the corresponding ABI. We can do that by setting the `rustc-abi` field to `x86-softfloat`. #### Putting it Together Our target specification file now looks like this: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true, "linker-flavor": "ld.lld", "linker": "rust-lld", "panic-strategy": "abort", "disable-redzone": true, "features": "-mmx,-sse,+soft-float", "rustc-abi": "x86-softfloat" } ``` ### Building our Kernel Compiling for our new target will use Linux conventions, since the ld.lld linker-flavor instructs llvm to compile with the `-flavor gnu` flag (for more linker options, see [the rustc documentation](https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-flavor)). This means that we need an entry point named `_start` as described in the [previous post]: [previous post]: @/edition-2/posts/01-freestanding-rust-binary/index.md ```rust // src/main.rs #![no_std] // don't link the Rust standard library #![no_main] // disable all Rust-level entry points use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { // this function is the entry point, since the linker looks for a function // named `_start` by default loop {} } ``` Note that the entry point needs to be called `_start` regardless of your host OS. We can now build the kernel for our new target by passing the name of the JSON file as `--target`: ``` > cargo build --target x86_64-blog_os.json error: `.json` target specs require -Zjson-target-spec ``` It fails! The error tells us that custom JSON target specifications are an unstable feature that requires explicit opt-in. This is because the format of the JSON target files is not considered stable yet, so changes to it might occur in future versions of Rust. See the [tracking issue for custom JSON target specs][json-target-spec-issue] for more information. [json-target-spec-issue]: https://github.com/rust-lang/rust/issues/151528 #### The `json-target-spec` Option To enable support for custom JSON target specifications, we need to create a local [cargo configuration] file at `.cargo/config.toml` (the `.cargo` folder should be next to your `src` folder) with the following content: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # in .cargo/config.toml [unstable] json-target-spec = true ``` This enables the unstable `json-target-spec` feature, allowing us to use custom JSON target files. With this configuration in place, let's try building again: ``` > cargo build --target x86_64-blog_os.json error[E0463]: can't find crate for `core` ``` It still fails, but with a new error. The error tells us that the Rust compiler does not find the [`core` library]. This library contains basic Rust types such as `Result`, `Option`, and iterators, and is implicitly linked to all `no_std` crates. [`core` library]: https://doc.rust-lang.org/nightly/core/index.html The problem is that the core library is distributed together with the Rust compiler as a _precompiled_ library. So it is only valid for supported host triples (e.g., `x86_64-unknown-linux-gnu`) but not for our custom target. If we want to compile code for other targets, we need to recompile `core` for these targets first. #### The `build-std` Option That's where the [`build-std` feature] of cargo comes in. It allows to recompile `core` and other standard library crates on demand, instead of using the precompiled versions shipped with the Rust installation. This feature is very new and still not finished, so it is marked as "unstable" and only available on [nightly Rust compilers]. [`build-std` feature]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std [nightly Rust compilers]: #installing-rust-nightly To use the feature, we need to add the following to our [cargo configuration] file at `.cargo/config.toml`: ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std = ["core", "compiler_builtins"] ``` This tells cargo that it should recompile the `core` and `compiler_builtins` libraries. The latter is required because it is a dependency of `core`. In order to recompile these libraries, cargo needs access to the rust source code, which we can install with `rustup component add rust-src`.
    **Note:** The `unstable.build-std` configuration key requires at least the Rust nightly from 2020-07-15.
    After setting the `unstable.build-std` configuration key and installing the `rust-src` component, we can rerun our build command: ``` > cargo build --target x86_64-blog_os.json Compiling core v0.0.0 (/…/rust/src/libcore) Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) Compiling compiler_builtins v0.1.32 Compiling blog_os v0.1.0 (/…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` We see that `cargo build` now recompiles the `core`, `rustc-std-workspace-core` (a dependency of `compiler_builtins`), and `compiler_builtins` libraries for our custom target. #### Memory-Related Intrinsics The Rust compiler assumes that a certain set of built-in functions is available for all systems. Most of these functions are provided by the `compiler_builtins` crate that we just recompiled. However, there are some memory-related functions in that crate that are not enabled by default because they are normally provided by the C library on the system. These functions include `memset`, which sets all bytes in a memory block to a given value, `memcpy`, which copies one memory block to another, and `memcmp`, which compares two memory blocks. While we didn't need any of these functions to compile our kernel right now, they will be required as soon as we add some more code to it (e.g. when copying structs around). Since we can't link to the C library of the operating system, we need an alternative way to provide these functions to the compiler. One possible approach for this could be to implement our own `memset` etc. functions and apply the `#[unsafe(no_mangle)]` attribute to them (to avoid the automatic renaming during compilation). However, this is dangerous since the slightest mistake in the implementation of these functions could lead to undefined behavior. For example, implementing `memcpy` with a `for` loop may result in an infinite recursion because `for` loops implicitly call the [`IntoIterator::into_iter`] trait method, which may call `memcpy` again. So it's a good idea to reuse existing, well-tested implementations instead. [`IntoIterator::into_iter`]: https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter Fortunately, the `compiler_builtins` crate already contains implementations for all the needed functions, they are just disabled by default to not collide with the implementations from the C library. We can enable them by setting cargo's [`build-std-features`] flag to `["compiler-builtins-mem"]`. Like the `build-std` flag, this flag can be either passed on the command line as a `-Z` flag or configured in the `unstable` table in the `.cargo/config.toml` file. Since we always want to build with this flag, the config file option makes more sense for us: [`build-std-features`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std-features = ["compiler-builtins-mem"] build-std = ["core", "compiler_builtins"] ``` (Support for the `compiler-builtins-mem` feature was only [added very recently](https://github.com/rust-lang/rust/pull/77284), so you need at least Rust nightly `2020-09-30` for it.) Behind the scenes, this flag enables the [`mem` feature] of the `compiler_builtins` crate. The effect of this is that the `#[unsafe(no_mangle)]` attribute is applied to the [`memcpy` etc. implementations] of the crate, which makes them available to the linker. [`mem` feature]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L54-L55 [`memcpy` etc. implementations]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69 With this change, our kernel has valid implementations for all compiler-required functions, so it will continue to compile even if our code gets more complex. #### Set a Default Target To avoid passing the `--target` parameter on every invocation of `cargo build`, we can override the default target. To do this, we add the following to our [cargo configuration] file at `.cargo/config.toml`: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # in .cargo/config.toml [build] target = "x86_64-blog_os.json" ``` This tells `cargo` to use our `x86_64-blog_os.json` target when no explicit `--target` argument is passed. This means that we can now build our kernel with a simple `cargo build`. For more information on cargo configuration options, check out the [official documentation][cargo configuration]. We are now able to build our kernel for a bare metal target with a simple `cargo build`. However, our `_start` entry point, which will be called by the boot loader, is still empty. It's time that we output something to screen from it. ### Printing to Screen The easiest way to print text to the screen at this stage is the [VGA text buffer]. It is a special memory area mapped to the VGA hardware that contains the contents displayed on screen. It normally consists of 25 lines that each contain 80 character cells. Each character cell displays an ASCII character with some foreground and background colors. The screen output looks like this: [VGA text buffer]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode ![screen output for common ASCII characters](https://upload.wikimedia.org/wikipedia/commons/f/f8/Codepage-437.png) We will discuss the exact layout of the VGA buffer in the next post, where we write a first small driver for it. For printing “Hello World!”, we just need to know that the buffer is located at address `0xb8000` and that each character cell consists of an ASCII byte and a color byte. The implementation looks like this: ```rust static HELLO: &[u8] = b"Hello World!"; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { let vga_buffer = 0xb8000 as *mut u8; for (i, &byte) in HELLO.iter().enumerate() { unsafe { *vga_buffer.offset(i as isize * 2) = byte; *vga_buffer.offset(i as isize * 2 + 1) = 0xb; } } loop {} } ``` First, we cast the integer `0xb8000` into a [raw pointer]. Then we [iterate] over the bytes of the [static] `HELLO` [byte string]. We use the [`enumerate`] method to additionally get a running variable `i`. In the body of the for loop, we use the [`offset`] method to write the string byte and the corresponding color byte (`0xb` is a light cyan). [iterate]: https://doc.rust-lang.org/stable/book/ch13-02-iterators.html [static]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate [byte string]: https://doc.rust-lang.org/reference/tokens.html#byte-string-literals [raw pointer]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset Note that there's an [`unsafe`] block around all memory writes. The reason is that the Rust compiler can't prove that the raw pointers we create are valid. They could point anywhere and lead to data corruption. By putting them into an `unsafe` block, we're basically telling the compiler that we are absolutely sure that the operations are valid. Note that an `unsafe` block does not turn off Rust's safety checks. It only allows you to do [five additional things]. [`unsafe`]: https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html [five additional things]: https://doc.rust-lang.org/stable/book/ch20-01-unsafe-rust.html#unsafe-superpowers I want to emphasize that **this is not the way we want to do things in Rust!** It's very easy to mess up when working with raw pointers inside unsafe blocks. For example, we could easily write beyond the buffer's end if we're not careful. So we want to minimize the use of `unsafe` as much as possible. Rust gives us the ability to do this by creating safe abstractions. For example, we could create a VGA buffer type that encapsulates all unsafety and ensures that it is _impossible_ to do anything wrong from the outside. This way, we would only need minimal amounts of `unsafe` code and can be sure that we don't violate [memory safety]. We will create such a safe VGA buffer abstraction in the next post. [memory safety]: https://en.wikipedia.org/wiki/Memory_safety ## Running our Kernel Now that we have an executable that does something perceptible, it is time to run it. First, we need to turn our compiled kernel into a bootable disk image by linking it with a bootloader. Then we can run the disk image in the [QEMU] virtual machine or boot it on real hardware using a USB stick. ### Creating a Bootimage To turn our compiled kernel into a bootable disk image, we need to link it with a bootloader. As we learned in the [section about booting], the bootloader is responsible for initializing the CPU and loading our kernel. [section about booting]: #the-boot-process Instead of writing our own bootloader, which is a project on its own, we use the [`bootloader`] crate. This crate implements a basic BIOS bootloader without any C dependencies, just Rust and inline assembly. To use it for booting our kernel, we need to add a dependency on it: [`bootloader`]: https://crates.io/crates/bootloader ```toml # in Cargo.toml [dependencies] bootloader = "0.9" ``` **Note:** This post is only compatible with `bootloader v0.9`. Newer versions use a different build system and will result in build errors when following this post. Adding the bootloader as a dependency is not enough to actually create a bootable disk image. The problem is that we need to link our kernel with the bootloader after compilation, but cargo has no support for [post-build scripts]. [post-build scripts]: https://github.com/rust-lang/cargo/issues/545 To solve this problem, we created a tool named `bootimage` that first compiles the kernel and bootloader, and then links them together to create a bootable disk image. To install the tool, go into your home directory (or any directory outside of your cargo project) and execute the following command in your terminal: ``` cargo install bootimage ``` For running `bootimage` and building the bootloader, you need to have the `llvm-tools-preview` rustup component installed. You can do so by executing `rustup component add llvm-tools-preview`. After installing `bootimage` and adding the `llvm-tools-preview` component, you can create a bootable disk image by going back into your cargo project directory and executing: ``` > cargo bootimage ``` We see that the tool recompiles our kernel using `cargo build`, so it will automatically pick up any changes you make. Afterwards, it compiles the bootloader, which might take a while. Like all crate dependencies, it is only built once and then cached, so subsequent builds will be much faster. Finally, `bootimage` combines the bootloader and your kernel into a bootable disk image. After executing the command, you should see a bootable disk image named `bootimage-blog_os.bin` in your `target/x86_64-blog_os/debug` directory. You can boot it in a virtual machine or copy it to a USB drive to boot it on real hardware. (Note that this is not a CD image, which has a different format, so burning it to a CD doesn't work). #### How does it work? The `bootimage` tool performs the following steps behind the scenes: - It compiles our kernel to an [ELF] file. - It compiles the bootloader dependency as a standalone executable. - It links the bytes of the kernel ELF file to the bootloader. [ELF]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format [rust-osdev/bootloader]: https://github.com/rust-osdev/bootloader When booted, the bootloader reads and parses the appended ELF file. It then maps the program segments to virtual addresses in the page tables, zeroes the `.bss` section, and sets up a stack. Finally, it reads the entry point address (our `_start` function) and jumps to it. ### Booting it in QEMU We can now boot the disk image in a virtual machine. To boot it in [QEMU], execute the following command: [QEMU]: https://www.qemu.org/ ``` > qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin ``` This opens a separate window which should look similar to this: ![QEMU showing "Hello World!"](qemu.png) We see that our "Hello World!" is visible on the screen. ### Real Machine It is also possible to write it to a USB stick and boot it on a real machine, **but be careful** to choose the correct device name, because **everything on that device is overwritten**: ``` > dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX && sync ``` Where `sdX` is the device name of your USB stick. After writing the image to the USB stick, you can run it on real hardware by booting from it. You probably need to use a special boot menu or change the boot order in your BIOS configuration to boot from the USB stick. Note that it currently doesn't work for UEFI machines, since the `bootloader` crate has no UEFI support yet. ### Using `cargo run` To make it easier to run our kernel in QEMU, we can set the `runner` configuration key for cargo: ```toml # in .cargo/config.toml [target.'cfg(target_os = "none")'] runner = "bootimage runner" ``` The `target.'cfg(target_os = "none")'` table applies to all targets whose target configuration file's `"os"` field is set to `"none"`. This includes our `x86_64-blog_os.json` target. The `runner` key specifies the command that should be invoked for `cargo run`. The command is run after a successful build with the executable path passed as the first argument. See the [cargo documentation][cargo configuration] for more details. The `bootimage runner` command is specifically designed to be usable as a `runner` executable. It links the given executable with the project's bootloader dependency and then launches QEMU. See the [Readme of `bootimage`] for more details and possible configuration options. [Readme of `bootimage`]: https://github.com/rust-osdev/bootimage Now we can use `cargo run` to compile our kernel and boot it in QEMU. ## What's next? In the next post, we will explore the VGA text buffer in more detail and write a safe interface for it. We will also add support for the `println` macro. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/index.pt-BR.md ================================================ +++ title = "Um Kernel Rust Mínimo" weight = 2 path = "pt-BR/minimal-rust-kernel" date = 2018-02-10 [extra] chapter = "O Básico" # Please update this when updating the translation translation_based_on_commit = "95d4fbd54c6b0e5a874981558c0cc1fe85d31606" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Neste post, criamos um kernel Rust mínimo de 64 bits para a arquitetura x86. Construímos sobre o [binário Rust independente] do post anterior para criar uma imagem de disco inicializável que imprime algo na tela. [binário Rust independente]: @/edition-2/posts/01-freestanding-rust-binary/index.pt-BR.md Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na branch [`post-02`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-02 ## O Processo de Boot Quando você liga um computador, ele começa a executar código de firmware que está armazenado na [ROM] da placa-mãe. Este código executa um [teste automático de inicialização], detecta a RAM disponível e pré-inicializa a CPU e o hardware. Depois, ele procura por um disco inicializável e começa a inicializar o kernel do sistema operacional. [ROM]: https://en.wikipedia.org/wiki/Read-only_memory [teste automático de inicialização]: https://en.wikipedia.org/wiki/Power-on_self-test No x86, existem dois padrões de firmware: o "Basic Input/Output System" (**[BIOS]**) e o mais novo "Unified Extensible Firmware Interface" (**[UEFI]**). O padrão BIOS é antigo e ultrapassado, mas simples e bem suportado em qualquer máquina x86 desde os anos 1980. UEFI, em contraste, é mais moderno e tem muito mais recursos, mas é mais complexo de configurar (na minha opinião, pelo menos). [BIOS]: https://en.wikipedia.org/wiki/BIOS [UEFI]: https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface Atualmente, fornecemos apenas suporte para BIOS, mas suporte para UEFI também está planejado. Se você gostaria de nos ajudar com isso, confira o [issue no Github](https://github.com/phil-opp/blog_os/issues/349). ### Boot BIOS Quase todos os sistemas x86 têm suporte para boot BIOS, incluindo máquinas mais novas baseadas em UEFI que usam um BIOS emulado. Isso é ótimo, porque você pode usar a mesma lógica de boot em todas as máquinas do último século. Mas essa ampla compatibilidade é ao mesmo tempo a maior desvantagem do boot BIOS, porque significa que a CPU é colocada em um modo de compatibilidade de 16 bits chamado [modo real] antes do boot, para que bootloaders arcaicos dos anos 1980 ainda funcionem. Mas vamos começar do início: Quando você liga um computador, ele carrega o BIOS de uma memória flash especial localizada na placa-mãe. O BIOS executa rotinas de teste automático e inicialização do hardware, então procura por discos inicializáveis. Se ele encontra um, o controle é transferido para seu _bootloader_, que é uma porção de 512 bytes de código executável armazenado no início do disco. A maioria dos bootloaders é maior que 512 bytes, então os bootloaders são comumente divididos em um primeiro estágio pequeno, que cabe em 512 bytes, e um segundo estágio, que é subsequentemente carregado pelo primeiro estágio. O bootloader tem que determinar a localização da imagem do kernel no disco e carregá-la na memória. Ele também precisa mudar a CPU do [modo real] de 16 bits primeiro para o [modo protegido] de 32 bits, e então para o [modo longo] de 64 bits, onde registradores de 64 bits e a memória principal completa estão disponíveis. Seu terceiro trabalho é consultar certas informações (como um mapa de memória) do BIOS e passá-las ao kernel do SO. [modo real]: https://en.wikipedia.org/wiki/Real_mode [modo protegido]: https://en.wikipedia.org/wiki/Protected_mode [modo longo]: https://en.wikipedia.org/wiki/Long_mode [segmentação de memória]: https://en.wikipedia.org/wiki/X86_memory_segmentation Escrever um bootloader é um pouco trabalhoso, pois requer linguagem assembly e muitos passos pouco intuitivos como "escrever este valor mágico neste registrador do processador". Portanto, não cobrimos a criação de bootloader neste post e em vez disso fornecemos uma ferramenta chamada [bootimage] que anexa automaticamente um bootloader ao seu kernel. [bootimage]: https://github.com/rust-osdev/bootimage Se você estiver interessado em construir seu próprio bootloader: Fique ligado, um conjunto de posts sobre este tópico já está planejado! #### O Padrão Multiboot Para evitar que todo sistema operacional implemente seu próprio bootloader, que é compatível apenas com um único SO, a [Free Software Foundation] criou um padrão de bootloader aberto chamado [Multiboot] em 1995. O padrão define uma interface entre o bootloader e o sistema operacional, para que qualquer bootloader compatível com Multiboot possa carregar qualquer sistema operacional compatível com Multiboot. A implementação de referência é o [GNU GRUB], que é o bootloader mais popular para sistemas Linux. [Free Software Foundation]: https://en.wikipedia.org/wiki/Free_Software_Foundation [Multiboot]: https://wiki.osdev.org/Multiboot [GNU GRUB]: https://en.wikipedia.org/wiki/GNU_GRUB Para tornar um kernel compatível com Multiboot, basta inserir um chamado [cabeçalho Multiboot] no início do arquivo do kernel. Isso torna muito fácil inicializar um SO a partir do GRUB. No entanto, o GRUB e o padrão Multiboot também têm alguns problemas: [cabeçalho Multiboot]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format - Eles suportam apenas o modo protegido de 32 bits. Isso significa que você ainda tem que fazer a configuração da CPU para mudar para o modo longo de 64 bits. - Eles são projetados para tornar o bootloader simples em vez do kernel. Por exemplo, o kernel precisa ser vinculado com um [tamanho de página padrão ajustado], porque o GRUB não consegue encontrar o cabeçalho Multiboot caso contrário. Outro exemplo é que as [informações de boot], que são passadas ao kernel, contêm muitas estruturas dependentes de arquitetura em vez de fornecer abstrações limpas. - Tanto o GRUB quanto o padrão Multiboot são documentados apenas esparsamente. - O GRUB precisa estar instalado no sistema host para criar uma imagem de disco inicializável a partir do arquivo do kernel. Isso torna o desenvolvimento no Windows ou Mac mais difícil. [tamanho de página padrão ajustado]: https://wiki.osdev.org/Multiboot#Multiboot_2 [informações de boot]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format Por causa dessas desvantagens, decidimos não usar o GRUB ou o padrão Multiboot. No entanto, planejamos adicionar suporte Multiboot à nossa ferramenta [bootimage], para que seja possível carregar seu kernel em um sistema GRUB também. Se você estiver interessado em escrever um kernel compatível com Multiboot, confira a [primeira edição] desta série de blog. [primeira edição]: @/edition-1/_index.md ### UEFI (Não fornecemos suporte UEFI no momento, mas adoraríamos! Se você gostaria de ajudar, por favor nos diga no [issue do Github](https://github.com/phil-opp/blog_os/issues/349).) ## Um Kernel Mínimo Agora que sabemos aproximadamente como um computador inicializa, é hora de criar nosso próprio kernel mínimo. Nosso objetivo é criar uma imagem de disco que imprima um "Hello World!" na tela quando inicializada. Fazemos isso estendendo o [binário Rust independente] do post anterior. Como você deve se lembrar, construímos o binário independente através do `cargo`, mas dependendo do sistema operacional, precisávamos de nomes de ponto de entrada e flags de compilação diferentes. Isso ocorre porque o `cargo` compila para o _sistema host_ por padrão, ou seja, o sistema em que você está executando. Isso não é algo que queremos para nosso kernel, porque um kernel que executa em cima de, por exemplo, Windows, não faz muito sentido. Em vez disso, queremos compilar para um _sistema alvo_ claramente definido. ### Instalando o Rust Nightly O Rust tem três canais de lançamento: _stable_, _beta_ e _nightly_. O Livro do Rust explica a diferença entre esses canais muito bem, então dê uma olhada [aqui](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#choo-choo-release-channels-and-riding-the-trains). Para construir um sistema operacional, precisaremos de alguns recursos experimentais que estão disponíveis apenas no canal nightly, então precisamos instalar uma versão nightly do Rust. Para gerenciar instalações do Rust, eu recomendo fortemente o [rustup]. Ele permite instalar compiladores nightly, beta e stable lado a lado e facilita a atualização deles. Com rustup, você pode usar um compilador nightly para o diretório atual executando `rustup override set nightly`. Alternativamente, você pode adicionar um arquivo chamado `rust-toolchain` com o conteúdo `nightly` ao diretório raiz do projeto. Você pode verificar que tem uma versão nightly instalada executando `rustc --version`: O número da versão deve conter `-nightly` no final. [rustup]: https://www.rustup.rs/ O compilador nightly nos permite optar por vários recursos experimentais usando as chamadas _feature flags_ no topo do nosso arquivo. Por exemplo, poderíamos habilitar a [macro `asm!`] experimental para assembly inline adicionando `#![feature(asm)]` no topo do nosso `main.rs`. Note que tais recursos experimentais são completamente instáveis, o que significa que versões futuras do Rust podem alterá-los ou removê-los sem aviso prévio. Por esta razão, só os usaremos se absolutamente necessário. [macro `asm!`]: https://doc.rust-lang.org/stable/reference/inline-assembly.html ### Especificação de Alvo O Cargo suporta diferentes sistemas alvo através do parâmetro `--target`. O alvo é descrito por uma chamada _[target triple]_, que descreve a arquitetura da CPU, o vendor, o sistema operacional e a [ABI]. Por exemplo, o target triple `x86_64-unknown-linux-gnu` descreve um sistema com uma CPU `x86_64`, sem vendor claro, e um sistema operacional Linux com a ABI GNU. O Rust suporta [muitos target triples diferentes][platform-support], incluindo `arm-linux-androideabi` para Android ou [`wasm32-unknown-unknown` para WebAssembly](https://www.hellorust.com/setup/wasm-target/). [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [ABI]: https://stackoverflow.com/a/2456882 [platform-support]: https://forge.rust-lang.org/release/platform-support.html [custom-targets]: https://doc.rust-lang.org/nightly/rustc/targets/custom.html Para nosso sistema alvo, no entanto, precisamos de alguns parâmetros de configuração especiais (por exemplo, nenhum SO subjacente), então nenhum dos [target triples existentes][platform-support] se encaixa. Felizmente, o Rust nos permite definir [nosso próprio alvo][custom-targets] através de um arquivo JSON. Por exemplo, um arquivo JSON que descreve o target `x86_64-unknown-linux-gnu` se parece com isto: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "linux", "executables": true, "linker-flavor": "gcc", "pre-link-args": ["-m64"], "morestack": false } ``` A maioria dos campos é exigida pelo LLVM para gerar código para aquela plataforma. Por exemplo, o campo [`data-layout`] define o tamanho de vários tipos integer, floating point e pointer. Então há campos que o Rust usa para compilação condicional, como `target-pointer-width`. O terceiro tipo de campo define como a crate deve ser construída. Por exemplo, o campo `pre-link-args` especifica argumentos passados ao [linker]. [`data-layout`]: https://llvm.org/docs/LangRef.html#data-layout [linker]: https://en.wikipedia.org/wiki/Linker_(computing) Também visamos sistemas `x86_64` com nosso kernel, então nossa especificação de alvo será muito similar à acima. Vamos começar criando um arquivo `x86_64-blog_os.json` (escolha qualquer nome que você goste) com o conteúdo comum: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true } ``` Note que mudamos o SO no `llvm-target` e no campo `os` para `none`, porque executaremos em bare metal. Adicionamos as seguintes entradas relacionadas à compilação: ```json "linker-flavor": "ld.lld", "linker": "rust-lld", ``` Em vez de usar o linker padrão da plataforma (que pode não suportar alvos Linux), usamos o linker multiplataforma [LLD] que vem com o Rust para vincular nosso kernel. [LLD]: https://lld.llvm.org/ ```json "panic-strategy": "abort", ``` Esta configuração especifica que o alvo não suporta [stack unwinding] no panic, então em vez disso o programa deve abortar diretamente. Isso tem o mesmo efeito que a opção `panic = "abort"` no nosso Cargo.toml, então podemos removê-la de lá. (Note que, em contraste com a opção Cargo.toml, esta opção de alvo também se aplica quando recompilamos a biblioteca `core` mais adiante neste post. Então, mesmo se você preferir manter a opção Cargo.toml, certifique-se de incluir esta opção.) [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php ```json "disable-redzone": true, ``` Estamos escrevendo um kernel, então precisaremos lidar com interrupções em algum momento. Para fazer isso com segurança, temos que desabilitar uma certa otimização do ponteiro de stack chamada _"red zone"_, porque ela causaria corrupção do stack caso contrário. Para mais informações, veja nosso post separado sobre [desabilitando a red zone]. [desabilitando a red zone]: @/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.md ```json "features": "-mmx,-sse,+soft-float", ``` O campo `features` habilita/desabilita recursos do alvo. Desabilitamos os recursos `mmx` e `sse` prefixando-os com um menos e habilitamos o recurso `soft-float` prefixando-o com um mais. Note que não deve haver espaços entre flags diferentes, caso contrário o LLVM falha ao interpretar a string de features. Os recursos `mmx` e `sse` determinam suporte para instruções [Single Instruction Multiple Data (SIMD)], que frequentemente podem acelerar programas significativamente. No entanto, usar os grandes registradores SIMD em kernels de SO leva a problemas de desempenho. A razão é que o kernel precisa restaurar todos os registradores ao seu estado original antes de continuar um programa interrompido. Isso significa que o kernel tem que salvar o estado SIMD completo na memória principal em cada chamada de sistema ou interrupção de hardware. Como o estado SIMD é muito grande (512-1600 bytes) e interrupções podem ocorrer com muita frequência, essas operações adicionais de salvar/restaurar prejudicam consideravelmente o desempenho. Para evitar isso, desabilitamos SIMD para nosso kernel (não para aplicações executando em cima!). [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD Um problema com desabilitar SIMD é que operações de ponto flutuante em `x86_64` exigem registradores SIMD por padrão. Para resolver este problema, adicionamos o recurso `soft-float`, que emula todas as operações de ponto flutuante através de funções de software baseadas em inteiros normais. Para mais informações, veja nosso post sobre [desabilitando SIMD](@/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.md). ```json "rustc-abi": "x86-softfloat" ``` Como queremos usar o recurso `soft-float`, também precisamos dizer ao compilador Rust `rustc` que queremos usar a ABI correspondente. Podemos fazer isso definindo o campo `rustc-abi` para `x86-softfloat`. #### Juntando Tudo Nosso arquivo de especificação de alvo agora se parece com isto: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true, "linker-flavor": "ld.lld", "linker": "rust-lld", "panic-strategy": "abort", "disable-redzone": true, "features": "-mmx,-sse,+soft-float", "rustc-abi": "x86-softfloat" } ``` ### Construindo nosso Kernel Compilar para nosso novo alvo usará convenções Linux, já que o linker-flavor ld.lld instrui o llvm a compilar com a flag `-flavor gnu` (para mais opções de linker, veja [a documentação do rustc](https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-flavor)). Isso significa que precisamos de um ponto de entrada chamado `_start` como descrito no [post anterior]: [post anterior]: @/edition-2/posts/01-freestanding-rust-binary/index.pt-BR.md ```rust // src/main.rs #![no_std] // não vincule a biblioteca padrão do Rust #![no_main] // desativar todos os pontos de entrada no nível Rust use core::panic::PanicInfo; /// Esta função é chamada em caso de pânico. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } #[unsafe(no_mangle)] // não altere (mangle) o nome desta função pub extern "C" fn _start() -> ! { // essa função é o ponto de entrada, já que o vinculador procura uma função // denominado `_start` por padrão loop {} } ``` Note que o ponto de entrada precisa ser chamado `_start` independentemente do seu SO host. Agora podemos construir o kernel para nosso novo alvo passando o nome do arquivo JSON como `--target`: ``` > cargo build --target x86_64-blog_os.json error: `.json` target specs require -Zjson-target-spec ``` Falha! O erro nos diz que especificações de alvo JSON personalizadas são um recurso instável que requer habilitação explícita. Isso ocorre porque o formato dos arquivos JSON de alvo ainda não é considerado estável, então mudanças podem ocorrer em futuras versões do Rust. Consulte a [issue de rastreamento para especificações de alvo JSON personalizadas][json-target-spec-issue] para mais informações. [json-target-spec-issue]: https://github.com/rust-lang/rust/issues/151528 #### A Opção `json-target-spec` Para habilitar o suporte para especificações de alvo JSON personalizadas, precisamos criar um arquivo de [configuração cargo] local em `.cargo/config.toml` (a pasta `.cargo` deve estar ao lado da sua pasta `src`) com o seguinte conteúdo: [configuração cargo]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # em .cargo/config.toml [unstable] json-target-spec = true ``` Isso habilita o recurso instável `json-target-spec`, permitindo-nos usar arquivos JSON de alvo personalizados. Com esta configuração em vigor, vamos tentar construir novamente: ``` > cargo build --target x86_64-blog_os.json error[E0463]: can't find crate for `core` ``` Agora vemos um erro diferente! O erro nos diz que o compilador Rust não consegue mais encontrar a [biblioteca `core`]. Esta biblioteca contém tipos básicos do Rust como `Result`, `Option` e iteradores, e é implicitamente vinculada a todas as crates `no_std`. [biblioteca `core`]: https://doc.rust-lang.org/nightly/core/index.html O problema é que a biblioteca core é distribuída junto com o compilador Rust como uma biblioteca _pré-compilada_. Então ela é válida apenas para target triples host suportados (por exemplo, `x86_64-unknown-linux-gnu`) mas não para nosso alvo customizado. Se quisermos compilar código para outros alvos, precisamos recompilar `core` para esses alvos primeiro. #### A Opção `build-std` É aí que entra o [recurso `build-std`] do cargo. Ele permite recompilar `core` e outras crates da biblioteca padrão sob demanda, em vez de usar as versões pré-compiladas enviadas com a instalação do Rust. Este recurso é muito novo e ainda não está finalizado, então é marcado como "unstable" e disponível apenas em [compiladores Rust nightly]. [recurso `build-std`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std [compiladores Rust nightly]: #instalando-o-rust-nightly Para usar o recurso, precisamos adicionar o seguinte ao nosso arquivo de [configuração cargo] em `.cargo/config.toml`: ```toml # em .cargo/config.toml [unstable] json-target-spec = true build-std = ["core", "compiler_builtins"] ``` Isso diz ao cargo que ele deve recompilar as bibliotecas `core` e `compiler_builtins`. Esta última é necessária porque é uma dependência de `core`. Para recompilar essas bibliotecas, o cargo precisa de acesso ao código-fonte do rust, que podemos instalar com `rustup component add rust-src`.
    **Nota:** A chave de configuração `unstable.build-std` requer pelo menos o Rust nightly de 15-07-2020.
    Depois de definir a chave de configuração `unstable.build-std` e instalar o componente `rust-src`, podemos executar novamente nosso comando de compilação: ``` > cargo build --target x86_64-blog_os.json Compiling core v0.0.0 (/…/rust/src/libcore) Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) Compiling compiler_builtins v0.1.32 Compiling blog_os v0.1.0 (/…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` Vemos que `cargo build` agora recompila as bibliotecas `core`, `rustc-std-workspace-core` (uma dependência de `compiler_builtins`) e `compiler_builtins` para nosso alvo customizado. #### Intrínsecos Relacionados a Memória O compilador Rust assume que um certo conjunto de funções embutidas está disponível para todos os sistemas. A maioria dessas funções é fornecida pela crate `compiler_builtins` que acabamos de recompilar. No entanto, existem algumas funções relacionadas a memória nessa crate que não são habilitadas por padrão porque normalmente são fornecidas pela biblioteca C no sistema. Essas funções incluem `memset`, que define todos os bytes em um bloco de memória para um valor dado, `memcpy`, que copia um bloco de memória para outro, e `memcmp`, que compara dois blocos de memória. Embora não precisássemos de nenhuma dessas funções para compilar nosso kernel agora, elas serão necessárias assim que adicionarmos mais código a ele (por exemplo, ao copiar structs). Como não podemos vincular à biblioteca C do sistema operacional, precisamos de uma maneira alternativa de fornecer essas funções ao compilador. Uma possível abordagem para isso poderia ser implementar nossas próprias funções `memset` etc. e aplicar o atributo `#[unsafe(no_mangle)]` a elas (para evitar a renomeação automática durante a compilação). No entanto, isso é perigoso, pois o menor erro na implementação dessas funções pode levar a undefined behavior. Por exemplo, implementar `memcpy` com um loop `for` pode resultar em recursão infinita porque loops `for` implicitamente chamam o método da trait [`IntoIterator::into_iter`], que pode chamar `memcpy` novamente. Então é uma boa ideia reutilizar implementações existentes e bem testadas em vez disso. [`IntoIterator::into_iter`]: https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter Felizmente, a crate `compiler_builtins` já contém implementações para todas as funções necessárias, elas estão apenas desabilitadas por padrão para não colidir com as implementações da biblioteca C. Podemos habilitá-las definindo a flag [`build-std-features`] do cargo para `["compiler-builtins-mem"]`. Como a flag `build-std`, esta flag pode ser passada na linha de comando como uma flag `-Z` ou configurada na tabela `unstable` no arquivo `.cargo/config.toml`. Como queremos sempre compilar com esta flag, a opção do arquivo de configuração faz mais sentido para nós: [`build-std-features`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features ```toml # em .cargo/config.toml [unstable] json-target-spec = true build-std-features = ["compiler-builtins-mem"] build-std = ["core", "compiler_builtins"] ``` (O suporte para o recurso `compiler-builtins-mem` foi [adicionado muito recentemente](https://github.com/rust-lang/rust/pull/77284), então você precisa pelo menos do Rust nightly `2020-09-30` para ele.) Nos bastidores, esta flag habilita o [recurso `mem`] da crate `compiler_builtins`. O efeito disso é que o atributo `#[unsafe(no_mangle)]` é aplicado às [implementações `memcpy` etc.] da crate, o que as torna disponíveis ao linker. [recurso `mem`]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L54-L55 [implementações `memcpy` etc.]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69 Com esta mudança, nosso kernel tem implementações válidas para todas as funções exigidas pelo compilador, então ele continuará a compilar mesmo se nosso código ficar mais complexo. #### Definir um Alvo Padrão Para evitar passar o parâmetro `--target` em cada invocação de `cargo build`, podemos sobrescrever o alvo padrão. Para fazer isso, adicionamos o seguinte ao nosso arquivo de [configuração cargo] em `.cargo/config.toml`: [configuração cargo]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # em .cargo/config.toml [build] target = "x86_64-blog_os.json" ``` Isso diz ao `cargo` para usar nosso alvo `x86_64-blog_os.json` quando nenhum argumento `--target` explícito é passado. Isso significa que agora podemos construir nosso kernel com um simples `cargo build`. Para mais informações sobre opções de configuração do cargo, confira a [documentação oficial][configuração cargo]. Agora podemos construir nosso kernel para um alvo bare metal com um simples `cargo build`. No entanto, nosso ponto de entrada `_start`, que será chamado pelo bootloader, ainda está vazio. É hora de mostrar algo na tela a partir dele. ### Imprimindo na Tela A maneira mais fácil de imprimir texto na tela neste estágio é o [buffer de texto VGA]. É uma área de memória especial mapeada para o hardware VGA que contém o conteúdo exibido na tela. Normalmente consiste em 25 linhas que cada uma contém 80 células de caractere. Cada célula de caractere exibe um caractere ASCII com algumas cores de primeiro plano e fundo. A saída da tela se parece com isto: [buffer de texto VGA]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode ![saída de tela para caracteres ASCII comuns](https://upload.wikimedia.org/wikipedia/commons/f/f8/Codepage-437.png) Discutiremos o layout exato do buffer VGA no próximo post, onde escreveremos um primeiro pequeno driver para ele. Para imprimir "Hello World!", só precisamos saber que o buffer está localizado no endereço `0xb8000` e que cada célula de caractere consiste em um byte ASCII e um byte de cor. A implementação se parece com isto: ```rust static HELLO: &[u8] = b"Hello World!"; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { let vga_buffer = 0xb8000 as *mut u8; for (i, &byte) in HELLO.iter().enumerate() { unsafe { *vga_buffer.offset(i as isize * 2) = byte; *vga_buffer.offset(i as isize * 2 + 1) = 0xb; } } loop {} } ``` Primeiro, convertemos o inteiro `0xb8000` em um [ponteiro bruto]. Então [iteramos] sobre os bytes da [byte string] [static] `HELLO`. Usamos o método [`enumerate`] para obter adicionalmente uma variável em execução `i`. No corpo do loop for, usamos o método [`offset`] para escrever o byte da string e o byte de cor correspondente (`0xb` é um ciano claro). [iterar]: https://doc.rust-lang.org/stable/book/ch13-02-iterators.html [static]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate [byte string]: https://doc.rust-lang.org/reference/tokens.html#byte-string-literals [ponteiro bruto]: https://doc.rust-lang.org/stable/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset Note que há um bloco [`unsafe`] em torno de todas as escritas de memória. A razão é que o compilador Rust não pode provar que os ponteiros brutos que criamos são válidos. Eles poderiam apontar para qualquer lugar e levar à corrupção de dados. Ao colocá-los em um bloco `unsafe`, estamos basicamente dizendo ao compilador que temos absoluta certeza de que as operações são válidas. Note que um bloco `unsafe` não desativa as verificações de segurança do Rust. Ele apenas permite que você faça [cinco coisas adicionais]. [`unsafe`]: https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html [cinco coisas adicionais]: https://doc.rust-lang.org/stable/book/ch20-01-unsafe-rust.html#unsafe-superpowers Quero enfatizar que **esta não é a maneira como queremos fazer as coisas em Rust!** É muito fácil bagunçar ao trabalhar com ponteiros brutos dentro de blocos unsafe. Por exemplo, poderíamos facilmente escrever além do fim do buffer se não tivermos cuidado. Então queremos minimizar o uso de `unsafe` o máximo possível. O Rust nos dá a capacidade de fazer isso criando abstrações seguras. Por exemplo, poderíamos criar um tipo de buffer VGA que encapsula toda a unsafety e garante que seja _impossível_ fazer algo errado de fora. Desta forma, precisaríamos apenas de quantidades mínimas de código `unsafe` e poderíamos ter certeza de que não violamos [memory safety]. Criaremos tal abstração de buffer VGA segura no próximo post. [memory safety]: https://en.wikipedia.org/wiki/Memory_safety ## Executando nosso Kernel Agora que temos um executável que faz algo perceptível, é hora de executá-lo. Primeiro, precisamos transformar nosso kernel compilado em uma imagem de disco inicializável vinculando-o com um bootloader. Então podemos executar a imagem de disco na máquina virtual [QEMU] ou inicializá-la em hardware real usando um pendrive USB. ### Criando uma Bootimage Para transformar nosso kernel compilado em uma imagem de disco inicializável, precisamos vinculá-lo com um bootloader. Como aprendemos na [seção sobre boot], o bootloader é responsável por inicializar a CPU e carregar nosso kernel. [seção sobre boot]: #o-processo-de-boot Em vez de escrever nosso próprio bootloader, que é um projeto por si só, usamos a crate [`bootloader`]. Esta crate implementa um bootloader BIOS básico sem nenhuma dependência C, apenas Rust e assembly inline. Para usá-lo para inicializar nosso kernel, precisamos adicionar uma dependência nele: [`bootloader`]: https://crates.io/crates/bootloader ```toml # em Cargo.toml [dependencies] bootloader = "0.9" ``` **Nota:** Este post é compatível apenas com `bootloader v0.9`. Versões mais novas usam um sistema de compilação diferente e resultarão em erros de compilação ao seguir este post. Adicionar o bootloader como uma dependência não é suficiente para realmente criar uma imagem de disco inicializável. O problema é que precisamos vincular nosso kernel com o bootloader após a compilação, mas o cargo não tem suporte para [scripts pós-compilação]. [scripts pós-compilação]: https://github.com/rust-lang/cargo/issues/545 Para resolver este problema, criamos uma ferramenta chamada `bootimage` que primeiro compila o kernel e o bootloader, e então os vincula juntos para criar uma imagem de disco inicializável. Para instalar a ferramenta, vá para seu diretório home (ou qualquer diretório fora do seu projeto cargo) e execute o seguinte comando no seu terminal: ``` cargo install bootimage ``` Para executar `bootimage` e construir o bootloader, você precisa ter o componente rustup `llvm-tools-preview` instalado. Você pode fazer isso executando `rustup component add llvm-tools-preview`. Depois de instalar `bootimage` e adicionar o componente `llvm-tools-preview`, você pode criar uma imagem de disco inicializável voltando para o diretório do seu projeto cargo e executando: ``` > cargo bootimage ``` Vemos que a ferramenta recompila nosso kernel usando `cargo build`, então automaticamente pegará quaisquer mudanças que você fizer. Depois, ela compila o bootloader, o que pode demorar um pouco. Como todas as dependências de crate, ele é compilado apenas uma vez e então armazenado em cache, então compilações subsequentes serão muito mais rápidas. Finalmente, `bootimage` combina o bootloader e seu kernel em uma imagem de disco inicializável. Após executar o comando, você deve ver uma imagem de disco inicializável chamada `bootimage-blog_os.bin` no seu diretório `target/x86_64-blog_os/debug`. Você pode inicializá-la em uma máquina virtual ou copiá-la para um pendrive USB para inicializá-la em hardware real. (Note que este não é uma imagem de CD, que tem um formato diferente, então gravá-la em um CD não funciona). #### Como funciona? A ferramenta `bootimage` executa os seguintes passos nos bastidores: - Ela compila nosso kernel para um arquivo [ELF]. - Ela compila a dependência do bootloader como um executável autônomo. - Ela vincula os bytes do arquivo ELF do kernel ao bootloader. [ELF]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format [rust-osdev/bootloader]: https://github.com/rust-osdev/bootloader Quando inicializado, o bootloader lê e analisa o arquivo ELF anexado. Ele então mapeia os segmentos do programa para endereços virtuais nas tabelas de página, zera a seção `.bss` e configura um stack. Finalmente, ele lê o endereço do ponto de entrada (nossa função `_start`) e salta para ele. ### Inicializando no QEMU Agora podemos inicializar a imagem de disco em uma máquina virtual. Para inicializá-la no [QEMU], execute o seguinte comando: [QEMU]: https://www.qemu.org/ ``` > qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin ``` Isso abre uma janela separada que deve se parecer com isto: ![QEMU mostrando "Hello World!"](qemu.png) Vemos que nosso "Hello World!" está visível na tela. ### Máquina Real Também é possível escrevê-lo em um pendrive USB e inicializá-lo em uma máquina real, **mas tenha cuidado** para escolher o nome correto do dispositivo, porque **tudo naquele dispositivo será sobrescrito**: ``` > dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX && sync ``` Onde `sdX` é o nome do dispositivo do seu pendrive USB. Depois de escrever a imagem no pendrive USB, você pode executá-la em hardware real inicializando a partir dele. Você provavelmente precisará usar um menu de boot especial ou alterar a ordem de boot na configuração do BIOS para inicializar a partir do pendrive USB. Note que atualmente não funciona para máquinas UEFI, já que a crate `bootloader` ainda não tem suporte UEFI. ### Usando `cargo run` Para facilitar a execução do nosso kernel no QEMU, podemos definir a chave de configuração `runner` para o cargo: ```toml # em .cargo/config.toml [target.'cfg(target_os = "none")'] runner = "bootimage runner" ``` A tabela `target.'cfg(target_os = "none")'` se aplica a todos os alvos cujo campo `"os"` do arquivo de configuração de alvo está definido como `"none"`. Isso inclui nosso alvo `x86_64-blog_os.json`. A chave `runner` especifica o comando que deve ser invocado para `cargo run`. O comando é executado após uma compilação bem-sucedida com o caminho do executável passado como o primeiro argumento. Veja a [documentação do cargo][configuração cargo] para mais detalhes. O comando `bootimage runner` é especificamente projetado para ser utilizável como um executável `runner`. Ele vincula o executável dado com a dependência do bootloader do projeto e então lança o QEMU. Veja o [Readme do `bootimage`] para mais detalhes e opções de configuração possíveis. [Readme do `bootimage`]: https://github.com/rust-osdev/bootimage Agora podemos usar `cargo run` para compilar nosso kernel e inicializá-lo no QEMU. ## O que vem a seguir? No próximo post, exploraremos o buffer de texto VGA em mais detalhes e escreveremos uma interface segura para ele. Também adicionaremos suporte para a macro `println`. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/index.ru.md ================================================ +++ title = "Минимально возможное ядро на Rust" weight = 2 path = "ru/minimal-rust-kernel" date = 2018-02-10 [extra] translators = ["MrZloHex"] +++ В этом посте мы создадим минимальное 64-битное ядро на Rust для архитектуры x86_64. Мы будем отталкиваться от [независимого бинарного файла][freestanding Rust binary] из предыдущего поста для создания загрузочного образа диска, который может что-то выводить на экран. [freestanding Rust binary]: @/edition-2/posts/01-freestanding-rust-binary/index.ru.md Этот блог открыто разрабатывается на [GitHub]. Если у вас возникли какие-либо проблемы или вопросы, пожалуйста, создайте _issue_. Также вы можете оставлять комментарии [в конце страницы][at the bottom]. Полный исходный код для этого поста вы можете найти в репозитории в ветке [`post-02`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-02 ## Последовательность процессов запуска {#the-boot-process} Когда вы включаете компьютер, он начинает выполнять код микропрограммы, который хранится в [ПЗУ][ROM] материнской платы. Этот код выполняет [самотестирование при включении][power-on self-test], определяет доступную оперативную память и выполняет предварительную инициализацию процессора и аппаратного обеспечения. После этого он ищет загрузочный диск и начинает загрузку ядра операционной системы. [ROM]: https://en.wikipedia.org/wiki/Read-only_memory [power-on self-test]: https://en.wikipedia.org/wiki/Power-on_self-test Для архитектуры x86 существует два стандарта прошивки: “Basic Input/Output System“ ("Базовая система ввода/вывода" **[BIOS]**) и более новый “Unified Extensible Firmware Interface” ("Унифицированный расширяемый интерфейс прошивки" **[UEFI]**). Стандарт BIOS - старый, но простой и хорошо поддерживаемый на любой машине x86 с 1980-х годов. UEFI, напротив, более современный и имеет гораздо больше возможностей, но более сложен в настройке (по крайней мере, на мой взгляд). [BIOS]: https://en.wikipedia.org/wiki/BIOS [UEFI]: https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface В данный момент, мы обеспечиваем поддержку только BIOS, но планируется поддержка и UEFI. Если вы хотите помочь нам в этом, обратитесь к [Github issue](https://github.com/phil-opp/blog_os/issues/349). ## Запуск BIOS Почти все системы x86 имеют поддержку загрузки BIOS, включая более новые машины на базе UEFI, которые используют эмулированный BIOS. Это замечательно, потому что вы можете использовать одну и ту же логику загрузки на всех машинах из прошлых веков. Но такая широкая совместимость одновременно является и самым большим недостатком загрузки BIOS, поскольку это означает, что перед загрузкой процессор переводится в 16-битный режим совместимости под названием [реальный режим], чтобы архаичные загрузчики 1980-х годов все еще работали. Но давайте начнем с самого начала: Когда вы включаете компьютер, он загружает BIOS из специальной флэш-памяти, расположенной на материнской плате. BIOS запускает процедуры самодиагностики и инициализации оборудования, затем ищет загрузочные диски. Если он находит такой, управление передается _загрузчику_, который представляет собой 512-байтовую порцию исполняемого кода, хранящуюся в начале диска. Большинство загрузчиков имеют размер более 512 байт, поэтому загрузчики обычно разделяются на небольшой первый этап, который помещается в 512 байт, и второй этап, который впоследствии загружается первым этапом. Загрузчик должен определить расположение образа ядра на диске и загрузить его в память. Он также должен переключить процессор из 16-битного [реального режима][real mode] сначала в 32-битный [защищенный режим][protected mode], а затем в 64-битный [длинный режим][long mode], где доступны 64-битные регистры и вся основная память. Третья задача - запросить определенную информацию (например, карту памяти) у BIOS и передать ее ядру ОС. [real mode]: https://en.wikipedia.org/wiki/Real_mode [protected mode]: https://en.wikipedia.org/wiki/Protected_mode [long mode]: https://en.wikipedia.org/wiki/Long_mode [memory segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation Написание загрузчика немного громоздко, поскольку требует использования языка ассемблера и множества неинтересных действий, таких как "запишите это магическое значение в этот регистр процессора". Поэтому мы не рассматриваем создание загрузчика в этом посте и вместо этого предоставляем инструмент под названием [bootimage], который автоматически добавляет загрузчик к вашему ядру. [bootimage]: https://github.com/rust-osdev/bootimage Если вы заинтересованы в создании собственного загрузчика: Оставайтесь с нами, набор постов на эту тему уже запланирован! #### Стандарт Multiboot Чтобы избежать того, что каждая операционная система реализует свой собственный загрузчик, который совместим только с одной ОС, [Free Software Foundation] в 1995 году создал открытый стандарт загрузчика под названием [Multiboot]. Стандарт определяет интерфейс между загрузчиком и операционной системой, так что любой совместимый с Multiboot загрузчик может загружать любую совместимую с Multiboot операционную систему. Эталонной реализацией является [GNU GRUB], который является самым популярным загрузчиком для систем Linux. [Free Software Foundation]: https://en.wikipedia.org/wiki/Free_Software_Foundation [Multiboot]: https://wiki.osdev.org/Multiboot [GNU GRUB]: https://en.wikipedia.org/wiki/GNU_GRUB Чтобы сделать ядро совместимым с Multiboot, нужно просто вставить так называемый [Multiboot заголовок][Multiboot header] в начало файла ядра. Это делает загрузку ОС в GRUB очень простой. Однако у GRUB и стандарта Multiboot есть и некоторые проблемы: [Multiboot header]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format - Они поддерживают только 32-битный защищенный режим. Это означает, что для перехода на 64-битный длинный режим необходимо выполнить конфигурацию процессора. - Они предназначены для того, чтобы упростить загрузчик вместо ядра. Например, ядро должно быть связано с [скорректированным размером страницы по умолчанию][adjusted default page size], потому что иначе GRUB не сможет найти заголовок Multiboot. Другой пример - [информация запуска][boot information], которая передается ядру, содержит множество структур, зависящих от архитектуры, вместо того, чтобы предоставлять чистые абстракции. - И GRUB, и стандарт Multiboot документированы очень скудно. - GRUB должен быть установлен на хост-системе, чтобы создать загрузочный образ диска из файла ядра. Это усложняет разработку под Windows или Mac. [adjusted default page size]: https://wiki.osdev.org/Multiboot#Multiboot_2 [boot information]: https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format Из-за этих недостатков мы решили не использовать GRUB или стандарт Multiboot. Однако мы планируем добавить поддержку Multiboot в наш инструмент [bootimage], чтобы можно было загружать ваше ядро и на системе GRUB. Если вы заинтересованы в написании ядра, совместимого с Multiboot, ознакомьтесь с [первым выпуском][first edition] этой серии блогов. [first edition]: @/edition-1/_index.md ### UEFI (На данный момент мы не предоставляем поддержку UEFI, но мы бы хотели! Если вы хотите помочь, пожалуйста, сообщите нам об этом в [Github issue](https://github.com/phil-opp/blog_os/issues/349).) ## Минимально возможное ядро Теперь, когда мы примерно знаем, как запускается компьютер, пришло время создать собственное минимально возможное ядро. Наша цель - создать образ диска, который при загрузке выводит на экран "Hello World!". Для этого мы будем используем [Независимый бинарный файл на Rust][freestanding Rust binary] из предыдущего поста. Как вы помните, мы собирали независимый бинарный файл с помощью `cargo`, но в зависимости от операционной системы нам требовались разные имена точек входа и флаги компиляции. Это потому, что `cargo` по умолчанию компилирует для _хостовой системы_, то есть системы, на которой вы работаете. Это не то, что мы хотим для нашего ядра, потому что ядро, работающее поверх, например, Windows, не имеет особого смысла. Вместо этого мы хотим компилировать для четко определенной _целевой системы_. ### Установка Rust Nightly {#installing-rust-nightly} Rust имеет три релизных канала: _stable_, _beta_ и _nightly_. В книге Rust Book очень хорошо объясняется разница между этими каналами, поэтому уделите минуту и [ознакомьтесь с ней](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#choo-choo-release-channels-and-riding-the-trains). Для создания операционной системы нам понадобятся некоторые экспериментальные возможности, которые доступны только на канале nightly, поэтому нам нужно установить nightly версию Rust. Для управления установками Rust я настоятельно рекомендую [rustup]. Он позволяет устанавливать nightly, beta и stable компиляторы рядом друг с другом и облегчает их обновление. С помощью rustup вы можете использовать nightly компилятор для текущего каталога, выполнив команду `rustup override set nightly`. В качестве альтернативы вы можете добавить файл `rust-toolchain` с содержимым `nightly` в корневой каталог проекта. Вы можете проверить, установлена ли у вас версия nightly, выполнив команду `rustc --version`: Номер версии должен содержать `-nightly` в конце. [rustup]: https://www.rustup.rs/ Nightly версия компилятора позволяет нам подключать различные экспериментальные возможности с помощью так называемых _флагов_ в верхней части нашего файла. Например, мы можем включить экспериментальный [макрос `asm!``asm!` macro] для встроенного ассемблера, добавив `#![feature(asm)]` в начало нашего `main.rs`. Обратите внимание, что такие экспериментальные возможности совершенно нестабильны, что означает, что будущие версии Rust могут изменить или удалить их без предварительного предупреждения. По этой причине мы будем использовать их только в случае крайней необходимости. [`asm!` macro]: https://doc.rust-lang.org/stable/reference/inline-assembly.html ### Спецификация целевой платформы Cargo поддерживает различные целевые системы через параметр `--target`. Цель описывается так называемой тройкой _[target triple]_, которая описывает архитектуру процессора, производителя, операционную систему и [ABI]. Например, тройка целей `x86_64-unknown-linux-gnu` описывает систему с процессором `x86_64`, неизвестным поставщиком и операционной системой Linux с GNU ABI. Rust поддерживает [множество различных целевых троек][platform-support], включая `arm-linux-androideabi` для Android или [`wasm32-unknown-unknown` для WebAssembly](https://www.hellorust.com/setup/wasm-target/). [target triple]: https://clang.llvm.org/docs/CrossCompilation.html#target-triple [ABI]: https://stackoverflow.com/a/2456882 [platform-support]: https://forge.rust-lang.org/release/platform-support.html [custom-targets]: https://doc.rust-lang.org/nightly/rustc/targets/custom.html Однако для нашей целевой системы нам требуются некоторые специальные параметры конфигурации (например, отсутствие базовой ОС), поэтому ни одна из [существующих целевых троек][platform-support] не подходит. К счастью, Rust позволяет нам определить [custom target][custom-targets] через JSON-файл. Например, JSON-файл, описывающий цель `x86_64-unknown-linux-gnu`, выглядит следующим образом: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "linux", "executables": true, "linker-flavor": "gcc", "pre-link-args": ["-m64"], "morestack": false } ``` Большинство полей требуется LLVM для генерации кода для данной платформы. Например, поле [`data-layout`] определяет размер различных типов целых чисел, чисел с плавающей точкой и указателей. Затем есть поля, которые Rust использует для условной компиляции, такие как `target-pointer-width`. Третий вид полей определяет, как должен быть собран крейт. Например, поле `pre-link-args` определяет аргументы, передаваемые [компоновщику][linker]. [`data-layout`]: https://llvm.org/docs/LangRef.html#data-layout [linker]: https://en.wikipedia.org/wiki/Linker_(computing) Для нашего ядра тоже нужна архитектура `x86_64`, поэтому наша спецификация цели будет очень похожа на приведенную выше. Начнем с создания файла `x86_64-blog_os.json` (выберите любое имя, которое вам нравится) с общим содержанием: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true } ``` Обратите внимание, что мы изменили ОС в поле `llvm-target` и `os` на `none`, потому что мы будем работать на голом железе. Добавляем дополнительные параметры для сборки ядра: ```json "linker-flavor": "ld.lld", "linker": "rust-lld", ``` Вместо того чтобы использовать компоновщик по умолчанию платформы (который может не поддерживать цели Linux), мы используем кроссплатформенный компоновщик [LLD], поставляемый вместе с Rust, для компоновки нашего ядра. [LLD]: https://lld.llvm.org/ ```json "panic-strategy": "abort", ``` Этот параметр указывает, что цель не поддерживает [раскрутку стека][stack unwinding] при панике, поэтому вместо этого программа должна прерваться напрямую. Это имеет тот же эффект, что и опция `panic = "abort"` в нашем Cargo.toml, поэтому мы можем удалить ее оттуда. (Обратите внимание, что в отличие от опции Cargo.toml, эта опция также будет применяться, когда мы перекомпилируем библиотеку `core` позже в этом посте. Поэтому не забудьте добавить эту опцию, даже если вы предпочтете оставить опцию в Cargo.toml). [stack unwinding]: https://www.bogotobogo.com/cplusplus/stackunwinding.php ```json "disable-redzone": true, ``` Мы пишем ядро, поэтому в какой-то момент нам понадобится обрабатывать прерывания. Чтобы сделать это безопасно, мы должны отключить определенную оптимизацию указателя стека, называемую _"красной зоной"_, поскольку в противном случае она приведет к повреждениям стека. Для получения дополнительной информации см. нашу отдельную статью об [отключении красной зоны][disabling the red zone]. [disabling the red zone]: @/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.ru.md ```json "features": "-mmx,-sse,+soft-float", ``` Поле `features` включает/выключает функции целевой платформы. Мы отключаем функции `mmx` и `sse`, добавляя к ним минус, и включаем функцию `soft-float`, добавляя к ней плюс. Обратите внимание, что между разными флагами не должно быть пробелов, иначе LLVM не сможет интерпретировать строку features. Функции `mmx` и `sse` определяют поддержку инструкций [Single Instruction Multiple Data (SIMD)], которые часто могут значительно ускорить работу программ. Однако использование больших регистров SIMD в ядрах ОС приводит к проблемам с производительностью. Причина в том, что ядру необходимо восстановить все регистры в исходное состояние перед продолжением прерванной программы. Это означает, что ядро должно сохранять полное состояние SIMD в основной памяти при каждом системном вызове или аппаратном прерывании. Поскольку состояние SIMD очень велико (512-1600 байт), а прерывания могут происходить очень часто, эти дополнительные операции сохранения/восстановления значительно снижают производительность. Чтобы избежать этого, мы отключили SIMD для нашего ядра (не для приложений, работающих поверх него!). [Single Instruction Multiple Data (SIMD)]: https://en.wikipedia.org/wiki/SIMD Проблема с отключением SIMD заключается в том, что операции с числами с плавающей точкой на `x86_64` по умолчанию требуют регистров SIMD. Чтобы решить эту проблему, мы добавили функцию `soft-float`, которая эмулирует все операции с числами с плавающей точкой через программные функции, основанные на обычных целых числах. Для получения дополнительной информации см. наш пост об [отключении SIMD](@/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.ru.md). ```json "rustc-abi": "x86-softfloat" ``` As we want to use the `soft-float` feature, we also need to tell the Rust compiler `rustc` that we want to use the corresponding ABI. We can do that by setting the `rustc-abi` field to `x86-softfloat`. #### Соединяем все вместе Наша спецификация целовой платформы выглядит следующим образом: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true, "linker-flavor": "ld.lld", "linker": "rust-lld", "panic-strategy": "abort", "disable-redzone": true, "features": "-mmx,-sse,+soft-float", "rustc-abi": "x86-softfloat" } ``` ### Компиляция ядра Компиляция для нашей новой целевой платформы будет использовать соглашения Linux (я не совсем уверен почему — предполагаю, что это просто поведение LLVM по умолчанию). Это означает, что нам нужна точка входа с именем `_start`, как описано в [предыдущем посте][previous post]: [previous post]: @/edition-2/posts/01-freestanding-rust-binary/index.ru.md ```rust // src/main.rs #![no_std] // don't link the Rust standard library #![no_main] // disable all Rust-level entry points use core::panic::PanicInfo; /// This function is called on panic. #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { // this function is the entry point, since the linker looks for a function // named `_start` by default loop {} } ``` Обратите внимание, что точка входа должна называться `_start` независимо от используемой вами ОС. Теперь мы можем собрать ядро для нашей новой цели, передав имя файла JSON в качестве `--target`: ``` > cargo build --target x86_64-blog_os.json error: `.json` target specs require -Zjson-target-spec ``` Не получается! Ошибка сообщает нам, что пользовательские спецификации целей JSON являются нестабильной функцией, требующей явной активации. Это связано с тем, что формат файлов JSON целей ещё не считается стабильным, поэтому в будущих версиях Rust могут произойти изменения. Дополнительную информацию см. в [issue отслеживания для пользовательских спецификаций целей JSON][json-target-spec-issue]. [json-target-spec-issue]: https://github.com/rust-lang/rust/issues/151528 ### Опция `json-target-spec` Чтобы включить поддержку пользовательских спецификаций целей JSON, нам нужно создать файл [конфигурации cargo][cargo configuration] по пути `.cargo/config.toml` (папка `.cargo` должна находиться рядом с папкой `src`) со следующим содержимым: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # in .cargo/config.toml [unstable] json-target-spec = true ``` Это включает нестабильную функцию `json-target-spec`, позволяя нам использовать пользовательские файлы целей JSON. С этой конфигурацией попробуем собрать снова: ``` > cargo build --target x86_64-blog_os.json error[E0463]: can't find crate for `core` ``` Теперь мы видим другую ошибку! Ошибка сообщает нам, что компилятор Rust больше не может найти [библиотеку `core`][`core` library]. Эта библиотека содержит основные типы Rust, такие как `Result`, `Option` и итераторы, и неявно связана со всеми `no_std` модулями. [`core` library]: https://doc.rust-lang.org/nightly/core/index.html Проблема в том, что корневая (`core`) библиотека распространяется вместе с компилятором Rust как _прекомпилированная_ библиотека. Поэтому она действительна только для поддерживаемых тройных хостов (например, `x86_64-unknown-linux-gnu`), но не для нашей пользовательской целевой платформы. Если мы хотим скомпилировать код для других целевых платформ, нам нужно сначала перекомпилировать `core` для этих целей. ### Функция `build-std` Вот тут-то и приходит на помощь функция [`build-std`][`build-std` feature] в cargo. Она позволяет перекомпилировать `core` и другие стандартные библиотеки по требованию, вместо того, чтобы использовать предварительно скомпилированные версии, поставляемые вместе с установкой Rust. Эта функция очень новая и еще не закончена, поэтому она помечена как "нестабильная" и доступна только на [nightly Rust]. [`build-std` feature]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std [nightly Rust]: #installing-rust-nightly Чтобы использовать эту функцию, нам нужно добавить следующее в файл [конфигурации cargo][cargo configuration] по пути `.cargo/config.toml`: ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std = ["core", "compiler_builtins"] ``` Это говорит cargo, что он должен перекомпилировать библиотеки `core` и `compiler_builtins`. Последняя необходима, поскольку `core` зависит от неё. Чтобы перекомпилировать эти библиотеки, cargo нужен доступ к исходному коду rust, который мы можем установить с помощью команды `rustup component add rust-src`.
    **Note:** Ключ конфигурации `unstable.build-std` требует как минимум Rust nightly от 2020-07-15.
    ``` > cargo build --target x86_64-blog_os.json Compiling core v0.0.0 (/…/rust/src/libcore) Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) Compiling compiler_builtins v0.1.32 Compiling blog_os v0.1.0 (/…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` Мы видим, что `cargo build` теперь перекомпилирует библиотеки `core`, `rustc-std-workspace-core` (зависимость от `compiler_builtins`) и `compiler_builtins` для нашей пользовательской целевой платформы. #### Внутренние функции, работающие с памятью Компилятор Rust предполагает, что определенный набор встроенных функций доступен для всех систем. Большинство этих функций обеспечивается модулем `compiler_builtins`, который мы только что перекомпилировали. Однако в этом модуле есть некоторые функции, связанные с памятью, которые не включены по умолчанию, потому что они обычно предоставляются библиотекой C в системе. Эти функции включают `memset`, которая устанавливает все байты в блоке памяти в заданное значение, `memcpy`, которая копирует один блок памяти в другой, и `memcmp`, которая сравнивает два блока памяти. Хотя ни одна из этих функций нам сейчас не понадобилась для компиляции нашего ядра, они потребуются, как только мы добавим в него дополнительный код (например, при копировании структур). Поскольку мы не можем ссылаться на С библиотеку хостовой операционной системы, нам нужен альтернативный способ предоставления этих функций компилятору. Одним из возможных подходов для этого может быть реализация наших собственных функций `memset` и т.д. и применение к ним атрибута `#[unsafe(no_mangle)]` (чтобы избежать автоматического переименования во время компиляции). Однако это опасно, поскольку малейшая ошибка в реализации этих функций может привести к неопределенному поведению. Например, при реализации `memcpy` с помощью цикла `for` вы можете получить бесконечную рекурсию, поскольку циклы `for` неявно вызывают метод трейта [`IntoIterator::into_iter`], который может снова вызвать `memcpy`. Поэтому хорошей идеей будет повторное использование существующих, хорошо протестированных реализаций. [`IntoIterator::into_iter`]: https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter К счастью, модуль `compiler_builtins` уже содержит реализации всех необходимых функций, они просто отключены по умолчанию, чтобы не столкнуться с реализациями из С библиотеки. Мы можем включить их, установив флаг cargo [`build-std-features`] на `["compiler-builtins-mem"]`. Как и флаг `build-std`, этот флаг может быть передан в командной строке как флаг `-Z` или настроен в таблице `unstable` в файле `.cargo/config.toml`. Поскольку мы всегда хотим собирать с этим флагом, вариант с конфигурационным файлом имеет для нас больше смысла: [`build-std-features`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std-features = ["compiler-builtins-mem"] build-std = ["core", "compiler_builtins"] ``` (Поддержка функции `compiler-builtins-mem` была [добавлена совсем недавно](https://github.com/rust-lang/rust/pull/77284), поэтому для нее вам нужен как минимум Rust nightly `2020-09-30`). За кулисами этот флаг включает функцию [`mem`][`mem` feature] крейта `compiler_builtins`. Это приводит к тому, что атрибут `#[unsafe(no_mangle)]` применяется к [реализациям `memcpy` и т.п.][`memcpy` etc. implementations] из этого крейта, что делает их доступными для компоновщика. [`mem` feature]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L54-L55 [`memcpy` etc. implementations]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69 Благодаря этому изменению наше ядро имеет валидные реализации для всех функций, требуемых компилятором, поэтому оно будет продолжать компилироваться, даже если наш код станет сложнее. #### Переопределение цели по умолчанию Чтобы избежать передачи параметра `--target` при каждом вызове `cargo build`, мы можем переопределить цель по умолчанию. Для этого мы добавим следующее в наш файл [конфигураций cargo][cargo configuration] по пути `.cargo/config.toml`: ```toml # in .cargo/config.toml [build] target = "x86_64-blog_os.json" ``` С этой конфигурацией `cargo` будет использовать нашу цель `x86_64-blog_os.json`, если не передан явный аргумент `--target`. Это означает, что теперь мы можем собрать наше ядро с помощью простой `cargo build`. Чтобы узнать больше о параметрах конфигурации cargo, ознакомьтесь с [официальной документацией][cargo configuration]. Теперь мы можем скомпилировать наше ядро под голое железо с помощью простой `cargo build`. Однако наша точка входа `_start`, которая будет вызываться загрузчиком, все еще пуста. Пришло время вывести что-нибудь на экран. ### Вывод на экран Самым простым способом печати текста на экран на данном этапе является [текстовый буфер VGA][VGA text buffer]. Это специальная область памяти, сопоставленная с аппаратным обеспечением VGA, которая содержит содержимое, отображаемое на экране. Обычно он состоит из 25 строк, каждая из которых содержит 80 символьных ячеек. Каждая символьная ячейка отображает символ ASCII с некоторыми цветами переднего и заднего плана. Вывод на экран выглядит следующим образом: [VGA text buffer]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode ![screen output for common ASCII characters](https://upload.wikimedia.org/wikipedia/commons/f/f8/Codepage-437.png) Точную разметку буфера VGA мы обсудим в следующем посте, где мы напишем первый небольшой драйвер для него. Для печати "Hello World!" нам достаточно знать, что буфер расположен по адресу `0xb8000` и что каждая символьная ячейка состоит из байта ASCII и байта цвета. Реализация выглядит следующим образом: ```rust static HELLO: &[u8] = b"Hello World!"; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { let vga_buffer = 0xb8000 as *mut u8; for (i, &byte) in HELLO.iter().enumerate() { unsafe { *vga_buffer.offset(i as isize * 2) = byte; *vga_buffer.offset(i as isize * 2 + 1) = 0xb; } } loop {} } ``` Сначала мы приводим целое число `0xb8000` к [сырому указателю][raw pointer]. Затем мы [итерируем][iterate] по байтам [статической][static] [байтовой строки][byte string] `HELLO`. Мы используем метод [`enumerate`], чтобы дополнительно получить бегущую переменную `i`. В теле цикла for мы используем метод [`offset`] для записи байта строки и соответствующего байта цвета (`0xb` - светло-голубой). [iterate]: https://doc.rust-lang.org/stable/book/ch13-02-iterators.html [static]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate [byte string]: https://doc.rust-lang.org/reference/tokens.html#byte-string-literals [raw pointer]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset Обратите внимание, что вокруг всех записей в память стоит блок [`unsafe`]. Причина в том, что компилятор Rust не может доказать, что сырые указатели, которые мы создаем, действительны. Они могут указывать куда угодно и привести к повреждению данных. Помещая их в блок `unsafe`, мы, по сути, говорим компилятору, что абсолютно уверены в правильности операций. Обратите внимание, что блок `unsafe` не отключает проверки безопасности Rust. Он только позволяет вам делать [пять дополнительных вещей][five additional things]. [`unsafe`]: https://doc.rust-lang.org/stable/book/ch19-01-unsafe-rust.html [five additional things]: https://doc.rust-lang.org/stable/book/ch20-01-unsafe-rust.html#unsafe-superpowers Я хочу подчеркнуть, что **это не тот способ, которым стоит что-либо делать в Rust!** Очень легко ошибиться при работе с сырыми указателями внутри блоков `unsafe`: например, мы можем легко записать за конец буфера, если не будем осторожны. Поэтому мы хотим минимизировать использование `unsafe` настолько, насколько это возможно. Rust дает нам возможность сделать это путем создания безопасных абстракций. Например, мы можем создать тип буфера VGA, который инкапсулирует всю небезопасность и гарантирует, что извне _невозможно_ сделать что-либо неправильно. Таким образом, нам понадобится лишь минимальное количество блоков `unsafe` и мы можем быть уверены, что не нарушаем [безопасность памяти][memory safety]. Мы создадим такую безопасную абстракцию буфера VGA в следующем посте. [memory safety]: https://en.wikipedia.org/wiki/Memory_safety ## Запуск ядра Теперь, когда у нас есть исполняемый файл, который делает что-то ощутимое, пришло время запустить его. Сначала нам нужно превратить наше скомпилированное ядро в загрузочный образ диска, связав его с загрузчиком. Затем мы можем запустить образ диска в виртуальной машине [QEMU] или загрузить его на реальном оборудовании с помощью USB-носителя. ### Создание загрузочного образа Чтобы превратить наше скомпилированное ядро в загрузочный образ диска, нам нужно связать его с загрузчиком. Как мы узнали в [разделе о загрузке], загрузчик отвечает за инициализацию процессора и загрузку нашего ядра. [разделе о загрузке]: #the-boot-process Вместо того чтобы писать собственный загрузчик, который является самостоятельным проектом, мы используем модуль [`bootloader`]. Этот модуль реализует базовый BIOS-загрузчик без каких-либо C-зависимостей, только Rust и встроенный ассемблер. Чтобы использовать его для загрузки нашего ядра, нам нужно добавить зависимость от него: [`bootloader`]: https://crates.io/crates/bootloader ```toml # in Cargo.toml [dependencies] bootloader = "0.9" ``` Добавление загрузчика в качестве зависимости недостаточно для создания загрузочного образа диска. Проблема в том, что нам нужно связать наше ядро с загрузчиком после компиляции, но в cargo нет поддержки [скриптов после сборки][post-build scripts]. [post-build scripts]: https://github.com/rust-lang/cargo/issues/545 Для решения этой проблемы мы создали инструмент `bootimage`, который сначала компилирует ядро и загрузчик, а затем соединяет их вместе для создания загрузочного образа диска. Чтобы установить инструмент, выполните следующую команду в терминале: ``` cargo install bootimage ``` Для запуска `bootimage` и сборки загрузчика вам необходимо установить компонент rustup `llvm-tools-preview`. Это можно сделать, выполнив команду `rustup component add llvm-tools-preview`. После установки `bootimage` и добавления компонента `llvm-tools-preview` мы можем создать образ загрузочного диска, выполнив команду: ``` > cargo bootimage ``` Мы видим, что инструмент перекомпилирует наше ядро с помощью `cargo build`, поэтому он автоматически подхватит все внесенные вами изменения. После этого он компилирует загрузчик, что может занять некоторое время. Как и все зависимости модулей, он собирается только один раз, а затем кэшируется, поэтому последующие сборки будут происходить гораздо быстрее. Наконец, `bootimage` объединяет загрузчик и ваше ядро в загрузочный образ диска. После выполнения команды вы должны увидеть загрузочный образ диска с именем `bootimage-blog_os.bin` в каталоге `target/x86_64-blog_os/debug`. Вы можете загрузить его в виртуальной машине или скопировать на USB-накопитель, чтобы загрузить его на реальном оборудовании. (Обратите внимание, что это не образ CD, который имеет другой формат, поэтому запись на CD не работает). #### Как этот работает? Инструмент `bootimage` выполняет следующие действия за кулисами: - Компилирует наше ядро в файл [ELF]. - Компилирует зависимость загрузчика как отдельный исполняемый файл. - Он связывает байты ELF-файла ядра с загрузчиком. [ELF]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format [rust-osdev/bootloader]: https://github.com/rust-osdev/bootloader При запуске загрузчик читает и разбирает приложенный файл ELF. Затем он сопоставляет сегменты программы с виртуальными адресами в таблицах страниц, обнуляет секцию `.bss` и устанавливает стек. Наконец, он считывает адрес точки входа (наша функция `_start`) и переходит к ней. ### Запуск через QEMU Теперь мы можем загрузить образ диска в виртуальной машине. Чтобы загрузить его в [QEMU], выполните следующую команду: [QEMU]: https://www.qemu.org/ ``` > qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] ``` Откроется отдельное окно, которое выглядит следующим образом: ![QEMU showing "Hello World!"](qemu.png) Мы видим, что наш "Hello World!" отображается на экране. ### Настоящая машина Также можно записать его на USB-накопитель и загрузить на реальной машине: ``` > dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX && sync ``` Где `sdX` - имя устройства вашего USB-накопителя. **Внимательно проверьте**, что вы выбрали правильное имя устройства, потому что все, что находится на этом устройстве, будет перезаписано. После записи образа на USB-накопитель его можно запустить на реальном оборудовании, загрузившись с него. Для загрузки с USB-накопителя вам, вероятно, потребуется использовать специальное меню загрузки или изменить порядок загрузки в конфигурации BIOS. Обратите внимание, что в настоящее время это не работает на машинах с UEFI, так как модуль `bootloader` пока не имеет поддержки UEFI. ### Использование `cargo run` Чтобы облегчить запуск нашего ядра в QEMU, мы можем установить ключ конфигурации `runner` для cargo: ```toml # in .cargo/config.toml [target.'cfg(target_os = "none")'] runner = "bootimage runner" ``` Таблица `target.'cfg(target_os = "none")'` применяется ко всем целям, которые установили поле `"os"` своего конфигурационного файла цели на `"none"`. Это включает нашу цель `x86_64-blog_os.json`. Ключ `runner` указывает команду, которая должна быть вызвана для `cargo run`. Команда запускается после успешной сборки с путем к исполняемому файлу, переданному в качестве первого аргумента. Более подробную информацию смотрите в [документации по cargo][cargo configuration]. Команда `bootimage runner` специально разработана для использования в качестве исполняемого файла `runner`. Она связывает заданный исполняемый файл с зависимостью загрузчика проекта, а затем запускает QEMU. Более подробную информацию и возможные варианты конфигурации смотрите в [Readme of `bootimage`]. [Readme of `bootimage`]: https://github.com/rust-osdev/bootimage Теперь мы можем использовать `cargo run` для компиляции нашего ядра и его загрузки в QEMU. ## Что дальше? В следующем посте мы более подробно рассмотрим текстовый буфер VGA и напишем безопасный интерфейс для него. Мы также добавим поддержку макроса `println`. ================================================ FILE: blog/content/edition-2/posts/02-minimal-rust-kernel/index.zh-CN.md ================================================ +++ title = "最小内核" weight = 2 path = "zh-CN/minimal-rust-kernel" date = 2018-02-10 [extra] # Please update this when updating the translation translation_based_on_commit = "096c044b4f3697e91d8e30a2e817e567d0ef21a2" # GitHub usernames of the people that translated this post translators = ["luojia65", "Rustin-Liu", "liuyuran"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["JiangengDong"] +++ 在这篇文章中,我们将基于 **x86架构**(the x86 architecture),使用 Rust 语言,编写一个最小化的 64 位内核。我们将从上一章中构建的[独立式可执行程序][freestanding-rust-binary]开始,构建自己的内核;它将向显示器打印字符串,并能被打包为一个能够引导启动的**磁盘映像**(disk image)。 [freestanding-rust-binary]: @/edition-2/posts/01-freestanding-rust-binary/index.md 此博客在 [GitHub] 上公开开发. 如果您有任何问题或疑问,请在此处打开一个 issue。 您也可以在[底部][at the bottom]发表评论. 这篇文章的完整源代码可以在 [`post-02`] [post branch] 分支中找到。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-02 ## 引导启动 当我们启动电脑时,主板 [ROM](https://en.wikipedia.org/wiki/Read-only_memory)内存储的**固件**(firmware)将会运行:它将负责电脑的**加电自检**([power-on self test](https://en.wikipedia.org/wiki/Power-on_self-test)),**可用内存**(available RAM)的检测,以及 CPU 和其它硬件的预加载。这之后,它将寻找一个**可引导的存储介质**(bootable disk),并开始引导启动其中的**内核**(kernel)。 x86 架构支持两种固件标准: **BIOS**([Basic Input/Output System](https://en.wikipedia.org/wiki/BIOS))和 **UEFI**([Unified Extensible Firmware Interface](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface))。其中,BIOS 标准显得陈旧而过时,但实现简单,并为 1980 年代后的所有 x86 设备所支持;相反地,UEFI 更现代化,功能也更全面,但开发和构建更复杂(至少从我的角度看是如此)。 在这篇文章中,我们暂时只提供 BIOS 固件的引导启动方式,但是UEFI支持也已经在计划中了。如果你希望帮助我们推进它,请查阅这份 [Github issue](https://github.com/phil-opp/blog_os/issues/349)。 ### BIOS 启动 几乎所有的 x86 硬件系统都支持 BIOS 启动,这也包含新型的、基于 UEFI、用**模拟 BIOS**(emulated BIOS)的方式向后兼容的硬件系统。这可以说是一件好事情,因为无论是上世纪还是现在的硬件系统,你都只需编写同样的引导启动逻辑;但这种兼容性有时也是 BIOS 引导启动最大的缺点,因为这意味着在系统启动前,你的 CPU 必须先进入一个 16 位系统兼容的**实模式**([real mode](https://en.wikipedia.org/wiki/Real_mode)),这样 1980 年代古老的引导固件才能够继续使用。 让我们从头开始,理解一遍 BIOS 启动的过程。 当电脑启动时,主板上特殊的闪存中存储的 BIOS 固件将被加载。BIOS 固件将会加电自检、初始化硬件,然后它将寻找一个可引导的存储介质。如果找到了,那电脑的控制权将被转交给**引导程序**(bootloader):一段存储在存储介质的开头的、512字节长度的程序片段。大多数的引导程序长度都大于512字节——所以通常情况下,引导程序都被切分为一段优先启动、长度不超过512字节、存储在介质开头的**第一阶段引导程序**(first stage bootloader),和一段随后由其加载的、长度可能较长、存储在其它位置的**第二阶段引导程序**(second stage bootloader)。 引导程序必须决定内核的位置,并将内核加载到内存。引导程序还需要将 CPU 从 16 位的实模式,先切换到 32 位的**保护模式**([protected mode](https://en.wikipedia.org/wiki/Protected_mode)),最终切换到 64 位的**长模式**([long mode](https://en.wikipedia.org/wiki/Long_mode)):此时,所有的 64 位寄存器和整个**主内存**(main memory)才能被访问。引导程序的第三个作用,是从 BIOS 查询特定的信息,并将其传递到内核;如查询和传递**内存映射表**(memory map)。 编写一个引导程序并不是一个简单的任务,因为这需要使用汇编语言,而且必须经过许多意图并不明显的步骤——比如,把一些**魔术数字**(magic number)写入某个寄存器。因此,我们不会讲解如何编写自己的引导程序,而是推荐 [bootimage 工具](https://github.com/rust-osdev/bootimage)——它能够自动并且方便地为你的内核准备一个引导程序。 ### Multiboot 标准 每个操作系统都实现自己的引导程序,而这只对单个操作系统有效。为了避免这样的僵局,1995 年,**自由软件基金会**([Free Software Foundation](https://en.wikipedia.org/wiki/Free_Software_Foundation))颁布了一个开源的引导程序标准——[Multiboot](https://wiki.osdev.org/Multiboot)。这个标准定义了引导程序和操作系统间的统一接口,所以任何适配 Multiboot 的引导程序,都能用来加载任何同样适配了 Multiboot 的操作系统。[GNU GRUB](https://en.wikipedia.org/wiki/GNU_GRUB) 是一个可供参考的 Multiboot 实现,它也是最热门的Linux系统引导程序之一。 要编写一款适配 Multiboot 的内核,我们只需要在内核文件开头,插入被称作 **Multiboot头**([Multiboot header](https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#OS-image-format))的数据片段。这让 GRUB 很容易引导任何操作系统,但是,GRUB 和 Multiboot 标准也有一些可预知的问题: 1. 它们只支持 32 位的保护模式。这意味着,在引导之后,你依然需要配置你的 CPU,让它切换到 64 位的长模式; 2. 它们被设计为精简引导程序,而不是精简内核。举个例子,内核需要以调整过的**默认页长度**([default page size](https://wiki.osdev.org/Multiboot#Multiboot_2))被链接,否则 GRUB 将无法找到内核的 Multiboot 头。另一个例子是**引导信息**([boot information](https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-information-format)),这个包含着大量与架构有关的数据,会在引导启动时,被直接传到操作系统,而不会经过一层清晰的抽象; 3. GRUB 和 Multiboot 标准并没有被详细地解释,阅读相关文档需要一定经验; 4. 为了创建一个能够被引导的磁盘映像,我们在开发时必须安装 GRUB:这加大了基于 Windows 或 macOS 开发内核的难度。 出于这些考虑,我们决定不使用 GRUB 或者 Multiboot 标准。然而,Multiboot 支持功能也在 bootimage 工具的开发计划之中,所以从原理上讲,如果选用 bootimage 工具,在未来使用 GRUB 引导你的系统内核是可能的。 如果你对编写一个支持 Mutiboot 标准的内核有兴趣,可以查阅 [初版文档][first edition]。 [first edition]: @/edition-1/_index.md ### UEFI (截至此时,我们并未提供UEFI相关教程,但我们确实有此意向。如果你愿意提供一些帮助,请在 [Github issue](https://github.com/phil-opp/blog_os/issues/349) 告知我们,不胜感谢。) ## 最小内核 现在我们已经明白电脑是如何启动的,那也是时候编写我们自己的内核了。我们的小目标是,创建一个内核的磁盘映像,它能够在启动时,向屏幕输出一行“Hello World!”;我们的工作将基于上一章构建的[独立式可执行程序][freestanding-rust-binary]。 如果读者还有印象的话,在上一章,我们使用 `cargo` 构建了一个独立的二进制程序;但这个程序依然基于特定的操作系统平台:因平台而异,我们需要定义不同名称的函数,且使用不同的编译指令。这是因为在默认情况下,`cargo` 会为特定的**宿主系统**(host system)构建源码,比如为你正在运行的系统构建源码。这并不是我们想要的,因为我们的内核不应该基于另一个操作系统——我们想要编写的,就是这个操作系统。确切地说,我们想要的是,编译为一个特定的**目标系统**(target system)。 ## 安装 Nightly Rust Rust 语言有三个**发行频道**(release channel),分别是 stable、beta 和 nightly。《Rust 程序设计语言》中对这三个频道的区别解释得很详细,可以前往[这里](https://doc.rust-lang.org/book/appendix-07-nightly-rust.html)看一看。为了搭建一个操作系统,我们需要一些只有 nightly 会提供的实验性功能,所以我们需要安装一个 nightly 版本的 Rust。 要管理安装好的 Rust,我强烈建议使用 [rustup](https://www.rustup.rs/):它允许你同时安装 nightly、beta 和 stable 版本的编译器,而且让更新 Rust 变得容易。你可以输入 `rustup override add nightly` 来选择在当前目录使用 nightly 版本的 Rust。或者,你也可以在项目根目录添加一个名称为 `rust-toolchain`、内容为 `nightly` 的文件。要检查你是否已经安装了一个 nightly,你可以运行 `rustc --version`:返回的版本号末尾应该包含`-nightly`。 Nightly 版本的编译器允许我们在源码的开头插入**特性标签**(feature flag),来自由选择并使用大量实验性的功能。举个例子,要使用实验性的[内联汇编(asm!宏)][asm feature],我们可以在 `main.rs` 的顶部添加 `#![feature(asm)]`。要注意的是,这样的实验性功能**不稳定**(unstable),意味着未来的 Rust 版本可能会修改或移除这些功能,而不会有预先的警告过渡。因此我们只有在绝对必要的时候,才应该使用这些特性。 [asm feature]: https://doc.rust-lang.org/stable/reference/inline-assembly.html ### 目标配置清单 通过 `--target` 参数,`cargo` 支持不同的目标系统。这个目标系统可以使用一个**目标三元组**([target triple](https://clang.llvm.org/docs/CrossCompilation.html#target-triple))来描述,它描述了 CPU 架构、平台供应者、操作系统和**应用程序二进制接口**([Application Binary Interface, ABI](https://stackoverflow.com/a/2456882))。比方说,目标三元组` x86_64-unknown-linux-gnu` 描述一个基于 `x86_64` 架构 CPU 的、没有明确的平台供应者的 linux 系统,它遵循 GNU 风格的 ABI。Rust 支持[许多不同的目标三元组](https://forge.rust-lang.org/release/platform-support.html),包括安卓系统对应的 `arm-linux-androideabi` 和 [WebAssembly使用的wasm32-unknown-unknown](https://www.hellorust.com/setup/wasm-target/)。 为了编写我们的目标系统,并且鉴于我们需要做一些特殊的配置(比如没有依赖的底层操作系统),[已经支持的目标三元组](https://forge.rust-lang.org/release/platform-support.html)都不能满足我们的要求。幸运的是,只需使用一个 JSON 文件,Rust 便允许我们定义自己的目标系统;这个文件常被称作**目标配置清单**(target specification)。比如,一个描述 `x86_64-unknown-linux-gnu` 目标系统的配置清单大概长这样: ```json { "llvm-target": "x86_64-unknown-linux-gnu", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "linux", "executables": true, "linker-flavor": "gcc", "pre-link-args": ["-m64"], "morestack": false } ``` 一个配置清单中包含多个**配置项**(field)。大多数的配置项都是 LLVM 需求的,它们将配置为特定平台生成的代码。打个比方,`data-layout` 配置项定义了不同的整数、浮点数、指针类型的长度;另外,还有一些 Rust 用作条件编译的配置项,如 `target-pointer-width`。还有一些类型的配置项,定义了这个包该如何被编译,例如,`pre-link-args` 配置项指定了应该向**链接器**([linker](https://en.wikipedia.org/wiki/Linker_(computing)))传入的参数。 我们将把我们的内核编译到 `x86_64` 架构,所以我们的配置清单将和上面的例子相似。现在,我们来创建一个名为 `x86_64-blog_os.json` 的文件——当然也可以选用自己喜欢的文件名——里面包含这样的内容: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true } ``` 需要注意的是,因为我们要在**裸机**(bare metal)上运行内核,我们已经修改了 `llvm-target` 的内容,并将 `os` 配置项的值改为 `none`。 我们还需要添加下面与编译相关的配置项: ```json "linker-flavor": "ld.lld", "linker": "rust-lld", ``` 在这里,我们不使用平台默认提供的链接器,因为它可能不支持 Linux 目标系统。为了链接我们的内核,我们使用跨平台的 **LLD链接器**([LLD linker](https://lld.llvm.org/)),它是和 Rust 一起打包发布的。 ```json "panic-strategy": "abort", ``` 这个配置项的意思是,我们的编译目标不支持 panic 时的**栈展开**([stack unwinding](https://www.bogotobogo.com/cplusplus/stackunwinding.php)),所以我们选择直接**在 panic 时中止**(abort on panic)。这和在 `Cargo.toml` 文件中添加 `panic = "abort"` 选项的作用是相同的,所以我们可以不在这里的配置清单中填写这一项。 ```json "disable-redzone": true, ``` 我们正在编写一个内核,所以我们迟早要处理中断。要安全地实现这一点,我们必须禁用一个与**红区**(redzone)有关的栈指针优化:因为此时,这个优化可能会导致栈被破坏。如果需要更详细的资料,请查阅我们的一篇关于 [禁用红区][disabling the red zone] 的短文。 [disabling the red zone]: @/edition-2/posts/02-minimal-rust-kernel/disable-red-zone/index.zh-CN.md ```json "features": "-mmx,-sse,+soft-float", ``` `features` 配置项被用来启用或禁用某个目标 **CPU 特征**(CPU feature)。通过在它们前面添加`-`号,我们将 `mmx` 和 `sse` 特征禁用;添加前缀`+`号,我们启用了 `soft-float` 特征。 `mmx` 和 `sse` 特征决定了是否支持**单指令多数据流**([Single Instruction Multiple Data,SIMD](https://en.wikipedia.org/wiki/SIMD))相关指令,这些指令常常能显著地提高程序层面的性能。然而,在内核中使用庞大的 SIMD 寄存器,可能会造成较大的性能影响:因为每次程序中断时,内核不得不储存整个庞大的 SIMD 寄存器以备恢复——这意味着,对每个硬件中断或系统调用,完整的 SIMD 状态必须存到主存中。由于 SIMD 状态可能相当大(512~1600 个字节),而中断可能时常发生,这些额外的存储与恢复操作可能显著地影响效率。为解决这个问题,我们对内核禁用 SIMD(但这不意味着禁用内核之上的应用程序的 SIMD 支持)。 禁用 SIMD 产生的一个问题是,`x86_64` 架构的浮点数指针运算默认依赖于 SIMD 寄存器。我们的解决方法是,启用 `soft-float` 特征,它将使用基于整数的软件功能,模拟浮点数指针运算。 为了让读者的印象更清晰,我们撰写了一篇关于 [禁用 SIMD][disabling SIMD] 的短文。 [disabling SIMD]: @/edition-2/posts/02-minimal-rust-kernel/disable-simd/index.zh-CN.md ```json "rustc-abi": "x86-softfloat" ``` As we want to use the `soft-float` feature, we also need to tell the Rust compiler `rustc` that we want to use the corresponding ABI. We can do that by setting the `rustc-abi` field to `x86-softfloat`. 现在,我们将各个配置项整合在一起。我们的目标配置清单应该长这样: ```json { "llvm-target": "x86_64-unknown-none", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", "arch": "x86_64", "target-endian": "little", "target-pointer-width": 64, "target-c-int-width": 32, "os": "none", "executables": true, "linker-flavor": "ld.lld", "linker": "rust-lld", "panic-strategy": "abort", "disable-redzone": true, "features": "-mmx,-sse,+soft-float", "rustc-abi": "x86-softfloat" } ``` ### 编译内核 要编译我们的内核,我们将使用 Linux 系统的编写风格(这可能是 LLVM 的默认风格)。这意味着,我们需要把[前一篇文章][previous post]中编写的入口点重命名为 `_start`: [previous post]: @/edition-2/posts/01-freestanding-rust-binary/index.md ```rust // src/main.rs #![no_std] // 不链接 Rust 标准库 #![no_main] // 禁用所有 Rust 层级的入口点 use core::panic::PanicInfo; /// 这个函数将在 panic 时被调用 #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } #[unsafe(no_mangle)] // 不重整函数名 pub extern "C" fn _start() -> ! { // 因为编译器会寻找一个名为 `_start` 的函数,所以这个函数就是入口点 // 默认命名为 `_start` loop {} } ``` 注意的是,无论你开发使用的是哪类操作系统,你都需要将入口点命名为 `_start`。前一篇文章中编写的 Windows 系统和 macOS 对应的入口点不应该被保留。 通过把 JSON 文件名传入 `--target` 选项,我们现在可以开始编译我们的内核。让我们试试看: ``` > cargo build --target x86_64-blog_os.json error: `.json` target specs require -Zjson-target-spec ``` 毫不意外的编译失败了,错误信息告诉我们自定义 JSON 目标规范是一个不稳定的功能,需要显式启用。这是因为 JSON 目标文件的格式尚未被认为是稳定的,因此在未来的 Rust 版本中可能会发生变化。有关更多信息,请参阅[自定义 JSON 目标规范的跟踪 issue][json-target-spec-issue]。 [json-target-spec-issue]: https://github.com/rust-lang/rust/issues/151528 #### `json-target-spec` 选项 要启用对自定义 JSON 目标规范的支持,我们需要创建一个 [cargo 配置][cargo configuration] 文件,即 `.cargo/config.toml`(`.cargo` 文件夹应该在 `src` 文件夹旁边),并写入以下语句: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # in .cargo/config.toml [unstable] json-target-spec = true ``` 这会启用不稳定的 `json-target-spec` 功能,允许我们使用自定义的 JSON 目标文件。 有了这个配置后,让我们再次尝试编译: ``` > cargo build --target x86_64-blog_os.json error[E0463]: can't find crate for `core` ``` 现在我们看到了一个不同的错误!错误信息告诉我们编译器没有找到 [`core`][`core` library] 这个crate,它包含了Rust语言中的部分基础类型,如 `Result`、`Option`、迭代器等等,并且它还会隐式链接到 `no_std` 特性里面。 [`core` library]: https://doc.rust-lang.org/nightly/core/index.html 通常状况下,`core` crate以**预编译库**(precompiled library)的形式与 Rust 编译器一同发布——这时,`core` crate只对支持的宿主系统有效,而对我们自定义的目标系统无效。如果我们想为其它系统编译代码,我们需要为这些系统重新编译整个 `core` crate。 #### `build-std` 选项 此时就到了cargo中 [`build-std` 特性][`build-std` feature] 登场的时刻,该特性允许你按照自己的需要重编译 `core` 等标准crate,而不需要使用Rust安装程序内置的预编译版本。 但是该特性是全新的功能,到目前为止尚未完全完成,所以它被标记为 "unstable" 且仅被允许在 [Nightly Rust 编译器][Nightly Rust compilers] 环境下调用。 [`build-std` feature]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std [Nightly Rust compilers]:https://os.phil-opp.com/zh-CN/minimal-rust-kernel/#an-zhuang-nightly-rust 要启用该特性,你需要在 [cargo 配置][cargo configuration] 文件 `.cargo/config.toml` 中添加以下语句: ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std = ["core", "compiler_builtins"] ``` 该配置会告知cargo需要重新编译 `core` 和 `compiler_builtins` 这两个crate,其中 `compiler_builtins` 是 `core` 的必要依赖。 另外重编译需要提供源码,我们可以使用 `rustup component add rust-src` 命令来下载它们。
    **Note:** 仅 `2020-07-15` 之后的Rust nightly版本支持 `unstable.build-std` 配置项。
    在设定 `unstable.build-std` 配置项并安装 `rust-src` 组件之后,我们就可以开始编译了: ``` > cargo build --target x86_64-blog_os.json Compiling core v0.0.0 (/…/rust/src/libcore) Compiling rustc-std-workspace-core v1.99.0 (/…/rust/src/tools/rustc-std-workspace-core) Compiling compiler_builtins v0.1.32 Compiling blog_os v0.1.0 (/…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.29 secs ``` 如你所见,在执行 `cargo build` 之后, `core`、`rustc-std-workspace-core` (`compiler_builtins` 的依赖)和 `compiler_builtins` crate被重新编译了。 #### 内存相关函数 目前来说,Rust编译器假定所有内置函数(`built-in functions`)在所有系统内都是存在且可用的。事实上这个前提只对了一半, 绝大多数内置函数都可以被 `compiler_builtins` 提供,而这个crate刚刚已经被我们重编译过了,然而部分内存相关函数是需要操作系统相关的标准C库提供的。 比如,`memset`(该函数可以为一个内存块内的所有比特进行赋值)、`memcpy`(将一个内存块里的数据拷贝到另一个内存块)以及`memcmp`(比较两个内存块的数据)。 好在我们的内核暂时还不需要用到这些函数,但是不要高兴的太早,当我们编写更丰富的功能(比如拷贝数据结构)时就会用到了。 现在我们当然无法提供操作系统相关的标准C库,所以我们需要使用其他办法提供这些东西。一个显而易见的途径就是自己实现 `memset` 这些函数,但不要忘记加入 `#[unsafe(no_mangle)]` 语句,以避免编译时被自动重命名。 当然,这样做很危险,底层函数中最细微的错误也会将程序导向不可预知的未来。比如,你可能在实现 `memcpy` 时使用了一个 `for` 循环,然而 `for` 循环本身又会调用 [`IntoIterator::into_iter`] 这个trait方法,这个方法又会再次调用 `memcpy`,此时一个无限递归就产生了,所以还是使用经过良好测试的既存实现更加可靠。 [`IntoIterator::into_iter`]: https://doc.rust-lang.org/stable/core/iter/trait.IntoIterator.html#tymethod.into_iter 幸运的是,`compiler_builtins` 事实上自带了所有相关函数的实现,只是在默认情况下,出于避免和标准C库发生冲突的考量被禁用掉了,此时我们需要将 [`build-std-features`] 配置项设置为 `["compiler-builtins-mem"]` 来启用这个特性。如同 `build-std` 配置项一样,该特性可以使用 `-Z` 参数启用,也可以在 `.cargo/config.toml` 中使用 `unstable` 配置集启用。现在我们的配置文件中的相关部分是这样子的: [`build-std-features`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std-features ```toml # in .cargo/config.toml [unstable] json-target-spec = true build-std-features = ["compiler-builtins-mem"] build-std = ["core", "compiler_builtins"] ``` (`compiler-builtins-mem` 特性是在 [这个PR](https://github.com/rust-lang/rust/pull/77284) 中被引入的,所以你的Rust nightly更新时间必须晚于 `2020-09-30`。) 该参数为 `compiler_builtins` 启用了 [`mem` 特性][`mem` feature],至于具体效果,就是已经在内部通过 `#[unsafe(no_mangle)]` 向链接器提供了 [`memcpy` 等函数的实现][`memcpy` etc. implementations]。 [`mem` feature]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/Cargo.toml#L54-L55 [`memcpy` etc. implementations]: https://github.com/rust-lang/compiler-builtins/blob/eff506cd49b637f1ab5931625a33cef7e91fbbf6/src/mem.rs#L12-L69 经过这些修改,我们的内核已经完成了所有编译所必需的函数,那么让我们继续对代码进行完善。 #### 设置默认编译目标 每次调用 `cargo build` 命令都需要传入 `--target` 参数很麻烦吧?其实我们可以复写掉默认值,从而省略这个参数,只需要在 `.cargo/config.toml` 中加入以下 [cargo 配置][cargo configuration]: [cargo configuration]: https://doc.rust-lang.org/cargo/reference/config.html ```toml # in .cargo/config.toml [build] target = "x86_64-blog_os.json" ``` 这个配置会告知 `cargo` 使用 `x86_64-blog_os.json` 这个文件作为默认的 `--target` 参数,此时只输入短短的一句 `cargo build` 就可以编译到指定平台了。如果你对其他配置项感兴趣,亦可以查阅 [官方文档][cargo configuration]。 那么现在我们已经可以用 `cargo build` 完成程序编译了,然而被成功调用的 `_start` 函数的函数体依然是一个空空如也的循环,是时候往屏幕上输出一点什么了。 ### 向屏幕打印字符 要做到这一步,最简单的方式是写入 **VGA 字符缓冲区**([VGA text buffer](https://en.wikipedia.org/wiki/VGA-compatible_text_mode)):这是一段映射到 VGA 硬件的特殊内存片段,包含着显示在屏幕上的内容。通常情况下,它能够存储 25 行、80 列共 2000 个**字符单元**(character cell);每个字符单元能够显示一个 ASCII 字符,也能设置这个字符的**前景色**(foreground color)和**背景色**(background color)。输出到屏幕的字符大概长这样: ![](https://upload.wikimedia.org/wikipedia/commons/6/6d/Codepage-737.png) 我们将在下篇文章中详细讨论 VGA 字符缓冲区的内存布局;目前我们只需要知道,这段缓冲区的地址是 `0xb8000`,且每个字符单元包含一个 ASCII 码字节和一个颜色字节。 我们的实现就像这样: ```rust static HELLO: &[u8] = b"Hello World!"; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { let vga_buffer = 0xb8000 as *mut u8; for (i, &byte) in HELLO.iter().enumerate() { unsafe { *vga_buffer.offset(i as isize * 2) = byte; *vga_buffer.offset(i as isize * 2 + 1) = 0xb; } } loop {} } ``` 在这段代码中,我们预先定义了一个**字节字符串**(byte string)类型的**静态变量**(static variable),名为 `HELLO`。我们首先将整数 `0xb8000` **转换**(cast)为一个**裸指针**([raw pointer])。这之后,我们迭代 `HELLO` 的每个字节,使用 [enumerate](https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate) 获得一个额外的序号变量 `i`。在 `for` 语句的循环体中,我们使用 [offset](https://doc.rust-lang.org/std/primitive.pointer.html#method.offset) 偏移裸指针,解引用它,来将字符串的每个字节和对应的颜色字节——`0xb` 代表淡青色——写入内存位置。 [raw pointer]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#dereferencing-a-raw-pointer 要注意的是,所有的裸指针内存操作都被一个 **unsafe 语句块**([unsafe block](https://doc.rust-lang.org/stable/book/second-edition/ch19-01-unsafe-rust.html))包围。这是因为,此时编译器不能确保我们创建的裸指针是有效的;一个裸指针可能指向任何一个你内存位置;直接解引用并写入它,也许会损坏正常的数据。使用 `unsafe` 语句块时,程序员其实在告诉编译器,自己保证语句块内的操作是有效的。事实上,`unsafe` 语句块并不会关闭 Rust 的安全检查机制;它允许你多做的事情[只有四件][unsafe superpowers]。 [unsafe superpowers]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers 使用 `unsafe` 语句块要求程序员有足够的自信,所以必须强调的一点是,**肆意使用 unsafe 语句块并不是 Rust 编程的一贯方式**。在缺乏足够经验的前提下,直接在 `unsafe` 语句块内操作裸指针,非常容易把事情弄得很糟糕;比如,在不注意的情况下,我们很可能会意外地操作缓冲区以外的内存。 在这样的前提下,我们希望最小化 `unsafe` 语句块的使用。使用 Rust 语言,我们能够将不安全操作将包装为一个安全的抽象模块。举个例子,我们可以创建一个 VGA 缓冲区类型,把所有的不安全语句封装起来,来确保从类型外部操作时,无法写出不安全的代码:通过这种方式,我们只需要最少的 `unsafe` 语句块来确保我们不破坏**内存安全**([memory safety](https://en.wikipedia.org/wiki/Memory_safety))。在下一篇文章中,我们将会创建这样的 VGA 缓冲区封装。 ## 启动内核 既然我们已经有了一个能够打印字符的可执行程序,是时候把它运行起来试试看了。首先,我们将编译完毕的内核与引导程序链接,来创建一个引导映像;这之后,我们可以在 QEMU 虚拟机中运行它,或者通过 U 盘在真机上运行。 ### 创建引导映像 要将可执行程序转换为**可引导的映像**(bootable disk image),我们需要把它和引导程序链接。这里,引导程序将负责初始化 CPU 并加载我们的内核。 编写引导程序并不容易,所以我们不编写自己的引导程序,而是使用已有的 [bootloader](https://crates.io/crates/bootloader) 包;无需依赖于 C 语言,这个包基于 Rust 代码和内联汇编,实现了一个五脏俱全的 BIOS 引导程序。为了用它启动我们的内核,我们需要将它添加为一个依赖项,在 `Cargo.toml` 中添加下面的代码: ```toml # in Cargo.toml [dependencies] bootloader = "0.9" ``` ** 注意:** 当前环境仅兼容 `bootloader v0.9` 版本。较新的版本需考虑使用其他的构建工具,否则会导致构建出现未知错误。 只添加引导程序为依赖项,并不足以创建一个可引导的磁盘映像;我们还需要内核编译完成之后,将内核和引导程序组合在一起。然而,截至目前,原生的 cargo 并不支持在编译完成后添加其它步骤(详见[这个 issue](https://github.com/rust-lang/cargo/issues/545))。 为了解决这个问题,我们建议使用 `bootimage` 工具——它将会在内核编译完毕后,将它和引导程序组合在一起,最终创建一个能够引导的磁盘映像。我们可以使用下面的命令来安装这款工具: ```bash cargo install bootimage ``` 为了运行 `bootimage` 以及编译引导程序,我们需要安装 rustup 模块 `llvm-tools-preview`——我们可以使用 `rustup component add llvm-tools-preview` 来安装这个工具。 成功安装 `bootimage` 后,创建一个可引导的磁盘映像就变得相当容易。我们来输入下面的命令: ```bash > cargo bootimage ``` 可以看到的是,`bootimage` 工具开始使用 `cargo build` 编译你的内核,所以它将增量编译我们修改后的源码。在这之后,它会编译内核的引导程序,这可能将花费一定的时间;但和所有其它依赖包相似的是,在首次编译后,产生的二进制文件将被缓存下来——这将显著地加速后续的编译过程。最终,`bootimage` 将把内核和引导程序组合为一个可引导的磁盘映像。 运行这行命令之后,我们应该能在 `target/x86_64-blog_os/debug` 目录内找到我们的映像文件 `bootimage-blog_os.bin`。我们可以在虚拟机内启动它,也可以刻录到 U 盘上以便在真机上启动。(需要注意的是,因为文件格式不同,这里的 bin 文件并不是一个光驱映像,所以将它刻录到光盘不会起作用。) 事实上,在这行命令背后,`bootimage` 工具执行了三个步骤: 1. 编译我们的内核为一个 **ELF**([Executable and Linkable Format](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format))文件; 2. 编译引导程序为独立的可执行文件; 3. 将内核 ELF 文件**按字节拼接**(append by bytes)到引导程序的末端。 当机器启动时,引导程序将会读取并解析拼接在其后的 ELF 文件。这之后,它将把程序片段映射到**分页表**(page table)中的**虚拟地址**(virtual address),清零 **BSS段**(BSS segment),还将创建一个栈。最终它将读取**入口点地址**(entry point address)——我们程序中 `_start` 函数的位置——并跳转到这个位置。 ### 在 QEMU 中启动内核 现在我们可以在虚拟机中启动内核了。为了在[QEMU](https://www.qemu.org/) 中启动内核,我们使用下面的命令: ```bash > qemu-system-x86_64 -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] ``` 然后就会弹出一个独立窗口: ![QEMU showing "Hello World!"](qemu.png) 我们可以看到,屏幕窗口已经显示出 “Hello World!” 字符串。祝贺你! ### 在真机上运行内核 我们也可以使用 dd 工具把内核写入 U 盘,以便在真机上启动。可以输入下面的命令: ```bash > dd if=target/x86_64-blog_os/debug/bootimage-blog_os.bin of=/dev/sdX && sync ``` 在这里,`sdX` 是U盘的**设备名**([device name](https://en.wikipedia.org/wiki/Device_file))。请注意,**在选择设备名的时候一定要极其小心,因为目标设备上已有的数据将全部被擦除**。 写入到 U 盘之后,你可以在真机上通过引导启动你的系统。视情况而定,你可能需要在 BIOS 中打开特殊的启动菜单,或者调整启动顺序。需要注意的是,`bootloader` 包暂时不支持 UEFI,所以我们并不能在 UEFI 机器上启动。 ### 使用 `cargo run` 要让在 QEMU 中运行内核更轻松,我们可以设置在 cargo 配置文件中设置 `runner` 配置项: ```toml # in .cargo/config.toml [target.'cfg(target_os = "none")'] runner = "bootimage runner" ``` 在这里,`target.'cfg(target_os = "none")'` 筛选了三元组中宿主系统设置为 `"none"` 的所有编译目标——这将包含我们的 `x86_64-blog_os.json` 目标。另外,`runner` 的值规定了运行 `cargo run` 使用的命令;这个命令将在成功编译后执行,而且会传递可执行文件的路径为第一个参数。[官方提供的 cargo 文档](https://doc.rust-lang.org/cargo/reference/config.html)讲述了更多的细节。 命令 `bootimage runner` 由 `bootimage` 包提供,参数格式经过特殊设计,可以用于 `runner` 命令。它将给定的可执行文件与项目的引导程序依赖项链接,然后在 QEMU 中启动它。`bootimage` 包的 [README文档](https://github.com/rust-osdev/bootimage) 提供了更多细节和可以传入的配置参数。 现在我们可以使用 `cargo run` 来编译内核并在 QEMU 中启动了。 ## 下篇预告 在下篇文章中,我们将细致地探索 VGA 字符缓冲区,并包装它为一个安全的接口。我们还将基于它实现 `println!` 宏。 ================================================ FILE: blog/content/edition-2/posts/03-vga-text-buffer/index.es.md ================================================ +++ title = "Modo de Texto VGA" weight = 3 path = "es/modo-texto-vga" date = 2018-02-26 [extra] chapter = "Fundamentos" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ El [modo de texto VGA] es una forma sencilla de imprimir texto en la pantalla. En esta publicación, creamos una interfaz que hace que su uso sea seguro y simple al encapsular toda la inseguridad en un módulo separado. También implementamos soporte para los [macros de formato] de Rust. [modo de texto VGA]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode [macros de formato]: https://doc.rust-lang.org/std/fmt/#related-macros Este blog se desarrolla abiertamente en [GitHub]. Si tienes algún problema o pregunta, por favor abre un issue allí. También puedes dejar comentarios [al final]. El código fuente completo para esta publicación se puede encontrar en la rama [`post-03`][rama del post]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [rama del post]: https://github.com/phil-opp/blog_os/tree/post-03 ## El Buffer de Texto VGA Para imprimir un carácter en la pantalla en modo de texto VGA, uno tiene que escribirlo en el buffer de texto del hardware VGA. El buffer de texto VGA es un arreglo bidimensional con típicamente 25 filas y 80 columnas, que se renderiza directamente en la pantalla. Cada entrada del arreglo describe un solo carácter de pantalla a través del siguiente formato: | Bit(s) | Valor | | ------ | --------------------- | | 0-7 | Código de punto ASCII | | 8-11 | Color de primer plano | | 12-14 | Color de fondo | | 15 | Parpadeo | El primer byte representa el carácter que debe imprimirse en la [codificación ASCII]. Para ser más específicos, no es exactamente ASCII, sino un conjunto de caracteres llamado [_página de códigos 437_] con algunos caracteres adicionales y ligeras modificaciones. Para simplificar, procederemos a llamarlo un carácter ASCII en esta publicación. [codificación ASCII]: https://en.wikipedia.org/wiki/ASCII [_página de códigos 437_]: https://en.wikipedia.org/wiki/Code_page_437 El segundo byte define cómo se muestra el carácter. Los primeros cuatro bits definen el color de primer plano, los siguientes tres bits el color de fondo, y el último bit si el carácter debe parpadear. Los siguientes colores están disponibles: | Número | Color | Número + Bit de Brillo | Color Brillante | | ------ | ---------- | ---------------------- | --------------- | | 0x0 | Negro | 0x8 | Gris Oscuro | | 0x1 | Azul | 0x9 | Azul Claro | | 0x2 | Verde | 0xa | Verde Claro | | 0x3 | Cian | 0xb | Cian Claro | | 0x4 | Rojo | 0xc | Rojo Claro | | 0x5 | Magenta | 0xd | Magenta Claro | | 0x6 | Marrón | 0xe | Amarillo | | 0x7 | Gris Claro | 0xf | Blanco | Bit 4 es el _bit de brillo_, que convierte, por ejemplo, azul en azul claro. Para el color de fondo, este bit se reutiliza como el bit de parpadeo. El buffer de texto VGA es accesible a través de [E/S mapeada en memoria] a la dirección `0xb8000`. Esto significa que las lecturas y escrituras a esa dirección no acceden a la RAM, sino que acceden directamente al buffer de texto en el hardware VGA. Esto significa que podemos leer y escribir a través de operaciones de memoria normales a esa dirección. [E/S mapeada en memoria]: https://en.wikipedia.org/wiki/Memory-mapped_I/O Ten en cuenta que el hardware mapeado en memoria podría no soportar todas las operaciones normales de RAM. Por ejemplo, un dispositivo podría soportar solo lecturas por byte y devolver basura cuando se lee un `u64`. Afortunadamente, el buffer de texto [soporta lecturas y escrituras normales], por lo que no tenemos que tratarlo de una manera especial. [soporta lecturas y escrituras normales]: https://web.stanford.edu/class/cs140/projects/pintos/specs/freevga/vga/vgamem.htm#manip ## Un Módulo de Rust Ahora que sabemos cómo funciona el buffer VGA, podemos crear un módulo de Rust para manejar la impresión: ```rust // en src/main.rs mod vga_buffer; ``` Para el contenido de este módulo, creamos un nuevo archivo `src/vga_buffer.rs`. Todo el código a continuación va en nuestro nuevo módulo (a menos que se especifique lo contrario). ### Colores Primero, representamos los diferentes colores usando un enum: ```rust // en src/vga_buffer.rs #[allow(dead_code)] #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u8)] pub enum Color { Black = 0, Blue = 1, Green = 2, Cyan = 3, Red = 4, Magenta = 5, Brown = 6, LightGray = 7, DarkGray = 8, LightBlue = 9, LightGreen = 10, LightCyan = 11, LightRed = 12, Pink = 13, Yellow = 14, White = 15, } ``` Usamos un [enum similar a C] aquí para especificar explícitamente el número para cada color. Debido al atributo `repr(u8)`, cada variante del enum se almacena como un `u8`. En realidad, 4 bits serían suficientes, pero Rust no tiene un tipo `u4`. [enum similar a C]: https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html Normalmente, el compilador emitiría una advertencia por cada variante no utilizada. Al usar el atributo `#[allow(dead_code)]`, deshabilitamos estas advertencias para el enum `Color`. Al [derivar] los rasgos [`Copy`], [`Clone`], [`Debug`], [`PartialEq`], y [`Eq`], habilitamos la [semántica de copia] para el tipo y lo hacemos imprimible y comparable. [derivar]: https://doc.rust-lang.org/rust-by-example/trait/derive.html [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [`Clone`]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html Para representar un código de color completo que especifique el color de primer plano y de fondo, creamos un [nuevo tipo] sobre `u8`: [nuevo tipo]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html ```rust // en src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(transparent)] struct ColorCode(u8); impl ColorCode { fn new(foreground: Color, background: Color) -> ColorCode { ColorCode((background as u8) << 4 | (foreground as u8)) } } ``` La estructura `ColorCode` contiene el byte de color completo, que incluye el color de primer plano y de fondo. Como antes, derivamos los rasgos `Copy` y `Debug` para él. Para asegurar que `ColorCode` tenga el mismo diseño de datos exacto que un `u8`, usamos el atributo [`repr(transparent)`]. [`repr(transparent)`]: https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent ### Buffer de Texto Ahora podemos agregar estructuras para representar un carácter de pantalla y el buffer de texto: ```rust // en src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(C)] struct ScreenChar { ascii_character: u8, color_code: ColorCode, } ``` Dado que el orden de los campos en las estructuras predeterminadas no está definido en Rust, necesitamos el atributo [`repr(C)`]. Garantiza que los campos de la estructura se dispongan exactamente como en una estructura C y, por lo tanto, garantiza el orden correcto de los campos. Para la estructura `Buffer`, usamos [`repr(transparent)`] nuevamente para asegurar que tenga el mismo diseño de memoria que su único campo. [`repr(C)`]: https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc Para escribir en pantalla, ahora creamos un tipo de escritor: ```rust // en src/vga_buffer.rs pub struct Writer { column_position: usize, color_code: ColorCode, buffer: &'static mut Buffer, } ``` El escritor siempre escribirá en la última línea y desplazará las líneas hacia arriba cuando una línea esté llena (o en `\n`). El campo `column_position` lleva un seguimiento de la posición actual en la última fila. Los colores de primer plano y de fondo actuales están especificados por `color_code` y una referencia al buffer VGA está almacenada en `buffer`. Ten en cuenta que necesitamos una [vida útil explícita] aquí para decirle al compilador cuánto tiempo es válida la referencia. La vida útil [`'static`] especifica que la referencia es válida durante todo el tiempo de ejecución del programa (lo cual es cierto para el buffer de texto VGA). [vida útil explícita]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-annotation-syntax [`'static`]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime ### Impresión Ahora podemos usar el `Writer` para modificar los caracteres del buffer. Primero creamos un método para escribir un solo byte ASCII: ```rust // en src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { if self.column_position >= BUFFER_WIDTH { self.new_line(); } let row = BUFFER_HEIGHT - 1; let col = self.column_position; let color_code = self.color_code; self.buffer.chars[row][col].write(ScreenChar { ascii_character: byte, color_code, }); self.column_position += 1; } } } fn new_line(&mut self) {/* TODO */} } ``` Si el byte es el byte de [nueva línea] `\n`, el escritor no imprime nada. En su lugar, llama a un método `new_line`, que implementaremos más tarde. Otros bytes se imprimen en la pantalla en el segundo caso de `match`. [nueva línea]: https://en.wikipedia.org/wiki/Newline Al imprimir un byte, el escritor verifica si la línea actual está llena. En ese caso, se usa una llamada a `new_line` para envolver la línea. Luego escribe un nuevo `ScreenChar` en el buffer en la posición actual. Finalmente, se avanza la posición de la columna actual. Para imprimir cadenas completas, podemos convertirlas en bytes e imprimirlas una por una: ```rust // en src/vga_buffer.rs impl Writer { pub fn write_string(&mut self, s: &str) { for byte in s.bytes() { match byte { // byte ASCII imprimible o nueva línea 0x20..=0x7e | b'\n' => self.write_byte(byte), // no es parte del rango ASCII imprimible _ => self.write_byte(0xfe), } } } } ``` El buffer de texto VGA solo soporta ASCII y los bytes adicionales de [página de códigos 437]. Las cadenas de Rust son [UTF-8] por defecto, por lo que podrían contener bytes que no son soportados por el buffer de texto VGA. Usamos un `match` para diferenciar los bytes ASCII imprimibles (una nueva línea o cualquier cosa entre un carácter de espacio y un carácter `~`) y los bytes no imprimibles. Para los bytes no imprimibles, imprimimos un carácter `■`, que tiene el código hexadecimal `0xfe` en el hardware VGA. [página de códigos 437]: https://en.wikipedia.org/wiki/Code_page_437 [UTF-8]: https://www.fileformat.info/info/unicode/utf8.htm #### ¡Pruébalo! Para escribir algunos caracteres en la pantalla, puedes crear una función temporal: ```rust // en src/vga_buffer.rs pub fn print_something() { let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello "); writer.write_string("Wörld!"); } ``` Primero crea un nuevo Writer que apunta al buffer VGA en `0xb8000`. La sintaxis para esto podría parecer un poco extraña: Primero, convertimos el entero `0xb8000` como un [puntero sin procesar] mutable. Luego lo convertimos en una referencia mutable al desreferenciarlo (a través de `*`) y tomarlo prestado inmediatamente (a través de `&mut`). Esta conversión requiere un [bloque `unsafe`], ya que el compilador no puede garantizar que el puntero sin procesar sea válido. [puntero sin procesar]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [bloque `unsafe`]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html Luego escribe el byte `b'H'` en él. El prefijo `b` crea un [literal de byte], que representa un carácter ASCII. Al escribir las cadenas `"ello "` y `"Wörld!"`, probamos nuestro método `write_string` y el manejo de caracteres no imprimibles. Para ver la salida, necesitamos llamar a la función `print_something` desde nuestra función `_start`: ```rust // en src/main.rs #[no_mangle] pub extern "C" fn _start() -> ! { vga_buffer::print_something(); loop {} } ``` Cuando ejecutamos nuestro proyecto ahora, se debería imprimir un `Hello W■■rld!` en la esquina inferior izquierda de la pantalla en amarillo: [literal de byte]: https://doc.rust-lang.org/reference/tokens.html#byte-literals ![Salida de QEMU con un `Hello W■■rld!` en amarillo en la esquina inferior izquierda](vga-hello.png) Observa que la `ö` se imprime como dos caracteres `■`. Eso es porque `ö` está representado por dos bytes en [UTF-8], los cuales no caen en el rango ASCII imprimible. De hecho, esta es una propiedad fundamental de UTF-8: los bytes individuales de valores multibyte nunca son ASCII válidos. ### Volátil Acabamos de ver que nuestro mensaje se imprimió correctamente. Sin embargo, podría no funcionar con futuros compiladores de Rust que optimicen más agresivamente. El problema es que solo escribimos en el `Buffer` y nunca leemos de él nuevamente. El compilador no sabe que realmente accedemos a la memoria del buffer VGA (en lugar de la RAM normal) y no sabe nada sobre el efecto secundario de que algunos caracteres aparezcan en la pantalla. Por lo tanto, podría decidir que estas escrituras son innecesarias y pueden omitirse. Para evitar esta optimización errónea, necesitamos especificar estas escrituras como _[volátiles]_. Esto le dice al compilador que la escritura tiene efectos secundarios y no debe ser optimizada. [volátiles]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) Para usar escrituras volátiles para el buffer VGA, usamos la biblioteca [volatile][crate volatile]. Este _crate_ (así es como se llaman los paquetes en el mundo de Rust) proporciona un tipo de envoltura `Volatile` con métodos `read` y `write`. Estos métodos usan internamente las funciones [read_volatile] y [write_volatile] de la biblioteca principal y, por lo tanto, garantizan que las lecturas/escrituras no sean optimizadas. [crate volatile]: https://docs.rs/volatile [read_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html [write_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html Podemos agregar una dependencia en el crate `volatile` agregándolo a la sección `dependencies` de nuestro `Cargo.toml`: ```toml # en Cargo.toml [dependencies] volatile = "0.2.6" ``` Asegúrate de especificar la versión `0.2.6` de `volatile`. Las versiones más nuevas del crate no son compatibles con esta publicación. `0.2.6` es el número de versión [semántica]. Para más información, consulta la guía [Especificar Dependencias] de la documentación de cargo. [semántica]: https://semver.org/ [Especificar Dependencias]: https://doc.crates.io/specifying-dependencies.html Vamos a usarlo para hacer que las escrituras al buffer VGA sean volátiles. Actualizamos nuestro tipo `Buffer` de la siguiente manera: ```rust // en src/vga_buffer.rs use volatile::Volatile; struct Buffer { chars: [[Volatile; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` En lugar de un `ScreenChar`, ahora estamos usando un `Volatile`. (El tipo `Volatile` es [genérico] y puede envolver (casi) cualquier tipo). Esto asegura que no podamos escribir accidentalmente en él “normalmente”. En su lugar, ahora tenemos que usar el método `write`. [genérico]: https://doc.rust-lang.org/book/ch10-01-syntax.html Esto significa que tenemos que actualizar nuestro método `Writer::write_byte`: ```rust // en src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { ... self.buffer.chars[row][col].write(ScreenChar { ascii_character: byte, color_code, }); ... } } } ... } ``` ================================================ FILE: blog/content/edition-2/posts/03-vga-text-buffer/index.fa.md ================================================ +++ title = "حالت متن VGA" weight = 3 path = "fa/vga-text-mode" date = 2018-02-26 [extra] # Please update this when updating the translation translation_based_on_commit = "fb8b03e82d9805473fed16e8795a78a020a6b537" # GitHub usernames of the people that translated this post translators = ["hamidrezakp", "MHBahrampour"] rtl = true +++ [حالت متن VGA] یک روش ساده برای چاپ متن روی صفحه است. در این پست ، با قرار دادن همه موارد غیر ایمنی در یک ماژول جداگانه ، رابطی ایجاد می کنیم که استفاده از آن را ایمن و ساده می کند. همچنین پشتیبانی از [ماکروی فرمت‌بندی] راست را پیاده سازی میکنیم. [حالت متن VGA]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode [ماکروی فرمت‌بندی]: https://doc.rust-lang.org/std/fmt/#related-macros این بلاگ بصورت آزاد بر روی [گیت‌هاب] توسعه داده شده. اگر مشکل یا سوالی دارید، لطفا آنجا یک ایشو باز کنید. همچنین می‌توانید [در زیر] این پست کامنت بگذارید. سورس کد کامل این پست را می توانید در شاخه [`post-01`][post branch] پیدا کنید. [گیت‌هاب]: https://github.com/phil-opp/blog_os [در زیر]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-03 ## بافر متن VGA برای چاپ یک کاراکتر روی صفحه در حالت متن VGA ، باید آن را در بافر متن سخت افزار VGA بنویسید. بافر متن VGA یک آرایه دو بعدی است که به طور معمول 25 ردیف و 80 ستون دارد که مستقیماً به صفحه نمایش داده(رندر) می شود. هر خانه آرایه یک کاراکتر صفحه نمایش را از طریق قالب زیر توصیف می کند: | Bit(s) | Value | | ------ | ---------------- | | 0-7 | ASCII code point | | 8-11 | Foreground color | | 12-14 | Background color | | 15 | Blink | اولین بایت کاراکتری در [کدگذاری ASCII] را نشان می دهد که باید چاپ شود. اگر بخواهیم دقیق باشیم ، دقیقاً ASCII نیست ، بلکه مجموعه ای از کاراکترها به نام [_کد صفحه 437_] با برخی کاراکتر های اضافی و تغییرات جزئی است. برای سادگی ، ما در این پست آنرا یک کاراکتر ASCII می نامیم. [کدگذاری ASCII]: https://en.wikipedia.org/wiki/ASCII [_کد صفحه 437_]: https://en.wikipedia.org/wiki/Code_page_437 بایت دوم نحوه نمایش کاراکتر را مشخص می کند. چهار بیت اول رنگ پیش زمینه را مشخص می کند ، سه بیت بعدی رنگ پس زمینه و بیت آخر اینکه کاراکتر باید چشمک بزند یا نه. رنگ های زیر موجود است: | Number | Color | Number + Bright Bit | Bright Color | | ------ | ---------- | ------------------- | ------------ | | 0x0 | Black | 0x8 | Dark Gray | | 0x1 | Blue | 0x9 | Light Blue | | 0x2 | Green | 0xa | Light Green | | 0x3 | Cyan | 0xb | Light Cyan | | 0x4 | Red | 0xc | Light Red | | 0x5 | Magenta | 0xd | Pink | | 0x6 | Brown | 0xe | Yellow | | 0x7 | Light Gray | 0xf | White | بیت 4، بیت روشنایی است ، که به عنوان مثال آبی به آبی روشن تبدیل می‌کند. برای رنگ پس زمینه ، این بیت به عنوان بیت چشمک مورد استفاده قرار می گیرد. بافر متن VGA از طریق [ورودی/خروجی حافظه‌نگاشتی] به آدرس`0xb8000` قابل دسترسی است. این بدان معنی است که خواندن و نوشتن در آن آدرس به RAM دسترسی ندارد ، بلکه مستقیماً دسترسی به بافر متن در سخت افزار VGA دارد. این بدان معنی است که می توانیم آن را از طریق عملیات حافظه عادی در آن آدرس بخوانیم و بنویسیم. [ورودی/خروجی حافظه‌نگاشتی]: https://en.wikipedia.org/wiki/Memory-mapped_I/O توجه داشته باشید که ممکن است سخت افزار حافظه‌نگاشتی شده از تمام عملیات معمول RAM پشتیبانی نکند. به عنوان مثال ، یک دستگاه ممکن است فقط خواندن بایتی را پشتیبانی کرده و با خواندن `u64` یک مقدار زباله را برگرداند. خوشبختانه بافر متن [از خواندن و نوشتن عادی پشتیبانی می کند] ، بنابراین مجبور نیستیم با آن به روش خاصی برخورد کنیم. [از خواندن و نوشتن عادی پشتیبانی می کند]: https://web.stanford.edu/class/cs140/projects/pintos/specs/freevga/vga/vgamem.htm#manip ## یک ماژول راست اکنون که از نحوه کار بافر VGA مطلع شدیم ، می توانیم یک ماژول Rust برای مدیریت چاپ ایجاد کنیم: ```rust // in src/main.rs mod vga_buffer; ``` برای محتوای این ماژول ما یک فایل جدید `src/vga_buffer.rs` ایجاد می کنیم. همه کدهای زیر وارد ماژول جدید ما می شوند (مگر اینکه طور دیگری مشخص شده باشد). ### رنگ ها اول ، ما رنگ های مختلف را با استفاده از یک enum نشان می دهیم: ```rust // in src/vga_buffer.rs #[allow(dead_code)] #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u8)] pub enum Color { Black = 0, Blue = 1, Green = 2, Cyan = 3, Red = 4, Magenta = 5, Brown = 6, LightGray = 7, DarkGray = 8, LightBlue = 9, LightGreen = 10, LightCyan = 11, LightRed = 12, Pink = 13, Yellow = 14, White = 15, } ``` ما در اینجا از [enum مانند C] برای مشخص کردن صریح عدد برای هر رنگ استفاده می کنیم. به دلیل ویژگی `repr(u8)` هر نوع enum به عنوان یک `u8` ذخیره می شود. در واقع 4 بیت کافی است ، اما Rust نوع `u4` ندارد. [enum مانند C]: https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html به طور معمول کامپایلر برای هر نوع استفاده نشده اخطار می دهد. با استفاده از ویژگی `#[allow(dead_code)]` این هشدارها را برای enum `Color` غیرفعال می کنیم. توسط [deriving] کردن تریت‌های [`Copy`], [`Clone`], [`Debug`], [`PartialEq`], و [`Eq`] ما [مفهوم کپی] را برای نوع فعال کرده و آن را قابل پرینت کردن میکنیم. [deriving]: https://doc.rust-lang.org/rust-by-example/trait/derive.html [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [`Clone`]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html [`Debug`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html [`PartialEq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html [`Eq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.Eq.html [مفهوم کپی]: https://doc.rust-lang.org/1.30.0/book/first-edition/ownership.html#copy-types برای نشان دادن یک کد کامل رنگ که رنگ پیش زمینه و پس زمینه را مشخص می کند ، یک [نوع جدید] بر روی `u8` ایجاد می کنیم: [نوع جدید]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(transparent)] struct ColorCode(u8); impl ColorCode { fn new(foreground: Color, background: Color) -> ColorCode { ColorCode((background as u8) << 4 | (foreground as u8)) } } ``` ساختمان `ColorCode` شامل بایت کامل رنگ است که شامل رنگ پیش زمینه و پس زمینه است. مانند قبل ، ویژگی های `Copy` و` Debug` را برای آن derive می کنیم. برای اطمینان از اینکه `ColorCode` دقیقاً ساختار داده مشابه `u8` دارد ، از ویژگی [`repr(transparent)`] استفاده می کنیم. [`repr(transparent)`]: https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent ### بافر متن اکنون می توانیم ساختمان‌هایی را برای نمایش یک کاراکتر صفحه و بافر متن اضافه کنیم: ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(C)] struct ScreenChar { ascii_character: u8, color_code: ColorCode, } const BUFFER_HEIGHT: usize = 25; const BUFFER_WIDTH: usize = 80; #[repr(transparent)] struct Buffer { chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` از آنجا که ترتیب فیلدهای ساختمان‌های پیش فرض در Rust تعریف نشده است ، به ویژگی[`repr(C)`] نیاز داریم. این تضمین می کند که فیلد های ساختمان دقیقاً مانند یک ساختمان C ترسیم شده اند و بنابراین ترتیب درست را تضمین می کند. برای ساختمان `Buffer` ، ما دوباره از [`repr(transparent)`] استفاده می کنیم تا اطمینان حاصل شود که نحوه قرارگیری در حافظه دقیقا همان یک فیلد است. [`repr(C)`]: https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc برای نوشتن در صفحه ، اکنون یک نوع نویسنده ایجاد می کنیم: ```rust // in src/vga_buffer.rs pub struct Writer { column_position: usize, color_code: ColorCode, buffer: &'static mut Buffer, } ``` نویسنده همیشه در آخرین خط مینویسد و وقتی خط پر است (یا در `\n`) ، سطرها را به سمت بالا شیفت می دهد. فیلد `column_position` موقعیت فعلی در ردیف آخر را نگهداری می کند. رنگهای پیش زمینه و پس زمینه فعلی توسط `color_code` مشخص شده و یک ارجاع (رفرنس) به بافر VGA در `buffer` ذخیره می شود. توجه داشته باشید که ما در اینجا به [طول عمر مشخصی] نیاز داریم تا به کامپایلر بگوییم تا چه مدت این ارجاع معتبر است. ظول عمر [`'static`] مشخص می کند که ارجاع برای کل مدت زمان اجرای برنامه معتبر باشد (که برای بافر متن VGA درست است). [طول عمر مشخصی]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-annotation-syntax [`'static`]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime ### چاپ کردن اکنون می توانیم از `Writer` برای تغییر کاراکترهای بافر استفاده کنیم. ابتدا یک متد برای نوشتن یک بایت ASCII ایجاد می کنیم: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { if self.column_position >= BUFFER_WIDTH { self.new_line(); } let row = BUFFER_HEIGHT - 1; let col = self.column_position; let color_code = self.color_code; self.buffer.chars[row][col] = ScreenChar { ascii_character: byte, color_code, }; self.column_position += 1; } } } fn new_line(&mut self) {/* TODO */} } ``` اگر بایت، بایتِ [خط جدید] `\n` باشد، نویسنده چیزی چاپ نمی کند. در عوض متد `new_line` را فراخوانی می کند که بعداً آن را پیاده‌سازی خواهیم کرد. بایت های دیگر در حالت دوم match روی صفحه چاپ می شوند. [خط جدید]: https://en.wikipedia.org/wiki/Newline هنگام چاپ بایت ، نویسنده بررسی می کند که آیا خط فعلی پر است یا نه. در صورت پُر بودن، برای نوشتن در خط ، باید متد `new_line` صدا زده شود. سپس یک `ScreenChar` جدید در بافر در موقعیت فعلی می نویسد. سرانجام ، موقعیت ستون فعلی یکی افزایش می‌یابد. برای چاپ کل رشته ها، می توانیم آنها را به بایت تبدیل کرده و یکی یکی چاپ کنیم: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_string(&mut self, s: &str) { for byte in s.bytes() { match byte { // printable ASCII byte or newline 0x20..=0x7e | b'\n' => self.write_byte(byte), // not part of printable ASCII range _ => self.write_byte(0xfe), } } } } ``` بافر متن VGA فقط از ASCII و بایت های اضافی [کد صفحه 437] پشتیبانی می کند. رشته های راست به طور پیش فرض [UTF-8] هستند ، بنابراین ممکن است حاوی بایت هایی باشند که توسط بافر متن VGA پشتیبانی نمی شوند. ما از یک match برای تفکیک بایت های قابل چاپ ASCII (یک خط جدید یا هر چیز دیگری بین یک کاراکتر فاصله و یک کاراکتر`~`) و بایت های غیر قابل چاپ استفاده می کنیم. برای بایت های غیر قابل چاپ ، یک کاراکتر `■` چاپ می کنیم که دارای کد شانزده‌ای (hex) `0xfe` بر روی سخت افزار VGA است. [کد صفحه 437]: https://en.wikipedia.org/wiki/Code_page_437 [UTF-8]: https://www.fileformat.info/info/unicode/utf8.htm #### امتحان‌اش کنید! برای نوشتن چند کاراکتر بر روی صفحه ، می توانید یک تابع موقتی ایجاد کنید: ```rust // in src/vga_buffer.rs pub fn print_something() { let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello "); writer.write_string("Wörld!"); } ``` ابتدا یک Writer جدید ایجاد می کند که به بافر VGA در `0xb8000` اشاره دارد. سینتکس این ممکن است کمی عجیب به نظر برسد: اول ، ما عدد صحیح `0xb8000` را به عنوان [اشاره گر خام] قابل تغییر در نظر می گیریم. سپس با dereferencing کردن آن (از طریق "*") و بلافاصله ارجاع مجدد (از طریق `&mut`) آن را به یک مرجع قابل تغییر تبدیل می کنیم. این تبدیل به یک [بلوک `غیرایمن`] احتیاج دارد ، زیرا کامپایلر نمی تواند صحت اشاره‌گر خام را تضمین کند. [اشاره گر خام]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [بلوک `غیرایمن`]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html سپس بایت `b'H'` را روی آن می نویسد. پیشوند `b` یک [بایت لیترال] ایجاد می کند ، که بیانگر یک کاراکتر ASCII است. با نوشتن رشته های `"ello "` و `"Wörld!"` ، ما متد `write_string` و واکنش به کاراکترهای غیر قابل چاپ را آزمایش می کنیم. برای دیدن خروجی ، باید تابع `print_something` را از تابع `_start` فراخوانی کنیم: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { vga_buffer::print_something(); loop {} } ``` اکنون هنگامی که ما پروژه را اجرا می کنیم ، باید یک `Hello W■■rld!` در گوشه سمت چپ _پایین_ صفحه به رنگ زرد چاپ شود: [بایت لیترال]: https://doc.rust-lang.org/reference/tokens.html#byte-literals ![QEMU output with a yellow `Hello W■■rld!` in the lower left corner](vga-hello.png) توجه داشته باشید که `ö` به عنوان دو کاراکتر `■` چاپ شده است. به این دلیل که `ö` با دو بایت در [UTF-8] نمایش داده می شود ، که هر دو در محدوده قابل چاپ ASCII قرار نمی گیرند. در حقیقت ، این یک ویژگی اساسی UTF-8 است: هر بایت از مقادیر چند بایتی هرگز ASCII معتبر نیستند. ### فرّار ما الان دیدیم که پیام ما به درستی چاپ شده است. با این حال ، ممکن است با کامپایلرهای آینده Rust که به صورت تهاجمی تری(aggressively) بهینه می شوند ، کار نکند. مشکل این است که ما فقط به `Buffer` می نویسیم و هرگز از آن نمیخوانیم. کامپایلر نمی داند که ما واقعاً به حافظه بافر VGA (به جای RAM معمولی) دسترسی پیدا می کنیم و در مورد اثر جانبی آن یعنی نمایش برخی کاراکتر ها روی صفحه چیزی نمی داند. بنابراین ممکن است تصمیم بگیرد که این نوشتن ها غیرضروری هستند و می تواند آن را حذف کند. برای جلوگیری از این بهینه سازی اشتباه ، باید این نوشتن ها را به عنوان _[فرّار]_ مشخص کنیم. این به کامپایلر می گوید که نوشتن عوارض جانبی دارد و نباید بهینه شود. [فرّار]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) به منظور استفاده از نوشتن های فرار برای بافر VGA ، ما از کتابخانه [volatile][volatile crate] استفاده می کنیم. این _crate_ (بسته ها در جهان Rust اینطور نامیده می‌شوند) نوع `Volatile` را که یک نوع wrapper هست با متد های `read` و `write` فراهم می کند. این متد ها به طور داخلی از توابع [read_volatile] و [write_volatile] کتابخانه اصلی استفاده می کنند و بنابراین تضمین می کنند که خواندن/ نوشتن با بهینه شدن حذف نمی‌شوند. [volatile crate]: https://docs.rs/volatile [read_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html [write_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html ما می توانیم وابستگی به کرت (crate) `volatile` را بوسیله اضافه کردن آن به بخش `dependencies` (وابستگی های) `Cargo.toml` اضافه کنیم: ```toml # in Cargo.toml [dependencies] volatile = "0.2.6" ``` `0.2.6` شماره نسخه [معنایی] است. برای اطلاعات بیشتر ، به راهنمای [تعیین وابستگی ها] مستندات کارگو (cargo) مراجعه کنید. [معنایی]: https://semver.org/ [تعیین وابستگی ها]: https://doc.crates.io/specifying-dependencies.html بیایید از آن برای نوشتن فرار در بافر VGA استفاده کنیم. نوع `Buffer` خود را به صورت زیر بروزرسانی می کنیم: ```rust // in src/vga_buffer.rs use volatile::Volatile; struct Buffer { chars: [[Volatile; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` به جای `ScreenChar` ، ما اکنون از `Volatile` استفاده می کنیم. (نوع `Volatile`، [generic] است و می تواند (تقریباً) هر نوع را در خود قرار دهد). این اطمینان می دهد که ما به طور تصادفی نمی توانیم از طریق نوشتن "عادی" در آن بنویسیم. در عوض ، اکنون باید از متد `write` استفاده کنیم. [generic]: https://doc.rust-lang.org/book/ch10-01-syntax.html این بدان معنی است که ما باید متد `Writer::write_byte` خود را به روز کنیم: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { ... self.buffer.chars[row][col].write(ScreenChar { ascii_character: byte, color_code, }); ... } } } ... } ``` به جای انتساب عادی با استفاده از `=` ، اکنون ما از متد `write` استفاده می کنیم. این تضمین می کند که کامپایلر هرگز این نوشتن را بهینه نخواهد کرد. ### ماکرو‌های قالب‌بندی خوب است که از ماکروهای قالب بندی Rust نیز پشتیبانی کنید. به این ترتیب ، می توانیم انواع مختلفی مانند عدد صحیح یا شناور را به راحتی چاپ کنیم. برای پشتیبانی از آنها ، باید تریت [`core::fmt::Write`] را پیاده سازی کنیم. تنها متد مورد نیاز این تریت ،`write_str` است که کاملاً شبیه به متد `write_str` ما است ، فقط با نوع بازگشت `fmt::Result`: [`core::fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html ```rust // in src/vga_buffer.rs use core::fmt; impl fmt::Write for Writer { fn write_str(&mut self, s: &str) -> fmt::Result { self.write_string(s); Ok(()) } } ``` `Ok(())` فقط نتیجه `Ok` حاوی نوع `()` است. اکنون ما می توانیم از ماکروهای قالب بندی داخلی راست یعنی `write!`/`writeln!` استفاده کنیم: ```rust // in src/vga_buffer.rs pub fn print_something() { use core::fmt::Write; let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello! "); write!(writer, "The numbers are {} and {}", 42, 1.0/3.0).unwrap(); } ``` حالا شما باید یک `Hello! The numbers are 42 and 0.3333333333333333` در پایین صفحه ببینید. فراخوانی `write!` یک `Result` را برمی گرداند که در صورت عدم استفاده باعث هشدار می شود ، بنابراین ما تابع [`unwrap`] را روی آن فراخوانی می کنیم که در صورت بروز خطا پنیک می کند. این در مورد ما مشکلی ندارد ، زیرا نوشتن در بافر VGA هرگز شکست نمیخورد. [`unwrap`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.unwrap ### خطوط جدید در حال حاضر ، ما از خطوط جدید و کاراکتر هایی که دیگر در خط نمی گنجند چشم پوشی می کنیم. درعوض ما می خواهیم هر کاراکتر را یک خط به بالا منتقل کنیم (خط بالا حذف می شود) و دوباره از ابتدای آخرین خط شروع کنیم. برای انجام این کار ، ما یک پیاده سازی برای متد `new_line` در `Writer` اضافه می کنیم: ```rust // in src/vga_buffer.rs impl Writer { fn new_line(&mut self) { for row in 1..BUFFER_HEIGHT { for col in 0..BUFFER_WIDTH { let character = self.buffer.chars[row][col].read(); self.buffer.chars[row - 1][col].write(character); } } self.clear_row(BUFFER_HEIGHT - 1); self.column_position = 0; } fn clear_row(&mut self, row: usize) {/* TODO */} } ``` ما تمام کاراکترهای صفحه را پیمایش می کنیم و هر کاراکتر را یک ردیف به بالا شیفت می دهیم. توجه داشته باشید که علامت گذاری دامنه (`..`) فاقد مقدار حد بالا است. ما همچنین سطر 0 را حذف می کنیم (اول محدوده از "1" شروع می شود) زیرا این سطر است که از صفحه به بیرون شیفت می شود. برای تکمیل کد `newline` ، متد `clear_row` را اضافه می کنیم: ```rust // in src/vga_buffer.rs impl Writer { fn clear_row(&mut self, row: usize) { let blank = ScreenChar { ascii_character: b' ', color_code: self.color_code, }; for col in 0..BUFFER_WIDTH { self.buffer.chars[row][col].write(blank); } } } ``` این متد با جایگزینی تمام کاراکترها با یک کاراکتر فاصله ، یک سطر را پاک می کند. ## یک رابط گلوبال برای فراهم کردن یک نویسنده گلوبال که بتواند به عنوان رابط از سایر ماژول ها بدون حمل نمونه `Writer` در اطراف استفاده شود ، سعی می کنیم یک `WRITER` ثابت ایجاد کنیم: ```rust // in src/vga_buffer.rs pub static WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; ``` با این حال ، اگر سعی کنیم اکنون آن را کامپایل کنیم ، خطاهای زیر رخ می دهد: ``` error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants --> src/vga_buffer.rs:7:17 | 7 | color_code: ColorCode::new(Color::Yellow, Color::Black), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ error[E0396]: raw pointers cannot be dereferenced in statics --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer in constant error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:13 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values ``` برای فهمیدن آنچه در اینجا اتفاق می افتد ، باید بدانیم که ثابت ها(Statics) در زمان کامپایل مقداردهی اولیه می شوند ، برخلاف متغیرهای عادی که در زمان اجرا مقداردهی اولیه می شوند. مولفه‌ای(component) از کامپایلر Rust که چنین عبارات مقداردهی اولیه را ارزیابی می کند ، “[const evaluator]” نامیده می شود. عملکرد آن هنوز محدود است ، اما کارهای گسترده ای برای گسترش آن در حال انجام است ، به عنوان مثال در “[Allow panicking in constants]” RFC. [const evaluator]: https://rustc-dev-guide.rust-lang.org/const-eval.html [Allow panicking in constants]: https://github.com/rust-lang/rfcs/pull/2345 مسئله در مورد `ColorCode::new` با استفاده از توابع [`const` functions] قابل حل است ، اما مشکل اساسی اینجاست که Rust's const evaluator قادر به تبدیل اشاره‌گر‌های خام به رفرنس در زمان کامپایل نیست. شاید روزی جواب دهد ، اما تا آن زمان ، ما باید راه حل دیگری پیدا کنیم. [`const` functions]: https://doc.rust-lang.org/reference/const_eval.html#const-functions ### استاتیک‌های تنبل (Lazy Statics) یکبار مقداردهی اولیه استاتیک‌ها با توابع غیر ثابت یک مشکل رایج در راست است. خوشبختانه ، در حال حاضر راه حل خوبی در کرتی به نام [lazy_static] وجود دارد. این کرت ماکرو `lazy_static!` را فراهم می کند که یک `استاتیک` را با تنبلی مقدار‌دهی اولیه می کند. به جای محاسبه مقدار آن در زمان کامپایل ، `استاتیک` به تنبلی هنگام اولین دسترسی به آن، خود را مقداردهی اولیه می‌کند. بنابراین ، مقداردهی اولیه در زمان اجرا اتفاق می افتد تا کد مقدار دهی اولیه پیچیده و دلخواه امکان پذیر باشد. [lazy_static]: https://docs.rs/lazy_static/1.0.1/lazy_static/ بیایید کرت `lazy_static` را به پروژه خود اضافه کنیم: ```toml # in Cargo.toml [dependencies.lazy_static] version = "1.0" features = ["spin_no_std"] ``` ما به ویژگی `spin_no_std` نیاز داریم ، زیرا به کتابخانه استاندارد پیوند نمی دهیم. با استفاده از `lazy_static` ، می توانیم WRITER ثابت خود را بدون مشکل تعریف کنیم: ```rust // in src/vga_buffer.rs use lazy_static::lazy_static; lazy_static! { pub static ref WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; } ``` با این حال ، این `WRITER` بسیار بی فایده است زیرا غیر قابل تغییر است. این بدان معنی است که ما نمی توانیم چیزی در آن بنویسیم (از آنجا که همه متد های نوشتن `&mut self` را در ورودی میگیرند). یک راه حل ممکن استفاده از [استاتیک قابل تغییر] است. اما پس از آن هر خواندن و نوشتن آن ناامن (unsafe) است زیرا می تواند به راحتی باعث data race و سایر موارد بد باشد. استفاده از `static mut` بسیار نهی شده است ، حتی پیشنهادهایی برای [حذف آن][remove static mut] وجود داشت. اما گزینه های دیگر چیست؟ ما می توانیم سعی کنیم از یک استاتیک تغییرناپذیر با نوع سلول مانند [RefCell] یا حتی [UnsafeCell] استفاده کنیم که [تغییر پذیری داخلی] را فراهم می کند. اما این انواع [Sync] نیستند (با دلیل کافی) ، بنابراین نمی توانیم از آنها در استاتیک استفاده کنیم. [استاتیک قابل تغییر]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable [remove static mut]: https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437 [RefCell]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#keeping-track-of-borrows-at-runtime-with-refcellt [UnsafeCell]: https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html [تغییر پذیری داخلی]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [Sync]: https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html ### Spinlocks برای دستیابی به قابلیت تغییرپذیری داخلی همزمان (synchronized) ، کاربران کتابخانه استاندارد می توانند از [Mutex] استفاده کنند. هنگامی که منبع از قبل قفل شده است ، با مسدود کردن رشته ها ، امکان انحصار متقابل را فراهم می کند. اما هسته اصلی ما هیچ پشتیبانی از مسدود کردن یا حتی مفهومی از نخ ها ندارد ، بنابراین ما هم نمی توانیم از آن استفاده کنیم. با این وجود یک نوع کاملاً پایه‌ای از mutex در علوم کامپیوتر وجود دارد که به هیچ ویژگی سیستم عاملی نیاز ندارد: [spinlock]. به جای مسدود کردن ، نخ ها سعی می کنند آن را بارها و بارها در یک حلقه قفل کنند و بنابراین زمان پردازنده را می سوزانند تا دوباره mutex آزاد شود. [Mutex]: https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html [spinlock]: https://en.wikipedia.org/wiki/Spinlock برای استفاده از spinning mutex ، می توانیم [کرت spin] را به عنوان یک وابستگی اضافه کنیم: [کرت spin]: https://crates.io/crates/spin ```toml # in Cargo.toml [dependencies] spin = "0.5.2" ``` سپس می توانیم از spinning Mutex برای افزودن [تغییر پذیری داخلی] امن به `WRITER` استاتیک خود استفاده کنیم: ```rust // in src/vga_buffer.rs use spin::Mutex; ... lazy_static! { pub static ref WRITER: Mutex = Mutex::new(Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }); } ``` اکنون می توانیم تابع `print_something` را حذف کرده و مستقیماً از تابع`_start` خود چاپ کنیم: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { use core::fmt::Write; vga_buffer::WRITER.lock().write_str("Hello again").unwrap(); write!(vga_buffer::WRITER.lock(), ", some numbers: {} {}", 42, 1.337).unwrap(); loop {} } ``` برای اینکه بتوانیم از توابع آن استفاده کنیم ، باید تریت `fmt::Write` را وارد کنیم. ### ایمنی توجه داشته باشید که ما فقط یک بلوک ناامن در کد خود داریم که برای ایجاد رفرنس `Buffer` با اشاره به `0xb8000` لازم است. پس از آن ، تمام عملیات ایمن هستند. Rust به طور پیش فرض از بررسی مرزها در دسترسی به آرایه استفاده می کند ، بنابراین نمی توانیم به طور اتفاقی خارج از بافر بنویسیم. بنابراین ، ما شرایط مورد نیاز را در سیستم نوع انجام می‌دهیم و قادر به ایجاد یک رابط ایمن به خارج هستیم. ### یک ماکروی println اکنون که یک نویسنده گلوبال داریم ، می توانیم یک ماکرو `println` اضافه کنیم که می تواند از هر کجا در کد استفاده شود. [سینتکس ماکروی] راست کمی عجیب است ، بنابراین ما سعی نمی کنیم ماکرو را از ابتدا بنویسیم. در عوض به سورس [ماکروی `println!`] در کتابخانه استاندارد نگاه می کنیم: [سینتکس ماکروی]: https://doc.rust-lang.org/nightly/book/ch20-05-macros.html#declarative-macros-for-general-metaprogramming [ماکروی `println!`]: https://doc.rust-lang.org/nightly/std/macro.println!.html ```rust #[macro_export] macro_rules! println { () => (print!("\n")); ($($arg:tt)*) => (print!("{}\n", format_args!($($arg)*))); } ``` ماکروها از طریق یک یا چند قانون تعریف می شوند که شبیه بازوهای `match` هستند. ماکرو `println` دارای دو قانون است: اولین قانون برای فراخوانی های بدون آرگمان است (به عنوان مثال: `println!()`) ، که به `print!("\n")` گسترش می یابد، بنابراین فقط یک خط جدید را چاپ می کند. قانون دوم برای فراخوانی هایی با پارامترهایی مانند `println!("Hello")` یا `println!("Number: {}", 4)` است. همچنین با فراخوانی کل آرگومان ها و یک خط جدید `\n` اضافی در انتها ، به فراخوانی ماکرو `print!` گسترش می یابد. ویژگی `#[macro_export]` ماکرو را برای کل کرت (نه فقط ماژولی که تعریف شده است) و کرت های خارجی در دسترس قرار می دهد. همچنین ماکرو را در ریشه کرت قرار می دهد ، به این معنی که ما باید ماکرو را به جای `std::macros::println` از طریق `use std::println` وارد کنیم. [ماکرو `print!`] به این صورت تعریف می شود: [ماکرو `print!`]: https://doc.rust-lang.org/nightly/std/macro.print!.html ```rust #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::io::_print(format_args!($($arg)*))); } ``` ماکرو به فراخوانی [تابع `_print`] در ماژول `io` گسترش می یابد. [متغیر `$crate`] تضمین می کند که ماکرو هنگام گسترش در `std` در زمان استفاده در کرت های دیگر، در خارج از کرت `std` نیز کار می کند. [ماکرو `format_args`] از آرگمان های داده شده یک نوع [fmt::Arguments] را می سازد که به `_print` ارسال می شود. [تابع `_print`] از کتابخانه استاندارد،`print_to` را فراخوانی می کند ، که بسیار پیچیده است زیرا از دستگاه های مختلف `Stdout` پشتیبانی می کند. ما به این پیچیدگی احتیاج نداریم زیرا فقط می خواهیم در بافر VGA چاپ کنیم. [تابع `_print`]: https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698 [متغیر `$crate`]: https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate [ماکرو `format_args`]: https://doc.rust-lang.org/nightly/std/macro.format_args.html [fmt::Arguments]: https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html برای چاپ در بافر VGA ، ما فقط ماکروهای `println!` و `print!` را کپی می کنیم ، اما آنها را اصلاح می کنیم تا از تابع `_print` خود استفاده کنیم: ```rust // in src/vga_buffer.rs #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::vga_buffer::_print(format_args!($($arg)*))); } #[macro_export] macro_rules! println { () => ($crate::print!("\n")); ($($arg:tt)*) => ($crate::print!("{}\n", format_args!($($arg)*))); } #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` چیزی که ما از تعریف اصلی `println` تغییر دادیم این است که فراخوانی ماکرو `print!` را با پیشوند `$crate` انجام می دهیم. این تضمین می کند که اگر فقط می خواهیم از `println` استفاده کنیم ، نیازی به وارد کردن ماکرو `print!` هم نداشته باشیم. مانند کتابخانه استاندارد ، ویژگی `#[macro_export]` را به هر دو ماکرو اضافه می کنیم تا در همه جای کرت ما در دسترس باشند. توجه داشته باشید که این ماکروها را در فضای نام ریشه کرت قرار می دهد ، بنابراین وارد کردن آنها از طریق `use crate::vga_buffer::println` کار نمی کند. در عوض ، ما باید `use crate::println` را استفاده کنیم. تابع `_print` نویسنده (`WRITER`) استاتیک ما را قفل می کند و متد`write_fmt` را روی آن فراخوانی می کند. این متد از تریت `Write` است ، ما باید این تریت را وارد کنیم. اگر چاپ موفقیت آمیز نباشد ، `unwrap()` اضافی در انتها باعث پنیک می‌شود. اما از آنجا که ما همیشه `Ok` را در `write_str` برمی گردانیم ، این اتفاق نمی افتد. از آنجا که ماکروها باید بتوانند از خارج از ماژول، `_print` را فراخوانی کنند، تابع باید عمومی (public) باشد. با این حال ، از آنجا که این جزئیات پیاده سازی را خصوصی (private) در نظر می گیریم، [ویژگی `doc(hidden)`] را اضافه می کنیم تا از مستندات تولید شده پنهان شود. [ویژگی `doc(hidden)`]: https://doc.rust-lang.org/nightly/rustdoc/write-documentation/the-doc-attribute.html#hidden ### Hello World توسط `println` اکنون می توانیم از `println` در تابع `_start` استفاده کنیم: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() { println!("Hello World{}", "!"); loop {} } ``` توجه داشته باشید که ما مجبور نیستیم ماکرو را در تابع اصلی وارد کنیم ، زیرا در حال حاضر در فضای نام ریشه موجود است. همانطور که انتظار می رفت ، اکنون یک _“Hello World!”_ روی صفحه مشاهده می کنیم: ![QEMU printing “Hello World!”](vga-hello-world.png) ### چاپ پیام های پنیک اکنون که ماکرو `println` را داریم ، می توانیم از آن در تابع پنیک برای چاپ پیام و مکان پنیک استفاده کنیم: ```rust // in main.rs /// This function is called on panic. #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } ``` اکنون وقتی که `panic!("Some panic message");` را در تابع `_start` خود اضافه میکنیم ، خروجی زیر را می گیریم: ![QEMU printing “panicked at 'Some panic message', src/main.rs:28:5](vga-panic.png) بنابراین ما نه تنها می‌دانیم که یک پنیک رخ داده است ، بلکه پیام پنیک و اینکه در کجای کد رخ داده است را نیز می‌دانیم. ## خلاصه در این پست با ساختار بافر متن VGA و نحوه نوشتن آن از طریق نگاشت حافظه در آدرس `0xb8000` آشنا شدیم. ما یک ماژول راست ایجاد کردیم که عدم امنیت نوشتن را در این بافر نگاشت حافظه شده را محصور می کند و یک رابط امن و راحت به خارج ارائه می دهد. همچنین دیدیم که به لطف کارگو ، اضافه کردن وابستگی به کتابخانه های دیگران چقدر آسان است. دو وابستگی که اضافه کردیم ، `lazy_static` و`spin` ، در توسعه سیستم عامل بسیار مفید هستند و ما در پست های بعدی از آنها در مکان های بیشتری استفاده خواهیم کرد. ## بعدی چیست؟ در پست بعدی نحوه راه اندازی چارچوب تست واحد (Unit Test) راست توضیح داده شده است. سپس از این پست چند تست واحد اساسی برای ماژول بافر VGA ایجاد خواهیم کرد. ================================================ FILE: blog/content/edition-2/posts/03-vga-text-buffer/index.fr.md ================================================ +++ title = "Mode Texte VGA" weight = 3 path = "fr/vga-text-mode" date = 2018-02-26 [extra] chapter = "Bare Bones" # Please update this when updating the translation translation_based_on_commit = "211f460251cd332905225c93eb66b1aff9f4aefd" # GitHub usernames of the people that translated this post translators = ["YaogoGerard"] +++ Le [mode texte VGA] est une manière simple d'afficher du texte à l'écran. Dans cet article, nous créons une interface qui rend son utilisation sûre et simple en encapsulant toutes les parties non sûres dans un module séparé. Nous implémentons également le support des [macros de formatage] de Rust. [mode texte VGA]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode [macros de formatage]: https://doc.rust-lang.org/std/fmt/#related-macros Ce blog est développé ouvertement sur [GitHub]. Si vous avez des problèmes ou des questions, veuillez ouvrir un ticket là-bas. Vous pouvez également laisser des commentaires [en bas de page]. Le code source complet de cet article se trouve dans la branche [`post-03`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [en bas de page]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-03 ## Le tampon de texte VGA Pour afficher un caractère à l'écran en mode texte VGA, il faut l'écrire dans le tampon de texte du matériel VGA. Le tampon de texte VGA est un tableau à deux dimensions typiquement de 25 lignes et 80 colonnes, qui est directement rendu à l'écran. Chaque entrée du tableau décrit un caractère à l'écran via le format suivant : | Bit(s) | Valeur | | ------ | --------------------------- | | 0-7 | Point de code ASCII | | 8-11 | Couleur de premier plan | | 12-14 | Couleur d'arrière-plan | | 15 | Clignotement | Le premier octet représente le caractère qui doit être affiché dans l'[encodage ASCII]. Pour être plus précis, ce n'est pas exactement l'ASCII, mais un jeu de caractères nommé [_page de codes 437_] avec quelques caractères supplémentaires et de légères modifications. Par souci de simplicité, nous continuerons à l'appeler caractère ASCII dans cet article. [encodage ASCII]: https://en.wikipedia.org/wiki/ASCII [_page de codes 437_]: https://en.wikipedia.org/wiki/Code_page_437 Le deuxième octet définit comment le caractère est affiché. Les quatre premiers bits définissent la couleur de premier plan, les trois bits suivants la couleur d'arrière-plan, et le dernier bit si le caractère doit clignoter. Les couleurs suivantes sont disponibles : | Nombre | Couleur | Nombre + Bit de Luminosité | Couleur Claire | | ------ | ---------------- | -------------------------- | -------------- | | 0x0 | Noir | 0x8 | Gris Foncé | | 0x1 | Bleu | 0x9 | Bleu Clair | | 0x2 | Vert | 0xa | Vert Clair | | 0x3 | Cyan | 0xb | Cyan Clair | | 0x4 | Rouge | 0xc | Rouge Clair | | 0x5 | Magenta | 0xd | Rose | | 0x6 | Marron | 0xe | Jaune | | 0x7 | Gris Clair | 0xf | Blanc | Le bit 4 est le _bit de luminosité_, qui transforme, par exemple, le bleu en bleu clair. Pour la couleur d'arrière-plan, ce bit est réutilisé comme bit de clignotement. Le tampon de texte VGA est accessible via une [entrée-sortie mappée en mémoire] à l'adresse `0xb8000`. Cela signifie que les lectures et écritures à cette adresse n'accèdent pas à la RAM mais accèdent directement au tampon de texte sur le matériel VGA. Cela signifie que nous pouvons le lire et l'écrire via des opérations mémoire normales à cette adresse. [entrée-sortie mappée en mémoire]: https://en.wikipedia.org/wiki/Memory-mapped_I/O Notez que le matériel mappé en mémoire peut ne pas supporter toutes les opérations RAM normales. Par exemple, un périphérique pourrait ne supporter que des lectures octet par octet et renvoyer des données incohérentes si un `u64` est lu. Heureusement, le tampon de texte [supporte les lectures et écritures normales], nous n'avons donc pas à le traiter de manière spéciale. [supporte les lectures et écritures normales]: https://web.stanford.edu/class/cs140/projects/pintos/specs/freevga/vga/vgamem.htm#manip ## Un module Rust Maintenant que nous savons comment fonctionne le tampon VGA, nous pouvons créer un module Rust pour gérer l'affichage : ```rust // dans src/main.rs mod vga_buffer; ``` Pour le contenu de ce module, nous créons un nouveau fichier `src/vga_buffer.rs`. Tout le code ci-dessous va dans notre nouveau module (sauf indication contraire). ### Couleurs Tout d'abord, nous représentons les différentes couleurs à l'aide d'une énumération : ```rust // dans src/vga_buffer.rs #[allow(dead_code)] #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u8)] pub enum Color { Black = 0, Blue = 1, Green = 2, Cyan = 3, Red = 4, Magenta = 5, Brown = 6, LightGray = 7, DarkGray = 8, LightBlue = 9, LightGreen = 10, LightCyan = 11, LightRed = 12, Pink = 13, Yellow = 14, White = 15, } ``` Nous utilisons ici une [énumération de style C] pour spécifier explicitement le numéro de chaque couleur. Grâce à l'attribut `repr(u8)`, chaque variante de l'énumération est stockée sous forme de `u8`. En réalité, 4 bits seraient suffisants, mais Rust n'a pas de type `u4`. [énumération de style C]: https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html Normalement, le compilateur émettrait un avertissement pour chaque variante inutilisée. En utilisant l'attribut `#[allow(dead_code)]`, nous désactivons ces avertissements pour l'énumération `Color`. En [dérivant] les traits [`Copy`], [`Clone`], [`Debug`], [`PartialEq`] et [`Eq`], nous activons la [sémantique de copie] pour le type et le rendons imprimable et comparable. [dérivant]: https://doc.rust-lang.org/rust-by-example/trait/derive.html [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [`Clone`]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html [`Debug`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html [`PartialEq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html [`Eq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.Eq.html [sémantique de copie]: https://doc.rust-lang.org/1.30.0/book/first-edition/ownership.html#copy-types Pour représenter un code couleur complet qui spécifie les couleurs de premier plan et d'arrière-plan, nous créons un [newtype] au-dessus de `u8` : [newtype]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html ```rust // dans src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(transparent)] struct ColorCode(u8); impl ColorCode { fn new(foreground: Color, background: Color) -> ColorCode { ColorCode((background as u8) << 4 | (foreground as u8)) } } ``` La structure `ColorCode` contient l'octet de couleur complet, contenant les couleurs de premier plan et d'arrière-plan. Comme précédemment, nous dérivons les traits `Copy` et `Debug` pour celle-ci. Pour garantir que `ColorCode` a exactement la même disposition de données qu'un `u8`, nous utilisons l'attribut [`repr(transparent)`]. [`repr(transparent)`]: https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent ### Tampon de texte Nous pouvons maintenant ajouter des structures pour représenter un caractère d'écran et le tampon de texte : ```rust // dans src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(C)] struct ScreenChar { ascii_character: u8, color_code: ColorCode, } const BUFFER_HEIGHT: usize = 25; const BUFFER_WIDTH: usize = 80; #[repr(transparent)] struct Buffer { chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Étant donné que l'ordre des champs dans les structures par défaut est indéfini en Rust, nous avons besoin de l'attribut [`repr(C)`]. Il garantit que les champs de la structure sont disposés exactement comme dans une structure C et garantit ainsi l'ordre correct des champs. Pour la structure `Buffer`, nous utilisons à nouveau [`repr(transparent)`] pour nous assurer qu'elle a la même disposition en mémoire que son champ unique. [`repr(C)`]: https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc Pour écrire réellement à l'écran, nous créons maintenant un type writer : ```rust // dans src/vga_buffer.rs pub struct Writer { column_position: usize, color_code: ColorCode, buffer: &'static mut Buffer, } ``` Le writer écrira toujours sur la dernière ligne et décalera les lignes vers le haut lorsqu'une ligne est pleine (ou sur `\n`). Le champ `column_position` garde la trace de la position actuelle dans la dernière ligne. Les couleurs actuelles de premier plan et d'arrière-plan sont spécifiées par `color_code` et une référence au tampon VGA est stockée dans `buffer`. Notez que nous avons besoin d'une [durée de vie explicite] ici pour indiquer au compilateur combien de temps la référence est valide. La durée de vie [`'static`] spécifie que la référence est valide pendant toute la durée d'exécution du programme (ce qui est vrai pour le tampon de texte VGA). [durée de vie explicite]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-annotation-syntax [`'static`]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime ### Affichage Nous pouvons maintenant utiliser le `Writer` pour modifier les caractères du tampon. Tout d'abord, nous créons une méthode pour écrire un seul octet ASCII : ```rust // dans src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { if self.column_position >= BUFFER_WIDTH { self.new_line(); } let row = BUFFER_HEIGHT - 1; let col = self.column_position; let color_code = self.color_code; self.buffer.chars[row][col] = ScreenChar { ascii_character: byte, color_code, }; self.column_position += 1; } } } fn new_line(&mut self) {/* TODO */} } ``` Si l'octet est l'octet de [nouvelle ligne] `\n`, le writer n'affiche rien. Au lieu de cela, il appelle une méthode `new_line`, que nous implémenterons plus tard. Les autres octets sont affichés à l'écran dans le deuxième cas `match`. [nouvelle ligne]: https://en.wikipedia.org/wiki/Newline Lors de l'affichage d'un octet, le writer vérifie si la ligne actuelle est pleine. Dans ce cas, un appel à `new_line` est utilisé pour passer à la ligne suivante. Ensuite, il écrit un nouveau `ScreenChar` dans le tampon à la position actuelle. Enfin, la position de colonne actuelle est avancée. Pour afficher des chaînes entières, nous pouvons les convertir en octets et les afficher un par un : ```rust // dans src/vga_buffer.rs impl Writer { pub fn write_string(&mut self, s: &str) { for byte in s.bytes() { match byte { // octet ASCII imprimable ou nouvelle ligne 0x20..=0x7e | b'\n' => self.write_byte(byte), // ne fait pas partie de la plage ASCII imprimable _ => self.write_byte(0xfe), } } } } ``` Le tampon de texte VGA ne prend en charge que l'ASCII et les octets supplémentaires de la [page de codes 437]. Les chaînes Rust sont en [UTF-8] par défaut, elles peuvent donc contenir des octets qui ne sont pas pris en charge par le tampon de texte VGA. Nous utilisons un `match` pour différencier les octets ASCII imprimables (une nouvelle ligne ou tout ce qui se trouve entre un caractère espace et un caractère `~`) et les octets non imprimables. Pour les octets non imprimables, nous affichons un caractère `■`, qui a le code hexadécimal `0xfe` sur le matériel VGA. [page de codes 437]: https://en.wikipedia.org/wiki/Code_page_437 [UTF-8]: https://www.fileformat.info/info/unicode/utf8.htm #### Essayons ! Pour écrire quelques caractères à l'écran, vous pouvez créer une fonction temporaire : ```rust // dans src/vga_buffer.rs pub fn print_something() { let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello "); writer.write_string("Wörld!"); } ``` Elle crée d'abord un nouveau Writer qui pointe vers le tampon VGA à `0xb8000`. La syntaxe pour cela peut sembler un peu étrange : D'abord, nous convertissons l'entier `0xb8000` en [pointeur brut] mutable. Ensuite, nous le convertissons en référence mutable en le déréférençant (via `*`) et en l'empruntant à nouveau immédiatement (via `&mut`). Cette conversion nécessite un [bloc `unsafe`], car le compilateur ne peut pas garantir que le pointeur brut est valide. [pointeur brut]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [bloc `unsafe`]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html Ensuite, elle écrit l'octet `b'H'`. Le préfixe `b` crée un [littéral d'octet], qui représente un caractère ASCII. En écrivant les chaînes `"ello "` et `"Wörld!"`, nous testons notre méthode `write_string` et la gestion des caractères non imprimables. Pour voir la sortie, nous devons appeler la fonction `print_something` depuis notre fonction `_start` : ```rust // dans src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { vga_buffer::print_something(); loop {} } ``` Lorsque nous exécutons notre projet maintenant, un `Hello W■■rld!` devrait être affiché dans le coin _inférieur_ gauche de l'écran en jaune : [littéral d'octet]: https://doc.rust-lang.org/reference/tokens.html#byte-literals ![Sortie QEMU avec un `Hello W■■rld!` jaune dans le coin inférieur gauche](vga-hello.png) Remarquez que le `ö` est affiché sous forme de deux caractères `■`. C'est parce que `ö` est représenté par deux octets en [UTF-8], qui ne se trouvent pas tous les deux dans la plage ASCII imprimable. En fait, c'est une propriété fondamentale de l'UTF-8 : les octets individuels des valeurs multi-octets ne sont jamais de l'ASCII valide. ### Volatile Nous venons de voir que notre message a été affiché correctement. Cependant, cela pourrait ne pas fonctionner avec les futurs compilateurs Rust qui optimisent de manière plus agressive. Le problème est que nous écrivons uniquement dans le `Buffer` et ne le lisons plus jamais. Le compilateur ne sait pas que nous accédons réellement à la mémoire du tampon VGA (au lieu de la RAM normale) et ne sait rien de l'effet secondaire selon lequel certains caractères apparaissent à l'écran. Il pourrait donc décider que ces écritures sont inutiles et peuvent être omises. Pour éviter cette optimisation erronée, nous devons spécifier que ces écritures sont _[volatile]_. Cela indique au compilateur que l'écriture a des effets secondaires et ne doit pas être optimisée. [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) Afin d'utiliser des écritures volatiles pour le tampon VGA, nous utilisons la bibliothèque [volatile][volatile crate]. Cette _crate_ (c'est ainsi que les paquets sont appelés dans le monde Rust) fournit un type wrapper `Volatile` avec des méthodes `read` et `write`. Ces méthodes utilisent en interne les fonctions [read_volatile] et [write_volatile] de la bibliothèque core et garantissent ainsi que les lectures/écritures ne sont pas optimisées. [volatile crate]: https://docs.rs/volatile [read_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html [write_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html Nous pouvons ajouter une dépendance à la crate `volatile` en l'ajoutant à la section `dependencies` de notre `Cargo.toml` : ```toml # dans Cargo.toml [dependencies] volatile = "0.2.6" ``` Assurez-vous de spécifier la version `0.2.6` de `volatile`. Les versions plus récentes de la crate ne sont pas compatibles avec cet article. `0.2.6` est le numéro de version [sémantique]. Pour plus d'informations, consultez le guide [Specifying Dependencies] de la documentation cargo. [sémantique]: https://semver.org/ [Specifying Dependencies]: https://doc.crates.io/specifying-dependencies.html Utilisons-la pour rendre les écritures dans le tampon VGA volatiles. Nous mettons à jour notre type `Buffer` comme suit : ```rust // dans src/vga_buffer.rs use volatile::Volatile; struct Buffer { chars: [[Volatile; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Au lieu d'un `ScreenChar`, nous utilisons maintenant un `Volatile`. (Le type `Volatile` est [générique] et peut envelopper (presque) n'importe quel type). Cela garantit que nous ne pouvons pas écrire dedans accidentellement de manière "normale". Au lieu de cela, nous devons maintenant utiliser la méthode `write`. [générique]: https://doc.rust-lang.org/book/ch10-01-syntax.html Cela signifie que nous devons mettre à jour notre méthode `Writer::write_byte` : ```rust // dans src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { ... self.buffer.chars[row][col].write(ScreenChar { ascii_character: byte, color_code, }); ... } } } ... } ``` Au lieu d'une affectation typique utilisant `=`, nous utilisons maintenant la méthode `write`. Maintenant, nous pouvons garantir que le compilateur n'optimisera jamais cette écriture. ### Macros de formatage Il serait agréable de prendre en charge les macros de formatage de Rust également. De cette façon, nous pouvons facilement afficher différents types, comme des entiers ou des flottants. Pour les prendre en charge, nous devons implémenter le trait [`core::fmt::Write`]. La seule méthode requise de ce trait est `write_str`, qui ressemble beaucoup à notre méthode `write_string`, juste avec un type de retour `fmt::Result` : [`core::fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html ```rust // dans src/vga_buffer.rs use core::fmt; impl fmt::Write for Writer { fn write_str(&mut self, s: &str) -> fmt::Result { self.write_string(s); Ok(()) } } ``` Le `Ok(())` est juste un Result `Ok` contenant le type `()`. Maintenant, nous pouvons utiliser les macros de formatage intégrées de Rust `write!`/`writeln!` : ```rust // dans src/vga_buffer.rs pub fn print_something() { use core::fmt::Write; let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello! "); write!(writer, "The numbers are {} and {}", 42, 1.0/3.0).unwrap(); } ``` Maintenant, vous devriez voir un `Hello! The numbers are 42 and 0.3333333333333333` en bas de l'écran. L'appel à `write!` renvoie un `Result` qui provoque un avertissement s'il n'est pas utilisé, nous appelons donc la fonction [`unwrap`] dessus, qui panique si une erreur se produit. Ce n'est pas un problème dans notre cas, car les écritures dans le tampon VGA n'échouent jamais. [`unwrap`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.unwrap ### Nouvelles lignes Pour le moment, nous ignorons simplement les nouvelles lignes et les caractères qui ne rentrent plus dans la ligne. Au lieu de cela, nous voulons déplacer chaque caractère d'une ligne vers le haut (la ligne supérieure est supprimée) et recommencer au début de la dernière ligne. Pour ce faire, nous ajoutons une implémentation pour la méthode `new_line` de `Writer` : ```rust // dans src/vga_buffer.rs impl Writer { fn new_line(&mut self) { for row in 1..BUFFER_HEIGHT { for col in 0..BUFFER_WIDTH { let character = self.buffer.chars[row][col].read(); self.buffer.chars[row - 1][col].write(character); } } self.clear_row(BUFFER_HEIGHT - 1); self.column_position = 0; } fn clear_row(&mut self, row: usize) {/* TODO */} } ``` Nous itérons sur tous les caractères de l'écran et déplaçons chaque caractère d'une ligne vers le haut. Notez que la borne supérieure de la notation de plage (`..`) est exclusive. Nous omettons également la 0ème ligne (la première plage commence à `1`) car c'est la ligne qui est décalée hors de l'écran. Pour terminer le code de nouvelle ligne, nous ajoutons la méthode `clear_row` : ```rust // dans src/vga_buffer.rs impl Writer { fn clear_row(&mut self, row: usize) { let blank = ScreenChar { ascii_character: b' ', color_code: self.color_code, }; for col in 0..BUFFER_WIDTH { self.buffer.chars[row][col].write(blank); } } } ``` Cette méthode efface une ligne en écrasant tous ses caractères par un caractère espace. ## Une interface globale Pour fournir un writer global qui peut être utilisé comme interface depuis d'autres modules sans transporter une instance `Writer`, nous essayons de créer un `WRITER` statique : ```rust // dans src/vga_buffer.rs pub static WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; ``` Cependant, si nous essayons de le compiler maintenant, les erreurs suivantes se produisent : ``` error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants --> src/vga_buffer.rs:7:17 | 7 | color_code: ColorCode::new(Color::Yellow, Color::Black), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ error[E0396]: raw pointers cannot be dereferenced in statics --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer in constant error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:13 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values ``` Pour comprendre ce qui se passe ici, nous devons savoir que les statiques sont initialisés au moment de la compilation, contrairement aux variables normales qui sont initialisées au moment de l'exécution. Le composant du compilateur Rust qui évalue ces expressions d'initialisation est appelé le "[const evaluator]". Sa fonctionnalité est encore limitée, mais il y a un travail en cours pour l'étendre, par exemple dans la RFC "[Allow panicking in constants]". [const evaluator]: https://rustc-dev-guide.rust-lang.org/const-eval.html [Allow panicking in constants]: https://github.com/rust-lang/rfcs/pull/2345 Le problème avec `ColorCode::new` serait résoluble en utilisant des [fonctions `const`], mais le problème fondamental ici est que l'évaluateur const de Rust n'est pas capable de convertir les pointeurs bruts en références au moment de la compilation. Peut-être que cela fonctionnera un jour, mais d'ici là, nous devons trouver une autre solution. [fonctions `const`]: https://doc.rust-lang.org/reference/const_eval.html#const-functions ### Lazy Statics L'initialisation unique de statiques avec des fonctions non-const est un problème courant en Rust. Heureusement, il existe déjà une bonne solution dans une crate nommée [lazy_static]. Cette crate fournit une macro `lazy_static!` qui définit un `static` initialisé paresseusement. Au lieu de calculer sa valeur au moment de la compilation, le `static` s'initialise paresseusement lorsqu'il est accédé pour la première fois. Ainsi, l'initialisation se produit au moment de l'exécution, de sorte qu'un code d'initialisation arbitrairement complexe est possible. [lazy_static]: https://docs.rs/lazy_static/1.0.1/lazy_static/ Ajoutons la crate `lazy_static` à notre projet : ```toml # dans Cargo.toml [dependencies.lazy_static] version = "1.0" features = ["spin_no_std"] ``` Nous avons besoin de la fonctionnalité `spin_no_std`, car nous ne lions pas la bibliothèque standard. Avec `lazy_static`, nous pouvons définir notre `WRITER` statique sans problème : ```rust // dans src/vga_buffer.rs use lazy_static::lazy_static; lazy_static! { pub static ref WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; } ``` Cependant, ce `WRITER` est assez inutile car il est immuable. Cela signifie que nous ne pouvons rien y écrire (puisque toutes les méthodes d'écriture prennent `&mut self`). Une solution possible serait d'utiliser un [static mutable]. Mais alors chaque lecture et écriture serait unsafe car cela pourrait facilement introduire des courses de données et d'autres mauvaises choses. L'utilisation de `static mut` est fortement déconseillée. Il y a même eu des propositions pour [le supprimer][remove static mut]. Mais quelles sont les alternatives ? Nous pourrions essayer d'utiliser un static immuable avec un type de cellule comme [RefCell] ou même [UnsafeCell] qui fournit une [mutabilité intérieure]. Mais ces types ne sont pas [Sync] (pour de bonnes raisons), nous ne pouvons donc pas les utiliser dans des statiques. [static mutable]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable [remove static mut]: https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437 [RefCell]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#keeping-track-of-borrows-at-runtime-with-refcellt [UnsafeCell]: https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html [mutabilité intérieure]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [Sync]: https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html ### Spinlocks Pour obtenir une mutabilité intérieure synchronisée, les utilisateurs de la bibliothèque standard peuvent utiliser [Mutex]. Il fournit une exclusion mutuelle en bloquant les threads lorsque la ressource est déjà verrouillée. Mais notre noyau de base n'a aucun support de blocage ni même de concept de threads, nous ne pouvons donc pas l'utiliser non plus. Cependant, il existe un type de mutex très basique en informatique qui ne nécessite aucune fonctionnalité du système d'exploitation : le [spinlock]. Au lieu de bloquer, les threads essaient simplement de le verrouiller encore et encore dans une boucle serrée, brûlant ainsi du temps CPU jusqu'à ce que le mutex soit à nouveau libre. [Mutex]: https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html [spinlock]: https://en.wikipedia.org/wiki/Spinlock Pour utiliser un mutex tournant, nous pouvons ajouter la [crate spin] comme dépendance : [crate spin]: https://crates.io/crates/spin ```toml # dans Cargo.toml [dependencies] spin = "0.5.2" ``` Ensuite, nous pouvons utiliser le mutex tournant pour ajouter une [mutabilité intérieure] sûre à notre `WRITER` statique : ```rust // dans src/vga_buffer.rs use spin::Mutex; ... lazy_static! { pub static ref WRITER: Mutex = Mutex::new(Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }); } ``` Maintenant, nous pouvons supprimer la fonction `print_something` et afficher directement depuis notre fonction `_start` : ```rust // dans src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { use core::fmt::Write; vga_buffer::WRITER.lock().write_str("Hello again").unwrap(); write!(vga_buffer::WRITER.lock(), ", some numbers: {} {}", 42, 1.337).unwrap(); loop {} } ``` Nous devons importer le trait `fmt::Write` pour pouvoir utiliser ses fonctions. ### Sécurité Notez que nous n'avons qu'un seul bloc unsafe dans notre code, qui est nécessaire pour créer une référence `Buffer` pointant vers `0xb8000`. Ensuite, toutes les opérations sont sûres. Rust utilise la vérification des limites pour les accès aux tableaux par défaut, nous ne pouvons donc pas écrire accidentellement en dehors du tampon. Ainsi, nous avons encodé les conditions requises dans le système de types et sommes capables de fournir une interface sûre vers l'extérieur. ### Une macro println Maintenant que nous avons un writer global, nous pouvons ajouter une macro `println` qui peut être utilisée n'importe où dans la base de code. La [syntaxe de macro] de Rust est un peu étrange, nous n'essaierons donc pas d'écrire une macro à partir de zéro. Au lieu de cela, nous regardons la source de la [macro `println!`] dans la bibliothèque standard : [syntaxe de macro]: https://doc.rust-lang.org/nightly/book/ch20-05-macros.html#declarative-macros-for-general-metaprogramming [macro `println!`]: https://doc.rust-lang.org/nightly/std/macro.println!.html ```rust #[macro_export] macro_rules! println { () => (print!("\n")); ($($arg:tt)*) => (print!("{}\n", format_args!($($arg)*))); } ``` Les macros sont définies par une ou plusieurs règles, similaires aux branches `match`. La macro `println` a deux règles : La première règle est pour les invocations sans arguments, par exemple `println!()`, qui est développée en `print!("\n")` et affiche donc juste une nouvelle ligne. La deuxième règle est pour les invocations avec des paramètres tels que `println!("Hello")` ou `println!("Number: {}", 4)`. Elle est également développée en une invocation de la macro `print!`, passant tous les arguments et une nouvelle ligne supplémentaire `\n` à la fin. L'attribut `#[macro_export]` rend la macro disponible pour toute la crate (pas seulement le module dans lequel elle est définie) et les crates externes. Il place également la macro à la racine de la crate, ce qui signifie que nous devons importer la macro via `use std::println` au lieu de `std::macros::println`. La [macro `print!`] est définie comme : [macro `print!`]: https://doc.rust-lang.org/nightly/std/macro.print!.html ```rust #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::io::_print(format_args!($($arg)*))); } ``` La macro se développe en un appel de la [fonction `_print`] dans le module `io`. La [variable `$crate`] garantit que la macro fonctionne également en dehors de la crate `std` en se développant en `std` lorsqu'elle est utilisée dans d'autres crates. La [macro `format_args`] construit un type [fmt::Arguments] à partir des arguments passés, qui est transmis à `_print`. La [fonction `_print`] de libstd appelle `print_to`, qui est assez compliquée car elle prend en charge différents périphériques `Stdout`. Nous n'avons pas besoin de cette complexité car nous voulons simplement afficher sur le tampon VGA. [fonction `_print`]: https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698 [variable `$crate`]: https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate [macro `format_args`]: https://doc.rust-lang.org/nightly/std/macro.format_args.html [fmt::Arguments]: https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html Pour afficher sur le tampon VGA, nous copions simplement les macros `println!` et `print!`, mais les modifions pour utiliser notre propre fonction `_print` : ```rust // dans src/vga_buffer.rs #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::vga_buffer::_print(format_args!($($arg)*))); } #[macro_export] macro_rules! println { () => ($crate::print!("\n")); ($($arg:tt)*) => ($crate::print!("{}\n", format_args!($($arg)*))); } #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` Une chose que nous avons changée par rapport à la définition originale de `println` est que nous avons préfixé les invocations de la macro `print!` avec `$crate` également. Cela garantit que nous n'avons pas besoin d'importer la macro `print!` aussi si nous voulons seulement utiliser `println`. Comme dans la bibliothèque standard, nous ajoutons l'attribut `#[macro_export]` aux deux macros pour les rendre disponibles partout dans notre crate. Notez que cela place les macros dans l'espace de noms racine de la crate, donc les importer via `use crate::vga_buffer::println` ne fonctionne pas. Au lieu de cela, nous devons faire `use crate::println`. La fonction `_print` verrouille notre `WRITER` statique et appelle la méthode `write_fmt` dessus. Cette méthode provient du trait `Write`, que nous devons importer. Le `unwrap()` supplémentaire à la fin panique si l'affichage n'est pas réussi. Mais puisque nous retournons toujours `Ok` dans `write_str`, cela ne devrait pas se produire. Comme les macros doivent pouvoir appeler `_print` depuis l'extérieur du module, la fonction doit être publique. Cependant, puisque nous considérons cela comme un détail d'implémentation privé, nous ajoutons l'[attribut `doc(hidden)`] pour le masquer de la documentation générée. [attribut `doc(hidden)`]: https://doc.rust-lang.org/nightly/rustdoc/write-documentation/the-doc-attribute.html#hidden ### Hello World en utilisant `println` Maintenant, nous pouvons utiliser `println` dans notre fonction `_start` : ```rust // dans src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); loop {} } ``` Notez que nous n'avons pas besoin d'importer la macro dans la fonction main, car elle vit déjà dans l'espace de noms racine. Comme prévu, nous voyons maintenant un _"Hello World!"_ à l'écran : ![QEMU affichant "Hello World!"](vga-hello-world.png) ### Affichage des messages de panique Maintenant que nous avons une macro `println`, nous pouvons l'utiliser dans notre fonction de panique pour afficher le message de panique et l'emplacement de la panique : ```rust // dans main.rs /// Cette fonction est appelée en cas de panique. #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } ``` Lorsque nous insérons maintenant `panic!("Some panic message");` dans notre fonction `_start`, nous obtenons la sortie suivante : ![QEMU affichant "panicked at 'Some panic message', src/main.rs:28:5"](vga-panic.png) Nous savons donc non seulement qu'une panique s'est produite, mais aussi le message de panique et où dans le code cela s'est produit. ## Résumé Dans cet article, nous avons appris la structure du tampon de texte VGA et comment il peut être écrit via le mappage mémoire à l'adresse `0xb8000`. Nous avons créé un module Rust qui encapsule le caractère unsafe de l'écriture dans ce tampon mappé en mémoire et présente une interface sûre et pratique vers l'extérieur. Grâce à cargo, nous avons également vu à quel point il est facile d'ajouter des dépendances à des bibliothèques tierces. Les deux dépendances que nous avons ajoutées, `lazy_static` et `spin`, sont très utiles dans le développement d'OS et nous les utiliserons dans plus d'endroits dans les futurs articles. ## Et ensuite ? Le prochain article explique comment configurer le framework de tests unitaires intégré de Rust. Nous créerons ensuite quelques tests unitaires de base pour le module de tampon VGA de cet article. ================================================ FILE: blog/content/edition-2/posts/03-vga-text-buffer/index.ja.md ================================================ +++ title = "VGAテキストモード" weight = 3 path = "ja/vga-text-mode" date = 2018-02-26 [extra] # Please update this when updating the translation translation_based_on_commit = "bd6fbcb1c36705b2c474d7fcee387bfea1210851" # GitHub usernames of the people that translated this post translators = ["swnakamura", "JohnTitor"] +++ [VGAテキストモード][VGA text mode]は画面にテキストを出力するシンプルな方法です。この記事では、すべてのunsafeな要素を別のモジュールにカプセル化することで、それを安全かつシンプルに扱えるようにするインターフェースを作ります。また、Rustの[フォーマッティングマクロ][formatting macros]のサポートも実装します。 [VGA text mode]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode [formatting macros]: https://doc.rust-lang.org/std/fmt/#related-macros このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-03` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-03 ## VGAテキストバッファ VGAテキストモードにおいて、文字を画面に出力するには、VGAハードウェアのテキストバッファにそれを書き込まないといけません。VGAテキストバッファは、普通25行と80列からなる2次元配列で、画面に直接書き出されます。それぞれの配列の要素は画面上の一つの文字を以下の形式で表現しています: | ビット | 値 | | ------ | -------------------------- | | 0-7 | ASCII コードポイント | | 8-11 | フォアグラウンド(前景)色 | | 12-14 | バックグラウンド(背景)色 | | 15 | 点滅 | 最初の1バイトは、出力されるべき文字を[ASCIIエンコーディング][ASCII encoding]で表します。正確に言うと、完全にASCIIではなく、[コードページ437][_code page 437_]という、いくつか文字が追加され、軽微な修正のなされたものです。簡単のため、この記事ではASCII文字と呼ぶことにします。 [ASCII encoding]: https://ja.wikipedia.org/wiki/ASCII [_code page 437_]: https://ja.wikipedia.org/wiki/コードページ437 2つ目のバイトはその文字がどのように出力されるのかを定義します。最初の4ビットが前景色(訳注:文字自体の色)を、次の3ビットが背景色を、最後のビットがその文字が点滅するのかを決めます。以下の色を使うことができます: | 数字 | 色 | 数字 + Bright Bit | Bright (明るい) 色 | | ---- | ------------ | ----------------- | ----------------------------------------------------------- | | 0x0 | 黒 | 0x8 | 暗いグレー | | 0x1 | 青 | 0x9 | 明るい青 | | 0x2 | 緑 | 0xa | 明るい緑 | | 0x3 | シアン | 0xb | 明るいシアン | | 0x4 | 赤 | 0xc | 明るい赤 | | 0x5 | マゼンタ | 0xd | ピンク | | 0x6 | 茶色 | 0xe | 黄色 | | 0x7 | 明るいグレー | 0xf | 白 | 4ビット目は **bright bit** で、これは(1になっているとき)たとえば青を明るい青に変えます。背景色については、このビットは点滅ビットとして再利用されています。 VGAテキストバッファはアドレス`0xb8000`に[memory-mapped (メモリマップされた) I/O][memory-mapped I/O]を通じてアクセスできます。これは、このアドレスへの読み書きをしても、RAMではなく直接VGAハードウェアのテキストバッファにアクセスするということを意味します。つまり、このアドレスに対する通常のメモリ操作を通じて、テキストバッファを読み書きできるのです。 [memory-mapped I/O]: https://ja.wikipedia.org/wiki/メモリマップドI/O メモリマップされたハードウェアは通常のRAM操作すべてをサポートしてはいないかもしれないということに注意してください。たとえば、デバイスはバイトずつの読み取りしかサポートしておらず、`u64`が読まれるとゴミデータを返すかもしれません。ありがたいことに、テキストバッファを特別なやり方で取り扱う必要がないよう、テキストバッファは[通常の読み書きをサポートしています][supports normal reads and writes]。 [supports normal reads and writes]: https://web.stanford.edu/class/cs140/projects/pintos/specs/freevga/vga/vgamem.htm#manip ## Rustのモジュール VGAバッファが動く仕組みを学んだので、さっそく画面出力を扱うRustのモジュールを作っていきます。 ```rust // in src/main.rs mod vga_buffer; ``` このモジュールの中身のために、新しい`src/vga_buffer.rs`というファイルを作ります。このファイル以下のコードは、(そうならないよう指定されない限り)すべてこの新しいモジュールの中に入ります。 ### 色 まず、様々な色をenumを使って表しましょう: ```rust // in src/vga_buffer.rs #[allow(dead_code)] #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u8)] pub enum Color { Black = 0, Blue = 1, Green = 2, Cyan = 3, Red = 4, Magenta = 5, Brown = 6, LightGray = 7, DarkGray = 8, LightBlue = 9, LightGreen = 10, LightCyan = 11, LightRed = 12, Pink = 13, Yellow = 14, White = 15, } ``` ここでは、それぞれの色の数を指定するのに[C言語ライクなenum][C-like enum]を使っています。`repr(u8)`属性のため、それぞれのenumのヴァリアントは`u8`として格納されています。実際には4ビットでも十分なのですが、Rustには`u4`型はありませんので。 [C-like enum]: https://doc.rust-jp.rs/rust-by-example-ja/custom_types/enum/c_like.html 通常、コンパイラは使われていないヴァリアントそれぞれに対して警告を発します。`#[allow(dead_code)]`属性を使うことで`Color` enumに対するそれらの警告を消すことができます。 [`Copy`]、[`Clone`]、[`Debug`]、[`PartialEq`]、および [`Eq`]を[derive][deriving]することによって、この型の[コピーセマンティクス][copy semantics]を有効化し、この型を出力することと比較することを可能にします。 [deriving]: https://doc.rust-jp.rs/rust-by-example-ja/trait/derive.html [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [`Clone`]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html [`Debug`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html [`PartialEq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html [`Eq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.Eq.html [copy semantics]: https://doc.rust-jp.rs/book-ja/appendix-03-derivable-traits.html#値を複製するcloneとcopy 前景と背景の色を指定する完全なカラーコードを表現するために、`u8`の上に[ニュータイプ][newtype]を作ります。 [newtype]: https://doc.rust-jp.rs/rust-by-example-ja/generics/new_types.html ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(transparent)] struct ColorCode(u8); impl ColorCode { fn new(foreground: Color, background: Color) -> ColorCode { ColorCode((background as u8) << 4 | (foreground as u8)) } } ``` `ColorCode`構造体は前景色と背景色を持つので、完全なカラーコードを持ちます。前と同じように、`Copy`と`Debug`トレイトをこれにderiveします。`ColorCode`が`u8`と全く同じデータ構造を持つようにするために、[`repr(transparent)`]属性(訳注:翻訳当時、リンク先未訳)を使います。 [`repr(transparent)`]: https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent ### テキストバッファ 次に、画面上の文字とテキストバッファをそれぞれ表す構造体を追加していきます。 ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(C)] struct ScreenChar { ascii_character: u8, color_code: ColorCode, } const BUFFER_HEIGHT: usize = 25; const BUFFER_WIDTH: usize = 80; #[repr(transparent)] struct Buffer { chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Rustにおいて、デフォルトの構造体におけるフィールドの並べ方は未定義なので、[`repr(C)`]属性が必要になります。これは、構造体のフィールドがCの構造体と全く同じように並べられることを保証してくれるので、フィールドの並べ方が正しいと保証してくれるのです。`Buffer`構造体については、[`repr(transparent)`]をもう一度使うことで、その唯一のフィールドと同じメモリレイアウトを持つようにしています。 [`repr(C)`]: https://doc.rust-jp.rs/rust-nomicon-ja/other-reprs.html#reprc 実際に画面に書き出すため、writer型を作ります。 ```rust // in src/vga_buffer.rs pub struct Writer { column_position: usize, color_code: ColorCode, buffer: &'static mut Buffer, } ``` writerは常に最後の行に書き、行が一杯になったとき(もしくは`\n`を受け取った時)は1行上に送ります。`column_position`フィールドは、最後の行における現在の位置を持ちます。現在の前景および背景色は`color_code`によって指定されており、VGAバッファへの参照は`buffer`に格納されています。ここで、コンパイラにどのくらいの間参照が有効であるのかを教えるために[明示的なライフタイム][explicit lifetime]が必要になることに注意してください。[`'static`]ライフタイムは、その参照がプログラムの実行中ずっと有効であることを指定しています(これはVGAバッファについて正しいです)。 [explicit lifetime]: https://doc.rust-jp.rs/book-ja/ch10-03-lifetime-syntax.html#ライフタイム注釈記法 [`'static`]: https://doc.rust-jp.rs/book-ja/ch10-03-lifetime-syntax.html#静的ライフタイム ### 出力する では`Writer`を使ってバッファの文字を変更しましょう。まず一つのASCII文字を書くメソッドを作ります: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { if self.column_position >= BUFFER_WIDTH { self.new_line(); } let row = BUFFER_HEIGHT - 1; let col = self.column_position; let color_code = self.color_code; self.buffer.chars[row][col] = ScreenChar { ascii_character: byte, color_code, }; self.column_position += 1; } } } fn new_line(&mut self) {/* TODO */} } ``` (引数の)バイトが[改行コード][newline]のバイトすなわち`\n`の場合は、writerは何も出力しません。代わりに、あとで実装する`new_line`メソッドを呼びます。他のバイトは、2つ目のマッチケースにおいて画面に出力されます。 [newline]: https://ja.wikipedia.org/wiki/%E6%94%B9%E8%A1%8C%E3%82%B3%E3%83%BC%E3%83%89 バイトを出力する時、writerは現在の行がいっぱいかをチェックします。その場合、行を折り返すために先に`new_line`の呼び出しが必要です。その後で現在の場所のバッファに新しい`ScreenChar`を書き込みます。最後に、現在の列の位置 (column position) を進めます。 文字列全体を出力するには、バイト列に変換しひとつひとつ出力すればよいです: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_string(&mut self, s: &str) { for byte in s.bytes() { match byte { // 出力可能なASCIIバイトか、改行コード 0x20..=0x7e | b'\n' => self.write_byte(byte), // 出力可能なASCIIバイトではない _ => self.write_byte(0xfe), } } } } ``` VGAテキストバッファはASCIIおよび[コードページ437][code page 437]にある追加のバイトのみをサポートしています。Rustの文字列はデフォルトでは[UTF-8]なのでVGAテキストバッファにはサポートされていないバイトを含んでいる可能性があります。matchを使って出力可能なASCIIバイト(改行コードか、空白文字から`~`文字の間のすべての文字)と出力不可能なバイトを分けています。出力不可能なバイトについては、文字`■`を出力します(これはVGAハードウェアにおいて16進コード`0xfe`を持っています)。 [code page 437]: https://ja.wikipedia.org/wiki/コードページ437 [UTF-8]: https://www.fileformat.info/info/unicode/utf8.htm #### やってみよう! 適当な文字を画面に書き出すために、一時的に使う関数を作ってみましょう。 ```rust // in src/vga_buffer.rs pub fn print_something() { let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello "); writer.write_string("Wörld!"); } ``` この関数はまず、VGAバッファの`0xb8000`を指す新しいwriterを作ります。このための構文はやや奇妙に思われるかもしれません:まず、整数`0xb8000`を可変な[生ポインタ][raw pointer]にキャストします。次にこれを(`*`を使って)参照外しすることで可変な参照に変え、即座にそれを(`&mut`を使って)再び借用します。コンパイラはこの生ポインタが有効であることを保証できないので、この変換には[`unsafe`ブロック][`unsafe` block]が必要となります。 [raw pointer]: https://doc.rust-jp.rs/book-ja/ch19-01-unsafe-rust.html#生ポインタを参照外しする [`unsafe` block]: https://doc.rust-jp.rs/book-ja/ch19-01-unsafe-rust.html つぎに、この関数はそれにバイト`b'H'`を書きます。`b`というプレフィックスは、ASCII文字を表す[バイトリテラル][byte literal]を作ります。文字列`"ello "`と`"Wörld!"`を書くことで、私達の`write_string`関数と出力不可能な文字の処理をテストできます。出力を見るためには、`print_something`関数を`_start`関数から呼び出さなければなりません: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { vga_buffer::print_something(); loop {} } ``` ここで、私達のプロジェクトを実行したら、`Hello W■■rld!`が画面の左 **下** に黄色で出力されるはずです。 [byte literal]: https://doc.rust-lang.org/reference/tokens.html#byte-literals ![QEMU output with a yellow `Hello W■■rld!` in the lower left corner](vga-hello.png) `ö`は2つの`■`という文字として出力されていることに注目してください。これは、`ö`は[UTF-8]において2つのバイトで表され、それらはどちらも出力可能なASCIIの範囲に収まっていないためです。実は、これはUTF-8の基本的な特性です:マルチバイト値のそれぞれのバイトは、絶対に有効なASCIIではないのです。 ### Volatile メッセージが正しく出力されるのを確認できました。しかし、より強力に最適化をする将来のRustコンパイラでは、これはうまく行かないかもしれません。 問題なのは、私達は`Buffer`に書き込むけれども、それから読み取ることはないということです。コンパイラは私達が実際には(通常のRAMの代わりに)VGAバッファメモリにアクセスしていることを知らないので、文字が画面に出力されるという副作用も全く知りません。なので、それらの書き込みは不要で省略可能と判断するかもしれません。この誤った最適化を回避するためには、それらの書き込みを **[volatile]** であると指定する必要があります。これは、この書き込みには副作用があり、最適化により取り除かれるべきではないとコンパイラに命令します。 [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) VGAバッファへのvolatileな書き込みをするために、[volatile][volatile crate]ライブラリを使います。この **クレート**(Rustではパッケージのことをこう呼びます)は、`read`と`write`というメソッドを持つ`Volatile`というラッパー型を提供します。これらのメソッドは、内部的にcoreライブラリの[read_volatile]と[write_volatile]関数を使い、読み込み・書き込みが最適化により取り除かれないことを保証します。 [volatile crate]: https://docs.rs/volatile [read_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html [write_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html `Cargo.toml`の`dependencies`セクションに`volatile`クレートを追加することで、このクレートへの依存関係を追加できます。 ```toml # in Cargo.toml [dependencies] volatile = "0.2.6" ``` `0.2.6`は[セマンティック][semantic]バージョン番号です。詳しくは、cargoドキュメントの[依存関係の指定][Specifying Dependencies]を見てください。 [semantic]: https://semver.org/lang/ja/ [Specifying Dependencies]: https://doc.crates.io/specifying-dependencies.html これを使って、VGAバッファへの書き込みをvolatileにしてみましょう。`Buffer`型を以下のように変更します: ```rust // in src/vga_buffer.rs use volatile::Volatile; struct Buffer { chars: [[Volatile; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` `ScreenChar`の代わりに、`Volatile`を使っています(`Volatile`型は[ジェネリック][generic]であり(ほぼ)すべての型をラップできます)。これにより、間違って「普通の」書き込みをこれに対して行わないようにできます。これからは、代わりに`write`メソッドを使わなければいけません。 [generic]: https://doc.rust-lang.org/book/ch10-01-syntax.html つまり、`Writer::write_byte`メソッドを更新しなければいけません: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { ... self.buffer.chars[row][col].write(ScreenChar { ascii_character: byte, color_code, }); ... } } } ... } ``` `=`を使った通常の代入の代わりに`write`メソッドを使っています。これにより、コンパイラがこの書き込みを最適化して取り除いてしまわないことが保証されます。 ### フォーマットマクロ Rustのフォーマットマクロ (formatting macro) もサポートすると良さそうです。そうすると、整数や浮動小数点数といった様々な型を簡単に出力できます。それらをサポートするためには、[`core::fmt::Write`]トレイトを実装する必要があります。このトレイトに必要なメソッドは`write_str`だけです。これは私達の`write_string`によく似ており、戻り値の型が`fmt::Result`であるだけです: [`core::fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html ```rust // in src/vga_buffer.rs use core::fmt; impl fmt::Write for Writer { fn write_str(&mut self, s: &str) -> fmt::Result { self.write_string(s); Ok(()) } } ``` `Ok(())`は、`()`型を持つ`Ok`、というだけです。 Rustの組み込みの`write!`/`writeln!`フォーマットマクロが使えるようになりました。 ```rust // in src/vga_buffer.rs pub fn print_something() { use core::fmt::Write; let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello! "); write!(writer, "The numbers are {} and {}", 42, 1.0/3.0).unwrap(); } ``` このようにすると、画面の下端に`Hello! The numbers are 42 and 0.3333333333333333`が見えるはずです。`write!`の呼び出しは`Result`を返し、これは放置されると警告を出すので、[`unwrap`]関数(エラーの際パニックします)をこれに呼び出しています。VGAバッファへの書き込みは絶対に失敗しないので、この場合これは問題ではありません。 [`unwrap`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.unwrap ### 改行 現在、改行や、行に収まらない文字は無視しています。その代わりに、すべての文字を一行上に持っていき(一番上の行は消去されます)、前の行の最初から始めるようにしたいです。これをするために、`Writer`の`new_line`というメソッドの実装を追加します。 ```rust // in src/vga_buffer.rs impl Writer { fn new_line(&mut self) { for row in 1..BUFFER_HEIGHT { for col in 0..BUFFER_WIDTH { let character = self.buffer.chars[row][col].read(); self.buffer.chars[row - 1][col].write(character); } } self.clear_row(BUFFER_HEIGHT - 1); self.column_position = 0; } fn clear_row(&mut self, row: usize) {/* TODO */} } ``` すべての画面の文字をイテレートし、それぞれの文字を一行上に動かします。範囲記法 (`..`) は上端を含まないことに注意してください。また、0行目はシフトしたら画面から除かれるので、この行についても省いています(最初の範囲は`1`から始まっています)。 newlineのプログラムを完成させるには、`clear_row`メソッドを追加すればよいです: ```rust // in src/vga_buffer.rs impl Writer { fn clear_row(&mut self, row: usize) { let blank = ScreenChar { ascii_character: b' ', color_code: self.color_code, }; for col in 0..BUFFER_WIDTH { self.buffer.chars[row][col].write(blank); } } } ``` このメソッドはすべての文字を空白文字で書き換えることによって行をクリアしてくれます。 ## 大域的 (global) なインターフェース `Writer`のインスタンスを動かさずとも他のモジュールからインターフェースとして使える、大域的なwriterを提供するために、静的 (static) な`WRITER`を作りましょう: ```rust // in src/vga_buffer.rs pub static WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; ``` しかし、これをコンパイルしようとすると、次のエラーが起こります: ``` error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants (エラー[E0015]: static内における呼び出しは、定数関数、タプル構造体、タプルヴァリアントに限定されています) --> src/vga_buffer.rs:7:17 | 7 | color_code: ColorCode::new(Color::Yellow, Color::Black), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ error[E0396]: raw pointers cannot be dereferenced in statics (エラー[E0396]: 生ポインタはstatic内では参照外しできません) --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer in constant | (定数内での生ポインタの参照外し) error[E0017]: references in statics may only refer to immutable values (エラー[E0017]: static内における参照が参照してよいのは不変変数だけです) --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values | (staticは不変変数を必要とします) error[E0017]: references in statics may only refer to immutable values (エラー[E0017]: static内における参照が参照してよいのは不変変数だけです) --> src/vga_buffer.rs:8:13 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values | (staticは不変変数を必要とします) ``` 何が起こっているかを理解するには、実行時に初期化される通常の変数とは対照的に、静的変数はコンパイル時に初期化されるということを知らないといけません。この初期化表現を評価するRustコンパイラのコンポーネントを"[const evaluator]"といいます。この機能はまだ限定的ですが、「[定数内でpanicできるようにする][Allow panicking in constants]」RFCのように、この機能を拡張する作業が現在も進行しています。 [const evaluator]: https://rustc-dev-guide.rust-lang.org/const-eval.html [Allow panicking in constants]: https://github.com/rust-lang/rfcs/pull/2345 `ColorCode::new`に関する問題は[`const`関数][`const` functions]を使って解決できるかもしれませんが、ここでの根本的な問題は、Rustのconst evaluatorがコンパイル時に生ポインタを参照へと変えることができないということです。いつかうまく行くようになるのかもしれませんが、その時までは、別の方法を行わなければなりません。 [`const` functions]: https://doc.rust-lang.org/reference/const_eval.html#const-functions ### 怠けた (Lazy) 静的変数 定数でない関数で一度だけ静的変数を初期化したい、というのはRustにおいてよくある問題です。嬉しいことに、[lazy_static]というクレートにすでに良い解決方法が存在します。このクレートは、初期化が後回しにされる`static`を定義する`lazy_static!`マクロを提供します。その値をコンパイル時に計算する代わりに、この`static`は最初にアクセスされたときに初めて初期化します。したがって、初期化は実行時に起こるので、どんなに複雑な初期化プログラムも可能ということです。
    **訳注:** lazyは、普通「遅延(評価)」などと訳されます。「怠けているので、アクセスされるギリギリまで評価されない」という英語のイメージを伝えたかったので上のように訳してみました。
    [lazy_static]: https://docs.rs/lazy_static/1.0.1/lazy_static/ 私達のプロジェクトに`lazy_static`クレートを追加しましょう: ```toml # in Cargo.toml [dependencies.lazy_static] version = "1.0" features = ["spin_no_std"] ``` 標準ライブラリをリンクしないので、`spin_no_std`機能が必要です。 `lazy_static`を使えば、静的な`WRITER`が問題なく定義できます: ```rust // in src/vga_buffer.rs use lazy_static::lazy_static; lazy_static! { pub static ref WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; } ``` しかし、この`WRITER`は不変 (immutable) なので、全く使い物になりません。なぜならこれは、この`WRITER`に何も書き込めないということを意味するからです(私達のすべての書き込みメソッドは`&mut self`を取るからです)。ひとつの解決策には、[可変 (mutable) で静的な変数][mutable static]を使うということがあります。しかし、そうすると、あらゆる読み書きが容易にデータ競合やその他の良くないことを引き起こしてしまうので、それらがすべてunsafeになってしまいます。`static mut`を使うことも、[それを削除しようという提案][remove static mut]すらあることを考えると、できる限り避けたいです。しかし他に方法はあるのでしょうか?不変静的変数を[RefCell]や、果ては[UnsafeCell]のような、[内部可変性 (interior mutability) ][interior mutability]を提供するcell型と一緒に使うという事も考えられます。しかし、それらの型は(ちゃんとした理由があって)[Sync]ではないので、静的変数で使うことはできません。 [mutable static]: https://doc.rust-jp.rs/book-ja/ch19-01-unsafe-rust.html#可変で静的な変数にアクセスしたり変更する [remove static mut]: https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437 [RefCell]: https://doc.rust-jp.rs/book-ja/ch15-05-interior-mutability.html#refcelltで実行時に借用を追いかける [UnsafeCell]: https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html [interior mutability]: https://doc.rust-jp.rs/book-ja/ch15-05-interior-mutability.html [Sync]: https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html ### スピンロック 同期された内部可変性を得るためには、標準ライブラリを使えるなら[Mutex]を使うことができます。これは、リソースがすでにロックされていた場合、スレッドをブロックすることにより相互排他性を提供します。しかし、私達の初歩的なカーネルにはブロックの機能はもちろんのこと、スレッドの概念すらないので、これも使うことはできません。しかし、コンピュータサイエンスの世界には、OSを必要としない非常に単純なmutexが存在するのです:それが[スピンロック (spinlock) ][spinlock]です。スピンロックを使うと、ブロックする代わりに、スレッドは単純にリソースを何度も何度もロックしようとすることで、mutexが開放されるまでの間CPU時間を使い尽くします。 [Mutex]: https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html [spinlock]: https://ja.wikipedia.org/wiki/スピンロック スピンロックによるmutexを使うには、[spinクレート][spin crate]への依存を追加すればよいです: [spin crate]: https://crates.io/crates/spin ```toml # in Cargo.toml [dependencies] spin = "0.5.2" ``` すると、スピンを使ったMutexを使うことができ、静的な`WRITER`に安全な[内部可変性][interior mutability]を追加できます。 ```rust // in src/vga_buffer.rs use spin::Mutex; ... lazy_static! { pub static ref WRITER: Mutex = Mutex::new(Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }); } ``` `print_something`関数を消して、`_start`関数から直接出力しましょう: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { use core::fmt::Write; vga_buffer::WRITER.lock().write_str("Hello again").unwrap(); write!(vga_buffer::WRITER.lock(), ", some numbers: {} {}", 42, 1.337).unwrap(); loop {} } ``` `fmt::Write`トレイトの関数を使うためには、このトレイトをインポートする必要があります。 ### 安全性 コードにはunsafeブロックが一つ(`0xb8000`を指す参照`Buffer`を作るために必要なもの)しかないことに注目してください。その後は、すべての命令が安全 (safe) です。Rustは配列アクセスにはデフォルトで境界チェックを行うので、間違ってバッファの外に書き込んでしまうことはありえません。よって、必要とされる条件を型システムにすべて組み込んだので、安全なインターフェースを外部に提供できます。 ### printlnマクロ 大域的なwriterを手に入れたので、プログラムのどこでも使える`println`マクロを追加できます。Rustの[マクロの構文][macro syntax]はすこしややこしいので、一からマクロを書くことはしません。代わりに、標準ライブラリで[`println!`マクロ][`println!` macro]のソースを見てみます: [macro syntax]: https://doc.rust-lang.org/nightly/book/ch20-05-macros.html#declarative-macros-for-general-metaprogramming [`println!` macro]: https://doc.rust-lang.org/nightly/std/macro.println!.html ```rust #[macro_export] macro_rules! println { () => (print!("\n")); ($($arg:tt)*) => (print!("{}\n", format_args!($($arg)*))); } ``` マクロは1つ以上のルールを使って定義されます(`match`アームと似ていますね)。`println`には2つのルールがあります:1つ目は引数なし呼び出し(例えば `println!()`)のためのもので、これは`print!("\n")`に展開され、よってただ改行を出力するだけになります。2つ目のルールはパラメータ付きの呼び出し(例えば`println!("Hello")`や `println!("Number: {}", 4)`)のためのものです。これも`print!`マクロの呼び出しへと展開され、すべての引数に加え、改行`\n`を最後に追加して渡します。 `#[macro_export]`属性はマクロを(その定義されたモジュールだけではなく)クレート全体および外部クレートで使えるようにします。また、これはマクロをクレートルートに置くため、`std::macros::println`の代わりに`use std::println`を使ってマクロをインポートしないといけないということを意味します。 [`print!`マクロ][`print!` macro]は以下のように定義されています: [`print!` macro]: https://doc.rust-lang.org/nightly/std/macro.print!.html ```rust #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::io::_print(format_args!($($arg)*))); } ``` このマクロは`io`モジュール内の[`_print`関数][`_print` function]の呼び出しへと展開しています。[`$crate`という変数][`$crate` variable]は、他のクレートで使われた際、`std`へと展開することによって、マクロが`std`クレートの外側で使われたとしてもうまく動くようにしてくれます。 [`format_args`マクロ][`format_args` macro]が与えられた引数から[fmt::Arguments]型を作り、これが`_print`へと渡されています。libstdの[`_print`関数]は`print_to`を呼び出すのですが、これは様々な`Stdout`デバイスをサポートいているためかなり煩雑です。ここではただVGAバッファに出力したいだけなので、そのような煩雑な実装は必要ありません。 [`_print` function]: https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698 [`$crate` variable]: https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate [`format_args` macro]: https://doc.rust-lang.org/nightly/std/macro.format_args.html [fmt::Arguments]: https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html VGAバッファに出力するには、`println!`マクロと`print!`マクロをコピーし、独自の`_print`関数を使うように修正してやればいいです: ```rust // in src/vga_buffer.rs #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::vga_buffer::_print(format_args!($($arg)*))); } #[macro_export] macro_rules! println { () => ($crate::print!("\n")); ($($arg:tt)*) => ($crate::print!("{}\n", format_args!($($arg)*))); } #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` 元の`println`の定義と異なり、`print!`マクロの呼び出しにも`$crate`をつけるようにしています。これにより、`println`だけを使いたいと思ったら`print!`マクロもインポートしなくていいようになります。 標準ライブラリのように、`#[macro_export]`属性を両方のマクロに与え、クレートのどこでも使えるようにします。このようにすると、マクロはクレートの名前空間のルートに置かれるので、`use crate::vga_buffer::println`としてインポートするとうまく行かないことに注意してください。代わりに、 `use crate::println`としなければいけません。 `_print`関数は静的な`WRITER`をロックし、その`write_fmt`メソッドを呼び出します。このメソッドは`Write`トレイトのものなので、このトレイトもインポートしないといけません。最後に追加した`unwrap()`は、画面出力がうまく行かなかったときパニックします。しかし、`write_str`は常に`Ok`を返すようにしているので、これは起きないはずです。 マクロは`_print`をモジュールの外側から呼び出せる必要があるので、この関数は公開 (public) されていなければなりません。しかし、これは非公開 (private) の実装の詳細であると考え、[`doc(hidden)`属性][`doc(hidden)` attribute]をつけることで、生成されたドキュメントから隠すようにします。 [`doc(hidden)` attribute]: https://doc.rust-lang.org/nightly/rustdoc/write-documentation/the-doc-attribute.html#hidden ### `println`を使ってHello World こうすることで、`_start`関数で`println`を使えるようになります: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() { println!("Hello World{}", "!"); loop {} } ``` マクロはすでに名前空間のルートにいるので、main関数内でマクロをインポートしなくても良いということに注意してください。 期待通り、画面に Hello World! と出ています: ![QEMU printing “Hello World!”](vga-hello-world.png) ### パニックメッセージを出力する `println`マクロを手に入れたので、これを私達のパニック関数で使って、パニックメッセージとパニックの場所を出力させることができます: ```rust // in main.rs /// この関数はパニック時に呼ばれる。 #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } ``` `panic!("Some panic message");`という文を`_start`関数に書くと、次の出力を得ます: ![QEMU printing “panicked at 'Some panic message', src/main.rs:28:5](vga-panic.png) つまり、パニックが起こったということだけでなく、パニックメッセージとそれがコードのどこで起こったかまで知ることができます。 ## まとめ この記事では、VGAテキストバッファの構造と、どのようにすれば`0xb8000`番地におけるメモリマッピングを通じてそれに書き込みを行えるかを学びました。このメモリマップされたバッファへの書き込みというunsafeな操作をカプセル化し、安全で便利なインターフェースを外部に提供するRustモジュールを作りました。 また、cargoのおかげでサードパーティのライブラリへの依存関係を簡単に追加できることも分かりました。`lazy_static`と`spin`という2つの依存先は、OS開発においてとても便利であり、今後の記事においても使っていきます。 ## 次は? 次の記事ではRustに組み込まれている単体テストフレームワークをセットアップする方法を説明します。その後、この記事のVGAバッファモジュールに対する基本的な単体テストを作ります。 ================================================ FILE: blog/content/edition-2/posts/03-vga-text-buffer/index.ko.md ================================================ +++ title = "VGA 텍스트 모드" weight = 3 path = "ko/vga-text-mode" date = 2018-02-26 [extra] # Please update this when updating the translation translation_based_on_commit = "1c9b5edd6a5a667e282ca56d6103d3ff1fd7cfcb" # GitHub usernames of the people that translated this post translators = ["JOE1994", "Quqqu"] +++ [VGA 텍스트 모드][VGA text mode]를 통해 쉽게 화면에 텍스트를 출력할 수 있습니다. 이 글에서는 안전하지 않은 작업들을 분리된 모듈에 격리해 쉽고 안전하게 VGA 텍스트 모드를 이용할 수 있는 인터페이스를 구현합니다. 또한 Rust의 [서식 정렬 매크로 (formatting macro)][formatting macros]에 대한 지원을 추가할 것입니다. [VGA text mode]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode [formatting macros]: https://doc.rust-lang.org/std/fmt/#related-macros 이 블로그는 [GitHub 저장소][GitHub]에서 오픈 소스로 개발되고 있으니, 문제나 문의사항이 있다면 저장소의 'Issue' 기능을 이용해 제보해주세요. [페이지 맨 아래][at the bottom]에 댓글을 남기실 수도 있습니다. 이 포스트와 관련된 모든 소스 코드는 저장소의 [`post-03 브랜치`][post branch]에서 확인하실 수 있습니다. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-03 ## VGA 텍스트 버퍼 VGA 텍스트 모드에서 화면에 문자를 출력하려면 VGA 하드웨어의 텍스트 버퍼에 해당 문자를 저장해야 합니다. VGA 텍스트 버퍼는 보통 25행 80열 크기의 2차원 배열이며, 해당 버퍼에 저장된 값들은 즉시 화면에 렌더링 됩니다. 배열의 각 원소는 화면에 출력될 문자를 아래의 형식으로 표현합니다: | 비트 | 값 | | ----- | ----------- | | 0-7 | ASCII 코드 | | 8-11 | 전경색 | | 12-14 | 배경색 | | 15 | 깜빡임 여부 | 첫 바이트는 [ASCII 인코딩][ASCII encoding]으로 출력될 문자를 나타냅니다. 엄밀히 따지자면 ASCII 인코딩이 아닌, 해당 인코딩에 문자들을 추가하고 살짝 변형한 [_code page 437_] 이라는 인코딩을 이용합니다. 설명을 간소화하기 위해 이하 본문에서는 그냥 ASCII 문자로 지칭하겠습니다. [ASCII encoding]: https://en.wikipedia.org/wiki/ASCII [_code page 437_]: https://en.wikipedia.org/wiki/Code_page_437 두 번째 바이트는 표현하는 문자가 어떻게 표시될 것인지를 정의합니다. 두 번째 바이트의 첫 4비트는 전경색을 나타내고, 그 다음 3비트는 배경색을 나타내며, 마지막 비트는 해당 문자가 화면에서 깜빡이도록 할지 결정합니다. 아래의 색상들을 이용할 수 있습니다: | 숫자 값 | 색상 | 색상 + 밝기 조정 비트 | 밝기 조정 후 최종 색상 | | ------- | ---------- | --------------------- | ---------------------- | | 0x0 | Black | 0x8 | Dark Gray | | 0x1 | Blue | 0x9 | Light Blue | | 0x2 | Green | 0xa | Light Green | | 0x3 | Cyan | 0xb | Light Cyan | | 0x4 | Red | 0xc | Light Red | | 0x5 | Magenta | 0xd | Pink | | 0x6 | Brown | 0xe | Yellow | | 0x7 | Light Gray | 0xf | White | 두 번째 바이트의 네 번째 비트 (_밝기 조정 비트_)를 통해 파란색을 하늘색으로 조정하는 등 색의 밝기를 변경할 수 있습니다. 배경색을 지정하는 3비트 이후의 마지막 비트는 깜빡임 여부를 지정합니다. [메모리 맵 입출력 (memory-mapped I/O)][memory-mapped I/O]으로 메모리 주소 `0xb8000`을 통해 VGA 텍스트 버퍼에 접근할 수 있습니다. 해당 주소에 읽기/쓰기 작업을 하면 RAM 대신 VGA 텍스트 버퍼에 직접 읽기/쓰기가 적용됩니다. [memory-mapped I/O]: https://en.wikipedia.org/wiki/Memory-mapped_I/O 메모리 맵 입출력 적용 대상 하드웨어가 일부 RAM 작업을 지원하지 않을 가능성을 염두해야 합니다. 예를 들어, 바이트 단위 읽기만 지원하는 장치로부터 메모리 맵 입출력을 통해 `u64`를 읽어들일 경우 쓰레기 값이 반환될 수도 있습니다. 다행히 텍스트 버퍼는 [일반적인 읽기/쓰기 작업들을 모두 지원하기에][supports normal reads and writes] 읽기/쓰기를 위한 특수 처리가 필요하지 않습니다. [supports normal reads and writes]: https://web.stanford.edu/class/cs140/projects/pintos/specs/freevga/vga/vgamem.htm#manip ## Rust 모듈 이제 VGA 버퍼가 어떻게 작동하는지 알았으니, 버퍼를 이용해 출력하는 것을 담당할 Rust 모듈을 만들어봅시다: ```rust // in src/main.rs mod vga_buffer; ``` 새로운 모듈 `vga_buffer`를 위해 파일 `src/vga_buffer.rs`을 만듭니다. 이후 나타나는 모든 코드는 이 모듈에 들어갈 내용입니다 (별도의 지시 사항이 붙는 경우 제외). ### 색상 우선 enum을 이용하여 사용 가능한 여러 색상들을 표현합니다: ```rust // in src/vga_buffer.rs #[allow(dead_code)] #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u8)] pub enum Color { Black = 0, Blue = 1, Green = 2, Cyan = 3, Red = 4, Magenta = 5, Brown = 6, LightGray = 7, DarkGray = 8, LightBlue = 9, LightGreen = 10, LightCyan = 11, LightRed = 12, Pink = 13, Yellow = 14, White = 15, } ``` 각 색상마다 고유 숫자 값을 배정할 수 있도록 우리는 [C언어와 같은 enum][C-like enum]을 사용합니다. `repr(u8)` 속성 때문에 enum의 각 분류 값은 `u8` 타입으로 저장됩니다. 사실 저장 공간은 4 비트만으로도 충분하지만, Rust에는 `u4` 타입이 없습니다. [C-like enum]: https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html 사용되지 않는 enum 분류 값이 있을 때마다 컴파일러는 불필요한 코드가 있다는 경고 메시지를 출력합니다. 하지만 위처럼 `#[allow(dead_code)]` 속성을 적용하면 `Color` enum에 대해서는 컴파일러가 해당 경고 메시지를 출력하지 않습니다. `Color` 타입에 [`Copy`], [`Clone`], [`Debug`], [`PartialEq`] 그리고 [`Eq`] 트레이트들을 [구현 (derive)][deriving] 함으로써 `Color` 타입이 [copy semantics] 를 따르도록 하고 또한 `Color` 타입 변수를 출력하거나 두 `Color` 타입 변수를 서로 비교할 수 있도록 합니다. [deriving]: https://doc.rust-lang.org/rust-by-example/trait/derive.html [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [`Clone`]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html [`Debug`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html [`PartialEq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html [`Eq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.Eq.html [copy semantics]: https://doc.rust-lang.org/1.30.0/book/first-edition/ownership.html#copy-types 전경색과 배경색을 모두 표현할 수 있는 색상 코드를 표현하기 위해 `u8` 타입을 감싸는 [newtype]을 선언합니다: [newtype]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(transparent)] struct ColorCode(u8); impl ColorCode { fn new(foreground: Color, background: Color) -> ColorCode { ColorCode((background as u8) << 4 | (foreground as u8)) } } ``` `ColorCode` 구조체는 전경색 및 배경색을 모두 표현하는 색상 바이트 전체의 정보를 지닙니다. 이전처럼 `Copy` 및 `Debug` 트레이트를 구현 (derive) 해줍니다. `ColorCode` 구조체가 메모리 상에서 `u8` 타입과 같은 저장 형태를 가지도록 [`repr(transparent)`] 속성을 적용합니다. [`repr(transparent)`]: https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent ### 텍스트 버퍼 스크린 상의 문자 및 텍스트 버퍼를 표현하는 구조체들을 아래와 같이 추가합니다: ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(C)] struct ScreenChar { ascii_character: u8, color_code: ColorCode, } const BUFFER_HEIGHT: usize = 25; const BUFFER_WIDTH: usize = 80; #[repr(transparent)] struct Buffer { chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Rust에서는 구조체 정의 코드에서의 필드 정렬 순서와 메모리 상에서 구조체의 각 필드가 저장되는 순서가 동일하지 않을 수 있습니다. 구조체의 각 필드 정렬 순서가 컴파일 중에 바뀌지 않도록 하려면 [`repr(C)`] 속성이 필요합니다. 이 속성을 사용하면 C언어의 구조체처럼 컴파일러가 구조체 내 각 필드의 정렬 순서를 임의로 조정할 수 없게 되기에, 우리는 메모리 상에서 구조체의 각 필드가 어떤 순서로 저장되는지 확신할 수 있습니다. 또한 `Buffer` 구조체에 [`repr(transparent)`] 속성을 적용하여 메모리 상에서 해당 구조체가 저장되는 형태가 `chars` 필드의 저장 형태와 동일하도록 해줍니다. [`repr(C)`]: https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc 이제 아래와 같은 Writer 타입을 만들어 실제로 화면에 출력하는 데에 이용할 것입니다: ```rust // in src/vga_buffer.rs pub struct Writer { column_position: usize, color_code: ColorCode, buffer: &'static mut Buffer, } ``` Writer는 언제나 가장 마지막 행에 값을 작성할 것이며, 작성 중인 행이 꽉 차거나 개행문자를 입력받은 경우에는 작성 중이던 행을 마치고 새로운 행으로 넘어갈 것입니다. 전경색 및 배경색은 `color_code`를 통해 표현되고 `buffer`에 VGA 버퍼에 대한 레퍼런스를 저장합니다. `buffer`에 대한 레퍼런스가 유효한 기간을 컴파일러에게 알리기 위해서 [명시적인 lifetime][explicit lifetime]이 필요합니다. [`'static`] lifetime 표기는 VGA 버퍼에 대한 레퍼런스가 프로그램 실행 시간 내내 유효하다는 것을 명시합니다. [explicit lifetime]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-annotation-syntax [`'static`]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime ### 출력하기 이제 `Writer`를 이용하여 VGA 버퍼에 저장된 문자들을 변경할 수 있게 되었습니다. 우선 아래와 같이 하나의 ASCII 바이트를 출력하는 함수를 만듭니다: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { if self.column_position >= BUFFER_WIDTH { self.new_line(); } let row = BUFFER_HEIGHT - 1; let col = self.column_position; let color_code = self.color_code; self.buffer.chars[row][col] = ScreenChar { ascii_character: byte, color_code, }; self.column_position += 1; } } } fn new_line(&mut self) {/* TODO */} } ``` 주어진 바이트 값이 [개행 문자][newline] `\n`일 경우, Writer는 아무것도 출력하지 않고 대신 `new_line` 함수 (아래에서 함께 구현할 예정)를 호출합니다. 다른 바이트 값들은 match문의 두 번째 패턴에 매치되어 화면에 출력됩니다. [newline]: https://en.wikipedia.org/wiki/Newline 바이트를 출력할 때, Writer는 현재 행이 가득 찼는지 확인합니다. 현재 행이 가득 찬 경우, 개행을 위해 `new_line` 함수를 먼저 호출해야 합니다. 그 후 버퍼에서의 현재 위치에 새로운 `ScreenChar`를 저장합니다. 마지막으로 현재 열 위치 값을 한 칸 올립니다. 위에서 구현한 함수로 문자열의 각 문자를 하나씩 출력함으로써 문자열 전체를 출력할 수도 있습니다: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_string(&mut self, s: &str) { for byte in s.bytes() { match byte { // 출력 가능한 ASCII 바이트 혹은 개행 문자 0x20..=0x7e | b'\n' => self.write_byte(byte), // ASCII 코드 범위 밖의 값 _ => self.write_byte(0xfe), } } } } ``` VGA 텍스트 버퍼는 ASCII 문자 및 [코드 페이지 437][code page 437] 인코딩의 문자들만 지원합니다. Rust의 문자열은 기본 인코딩이 [UTF-8]이기에 VGA 텍스트 버퍼가 지원하지 않는 바이트들을 포함할 수 있습니다. 그렇기에 위 함수에서 `match`문을 통해 VGA 버퍼를 통해 출력 가능한 문자 (개행 문자 및 스페이스 문자와 `~` 문자 사이의 모든 문자)와 그렇지 않은 문자를 구분하여 처리합니다. 출력 불가능한 문자의 경우, VGA 하드웨어에서 16진수 코드 `0xfe`를 가지는 문자 (`■`)을 출력합니다. [code page 437]: https://en.wikipedia.org/wiki/Code_page_437 [UTF-8]: https://www.fileformat.info/info/unicode/utf8.htm #### 테스트 해봅시다! 간단한 함수를 하나 만들어 화면에 문자들을 출력해봅시다: ```rust // in src/vga_buffer.rs pub fn print_something() { let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello "); writer.write_string("Wörld!"); } ``` 우선 메모리 주소 `0xb8000`을 가리키는 새로운 Writer 인스턴스를 생성합니다. 이를 구현한 코드가 다소 난해하게 느껴질 수 있으니 단계별로 나누어 설명드리겠습니다: 먼저 정수 `0xb8000`을 읽기/쓰기 모두 가능한 (mutable) [포인터][raw pointer]로 타입 변환합니다. 그 후 `*` 연산자를 통해 이 포인터를 역참조 (dereference) 하고 `&mut`를 통해 즉시 borrow 함으로써 해당 주소에 저장된 값을 변경할 수 있는 레퍼런스 (mutable reference)를 만듭니다. 여기서 Rust 컴파일러는 포인터의 유효성 및 안전성을 보증할 수 없기에, [`unsafe` 블록][`unsafe` block]을 사용해야만 포인터를 레퍼런스로 변환할 수 있습니다. [raw pointer]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`unsafe` block]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html 그 다음 Writer 인스턴스에 바이트 `b'H'`를 적습니다. 접두사 `b`는 ASCII 문자를 나타내는 [바이트 상수 (literal)][byte literal] 를 생성합니다. 문자열 `"ello "`와 `"Wörld!"`를 적음으로써 `write_string` 함수 및 출력 불가능한 문자에 대한 특수 처리가 잘 구현되었는지 테스트 해봅니다. 화면에 메시지가 출력되는지 확인하기 위해 `print_something` 함수를 `_start` 함수에서 호출합니다: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { vga_buffer::print_something(); loop {} } ``` 프로젝트를 실행하면 `Hello W■■rld!` 라는 메시지가 화면 왼쪽 _아래_ 구석에 노란 텍스트로 출력됩니다: [byte literal]: https://doc.rust-lang.org/reference/tokens.html#byte-literals ![QEMU output with a yellow `Hello W■■rld!` in the lower left corner](vga-hello.png) 문자 `ö` 대신 두 개의 `■` 문자가 출력되었습니다. 문자 `ö`는 [UTF-8] 인코딩에서 두 바이트로 표현되는데, 각각의 바이트가 출력 가능한 ASCII 문자 범위에 있지 않기 때문입니다. 이는 사실 UTF-8 인코딩의 핵심 특징으로, 두 바이트 이상으로 표현되는 문자들의 각 바이트는 유효한 ASCII 값을 가질 수 없습니다. ### Volatile 위에서 화면에 메시지가 출력되는 것을 확인했습니다. 하지만 미래의 Rust 컴파일러가 더 공격적으로 프로그램 최적화를 하게 된다면 메시지가 출력되지 않을 수 있습니다. 여기서 주목해야 할 것은 우리가 `Buffer`에 데이터를 쓰기만 할 뿐 읽지는 않는다는 점입니다. 컴파일러는 우리가 일반 RAM 메모리가 아닌 VGA 버퍼 메모리에 접근한다는 사실을 알지 못하며, 해당 버퍼에 쓰인 값이 화면에 출력되는 현상 (외부에서 관찰 가능한 상태 변화)에 대해서도 이해하지 못합니다. 그렇기에 컴파일러가 VGA 버퍼에 대한 쓰기 작업이 불필요하다고 판단하여 프로그램 최적화 중에 해당 작업들을 삭제할 수도 있습니다. 이를 방지하려면 VGA 버퍼에 대한 쓰기 작업이 _[volatile]_ 하다고 명시함으로써 해당 쓰기 작업이 관찰 가능한 상태 변화 (side effect)를 일으킨다는 것을 컴파일러에게 알려야 합니다. [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) VGA 버퍼에 volatile한 방식으로 데이터를 쓰기 위해 우리는 [volatile][volatile crate] 크레이트를 사용합니다. 이 _크레이트_ (패키지 형태의 Rust 라이브러리) 는 `Volatile` 이라는 포장 타입 (wrapper type)과 함께 `read` 및 `write` 함수들을 제공합니다. 이 함수들은 내부적으로 Rust 코어 라이브러리의 [read_volatile] 및 [write_volatile] 함수들을 사용함으로써 읽기/쓰기 작업이 프로그램 최적화 중에 제거되지 않게 합니다. [volatile crate]: https://docs.rs/volatile [read_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html [write_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html `Cargo.toml`의 `dependencies` 섹션에 `volatile` 크레이트를 추가합니다: ```toml # in Cargo.toml [dependencies] volatile = "0.2.6" ``` 꼭 `volatile` 크레이트의 `0.2.6` 버전을 사용하셔야 합니다. 그 이후 버전의 `volatile` 크레이트는 이 포스트의 코드와 호환되지 않습니다. `0.2.6`은 [semantic] 버전 넘버를 나타내는데, 자세한 내용은 cargo 문서의 [Specifying Dependencies] 챕터를 확인해주세요. [semantic]: https://semver.org/ [Specifying Dependencies]: https://doc.crates.io/specifying-dependencies.html 이제 이 크레이트를 써서 VGA 버퍼에 대한 쓰기 작업이 volatile 하도록 만들 것입니다. `Buffer` 타입을 정의하는 코드를 아래처럼 수정해주세요: ```rust // in src/vga_buffer.rs use volatile::Volatile; struct Buffer { chars: [[Volatile; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` `ScreenChar` 대신 `Volatile`를 사용합니다. (`Volatile` 타입은 [제네릭 (generic)][generic] 타입이며 거의 모든 타입을 감쌀 수 있습니다). 이로써 해당 타입에 대해 실수로 “일반” 쓰기 작업을 하는 실수를 방지할 수 있게 되었습니다. 이제 쓰기 작업 구현 시 `write` 함수만을 이용해야 합니다. [generic]: https://doc.rust-lang.org/book/ch10-01-syntax.html `Writer::write_byte` 함수가 `write`함수를 사용하도록 아래처럼 변경합니다: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { ... self.buffer.chars[row][col].write(ScreenChar { ascii_character: byte, color_code, }); ... } } } ... } ``` 일반 대입 연산자 `=` 대신에 `write` 함수를 사용하였기에, 컴파일러는 최적화 단계에 절대로 해당 쓰기 작업을 삭제하지 않을 것입니다. ### 서식 정렬 매크로 `Writer` 타입이 Rust의 서식 정렬 매크로 (formatting macro) 를 지원한다면 정수나 부동 소수점 값 등 다양한 타입의 값들을 편리하고 쉽게 출력할 수 있을 것입니다. `Writer`가 Rust의 서식 정렬 매크로를 지원하려면 [`core::fmt::Write`] 트레이트를 구현해야 합니다. 해당 트레이트를 구현하기 위해서는 `write_str` 함수만 구현하면 되는데, 이 함수는 우리가 위에서 구현한 `write_string` 함수와 거의 유사하나 반환 타입이 `fmt::Result` 타입인 함수입니다: [`core::fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html ```rust // in src/vga_buffer.rs use core::fmt; impl fmt::Write for Writer { fn write_str(&mut self, s: &str) -> fmt::Result { self.write_string(s); Ok(()) } } ``` 반환 값 `Ok(())` 는 `()` 타입을 감싸는 `Result` 타입의 `Ok` 입니다. 이제 Rust에서 기본적으로 제공되는 서식 정렬 매크로 `write!`/`writeln!`을 사용할 수 있습니다: ```rust // in src/vga_buffer.rs pub fn print_something() { use core::fmt::Write; let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello! "); write!(writer, "The numbers are {} and {}", 42, 1.0/3.0).unwrap(); } ``` 화면 맨 아래에 메시지 `Hello! The numbers are 42 and 0.3333333333333333`가 출력될 것입니다. `write!` 매크로는 `Result`를 반환하는데, `Result`가 사용되지 않았다는 오류가 출력되지 않도록 [`unwrap`] 함수를 호출합니다. 반환된 `Result`가 `Err()`일 경우 프로그램이 패닉 (panic) 하겠지만, 우리가 작성한 코드는 VGA 버퍼에 대한 쓰기 후 언제나 `Ok()`를 반환하기에 패닉이 발생하지 않습니다. [`unwrap`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.unwrap ### 개행 현재 행이 꽉 찬 상태에서 입력받은 문자 및 개행 문자에 대해 우리는 아직 아무런 대응을 하지 않습니다. 이러한 경우 현재 행의 모든 문자들을 한 행씩 위로 올려 출력하고 (맨 위 행은 지우고) 비워진 현재 행의 맨 앞 칸에서부터 다시 시작해야 합니다. 아래의 `new_line` 함수를 통해 해당 작업을 구현합니다: ```rust // in src/vga_buffer.rs impl Writer { fn new_line(&mut self) { for row in 1..BUFFER_HEIGHT { for col in 0..BUFFER_WIDTH { let character = self.buffer.chars[row][col].read(); self.buffer.chars[row - 1][col].write(character); } } self.clear_row(BUFFER_HEIGHT - 1); self.column_position = 0; } fn clear_row(&mut self, row: usize) {/* TODO */} } ``` 화면에 출력된 각 문자들을 순회하며 전부 한 행씩 위로 올려 출력합니다. 범위를 나타내는 `..` 표기는 범위의 상한 값을 포함하지 않는다는 것을 주의해 주세요. 0번째 행은 화면 밖으로 사라질 행이기에 순회하지 않습니다. 아래의 `clear_row` 함수를 추가하여 개행 문자 처리 코드를 완성합니다: ```rust // in src/vga_buffer.rs impl Writer { fn clear_row(&mut self, row: usize) { let blank = ScreenChar { ascii_character: b' ', color_code: self.color_code, }; for col in 0..BUFFER_WIDTH { self.buffer.chars[row][col].write(blank); } } } ``` 이 함수는 한 행의 모든 문자를 스페이스 문자로 덮어쓰는 방식으로 한 행의 내용을 전부 지웁니다. ## 전역 접근 가능한 인터페이스 `Writer` 인스턴스를 이리저리 옮겨다닐 필요가 없도록 전역 접근 가능한 `Writer`를 제공하기 위해 정적 변수 `WRITER`를 만들어 봅시다: ```rust // in src/vga_buffer.rs pub static WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; ``` 컴파일 시 아래의 오류 메시지가 출력될 것입니다: ``` error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants --> src/vga_buffer.rs:7:17 | 7 | color_code: ColorCode::new(Color::Yellow, Color::Black), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ error[E0396]: raw pointers cannot be dereferenced in statics --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer in constant error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:13 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values ``` 여기서 오류가 왜 발생했는지 이해하려면 우선 알아야 할 것이 있습니다. 그것은 바로 일반 자동 변수들이 프로그램 실행 시간에 초기화 되는 반면에 정적 (static) 변수들은 컴파일 시간에 초기화된다는 점입니다. Rust 컴파일러의 "[const evaluator]" 컴포넌트가 정적 변수를 컴파일 시간에 초기화합니다. 아직 구현된 기능이 많지는 않지만, 해당 컴포넌트의 기능을 확장하는 작업이 진행 중입니다 (예시: “[Allow panicking in constants]” RFC). [const evaluator]: https://rustc-dev-guide.rust-lang.org/const-eval.html [Allow panicking in constants]: https://github.com/rust-lang/rfcs/pull/2345 `ColorCode::new`에 대한 오류는 [`const` 함수][`const` functions]를 이용해 쉽게 해결할 수 있습니다. 더 큰 문제는 바로 Rust의 const evaluator가 컴파일 시간에 raw pointer를 레퍼런스로 전환하지 못한다는 것입니다. 미래에는 이것이 가능해질 수도 있겠지만, 현재로서는 다른 해법을 찾아야 합니다. [`const` functions]: https://doc.rust-lang.org/reference/const_eval.html#const-functions ### 정적 변수의 초기화 지연 Rust 개발을 하다 보면 const가 아닌 함수를 이용해 1회에 한해 정적 변수의 값을 설정해야 하는 상황이 자주 발생합니다. [lazy_static] 크레이트의 `lazy_static!` 매크로를 이용하면, 정적 변수의 값을 컴파일 시간에 결정하지 않고 초기화 시점을 해당 프로그램 실행 중 변수에 대한 접근이 처음 일어나는 시점까지 미룰 수 있습니다. 즉, 정적 변수 초기화가 프로그램 실행 시간에 진행되기에 초기 값을 계산할 때 const가 아닌 복잡한 함수들을 사용할 수 있습니다. [lazy_static]: https://docs.rs/lazy_static/1.0.1/lazy_static/ 프로젝트 의존 라이브러리로서 `lazy_static` 크레이트를 추가해줍니다: ```toml # in Cargo.toml [dependencies.lazy_static] version = "1.0" features = ["spin_no_std"] ``` 우리는 러스트 표준 라이브러리를 링크하지 않기에 `spin_no_std` 기능이 필요합니다. `lazy_static` 크레이트 덕분에 이제 오류 없이 `WRITER`를 정의할 수 있습니다: ```rust // in src/vga_buffer.rs use lazy_static::lazy_static; lazy_static! { pub static ref WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; } ``` 현재 `WRITER`는 immutable (읽기 가능, 쓰기 불가능) 하여 실질적인 쓸모가 없습니다. 모든 쓰기 함수들은 첫 인자로 `&mut self`를 받기 때문에 `WRITER`로 어떤 쓰기 작업도 할 수가 없습니다. 이에 대한 해결책으로 [mutable static]은 어떨까요? 이 선택지를 고른다면 모든 읽기 및 쓰기 작업이 데이터 레이스 (data race) 및 기타 위험에 노출되기에 안전을 보장할 수 없게 됩니다. Rust에서 `static mut`는 웬만하면 사용하지 않도록 권장되며, 심지어 [Rust 언어에서 완전히 `static mut`를 제거하자는 제안][remove static mut]이 나오기도 했습니다. 이것 이외에도 대안이 있을까요? [내부 가변성 (interior mutability)][interior mutability]을 제공하는 [RefCell] 혹은 [UnsafeCell] 을 통해 immutable한 정적 변수를 만드는 것은 어떨까요? 이 타입들은 중요한 이유로 [Sync] 트레이트를 구현하지 않기에 정적 변수를 선언할 때에는 사용할 수 없습니다. [mutable static]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable [remove static mut]: https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437 [RefCell]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#keeping-track-of-borrows-at-runtime-with-refcellt [UnsafeCell]: https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html [interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [Sync]: https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html ### 스핀 락 (Spinlocks) 표준 라이브러리의 [Mutex]는 동기화된 내부 가변성 (interior mutability)을 제공합니다. Mutex는 접근하려는 리소스가 잠겼을 때 현재 스레드를 블로킹 (blocking) 하는 것으로 상호 배제 (mutual exclusion)를 구현합니다. 우리의 커널은 스레드 블로킹은 커녕 스레드의 개념조차 구현하지 않기에 [Mutex]를 사용할 수 없습니다. 그 대신 우리에게는 운영체제 기능이 필요 없는 원시적인 [스핀 락 (spinlock)][spinlock]이 있습니다. 스핀 락은 Mutex와 달리 스레드를 블로킹하지 않고, 리소스의 잠김이 풀릴 때까지 반복문에서 계속 리소스 취득을 시도하면서 CPU 시간을 소모합니다. [Mutex]: https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html [spinlock]: https://en.wikipedia.org/wiki/Spinlock 스핀 락을 사용하기 위해 [spin 크레이트][spin crate] 를 의존 크레이트 목록에 추가합니다: [spin crate]: https://crates.io/crates/spin ```toml # in Cargo.toml [dependencies] spin = "0.5.2" ``` 이제 스핀 락을 이용해 전역 변수 `WRITER`에 안전하게 [내부 가변성 (interior mutability)][interior mutability] 을 구현할 수 있습니다: ```rust // in src/vga_buffer.rs use spin::Mutex; ... lazy_static! { pub static ref WRITER: Mutex = Mutex::new(Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }); } ``` `print_something` 함수를 삭제하고 `_start` 함수에서 직접 메시지를 출력할 수 있습니다: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { use core::fmt::Write; vga_buffer::WRITER.lock().write_str("Hello again").unwrap(); write!(vga_buffer::WRITER.lock(), ", some numbers: {} {}", 42, 1.337).unwrap(); loop {} } ``` `fmt::Write` 트레이트를 가져와야 이 트레이트가 제공하는 함수들을 사용할 수 있습니다. ### 메모리 안전성 우리가 작성한 코드에는 unsafe 블록이 단 하나 존재합니다. 이 unsafe 블록은 주소 `0xb8000`을 가리키는 레퍼런스 `Buffer`를 초기화 하는 로직을 담기 위해 필요합니다. `Buffer`에 대한 초기화 이외 모든 작업들은 안전합니다 (메모리 안전성 측면에서). Rust는 배열의 원소에 접근하는 코드에는 인덱스 값과 배열의 길이를 비교하는 로직을 자동으로 삽입하기에, 버퍼의 정해진 공간 밖에 실수로 데이터를 쓰는 것은 불가능합니다. 타입 시스템에서 요구하는 조건들을 코드에 알맞게 구현함으로써 외부 사용자에게 안전한 인터페이스를 제공할 수 있게 되었습니다. ### println 매크로 전역 변수 `Writer`도 갖추었으니 이제 프로젝트 내 어디서든 사용할 수 있는 `println` 매크로를 추가할 수 있습니다. Rust의 [매크로 문법][macro syntax]은 다소 난해하기에, 우리에게 필요한 매크로를 밑바닥부터 작성하지는 않을 것입니다. 그 대신 표준 라이브러리의 [`println!` 매크로][`println!` macro] 구현 코드를 참조할 것입니다: [macro syntax]: https://doc.rust-lang.org/nightly/book/ch20-05-macros.html#declarative-macros-for-general-metaprogramming [`println!` macro]: https://doc.rust-lang.org/nightly/std/macro.println!.html ```rust #[macro_export] macro_rules! println { () => (print!("\n")); ($($arg:tt)*) => (print!("{}\n", format_args!($($arg)*))); } ``` 매크로는 `match`문의 여러 패턴들을 선언하듯 한 개 이상의 규칙을 통해 정의됩니다. `println` 매크로는 두 개의 규칙을 가집니다: 첫 번째 규칙은 매크로에 아무 인자도 전달되지 않았을 때 (예: `println!()`)에 적용되어 개행 문자를 출력하는 `print!("\n")` 코드를 생성합니다. 두 번째 규칙은 매크로에 여러 인자들이 주어졌을 때 적용됩니다 (예: `println!("Hello")` 혹은 `println!("Number: {}", 4)`). 두 번째 규칙은 주어진 인자들을 그대로 `print!` 매크로에 전달하고 인자 문자열 끝에 개행 문자를 추가한 코드를 생성합니다. `#[macro_export]` 속성이 적용된 매크로는 외부 크레이트 및 현재 크레이트 내 어디서든 사용 가능해집니다 (기본적으로는 매크로가 정의된 모듈 내에서만 그 매크로를 쓸 수 있습니다). 또한 이 속성이 적용된 매크로는 크레이트의 최고 상위 네임스페이스에 배치되기에, 매크로를 쓰기 위해 가져올 때 `use std::println` 대신 `use std::macros::println`을 적어야 합니다. [`print!` 매크로][`print!` macro]는 아래와 같이 정의되어 있습니다: [`print!` macro]: https://doc.rust-lang.org/nightly/std/macro.print!.html ```rust #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::io::_print(format_args!($($arg)*))); } ``` 이 매크로는 `io` 모듈의 [`print` 함수][`_print` function]를 호출하는 코드로 변환됩니다. [변수 `$crate`][`$crate` variable]가 `std`로 변환되기에 다른 크레이트에서도 이 매크로를 사용할 수 있습니다. [`format_args` 매크로][`format_args` macro]는 주어진 인자들로부터 [fmt::Arguments] 타입 오브젝트를 만들고, 이 오브젝트가 `_print` 함수에 전달됩니다. 표준 라이브러리의 [`_print` 함수][`_print` function]는 `print_to` 함수를 호출합니다. `print_to` 함수는 다양한 `Stdout` (표준 출력) 장치들을 모두 지원하기에 구현이 제법 복잡합니다. 우리는 VGA 버퍼에 출력하는 것만을 목표로 하기에 굳이 `print_to` 함수의 복잡한 구현을 가져올 필요가 없습니다. [`_print` function]: https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698 [`$crate` variable]: https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate [`format_args` macro]: https://doc.rust-lang.org/nightly/std/macro.format_args.html [fmt::Arguments]: https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html VGA 버퍼에 메시지를 출력하기 위해 `println!` 및 `print!` 매크로 구현 코드를 복사해 온 뒤 우리가 직접 정의한 `_print` 함수를 사용하도록 변경해줍니다: ```rust // in src/vga_buffer.rs #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::vga_buffer::_print(format_args!($($arg)*))); } #[macro_export] macro_rules! println { () => ($crate::print!("\n")); ($($arg:tt)*) => ($crate::print!("{}\n", format_args!($($arg)*))); } #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` 기존 `println` 구현에서 `print!` 매크로를 호출하는 코드에 우리는 `$crate` 접두어를 추가했습니다. 이로써 `println` 매크로만 사용하고 싶은 경우에 `print` 매크로를 별도로 import 하지 않아도 됩니다. 표준 라이브러리의 구현과 마찬가지로, 두 매크로에 `#[macro_export]` 속성을 추가하여 크레이트 어디에서나 사용할 수 있도록 합니다. 이 속성이 추가된 두 매크로는 크레이트의 최고 상위 네임스페이스에 배정되기에, `use crate::vga_buffer::println` 대신 `use crate::println`을 사용하여 import 합니다. `_print` 함수는 정적 변수 `WRITER`를 잠그고 `write_fmt` 함수를 호출합니다. 이 함수는 `Write` 트레이트를 통해 제공되기에, 이 트레이트를 import 해야 합니다. `write_fmt` 함수 호출 이후의 `unwrap()`으로 인해 출력이 실패할 경우 패닉이 발생합니다. 하지만 `write_str` 함수가 언제나 `Ok`를 반환하기에 패닉이 일어날 일은 없습니다. 우리의 매크로들이 모듈 밖에서 `_print` 함수를 호출할 수 있으려면 이 함수를 public 함수로 설정해야 합니다. public 함수이지만 구체적인 구현 방식은 드러나지 않도록 [`doc(hidden)` 속성][`doc(hidden)` attribute]을 추가하여 이 함수가 프로젝트 문서에 노출되지 않게 합니다. [`doc(hidden)` attribute]: https://doc.rust-lang.org/nightly/rustdoc/write-documentation/the-doc-attribute.html#hidden ### `println`을 이용해 "Hello World" 출력하기 이제 `_start` 함수에서 `println`을 사용할 수 있습니다: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() { println!("Hello World{}", "!"); loop {} } ``` `println!` 매크로가 이미 루트 네임스페이스에 배정되었기에, main 함수에서 사용하기 위해 다시 매크로를 import 할 필요가 없습니다. 예상한 대로, 화면에 _“Hello World!”_ 가 출력된 것을 확인할 수 있습니다: ![QEMU printing “Hello World!”](vga-hello-world.png) ### 패닉 메시지 출력하기 `println` 매크로를 이용하여 `panic` 함수에서도 패닉 메시지 및 패닉이 발생한 코드 위치를 출력할 수 있게 되었습니다: ```rust // in main.rs /// This function is called on panic. #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } ``` `_start` 함수에 `panic!("Some panic message")` 을 추가한 후 빌드 및 실행하면 아래와 같은 출력 내용을 확인할 수 있을 것입니다: ![QEMU printing “panicked at 'Some panic message', src/main.rs:28:5](vga-panic.png) 출력 내용을 통해 패닉 발생 여부, 패닉 메시지 그리고 패닉이 일어난 코드 위치까지도 알 수 있습니다. ## 정리 이 포스트에서는 VGA 텍스트 버퍼의 구조 및 메모리 주소 `0xb8000`로의 메모리 매핑을 통해 어떻게 VGA 텍스트 버퍼에 쓰기 작업을 할 수 있는지에 대해 다뤘습니다. 또한 메모리 매핑 된 버퍼에 대한 쓰기 기능 (안전하지 않은 작업)을 안전하고 편리한 인터페이스로 제공하는 Rust 모듈을 작성했습니다. 또한 cargo를 이용하여 의존 크레이트를 추가하는 것이 얼마나 쉬운지 직접 확인해볼 수 있었습니다. 이번 포스트에서 추가한 의존 크레이트 `lazy_static`과 `spin`은 운영체제 개발에 매우 유용하기에 이후 포스트에서도 자주 사용할 것입니다. ## 다음 단계는 무엇일까요? 다음 포스트에서는 Rust의 자체 유닛 테스트 프레임워크를 설정하는 법에 대해 설명할 것입니다. 그리고 나서 이번 포스트에서 작성한 VGA 버퍼 모듈을 위한 기본적인 유닛 테스트들을 작성할 것입니다. ================================================ FILE: blog/content/edition-2/posts/03-vga-text-buffer/index.md ================================================ +++ title = "VGA Text Mode" weight = 3 path = "vga-text-mode" date = 2018-02-26 [extra] chapter = "Bare Bones" +++ The [VGA text mode] is a simple way to print text to the screen. In this post, we create an interface that makes its usage safe and simple by encapsulating all unsafety in a separate module. We also implement support for Rust's [formatting macros]. [VGA text mode]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode [formatting macros]: https://doc.rust-lang.org/std/fmt/#related-macros This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-03`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-03 ## The VGA Text Buffer To print a character to the screen in VGA text mode, one has to write it to the text buffer of the VGA hardware. The VGA text buffer is a two-dimensional array with typically 25 rows and 80 columns, which is directly rendered to the screen. Each array entry describes a single screen character through the following format: | Bit(s) | Value | | ------ | ---------------- | | 0-7 | ASCII code point | | 8-11 | Foreground color | | 12-14 | Background color | | 15 | Blink | The first byte represents the character that should be printed in the [ASCII encoding]. To be more specific, it isn't exactly ASCII, but a character set named [_code page 437_] with some additional characters and slight modifications. For simplicity, we will proceed to call it an ASCII character in this post. [ASCII encoding]: https://en.wikipedia.org/wiki/ASCII [_code page 437_]: https://en.wikipedia.org/wiki/Code_page_437 The second byte defines how the character is displayed. The first four bits define the foreground color, the next three bits the background color, and the last bit whether the character should blink. The following colors are available: | Number | Color | Number + Bright Bit | Bright Color | | ------ | ---------- | ------------------- | ------------ | | 0x0 | Black | 0x8 | Dark Gray | | 0x1 | Blue | 0x9 | Light Blue | | 0x2 | Green | 0xa | Light Green | | 0x3 | Cyan | 0xb | Light Cyan | | 0x4 | Red | 0xc | Light Red | | 0x5 | Magenta | 0xd | Pink | | 0x6 | Brown | 0xe | Yellow | | 0x7 | Light Gray | 0xf | White | Bit 4 is the _bright bit_, which turns, for example, blue into light blue. For the background color, this bit is repurposed as the blink bit. The VGA text buffer is accessible via [memory-mapped I/O] to the address `0xb8000`. This means that reads and writes to that address don't access the RAM but directly access the text buffer on the VGA hardware. This means we can read and write it through normal memory operations to that address. [memory-mapped I/O]: https://en.wikipedia.org/wiki/Memory-mapped_I/O Note that memory-mapped hardware might not support all normal RAM operations. For example, a device could only support byte-wise reads and return junk when a `u64` is read. Fortunately, the text buffer [supports normal reads and writes], so we don't have to treat it in a special way. [supports normal reads and writes]: https://web.stanford.edu/class/cs140/projects/pintos/specs/freevga/vga/vgamem.htm#manip ## A Rust Module Now that we know how the VGA buffer works, we can create a Rust module to handle printing: ```rust // in src/main.rs mod vga_buffer; ``` For the content of this module, we create a new `src/vga_buffer.rs` file. All of the code below goes into our new module (unless specified otherwise). ### Colors First, we represent the different colors using an enum: ```rust // in src/vga_buffer.rs #[allow(dead_code)] #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u8)] pub enum Color { Black = 0, Blue = 1, Green = 2, Cyan = 3, Red = 4, Magenta = 5, Brown = 6, LightGray = 7, DarkGray = 8, LightBlue = 9, LightGreen = 10, LightCyan = 11, LightRed = 12, Pink = 13, Yellow = 14, White = 15, } ``` We use a [C-like enum] here to explicitly specify the number for each color. Because of the `repr(u8)` attribute, each enum variant is stored as a `u8`. Actually 4 bits would be sufficient, but Rust doesn't have a `u4` type. [C-like enum]: https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html Normally the compiler would issue a warning for each unused variant. By using the `#[allow(dead_code)]` attribute, we disable these warnings for the `Color` enum. By [deriving] the [`Copy`], [`Clone`], [`Debug`], [`PartialEq`], and [`Eq`] traits, we enable [copy semantics] for the type and make it printable and comparable. [deriving]: https://doc.rust-lang.org/rust-by-example/trait/derive.html [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [`Clone`]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html [`Debug`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html [`PartialEq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html [`Eq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.Eq.html [copy semantics]: https://doc.rust-lang.org/1.30.0/book/first-edition/ownership.html#copy-types To represent a full color code that specifies foreground and background color, we create a [newtype] on top of `u8`: [newtype]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(transparent)] struct ColorCode(u8); impl ColorCode { fn new(foreground: Color, background: Color) -> ColorCode { ColorCode((background as u8) << 4 | (foreground as u8)) } } ``` The `ColorCode` struct contains the full color byte, containing foreground and background color. Like before, we derive the `Copy` and `Debug` traits for it. To ensure that the `ColorCode` has the exact same data layout as a `u8`, we use the [`repr(transparent)`] attribute. [`repr(transparent)`]: https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent ### Text Buffer Now we can add structures to represent a screen character and the text buffer: ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(C)] struct ScreenChar { ascii_character: u8, color_code: ColorCode, } const BUFFER_HEIGHT: usize = 25; const BUFFER_WIDTH: usize = 80; #[repr(transparent)] struct Buffer { chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Since the field ordering in default structs is undefined in Rust, we need the [`repr(C)`] attribute. It guarantees that the struct's fields are laid out exactly like in a C struct and thus guarantees the correct field ordering. For the `Buffer` struct, we use [`repr(transparent)`] again to ensure that it has the same memory layout as its single field. [`repr(C)`]: https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc To actually write to screen, we now create a writer type: ```rust // in src/vga_buffer.rs pub struct Writer { column_position: usize, color_code: ColorCode, buffer: &'static mut Buffer, } ``` The writer will always write to the last line and shift lines up when a line is full (or on `\n`). The `column_position` field keeps track of the current position in the last row. The current foreground and background colors are specified by `color_code` and a reference to the VGA buffer is stored in `buffer`. Note that we need an [explicit lifetime] here to tell the compiler how long the reference is valid. The [`'static`] lifetime specifies that the reference is valid for the whole program run time (which is true for the VGA text buffer). [explicit lifetime]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-annotation-syntax [`'static`]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime ### Printing Now we can use the `Writer` to modify the buffer's characters. First we create a method to write a single ASCII byte: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { if self.column_position >= BUFFER_WIDTH { self.new_line(); } let row = BUFFER_HEIGHT - 1; let col = self.column_position; let color_code = self.color_code; self.buffer.chars[row][col] = ScreenChar { ascii_character: byte, color_code, }; self.column_position += 1; } } } fn new_line(&mut self) {/* TODO */} } ``` If the byte is the [newline] byte `\n`, the writer does not print anything. Instead, it calls a `new_line` method, which we'll implement later. Other bytes get printed to the screen in the second `match` case. [newline]: https://en.wikipedia.org/wiki/Newline When printing a byte, the writer checks if the current line is full. In that case, a `new_line` call is used to wrap the line. Then it writes a new `ScreenChar` to the buffer at the current position. Finally, the current column position is advanced. To print whole strings, we can convert them to bytes and print them one-by-one: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_string(&mut self, s: &str) { for byte in s.bytes() { match byte { // printable ASCII byte or newline 0x20..=0x7e | b'\n' => self.write_byte(byte), // not part of printable ASCII range _ => self.write_byte(0xfe), } } } } ``` The VGA text buffer only supports ASCII and the additional bytes of [code page 437]. Rust strings are [UTF-8] by default, so they might contain bytes that are not supported by the VGA text buffer. We use a `match` to differentiate printable ASCII bytes (a newline or anything in between a space character and a `~` character) and unprintable bytes. For unprintable bytes, we print a `■` character, which has the hex code `0xfe` on the VGA hardware. [code page 437]: https://en.wikipedia.org/wiki/Code_page_437 [UTF-8]: https://www.fileformat.info/info/unicode/utf8.htm #### Try it out! To write some characters to the screen, you can create a temporary function: ```rust // in src/vga_buffer.rs pub fn print_something() { let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello "); writer.write_string("Wörld!"); } ``` It first creates a new Writer that points to the VGA buffer at `0xb8000`. The syntax for this might seem a bit strange: First, we cast the integer `0xb8000` as a mutable [raw pointer]. Then we convert it to a mutable reference by dereferencing it (through `*`) and immediately borrowing it again (through `&mut`). This conversion requires an [`unsafe` block], since the compiler can't guarantee that the raw pointer is valid. [raw pointer]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`unsafe` block]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html Then it writes the byte `b'H'` to it. The `b` prefix creates a [byte literal], which represents an ASCII character. By writing the strings `"ello "` and `"Wörld!"`, we test our `write_string` method and the handling of unprintable characters. To see the output, we need to call the `print_something` function from our `_start` function: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { vga_buffer::print_something(); loop {} } ``` When we run our project now, a `Hello W■■rld!` should be printed in the _lower_ left corner of the screen in yellow: [byte literal]: https://doc.rust-lang.org/reference/tokens.html#byte-literals ![QEMU output with a yellow `Hello W■■rld!` in the lower left corner](vga-hello.png) Notice that the `ö` is printed as two `■` characters. That's because `ö` is represented by two bytes in [UTF-8], which both don't fall into the printable ASCII range. In fact, this is a fundamental property of UTF-8: the individual bytes of multi-byte values are never valid ASCII. ### Volatile We just saw that our message was printed correctly. However, it might not work with future Rust compilers that optimize more aggressively. The problem is that we only write to the `Buffer` and never read from it again. The compiler doesn't know that we really access VGA buffer memory (instead of normal RAM) and knows nothing about the side effect that some characters appear on the screen. So it might decide that these writes are unnecessary and can be omitted. To avoid this erroneous optimization, we need to specify these writes as _[volatile]_. This tells the compiler that the write has side effects and should not be optimized away. [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) In order to use volatile writes for the VGA buffer, we use the [volatile][volatile crate] library. This _crate_ (this is how packages are called in the Rust world) provides a `Volatile` wrapper type with `read` and `write` methods. These methods internally use the [read_volatile] and [write_volatile] functions of the core library and thus guarantee that the reads/writes are not optimized away. [volatile crate]: https://docs.rs/volatile [read_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html [write_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html We can add a dependency on the `volatile` crate by adding it to the `dependencies` section of our `Cargo.toml`: ```toml # in Cargo.toml [dependencies] volatile = "0.2.6" ``` Make sure to specify `volatile` version `0.2.6`. Newer versions of the crate are not compatible with this post. `0.2.6` is the [semantic] version number. For more information, see the [Specifying Dependencies] guide of the cargo documentation. [semantic]: https://semver.org/ [Specifying Dependencies]: https://doc.crates.io/specifying-dependencies.html Let's use it to make writes to the VGA buffer volatile. We update our `Buffer` type as follows: ```rust // in src/vga_buffer.rs use volatile::Volatile; struct Buffer { chars: [[Volatile; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Instead of a `ScreenChar`, we're now using a `Volatile`. (The `Volatile` type is [generic] and can wrap (almost) any type). This ensures that we can't accidentally write to it “normally”. Instead, we have to use the `write` method now. [generic]: https://doc.rust-lang.org/book/ch10-01-syntax.html This means that we have to update our `Writer::write_byte` method: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { ... self.buffer.chars[row][col].write(ScreenChar { ascii_character: byte, color_code, }); ... } } } ... } ``` Instead of a typical assignment using `=`, we're now using the `write` method. Now we can guarantee that the compiler will never optimize away this write. ### Formatting Macros It would be nice to support Rust's formatting macros, too. That way, we can easily print different types, like integers or floats. To support them, we need to implement the [`core::fmt::Write`] trait. The only required method of this trait is `write_str`, which looks quite similar to our `write_string` method, just with a `fmt::Result` return type: [`core::fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html ```rust // in src/vga_buffer.rs use core::fmt; impl fmt::Write for Writer { fn write_str(&mut self, s: &str) -> fmt::Result { self.write_string(s); Ok(()) } } ``` The `Ok(())` is just a `Ok` Result containing the `()` type. Now we can use Rust's built-in `write!`/`writeln!` formatting macros: ```rust // in src/vga_buffer.rs pub fn print_something() { use core::fmt::Write; let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello! "); write!(writer, "The numbers are {} and {}", 42, 1.0/3.0).unwrap(); } ``` Now you should see a `Hello! The numbers are 42 and 0.3333333333333333` at the bottom of the screen. The `write!` call returns a `Result` which causes a warning if not used, so we call the [`unwrap`] function on it, which panics if an error occurs. This isn't a problem in our case, since writes to the VGA buffer never fail. [`unwrap`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.unwrap ### Newlines Right now, we just ignore newlines and characters that don't fit into the line anymore. Instead, we want to move every character one line up (the top line gets deleted) and start at the beginning of the last line again. To do this, we add an implementation for the `new_line` method of `Writer`: ```rust // in src/vga_buffer.rs impl Writer { fn new_line(&mut self) { for row in 1..BUFFER_HEIGHT { for col in 0..BUFFER_WIDTH { let character = self.buffer.chars[row][col].read(); self.buffer.chars[row - 1][col].write(character); } } self.clear_row(BUFFER_HEIGHT - 1); self.column_position = 0; } fn clear_row(&mut self, row: usize) {/* TODO */} } ``` We iterate over all the screen characters and move each character one row up. Note that the upper bound of the range notation (`..`) is exclusive. We also omit the 0th row (the first range starts at `1`) because it's the row that is shifted off screen. To finish the newline code, we add the `clear_row` method: ```rust // in src/vga_buffer.rs impl Writer { fn clear_row(&mut self, row: usize) { let blank = ScreenChar { ascii_character: b' ', color_code: self.color_code, }; for col in 0..BUFFER_WIDTH { self.buffer.chars[row][col].write(blank); } } } ``` This method clears a row by overwriting all of its characters with a space character. ## A Global Interface To provide a global writer that can be used as an interface from other modules without carrying a `Writer` instance around, we try to create a static `WRITER`: ```rust // in src/vga_buffer.rs pub static WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; ``` However, if we try to compile it now, the following errors occur: ``` error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants --> src/vga_buffer.rs:7:17 | 7 | color_code: ColorCode::new(Color::Yellow, Color::Black), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ error[E0396]: raw pointers cannot be dereferenced in statics --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer in constant error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:13 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values ``` To understand what's happening here, we need to know that statics are initialized at compile time, in contrast to normal variables that are initialized at run time. The component of the Rust compiler that evaluates such initialization expressions is called the “[const evaluator]”. Its functionality is still limited, but there is ongoing work to expand it, for example in the “[Allow panicking in constants]” RFC. [const evaluator]: https://rustc-dev-guide.rust-lang.org/const-eval.html [Allow panicking in constants]: https://github.com/rust-lang/rfcs/pull/2345 The issue with `ColorCode::new` would be solvable by using [`const` functions], but the fundamental problem here is that Rust's const evaluator is not able to convert raw pointers to references at compile time. Maybe it will work someday, but until then, we have to find another solution. [`const` functions]: https://doc.rust-lang.org/reference/const_eval.html#const-functions ### Lazy Statics The one-time initialization of statics with non-const functions is a common problem in Rust. Fortunately, there already exists a good solution in a crate named [lazy_static]. This crate provides a `lazy_static!` macro that defines a lazily initialized `static`. Instead of computing its value at compile time, the `static` lazily initializes itself when accessed for the first time. Thus, the initialization happens at runtime, so arbitrarily complex initialization code is possible. [lazy_static]: https://docs.rs/lazy_static/1.0.1/lazy_static/ Let's add the `lazy_static` crate to our project: ```toml # in Cargo.toml [dependencies.lazy_static] version = "1.0" features = ["spin_no_std"] ``` We need the `spin_no_std` feature, since we don't link the standard library. With `lazy_static`, we can define our static `WRITER` without problems: ```rust // in src/vga_buffer.rs use lazy_static::lazy_static; lazy_static! { pub static ref WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; } ``` However, this `WRITER` is pretty useless since it is immutable. This means that we can't write anything to it (since all the write methods take `&mut self`). One possible solution would be to use a [mutable static]. But then every read and write to it would be unsafe since it could easily introduce data races and other bad things. Using `static mut` is highly discouraged. There were even proposals to [remove it][remove static mut]. But what are the alternatives? We could try to use an immutable static with a cell type like [RefCell] or even [UnsafeCell] that provides [interior mutability]. But these types aren't [Sync] \(with good reason), so we can't use them in statics. [mutable static]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable [remove static mut]: https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437 [RefCell]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#keeping-track-of-borrows-at-runtime-with-refcellt [UnsafeCell]: https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html [interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [Sync]: https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html ### Spinlocks To get synchronized interior mutability, users of the standard library can use [Mutex]. It provides mutual exclusion by blocking threads when the resource is already locked. But our basic kernel does not have any blocking support or even a concept of threads, so we can't use it either. However, there is a really basic kind of mutex in computer science that requires no operating system features: the [spinlock]. Instead of blocking, the threads simply try to lock it again and again in a tight loop, thus burning CPU time until the mutex is free again. [Mutex]: https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html [spinlock]: https://en.wikipedia.org/wiki/Spinlock To use a spinning mutex, we can add the [spin crate] as a dependency: [spin crate]: https://crates.io/crates/spin ```toml # in Cargo.toml [dependencies] spin = "0.5.2" ``` Then we can use the spinning mutex to add safe [interior mutability] to our static `WRITER`: ```rust // in src/vga_buffer.rs use spin::Mutex; ... lazy_static! { pub static ref WRITER: Mutex = Mutex::new(Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }); } ``` Now we can delete the `print_something` function and print directly from our `_start` function: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { use core::fmt::Write; vga_buffer::WRITER.lock().write_str("Hello again").unwrap(); write!(vga_buffer::WRITER.lock(), ", some numbers: {} {}", 42, 1.337).unwrap(); loop {} } ``` We need to import the `fmt::Write` trait in order to be able to use its functions. ### Safety Note that we only have a single unsafe block in our code, which is needed to create a `Buffer` reference pointing to `0xb8000`. Afterwards, all operations are safe. Rust uses bounds checking for array accesses by default, so we can't accidentally write outside the buffer. Thus, we encoded the required conditions in the type system and are able to provide a safe interface to the outside. ### A println Macro Now that we have a global writer, we can add a `println` macro that can be used from anywhere in the codebase. Rust's [macro syntax] is a bit strange, so we won't try to write a macro from scratch. Instead, we look at the source of the [`println!` macro] in the standard library: [macro syntax]: https://doc.rust-lang.org/nightly/book/ch20-05-macros.html#declarative-macros-for-general-metaprogramming [`println!` macro]: https://doc.rust-lang.org/nightly/std/macro.println!.html ```rust #[macro_export] macro_rules! println { () => (print!("\n")); ($($arg:tt)*) => (print!("{}\n", format_args!($($arg)*))); } ``` Macros are defined through one or more rules, similar to `match` arms. The `println` macro has two rules: The first rule is for invocations without arguments, e.g., `println!()`, which is expanded to `print!("\n")` and thus just prints a newline. The second rule is for invocations with parameters such as `println!("Hello")` or `println!("Number: {}", 4)`. It is also expanded to an invocation of the `print!` macro, passing all arguments and an additional newline `\n` at the end. The `#[macro_export]` attribute makes the macro available to the whole crate (not just the module it is defined in) and external crates. It also places the macro at the crate root, which means we have to import the macro through `use std::println` instead of `std::macros::println`. The [`print!` macro] is defined as: [`print!` macro]: https://doc.rust-lang.org/nightly/std/macro.print!.html ```rust #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::io::_print(format_args!($($arg)*))); } ``` The macro expands to a call of the [`_print` function] in the `io` module. The [`$crate` variable] ensures that the macro also works from outside the `std` crate by expanding to `std` when it's used in other crates. The [`format_args` macro] builds a [fmt::Arguments] type from the passed arguments, which is passed to `_print`. The [`_print` function] of libstd calls `print_to`, which is rather complicated because it supports different `Stdout` devices. We don't need that complexity since we just want to print to the VGA buffer. [`_print` function]: https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698 [`$crate` variable]: https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate [`format_args` macro]: https://doc.rust-lang.org/nightly/std/macro.format_args.html [fmt::Arguments]: https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html To print to the VGA buffer, we just copy the `println!` and `print!` macros, but modify them to use our own `_print` function: ```rust // in src/vga_buffer.rs #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::vga_buffer::_print(format_args!($($arg)*))); } #[macro_export] macro_rules! println { () => ($crate::print!("\n")); ($($arg:tt)*) => ($crate::print!("{}\n", format_args!($($arg)*))); } #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` One thing that we changed from the original `println` definition is that we prefixed the invocations of the `print!` macro with `$crate` too. This ensures that we don't need to import the `print!` macro too if we only want to use `println`. Like in the standard library, we add the `#[macro_export]` attribute to both macros to make them available everywhere in our crate. Note that this places the macros in the root namespace of the crate, so importing them via `use crate::vga_buffer::println` does not work. Instead, we have to do `use crate::println`. The `_print` function locks our static `WRITER` and calls the `write_fmt` method on it. This method is from the `Write` trait, which we need to import. The additional `unwrap()` at the end panics if printing isn't successful. But since we always return `Ok` in `write_str`, that should not happen. Since the macros need to be able to call `_print` from outside of the module, the function needs to be public. However, since we consider this a private implementation detail, we add the [`doc(hidden)` attribute] to hide it from the generated documentation. [`doc(hidden)` attribute]: https://doc.rust-lang.org/nightly/rustdoc/write-documentation/the-doc-attribute.html#hidden ### Hello World using `println` Now we can use `println` in our `_start` function: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); loop {} } ``` Note that we don't have to import the macro in the main function, because it already lives in the root namespace. As expected, we now see a _“Hello World!”_ on the screen: ![QEMU printing “Hello World!”](vga-hello-world.png) ### Printing Panic Messages Now that we have a `println` macro, we can use it in our panic function to print the panic message and the location of the panic: ```rust // in main.rs /// This function is called on panic. #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } ``` When we now insert `panic!("Some panic message");` in our `_start` function, we get the following output: ![QEMU printing “panicked at 'Some panic message', src/main.rs:28:5](vga-panic.png) So we know not only that a panic has occurred, but also the panic message and where in the code it happened. ## Summary In this post, we learned about the structure of the VGA text buffer and how it can be written through the memory mapping at address `0xb8000`. We created a Rust module that encapsulates the unsafety of writing to this memory-mapped buffer and presents a safe and convenient interface to the outside. Thanks to cargo, we also saw how easy it is to add dependencies on third-party libraries. The two dependencies that we added, `lazy_static` and `spin`, are very useful in OS development and we will use them in more places in future posts. ## What's next? The next post explains how to set up Rust's built-in unit test framework. We will then create some basic unit tests for the VGA buffer module from this post. ================================================ FILE: blog/content/edition-2/posts/03-vga-text-buffer/index.pt-BR.md ================================================ +++ title = "Modo de Texto VGA" weight = 3 path = "pt-BR/vga-text-mode" date = 2018-02-26 [extra] chapter = "O Básico" # Please update this when updating the translation translation_based_on_commit = "9753695744854686a6b80012c89b0d850a44b4b0" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ O [modo de texto VGA] é uma maneira simples de imprimir texto na tela. Neste post, criamos uma interface que torna seu uso seguro e simples ao encapsular toda a unsafety em um módulo separado. Também implementamos suporte para as [macros de formatação] do Rust. [modo de texto VGA]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode [macros de formatação]: https://doc.rust-lang.org/std/fmt/#related-macros Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na branch [`post-03`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-03 ## O Buffer de Texto VGA Para imprimir um caractere na tela em modo de texto VGA, é preciso escrevê-lo no buffer de texto do hardware VGA. O buffer de texto VGA é um array bidimensional com tipicamente 25 linhas e 80 colunas, que é renderizado diretamente na tela. Cada entrada do array descreve um único caractere da tela através do seguinte formato: | Bit(s) | Valor | | ------ | --------------------- | | 0-7 | Ponto de código ASCII | | 8-11 | Cor do primeiro plano | | 12-14 | Cor do fundo | | 15 | Piscar | O primeiro byte representa o caractere que deve ser impresso na [codificação ASCII]. Para ser mais específico, não é exatamente ASCII, mas um conjunto de caracteres chamado [_página de código 437_] com alguns caracteres adicionais e pequenas modificações. Para simplificar, continuaremos chamando-o de caractere ASCII neste post. [codificação ASCII]: https://en.wikipedia.org/wiki/ASCII [_página de código 437_]: https://en.wikipedia.org/wiki/Code_page_437 O segundo byte define como o caractere é exibido. Os primeiros quatro bits definem a cor do primeiro plano, os próximos três bits a cor do fundo, e o último bit se o caractere deve piscar. As seguintes cores estão disponíveis: | Número | Cor | Número + Bit Brilhante | Cor Brilhante | | ------ | ----------- | ---------------------- | -------------- | | 0x0 | Preto | 0x8 | Cinza Escuro | | 0x1 | Azul | 0x9 | Azul Claro | | 0x2 | Verde | 0xa | Verde Claro | | 0x3 | Ciano | 0xb | Ciano Claro | | 0x4 | Vermelho | 0xc | Vermelho Claro | | 0x5 | Magenta | 0xd | Rosa | | 0x6 | Marrom | 0xe | Amarelo | | 0x7 | Cinza Claro | 0xf | Branco | O bit 4 é o _bit brilhante_, que transforma, por exemplo, azul em azul claro. Para a cor de fundo, este bit é reaproveitado como o bit de piscar. O buffer de texto VGA é acessível via [I/O mapeado em memória] no endereço `0xb8000`. Isso significa que leituras e escritas naquele endereço não acessam a RAM, mas acessam diretamente o buffer de texto no hardware VGA. Isso significa que podemos lê-lo e escrevê-lo através de operações normais de memória naquele endereço. [I/O mapeado em memória]: https://en.wikipedia.org/wiki/Memory-mapped_I/O Note que hardware mapeado em memória pode não suportar todas as operações normais de RAM. Por exemplo, um dispositivo poderia suportar apenas leituras byte a byte e retornar lixo quando um `u64` é lido. Felizmente, o buffer de texto [suporta leituras e escritas normais], então não precisamos tratá-lo de maneira especial. [suporta leituras e escritas normais]: https://web.stanford.edu/class/cs140/projects/pintos/specs/freevga/vga/vgamem.htm#manip ## Um Módulo Rust Agora que sabemos como o buffer VGA funciona, podemos criar um módulo Rust para lidar com a impressão: ```rust // em src/main.rs mod vga_buffer; ``` Para o conteúdo deste módulo, criamos um novo arquivo `src/vga_buffer.rs`. Todo o código abaixo vai para nosso novo módulo (a menos que especificado o contrário). ### Cores Primeiro, representamos as diferentes cores usando um enum: ```rust // em src/vga_buffer.rs #[allow(dead_code)] #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u8)] pub enum Color { Black = 0, Blue = 1, Green = 2, Cyan = 3, Red = 4, Magenta = 5, Brown = 6, LightGray = 7, DarkGray = 8, LightBlue = 9, LightGreen = 10, LightCyan = 11, LightRed = 12, Pink = 13, Yellow = 14, White = 15, } ``` Usamos um [enum estilo C] aqui para especificar explicitamente o número para cada cor. Por causa do atributo `repr(u8)`, cada variante do enum é armazenada como um `u8`. Na verdade, 4 bits seriam suficientes, mas Rust não tem um tipo `u4`. [enum estilo C]: https://doc.rust-lang.org/rust-by-example/custom_types/enum/c_like.html Normalmente o compilador emitiria um aviso para cada variante não utilizada. Ao usar o atributo `#[allow(dead_code)]`, desabilitamos esses avisos para o enum `Color`. Ao [derivar] as traits [`Copy`], [`Clone`], [`Debug`], [`PartialEq`] e [`Eq`], habilitamos [semântica de cópia] para o tipo e o tornamos imprimível e comparável. [derivar]: https://doc.rust-lang.org/rust-by-example/trait/derive.html [`Copy`]: https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html [`Clone`]: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html [`Debug`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html [`PartialEq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html [`Eq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.Eq.html [semântica de cópia]: https://doc.rust-lang.org/1.30.0/book/first-edition/ownership.html#copy-types Para representar um código de cor completo que especifica as cores de primeiro plano e de fundo, criamos um [newtype] em cima de `u8`: [newtype]: https://doc.rust-lang.org/rust-by-example/generics/new_types.html ```rust // em src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(transparent)] struct ColorCode(u8); impl ColorCode { fn new(foreground: Color, background: Color) -> ColorCode { ColorCode((background as u8) << 4 | (foreground as u8)) } } ``` A struct `ColorCode` contém o byte de cor completo, contendo as cores de primeiro plano e de fundo. Como antes, derivamos as traits `Copy` e `Debug` para ela. Para garantir que o `ColorCode` tenha exatamente o mesmo layout de dados que um `u8`, usamos o atributo [`repr(transparent)`]. [`repr(transparent)`]: https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent ### Buffer de Texto Agora podemos adicionar estruturas para representar um caractere da tela e o buffer de texto: ```rust // em src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(C)] struct ScreenChar { ascii_character: u8, color_code: ColorCode, } const BUFFER_HEIGHT: usize = 25; const BUFFER_WIDTH: usize = 80; #[repr(transparent)] struct Buffer { chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Como a ordenação dos campos em structs padrão é indefinida em Rust, precisamos do atributo [`repr(C)`]. Ele garante que os campos da struct sejam dispostos exatamente como em uma struct C e, portanto, garante a ordenação correta dos campos. Para a struct `Buffer`, usamos [`repr(transparent)`] novamente para garantir que ela tenha o mesmo layout de memória que seu único campo. [`repr(C)`]: https://doc.rust-lang.org/nightly/nomicon/other-reprs.html#reprc Para realmente escrever na tela, agora criamos um tipo writer: ```rust // em src/vga_buffer.rs pub struct Writer { column_position: usize, color_code: ColorCode, buffer: &'static mut Buffer, } ``` O writer sempre escreverá na última linha e deslocará as linhas para cima quando uma linha estiver cheia (ou no `\n`). O campo `column_position` acompanha a posição atual na última linha. As cores de primeiro plano e de fundo atuais são especificadas por `color_code` e uma referência ao buffer VGA é armazenada em `buffer`. Note que precisamos de um [lifetime explícito] aqui para dizer ao compilador por quanto tempo a referência é válida. O lifetime [`'static`] especifica que a referência é válida durante toda a execução do programa (o que é verdade para o buffer de texto VGA). [lifetime explícito]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-annotation-syntax [`'static`]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime ### Impressão Agora podemos usar o `Writer` para modificar os caracteres do buffer. Primeiro criamos um método para escrever um único byte ASCII: ```rust // em src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { if self.column_position >= BUFFER_WIDTH { self.new_line(); } let row = BUFFER_HEIGHT - 1; let col = self.column_position; let color_code = self.color_code; self.buffer.chars[row][col] = ScreenChar { ascii_character: byte, color_code, }; self.column_position += 1; } } } fn new_line(&mut self) {/* TODO */} } ``` Se o byte é o byte de [newline] `\n`, o writer não imprime nada. Em vez disso, ele chama um método `new_line`, que implementaremos mais tarde. Outros bytes são impressos na tela no segundo caso `match`. [newline]: https://en.wikipedia.org/wiki/Newline Ao imprimir um byte, o writer verifica se a linha atual está cheia. Nesse caso, uma chamada `new_line` é usada para quebrar a linha. Então ele escreve um novo `ScreenChar` no buffer na posição atual. Finalmente, a posição da coluna atual é avançada. Para imprimir strings inteiras, podemos convertê-las em bytes e imprimi-los um por um: ```rust // em src/vga_buffer.rs impl Writer { pub fn write_string(&mut self, s: &str) { for byte in s.bytes() { match byte { // byte ASCII imprimível ou newline 0x20..=0x7e | b'\n' => self.write_byte(byte), // não faz parte da faixa ASCII imprimível _ => self.write_byte(0xfe), } } } } ``` O buffer de texto VGA suporta apenas ASCII e os bytes adicionais da [página de código 437]. Strings Rust são [UTF-8] por padrão, então podem conter bytes que não são suportados pelo buffer de texto VGA. Usamos um `match` para diferenciar bytes ASCII imprimíveis (um newline ou qualquer coisa entre um caractere de espaço e um caractere `~`) e bytes não imprimíveis. Para bytes não imprimíveis, imprimimos um caractere `■`, que tem o código hexadecimal `0xfe` no hardware VGA. [página de código 437]: https://en.wikipedia.org/wiki/Code_page_437 [UTF-8]: https://www.fileformat.info/info/unicode/utf8.htm #### Experimente! Para escrever alguns caracteres na tela, você pode criar uma função temporária: ```rust // em src/vga_buffer.rs pub fn print_something() { let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello "); writer.write_string("Wörld!"); } ``` Primeiro ele cria um novo Writer que aponta para o buffer VGA em `0xb8000`. A sintaxe para isso pode parecer um pouco estranha: Primeiro, convertemos o inteiro `0xb8000` como um [ponteiro bruto] mutável. Então o convertemos em uma referência mutável ao desreferenciá-lo (através de `*`) e imediatamente emprestar novamente (através de `&mut`). Esta conversão requer um [bloco `unsafe`], pois o compilador não pode garantir que o ponteiro bruto é válido. [ponteiro bruto]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [bloco `unsafe`]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html Então ele escreve o byte `b'H'` nele. O prefixo `b` cria um [byte literal], que representa um caractere ASCII. Ao escrever as strings `"ello "` e `"Wörld!"`, testamos nosso método `write_string` e o tratamento de caracteres não imprimíveis. Para ver a saída, precisamos chamar a função `print_something` da nossa função `_start`: ```rust // em src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { vga_buffer::print_something(); loop {} } ``` Quando executamos nosso projeto agora, um `Hello W■■rld!` deve ser impresso no canto inferior _esquerdo_ da tela em amarelo: [byte literal]: https://doc.rust-lang.org/reference/tokens.html#byte-literals ![QEMU exibindo um `Hello W■■rld!` amarelo no canto inferior esquerdo](vga-hello.png) Note que o `ö` é impresso como dois caracteres `■`. Isso ocorre porque `ö` é representado por dois bytes em [UTF-8], que ambos não se enquadram na faixa ASCII imprimível. Na verdade, esta é uma propriedade fundamental do UTF-8: os bytes individuais de valores multi-byte nunca são ASCII válido. ### Volatile Acabamos de ver que nossa mensagem foi impressa corretamente. No entanto, pode não funcionar com futuros compiladores Rust que otimizam de forma mais agressiva. O problema é que escrevemos apenas no `Buffer` e nunca lemos dele novamente. O compilador não sabe que realmente acessamos memória do buffer VGA (em vez de RAM normal) e não sabe nada sobre o efeito colateral de que alguns caracteres aparecem na tela. Então ele pode decidir que essas escritas são desnecessárias e podem ser omitidas. Para evitar esta otimização errônea, precisamos especificar essas escritas como _[volatile]_. Isso diz ao compilador que a escrita tem efeitos colaterais e não deve ser otimizada. [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) Para usar escritas volatile para o buffer VGA, usamos a biblioteca [volatile][volatile crate]. Esta _crate_ (é assim que os pacotes são chamados no mundo Rust) fornece um tipo wrapper `Volatile` com métodos `read` e `write`. Esses métodos usam internamente as funções [read_volatile] e [write_volatile] da biblioteca core e, portanto, garantem que as leituras/escritas não sejam otimizadas. [volatile crate]: https://docs.rs/volatile [read_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html [write_volatile]: https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html Podemos adicionar uma dependência na crate `volatile` adicionando-a à seção `dependencies` do nosso `Cargo.toml`: ```toml # em Cargo.toml [dependencies] volatile = "0.2.6" ``` Certifique-se de especificar a versão `0.2.6` do `volatile`. Versões mais novas da crate não são compatíveis com este post. `0.2.6` é o número de versão [semântico]. Para mais informações, veja o guia [Specifying Dependencies] da documentação do cargo. [semântico]: https://semver.org/ [Specifying Dependencies]: https://doc.crates.io/specifying-dependencies.html Vamos usá-lo para tornar as escritas no buffer VGA volatile. Atualizamos nosso tipo `Buffer` da seguinte forma: ```rust // em src/vga_buffer.rs use volatile::Volatile; struct Buffer { chars: [[Volatile; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` Em vez de um `ScreenChar`, agora estamos usando um `Volatile`. (O tipo `Volatile` é [genérico] e pode envolver (quase) qualquer tipo). Isso garante que não possamos escrever nele "normalmente" acidentalmente. Em vez disso, temos que usar o método `write` agora. [genérico]: https://doc.rust-lang.org/book/ch10-01-syntax.html Isso significa que temos que atualizar nosso método `Writer::write_byte`: ```rust // em src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { ... self.buffer.chars[row][col].write(ScreenChar { ascii_character: byte, color_code, }); ... } } } ... } ``` Em vez de uma atribuição típica usando `=`, agora estamos usando o método `write`. Agora podemos garantir que o compilador nunca otimizará esta escrita. ### Macros de Formatação Seria bom suportar as macros de formatação do Rust também. Dessa forma, podemos facilmente imprimir diferentes tipos, como inteiros ou floats. Para suportá-las, precisamos implementar a trait [`core::fmt::Write`]. O único método necessário desta trait é `write_str`, que se parece bastante com nosso método `write_string`, apenas com um tipo de retorno `fmt::Result`: [`core::fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html ```rust // em src/vga_buffer.rs use core::fmt; impl fmt::Write for Writer { fn write_str(&mut self, s: &str) -> fmt::Result { self.write_string(s); Ok(()) } } ``` O `Ok(())` é apenas um Result `Ok` contendo o tipo `()`. Agora podemos usar as macros de formatação `write!`/`writeln!` embutidas do Rust: ```rust // em src/vga_buffer.rs pub fn print_something() { use core::fmt::Write; let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello! "); write!(writer, "The numbers are {} and {}", 42, 1.0/3.0).unwrap(); } ``` Agora você deve ver um `Hello! The numbers are 42 and 0.3333333333333333` na parte inferior da tela. A chamada `write!` retorna um `Result` que causa um aviso se não usado, então chamamos a função [`unwrap`] nele, que entra em panic se ocorrer um erro. Isso não é um problema no nosso caso, pois escritas no buffer VGA nunca falham. [`unwrap`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.unwrap ### Newlines Agora, simplesmente ignoramos newlines e caracteres que não cabem mais na linha. Em vez disso, queremos mover cada caractere uma linha para cima (a linha superior é excluída) e começar no início da última linha novamente. Para fazer isso, adicionamos uma implementação para o método `new_line` do `Writer`: ```rust // em src/vga_buffer.rs impl Writer { fn new_line(&mut self) { for row in 1..BUFFER_HEIGHT { for col in 0..BUFFER_WIDTH { let character = self.buffer.chars[row][col].read(); self.buffer.chars[row - 1][col].write(character); } } self.clear_row(BUFFER_HEIGHT - 1); self.column_position = 0; } fn clear_row(&mut self, row: usize) {/* TODO */} } ``` Iteramos sobre todos os caracteres da tela e movemos cada caractere uma linha para cima. Note que o limite superior da notação de intervalo (`..`) é exclusivo. Também omitimos a linha 0 (o primeiro intervalo começa em `1`) porque é a linha que é deslocada para fora da tela. Para finalizar o código de newline, adicionamos o método `clear_row`: ```rust // em src/vga_buffer.rs impl Writer { fn clear_row(&mut self, row: usize) { let blank = ScreenChar { ascii_character: b' ', color_code: self.color_code, }; for col in 0..BUFFER_WIDTH { self.buffer.chars[row][col].write(blank); } } } ``` Este método limpa uma linha sobrescrevendo todos os seus caracteres com um caractere de espaço. ## Uma Interface Global Para fornecer um writer global que possa ser usado como uma interface de outros módulos sem carregar uma instância `Writer` por aí, tentamos criar um `WRITER` static: ```rust // em src/vga_buffer.rs pub static WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; ``` No entanto, se tentarmos compilá-lo agora, os seguintes erros ocorrem: ``` error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants --> src/vga_buffer.rs:7:17 | 7 | color_code: ColorCode::new(Color::Yellow, Color::Black), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ error[E0396]: raw pointers cannot be dereferenced in statics --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer in constant error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:13 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values ``` Para entender o que está acontecendo aqui, precisamos saber que statics são inicializados em tempo de compilação, ao contrário de variáveis normais que são inicializadas em tempo de execução. O componente do compilador Rust que avalia tais expressões de inicialização é chamado de "[const evaluator]". Sua funcionalidade ainda é limitada, mas há trabalho contínuo para expandi-la, por exemplo no RFC "[Allow panicking in constants]". [const evaluator]: https://rustc-dev-guide.rust-lang.org/const-eval.html [Allow panicking in constants]: https://github.com/rust-lang/rfcs/pull/2345 O problema com `ColorCode::new` seria solucionável usando [funções `const`], mas o problema fundamental aqui é que o const evaluator do Rust não é capaz de converter ponteiros brutos em referências em tempo de compilação. Talvez funcione algum dia, mas até lá, precisamos encontrar outra solução. [funções `const`]: https://doc.rust-lang.org/reference/const_eval.html#const-functions ### Lazy Statics A inicialização única de statics com funções não const é um problema comum em Rust. Felizmente, já existe uma boa solução em uma crate chamada [lazy_static]. Esta crate fornece uma macro `lazy_static!` que define um `static` inicializado lazily. Em vez de calcular seu valor em tempo de compilação, o `static` se inicializa lazily quando é acessado pela primeira vez. Assim, a inicialização acontece em tempo de execução, então código de inicialização arbitrariamente complexo é possível. [lazy_static]: https://docs.rs/lazy_static/1.0.1/lazy_static/ Vamos adicionar a crate `lazy_static` ao nosso projeto: ```toml # em Cargo.toml [dependencies.lazy_static] version = "1.0" features = ["spin_no_std"] ``` Precisamos do recurso `spin_no_std`, já que não vinculamos a biblioteca padrão. Com `lazy_static`, podemos definir nosso `WRITER` static sem problemas: ```rust // em src/vga_buffer.rs use lazy_static::lazy_static; lazy_static! { pub static ref WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; } ``` No entanto, este `WRITER` é praticamente inútil, pois é imutável. Isso significa que não podemos escrever nada nele (já que todos os métodos de escrita recebem `&mut self`). Uma solução possível seria usar um [static mutável]. Mas então cada leitura e escrita nele seria unsafe, pois poderia facilmente introduzir data races e outras coisas ruins. Usar `static mut` é altamente desencorajado. Até houve propostas para [removê-lo][remove static mut]. Mas quais são as alternativas? Poderíamos tentar usar um static imutável com um tipo de célula como [RefCell] ou até [UnsafeCell] que fornece [mutabilidade interior]. Mas esses tipos não são [Sync] \(com boa razão), então não podemos usá-los em statics. [static mutável]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable [remove static mut]: https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437 [RefCell]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#keeping-track-of-borrows-at-runtime-with-refcellt [UnsafeCell]: https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html [mutabilidade interior]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [Sync]: https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html ### Spinlocks Para obter mutabilidade interior sincronizada, usuários da biblioteca padrão podem usar [Mutex]. Ele fornece exclusão mútua bloqueando threads quando o recurso já está bloqueado. Mas nosso kernel básico não tem nenhum suporte de bloqueio ou mesmo um conceito de threads, então também não podemos usá-lo. No entanto, há um tipo realmente básico de mutex na ciência da computação que não requer nenhum recurso de sistema operacional: o [spinlock]. Em vez de bloquear, as threads simplesmente tentam bloqueá-lo novamente e novamente em um loop apertado, queimando assim tempo de CPU até que o mutex esteja livre novamente. [Mutex]: https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html [spinlock]: https://en.wikipedia.org/wiki/Spinlock Para usar um spinning mutex, podemos adicionar a [crate spin] como uma dependência: [crate spin]: https://crates.io/crates/spin ```toml # em Cargo.toml [dependencies] spin = "0.5.2" ``` Então podemos usar o spinning mutex para adicionar [mutabilidade interior] segura ao nosso `WRITER` static: ```rust // em src/vga_buffer.rs use spin::Mutex; ... lazy_static! { pub static ref WRITER: Mutex = Mutex::new(Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }); } ``` Agora podemos deletar a função `print_something` e imprimir diretamente da nossa função `_start`: ```rust // em src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { use core::fmt::Write; vga_buffer::WRITER.lock().write_str("Hello again").unwrap(); write!(vga_buffer::WRITER.lock(), ", some numbers: {} {}", 42, 1.337).unwrap(); loop {} } ``` Precisamos importar a trait `fmt::Write` para poder usar suas funções. ### Segurança Note que temos apenas um bloco unsafe no nosso código, que é necessário para criar uma referência `Buffer` apontando para `0xb8000`. Depois disso, todas as operações são seguras. Rust usa verificação de limites para acessos a arrays por padrão, então não podemos escrever acidentalmente fora do buffer. Assim, codificamos as condições necessárias no sistema de tipos e somos capazes de fornecer uma interface segura para o exterior. ### Uma Macro println Agora que temos um writer global, podemos adicionar uma macro `println` que pode ser usada de qualquer lugar na base de código. A [sintaxe de macro] do Rust é um pouco estranha, então não tentaremos escrever uma macro do zero. Em vez disso, olhamos para o código-fonte da [macro `println!`] na biblioteca padrão: [sintaxe de macro]: https://doc.rust-lang.org/nightly/book/ch20-05-macros.html#declarative-macros-with-macro_rules-for-general-metaprogramming [macro `println!`]: https://doc.rust-lang.org/nightly/std/macro.println!.html ```rust #[macro_export] macro_rules! println { () => (print!("\n")); ($($arg:tt)*) => (print!("{}\n", format_args!($($arg)*))); } ``` Macros são definidas através de uma ou mais regras, semelhantes a braços `match`. A macro `println` tem duas regras: A primeira regra é para invocações sem argumentos, por exemplo, `println!()`, que é expandida para `print!("\n")` e, portanto, apenas imprime um newline. A segunda regra é para invocações com parâmetros como `println!("Hello")` ou `println!("Number: {}", 4)`. Ela também é expandida para uma invocação da macro `print!`, passando todos os argumentos e um newline `\n` adicional no final. O atributo `#[macro_export]` torna a macro disponível para toda a crate (não apenas o módulo em que é definida) e crates externas. Ele também coloca a macro na raiz da crate, o que significa que temos que importar a macro através de `use std::println` em vez de `std::macros::println`. A [macro `print!`] é definida como: [macro `print!`]: https://doc.rust-lang.org/nightly/std/macro.print!.html ```rust #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::io::_print(format_args!($($arg)*))); } ``` A macro se expande para uma chamada da [função `_print`] no módulo `io`. A [variável `$crate`] garante que a macro também funcione de fora da crate `std` ao se expandir para `std` quando é usada em outras crates. A [macro `format_args`] constrói um tipo [fmt::Arguments] dos argumentos passados, que é passado para `_print`. A [função `_print`] da libstd chama `print_to`, que é bastante complicado porque suporta diferentes dispositivos `Stdout`. Não precisamos dessa complexidade, pois só queremos imprimir no buffer VGA. [função `_print`]: https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698 [variável `$crate`]: https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate [macro `format_args`]: https://doc.rust-lang.org/nightly/std/macro.format_args.html [fmt::Arguments]: https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html Para imprimir no buffer VGA, apenas copiamos as macros `println!` e `print!`, mas as modificamos para usar nossa própria função `_print`: ```rust // em src/vga_buffer.rs #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::vga_buffer::_print(format_args!($($arg)*))); } #[macro_export] macro_rules! println { () => ($crate::print!("\n")); ($($arg:tt)*) => ($crate::print!("{}\n", format_args!($($arg)*))); } #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` Uma coisa que mudamos da definição original de `println` é que também prefixamos as invocações da macro `print!` com `$crate`. Isso garante que não precisamos importar a macro `print!` também se quisermos usar apenas `println`. Como na biblioteca padrão, adicionamos o atributo `#[macro_export]` a ambas as macros para torná-las disponíveis em todo lugar na nossa crate. Note que isso coloca as macros no namespace raiz da crate, então importá-las via `use crate::vga_buffer::println` não funciona. Em vez disso, temos que fazer `use crate::println`. A função `_print` bloqueia nosso `WRITER` static e chama o método `write_fmt` nele. Este método é da trait `Write`, que precisamos importar. O `unwrap()` adicional no final entra em panic se a impressão não for bem-sucedida. Mas como sempre retornamos `Ok` em `write_str`, isso não deve acontecer. Como as macros precisam ser capazes de chamar `_print` de fora do módulo, a função precisa ser pública. No entanto, como consideramos isso um detalhe de implementação privado, adicionamos o [atributo `doc(hidden)`] para ocultá-la da documentação gerada. [atributo `doc(hidden)`]: https://doc.rust-lang.org/nightly/rustdoc/write-documentation/the-doc-attribute.html#hidden ### Hello World usando `println` Agora podemos usar `println` na nossa função `_start`: ```rust // em src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); loop {} } ``` Note que não precisamos importar a macro na função main, porque ela já vive no namespace raiz. Como esperado, agora vemos um _"Hello World!"_ na tela: ![QEMU imprimindo "Hello World!"](vga-hello-world.png) ### Imprimindo Mensagens de Panic Agora que temos uma macro `println`, podemos usá-la na nossa função panic para imprimir a mensagem de panic e a localização do panic: ```rust // em main.rs /// Esta função é chamada em caso de pânico. #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } ``` Quando agora inserimos `panic!("Some panic message");` na nossa função `_start`, obtemos a seguinte saída: ![QEMU imprimindo "panicked at 'Some panic message', src/main.rs:28:5](vga-panic.png) Então sabemos não apenas que um panic ocorreu, mas também a mensagem de panic e onde no código aconteceu. ## Resumo Neste post, aprendemos sobre a estrutura do buffer de texto VGA e como ele pode ser escrito através do mapeamento de memória no endereço `0xb8000`. Criamos um módulo Rust que encapsula a unsafety de escrever neste buffer mapeado em memória e apresenta uma interface segura e conveniente para o exterior. Graças ao cargo, também vimos como é fácil adicionar dependências em bibliotecas de terceiros. As duas dependências que adicionamos, `lazy_static` e `spin`, são muito úteis no desenvolvimento de SO e as usaremos em mais lugares em posts futuros. ## O que vem a seguir? O próximo post explica como configurar o framework de testes unitários embutido do Rust. Criaremos então alguns testes unitários básicos para o módulo de buffer VGA deste post. ================================================ FILE: blog/content/edition-2/posts/03-vga-text-buffer/index.zh-CN.md ================================================ +++ title = "VGA 字符模式" weight = 3 path = "zh-CN/vga-text-mode" date = 2018-02-26 [extra] # Please update this when updating the translation translation_based_on_commit = "bd6fbcb1c36705b2c474d7fcee387bfea1210851" # GitHub usernames of the people that translated this post translators = ["luojia65", "Rustin-Liu"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["liuyuran"] +++ **VGA 字符模式**([VGA text mode])是打印字符到屏幕的一种简单方式。在这篇文章中,为了包装这个模式为一个安全而简单的接口,我们将包装 unsafe 代码到独立的模块。我们还将实现对 Rust 语言**格式化宏**([formatting macros])的支持。 [VGA text mode]: https://en.wikipedia.org/wiki/VGA-compatible_text_mode [formatting macros]: https://doc.rust-lang.org/std/fmt/#related-macros 此博客在 [GitHub] 上公开开发. 如果您有任何问题或疑问,请在此处打开一个 issue。 您也可以在[底部][at the bottom]发表评论. 这篇文章的完整源代码可以在 [`post-03`] [post branch] 分支中找到。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-03 ## VGA 字符缓冲区 为了在 VGA 字符模式中向屏幕打印字符,我们必须将它写入硬件提供的 **VGA 字符缓冲区**(VGA text buffer)。通常状况下,VGA 字符缓冲区是一个 25 行、80 列的二维数组,它的内容将被实时渲染到屏幕。这个数组的元素被称作**字符单元**(character cell),它使用下面的格式描述一个屏幕上的字符: | Bit(s) | Value | | ------ | ---------------- | | 0-7 | ASCII code point | | 8-11 | Foreground color | | 12-14 | Background color | | 15 | Blink | 第一个字节表示了应当输出的 [ASCII 编码][ASCII encoding],更加准确的说,类似于 [437 字符编码表][_code page 437_] 中字符对应的编码,但又有细微的不同。 这里为了简化表达,我们在文章里将其简称为ASCII字符。 [ASCII encoding]: https://en.wikipedia.org/wiki/ASCII [_code page 437_]: https://en.wikipedia.org/wiki/Code_page_437 第二个字节则定义了字符的显示方式,前四个比特定义了前景色,中间三个比特定义了背景色,最后一个比特则定义了该字符是否应该闪烁,以下是可用的颜色列表: | Number | Color | Number + Bright Bit | Bright Color | | ------ | ---------- | ------------------- | ------------ | | 0x0 | Black | 0x8 | Dark Gray | | 0x1 | Blue | 0x9 | Light Blue | | 0x2 | Green | 0xa | Light Green | | 0x3 | Cyan | 0xb | Light Cyan | | 0x4 | Red | 0xc | Light Red | | 0x5 | Magenta | 0xd | Pink | | 0x6 | Brown | 0xe | Yellow | | 0x7 | Light Gray | 0xf | White | 每个颜色的第四位称为**加亮位**(bright bit),比如blue加亮后就变成了light blue,但对于背景色,这个比特会被用于标记是否闪烁。 要修改 VGA 字符缓冲区,我们可以通过**存储器映射输入输出**([memory-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O))的方式,读取或写入地址 `0xb8000`;这意味着,我们可以像操作普通的内存区域一样操作这个地址。 需要注意的是,一些硬件虽然映射到存储器,但可能不会完全支持所有的内存操作:可能会有一些设备支持按 `u8` 字节读取,但在读取 `u64` 时返回无效的数据。幸运的是,字符缓冲区都[支持标准的读写操作](https://web.stanford.edu/class/cs140/projects/pintos/specs/freevga/vga/vgamem.htm#manip),所以我们不需要用特殊的标准对待它。 ## 包装到 Rust 模块 既然我们已经知道 VGA 文字缓冲区如何工作,也是时候创建一个 Rust 模块来处理文字打印了。我们输入这样的代码: ```rust // in src/main.rs mod vga_buffer; ``` 我们的模块暂时不需要添加子模块,所以我们将它创建为 `src/vga_buffer.rs` 文件。除非另有说明,本文中的代码都保存到这个文件中。 ### 颜色 首先,我们使用 Rust 的**枚举**(enum)表示特定的颜色: ```rust // in src/vga_buffer.rs #[allow(dead_code)] #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u8)] pub enum Color { Black = 0, Blue = 1, Green = 2, Cyan = 3, Red = 4, Magenta = 5, Brown = 6, LightGray = 7, DarkGray = 8, LightBlue = 9, LightGreen = 10, LightCyan = 11, LightRed = 12, Pink = 13, Yellow = 14, White = 15, } ``` 我们使用**类似于 C 语言的枚举**(C-like enum),为每个颜色明确指定一个数字。在这里,每个用 `repr(u8)` 注记标注的枚举类型,都会以一个 `u8` 的形式存储——事实上 4 个二进制位就足够了,但 Rust 语言并不提供 `u4` 类型。 通常来说,编译器会对每个未使用的变量发出**警告**(warning);使用 `#[allow(dead_code)]`,我们可以对 `Color` 枚举类型禁用这个警告。 我们还**生成**([derive])了 [`Copy`](https://doc.rust-lang.org/nightly/core/marker/trait.Copy.html)、[`Clone`](https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html)、[`Debug`](https://doc.rust-lang.org/nightly/core/fmt/trait.Debug.html)、[`PartialEq`](https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html) 和 [`Eq`](https://doc.rust-lang.org/nightly/core/cmp/trait.Eq.html) 这几个 trait:这让我们的类型遵循**复制语义**([copy semantics]),也让它可以被比较、被调试和打印。 [derive]: https://doc.rust-lang.org/rust-by-example/trait/derive.html [copy semantics]: https://doc.rust-lang.org/1.30.0/book/first-edition/ownership.html#copy-types 为了描述包含前景色和背景色的、完整的**颜色代码**(color code),我们基于 `u8` 创建一个新类型: ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(transparent)] struct ColorCode(u8); impl ColorCode { fn new(foreground: Color, background: Color) -> ColorCode { ColorCode((background as u8) << 4 | (foreground as u8)) } } ``` 这里,`ColorCode` 类型包装了一个完整的颜色代码字节,它包含前景色和背景色信息。和 `Color` 类型类似,我们为它生成 `Copy` 和 `Debug` 等一系列 trait。为了确保 `ColorCode` 和 `u8` 有完全相同的内存布局,我们添加 [repr(transparent) 标记](https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent)。 ### 字符缓冲区 现在,我们可以添加更多的结构体,来描述屏幕上的字符和整个字符缓冲区: ```rust // in src/vga_buffer.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(C)] struct ScreenChar { ascii_character: u8, color_code: ColorCode, } const BUFFER_HEIGHT: usize = 25; const BUFFER_WIDTH: usize = 80; #[repr(transparent)] struct Buffer { chars: [[ScreenChar; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` 在内存布局层面,Rust 并不保证按顺序布局成员变量。因此,我们需要使用 `#[repr(C)]` 标记结构体;这将按 C 语言约定的顺序布局它的成员变量,让我们能正确地映射内存片段。对 `Buffer` 类型,我们再次使用 `repr(transparent)`,来确保类型和它的单个成员有相同的内存布局。 为了输出字符到屏幕,我们来创建一个 `Writer` 类型: ```rust // in src/vga_buffer.rs pub struct Writer { column_position: usize, color_code: ColorCode, buffer: &'static mut Buffer, } ``` 我们将让这个 `Writer` 类型将字符写入屏幕的最后一行,并在一行写满或接收到换行符 `\n` 的时候,将所有的字符向上位移一行。`column_position` 变量将跟踪光标在最后一行的位置。当前字符的前景和背景色将由 `color_code` 变量指定;另外,我们存入一个 VGA 字符缓冲区的可变借用到`buffer`变量中。需要注意的是,这里我们对借用使用**显式生命周期**([explicit lifetime](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#lifetime-annotation-syntax)),告诉编译器这个借用在何时有效:我们使用 `'static` 生命周期(['static lifetime](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime)),意味着这个借用应该在整个程序的运行期间有效;这对一个全局有效的 VGA 字符缓冲区来说,是非常合理的。 ### 打印字符 现在我们可以使用 `Writer` 类型来更改缓冲区内的字符了。首先,为了写入一个 ASCII 码字节,我们创建这样的函数: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { if self.column_position >= BUFFER_WIDTH { self.new_line(); } let row = BUFFER_HEIGHT - 1; let col = self.column_position; let color_code = self.color_code; self.buffer.chars[row][col] = ScreenChar { ascii_character: byte, color_code, }; self.column_position += 1; } } } fn new_line(&mut self) {/* TODO */} } ``` 如果这个字节是一个**换行符**([line feed](https://en.wikipedia.org/wiki/Newline))字节 `\n`,我们的 `Writer` 不应该打印新字符,相反,它将调用我们稍后会实现的 `new_line` 方法;其它的字节应该将在 `match` 语句的第二个分支中被打印到屏幕上。 当打印字节时,`Writer` 将检查当前行是否已满。如果已满,它将首先调用 `new_line` 方法来将这一行字向上提升,再将一个新的 `ScreenChar` 写入到缓冲区,最终将当前的光标位置前进一位。 要打印整个字符串,我们把它转换为字节并依次输出: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_string(&mut self, s: &str) { for byte in s.bytes() { match byte { // 可以是能打印的 ASCII 码字节,也可以是换行符 0x20..=0x7e | b'\n' => self.write_byte(byte), // 不包含在上述范围之内的字节 _ => self.write_byte(0xfe), } } } } ``` VGA 字符缓冲区只支持 ASCII 码字节和**代码页 437**([Code page 437](https://en.wikipedia.org/wiki/Code_page_437))定义的字节。Rust 语言的字符串默认编码为 [UTF-8](https://www.fileformat.info/info/unicode/utf8.htm),也因此可能包含一些 VGA 字符缓冲区不支持的字节:我们使用 `match` 语句,来区别可打印的 ASCII 码或换行字节,和其它不可打印的字节。对每个不可打印的字节,我们打印一个 `■` 符号;这个符号在 VGA 硬件中被编码为十六进制的 `0xfe`。 我们可以亲自试一试已经编写的代码。为了这样做,我们可以临时编写一个函数: ```rust // in src/vga_buffer.rs pub fn print_something() { let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello "); writer.write_string("Wörld!"); } ``` 这个函数首先创建一个指向 `0xb8000` 地址VGA缓冲区的 `Writer`。实现这一点,我们需要编写的代码可能看起来有点奇怪:首先,我们把整数 `0xb8000` 强制转换为一个可变的**裸指针**([raw pointer](https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer));之后,通过运算符`*`,我们将这个裸指针解引用;最后,我们再通过 `&mut`,再次获得它的可变借用。这些转换需要 **`unsafe` 语句块**([unsafe block](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html)),因为编译器并不能保证这个裸指针是有效的。 然后它将字节 `b'H'` 写入缓冲区内. 前缀 `b` 创建了一个字节常量([byte literal](https://doc.rust-lang.org/reference/tokens.html#byte-literals)),表示单个 ASCII 码字符;通过尝试写入 `"ello "` 和 `"Wörld!"`,我们可以测试 `write_string` 方法和其后对无法打印字符的处理逻辑。为了观察输出,我们需要在 `_start` 函数中调用 `print_something` 方法: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { vga_buffer::print_something(); loop {} } ``` 编译运行后,黄色的 `Hello W■■rld!` 字符串将会被打印在屏幕的左下角: ![QEMU output with a yellow Hello W■■rld! in the lower left corner](https://os.phil-opp.com/vga-text-mode/vga-hello.png) 需要注意的是,`ö` 字符被打印为两个 `■` 字符。这是因为在 [UTF-8](https://www.fileformat.info/info/unicode/utf8.htm) 编码下,字符 `ö` 是由两个字节表述的——而这两个字节并不处在可打印的 ASCII 码字节范围之内。事实上,这是 UTF-8 编码的基本特点之一:**如果一个字符占用多个字节,那么每个组成它的独立字节都不是有效的 ASCII 码字节**(the individual bytes of multi-byte values are never valid ASCII)。 ### 易失操作 我们刚才看到,自己想要输出的信息被正确地打印到屏幕上。然而,未来 Rust 编译器更暴力的优化可能让这段代码不按预期工作。 产生问题的原因在于,我们只向 `Buffer` 写入,却不再从它读出数据。此时,编译器不知道我们事实上已经在操作 VGA 缓冲区内存,而不是在操作普通的 RAM——因此也不知道产生的**副效应**(side effect),即会有几个字符显示在屏幕上。这时,编译器也许会认为这些写入操作都没有必要,甚至会选择忽略这些操作!所以,为了避免这些并不正确的优化,这些写入操作应当被指定为[易失操作](https://en.wikipedia.org/wiki/Volatile_(computer_programming))。这将告诉编译器,这些写入可能会产生副效应,不应该被优化掉。 为了在我们的 VGA 缓冲区中使用易失的写入操作,我们使用 [volatile](https://docs.rs/volatile) 库。这个**包**(crate)提供一个名为 `Volatile` 的**包装类型**(wrapping type)和它的 `read`、`write` 方法;这些方法包装了 `core::ptr` 内的 [read_volatile](https://doc.rust-lang.org/nightly/core/ptr/fn.read_volatile.html) 和 [write_volatile](https://doc.rust-lang.org/nightly/core/ptr/fn.write_volatile.html) 函数,从而保证读操作或写操作不会被编译器优化。 要添加 `volatile` 包为项目的**依赖项**(dependency),我们可以在 `Cargo.toml` 文件的 `dependencies` 中添加下面的代码: ```toml # in Cargo.toml [dependencies] volatile = "0.2.6" ``` `0.2.6` 表示一个**语义版本号**([semantic version number](https://semver.org/)),在 cargo 文档的[《指定依赖项》章节](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html)可以找到与它相关的使用指南。 现在,我们使用它来完成 VGA 缓冲区的 volatile 写入操作。我们将 `Buffer` 类型的定义修改为下列代码: ```rust // in src/vga_buffer.rs use volatile::Volatile; struct Buffer { chars: [[Volatile; BUFFER_WIDTH]; BUFFER_HEIGHT], } ``` 在这里,我们不使用 `ScreenChar` ,而选择使用 `Volatile` ——在这里,`Volatile` 类型是一个**泛型**([generic](https://doc.rust-lang.org/book/ch10-01-syntax.html)),可以包装几乎所有的类型——这确保了我们不会通过普通的写入操作,意外地向它写入数据;我们转而使用提供的 `write` 方法。 这意味着,我们必须要修改我们的 `Writer::write_byte` 方法: ```rust // in src/vga_buffer.rs impl Writer { pub fn write_byte(&mut self, byte: u8) { match byte { b'\n' => self.new_line(), byte => { ... self.buffer.chars[row][col].write(ScreenChar { ascii_character: byte, color_code: color_code, }); ... } } } ... } ``` 正如代码所示,我们不再使用普通的 `=` 赋值,而使用了 `write` 方法:这能确保编译器不再优化这个写入操作。 ### 格式化宏 支持 Rust 提供的**格式化宏**(formatting macros)也是一个很好的思路。通过这种途径,我们可以轻松地打印不同类型的变量,如整数或浮点数。为了支持它们,我们需要实现 [`core::fmt::Write`](https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html) trait;要实现它,唯一需要提供的方法是 `write_str`,它和我们先前编写的 `write_string` 方法差别不大,只是返回值类型变成了 `fmt::Result`: ```rust // in src/vga_buffer.rs use core::fmt; impl fmt::Write for Writer { fn write_str(&mut self, s: &str) -> fmt::Result { self.write_string(s); Ok(()) } } ``` 这里,`Ok(())` 属于 `Result` 枚举类型中的 `Ok`,包含一个值为 `()` 的变量。 现在我们就可以使用 Rust 内置的格式化宏 `write!` 和 `writeln!` 了: ```rust // in src/vga_buffer.rs pub fn print_something() { use core::fmt::Write; let mut writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; writer.write_byte(b'H'); writer.write_string("ello! "); write!(writer, "The numbers are {} and {}", 42, 1.0/3.0).unwrap(); } ``` 现在,你应该在屏幕下端看到一串 `Hello! The numbers are 42 and 0.3333333333333333`。`write!` 宏返回的 `Result` 类型必须被使用,所以我们调用它的 [`unwrap`](https://doc.rust-lang.org/core/result/enum.Result.html#method.unwrap) 方法,它将在错误发生时 panic。这里的情况下应该不会发生这样的问题,因为写入 VGA 字符缓冲区并没有可能失败。 ### 换行 在之前的代码中,我们忽略了换行符,因此没有处理超出一行字符的情况。当换行时,我们想要把每个字符向上移动一行——此时最顶上的一行将被删除——然后在最后一行的起始位置继续打印。要做到这一点,我们要为 `Writer` 实现一个新的 `new_line` 方法: ```rust // in src/vga_buffer.rs impl Writer { fn new_line(&mut self) { for row in 1..BUFFER_HEIGHT { for col in 0..BUFFER_WIDTH { let character = self.buffer.chars[row][col].read(); self.buffer.chars[row - 1][col].write(character); } } self.clear_row(BUFFER_HEIGHT - 1); self.column_position = 0; } fn clear_row(&mut self, row: usize) {/* TODO */} } ``` 我们遍历每个屏幕上的字符,把每个字符移动到它上方一行的相应位置。这里,`..` 符号是**区间标号**(range notation)的一种;它表示左闭右开的区间,因此不包含它的上界。在外层的枚举中,我们从第 1 行开始,省略了对第 0 行的枚举过程——因为这一行应该被移出屏幕,即它将被下一行的字符覆写。 所以我们实现的 `clear_row` 方法代码如下: ```rust // in src/vga_buffer.rs impl Writer { fn clear_row(&mut self, row: usize) { let blank = ScreenChar { ascii_character: b' ', color_code: self.color_code, }; for col in 0..BUFFER_WIDTH { self.buffer.chars[row][col].write(blank); } } } ``` 通过向对应的缓冲区写入空格字符,这个方法能清空一整行的字符位置。 ## 全局接口 编写其它模块时,我们希望无需随时拥有 `Writer` 实例,便能使用它的方法。我们尝试创建一个静态的 `WRITER` 变量: ```rust // in src/vga_buffer.rs pub static WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; ``` 我们尝试编译这些代码,却发生了下面的编译错误: ``` error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants --> src/vga_buffer.rs:7:17 | 7 | color_code: ColorCode::new(Color::Yellow, Color::Black), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ error[E0396]: raw pointers cannot be dereferenced in statics --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer in constant error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:22 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values error[E0017]: references in statics may only refer to immutable values --> src/vga_buffer.rs:8:13 | 8 | buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics require immutable values ``` 为了明白现在发生了什么,我们需要知道一点:一般的变量在运行时初始化,而静态变量在编译时初始化。Rust编译器规定了一个称为**常量求值器**([const evaluator](https://rustc-dev-guide.rust-lang.org/const-eval.html))的组件,它应该在编译时处理这样的初始化工作。虽然它目前的功能较为有限,但对它的扩展工作进展活跃,比如允许在常量中 panic 的[一篇 RFC 文档](https://github.com/rust-lang/rfcs/pull/2345)。 关于 `ColorCode::new` 的问题应该能使用**常函数**([`const` functions](https://doc.rust-lang.org/reference/const_eval.html#const-functions))解决,但常量求值器还存在不完善之处,它还不能在编译时直接转换裸指针到变量的引用——也许未来这段代码能够工作,但在那之前,我们需要寻找另外的解决方案。 ### 延迟初始化 使用非常函数初始化静态变量是 Rust 程序员普遍遇到的问题。幸运的是,有一个叫做 [lazy_static](https://docs.rs/lazy_static/1.0.1/lazy_static/) 的包提供了一个很棒的解决方案:它提供了名为 `lazy_static!` 的宏,定义了一个**延迟初始化**(lazily initialized)的静态变量;这个变量的值将在第一次使用时计算,而非在编译时计算。这时,变量的初始化过程将在运行时执行,任意的初始化代码——无论简单或复杂——都是能够使用的。 现在,我们将 `lazy_static` 包导入到我们的项目: ```toml # in Cargo.toml [dependencies.lazy_static] version = "1.0" features = ["spin_no_std"] ``` 在这里,由于程序不连接标准库,我们需要启用 `spin_no_std` 特性。 使用 `lazy_static` 我们就可以定义一个不出问题的 `WRITER` 变量: ```rust // in src/vga_buffer.rs use lazy_static::lazy_static; lazy_static! { pub static ref WRITER: Writer = Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }; } ``` 然而,这个 `WRITER` 可能没有什么用途,因为它目前还是**不可变变量**(immutable variable):这意味着我们无法向它写入数据,因为所有与写入数据相关的方法都需要实例的可变引用 `&mut self`。一种解决方案是使用**可变静态**([mutable static](https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable))的变量,但所有对它的读写操作都被规定为不安全的(unsafe)操作,因为这很容易导致数据竞争或发生其它不好的事情——使用 `static mut` 极其不被赞成,甚至有一些提案认为[应该将它删除](https://internals.rust-lang.org/t/pre-rfc-remove-static-mut/1437)。也有其它的替代方案,比如可以尝试使用比如 [RefCell](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#keeping-track-of-borrows-at-runtime-with-refcellt) 或甚至 [UnsafeCell](https://doc.rust-lang.org/nightly/core/cell/struct.UnsafeCell.html) 等类型提供的**内部可变性**([interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html));但这些类型都被设计为非同步类型,即不满足 [Sync](https://doc.rust-lang.org/nightly/core/marker/trait.Sync.html) 约束,所以我们不能在静态变量中使用它们。 ### spinlock 要定义同步的内部可变性,我们往往使用标准库提供的互斥锁类 [Mutex](https://doc.rust-lang.org/nightly/std/sync/struct.Mutex.html),它通过提供当资源被占用时将线程**阻塞**(block)的**互斥条件**(mutual exclusion)实现这一点;但我们初步的内核代码还没有线程和阻塞的概念,我们将不能使用这个类。不过,我们还有一种较为基础的互斥锁实现方式——**自旋锁**([spinlock](https://en.wikipedia.org/wiki/Spinlock))。自旋锁并不会调用阻塞逻辑,而是在一个小的无限循环中反复尝试获得这个锁,也因此会一直占用 CPU 时间,直到互斥锁被它的占用者释放。 为了使用自旋互斥锁,我们添加 [spin包](https://crates.io/crates/spin) 到项目的依赖项列表: ```toml # in Cargo.toml [dependencies] spin = "0.5.2" ``` 现在,我们能够使用自旋的互斥锁,为我们的 `WRITER` 类实现安全的[内部可变性](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html): ```rust // in src/vga_buffer.rs use spin::Mutex; ... lazy_static! { pub static ref WRITER: Mutex = Mutex::new(Writer { column_position: 0, color_code: ColorCode::new(Color::Yellow, Color::Black), buffer: unsafe { &mut *(0xb8000 as *mut Buffer) }, }); } ``` 现在我们可以删除 `print_something` 函数,尝试直接在 `_start` 函数中打印字符: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { use core::fmt::Write; vga_buffer::WRITER.lock().write_str("Hello again").unwrap(); write!(vga_buffer::WRITER.lock(), ", some numbers: {} {}", 42, 1.337).unwrap(); loop {} } ``` 在这里,我们需要导入名为 `fmt::Write` 的 trait,来使用实现它的类的相应方法。 ### 安全性 经过上面的努力后,我们现在的代码只剩一个 unsafe 语句块,它用于创建一个指向 `0xb8000` 地址的 `Buffer` 类型引用;在这步之后,所有的操作都是安全的。Rust 将为每个数组访问检查边界,所以我们不会在不经意间越界到缓冲区之外。因此,我们把需要的条件编码到 Rust 的类型系统,这之后,我们为外界提供的接口就符合内存安全原则了。 ### `println!` 宏 现在我们有了一个全局的 `Writer` 实例,我们就可以基于它实现 `println!` 宏,这样它就能被任意地方的代码使用了。Rust 提供的[宏定义语法](https://doc.rust-lang.org/nightly/book/ch20-05-macros.html#declarative-macros-for-general-metaprogramming)需要时间理解,所以我们将不从零开始编写这个宏。我们先看看标准库中 [`println!` 宏的实现源码](https://doc.rust-lang.org/nightly/std/macro.println!.html): ```rust #[macro_export] macro_rules! println { () => (print!("\n")); ($($arg:tt)*) => (print!("{}\n", format_args!($($arg)*))); } ``` 宏是通过一个或多个**规则**(rule)定义的,这就像 `match` 语句的多个分支。`println!` 宏有两个规则:第一个规则不要求传入参数——就比如 `println!()` ——它将被扩展为 `print!("\n")`,因此只会打印一个新行;第二个要求传入参数——好比 `println!("Rust 能够编写操作系统")` 或 `println!("我学习 Rust 已经{}年了", 3)`——它将使用 `print!` 宏扩展,传入它需求的所有参数,并在输出的字符串最后加入一个换行符 `\n`。 这里,`#[macro_export]` 属性让整个包(crate)和基于它的包都能访问这个宏,而不仅限于定义它的模块(module)。它还将把宏置于包的根模块(crate root)下,这意味着比如我们需要通过 `use std::println` 来导入这个宏,而不是通过 `std::macros::println`。 [`print!` 宏](https://doc.rust-lang.org/nightly/std/macro.print!.html)是这样定义的: ``` #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::io::_print(format_args!($($arg)*))); } ``` 这个宏将扩展为一个对 `io` 模块中 [`_print` 函数](https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698)的调用。[`$crate` 变量](https://doc.rust-lang.org/1.30.0/book/first-edition/macros.html#the-variable-crate)将在 `std` 包之外被解析为 `std` 包,保证整个宏在 `std` 包之外也可以使用。 [`format_args!` 宏](https://doc.rust-lang.org/nightly/std/macro.format_args.html)将传入的参数搭建为一个 [fmt::Arguments](https://doc.rust-lang.org/nightly/core/fmt/struct.Arguments.html) 类型,这个类型将被传入 `_print` 函数。`std` 包中的 [`_print` 函数](https://github.com/rust-lang/rust/blob/29f5c699b11a6a148f097f82eaa05202f8799bbc/src/libstd/io/stdio.rs#L698)将调用复杂的私有函数 `print_to`,来处理对不同 `Stdout` 设备的支持。我们不需要编写这样的复杂函数,因为我们只需要打印到 VGA 字符缓冲区。 要打印到字符缓冲区,我们把 `println!` 和 `print!` 两个宏复制过来,但修改部分代码,让这些宏使用我们定义的 `_print` 函数: ```rust // in src/vga_buffer.rs #[macro_export] macro_rules! print { ($($arg:tt)*) => ($crate::vga_buffer::_print(format_args!($($arg)*))); } #[macro_export] macro_rules! println { () => ($crate::print!("\n")); ($($arg:tt)*) => ($crate::print!("{}\n", format_args!($($arg)*))); } #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` 我们首先修改了 `println!` 宏,在每个使用的 `print!` 宏前面添加了 `$crate` 变量。这样我们在只需要使用 `println!` 时,不必也编写代码导入 `print!` 宏。 就像标准库做的那样,我们为两个宏都添加了 `#[macro_export]` 属性,这样在包的其它地方也可以使用它们。需要注意的是,这将占用包的**根命名空间**(root namespace),所以我们不能通过 `use crate::vga_buffer::println` 来导入它们;我们应该使用 `use crate::println`。 另外,`_print` 函数将占有静态变量 `WRITER` 的锁,并调用它的 `write_fmt` 方法。这个方法是从名为 `Write` 的 trait 中获得的,所以我们需要导入这个 trait。额外的 `unwrap()` 函数将在打印不成功的时候 panic;但既然我们的 `write_str` 总是返回 `Ok`,这种情况不应该发生。 如果这个宏将能在模块外访问,它们也应当能访问 `_print` 函数,因此这个函数必须是公有的(public)。然而,考虑到这是一个私有的实现细节,我们添加一个 [`doc(hidden)` 属性](https://doc.rust-lang.org/nightly/rustdoc/write-documentation/the-doc-attribute.html#hidden),防止它在生成的文档中出现。 ### 使用 `println!` 的 Hello World 现在,我们可以在 `_start` 里使用 `println!` 了: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() { println!("Hello World{}", "!"); loop {} } ``` 要注意的是,我们在入口函数中不需要导入这个宏——因为它已经被置于包的根命名空间了。 运行这段代码,和我们预料的一样,一个 *“Hello World!”* 字符串被打印到了屏幕上: ![QEMU printing “Hello World!”](https://os.phil-opp.com/vga-text-mode/vga-hello-world.png) ### 打印 panic 信息 既然我们已经有了 `println!` 宏,我们可以在 panic 处理函数中,使用它打印 panic 信息和 panic 产生的位置: ```rust // in main.rs /// 这个函数将在 panic 发生时被调用 #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } ``` 当我们在 `_start` 函数中插入一行 `panic!("Some panic message");` 后,我们得到了这样的输出: ![QEMU printing “panicked at 'Some panic message', src/main.rs:28:5](https://os.phil-opp.com/vga-text-mode/vga-panic.png) 所以,现在我们不仅能知道 panic 已经发生,还能够知道 panic 信息和产生 panic 的代码。 ## 小结 这篇文章中,我们学习了 VGA 字符缓冲区的结构,以及如何在 `0xb8000` 的内存映射地址访问它。我们将所有的不安全操作包装为一个 Rust 模块,以便在外界安全地访问它。 我们也发现了——感谢便于使用的 cargo——在 Rust 中使用第三方提供的包是及其容易的。我们添加的两个依赖项,`lazy_static` 和 `spin`,都在操作系统开发中及其有用;我们将在未来的文章中多次使用它们。 ## 下篇预告 下一篇文章中,我们将会讲述如何配置 Rust 内置的单元测试框架。我们还将为本文编写的 VGA 缓冲区模块添加基础的单元测试项目。 ================================================ FILE: blog/content/edition-2/posts/04-testing/index.es.md ================================================ +++ title = "Pruebas" weight = 4 path = "es/testing" date = 2019-04-27 [extra] chapter = "Fundamentos" comments_search_term = 1009 # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ Esta publicación explora las pruebas unitarias e integración en ejecutables `no_std`. Utilizaremos el soporte de Rust para marcos de prueba personalizados para ejecutar funciones de prueba dentro de nuestro núcleo. Para reportar los resultados fuera de QEMU, utilizaremos diferentes características de QEMU y la herramienta `bootimage`. Este blog se desarrolla de manera abierta en [GitHub]. Si tienes algún problema o pregunta, por favor abre un problema allí. También puedes dejar comentarios [en la parte inferior]. El código fuente completo de esta publicación se puede encontrar en la rama [`post-04`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [en la parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-04 ## Requisitos Esta publicación reemplaza las publicaciones (_Pruebas Unitarias_) y (_Pruebas de Integración_) (ahora obsoletas). Se asume que has seguido la publicación (_Un Núcleo Rust Mínimo_) después del 2019-04-27. Principalmente, requiere que tengas un archivo `.cargo/config.toml` que [establezca un objetivo predeterminado] y [defina un ejecutable de runner]. [_Pruebas Unitarias_]: @/edition-2/posts/deprecated/04-unit-testing/index.md [_Pruebas de Integración_]: @/edition-2/posts/deprecated/05-integration-tests/index.md [_Un Núcleo Rust Mínimo_]: @/edition-2/posts/02-minimal-rust-kernel/index.md [establezca un objetivo predeterminado]: @/edition-2/posts/02-minimal-rust-kernel/index.md#set-a-default-target [defina un ejecutable de runner]: @/edition-2/posts/02-minimal-rust-kernel/index.md#using-cargo-run ## Pruebas en Rust Rust tiene un [marco de prueba incorporado] que es capaz de ejecutar pruebas unitarias sin la necesidad de configurar nada. Solo crea una función que verifique algunos resultados mediante afirmaciones y añade el atributo `#[test]` al encabezado de la función. Luego, `cargo test` encontrará y ejecutará automáticamente todas las funciones de prueba de tu crate. [marco de prueba incorporado]: https://doc.rust-lang.org/book/ch11-00-testing.html Desafortunadamente, es un poco más complicado para aplicaciones `no_std` como nuestro núcleo. El problema es que el marco de prueba de Rust utiliza implícitamente la biblioteca incorporada [`test`], que depende de la biblioteca estándar. Esto significa que no podemos usar el marco de prueba predeterminado para nuestro núcleo `#[no_std]`. [`test`]: https://doc.rust-lang.org/test/index.html Podemos ver esto cuando intentamos ejecutar `cargo test` en nuestro proyecto: ``` > cargo test Compiling blog_os v0.1.0 (/…/blog_os) error[E0463]: can't find crate for `test` ``` Dado que el crate `test` depende de la biblioteca estándar, no está disponible para nuestro objetivo de metal desnudo. Si bien portar el crate `test` a un contexto `#[no_std]` [es posible][utest], es altamente inestable y requiere algunos hacks, como redefinir el macro `panic`. [utest]: https://github.com/japaric/utest ### Marcos de Prueba Personalizados Afortunadamente, Rust soporta reemplazar el marco de prueba predeterminado a través de la característica inestable [`custom_test_frameworks`]. Esta característica no requiere bibliotecas externas y, por lo tanto, también funciona en entornos `#[no_std]`. Funciona recopilando todas las funciones anotadas con un atributo `#[test_case]` y luego invocando una función runner especificada por el usuario con la lista de pruebas como argumento. Así, proporciona a la implementación un control máximo sobre el proceso de prueba. [`custom_test_frameworks`]: https://doc.rust-lang.org/unstable-book/language-features/custom-test-frameworks.html La desventaja en comparación con el marco de prueba predeterminado es que muchas características avanzadas, como las pruebas [`should_panic`], no están disponibles. En su lugar, depende de la implementación proporcionar tales características sí es necesario. Esto es ideal para nosotros ya que tenemos un entorno de ejecución muy especial en el que las implementaciones predeterminadas de tales características avanzadas probablemente no funcionarían de todos modos. Por ejemplo, el atributo `#[should_panic]` depende de desenrollar la pila para capturar los pánicos, lo cual hemos deshabilitado para nuestro núcleo. [`should_panic`]: https://doc.rust-lang.org/book/ch11-01-writing-tests.html#checking-for-panics-with-should_panic Para implementar un marco de prueba personalizado para nuestro núcleo, añadimos lo siguiente a nuestro `main.rs`: ```rust // en src/main.rs #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #[cfg(test)] pub fn test_runner(tests: &[&dyn Fn()]) { println!("Ejecutando {} pruebas", tests.len()); for test in tests { test(); } } ``` Nuestro runner solo imprime un breve mensaje de depuración y luego llama a cada función de prueba en la lista. El tipo de argumento `&[&dyn Fn()]` es un [_slice_] de referencias de [_trait object_] del trait [_Fn()_]. Es básicamente una lista de referencias a tipos que pueden ser llamados como una función. Dado que la función es inútil para ejecuciones que no son de prueba, usamos el atributo `#[cfg(test)]` para incluirlo solo para pruebas. [_slice_]: https://doc.rust-lang.org/std/primitive.slice.html [_trait object_]: https://doc.rust-lang.org/1.30.0/book/first-edition/trait-objects.html [_Fn()_]: https://doc.rust-lang.org/std/ops/trait.Fn.html Cuando ejecutamos `cargo test` ahora, vemos que ahora tiene éxito (si no lo tiene, consulta la nota a continuación). Sin embargo, todavía vemos nuestro "¡Hola Mundo!" en lugar del mensaje de nuestro `test_runner`. La razón es que nuestra función `_start` todavía se utiliza como punto de entrada. La característica de marcos de prueba personalizados genera una función `main` que llama a `test_runner`, pero esta función se ignora porque usamos el atributo `#[no_main]` y proporcionamos nuestra propia entrada.
    **Nota:** Actualmente hay un error en cargo que conduce a errores de "elemento lang duplicado" en `cargo test` en algunos casos. Ocurre cuando has establecido `panic = "abort"` para un perfil en tu `Cargo.toml`. Intenta eliminarlo, luego `cargo test` debería funcionar. Alternativamente, si eso no funciona, añade `panic-abort-tests = true` a la sección `[unstable]` de tu archivo `.cargo/config.toml`. Consulta el [problema de cargo](https://github.com/rust-lang/cargo/issues/7359) para más información sobre esto.
    Para solucionarlo, primero necesitamos cambiar el nombre de la función generada a algo diferente de `main` mediante el atributo `reexport_test_harness_main`. Luego podemos llamar a la función renombrada desde nuestra función `_start`: ```rust // en src/main.rs #![reexport_test_harness_main = "test_main"] #[no_mangle] pub extern "C" fn _start() -> ! { println!("¡Hola Mundo{}!", "!"); #[cfg(test)] test_main(); loop {} } ``` Establecemos el nombre de la función de entrada del marco de prueba en `test_main` y la llamamos desde nuestro punto de entrada `_start`. Usamos [compilación condicional] para añadir la llamada a `test_main` solo en contextos de prueba porque la función no se genera en una ejecución normal. Cuando ejecutamos `cargo test` ahora, vemos el mensaje "Ejecutando 0 pruebas" en la pantalla. Ahora estamos listos para crear nuestra primera función de prueba: ```rust // en src/main.rs #[test_case] fn trivial_assertion() { print!("aserción trivial... "); assert_eq!(1, 1); println!("[ok]"); } ``` Cuando ejecutamos `cargo test` ahora, vemos la siguiente salida: ![QEMU imprimiendo "¡Hola Mundo!", "Ejecutando 1 pruebas" y "aserción trivial... [ok]"](qemu-test-runner-output.png) El slice `tests` pasado a nuestra función `test_runner` ahora contiene una referencia a la función `trivial_assertion`. A partir de la salida `aserción trivial... [ok]` en la pantalla, vemos que la prueba fue llamada y que tuvo éxito. Después de ejecutar las pruebas, nuestro `test_runner` regresa a la función `test_main`, que a su vez regresa a nuestra función de entrada `_start`. Al final de `_start`, entramos en un bucle infinito porque la función de entrada no puede retornar. Este es un problema, porque queremos que `cargo test` salga después de ejecutar todas las pruebas. ## Salida de QEMU En este momento, tenemos un bucle infinito al final de nuestra función `_start` y necesitamos cerrar QEMU manualmente en cada ejecución de `cargo test`. Esto es desafortunado porque también queremos ejecutar `cargo test` en scripts sin interacción del usuario. La solución limpia a esto sería implementar una forma adecuada de apagar nuestro OS. Desafortunadamente, esto es relativamente complejo porque requiere implementar soporte para el estándar de gestión de energía [APM] o [ACPI]. [APM]: https://wiki.osdev.org/APM [ACPI]: https://wiki.osdev.org/ACPI Afortunadamente, hay una salida: QEMU soporta un dispositivo especial `isa-debug-exit`, que proporciona una forma fácil de salir de QEMU desde el sistema invitado. Para habilitarlo, necesitamos pasar un argumento `-device` a QEMU. Podemos hacerlo añadiendo una clave de configuración `package.metadata.bootimage.test-args` en nuestro `Cargo.toml`: ```toml # en Cargo.toml [package.metadata.bootimage] test-args = ["-device", "isa-debug-exit,iobase=0xf4,iosize=0x04"] ``` La aplicación `bootimage runner` agrega los `test-args` al comando predeterminado de QEMU para todos los ejecutables de prueba. Para un `cargo run` normal, los argumentos se ignoran. Junto con el nombre del dispositivo (`isa-debug-exit`), pasamos los dos parámetros `iobase` y `iosize` que especifican el _puerto de E/S_ a través del cual se puede alcanzar el dispositivo desde nuestro núcleo. ### Puertos de E/S Hay dos enfoques diferentes para comunicar entre la CPU y el hardware periférico en x86, **E/S mapeada en memoria** y **E/S mapeada en puerto**. Ya hemos utilizado E/S mapeada en memoria para acceder al [buffer de texto VGA] a través de la dirección de memoria `0xb8000`. Esta dirección no está mapeada a RAM, sino a alguna memoria en el dispositivo VGA. [buffer de texto VGA]: @/edition-2/posts/03-vga-text-buffer/index.md En contraste, la E/S mapeada en puerto utiliza un bus de E/S separado para la comunicación. Cada periférico conectado tiene uno o más números de puerto. Para comunicarse con dicho puerto de E/S, existen instrucciones especiales de la CPU llamadas `in` y `out`, que toman un número de puerto y un byte de datos (también hay variaciones de estos comandos que permiten enviar un `u16` o `u32`). El dispositivo `isa-debug-exit` utiliza E/S mapeada en puerto. El parámetro `iobase` especifica en qué dirección de puerto debe residir el dispositivo (`0xf4` es un puerto [generalmente no utilizado][list of x86 I/O ports] en el bus de E/S de x86) y el `iosize` especifica el tamaño de puerto (`0x04` significa cuatro bytes). [list of x86 I/O ports]: https://wiki.osdev.org/I/O_Ports#The_list ### Usando el Dispositivo de Salida La funcionalidad del dispositivo `isa-debug-exit` es muy simple. Cuando se escribe un `valor` en el puerto de E/S especificado por `iobase`, provoca que QEMU salga con un [código de salida] `(valor << 1) | 1`. Por lo tanto, cuando escribimos `0` en el puerto, QEMU saldrá con un código de salida `(0 << 1) | 1 = 1`, y cuando escribimos `1` en el puerto, saldrá con un código de salida `(1 << 1) | 1 = 3`. [código de salida]: https://en.wikipedia.org/wiki/Exit_status En lugar de invocar manualmente las instrucciones de ensamblaje `in` y `out`, utilizamos las abstracciones provistas por la crate [`x86_64`]. Para añadir una dependencia en esa crate, la añadimos a la sección de `dependencies` en nuestro `Cargo.toml`: [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/ ```toml # en Cargo.toml [dependencies] x86_64 = "0.14.2" ``` Ahora podemos usar el tipo [`Port`] proporcionado por la crate para crear una función `exit_qemu`: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html ```rust // en src/main.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` La función crea un nuevo [`Port`] en `0xf4`, que es el `iobase` del dispositivo `isa-debug-exit`. Luego escribe el código de salida pasado al puerto. Usamos `u32` porque especificamos el `iosize` del dispositivo `isa-debug-exit` como 4 bytes. Ambas operaciones son inseguras porque escribir en un puerto de E/S puede resultar en un comportamiento arbitrario. Para especificar el código de salida, creamos un enum `QemuExitCode`. La idea es salir con el código de salida de éxito si todas las pruebas tuvieron éxito y con el código de salida de fallo de otro modo. El enum está marcado como `#[repr(u32)]` para representar cada variante como un entero `u32`. Usamos el código de salida `0x10` para éxito y `0x11` para fallo. Los códigos de salida reales no importan mucho, siempre y cuando no choquen con los códigos de salida predeterminados de QEMU. Por ejemplo, usar el código de salida `0` para éxito no es una buena idea porque se convierte en `(0 << 1) | 1 = 1` después de la transformación, que es el código de salida predeterminado cuando QEMU falla al ejecutarse. Así que no podríamos diferenciar un error de QEMU de una ejecución de prueba exitosa. Ahora podemos actualizar nuestro `test_runner` para salir de QEMU después de que se hayan ejecutado todas las pruebas: ```rust // en src/main.rs fn test_runner(tests: &[&dyn Fn()]) { println!("Ejecutando {} pruebas", tests.len()); for test in tests { test(); } /// nuevo exit_qemu(QemuExitCode::Success); } ``` Cuando ejecutamos `cargo test` ahora, vemos que QEMU se cierra inmediatamente después de ejecutar las pruebas. El problema es que `cargo test` interpreta la prueba como fallida aunque pasamos nuestro código de salida de éxito: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.03s Running target/x86_64-blog_os/debug/deps/blog_os-5804fc7d2dd4c9be Building bootloader Compiling bootloader v0.5.3 (/home/philipp/Documents/bootloader) Finished release [optimized + debuginfo] target(s) in 1.07s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-5804fc7d2dd4c9be.bin -device isa-debug-exit,iobase=0xf4, iosize=0x04` error: test failed, to rerun pass '--bin blog_os' ``` El problema es que `cargo test` considera todos los códigos de error que no sean `0` como fallidos. ### Código de salida de éxito Para solucionar esto, `bootimage` proporciona una clave de configuración `test-success-exit-code` que mapea un código de salida especificado al código de salida `0`: ```toml # en Cargo.toml [package.metadata.bootimage] test-args = […] test-success-exit-code = 33 # (0x10 << 1) | 1 ``` Con esta configuración, `bootimage` mapea nuestro código de salida de éxito al código de salida 0, de modo que `cargo test` reconozca correctamente el caso de éxito y no cuente la prueba como fallida. Nuestro runner de pruebas ahora cierra automáticamente QEMU y reporta correctamente los resultados de las pruebas. Aún vemos que la ventana de QEMU permanece abierta por un breve período de tiempo, pero no es suficiente para leer los resultados. Sería agradable si pudiéramos imprimir los resultados de las pruebas en la consola en su lugar, para que podamos seguir viéndolos después de que QEMU salga. ## Imprimiendo en la Consola Para ver la salida de las pruebas en la consola, necesitamos enviar los datos desde nuestro núcleo al sistema host de alguna manera. Hay varias formas de lograr esto, por ejemplo, enviando los datos a través de una interfaz de red TCP. Sin embargo, configurar una pila de red es una tarea bastante compleja, por lo que elegiremos una solución más simple. ### Puerto Serial Una forma simple de enviar los datos es usar el [puerto serial], un estándar de interfaz antiguo que ya no se encuentra en computadoras modernas. Es fácil de programar y QEMU puede redirigir los bytes enviados a través del serial a la salida estándar del host o a un archivo. [puerto serial]: https://en.wikipedia.org/wiki/Serial_port Los chips que implementan una interfaz serial se llaman [UARTs]. Hay [muchos modelos de UART] en x86, pero afortunadamente las únicas diferencias entre ellos son algunas características avanzadas que no necesitamos. Los UART comunes hoy en día son todos compatibles con el [UART 16550], así que utilizaremos ese modelo para nuestro framework de pruebas. [UARTs]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter [muchos modelos de UART]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#Models [UART 16550]: https://en.wikipedia.org/wiki/16550_UART Usaremos la crate [`uart_16550`] para inicializar el UART y enviar datos a través del puerto serial. Para añadirlo como dependencia, actualizamos nuestro `Cargo.toml` y `main.rs`: [`uart_16550`]: https://docs.rs/uart_16550 ```toml # en Cargo.toml [dependencies] uart_16550 = "0.2.0" ``` La crate `uart_16550` contiene una estructura `SerialPort` que representa los registros del UART, pero aún necesitamos construir una instancia de ella nosotros mismos. Para eso, creamos un nuevo módulo `serial` con el siguiente contenido: ```rust // en src/main.rs mod serial; ``` ```rust // en src/serial.rs use uart_16550::SerialPort; use spin::Mutex; use lazy_static::lazy_static; lazy_static! { pub static ref SERIAL1: Mutex = { let mut serial_port = unsafe { SerialPort::new(0x3F8) }; serial_port.init(); Mutex::new(serial_port) }; } ``` Al igual que con el [buffer de texto VGA][vga lazy-static], usamos `lazy_static` y un spinlock para crear una instancia `static` de escritor. Usando `lazy_static` podemos asegurarnos de que el método `init` se llame exactamente una vez en su primer uso. Al igual que el dispositivo `isa-debug-exit`, el UART se programa usando E/S de puerto. Dado que el UART es más complejo, utiliza varios puertos de E/S para programar diferentes registros del dispositivo. La función insegura `SerialPort::new` espera la dirección del primer puerto de E/S del UART como argumento, desde la cual puede calcular las direcciones de todos los puertos necesarios. Estamos pasando la dirección de puerto `0x3F8`, que es el número de puerto estándar para la primera interfaz serial. [vga lazy-static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics Para hacer que el puerto serial sea fácilmente utilizable, añadimos los macros `serial_print!` y `serial_println!`: ```rust // en src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; SERIAL1.lock().write_fmt(args).expect("Error al imprimir en serial"); } /// Imprime en el host a través de la interfaz serial. #[macro_export] macro_rules! serial_print { ($($arg:tt)*) => { $crate::serial::_print(format_args!($($arg)*)); }; } /// Imprime en el host a través de la interfaz serial, añadiendo una nueva línea. #[macro_export] macro_rules! serial_println { () => ($crate::serial_print!("\n")); ($fmt:expr) => ($crate::serial_print!(concat!($fmt, "\n"))); ($fmt:expr, $($arg:tt)*) => ($crate::serial_print!( concat!($fmt, "\n"), $($arg)*)); } ``` La implementación es muy similar a la implementación de nuestros macros `print` y `println`. Dado que el tipo `SerialPort` ya implementa el trait [`fmt::Write`], no necesitamos proporcionar nuestra propia implementación. [`fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html Ahora podemos imprimir en la interfaz serial en lugar de en el buffer de texto VGA en nuestro código de prueba: ```rust // en src/main.rs #[cfg(test)] fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Ejecutando {} pruebas", tests.len()); […] } #[test_case] fn trivial_assertion() { serial_print!("aserción trivial... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` Ten en cuenta que el macro `serial_println` vive directamente en el espacio de nombres raíz porque usamos el atributo `#[macro_export]`, por lo que importarlo a través de `use crate::serial::serial_println` no funcionará. ### Argumentos de QEMU Para ver la salida serial de QEMU, necesitamos usar el argumento `-serial` para redirigir la salida a stdout: ```toml # en Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio" ] ``` Cuando ejecutamos `cargo test` ahora, vemos la salida de las pruebas directamente en la consola: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Ejecutando 1 pruebas aserción trivial... [ok] ``` Sin embargo, cuando una prueba falla, todavía vemos la salida dentro de QEMU porque nuestro manejador de pánicos todavía usa `println`. Para simular esto, podemos cambiar la afirmación en nuestra prueba de `trivial_assertion` a `assert_eq!(0, 1)`: ![QEMU imprimiendo "¡Hola Mundo!" y "panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:55:5](qemu-failed-test.png) Vemos que el mensaje de pánico todavía se imprime en el buffer de VGA, mientras que la otra salida de prueba se imprime en el puerto serial. El mensaje de pánico es bastante útil, así que sería útil verlo también en la consola. ### Imprimir un Mensaje de Error en el Pánico Para salir de QEMU con un mensaje de error en un pánico, podemos usar [compilación condicional] para usar un manejador de pánicos diferente en modo de prueba: [compilación condicional]: https://doc.rust-lang.org/1.30.0/book/first-edition/conditional-compilation.html ```rust // en src/main.rs // nuestro manejador de pánico existente #[cfg(not(test))] // nuevo atributo #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } // nuestro manejador de pánico en modo de prueba #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { serial_println!("[fallido]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } ``` Para nuestro manejador de pánico en las pruebas, usamos `serial_println` en lugar de `println` y luego salimos de QEMU con un código de salida de error. Ten en cuenta que aún necesitamos un bucle infinito después de la llamada a `exit_qemu` porque el compilador no sabe que el dispositivo `isa-debug-exit` provoca una salida del programa. Ahora QEMU también saldrá para pruebas fallidas e imprimirá un mensaje de error útil en la consola: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Ejecutando 1 pruebas aserción trivial... [fallido] Error: panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:65:5 ``` Dado que ahora vemos toda la salida de prueba en la consola, ya no necesitamos la ventana de QEMU que aparece por un corto período de tiempo. Así que podemos ocultarla completamente. ### Ocultando QEMU Dado que reportamos todos los resultados de las pruebas utilizando el dispositivo `isa-debug-exit` y el puerto serial, ya no necesitamos la ventana de QEMU. Podemos ocultarla pasando el argumento `-display none` a QEMU: ```toml # en Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio", "-display", "none" ] ``` Ahora QEMU se ejecuta completamente en segundo plano y no se abre ninguna ventana. Esto no solo es menos molesto, sino que también permite que nuestro framework de pruebas se ejecute en entornos sin una interfaz gráfica, como servicios CI o conexiones [SSH]. [SSH]: https://en.wikipedia.org/wiki/Secure_Shell ### Timeouts Dado que `cargo test` espera hasta que el runner de pruebas salga, una prueba que nunca retorna puede bloquear el runner de pruebas para siempre. Eso es desafortunado, pero no es un gran problema en la práctica, ya que generalmente es fácil evitar bucles infinitos. En nuestro caso, sin embargo, pueden ocurrir bucles infinitos en varias situaciones: - El cargador de arranque no logra cargar nuestro núcleo, lo que provoca que el sistema reinicie indefinidamente. - El firmware BIOS/UEFI no logra cargar el cargador de arranque, lo que provoca el mismo reinicio infinito. - La CPU entra en una instrucción `loop {}` al final de algunas de nuestras funciones, por ejemplo, porque el dispositivo de salida QEMU no funciona correctamente. - El hardware provoca un reinicio del sistema, por ejemplo, cuando una excepción de CPU no es capturada (explicado en una publicación futura). Dado que los bucles infinitos pueden ocurrir en tantas situaciones, la herramienta `bootimage` establece un tiempo de espera de 5 minutos para cada ejecutable de prueba de manera predeterminada. Si la prueba no termina dentro de este tiempo, se marca como fallida y se imprime un error de "Tiempo de espera". Esta función asegura que las pruebas que están atrapadas en un bucle infinito no bloqueen `cargo test` para siempre. Puedes intentarlo tú mismo añadiendo una instrucción `loop {}` en la prueba `trivial_assertion`. Cuando ejecutes `cargo test`, verás que la prueba se marca como expirado después de 5 minutos. La duración del tiempo de espera es [configurable][bootimage config] a través de una clave `test-timeout` en el Cargo.toml: [bootimage config]: https://github.com/rust-osdev/bootimage#configuration ```toml # en Cargo.toml [package.metadata.bootimage] test-timeout = 300 # (en segundos) ``` Si no quieres esperar 5 minutos para que la prueba `trivial_assertion` expire, puedes reducir temporalmente el valor anterior. ### Insertar Impresión Automáticamente Nuestra prueba `trivial_assertion` actualmente necesita imprimir su propia información de estado usando `serial_print!`/`serial_println!`: ```rust #[test_case] fn trivial_assertion() { serial_print!("aserción trivial... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` Añadir manualmente estas declaraciones de impresión para cada prueba que escribimos es engorroso, así que actualicemos nuestro `test_runner` para imprimir estos mensajes automáticamente. Para hacer eso, necesitamos crear un nuevo trait `Testable`: ```rust // en src/main.rs pub trait Testable { fn run(&self) -> (); } ``` El truco ahora es implementar este trait para todos los tipos `T` que implementan el trait [`Fn()`]: [`Fn()` trait]: https://doc.rust-lang.org/stable/core/ops/trait.Fn.html ```rust // en src/main.rs impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } ``` Implementamos la función `run` imprimiendo primero el nombre de la función utilizando la función [`any::type_name`] . Esta función se implementa directamente en el compilador y devuelve una descripción de cadena de cada tipo. Para las funciones, el tipo es su nombre, así que esto es exactamente lo que queremos en este caso. El carácter `\t` es el [carácter de tabulación], que añade algo de alineación a los mensajes `[ok]`. [`any::type_name`]: https://doc.rust-lang.org/stable/core/any/fn.type_name.html [carácter de tabulación]: https://en.wikipedia.org/wiki/Tab_character Después de imprimir el nombre de la función, invocamos la función de prueba a través de `self()`. Esto solo funciona porque requerimos que `self` implemente el trait `Fn()`. Después de que la función de prueba retorna, imprimimos `[ok]` para indicar que la función no provocó un pánico. El último paso es actualizar nuestro `test_runner` para usar el nuevo trait `Testable`: ```rust // en src/main.rs #[cfg(test)] pub fn test_runner(tests: &[&dyn Testable]) { // nuevo serial_println!("Ejecutando {} pruebas", tests.len()); for test in tests { test.run(); // nuevo } exit_qemu(QemuExitCode::Success); } ``` Los únicos dos cambios son el tipo del argumento `tests` de `&[&dyn Fn()]` a `&[&dyn Testable]` y el hecho de que ahora llamamos a `test.run()` en lugar de `test()`. Ahora podemos eliminar las declaraciones de impresión de nuestra prueba `trivial_assertion` ya que ahora se imprimen automáticamente: ```rust // en src/main.rs #[test_case] fn trivial_assertion() { assert_eq!(1, 1); } ``` La salida de `cargo test` ahora se ve así: ``` Ejecutando 1 pruebas blog_os::trivial_assertion... [ok] ``` El nombre de la función ahora incluye la ruta completa a la función, que es útil cuando las funciones de prueba en diferentes módulos tienen el mismo nombre. De lo contrario, la salida se ve igual que antes, pero ya no necesitamos agregar declaraciones de impresión a nuestras pruebas manualmente. ## Pruebas del Buffer VGA Ahora que tenemos un marco de pruebas funcional, podemos crear algunas pruebas para nuestra implementación del buffer VGA. Primero, creamos una prueba muy simple para verificar que `println` funciona sin provocar un pánico: ```rust // en src/vga_buffer.rs #[test_case] fn test_println_simple() { println!("salida de test_println_simple"); } ``` La prueba simplemente imprime algo en el buffer VGA. Si termina sin provocar un pánico, significa que la invocación de `println` tampoco provocó un pánico. Para asegurarnos de que no se produzca un pánico incluso si se imprimen muchas líneas y las líneas se desplazan de la pantalla, podemos crear otra prueba: ```rust // en src/vga_buffer.rs #[test_case] fn test_println_many() { for _ in 0..200 { println!("salida de test_println_many"); } } ``` También podemos crear una función de prueba para verificar que las líneas impresas realmente aparecen en la pantalla: ```rust // en src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Alguna cadena de prueba que cabe en una única línea"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` La función define una cadena de prueba, la imprime usando `println`, y luego itera sobre los caracteres de pantalla del estático `WRITER`, que representa el buffer de texto VGA. Dado que `println` imprime en la última línea de pantalla y luego inmediatamente agrega una nueva línea, la cadena debería aparecer en la línea `BUFFER_HEIGHT - 2`. Usando [`enumerate`], contamos el número de iteraciones en la variable `i`, que luego utilizamos para cargar el carácter de pantalla correspondiente a `c`. Al comparar el `ascii_character` del carácter de pantalla con `c`, nos aseguramos de que cada carácter de la cadena realmente aparece en el buffer de texto VGA. [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate Como puedes imaginar, podríamos crear muchas más funciones de prueba. Por ejemplo, una función que teste que no se produzca un pánico al imprimir líneas muy largas y que se envuelvan correctamente, o una función que pruebe que se manejan correctamente nuevas líneas, caracteres no imprimibles y caracteres no unicode. Para el resto de esta publicación, sin embargo, explicaremos cómo crear _pruebas de integración_ para probar la interacción de diferentes componentes juntos. ## Pruebas de Integración La convención para las [pruebas de integración] en Rust es ponerlas en un directorio `tests` en la raíz del proyecto (es decir, junto al directorio `src`). Tanto el marco de prueba predeterminado como los marcos de prueba personalizados recogerán y ejecutarán automáticamente todas las pruebas en ese directorio. [pruebas de integración]: https://doc.rust-lang.org/book/ch11-03-test-organization.html#integration-tests Todas las pruebas de integración son sus propios ejecutables y completamente separadas de nuestro `main.rs`. Esto significa que cada prueba necesita definir su propia función de punto de entrada. Creemos una prueba de integración de ejemplo llamada `basic_boot` para ver cómo funciona en detalle: ```rust // en tests/basic_boot.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; #[no_mangle] // no modificar el nombre de esta función pub extern "C" fn _start() -> ! { test_main(); loop {} } fn test_runner(tests: &[&dyn Fn()]) { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { loop {} } ``` Dado que las pruebas de integración son ejecutables separados, necesitamos proporcionar todos los atributos de crate nuevamente (`no_std`, `no_main`, `test_runner`, etc.). También necesitamos crear una nueva función de punto de entrada `_start`, que llama a la función de punto de entrada de prueba `test_main`. No necesitamos ningún atributo `cfg(test)` porque los ejecutables de prueba de integración nunca se construyen en modo no prueba. Usamos el macro [`unimplemented`] que siempre provoca un pánico como un marcador de posición para la función `test_runner` y simplemente hacemos `loop` en el manejador de pánico por ahora. Idealmente, queremos implementar estas funciones exactamente como lo hicimos en nuestro `main.rs` utilizando el macro `serial_println` y la función `exit_qemu`. El problema es que no tenemos acceso a estas funciones ya que las pruebas se construyen completamente por separado de nuestro ejecutable `main.rs`. [`unimplemented`]: https://doc.rust-lang.org/core/macro.unimplemented.html Si ejecutas `cargo test` en esta etapa, te quedarás atrapado en un bucle infinito porque el manejador de pánicos se queda en un bucle indefinidamente. Necesitas usar el atajo de teclado `ctrl+c` para salir de QEMU. ### Crear una Biblioteca Para que las funciones requeridas estén disponibles para nuestra prueba de integración, necesitamos separar una biblioteca de nuestro `main.rs`, que pueda ser incluida por otros crates y ejecutables de pruebas de integración. Para hacer esto, creamos un nuevo archivo `src/lib.rs`: ```rust // src/lib.rs #![no_std] ``` Al igual que `main.rs`, `lib.rs` es un archivo especial que es automáticamente reconocido por cargo. La biblioteca es una unidad de compilación separada, por lo que necesitamos especificar el atributo `#![no_std]` nuevamente. Para que nuestra biblioteca funcione con `cargo test`, también necesitamos mover las funciones y atributos de prueba de `main.rs` a `lib.rs`: ```rust // en src/lib.rs #![cfg_attr(test, no_main)] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; pub trait Testable { fn run(&self) -> (); } impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Ejecutando {} pruebas", tests.len()); for test in tests { test.run(); } exit_qemu(QemuExitCode::Success); } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[fallido]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } /// Punto de entrada para `cargo test` #[cfg(test)] #[no_mangle] pub extern "C" fn _start() -> ! { test_main(); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { test_panic_handler(info) } ``` Para hacer que nuestra `test_runner` esté disponible para los ejecutables y pruebas de integración, la hacemos pública y no le aplicamos el atributo `cfg(test)`. También extraemos la implementación de nuestro manejador de pánicos en una función pública `test_panic_handler`, para que esté disponible para los ejecutables también. Dado que nuestra `lib.rs` se prueba independientemente de `main.rs`, necesitamos añadir una función de entrada `_start` y un manejador de pánico cuando la biblioteca se compila en modo de prueba. Usando el atributo [`cfg_attr`] de crate, habilitamos condicionalmente el atributo `no_main` en este caso. [`cfg_attr`]: https://doc.rust-lang.org/reference/conditional-compilation.html#the-cfg_attr-attribute También movemos el enum `QemuExitCode` y la función `exit_qemu` y los hacemos públicos: ```rust // en src/lib.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` Ahora los ejecutables y las pruebas de integración pueden importar estas funciones de la biblioteca y no necesitan definir sus propias implementaciones. Para también hacer que `println` y `serial_println` estén disponibles, movemos también las declaraciones de módulo: ```rust // en src/lib.rs pub mod serial; pub mod vga_buffer; ``` Hacemos que los módulos sean públicos para que sean utilizables fuera de nuestra biblioteca. Esto también es necesario para hacer que nuestros macros `println` y `serial_println` sean utilizables ya que utilizan las funciones `_print` de los módulos. Ahora podemos actualizar nuestro `main.rs` para usar la biblioteca: ```rust // en src/main.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; use blog_os::println; #[no_mangle] pub extern "C" fn _start() -> ! { println!("¡Hola Mundo{}!", "!"); #[cfg(test)] test_main(); loop {} } /// Esta función se llama en caso de pánico. #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` La biblioteca es utilizable como si fuera una crate externa normal. Se llama `blog_os`, como nuestra crate. El código anterior utiliza la función `test_runner` de `blog_os` en el atributo `test_runner` y la función `test_panic_handler` de `blog_os` en nuestro manejador de pánicos `cfg(test)`. También importa el macro `println` para hacerlo disponible en nuestras funciones `_start` y `panic`. En este punto, `cargo run` y `cargo test` deberían funcionar nuevamente. Por supuesto, `cargo test` todavía se queda atrapado en un bucle infinito (puedes salir con `ctrl+c`). Vamos a solucionar esto usando las funciones de biblioteca requeridas en nuestra prueba de integración. ### Completar la Prueba de Integración Al igual que nuestro `src/main.rs`, nuestro ejecutable `tests/basic_boot.rs` puede importar tipos de nuestra nueva biblioteca. Esto nos permite importar los componentes faltantes para completar nuestra prueba: ```rust // en tests/basic_boot.rs #![test_runner(blog_os::test_runner)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` En lugar de reimplementar el runner de prueba, usamos la función `test_runner` de nuestra biblioteca cambiando el atributo `#![test_runner(crate::test_runner)]` a `#![test_runner(blog_os::test_runner)]`. Ya no necesitamos la función de sanidad `test_runner` de referencia en `basic_boot.rs`, así que podemos eliminarla. Para nuestro manejador de pánicos, llamamos a la función `blog_os::test_panic_handler` como hicimos en nuestro archivo `main.rs`. Ahora `cargo test` sale normalmente nuevamente. Cuando lo ejecutas, verás que construye y ejecuta las pruebas para `lib.rs`, `main.rs` y `basic_boot.rs` por separado después de cada uno. Para `main.rs` y las pruebas de integración `basic_boot`, informa "Ejecutando 0 pruebas" ya que estos archivos no tienen funciones anotadas con `#[test_case]`. Ahora podemos añadir pruebas a nuestro `basic_boot.rs`. Por ejemplo, podemos probar que `println` funciona sin provocar un pánico, como hicimos en las pruebas del buffer VGA: ```rust // en tests/basic_boot.rs use blog_os::println; #[test_case] fn test_println() { println!("salida de test_println"); } ``` Cuando ejecutamos `cargo test` ahora, vemos que encuentra y ejecuta la función de prueba. La prueba podría parecer un poco inútil ahora ya que es casi idéntica a una de las pruebas del buffer VGA. Sin embargo, en el futuro, las funciones `_start` de nuestros `main.rs` y `lib.rs` podrían crecer y llamar a varias rutinas de inicialización antes de ejecutar la función `test_main`, de modo que las dos pruebas se ejecuten en entornos muy diferentes. Al probar `println` en un entorno de `basic_boot` sin llamar a ninguna rutina de inicialización en `_start`, podemos asegurarnos de que `println` funcione justo después de arrancar. Esto es importante porque nos basamos en ello, por ejemplo, para imprimir mensajes de pánico. ### Pruebas Futuras El poder de las pruebas de integración es que se tratan como ejecutables completamente separados. Esto les da el control total sobre el entorno, lo que hace posible probar que el código interactúa correctamente con la CPU o dispositivos de hardware. Nuestra prueba `basic_boot` es un ejemplo muy simple de una prueba de integración. En el futuro, nuestro núcleo se volverá mucho más funcional e interactuará con el hardware de varias maneras. Al añadir pruebas de integración, podemos asegurarnos de que estas interacciones funcionen (y sigan funcionando) como se espera. Algunas ideas para posibles pruebas futuras son: - **Excepciones de CPU**: Cuando el código realiza operaciones inválidas (por ejemplo, division por cero), la CPU lanza una excepción. El núcleo puede registrar funciones de manejo para tales excepciones. Una prueba de integración podría verificar que se llame al controlador de excepciones correcto cuando ocurre una excepción de CPU o que la ejecución continúe correctamente después de una excepción recuperable. - **Tablas de Páginas**: Las tablas de páginas definen qué regiones de memoria son válidas y accesibles. Al modificar las tablas de páginas, es posible asignar nuevas regiones de memoria, por ejemplo, al lanzar programas. Una prueba de integración podría modificar las tablas de páginas en la función `_start` y verificar que las modificaciones tengan los efectos deseados en las funciones `#[test_case]`. - **Programas en Espacio de Usuario**: Los programas en espacio de usuario son programas con acceso limitado a los recursos del sistema. Por ejemplo, no tienen acceso a las estructuras de datos del núcleo ni a la memoria de otros programas. Una prueba de integración podría lanzar programas en espacio de usuario que realicen operaciones prohibidas y verificar que el núcleo las prevenga todas. Como puedes imaginar, son posibles muchas más pruebas. Al añadir tales pruebas, podemos asegurarnos de no romperlas accidentalmente al añadir nuevas características a nuestro núcleo o refactorizar nuestro código. Esto es especialmente importante cuando nuestro núcleo se vuelve más grande y complejo. ### Pruebas que Deberían Fallar El marco de pruebas de la biblioteca estándar admite un atributo [`#[should_panic]`](https://doc.rust-lang.org/rust-by-example/testing/unit_testing.html#testing-panics) que permite construir funciones de prueba que deberían fallar. Esto es útil, por ejemplo, para verificar que una función falle cuando se pasa un argumento inválido. Desafortunadamente, este atributo no está soportado en crates `#[no_std]` ya que requiere soporte de la biblioteca estándar. Si bien no podemos usar el atributo `#[should_panic]` en nuestro núcleo, podemos obtener un comportamiento similar creando una prueba de integración que salga con un código de error de éxito desde el manejador de pánicos. Comencemos a crear tal prueba con el nombre `should_panic`: ```rust // en tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{QemuExitCode, exit_qemu, serial_println}; #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` Esta prueba aún está incompleta ya que no define una función `_start` ni ninguno de los atributos del marco de prueba personalizados que faltan. Añadamos las partes que faltan: ```rust // en tests/should_panic.rs #![feature(custom_test_frameworks)] #![test_runner(test_runner)] #![reexport_test_harness_main = "test_main"] #[no_mangle] pub extern "C" fn _start() -> ! { test_main(); loop {} } pub fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Ejecutando {} pruebas", tests.len()); for test in tests { test(); serial_println!("[la prueba no falló]"); exit_qemu(QemuExitCode::Failed); } exit_qemu(QemuExitCode::Success); } ``` En lugar de reutilizar el `test_runner` de `lib.rs`, la prueba define su propia función `test_runner` que sale con un código de error de fallo cuando una prueba retorna sin provocar un pánico (queremos que nuestras pruebas fallen). Si no se define ninguna función de prueba, el runner sale con un código de éxito. Dado que el runner siempre sale después de ejecutar una sola prueba, no tiene sentido definir más de una función `#[test_case]`. Ahora podemos crear una prueba que debería fallar: ```rust // en tests/should_panic.rs use blog_os::serial_print; #[test_case] fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } ``` La prueba utiliza `assert_eq` para afirmar que `0` y `1` son iguales. Por supuesto, esto falla, por lo que nuestra prueba provoca un pánico como se deseaba. Ten en cuenta que necesitamos imprimir manualmente el nombre de la función usando `serial_print!` aquí porque no usamos el trait `Testable`. Cuando ejecutamos la prueba a través de `cargo test --test should_panic` vemos que es exitosa porque la prueba se produjo como se esperaba. Cuando comentamos la afirmación y ejecutamos la prueba nuevamente, vemos que, de hecho, falla con el mensaje _"la prueba no falló"_. Una gran desventaja de este enfoque es que solo funciona para una única función de prueba. Con múltiples funciones `#[test_case]`, solo se ejecuta la primera función porque la ejecución no puede continuar después de que se ha llamado al manejador de pánicos. Actualmente no sé una buena manera de resolver este problema, ¡así que házmelo saber si tienes una idea! ### Pruebas Sin Harness Para las pruebas de integración que solo tienen una única función de prueba (como nuestra prueba `should_panic`), el runner de prueba no es realmente necesario. Para casos como este, podemos deshabilitar completamente el runner de pruebas y ejecutar nuestra prueba directamente en la función `_start`. La clave para esto es deshabilitar la bandera `harness` para la prueba en el `Cargo.toml`, que define si se usa un runner de prueba para una prueba de integración. Cuando está configurada como `false`, se desactivan tanto el marco de prueba predeterminado como la característica de marcos de prueba personalizados, por lo que la prueba se trata como un ejecutable normal. Deshabilitemos la bandera `harness` para nuestra prueba `should_panic`: ```toml # en Cargo.toml [[test]] name = "should_panic" harness = false ``` Ahora simplificamos enormemente nuestra prueba `should_panic` al eliminar el código relacionado con el `test_runner`. El resultado se ve así: ```rust // en tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{exit_qemu, serial_print, serial_println, QemuExitCode}; #[no_mangle] pub extern "C" fn _start() -> ! { should_fail(); serial_println!("[la prueba no falló]"); exit_qemu(QemuExitCode::Failed); loop{} } fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` Ahora llamamos a la función `should_fail` directamente desde nuestra función `_start` y salimos con un código de error de fallo si retorna. Cuando ejecutamos `cargo test --test should_panic` ahora, vemos que la prueba se comporta exactamente como antes. Además de crear pruebas `should_panic`, deshabilitar el atributo `harness` también puede ser útil para pruebas de integración complejas, por ejemplo, cuando las funciones de prueba individuales tienen efectos secundarios y necesitan ejecutarse en un orden específico. ## Resumen Las pruebas son una técnica muy útil para asegurarse de que ciertos componentes tengan el comportamiento deseado. Aunque no pueden mostrar la ausencia de errores, siguen siendo una herramienta útil para encontrarlos y especialmente para evitar regresiones. Esta publicación explicó cómo configurar un marco de pruebas para nuestro núcleo Rust. Utilizamos la característica de marcos de prueba personalizados de Rust para implementar el soporte para un simple atributo `#[test_case]` en nuestro entorno de metal desnudo. Usando el dispositivo `isa-debug-exit` de QEMU, nuestro runner de pruebas puede salir de QEMU después de ejecutar las pruebas y reportar el estado de las pruebas. Para imprimir mensajes de error en la consola en lugar de en el buffer de VGA, creamos un controlador básico para el puerto serial. Después de crear algunas pruebas para nuestro macro `println`, exploramos las pruebas de integración en la segunda mitad de la publicación. Aprendimos que viven en el directorio `tests` y se tratan como ejecutables completamente separados. Para dar acceso a la función `exit_qemu` y al macro `serial_println`, movimos la mayor parte de nuestro código a una biblioteca que pueden importar todos los ejecutables y pruebas de integración. Dado que las pruebas de integración se ejecutan en su propio entorno separado, permiten probar interacciones con el hardware o crear pruebas que deberían provocar pánicos. Ahora tenemos un marco de pruebas que se ejecuta en un entorno realista dentro de QEMU. Al crear más pruebas en publicaciones futuras, podemos mantener nuestro núcleo manejable a medida que se vuelva más complejo. ## ¿Qué sigue? En la próxima publicación, exploraremos _excepciones de CPU_. Estas excepciones son lanzadas por la CPU cuando ocurre algo ilegal, como una división por cero o un acceso a una página de memoria no mapeada (una llamada "falta de página"). Poder capturar y examinar estas excepciones es muy importante para depurar futuros errores. El manejo de excepciones también es muy similar al manejo de interrupciones de hardware, que es necesario para el soporte del teclado. ================================================ FILE: blog/content/edition-2/posts/04-testing/index.fa.md ================================================ +++ title = "تست کردن" weight = 4 path = "fa/testing" date = 2019-04-27 [extra] # Please update this when updating the translation translation_based_on_commit = "d007af4811469b974f7abb988dd9c9d1373b55f0" # GitHub usernames of the people that translated this post translators = ["hamidrezakp", "MHBahrampour"] rtl = true +++ این پست به بررسی تست‌های واحد (ترجمه: unit) و یکپارچه (ترجمه: integration) در فایل‌های اجرایی ‌`no_std` می‌پردازد. ما از پشتیبانی Rust برای فریم‌ورک تست‌های سفارشی استفاده می‌کنیم تا توابع تست را درون کرنل‌مان اجرا کنیم. برای گزارش کردن نتایج خارج از QEMU، از ویژگی‌های مختلف QEMU و ابزار `bootimage` استفاده می‌کنیم. این بلاگ بصورت آزاد روی [گیت‌هاب] توسعه داده شده است. اگر شما مشکل یا سوالی دارید، لطفاً آن‌جا یک ایشو باز کنید. شما همچنین می‌توانید [در زیر] این پست کامنت بگذارید. منبع کد کامل این پست را می‌توانید در بِرَنچ [`post-04`][post branch] پیدا کنید. [گیت‌هاب]: https://github.com/phil-opp/blog_os [در زیر]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-04 ## نیازمندی‌ها این پست جایگزین (حالا منسوخ شده) پست‌های [_Unit Testing_] و [_Integration Tests_] می‌شود. فرض بر این است که شما پست [_یک کرنل مینیمال با Rust_] را پس از 27-09-2019 دنبال کرده‌اید. اساساً نیاز است که شما یک فایل `.cargo/config.toml` داشته باشید که [یک هدف پیشفرض مشخص می‌کند] و [یک اجرا کننده قابل اجرا تعریف می‌کند]. [_Unit Testing_]: @/edition-2/posts/deprecated/04-unit-testing/index.md [_Integration Tests_]: @/edition-2/posts/deprecated/05-integration-tests/index.md [_یک کرنل مینیمال با Rust_]: @/edition-2/posts/02-minimal-rust-kernel/index.md [یک هدف پیشفرض مشخص می‌کند]: @/edition-2/posts/02-minimal-rust-kernel/index.md#set-a-default-target [یک اجرا کننده قابل اجرا تعریف می‌کند]: @/edition-2/posts/02-minimal-rust-kernel/index.md#using-cargo-run ## تست کردن در Rust زبان Rust یک [فریم‌ورک تست توکار] دارد که قادر به اجرای تست‌های واحد بدون نیاز به تنظیم هر چیزی است. فقط کافی است تابعی ایجاد کنید که برخی نتایج را از طریق اَسرشن‌ها (کلمه: assertions) بررسی کند و صفت `#[test]` را به هدر تابع (ترجمه: function header) اضافه کنید. سپس `cargo test` به طور خودکار تمام تابع‌های تست کریت شما را پیدا و اجرا می‌کند. [فریم‌ورک تست توکار]: https://doc.rust-lang.org/book/second-edition/ch11-00-testing.html متأسفانه برای برنامه‌های `no_std` مانند هسته ما کمی پیچیده‌تر است. مسئله این است که فریم‌ورک تست Rust به طور ضمنی از کتابخانه [`test`] داخلی استفاده می‌کند که به کتابخانه استاندارد وابسته‌ است. این بدان معناست که ما نمی‌توانیم از فریم‌ورک تست پیشفرض برای هسته `#[no_std]` خود استفاده کنیم. [`test`]: https://doc.rust-lang.org/test/index.html وقتی می‌خواهیم `cargo test` را در پروژه خود اجرا کنیم، چنین چیزی می‌بینیم: ``` > cargo test Compiling blog_os v0.1.0 (/…/blog_os) error[E0463]: can't find crate for `test` ``` از آن‌جایی که کریت `test` به کتابخانه استاندارد وابسته است، برای هدف bare metal ما در دسترس نیست. در حالی که استفاده از کریت `test` در یک `#[no_std]` [امکان پذیر است][utest]، اما بسیار ناپایدار بوده و به برخی هک‌ها مانند تعریف مجدد ماکرو `panic` نیاز دارد. [utest]: https://github.com/japaric/utest ### فریم‌ورک تست سفارشی خوشبختانه، Rust از جایگزین کردن فریم‌ورک تست پیشفرض از طریق ویژگی [`custom_test_frameworks`] ناپایدار پشتیبانی می‌کند. این ویژگی به کتابخانه خارجی احتیاج ندارد و بنابراین در محیط‌های `#[no_std]` نیز کار می‌کند. این کار با جمع آوری تمام توابع دارای صفت `#[test_case]` و سپس فراخوانی یک تابع اجرا کننده مشخص شده توسط کاربر و با لیست تست‌ها به عنوان آرگومان کار می‌کند. بنابراین حداکثر کنترل فرآیند تست را به ما می‌دهد. [`custom_test_frameworks`]: https://doc.rust-lang.org/unstable-book/language-features/custom-test-frameworks.html نقطه ضعف آن در مقایسه با فریم‌ورک تست پیشفرض این است که بسیاری از ویژگی‌های پیشرفته مانند [تست‌های `should_panic`] در دسترس نیست. در عوض، تهیه این ویژگی‌ها در صورت نیاز به پیاده‌سازی ما بستگی دارد. این برای ما ایده آل است، زیرا ما یک محیط اجرای بسیار ویژه داریم که پیاده سازی پیشفرض چنین ویژگی‌های پیشرفته‌ای احتمالاً کارساز نخواهد بود. به عنوان مثال‌، صفت `#[should_panic]` متکی به stack unwinding برای گرفتن پنیک‌ها (کلمه: panics) است، که ما آن را برای هسته خود غیرفعال کردیم. [تست‌های `should_panic`]: https://doc.rust-lang.org/book/ch11-01-writing-tests.html#checking-for-panics-with-should_panic برای اجرای یک فریم‌ورک تست سفارشی برای هسته خود، موارد زیر را به `main.rs` اضافه می‌کنیم: ```rust // in src/main.rs #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #[cfg(test)] pub fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } } ``` اجرا کننده ما فقط یک پیام کوتاه اشکال زدایی را چاپ می‌کند و سپس هر تابع تست درون لیست را فراخوانی می‌کند. نوع آرگومان `&[&dyn Fn()]` یک [_slice_] از [_trait object_] است که آن هم ارجاعی از تِرِیت (کلمه: trait) [_Fn()_] می‌باشد. در اصل لیستی از ارجاع به انواع است که می‌توان آن‌ها را مانند یک تابع صدا زد. از آن‌جایی که این تابع برای اجراهایی که تست نباشند بی فایده است، از ویژگی `#[cfg(test)]` استفاده می‌کنیم تا آن را فقط برای تست کردن در اضافه کنیم. [_slice_]: https://doc.rust-lang.org/std/primitive.slice.html [_trait object_]: https://doc.rust-lang.org/1.30.0/book/first-edition/trait-objects.html [_Fn()_]: https://doc.rust-lang.org/std/ops/trait.Fn.html حال وقتی که `cargo test` را اجرا می‌کنیم، می‌بینیم که الان موفقیت آمیز است (اگر اینطور نیست یادداشت زیر را بخوانید). اگرچه، همچنان “Hello World” را به جای پیام `test_runner` می‌بینیم. دلیلش این است که تابع `_start` هنوز بعنوان نقطه شروع استفاده می‌شود. ویژگی فریم‌ورک تست سفارشی، یک تابع `main` ایجاد می‌کند که `test_runner` را صدا می‌زند، اما این تابع نادیده گرفته می‌شود چرا که ما از ویژگی `#[no_main]` استفاده می‌کنیم و نقطه شروع خودمان را ایجاد کردیم.
    **یادداشت:** درحال حاضر یک باگ در کارگو وجود دارد که در برخی موارد وقتی از `cargo test` استفاده می‌کنیم ما را به سمت خطای “duplicate lang item” می‌برد. زمانی رخ می‌دهد که شما `panic = "abort"` را برای یک پروفایل در `Cargo.toml` تنظیم کرده‌اید. سعی کنید آن را حذف کنید، سپس `cargo test` باید به درستی کار کند. برای اطلاعات بیشتر [ایشوی کارگو](https://github.com/rust-lang/cargo/issues/7359) را ببینید.
    برای حل کردن این مشکل، ما ابتدا نیاز داریم که نام تابع تولید شده را از طریق صفت `reexport_test_harness_main` به چیزی غیر از `main` تغییر دهیم. سپس می‌توانیم تابع تغییر نام داده شده را از تابع `_start` صدا بزنیم: ```rust // in src/main.rs #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } ``` ما نام فریم‌ورک تست تابع شروع را `test_main` گذاشتیم و آن را درون `_start` صدا زدیم. از [conditional compilation] برای اضافه کردن فراخوانی `test_main` فقط در زمینه‌های تست استفاده می‌کنیم زیرا تابع روی یک اجرای عادی تولید نشده است. زمانی که `cargo test` را اجرا می‌کنیم، می‌بینیم که پیام "Running 0 tests" از `test_runner` روی صفحه نمایش داده می‌شود. حال ما آماده‌ایم تا اولین تابع تست را بسازیم: ```rust // in src/main.rs #[test_case] fn trivial_assertion() { print!("trivial assertion... "); assert_eq!(1, 1); println!("[ok]"); } ``` حال وقتی `cargo test` را اجرا می‌کنیم، خروجی زیر را می‌بینیم: ![QEMU printing "Hello World!", "Running 1 tests", and "trivial assertion... [ok]"](qemu-test-runner-output.png) حالا بخش `tests` ارسال شده به تابع `test_runner` شامل یک ارجاع به تابع `trivial_assertion` است. از خروجی `trivial assertion... [ok]` روی صفحه می‌فهمیم که تست مورد نظر فراخوانی شده و موفقیت آمیز بوده است. پس از اجرای تست‌ها، `test_runner` به تابع `test_main` برمی‌گردد، که به نوبه خود به تابع `_start` برمی‌گردد. در انتهای `_start`، یک حلقه بی‌پایان ایجاد می‌کنیم زیرا تابع شروع اجازه برگردادن چیزی را ندارد (یعنی بدون خروجی است). این یک مشکل است، زیرا می‌خواهیم `cargo test` پس از اجرای تمام تست‌ها به کار خود پایان دهد. ## خروج از QEMU در حال حاضر ما یک حلقه بی‌پایان در انتهای تابع `"_start"` داریم و باید QEMU را به صورت دستی در هر مرحله از `cargo test` ببندیم. این جای تأسف دارد زیرا ما همچنین می‌خواهیم `cargo test` را در اسکریپت‌ها بدون تعامل کاربر اجرا کنیم. یک راه حل خوب می‌تواند اجرای یک روش مناسب برای خاموش کردن سیستم عامل باشد. متأسفانه این کار نسبتاً پیچیده است، زیرا نیاز به پشتیبانی از استاندارد [APM] یا [ACPI] مدیریت توان دارد. [APM]: https://wiki.osdev.org/APM [ACPI]: https://wiki.osdev.org/ACPI خوشبختانه، یک دریچه فرار وجود دارد: QEMU از یک دستگاه خاص `isa-debug-exit` پشتیبانی می‌کند، که راهی آسان برای خروج از سیستم QEMU از سیستم مهمان فراهم می‌کند. برای فعال کردن آن، باید یک آرگومان `-device` را به QEMU منتقل کنیم. ما می‌توانیم این کار را با اضافه کردن کلید پیکربندی `pack.metadata.bootimage.test-args` در` Cargo.toml` انجام دهیم: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = ["-device", "isa-debug-exit,iobase=0xf4,iosize=0x04"] ``` `bootimage runner` برای کلیه تست‌های اجرایی ` test-args` را به دستور پیش فرض QEMU اضافه می کند. برای یک `cargo run` عادی، آرگومان‌ها نادیده گرفته می‌شوند. همراه با نام دستگاه (`isa-debug-exit`)، دو پارامتر `iobase` و `iosize` را عبور می‌دهیم که _پورت I/O_ را مشخص می‌کند و هسته از طریق آن می‌تواند به دستگاه دسترسی داشته باشد. ### پورت‌های I/O برای برقراری ارتباط بین پردازنده و سخت افزار جانبی در x86، دو رویکرد مختلف وجود دارد،**memory-mapped I/O** و **port-mapped I/O**. ما قبلاً برای دسترسی به [بافر متن VGA] از طریق آدرس حافظه `0xb8000` از memory-mapped I/O استفاده کرده‌ایم. این آدرس به RAM مپ (ترسیم) نشده است، بلکه به برخی از حافظه‌های دستگاه VGA مپ شده است. [بافر متن VGA]: @/edition-2/posts/03-vga-text-buffer/index.md در مقابل، port-mapped I/O از یک گذرگاه I/O جداگانه برای ارتباط استفاده می‌کند. هر قسمت جانبی متصل دارای یک یا چند شماره پورت است. برای برقراری ارتباط با چنین پورت I/O، دستورالعمل‌های CPU خاصی وجود دارد که `in` و `out` نامیده می‌شوند، که یک عدد پورت و یک بایت داده را می‌گیرند (همچنین این دستورات تغییراتی دارند که اجازه می دهد یک `u16` یا `u32` ارسال کنید). دستگاه‌های `isa-debug-exit` از port-mapped I/O استفاده می‌کنند. پارامتر `iobase` مشخص می‌کند که دستگاه باید در کدام آدرس پورت قرار بگیرد (`0xf4` یک پورت [معمولاً استفاده نشده][list of x86 I/O ports] در گذرگاه IO x86 است) و `iosize` اندازه پورت را مشخص می‌کند (`0x04` یعنی چهار بایت). [list of x86 I/O ports]: https://wiki.osdev.org/I/O_Ports#The_list ### استفاده از دستگاه خروج عملکرد دستگاه `isa-debug-exit` بسیار ساده است. وقتی یک مقدار به پورت I/O مشخص شده توسط `iobase` نوشته می‌شود، باعث می شود QEMU با [exit status] خارج شود `(value << 1) | 1`. بنابراین هنگامی که ما `0` را در پورت می‌نویسیم، QEMU با وضعیت خروج `(0 << 1) | 1 = 1` خارج می‌شود و وقتی که ما `1` را در پورت می‌نویسیم با وضعیت خروج `(1 << 1) | 1 = 3` از آن خارج می شود. [exit status]: https://en.wikipedia.org/wiki/Exit_status به جای فراخوانی دستی دستورالعمل های اسمبلی `in` و `out`، ما از انتزاعات ارائه شده توسط کریت [`x86_64`] استفاده می‌کنیم. برای افزودن یک وابستگی به آن کریت، آن را به بخش `dependencies` در `Cargo.toml` اضافه می‌کنیم: [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/ ```toml # in Cargo.toml [dependencies] x86_64 = "0.14.2" ``` اکنون می‌توانیم از نوع [`Port`] ارائه شده توسط کریت برای ایجاد عملکرد `exit_qemu` استفاده کنیم: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html ```rust // in src/main.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` این تابع یک [`Port`] جدید در `0xf4` ایجاد می‌کند، که `iobase` دستگاه `isa-debug-exit` است. سپس کد خروجی عبور داده شده را در پورت می‌نویسد. ما از `u32` استفاده می‌کنیم زیرا `iosize` دستگاه `isa-debug-exit` را به عنوان 4 بایت مشخص کردیم. هر دو عملیات ایمن نیستند، زیرا نوشتن در یک پورت I/O می‌تواند منجر به رفتار خودسرانه شود. برای تعیین وضعیت خروج، یک ای‌نام (کلمه: enum) `QemuExitCode` ایجاد می کنیم. ایده این است که اگر همه تست‌ها موفقیت آمیز بود، با کد خروج موفقیت (ترجمه: success exit code) خارج شود و در غیر این صورت با کد خروج شکست (ترجمه: failure exit code) خارج شود. enum به عنوان `#[repr(u32)]` علامت گذاری شده است تا هر نوع را با یک عدد صحیح `u32` نشان دهد. برای موفقیت از کد خروجی `0x10` و برای شکست از `0x11` استفاده می‌کنیم. کدهای خروجی واقعی چندان هم مهم نیستند، به شرطی که با کدهای خروجی پیش فرض QEMU مغایرت نداشته باشند. به عنوان مثال، استفاده از کد خروجی `0` برای موفقیت ایده خوبی نیست زیرا پس از تغییر شکل تبدیل به `(0 << 1) | 1 = 1` می‌شود، که کد خروجی پیش فرض است برای زمانی که QEMU نمی‌تواند اجرا شود. بنابراین ما نمی‌توانیم خطای QEMU را از یک تست موفقیت آمیز تشخیص دهیم. اکنون می توانیم `test_runner` خود را به روز کنیم تا پس از اتمام تست‌ها از QEMU خارج شویم: ```rust fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } /// new exit_qemu(QemuExitCode::Success); } ``` حال وقتی `cargo test` را اجرا می‌کنیم، می‌بینیم که QEMU پس از اجرای تست‌ها بلافاصله بسته می‌شود. مشکل این است که `cargo test` تست را به عنوان شکست تفسیر می‌کند حتی اگر کد خروج `Success` را عبور دهیم: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.03s Running target/x86_64-blog_os/debug/deps/blog_os-5804fc7d2dd4c9be Building bootloader Compiling bootloader v0.5.3 (/home/philipp/Documents/bootloader) Finished release [optimized + debuginfo] target(s) in 1.07s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-5804fc7d2dd4c9be.bin -device isa-debug-exit,iobase=0xf4, iosize=0x04` error: test failed, to rerun pass '--bin blog_os' ``` مسئله این است که `cargo test` همه کدهای خطا به غیر از `0` را به عنوان شکست در نظر می‌گیرد. ### کد خروج موفقیت برای کار در این مورد، `bootimage` یک کلید پیکربندی `test-success-exit-code` ارائه می‌دهد که یک کد خروجی مشخص را به کد خروجی `0` مپ می‌کند: ```toml [package.metadata.bootimage] test-args = […] test-success-exit-code = 33 # (0x10 << 1) | 1 ``` با استفاده از این پیکربندی، `bootimage` کد خروج موفقیت ما را به کد خروج 0 مپ می‌کند، به طوری که `cargo test` به درستی مورد موفقیت را تشخیص می‌دهد و تست را شکست خورده به حساب نمی‌آورد. اجرا کننده تست ما اکنون به طور خودکار QEMU را می‌بندد و نتایج تست را به درستی گزارش می‌کند. ما همچنان می‌بینیم که پنجره QEMU برای مدت بسیار کوتاهی باز است، اما این مدت بسیار کوتاه برای خواندن نتایج کافی نیست. جالب می‌شود اگر بتوانیم نتایج تست را به جای QEMU در کنسول چاپ کنیم، بنابراین پس از خروج از QEMU هنوز می‌توانیم آنها را ببینیم. ## چاپ کردن در کنسول برای دیدن خروجی تست روی کنسول، باید داده‌ها را از هسته خود به نحوی به سیستم میزبان ارسال کنیم. روش‌های مختلفی برای دستیابی به این هدف وجود دارد، به عنوان مثال با ارسال داده‌ها از طریق رابط شبکه TCP. با این حال، تنظیم پشته شبکه یک کار کاملا پیچیده است، بنابراین ما به جای آن راه حل ساده‌تری را انتخاب خواهیم کرد. ### پورت سریال یک راه ساده برای ارسال داده‌ها استفاده از [پورت سریال] است، یک استاندارد رابط قدیمی که دیگر در رایانه‌های مدرن یافت نمی‌شود. پیاده‌سازی آن آسان است و QEMU می‌تواند بایت‌های ارسالی از طریق سریال را به خروجی استاندارد میزبان یا یک فایل هدایت کند. [پورت سریال]: https://en.wikipedia.org/wiki/Serial_port تراشه‌های پیاده سازی یک رابط سریال [UART] نامیده می‌شوند. در x86 [مدلهای UART زیادی] وجود دارد، اما خوشبختانه تنها تفاوت آنها ویژگی‌های پیشرفته‌ای است که نیازی به آن‌ها نداریم. UART هایِ رایج امروزه همه با [16550 UART] سازگار هستند، بنابراین ما از آن مدل برای فریم‌ورک تست خود استفاده خواهیم کرد. [UARTs]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter [مدلهای UART زیادی]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#Models [16550 UART]: https://en.wikipedia.org/wiki/16550_UART ما از کریت [`uart_16550`] برای شروع اولیه UART و ارسال داده‌ها از طریق پورت سریال استفاده خواهیم کرد. برای افزودن آن به عنوان یک وابستگی، ما `Cargo.toml` و `main.rs` خود را به روز می‌کنیم: [`uart_16550`]: https://docs.rs/uart_16550 ```toml # in Cargo.toml [dependencies] uart_16550 = "0.2.0" ``` کریت `uart_16550` حاوی ساختار `SerialPort` است که نمایانگر ثبات‌های UART است، اما ما هنوز هم باید نمونه‌ای از آن را خودمان بسازیم. برای آن ما یک ماژول `‌serial` جدید با محتوای زیر ایجاد می‌کنیم: ```rust // in src/main.rs mod serial; ``` ```rust // in src/serial.rs use uart_16550::SerialPort; use spin::Mutex; use lazy_static::lazy_static; lazy_static! { pub static ref SERIAL1: Mutex = { let mut serial_port = unsafe { SerialPort::new(0x3F8) }; serial_port.init(); Mutex::new(serial_port) }; } ``` مانند [بافر متن VGA] [vga lazy-static]، ما از `lazy_static` و یک spinlock برای ایجاد یک نمونه نویسنده `static` استفاده می‌کنیم. با استفاده از `lazy_static` می‌توان اطمینان حاصل کرد که متد `init` در اولین استفاده دقیقاً یک بار فراخوانی می‌شود. مانند دستگاه `isa-debug-exit`، UART با استفاده از پورت I/O برنامه نویسی می‌شود. از آنجا که UART پیچیده‌تر است، از چندین پورت I/O برای برنامه نویسی رجیسترهای مختلف دستگاه استفاده می‌کند. تابع ناامن `SerialPort::new` انتظار دارد که آدرس اولین پورت I/O از UART به عنوان آرگومان باشد، که از آن می‌تواند آدرس تمام پورت‌های مورد نیاز را محاسبه کند. ما در حال عبور دادنِ آدرس پورت `0x3F8` هستیم که شماره پورت استاندارد برای اولین رابط سریال است. [vga lazy-static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics برای اینکه پورت سریال به راحتی قابل استفاده باشد، ماکروهای `serial_print!` و `serial_println!` را اضافه می‌کنیم: ```rust #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; SERIAL1.lock().write_fmt(args).expect("Printing to serial failed"); } /// Prints to the host through the serial interface. #[macro_export] macro_rules! serial_print { ($($arg:tt)*) => { $crate::serial::_print(format_args!($($arg)*)); }; } /// Prints to the host through the serial interface, appending a newline. #[macro_export] macro_rules! serial_println { () => ($crate::serial_print!("\n")); ($fmt:expr) => ($crate::serial_print!(concat!($fmt, "\n"))); ($fmt:expr, $($arg:tt)*) => ($crate::serial_print!( concat!($fmt, "\n"), $($arg)*)); } ``` پیاده سازی بسیار شبیه به پیاده سازی ماکروهای `print` و` println` است. از آنجا که نوع `SerialPort` تِرِیت [`fmt::Write`] را پیاده سازی می‌کند، نیازی نیست این پیاده سازی را خودمان انجام دهیم. [`fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html اکنون می‌توانیم به جای بافر متن VGA در کد تست خود، روی رابط سریال چاپ کنیم: ```rust // in src/main.rs #[cfg(test)] fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); […] } #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` توجه داشته باشید که ماکرو `serial_println` مستقیماً در زیر فضای نام (ترجمه: namespace) ریشه قرار می‌گیرد زیرا ما از صفت `#[macro_export]` استفاده کردیم، بنابراین وارد کردن آن از طریق `use crate::serial::serial_println` کار نمی کند. ### آرگومان‌‌های QEMU برای دیدن خروجی سریال از QEMU، باید از آرگومان `-serial` برای هدایت خروجی به stdout (خروجی استاندارد) استفاده کنیم: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio" ] ``` حالا وقتی `cargo test` را اجرا می‌کنیم، خروجی تست را مستقیماً در کنسول مشاهده خواهیم گرد: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [ok] ``` با این حال، هنگامی که یک تست ناموفق بود، ما همچنان خروجی را داخل QEMU مشاهده می‌کنیم، زیرا panic handler هنوز از `println` استفاده می‌کند. برای شبیه‌سازی این، می‌توانیم assertion درون تست `trivial_assertion` را به `assert_eq!(0, 1)` تغییر دهیم: ![QEMU printing "Hello World!" and "panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:55:5](qemu-failed-test.png) می‌بینیم که پیام panic (تلفظ: پَنیک) هنوز در بافر VGA چاپ می‌شود، در حالی که خروجی‌ تست دیگر (منظور تستی می‌باشد که پنیک نکند) در پورت سریال چاپ می‌شود. پیام پنیک کاملاً مفید است، بنابراین دیدن آن در کنسول نیز مفید خواهد بود. ### چاپ کردن پیام خطا هنگام پنیک کردن برای خروج از QEMU با یک پیام خطا هنگامی که پنیک رخ می‌دهد، می‌توانیم از [conditional compilation] برای استفاده از یک panic handler متفاوت در حالت تست استفاده کنیم: [conditional compilation]: https://doc.rust-lang.org/1.30.0/book/first-edition/conditional-compilation.html ```rust // our existing panic handler #[cfg(not(test))] // new attribute #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } // our panic handler in test mode #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } ``` برای panic handler تستِ خودمان، از `serial_println` به جای `println` استفاده می‌کنیم و سپس با کد خروج خطا از QEMU خارج می‌شویم. توجه داشته باشید که بعد از فراخوانی `exit_qemu` هنوز به یک حلقه بی‌پایان نیاز داریم زیرا کامپایلر نمی‌داند که دستگاه `isa-debug-exit` باعث خروج برنامه می‌شود. اکنون QEMU برای تست‌های ناموفق نیز خارج شده و یک پیام خطای مفید روی کنسول چاپ می کند: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:65:5 ``` از آن‌جایی که اکنون همه خروجی‌های تست را در کنسول مشاهده می‌کنیم، دیگر نیازی به پنجره QEMU نداریم که برای مدت کوتاهی ظاهر می‌شود. بنابراین می‌توانیم آن را کاملا پنهان کنیم. ### پنهان کردن QEMU از آنجا که ما نتایج کامل تست را با استفاده از دستگاه `isa-debug-exit` و پورت سریال گزارش می‌کنیم، دیگر نیازی به پنجره QEMU نداریم. ما می‌توانیم آن را با عبور دادن آرگومان `-display none` به QEMU پنهان کنیم: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio", "-display", "none" ] ``` اکنون QEMU کاملا در پس زمینه اجرا می‌شود و دیگر هیچ پنجره‌ای باز نمی‌شود. این نه تنها کمتر آزار دهنده است، بلکه به فریم‌ورک تست ما این امکان را می‌دهد که در محیط‌های بدون رابط کاربری گرافیکی مانند سرویس‌های CI یا کانکشن‌های [SSH] اجرا شود. [SSH]: https://en.wikipedia.org/wiki/Secure_Shell ### Timeouts از آنجا که `cargo test` منتظر می‌ماند تا test runner (ترجمه: اجرا کننده تست) پایان یابد، تستی که هرگز به اتمام نمی‌رسد (چه موفق، چه ناموفق) می‌تواند برای همیشه اجرا کننده تست را مسدود کند. این جای تأسف دارد، اما در عمل مشکل بزرگی نیست زیرا اجتناب از حلقه‌های بی‌پایان به طور معمول آسان است. با این حال، در مورد ما، حلقه‌های بی‌پایان می‌توانند در موقعیت‌های مختلف رخ دهند: - بوت لودر موفق به بارگیری هسته نمی‌شود، در نتیجه سیستم به طور بی‌وقفه راه اندازی مجدد شود. - فریم‌ورک BIOS/UEFI قادر به بارگیری بوت لودر نمی‌شود، در نتیجه باز هم باعث راه‌اندازی مجدد بی‌پایان می‌شود. - وقتی که CPU در انتهای برخی از توابع ما وارد یک `loop {}` (حلقه بی‌پایان) می‌شود، به عنوان مثال به دلیل اینکه دستگاه خروج QEMU به درستی کار نمی‌کند. - یا وقتی که سخت افزار باعث ریست شدن سیستم می‌شود، به عنوان مثال وقتی یک استثنای پردازنده (ترجمه: CPU exception) گیر نمی‌افتد (در پست بعدی توضیح داده شده است). از آنجا که حلقه های بی‌پایان در بسیاری از شرایط ممکن است رخ دهد، به طور پیش فرض ابزار `bootimage` برای هر تست ۵ دقیقه زمان تعیین می‌کند. اگر تست در این زمان به پایان نرسد، به عنوان ناموفق علامت گذاری شده و خطای "Timed Out" در کنسول چاپ می شود. این ویژگی تضمین می‌کند که تست‌هایی که در یک حلقه بی‌پایان گیر کرده‌اند، `cargo test` را برای همیشه مسدود نمی‌کنند. خودتان می‌توانید با افزودن عبارت `loop {}` در تست `trivial_assertion` آن را امتحان کنید. هنگامی که `cargo test` را اجرا می‌کنید، می‌بینید که این تست پس از ۵ دقیقه به پایان رسیده است. مدت زمان مهلت از طریق یک کلید `test-timeout` در Cargo.toml [قابل پیکربندی][bootimage config] است: [bootimage config]: https://github.com/rust-osdev/bootimage#configuration ```toml # in Cargo.toml [package.metadata.bootimage] test-timeout = 300 # (in seconds) ``` اگر نمی‌خواهید ۵ دقیقه منتظر بمانید تا تست `trivial_assertion` تمام شود، می‌توانید به طور موقت مقدار فوق را کاهش دهید. ### اضافه کردن چاپ خودکار تست `trivial_assertion` در حال حاضر باید اطلاعات وضعیت خود را با استفاده از `serial_print!`/`serial_println!` چاپ کند: ```rust #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` افزودن دستی این دستورات چاپی برای هر تستی که می‌نویسیم دست و پا گیر است، بنابراین بیایید `test_runner` خود را به روز کنیم تا به صورت خودکار این پیام‌ها را چاپ کنیم. برای انجام این کار، ما باید یک تریت جدید به نام `Testable` ایجاد کنیم: ```rust // in src/main.rs pub trait Testable { fn run(&self) -> (); } ``` این ترفند اکنون پیاده سازی این تریت برای همه انواع `T` است که [`Fn()` trait] را پیاده سازی می‌کنند: [`Fn()` trait]: https://doc.rust-lang.org/stable/core/ops/trait.Fn.html ```rust // in src/main.rs impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } ``` ما با اولین چاپِ نام تابع از طریق تابعِ [`any::type_name`]، تابع `run` را پیاده سازی می کنیم. این تابع مستقیماً در کامپایلر پیاده سازی شده و یک رشته توضیح از هر نوع را برمی‌گرداند. برای توابع، نوع آنها نامشان است، بنابراین این دقیقاً همان چیزی است که ما در این مورد می‌خواهیم. کاراکتر `\t` [کاراکتر tab] است، که مقداری ترازبندی‌ به پیام‌های `[ok]` اضافه می‌کند. [`any::type_name`]: https://doc.rust-lang.org/stable/core/any/fn.type_name.html [کاراکتر tab]: https://en.wikipedia.org/wiki/Tab_character پس از چاپ نام تابع، ما از طریق `self ()` تابع تست را فراخوانی می‌کنیم. این فقط به این دلیل کار می‌کند که ما نیاز داریم که `self` تریت `Fn()` را پیاده سازی کند. بعد از بازگشت تابع تست، ما `[ok]` را چاپ می‌کنیم تا نشان دهد که تابع پنیک نکرده است. آخرین مرحله به روزرسانی `test_runner` برای استفاده از تریت جدید` Testable` است: ```rust // in src/main.rs #[cfg(test)] pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); // new } exit_qemu(QemuExitCode::Success); } ``` تنها دو تغییر رخ داده، نوع آرگومان `tests` از `&[&dyn Fn()]` به `&[&dyn Testable]` است و ما اکنون `test.run()` را به جای `test()` فراخوانی می‌کنیم. اکنون می‌توانیم عبارات چاپ را از تست `trivial_assertion` حذف کنیم، زیرا آن‌ها اکنون به طور خودکار چاپ می‌شوند: ```rust // in src/main.rs #[test_case] fn trivial_assertion() { assert_eq!(1, 1); } ``` خروجی `cargo test` اکنون به این شکل است: ``` Running 1 tests blog_os::trivial_assertion... [ok] ``` نام تابع اکنون مسیر کامل به تابع را شامل می‌شود، که زمانی مفید است که توابع تست در ماژول‌های مختلف نام یکسانی دارند. در غیر اینصورت خروجی همانند قبل است، اما دیگر نیازی نیست که به صورت دستی دستورات چاپ را به تست‌های خود اضافه کنیم. ## تست کردن بافر VGA اکنون که یک فریم‌ورک تستِ کارا داریم، می‌توانیم چند تست برای اجرای بافر VGA خود ایجاد کنیم. ابتدا، ما یک تست بسیار ساده برای تأیید اینکه `println` بدون پنیک کردن کار می‌کند ایجاد می‌کنیم: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_simple() { println!("test_println_simple output"); } ``` این تست فقط چیزی را در بافر VGA چاپ می کند. اگر بدون پنیک تمام شود، به این معنی است که فراخوانی `println` نیز پنیک نکرده است. برای اطمینان از این‌ که پنیک ایجاد نمی‌شود حتی اگر خطوط زیادی چاپ شده و خطوط از صفحه خارج شوند، می‌توانیم آزمایش دیگری ایجاد کنیم: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_many() { for _ in 0..200 { println!("test_println_many output"); } } ``` همچنین می‌توانیم تابع تستی ایجاد کنیم تا تأیید کنیم که خطوط چاپ شده واقعاً روی صفحه ظاهر می شوند: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` این تابع یک رشته آزمایشی را تعریف می‌کند، آن را با استفاده از `println` چاپ می‌کند و سپس بر روی کاراکترهای صفحه از ` WRITER` ثابت تکرار (iterate) می‌کند، که نشان دهنده بافر متن vga است. از آنجا که `println` در آخرین خط صفحه چاپ می‌شود و سپس بلافاصله یک خط جدید اضافه می‌کند، رشته باید در خط` BUFFER_HEIGHT - 2` ظاهر شود. با استفاده از [`enumerate`]، تعداد تکرارها را در متغیر `i` حساب می‌کنیم، سپس از آن‌ها برای بارگذاری کاراکتر صفحه مربوط به `c` استفاده می‌کنیم. با مقایسه `ascii_character` از کاراکتر صفحه با `c`، اطمینان حاصل می‌کنیم که هر کاراکتر از این رشته واقعاً در بافر متن vga ظاهر می‌شود. [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate همانطور که می‌توانید تصور کنید، ما می‌توانیم توابع تست بیشتری ایجاد کنیم، به عنوان مثال تابعی که تست می‌کند هنگام چاپ خطوط طولانی پنیک ایجاد نمی‌شود و به درستی بسته‌بندی می‌شوند. یا تابعی برای تست این که خطوط جدید، کاراکترهای غیرقابل چاپ (ترجمه: non-printable) و کاراکترهای non-unicode به درستی اداره می‌شوند. برای بقیه این پست، ما نحوه ایجاد _integration tests_ را برای تست تعامل اجزای مختلف با هم توضیح خواهیم داد. ## تست‌های یکپارچه قرارداد [تست‌های یکپارچه] در Rust این است که آن‌ها را در یک دایرکتوری `tests` در ریشه پروژه قرار دهید (یعنی در کنار فهرست `src`). فریم‌ورک تست پیش فرض و فریم‌ورک‌های تست سفارشی به طور خودکار تمام تست‌های موجود در آن فهرست را انتخاب و اجرا می‌کنند. [تست‌های یکپارچه]: https://doc.rust-lang.org/book/ch11-03-test-organization.html#integration-tests همه تست‌های یکپارچه، فایل اجرایی خاص خودشان هستند و کاملاً از `main.rs` جدا هستند. این بدان معناست که هر تست باید تابع نقطه شروع خود را مشخص کند. بیایید یک نمونه تست یکپارچه به نام `basic_boot` ایجاد کنیم تا با جزئیات ببینیم که چگونه کار می‌کند: ```rust // in tests/basic_boot.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { test_main(); loop {} } fn test_runner(tests: &[&dyn Fn()]) { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { loop {} } ``` از آن‌جا که تست‌های یکپارچه فایل‌های اجرایی جداگانه‌ای هستند، ما باید تمام صفت‌های کریت (`no_std`، `no_main`، `test_runner` و غیره) را دوباره تهیه کنیم. ما همچنین باید یک تابع شروع جدید `_start` ایجاد کنیم که تابع نقطه شروع تست `test_main` را فراخوانی می‌کند. ما به هیچ یک از ویژگی‌های `cfg (test)` نیازی نداریم زیرا اجرایی‌های تست یکپارچه هرگز در حالت غیر تست ساخته نمی‌شوند. ما از ماکرو [ʻunimplemented] استفاده می‌کنیم که همیشه به عنوان یک مکان نگهدار برای تابع `test_runner` پنیک می‌کند و فقط در حلقه رسیدگی کننده `panic` فعلاً `loop` می‌زند. در حالت ایده آل، ما می‌خواهیم این توابع را دقیقاً همانطور که در `main.rs` خود با استفاده از ماکرو` serial_println` و تابع `exit_qemu` پیاده سازی کردیم، پیاده سازی کنیم. مشکل این است که ما به این توابع دسترسی نداریم زیرا تست‌ها کاملاً جدا از اجرایی `main.rs` ساخته شده‌اند. [`unimplemented`]: https://doc.rust-lang.org/core/macro.unimplemented.html اگر در این مرحله `cargo test` را انجام دهید، یک حلقه بی‌پایان خواهید گرفت زیرا رسیدگی کننده پنیک دارای حلقه بی‌پایان است. برای خروج از QEMU باید از میانبر صفحه کلید `Ctrl + c` استفاده کنید. ### ساخت یک کتابخانه برای در دسترس قرار دادن توابع مورد نیاز در تست یکپارچه، باید یک کتابخانه را از `main.rs` جدا کنیم، کتابخانه‌ای که می‌تواند توسط کریت‌های دیگر و تست‌های یکپارچه مورد استفاده قرار بگیرد. برای این کار، یک فایل جدید `src/lib.rs` ایجاد می‌کنیم: ```rust // src/lib.rs #![no_std] ``` مانند `main.rs` ،`lib.rs` یک فایل خاص است که به طور خودکار توسط کارگو شناسایی می‌شود. کتابخانه یک واحد تلفیقی جداگانه است، بنابراین باید ویژگی `#![no_std]` را دوباره مشخص کنیم. برای اینکه کتابخانه‌مان با `cargo test` کار کند، باید توابع و صفت‌های تست را نیز اضافه کنیم: To make our library work with `cargo test`, we need to also add the test functions and attributes: ```rust // in src/lib.rs #![cfg_attr(test, no_main)] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; pub trait Testable { fn run(&self) -> (); } impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); } exit_qemu(QemuExitCode::Success); } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { test_panic_handler(info) } ``` برای اینکه `test_runner` را در دسترس ‌تست‌های یکپارچه و فایل‌های اجرایی قرار دهیم، صفت `cfg(test)` را روی آن اعمال نمی‌کنیم و عمومی نمی‌کنیم. ما همچنین پیاده سازی رسیدگی کننده پنیک خود را به یک تابع عمومی `test_panic_handler` تبدیل می‌کنیم، به طوری که برای اجرایی‌ها نیز در دسترس باشد. از آن‌جا که `lib.rs` به طور مستقل از` main.rs` ما تست می‌شود، هنگام کامپایل کتابخانه در حالت تست، باید یک نقطه شروع `_start` و یک رسیدگی کننده پنیک اضافه کنیم. با استفاده از صفت کریت [`cfg_attr`]، در این حالت ویژگی`no_main` را به طور مشروط فعال می‌کنیم. [`cfg_attr`]: https://doc.rust-lang.org/reference/conditional-compilation.html#the-cfg_attr-attribute ما همچنین ای‌نام `QemuExitCode` و تابع `exit_qemu` را عمومی می‌کنیم: ```rust // in src/lib.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` اکنون فایل‌های اجرایی و تست‌های یکپارچه می‌توانند این توابع را از کتابخانه وارد کنند و نیازی به تعریف پیاده سازی‌های خود ندارند. برای در دسترس قرار دادن `println` و `serial_println`، اعلان ماژول‌ها را نیز منتقل می‌کنیم: ```rust // in src/lib.rs pub mod serial; pub mod vga_buffer; ``` ما ماژول‌ها را عمومی می‌کنیم تا از خارج از کتابخانه قابل استفاده باشند. این امر همچنین برای استفاده از ماکروهای `println` و `serial_println` مورد نیاز است، زیرا آنها از توابع `_print` ماژول‌ها استفاده می‌کنند. اکنون می توانیم `main.rs` خود را برای استفاده از کتابخانه به روز کنیم: ```rust // src/main.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; use blog_os::println; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } /// This function is called on panic. #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` کتابخانه مانند یک کریت خارجی معمولی قابل استفاده است. و مانند کریت (که در مورد ما کریت `blog_os` است) فراخوانی می‌شود. کد فوق از تابع `blog_os :: test_runner` در صفت `test_runner` و تابع `blog_os :: test_panic_handler` در رسیدگی کننده پنیک `cfg(test)` استفاده می‌کند. همچنین ماکرو `println` را وارد می‌کند تا در اختیار توابع `_start` و `panic` قرار گیرد. در این مرحله، `cargo run` و `cargo test` باید دوباره کار کنند. البته، `cargo test` هنوز هم در یک حلقه بی‌پایان گیر می‌کند (با `ctrl + c` می‌توانید خارج شوید). بیایید با استفاده از توابع مورد نیاز کتابخانه در تست یکپارچه این مشکل را برطرف کنیم. ### تمام کردن تست یکپارچه مانند `src/main.rs`، اجرایی` test/basic_boot.rs` می‌تواند انواع مختلفی را از کتابخانه جدید ما وارد کند. که این امکان را به ما می‌دهد تا اجزای گمشده را برای تکمیل آزمایش وارد کنیم. ```rust // in tests/basic_boot.rs #![test_runner(blog_os::test_runner)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` ما به جای پیاده سازی مجدد اجرا کننده تست، از تابع `test_runner` در کتابخانه خود استفاده می‌کنیم. برای رسیدگی کننده `panic`، ما تابع `blog_os::test_panic_handler` را مانند آن‌چه در `main.rs` انجام دادیم، فراخوانی می‌کنیم. اکنون `cargo test` مجدداً به طور معمول وجود دارد. وقتی آن را اجرا می‌کنید ، می‌بینید که تست‌های `lib.rs`، `main.rs` و `basic_boot.rs` ما را به طور جداگانه و یکی پس از دیگری ایجاد و اجرا می‌کند. برای تست‌های یکپارچه `main.rs` و `basic_boot`، متن "Running 0 tests" را نشان می‌دهد زیرا این فایل‌ها هیچ تابعی با حاشیه نویسی `#[test_case]` ندارد. اکنون می‌توانیم تست‌ها را به `basic_boot.rs` خود اضافه کنیم. به عنوان مثال، ما می‌توانیم آزمایش کنیم که `println` بدون پنیک کار می‌کند، مانند آنچه در تست‌های بافر vga انجام دادیم: ```rust // in tests/basic_boot.rs use blog_os::println; #[test_case] fn test_println() { println!("test_println output"); } ``` حال وقتی `cargo test` را اجرا می‌کنیم، می‌بینیم که این تابع تست را پیدا و اجرا می‌کند. این تست ممکن است در حال حاضر کمی بی‌فایده به نظر برسد، زیرا تقریباً مشابه یکی از تست‌های بافر VGA است. با این حال، در آینده ممکن است توابع `_start` ما از `main.rs` و `lib.rs` رشد کرده و روال‌های اولیه مختلفی را قبل از اجرای تابع `test_main` فراخوانی کنند، به طوری که این دو تست در محیط‌های بسیار مختلف اجرا می‌شوند. ### تست‌های آینده قدرت تست‌های یکپارچه این است که با آن‌ها به عنوان اجرایی کاملاً جداگانه برخورد می‌شود. این امر به آن‌ها اجازه کنترل کامل بر محیط را می‌دهد، و امکان تست کردن این که کد به درستی با CPU یا دستگاه‌های سخت‌افزاری ارتباط دارد را به ما می‌دهد. تست `basic_boot` ما یک مثال بسیار ساده برای تست یکپارچه است. در آینده، هسته ما ویژگی‌های بسیار بیشتری پیدا می‌کند و از راه‌های مختلف با سخت افزار ارتباط برقرار می‌کند. با افزودن تست های یکپارچه، می‌توانیم اطمینان حاصل کنیم که این تعاملات مطابق انتظار کار می‌کنند (و به کار خود ادامه می‌دهند). برخی از ایده‌ها برای تست‌های احتمالی در آینده عبارتند از: - **استثنائات CPU**: هنگامی که این کد عملیات نامعتبری را انجام می‌دهد (به عنوان مثال تقسیم بر صفر)، CPU یک استثنا را ارائه می‌دهد. هسته می‌تواند توابع رسیدگی کننده را برای چنین مواردی ثبت کند. یک تست یکپارچه می‌تواند تأیید کند که در صورت بروز استثنا پردازنده ، رسیدگی کننده استثنای صحیح فراخوانی می‌شود یا اجرای آن پس از استثناهای قابل حل به درستی ادامه دارد. - **جدول‌های صفحه**: جدول‌های صفحه مشخص می‌کند که کدام مناطق حافظه معتبر و قابل دسترسی هستند. با اصلاح جدول‌های صفحه، می‌توان مناطق حافظه جدیدی را اختصاص داد، به عنوان مثال هنگام راه‌اندازی برنامه‌ها. یک تست یکپارچه می‌تواند برخی از تغییرات جدول‌های صفحه را در تابع `_start` انجام دهد و سپس تأیید کند که این تغییرات در تابع‌های `# [test_case]` اثرات مطلوبی دارند. - **برنامه‌های فضای کاربر**: برنامه‌های فضای کاربر برنامه‌هایی با دسترسی محدود به منابع سیستم هستند. به عنوان مثال، آنها به ساختار داده‌های هسته یا حافظه برنامه‌های دیگر دسترسی ندارند. یک تست یکپارچه می‌تواند برنامه‌های فضای کاربر را که عملیات‌های ممنوعه را انجام می‌دهند راه‌اندازی کرده و بررسی کند هسته از همه آن‌ها جلوگیری می‌کند. همانطور که می‌توانید تصور کنید، تست‌های بیشتری امکان پذیر است. با افزودن چنین تست‌هایی، می‌توانیم اطمینان حاصل کنیم که وقتی ویژگی‌های جدیدی به هسته خود اضافه می‌کنیم یا کد خود را دوباره می‌سازیم، آن‌ها را به طور تصادفی خراب نمی‌کنیم. این امر به ویژه هنگامی مهم‌تر می‌شود که هسته ما بزرگتر و پیچیده‌تر شود. ### تست‌هایی که باید پنیک کنند فریم‌ورک تست کتابخانه استاندارد از [صفت `#[should_panic]`][should_panic] پشتیبانی می‌کند که اجازه می‌دهد تست‌هایی را بسازد که باید ناموفق شوند (باید پنیک کنند). این مفید است، به عنوان مثال برای تأیید پنیک کردن یک تابع هنگام عبور دادن یک آرگومان نامعتبر به آن. متأسفانه این ویژگی در کریت‌های `#[no_std]` پشتیبانی نمی‌شود زیرا به پشتیبانی از کتابخانه استاندارد نیاز دارد. [should_panic]: https://doc.rust-lang.org/rust-by-example/testing/unit_testing.html#testing-panics اگرچه نمی‌توانیم از صفت `#[should_panic]` در هسته خود استفاده کنیم، اما می‌توانیم با ایجاد یک تست یکپارچه که با کد خطای موفقیت آمیز از رسیدگی کننده پنیک خارج می‌شود، رفتار مشابهی داشته باشیم. بیایید شروع به ایجاد چنین تستی با نام `should_panic` کنیم: ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{QemuExitCode, exit_qemu, serial_println}; #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` این تست هنوز ناقص است زیرا هنوز تابع `_start` یا هیچ یک از صفت‌های اجرا کننده تست سفارشی را مشخص نکرده. بیایید قسمت‌های گمشده را اضافه کنیم: ```rust // in tests/should_panic.rs #![feature(custom_test_frameworks)] #![test_runner(test_runner)] #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } pub fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); for test in tests { test(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); } exit_qemu(QemuExitCode::Success); } ``` به جای استفاده مجدد از `test_runner` از `lib.rs`، تست تابع `test_runner` خود را تعریف می‌کند که هنگام بازگشت یک تست بدون پنیک با یک کد خروج خطا خارج می‌شود (ما می‌خواهیم تست‌هایمان پنیک داشته باشند). اگر هیچ تابع تستی تعریف نشده باشد، اجرا کننده با کد خطای موفقیت خارج می‌شود. از آن‌جا که اجرا کننده همیشه پس از اجرای یک تست خارج می‌شود، منطقی نیست که بیش از یک تابع `#[test_case]` تعریف شود. اکنون می‌توانیم یک تست ایجاد کنیم که باید شکست بخورد: ```rust // in tests/should_panic.rs use blog_os::serial_print; #[test_case] fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } ``` این تست با استفاده از `assert_eq` ادعا (ترجمه: assert) می‌کند که `0` و `1` برابر هستند. این البته ناموفق است، به طوری که تست ما مطابق دلخواه پنیک می‌کند. توجه داشته باشید که ما باید نام تابع را با استفاده از `serial_print!` در اینجا چاپ دستی کنیم زیرا از تریت `Testable` استفاده نمی‌کنیم. هنگامی که ما تست را از طریق `cargo test --test should_panic` انجام دهیم، می‌بینیم که موفقیت آمیز است زیرا تست مطابق انتظار پنیک کرد. وقتی ادعا را کامنت کنیم و تست را دوباره اجرا کنیم، می‌بینیم که با پیام _"test did not panic"_ با شکست مواجه می‌شود. یک اشکال قابل توجه در این روش این است که این روش فقط برای یک تابع تست کار می‌کند. با چندین تابع `#[test_case]`، فقط اولین تابع اجرا می‌شود زیرا پس این‌که رسیدگی کننده پنیک فراخوانی شد، اجرا تمام می‌شود. من در حال حاضر راه خوبی برای حل این مشکل نمی‌دانم، بنابراین اگر ایده‌ای دارید به من اطلاع دهید! ### تست های بدون مهار برای تست‌های یکپارچه که فقط یک تابع تست دارند (مانند تست `should_panic` ما)، اجرا کننده تست مورد نیاز نیست. برای مواردی از این دست، ما می‌توانیم اجرا کننده تست را به طور کامل غیرفعال کنیم و تست خود را مستقیماً در تابع `_start` اجرا کنیم. کلید این کار غیرفعال کردن پرچم `harness` برای تست در` Cargo.toml` است، که مشخص می‌کند آیا از یک اجرا کننده تست برای تست یکپارچه استفاده می‌شود. وقتی روی `false` تنظیم شود، هر دو اجرا ککنده تست پیش فرض و سفارشی غیرفعال می‌شوند، بنابراین با تست مانند یک اجرای معمولی رفتار می‌شود. بیایید پرچم `harness` را برای تست `should_panic` خود غیرفعال کنیم: ```toml # in Cargo.toml [[test]] name = "should_panic" harness = false ``` اکنون ما با حذف کد مربوط به آاجرا کننده تست، تست `should_panic` خود را بسیار ساده کردیم. نتیجه به این شکل است: ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{exit_qemu, serial_print, serial_println, QemuExitCode}; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { should_fail(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); loop{} } fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` اکنون تابع `should_fail` را مستقیماً از تابع `_start` خود فراخوانی می‌کنیم و در صورت بازگشت با کد خروج شکست خارج می‌شویم. اکنون وقتی `cargo test --test should_panic` را اجرا می‌کنیم، می‌بینیم که تست دقیقاً مانند قبل عمل می‌کند. غیر از ایجاد تست‌های `should_panic`، غیرفعال کردن صفت `harness` همچنین می‌تواند برای تست‌های یکپارچه پیچیده مفید باشد، به عنوان مثال هنگامی که تابع‌های منفرد دارای عوارض جانبی هستند و باید به ترتیب مشخصی اجرا شوند. ## خلاصه تست کردن یک تکنیک بسیار مفید است تا اطمینان حاصل شود که اجزای خاصی رفتار مطلوبی دارند. حتی اگر آن‌ها نتوانند فقدان اشکالات را نشان دهند، آن‌ها هنوز هم یک ابزار مفید برای یافتن آن‌ها و به ویژه برای جلوگیری از دوباره کاری و پسرفت هستند. در این پست نحوه تنظیم فریم‌ورک تست برای هسته Rust ما توضیح داده شده است. ما از ویژگی فریم‌ورک تست سفارشی Rust برای پیاده سازی پشتیبانی از یک صفت ساده `#[test_case]` در محیط bare-metal خود استفاده کردیم. با استفاده از دستگاه `isa-debug-exit` شبیه‌ساز ماشین و مجازی‌ساز QEMU، اجرا کننده تست ما می‌تواند پس از اجرای تست‌ها از QEMU خارج شده و وضعیت تست را گزارش دهد. برای چاپ پیام‌های خطا به جای بافر VGA در کنسول، یک درایور اساسی برای پورت سریال ایجاد کردیم. پس از ایجاد چند تست برای ماکرو `println`، در نیمه دوم پست به بررسی تست‌های یکپارچه پرداختیم. ما فهمیدیم که آن‌ها در دایرکتوری `tests` قرار می‌گیرند و به عنوان اجرایی کاملاً مستقل با آن‌ها رفتار می‌شود. برای دسترسی دادن به آن‌ها به تابع `exit_qemu` و ماکرو `serial_println`، بیشتر کدهای خود را به یک کتابخانه منتقل کردیم که می‌تواند توسط همه اجراها و تست‌های یکپارچه وارد (import) شود. از آن‌جا که تست‌های یکپارچه در محیط جداگانه خود اجرا می‌شوند، آن‌ها تست تعاملاتی با سخت‌افزار یا ایجاد تست‌هایی که باید پنیک کنند را امکان پذیر می کنند. اکنون یک فریم‌ورک تست داریم که در یک محیط واقع گرایانه در داخل QEMU اجرا می‌شود. با ایجاد تست‌های بیشتر در پست‌های بعدی، می‌توانیم هسته خود را هنگامی که پیچیده‌تر شود، نگهداری کنیم. ## مرحله بعدی چیست؟ در پست بعدی، ما _استثنائات CPU_ را بررسی خواهیم کرد. این موارد استثنایی توسط CPU در صورت بروز هرگونه اتفاق غیرقانونی، مانند تقسیم بر صفر یا دسترسی به صفحه حافظه مپ نشده (اصطلاحاً "خطای صفحه")، رخ می‌دهد. امکان کشف و بررسی این موارد استثنایی برای رفع اشکال در خطاهای آینده بسیار مهم است. رسیدگی به استثناها نیز بسیار شبیه رسیدگی به وقفه‌های سخت‌افزاری است، که برای پشتیبانی صفحه کلید مورد نیاز است. ================================================ FILE: blog/content/edition-2/posts/04-testing/index.ja.md ================================================ +++ title = "テスト" weight = 4 path = "ja/testing" date = 2019-04-27 [extra] # Please update this when updating the translation translation_based_on_commit = "e6c148d6f47bcf8a34916393deaeb7e8da2d5e2a" # GitHub usernames of the people that translated this post translators = ["swnakamura", "JohnTitor","ic3w1ne"] +++ この記事では、`no_std`な実行環境における単体テスト (unit test) 結合テスト (integration test) について学びます。Rustではカスタムテストフレームワークがサポートされているので、これを使ってカーネルの中でテスト関数を実行します。QEMUの外へとテストの結果を通知するため、QEMUと`bootimage`の様々な機能を使います。 このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-04` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-04 ## この記事を読む前に この記事は、(古い版の)[単体テスト][_Unit Testing_]と[結合テスト][_Integration Tests_]の記事を置き換えるものです。この記事は、あなたが[最小のカーネル][_A Minimal Rust Kernel_]の記事を2019-04-27以降に読んだことを前提にしています。主に、あなたの`.cargo/config.toml`ファイルが[標準のターゲットを設定して][sets a default target]おり、[ランナー実行ファイルを定義している][defines a runner executable]ことが条件となります。
    **訳注:** [最小のカーネル][_A Minimal Rust Kernel_]の記事が日本語に翻訳されたのはこの日より後なので、あなたがこのサイトを日本語で閲覧している場合は特に問題はありません。
    [_Unit Testing_]: @/edition-2/posts/deprecated/04-unit-testing/index.md [_Integration Tests_]: @/edition-2/posts/deprecated/05-integration-tests/index.md [_A Minimal Rust Kernel_]: @/edition-2/posts/02-minimal-rust-kernel/index.ja.md [sets a default target]: @/edition-2/posts/02-minimal-rust-kernel/index.ja.md#biao-zhun-notagetutowosetutosuru [defines a runner executable]: @/edition-2/posts/02-minimal-rust-kernel/index.ja.md#cargo-runwoshi-u ## Rustにおけるテスト Rustには[テストフレームワークが組み込まれて][built-in test framework]おり、特別な設定なしに単体テストを走らせることができます。何らかの結果をアサーションを使って確認する関数を作り、その関数のヘッダに`#[test]`属性をつけるだけです。その上で`cargo test`を実行すると、あなたのクレートのすべてのテスト関数を自動で見つけて実行してくれます。 [built-in test framework]: https://doc.rust-jp.rs/book-ja/ch11-00-testing.html カーネルバイナリのテストを有効にするには、Cargo.toml の `test` フラグを `true` に設定します: ```toml # Cargo.toml 内 [[bin]] name = "blog_os" test = true bench = false ``` この [`[[bin]]` セクション](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#configuring-a-target) は、`cargo` が `blog_os` 実行可能ファイルをどのようにコンパイルするかを指定します。 `test` フィールドは、この実行可能ファイルに対してテストがサポートされているかどうかを指定します。 最初の投稿では、[`rust-analyzer` を正常に動作させる](@/edition-2/posts/01-freestanding-rust-binary/index.md#making-rust-analyzer-happy)ために `test = false` に設定しましたが、今回はテストを有効にしたいので、`true` に戻します。 残念なことに、私達のカーネルのような`no_std`のアプリケーションにとっては、テストは少しややこしくなります。問題なのは、Rustのテストフレームワークは組み込みの[`test`][`test`]ライブラリを内部で使っており、これは標準ライブラリに依存しているということです。つまり、私達の`#[no_std]`のカーネルには標準のテストフレームワークは使えないのです。 [`test`]: https://doc.rust-lang.org/test/index.html 私達のプロジェクト内で`cargo test`を実行しようとすればそれがわかります: ``` > cargo test Compiling blog_os v0.1.0 (/…/blog_os) error[E0463]: can't find crate for `test` ``` `test`クレートは標準ライブラリに依存しているので、私達のベアメタルのターゲットでは使えません。`test`クレートを`#[no_std]`環境に持ってくるということは[不可能ではない][utest]のですが、非常に不安定であり、また`panic`マクロの再定義といった技巧 (ハック) が必要になってしまいます。 [utest]: https://github.com/japaric/utest ### 独自のテストフレームワーク ありがたいことに、Rustでは、不安定な[`custom_test_frameworks` (独自のテストフレームワーク) ][`custom_test_frameworks`]機能を使えば標準のテストフレームワークを置き換えることができます。この機能には外部ライブラリは必要なく、したがって`#[no_std]`環境でも動きます。これは、`#[test_case]`属性をつけられたすべての関数のリストを引数としてユーザの指定した実行関数を呼び出すことで働きます。こうすることで、(実行関数の)実装内容によってテストプロセスを最大限コントロールできるようにしているのです。 [`custom_test_frameworks`]: https://doc.rust-lang.org/unstable-book/language-features/custom-test-frameworks.html 標準のテストフレームワークと比べた欠点は、[`should_panic`テスト][`should_panic` tests]のような多くの高度な機能が利用できないということです。それらの機能が必要なら、自分で実装して提供してください、というわけです。これは私達にとって全く申し分のないことで、というのも、私達の非常に特殊な実行環境では、それらの高度な機能の標準の実装はいずれにせようまく働かないだろうからです。例えば、`#[should_panic]`属性はパニックを検知するためにスタックアンワインドを使いますが、これは私達のカーネルでは無効化しています。 [`should_panic` tests]: https://doc.rust-jp.rs/book-ja/ch11-01-writing-tests.html#should_panicでパニックを確認する 私達のカーネルのための独自テストフレームワークを実装するため、以下を`main.rs`に追記します: ```rust // in src/main.rs #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #[cfg(test)] pub fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } } ``` このランナーは短いデバッグメッセージを表示し、リスト内のそれぞれの関数を呼び出すだけです。引数の型である`&[&dyn Fn()]`は、[Fn()][_Fn()_]トレイトの[トレイトオブジェクト][_trait object_]参照の[スライス][_slice_]です。これは要するに、関数のように呼び出せる型への参照のリストです。この (test_runner) 関数はテストでない実行のときには意味がないので、`#[cfg(test)]`属性を使って、テスト時にのみこれがインクルードされるようにします。 [_slice_]: https://doc.rust-lang.org/std/primitive.slice.html [_trait object_]: https://doc.rust-jp.rs/book-ja/ch17-02-trait-objects.html [_Fn()_]: https://doc.rust-lang.org/std/ops/trait.Fn.html `cargo test`を実行すると、今度は成功しているはずです(もし失敗したなら、下の補足を読んでください)。しかし、依然として、`test_runner`からのメッセージではなく "Hello World" が表示されてしまっています。この理由は、`_start`関数がまだエントリポイントとして使われているからです。「独自のテストフレームワーク」機能は`test_runner`を呼び出す`main`関数を生成するのですが、私達は`#[no_main]`属性を使っており、独自のエントリポイントを与えてしまっているため、このmain関数は無視されてしまうのです。
    **補足:** 現在、cargoには`cargo test`を実行すると、いくらかのケースにおいて "duplicate lang item" エラーになってしまうバグが存在します。これは、`Cargo.toml`内のプロファイルにおいて`panic = "abort"`を設定していたときに起こります。これを取り除けば`cargo test`はうまくいくはずです。これについて、より詳しく知りたい場合は[cargoのissue](https://github.com/rust-lang/cargo/issues/7359)を読んでください。
    これを修正するために、まず生成される関数の名前を`reexport_test_harness_main`属性を使って`main`とは違うものに変える必要があります。そして、その改名 (リネーム) された関数を`_start`関数から呼び出せばよいです。 ```rust // in src/main.rs #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } ``` テストフレームワークのエントリ関数の名前を`test_main`に設定し、私達の`_start`エントリポイントから呼び出しています。`test_main`関数は通常の実行時には生成されていないので、[条件付きコンパイル][conditional compilation]を用いて、テスト時にのみこの関数への呼び出しが追記されるようにしています。 `cargo test`を実行すると、 `test_runner`からの "Running 0 tests" というメッセージが画面に表示されます。これで、テスト関数を作り始める準備ができました: ```rust // in src/main.rs #[test_case] fn trivial_assertion() { print!("trivial assertion... "); // "些末なアサーション……" assert_eq!(1, 1); println!("[ok]"); } ``` `cargo test`を実行すると、以下の出力を得ます: ![QEMU printing "Hello World!", "Running 1 tests", and "trivial assertion... [ok]"](qemu-test-runner-output.png) 今、`test_runner`関数に渡される`test`のスライスは、`trivial_assertion`関数への参照を保持しています。`trivial assertion... [ok]`という画面の出力から、テストが呼び出され成功したことがわかります。 テストを実行したあとは、`test_runner`から`test_main`関数へとリターンし、さらに`_start`エントリポイント関数へとリターンします。エントリポイント関数がリターンすることは認められていないので、`_start`の最後では無限ループに入ります。しかし、`cargo test`にはすべてのテストを実行し終わった後に終了してほしいので、これは問題です。 ## QEMUを終了する 今の所、`_start`関数の最後で無限ループがあるので、`cargo test`を実行するたびにQEMUを手動で終了しないといけません。ユーザによる入力などのないスクリプトでも`cargo test`を実行したいので、これは不都合です。これに対する綺麗な解決法はOSをシャットダウンする適切な方法を実装することでしょう。これは[APM]か[ACPI]というパワーマネジメント標準規格へのサポートを実装する必要があるので、残念なことに比較的複雑です。 [APM]: https://wiki.osdev.org/APM [ACPI]: https://wiki.osdev.org/ACPI しかし嬉しいことに、ある「脱出口」があるのです。QEMUは特殊な`isa-debug-exit`デバイスをサポートしており、これを使うとゲストシステムから簡単にQEMUを終了できます。これを有効化するためには、QEMUに`-device`引数を渡す必要があります。これは`Cargo.toml`に`package.metadata.bootimage.test-args`設定キーを追加することで行えます。 ```toml # in Cargo.toml [package.metadata.bootimage] test-args = ["-device", "isa-debug-exit,iobase=0xf4,iosize=0x04"] ``` `bootimage runner`は、`test-args`をすべてのテスト実行可能ファイルの標準QEMUコマンドに追加します。通常の`cargo run`のとき、これらの引数は無視されます。 デバイス名 (`isa-debug-exit`) に加え、カーネルからそのデバイスにたどり着くための **I/Oポート** を指定する`iobase`と`iosize`という2つのパラメータを渡しています。 ### I/Oポート CPUと周辺機器 (ペリフェラル) が通信するやり方には、 **memory-mapped (メモリマップされた) I/O** と **port-mapped (ポートマップされた) I/O** の2つがあります。memory-mapped I/Oについては、すでに[VGAテキストバッファ][VGA text buffer]にメモリアドレス`0xb8000`を使ってアクセスしたときに使っています。このアドレスはRAMではなく、VGAデバイス上にあるメモリにマップされているのです。 [VGA text buffer]: @/edition-2/posts/03-vga-text-buffer/index.ja.md 一方、port-mapped I/Oは通信に別個のI/Oバスを使います。接続されたそれぞれの周辺機器は1つ以上のポート番号を持っています。それらのI/Oポートと通信するために、`in`と`out`という特別なCPU命令があり、これらはポート番号と1バイトのデータを受け取ります(`u16`や`u32`を送信できる、これらの亜種も存在します)。 `isa-debug-exit`はこのport-mapped I/Oを使います。`iobase`パラメータはどのポートにこのデバイスが繋がれているのか(`0xf4`はx86のI/Oバスにおいて[普通使われない][list of x86 I/O ports]ポートです)を、`iosize`はポートの大きさ(`0x04`は4バイトを意味します)を指定します。 [list of x86 I/O ports]: https://wiki.osdev.org/I/O_Ports#The_list ### 「終了デバイス」を使う `isa-debug-exit`の機能は非常に単純です。値`value`が`iobase`により指定されたI/Oポートに書き込まれたら、QEMUは[終了ステータス][exit status]を`(value << 1) | 1`にして終了します。なので、このポートに`0`を書き込むと、QEMUは終了ステータス`(0 << 1) | 1 = 1`で、`1`を書き込むと終了ステータス`(1 << 1) | 1 = 3`で終了します。 [exit status]: https://ja.wikipedia.org/wiki/終了ステータス `in`と`out`のアセンブリ命令を手動で呼び出す代わりに、[`x86_64`]クレートによって提供されるabstraction (抽象化されたもの) を使います。このクレートへの依存を追加するため、`Cargo.toml`の`dependencies`セクションにこれを追加しましょう: [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/ ```toml # in Cargo.toml [dependencies] x86_64 = "0.14.2" ``` これで、このクレートによって提供される[`Port`]型を使って`exit_qemu`関数を作ることができます。 [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html ```rust // in src/main.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` この関数は新しい[`Port`]を`0xf4`(`isa-debug-exit`デバイスの`iobase`です)に作ります。そして、渡された終了コードをポートに書き込みます。`isa-device-exit`デバイスの`iosize`に4バイトを指定していたので、`u32`を使うことにします。I/Oポートへの書き込みは一般にあらゆる振る舞いを引き起こしうるので、これらの命令は両方unsafeです。 終了ステータスを指定するために、`QemuExitCode`enumを作ります。成功したら成功(`Success`)の終了コードで、そうでなければ失敗(`Failed`)の終了コードで終了しようというわけです。enumは`#[repr(u32)]`をつけることで、それぞれのヴァリアントが`u32`の整数として表されるようにしています。終了コード`0x10`を成功に、`0x11`を失敗に使います。終了コードの実際の値は、QEMUの標準の終了コードと被ってしまわない限りはなんでも構いません。例えば、成功の終了コードに`0`を使うと、変換後`(0 << 1) | 1 = 1`になってしまい、これはQEMUが実行に失敗したときの標準終了コードなのでよくありません。QEMUのエラーとテスト実行の成功が区別できなくなります。 というわけで、`test_runner`を更新して、すべてのテストが実行されたあとでQEMUを終了するようにできますね: ```rust fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } /// new exit_qemu(QemuExitCode::Success); } ``` `cargo test`を実行すると、QEMUはテスト実行後即座に閉じるのがわかります。しかし、問題は、`Success`の終了コードを渡したのに、`cargo test`はテストが失敗したと解釈することです: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.03s Running target/x86_64-blog_os/debug/deps/blog_os-5804fc7d2dd4c9be Building bootloader Compiling bootloader v0.5.3 (/home/philipp/Documents/bootloader) Finished release [optimized + debuginfo] target(s) in 1.07s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-5804fc7d2dd4c9be.bin -device isa-debug-exit,iobase=0xf4, iosize=0x04` error: test failed, to rerun pass '--bin blog_os' ``` 問題は、`cargo test`が`0`でないすべてのエラーコードを失敗と解釈してしまうことです。 ### 成功の終了コード これを解決するために、`bootimage`は指定された終了コードを`0`へとマップする設定キー、`test-success-exit-code`を提供しています: ```toml [package.metadata.bootimage] test-args = […] test-success-exit-code = 33 # (0x10 << 1) | 1 ``` この設定を使うと、`bootimage`は私達の出した成功の終了コードを、終了コード0へとマップするので、`cargo test`は正しく成功を認識し、テストを失敗したと見做さなくなります。 これで私達のテストランナーは、自動でQEMUを閉じ、結果を報告するようになりました。しかし、QEMUの画面が非常に短い時間開くのは見えますが、短すぎて結果が読めません。QEMUが終了したあともテストの結果が見られるように、コンソールに出力できたら良さそうです。 ## コンソールに出力する テストの結果をコンソールで見るためには、カーネルからホストシステムにどうにかしてデータを送る必要があります。これを達成する方法は色々あり、例えばTCPネットワークインターフェースを通じてデータを送るというのが考えられます。しかし、ネットワークスタックを設定するのは非常に複雑なタスクなので、より簡単な解決策を取ることにしましょう。 ### シリアルポート データを送る簡単な方法とは、[シリアルポート][serial port]という、最近のコンピュータにはもはや見られない古いインターフェース標準を使うことです。これはプログラムするのが簡単で、QEMUはシリアルを通じて送られたデータをホストの標準出力やファイルにリダイレクトすることができます。 [serial port]: https://ja.wikipedia.org/wiki/シリアルポート シリアルインターフェースを実装しているチップは[UART][UARTs]と呼ばれています。x86には[多くのUARTのモデルがありますが][lots of UART models]、幸運なことに、それらの違いは私達の必要としないような高度な機能だけです。今日よく見られるUARTはすべて[16550 UART]に互換性があるので、このモデルを私達のテストフレームワークに使いましょう。 [UARTs]: https://ja.wikipedia.org/wiki/UART [lots of UART models]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#Models [16550 UART]: https://ja.wikipedia.org/wiki/16550_UART [`uart_16550`]クレートを使ってUARTを初期化しデータをシリアルポートを使って送信しましょう。これを依存先として追加するため、`Cargo.toml`と`main.rs`を書き換えます: [`uart_16550`]: https://docs.rs/uart_16550 ```toml # in Cargo.toml [dependencies] uart_16550 = "0.2.0" ``` `uart_16550`クレートにはUARTレジスタを表現する`SerialPort`構造体が含まれていますが、これのインスタンスは私達自身で作らなくてはいけません。そのため、以下の内容で新しい`serial`モジュールを作りましょう: ```rust // in src/main.rs mod serial; ``` ```rust // in src/serial.rs use uart_16550::SerialPort; use spin::Mutex; use lazy_static::lazy_static; lazy_static! { pub static ref SERIAL1: Mutex = { let mut serial_port = unsafe { SerialPort::new(0x3F8) }; serial_port.init(); Mutex::new(serial_port) }; } ``` [VGAテキストバッファ][vga lazy-static]のときのように、`lazy_static`とスピンロックを使って`static`なwriterインスタンスを作ります。`lazy_static`を使うことで、`init`メソッドが初回使用時にのみ呼び出されることを保証できます。 `isa-debug-exit`デバイスのときと同じように、UARTはport I/Oを使ってプログラムされています。UARTはより複雑で、様々なデバイスレジスタ群をプログラムするために複数のI/Oポートを使います。unsafeな`SerialPort::new`関数はUARTの最初のI/Oポートを引数とします。この引数から、すべての必要なポートのアドレスを計算することができます。ポートアドレス`0x3F8`を渡していますが、これは最初のシリアルインターフェースの標準のポート番号です。 [vga lazy-static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics シリアルポートを簡単に使えるようにするために、`serial_print!`と`serial_println!`マクロを追加します: ```rust #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; SERIAL1.lock().write_fmt(args).expect("Printing to serial failed"); } /// シリアルインターフェースを通じてホストに出力する。 #[macro_export] macro_rules! serial_print { ($($arg:tt)*) => { $crate::serial::_print(format_args!($($arg)*)); }; } /// シリアルインターフェースを通じてホストに出力し、改行を末尾に追加する。 #[macro_export] macro_rules! serial_println { () => ($crate::serial_print!("\n")); ($fmt:expr) => ($crate::serial_print!(concat!($fmt, "\n"))); ($fmt:expr, $($arg:tt)*) => ($crate::serial_print!( concat!($fmt, "\n"), $($arg)*)); } ``` この実装は私達の`print`および`println`マクロとよく似ています。`SerialPort`型はすでに[`fmt::Write`]トレイトを実装しているので、自前の実装を提供する必要はありません。 [`fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html これで、テストコードにおいてVGAテキストバッファの代わりにシリアルインターフェースに出力することができます: ```rust // in src/main.rs #[cfg(test)] fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); […] } #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` `#[macro_export]`属性を使うことで、`serial_println`マクロはルート名前空間 (namespace) の直下に置かれるので、`use crate::serial::serial_println`とインポートするとうまくいかないということに注意してください。 ### QEMUの引数 QEMUからのシリアル出力を見るために、出力を標準出力にリダイレクトしたいので、`-serial`引数を使う必要があります。 ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio" ] ``` これで`cargo test`を実行すると、テスト出力がコンソールに直接出力されているのが見えるでしょう: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [ok] ``` しかし、テストが失敗したときは、私達のパニックハンドラはまだ`println`を使っているので、出力がQEMUの中に出てしまいます。これをシミュレートするには、`trivial_assertion`テストの中のアサーションを`assert_eq!(0, 1)`に変えればよいです: ![QEMU printing "Hello World!" and "panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:55:5](qemu-failed-test.png) 他のテスト出力がシリアルポートに出力されている一方、パニックメッセージはまだVGAバッファに出力されているのがわかります。このパニックメッセージは非常に役に立つので、コンソールでこのメッセージも見られたら非常に便利でしょう。 ### パニック時のエラーメッセージを出力する パニック時にQEMUをエラーメッセージとともに終了するためには、[条件付きコンパイル][conditional compilation]を使うことで、テスト時に異なるパニックハンドラを使うことができます: [conditional compilation]: https://doc.rust-lang.org/1.30.0/book/first-edition/conditional-compilation.html ```rust // 前からあるパニックハンドラ #[cfg(not(test))] // 新しく追加した属性 #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } // テストモードで使うパニックハンドラ #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } ``` テストパニックハンドラには`println`の代わりに`serial_println`を使い、そのあと失敗の終了コードでQEMUを終了します。コンパイラには、`exit_qemu`の呼び出しのあと`isa-debug-exit`デバイスがプログラムを終了させているということはわからないので、やはり最後に無限ループを入れないといけないことに注意してください。 これでQEMUはテストが失敗したときも終了し、コンソールに役に立つエラーメッセージを表示するようになります: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:65:5 ``` これですべてのテスト出力がコンソールで見られるようになったので、一瞬出てくるQEMUウィンドウはもはや必要ありません。ですので、これを完全に見えなくしてしまいましょう。 ### QEMUを隠す すべてのテスト結果を`isa-debug-exit`デバイスとシリアルポートを使って通知できるので、QEMUのウィンドウはもはや必要ありません。これは、QEMUに`-display none`引数を渡すことで隠すことができます: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio", "-display", "none" ] ``` これでQEMUは完全にバックグラウンドで実行するようになり、ウィンドウはもう開きません。これで、ジャマが減っただけでなく、私達のテストフレームワークがグラフィカルユーザーインターフェースのない環境――たとえばCIサービスや[SSH]接続――でも使えるようになりました。 [SSH]: https://ja.wikipedia.org/wiki/Secure_Shell ### タイムアウト `cargo test`はテストランナーが終了するまで待つので、絶対に終了しないテストがあるとテストランナーを永遠にブロックしかねません。これは悲しいですが、普通エンドレス (終了しない) ループを回避するのは簡単なので、実際は大きな問題ではありません。しかしながら、私達のケースでは、様々な状況でエンドレスループが発生しうるのです: - ブートローダーが私達のカーネルを読み込むのに失敗し、これによりシステムが延々と再起動し続ける。 - BIOS/UEFIファームウェアがブートローダーの読み込みに失敗し、同様に延々と再起動し続ける。 - 私達の関数のどれかの最後で、CPUが`loop {}`文に入ってしまう(例えば、QEMU終了デバイスがうまく動かなかったなどの理由で)。 - CPU例外(今後説明します)がうまく捕捉されなかった場合などに、ハードウェアがシステムリセットを行う。 エンドレスループは非常に多くの状況で発生しうるので、`bootimage`はそれぞれのテスト実行ファイルに対し標準で5分のタイムアウトを設定しています。テストがこの時間内に終了しなかった場合は失敗したとみなされ、"Timed Out" エラーがコンソールに出力されます。この機能により、エンドレスループで詰まったテストが`cargo test`を永遠にブロックしてしまうことがないことが保証されます。 これを自分で試すこともできます。`trivial_assertion`テストに`loop {}`文を追加してください。`cargo test`を実行すると、5分後にテストがタイムアウトしたことが表示されるでしょう。タイムアウトまでの時間は`Cargo.toml`の`test-timeout`キーで[設定可能][bootimage config]です: [bootimage config]: https://github.com/rust-osdev/bootimage#configuration ```toml # in Cargo.toml [package.metadata.bootimage] test-timeout = 300 # (単位は秒) ``` `trivial_assertion`テストがタイムアウトするのを待ちたくない場合は、上の値を一時的に下げても良いでしょう。 ### 出力機能を自動で挿入する 現在、私達の`trivial_assertion`テストは、自分のステータス情報を`serial_print!`/`serial_println!`を使って出力する必要があります: ```rust #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` 私達の書くすべてのテストにこれらのprint文を手動で追加するのは煩わしいので、私達の`test_runner`を変更して、これらのメッセージを自動で出力するようにしましょう。そうするためには、`Testable`トレイトを作る必要があります: ```rust // in src/main.rs pub trait Testable { fn run(&self) -> (); } ``` ここで、[`Fn()`トレイト][`Fn()` trait]を持つ型`T`すべてにこのトレイトを実装してやるのがミソです: [`Fn()` trait]: https://doc.rust-lang.org/stable/core/ops/trait.Fn.html ```rust // in src/main.rs impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } ``` `run`関数を実装するに当たり、まず[`any::type_name`]を使って関数の名前を出力します。この関数はコンパイラの中に直接実装されており、すべての型の文字列による説明を返すことができます。関数の型はその名前なので、今回の場合まさに私達のやりたいことができています。文字`\t`は[タブ文字][tab character]であり、メッセージ`[ok]`の前にちょっとしたアラインメント(幅を整えるための空白)をつけます。 [`any::type_name`]: https://doc.rust-lang.org/stable/core/any/fn.type_name.html [tab character]: https://ja.wikipedia.org/wiki/タブキー#タブ文字 関数名を出力したあとは、テスト関数を`self()`を使って呼び出します。これは、`self`が`Fn()`トレイトを実装していることが要求されているからこそ可能です。テスト関数がリターンしたら、`[ok]`を出力してこの関数がパニックしなかったことを示します。 最後に、`test_runner`をこの`Testable`トレイトを使うように更新します: ```rust // in src/main.rs #[cfg(test)] pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); // ここを変更 } exit_qemu(QemuExitCode::Success); } ``` 変更点は2つだけで、`tests`引数の型を`&[&dyn Fn()]`から`&[&dyn Testable]`に変えたことと、`test()`の変わりに`test.run()`を呼ぶようにしたことです。 また、`trivial_assertion`のprint文は今や自動で出力されるようになったので、これを取り除きましょう: ```rust // in src/main.rs #[test_case] fn trivial_assertion() { assert_eq!(1, 1); } ``` これで`cargo test`の出力は以下のようになるはずです: ``` Running 1 tests blog_os::trivial_assertion... [ok] ``` いま、関数名には関数までのフルパスが含まれていますが、これは異なるモジュールのテスト関数が同じ名前を持っているときに便利です。それ以外の点において出力は前と同じですが、もう手動でテストにprint文を付け加える必要はありません。 ## VGAバッファをテストする 私達のテストフレームワークがうまく動くようになったので、私達のVGAバッファに関する実装のテストをいくつか作ってみましょう。まず、`println`がパニックすることなく成功することを確かめる非常に単純なテストを作ります: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_simple() { println!("test_println_simple output"); } ``` このテストは、適当な文字列をVGAバッファにただ出力するだけです。このテストがパニックすることなく終了したなら、`println`の呼び出しもまたパニックしなかったということです。 たくさんの行が出力され、行がスクリーンから押し出されたとしてもパニックが起きないことを確かめるために、もう一つテストを作ってみましょう: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_many() { for _ in 0..200 { println!("test_println_many output"); } } ``` 出力された行が本当に画面に映っているのかを確かめるテスト関数も作ることができます: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` この関数はテスト用文字列を定義し、`println`を使って出力し、静的な`WRITER`――VGAテキストバッファを表現しています――上の表示文字を走査 (イテレート) しています。`println`は最後に出力された行につづけて出力し、即座に改行するので、`BUFFER_HEIGHT - 2`行目にこの文字列は現れるはずです。 [`enumerate`]を使うことで、変数`i`によって反復の回数を数え、これを`c`に対応する画面上の文字を読み込むのに使っています。画面の文字の`ascii_character`を`c`と比較することで、文字列のそれぞれの文字がVGAテキストバッファに確実に現れていることを確かめることができます。 [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate ご想像の通り、もっとたくさんテストを作っても良いです。例えば、非常に長い行を出力しても、うまく折り返され、パニックしないことをテストする関数や、改行・出力不可能な文字・非ユニコード文字などが適切に処理されることを確かめるような関数を作ることもできます。 ですが、この記事の残りでは、 **結合テスト** を作って、異なる構成要素 (コンポーネント) の相互作用をテストする方法を説明しましょう。 ## 結合テスト Rustにおける[結合テスト][integration tests]では、慣習としてプロジェクトのルートにおいた`tests`ディレクトリ (つまり`src`ディレクトリと同じ階層ですね) にテストプログラムを入れます。標準のテストフレームワークも、独自のテストフレームワークも、自動的にこのディレクトリにあるすべてのテストを実行します。 [integration tests]: https://doc.rust-jp.rs/book-ja/ch11-03-test-organization.html#結合テスト すべての結合テストは、独自の実行可能ファイルを持っており、私達の`main.rs`とは完全に独立しています。つまり、それぞれのテストに独自のエントリポイント関数を定義しないといけないということです。どのような仕組みになっているのかを詳しく見るために、`basic_boot`という名前で試しに結合テストを作ってみましょう: ```rust // in tests/basic_boot.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; #[unsafe(no_mangle)] // この関数の名前を変えない pub extern "C" fn _start() -> ! { test_main(); loop {} } fn test_runner(tests: &[&dyn Fn()]) { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { loop {} } ``` 結合テストは独立した実行ファイルであるので、クレート属性(`no_std`、`no_main`、`test_runner`など)をすべてもう一度与えないといけません。また、新しいエントリポイント関数`_start`も作らないといけません。これはテストエントリポイント関数`test_main`を呼び出します。結合テストの実行可能ファイルは、テストモードでないときはビルドされないので、`cfg(test)`属性は必要ありません。 今のところ、`test_runner`関数の中身として、常にパニックする[`unimplemented`]マクロを代わりに入れており、そして`panic`ハンドラにはただの`loop`を入れています。本当は、`serial_println`マクロと`exit_qemu`関数を使って、これらの関数を`main.rs`と全く同じように実装したいです。しかし問題は、テストが私達の`main.rs`実行ファイルとは完全に別にビルドされているので、これらの関数にアクセスすることができないということです。 [`unimplemented`]: https://doc.rust-lang.org/core/macro.unimplemented.html この段階で`cargo test`を実行したら、パニックハンドラによってエンドレスループに入ってしまうでしょう。QEMUを終了するキーボードショートカットである`Ctrl+c`を使わないといけません。 ### ライブラリを作る 結合テストに必要な関数を利用できるようにするために、`main.rs`からライブラリを分離してやる必要があります。こうすると、他のクレートや結合テスト実行ファイルがこれをインクルードできるようになります。これをするために、新しい`src/lib.rs`ファイルを作りましょう: ```rust // src/lib.rs #![no_std] ``` `main.rs`と同じく、`lib.rs`は自動的にcargoに認識される特別なファイルです。ライブラリは別のコンパイル単位なので、`#![no_std]`属性を再び指定する必要があります。 `cargo test`がライブラリにも使えるようにするために、テストのための関数や属性を`main.rs`から`lib.rs`へと移す必要もあります。 ```rust // in src/lib.rs #![cfg_attr(test, no_main)] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; pub trait Testable { fn run(&self) -> (); } impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); } exit_qemu(QemuExitCode::Success); } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } /// `cargo test`のときのエントリポイント #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { test_panic_handler(info) } ``` `test_runner`を(`main.rs`の)実行可能ファイルと結合テストの両方から利用可能にするために、`cfg(test)`属性をこれに適用せず、また、publicにします。パニックハンドラの実装もpublicな`test_panic_handler`関数へと分離することで、実行可能ファイルからも使えるようにしています。 `lib.rs`は`main.rs`とは独立にコンパイルされるので、ライブラリがテストモードでコンパイルされるときは`_start`エントリポイントとパニックハンドラを追加する必要があります。このような場合、[`cfg_attr`]クレート属性を使うことで、`no_main`属性を条件付きで有効化することができます。 [`cfg_attr`]: https://doc.rust-lang.org/reference/conditional-compilation.html#the-cfg_attr-attribute `QemuExitCode`enumと`exit_qemu`関数も移動し、publicにします: ```rust // in src/lib.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` これで、実行ファイルも結合テストもこれらの関数をライブラリからインポートでき、自前の実装を定義する必要はありません。`println`と`serial_println`も利用可能にするために、モジュールの宣言も移動させましょう: ```rust // in src/lib.rs pub mod serial; pub mod vga_buffer; ``` モジュールをpublicにすることで、ライブラリの外からも使えるようにしています。`println`と`serial_println`マクロは、これらのモジュールの`_print`関数を使っているため、これらのマクロを使うためにも、この変更は必要です。 では、`main.rs`をこのライブラリを使うように更新しましょう: ```rust // src/main.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; use blog_os::println; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } /// この関数はパニック時に呼ばれる。 #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` ライブラリは通常の外部クレートと同じように使うことができます。名前は、私達のクレート名――今回なら`blog_os`――になります。上のコードでは、`blog_os::test_runner`関数を`test_runner`属性で、`blog_os::test_panic_handler`関数を`cfg(test)`のパニックハンドラで使っています。また、`println`マクロをインポートすることで、`_start`と`panic`関数で使えるようにもしています。 この時点で、`cargo run`と`cargo test`は再びうまく実行できるようになっているはずです。もちろん、`cargo test`は依然エンドレスループするはずですが(`ctrl+c`で終了できます)。結合テストに必要な関数を使ってこれを修正しましょう。 ### 結合テストを完成させる `src/main.rs`と同じように、`tests/basic_boot.rs`実行ファイルは新しいライブラリから型をインポートできます。これで、テストを完成させるのに足りない要素をインポートすることができます。 ```rust // in tests/basic_boot.rs #![test_runner(blog_os::test_runner)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` テストランナーを再実装することはせず、ライブラリの`test_runner`関数を使います。`panic`ハンドラとしては、`main.rs`でやったように`blog_os::test_panic_handler`関数を呼びます。 これで、`cargo test`は再び通常通り終了するはずです。実行すると、`lib.rs`、`main.rs`、そして`basic_boot.rs`を順にそれぞれビルドし、テストを実行するのが見えるはずです。`main.rs`と`basic_boot`結合テストに関しては、これらには`#[test_case]`のつけられた関数はないため、"Running 0 tests"と報告されるはずです。 これで、`basic_boot.rs`にテストを追加していくことができます。例えば、`println`がパニックすることなくうまく行くことを、VGAバッファのときのようにテストすることができます: ```rust // in tests/basic_boot.rs use blog_os::println; #[test_case] fn test_println() { println!("test_println output"); } ``` `cargo test`を実行すると、テスト関数を見つけ出して実行しているのがわかるでしょう。 このテストは、VGAバッファのテストとほとんど同じであるため、今のところあまり意味がないように思われるかもしれません。しかし、将来的に`main.rs`の`_start`関数と`lib.rs`はどんどん大きくなり、`test_main`関数を実行する前に様々な初期化ルーチンを呼ぶようになるかもしれないので、これらの2つのテストは全然違う環境で実行されるようになるかもしれないのです。 `println`を`basic_boot`環境で(`_start`で初期化ルーチンを一切呼ぶことなく)テストすることにより、起動の直後に`println`が使えることが保証されます。私達は、例えばパニックメッセージの出力などを`println`に依存しているので、これは重要です。 ### 今後のテスト 結合テストの魅力は、これらが完全に独立した実行ファイルとして扱われることです。これにより、実行環境を完全にコントロールすることができるので、コードがCPUやハードウェアデバイスと正しく相互作用していることをテストすることができるのです。 `basic_boot`テストは結合テストの非常に簡単な例でした。今後、私達のカーネルは機能がより豊富になり、そして様々な方法でハードウェアと相互作用するようになります。結合テストを追加することにより、それらの相互作用が期待通り動く(また、期待通り動きつづけている)ことを確かめることができるのです。今後追加できるテストの例としては、以下があります: - **CPU例外 (exception) **: プログラムが不正な操作(例えばゼロで割るなど)を行った場合、CPUは例外を投げます(訳注:例外を発することを、英語でthrow an exceptionというのにちなんで、慣例的に「投げる」と表現します)。カーネルはそのような例外に対するハンドラ関数を登録しておくことができます。結合テストで、CPU例外が起こったときに、例外ハンドラが呼ばれていることや、例外が解決可能だった場合に実行が継続することを確かめることができるでしょう。 - **ページテーブル**: ページテーブルは、どのメモリ領域が有効でアクセスできるかを定義しています。例えばプログラムを立ち上げるとき、このページテーブルを変更することで、新しいメモリ領域を割り当てることが可能です。結合テストで、ページテーブルに`_start`関数内で何らかの変更を施して、その変更が期待通りの効果を起こしているかを`#[test_case]`関数で確かめることができるでしょう。 - **ユーザー空間 (スペース) プログラム**: ユーザー空間プログラムは、システムの資源 (リソース) に限られたアクセスしか持たないプログラムのことです。これらは例えば、カーネルのデータ構造や、他のプログラムのメモリにアクセスすることはできません。結合テストで、禁止された操作を実行するようなユーザー空間プログラムを起動し、カーネルがそれらをすべて防ぐことを確かめることができるでしょう。 ご想像のとおり、もっと多くのテストが可能です。このようなテストを追加することで、カーネルに新しい機能を追加したときや、コードをリファクタリングしたときに、これらを壊してしまっていないことを保証できます。これは、私達のカーネルがより大きく、より複雑になったときに特に重要になります。 ### パニックしなければならないテスト 標準ライブラリのテストフレームワークは、[`#[should_panic]`属性][should_panic]をサポートしています。これを使うと、失敗しなければならないテストを作ることができます。これは、例えば、関数が無効な引数を渡されたときに失敗することを確かめる場合などに便利です。残念なことに、この機能は標準ライブラリのサポートを必要とするため、`#[no_std]`クレートではこの属性はサポートされていません。 [should_panic]: https://doc.rust-jp.rs/rust-by-example-ja/testing/unit_testing.html#パニックをテストする `#[should_panic]`属性は使えませんが、パニックハンドラから成功のエラーコードで終了するような結合テストを作れば、似たような動きをさせることはできます。そのようなテストを`should_panic`という名前で作ってみましょう: ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{QemuExitCode, exit_qemu, serial_println}; #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` これは`_start`関数や、独自テストランナー属性などをまだ定義していないので未完成です。足りない部分を追加しましょう: ```rust // in tests/should_panic.rs #![feature(custom_test_frameworks)] #![test_runner(test_runner)] #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } pub fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); for test in tests { test(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); } exit_qemu(QemuExitCode::Success); } ``` このテストは、`lib.rs`の`test_runner`を使い回さず、自前の、テストがパニックせずリターンしたときに失敗の終了コードを出すような`test_runner`関数を定義しています(私達はテストにパニックしてほしいわけですから)。もしテスト関数が一つも定義されていなければ、このランナーは成功のエラーコードで終了します。ランナーは一つテストを実行したら必ず終了するので、1つ以上の`#[test_case]`関数を定義しても意味はありません。 では、失敗するはずのテストを追加してみましょう: ```rust // in tests/should_panic.rs use blog_os::serial_print; #[test_case] fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } ``` このテストは`assert_eq`を使って`0`と`1`が等しいことをアサートしています。これはもちろん失敗するので、私達のテストは望み通りパニックします。ここで、`Testable`トレイトは使っていないので、関数名は`serial_print!`を使って自分で出力しないといけないことに注意してください。 `cargo test --test should_panic`を使ってテストすると、テストが期待通りパニックし、成功したことがわかるでしょう。アサーションをコメントアウトしテストをもう一度実行すると、"test did not panic"というメッセージとともに、テストが確かに失敗することがわかります。 この方法の無視できない欠点は、テスト関数を一つしか使えないことです。`#[test_case]`関数が複数ある場合、パニックハンドラが呼び出された後で(プログラムの)実行を続けることはできないので、最初の関数のみが実行されます。この問題を解決するいい方法を私は知らないので、もしなにかアイデアがあったら教えてください! ### ハーネス (harness) のないテスト
    **訳注:** ハーネスとは、もともとは馬具の一種を意味する言葉です。転じて「制御する道具」一般を指し、また[テストハーネス](https://en.wikipedia.org/wiki/Test_harness)というと(`test_runner`のように)複数のテストケースを処理し、その振る舞い・出力などを適切に処理・整形してくれるプログラムのことを指します。
    (私達の`should_panic`テストのように)一つしかテスト関数を持たない結合テストでは、テストランナーは必ずしも必要というわけではありません。このような場合、テストランナーは完全に無効化してしまって、`_start`関数からテストを直接実行することができます。 このためには、`Cargo.toml`でこのテストの`harness`フラグを無効化することがカギとなります。これは、結合テストにテストランナーが使われるかを定義しています。これが`false`に設定されると、標準のテストランナーと独自のテストランナーの両方が無効化され、通常の実行ファイルのように扱われるようになります。 `should_panic`テストの`harness`フラグを無効化してみましょう: ```toml # in Cargo.toml [[test]] name = "should_panic" harness = false ``` これで、テストランナーに関係するコードを取り除いて、`should_panic`テストを大幅に簡略化することができます。結果として以下のようになります: ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{exit_qemu, serial_print, serial_println, QemuExitCode}; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { should_fail(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); loop{} } fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` `should_fail`関数を`_start`関数から直接呼び出して、もしリターンしたら失敗の終了コードで終了するようにしました。今`cargo test --test should_panic`を実行しても、以前と全く同じように振る舞います。 `should_panic`なテストを作るとき以外にも`harness`属性は有用なことがあります。例えば、それぞれのテスト関数が副作用を持っており、指定された順番で実行されないといけないときなどです。 ## まとめ テストは、ある要素が望み通りの振る舞いをしていることを保証するのにとても便利なテクニックです。バグが存在しないことを証明することはできないとはいえ、バグを発見したり、特にリグレッションを防ぐのに便利な方法であることは間違いありません。 この記事では、私達のRust製カーネルでテストフレームワークを組み立てる方法を説明しました。Rustの独自 (カスタム) テストフレームワーク機能を使って、私達のベアメタル環境における、シンプルな`#[test_case]`属性のサポートを実装しました。私達のテストランナーは、QEMUの`isa-debug-exit`デバイスを使うことで、QEMUをテスト実行後に終了し、テストステータスを報告することができます。エラーメッセージを、VGAバッファの代わりにコンソールに出力するために、シリアルポートの単純なドライバを作りました。 `println`マクロのテストをいくつか作った後、記事の後半では結合テストについて見ました。結合テストは`tests`ディレクトリに置かれ、完全に独立した実行ファイルとして扱われることを学びました。結合テストから`exit_qemu`関数と`serial_println`マクロにアクセスできるようにするために、コードのほとんどをライブラリに移し、すべての実行ファイルと結合テストがインポートできるようにしました。結合テストはそれぞれ独自の環境で実行されるため、ハードウェアとの相互作用や、パニックするべきテストを作るといったことが可能になります。 QEMU内で現実に近い環境で実行できるテストフレームワークを手に入れました。今後の記事でより多くのテストを作っていくことで、カーネルがより複雑になってもメンテナンスし続けられるでしょう。 ## 次は? 次の記事では、**CPU例外**を見ていきます。この例外というのは、CPUによってなにか「不法行為」――例えば、ゼロ除算やマップされていないメモリページへのアクセス(いわゆる「ページフォルト」)――が行われたときに投げられます。これらの例外を捕捉してテストできるようにしておくことは、将来エラーをデバッグするときに非常に重要です。例外の処理はまた、キーボードをサポートするのに必要になる、ハードウェア割り込みの処理に非常に似てもいます。 ================================================ FILE: blog/content/edition-2/posts/04-testing/index.ko.md ================================================ +++ title = "커널을 위한 테스트 작성 및 실행하기" weight = 4 path = "ko/testing" date = 2019-04-27 [extra] # Please update this when updating the translation translation_based_on_commit = "1c9b5edd6a5a667e282ca56d6103d3ff1fd7cfcb" # GitHub usernames of the people that translated this post translators = ["JOE1994"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["SNOOPYOF", "dalinaum"] +++ 이 글에서는 `no_std` 실행파일에 대한 유닛 테스트 및 통합 테스트 과정을 다룰 것입니다. Rust에서 지원하는 커스텀 테스트 프레임워크 기능을 이용해 우리가 작성한 커널 안에서 테스트 함수들을 실행할 것입니다. 그 후 테스트 결과를 QEMU 밖으로 가져오기 위해 QEMU 및 `bootimage` 도구가 제공하는 여러 기능들을 사용할 것입니다. 이 블로그는 [GitHub 저장소][GitHub]에서 오픈 소스로 개발되고 있으니, 문제나 문의사항이 있다면 저장소의 'Issue' 기능을 이용해 제보해주세요. [페이지 맨 아래][at the bottom]에 댓글을 남기실 수도 있습니다. 이 글과 관련된 모든 소스 코드는 저장소의 [`post-04 브랜치`][post branch]에서 확인하실 수 있습니다. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-04 ## 전제 조건 이 글은 이전에 작성된 글들 [_Unit Testing_]과 [_Integration Tests_]를 대체합니다 (예전에 작성된 이 두 포스트의 내용은 오래전 내용이라 현재는 더 이상 유효하지 않습니다). 이 글은 독자가 2019년 4월 27일 이후에 글 [_A Minimal Rust Kernel_]을 읽고 따라 실습해봤다는 가정하에 작성했습니다. 독자는 해당 포스트에서 작성했던 파일 ` .cargo/config.toml`을 가지고 있어야 합니다. 이 파일은 [컴파일 대상 환경을 설정][sets a default target]하고 [프로그램 실행 시작을 담당하는 실행 파일을 정의][defines a runner executable]합니다. [_Unit Testing_]: @/edition-2/posts/deprecated/04-unit-testing/index.md [_Integration Tests_]: @/edition-2/posts/deprecated/05-integration-tests/index.md [_A Minimal Rust Kernel_]: @/edition-2/posts/02-minimal-rust-kernel/index.md [sets a default target]: @/edition-2/posts/02-minimal-rust-kernel/index.md#set-a-default-target [defines a runner executable]: @/edition-2/posts/02-minimal-rust-kernel/index.md#using-cargo-run ## Rust 프로그램 테스트하기 Rust 언어에 [내장된 자체 테스트 프레임워크][built-in test framework]를 사용하면 복잡한 초기 설정 과정 없이 유닛 테스트들을 실행할 수 있습니다. 작성한 함수에 가정 설정문 (assertion check)들을 삽입한 후, 함수 선언 바로 앞에 `#[test]` 속성을 추가하기만 하면 됩니다. 그 후에 `cargo test` 명령어를 실행하면 `cargo`가 자동으로 크레이트의 모든 테스트 함수들을 발견하고 실행합니다. [built-in test framework]: https://doc.rust-lang.org/book/ch11-00-testing.html 안타깝게도 우리의 커널처럼 `no_std` 환경에서 구동할 프로그램은 Rust가 기본으로 제공하는 테스트 프레임워크를 이용하기 어렵습니다. Rust의 테스트 프레임워크는 기본적으로 언어에 내장된 [`test`] 라이브러리를 사용하는데, 이 라이브러리는 Rust 표준 라이브러리를 이용합니다. 우리의 `#no_std` 커널을 테스트할 때는 Rust의 기본 테스트 프레임워크를 사용할 수 없습니다. [`test`]: https://doc.rust-lang.org/test/index.html 프로젝트 디렉터리 안에서 `cargo test` 명령어를 실행하면 아래와 같은 오류가 발생합니다: ``` > cargo test Compiling blog_os v0.1.0 (/…/blog_os) error[E0463]: can't find crate for `test` ``` `test` 크레이트가 표준 라이브러리에 의존하기에, 베어메탈 환경에서는 이 크레이트를 이용할 수 없습니다. `test` 크레이트를 `#[no_std]` 환경에서 이용할 수 있게 포팅(porting)하는 것이 [불가능한 것은 아니지만][utest], 일단 `test` 크레이트의 구현 변경이 잦아서 불안정하며 포팅 시 `panic` 매크로를 재정의하는 등 잡다하게 신경 써야 할 것들이 존재합니다. [utest]: https://github.com/japaric/utest ### 커스텀 테스트 프레임워크 다행히 Rust의 [`custom_test_frameworks`] 기능을 이용하면 Rust의 기본 테스트 프레임워크 대신 다른 것을 사용할 수 있습니다. 이 기능은 외부 라이브러리가 필요하지 않기에 `#[no_std]` 환경에서도 사용할 수 있습니다. 이 기능은 `#[test case]` 속성이 적용된 함수들을 모두 리스트에 모은 후에 사용자가 작성한 테스트 실행 함수에 전달하는 방식으로 작동합니다. 따라서 사용자가 작성한 테스트 실행 함수 단에서 테스트 실행 과정을 전적으로 제어할 수 있습니다. [`custom_test_frameworks`]: https://doc.rust-lang.org/unstable-book/language-features/custom-test-frameworks.html 기본 테스트 프레임워크와 비교했을 때의 단점은 [`should_panic` 테스트][`should_panic` tests]와 같은 고급 기능이 준비되어 있지 않다는 것입니다. 베어메탈 환경에서는 Rust의 기본 테스트 프레임워크가 제공하는 고급 기능들이 지원되지 않기에, 이 중 필요한 것이 있다면 우리가 직접 코드로 구현해야 합니다. 예를 들어 `#[should_panic]` 속성은 스택 되감기를 사용해 패닉을 잡아내는데, 우리의 커널에서는 스택 되감기가 해제되어 있어 사용할 수 없습니다. [`should_panic` tests]: https://doc.rust-lang.org/book/ch11-01-writing-tests.html#checking-for-panics-with-should_panic 커널 테스트용 테스트 프레임워크 작성의 첫 단계로 아래의 코드를 `main.rs`에 추가합니다: ```rust // in src/main.rs #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #[cfg(test)] fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } } ``` `test_runner`는 짧은 디버그 메시지를 출력한 후 주어진 리스트의 각 테스트 함수를 호출합니다. 인자 타입 `&[& dyn Fn()]`은 [_Fn()_] 트레이트를 구현하는 타입에 대한 레퍼런스들의 [_slice_]입니다. 좀 더 쉽게 말하면 이것은 함수처럼 호출될 수 있는 타입에 대한 레퍼런스들의 리스트입니다. `test_runner` 함수는 테스트 용도 외에 쓸모가 없기에 `#[cfg(test)]` 속성을 적용하여 테스트 시에만 빌드합니다. [_slice_]: https://doc.rust-lang.org/std/primitive.slice.html [_trait object_]: https://doc.rust-lang.org/1.30.0/book/first-edition/trait-objects.html [_Fn()_]: https://doc.rust-lang.org/std/ops/trait.Fn.html `cargo test`를 다시 시도하면 실행이 성공할 것입니다 (실행이 실패한다면 아래의 노트를 확인해주세요). 하지만 "Hello World" 메시지만 출력될 뿐 `test_runner`로부터의 메시지는 출력되지 않는데, 아직 `_start` 함수를 프로그램 실행 시작 함수로 이용하고 있기 때문입니다. 우리가 `#[no_main]` 속성을 통해 별도의 실행 시작 함수를 사용하고 있기에, 커스텀 테스트 프레임워크가 `test_runner`를 호출하려고 생성한 `main`함수가 이용되지 않고 있습니다.
    **각주:** 특정 상황에서 `cargo test` 실행 시 "duplicate lang item" 오류가 발생하는 버그가 존재합니다. `Cargo.toml`에 `panic = "abort"` 설정이 있으면 해당 오류가 발생할 수 있습니다. 해당 설정을 제거하면 `cargo test` 실행 시 오류가 발생하지 않을 것입니다. 더 자세한 정보는 [해당 버그에 대한 깃헙 이슈](https://github.com/rust-lang/cargo/issues/7359)를 참조해주세요.
    이 문제를 해결하려면 우선 `reexport_test_harness_main` 속성을 사용해 테스트 프레임워크가 생성하는 함수의 이름을 `main` 이외의 이름으로 변경해야 합니다. 그 후에 `_start` 함수로부터 이름이 변경된 이 함수를 호출할 것입니다. ```rust // in src/main.rs #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } ``` 테스트 프레임워크의 시작 함수를 `test_main`으로 설정하고, 커널 시작 함수 `_start`에서 `test_main` 함수를 호출합니다. `test_main` 함수는 테스트 상황이 아니면 생성되지 않기 때문에, [조건부 컴파일][conditional compilation]을 통해 테스트 상황에서만 `test_main` 함수를 호출하도록 합니다. `cargo test` 명령어를 실행하면 "Running 0 tests"라는 메시지가 출력됩니다. 이제 첫 번째 테스트 함수를 작성할 준비가 되었습니다. ```rust // in src/main.rs #[test_case] fn trivial_assertion() { print!("trivial assertion... "); assert_eq!(1, 1); println!("[ok]"); } ``` 위의 테스트 함수를 작성한 뒤 다시 `cargo test`를 실행하면 아래의 내용이 출력됩니다: ![QEMU printing "Hello World!", "Running 1 tests", and "trivial assertion... [ok]"](qemu-test-runner-output.png) `test_runner` 함수에 인자로 전달되는 `tests` 슬라이스에 `trivial_assertion` 함수에 대한 레퍼런스가 들어 있습니다. 출력 메시지 `trivial assertion... [ok]`를 통해 테스트가 성공적으로 실행되었음을 확인할 수 있습니다. 테스트 실행 완료 후 `test_runner` 함수가 반환되어 제어 흐름이 `test_main` 함수로 돌아오고, 다시 이 함수가 반환되어 `_start` 함수로 제어 흐름이 돌아갑니다. 실행 시작 함수는 반환할 수 없기에 `_start` 함수의 맨 끝에서 무한 루프에 진입하는데, `cargo test`의 실행 완료 후 종료하기를 바라는 우리의 입장에서는 해결해야 할 문제입니다. ## QEMU 종료하기 `_start` 함수의 맨 뒤에 무한루프가 있어 `cargo test`의 실행을 종료하려면 실행 중인 QEMU를 수동으로 종료해야 합니다. 이 때문에 각종 명령어 스크립트에서 사람의 개입 없이는 `cargo test`를 사용할 수 없습니다. 이 불편을 해소하는 가장 직관적인 방법은 정식으로 운영체제를 종료하는 기능을 구현하는 것입니다. 하지만 이를 구현하려면 [APM] 또는 [ACPI] 전원 제어 표준을 지원하도록 커널 코드를 짜야 해서 제법 복잡한 작업이 될 것입니다. [APM]: https://wiki.osdev.org/APM [ACPI]: https://wiki.osdev.org/ACPI 다행히 이 불편을 해결할 차선책이 존재합니다: QEMU가 지원하는 `isa-debug-exit` 장치를 사용하면 게스트 시스템에서 쉽게 QEMU를 종료할 수 있습니다. QEMU 실행 시 `-device` 인자를 전달하여 이 장치를 활성화할 수 있습니다. `Cargo.toml`에 `package.metadata.bootimage.test-args`라는 설정 키 값을 추가하여 QEMU에 `device` 인자를 전달합니다: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = ["-device", "isa-debug-exit,iobase=0xf4,iosize=0x04"] ``` `bootimage runner`는 테스트 실행 파일을 실행할 때 QEMU 실행 명령어의 마지막에 `test-args`를 추가합니다. `cargo run` 실행의 경우에는 QEMU 실행 명령어 끝에 `test-args`를 추가하지 않습니다. 장치 이름(`isa-debug-exit`)과 함께 두 개의 인자 `iobase`와 `iosize`를 전달하는데, 이 두 인자는 우리의 커널이 어떤 _입출력 포트_ 를 이용해 `isa-debug-exit` 장치에 접근할 수 있는지 알립니다. ### 입출력 포트 x86 CPU와 주변 장치가 데이터를 주고받는 입출력 방법은 두 가지가 있습니다. 하나는 **메모리 맵 입출력(memory-mapped I/O)**이고 다른 하나는 **포트 맵 입출력(port-mapped I/O)**입니다. 예전에 우리는 메모리 맵 입출력을 이용해 [VGA 텍스트 버퍼][VGA text buffer]를 메모리 주소 `0xb8000`에 매핑하여 접근했었습니다. 이 주소는 RAM에 매핑되는 대신 VGA 장치의 메모리에 매핑됩니다. [VGA text buffer]: @/edition-2/posts/03-vga-text-buffer/index.md 반면 포트 맵 입출력은 별도의 입출력 버스를 이용해 장치 간 통신을 합니다. CPU에 연결된 주변장치 당 1개 이상의 포트 번호가 배정됩니다. CPU 명령어 `in`과 `out`은 포트 번호와 1바이트의 데이터를 인자로 받습니다. CPU는 이 명령어들을 이용해 입출력 포트와 데이터를 주고받습니다 (`in`/`out`이 변형된 버전의 명령어로 `u16` 혹은 `u32` 단위로 데이터를 주고받을 수도 있습니다). `isa-debug-exit` 장치는 port-mapped I/O 를 사용합니다. `iobase` 인자는 이 장치를 어느 포트에 연결할지 정합니다 (`0xf4`는 [x86 시스템의 입출력 버스 중 잘 안 쓰이는][list of x86 I/O ports] 포트입니다). `iosize` 인자는 포트의 크기를 정합니다 (`0x04`는 4 바이트 크기를 나타냅니다). [list of x86 I/O ports]: https://wiki.osdev.org/I/O_Ports#The_list ### 종료 장치 사용하기 `isa-debug-exit` 장치가 하는 일은 매우 간단합니다. `iobase`가 가리키는 입출력 포트에 값 `value`가 쓰였을 때, 이 장치는 QEMU가 [종료 상태][exit status] `(value << 1) | 1`을 반환하며 종료하도록 합니다. 따라서 우리가 입출력 포트에 값 `0`을 보내면 QEMU가 `(0 << 1) | 1 = 1`의 종료 상태 코드를 반환하고, 값 `1`을 보내면 `(1 << 1) | 1 = 3`의 종료 상태 코드를 반환합니다. [exit status]: https://en.wikipedia.org/wiki/Exit_status `x86` 명령어 `in` 및 `out`을 사용하는 어셈블리 코드를 직접 작성하는 대신 `x86_64` 크레이트가 제공하는 추상화된 API를 사용할 것입니다. `Cargo.toml`의 `dependencies` 목록에 `x86_64` 크레이트를 추가합니다: [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/ ```toml # in Cargo.toml [dependencies] x86_64 = "0.14.2" ``` `x86_64` 크레이트가 제공하는 [`Port`] 타입을 사용해 아래처럼 `exit_qemu` 함수를 작성합니다: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html ```rust // in src/main.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` 이 함수는 새로운 [`Port`]를 주소 `0xf4`(`isa-debug-exit` 장치의 `iobase`)에 생성합니다. 그다음 인자로 받은 종료 상태 코드를 포트로 전달합니다. 여기서 `u32` 타입을 사용하는 이유는 앞에서 우리가 `isa-debug-exit` 장치의 `iosize`를 4 바이트로 설정했기 때문입니다. 입출력 포트에 값을 쓰는 것은 잘못하면 프로그램이 예상치 못한 행동을 보일 수 있어 위험하다고 간주합니다. 따라서 이 함수가 처리하는 두 작업 모두 `unsafe` 블록 안에 배치해야 합니다. `QemuExitCode` enum 타입을 이용하여 프로그램 종료 상태를 표현합니다. 모든 테스트가 성공적으로 실행되었다면 "성공" 종료 코드를 반환하고 그렇지 않았다면 "실패" 종료 코드를 반환하도록 구현할 것입니다. enum에는 `#[repr(u32)]` 속성이 적용하여 enum의 각 분류 값은 `u32` 타입의 값으로 표현됩니다. `0x10`을 성공 종료 코드로 사용하고 `0x11`을 실패 종료 코드로 사용할 것입니다. QEMU가 이미 사용 중인 종료 코드와 중복되지만 않는다면, 어떤 값을 성공/실패 종료 코드로 사용하는지는 크게 중요하지 않습니다. `0`을 성공 종료 코드로 사용하는 것은 바람직하지 않은데, 그 이유는 종료 코드 변환 결과인 `(0 << 1) | 1 = 1`의 값이 QEMU가 실행 실패 시 반환하는 코드와 동일하기 때문입니다. 이 경우 종료 코드만으로는 QEMU가 실행을 실패한 것인지 모든 테스트가 성공적으로 실행된 것인지 구분하기 어렵습니다. 이제 `test_runner` 함수를 수정하여 모든 테스트 실행 완료 시 QEMU가 종료하도록 합니다. ```rust // in src/main.rs fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } /// new exit_qemu(QemuExitCode::Success); } ``` `cargo test`를 다시 실행하면 테스트 실행 완료 직후에 QEMU가 종료되는 것을 확인할 수 있습니다. 여기서 문제는 우리가 `Success` 종료 코드를 전달했는데도 불구하고 `cargo test`는 테스트들이 전부 실패했다고 인식한다는 것입니다. ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.03s Running target/x86_64-blog_os/debug/deps/blog_os-5804fc7d2dd4c9be Building bootloader Compiling bootloader v0.5.3 (/home/philipp/Documents/bootloader) Finished release [optimized + debuginfo] target(s) in 1.07s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-5804fc7d2dd4c9be.bin -device isa-debug-exit,iobase=0xf4, iosize=0x04` error: test failed, to rerun pass '--bin blog_os' ``` `cargo test`는 `0` 이외의 모든 에러 코드 값을 보면 실행이 실패했다고 간주합니다. ### 실행 성공 시 종료 코드 `bootimage` 도구의 설정 키 `test-success-exit-code`를 이용하면 특정 종료 코드가 종료 코드 `0`처럼 취급되도록 할 수 있습니다. ```toml # in Cargo.toml [package.metadata.bootimage] test-args = […] test-success-exit-code = 33 # (0x10 << 1) | 1 ``` 이 설정을 이용하면 우리가 반환하는 성공 종료 코드를 `bootimage` 도구가 종료 코드 0으로 변환합니다. 이제 `cargo test`는 테스트 실행이 성공했다고 인식합니다. test_runner는 이제 테스트 결과를 출력한 후 QEMU를 자동으로 종료합니다. QEMU 창이 매우 짧은 시간 동안만 떠 있기에 QEMU 창에 출력된 테스트 결과를 제대로 읽기 어렵습니다. QEMU 종료 후에도 충분한 시간을 갖고 테스트 결과를 읽을 수 있으려면 테스트 결과가 콘솔에 출력되는 편이 나을 것입니다. ## 콘솔에 출력하기 테스트 결과를 콘솔에서 확인하려면 우리의 커널에서 호스트 시스템으로 출력 결과 데이터를 전송해야 합니다. 이것을 달성하는 방법은 여러 가지 있습니다. 한 방법은 TCP 네트워크 통신을 이용해 데이터를 전달하는 것입니다. 하지만 네트워크 통신 스택을 구현하는 것은 상당히 복잡하기에, 우리는 좀 더 간단한 해결책을 이용할 것입니다. ### 직렬 포트 (Serial Port) 데이터를 전송하는 쉬운 방법 중 하나는 바로 [직렬 포트 (serial port)][serial port]를 이용하는 것입니다. 직렬 포트 하드웨어는 근대의 컴퓨터들에서는 찾아보기 어렵습니다. 하지만 직렬 포트의 기능 자체는 소프트웨어로 쉽게 구현할 수 있으며, 직렬 통신을 통해 우리의 커널에서 QEMU로 전송한 데이터를 다시 QEMU에서 호스트 시스템의 표준 출력 및 파일로 재전달할 수 있습니다. [serial port]: https://en.wikipedia.org/wiki/Serial_port 직렬 통신을 구현하는 칩을 [UART][UARTs]라고 부릅니다. x86에서 사용할 수 있는 [다양한 종류의 UART 구현 모델들][lots of UART models]이 존재하며, 다양한 구현 모델들 간 차이는 우리가 쓰지 않을 고급 기능 사용 시에만 유의미합니다. 우리의 테스트 프레임워크에서는 대부분의 UART 구현 모델들과 호환되는 [16550 UART] 모델을 이용할 것입니다. [UARTs]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter [lots of UART models]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#Models [16550 UART]: https://en.wikipedia.org/wiki/16550_UART [`uart_16550`] 크레이트를 이용해 UART 초기 설정을 마친 후 직렬 포트를 통해 데이터를 전송할 것입니다. `Cargo.toml`과 `main.rs`에 아래의 내용을 추가하여 의존 크레이트를 추가합니다. [`uart_16550`]: https://docs.rs/uart_16550 ```toml # in Cargo.toml [dependencies] uart_16550 = "0.2.0" ``` `uart_16550` 크레이트는 UART 레지스터를 나타내는 `SerialPort` 구조체 타입을 제공합니다. 이 구조체 타입의 인스턴스를 생성하기 위해 아래와 같이 새 모듈 `serial`을 작성합니다. ```rust // in src/main.rs mod serial; ``` ```rust // in src/serial.rs use uart_16550::SerialPort; use spin::Mutex; use lazy_static::lazy_static; lazy_static! { pub static ref SERIAL1: Mutex = { let mut serial_port = unsafe { SerialPort::new(0x3F8) }; serial_port.init(); Mutex::new(serial_port) }; } ``` [VGA 텍스트 버퍼][vga lazy-static]를 구현할 때와 마찬가지로 `lazy_static` 매크로와 스핀 락을 사용해 정적 변수 `SERIAL1`을 생성했습니다. `lazy_static`을 사용함으로써 `SERIAL1`이 최초로 사용되는 시점에 단 한 번만 `init` 함수가 호출됩니다. `isa-debug-exit` 장치와 마찬가지로 UART 또한 포트 입출력을 통해 프로그래밍 됩니다. UART는 좀 더 복잡해서 장치의 레지스터 여러 개를 이용하기 위해 여러 개의 입출력 포트를 사용합니다. unsafe 함수 `SerialPort::new`는 첫 번째 입출력 포트의 주소를 인자로 받고 그것을 통해 필요한 모든 포트들의 주소들을 알아냅니다. 첫 번째 시리얼 통신 인터페이스의 표준 포트 번호인 `0x3F8`을 인자로 전달합니다. [vga lazy-static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics 직렬 포트를 쉽게 사용할 수 있도록 `serial_print!` 및 `serial_println!` 매크로를 추가해줍니다. ```rust // in src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; SERIAL1.lock().write_fmt(args).expect("Printing to serial failed"); } /// Prints to the host through the serial interface. #[macro_export] macro_rules! serial_print { ($($arg:tt)*) => { $crate::serial::_print(format_args!($($arg)*)); }; } /// Prints to the host through the serial interface, appending a newline. #[macro_export] macro_rules! serial_println { () => ($crate::serial_print!("\n")); ($fmt:expr) => ($crate::serial_print!(concat!($fmt, "\n"))); ($fmt:expr, $($arg:tt)*) => ($crate::serial_print!( concat!($fmt, "\n"), $($arg)*)); } ``` 구현은 이전 포스트에서 작성했던 `print` 및 `println` 매크로와 매우 유사합니다. `SerialPort` 타입은 이미 [`fmt::Write`] 트레이트를 구현하기에 우리가 새로 구현할 필요가 없습니다. [`fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html 이제 VGA 텍스트 버퍼가 아닌 직렬 통신 인터페이스로 메시지를 출력할 수 있습니다. ```rust // in src/main.rs #[cfg(test)] fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); […] } #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` `serial_println` 매크로에 `#[macro_export]` 속성을 적용하여 이제 이 매크로는 프로젝트 루트 네임스페이스에 배정되어 있습니다. 따라서 `use crate::serial::serial_println`을 이용해서는 해당 함수를 불러올 수 없습니다. ### QEMU로 전달해야 할 인자들 QEMU에서 직렬 통신 출력 내용을 확인하려면 QEMU에 `-serial` 인자를 전달하여 출력내용을 표준 출력으로 내보내야 합니다. ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio" ] ``` `cargo test` 실행 시 테스트 결과를 콘솔에서 바로 확인할 수 있습니다. ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [ok] ``` 테스트 실패 시 여전히 출력 메시지가 QEMU에서 출력되는데, 그 이유는 패닉 핸들러가 `println`을 쓰고 있기 때문입니다. 테스트 `trivial_assertion` 내의 가정문을 `assert_eq!(0, 1)`로 변경하고 다시 실행하여 출력 결과를 확인해보세요. ![QEMU printing "Hello World!" and "panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:55:5](qemu-failed-test.png) 다른 테스트 결과는 시리얼 포트를 통해 출력되지만, 패닉 메시지는 여전히 VGA 버퍼에 출력되고 있습니다. 패닉 메시지는 중요한 정보를 포함하기에 콘솔에서 다른 메시지들과 함께 볼 수 있는 편이 더 편리할 것입니다. ### 패닉 시 오류 메시지 출력하기 [조건부 컴파일][conditional compilation]을 통해 테스트 모드에서 다른 패닉 핸들러를 사용하도록 하면, 패닉 발생 시 콘솔에 에러 메시지를 출력한 후 QEMU를 종료시킬 수 있습니다. [conditional compilation]: https://doc.rust-lang.org/1.30.0/book/first-edition/conditional-compilation.html ```rust // in src/main.rs // our existing panic handler #[cfg(not(test))] // new attribute #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } // our panic handler in test mode #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } ``` 테스트용 패닉 핸들러에서는 `println` 대신 `serial_println`을 사용하고, QEMU는 실행 실패를 나타내는 종료 코드를 반환하면서 종료됩니다. 컴파일러는 `isa-debug-exit` 장치가 프로그램을 종료시킨다는 것을 알지 못하기에, `exit_qemu` 호출 이후의 무한 루프는 여전히 필요합니다. 이제 테스트 실패 시에도 QEMU가 종료되고 콘솔에 에러 메시지가 출력됩니다. ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:65:5 ``` 이제 모든 테스트 결과 내용을 콘솔에서 확인할 수 있기에, 잠깐 생겼다가 사라지는 QEMU 윈도우 창은 더 이상 필요하지 않습니다. 이제 QEMU 창을 완전히 숨기는 방법에 대해 알아보겠습니다. ### QEMU 창 숨기기 우린 이제 `isa-debug-exit` 장치와 시리얼 포트를 통해 모든 테스트 결과를 보고하므로 더 이상 QEMU 윈도우 창이 필요하지 않습니다. `-display none` 인자를 QEMU에 전달하면 QEMU 윈도우 창을 숨길 수 있습니다: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio", "-display", "none" ] ``` 이제 QEMU는 완전히 백그라운드에서 동작합니다 (QEMU 윈도우 창이 생성되지 않습니다). 이제 우리의 테스트 프레임워크를 그래픽 사용자 인터페이스가 지원되지 않는 환경(CI 서비스 혹은 [SSH] 연결)에서도 구동할 수 있게 되었습니다. [SSH]: https://en.wikipedia.org/wiki/Secure_Shell ### 타임아웃 `cargo test`는 test_runner가 종료할 때까지 기다리기 때문에, 실행이 끝나지 않는 테스트가 있다면 test_runner와 `cargo test`는 영원히 종료되지 않을 수 있습니다. 일반적인 소프트웨어 개발 상황에서는 무한 루프를 방지하는 것이 어렵지 않습니다. 하지만 커널을 작성하는 경우에는 다양한 상황에서 무한 루프가 발생할 수 있습니다: - 부트로더가 커널을 불러오는 것에 실패하는 경우, 시스템은 무한히 재부팅을 시도합니다. - BIOS/UEFI 펌웨어가 부트로더를 불러오는 것에 실패하는 경우, 시스템은 무한히 재부팅을 시도합니다. - QEMU의 `isa-debug-exit` 장치가 제대로 동작하지 않는 등의 이유로 제어 흐름이 우리가 구현한 함수들의 `loop {}`에 도착하는 경우. - CPU 예외가 제대로 처리되지 않는 등의 이유로 하드웨어가 시스템 리셋을 일으키는 경우. 무한 루프가 발생할 수 있는 경우의 수가 너무 많기에 `bootimage` 도구는 각 테스트 실행에 5분의 시간 제한을 적용합니다. 제한 시간 안에 테스트 실행이 끝나지 않는다면 해당 테스트의 실행은 실패한 것으로 표기되며 "Timed Out"라는 오류 메시지가 콘솔에 출력됩니다. 덕분에 무한 루프에 갇힌 테스트가 있어도 `cargo test`의 실행이 무한히 지속되지는 않습니다. `trivial_assertion` 테스트에 무한 루프 `loop {}`를 추가한 후 실행해보세요. `cargo test` 실행 시 5분 후에 해당 테스트가 시간 제한을 초과했다는 메시지가 출력될 것입니다. Cargo.toml의 `test-timeout` 키 값을 변경하여 [제한 시간을 조정][bootimage config]할 수도 있습니다: [bootimage config]: https://github.com/rust-osdev/bootimage#configuration ```toml # in Cargo.toml [package.metadata.bootimage] test-timeout = 300 # (in seconds) ``` `trivial_assertion` 테스트가 타임아웃 되도록 5분 동안이나 기다리고 싶지 않다면 위의 `test-timeout` 값을 낮추세요. ### 자동으로 출력문 삽입하기 현재 `trivial_assertion` 테스트의 상태 정보는 `serial_print!`/`serial_println!` 매크로를 직접 입력해서 출력하고 있습니다. ```rust #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` 새로운 테스트를 작성할 때마다 매번 출력문을 직접 입력하지 않아도 되도록 `test_runner`를 수정해보겠습니다. 아래와 같이 새로운 `Testable` 트레이트를 작성합니다. ```rust // in src/main.rs pub trait Testable { fn run(&self) -> (); } ``` [`Fn()` 트레이트][`Fn()` trait]를 구현하는 모든 타입 `T`에 대해 `Testable` 트레이트를 구현하는 것이 핵심입니다. [`Fn()` trait]: https://doc.rust-lang.org/stable/core/ops/trait.Fn.html ```rust // in src/main.rs impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } ``` `run` 함수에서 먼저 [`any::type_name`] 함수를 이용해 테스트 함수의 이름을 출력합니다. 이 함수는 컴파일러 단에서 구현된 함수로, 주어진 타입의 이름을 문자열로 반환합니다. 함수 타입의 경우, 함수 이름이 곧 타입의 이름입니다. `\t` 문자는 [탭 문자][tab character]인데 `[ok]` 메시지를 출력 이전에 여백을 삽입합니다. [`any::type_name`]: https://doc.rust-lang.org/stable/core/any/fn.type_name.html [tab character]: https://en.wikipedia.org/wiki/Tab_character 함수명을 출력한 후 `self()`를 통해 테스트 함수를 호출합니다. `self`가 `Fn()` 트레이트를 구현한다는 조건을 걸어놨기 때문에 이것이 가능합니다. 테스트 함수가 반환된 후, `[ok]` 메시지를 출력하여 테스트 함수가 패닉하지 않았다는 것을 알립니다. 마지막으로 `test_runner`가 `Testable` 트레이트를 사용하도록 변경해줍니다. ```rust // in src/main.rs #[cfg(test)] pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); // new } exit_qemu(QemuExitCode::Success); } ``` 인자 `tests`의 타입을 `&[&dyn Fn()]`에서 `&[&dyn Testable]`로 변경했고, `test()` 대신 `test.run()`을 호출합니다. 이제 메시지가 자동으로 출력되기에 테스트 `trivial_assertion`에서 출력문들을 전부 지워줍니다. ```rust // in src/main.rs #[test_case] fn trivial_assertion() { assert_eq!(1, 1); } ``` `cargo test` 실행 시 아래와 같은 출력 내용이 나타날 것입니다. ``` Running 1 tests blog_os::trivial_assertion... [ok] ``` 함수의 크레이트 네임스페이스 안에서의 전체 경로가 함수 이름으로 출력됩니다. 크레이트 내 다른 모듈들이 같은 이름의 테스트를 갖더라도 구분할 수 있습니다. 그 외에 출력 내용이 크게 달라진 것은 없고, 매번 print문을 직접 입력해야 하는 번거로움을 덜었습니다. ## VGA 버퍼 테스트 하기 제대로 작동하는 테스트 프레임워크를 갖췄으니, VGA 버퍼 구현을 테스트할 테스트들을 몇 개 작성해봅시다. 우선 아주 간단한 테스트를 통해 `println`이 패닉하지 않고 실행되는지 확인해봅시다. ```rust // in src/vga_buffer.rs #[test_case] fn test_println_simple() { println!("test_println_simple output"); } ``` 이 테스트는 VGA 버퍼에 간단한 메시지를 출력합니다. 이 테스트 함수가 패닉 없이 실행을 완료한다면 `println` 또한 패닉하지 않았다는 것을 확인할 수 있습니다. 여러 행이 출력되고 기존 행이 화면 밖으로 나가 지워지더라도 패닉이 일어나지 않는다는 것을 확인하기 위해 또다른 테스트를 작성합니다. ```rust // in src/vga_buffer.rs #[test_case] fn test_println_many() { for _ in 0..200 { println!("test_println_many output"); } } ``` 출력된 행들이 화면에 제대로 나타나는지 확인하는 테스트 또한 작성합니다. ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` 이 함수는 테스트 문자열을 정의하여 `println`을 통해 출력한 후, VGA 텍스트 버퍼를 나타내는 `WRITER`를 통해 화면에 출력된 문자들을 하나씩 순회합니다. `println`은 화면의 가장 아래 행에 문자열을 출력한 후 개행 문자를 추가하기 때문에 출력된 문자열은 VGA 버퍼의 `BUFFER_HEIGHT - 2` 번째 행에 저장되어 있습니다. [`enumerate`]를 통해 문자열의 몇 번째 문자를 순회 중인지 변수 `i`에 기록하고, 변수 `c`로 `i`번째 문자에 접근합니다. screen_char의 `ascii_character`와 `c`를 비교하여 문자열 s의 각 문자가 실제로 VGA 텍스트 버퍼에 출력되었는지 점검합니다. [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate 이 외에도 추가로 작성해볼 수 있는 테스트가 많이 있습니다. 아주 긴 문자열을 `println`을 통해 출력했을 때 패닉이 발생 안 하는지와 문자열이 화면 크기에 맞게 적절히 여러 행에 나누어져 제대로 출력되는지 확인하는 테스트를 작성해볼 수 있을 것입니다. 또한 개행 문자와 출력할 수 없는 문자 및 유니코드가 아닌 문자가 오류 없이 처리되는지 점검하는 테스트도 작성해볼 수 있을 것입니다. 이하 본문에서는 여러 컴포넌트들의 상호 작용을 테스트할 수 있는 _통합 테스트_ 를 어떻게 작성하는지 설명하겠습니다. ## 통합 테스트 (Integration Tests) Rust에서는 [통합 테스트][integration tests]들을 프로젝트 루트에 `tests` 디렉터리를 만들어 저장하는 것이 관례입니다. Rust의 기본 테스트 프레임워크와 커스텀 테스트 프레임워크 모두 `tests` 디렉터리에 있는 테스트들을 자동으로 식별하고 실행합니다. [integration tests]: https://doc.rust-lang.org/book/ch11-03-test-organization.html#integration-tests 각 통합 테스트는 `main.rs`와 별개로 독립적인 실행 파일이기에, 실행 시작 함수를 별도로 지정해줘야 합니다. 예제 통합 테스트 `basic_boot`를 작성하면서 그 과정을 자세히 살펴봅시다: ```rust // in tests/basic_boot.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { test_main(); loop {} } fn test_runner(tests: &[&dyn Fn()]) { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { loop {} } ``` 각 통합 테스트는 독립된 실행파일이기에 각각마다의 크레이트 속성(`no_std`, `no_main`, `test_runner` 등)들을 새로 설정해줘야 합니다. 테스트 시작 함수인 `test_main`을 호출할 실행 시작 함수 `_start` 또한 새로 만들어줘야 합니다. 통합 테스트는 테스트 모드가 아닌 이상 빌드되지 않기에 테스트 함수들에 `cfg(test)` 속성을 부여할 필요가 없습니다. `test_runner` 함수에는 항상 패닉하는 [`unimplemented`] 매크로를 넣었고, 패닉 핸들러에는 무한 루프를 넣었습니다. 이 테스트 또한 `main.rs`에서 작성한 테스트와 동일하게 `serial_println` 매크로 및 `exit_qemu` 함수를 이용해 작성하면 좋겠지만, 해당 함수들은 별도의 컴파일 유닛인 `main.rs`에 정의되어 있기에 `basic_boot.rs`에서는 사용할 수 없습니다. [`unimplemented`]: https://doc.rust-lang.org/core/macro.unimplemented.html `cargo test` 명령어를 실행하면 패닉 핸들러 내의 무한 루프 때문에 실행이 끝나지 않습니다. 키보드에서 `Ctrl+c`를 입력해야 QEMU의 실행을 종료할 수 있습니다. ### 라이브러리 생성하기 `main.rs`에서 작성한 코드 일부를 따로 라이브러리 형태로 분리한 후, 통합 테스트에서 해당 라이브러리를 로드하여 필요한 함수들을 사용할 것입니다. 우선 아래와 같이 새 파일 `src/lib.rs`를 생성합니다. ```rust // src/lib.rs #![no_std] ``` `lib.rs` 또한 `main.rs`와 마찬가지로 cargo가 자동으로 인식하는 특별한 파일입니다. `lib.rs`를 통해 생성되는 라이브러리는 별도의 컴파일 유닛이기에 `lib.rs`에 새로 `#![no_std]` 속성을 명시해야 합니다. 이 라이브러리에 `cargo test`를 사용하도록 테스트 함수들과 속성들을 `main.rs`에서 `lib.rs`로 옮겨옵니다. ```rust // in src/lib.rs #![cfg_attr(test, no_main)] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; pub trait Testable { fn run(&self) -> (); } impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); } exit_qemu(QemuExitCode::Success); } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { test_panic_handler(info) } ``` 실행 파일 및 통합 테스트에서 `test_runner`를 사용할 수 있도록, `test_runner`를 `public`으로 설정하고 `cfg(test)` 속성을 적용하지 않았습니다. 또한 다른 실행 파일에서 쓸 수 있도록 패닉 핸들러 구현도 public 함수 `test_panic_handler`로 옮겨놓습니다. `lib.rs`는 `main.rs`와는 독립적으로 테스트됩니다. 그렇기에 라이브러리를 테스트 모드로 빌드할 경우 실행 시작 함수 `_start` 및 패닉 핸들러를 별도로 제공해야 합니다. [`cfg_attr`] 속성을 사용하여 `no_main` 을 인자로 제공해 `no_main` 속성을 테스트 모드 빌드 시에 적용합니다. [`cfg_attr`]: https://doc.rust-lang.org/reference/conditional-compilation.html#the-cfg_attr-attribute `QemuExitcode` enum 과 `exit_qemu` 함수 또한 `src/lib.rs`로 옮기고 public (pub) 키워드를 달아줍니다. ```rust // in src/lib.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` 이제 실행 파일 및 통합 테스트에서 이 라이브러리로부터 함수들을 불러와 사용할 수 있습니다. `println` 와 `serial_println` 또한 사용 가능하도록 모듈 선언을 `lib.rs`로 옮깁니다. ```rust // in src/lib.rs pub mod serial; pub mod vga_buffer; ``` 각 모듈 선언에 `pub` 키워드를 달아주어 라이브러리 밖에서도 해당 모듈들을 사용할 수 있도록 합니다. `println` 및 `serial_println` 매크로가 각각 vga_buffer 모듈과 serial 모듈의 `_print` 함수 구현을 이용하기에 각 모듈 선언에 `pub` 키워드가 꼭 필요합니다. `main.rs`를 수정하여 우리가 만든 라이브러리를 사용해보겠습니다. ```rust // in src/main.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; use blog_os::println; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } /// This function is called on panic. #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` 우리의 라이브러리는 외부 크레이트와 동일한 방식으로 사용 가능합니다. 라이브러리의 이름은 크레이트 이름 (`blog_os`)과 동일하게 설정됩니다. 위 코드에서 `test_runner` 속성에 `blog_os::test_runner` 함수를 사용하며, `cfg(test)` 속성이 적용된 패닉 핸들러에서 `blog_os::test_panic_handler` 함수를 사용합니다. 또한 라이브러리로부터 `println` 매크로를 가져와 `_start` 함수와 `panic` 함수에서 사용합니다. 이제 `cargo run` 및 `cargo test`가 다시 제대로 동작합니다. 물론 `cargo test`는 여전히 무한히 루프하기에 `ctrl+c`를 통해 종료해야 합니다. 통합 테스트에서 우리의 라이브러리 함수들을 이용해 이것을 고쳐보겠습니다. ### 통합 테스트 완료하기 `src/main.rs`처럼 `tests/basic_boot.rs`에서도 우리가 만든 라이브러리에서 타입들을 불러와 사용할 수 있습니다. 우린 이제 필요했던 타입 정보들을 불러와서 테스트 작성을 마칠 수 있게 되었습니다. ```rust // in tests/basic_boot.rs #![test_runner(blog_os::test_runner)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` 테스트 실행 함수를 새로 작성하지 않는 대신 `#![test_runner(crate::test_runner)]` 속성을 `#![test_runner(blog_os::test_runner)]` 속성으로 변경하여 라이브러리의 `test_runner` 함수를 사용합니다. `basic_boot.rs`의 `test_runner` 함수는 이제 필요 없으니 지워줍니다. `main.rs`에서처럼 패닉 핸들러에서 `blog_os::test_panic_handler` 함수를 호출합니다. 다시 `cargo test`를 시도하면 실행을 정상적으로 완료합니다. `lib.rs`와 `main.rs` 그리고 `basic_boot.rs` 각각의 빌드 및 테스트가 따로 실행되는 것을 확인하실 수 있습니다. `main.rs`와 통합 테스트 `basic_boot`의 경우 `#[test_case]` 속성이 적용된 함수가 하나도 없어 "Running 0 tests"라는 메시지가 출력됩니다. `basic_boot.rs`에 테스트들을 추가할 수 있게 되었습니다. VGA 버퍼를 테스트했던 것처럼, 여기서도 `println`이 패닉 없이 잘 동작하는지 테스트 해보겠습니다. ```rust // in tests/basic_boot.rs use blog_os::println; #[test_case] fn test_println() { println!("test_println output"); } ``` `cargo test` 실행 시 테스트 함수들이 제대로 식별되고 실행되는 것을 확인할 수 있습니다. 이 테스트가 VGA 버퍼 테스트 중 하나와 거의 동일해서 이 테스트가 쓸모없어 보일 수 있습니다. 하지만 운영체제 개발을 하면서 점점 `main.rs`의 `_start` 함수와 `lib.rs`의 `_start` 함수에는 서로 다른 초기화 코드가 추가될 수 있기에, 미래에 가서는 두 테스트가 서로 많이 다른 환경을 테스트하게 될 것입니다. `_start` 함수에서 별도의 초기화 함수를 호출하지 않고 바로 `println` 함수를 테스트함으로써 부팅 직후부터 `println`이 제대로 동작하는지를 확인할 수 있습니다. 패닉 메시지 출력에 `println`을 이용하고 있기에 이 함수가 제대로 동작하는 것이 상당히 중요합니다. ### 앞으로 추가할 만한 테스트들 통합 테스트는 크레이트 실행 파일과는 완전히 별개의 실행파일로 취급됩니다. 이 덕에 크레이트와는 별개의 독립적인 환경 설정을 적용할 수 있고, 또한 코드가 CPU 및 하드웨어 장치와 올바르게 상호 작용하는지 테스트할 수 있습니다. `basic_boot`는 통합 테스트의 매우 간단한 예시입니다. 커널을 작성해나가면서 커널의 기능도 점점 많아지고 하드웨어와 상호작용하는 방식도 다양해질 것입니다. 통합 테스트를 통해 커널과 하드웨어와의 상호작용이 예상대로 작동하는지 확인할 수 있습니다. 아래와 같은 방향으로 통합 테스트를 작성해볼 수 있을 것입니다. - **CPU 예외**: 프로그램 코드가 허용되지 않은 작업을 실행하는 경우 (예: 0으로 나누기 연산), CPU는 예외 시그널을 반환합니다. 커널은 이런 예외 상황에 대처할 예외 핸들러를 등록해놓을 수 있습니다. 통합 테스트를 통해 CPU 예외 발생 시 알맞은 예외 핸들러가 호출되는지, 혹은 예외 처리 후 원래 실행 중이던 코드가 문제없이 실행을 계속하는지 확인해볼 수 있습니다. - **페이지 테이블**: 페이지 테이블은 어떤 메모리 영역에 접근할 수 있고 유효한지 정의합니다. 페이지 테이블의 내용을 변경하여 새 프로그램의 실행에 필요한 메모리 영역을 할당할 수 있습니다. 통합 테스트를 통해 `_start` 함수에서 페이지 테이블의 내용을 변경한 후 `#[test_case]` 속성이 부여된 테스트에서 이상 상황이 발생하지 않았는지 확인해볼 수 있습니다. - **사용자 공간 프로그램**: 사용자 공간에서 실행되는 프로그램들은 시스템 자원에 대해 제한된 접근 권한을 가집니다. 예를 들면, 사용자 공간 프로그램은 커널의 자료구조 및 실행 중인 다른 프로그램의 메모리 영역에 접근할 수 없습니다. 통합 테스트를 통해 허용되지 않은 작업을 시도하는 사용자 공간 프로그램을 작성한 후 커널이 이를 제대로 차단하는지 확인해볼 수 있습니다. 통합 테스트를 작성할 아이디어는 많이 있습니다. 테스트들을 작성해놓으면 이후에 커널에 새로운 기능을 추가하거나 코드를 리팩토링 할 때 우리가 실수를 저지르지 않는지 확인할 수 있습니다. 커널 코드 구현이 크고 복잡해질수록 더 중요한 사항입니다. ### 패닉을 가정하는 테스트 표준 라이브러리의 테스트 프레임워크는 [`#[should_panic]` 속성][should_panic]을 지원합니다. 이 속성은 패닉 발생을 가정하는 테스트를 작성할 때 쓰입니다. 예를 들어, 유효하지 않은 인자가 함수에 전달된 경우 실행이 실패하는지 확인할 때 이 속성을 사용합니다. 이 속성은 표준 라이브러리의 지원이 필요해서 `#[no_std]` 크레이트에서는 사용할 수 없습니다. [should_panic]: https://doc.rust-lang.org/rust-by-example/testing/unit_testing.html#testing-panics `#[should_panic]` 속성을 커널에서 직접 사용하지는 못하지만, 패닉 핸들러에서 실행 성공 여부 코드를 반환하는 통합 테스트를 작성하여 비슷한 기능을 얻을 수 있습니다. 아래처럼 `should_panic`이라는 통합 테스트를 작성해보겠습니다. ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{QemuExitCode, exit_qemu, serial_println}; #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` 이 테스트는 아직 `_start` 함수 및 test_runner를 설정하는 속성들을 정의하지 않아 미완성인 상태입니다. 빠진 부분들을 채워줍시다. ```rust // in tests/should_panic.rs #![feature(custom_test_frameworks)] #![test_runner(test_runner)] #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } pub fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); for test in tests { test(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); } exit_qemu(QemuExitCode::Success); } ``` `lib.rs`에서의 `test_runner`를 재사용하지 않습니다. 이 테스트는 자체적인 `test_runner` 함수를 정의하며 이 함수는 테스트가 패닉 없이 반환하는 경우, 실행 실패 코드를 반환하며 종료합니다. 정의된 테스트 함수가 하나도 없다면, `test_runner`는 실행 성공 코드를 반환하며 종료합니다. `test_runner`는 테스트 1개 실행 후 종료할 것이기에 `#[test_case]` 속성이 붙은 함수를 1개 이상 선언하는 것은 무의미합니다. 이제 패닉 발생을 가정하는 테스트를 작성할 수 있습니다. ```rust // in tests/should_panic.rs use blog_os::serial_print; #[test_case] fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } ``` 테스트에서 `assert_eq` 매크로를 이용해 0과 1이 같다는 가정을 합니다. 이 가정은 늘 거짓이기에 테스트는 패닉할 것입니다. 여기서는 `Testable` 트레이트를 사용하지 않았기에, 수동으로 `serial_print!` 매크로를 삽입하여 테스트 함수 이름을 출력합니다. 명령어 `cargo test --test should_panic`을 실행하면 패닉이 발생하여 테스트가 성공하는 것을 확인할 수 있습니다. `assert_eq` 매크로를 사용한 가정문을 지우고 다시 테스트를 실행하면 _"test did not panic"_ 이라는 메시지가 출력되며 테스트가 실패합니다. 이 방식의 큰 문제는 바로 테스트 함수를 하나밖에 쓸 수 없다는 점입니다. 패닉 핸들러가 호출된 후에는 다른 테스트의 실행을 계속할 수가 없어서, `#[test_case]` 속성이 붙은 함수가 여럿 있더라도 첫 함수만 실행이 됩니다. 이 문제의 해결책을 알고 계신다면 제게 꼭 알려주세요! ### 테스트 하네스 (test harness)를 쓰지 않는 테스트 {#no-harness-tests} 테스트 함수가 1개인 통합 테스트 (예: 우리의 `should_panic` 테스트)는 별도의 test_runner가 필요하지 않습니다. 이런 테스트들은 test_runner 사용을 해제하고 `_start` 함수에서 직접 실행해도 됩니다. 여기서 핵심은 `Cargo.toml`에서 해당 테스트에 대해 `harness` 플래그를 해제하는 것입니다. 이 플래그는 통합 테스트에 대해 test_runner의 사용 유무를 설정합니다. 플래그가 `false`로 설정된 경우, 기본 및 커스텀 test_runner 모두 사용이 해제되고, 테스트는 일반 실행파일로 취급됩니다. `should_panic` 테스트에서 `harness` 플래그를 false로 설정합니다. ```toml # in Cargo.toml [[test]] name = "should_panic" harness = false ``` `should_panic` 테스트에서 test_runner 사용에 필요한 코드를 모두 제거하면 아래처럼 간소해집니다. ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{exit_qemu, serial_print, serial_println, QemuExitCode}; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { should_fail(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); loop{} } fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` 이제 `_start` 함수에서 직접 `should_fail` 함수를 호출하며, `should_fail` 함수가 반환하는 경우 `_start` 함수가 실행 실패를 나타내는 종료 코드를 반환하며 종료합니다. `cargo test --test should_panic`을 실행하여 테스트 결과는 이전과 동일함을 확인할 수 있습니다. `harness` 속성을 해제하는 것은 복잡한 통합 테스트들을 실행할 때도 유용할 수 있습니다. 예를 들면, 테스트 함수마다 실행 환경에 특정 side effect를 일으키는 경우, 테스트들 간의 실행 순서가 중요하기에 `harness` 속성을 해제하고 테스트들을 원하는 순서대로 실행할 수 있습니다. ## 정리 소프트웨어 테스트는 각 컴포넌트가 예상대로 동작하는지 확인하는 데에 매우 유용합니다. 테스트를 통해 버그의 부재를 보장할 수는 없지만, 개발 중 새롭게 등장한 버그 및 기존의 버그를 찾아내는 데에 여전히 도움이 많이 됩니다. 이 글에서는 Rust 커널 테스트용 프레임워크를 설정하는 방법을 다뤘습니다. Rust가 지원하는 커스텀 테스트 프레임워크 기능을 통해 베어 메탈 환경에서 `#[test_case]` 속성이 적용된 테스트를 지원하는 기능을 구현했습니다. QEMU의 `isa-debug-exit` 장치를 사용해 `test_runner`가 테스트 완료 후 QEMU를 종료하고 테스트 결과를 보고하도록 만들었습니다. VGA 버퍼 대신 콘솔에 에러 메시지를 출력하기 위해 시리얼 포트를 이용하는 기초적인 드라이버 프로그램을 만들었습니다. `println` 매크로의 구현을 점검하는 테스트들을 작성한 후, 이 글의 후반부에서는 통합 테스트 작성에 대해 다뤘습니다. 통합 테스트는 `tests` 디렉터리에 저장되며 별도의 실행파일로 취급된다는 것을 배웠습니다. 통합 테스트에서 `exit_qemu` 함수 및 `serial_println` 매크로를 사용할 수 있도록 필요한 코드 구현을 크레이트 내 새 라이브러리로 옮겼습니다. 통합 테스트는 분리된 환경에서 실행됩니다. 따라서 통합 테스트를 통해 하드웨어와의 상호작용을 구현한 코드를 시험해볼 수 있으며, 패닉 발생을 가정하는 테스트를 작성할 수도 있습니다. 실제 하드웨어 환경과 유사한 QEMU 상에서 동작하는 테스트 프레임워크를 완성했습니다. 앞으로 커널이 더 복잡해지더라도 더 많은 테스트를 작성하면서 커널 코드를 유지보수할 수 있을 것입니다. ## 다음 단계는 무엇일까요? 다음 글에서는 _CPU exception (예외)_ 에 대해 알아볼 것입니다. 분모가 0인 나누기 연산 혹은 매핑되지 않은 메모리 페이지에 대한 접근 (페이지 폴트) 등 허가되지 않은 작업이 일어났을 때 CPU가 예외를 발생시킵니다. 이러한 예외 발생을 포착하고 분석할 수 있어야 앞으로 커널에 발생할 수많은 오류를 디버깅할 수 있을 것입니다. 예외를 처리하는 과정은 하드웨어 인터럽트를 처리하는 과정(예: 컴퓨터의 키보드 입력을 지원할 때)과 매우 유사합니다. ================================================ FILE: blog/content/edition-2/posts/04-testing/index.md ================================================ +++ title = "Testing" weight = 4 path = "testing" date = 2019-04-27 [extra] chapter = "Bare Bones" comments_search_term = 1009 +++ This post explores unit and integration testing in `no_std` executables. We will use Rust's support for custom test frameworks to execute test functions inside our kernel. To report the results out of QEMU, we will use different features of QEMU and the `bootimage` tool. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-04`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-04 ## Requirements This post replaces the (now deprecated) [_Unit Testing_] and [_Integration Tests_] posts. It assumes that you have followed the [_A Minimal Rust Kernel_] post after 2019-04-27. Mainly, it requires that you have a `.cargo/config.toml` file that [sets a default target] and [defines a runner executable]. [_Unit Testing_]: @/edition-2/posts/deprecated/04-unit-testing/index.md [_Integration Tests_]: @/edition-2/posts/deprecated/05-integration-tests/index.md [_A Minimal Rust Kernel_]: @/edition-2/posts/02-minimal-rust-kernel/index.md [sets a default target]: @/edition-2/posts/02-minimal-rust-kernel/index.md#set-a-default-target [defines a runner executable]: @/edition-2/posts/02-minimal-rust-kernel/index.md#using-cargo-run ## Testing in Rust Rust has a [built-in test framework] that is capable of running unit tests without the need to set anything up. Just create a function that checks some results through assertions and add the `#[test]` attribute to the function header. Then `cargo test` will automatically find and execute all test functions of your crate. [built-in test framework]: https://doc.rust-lang.org/book/ch11-00-testing.html To enable testing for our kernel binary, we can set the `test` flag in the Cargo.toml to `true`: ```toml # in Cargo.toml [[bin]] name = "blog_os" test = true bench = false ``` This [`[[bin]]` section](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#configuring-a-target) specifies how `cargo` should compile our `blog_os` executable. The `test` field specifies whether testing is supported for this executable. We set `test = false` in the first post to [make `rust-analyzer` happy](@/edition-2/posts/01-freestanding-rust-binary/index.md#making-rust-analyzer-happy), but now we want to enable testing, so we set it back to `true`. Unfortunately, testing is a bit more complicated for `no_std` applications such as our kernel. The problem is that Rust's test framework implicitly uses the built-in [`test`] library, which depends on the standard library. This means that we can't use the default test framework for our `#[no_std]` kernel. [`test`]: https://doc.rust-lang.org/test/index.html We can see this when we try to run `cargo test` in our project: ``` > cargo test Compiling blog_os v0.1.0 (/…/blog_os) error[E0463]: can't find crate for `test` ``` Since the `test` crate depends on the standard library, it is not available for our bare metal target. While porting the `test` crate to a `#[no_std]` context [is possible][utest], it is highly unstable and requires some hacks, such as redefining the `panic` macro. [utest]: https://github.com/japaric/utest ### Custom Test Frameworks Fortunately, Rust supports replacing the default test framework through the unstable [`custom_test_frameworks`] feature. This feature requires no external libraries and thus also works in `#[no_std]` environments. It works by collecting all functions annotated with a `#[test_case]` attribute and then invoking a user-specified runner function with the list of tests as an argument. Thus, it gives the implementation maximal control over the test process. [`custom_test_frameworks`]: https://doc.rust-lang.org/unstable-book/language-features/custom-test-frameworks.html The disadvantage compared to the default test framework is that many advanced features, such as [`should_panic` tests], are not available. Instead, it is up to the implementation to provide such features itself if needed. This is ideal for us since we have a very special execution environment where the default implementations of such advanced features probably wouldn't work anyway. For example, the `#[should_panic]` attribute relies on stack unwinding to catch the panics, which we disabled for our kernel. [`should_panic` tests]: https://doc.rust-lang.org/book/ch11-01-writing-tests.html#checking-for-panics-with-should_panic To implement a custom test framework for our kernel, we add the following to our `main.rs`: ```rust // in src/main.rs #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #[cfg(test)] pub fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } } ``` Our runner just prints a short debug message and then calls each test function in the list. The argument type `&[&dyn Fn()]` is a [_slice_] of [_trait object_] references of the [_Fn()_] trait. It is basically a list of references to types that can be called like a function. Since the function is useless for non-test runs, we use the `#[cfg(test)]` attribute to include it only for tests. [_slice_]: https://doc.rust-lang.org/std/primitive.slice.html [_trait object_]: https://doc.rust-lang.org/1.30.0/book/first-edition/trait-objects.html [_Fn()_]: https://doc.rust-lang.org/std/ops/trait.Fn.html When we run `cargo test` now, we see that it now succeeds (if it doesn't, see the note below). However, we still see our "Hello World" instead of the message from our `test_runner`. The reason is that our `_start` function is still used as entry point. The custom test frameworks feature generates a `main` function that calls `test_runner`, but this function is ignored because we use the `#[no_main]` attribute and provide our own entry point.
    **Note:** There is currently a bug in cargo that leads to "duplicate lang item" errors on `cargo test` in some cases. It occurs when you have set `panic = "abort"` for a profile in your `Cargo.toml`. Try removing it, then `cargo test` should work. Alternatively, if that doesn't work, then add `panic-abort-tests = true` to the `[unstable]` section of your `.cargo/config.toml` file. See the [cargo issue](https://github.com/rust-lang/cargo/issues/7359) for more information on this.
    To fix this, we first need to change the name of the generated function to something different than `main` through the `reexport_test_harness_main` attribute. Then we can call the renamed function from our `_start` function: ```rust // in src/main.rs #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } ``` We set the name of the test framework entry function to `test_main` and call it from our `_start` entry point. We use [conditional compilation] to add the call to `test_main` only in test contexts because the function is not generated on a normal run. When we now execute `cargo test`, we see the "Running 0 tests" message from our `test_runner` on the screen. We are now ready to create our first test function: ```rust // in src/main.rs #[test_case] fn trivial_assertion() { print!("trivial assertion... "); assert_eq!(1, 1); println!("[ok]"); } ``` When we run `cargo test` now, we see the following output: ![QEMU printing "Hello World!", "Running 1 tests", and "trivial assertion... [ok]"](qemu-test-runner-output.png) The `tests` slice passed to our `test_runner` function now contains a reference to the `trivial_assertion` function. From the `trivial assertion... [ok]` output on the screen, we see that the test was called and that it succeeded. After executing the tests, our `test_runner` returns to the `test_main` function, which in turn returns to our `_start` entry point function. At the end of `_start`, we enter an endless loop because the entry point function is not allowed to return. This is a problem, because we want `cargo test` to exit after running all tests. ## Exiting QEMU Right now, we have an endless loop at the end of our `_start` function and need to close QEMU manually on each execution of `cargo test`. This is unfortunate because we also want to run `cargo test` in scripts without user interaction. The clean solution to this would be to implement a proper way to shutdown our OS. Unfortunately, this is relatively complex because it requires implementing support for either the [APM] or [ACPI] power management standard. [APM]: https://wiki.osdev.org/APM [ACPI]: https://wiki.osdev.org/ACPI Luckily, there is an escape hatch: QEMU supports a special `isa-debug-exit` device, which provides an easy way to exit QEMU from the guest system. To enable it, we need to pass a `-device` argument to QEMU. We can do so by adding a `package.metadata.bootimage.test-args` configuration key in our `Cargo.toml`: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = ["-device", "isa-debug-exit,iobase=0xf4,iosize=0x04"] ``` The `bootimage runner` appends the `test-args` to the default QEMU command for all test executables. For a normal `cargo run`, the arguments are ignored. Together with the device name (`isa-debug-exit`), we pass the two parameters `iobase` and `iosize` that specify the _I/O port_ through which the device can be reached from our kernel. ### I/O Ports There are two different approaches for communicating between the CPU and peripheral hardware on x86, **memory-mapped I/O** and **port-mapped I/O**. We already used memory-mapped I/O for accessing the [VGA text buffer] through the memory address `0xb8000`. This address is not mapped to RAM but to some memory on the VGA device. [VGA text buffer]: @/edition-2/posts/03-vga-text-buffer/index.md In contrast, port-mapped I/O uses a separate I/O bus for communication. Each connected peripheral has one or more port numbers. To communicate with such an I/O port, there are special CPU instructions called `in` and `out`, which take a port number and a data byte (there are also variations of these commands that allow sending a `u16` or `u32`). The `isa-debug-exit` device uses port-mapped I/O. The `iobase` parameter specifies on which port address the device should live (`0xf4` is a [generally unused][list of x86 I/O ports] port on the x86's IO bus) and the `iosize` specifies the port size (`0x04` means four bytes). [list of x86 I/O ports]: https://wiki.osdev.org/I/O_Ports#The_list ### Using the Exit Device The functionality of the `isa-debug-exit` device is very simple. When a `value` is written to the I/O port specified by `iobase`, it causes QEMU to exit with [exit status] `(value << 1) | 1`. So when we write `0` to the port, QEMU will exit with exit status `(0 << 1) | 1 = 1`, and when we write `1` to the port, it will exit with exit status `(1 << 1) | 1 = 3`. [exit status]: https://en.wikipedia.org/wiki/Exit_status Instead of manually invoking the `in` and `out` assembly instructions, we use the abstractions provided by the [`x86_64`] crate. To add a dependency on that crate, we add it to the `dependencies` section in our `Cargo.toml`: [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/ ```toml # in Cargo.toml [dependencies] x86_64 = "0.14.2" ``` Now we can use the [`Port`] type provided by the crate to create an `exit_qemu` function: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html ```rust // in src/main.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` The function creates a new [`Port`] at `0xf4`, which is the `iobase` of the `isa-debug-exit` device. Then it writes the passed exit code to the port. We use `u32` because we specified the `iosize` of the `isa-debug-exit` device as 4 bytes. Both operations are unsafe because writing to an I/O port can generally result in arbitrary behavior. To specify the exit status, we create a `QemuExitCode` enum. The idea is to exit with the success exit code if all tests succeeded and with the failure exit code otherwise. The enum is marked as `#[repr(u32)]` to represent each variant by a `u32` integer. We use the exit code `0x10` for success and `0x11` for failure. The actual exit codes don't matter much, as long as they don't clash with the default exit codes of QEMU. For example, using exit code `0` for success is not a good idea because it becomes `(0 << 1) | 1 = 1` after the transformation, which is the default exit code when QEMU fails to run. So we could not differentiate a QEMU error from a successful test run. We can now update our `test_runner` to exit QEMU after all tests have run: ```rust // in src/main.rs fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } /// new exit_qemu(QemuExitCode::Success); } ``` When we run `cargo test` now, we see that QEMU immediately closes after executing the tests. The problem is that `cargo test` interprets the test as failed even though we passed our `Success` exit code: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.03s Running target/x86_64-blog_os/debug/deps/blog_os-5804fc7d2dd4c9be Building bootloader Compiling bootloader v0.5.3 (/home/philipp/Documents/bootloader) Finished release [optimized + debuginfo] target(s) in 1.07s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-5804fc7d2dd4c9be.bin -device isa-debug-exit,iobase=0xf4, iosize=0x04` error: test failed, to rerun pass '--bin blog_os' ``` The problem is that `cargo test` considers all error codes other than `0` as failure. ### Success Exit Code To work around this, `bootimage` provides a `test-success-exit-code` configuration key that maps a specified exit code to the exit code `0`: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = […] test-success-exit-code = 33 # (0x10 << 1) | 1 ``` With this configuration, `bootimage` maps our success exit code to exit code 0, so that `cargo test` correctly recognizes the success case and does not count the test as failed. Our test runner now automatically closes QEMU and correctly reports the test results. We still see the QEMU window open for a very short time, but it does not suffice to read the results. It would be nice if we could print the test results to the console instead, so we can still see them after QEMU exits. ## Printing to the Console To see the test output on the console, we need to send the data from our kernel to the host system somehow. There are various ways to achieve this, for example, by sending the data over a TCP network interface. However, setting up a networking stack is quite a complex task, so we will choose a simpler solution instead. ### Serial Port A simple way to send the data is to use the [serial port], an old interface standard which is no longer found in modern computers. It is easy to program and QEMU can redirect the bytes sent over serial to the host's standard output or a file. [serial port]: https://en.wikipedia.org/wiki/Serial_port The chips implementing a serial interface are called [UARTs]. There are [lots of UART models] on x86, but fortunately the only differences between them are some advanced features we don't need. The common UARTs today are all compatible with the [16550 UART], so we will use that model for our testing framework. [UARTs]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter [lots of UART models]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#Models [16550 UART]: https://en.wikipedia.org/wiki/16550_UART We will use the [`uart_16550`] crate to initialize the UART and send data over the serial port. To add it as a dependency, we update our `Cargo.toml` and `main.rs`: [`uart_16550`]: https://docs.rs/uart_16550 ```toml # in Cargo.toml [dependencies] uart_16550 = "0.2.0" ``` The `uart_16550` crate contains a `SerialPort` struct that represents the UART registers, but we still need to construct an instance of it ourselves. For that, we create a new `serial` module with the following content: ```rust // in src/main.rs mod serial; ``` ```rust // in src/serial.rs use uart_16550::SerialPort; use spin::Mutex; use lazy_static::lazy_static; lazy_static! { pub static ref SERIAL1: Mutex = { let mut serial_port = unsafe { SerialPort::new(0x3F8) }; serial_port.init(); Mutex::new(serial_port) }; } ``` Like with the [VGA text buffer][vga lazy-static], we use `lazy_static` and a spinlock to create a `static` writer instance. By using `lazy_static` we can ensure that the `init` method is called exactly once on its first use. Like the `isa-debug-exit` device, the UART is programmed using port I/O. Since the UART is more complex, it uses multiple I/O ports for programming different device registers. The unsafe `SerialPort::new` function expects the address of the first I/O port of the UART as an argument, from which it can calculate the addresses of all needed ports. We're passing the port address `0x3F8`, which is the standard port number for the first serial interface. [vga lazy-static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics To make the serial port easily usable, we add `serial_print!` and `serial_println!` macros: ```rust // in src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; SERIAL1.lock().write_fmt(args).expect("Printing to serial failed"); } /// Prints to the host through the serial interface. #[macro_export] macro_rules! serial_print { ($($arg:tt)*) => { $crate::serial::_print(format_args!($($arg)*)); }; } /// Prints to the host through the serial interface, appending a newline. #[macro_export] macro_rules! serial_println { () => ($crate::serial_print!("\n")); ($fmt:expr) => ($crate::serial_print!(concat!($fmt, "\n"))); ($fmt:expr, $($arg:tt)*) => ($crate::serial_print!( concat!($fmt, "\n"), $($arg)*)); } ``` The implementation is very similar to the implementation of our `print` and `println` macros. Since the `SerialPort` type already implements the [`fmt::Write`] trait, we don't need to provide our own implementation. [`fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html Now we can print to the serial interface instead of the VGA text buffer in our test code: ```rust // in src/main.rs #[cfg(test)] fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); […] } #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` Note that the `serial_println` macro lives directly under the root namespace because we used the `#[macro_export]` attribute, so importing it through `use crate::serial::serial_println` will not work. ### QEMU Arguments To see the serial output from QEMU, we need to use the `-serial` argument to redirect the output to stdout: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio" ] ``` When we run `cargo test` now, we see the test output directly in the console: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [ok] ``` However, when a test fails, we still see the output inside QEMU because our panic handler still uses `println`. To simulate this, we can change the assertion in our `trivial_assertion` test to `assert_eq!(0, 1)`: ![QEMU printing "Hello World!" and "panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:55:5](qemu-failed-test.png) We see that the panic message is still printed to the VGA buffer, while the other test output is printed to the serial port. The panic message is quite useful, so it would be useful to see it in the console too. ### Print an Error Message on Panic To exit QEMU with an error message on a panic, we can use [conditional compilation] to use a different panic handler in testing mode: [conditional compilation]: https://doc.rust-lang.org/1.30.0/book/first-edition/conditional-compilation.html ```rust // in src/main.rs // our existing panic handler #[cfg(not(test))] // new attribute #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } // our panic handler in test mode #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } ``` For our test panic handler, we use `serial_println` instead of `println` and then exit QEMU with a failure exit code. Note that we still need an endless `loop` after the `exit_qemu` call because the compiler does not know that the `isa-debug-exit` device causes a program exit. Now QEMU also exits for failed tests and prints a useful error message on the console: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:65:5 ``` Since we see all test output on the console now, we no longer need the QEMU window that pops up for a short time. So we can hide it completely. ### Hiding QEMU Since we report out the complete test results using the `isa-debug-exit` device and the serial port, we don't need the QEMU window anymore. We can hide it by passing the `-display none` argument to QEMU: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio", "-display", "none" ] ``` Now QEMU runs completely in the background and no window gets opened anymore. This is not only less annoying, but also allows our test framework to run in environments without a graphical user interface, such as CI services or [SSH] connections. [SSH]: https://en.wikipedia.org/wiki/Secure_Shell ### Timeouts Since `cargo test` waits until the test runner exits, a test that never returns can block the test runner forever. That's unfortunate, but not a big problem in practice since it's usually easy to avoid endless loops. In our case, however, endless loops can occur in various situations: - The bootloader fails to load our kernel, which causes the system to reboot endlessly. - The BIOS/UEFI firmware fails to load the bootloader, which causes the same endless rebooting. - The CPU enters a `loop {}` statement at the end of some of our functions, for example because the QEMU exit device doesn't work properly. - The hardware causes a system reset, for example when a CPU exception is not caught (explained in a future post). Since endless loops can occur in so many situations, the `bootimage` tool sets a timeout of 5 minutes for each test executable by default. If the test does not finish within this time, it is marked as failed and a "Timed Out" error is printed to the console. This feature ensures that tests that are stuck in an endless loop don't block `cargo test` forever. You can try it yourself by adding a `loop {}` statement in the `trivial_assertion` test. When you run `cargo test`, you see that the test is marked as timed out after 5 minutes. The timeout duration is [configurable][bootimage config] through a `test-timeout` key in the Cargo.toml: [bootimage config]: https://github.com/rust-osdev/bootimage#configuration ```toml # in Cargo.toml [package.metadata.bootimage] test-timeout = 300 # (in seconds) ``` If you don't want to wait 5 minutes for the `trivial_assertion` test to time out, you can temporarily decrease the above value. ### Insert Printing Automatically Our `trivial_assertion` test currently needs to print its own status information using `serial_print!`/`serial_println!`: ```rust #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` Manually adding these print statements for every test we write is cumbersome, so let's update our `test_runner` to print these messages automatically. To do that, we need to create a new `Testable` trait: ```rust // in src/main.rs pub trait Testable { fn run(&self) -> (); } ``` The trick now is to implement this trait for all types `T` that implement the [`Fn()` trait]: [`Fn()` trait]: https://doc.rust-lang.org/stable/core/ops/trait.Fn.html ```rust // in src/main.rs impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } ``` We implement the `run` function by first printing the function name using the [`any::type_name`] function. This function is implemented directly in the compiler and returns a string description of every type. For functions, the type is their name, so this is exactly what we want in this case. The `\t` character is the [tab character], which adds some alignment to the `[ok]` messages. [`any::type_name`]: https://doc.rust-lang.org/stable/core/any/fn.type_name.html [tab character]: https://en.wikipedia.org/wiki/Tab_character After printing the function name, we invoke the test function through `self()`. This only works because we require that `self` implements the `Fn()` trait. After the test function returns, we print `[ok]` to indicate that the function did not panic. The last step is to update our `test_runner` to use the new `Testable` trait: ```rust // in src/main.rs #[cfg(test)] pub fn test_runner(tests: &[&dyn Testable]) { // new serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); // new } exit_qemu(QemuExitCode::Success); } ``` The only two changes are the type of the `tests` argument from `&[&dyn Fn()]` to `&[&dyn Testable]` and the fact that we now call `test.run()` instead of `test()`. We can now remove the print statements from our `trivial_assertion` test since they're now printed automatically: ```rust // in src/main.rs #[test_case] fn trivial_assertion() { assert_eq!(1, 1); } ``` The `cargo test` output now looks like this: ``` Running 1 tests blog_os::trivial_assertion... [ok] ``` The function name now includes the full path to the function, which is useful when test functions in different modules have the same name. Otherwise, the output looks the same as before, but we no longer need to add print statements to our tests manually. ## Testing the VGA Buffer Now that we have a working test framework, we can create a few tests for our VGA buffer implementation. First, we create a very simple test to verify that `println` works without panicking: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_simple() { println!("test_println_simple output"); } ``` The test just prints something to the VGA buffer. If it finishes without panicking, it means that the `println` invocation did not panic either. To ensure that no panic occurs even if many lines are printed and lines are shifted off the screen, we can create another test: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_many() { for _ in 0..200 { println!("test_println_many output"); } } ``` We can also create a test function to verify that the printed lines really appear on the screen: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` The function defines a test string, prints it using `println`, and then iterates over the screen characters of the static `WRITER`, which represents the VGA text buffer. Since `println` prints to the last screen line and then immediately appends a newline, the string should appear on line `BUFFER_HEIGHT - 2`. By using [`enumerate`], we count the number of iterations in the variable `i`, which we then use for loading the screen character corresponding to `c`. By comparing the `ascii_character` of the screen character with `c`, we ensure that each character of the string really appears in the VGA text buffer. [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate As you can imagine, we could create many more test functions. For example, a function that tests that no panic occurs when printing very long lines and that they're wrapped correctly, or a function for testing that newlines, non-printable characters, and non-unicode characters are handled correctly. For the rest of this post, however, we will explain how to create _integration tests_ to test the interaction of different components together. ## Integration Tests The convention for [integration tests] in Rust is to put them into a `tests` directory in the project root (i.e., next to the `src` directory). Both the default test framework and custom test frameworks will automatically pick up and execute all tests in that directory. [integration tests]: https://doc.rust-lang.org/book/ch11-03-test-organization.html#integration-tests All integration tests are their own executables and completely separate from our `main.rs`. This means that each test needs to define its own entry point function. Let's create an example integration test named `basic_boot` to see how it works in detail: ```rust // in tests/basic_boot.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { test_main(); loop {} } fn test_runner(tests: &[&dyn Fn()]) { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { loop {} } ``` Since integration tests are separate executables, we need to provide all the crate attributes (`no_std`, `no_main`, `test_runner`, etc.) again. We also need to create a new entry point function `_start`, which calls the test entry point function `test_main`. We don't need any `cfg(test)` attributes because integration test executables are never built in non-test mode. We use the [`unimplemented`] macro that always panics as a placeholder for the `test_runner` function and just `loop` in the `panic` handler for now. Ideally, we want to implement these functions exactly as we did in our `main.rs` using the `serial_println` macro and the `exit_qemu` function. The problem is that we don't have access to these functions since tests are built completely separately from our `main.rs` executable. [`unimplemented`]: https://doc.rust-lang.org/core/macro.unimplemented.html If you run `cargo test` at this stage, you will get an endless loop because the panic handler loops endlessly. You need to use the `ctrl+c` keyboard shortcut for exiting QEMU. ### Create a Library To make the required functions available to our integration test, we need to split off a library from our `main.rs`, which can be included by other crates and integration test executables. To do this, we create a new `src/lib.rs` file: ```rust // src/lib.rs #![no_std] ``` Like the `main.rs`, the `lib.rs` is a special file that is automatically recognized by cargo. The library is a separate compilation unit, so we need to specify the `#![no_std]` attribute again. To make our library work with `cargo test`, we need to also move the test functions and attributes from `main.rs` to `lib.rs`: ```rust // in src/lib.rs #![cfg_attr(test, no_main)] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; pub trait Testable { fn run(&self) -> (); } impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); } exit_qemu(QemuExitCode::Success); } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { test_panic_handler(info) } ``` To make our `test_runner` available to executables and integration tests, we make it public and don't apply the `cfg(test)` attribute to it. We also factor out the implementation of our panic handler into a public `test_panic_handler` function, so that it is available for executables too. Since our `lib.rs` is tested independently of our `main.rs`, we need to add a `_start` entry point and a panic handler when the library is compiled in test mode. By using the [`cfg_attr`] crate attribute, we conditionally enable the `no_main` attribute in this case. [`cfg_attr`]: https://doc.rust-lang.org/reference/conditional-compilation.html#the-cfg_attr-attribute We also move over the `QemuExitCode` enum and the `exit_qemu` function and make them public: ```rust // in src/lib.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` Now executables and integration tests can import these functions from the library and don't need to define their own implementations. To also make `println` and `serial_println` available, we move the module declarations too: ```rust // in src/lib.rs pub mod serial; pub mod vga_buffer; ``` We make the modules public to make them usable outside of our library. This is also required for making our `println` and `serial_println` macros usable since they use the `_print` functions of the modules. Now we can update our `main.rs` to use the library: ```rust // in src/main.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; use blog_os::println; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } /// This function is called on panic. #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` The library is usable like a normal external crate. It is called `blog_os`, like our crate. The above code uses the `blog_os::test_runner` function in the `test_runner` attribute and the `blog_os::test_panic_handler` function in our `cfg(test)` panic handler. It also imports the `println` macro to make it available to our `_start` and `panic` functions. At this point, `cargo run` and `cargo test` should work again. Of course, `cargo test` still loops endlessly (you can exit with `ctrl+c`). Let's fix this by using the required library functions in our integration test. ### Completing the Integration Test Like our `src/main.rs`, our `tests/basic_boot.rs` executable can import types from our new library. This allows us to import the missing components to complete our test: ```rust // in tests/basic_boot.rs #![test_runner(blog_os::test_runner)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` Instead of reimplementing the test runner, we use the `test_runner` function from our library by changing the `#![test_runner(crate::test_runner)]` attribute to `#![test_runner(blog_os::test_runner)]`. We then don't need the `test_runner` stub function in `basic_boot.rs` anymore, so we can remove it. For our `panic` handler, we call the `blog_os::test_panic_handler` function like we did in our `main.rs`. Now `cargo test` exits normally again. When you run it, you will see that it builds and runs the tests for our `lib.rs`, `main.rs`, and `basic_boot.rs` separately after each other. For the `main.rs` and the `basic_boot` integration tests, it reports "Running 0 tests" since these files don't have any functions annotated with `#[test_case]`. We can now add tests to our `basic_boot.rs`. For example, we can test that `println` works without panicking, like we did in the VGA buffer tests: ```rust // in tests/basic_boot.rs use blog_os::println; #[test_case] fn test_println() { println!("test_println output"); } ``` When we run `cargo test` now, we see that it finds and executes the test function. The test might seem a bit useless right now since it's almost identical to one of the VGA buffer tests. However, in the future, the `_start` functions of our `main.rs` and `lib.rs` might grow and call various initialization routines before running the `test_main` function, so that the two tests are executed in very different environments. By testing `println` in a `basic_boot` environment without calling any initialization routines in `_start`, we can ensure that `println` works right after booting. This is important because we rely on it, e.g., for printing panic messages. ### Future Tests The power of integration tests is that they're treated as completely separate executables. This gives them complete control over the environment, which makes it possible to test that the code interacts correctly with the CPU or hardware devices. Our `basic_boot` test is a very simple example of an integration test. In the future, our kernel will become much more featureful and interact with the hardware in various ways. By adding integration tests, we can ensure that these interactions work (and keep working) as expected. Some ideas for possible future tests are: - **CPU Exceptions**: When the code performs invalid operations (e.g., divides by zero), the CPU throws an exception. The kernel can register handler functions for such exceptions. An integration test could verify that the correct exception handler is called when a CPU exception occurs or that the execution continues correctly after a resolvable exception. - **Page Tables**: Page tables define which memory regions are valid and accessible. By modifying the page tables, it is possible to allocate new memory regions, for example when launching programs. An integration test could modify the page tables in the `_start` function and verify that the modifications have the desired effects in `#[test_case]` functions. - **Userspace Programs**: Userspace programs are programs with limited access to the system's resources. For example, they don't have access to kernel data structures or to the memory of other programs. An integration test could launch userspace programs that perform forbidden operations and verify that the kernel prevents them all. As you can imagine, many more tests are possible. By adding such tests, we can ensure that we don't break them accidentally when we add new features to our kernel or refactor our code. This is especially important when our kernel becomes larger and more complex. ### Tests that Should Panic The test framework of the standard library supports a [`#[should_panic]` attribute][should_panic] that allows constructing tests that should fail. This is useful, for example, to verify that a function fails when an invalid argument is passed. Unfortunately, this attribute isn't supported in `#[no_std]` crates since it requires support from the standard library. [should_panic]: https://doc.rust-lang.org/rust-by-example/testing/unit_testing.html#testing-panics While we can't use the `#[should_panic]` attribute in our kernel, we can get similar behavior by creating an integration test that exits with a success error code from the panic handler. Let's start creating such a test with the name `should_panic`: ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{QemuExitCode, exit_qemu, serial_println}; #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` This test is still incomplete as it doesn't define a `_start` function or any of the custom test runner attributes yet. Let's add the missing parts: ```rust // in tests/should_panic.rs #![feature(custom_test_frameworks)] #![test_runner(test_runner)] #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } pub fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); for test in tests { test(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); } exit_qemu(QemuExitCode::Success); } ``` Instead of reusing the `test_runner` from our `lib.rs`, the test defines its own `test_runner` function that exits with a failure exit code when a test returns without panicking (we want our tests to panic). If no test function is defined, the runner exits with a success error code. Since the runner always exits after running a single test, it does not make sense to define more than one `#[test_case]` function. Now we can create a test that should fail: ```rust // in tests/should_panic.rs use blog_os::serial_print; #[test_case] fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } ``` The test uses `assert_eq` to assert that `0` and `1` are equal. Of course, this fails, so our test panics as desired. Note that we need to manually print the function name using `serial_print!` here because we don't use the `Testable` trait. When we run the test through `cargo test --test should_panic` we see that it is successful because the test panicked as expected. When we comment out the assertion and run the test again, we see that it indeed fails with the _"test did not panic"_ message. A significant drawback of this approach is that it only works for a single test function. With multiple `#[test_case]` functions, only the first function is executed because the execution cannot continue after the panic handler has been called. I currently don't know of a good way to solve this problem, so let me know if you have an idea! ### No Harness Tests For integration tests that only have a single test function (like our `should_panic` test), the test runner isn't really needed. For cases like this, we can disable the test runner completely and run our test directly in the `_start` function. The key to this is to disable the `harness` flag for the test in the `Cargo.toml`, which defines whether a test runner is used for an integration test. When it's set to `false`, both the default test runner and the custom test runner feature are disabled, so that the test is treated like a normal executable. Let's disable the `harness` flag for our `should_panic` test: ```toml # in Cargo.toml [[test]] name = "should_panic" harness = false ``` Now we vastly simplify our `should_panic` test by removing the `test_runner`-related code. The result looks like this: ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{exit_qemu, serial_print, serial_println, QemuExitCode}; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { should_fail(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); loop{} } fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` We now call the `should_fail` function directly from our `_start` function and exit with a failure exit code if it returns. When we run `cargo test --test should_panic` now, we see that the test behaves exactly as before. Apart from creating `should_panic` tests, disabling the `harness` attribute can also be useful for complex integration tests, for example, when the individual test functions have side effects and need to be run in a specified order. ## Summary Testing is a very useful technique to ensure that certain components have the desired behavior. Even if they cannot show the absence of bugs, they're still a useful tool for finding them and especially for avoiding regressions. This post explained how to set up a test framework for our Rust kernel. We used Rust's custom test frameworks feature to implement support for a simple `#[test_case]` attribute in our bare-metal environment. Using the `isa-debug-exit` device of QEMU, our test runner can exit QEMU after running the tests and report the test status. To print error messages to the console instead of the VGA buffer, we created a basic driver for the serial port. After creating some tests for our `println` macro, we explored integration tests in the second half of the post. We learned that they live in the `tests` directory and are treated as completely separate executables. To give them access to the `exit_qemu` function and the `serial_println` macro, we moved most of our code into a library that can be imported by all executables and integration tests. Since integration tests run in their own separate environment, they make it possible to test interactions with the hardware or to create tests that should panic. We now have a test framework that runs in a realistic environment inside QEMU. By creating more tests in future posts, we can keep our kernel maintainable when it becomes more complex. ## What's next? In the next post, we will explore _CPU exceptions_. These exceptions are thrown by the CPU when something illegal happens, such as a division by zero or an access to an unmapped memory page (a so-called “page fault”). Being able to catch and examine these exceptions is very important for debugging future errors. Exception handling is also very similar to the handling of hardware interrupts, which is required for keyboard support. ================================================ FILE: blog/content/edition-2/posts/04-testing/index.pt-BR.md ================================================ +++ title = "Testes" weight = 4 path = "pt-BR/testing" date = 2019-04-27 [extra] chapter = "O Básico" comments_search_term = 1009 # Please update this when updating the translation translation_based_on_commit = "33b7979468235b8637584e91e4c599cef37d9687" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Este post explora testes unitários e de integração em executáveis `no_std`. Usaremos o suporte do Rust para frameworks de teste customizados para executar funções de teste dentro do nosso kernel. Para reportar os resultados para fora do QEMU, usaremos diferentes recursos do QEMU e da ferramenta `bootimage`. Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na branch [`post-04`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-04 ## Requisitos Este post substitui os posts (agora deprecados) [_Unit Testing_] e [_Integration Tests_]. Ele assume que você seguiu o post [_A Minimal Rust Kernel_] depois de 27-04-2019. Principalmente, ele requer que você tenha um arquivo `.cargo/config.toml` que [define um alvo padrão] e [define um executável runner]. [_Unit Testing_]: @/edition-2/posts/deprecated/04-unit-testing/index.md [_Integration Tests_]: @/edition-2/posts/deprecated/05-integration-tests/index.md [_A Minimal Rust Kernel_]: @/edition-2/posts/02-minimal-rust-kernel/index.pt-BR.md [define um alvo padrão]: @/edition-2/posts/02-minimal-rust-kernel/index.pt-BR.md#definir-um-alvo-padrao [define um executável runner]: @/edition-2/posts/02-minimal-rust-kernel/index.pt-BR.md#usando-cargo-run ## Testes em Rust Rust tem um [framework de testes integrado] que é capaz de executar testes unitários sem a necessidade de configurar nada. Basta criar uma função que verifica alguns resultados através de assertions e adicionar o atributo `#[test]` ao cabeçalho da função. Então `cargo test` automaticamente encontrará e executará todas as funções de teste da sua crate. [framework de testes integrado]: https://doc.rust-lang.org/book/ch11-00-testing.html Para habilitar testes para nosso binário kernel, podemos definir a flag `test` no Cargo.toml como `true`: ```toml # em Cargo.toml [[bin]] name = "blog_os" test = true bench = false ``` Esta [seção `[[bin]]`](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#configuring-a-target) especifica como o `cargo` deve compilar nosso executável `blog_os`. O campo `test` especifica se testes são suportados para este executável. Definimos `test = false` no primeiro post para [deixar o `rust-analyzer` feliz](@/edition-2/posts/01-freestanding-rust-binary/index.pt-BR.md#deixando-rust-analyzer-feliz), mas agora queremos habilitar testes, então o definimos de volta para `true`. Infelizmente, testes são um pouco mais complicados para aplicações `no_std` como nosso kernel. O problema é que o framework de testes do Rust usa implicitamente a biblioteca [`test`] integrada, que depende da biblioteca padrão. Isso significa que não podemos usar o framework de testes padrão para nosso kernel `#[no_std]`. [`test`]: https://doc.rust-lang.org/test/index.html Podemos ver isso quando tentamos executar `cargo test` no nosso projeto: ``` > cargo test Compiling blog_os v0.1.0 (/…/blog_os) error[E0463]: can't find crate for `test` ``` Como a crate `test` depende da biblioteca padrão, ela não está disponível para nosso alvo bare metal. Embora portar a crate `test` para um contexto `#[no_std]` [seja possível][utest], é altamente instável e requer alguns hacks, como redefinir a macro `panic`. [utest]: https://github.com/japaric/utest ### Frameworks de Teste Customizados Felizmente, Rust suporta substituir o framework de testes padrão através do recurso instável [`custom_test_frameworks`]. Este recurso não requer bibliotecas externas e, portanto, também funciona em ambientes `#[no_std]`. Funciona coletando todas as funções anotadas com um atributo `#[test_case]` e então invocando uma função runner especificada pelo usuário com a lista de testes como argumento. Assim, dá à implementação controle máximo sobre o processo de teste. [`custom_test_frameworks`]: https://doc.rust-lang.org/unstable-book/language-features/custom-test-frameworks.html A desvantagem comparada ao framework de testes padrão é que muitos recursos avançados, como [testes `should_panic`], não estão disponíveis. Em vez disso, cabe à implementação fornecer tais recursos ela mesma se necessário. Isso é ideal para nós, pois temos um ambiente de execução muito especial onde as implementações padrão de tais recursos avançados provavelmente não funcionariam de qualquer forma. Por exemplo, o atributo `#[should_panic]` depende de stack unwinding para capturar os panics, que desabilitamos para nosso kernel. [testes `should_panic`]: https://doc.rust-lang.org/book/ch11-01-writing-tests.html#checking-for-panics-with-should_panic Para implementar um framework de testes customizado para nosso kernel, adicionamos o seguinte ao nosso `main.rs`: ```rust // em src/main.rs #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #[cfg(test)] pub fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } } ``` Nosso runner apenas imprime uma breve mensagem de debug e então chama cada função de teste na lista. O tipo de argumento `&[&dyn Fn()]` é uma [_slice_] de referências a [_trait object_] da trait [_Fn()_]. É basicamente uma lista de referências a tipos que podem ser chamados como uma função. Como a função é inútil para execuções não-teste, usamos o atributo `#[cfg(test)]` para incluí-la apenas para testes. [_slice_]: https://doc.rust-lang.org/std/primitive.slice.html [_trait object_]: https://doc.rust-lang.org/1.30.0/book/first-edition/trait-objects.html [_Fn()_]: https://doc.rust-lang.org/std/ops/trait.Fn.html Quando executamos `cargo test` agora, vemos que ele agora é bem-sucedido (se não for, veja a nota abaixo). No entanto, ainda vemos nosso "Hello World" em vez da mensagem do nosso `test_runner`. A razão é que nossa função `_start` ainda é usada como ponto de entrada. O recurso de frameworks de teste customizados gera uma função `main` que chama `test_runner`, mas esta função é ignorada porque usamos o atributo `#[no_main]` e fornecemos nosso próprio ponto de entrada.
    **Nota:** Atualmente há um bug no cargo que leva a erros de "duplicate lang item" no `cargo test` em alguns casos. Ocorre quando você definiu `panic = "abort"` para um profile no seu `Cargo.toml`. Tente removê-lo, então `cargo test` deve funcionar. Alternativamente, se isso não funcionar, então adicione `panic-abort-tests = true` à seção `[unstable]` do seu arquivo `.cargo/config.toml`. Veja o [issue do cargo](https://github.com/rust-lang/cargo/issues/7359) para mais informações sobre isso.
    Para corrigir isso, primeiro precisamos mudar o nome da função gerada para algo diferente de `main` através do atributo `reexport_test_harness_main`. Então podemos chamar a função renomeada da nossa função `_start`: ```rust // em src/main.rs #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } ``` Definimos o nome da função de entrada do framework de testes como `test_main` e a chamamos do nosso ponto de entrada `_start`. Usamos [compilação condicional] para adicionar a chamada a `test_main` apenas em contextos de teste porque a função não é gerada em uma execução normal. Quando agora executamos `cargo test`, vemos a mensagem "Running 0 tests" do nosso `test_runner` na tela. Agora estamos prontos para criar nossa primeira função de teste: ```rust // em src/main.rs #[test_case] fn trivial_assertion() { print!("trivial assertion... "); assert_eq!(1, 1); println!("[ok]"); } ``` Quando executamos `cargo test` agora, vemos a seguinte saída: ![QEMU imprimindo "Hello World!", "Running 1 tests" e "trivial assertion... [ok]"](qemu-test-runner-output.png) A slice `tests` passada para nossa função `test_runner` agora contém uma referência à função `trivial_assertion`. Da saída `trivial assertion... [ok]` na tela, vemos que o teste foi chamado e que foi bem-sucedido. Após executar os testes, nosso `test_runner` retorna à função `test_main`, que por sua vez retorna à nossa função de ponto de entrada `_start`. No final de `_start`, entramos em um loop infinito porque a função de ponto de entrada não tem permissão para retornar. Isso é um problema, porque queremos que `cargo test` saia após executar todos os testes. ## Saindo do QEMU Agora, temos um loop infinito no final da nossa função `_start` e precisamos fechar o QEMU manualmente em cada execução de `cargo test`. Isso é infeliz porque também queremos executar `cargo test` em scripts sem interação do usuário. A solução limpa para isso seria implementar uma maneira adequada de desligar nosso SO. Infelizmente, isso é relativamente complexo porque requer implementar suporte para o padrão de gerenciamento de energia [APM] ou [ACPI]. [APM]: https://wiki.osdev.org/APM [ACPI]: https://wiki.osdev.org/ACPI Felizmente, há uma saída de emergência: O QEMU suporta um dispositivo especial `isa-debug-exit`, que fornece uma maneira fácil de sair do QEMU do sistema guest. Para habilitá-lo, precisamos passar um argumento `-device` ao QEMU. Podemos fazer isso adicionando uma chave de configuração `package.metadata.bootimage.test-args` no nosso `Cargo.toml`: ```toml # em Cargo.toml [package.metadata.bootimage] test-args = ["-device", "isa-debug-exit,iobase=0xf4,iosize=0x04"] ``` O `bootimage runner` anexa os `test-args` ao comando QEMU padrão para todos os executáveis de teste. Para um `cargo run` normal, os argumentos são ignorados. Junto com o nome do dispositivo (`isa-debug-exit`), passamos os dois parâmetros `iobase` e `iosize` que especificam a _porta I/O_ através da qual o dispositivo pode ser alcançado do nosso kernel. ### Portas I/O Existem duas abordagens diferentes para comunicação entre a CPU e hardware periférico no x86, **I/O mapeado em memória** e **I/O mapeado em porta**. Já usamos I/O mapeado em memória para acessar o [buffer de texto VGA] através do endereço de memória `0xb8000`. Este endereço não é mapeado para RAM, mas para alguma memória no dispositivo VGA. [buffer de texto VGA]: @/edition-2/posts/03-vga-text-buffer/index.pt-BR.md Em contraste, I/O mapeado em porta usa um barramento I/O separado para comunicação. Cada periférico conectado tem um ou mais números de porta. Para comunicar com tal porta I/O, existem instruções especiais de CPU chamadas `in` e `out`, que recebem um número de porta e um byte de dados (também há variações desses comandos que permitem enviar um `u16` ou `u32`). O dispositivo `isa-debug-exit` usa I/O mapeado em porta. O parâmetro `iobase` especifica em qual endereço de porta o dispositivo deve viver (`0xf4` é uma porta [geralmente não utilizada][lista de portas I/O x86] no barramento IO do x86) e o `iosize` especifica o tamanho da porta (`0x04` significa quatro bytes). [lista de portas I/O x86]: https://wiki.osdev.org/I/O_Ports#The_list ### Usando o Dispositivo de Saída A funcionalidade do dispositivo `isa-debug-exit` é muito simples. Quando um `value` é escrito na porta I/O especificada por `iobase`, ele faz com que o QEMU saia com [status de saída] `(value << 1) | 1`. Então, quando escrevemos `0` na porta, o QEMU sairá com status de saída `(0 << 1) | 1 = 1`, e quando escrevemos `1` na porta, ele sairá com status de saída `(1 << 1) | 1 = 3`. [status de saída]: https://en.wikipedia.org/wiki/Exit_status Em vez de invocar manualmente as instruções assembly `in` e `out`, usamos as abstrações fornecidas pela crate [`x86_64`]. Para adicionar uma dependência nessa crate, a adicionamos à seção `dependencies` no nosso `Cargo.toml`: [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/ ```toml # em Cargo.toml [dependencies] x86_64 = "0.14.2" ``` Agora podemos usar o tipo [`Port`] fornecido pela crate para criar uma função `exit_qemu`: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html ```rust // em src/main.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` A função cria um novo [`Port`] em `0xf4`, que é o `iobase` do dispositivo `isa-debug-exit`. Então ela escreve o código de saída passado para a porta. Usamos `u32` porque especificamos o `iosize` do dispositivo `isa-debug-exit` como 4 bytes. Ambas as operações são unsafe porque escrever em uma porta I/O geralmente pode resultar em comportamento arbitrário. Para especificar o status de saída, criamos um enum `QemuExitCode`. A ideia é sair com o código de saída de sucesso se todos os testes foram bem-sucedidos e com o código de saída de falha caso contrário. O enum é marcado como `#[repr(u32)]` para representar cada variante por um inteiro `u32`. Usamos o código de saída `0x10` para sucesso e `0x11` para falha. Os códigos de saída reais não importam muito, desde que não colidam com os códigos de saída padrão do QEMU. Por exemplo, usar código de saída `0` para sucesso não é uma boa ideia porque ele se torna `(0 << 1) | 1 = 1` após a transformação, que é o código de saída padrão quando o QEMU falha ao executar. Então não poderíamos diferenciar um erro do QEMU de uma execução de teste bem-sucedida. Agora podemos atualizar nosso `test_runner` para sair do QEMU após todos os testes terem sido executados: ```rust // em src/main.rs fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } /// novo exit_qemu(QemuExitCode::Success); } ``` Quando executamos `cargo test` agora, vemos que o QEMU fecha imediatamente após executar os testes. O problema é que `cargo test` interpreta o teste como falhado mesmo que tenhamos passado nosso código de saída `Success`: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.03s Running target/x86_64-blog_os/debug/deps/blog_os-5804fc7d2dd4c9be Building bootloader Compiling bootloader v0.5.3 (/home/philipp/Documents/bootloader) Finished release [optimized + debuginfo] target(s) in 1.07s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-5804fc7d2dd4c9be.bin -device isa-debug-exit,iobase=0xf4, iosize=0x04` error: test failed, to rerun pass '--bin blog_os' ``` O problema é que `cargo test` considera todos os códigos de erro diferentes de `0` como falha. ### Código de Saída de Sucesso Para contornar isso, `bootimage` fornece uma chave de configuração `test-success-exit-code` que mapeia um código de saída especificado para o código de saída `0`: ```toml # em Cargo.toml [package.metadata.bootimage] test-args = […] test-success-exit-code = 33 # (0x10 << 1) | 1 ``` Com esta configuração, `bootimage` mapeia nosso código de saída de sucesso para o código de saída 0, para que `cargo test` reconheça corretamente o caso de sucesso e não conte o teste como falhado. Nosso test runner agora fecha automaticamente o QEMU e reporta corretamente os resultados do teste. Ainda vemos a janela do QEMU abrir por um tempo muito curto, mas não é suficiente para ler os resultados. Seria bom se pudéssemos imprimir os resultados do teste no console em vez disso, para que ainda possamos vê-los após o QEMU sair. ## Imprimindo no Console Para ver a saída do teste no console, precisamos enviar os dados do nosso kernel para o sistema host de alguma forma. Existem várias maneiras de conseguir isso, por exemplo, enviando os dados por uma interface de rede TCP. No entanto, configurar uma pilha de rede é uma tarefa bastante complexa, então escolheremos uma solução mais simples em vez disso. ### Porta Serial Uma maneira simples de enviar os dados é usar a [porta serial], um antigo padrão de interface que não é mais encontrado em computadores modernos. É fácil de programar e o QEMU pode redirecionar os bytes enviados pela porta serial para a saída padrão do host ou um arquivo. [porta serial]: https://en.wikipedia.org/wiki/Serial_port Os chips que implementam uma interface serial são chamados [UARTs]. Existem [muitos modelos de UART] no x86, mas felizmente as únicas diferenças entre eles são alguns recursos avançados que não precisamos. Os UARTs comuns hoje são todos compatíveis com o [UART 16550], então usaremos esse modelo para nosso framework de testes. [UARTs]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter [muitos modelos de UART]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#Models [UART 16550]: https://en.wikipedia.org/wiki/16550_UART Usaremos a crate [`uart_16550`] para inicializar o UART e enviar dados pela porta serial. Para adicioná-la como dependência, atualizamos nosso `Cargo.toml` e `main.rs`: [`uart_16550`]: https://docs.rs/uart_16550 ```toml # em Cargo.toml [dependencies] uart_16550 = "0.2.0" ``` A crate `uart_16550` contém uma struct `SerialPort` que representa os registradores UART, mas ainda precisamos construir uma instância dela nós mesmos. Para isso, criamos um novo módulo `serial` com o seguinte conteúdo: ```rust // em src/main.rs mod serial; ``` ```rust // em src/serial.rs use uart_16550::SerialPort; use spin::Mutex; use lazy_static::lazy_static; lazy_static! { pub static ref SERIAL1: Mutex = { let mut serial_port = unsafe { SerialPort::new(0x3F8) }; serial_port.init(); Mutex::new(serial_port) }; } ``` Como com o [buffer de texto VGA][vga lazy-static], usamos `lazy_static` e um spinlock para criar uma instância writer `static`. Ao usar `lazy_static` podemos garantir que o método `init` seja chamado exatamente uma vez em seu primeiro uso. Como o dispositivo `isa-debug-exit`, o UART é programado usando I/O de porta. Como o UART é mais complexo, ele usa múltiplas portas I/O para programar diferentes registradores do dispositivo. A função unsafe `SerialPort::new` espera o endereço da primeira porta I/O do UART como argumento, a partir do qual ela pode calcular os endereços de todas as portas necessárias. Estamos passando o endereço de porta `0x3F8`, que é o número de porta padrão para a primeira interface serial. [vga lazy-static]: @/edition-2/posts/03-vga-text-buffer/index.pt-BR.md#lazy-statics Para tornar a porta serial facilmente utilizável, adicionamos macros `serial_print!` e `serial_println!`: ```rust // em src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; SERIAL1.lock().write_fmt(args).expect("Printing to serial failed"); } /// Imprime no host através da interface serial. #[macro_export] macro_rules! serial_print { ($($arg:tt)*) => { $crate::serial::_print(format_args!($($arg)*)); }; } /// Imprime no host através da interface serial, anexando uma newline. #[macro_export] macro_rules! serial_println { () => ($crate::serial_print!("\n")); ($fmt:expr) => ($crate::serial_print!(concat!($fmt, "\n"))); ($fmt:expr, $($arg:tt)*) => ($crate::serial_print!( concat!($fmt, "\n"), $($arg)*)); } ``` A implementação é muito similar à implementação das nossas macros `print` e `println`. Como o tipo `SerialPort` já implementa a trait [`fmt::Write`], não precisamos fornecer nossa própria implementação. [`fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html Agora podemos imprimir na interface serial em vez do buffer de texto VGA no nosso código de teste: ```rust // em src/main.rs #[cfg(test)] fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); […] } #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` Note que a macro `serial_println` vive diretamente sob o namespace raiz porque usamos o atributo `#[macro_export]`, então importá-la através de `use crate::serial::serial_println` não funcionará. ### Argumentos do QEMU Para ver a saída serial do QEMU, precisamos usar o argumento `-serial` para redirecionar a saída para stdout: ```toml # em Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio" ] ``` Quando executamos `cargo test` agora, vemos a saída do teste diretamente no console: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [ok] ``` No entanto, quando um teste falha, ainda vemos a saída dentro do QEMU porque nosso handler de panic ainda usa `println`. Para simular isso, podemos mudar a assertion no nosso teste `trivial_assertion` para `assert_eq!(0, 1)`: ![QEMU imprimindo "Hello World!" e "panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:55:5](qemu-failed-test.png) Vemos que a mensagem de panic ainda é impressa no buffer VGA, enquanto a outra saída de teste é impressa na porta serial. A mensagem de panic é bastante útil, então seria útil vê-la no console também. ### Imprimir uma Mensagem de Erro no Panic Para sair do QEMU com uma mensagem de erro em um panic, podemos usar [compilação condicional] para usar um handler de panic diferente no modo de teste: [compilação condicional]: https://doc.rust-lang.org/1.30.0/book/first-edition/conditional-compilation.html ```rust // em src/main.rs // nosso handler de panic existente #[cfg(not(test))] // novo atributo #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } // nosso handler de panic em modo de teste #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } ``` Para nosso handler de panic de teste, usamos `serial_println` em vez de `println` e então saímos do QEMU com um código de saída de falha. Note que ainda precisamos de um `loop` infinito após a chamada `exit_qemu` porque o compilador não sabe que o dispositivo `isa-debug-exit` causa uma saída do programa. Agora o QEMU também sai para testes falhados e imprime uma mensagem de erro útil no console: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:65:5 ``` Como agora vemos toda a saída do teste no console, não precisamos mais da janela do QEMU que aparece por um curto tempo. Então podemos ocultá-la completamente. ### Ocultando o QEMU Como reportamos os resultados completos do teste usando o dispositivo `isa-debug-exit` e a porta serial, não precisamos mais da janela do QEMU. Podemos ocultá-la passando o argumento `-display none` ao QEMU: ```toml # em Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio", "-display", "none" ] ``` Agora o QEMU executa completamente em segundo plano e nenhuma janela é mais aberta. Isso não é apenas menos irritante, mas também permite que nosso framework de testes execute em ambientes sem interface gráfica do usuário, como serviços de CI ou conexões [SSH]. [SSH]: https://en.wikipedia.org/wiki/Secure_Shell ### Timeouts Como `cargo test` espera até que o test runner saia, um teste que nunca retorna pode bloquear o test runner para sempre. Isso é infeliz, mas não é um grande problema na prática, pois geralmente é fácil evitar loops infinitos. No nosso caso, no entanto, loops infinitos podem ocorrer em várias situações: - O bootloader falha ao carregar nosso kernel, o que causa o sistema reiniciar infinitamente. - O firmware BIOS/UEFI falha ao carregar o bootloader, o que causa a mesma reinicialização infinita. - A CPU entra em uma declaração `loop {}` no final de algumas das nossas funções, por exemplo porque o dispositivo de saída do QEMU não funciona corretamente. - O hardware causa um reset do sistema, por exemplo quando uma exceção de CPU não é capturada (explicado em um post futuro). Como loops infinitos podem ocorrer em tantas situações, a ferramenta `bootimage` define um timeout de 5 minutos para cada executável de teste por padrão. Se o teste não terminar dentro deste tempo, ele é marcado como falhado e um erro "Timed Out" é impresso no console. Este recurso garante que testes que estão presos em um loop infinito não bloqueiem `cargo test` para sempre. Você pode tentar você mesmo adicionando uma declaração `loop {}` no teste `trivial_assertion`. Quando você executa `cargo test`, vê que o teste é marcado como timed out após 5 minutos. A duração do timeout é [configurável][bootimage config] através de uma chave `test-timeout` no Cargo.toml: [bootimage config]: https://github.com/rust-osdev/bootimage#configuration ```toml # em Cargo.toml [package.metadata.bootimage] test-timeout = 300 # (em segundos) ``` Se você não quiser esperar 5 minutos para o teste `trivial_assertion` dar timeout, pode diminuir temporariamente o valor acima. ### Inserir Impressão Automaticamente Nosso teste `trivial_assertion` atualmente precisa imprimir suas próprias informações de status usando `serial_print!`/`serial_println!`: ```rust #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` Adicionar manualmente essas declarações de impressão para cada teste que escrevemos é trabalhoso, então vamos atualizar nosso `test_runner` para imprimir essas mensagens automaticamente. Para fazer isso, precisamos criar uma nova trait `Testable`: ```rust // em src/main.rs pub trait Testable { fn run(&self) -> (); } ``` O truque agora é implementar esta trait para todos os tipos `T` que implementam a [trait `Fn()`]: [trait `Fn()`]: https://doc.rust-lang.org/stable/core/ops/trait.Fn.html ```rust // em src/main.rs impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } ``` Implementamos a função `run` primeiro imprimindo o nome da função usando a função [`any::type_name`]. Esta função é implementada diretamente no compilador e retorna uma descrição em string de cada tipo. Para funções, o tipo é seu nome, então isso é exatamente o que queremos neste caso. O caractere `\t` é o [caractere tab], que adiciona algum alinhamento às mensagens `[ok]`. [`any::type_name`]: https://doc.rust-lang.org/stable/core/any/fn.type_name.html [caractere tab]: https://en.wikipedia.org/wiki/Tab_character Após imprimir o nome da função, invocamos a função de teste através de `self()`. Isso só funciona porque exigimos que `self` implemente a trait `Fn()`. Após a função de teste retornar, imprimimos `[ok]` para indicar que a função não entrou em panic. O último passo é atualizar nosso `test_runner` para usar a nova trait `Testable`: ```rust // em src/main.rs #[cfg(test)] pub fn test_runner(tests: &[&dyn Testable]) { // novo serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); // novo } exit_qemu(QemuExitCode::Success); } ``` As únicas duas mudanças são o tipo do argumento `tests` de `&[&dyn Fn()]` para `&[&dyn Testable]` e o fato de que agora chamamos `test.run()` em vez de `test()`. Agora podemos remover as declarações de impressão do nosso teste `trivial_assertion` já que elas são impressas automaticamente: ```rust // em src/main.rs #[test_case] fn trivial_assertion() { assert_eq!(1, 1); } ``` A saída de `cargo test` agora se parece com isto: ``` Running 1 tests blog_os::trivial_assertion... [ok] ``` O nome da função agora inclui o caminho completo para a função, o que é útil quando funções de teste em diferentes módulos têm o mesmo nome. Caso contrário, a saída parece igual a antes, mas não precisamos mais adicionar declarações de impressão aos nossos testes manualmente. ## Testando o Buffer VGA Agora que temos um framework de testes funcionando, podemos criar alguns testes para nossa implementação de buffer VGA. Primeiro, criamos um teste muito simples para verificar que `println` funciona sem entrar em panic: ```rust // em src/vga_buffer.rs #[test_case] fn test_println_simple() { println!("test_println_simple output"); } ``` O teste apenas imprime algo no buffer VGA. Se ele terminar sem entrar em panic, significa que a invocação de `println` também não entrou em panic. Para garantir que nenhum panic ocorra mesmo se muitas linhas forem impressas e as linhas forem deslocadas para fora da tela, podemos criar outro teste: ```rust // em src/vga_buffer.rs #[test_case] fn test_println_many() { for _ in 0..200 { println!("test_println_many output"); } } ``` Também podemos criar uma função de teste para verificar que as linhas impressas realmente aparecem na tela: ```rust // em src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` A função define uma string de teste, a imprime usando `println`, e então itera sobre os caracteres da tela do `WRITER` static, que representa o buffer de texto VGA. Como `println` imprime na última linha da tela e então anexa imediatamente uma newline, a string deve aparecer na linha `BUFFER_HEIGHT - 2`. Ao usar [`enumerate`], contamos o número de iterações na variável `i`, que então usamos para carregar o caractere da tela correspondente a `c`. Ao comparar o `ascii_character` do caractere da tela com `c`, garantimos que cada caractere da string realmente aparece no buffer de texto VGA. [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate Como você pode imaginar, poderíamos criar muitas mais funções de teste. Por exemplo, uma função que testa que nenhum panic ocorre ao imprimir linhas muito longas e que elas são quebradas corretamente, ou uma função para testar que newlines, caracteres não imprimíveis e caracteres não-unicode são tratados corretamente. Para o resto deste post, no entanto, explicaremos como criar _testes de integração_ para testar a interação de diferentes componentes juntos. ## Testes de Integração A convenção para [testes de integração] em Rust é colocá-los em um diretório `tests` na raiz do projeto (ou seja, ao lado do diretório `src`). Tanto o framework de testes padrão quanto frameworks de testes customizados detectarão e executarão automaticamente todos os testes naquele diretório. [testes de integração]: https://doc.rust-lang.org/book/ch11-03-test-organization.html#integration-tests Todos os testes de integração são seus próprios executáveis e completamente separados do nosso `main.rs`. Isso significa que cada teste precisa definir sua própria função de ponto de entrada. Vamos criar um teste de integração de exemplo chamado `basic_boot` para ver como funciona em detalhes: ```rust // em tests/basic_boot.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; #[unsafe(no_mangle)] // não altere (mangle) o nome desta função pub extern "C" fn _start() -> ! { test_main(); loop {} } fn test_runner(tests: &[&dyn Fn()]) { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { loop {} } ``` Como testes de integração são executáveis separados, precisamos fornecer todos os atributos da crate (`no_std`, `no_main`, `test_runner`, etc.) novamente. Também precisamos criar uma nova função de ponto de entrada `_start`, que chama a função de ponto de entrada de teste `test_main`. Não precisamos de nenhum atributo `cfg(test)` porque executáveis de teste de integração nunca são construídos em modo não-teste. Usamos a macro [`unimplemented`] que sempre entra em panic como placeholder para a função `test_runner` e apenas fazemos `loop` no handler de `panic` por enquanto. Idealmente, queremos implementar essas funções exatamente como fizemos no nosso `main.rs` usando a macro `serial_println` e a função `exit_qemu`. O problema é que não temos acesso a essas funções porque os testes são construídos completamente separados do nosso executável `main.rs`. [`unimplemented`]: https://doc.rust-lang.org/core/macro.unimplemented.html Se você executar `cargo test` neste estágio, entrará em um loop infinito porque o handler de panic faz loop infinitamente. Você precisa usar o atalho de teclado `ctrl+c` para sair do QEMU. ### Criar uma Biblioteca Para tornar as funções necessárias disponíveis para nosso teste de integração, precisamos separar uma biblioteca do nosso `main.rs`, que pode ser incluída por outras crates e executáveis de teste de integração. Para fazer isso, criamos um novo arquivo `src/lib.rs`: ```rust // src/lib.rs #![no_std] ``` Como o `main.rs`, o `lib.rs` é um arquivo especial que é automaticamente reconhecido pelo cargo. A biblioteca é uma unidade de compilação separada, então precisamos especificar o atributo `#![no_std]` novamente. Para fazer nossa biblioteca funcionar com `cargo test`, precisamos também mover as funções de teste e atributos de `main.rs` para `lib.rs`: ```rust // em src/lib.rs #![cfg_attr(test, no_main)] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; pub trait Testable { fn run(&self) -> (); } impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); } exit_qemu(QemuExitCode::Success); } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } /// Ponto de entrada para `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { test_panic_handler(info) } ``` Para tornar nosso `test_runner` disponível para executáveis e testes de integração, o tornamos público e não aplicamos o atributo `cfg(test)` a ele. Também fatoramos a implementação do nosso handler de panic em uma função pública `test_panic_handler`, para que ela esteja disponível para executáveis também. Como nosso `lib.rs` é testado independentemente do nosso `main.rs`, precisamos adicionar um ponto de entrada `_start` e um handler de panic quando a biblioteca é compilada em modo de teste. Ao usar o atributo de crate [`cfg_attr`], habilitamos condicionalmente o atributo `no_main` neste caso. [`cfg_attr`]: https://doc.rust-lang.org/reference/conditional-compilation.html#the-cfg_attr-attribute Também movemos o enum `QemuExitCode` e a função `exit_qemu` e os tornamos públicos: ```rust // em src/lib.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` Agora executáveis e testes de integração podem importar essas funções da biblioteca e não precisam definir suas próprias implementações. Para também tornar `println` e `serial_println` disponíveis, movemos as declarações de módulo também: ```rust // em src/lib.rs pub mod serial; pub mod vga_buffer; ``` Tornamos os módulos públicos para torná-los utilizáveis fora da nossa biblioteca. Isso também é necessário para tornar nossas macros `println` e `serial_println` utilizáveis, já que elas usam as funções `_print` dos módulos. Agora podemos atualizar nosso `main.rs` para usar a biblioteca: ```rust // em src/main.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; use blog_os::println; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } /// Esta função é chamada em caso de pânico. #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` A biblioteca é utilizável como uma crate externa normal. É chamada `blog_os`, como nossa crate. O código acima usa a função `blog_os::test_runner` no atributo `test_runner` e a função `blog_os::test_panic_handler` no nosso handler de `panic` `cfg(test)`. Também importa a macro `println` para torná-la disponível para nossas funções `_start` e `panic`. Neste ponto, `cargo run` e `cargo test` devem funcionar novamente. É claro que `cargo test` ainda faz loop infinitamente (você pode sair com `ctrl+c`). Vamos corrigir isso usando as funções necessárias da biblioteca no nosso teste de integração. ### Completando o Teste de Integração Como nosso `src/main.rs`, nosso executável `tests/basic_boot.rs` pode importar tipos da nossa nova biblioteca. Isso nos permite importar os componentes faltantes para completar nosso teste: ```rust // em tests/basic_boot.rs #![test_runner(blog_os::test_runner)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` Em vez de reimplementar o test runner, usamos a função `test_runner` da nossa biblioteca mudando o atributo `#![test_runner(crate::test_runner)]` para `#![test_runner(blog_os::test_runner)]`. Então não precisamos mais da função stub `test_runner` em `basic_boot.rs`, então podemos removê-la. Para nosso handler de `panic`, chamamos a função `blog_os::test_panic_handler` como fizemos no nosso `main.rs`. Agora `cargo test` sai normalmente novamente. Quando você o executa, verá que ele constrói e executa os testes para nosso `lib.rs`, `main.rs` e `basic_boot.rs` separadamente um após o outro. Para o `main.rs` e os testes de integração `basic_boot`, ele reporta "Running 0 tests" já que esses arquivos não têm nenhuma função anotada com `#[test_case]`. Agora podemos adicionar testes ao nosso `basic_boot.rs`. Por exemplo, podemos testar que `println` funciona sem entrar em panic, como fizemos nos testes do buffer VGA: ```rust // em tests/basic_boot.rs use blog_os::println; #[test_case] fn test_println() { println!("test_println output"); } ``` Quando executamos `cargo test` agora, vemos que ele encontra e executa a função de teste. O teste pode parecer um pouco inútil agora já que é quase idêntico a um dos testes do buffer VGA. No entanto, no futuro, as funções `_start` do nosso `main.rs` e `lib.rs` podem crescer e chamar várias rotinas de inicialização antes de executar a função `test_main`, então os dois testes são executados em ambientes muito diferentes. Ao testar `println` em um ambiente `basic_boot` sem chamar nenhuma rotina de inicialização em `_start`, podemos garantir que `println` funciona logo após o boot. Isso é importante porque dependemos dele, por exemplo, para imprimir mensagens de panic. ### Testes Futuros O poder dos testes de integração é que eles são tratados como executáveis completamente separados. Isso lhes dá controle completo sobre o ambiente, o que torna possível testar que o código interage corretamente com a CPU ou dispositivos de hardware. Nosso teste `basic_boot` é um exemplo muito simples de um teste de integração. No futuro, nosso kernel se tornará muito mais cheio de recursos e interagirá com o hardware de várias maneiras. Ao adicionar testes de integração, podemos garantir que essas interações funcionem (e continuem funcionando) como esperado. Algumas ideias para possíveis testes futuros são: - **Exceções de CPU**: Quando o código executa operações inválidas (por exemplo, divide por zero), a CPU lança uma exceção. O kernel pode registrar funções handler para tais exceções. Um teste de integração poderia verificar que o handler de exceção correto é chamado quando uma exceção de CPU ocorre ou que a execução continua corretamente após uma exceção resolvível. - **Tabelas de Página**: Tabelas de página definem quais regiões de memória são válidas e acessíveis. Ao modificar as tabelas de página, é possível alocar novas regiões de memória, por exemplo ao lançar programas. Um teste de integração poderia modificar as tabelas de página na função `_start` e verificar que as modificações têm os efeitos desejados nas funções `#[test_case]`. - **Programas Userspace**: Programas userspace são programas com acesso limitado aos recursos do sistema. Por exemplo, eles não têm acesso a estruturas de dados do kernel ou à memória de outros programas. Um teste de integração poderia lançar programas userspace que executam operações proibidas e verificar que o kernel as impede todas. Como você pode imaginar, muitos mais testes são possíveis. Ao adicionar tais testes, podemos garantir que não os quebramos acidentalmente quando adicionamos novos recursos ao nosso kernel ou refatoramos nosso código. Isso é especialmente importante quando nosso kernel se torna maior e mais complexo. ### Testes que Devem Entrar em Panic O framework de testes da biblioteca padrão suporta um [atributo `#[should_panic]`][should_panic] que permite construir testes que devem falhar. Isso é útil, por exemplo, para verificar que uma função falha quando um argumento inválido é passado. Infelizmente, este atributo não é suportado em crates `#[no_std]` porque requer suporte da biblioteca padrão. [should_panic]: https://doc.rust-lang.org/rust-by-example/testing/unit_testing.html#testing-panics Embora não possamos usar o atributo `#[should_panic]` no nosso kernel, podemos obter comportamento similar criando um teste de integração que sai com um código de erro de sucesso do handler de panic. Vamos começar a criar tal teste com o nome `should_panic`: ```rust // em tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{QemuExitCode, exit_qemu, serial_println}; #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` Este teste ainda está incompleto, pois não define uma função `_start` ou nenhum dos atributos customizados de test runner ainda. Vamos adicionar as partes faltantes: ```rust // em tests/should_panic.rs #![feature(custom_test_frameworks)] #![test_runner(test_runner)] #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } pub fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); for test in tests { test(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); } exit_qemu(QemuExitCode::Success); } ``` Em vez de reutilizar o `test_runner` do nosso `lib.rs`, o teste define sua própria função `test_runner` que sai com um código de saída de falha quando um teste retorna sem entrar em panic (queremos que nossos testes entrem em panic). Se nenhuma função de teste for definida, o runner sai com um código de erro de sucesso. Como o runner sempre sai após executar um único teste, não faz sentido definir mais de uma função `#[test_case]`. Agora podemos criar um teste que deveria falhar: ```rust // em tests/should_panic.rs use blog_os::serial_print; #[test_case] fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } ``` O teste usa `assert_eq` para afirmar que `0` e `1` são iguais. É claro que isso falha, então nosso teste entra em panic como desejado. Note que precisamos imprimir manualmente o nome da função usando `serial_print!` aqui porque não usamos a trait `Testable`. Quando executamos o teste através de `cargo test --test should_panic` vemos que ele é bem-sucedido porque o teste entrou em panic como esperado. Quando comentamos a assertion e executamos o teste novamente, vemos que ele de fato falha com a mensagem _"test did not panic"_. Uma desvantagem significativa desta abordagem é que ela só funciona para uma única função de teste. Com múltiplas funções `#[test_case]`, apenas a primeira função é executada porque a execução não pode continuar após o handler de panic ter sido chamado. Atualmente não conheço uma boa maneira de resolver este problema, então me avise se você tiver uma ideia! ### Testes Sem Harness Para testes de integração que têm apenas uma única função de teste (como nosso teste `should_panic`), o test runner realmente não é necessário. Para casos como este, podemos desabilitar o test runner completamente e executar nosso teste diretamente na função `_start`. A chave para isso é desabilitar a flag `harness` para o teste no `Cargo.toml`, que define se um test runner é usado para um teste de integração. Quando está definido como `false`, tanto o test runner padrão quanto o recurso de test runner customizado são desabilitados, de modo que o teste é tratado como um executável normal. Vamos desabilitar a flag `harness` para nosso teste `should_panic`: ```toml # em Cargo.toml [[test]] name = "should_panic" harness = false ``` Agora simplificamos vastamente nosso teste `should_panic` removendo o código relacionado ao `test_runner`. O resultado se parece com isto: ```rust // em tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{exit_qemu, serial_print, serial_println, QemuExitCode}; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { should_fail(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); loop{} } fn should_fail() { serial_print!("should_panic::should_fail...\t"); assert_eq!(0, 1); } #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` Agora chamamos a função `should_fail` diretamente da nossa função `_start` e saímos com um código de saída de falha se ela retornar. Quando executamos `cargo test --test should_panic` agora, vemos que o teste se comporta exatamente como antes. Além de criar testes `should_panic`, desabilitar o atributo `harness` também pode ser útil para testes de integração complexos, por exemplo, quando as funções de teste individuais têm efeitos colaterais e precisam ser executadas em uma ordem especificada. ## Resumo Testes são uma técnica muito útil para garantir que certos componentes tenham o comportamento desejado. Mesmo que não possam mostrar a ausência de bugs, ainda são uma ferramenta útil para encontrá-los e especialmente para evitar regressões. Este post explicou como configurar um framework de testes para nosso kernel Rust. Usamos o recurso de frameworks de teste customizados do Rust para implementar suporte para um atributo `#[test_case]` simples no nosso ambiente bare metal. Usando o dispositivo `isa-debug-exit` do QEMU, nosso test runner pode sair do QEMU após executar os testes e reportar o status do teste. Para imprimir mensagens de erro no console em vez do buffer VGA, criamos um driver básico para a porta serial. Após criar alguns testes para nossa macro `println`, exploramos testes de integração na segunda metade do post. Aprendemos que eles vivem no diretório `tests` e são tratados como executáveis completamente separados. Para dar a eles acesso à função `exit_qemu` e à macro `serial_println`, movemos a maior parte do nosso código para uma biblioteca que pode ser importada por todos os executáveis e testes de integração. Como testes de integração são executados em seu próprio ambiente separado, eles tornam possível testar interações com o hardware ou criar testes que devem entrar em panic. Agora temos um framework de testes que executa em um ambiente realista dentro do QEMU. Ao criar mais testes em posts futuros, podemos manter nosso kernel sustentável quando ele se tornar mais complexo. ## O que vem a seguir? No próximo post, exploraremos _exceções de CPU_. Essas exceções são lançadas pela CPU quando algo ilegal acontece, como uma divisão por zero ou um acesso a uma página de memória não mapeada (um chamado "page fault"). Ser capaz de capturar e examinar essas exceções é muito importante para depuração de erros futuros. O tratamento de exceções também é muito similar ao tratamento de interrupções de hardware, que é necessário para suporte a teclado. ================================================ FILE: blog/content/edition-2/posts/04-testing/index.zh-CN.md ================================================ +++ title = "内核测试" weight = 4 path = "zh-CN/testing" date = 2019-04-27 [extra] # Please update this when updating the translation translation_based_on_commit = "e6c148d6f47bcf8a34916393deaeb7e8da2d5e2a" # GitHub usernames of the people that translated this post translators = ["luojia65", "Rustin-Liu", "liuyuran","ic3w1ne"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["JiangengDong"] +++ 本文主要讲述了在`no_std`环境下进行单元测试和集成测试的方法。我们将通过Rust的自定义测试框架来在我们的内核中执行一些测试函数。为了将结果反馈到QEMU上,我们需要使用QEMU的一些其他的功能以及`bootimage`工具。 这个系列的blog在[GitHub]上开放开发,如果你有任何问题,请在这里开一个issue来讨论。当然你也可以在[底部][at the bottom]留言。你可以在[`post-04`][post branch]找到这篇文章的完整源码。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-04 ## 阅读要求 这篇文章替换了此前的(现在已经过时了) [_单元测试(Unit Testing)_][_Unit Testing_] 和 [_集成测试(Integration Tests)_][_Integration Tests_] 两篇文章。这里我将假定你是在2019-04-27日后阅读的[_最小Rust内核_][_A Minimal Rust Kernel_]一文。总而言之,本文要求你已经有一个[已设置默认目标][sets a default target]的 `.cargo/config` 文件且[定义了一个runner可执行文件][defines a runner executable]。 [_Unit Testing_]: @/edition-2/posts/deprecated/04-unit-testing/index.md [_Integration Tests_]: @/edition-2/posts/deprecated/05-integration-tests/index.md [_A Minimal Rust Kernel_]: @/edition-2/posts/02-minimal-rust-kernel/index.md [sets a default target]: @/edition-2/posts/02-minimal-rust-kernel/index.md#set-a-default-target [defines a runner executable]: @/edition-2/posts/02-minimal-rust-kernel/index.md#using-cargo-run ## Rust中的测试 Rust有一个**内置的测试框架**([built-in test framework][built-in test framework]):无需任何设置就可以进行单元测试,只需要创建一个通过assert来检查结果的函数并在函数的头部加上 `#[test]` 属性即可。然后 `cargo test` 会自动找到并执行你的crate中的所有测试函数。 [built-in test framework]: https://doc.rust-lang.org/book/second-edition/ch11-00-testing.html 为了启用内核二进制文件的测试功能,我们可以在 `Cargo.toml` 中将 `test` 标志设置为 `true`: ```toml # 在 Cargo.toml 中 [[bin]] name = "blog_os" test = true bench = false ``` 这个 [`[[bin]]` 配置段](https://doc.rust-lang.org/cargo/reference/cargo-targets.html#configuring-a-target) 指定了 `cargo` 应如何编译 `blog_os` 可执行文件。 其中 `test` 字段用于指定该可执行文件是否支持测试。 在第一篇文章中,我们为了 [使 `rust-analyzer` 正常](@/edition-2/posts/01-freestanding-rust-binary/index.md#making-rust-analyzer-happy) 将其设置为 `false`,但现在我们需要启用测试,因此将其重新设为 `true`。 不幸的是,对于像内核这样的 `no_std` 应用来说,测试会变得比较复杂。问题在于 Rust 的测试框架隐式地使用了内置的 [`test`][`test`] 库,而这个库依赖于标准库。这意味着我们无法为 `#[no_std]` 内核使用默认的测试框架。 [`test`]: https://doc.rust-lang.org/test/index.html 当我们试图在我们的项目中执行 `cargo test` 时,我们可以看到如下信息: ``` > cargo test Compiling blog_os v0.1.0 (/…/blog_os) error[E0463]: can't find crate for `test` ``` 由于 `test` 库依赖于标准库,所以它在我们的裸机目标上并不可用。虽然将 `test` 库移植到一个 `#[no_std]` 上下文环境中是[可能的][utest],但是这样做是高度不稳定的,并且还会需要一些特殊的hacks,例如重定义 `panic` 宏。 [utest]: https://github.com/japaric/utest ### 自定义测试框架 幸运的是,Rust支持通过使用不稳定的**自定义测试框架**([`custom_test_frameworks`]) 功能来替换默认的测试框架。该功能不需要额外的库,因此在 `#[no_std]`环境中它也可以工作。它的工作原理是收集所有标注了 `#[test_case]`属性的函数,然后将这个测试函数的列表作为参数传递给用户指定的runner函数。因此,它实现了对测试过程的最大控制。 [`custom_test_frameworks`]: https://doc.rust-lang.org/unstable-book/language-features/custom-test-frameworks.html 与默认的测试框架相比,它的缺点是有一些高级功能诸如 [`should_panic` tests] 都不可用了。相对的,如果需要这些功能,我们需要自己来实现。当然,这点对我们来说是好事,因为我们的环境非常特殊,在这个环境里,这些高级功能的默认实现无论如何都是无法工作的,举个例子, `#[should_panic]` 属性依赖于栈展开来捕获内核panic,而我们的内核早已将其禁用了。 [`should_panic` tests]: https://doc.rust-lang.org/book/ch11-01-writing-tests.html#checking-for-panics-with-should_panic 要为我们的内核实现自定义测试框架,我们需要将如下代码添加到我们的 `main.rs` 中去: ```rust // in src/main.rs #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #[cfg(test)] pub fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } } ``` 我们的runner会打印一个简短的debug信息然后调用列表中的每个测试函数。参数类型 `&[&dyn Fn()]` 是[_Fn()_] trait的 [_trait object_] 引用的一个 [_slice_]。它基本上可以被看做一个可以像函数一样被调用的类型的引用列表。由于这个函数在不进行测试的时候没有什么用,这里我们使用 `#[cfg(test)]`属性保证它只会出现在测试中。 [_slice_]: https://doc.rust-lang.org/std/primitive.slice.html [_trait object_]: https://doc.rust-lang.org/1.30.0/book/first-edition/trait-objects.html [_Fn()_]: https://doc.rust-lang.org/std/ops/trait.Fn.html 现在当我们运行 `cargo test` ,我们可以发现运行成功了。然而,我们看到的仍然是"Hello World"而不是我们的 `test_runner`传递来的信息。这是由于我们的入口点仍然是 `_start` 函数——自定义测试框架会生成一个`main`函数来调用`test_runner`,但是由于我们使用了 `#[no_main]`并提供了我们自己的入口点,所以这个`main`函数就被忽略了。
    **Note:** cargo目前有个bug,就是某些测试用例会在执行 `cargo test` 时抛出 `duplicate lang item` 错误。目前已知的复现条件是在你的 `Cargo.toml` 中配置 `panic = "abort"`,只要移除掉,`cargo test` 即可正常执行。如果你对这个bug感兴趣,可以关注一下这个 [cargo issue](https://github.com/rust-lang/cargo/issues/7359)。
    为了修复这个问题,我们需要通过 `reexport_test_harness_main`属性来将生成的函数的名称更改为与`main`不同的名称。然后我们可以在我们的`_start`函数里调用这个重命名的函数: ```rust // in src/main.rs #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } ``` 我们将测试框架的入口函数的名字设置为`test_main`,并在我们的 `_start`入口点里调用它。通过使用**条件编译**([conditional compilation]),我们能够只在上下文环境为测试(test)时调用 `test_main` ,因为该函数将不在非测试上下文中生成。 现在当我们执行 `cargo test`时,我们可以看到我们的`test_runner`将"Running 0 tests"信息显示在屏幕上了。我们可以创建第一个测试函数了: ```rust // in src/main.rs #[test_case] fn trivial_assertion() { print!("trivial assertion... "); assert_eq!(1, 1); println!("[ok]"); } ``` 现在,当我们运行 `cargo test` 时,我们可以看到如下输出: ![QEMU printing "Hello World!", "Running 1 tests", and "trivial assertion... [ok]"](https://os.phil-opp.com/testing/qemu-test-runner-output.png) 传递给 `test_runner`函数的`tests`切片里包含了一个 `trivial_assertion` 函数的引用,从屏幕上输出的 `trivial assertion... [ok]` 信息可见,我们的测试已被调用并且顺利通过。 在执行完tests后, `test_runner` 会将结果返回给 `test_main` 函数,而这个函数又返回到 `_start` 入口点函数——这样我们就进入了一个死循环,因为入口点函数是不允许返回的。这将导致一个问题:我们希望 `cargo test` 在所有的测试运行完毕后,直接返回并退出。 ## 退出QEMU 现在我们在 `_start` 函数结束后进入了一个死循环,所以每次执行完 `cargo test` 后我们都需要手动去关闭QEMU;但是我们还想在没有用户交互的脚本环境下执行 `cargo test`。解决这个问题的最佳方式,是实现一个合适的方法来关闭我们的操作系统——不幸的是,这个方式实现起来相对有些复杂,因为这要求我们实现对[APM]或[ACPI]电源管理标准的支持。 [APM]: https://wiki.osdev.org/APM [ACPI]: https://wiki.osdev.org/ACPI 幸运的是,还有一个绕开这些问题的办法:QEMU支持一种名为 `isa-debug-exit` 的特殊设备,它提供了一种从客户系统(guest system)里退出QEMU的简单方式。为了使用这个设备,我们需要向QEMU传递一个 `-device` 参数。当然,我们也可以通过将 `package.metadata.bootimage.test-args` 配置关键字添加到我们的 `Cargo.toml` 来达到目的: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = ["-device", "isa-debug-exit,iobase=0xf4,iosize=0x04"] ``` `bootimage runner` 会在QEMU的默认测试命令后添加 `test-args` 参数。(对于 `cargo run` 命令,这个参数会被忽略。) 在传递设备名 (`isa-debug-exit`)的同时,我们还传递了两个参数,`iobase` 和 `iosize` 。这两个参数指定了一个_I/O 端口_,我们的内核将通过它来访问设备。 ### I/O 端口 在x86平台上,CPU和外围硬件通信通常有两种方式,**内存映射I/O**和**端口映射I/O**。之前,我们已经使用内存映射的方式,通过内存地址 `0xb8000` 访问了[VGA文本缓冲区]。该地址并没有映射到RAM,而是映射到了VGA设备的一部分内存上。 [VGA text buffer]: @/edition-2/posts/03-vga-text-buffer/index.md 与内存映射不同,端口映射I/O使用独立的I/O总线来进行通信。每个外围设备都有一个或数个端口号。CPU采用了特殊的`in`和`out`指令来和端口通信,这些指令要求一个端口号和一个字节的数据作为参数(有些这种指令的变体也允许发送 `u16` 或是 `u32` 长度的数据)。 `isa-debug-exit` 设备使用的就是端口映射I/O。其中, `iobase` 参数指定了设备对应的端口地址(在x86中,`0xf4` 是一个[通常未被使用的端口][list of x86 I/O ports]),而 `iosize` 则指定了端口的大小(`0x04` 代表4字节)。 [list of x86 I/O ports]: https://wiki.osdev.org/I/O_Ports#The_list ### 使用退出(Exit)设备 `isa-debug-exit` 设备的功能非常简单。当一个 `value` 写入 `iobase` 指定的端口时,它会导致QEMU以**退出状态**([exit status])`(value << 1) | 1` 退出。也就是说,当我们向端口写入 `0` 时,QEMU将以退出状态 `(0 << 1) | 1 = 1` 退出,而当我们向端口写入`1`时,它将以退出状态 `(1 << 1) | 1 = 3` 退出。 [exit status]: https://en.wikipedia.org/wiki/Exit_status 这里我们使用 [`x86_64`] crate提供的抽象,而不是手动调用 `in` 或 `out` 指令。为了添加对该crate的依赖,我们可以将其添加到我们的 `Cargo.toml`中的 `dependencies` 小节中去: [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/ ```toml # in Cargo.toml [dependencies] x86_64 = "0.14.2" ``` 现在我们可以使用crate中提供的 [`Port`] 类型来创建一个 `exit_qemu` 函数了: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html ```rust // in src/main.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` 该函数在 `0xf4` 处创建了一个新的端口,该端口同时也是 `isa-debug-exit` 设备的 `iobase` 。然后它会向端口写入传递的退出代码。这里我们使用 `u32` 来传递数据,因为我们之前已经将 `isa-debug-exit` 设备的 `iosize` 指定为4字节了。上述两个操作都是 `unsafe` 的,因为I/O端口的写入操作通常会导致一些不可预知的行为。 为了指定退出状态,我们创建了一个 `QemuExitCode` 枚举。思路大体上是,如果所有的测试均成功,就以成功退出码退出;否则就以失败退出码退出。这个枚举类型被标记为 `#[repr(u32)]`,代表每个变量都是一个 `u32` 的整数类型。我们使用退出代码 `0x10` 代表成功,`0x11` 代表失败。 实际的退出代码并不重要,只要它们不与QEMU的默认退出代码冲突即可。 例如,使用退出代码0表示成功可能并不是一个好主意,因为它在转换后就变成了 `(0 << 1) | 1 = 1` ,而 `1` 是QEMU运行失败时的默认退出代码。 这样,我们就无法将QEMU错误与成功的测试运行区分开来了。 现在我们来更新 `test_runner` 的代码,让程序在运行所有测试完毕后退出QEMU: ```rust // in src/main.rs fn test_runner(tests: &[&dyn Fn()]) { println!("Running {} tests", tests.len()); for test in tests { test(); } /// new exit_qemu(QemuExitCode::Success); } ``` 当我们现在运行 `cargo test` 时,QEMU会在测试运行后立刻退出。现在的问题是,即使我们传递了表示成功(`Success`)的退出代码, `cargo test` 依然会将所有的测试都视为失败: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.03s Running target/x86_64-blog_os/debug/deps/blog_os-5804fc7d2dd4c9be Building bootloader Compiling bootloader v0.5.3 (/home/philipp/Documents/bootloader) Finished release [optimized + debuginfo] target(s) in 1.07s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-5804fc7d2dd4c9be.bin -device isa-debug-exit,iobase=0xf4, iosize=0x04` error: test failed, to rerun pass '--bin blog_os' ``` 这里的问题在于,`cargo test` 会将所有非 `0` 的错误码都视为测试失败。 ### 成功退出(Exit)代码 为了解决这个问题, `bootimage` 提供了一个 `test-success-exit-code` 配置项,可以将指定的退出代码映射到退出代码 `0`: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = […] test-success-exit-code = 33 # (0x10 << 1) | 1 ``` 有了这个配置,`bootimage` 就会将我们的成功退出码映射到退出码0;这样一来, `cargo test` 就能正确地识别出测试成功的情况,而不会将其视为测试失败。 我们的 test runner 现在会在正确报告测试结果后自动关闭QEMU。我们可以看到QEMU的窗口只会显示很短的时间——我们很难看清测试的结果。如果测试结果会打印在控制台上而不是QEMU里,让我们能在QEMU退出后仍然能看到测试结果就好了。 ## 打印到控制台 要在控制台上查看测试输出,我们需要以某种方式将数据从内核发送到宿主系统。 有多种方法可以实现这一点,例如通过TCP网络接口来发送数据。但是,设置网络堆栈是一项很复杂的任务,这里我们可以选择更简单的解决方案。 ### 串口 发送数据的一个简单的方式是通过[串行端口][serial port],这是一个现代电脑中已经不存在的旧标准接口(译者注:玩过单片机的同学应该知道,其实译者上大学的时候有些同学的笔记本电脑还有串口的,没有串口的同学在烧录单片机程序的时候也都会需要usb转串口线,一般是51,像stm32有st-link,这个另说,不过其实也可以用串口来下载)。串口非常易于编程,QEMU可以将通过串口发送的数据重定向到宿主机的标准输出或是文件中。 [serial port]: https://en.wikipedia.org/wiki/Serial_port 用来实现串行接口的芯片被称为 [UARTs]。在x86上,有[很多UART模型][lots of UART models],但是幸运的是,它们之间仅有的那些不同之处都是我们用不到的高级功能。目前通用的UARTs都会兼容[16550 UART],所以我们在我们测试框架里采用该模型。 [UARTs]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter [lots of UART models]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#Models [16550 UART]: https://en.wikipedia.org/wiki/16550_UART 我们使用 [`uart_16550`] crate来初始化UART,并通过串口来发送数据。为了将该crate添加为依赖,我们需要将 `Cargo.toml` 和 `main.rs` 修改为如下: [`uart_16550`]: https://docs.rs/uart_16550 ```toml # in Cargo.toml [dependencies] uart_16550 = "0.2.0" ``` `uart_16550` crate包含了一个代表UART寄存器的 `SerialPort` 结构体,但是我们仍然需要自己来创建一个相应的实例。我们使用以下代码来创建一个新的串口模块 `serial`: ```rust // in src/main.rs mod serial; ``` ```rust // in src/serial.rs use uart_16550::SerialPort; use spin::Mutex; use lazy_static::lazy_static; lazy_static! { pub static ref SERIAL1: Mutex = { let mut serial_port = unsafe { SerialPort::new(0x3F8) }; serial_port.init(); Mutex::new(serial_port) }; } ``` 就像[VGA文本缓冲区][vga lazy-static]一样,我们使用 `lazy_static` 和一个自旋锁来创建一个 `static` writer实例。通过使用 `lazy_static` ,我们可以保证 `init` 方法只会在该示例第一次被使用使被调用。 和 `isa-debug-exit` 设备一样,UART也是通过I/O端口进行编程的。由于UART相对来讲更加复杂,它使用多个I/O端口来对不同的设备寄存器进行编程。`unsafe` 的 `SerialPort::new` 函数需要UART的第一个I/O端口的地址作为参数,从该地址中可以计算出所有所需端口的地址。我们传递的端口地址为 `0x3F8` ,该地址是第一个串行接口的标准端口号。 [vga lazy-static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics 为了使串口更加易用,我们添加了 `serial_print!` 和 `serial_println!`宏: ```rust // in src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; SERIAL1.lock().write_fmt(args).expect("Printing to serial failed"); } /// Prints to the host through the serial interface. #[macro_export] macro_rules! serial_print { ($($arg:tt)*) => { $crate::serial::_print(format_args!($($arg)*)); }; } /// Prints to the host through the serial interface, appending a newline. #[macro_export] macro_rules! serial_println { () => ($crate::serial_print!("\n")); ($fmt:expr) => ($crate::serial_print!(concat!($fmt, "\n"))); ($fmt:expr, $($arg:tt)*) => ($crate::serial_print!( concat!($fmt, "\n"), $($arg)*)); } ``` 该实现和我们此前的 `print` 和 `println` 宏的实现非常类似。 由于 `SerialPort` 类型已经实现了 [`fmt::Write`] trait,所以我们不需要提供我们自己的实现了。 [`fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html 现在我们可以从测试代码里向串行接口打印而不是向VGA文本缓冲区打印了: ```rust // in src/main.rs #[cfg(test)] fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); […] } #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` 注意,由于我们使用了 `#[macro_export]` 属性, `serial_println` 宏直接位于根命名空间下,所以通过 `use crate::serial::serial_println` 来导入该宏是不起作用的。 ### QEMU参数 为了查看QEMU的串行输出,我们需要使用 `-serial` 参数将输出重定向到stdout: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio" ] ``` 现在,当我们运行 `cargo test` 时,我们可以直接在控制台里看到测试输出了: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [ok] ``` 然而,当测试失败时,我们仍然会在QEMU内看到输出结果,因为我们的panic handler还是用了 `println`。为了模拟这个过程,我们将我们的 `trivial_assertion` test中的断言(assertion)修改为 `assert_eq!(0, 1)`: ![QEMU printing "Hello World!" and "panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:55:5](https://os.phil-opp.com/testing/qemu-failed-test.png) 可以看到,panic信息被打印到了VGA缓冲区里,而测试输出则被打印到串口上了。panic信息非常有用,所以我们希望能够在控制台中来查看它。 ### 在panic时打印一个错误信息 为了在panic时使用错误信息来退出QEMU,我们可以使用**条件编译**([conditional compilation])在测试模式下使用(与非测试模式下)不同的panic处理方式: [conditional compilation]: https://doc.rust-lang.org/1.30.0/book/first-edition/conditional-compilation.html ```rust // in src/main.rs // our existing panic handler #[cfg(not(test))] // new attribute #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } // our panic handler in test mode #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } ``` 在我们的测试panic处理中,我们用 `serial_println` 来代替 `println` 并使用失败代码来退出QEMU。注意,在 `exit_qemu` 调用后,我们仍然需要一个无限循环的 `loop` 因为编译器并不知道 `isa-debug-exit` 设备会导致程序退出。 现在,即使在测试失败的情况下QEMU仍然会退出,并会将一些有用的错误信息打印到控制台: ``` > cargo test Finished dev [unoptimized + debuginfo] target(s) in 0.02s Running target/x86_64-blog_os/debug/deps/blog_os-7b7c37b4ad62551a Building bootloader Finished release [optimized + debuginfo] target(s) in 0.02s Running: `qemu-system-x86_64 -drive format=raw,file=/…/target/x86_64-blog_os/debug/ deps/bootimage-blog_os-7b7c37b4ad62551a.bin -device isa-debug-exit,iobase=0xf4,iosize=0x04 -serial stdio` Running 1 tests trivial assertion... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `0`, right: `1`', src/main.rs:65:5 ``` 由于现在所有的测试都将输出到控制台上,我们不再需要让QEMU窗口弹出一小会儿了——我们完全可以把窗口藏起来。 ### 隐藏 QEMU 由于我们使用 `isa-debug-exit` 设备和串行端口来报告完整的测试结果,所以我们不再需要QEMU的窗口了。我们可以通过向QEMU传递 `-display none` 参数来将其隐藏: ```toml # in Cargo.toml [package.metadata.bootimage] test-args = [ "-device", "isa-debug-exit,iobase=0xf4,iosize=0x04", "-serial", "stdio", "-display", "none" ] ``` 现在QEMU完全在后台运行,且没有任何窗口会被打开。这不仅很清爽,还允许我们的测试框架在没有图形界面的环境里,诸如CI服务器或是[SSH]连接里运行。 [SSH]: https://en.wikipedia.org/wiki/Secure_Shell ### 超时 由于 `cargo test` 会等待test runner退出,如果一个测试永远不返回那么它就会一直阻塞test runner。幸运的是,在实际应用中这并不是一个大问题,因为无限循环通常是很容易避免的。在我们的这个例子里,无限循环会发生在以下几种不同的情况中: - bootloader加载内核失败,导致系统不停重启; - BIOS/UEFI固件加载bootloader失败,同样会导致无限重启; - CPU在某些函数结束时进入一个 `loop {}` 语句,例如因为QEMU的exit设备无法正常工作而导致死循环; - 硬件触发了系统重置,例如未捕获CPU异常时(后续的文章将会详细解释)。 由于无限循环可能会在各种情况中发生,因此, `bootimage` 工具默认为每个可执行测试设置了一个长度为5分钟的超时时间。如果测试未在此时间内完成,则将其标记为失败,并向控制台输出"Timed Out(超时)"错误。这个功能确保了那些卡在无限循环里的测试不会一直阻塞 `cargo test`。 你可以将`loop {}`语句添加到 `trivial_assertion` 测试中来进行尝试。当你运行 `cargo test` 时,你可以发现该测试会在五分钟后被标记为超时。超时持续的时间可以通过Cargo.toml中的 `test-timeout` 配置项来进行[配置][bootimage config]: [bootimage config]: https://github.com/rust-osdev/bootimage#configuration ```toml # in Cargo.toml [package.metadata.bootimage] test-timeout = 300 # (in seconds) ``` 如果你不想为了观察 `trivial_assertion` 测试超时等待5分钟之久,你可以将这个配置数值调低一些。 ### 自动添加打印语句 `trivial_assertion` 测试仅能使用 `serial_print!`/`serial_println!` 输出自己的状态信息: ```rust #[test_case] fn trivial_assertion() { serial_print!("trivial assertion... "); assert_eq!(1, 1); serial_println!("[ok]"); } ``` 为每一个测试手动添加固定的日志实在是太烦琐了,所以我们可以修改一下 `test_runner` 把这部分逻辑改进一下,使其可以自动添加日志输出。那么我们先建立一个 `Testable` trait: ```rust // in src/main.rs pub trait Testable { fn run(&self) -> (); } ``` 下面这个 trick 将会实现上面书写的 trait,并约束只有满足 [`Fn()` trait] 的泛型可使用这个实现: [`Fn()` trait]: https://doc.rust-lang.org/stable/core/ops/trait.Fn.html ```rust // in src/main.rs impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } ``` 我们实现的 `run` 函数中,首先使用 [`any::type_name`] 输出了函数名,这个函数事实上是被编译器实现的,可以返回任意类型的字符串形式。对于函数而言,其类型的字符串形式就是它的函数名,而函数名也正是我们想要的测试用例名称。至于 `\t` 则代表 [制表符][tab character],其作用是为后面的 `[ok]` 输出增加一点左边距。 [`any::type_name`]: https://doc.rust-lang.org/stable/core/any/fn.type_name.html [tab character]: https://en.wikipedia.org/wiki/Tab_character 输出函数名之后,我们通过 `self()` 调用了测试函数本身,该调用方式属于 `Fn()` trait 独有,如果测试函数顺利执行完毕,则 `[ok]` 也会被输出出来。 最后一步就是给 `test_runner` 的参数附加上 `Testable` trait: ```rust // in src/main.rs #[cfg(test)] pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); // new } exit_qemu(QemuExitCode::Success); } ``` 仅有的两处修改,就是将 `tests` 参数的类型从 `&[&dyn Fn()]` 改为了 `&[&dyn Testable]`,以及将函数调用方式从 `test()` 改成了 `test.run()`。 由于我们已经完成了首尾输出的自动化,所以 `trivial_assertion` 里那两行输出语句也就可以删掉了: ```rust // in src/main.rs #[test_case] fn trivial_assertion() { assert_eq!(1, 1); } ``` 现在 `cargo test` 的输出就变成了下面这样: ``` Running 1 tests blog_os::trivial_assertion... [ok] ``` 如你所见,自动生成的函数名包含了完整的内部路径,但是也因此可以区分不同模块下的同名函数。除此之外,其输出和之前看起来完全相同,我们也就不再需要在测试函数内部加输出语句了。 ## 测试VGA缓冲区 现在我们已经有了一个可以工作的测试框架了,我们可以为我们的VGA缓冲区实现创建一些测试。首先,我们创建了一个非常简单的测试来验证 `println`是否正常运行而不会panic: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_simple() { println!("test_println_simple output"); } ``` 这个测试所做的仅仅是将一些内容打印到VGA缓冲区。如果它正常结束并且没有panic,也就意味着 `println` 调用也没有panic。 为了确保即使打印很多行且有些行超出屏幕的情况下也没有panic发生,我们可以创建另一个测试: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_many() { for _ in 0..200 { println!("test_println_many output"); } } ``` 我们还可以创建另一个测试函数,来验证打印的几行字符是否真的出现在了屏幕上: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` 该函数定义了一个测试字符串,并通过 `println`将其输出,然后遍历静态 `WRITER` 也就是vga字符缓冲区的屏幕字符。由于 `println` 在将字符串打印到屏幕上最后一行后会立刻附加一个新行(即输出完后有一个换行符),所以这个字符串应该会出现在第 `BUFFER_HEIGHT - 2`行。 通过使用[`enumerate`] ,我们统计了变量 `i` 的迭代次数,然后用它来加载对应于`c`的屏幕字符。 通过比较屏幕字符的 `ascii_character` 和 `c` ,我们可以确保字符串的每个字符确实出现在vga文本缓冲区中。 [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate 如你所想,我们可以创建更多的测试函数:例如一个用来测试当打印一个很长的且包装正确的行时是否会发生panic的函数,或是一个用于测试换行符、不可打印字符、非unicode字符是否能被正确处理的函数。 在这篇文章的剩余部分,我们还会解释如何创建一个 _集成测试_ 以测试不同组件之间的交互。 ## 集成测试 在Rust中,**集成测试**([integration tests])的约定是将其放到项目根目录中的 `tests` 目录下(即 `src` 的同级目录)。无论是默认测试框架还是自定义测试框架都将自动获取并执行该目录下所有的测试。 [integration tests]: https://doc.rust-lang.org/book/ch11-03-test-organization.html#integration-tests 所有的集成测试都是它们自己的可执行文件,并且与我们的 `main.rs` 完全独立。这也就意味着每个测试都需要定义它们自己的函数入口点。让我们创建一个名为 `basic_boot` 的例子来看看集成测试的工作细节吧: ```rust // in tests/basic_boot.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { test_main(); loop {} } fn test_runner(tests: &[&dyn Fn()]) { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { loop {} } ``` 由于集成测试都是单独的可执行文件,所以我们需要再次提供所有的crate属性(`no_std`, `no_main`, `test_runner`, 等等)。我们还需要创建一个新的入口点函数 `_start`,用于调用测试入口函数 `test_main`。我们不需要任何的 `cfg(test)` 属性,因为集成测试的二进制文件在非测试模式下根本不会被编译构建。 这里我们采用[`unimplemented`]宏,充当 `test_runner` 暂未实现的占位符;添加简单的 `loop {}` 循环,作为 `panic` 处理器的内容。理想情况下,我们希望能向我们在 `main.rs` 里所做的一样使用 `serial_println` 宏和 `exit_qemu` 函数来实现这个函数。但问题是,由于这些测试的构建和我们的 `main.rs` 的可执行文件是完全独立的,我们没有办法使用这些函数。 [`unimplemented`]: https://doc.rust-lang.org/core/macro.unimplemented.html 如果现阶段你运行 `cargo test`,你将进入一个无限循环,因为目前panic的处理就是进入无限循环。你需要使用快捷键 `Ctrl+c`,才可以退出QEMU。 ### 创建一个库 为了让这些函数能在我们的集成测试中使用,我们需要从我们的 `main.rs` 中分割出一个库,这个库应当可以被其他的crate和集成测试可执行文件使用。为了达成这个目的,我们创建了一个新文件,`src/lib.rs`: ```rust // src/lib.rs #![no_std] ``` 和 `main.rs` 一样,`lib.rs` 也是一个可以被cargo自动识别的特殊文件。该库是一个独立的编译单元,所以我们需要再次指定 `#![no_std]` 属性。 为了让我们的库可以和 `cargo test` 一起协同工作,我们还需要移动以下测试函数和属性: ```rust // in src/lib.rs #![cfg_attr(test, no_main)] #![feature(custom_test_frameworks)] #![test_runner(crate::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; pub trait Testable { fn run(&self) -> (); } impl Testable for T where T: Fn(), { fn run(&self) { serial_print!("{}...\t", core::any::type_name::()); self(); serial_println!("[ok]"); } } pub fn test_runner(tests: &[&dyn Testable]) { serial_println!("Running {} tests", tests.len()); for test in tests { test.run(); } exit_qemu(QemuExitCode::Success); } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); loop {} } /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { test_panic_handler(info) } ``` 为了能在可执行文件和集成测试中使用 `test_runner`,我们不对其应用 `cfg(test)` 属性,并将其设置为public。同时,我们还将panic的处理程序分解为public函数 `test_panic_handler`,这样一来它也可以用于可执行文件了。 由于我们的 `lib.rs` 是独立于 `main.rs` 进行测试的,因此当该库实在测试模式下编译时我们需要添加一个 `_start` 入口点和一个panic处理程序。通过使用[`cfg_attr`] ,我们可以在这种情况下有条件地启用 `no_main` 属性。 [`cfg_attr`]: https://doc.rust-lang.org/reference/conditional-compilation.html#the-cfg_attr-attribute 我们还将 `QemuExitCode` 枚举和 `exit_qemu` 函数从main.rs移动过来,并将其设置为公有函数: ```rust // in src/lib.rs #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[repr(u32)] pub enum QemuExitCode { Success = 0x10, Failed = 0x11, } pub fn exit_qemu(exit_code: QemuExitCode) { use x86_64::instructions::port::Port; unsafe { let mut port = Port::new(0xf4); port.write(exit_code as u32); } } ``` 现在,可执行文件和集成测试都可以从库中导入这些函数,而不需要实现自己的定义。为了使 `println` 和 `serial_println` 可用,我们将以下的模块声明代码也移动到 `lib.rs` 中: ```rust // in src/lib.rs pub mod serial; pub mod vga_buffer; ``` 我们将这些模块设置为public(公有),这样一来我们在库的外部也一样能使用它们了。由于这两者都用了该模块内的 `_print` 函数,所以这也是让 `println` 和 `serial_println` 宏可用的必要条件。 现在我们修改我们的 `main.rs` 代码来使用该库: ```rust // src/main.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] use core::panic::PanicInfo; use blog_os::println; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); #[cfg(test)] test_main(); loop {} } /// This function is called on panic. #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } #[cfg(test)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` 可以看到,这个库用起来就像一个普通的外部crate。它的调用方法与其它crate无异;在我们的这个例子中,位置可能为 `blog_os`。上述代码使用了 `test_runner` 属性中的 `blog_os::test_runner` 函数和 `cfg(test)` 的panic处理中的 `blog_os::test_panic_handler` 函数。它还导入了 `println` 宏,这样一来,我们可以在我们的 `_start` 和 `panic` 中使用它了。 与此同时,`cargo run` 和 `cargo test`可以再次正常工作了。当然了,`cargo test`仍然会进入无限循环(你可以通过`ctrl+c`来退出),接下来我们将在集成测试中通过所需要的库函数来修复这个问题。 ### 完成集成测试 就像我们的 `src/main.rs`,我们的 `tests/basic_boot.rs` 可执行文件同样可以从我们的新库中导入类型。这也就意味着我们可以导入缺失的组件来完成我们的测试。 ```rust // in tests/basic_boot.rs #![test_runner(blog_os::test_runner)] #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` 这里我们使用我们的库中的 `test_runner` 函数,而不是重新实现一个test runner。至于panic处理,调用 `blog_os::test_panic_handler` 函数即可,就像我们之前在我们的 `main.rs` 里面做的一样。 现在,`cargo test`又可以正常退出了。当你运行该命令时,你会发现它为我们的 `lib.rs`, `main.rs`, 和 `basic_boot.rs` 分别构建并运行了测试。其中,对于 `main.rs` 和 `basic_boot` 的集成测试,它会报告"Running 0 tests"(正在运行0个测试),因为这些文件里面没有任何用 `#[test_case]`标注的函数。 现在我们可以在`basic_boot.rs`中添加测试了。举个例子,我们可以测试`println`是否能够正常工作而不panic,就像我们之前在vga缓冲区测试中做的那样: ```rust // in tests/basic_boot.rs use blog_os::println; #[test_case] fn test_println() { println!("test_println output"); } ``` 现在当我们运行`cargo test`时,我们可以看到它会寻找并执行这些测试函数。 由于该测试和vga缓冲区测试中的一个几乎完全相同,所以目前它看起来似乎没什么用。然而在将来,我们的 `main.rs` 和 `lib.rs` 中的 `_start` 函数的内容会不断增长,并且在运行 `test_main` 之前需要调用一系列的初始化进程,所以这两个测试将会运行在完全不同的环境中(译者注:也就是说虽然现在看起来差不多,但是在将来该测试和vga buffer中的测试会很不一样,有必要单独拿出来,这两者并没有重复)。 通过在 `basic_boot` 环境里不调用任何初始化例程的 `_start` 中测试 `println` 函数,我们可以确保 `println` 在启动(boot)后可以正常工作。这一点非常重要,因为我们有很多部分依赖于 `println`,例如打印panic信息。 ### 未来的测试 集成测试的强大之处在于,它们可以被看成是完全独立的可执行文件;这也给了它们完全控制环境的能力,使得他们能够测试代码和CPU或是其他硬件的交互是否正确。 我们的 `basic_boot` 测试正是集成测试的一个非常简单的例子。在将来,我们的内核的功能会变得更多,和硬件交互的方式也会变得多种多样。通过添加集成测试,我们可以保证这些交互按预期工作(并一直保持工作)。下面是一些对于未来的测试的设想: - **CPU异常**:当代码执行无效操作(例如除以零)时,CPU就会抛出异常。内核会为这些异常注册处理函数。集成测试可以验证在CPU异常时是否调用了正确的异常处理程序,或者在可解析的异常之后程序是否能正确执行; - **页表**:页表定义了哪些内存区域是有效且可访问的。通过修改页表,可以重新分配新的内存区域,例如,当你启动一个软件的时候。我们可以在集成测试中调整 `_start` 函数中的一些页表项,并确认这些改动是否会对 `#[test_case]` 的函数产生影响; - **用户空间程序**:用户空间程序是只能访问有限的系统资源的程序。例如,他们无法访问内核数据结构或是其他应用程序的内存。集成测试可以启动执行禁止操作的用户空间程序验证认内核是否会将这些操作全都阻止。 可以想象,还有更多的测试可以进行。通过添加各种各样的测试,我们确保在为我们的内核添加新功能或是重构代码时,不会意外地破坏他们。这一点在我们的内核变得更大和更复杂的时候显得尤为重要。 ### 那些应该Panic的测试 标准库的测试框架支持 [`#[should_panic]` 属性][should_panic],这允许我们构造理应失败的测试。这个功能对于验证传递无效参数时函数是否会失败非常有用。不幸的是,这个属性需要标准库的支持,因此,在 `#[no_std]` 环境下无法使用。 [should_panic]: https://doc.rust-lang.org/rust-by-example/testing/unit_testing.html#testing-panics 尽管我们不能在我们的内核中使用 `#[should_panic]` 属性,但是通过创建一个集成测试我们可以达到类似的效果——该集成测试可以从panic处理程序中返回一个成功错误代码。接下来让我一起来创建一个如上所述名为 `should_panic` 的测试吧: ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{QemuExitCode, exit_qemu, serial_println}; #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` 这个测试还没有完成,因为它尚未定义 `_start` 函数或是其他自定义的test runner属性。让我们来补充缺少的内容吧: ```rust // in tests/should_panic.rs #![feature(custom_test_frameworks)] #![test_runner(test_runner)] #![reexport_test_harness_main = "test_main"] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { test_main(); loop {} } pub fn test_runner(tests: &[&dyn Fn()]) { serial_println!("Running {} tests", tests.len()); for test in tests { test(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); } exit_qemu(QemuExitCode::Success); } ``` 这个测试定义了自己的 `test_runner` 函数,而不是复用 `lib.rs` 中的 `test_runner`,该函数会在测试没有panic而是正常退出时返回一个错误退出代码(因为这里我们希望测试会panic)。如果没有定义测试函数,runner就会以一个成功错误代码退出。由于这个runner总是在执行完单个的测试后就退出,因此定义超过一个 `#[test_case]` 的函数都是没有意义的。 现在我们来创建一个应该失败的测试: ```rust // in tests/should_panic.rs use blog_os::serial_print; #[test_case] fn should_fail() { serial_print!("should_fail... "); assert_eq!(0, 1); } ``` 该测试用 `assert_eq`来断言(assert)`0` 和 `1` 是否相等。毫无疑问,这当然会失败(`0` 当然不等于 `1`),所以我们的测试就会像我们想要的那样panic。 当我们通过 `cargo test --test should_panic` 运行该测试时,我们会发现测试成功,该测试如我们预期的那样panic了。当我们将断言部分(即 `assert_eq!(0, 1);`)注释掉后,我们就会发现测试失败,并返回了 _"test did not panic"_ 的信息。 这种方法的缺点是它只使用于单个的测试函数。对于多个 `#[test_case]` 函数,它只会执行第一个函数,因为程序无法在panic处理被调用后继续执行。我目前没有想到解决这个问题的方法,如果你有任何想法,请务必告诉我! ### 无约束测试 对于那些只有单个测试函数的集成测试而言(例如我们的 `should_panic` 测试),其实并不需要test runner。对于这种情况,我们可以完全禁用test runner,直接在 `_start` 函数中直接运行我们的测试。 这里的关键就是在 `Cargo.toml` 中为测试禁用 `harness` flag,这个标志(flag)定义了是否将test runner用于集成测试中。如果该标志位被设置为 `false`,那么默认的test runner和自定义的test runner功能都将被禁用,这样一来该测试就可以像一个普通的可执行程序一样运行了。 现在为我们的 `should_panic` 测试禁用 `harness` flag吧: ```toml # in Cargo.toml [[test]] name = "should_panic" harness = false ``` 现在我们通过移除test runner相关的代码,大大简化了我们的 `should_panic` 测试。结果看起来如下: ```rust // in tests/should_panic.rs #![no_std] #![no_main] use core::panic::PanicInfo; use blog_os::{QemuExitCode, exit_qemu, serial_println, serial_print}; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { should_fail(); serial_println!("[test did not panic]"); exit_qemu(QemuExitCode::Failed); loop{} } fn should_fail() { serial_print!("should_fail... "); assert_eq!(0, 1); } #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` 现在我们可以通过我们的 `_start` 函数来直接调用 `should_fail` 函数了,如果返回则返回一个失败退出代码并退出。现在当我们执行 `cargo test --test should_panic` 时,我们可以发现测试的行为和之前完全一样。 除了创建 `should_panic` 测试,禁用 `harness` 属性对复杂集成测试也很有用,例如,当单个测试函数会产生一些边际效应,需要通过特定的顺序执行时。 ## 总结 测试是一种非常有用的技术,它能确保特定的部件拥有我们期望的行为。即使它们不能显示是否有bug,它们仍然是用来寻找bug的利器,尤其是用来避免回归。 本文讲述了如何为我们的Rust kernel创建一个测试框架。我们使用Rust的自定义框架功能为我们的裸机环境实现了一个简单的 `#[test_case]` 属性支持。通过使用QEMU的 `isa-debug-exit` 设备,我们的test runner可以在运行测试后退出QEMU并报告测试状态。我们还为串行端口实现了一个简单的驱动,使得错误信息可以被打印到控制台而不是VGA buffer中。 在为我们的 `println` 宏创建了一些测试后,我们在本文的后半部分还探索了集成测试。我们了解到它们位于 `tests` 目录中,并被视为完全独立的可执行文件。为了使他们能够使用 `exit_qemu` 函数和 `serial_println` 宏,我们将大部分代码移动到一个库里,使其能够被导入到所有可执行文件和集成测试中。由于集成测试在各自独立的环境中运行,所以能够测试与硬件的交互或是创建应该panic的测试。 我们现在有了一个在QEMU内部真实环境中运行的测试框架。在未来的文章里,我们会创建更多的测试,从而让我们的内核在变得更复杂的同时保持可维护性。 ## 下期预告 在下一篇文章中,我们将会探索 _CPU异常_。这些异常将在一些非法事件发生时由CPU抛出,例如抛出除以零或是访问没有映射的内存页(通常也被称为 `page fault` 即页异常)。能够捕获和检查这些异常,对将来的调试来说是非常重要的。异常处理与键盘支持所需的硬件中断处理十分相似。 ================================================ FILE: blog/content/edition-2/posts/05-cpu-exceptions/index.es.md ================================================ +++ title = "Excepciones de CPU" weight = 5 path = "es/cpu-exceptions" date = 2018-06-17 [extra] chapter = "Interrupciones" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ Las excepciones de CPU ocurren en diversas situaciones erróneas, por ejemplo, al acceder a una dirección de memoria inválida o al dividir por cero. Para reaccionar ante ellas, tenemos que configurar una _tabla de descriptores de interrupción_ (IDT, por sus siglas en inglés) que proporcione funciones manejadoras. Al final de esta publicación, nuestro núcleo será capaz de capturar [excepciones de punto de interrupción] y reanudar la ejecución normal después. [excepciones de punto de interrupción]: https://wiki.osdev.org/Exceptions#Breakpoint Este blog se desarrolla abiertamente en [GitHub]. Si tiene algún problema o pregunta, por favor abra un problema allí. También puede dejar comentarios [al final]. El código fuente completo de esta publicación se puede encontrar en la rama [`post-05`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-05 ## Descripción general Una excepción indica que algo está mal con la instrucción actual. Por ejemplo, la CPU emite una excepción si la instrucción actual intenta dividir por 0. Cuando se produce una excepción, la CPU interrumpe su trabajo actual y llama inmediatamente a una función manejadora de excepciones específica, dependiendo del tipo de excepción. En x86, hay alrededor de 20 tipos diferentes de excepciones de CPU. Las más importantes son: - **Fallo de página**: Un fallo de página ocurre en accesos a memoria ilegales. Por ejemplo, si la instrucción actual intenta leer de una página no mapeada o intenta escribir en una página de solo lectura. - **Código de operación inválido**: Esta excepción ocurre cuando la instrucción actual es inválida, por ejemplo, cuando intentamos usar nuevas [instrucciones SSE] en una CPU antigua que no las soporta. - **Fallo de protección general**: Esta es la excepción con el rango más amplio de causas. Ocurre en varios tipos de violaciones de acceso, como intentar ejecutar una instrucción privilegiada en código de nivel de usuario o escribir en campos reservados en registros de configuración. - **Doble fallo**: Cuando ocurre una excepción, la CPU intenta llamar a la función manejadora correspondiente. Si ocurre otra excepción _mientras se llama a la función manejadora de excepciones_, la CPU genera una excepción de doble fallo. Esta excepción también ocurre cuando no hay una función manejadora registrada para una excepción. - **Triple fallo**: Si ocurre una excepción mientras la CPU intenta llamar a la función manejadora de doble fallo, emite un _triple fallo_ fatal. No podemos capturar o manejar un triple fallo. La mayoría de los procesadores reaccionan reiniciándose y reiniciando el sistema operativo. [instrucciones SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions Para ver la lista completa de excepciones, consulte la [wiki de OSDev][exceptions]. [exceptions]: https://wiki.osdev.org/Exceptions ### La tabla de descriptores de interrupción Para poder capturar y manejar excepciones, tenemos que configurar una llamada _tabla de descriptores de interrupción_ (IDT). En esta tabla, podemos especificar una función manejadora para cada excepción de CPU. El hardware utiliza esta tabla directamente, por lo que necesitamos seguir un formato predefinido. Cada entrada debe tener la siguiente estructura de 16 bytes: | Tipo | Nombre | Descripción | | ---- | ------------------------- | ----------------------------------------------------------------------- | | u16 | Puntero a función [0:15] | Los bits más bajos del puntero a la función manejadora. | | u16 | Selector GDT | Selector de un segmento de código en la [tabla de descriptores global]. | | u16 | Opciones | (ver abajo) | | u16 | Puntero a función [16:31] | Los bits del medio del puntero a la función manejadora. | | u32 | Puntero a función [32:63] | Los bits restantes del puntero a la función manejadora. | | u32 | Reservado | [tabla de descriptores global]: https://en.wikipedia.org/wiki/Global_Descriptor_Table El campo de opciones tiene el siguiente formato: | Bits | Nombre | Descripción | | ----- | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- | | 0-2 | Índice de tabla de pila de interrupción | 0: No cambiar pilas, 1-7: Cambiar a la n-ésima pila en la Tabla de Pila de Interrupción cuando se llama a este manejador. | | 3-7 | Reservado | | 8 | 0: Puerta de interrupción, 1: Puerta de trampa | Si este bit es 0, las interrupciones están deshabilitadas cuando se llama a este manejador. | | 9-11 | debe ser uno | | 12 | debe ser cero | | 13‑14 | Nivel de privilegio del descriptor (DPL) | El nivel mínimo de privilegio requerido para llamar a este manejador. | | 15 | Presente | Cada excepción tiene un índice de IDT predefinido. Por ejemplo, la excepción de código de operación inválido tiene índice de tabla 6 y la excepción de fallo de página tiene índice de tabla 14. Así, el hardware puede cargar automáticamente la entrada de IDT correspondiente para cada excepción. La [Tabla de Excepciones][exceptions] en la wiki de OSDev muestra los índices de IDT de todas las excepciones en la columna “Vector nr.”. Cuando ocurre una excepción, la CPU realiza aproximadamente lo siguiente: 1. Empuja algunos registros en la pila, incluyendo el puntero de instrucción y el registro [RFLAGS]. (Usaremos estos valores más adelante en esta publicación.) 2. Lee la entrada correspondiente de la tabla de descriptores de interrupción (IDT). Por ejemplo, la CPU lee la 14ª entrada cuando ocurre un fallo de página. 3. Verifica si la entrada está presente y, si no, genera un doble fallo. 4. Deshabilita las interrupciones de hardware si la entrada es una puerta de interrupción (bit 40 no establecido). 5. Carga el selector [GDT] especificado en el CS (segmento de código). 6. Salta a la función manejadora especificada. [RFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register [GDT]: https://en.wikipedia.org/wiki/Global_Descriptor_Table No se preocupe por los pasos 4 y 5 por ahora; aprenderemos sobre la tabla de descriptores global y las interrupciones de hardware en publicaciones futuras. ## Un tipo de IDT En lugar de crear nuestro propio tipo de IDT, utilizaremos la estructura [`InterruptDescriptorTable`] del crate `x86_64`, que luce así: [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html ``` rust #[repr(C)] pub struct InterruptDescriptorTable { pub divide_by_zero: Entry, pub debug: Entry, pub non_maskable_interrupt: Entry, pub breakpoint: Entry, pub overflow: Entry, pub bound_range_exceeded: Entry, pub invalid_opcode: Entry, pub device_not_available: Entry, pub double_fault: Entry, pub invalid_tss: Entry, pub segment_not_present: Entry, pub stack_segment_fault: Entry, pub general_protection_fault: Entry, pub page_fault: Entry, pub x87_floating_point: Entry, pub alignment_check: Entry, pub machine_check: Entry, pub simd_floating_point: Entry, pub virtualization: Entry, pub security_exception: Entry, // algunos campos omitidos } ``` Los campos tienen el tipo [`idt::Entry`], que es una estructura que representa los campos de una entrada de IDT (ver tabla anterior). El parámetro de tipo `F` define el tipo esperado de la función manejadora. Vemos que algunas entradas requieren un [`HandlerFunc`] y algunas entradas requieren un [`HandlerFuncWithErrCode`]. El fallo de página incluso tiene su propio tipo especial: [`PageFaultHandlerFunc`]. [`idt::Entry`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.Entry.html [`HandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFunc.html [`HandlerFuncWithErrCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFuncWithErrCode.html [`PageFaultHandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.PageFaultHandlerFunc.html Veamos primero el tipo `HandlerFunc`: ```rust type HandlerFunc = extern "x86-interrupt" fn(_: InterruptStackFrame); ``` Es un [alias de tipo] para un tipo de `extern "x86-interrupt" fn`. La palabra clave `extern` define una función con una [convención de llamada foránea] y se utiliza a menudo para comunicarse con código C (`extern "C" fn`). Pero, ¿cuál es la convención de llamada `x86-interrupt`? [alias de tipo]: https://doc.rust-lang.org/book/ch20-03-advanced-types.html#creating-type-synonyms-with-type-aliases [convención de llamada foránea]: https://doc.rust-lang.org/nomicon/ffi.html#foreign-calling-conventions ## La convención de llamada de interrupción Las excepciones son bastante similares a las llamadas a funciones: la CPU salta a la primera instrucción de la función llamada y la ejecuta. Después, la CPU salta a la dirección de retorno y continúa la ejecución de la función madre. Sin embargo, hay una gran diferencia entre excepciones y llamadas a funciones: una llamada a función es invocada voluntariamente por una instrucción `call` insertada por el compilador, mientras que una excepción puede ocurrir en _cualquier_ instrucción. Para entender las consecuencias de esta diferencia, necesitamos examinar las llamadas a funciones en más detalle. [Convenciones de llamada] especifican los detalles de una llamada a función. Por ejemplo, especifican dónde se colocan los parámetros de la función (por ejemplo, en registros o en la pila) y cómo se devuelven los resultados. En x86_64 Linux, se aplican las siguientes reglas para funciones C (especificadas en el [ABI de System V]): [Convenciones de llamada]: https://en.wikipedia.org/wiki/Calling_convention [ABI de System V]: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf - los primeros seis argumentos enteros se pasan en los registros `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` - argumentos adicionales se pasan en la pila - los resultados se devuelven en `rax` y `rdx` Tenga en cuenta que Rust no sigue el ABI de C (de hecho, [ni siquiera hay un ABI de Rust todavía][rust abi]), por lo que estas reglas solo se aplican a funciones declaradas como `extern "C" fn`. [rust abi]: https://github.com/rust-lang/rfcs/issues/600 ### Registros preservados y de uso La convención de llamada divide los registros en dos partes: registros _preservados_ y registros _de uso_. Los valores de los registros _preservados_ deben permanecer sin cambios a través de llamadas a funciones. Por lo tanto, una función llamada (la _“llamada”_) solo puede sobrescribir estos registros si restaura sus valores originales antes de retornar. Por ello, estos registros se llaman _“guardados por el llamado”_. Un patrón común es guardar estos registros en la pila al inicio de la función y restaurarlos justo antes de retornar. En contraste, una función llamada puede sobrescribir registros _de uso_ sin restricciones. Si el llamador quiere preservar el valor de un registro de uso a través de una llamada a función, necesita respaldarlo y restaurarlo antes de la llamada a la función (por ejemplo, empujándolo a la pila). Así, los registros de uso son _guardados por el llamador_. En x86_64, la convención de llamada C especifica los siguientes registros preservados y de uso: | registros preservados | registros de uso | | ----------------------------------------------- | ----------------------------------------------------------- | | `rbp`, `rbx`, `rsp`, `r12`, `r13`, `r14`, `r15` | `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`, `r9`, `r10`, `r11` | | _guardados por el llamado_ | _guardados por el llamador_ | El compilador conoce estas reglas, por lo que genera el código en consecuencia. Por ejemplo, la mayoría de las funciones comienzan con un `push rbp`, que respalda `rbp` en la pila (porque es un registro guardado por el llamado). ### Preservando todos los registros A diferencia de las llamadas a funciones, las excepciones pueden ocurrir en _cualquier_ instrucción. En la mayoría de los casos, ni siquiera sabemos en tiempo de compilación si el código generado causará una excepción. Por ejemplo, el compilador no puede saber si una instrucción provoca un desbordamiento de pila o un fallo de página. Dado que no sabemos cuándo ocurrirá una excepción, no podemos respaldar ningún registro antes. Esto significa que no podemos usar una convención de llamada que dependa de registros guardados por el llamador para los manejadores de excepciones. En su lugar, necesitamos una convención de llamada que preserve _todos los registros_. La convención de llamada `x86-interrupt` es una de esas convenciones, por lo que garantiza que todos los valores de los registros se restauren a sus valores originales al retornar de la función. Tenga en cuenta que esto no significa que todos los registros se guarden en la pila al ingresar la función. En su lugar, el compilador solo respalda los registros que son sobrescritos por la función. De esta manera, se puede generar un código muy eficiente para funciones cortas que solo utilizan unos pocos registros. ### El marco de pila de interrupción En una llamada a función normal (usando la instrucción `call`), la CPU empuja la dirección de retorno antes de saltar a la función objetivo. Al retornar de la función (usando la instrucción `ret`), la CPU extrae esta dirección de retorno y salta a ella. Por lo tanto, el marco de pila de una llamada a función normal se ve así: ![marco de pila de función](function-stack-frame.svg) Sin embargo, para los manejadores de excepciones e interrupciones, empujar una dirección de retorno no sería suficiente, ya que los manejadores de interrupción a menudo se ejecutan en un contexto diferente (puntero de pila, flags de CPU, etc.). En cambio, la CPU realiza los siguientes pasos cuando ocurre una interrupción: 0. **Guardando el antiguo puntero de pila**: La CPU lee los valores del puntero de pila (`rsp`) y del registro del segmento de pila (`ss`) y los recuerda en un búfer interno. 1. **Alineando el puntero de pila**: Una interrupción puede ocurrir en cualquier instrucción, por lo que el puntero de pila también puede tener cualquier valor. Sin embargo, algunas instrucciones de CPU (por ejemplo, algunas instrucciones SSE) requieren que el puntero de pila esté alineado en un límite de 16 bytes, por lo que la CPU realiza tal alineación inmediatamente después de la interrupción. 2. **Cambiando de pilas** (en algunos casos): Se produce un cambio de pila cuando cambia el nivel de privilegio de la CPU, por ejemplo, cuando ocurre una excepción de CPU en un programa en modo usuario. También es posible configurar los cambios de pila para interrupciones específicas utilizando la llamada _Tabla de Pila de Interrupción_ (descrita en la próxima publicación). 3. **Empujando el antiguo puntero de pila**: La CPU empuja los valores `rsp` y `ss` del paso 0 a la pila. Esto hace posible restaurar el puntero de pila original al retornar de un manejador de interrupción. 4. **Empujando y actualizando el registro `RFLAGS`**: El registro [`RFLAGS`] contiene varios bits de control y estado. Al entrar en la interrupción, la CPU cambia algunos bits y empuja el antiguo valor. 5. **Empujando el puntero de instrucción**: Antes de saltar a la función manejadora de la interrupción, la CPU empuja el puntero de instrucción (`rip`) y el segmento de código (`cs`). Esto es comparable al empuje de la dirección de retorno de una llamada a función normal. 6. **Empujando un código de error** (para algunas excepciones): Para algunas excepciones específicas, como los fallos de página, la CPU empuja un código de error, que describe la causa de la excepción. 7. **Invocando el manejador de interrupción**: La CPU lee la dirección y el descriptor de segmento de la función manejadora de interrupción del campo correspondiente en la IDT. Luego, invoca este manejador cargando los valores en los registros `rip` y `cs`. [`RFLAGS`]: https://en.wikipedia.org/wiki/FLAGS_register Así, el _marco de pila de interrupción_ se ve así: ![marco de pila de interrupción](exception-stack-frame.svg) En el crate `x86_64`, el marco de pila de interrupción está representado por la estructura [`InterruptStackFrame`]. Se pasa a los manejadores de interrupción como `&mut` y se puede utilizar para recuperar información adicional sobre la causa de la excepción. La estructura no contiene un campo de código de error, ya que solo algunas pocas excepciones empujan un código de error. Estas excepciones utilizan el tipo de función separado [`HandlerFuncWithErrCode`], que tiene un argumento adicional `error_code`. [`InterruptStackFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptStackFrame.html ### Detrás de las escenas {#demasiada-magia} La convención de llamada `x86-interrupt` es una potente abstracción que oculta casi todos los detalles desordenados del proceso de manejo de excepciones. Sin embargo, a veces es útil saber lo que sucede tras el telón. Aquí hay un breve resumen de las cosas que la convención de llamada `x86-interrupt` maneja: - **Recuperando los argumentos**: La mayoría de las convenciones de llamada esperan que los argumentos se pasen en registros. Esto no es posible para los manejadores de excepciones, ya que no debemos sobrescribir los valores de ningún registro antes de respaldarlos en la pila. En cambio, la convención de llamada `x86-interrupt` es consciente de que los argumentos ya están en la pila en un desplazamiento específico. - **Retornando usando `iretq`**: Dado que el marco de pila de interrupción difiere completamente de los marcos de pila de llamadas a funciones normales, no podemos retornar de las funciones manejadoras a través de la instrucción `ret` normal. Así que en su lugar, se debe usar la instrucción `iretq`. - **Manejando el código de error**: El código de error, que se empuja para algunas excepciones, hace que las cosas sean mucho más complejas. Cambia la alineación de la pila (vea el siguiente punto) y debe ser extraído de la pila antes de retornar. La convención de llamada `x86-interrupt` maneja toda esa complejidad. Sin embargo, no sabe qué función manejadora se utiliza para qué excepción, por lo que necesita deducir esa información del número de argumentos de función. Esto significa que el programador sigue siendo responsable de utilizar el tipo de función correcto para cada excepción. Afortunadamente, el tipo `InterruptDescriptorTable` definido por el crate `x86_64` asegura que se utilicen los tipos de función correctos. - **Alineando la pila**: Algunas instrucciones (especialmente las instrucciones SSE) requieren que la pila esté alineada a 16 bytes. La CPU asegura esta alineación cada vez que ocurre una excepción, pero para algunas excepciones, puede destruirla de nuevo más tarde cuando empuja un código de error. La convención de llamada `x86-interrupt` se encarga de esto al realinear la pila en este caso. Si está interesado en más detalles, también tenemos una serie de publicaciones que explican el manejo de excepciones utilizando [funciones desnudas] vinculadas [al final de esta publicación][too-much-magic]. [funciones desnudas]: https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md [too-much-magic]: #demasiada-magia ## Implementación Ahora que hemos entendido la teoría, es hora de manejar las excepciones de CPU en nuestro núcleo. Comenzaremos creando un nuevo módulo de interrupciones en `src/interrupts.rs`, que primero crea una función `init_idt` que crea una nueva `InterruptDescriptorTable`: ``` rust // en src/lib.rs pub mod interrupts; // en src/interrupts.rs use x86_64::structures::idt::InterruptDescriptorTable; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); } ``` Ahora podemos agregar funciones manejadoras. Comenzamos agregando un manejador para la [excepción de punto de interrupción]. La excepción de punto de interrupción es la excepción perfecta para probar el manejo de excepciones. Su único propósito es pausar temporalmente un programa cuando se ejecuta la instrucción de punto de interrupción `int3`. [excepción de punto de interrupción]: https://wiki.osdev.org/Exceptions#Breakpoint La excepción de punto de interrupción se utiliza comúnmente en depuradores: cuando el usuario establece un punto de interrupción, el depurador sobrescribe la instrucción correspondiente con la instrucción `int3` para que la CPU lance la excepción de punto de interrupción al llegar a esa línea. Cuando el usuario quiere continuar el programa, el depurador reemplaza la instrucción `int3` con la instrucción original nuevamente y continúa el programa. Para más detalles, vea la serie ["_Cómo funcionan los depuradores_"]. ["_Cómo funcionan los depuradores_"]: https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints Para nuestro caso de uso, no necesitamos sobrescribir instrucciones. En su lugar, solo queremos imprimir un mensaje cuando la instrucción de punto de interrupción se ejecute y luego continuar el programa. Así que creemos una simple función `breakpoint_handler` y la agreguemos a nuestra IDT: ```rust // en src/interrupts.rs use x86_64::structures::idt::{InterruptDescriptorTable, InterruptStackFrame}; use crate::println; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); } extern "x86-interrupt" fn breakpoint_handler( stack_frame: InterruptStackFrame) { println!("EXCEPCIÓN: PUNTO DE INTERRUPCIÓN\n{:#?}", stack_frame); } ``` Nuestro manejador simplemente muestra un mensaje y imprime en formato bonito el marco de pila de interrupción. Cuando intentamos compilarlo, ocurre el siguiente error: ``` error[E0658]: la ABI de x86-interrupt es experimental y está sujeta a cambios (ver issue #40180) --> src/main.rs:53:1 | 53 | / extern "x86-interrupt" fn breakpoint_handler(stack_frame: InterruptStackFrame) { 54 | | println!("EXCEPCIÓN: PUNTO DE INTERRUPCIÓN\n{:#?}", stack_frame); 55 | | } | |_^ | = ayuda: añade #![feature(abi_x86_interrupt)] a los atributos del crate para habilitarlo ``` Este error ocurre porque la convención de llamada `x86-interrupt` sigue siendo inestable. Para utilizarla de todos modos, tenemos que habilitarla explícitamente añadiendo `#![feature(abi_x86_interrupt)]` en la parte superior de nuestro `lib.rs`. ### Cargando la IDT Para que la CPU utilice nuestra nueva tabla de descriptores de interrupción, necesitamos cargarla usando la instrucción [`lidt`]. La estructura `InterruptDescriptorTable` del crate `x86_64` proporciona un método [`load`][InterruptDescriptorTable::load] para eso. Intentemos usarlo: [`lidt`]: https://www.felixcloutier.com/x86/lgdt:lidt [InterruptDescriptorTable::load]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html#method.load ```rust // en src/interrupts.rs pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.load(); } ``` Cuando intentamos compilarlo ahora, ocurre el siguiente error: ``` error: `idt` no vive lo suficiente --> src/interrupts/mod.rs:43:5 | 43 | idt.load(); | ^^^ no vive lo suficiente 44 | } | - el valor prestado solo es válido hasta aquí | = nota: el valor prestado debe ser válido durante la vida estática... ``` Así que el método `load` espera un `&'static self`, es decir, una referencia válida para la duración completa del programa. La razón es que la CPU accederá a esta tabla en cada interrupción hasta que se cargue una IDT diferente. Por lo tanto, usar una vida más corta que `'static` podría llevar a errores de uso después de liberar. De hecho, esto es exactamente lo que sucede aquí. Nuestra `idt` se crea en la pila, por lo que solo es válida dentro de la función `init`. Después, la memoria de la pila se reutiliza para otras funciones, por lo que la CPU podría interpretar una memoria aleatoria de la pila como IDT. Afortunadamente, el método `load` de `InterruptDescriptorTable` codifica este requisito de vida en su definición de función, para que el compilador de Rust pueda prevenir este posible error en tiempo de compilación. Para solucionar este problema, necesitamos almacenar nuestra `idt` en un lugar donde tenga una vida `'static`. Para lograr esto, podríamos asignar nuestra IDT en el montón usando [`Box`] y luego convertirla en una referencia `'static`, pero estamos escribiendo un núcleo de sistema operativo y, por lo tanto, no tenemos un montón (todavía). [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html Como alternativa, podríamos intentar almacenar la IDT como una `static`: ```rust static IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } ``` Sin embargo, hay un problema: las estáticas son inmutables, por lo que no podemos modificar la entrada de punto de interrupción desde nuestra función `init`. Podríamos resolver este problema utilizando un [`static mut`]: [`static mut`]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable ```rust static mut IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { unsafe { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } } ``` Esta variante se compila sin errores, pero está lejos de ser idiomática. Las variables `static mut` son muy propensas a condiciones de carrera, por lo que necesitamos un bloque [`unsafe`] en cada acceso. [`unsafe`]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers #### Las estáticas perezosas al rescate Afortunadamente, existe el macro `lazy_static`. En lugar de evaluar una `static` en tiempo de compilación, el macro realiza la inicialización de cuando la `static` es referenciada por primera vez. Por lo tanto, podemos hacer casi todo en el bloque de inicialización e incluso ser capaces de leer valores en tiempo de ejecución. Ya importamos el crate `lazy_static` cuando [creamos una abstracción para el búfer de texto VGA][vga text buffer lazy static]. Así que podemos utilizar directamente el macro `lazy_static!` para crear nuestra IDT estática: [vga text buffer lazy static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics ```rust // en src/interrupts.rs use lazy_static::lazy_static; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt }; } pub fn init_idt() { IDT.load(); } ``` Tenga en cuenta cómo esta solución no requiere bloques `unsafe`. El macro `lazy_static!` utiliza `unsafe` detrás de escena, pero está abstraído en una interfaz segura. ### Ejecutándolo El último paso para hacer que las excepciones funcionen en nuestro núcleo es llamar a la función `init_idt` desde nuestro `main.rs`. En lugar de llamarla directamente, introducimos una función de inicialización general en nuestro `lib.rs`: ```rust // en src/lib.rs pub fn init() { interrupts::init_idt(); } ``` Con esta función, ahora tenemos un lugar central para las rutinas de inicialización que se pueden compartir entre las diferentes funciones `_start` en nuestro `main.rs`, `lib.rs` y pruebas de integración. Ahora podemos actualizar la función `_start` de nuestro `main.rs` para llamar a `init` y luego activar una excepción de punto de interrupción: ```rust // en src/main.rs #[no_mangle] pub extern "C" fn _start() -> ! { println!("¡Hola Mundo{}", "!"); blog_os::init(); // nueva // invocar una excepción de punto de interrupción x86_64::instructions::interrupts::int3(); // nueva // como antes #[cfg(test)] test_main(); println!("¡No se bloqueó!"); loop {} } ``` Cuando lo ejecutamos en QEMU ahora (usando `cargo run`), vemos lo siguiente: ![QEMU imprimiendo `EXCEPCIÓN: PUNTO DE INTERRUPCIÓN` y el marco de pila de interrupción](qemu-breakpoint-exception.png) ¡Funciona! La CPU invoca exitosamente nuestro manejador de punto de interrupción, que imprime el mensaje, y luego devuelve a la función `_start`, donde se imprime el mensaje `¡No se bloqueó!`. Vemos que el marco de pila de interrupción nos indica los punteros de instrucción y de pila en el momento en que ocurrió la excepción. Esta información es muy útil al depurar excepciones inesperadas. ### Agregando una prueba Creemos una prueba que asegure que lo anterior sigue funcionando. Primero, actualizamos la función `_start` para que también llame a `init`: ```rust // en src/lib.rs /// Punto de entrada para `cargo test` #[cfg(test)] #[no_mangle] pub extern "C" fn _start() -> ! { init(); // nueva test_main(); loop {} } ``` Recuerde, esta función `_start` se utiliza al ejecutar `cargo test --lib`, ya que Rust prueba el `lib.rs` completamente de forma independiente de `main.rs`. Necesitamos llamar a `init` aquí para configurar una IDT antes de ejecutar las pruebas. Ahora podemos crear una prueba `test_breakpoint_exception`: ```rust // en src/interrupts.rs #[test_case] fn test_breakpoint_exception() { // invocar una excepción de punto de interrupción x86_64::instructions::interrupts::int3(); } ``` La prueba invoca la función `int3` para activar una excepción de punto de interrupción. Al verificar que la ejecución continúa después, verificamos que nuestro manejador de punto de interrupción está funcionando correctamente. Puedes probar esta nueva prueba ejecutando `cargo test` (todas las pruebas) o `cargo test --lib` (solo las pruebas de `lib.rs` y sus módulos). Deberías ver lo siguiente en la salida: ``` blog_os::interrupts::test_breakpoint_exception... [ok] ``` ## ¿Demasiada magia? La convención de llamada `x86-interrupt` y el tipo [`InterruptDescriptorTable`] hicieron que el proceso de manejo de excepciones fuera relativamente sencillo y sin dolor. Si esto fue demasiada magia para ti y te gusta aprender todos los detalles sucios del manejo de excepciones, tenemos cubiertos: Nuestra serie ["Manejo de Excepciones con Funciones Desnudas"] muestra cómo manejar excepciones sin la convención de llamada `x86-interrupt` y también crea su propio tipo de IDT. Históricamente, estas publicaciones eran las principales publicaciones sobre manejo de excepciones antes de que existieran la convención de llamada `x86-interrupt` y el crate `x86_64`. Tenga en cuenta que estas publicaciones se basan en la [primera edición] de este blog y pueden estar desactualizadas. ["Manejo de Excepciones con Funciones Desnudas"]: @/edition-1/extra/naked-exceptions/_index.md [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [primera edición]: @/edition-1/_index.md ## ¿Qué sigue? ¡Hemos capturado con éxito nuestra primera excepción y regresamos de ella! El siguiente paso es asegurarnos de que capturamos todas las excepciones porque una excepción no capturada causa un [triple fallo] fatal, lo que lleva a un reinicio del sistema. La próxima publicación explica cómo podemos evitar esto al capturar correctamente [dobles fallos]. [triple fallo]: https://wiki.osdev.org/Triple_Fault [dobles fallos]: https://wiki.osdev.org/Double_Fault#Double_Fault ================================================ FILE: blog/content/edition-2/posts/05-cpu-exceptions/index.fa.md ================================================ +++ title = "استثناهای پردازنده" weight = 5 path = "fa/cpu-exceptions" date = 2018-06-17 [extra] # Please update this when updating the translation translation_based_on_commit = "a081faf3cced9aeb0521052ba91b74a1c408dcff" # GitHub usernames of the people that translated this post translators = ["hamidrezakp", "MHBahrampour"] rtl = true +++ استثناهای پردازنده در موقعیت های مختلف دارای خطا رخ می دهد ، به عنوان مثال هنگام دسترسی به آدرس حافظه نامعتبر یا تقسیم بر صفر. برای واکنش به آنها ، باید یک _جدول توصیف کننده وقفه_ تنظیم کنیم که توابع کنترل کننده را فراهم کند. در انتهای این پست ، هسته ما قادر به گرفتن [استثناهای breakpoint] و ادامه اجرای طبیعی پس از آن خواهد بود. [استثناهای breakpoint]: https://wiki.osdev.org/Exceptions#Breakpoint این بلاگ بصورت آزاد روی [گیت‌هاب] توسعه داده شده است. اگر مشکل یا سوالی دارید، لطفاً آن‌جا یک ایشو باز کنید. همچنین می‌توانید [در زیر] این پست کامنت بگذارید. سورس کد کامل این پست را می‌توانید در بِرَنچ [`post-05`][post branch] پیدا کنید. [گیت‌هاب]: https://github.com/phil-opp/blog_os [در زیر]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-05 ## بررسی اجمالی یک استثنا نشان می دهد که مشکلی در دستورالعمل فعلی وجود دارد. به عنوان مثال ، اگر دستورالعمل فعلی بخواهد تقسیم بر 0 کند ، پردازنده یک استثنا صادر می کند. وقتی یک استثنا اتفاق می افتد ، پردازنده کار فعلی خود را رها کرده و بسته به نوع استثنا ، بلافاصله یک تابع خاص کنترل کننده استثنا را فراخوانی می کند. در x86 حدود 20 نوع مختلف استثنا پردازنده وجود دارد. مهمترین آنها در زیر آمده اند: - **خطای صفحه**: خطای صفحه در دسترسی غیرقانونی به حافظه رخ می دهد. به عنوان مثال ، اگر دستورالعمل فعلی بخواهد از یک صفحه نگاشت نشده بخواند یا بخواهد در یک صفحه فقط خواندنی بنویسد. - **کد نامعتبر**: این استثنا وقتی رخ می دهد که دستورالعمل فعلی نامعتبر است ، به عنوان مثال وقتی می خواهیم از [دستورالعمل های SSE] جدیدتر بر روی یک پردازنده قدیمی استفاده کنیم که آنها را پشتیبانی نمی کند. - **خطای محافظت عمومی**: این استثنا دارای بیشترین دامنه علل است. این مورد در انواع مختلف نقض دسترسی مانند تلاش برای اجرای یک دستورالعمل ممتاز در کد سطح کاربر یا نوشتن فیلدهای رزرو شده در ثبات های پیکربندی رخ می دهد. - **خطای دوگانه**: هنگامی که یک استثنا رخ می دهد ، پردازنده سعی می کند تابع کنترل کننده مربوطه را اجرا کند. اگر یک استثنا دیگر رخ دهد _هنگام فراخوانی تابع کنترل کننده استثنا_ ، پردازنده یک استثنای خطای دوگانه ایجاد می کند. این استثنا همچنین زمانی اتفاق می افتد که هیچ تابع کنترل کننده ای برای یک استثنا ثبت نشده باشد. - **خطای سه‌گانه**: اگر در حالی که پردازنده سعی می کند تابع کنترل کننده خطای دوگانه را فراخوانی کند استثنایی رخ دهد ، این یک خطای سه‌گانه است. ما نمی توانیم یک خطای سه گانه را بگیریم یا آن را کنترل کنیم. بیشتر پردازنده ها ریست کردن خود و راه اندازی مجدد سیستم عامل واکنش نشان می دهند. [دستورالعمل های SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions برای مشاهده لیست کامل استثنا‌ها ، [ویکی OSDev][exceptions] را بررسی کنید. [exceptions]: https://wiki.osdev.org/Exceptions ### جدول توصیف کننده وقفه برای گرفتن و رسیدگی به استثنا‌ها ، باید اصطلاحاً _جدول توصیفگر وقفه_ (IDT) را تنظیم کنیم. در این جدول می توانیم برای هر استثنا پردازنده یک عملکرد تابع کننده مشخص کنیم. سخت افزار به طور مستقیم از این جدول استفاده می کند ، بنابراین باید از یک قالب از پیش تعریف شده پیروی کنیم. هر ورودی جدول باید ساختار 16 بایتی زیر را داشته باشد: | Type | Name | Description | | ---- | ------------------------ | ------------------------------------------------------------ | | u16 | Function Pointer [0:15] | The lower bits of the pointer to the handler function. | | u16 | GDT selector | Selector of a code segment in the [global descriptor table]. | | u16 | Options | (see below) | | u16 | Function Pointer [16:31] | The middle bits of the pointer to the handler function. | | u32 | Function Pointer [32:63] | The remaining bits of the pointer to the handler function. | | u32 | Reserved | [global descriptor table]: https://en.wikipedia.org/wiki/Global_Descriptor_Table قسمت گزینه ها (Options) دارای قالب زیر است: | Bits | Name | Description | | ----- | -------------------------------- | --------------------------------------------------------------------------------------------------------------- | | 0-2 | Interrupt Stack Table Index | 0: Don't switch stacks, 1-7: Switch to the n-th stack in the Interrupt Stack Table when this handler is called. | | 3-7 | Reserved | | 8 | 0: Interrupt Gate, 1: Trap Gate | If this bit is 0, interrupts are disabled when this handler is called. | | 9-11 | must be one | | 12 | must be zero | | 13‑14 | Descriptor Privilege Level (DPL) | The minimal privilege level required for calling this handler. | | 15 | Present | هر استثنا دارای یک اندیس از پیش تعریف شده در IDT است. به عنوان مثال استثنا کد نامعتبر دارای اندیس 6 و استثنا خطای صفحه دارای اندیس 14 است. بنابراین ، سخت افزار می تواند به طور خودکار عنصر مربوطه را برای هر استثنا بارگذاری کند. [جدول استثناها][exceptions] در ویکی OSDev ، اندیس های IDT کلیه استثناها را در ستون “Vector nr.” نشان داده است. هنگامی که یک استثنا رخ می دهد ، پردازنده تقریباً موارد زیر را انجام می دهد: 1. برخی از ثبات‌ها را به پشته وارد می‌کند، از جمله اشاره گر دستورالعمل و ثبات [RFLAGS]. (بعداً در این پست از این مقادیر استفاده خواهیم کرد.) 2. عنصر مربوط به آن (استثنا) را از جدول توصیف کننده وقفه (IDT) می‌خواند. به عنوان مثال ، پردازنده هنگام رخ دادن خطای صفحه ، عنصر چهاردهم را می خواند. 3. وجود عنصر را بررسی می‌کند. اگر اینگونه نباشد یک خطای دوگانه ایجاد می‌کند. 4. اگر عنصر یک گیت وقفه است (بیت 40 تنظیم نشده است) وقفه های سخت افزاری را غیرفعال می‌کند. 5. انتخابگر مشخص شده [GDT] را در سگمنت CS بارگذاری می‌کند. 6. به تابع کنترل کننده مشخص شده می‌رود. [RFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register [GDT]: https://en.wikipedia.org/wiki/Global_Descriptor_Table در حال حاضر نگران مراحل 4 و 5 نباشید ، ما در مورد جدول توصیف کننده گلوبال و وقفه های سخت افزاری در پست های بعدی خواهیم آموخت. ## یک نوع IDT به جای ایجاد نوع IDT خود ، از [ساختمان `InterruptDescriptorTable`] کرت `x86_64` استفاده خواهیم کرد که به این شکل است: [ساختمان `InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html ``` rust #[repr(C)] pub struct InterruptDescriptorTable { pub divide_by_zero: Entry, pub debug: Entry, pub non_maskable_interrupt: Entry, pub breakpoint: Entry, pub overflow: Entry, pub bound_range_exceeded: Entry, pub invalid_opcode: Entry, pub device_not_available: Entry, pub double_fault: Entry, pub invalid_tss: Entry, pub segment_not_present: Entry, pub stack_segment_fault: Entry, pub general_protection_fault: Entry, pub page_fault: Entry, pub x87_floating_point: Entry, pub alignment_check: Entry, pub machine_check: Entry, pub simd_floating_point: Entry, pub virtualization: Entry, pub security_exception: Entry, // some fields omitted } ``` فیلدها از نوع [` src/main.rs:53:1 | 53 | / extern "x86-interrupt" fn breakpoint_handler(stack_frame: InterruptStackFrame) { 54 | | println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); 55 | | } | |_^ | = help: add #![feature(abi_x86_interrupt)] to the crate attributes to enable ``` این خطا به این دلیل رخ می دهد که قرارداد فراخوانی `x86-interrupt` هنوز ناپایدار است. به هر حال برای استفاده از آن ، باید صریحاً آن را با اضافه کردن `#![feature(abi_x86_interrupt)]` در بالای `lib.rs` فعال کنیم. ### بارگیری IDT برای اینکه پردازنده از جدول توصیف کننده وقفه جدید ما استفاده کند ، باید آن را با استفاده از دستورالعمل [`lidt`] بارگیری کنیم. ساختمان `InterruptDescriptorTable` از کرت ` x86_64` متد [`load`][InterruptDescriptorTable::load] را برای این کار فراهم می کند. بیایید سعی کنیم از آن استفاده کنیم: [`lidt`]: https://www.felixcloutier.com/x86/lgdt:lidt [InterruptDescriptorTable::load]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html#method.load ```rust // in src/interrupts.rs pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.load(); } ``` اکنون هنگامی که می خواهیم آن را کامپایل کنیم ، خطای زیر رخ می دهد: ``` error: `idt` does not live long enough --> src/interrupts/mod.rs:43:5 | 43 | idt.load(); | ^^^ does not live long enough 44 | } | - borrowed value only lives until here | = note: borrowed value must be valid for the static lifetime... ``` پس متد `load` انتظار دریافت یک `static self'&` را دارد، این مرجعی است که برای تمام مدت زمان اجرای برنامه معتبر است. دلیل این امر این است که پردازنده در هر وقفه به این جدول دسترسی پیدا می کند تا زمانی که IDT دیگری بارگیری کنیم. بنابراین استفاده از طول عمر کوتاه تر از `static'` می تواند منجر به باگ های استفاده-بعد-از-آزادسازی شود. در واقع ، این دقیقاً همان چیزی است که در اینجا اتفاق می افتد. `idt` ما روی پشته ایجاد می شود ، بنابراین فقط در داخل تابع `init` معتبر است. پس از آن حافظه پشته برای توابع دیگر مورد استفاده مجدد قرار می گیرد ، بنابراین پردازنده حافظه پشته تصادفی را به عنوان IDT تفسیر می کند. خوشبختانه ، متد `InterruptDescriptorTable::load` این نیاز به طول عمر را در تعریف تابع خود اجباری می کند، بنابراین کامپایلر راست قادر است از این مشکل احتمالی در زمان کامپایل جلوگیری کند. برای رفع این مشکل، باید `idt` را در مکانی ذخیره کنیم که طول عمر `static'` داشته باشد. برای رسیدن به این هدف می توانیم IDT را با استفاده از [`Box`] بر روی حافظه Heap ایجاد کنیم و سپس آن را به یک مرجع `static'` تبدیل کنیم، اما ما در حال نوشتن هسته سیستم عامل هستیم و بنابراین هنوز Heap نداریم. [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html به عنوان یک گزینه دیگر، می توانیم IDT را به صورت `static` ذخیره کنیم: ```rust static IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } ``` با این وجود، یک مشکل وجود دارد: استاتیک‌ها تغییرناپذیر هستند، پس نمی توانیم ورودی بریک‌پوینت را از تابع `init` تغییر دهیم. می توانیم این مشکل را با استفاده از [`static mut`] حل کنیم: [`static mut`]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable ```rust static mut IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { unsafe { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } } ``` در این روش بدون خطا کامپایل می شود اما مشکلات دیگری به همراه دارد. `static mut` بسیار مستعد Data Race هستند، بنابراین در هر دسترسی به یک [بلوک `unsafe`] نیاز داریم. [بلوک `unsafe`]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers #### Lazy Statics به نجات ما می‌آیند خوشبختانه ماکرو `lazy_static` وجود دارد. ماکرو به جای ارزیابی یک `static` در زمان کامپایل ، مقداردهی اولیه آن را هنگام اولین ارجاع به آن انجام می دهد. بنابراین، می توانیم تقریباً همه کاری را در بلوک مقداردهی اولیه انجام دهیم و حتی قادر به خواندن مقادیر زمان اجرا هستیم. ما قبلاً کرت `lazy_static` را وارد کردیم وقتی [یک انتزاع برای بافر متن VGA ایجاد کردیم][vga text buffer lazy static]. بنابراین می توانیم مستقیماً از ماکرو `!lazy_static` برای ایجاد IDT استاتیک استفاده کنیم: [vga text buffer lazy static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics ```rust // in src/interrupts.rs use lazy_static::lazy_static; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt }; } pub fn init_idt() { IDT.load(); } ``` توجه داشته باشید که چگونه این راه حل به هیچ بلوک `unsafe` نیاز ندارد. ماکرو `!lazy_static` از `unsafe` در پشت صحنه استفاده می کند ، اما در یک رابط امن به ما داده می شود. ### اجرای آن آخرین مرحله برای کارکرد استثناها در هسته ما فراخوانی تابع `init_idt` از `main.rs` است. به جای فراخوانی مستقیم آن، یک تابع عمومی `init` را در `lib.rs` معرفی می کنیم: ```rust // in src/lib.rs pub fn init() { interrupts::init_idt(); } ``` با استفاده از این تابع اکنون یک مکان اصلی برای روالهای اولیه داریم که می تواند بین توابع مختلف `start_` در `main.rs` ، `lib.rs` و تست‌های یک‌پارچه به اشتراک گذاشته شود. اکنون می توانیم تابع `start_` در `main.rs` را به روز کنیم تا `init` را فراخوانی کرده و سپس یک استثنا بریک‌پوینت ایجاد کند: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // new // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); // new // as before #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` اکنون هنگامی که آن را در QEMU اجرا می کنیم (با استفاده از `cargo run`) ، موارد زیر را مشاهده می کنیم: ![QEMU printing `EXCEPTION: BREAKPOINT` and the interrupt stack frame](qemu-breakpoint-exception.png) کار می کند! پردازنده با موفقیت تابع کنترل کننده بریک‌پوینت ما را فراخوانی می کند ، که پیام را چاپ می کند و سپس به تابع `start_` برمی گردد ، جایی که پیام `!It did not crash` چاپ شده است. می بینیم که قاب پشته وقفه، دستورالعمل و نشانگرهای پشته را در زمان وقوع استثنا به ما می گوید. این اطلاعات هنگام رفع اشکال استثناهای غیر منتظره بسیار مفید است. ### افزودن یک تست بیایید یک تست ایجاد کنیم که از ادامه کار کد بالا اطمینان حاصل کند. ابتدا تابع `start_` را به روز می کنیم تا `init` را نیز فراخوانی کند: ```rust // in src/lib.rs /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); // new test_main(); loop {} } ``` بخاطر داشته باشید، این تابع `start_` هنگام اجرای`cargo test --lib` استفاده می شود، زیرا راست `lib.rs` را کاملاً مستقل از`main.rs` تست می‌کند. قبل از اجرای تست‌ها باید برای راه اندازی IDT در اینجا `init` فراخوانی شود. اکنون می توانیم یک تست `test_breakpoint_exception` ایجاد کنیم: ```rust // in src/interrupts.rs #[test_case] fn test_breakpoint_exception() { // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); } ``` این تست تابع `int3` را فراخوانی می کند تا یک استثنا بریک‌پوینت ایجاد کند. با بررسی اینکه اجرا پس از آن ادامه دارد ، تأیید می کنیم که کنترل کننده بریک‌پوینت ما به درستی کار می کند. شما می توانید این تست جدید را با اجرای `cargo test` (همه تست‌ها) یا` cargo test --lib` (فقط تست های `lib.rs` و ماژول های آن) امتحان کنید. باید موارد زیر را در خروجی مشاهده کنید: ``` blog_os::interrupts::test_breakpoint_exception... [ok] ``` ## خیلی جادویی بود؟ قرارداد فراخوانی `x86-interrupt` و نوع [`InterruptDescriptorTable`] روند مدیریت استثناها را نسبتاً سر راست و بدون درد ساخته‌اند. اگر این برای شما بسیار جادویی بود و دوست دارید تمام جزئیات مهم مدیریت استثنا را بیاموزید، برای شما هم مطالبی داریم: مجموعه ["مدیریت استثناها با توابع برهنه"] ما، نحوه مدیریت استثنا‌ها بدون قرارداد فراخوانی`x86-interrupt` را نشان می دهد و همچنین نوع IDT خاص خود را ایجاد می کند. از نظر تاریخی، این پست‌ها مهمترین پست‌های مدیریت استثناها قبل از وجود قرارداد فراخوانی `x86-interrupt` و کرت `x86_64` بودند. توجه داشته باشید که این پست‌ها بر اساس [نسخه اول] این وبلاگ هستند و ممکن است قدیمی باشند. ["مدیریت استثناها با توابع برهنه"]: @/edition-1/extra/naked-exceptions/_index.md [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [نسخه اول]: @/edition-1/_index.md ## مرحله بعدی چیست؟ ما اولین استثنای خود را با موفقیت گرفتیم و از آن بازگشتیم! گام بعدی اطمینان از این است که همه استثناها را می گیریم ، زیرا یک استثنا گرفته نشده باعث [خطای سه‌گانه] می شود که منجر به شروع مجدد سیستم می شود. پست بعدی توضیح می دهد که چگونه می توان با گرفتن صحیح [خطای دوگانه] از این امر جلوگیری کرد. [خطای سه‌گانه]: https://wiki.osdev.org/Triple_Fault [خطای دوگانه]: https://wiki.osdev.org/Double_Fault#Double_Fault ================================================ FILE: blog/content/edition-2/posts/05-cpu-exceptions/index.ja.md ================================================ +++ title = "CPU例外" weight = 5 path = "ja/cpu-exceptions" date = 2018-06-17 [extra] # Please update this when updating the translation translation_based_on_commit = "a8a6b725cff2e485bed76ff52ac1f18cec08cc7b" # GitHub usernames of the people that translated this post translators = ["swnakamura"] +++ CPU例外は、例えば無効なメモリアドレスにアクセスしたときやゼロ除算したときなど、様々なミスによって発生します。それらに対処するために、ハンドラ関数を提供する **割り込み記述子表 (interrupt descriptor table) ** を設定しなくてはなりません。この記事を読み終わる頃には、私達のカーネルは[ブレークポイント例外][breakpoint exceptions]を捕捉し、その後通常の実行を継続できるようになっているでしょう。 [breakpoint exceptions]: https://wiki.osdev.org/Exceptions#Breakpoint このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-05` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-05 ## 概要 例外とは、今実行している命令はなにかおかしいぞ、ということを示すものです。例えば、現在の命令がゼロ除算を実行しようとしているとき、CPUは例外を発します。例外が起こると、CPUは現在行われている作業に割り込み、例外の種類に従って、即座に特定の例外ハンドラ関数を呼びます。 x86には20種類のCPU例外があります。中でも重要なものは: - **ページフォルト (Page Fault) **: ページフォルトは不正なメモリアクセスの際に発生します。例えば、現在の命令がマップされていないページから読み込もうとしたり、読み込み専用のページに書き込もうとしたときに生じます。 - **無効な (Invalid) 命令コード (Opcode) **: この例外は現在の命令が無効であるときに発生します。例えば、[SSE命令][SSE instructions]という新しい命令をサポートしていない旧式のCPU上でこれを実行しようとしたときに生じます。 - **一般保護違反 (General Protection Fault) **: これは、例外の中でも、最もいろいろな理由で発生しうるものです。ユーザーレベルのコードで特権命令 (privileged instruction) を実行しようとしたときや、設定レジスタの保護領域に書き込もうとしたときなど、様々な種類のアクセス違反によって生じます。 - **ダブルフォルト (Double Fault) **: 何らかの例外が起こったとき、CPUは対応するハンドラ関数を呼び出そうとします。 この例外ハンドラを **呼び出している間に** 別の例外が起こった場合、CPUはダブルフォルト例外を出します。この例外はまた、ある例外に対してハンドラ関数が登録されていないときにも起こります。 - **トリプルフォルト (Triple Fault) **: CPUがダブルフォルトのハンドラ関数を呼び出そうとしている間に例外が発生すると、CPUは **トリプルフォルト** という致命的な例外を発します。トリプルフォルトを捕捉したり処理したりすることはできません。これが起こると、多くのプロセッサは自らをリセットしてOSを再起動することで対応します。 [SSE instructions]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions 例外の完全な一覧を見たい場合は、[OSDev wiki][exceptions]を見てください。 [exceptions]: https://wiki.osdev.org/Exceptions ### 割り込み記述子表 例外を捕捉し処理するためには、いわゆる割り込み記述子表 (Interrupt Descriptor Table, IDT) を設定しないといけません。この表にそれぞれのCPU例外に対するハンドラ関数を指定することができます。ハードウェアはこの表を直接使うので、決められたフォーマットに従わないといけません。それぞれのエントリは以下の16バイトの構造を持たなければなりません: | 型 | 名前 | 説明 | | --- | ------------------------ | ------------------------------------------------------------------------------------------------------ | | u16 | 関数ポインタ [0:15] | ハンドラ関数へのポインタの下位ビット。 | | u16 | GDTセレクタ | [大域記述子表 (Global Descriptor Table)][global descriptor table] におけるコードセグメントのセレクタ。 | | u16 | オプション | (下を参照) | | u16 | 関数ポインタ [16:31] | ハンドラ関数へのポインタの中位ビット。 | | u32 | 関数ポインタ [32:63] | ハンドラ関数へのポインタの上位ビット。 | | u32 | 予約済 | [global descriptor table]: https://en.wikipedia.org/wiki/Global_Descriptor_Table オプション部は以下のフォーマットになっています: | ビット | 名前 | 説明 | | ------ | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | | 0-2 | 割り込みスタックテーブルインデックス | 0ならスタックを変えない。1から7なら、ハンドラが呼ばれたとき、割り込みスタック表のその数字のスタックに変える。 | | 3-7 | 予約済 | | 8 | 0: 割り込みゲート、1: トラップゲート | 0なら、このハンドラが呼ばれたとき割り込みは無効化される。 | | 9-11 | 1にしておかないといけない | | 12 | 0にしておかないといけない | | 13‑14 | 記述子の特権レベル (Descriptor Privilege Level) (DPL) | このハンドラを呼ぶ際に必要になる最低限の特権レベル。 | | 15 | Present | それぞれの例外がIDTの何番目に対応するかは事前に定義されています。例えば、「無効な命令コード」の例外は6番目で、「ページフォルト」例外は14番目です。これにより、ハードウェアがそれぞれの例外に対応するIDTの設定を(特に設定の必要なく)自動的に読み出せるというわけです。OSDev wikiの[「例外表」][exceptions]の "Vector nr." 列に、すべての例外についてIDTの何番目かが記されています。 例外が起こると、ざっくりCPUは以下のことを行います: 1. 命令ポインタと[RFLAGS]レジスタ(これらの値は後で使います)を含むレジスタをスタックにプッシュする。 2. 割り込み記述子表から対応するエントリを読む。例えば、ページフォルトが起こったときはCPUは14番目のエントリを読む。 3. エントリが存在しているのかチェックする。そうでなければダブルフォルトを起こす。 4. エントリが割り込みゲートなら(40番目のビットが0なら)ハードウェア割り込みを無効にする。 5. 指定された[GDT]セレクタをCSセグメントに読み込む。 6. 指定されたハンドラ関数にジャンプする。 [RFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register [GDT]: https://en.wikipedia.org/wiki/Global_Descriptor_Table ステップ4と5について今深く考える必要はありません。今後の記事で大域記述子表 (Global Descriptor Table, 略してGDT) とハードウェア割り込みについては学んでいきます。 ## IDT型 自前でIDTの型を作る代わりに、`x86_64`クレートの[`InterruptDescriptorTable`構造体][`InterruptDescriptorTable` struct]を使います。こんな見た目をしています: [`InterruptDescriptorTable` struct]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html ``` rust #[repr(C)] pub struct InterruptDescriptorTable { pub divide_by_zero: Entry, pub debug: Entry, pub non_maskable_interrupt: Entry, pub breakpoint: Entry, pub overflow: Entry, pub bound_range_exceeded: Entry, pub invalid_opcode: Entry, pub device_not_available: Entry, pub double_fault: Entry, pub invalid_tss: Entry, pub segment_not_present: Entry, pub stack_segment_fault: Entry, pub general_protection_fault: Entry, pub page_fault: Entry, pub x87_floating_point: Entry, pub alignment_check: Entry, pub machine_check: Entry, pub simd_floating_point: Entry, pub virtualization: Entry, pub security_exception: Entry, // いくつかのフィールドは省略している } ``` この構造体のフィールドは[`idt::Entry`]という型を持っています。これはIDTのエントリのフィールド(上の表を見てください)を表す構造体です。型パラメータ`F`は、期待されるハンドラ関数の型を表します。エントリの中には、[`HandlerFunc`]型を要求するものや、[`HandlerFuncWithErrCode`]型を要求するものがあることがわかります。ページフォルトに至っては、[`PageFaultHandlerFunc`]という自分専用の型を要求していますね。 [`idt::Entry`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.Entry.html [`HandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFunc.html [`HandlerFuncWithErrCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFuncWithErrCode.html [`PageFaultHandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.PageFaultHandlerFunc.html まず`HandlerFunc`型を見てみましょう: ```rust type HandlerFunc = extern "x86-interrupt" fn(_: InterruptStackFrame); ``` これは、`extern "x86-interrupt" fn`型への[型エイリアス][type alias]です。`extern`は[外部呼び出し規約][foreign calling convention]に従う関数を定義するのに使われ、おもにC言語のコードと連携したいときに使われます (`extern "C" fn`) 。しかし、`x86-interrupt`呼び出し規約とは何なのでしょう? [type alias]: https://doc.rust-lang.org/book/ch20-03-advanced-types.html#creating-type-synonyms-with-type-aliases [foreign calling convention]: https://doc.rust-lang.org/nomicon/ffi.html#foreign-calling-conventions ## 例外の呼び出し規約 例外は関数呼び出しと非常に似ています。CPUが呼び出された関数の最初の命令にジャンプし、それを実行します。その後、CPUはリターンアドレスにジャンプし、親関数の実行を続けます。 しかし、例外と関数呼び出しには大きな違いが一つあるのです:関数呼び出しはコンパイラによって挿入された`call`命令によって自発的に引き起こされますが、例外は **どんな命令の実行中でも** 起こる可能性があるのです。この違いの結果を理解するためには、関数呼び出しについてより詳しく見ていく必要があります。 [呼び出し規約][Calling conventions]は関数呼び出しについて事細かく指定しています。例えば、関数のパラメータがどこに置かれるべきか(例えば、レジスタなのかスタックなのか)や、結果がどのように返されるべきかを指定しています。x86_64上のLinuxでは、C言語の関数に関しては以下のルールが適用されます(これは[System V ABI]で指定されています): [Calling conventions]: https://en.wikipedia.org/wiki/Calling_convention [System V ABI]: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf - 最初の6つの整数引数は、レジスタ`rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9`で渡される - 追加の引数はスタックで渡される - 結果は`rax`と`rdx`で返される 注意してほしいのは、RustはC言語のABIに従っていない(実は、[RustにはABIすらまだありません][rust abi])ので、このルールは`extern "C" fn`と宣言された関数にしか適用されないということです。 [rust abi]: https://github.com/rust-lang/rfcs/issues/600 ### PreservedレジスタとScratchレジスタ 呼び出し規約はレジスタを2種類に分けています:preserved (保存) レジスタとscratch (下書き) レジスタです。 preservedレジスタの値は関数呼び出しの前後で変化してはいけません。ですので、呼び出された関数(訳注:callの受け身で"callee"と呼ばれます)は、リターンする前にその値をもとに戻す場合に限り、その値を上書きできます。そのため、これらのレジスタはcallee-saved (呼び出し先によって保存される) と呼ばれます。よくとられる方法は、関数の最初でそのレジスタをスタックに保存し、リターンする直前にその値をもとに戻すことです。 それとは対照的に、呼び出された関数はscratchレジスタを何の制限もなく上書きすることができます。呼び出し元の関数がscratchレジスタの値を関数呼び出しの前後で保存したいなら、関数呼び出しの前に自分で(スタックにプッシュするなどして)バックアップしておいて、もとに戻す必要があります。なので、scratchレジスタはcaller-saved (呼び出し元によって保存される) です。 x86_64においては、C言語の呼び出し規約は以下のpreservedレジスタとscratchレジスタを指定します: | preservedレジスタ | scratchレジスタ | | ----------------------------------------------- | ----------------------------------------------------------- | | `rbp`, `rbx`, `rsp`, `r12`, `r13`, `r14`, `r15` | `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`, `r9`, `r10`, `r11` | | _callee-saved_ | _caller-saved_ | コンパイラはこれらのルールを知っているので、それにしたがってコードを生成します。例えば、ほとんどの関数は`push rbp`から始まるのですが、これは`rbp`をスタックにバックアップしているのです(`rbp`はcallee-savedなレジスタであるため)。 ### すべてのレジスタを保存する 関数呼び出しとは対象的に、例外は **どんな命令の最中にも** 起きる可能性があります。多くの場合、生成されたコードが例外を引き起こすのかどうかは、コンパイル時には見当も付きません。例えば、コンパイラはある命令がスタックオーバーフローやページフォルトを起こすのか知ることができません。 いつ例外が起きるのかわからない以上、レジスタを事前にバックアップしておくことは不可能です。つまり、caller-savedレジスタを利用する呼び出し規約は、例外ハンドラには使えないということです。代わりに、 **すべてのレジスタを** 保存する規約を使わないといけません。`x86-interrupt`呼び出し規約はそのような呼び出し規約なので、関数が戻るときにすべてのレジスタが元の値に戻されることを保証してくれるというわけです。 これは、関数の初めにすべてのレジスタがスタックに保存されるということを意味しないことに注意してください。その代わりに、コンパイラは関数によって上書きされてしまうレジスタのみをバックアップします。こうすれば、数個のレジスタしか使わない短い関数に対して、とても効率的なコードが生成できるでしょう。 ### 割り込み時のスタックフレーム 通常の関数呼び出し(`call`命令を使います)においては、CPUは対象の関数にジャンプする前にリターンアドレスをプッシュします。関数がリターンするとき(`ret`命令を使います)、CPUはこのリターンアドレスをポップし、そこにジャンプします。そのため、通常の関数呼び出しの際のスタックフレームは以下のようになっています: ![function stack frame](function-stack-frame.svg) しかし、例外と割り込みハンドラについては、リターンアドレスをプッシュするだけではだめです。なぜなら、割り込みハンドラはしばしば(スタックポインタや、CPUフラグなどが)異なる状況で実行されるからです。ですので、代わりに、CPUは割り込みが起こると以下の手順を実行します。 1. **スタックポインタをアラインする**: 割り込みはあらゆる命令において発生しうるので、スタックポインタもあらゆる値を取る可能性があります。しかし、CPU命令のうちいくつか(例えばSSE命令の一部など)はスタックポインタが16バイトの倍数になっていることを要求するので、そうなるようにCPUは割り込みの直後にスタックポインタを揃え (アラインし) ます。 2. (場合によっては)**スタックを変更する**: スタックの変更は、例えばCPU例外がユーザーモードのプログラムで起こった場合のような、CPUの特権レベルを変更するときに起こります。いわゆる割り込みスタック表 (Interrupt Stack Table) を使うことで、特定の割り込みに対しスタックを変更するよう設定することも可能です。割り込みスタック表については次の記事で説明します。 3. **古いスタックポインタをプッシュする**: CPUは、割り込みが発生した際の(アラインされる前の)スタックポインタレジスタ(`rsp`)とスタックセグメントレジスタ(`ss`)の値をプッシュします。これにより、割り込みハンドラからリターンしてきたときにもとのスタックポインタを復元することが可能になります。 4. **`RFLAGS`レジスタをプッシュして更新する**: [`RFLAGS`]レジスタは状態や制御のための様々なビットを保持しています。割り込みに入るとき、CPUはビットのうちいくつかを変更し古い値をプッシュしておきます。 5. **命令ポインタをプッシュする**: 割り込みハンドラ関数にジャンプする前に、CPUは命令ポインタ(`rip`)とコードセグメント(`cs`)をプッシュします。これは通常の関数呼び出しにおける戻り値のプッシュに対応します。 6. (例外によっては)**エラーコードをプッシュする**: ページフォルトのような特定の例外の場合、CPUはエラーコードをプッシュします。これは、例外の原因を説明するものです。 7. **割り込みハンドラを呼び出す**: CPUは割り込みハンドラ関数のアドレスとセグメント記述子 (segment descriptor) をIDTの対応するフィールドから読み出します。そして、この値を`rip`と`cs`レジスタに書き出してから、ハンドラを呼び出します。 [`RFLAGS`]: https://en.wikipedia.org/wiki/FLAGS_register ですので、割り込み時のスタックフレーム (interrupt stack frame) は以下のようになります: ![interrupt stack frame](exception-stack-frame.svg) `x86_64`クレートにおいては、割り込み時のスタックフレームは[`InterruptStackFrame`]構造体によって表現されます。これは割り込みハンドラに`&mut`として渡されるため、これを使うことで例外の原因に関して追加で情報を手に入れることができます。例外のすべてがエラーコードをプッシュするわけではないので、この構造体にはエラーコードのためのフィールドはありません。これらの例外は[`HandlerFuncWithErrCode`]という別の関数型を使いますが、これらは追加で`error_code`引数を持ちます。 [`InterruptStackFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptStackFrame.html ### 舞台裏では何が `x86-interrupt`呼び出し規約は、この例外処理 (ハンドル) プロセスのややこしいところをほぼ全て隠蔽してくれる、強力な抽象化です。しかし、その後ろで何が起こっているのかを知っておいたほうが良いこともあるでしょう。以下に、`x86-interrupt`呼び出し規約がやってくれることを簡単なリストにして示しました。 - **引数を取得する**: 多くの呼び出し規約においては、引数はレジスタを使って渡されることを想定しています。例外ハンドラにおいては、スタックにバックアップする前にレジスタの値を上書きしてはいけないので、これは不可能です。その代わり、`x86-interrupt`呼び出し規約は、引数が既に特定のオフセットでスタック上にあることを認識しています。 - **`iretq`を使ってリターンする**: 割り込み時のスタックフレームは通常の関数呼び出しのスタックフレームとは全く異なるため、通常の `ret` 命令を使ってハンドラ関数から戻ることはできません。その代わりに、`iretq` 命令を使う必要があります。 - **エラーコードを処理する**: いくつかの例外の場合、エラーコードがプッシュされるのですが、これが状況をより複雑にします。エラーコードはスタックのアラインメントを変更し(次の箇条を参照)、リターンする前にスタックからポップされる必要があるのです。`x86-interrupt`呼び出し規約は、このややこしい仕組みをすべて処理してくれます。しかし、どのハンドラ関数がどの例外に使われているかは呼び出し規約側にはわからないので、関数の引数の数からその情報を推測する必要があります。つまり、プログラマはやはりそれぞれの例外に対して正しい関数型を使う責任があるということです。幸いにも、`x86_64`クレートで定義されている`InterruptDescriptorTable`型が、正しい関数型が確実に使われるようにしてくれます。 - **スタックをアラインする**: 一部の命令(特にSSE命令)には、16バイトのスタックアラインメントを必要とするものがあります。CPUは例外が発生したときには必ずこのようにスタックが整列 (アライン) されることを保証しますが、例外の中には、エラーコードをプッシュして再びスタックの整列を壊してしまうものもあります。この場合、`x86-interrupt`の呼び出し規約は、スタックを再整列させることでこの問題を解決します。 もしより詳しく知りたい場合は、例外の処理について[naked function][naked functions]を使って説明する一連の記事があります。[この記事の最下部][too-much-magic]にそこへのリンクがあります。 [naked functions]: https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md [too-much-magic]: #sasuganijian-dan-sugi ## 実装 理屈は理解したので、私達のカーネルでCPUの例外を実際に処理していきましょう。まず、`src/interrupts.rs`に割り込みのための新しいモジュールを作ります。このモジュールはまず、`init_idt`関数という、新しい`InterruptDescriptorTable`を作る関数を定義します。 ``` rust // in src/lib.rs pub mod interrupts; // in src/interrupts.rs use x86_64::structures::idt::InterruptDescriptorTable; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); } ``` これで、ハンドラ関数を追加していくことができます。まず、[ブレークポイント例外][breakpoint exception]のハンドラを追加するところから始めましょう。ブレークポイント例外は、例外処理のテストをするのにうってつけの例外なのです。この例外の唯一の目的は、ブレークポイント命令`int3`が実行された時、プログラムを一時停止させることです。 [breakpoint exception]: https://wiki.osdev.org/Exceptions#Breakpoint ブレークポイント例外はよくデバッガによって使われます。ユーザーがブレークポイントを設定すると、デバッガが対応する命令を`int3`命令で置き換え、その行に到達したときにCPUがブレークポイント例外を投げるようにするのです。ユーザがプログラムを続行したい場合は、デバッガは`int3`命令をもとの命令に戻してプログラムを再開します。より詳しく知るには、[How debuggers work]["_How debuggers work_"]というシリーズ記事を読んでください。 ["_How debuggers work_"]: https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints 今回の場合、命令を上書きしたりする必要はありません。ブレークポイント命令が実行された時、メッセージを表示したうえで実行を継続したいだけです。ですので、単純な`breakpoint_handler`関数を作ってIDTに追加してみましょう。 ```rust // in src/interrupts.rs use x86_64::structures::idt::{InterruptDescriptorTable, InterruptStackFrame}; use crate::println; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); } extern "x86-interrupt" fn breakpoint_handler( stack_frame: InterruptStackFrame) { println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); } ``` 私達のハンドラは、ただメッセージを出力し、割り込みスタックフレームを整形して出力するだけです。 これをコンパイルしようとすると、以下のエラーが起こります: ``` error[E0658]: x86-interrupt ABI is experimental and subject to change (see issue #40180) --> src/main.rs:53:1 | 53 | / extern "x86-interrupt" fn breakpoint_handler(stack_frame: InterruptStackFrame) { 54 | | println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); 55 | | } | |_^ | = help: add #![feature(abi_x86_interrupt)] to the crate attributes to enable ``` このエラーは、`x86-interrupt`呼び出し規約がまだ不安定なために発生します。これを強制的に使うためには、`lib.rs`の最初に`#![feature(abi_x86_interrupt)]`を追記して、この機能を明示的に有効化してやる必要があります。 ### IDTを読み込む CPUがこの割り込みディスクリプタテーブル(IDT)を使用するためには、[`lidt`]命令を使ってこれを読み込む必要があります。`x86_64`の`InterruptDescriptorTable`構造体には、そのための[`load`][InterruptDescriptorTable::load]というメソッド関数が用意されています。それを使ってみましょう: [`lidt`]: https://www.felixcloutier.com/x86/lgdt:lidt [InterruptDescriptorTable::load]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html#method.load ```rust // in src/interrupts.rs pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.load(); } ``` これをコンパイルしようとすると、以下のエラーが発生します: ``` error: `idt` does not live long enough --> src/interrupts/mod.rs:43:5 | 43 | idt.load(); | ^^^ does not live long enough 44 | } | - borrowed value only lives until here | = note: borrowed value must be valid for the static lifetime... ``` `load`メソッドは(`idt`に)`&'static self`、つまりプログラムの実行されている間ずっと有効な参照を期待しています。これは、私達が別のIDTを読み込まない限り、CPUは割り込みのたびにこの表にアクセスするからです。そのため、`'static`より短いライフタイムの場合、use-after-free (メモリ解放後にアクセス) バグが発生する可能性があります。 実際、これはまさにここで起こっていることです。私達の`idt`はスタック上に生成されるので、`init`関数の中でしか有効ではないのです。この関数が終わると、このスタックメモリは他の関数に使い回されるので、CPUはどこかもわからないスタックメモリをIDTとして解釈してしまうのです。幸運にも、`InterruptDescriptorTable::load`メソッドは関数定義にこのライフタイムの要件を組み込んでいるので、Rustコンパイラはこのバグをコンパイル時に未然に防ぐことができたというわけです。 この問題を解決するには、`idt`を`'static`なライフタイムの場所に格納する必要があります。これを達成するには、[`Box`]を使ってIDTをヒープに割当て、続いてそれを`'static`な参照に変換すればよいです。しかし、私達はOSのカーネルを書いている途中であり、(まだ)ヒープを持っていません。 [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html 別の方法として、IDTを`static`として保存してみましょう: ```rust static IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } ``` しかし、問題が発生します:staticは不変 (イミュータブル) なので、`init`関数でエントリを変更することができません。これは[`static mut`]を使って解決できそうです: [`static mut`]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable ```rust static mut IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { unsafe { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } } ``` このように変更するとエラーなくコンパイルできますが、このような書き方は全く慣用的ではありません。`static mut`はデータ競合を非常に起こしやすいので、アクセスするたびに[unsafeブロック][`unsafe` block]が必要になります。 [`unsafe` block]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers #### Lazy Staticsにおまかせ 幸いにも、例の`lazy_static`マクロが存在します。このマクロは`static`をコンパイル時に評価する代わりに、最初に参照されたときに初期化を行います。このため、初期化時にはほとんどすべてのことができ、実行時にのみ決定する値を読み込むこともできます。 [VGAテキストバッファの抽象化をした][vga text buffer lazy static]ときに、すでに`lazy_static`クレートはインポートしました。そのため、すぐに`lazy_static!`マクロを使って静的なIDTを作ることができます。 [vga text buffer lazy static]: @/edition-2/posts/03-vga-text-buffer/index.ja.md#dai-keta-lazy-jing-de-bian-shu ```rust // in src/interrupts.rs use lazy_static::lazy_static; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt }; } pub fn init_idt() { IDT.load(); } ``` この方法では`unsafe`ブロックが必要ないことに注目してください。`lazy_static!`マクロはその内部で`unsafe`を使ってはいるのですが、これは安全なインターフェースの中に抽象化されているのです。 ### 実行する カーネルで例外を動作させるための最後のステップは、`main.rs`から`init_idt`関数を呼び出すことです。直接呼び出す代わりに、より一般的な`init`関数を`lib.rs`に導入します: ```rust // in src/lib.rs pub fn init() { interrupts::init_idt(); } ``` この関数により、`main.rs`、`lib.rs`および結合テストにおける、異なる`_start`関数で共有される、初期化ルーチンの「中央広場」ができました。 `main.rs`内の`_start`関数を更新して、`init`を呼び出し、そのあとブレークポイント例外を発生させるようにしてみましょう: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // new // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); // new // as before #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` (`cargo run`を使って)QEMU内でこれを実行すると、以下のようになります ![QEMU printing `EXCEPTION: BREAKPOINT` and the interrupt stack frame](qemu-breakpoint-exception.png) うまくいきました!CPUは私達のブレークポイントハンドラを呼び出すのに成功し、これがメッセージを出力し、そのあと`_start`関数に戻って、`It did not crash!`のメッセージを出力しました。 割り込みスタックフレームは、例外が発生した時の命令とスタックポインタを教えてくれることがわかります。これは、予期せぬ例外をデバッグする際に非常に便利です。 ### テストを追加する 上記の動作が継続することを確認するテストを作成してみましょう。まず、`_start` 関数を更新して `init` を呼び出すようにします。 ```rust // in src/lib.rs /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); // new test_main(); loop {} } ``` Rustのテストでは、`main.rs`とは全く無関係に`lib.rs`をテストするので、この`_start`関数は`cargo test --lib`を実行する際に使用されることを思い出してください。テストを実行する前にIDTを設定するために、ここで`init`を呼び出す必要があります。 では、`test_breakpoint_exception`テストを作ってみましょう: ```rust // in src/interrupts.rs #[test_case] fn test_breakpoint_exception() { // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); } ``` このテストでは、`int3`関数を呼び出してブレークポイント例外を発生させます。その後も実行が続くことを確認することで、ブレークポイントハンドラが正しく動作していることを保証します。 この新しいテストを試すには、`cargo test`(すべてのテストを試したい場合)または`cargo test --lib`(`lib.rs`とそのモジュールのテストのみの場合)を実行すればよいです。出力は以下のようになるはずです: ``` blog_os::interrupts::test_breakpoint_exception... [ok] ``` ## さすがに簡単すぎ? `x86-interrupt`呼び出し規約と[`InterruptDescriptorTable`]型のおかげで、例外処理のプロセスは比較的わかりやすく、面倒なところはありませんでした。「これではさすがに簡単すぎる、例外処理の闇をすべて学び尽くしたい」というあなた向けの記事もあります:私達の[Handling Exceptions with Naked Functions][“Handling Exceptions with Naked Functions”]シリーズ(未訳)では、`x86-interrupt`呼び出し規約を使わずに例外を処理する方法を学び、さらには独自のIDT型を定義します。`x86-interrupt`呼び出し規約や、`x86_64`クレートが存在する前は、これらの記事が主な例外処理に関する記事でした。なお、これらの記事はこのブログの[第1版][first edition]をもとにしているので、内容が古くなっている可能性があることに注意してください。 [“Handling Exceptions with Naked Functions”]: @/edition-1/extra/naked-exceptions/_index.md [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [first edition]: @/edition-1/_index.md ## 次は? 例外を捕捉し、そこから戻ってくることに成功しました!次のステップは、すべての例外を捕捉できるようにすることです。なぜなら、補足されなかった例外は致命的な[トリプルフォルト][triple fault]を引き起こし、これはシステムリセットにつながってしまうからです。次の記事では、[ダブルフォルト][double faults]を正しく捕捉することで、これを回避できることを説明します。 [triple fault]: https://wiki.osdev.org/Triple_Fault [double faults]: https://wiki.osdev.org/Double_Fault#Double_Fault ================================================ FILE: blog/content/edition-2/posts/05-cpu-exceptions/index.ko.md ================================================ +++ title = "CPU 예외 (Exception)" weight = 5 path = "ko/cpu-exceptions" date = 2018-06-17 [extra] # Please update this when updating the translation translation_based_on_commit = "1c9b5edd6a5a667e282ca56d6103d3ff1fd7cfcb" # GitHub usernames of the people that translated this post translators = ["JOE1994"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["KimWang906"] +++ CPU 예외 (exception)는 유효하지 않은 메모리 주소에 접근하거나 분모가 0인 나누기 연산을 하는 등 허용되지 않은 작업 실행 시 발생합니다. CPU 예외를 처리할 수 있으려면 예외 처리 함수 정보를 제공하는 _인터럽트 서술자 테이블 (interrupt descriptor table; IDT)_ 을 설정해 두어야 합니다. 이 글에서는 커널이 [breakpoint 예외][breakpoint exceptions]를 처리한 후 정상 실행을 재개할 수 있도록 구현할 것입니다. [breakpoint exceptions]: https://wiki.osdev.org/Exceptions#Breakpoint 이 블로그는 [GitHub 저장소][GitHub]에서 오픈 소스로 개발되고 있으니, 문제나 문의사항이 있다면 저장소의 'Issue' 기능을 이용해 제보해주세요. [페이지 맨 아래][at the bottom]에 댓글을 남기실 수도 있습니다. 이 포스트와 관련된 모든 소스 코드는 저장소의 [`post-05 브랜치`][post branch]에서 확인하실 수 있습니다. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-05 ## 개요 예외 (exception)는 현재 실행 중인 CPU 명령어에 문제가 있음을 알립니다. 예를 들면, 분모가 0인 나누기 연산을 CPU 명령어가 하려고 하면 CPU가 예외를 발생시킵니다. 예외가 발생하게 되면 CPU는 진행 중인 작업을 일시 중단한 후 즉시 예외 처리 함수 (exception handler)를 호출합니다 (발생한 예외의 종류에 따라 호출될 예외 처리 함수가 결정됩니다). x86 아키텍처에는 20가지 정도의 CPU 예외가 존재합니다. 그 중 제일 중요한 것들은 아래와 같습니다: - **페이지 폴트 (Page Fault)**: 접근이 허용되지 않은 메모리에 접근을 시도하는 경우 페이지 폴트가 발생하게 됩니다. 예를 들면, CPU가 실행하려는 명령어가 (1) 매핑되지 않은 페이지로부터 데이터를 읽어오려고 하거나, (2) 읽기 전용 페이지에 데이터를 쓰려고 하는 경우에 페이지 폴트가 발생합니다. - **유효하지 않은 Opcode**: CPU에 주어진 명령어의 Opcode를 CPU가 지원하지 않을 때 발생합니다. 새로 출시된 [SSE 명령어][SSE instructions]를 구식 CPU에서 실행하려 하면 예외가 발생하게 됩니다. - **General Protection Fault**: 이 예외는 가장 광범위한 원인을 가진 예외입니다. 사용자 레벨 코드에서 권한 수준이 높은 명령어 (privileged instruction)를 실행하거나 configuration 레지스터를 덮어 쓰는 등 다양한 접근 권한 위반 상황에 발생합니다. - **더블 폴트 (Double Fault)**: 예외 발생 시 CPU는 알맞은 예외 처리 함수의 호출을 시도합니다. _예외 처리 함수를 호출하는 도중에_ 또 예외가 발생하는 경우, CPU는 더블 폴트 (double fault) 예외를 발생시킵니다. 또한 예외를 처리할 예외 처리 함수가 등록되지 않은 경우에도 더블 폴트 예외가 발생합니다. - **트리플 폴트 (Triple Fault)** : CPU가 더블 폴트 예외 처리 함수를 호출하려고 하는 사이에 예외가 발생하는 경우, CPU는 치명적인 _트리플 폴트 (triple fault)_ 예외를 발생시킵니다. 트리플 폴트 예외를 처리하는 것은 불가능 하므로 대부분의 프로세서들은 트리플 폴트 발생 시 프로세서를 초기화하고 운영체제를 재부팅합니다. [SSE instructions]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions 모든 CPU 예외들의 목록을 보시려면 [OSDev wiki][exceptions]를 확인해주세요. [exceptions]: https://wiki.osdev.org/Exceptions ### 인터럽트 서술사 테이블 (Interrupt Descriptor Table) {#the-interrupt-descriptor-table} 예외 발생을 포착하고 대응할 수 있으려면 _인터럽트 서술자 테이블 (Interrupt Descriptor Table; IDT)_ 이 필요합니다. 이 테이블을 통해 우리는 각각의 CPU 예외를 어떤 예외 처리 함수가 처리할지 지정합니다. 하드웨어에서 이 테이블을 직접 사용하므로 테이블의 형식은 정해진 표준에 따라야 합니다. 테이블의 각 엔트리는 아래와 같은 16 바이트 구조를 따릅니다: | 타입 | 이름 | 설명 | | ---- | ------------------------ | ------------------------------------------------------------------------------------------------------- | | u16 | Function Pointer [0:15] | 예외 처리 함수에 대한 64비트 포인터의 하위 16비트 | | u16 | GDT selector | [전역 서술자 테이블 (global descriptor table)][global descriptor table]에서 코드 세그먼트를 선택하는 값 | | u16 | Options | (표 아래의 설명 참조) | | u16 | Function Pointer [16:31] | 예외 처리 함수에 대한 64비트 포인터의 2번째 하위 16비트 | | u32 | Function Pointer [32:63] | 예외 처리 함수에 대한 64비트 포인터의 상위 32비트 | | u32 | Reserved | 사용 보류 중인 영역 | [global descriptor table]: https://en.wikipedia.org/wiki/Global_Descriptor_Table Options 필드는 아래의 형식을 갖습니다: | 비트 구간 | 이름 | 설명 | | --------- | -------------------------------- | -------------------------------------------------------------------------------------------------------------------- | | 0-2 | Interrupt Stack Table Index | 0: 스택을 교체하지 않는다, 1-7: 이 인터럽트 처리 함수가 호출된 경우 Interrupt Stack Table의 n번째 스택으로 교체한다. | | 3-7 | Reserved | 사용 보류 중인 영역 | | 8 | 0: Interrupt Gate, 1: Trap Gate | 비트가 0이면 이 예외 처리 함수가 호출 이후 인터럽트 발생 억제 | | 9-11 | must be one | 각 비트는 언제나 1 | | 12 | must be zero | 언제나 0 | | 13‑14 | Descriptor Privilege Level (DPL) | 이 예외 처리 함수를 호출하는 데에 필요한 최소 특권 레벨 | | 15 | Present | 각 예외마다 IDT에서의 인덱스가 배정되어 있습니다. invalid opcode 예외는 테이블 인덱스 6이 배정되어 있고, 페이지 폴트 예외는 테이블 인덱스 14가 배정되어 있습니다. 하드웨어는 미리 배정된 인덱스를 이용해 각 예외에 대응하는 IDT 엔트리를 자동으로 불러올 수 있습니다. OSDev 위키의 [Exception Table][exceptions]의 “Vector nr.”로 명명된 열을 보시면 모든 예외 및 배정된 인덱스를 확인하실 수 있습니다. 예외가 발생하면 CPU는 대략 아래의 작업들을 순서대로 진행합니다: 1. Instruction Pointer 레지스터와 [RFLAGS] 레지스터를 비롯해 몇몇 레지스터들의 값을 스택에 push (저장)합니다 (나중에 이 값들을 사용할 것입니다). 2. 발생한 예외의 엔트리를 인터럽트 서술사 테이블 (IDT)로부터 읽어옵니다. 예를 들면, 페이지 폴트 발생 시 CPU는 IDT의 14번째 엔트리를 읽어옵니다. 3. 등록된 엔트리가 없을 경우, 더블 폴트 예외를 발생시킵니다. 4. 해당 엔트리가 인터럽트 게이트인 경우 (40번 비트 = 0), 하드웨어 인터럽트 발생을 억제합니다. 5. 지정된 [GDT] 선택자를 CS 세그먼트로 읽어옵니다. 6. 지정된 예외 처리 함수로 점프합니다. [RFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register [GDT]: https://en.wikipedia.org/wiki/Global_Descriptor_Table 위의 4단계와 5단계가 잘 이해되지 않아도 걱정 마세요. 전역 서술자 테이블 (Global Descriptor Table; GDT)과 하드웨어 인터럽트는 이후에 다른 글에서 더 설명할 것입니다. ## IDT 타입 IDT를 나타내는 타입을 직접 구현하지 않고 `x86_64` 크레이트의 [`InterruptDescriptorTable` 구조체][`InterruptDescriptorTable` struct] 타입을 사용합니다: [`InterruptDescriptorTable` struct]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html ``` rust #[repr(C)] pub struct InterruptDescriptorTable { pub divide_by_zero: Entry, pub debug: Entry, pub non_maskable_interrupt: Entry, pub breakpoint: Entry, pub overflow: Entry, pub bound_range_exceeded: Entry, pub invalid_opcode: Entry, pub device_not_available: Entry, pub double_fault: Entry, pub invalid_tss: Entry, pub segment_not_present: Entry, pub stack_segment_fault: Entry, pub general_protection_fault: Entry, pub page_fault: Entry, pub x87_floating_point: Entry, pub alignment_check: Entry, pub machine_check: Entry, pub simd_floating_point: Entry, pub virtualization: Entry, pub security_exception: Entry, // 일부 필드는 생략했습니다 } ``` 구조체의 각 필드는 IDT의 엔트리를 나타내는 `idt::Entry` 타입을 가집니다. 타입 인자 `F`는 사용될 예외 처리 함수의 타입을 정의합니다. 어떤 엔트리는 `F`에 [`HandlerFunc`]를 또는 `F`에 [`HandlerFuncWithErrCode`]를 필요로 하며 페이지 폴트는 [`PageFaultHandlerFunc`]를 필요로 합니다. [`idt::Entry`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.Entry.html [`HandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFunc.html [`HandlerFuncWithErrCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFuncWithErrCode.html [`PageFaultHandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.PageFaultHandlerFunc.html `HandlerFunc` 타입을 먼저 살펴보겠습니다: ```rust type HandlerFunc = extern "x86-interrupt" fn(_: InterruptStackFrame); ``` `HandlerFunc`는 함수 타입 `extern "x86-interrupt" fn`의 [타입 별칭][type alias]입니다. `extern` 키워드는 [외부 함수 호출 규약 (foreign calling convention)][foreign calling convention]을 사용하는 함수를 정의할 때 쓰이는데, 주로 C 함수와 상호작용하는 경우에 쓰입니다 (`extern "C" fn`). `x86-interrupt` 함수 호출 규약은 무엇일까요? [type alias]: https://doc.rust-lang.org/book/ch20-03-advanced-types.html#creating-type-synonyms-with-type-aliases [foreign calling convention]: https://doc.rust-lang.org/nomicon/ffi.html#foreign-calling-conventions ## 인터럽트 호출 규약 예외는 함수 호출과 유사한 점이 많습니다: 호출된 함수의 첫 명령어로 CPU가 점프한 후 함수 안의 명령어들을 차례대로 실행합니다. 그 후 CPU가 반환 주소로 점프하고, 기존에 실행 중이었던 함수의 실행을 이어갑니다. 하지만 예외와 함수 호출 사이에 중요한 차이점이 있습니다: 일반 함수의 경우 컴파일러가 삽입한 `call` 명령어를 통해 호출하지만, 예외는 _어떤 명령어 실행 도중에라도_ 발생할 수 있습니다. 이 차이점의 중대성을 이해하려면 함수 호출 과정을 더 면밀히 살펴봐야 합니다. [함수 호출 규약][Calling conventions]은 함수 호출 과정의 세부 사항들을 규정합니다. 예를 들면, 함수 인자들이 어디에 저장되는지 (레지스터 또는 스택), 함수의 반환 값을 어떻게 전달할지 등을 정합니다. x86_64 리눅스에서 C 함수 호출 시 [System V ABI]가 규정하는 아래의 규칙들이 적용됩니다: [Calling conventions]: https://en.wikipedia.org/wiki/Calling_convention [System V ABI]: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf - 함수의 첫 여섯 인자들은 `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` 레지스터에 저장합니다 - 7번째 함수 인자부터는 모두 스택에 저장합니다 - 함수의 반환 값은 `rax`와 `rdx` 레지스터에 저장됩니다 참고로 Rust는 C ABI를 따르지 않기에 (사실, [Rust는 규정된 ABI가 존재하지 않습니다][rust abi]), 이 법칙들은 `extern "C" fn`으로 정의된 함수들에만 적용됩니다. [rust abi]: https://github.com/rust-lang/rfcs/issues/600 ### Preserved 레지스터와 Scratch 레지스터 함수 호출 규약은 레지스터들을 크게 두 가지 (_preserved_ 레지스터와 _scratch_ 레지스터)로 분류합니다. _preserved_ 레지스터들의 값은 함수 호출 전/후에 보존되어야 합니다. 호출된 함수 (callee)가 이 레지스터들에 다른 값을 저장해 사용하려면 반환 직전에 이 레지스터들에 원래 저장되어 있던 값을 복원해 놓아야 합니다. preserved 레지스터는 _“callee-saved”_ 레지스터라고도 불립니다. 함수 실행 시작 시 이 레지스터들의 값들을 스택에 저장했다가 함수 반환 직전에 복구하는 것이 일반적입니다. 반면, 호출된 함수가 _scratch_ 레지스터들의 값을 자유롭게 덮어 쓰는 것은 괜찮습니다. 함수 호출 전/후로 scratch 레지스터의 값을 보존하고 싶다면, 호출하는 측 (caller)이 함수 호출 전에 레지스터의 값을 스택에 저장해뒀다가 함수의 실행이 끝난 후 레지스터의 값을 본래 값으로 복원해야 합니다. scratch 레지스터는 _“caller-saved”_ 레지스터라고도 불립니다. x86_64에서는 C 함수 호출 규약이 preserved 레지스터와 scratch 레지스터를 아래와 같이 정합니다: | preserved 레지스터 | scratch 레지스터 | | ----------------------------------------------- | ----------------------------------------------------------- | | `rbp`, `rbx`, `rsp`, `r12`, `r13`, `r14`, `r15` | `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`, `r9`, `r10`, `r11` | | _callee-saved_ | _caller-saved_ | 컴파일러는 이 규칙들에 따라 코드를 컴파일 합니다. 예를 들면 대부분의 함수들은 `push rbp` 로 시작하는데, 이는 callee-saved 레지스터인 `rbp`를 스택에 저장합니다. ### 모든 레지스터들의 값 보존하기 함수 호출과 달리 예외는 _어떤_ 명령어가 실행 중이든 관계 없이 발생할 수 있습니다. 대체로 컴파일 시간에는 컴파일 결과 생성된 코드가 예외를 발생시킬지의 유무를 장담하기 어렵습니다. 예를 들어, 컴파일러는 임의의 명령어가 스택 오버플로우 또는 페이지 폴트를 일으킬지 판별하기 어렵습니다. 예외가 언제 발생할지 알 수 없다보니 레지스터에 저장된 값들을 미리 백업해놓을 수가 없습니다. 즉, 예외 처리 함수 구현 시 caller-saved 레지스터에 의존하는 함수 호출 규약을 사용할 수가 없습니다. 예외 처리 함수 구현 시 _모든 레지스터_ 들의 값을 보존하는 함수 호출 규약을 사용해야 합니다. 예시로 `x86-interrupt` 함수 호출 규약은 함수 반환 시 모든 레지스터들의 값이 함수 호출 이전과 동일하게 복원되도록 보장합니다. 함수 실행 시작 시 모든 레지스터들의 값이 스택에 저장된다는 뜻은 아닙니다. 호출된 함수가 덮어 쓸 레지스터들만을 컴파일러가 스택에 백업합니다. 이렇게 하면 적은 수의 레지스터를 사용하는 함수를 컴파일 할 때 짧고 효율적인 코드를 생성할 수 있습니다. ### 인터럽트 스택 프레임 {# the-interrupt-stack-frame} 일반적인 함수 호출 시 (`call` 명령어 이용), CPU는 호출된 함수로 제어 흐름을 넘기기 전에 반환 주소를 스택에 push (저장)합니다. 함수 반환 시 (`ret` 명령어 이용), CPU는 스택에 저장해뒀던 반환 주소를 읽어온 후 해당 주소로 점프합니다. 일반적인 함수 호출 시 스택 프레임의 모습은 아래와 같습니다: ![function stack frame](function-stack-frame.svg) 예외 및 인터럽트 처리 함수의 경우, 일반 함수가 실행되는 CPU 컨텍스트 (스택 포인터, CPU 플래그 등)가 아닌 별개의 CPU 컨텍스트에서 실행됩니다. 따라서 단순히 스택에 반환 주소를 push하는 것보다 더 복잡한 사전 처리가 필요합니다. 인터럽트 발생 시 CPU가 아래의 작업들을 처리합니다. 1. **스택 포인터 정렬**: 인터럽트는 어느 명령어의 실행 중에도 발생할 수 있고, 따라서 스택 포인터 또한 임의의 값을 가질 수 있습니다. 하지만 특정 CPU 명령어들 (예: 일부 SSE 명령어)은 스택 포인터가 16바이트 단위 경계에 정렬되어 있기를 요구합니다. 따라서 CPU는 인터럽트 발생 직후에 스택 포인터를 알맞게 정렬합니다. 2. **스택 교체** (경우에 따라서): CPU의 특권 레벨 (privilege level)이 바뀌는 경우에 스택 교체가 일어납니다 (예: 사용자 모드 프로그램에서 CPU 예외가 발생할 때). 또한 _인터럽트 스택 테이블 (Interrupt Stack Table)_ 을 이용해 특정 인터럽트 발생 시 스택 교체가 이뤄지도록 설정하는 것 또한 가능합니다 (이후 다른 글에서 설명할 내용입니다). 3. **이전의 스택 포인터 push**: 인터럽트 발생 시, CPU는 스택 포인터를 정렬하기에 앞서 스택 포인터 (`rsp`)와 스택 세그먼트 (`ss`) 레지스터들을 저장 (push)합니다. 이로써 인터럽트 처리 함수로부터 반환 시 이전의 스택 포인터를 복원할 수 있습니다. 4. **`RFLAGS` 레지스터 push 및 수정**: [`RFLAGS`] 레지스터는 CPU의 다양한 제어 및 상태 비트들을 저장합니다. 인터럽트 발생 시 CPU는 기존 값을 push한 후 일부 비트들의 값을 변경합니다. 5. **instruction pointer push**: 인터럽트 처리 함수로 점프하기 전에, CPU는 instruction pointer (`rip`)와 code segment (`cs`) 레지스터들을 push합니다. 이는 일반 함수 호출 시 반환 주소를 push하는 것과 유사합니다. 6. **오류 코드 push** (일부 예외만 해당): 페이지 폴트 같은 일부 예외의 경우, CPU는 예외의 원인을 설명하는 오류 코드를 push합니다. 7. **인터럽트 처리 함수 호출**: CPU는 IDT로부터 인터럽트 처리 함수의 주소와 세그먼트 서술자 (segment descriptor)를 읽어옵니다. 읽어온 값들을 각각 `rip` 레지스터와 `cs` 레지스터에 저장함으로써 인터럽트 처리 함수를 호출합니다. [`RFLAGS`]: https://en.wikipedia.org/wiki/FLAGS_register _인터럽트 스택 프레임_ 은 아래와 같은 모습을 가집니다: ![interrupt stack frame](exception-stack-frame.svg) `x86_64` 크레이트에서는 [`InterruptStackFrame`] 구조체 타입을 통해 인터럽트 스택 프레임을 구현합니다. 예외 처리 함수들은 `&mut InterruptStackFrame`를 인자로 받아서 예외 발생 원인에 대한 추가 정보를 얻을 수 있습니다. 이 구조체는 오류 코드를 저장하는 필드를 갖고 있지 않은데, 그 이유는 아주 일부의 예외들만이 오류 코드를 반환하기 때문입니다. 오류 코드를 반환하는 예외들은 [`HandlerFuncWithErrCode`] 함수 타입을 사용하는데, 이 함수 타입은 추가적으로 `error_code` 인자를 받습니다. [`InterruptStackFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptStackFrame.html ### 무대 뒤의 상황 함수 호출 규약 `x86-interrupt`는 예외 처리 과정의 세부적인 사항들을 대부분 숨겨주는 강력한 추상화 계층입니다. 하지만 때로는 추상화 계층 안에서 무슨 일이 일어나는지 알아두는 것이 도움이 됩니다. 아래는 함수 호출 규약 `x86-interrupt`가 처리하는 작업들의 개요입니다. - **함수 인자 읽어오기**: 대부분의 함수 호출 규약들은 함수 인자들이 레지스터를 통해 전달되는 것으로 생각합니다. 그러나 예외 처리 함수는 그렇게 할 수가 없습니다. 그 이유는 스택에 레지스터들의 값들을 백업하기 전에는 어떤 레지스터도 덮어 쓸 수 없기 때문입니다. 함수 호출 규약 `x86-interrupt`는 함수 인자들이 레지스터가 아니라 스택의 특정 위치에 저장되어 있다고 가정합니다. - **`iretq`를 통해 반환**: 인터럽트 스택 프레임은 일반 함수 호출 시 사용되는 스택 프레임과는 완전히 별개의 것이라서 `ret` 명령어를 사용해서는 인터럽트 처리 함수로부터 제대로 반환할 수 없습니다. 대신 `iretq` 명령어를 사용하여 반환합니다. - **오류 코드 처리**: 일부 예외에 한해 push되는 오류 코드는 일을 번거롭게 합니다. 이 오류 코드로 인해 스택 정렬이 망가뜨려지며 (아래 '스택 정렬' 항목 참고), 예외 처리 함수로부터 반환하기 전에 오류 코드를 스택으로부터 pop (제거)해야 합니다. 함수 호출 규약 `x86-interrupt`가 오류 코드로 인한 번거로움을 대신 감당해줍니다. `x86-interrupt`는 어떤 예외 처리 함수가 어떤 예외에 대응하는지 알지 못하기에, 함수의 인자 개수를 통해 해당 정보를 유추합니다. 따라서 개발자는 오류 코드가 push되는 예외와 그렇지 않은 예외에 대해 각각 정확한 함수 타입을 사용해야만 합니다. 다행히 `x86_64` 크레이트가 제공하는 `InterruptDescriptorTable` 타입이 각 경우에 정확한 함수 타입이 사용되도록 보장합니다. - **스택 정렬**: 일부 명령어들 (특히 SSE 명령어)은 스택이 16 바이트 경계에 정렬되어 있기를 요구합니다. 예외 발생 시 CPU는 해당 정렬이 맞춰져 있도록 보장하지만, 일부 예외의 경우에는 오류 코드를 push하면서 맞춰져 있던 정렬을 망가뜨립니다. 함수 호출 규약 `x86-interrupt`는 해당 상황에서 망가진 정렬을 다시 맞춰줍니다. 더 자세한 내용이 궁금하시다면, [naked 함수][naked functions]를 사용한 예외 처리 과정을 설명하는 저희 블로그의 또다른 글 시리즈를 참고하세요 (링크는 [이 글의 맨 마지막][too-much-magic]을 참조). [naked functions]: https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md [too-much-magic]: #too-much-magic ## 구현 이론적 배경 설명은 끝났고, 이제 CPU 예외 처리 기능을 커널에 구현해보겠습니다. 새로운 모듈 `interrupts`를 `src/interrupts.rs`에 만든 후, 새로운 `InterruptDescriptorTable`을 생성하는 함수 `init_idt`를 작성합니다. ``` rust // in src/lib.rs pub mod interrupts; // in src/interrupts.rs use x86_64::structures::idt::InterruptDescriptorTable; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); } ``` 이제 예외 처리 함수들을 추가할 수 있습니다. [breakpoint 예외][breakpoint exception]를 위한 예외 처리 함수부터 작성해보겠습니다. breakpoint 예외는 예외 처리를 테스트하는 용도에 안성맞춤입니다. breakpoint 예외의 유일한 용도는 breakpoint 명령어 `int3`가 실행되었을 때 실행 중인 프로그램을 잠시 멈추는 것입니다. [breakpoint exception]: https://wiki.osdev.org/Exceptions#Breakpoint breakpoint 예외는 디버거 (debugger)에서 자주 사용됩니다: 사용자가 breakpoint를 설정하면 디버거는 breakpoint에 대응되는 명령어를 `int3` 명령어로 치환하는데, 이로써 해당 명령어에 도달했을 때 CPU가 breakpoint 예외를 발생시킵니다. 사용자가 프로그램 실행을 재개하면 디버거는 `int3` 명령어를 원래의 명령어로 다시 교체한 후 프로그램 실행을 재개합니다. 더 자세한 내용이 궁금하시면 ["_How debuggers work_"] 시리즈를 읽어보세요. ["_How debuggers work_"]: https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints 지금 우리가 breakpoint 예외를 사용하는 상황에서는 명령어를 덮어쓸 필요가 전혀 없습니다. 우리는 breakpoint 예외가 발생했을 때 그저 메시지를 출력한 후 프로그램 실행을 재개하기만 하면 됩니다. 간단한 예외 처리 함수 `breakpoint_handler`를 만들고 IDT에 추가합니다: ```rust // in src/interrupts.rs use x86_64::structures::idt::{InterruptDescriptorTable, InterruptStackFrame}; use crate::println; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); } extern "x86-interrupt" fn breakpoint_handler( stack_frame: InterruptStackFrame) { println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); } ``` 이 예외 처리 함수는 간단한 메시지와 함께 인터럽트 스택 프레임의 정보를 출력합니다. 컴파일을 시도하면 아래와 같은 오류 메시지가 출력됩니다: ``` error[E0658]: x86-interrupt ABI is experimental and subject to change (see issue #40180) --> src/main.rs:53:1 | 53 | / extern "x86-interrupt" fn breakpoint_handler(stack_frame: InterruptStackFrame) { 54 | | println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); 55 | | } | |_^ | = help: add #![feature(abi_x86_interrupt)] to the crate attributes to enable ``` 이 오류는 함수 호출 규약 `x86-interrupt`가 아직 unstable 하여 발생합니다. `lib.rs`의 맨 위에 `#![feature(abi_x86_interrupt)]` 속성을 추가하여 함수 호출 규약 `x86-interrupt`의 사용을 강제합니다. ### IDT 불러오기 우리가 만든 인터럽트 서술사 테이블을 CPU가 사용하도록 하려면, 먼저 [`lidt`] 명령어를 통해 해당 테이블을 불러와야 합니다. `x86_64` 크레이트가 제공하는 `InterruptDescriptorTable` 구조체의 함수 [`load`][InterruptDescriptorTable::load]를 통해 테이블을 불러옵니다: [`lidt`]: https://www.felixcloutier.com/x86/lgdt:lidt [InterruptDescriptorTable::load]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html#method.load ```rust // in src/interrupts.rs pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.load(); } ``` 컴파일 시 아래와 같은 오류가 발생합니다: ``` error: `idt` does not live long enough --> src/interrupts/mod.rs:43:5 | 43 | idt.load(); | ^^^ does not live long enough 44 | } | - borrowed value only lives until here | = note: borrowed value must be valid for the static lifetime... ``` `load` 함수는 `&'static self` 타입의 인자를 받는데, 이 타입은 프로그램 실행 시간 전체 동안 유효한 레퍼런스 타입입니다. 우리가 새로운 IDT를 로드하지 않는 이상 프로그램 실행 중 인터럽트가 발생할 때마다 CPU가 이 테이블에 접근할 것이기에, `'static` 라이프타임보다 짧은 라이프타임을 사용하면 use-after-free 버그가 발생할 수 있습니다. `idt`는 스택에 생성되어 `init` 함수 안에서만 유효합니다. `init` 함수를 벗어나면 해당 스택 메모리는 다른 함수에 의해 재사용되므로 해당 메모리를 IDT로서 간주하고 참조한다면 임의의 함수의 스택 메모리로부터 데이터를 읽어오게 됩니다. 다행히 `InterruptDescriptorTable::load` 함수 정의에 라이프타임 요구 사항이 포함되어 있어 Rust 컴파일러가 잠재적인 use-after-free 버그를 컴파일 도중에 막아줍니다. 이 문제를 해결하려면 `idt`를 `'static` 라이프타임을 갖는 곳에 저장해야 합니다. [`Box`]를 통해 IDT를 힙 (heap) 메모리에 할당한 뒤 Box 에 저장된 IDT에 대한 `'static` 레퍼런스를 얻는 것은 해결책이 되지 못합니다. 그 이유는 아직 우리가 커널에 힙 메모리를 구현하지 않았기 때문입니다. [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html 대안으로 `IDT`를 `static` 변수에 저장하는 것을 시도해보겠습니다: ```rust static IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } ``` 문제는 static 변수의 값은 변경할 수가 없어서, `init` 함수 실행 시 breakpoint 예외에 대응하는 IDT 엔트리를 수정할 수 없습니다. 대신 `IDT`를 [`static mut`] 변수에 저장해보겠습니다: [`static mut`]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable ```rust static mut IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { unsafe { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } } ``` 이제 컴파일 오류가 발생하지는 않지만, Rust에서 `static mut`의 사용은 권장되지 않습니다. `static mut`는 데이터 레이스 (data race)를 일으키기 쉽기에, `static mut` 변수에 접근할 때마다 [`unsafe` 블록][`unsafe` block]을 반드시 사용해야 합니다. [`unsafe` block]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers #### 초기화 지연이 가능한 Static 변수 (Lazy Statics) 다행히 `lazy_static` 매크로를 사용하면 `static` 변수의 초기화를 컴파일 도중이 아니라 프로그램 실행 중 해당 변수가 처음 읽어지는 시점에 일어나게 할 수 있습니다. 따라서 프로그램 실행 시간에 다른 변수의 값을 읽어오는 등 거의 모든 작업을 변수 초기화 블록 안에서 제약 없이 진행할 수 있습니다. 이전에 [VGA 텍스트 버퍼에 대한 추상 인터페이스][vga text buffer lazy static]를 구현 시 의존 크레이트 목록에 `lazy_static`을 이미 추가했습니다. `lazy_static!` 매크로를 바로 사용하여 static 타입의 IDT를 생성합니다: [vga text buffer lazy static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics ```rust // in src/interrupts.rs use lazy_static::lazy_static; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt }; } pub fn init_idt() { IDT.load(); } ``` 이 코드에서는 `unsafe` 블록이 필요하지 않습니다. `lazy_static!` 매크로의 내부 구현에서는 `unsafe`가 사용되지만, 안전한 추상 인터페이스 덕분에 `unsafe`가 외부로 드러나지 않습니다. ### 실행하기 마지막으로 `main.rs`에서 `init_idt` 함수를 호출하면 커널에서 예외 발생 및 처리가 제대로 작동합니다. 직접 `init_idt` 함수를 호출하는 대신 범용 초기화 함수 `init`을 `lib.rs`에 추가합니다: ```rust // in src/lib.rs pub fn init() { interrupts::init_idt(); } ``` `main.rs`와 `lib.rs` 및 통합 테스트들의 `_start` 함수들에서 공용으로 사용하는 초기화 루틴들의 호출은 앞으로 이 `init` 함수에 한데 모아 관리할 것입니다. `main.rs`의 `_start_` 함수가 `init` 함수를 호출한 후 breakpoint exception을 발생시키도록 코드를 추가합니다: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // 새로 추가한 코드 // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); // 새로 추가한 코드 // as before #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` `cargo run`을 통해 QEMU에서 커널을 실행하면 아래의 출력 내용을 얻습니다: ![QEMU printing `EXCEPTION: BREAKPOINT` and the interrupt stack frame](qemu-breakpoint-exception.png) 성공입니다! CPU가 성공적으로 예외 처리 함수 `breakpoint_handler`를 호출했고, 예외 처리 함수가 메시지를 출력했으며, 그 후 `_start` 함수로 제어 흐름이 돌아와 `It did not crash!` 메시지도 출력됐습니다. 예외가 발생한 시점의 명령어 및 스택 포인터들을 인터럽트 스택 프레임이 알려줍니다. 이 정보는 예상치 못한 예외를 디버깅할 때 매우 유용합니다. ### 테스트 추가하기 위에서 확인한 동작을 위한 테스트를 작성해봅시다. 우선 `_start` 함수가 `init` 함수를 호출하도록 수정합니다: ```rust // in src/lib.rs /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); // 새로 추가한 코드 test_main(); loop {} } ``` Rust는 `lib.rs`를 `main.rs`와는 독립적으로 테스트하기 때문에 이 `_start` 함수는 `cargo test --lib` 실행 시에만 사용된다는 것을 기억하세요. 테스트 실행 전에 `init` 함수를 먼저 호출하여 IDT를 만들고 테스트 실행 시 사용되도록 설정합니다. 이제 `test_breakpoint_exception` 테스트를 생성할 수 있습니다: ```rust // in src/interrupts.rs #[test_case] fn test_breakpoint_exception() { // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); } ``` 테스트는 `int3` 함수를 호출하여 breakpoint 예외를 발생시킵니다. 예외 처리 후, 이전에 실행 중이었던 프로그램의 실행이 재개함을 확인함으로써 breakpoint handler가 제대로 작동하는지 점검합니다. `cargo test` (모든 테스트 실행) 혹은 `cargo test --lib` (`lib.rs` 및 그 하위 모듈의 테스트만 실행) 커맨드를 통해 이 새로운 테스트를 실행해보세요. 테스트 실행 결과가 아래처럼 출력될 것입니다: ``` blog_os::interrupts::test_breakpoint_exception... [ok] ``` ## 더 자세히 파헤치고 싶은 분들께 {#too-much-magic} `x86-interrupt` 함수 호출 규약과 [`InterruptDescriptorTable`] 타입 덕분에 비교적 쉽게 예외 처리를 구현할 수 있었습니다. 예외 처리 시 우리가 이용한 추상화 단계 아래에서 일어나는 일들을 자세히 알고 싶으신가요? 그런 분들을 위해 준비했습니다: 저희 블로그의 또다른 글 시리즈 [“Handling Exceptions with Naked Functions”]는 `x86-interrupt` 함수 호출 규약 없이 예외 처리를 구현하는 과정을 다루며, IDT 타입을 직접 구현하여 사용합니다. 해당 글 시리즈는 `x86-interrupt` 함수 호출 규약 및 `x86_64` 크레이트가 생기기 이전에 작성되었습니다. 해당 시리즈는 이 블로그의 [첫 번째 버전][first edition]에 기반하여 작성되었기에 오래되어 더 이상 유효하지 않은 정보가 포함되어 있을 수 있으니 참고 부탁드립니다. [“Handling Exceptions with Naked Functions”]: @/edition-1/extra/naked-exceptions/_index.md [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [first edition]: @/edition-1/_index.md ## 다음 단계는 무엇일까요? 이번 포스트에서 예외 (exception)를 발생시키고 처리한 후 예외로부터 반환하는 것까지 성공했습니다. 다음 단계는 우리의 커널이 모든 예외를 처리할 수 있게 하는 것입니다. 제대로 처리되지 않은 예외는 치명적인 [트리플 폴트 (triple fault)][triple fault]를 발생시켜 시스템이 리셋하도록 만듭니다. 다음 포스트에서는 트리플 폴트가 발생하지 않도록 [더블 폴트 (double fault)][double faults]를 처리하는 방법을 다뤄보겠습니다. [triple fault]: https://wiki.osdev.org/Triple_Fault [double faults]: https://wiki.osdev.org/Double_Fault#Double_Fault ================================================ FILE: blog/content/edition-2/posts/05-cpu-exceptions/index.md ================================================ +++ title = "CPU Exceptions" weight = 5 path = "cpu-exceptions" date = 2018-06-17 [extra] chapter = "Interrupts" +++ CPU exceptions occur in various erroneous situations, for example, when accessing an invalid memory address or when dividing by zero. To react to them, we have to set up an _interrupt descriptor table_ that provides handler functions. At the end of this post, our kernel will be able to catch [breakpoint exceptions] and resume normal execution afterward. [breakpoint exceptions]: https://wiki.osdev.org/Exceptions#Breakpoint This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-05`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-05 ## Overview An exception signals that something is wrong with the current instruction. For example, the CPU issues an exception if the current instruction tries to divide by 0. When an exception occurs, the CPU interrupts its current work and immediately calls a specific exception handler function, depending on the exception type. On x86, there are about 20 different CPU exception types. The most important are: - **Page Fault**: A page fault occurs on illegal memory accesses. For example, if the current instruction tries to read from an unmapped page or tries to write to a read-only page. - **Invalid Opcode**: This exception occurs when the current instruction is invalid, for example, when we try to use new [SSE instructions] on an old CPU that does not support them. - **General Protection Fault**: This is the exception with the broadest range of causes. It occurs on various kinds of access violations, such as trying to execute a privileged instruction in user-level code or writing reserved fields in configuration registers. - **Double Fault**: When an exception occurs, the CPU tries to call the corresponding handler function. If another exception occurs _while calling the exception handler_, the CPU raises a double fault exception. This exception also occurs when there is no handler function registered for an exception. - **Triple Fault**: If an exception occurs while the CPU tries to call the double fault handler function, it issues a fatal _triple fault_. We can't catch or handle a triple fault. Most processors react by resetting themselves and rebooting the operating system. [SSE instructions]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions For the full list of exceptions, check out the [OSDev wiki][exceptions]. [exceptions]: https://wiki.osdev.org/Exceptions ### The Interrupt Descriptor Table In order to catch and handle exceptions, we have to set up a so-called _Interrupt Descriptor Table_ (IDT). In this table, we can specify a handler function for each CPU exception. The hardware uses this table directly, so we need to follow a predefined format. Each entry must have the following 16-byte structure: | Type | Name | Description | | ---- | ------------------------ | ------------------------------------------------------------ | | u16 | Function Pointer [0:15] | The lower bits of the pointer to the handler function. | | u16 | GDT selector | Selector of a code segment in the [global descriptor table]. | | u16 | Options | (see below) | | u16 | Function Pointer [16:31] | The middle bits of the pointer to the handler function. | | u32 | Function Pointer [32:63] | The remaining bits of the pointer to the handler function. | | u32 | Reserved | [global descriptor table]: https://en.wikipedia.org/wiki/Global_Descriptor_Table The options field has the following format: | Bits | Name | Description | | ----- | -------------------------------- | --------------------------------------------------------------------------------------------------------------- | | 0-2 | Interrupt Stack Table Index | 0: Don't switch stacks, 1-7: Switch to the n-th stack in the Interrupt Stack Table when this handler is called. | | 3-7 | Reserved | | 8 | 0: Interrupt Gate, 1: Trap Gate | If this bit is 0, interrupts are disabled when this handler is called. | | 9-11 | must be one | | 12 | must be zero | | 13‑14 | Descriptor Privilege Level (DPL) | The minimal privilege level required for calling this handler. | | 15 | Present | Each exception has a predefined IDT index. For example, the invalid opcode exception has table index 6 and the page fault exception has table index 14. Thus, the hardware can automatically load the corresponding IDT entry for each exception. The [Exception Table][exceptions] in the OSDev wiki shows the IDT indexes of all exceptions in the “Vector nr.” column. When an exception occurs, the CPU roughly does the following: 1. Push some registers on the stack, including the instruction pointer and the [RFLAGS] register. (We will use these values later in this post.) 2. Read the corresponding entry from the Interrupt Descriptor Table (IDT). For example, the CPU reads the 14th entry when a page fault occurs. 3. Check if the entry is present and, if not, raise a double fault. 4. Disable hardware interrupts if the entry is an interrupt gate (bit 40 not set). 5. Load the specified [GDT] selector into the CS (code segment). 6. Jump to the specified handler function. [RFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register [GDT]: https://en.wikipedia.org/wiki/Global_Descriptor_Table Don't worry about steps 4 and 5 for now; we will learn about the global descriptor table and hardware interrupts in future posts. ## An IDT Type Instead of creating our own IDT type, we will use the [`InterruptDescriptorTable` struct] of the `x86_64` crate, which looks like this: [`InterruptDescriptorTable` struct]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html ``` rust #[repr(C)] pub struct InterruptDescriptorTable { pub divide_by_zero: Entry, pub debug: Entry, pub non_maskable_interrupt: Entry, pub breakpoint: Entry, pub overflow: Entry, pub bound_range_exceeded: Entry, pub invalid_opcode: Entry, pub device_not_available: Entry, pub double_fault: Entry, pub invalid_tss: Entry, pub segment_not_present: Entry, pub stack_segment_fault: Entry, pub general_protection_fault: Entry, pub page_fault: Entry, pub x87_floating_point: Entry, pub alignment_check: Entry, pub machine_check: Entry, pub simd_floating_point: Entry, pub virtualization: Entry, pub security_exception: Entry, // some fields omitted } ``` The fields have the type [`idt::Entry`], which is a struct that represents the fields of an IDT entry (see the table above). The type parameter `F` defines the expected handler function type. We see that some entries require a [`HandlerFunc`] and some entries require a [`HandlerFuncWithErrCode`]. The page fault even has its own special type: [`PageFaultHandlerFunc`]. [`idt::Entry`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.Entry.html [`HandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFunc.html [`HandlerFuncWithErrCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFuncWithErrCode.html [`PageFaultHandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.PageFaultHandlerFunc.html Let's look at the `HandlerFunc` type first: ```rust type HandlerFunc = extern "x86-interrupt" fn(_: InterruptStackFrame); ``` It's a [type alias] for an `extern "x86-interrupt" fn` type. The `extern` keyword defines a function with a [foreign calling convention] and is often used to communicate with C code (`extern "C" fn`). But what is the `x86-interrupt` calling convention? [type alias]: https://doc.rust-lang.org/book/ch20-03-advanced-types.html#creating-type-synonyms-with-type-aliases [foreign calling convention]: https://doc.rust-lang.org/nomicon/ffi.html#foreign-calling-conventions ## The Interrupt Calling Convention Exceptions are quite similar to function calls: The CPU jumps to the first instruction of the called function and executes it. Afterwards, the CPU jumps to the return address and continues the execution of the parent function. However, there is a major difference between exceptions and function calls: A function call is invoked voluntarily by a compiler-inserted `call` instruction, while an exception might occur at _any_ instruction. In order to understand the consequences of this difference, we need to examine function calls in more detail. [Calling conventions] specify the details of a function call. For example, they specify where function parameters are placed (e.g. in registers or on the stack) and how results are returned. On x86_64 Linux, the following rules apply for C functions (specified in the [System V ABI]): [Calling conventions]: https://en.wikipedia.org/wiki/Calling_convention [System V ABI]: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf - the first six integer arguments are passed in registers `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` - additional arguments are passed on the stack - results are returned in `rax` and `rdx` Note that Rust does not follow the C ABI (in fact, [there isn't even a Rust ABI yet][rust abi]), so these rules apply only to functions declared as `extern "C" fn`. [rust abi]: https://github.com/rust-lang/rfcs/issues/600 ### Preserved and Scratch Registers The calling convention divides the registers into two parts: _preserved_ and _scratch_ registers. The values of _preserved_ registers must remain unchanged across function calls. So a called function (the _“callee”_) is only allowed to overwrite these registers if it restores their original values before returning. Therefore, these registers are called _“callee-saved”_. A common pattern is to save these registers to the stack at the function's beginning and restore them just before returning. In contrast, a called function is allowed to overwrite _scratch_ registers without restrictions. If the caller wants to preserve the value of a scratch register across a function call, it needs to backup and restore it before the function call (e.g., by pushing it to the stack). So the scratch registers are _caller-saved_. On x86_64, the C calling convention specifies the following preserved and scratch registers: | preserved registers | scratch registers | | ----------------------------------------------- | ----------------------------------------------------------- | | `rbp`, `rbx`, `rsp`, `r12`, `r13`, `r14`, `r15` | `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`, `r9`, `r10`, `r11` | | _callee-saved_ | _caller-saved_ | The compiler knows these rules, so it generates the code accordingly. For example, most functions begin with a `push rbp`, which backups `rbp` on the stack (because it's a callee-saved register). ### Preserving all Registers In contrast to function calls, exceptions can occur on _any_ instruction. In most cases, we don't even know at compile time if the generated code will cause an exception. For example, the compiler can't know if an instruction causes a stack overflow or a page fault. Since we don't know when an exception occurs, we can't backup any registers before. This means we can't use a calling convention that relies on caller-saved registers for exception handlers. Instead, we need a calling convention that preserves _all registers_. The `x86-interrupt` calling convention is such a calling convention, so it guarantees that all register values are restored to their original values on function return. Note that this does not mean all registers are saved to the stack at function entry. Instead, the compiler only backs up the registers that are overwritten by the function. This way, very efficient code can be generated for short functions that only use a few registers. ### The Interrupt Stack Frame On a normal function call (using the `call` instruction), the CPU pushes the return address before jumping to the target function. On function return (using the `ret` instruction), the CPU pops this return address and jumps to it. So the stack frame of a normal function call looks like this: ![function stack frame](function-stack-frame.svg) For exception and interrupt handlers, however, pushing a return address would not suffice, since interrupt handlers often run in a different context (stack pointer, CPU flags, etc.). Instead, the CPU performs the following steps when an interrupt occurs: 0. **Saving the old stack pointer**: The CPU reads the stack pointer (`rsp`) and stack segment (`ss`) register values and remembers them in an internal buffer. 1. **Aligning the stack pointer**: An interrupt can occur at any instruction, so the stack pointer can have any value, too. However, some CPU instructions (e.g., some SSE instructions) require that the stack pointer be aligned on a 16-byte boundary, so the CPU performs such an alignment right after the interrupt. 2. **Switching stacks** (in some cases): A stack switch occurs when the CPU privilege level changes, for example, when a CPU exception occurs in a user-mode program. It is also possible to configure stack switches for specific interrupts using the so-called _Interrupt Stack Table_ (described in the next post). 3. **Pushing the old stack pointer**: The CPU pushes the `rsp` and `ss` values from step 0 to the stack. This makes it possible to restore the original stack pointer when returning from an interrupt handler. 4. **Pushing and updating the `RFLAGS` register**: The [`RFLAGS`] register contains various control and status bits. On interrupt entry, the CPU changes some bits and pushes the old value. 5. **Pushing the instruction pointer**: Before jumping to the interrupt handler function, the CPU pushes the instruction pointer (`rip`) and the code segment (`cs`). This is comparable to the return address push of a normal function call. 6. **Pushing an error code** (for some exceptions): For some specific exceptions, such as page faults, the CPU pushes an error code, which describes the cause of the exception. 7. **Invoking the interrupt handler**: The CPU reads the address and the segment descriptor of the interrupt handler function from the corresponding field in the IDT. It then invokes this handler by loading the values into the `rip` and `cs` registers. [`RFLAGS`]: https://en.wikipedia.org/wiki/FLAGS_register So the _interrupt stack frame_ looks like this: ![interrupt stack frame](exception-stack-frame.svg) In the `x86_64` crate, the interrupt stack frame is represented by the [`InterruptStackFrame`] struct. It is passed to interrupt handlers as `&mut` and can be used to retrieve additional information about the exception's cause. The struct contains no error code field, since only a few exceptions push an error code. These exceptions use the separate [`HandlerFuncWithErrCode`] function type, which has an additional `error_code` argument. [`InterruptStackFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptStackFrame.html ### Behind the Scenes The `x86-interrupt` calling convention is a powerful abstraction that hides almost all of the messy details of the exception handling process. However, sometimes it's useful to know what's happening behind the curtain. Here is a short overview of the things that the `x86-interrupt` calling convention takes care of: - **Retrieving the arguments**: Most calling conventions expect that the arguments are passed in registers. This is not possible for exception handlers since we must not overwrite any register values before backing them up on the stack. Instead, the `x86-interrupt` calling convention is aware that the arguments already lie on the stack at a specific offset. - **Returning using `iretq`**: Since the interrupt stack frame completely differs from stack frames of normal function calls, we can't return from handler functions through the normal `ret` instruction. So instead, the `iretq` instruction must be used. - **Handling the error code**: The error code, which is pushed for some exceptions, makes things much more complex. It changes the stack alignment (see the next point) and needs to be popped off the stack before returning. The `x86-interrupt` calling convention handles all that complexity. However, it doesn't know which handler function is used for which exception, so it needs to deduce that information from the number of function arguments. That means the programmer is still responsible for using the correct function type for each exception. Luckily, the `InterruptDescriptorTable` type defined by the `x86_64` crate ensures that the correct function types are used. - **Aligning the stack**: Some instructions (especially SSE instructions) require a 16-byte stack alignment. The CPU ensures this alignment whenever an exception occurs, but for some exceptions it destroys it again later when it pushes an error code. The `x86-interrupt` calling convention takes care of this by realigning the stack in this case. If you are interested in more details, we also have a series of posts that explain exception handling using [naked functions] linked [at the end of this post][too-much-magic]. [naked functions]: https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md [too-much-magic]: #too-much-magic ## Implementation Now that we've understood the theory, it's time to handle CPU exceptions in our kernel. We'll start by creating a new interrupts module in `src/interrupts.rs`, that first creates an `init_idt` function that creates a new `InterruptDescriptorTable`: ``` rust // in src/lib.rs pub mod interrupts; // in src/interrupts.rs use x86_64::structures::idt::InterruptDescriptorTable; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); } ``` Now we can add handler functions. We start by adding a handler for the [breakpoint exception]. The breakpoint exception is the perfect exception to test exception handling. Its only purpose is to temporarily pause a program when the breakpoint instruction `int3` is executed. [breakpoint exception]: https://wiki.osdev.org/Exceptions#Breakpoint The breakpoint exception is commonly used in debuggers: When the user sets a breakpoint, the debugger overwrites the corresponding instruction with the `int3` instruction so that the CPU throws the breakpoint exception when it reaches that line. When the user wants to continue the program, the debugger replaces the `int3` instruction with the original instruction again and continues the program. For more details, see the ["_How debuggers work_"] series. ["_How debuggers work_"]: https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints For our use case, we don't need to overwrite any instructions. Instead, we just want to print a message when the breakpoint instruction is executed and then continue the program. So let's create a simple `breakpoint_handler` function and add it to our IDT: ```rust // in src/interrupts.rs use x86_64::structures::idt::{InterruptDescriptorTable, InterruptStackFrame}; use crate::println; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); } extern "x86-interrupt" fn breakpoint_handler( stack_frame: InterruptStackFrame) { println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); } ``` Our handler just outputs a message and pretty-prints the interrupt stack frame. When we try to compile it, the following error occurs: ``` error[E0658]: x86-interrupt ABI is experimental and subject to change (see issue #40180) --> src/main.rs:53:1 | 53 | / extern "x86-interrupt" fn breakpoint_handler(stack_frame: InterruptStackFrame) { 54 | | println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); 55 | | } | |_^ | = help: add #![feature(abi_x86_interrupt)] to the crate attributes to enable ``` This error occurs because the `x86-interrupt` calling convention is still unstable. To use it anyway, we have to explicitly enable it by adding `#![feature(abi_x86_interrupt)]` at the top of our `lib.rs`. ### Loading the IDT In order for the CPU to use our new interrupt descriptor table, we need to load it using the [`lidt`] instruction. The `InterruptDescriptorTable` struct of the `x86_64` crate provides a [`load`][InterruptDescriptorTable::load] method for that. Let's try to use it: [`lidt`]: https://www.felixcloutier.com/x86/lgdt:lidt [InterruptDescriptorTable::load]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html#method.load ```rust // in src/interrupts.rs pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.load(); } ``` When we try to compile it now, the following error occurs: ``` error: `idt` does not live long enough --> src/interrupts/mod.rs:43:5 | 43 | idt.load(); | ^^^ does not live long enough 44 | } | - borrowed value only lives until here | = note: borrowed value must be valid for the static lifetime... ``` So the `load` method expects a `&'static self`, that is, a reference valid for the complete runtime of the program. The reason is that the CPU will access this table on every interrupt until we load a different IDT. So using a shorter lifetime than `'static` could lead to use-after-free bugs. In fact, this is exactly what happens here. Our `idt` is created on the stack, so it is only valid inside the `init` function. Afterwards, the stack memory is reused for other functions, so the CPU would interpret random stack memory as IDT. Luckily, the `InterruptDescriptorTable::load` method encodes this lifetime requirement in its function definition, so that the Rust compiler is able to prevent this possible bug at compile time. In order to fix this problem, we need to store our `idt` at a place where it has a `'static` lifetime. To achieve this, we could allocate our IDT on the heap using [`Box`] and then convert it to a `'static` reference, but we are writing an OS kernel and thus don't have a heap (yet). [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html As an alternative, we could try to store the IDT as a `static`: ```rust static IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } ``` However, there is a problem: Statics are immutable, so we can't modify the breakpoint entry from our `init` function. We could solve this problem by using a [`static mut`]: [`static mut`]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable ```rust static mut IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { unsafe { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } } ``` This variant compiles without errors but it's far from idiomatic. `static mut`s are very prone to data races, so we need an [`unsafe` block] on each access. [`unsafe` block]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers #### Lazy Statics to the Rescue Fortunately, the `lazy_static` macro exists. Instead of evaluating a `static` at compile time, the macro performs the initialization when the `static` is referenced the first time. Thus, we can do almost everything in the initialization block and are even able to read runtime values. We already imported the `lazy_static` crate when we [created an abstraction for the VGA text buffer][vga text buffer lazy static]. So we can directly use the `lazy_static!` macro to create our static IDT: [vga text buffer lazy static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics ```rust // in src/interrupts.rs use lazy_static::lazy_static; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt }; } pub fn init_idt() { IDT.load(); } ``` Note how this solution requires no `unsafe` blocks. The `lazy_static!` macro does use `unsafe` behind the scenes, but it is abstracted away in a safe interface. ### Running it The last step for making exceptions work in our kernel is to call the `init_idt` function from our `main.rs`. Instead of calling it directly, we introduce a general `init` function in our `lib.rs`: ```rust // in src/lib.rs pub fn init() { interrupts::init_idt(); } ``` With this function, we now have a central place for initialization routines that can be shared between the different `_start` functions in our `main.rs`, `lib.rs`, and integration tests. Now we can update the `_start` function of our `main.rs` to call `init` and then trigger a breakpoint exception: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // new // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); // new // as before #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` When we run it in QEMU now (using `cargo run`), we see the following: ![QEMU printing `EXCEPTION: BREAKPOINT` and the interrupt stack frame](qemu-breakpoint-exception.png) It works! The CPU successfully invokes our breakpoint handler, which prints the message, and then returns back to the `_start` function, where the `It did not crash!` message is printed. We see that the interrupt stack frame tells us the instruction and stack pointers at the time when the exception occurred. This information is very useful when debugging unexpected exceptions. ### Adding a Test Let's create a test that ensures that the above continues to work. First, we update the `_start` function to also call `init`: ```rust // in src/lib.rs /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); // new test_main(); loop {} } ``` Remember, this `_start` function is used when running `cargo test --lib`, since Rust tests the `lib.rs` completely independently of the `main.rs`. We need to call `init` here to set up an IDT before running the tests. Now we can create a `test_breakpoint_exception` test: ```rust // in src/interrupts.rs #[test_case] fn test_breakpoint_exception() { // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); } ``` The test invokes the `int3` function to trigger a breakpoint exception. By checking that the execution continues afterward, we verify that our breakpoint handler is working correctly. You can try this new test by running `cargo test` (all tests) or `cargo test --lib` (only tests of `lib.rs` and its modules). You should see the following in the output: ``` blog_os::interrupts::test_breakpoint_exception... [ok] ``` ## Too much Magic? The `x86-interrupt` calling convention and the [`InterruptDescriptorTable`] type made the exception handling process relatively straightforward and painless. If this was too much magic for you and you like to learn all the gory details of exception handling, we've got you covered: Our [“Handling Exceptions with Naked Functions”] series shows how to handle exceptions without the `x86-interrupt` calling convention and also creates its own IDT type. Historically, these posts were the main exception handling posts before the `x86-interrupt` calling convention and the `x86_64` crate existed. Note that these posts are based on the [first edition] of this blog and might be out of date. [“Handling Exceptions with Naked Functions”]: @/edition-1/extra/naked-exceptions/_index.md [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [first edition]: @/edition-1/_index.md ## What's next? We've successfully caught our first exception and returned from it! The next step is to ensure that we catch all exceptions because an uncaught exception causes a fatal [triple fault], which leads to a system reset. The next post explains how we can avoid this by correctly catching [double faults]. [triple fault]: https://wiki.osdev.org/Triple_Fault [double faults]: https://wiki.osdev.org/Double_Fault#Double_Fault ================================================ FILE: blog/content/edition-2/posts/05-cpu-exceptions/index.pt-BR.md ================================================ +++ title = "Exceções de CPU" weight = 5 path = "pt-BR/cpu-exceptions" date = 2018-06-17 [extra] chapter = "Interrupções" # Please update this when updating the translation translation_based_on_commit = "9753695744854686a6b80012c89b0d850a44b4b0" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Exceções de CPU ocorrem em várias situações errôneas, por exemplo, ao acessar um endereço de memória inválido ou ao dividir por zero. Para reagir a elas, precisamos configurar uma _tabela de descritores de interrupção_ que fornece funções manipuladoras. Ao final desta postagem, nosso kernel será capaz de capturar [exceções de breakpoint] e retomar a execução normal posteriormente. [exceções de breakpoint]: https://wiki.osdev.org/Exceptions#Breakpoint Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na branch [`post-05`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-05 ## Visão Geral Uma exceção sinaliza que algo está errado com a instrução atual. Por exemplo, a CPU emite uma exceção se a instrução atual tenta dividir por 0. Quando uma exceção ocorre, a CPU interrompe seu trabalho atual e imediatamente chama uma função manipuladora de exceção específica, dependendo do tipo de exceção. No x86, existem cerca de 20 tipos diferentes de exceções de CPU. As mais importantes são: - **Page Fault**: Um page fault ocorre em acessos ilegais à memória. Por exemplo, se a instrução atual tenta ler de uma página não mapeada ou tenta escrever em uma página somente leitura. - **Invalid Opcode**: Esta exceção ocorre quando a instrução atual é inválida, por exemplo, quando tentamos usar novas [instruções SSE] em uma CPU antiga que não as suporta. - **General Protection Fault**: Esta é a exceção com a gama mais ampla de causas. Ela ocorre em vários tipos de violações de acesso, como tentar executar uma instrução privilegiada em código de nível de usuário ou escrever em campos reservados de registradores de configuração. - **Double Fault**: Quando uma exceção ocorre, a CPU tenta chamar a função manipuladora correspondente. Se outra exceção ocorre _enquanto chama o manipulador de exceção_, a CPU levanta uma exceção de double fault. Esta exceção também ocorre quando não há função manipuladora registrada para uma exceção. - **Triple Fault**: Se uma exceção ocorre enquanto a CPU tenta chamar a função manipuladora de double fault, ela emite um _triple fault_ fatal. Não podemos capturar ou manipular um triple fault. A maioria dos processadores reage redefinindo-se e reinicializando o sistema operacional. [instruções SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions Para a lista completa de exceções, consulte a [wiki do OSDev][exceptions]. [exceptions]: https://wiki.osdev.org/Exceptions ### A Tabela de Descritores de Interrupção Para capturar e manipular exceções, precisamos configurar uma chamada _Tabela de Descritores de Interrupção_ (IDT - Interrupt Descriptor Table). Nesta tabela, podemos especificar uma função manipuladora para cada exceção de CPU. O hardware usa esta tabela diretamente, então precisamos seguir um formato predefinido. Cada entrada deve ter a seguinte estrutura de 16 bytes: | Tipo | Nome | Descrição | | ---- | ------------------------ | -------------------------------------------------------------------- | | u16 | Function Pointer [0:15] | Os bits inferiores do ponteiro para a função manipuladora. | | u16 | GDT selector | Seletor de um segmento de código na [tabela de descritores globais]. | | u16 | Options | (veja abaixo) | | u16 | Function Pointer [16:31] | Os bits do meio do ponteiro para a função manipuladora. | | u32 | Function Pointer [32:63] | Os bits restantes do ponteiro para a função manipuladora. | | u32 | Reserved | [tabela de descritores globais]: https://en.wikipedia.org/wiki/Global_Descriptor_Table O campo options tem o seguinte formato: | Bits | Nome | Descrição | | ----- | -------------------------------- | ---------------------------------------------------------------------------------------------------------------- | | 0-2 | Interrupt Stack Table Index | 0: Não troca stacks, 1-7: Troca para a n-ésima stack na Interrupt Stack Table quando este manipulador é chamado. | | 3-7 | Reserved | | 8 | 0: Interrupt Gate, 1: Trap Gate | Se este bit é 0, as interrupções são desativadas quando este manipulador é chamado. | | 9-11 | must be one | | 12 | must be zero | | 13‑14 | Descriptor Privilege Level (DPL) | O nível mínimo de privilégio necessário para chamar este manipulador. | | 15 | Present | Cada exceção tem um índice predefinido na IDT. Por exemplo, a exceção invalid opcode tem índice de tabela 6 e a exceção page fault tem índice de tabela 14. Assim, o hardware pode automaticamente carregar a entrada IDT correspondente para cada exceção. A [Tabela de Exceções][exceptions] na wiki do OSDev mostra os índices IDT de todas as exceções na coluna "Vector nr.". Quando uma exceção ocorre, a CPU aproximadamente faz o seguinte: 1. Empurra alguns registradores na pilha, incluindo o ponteiro de instrução e o registrador [RFLAGS]. (Usaremos esses valores mais tarde nesta postagem.) 2. Lê a entrada correspondente da Tabela de Descritores de Interrupção (IDT). Por exemplo, a CPU lê a 14ª entrada quando ocorre um page fault. 3. Verifica se a entrada está presente e, se não estiver, levanta um double fault. 4. Desativa interrupções de hardware se a entrada é um interrupt gate (bit 40 não está definido). 5. Carrega o seletor [GDT] especificado no CS (segmento de código). 6. Pula para a função manipuladora especificada. [RFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register [GDT]: https://en.wikipedia.org/wiki/Global_Descriptor_Table Não se preocupe com os passos 4 e 5 por enquanto; aprenderemos sobre a tabela de descritores globais e interrupções de hardware em postagens futuras. ## Um Tipo IDT Em vez de criar nosso próprio tipo IDT, usaremos a [struct `InterruptDescriptorTable`] da crate `x86_64`, que se parece com isto: [struct `InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html ``` rust #[repr(C)] pub struct InterruptDescriptorTable { pub divide_by_zero: Entry, pub debug: Entry, pub non_maskable_interrupt: Entry, pub breakpoint: Entry, pub overflow: Entry, pub bound_range_exceeded: Entry, pub invalid_opcode: Entry, pub device_not_available: Entry, pub double_fault: Entry, pub invalid_tss: Entry, pub segment_not_present: Entry, pub stack_segment_fault: Entry, pub general_protection_fault: Entry, pub page_fault: Entry, pub x87_floating_point: Entry, pub alignment_check: Entry, pub machine_check: Entry, pub simd_floating_point: Entry, pub virtualization: Entry, pub security_exception: Entry, // alguns campos omitidos } ``` Os campos têm o tipo [`idt::Entry`], que é uma struct que representa os campos de uma entrada IDT (veja a tabela acima). O parâmetro de tipo `F` define o tipo de função manipuladora esperado. Vemos que algumas entradas requerem uma [`HandlerFunc`] e algumas entradas requerem uma [`HandlerFuncWithErrCode`]. O page fault tem até seu próprio tipo especial: [`PageFaultHandlerFunc`]. [`idt::Entry`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.Entry.html [`HandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFunc.html [`HandlerFuncWithErrCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFuncWithErrCode.html [`PageFaultHandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.PageFaultHandlerFunc.html Vamos olhar primeiro para o tipo `HandlerFunc`: ```rust type HandlerFunc = extern "x86-interrupt" fn(_: InterruptStackFrame); ``` É um [type alias] para um tipo `extern "x86-interrupt" fn`. A palavra-chave `extern` define uma função com uma [convenção de chamada estrangeira] e é frequentemente usada para se comunicar com código C (`extern "C" fn`). Mas o que é a convenção de chamada `x86-interrupt`? [type alias]: https://doc.rust-lang.org/book/ch20-03-advanced-types.html#creating-type-synonyms-with-type-aliases [convenção de chamada estrangeira]: https://doc.rust-lang.org/nomicon/ffi.html#foreign-calling-conventions ## A Convenção de Chamada de Interrupção Exceções são bastante similares a chamadas de função: A CPU pula para a primeira instrução da função chamada e a executa. Depois, a CPU pula para o endereço de retorno e continua a execução da função pai. No entanto, há uma diferença importante entre exceções e chamadas de função: Uma chamada de função é invocada voluntariamente por uma instrução `call` inserida pelo compilador, enquanto uma exceção pode ocorrer em _qualquer_ instrução. Para entender as consequências desta diferença, precisamos examinar as chamadas de função em mais detalhes. [Convenções de chamada] especificam os detalhes de uma chamada de função. Por exemplo, elas especificam onde os parâmetros da função são colocados (por exemplo, em registradores ou na pilha) e como os resultados são retornados. No x86_64 Linux, as seguintes regras se aplicam para funções C (especificadas no [System V ABI]): [Convenções de chamada]: https://en.wikipedia.org/wiki/Calling_convention [System V ABI]: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf - os primeiros seis argumentos inteiros são passados nos registradores `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` - argumentos adicionais são passados na pilha - resultados são retornados em `rax` e `rdx` Note que Rust não segue a ABI do C (na verdade, [nem existe uma ABI Rust ainda][rust abi]), então essas regras se aplicam apenas a funções declaradas como `extern "C" fn`. [rust abi]: https://github.com/rust-lang/rfcs/issues/600 ### Registradores Preservados e Scratch A convenção de chamada divide os registradores em duas partes: registradores _preservados_ e _scratch_. Os valores dos registradores _preservados_ devem permanecer inalterados entre chamadas de função. Portanto, uma função chamada (a _"callee"_) só tem permissão para sobrescrever esses registradores se restaurar seus valores originais antes de retornar. Portanto, esses registradores são chamados de _"callee-saved"_. Um padrão comum é salvar esses registradores na pilha no início da função e restaurá-los logo antes de retornar. Em contraste, uma função chamada tem permissão para sobrescrever registradores _scratch_ sem restrições. Se o chamador quiser preservar o valor de um registrador scratch entre uma chamada de função, ele precisa fazer backup e restaurá-lo antes da chamada de função (por exemplo, empurrando-o para a pilha). Portanto, os registradores scratch são _caller-saved_. No x86_64, a convenção de chamada C especifica os seguintes registradores preservados e scratch: | registradores preservados | registradores scratch | | ----------------------------------------------- | ----------------------------------------------------------- | | `rbp`, `rbx`, `rsp`, `r12`, `r13`, `r14`, `r15` | `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`, `r9`, `r10`, `r11` | | _callee-saved_ | _caller-saved_ | O compilador conhece essas regras, então gera o código de acordo. Por exemplo, a maioria das funções começa com um `push rbp`, que faz backup de `rbp` na pilha (porque é um registrador callee-saved). ### Preservando Todos os Registradores Em contraste com chamadas de função, exceções podem ocorrer em _qualquer_ instrução. Na maioria dos casos, não sabemos nem em tempo de compilação se o código gerado causará uma exceção. Por exemplo, o compilador não pode saber se uma instrução causa um stack overflow ou um page fault. Como não sabemos quando uma exceção ocorre, não podemos fazer backup de nenhum registrador antes. Isso significa que não podemos usar uma convenção de chamada que depende de registradores caller-saved para manipuladores de exceção. Em vez disso, precisamos de uma convenção de chamada que preserva _todos os registradores_. A convenção de chamada `x86-interrupt` é tal convenção de chamada, então garante que todos os valores de registrador são restaurados para seus valores originais no retorno da função. Note que isso não significa que todos os registradores são salvos na pilha na entrada da função. Em vez disso, o compilador apenas faz backup dos registradores que são sobrescritos pela função. Desta forma, código muito eficiente pode ser gerado para funções curtas que usam apenas alguns registradores. ### O Stack Frame de Interrupção Em uma chamada de função normal (usando a instrução `call`), a CPU empurra o endereço de retorno antes de pular para a função alvo. No retorno da função (usando a instrução `ret`), a CPU retira este endereço de retorno e pula para ele. Então o stack frame de uma chamada de função normal se parece com isto: ![function stack frame](function-stack-frame.svg) Para manipuladores de exceção e interrupção, no entanto, empurrar um endereço de retorno não seria suficiente, já que manipuladores de interrupção frequentemente executam em um contexto diferente (ponteiro de pilha, flags da CPU, etc.). Em vez disso, a CPU executa os seguintes passos quando uma interrupção ocorre: 0. **Salvando o antigo ponteiro de pilha**: A CPU lê os valores dos registradores ponteiro de pilha (`rsp`) e segmento de pilha (`ss`) e os lembra em um buffer interno. 1. **Alinhando o ponteiro de pilha**: Uma interrupção pode ocorrer em qualquer instrução, então o ponteiro de pilha pode ter qualquer valor também. No entanto, algumas instruções de CPU (por exemplo, algumas instruções SSE) requerem que o ponteiro de pilha esteja alinhado em um limite de 16 bytes, então a CPU realiza tal alinhamento logo após a interrupção. 2. **Trocando pilhas** (em alguns casos): Uma troca de pilha ocorre quando o nível de privilégio da CPU muda, por exemplo, quando uma exceção de CPU ocorre em um programa em modo usuário. Também é possível configurar trocas de pilha para interrupções específicas usando a chamada _Interrupt Stack Table_ (descrita na próxima postagem). 3. **Empurrando o antigo ponteiro de pilha**: A CPU empurra os valores `rsp` e `ss` do passo 0 para a pilha. Isso torna possível restaurar o ponteiro de pilha original ao retornar de um manipulador de interrupção. 4. **Empurrando e atualizando o registrador `RFLAGS`**: O registrador [`RFLAGS`] contém vários bits de controle e status. Na entrada de interrupção, a CPU muda alguns bits e empurra o valor antigo. 5. **Empurrando o ponteiro de instrução**: Antes de pular para a função manipuladora de interrupção, a CPU empurra o ponteiro de instrução (`rip`) e o segmento de código (`cs`). Isso é comparável ao push de endereço de retorno de uma chamada de função normal. 6. **Empurrando um código de erro** (para algumas exceções): Para algumas exceções específicas, como page faults, a CPU empurra um código de erro, que descreve a causa da exceção. 7. **Invocando o manipulador de interrupção**: A CPU lê o endereço e o descritor de segmento da função manipuladora de interrupção do campo correspondente na IDT. Ela então invoca este manipulador carregando os valores nos registradores `rip` e `cs`. [`RFLAGS`]: https://en.wikipedia.org/wiki/FLAGS_register Então o _interrupt stack frame_ se parece com isto: ![interrupt stack frame](exception-stack-frame.svg) Na crate `x86_64`, o interrupt stack frame é representado pela struct [`InterruptStackFrame`]. Ela é passada para manipuladores de interrupção como `&mut` e pode ser usada para recuperar informações adicionais sobre a causa da exceção. A struct não contém campo de código de erro, já que apenas algumas exceções empurram um código de erro. Essas exceções usam o tipo de função [`HandlerFuncWithErrCode`] separado, que tem um argumento adicional `error_code`. [`InterruptStackFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptStackFrame.html ### Por Trás das Cortinas A convenção de chamada `x86-interrupt` é uma abstração poderosa que esconde quase todos os detalhes confusos do processo de manipulação de exceção. No entanto, às vezes é útil saber o que está acontecendo por trás das cortinas. Aqui está uma breve visão geral das coisas das quais a convenção de chamada `x86-interrupt` cuida: - **Recuperando os argumentos**: A maioria das convenções de chamada espera que os argumentos sejam passados em registradores. Isso não é possível para manipuladores de exceção, já que não devemos sobrescrever nenhum valor de registrador antes de fazer backup deles na pilha. Em vez disso, a convenção de chamada `x86-interrupt` está ciente de que os argumentos já estão na pilha em um deslocamento específico. - **Retornando usando `iretq`**: Como o interrupt stack frame difere completamente dos stack frames de chamadas de função normais, não podemos retornar de funções manipuladoras através da instrução `ret` normal. Então, em vez disso, a instrução `iretq` deve ser usada. - **Manipulando o código de erro**: O código de erro, que é empurrado para algumas exceções, torna as coisas muito mais complexas. Ele muda o alinhamento da pilha (veja o próximo ponto) e precisa ser retirado da pilha antes de retornar. A convenção de chamada `x86-interrupt` manipula toda essa complexidade. No entanto, ela não sabe qual função manipuladora é usada para qual exceção, então precisa deduzir essa informação do número de argumentos da função. Isso significa que o programador ainda é responsável por usar o tipo de função correto para cada exceção. Felizmente, o tipo `InterruptDescriptorTable` definido pela crate `x86_64` garante que os tipos de função corretos são usados. - **Alinhando a pilha**: Algumas instruções (especialmente instruções SSE) requerem um alinhamento de pilha de 16 bytes. A CPU garante esse alinhamento sempre que uma exceção ocorre, mas para algumas exceções ela o destrói novamente mais tarde quando empurra um código de erro. A convenção de chamada `x86-interrupt` cuida disso realinhando a pilha neste caso. Se você estiver interessado em mais detalhes, também temos uma série de postagens que explica a manipulação de exceção usando [funções nuas] vinculadas [no final desta postagem][too-much-magic]. [funções nuas]: https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md [too-much-magic]: #muita-magica ## Implementação Agora que entendemos a teoria, é hora de manipular exceções de CPU em nosso kernel. Começaremos criando um novo módulo interrupts em `src/interrupts.rs`, que primeiro cria uma função `init_idt` que cria uma nova `InterruptDescriptorTable`: ``` rust // em src/lib.rs pub mod interrupts; // em src/interrupts.rs use x86_64::structures::idt::InterruptDescriptorTable; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); } ``` Agora podemos adicionar funções manipuladoras. Começamos adicionando um manipulador para a [exceção de breakpoint]. A exceção de breakpoint é a exceção perfeita para testar a manipulação de exceção. Seu único propósito é pausar temporariamente um programa quando a instrução de breakpoint `int3` é executada. [exceção de breakpoint]: https://wiki.osdev.org/Exceptions#Breakpoint A exceção de breakpoint é comumente usada em debuggers: Quando o usuário define um breakpoint, o debugger sobrescreve a instrução correspondente com a instrução `int3` para que a CPU lance a exceção de breakpoint quando atinge aquela linha. Quando o usuário quer continuar o programa, o debugger substitui a instrução `int3` pela instrução original novamente e continua o programa. Para mais detalhes, veja a série ["_How debuggers work_"]. ["_How debuggers work_"]: https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints Para nosso caso de uso, não precisamos sobrescrever nenhuma instrução. Em vez disso, queremos apenas imprimir uma mensagem quando a instrução de breakpoint é executada e então continuar o programa. Então vamos criar uma função `breakpoint_handler` simples e adicioná-la à nossa IDT: ```rust // em src/interrupts.rs use x86_64::structures::idt::{InterruptDescriptorTable, InterruptStackFrame}; use crate::println; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); } extern "x86-interrupt" fn breakpoint_handler( stack_frame: InterruptStackFrame) { println!("EXCEÇÃO: BREAKPOINT\n{:#?}", stack_frame); } ``` Nosso manipulador apenas produz uma mensagem e imprime de forma bonita o interrupt stack frame. Quando tentamos compilá-lo, o seguinte erro ocorre: ``` error[E0658]: x86-interrupt ABI is experimental and subject to change (see issue #40180) --> src/main.rs:53:1 | 53 | / extern "x86-interrupt" fn breakpoint_handler(stack_frame: InterruptStackFrame) { 54 | | println!("EXCEÇÃO: BREAKPOINT\n{:#?}", stack_frame); 55 | | } | |_^ | = help: add #![feature(abi_x86_interrupt)] to the crate attributes to enable ``` Este erro ocorre porque a convenção de chamada `x86-interrupt` ainda é instável. Para usá-la de qualquer forma, temos que habilitá-la explicitamente adicionando `#![feature(abi_x86_interrupt)]` no topo do nosso `lib.rs`. ### Carregando a IDT Para que a CPU use nossa nova tabela de descritores de interrupção, precisamos carregá-la usando a instrução [`lidt`]. A struct `InterruptDescriptorTable` da crate `x86_64` fornece um método [`load`][InterruptDescriptorTable::load] para isso. Vamos tentar usá-lo: [`lidt`]: https://www.felixcloutier.com/x86/lgdt:lidt [InterruptDescriptorTable::load]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html#method.load ```rust // em src/interrupts.rs pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.load(); } ``` Quando tentamos compilar agora, o seguinte erro ocorre: ``` error: `idt` does not live long enough --> src/interrupts/mod.rs:43:5 | 43 | idt.load(); | ^^^ does not live long enough 44 | } | - borrowed value only lives until here | = note: borrowed value must be valid for the static lifetime... ``` Então o método `load` espera um `&'static self`, isto é, uma referência válida para o tempo de execução completo do programa. A razão é que a CPU acessará esta tabela em cada interrupção até carregarmos uma IDT diferente. Então usar um tempo de vida menor que `'static` poderia levar a bugs de use-after-free. De fato, isso é exatamente o que acontece aqui. Nossa `idt` é criada na pilha, então ela é válida apenas dentro da função `init`. Depois, a memória da pilha é reutilizada para outras funções, então a CPU interpretaria memória aleatória da pilha como IDT. Felizmente, o método `InterruptDescriptorTable::load` codifica este requisito de tempo de vida em sua definição de função, para que o compilador Rust seja capaz de prevenir este possível bug em tempo de compilação. Para corrigir este problema, precisamos armazenar nossa `idt` em um lugar onde ela tenha um tempo de vida `'static`. Para conseguir isso, poderíamos alocar nossa IDT no heap usando [`Box`] e então convertê-la para uma referência `'static`, mas estamos escrevendo um kernel de SO e, portanto, não temos um heap (ainda). [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html Como alternativa, poderíamos tentar armazenar a IDT como uma `static`: ```rust static IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } ``` No entanto, há um problema: Statics são imutáveis, então não podemos modificar a entrada de breakpoint da nossa função `init`. Poderíamos resolver este problema usando uma [`static mut`]: [`static mut`]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable ```rust static mut IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { unsafe { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } } ``` Esta variante compila sem erros, mas está longe de ser idiomática. `static mut`s são muito propensas a data races, então precisamos de um [bloco `unsafe`] em cada acesso. [bloco `unsafe`]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers #### Lazy Statics ao Resgate Felizmente, a macro `lazy_static` existe. Em vez de avaliar uma `static` em tempo de compilação, a macro realiza a inicialização quando a `static` é referenciada pela primeira vez. Assim, podemos fazer quase tudo no bloco de inicialização e somos até capazes de ler valores de tempo de execução. Já importamos a crate `lazy_static` quando [criamos uma abstração para o buffer de texto VGA][vga text buffer lazy static]. Então podemos usar diretamente a macro `lazy_static!` para criar nossa IDT estática: [vga text buffer lazy static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics ```rust // em src/interrupts.rs use lazy_static::lazy_static; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt }; } pub fn init_idt() { IDT.load(); } ``` Note como esta solução não requer blocos `unsafe`. A macro `lazy_static!` usa `unsafe` por trás dos panos, mas é abstraída em uma interface segura. ### Executando O último passo para fazer exceções funcionarem em nosso kernel é chamar a função `init_idt` do nosso `main.rs`. Em vez de chamá-la diretamente, introduzimos uma função geral `init` em nosso `lib.rs`: ```rust // em src/lib.rs pub fn init() { interrupts::init_idt(); } ``` Com esta função, agora temos um lugar central para rotinas de inicialização que podem ser compartilhadas entre as diferentes funções `_start` em nosso `main.rs`, `lib.rs` e testes de integração. Agora podemos atualizar a função `_start` do nosso `main.rs` para chamar `init` e então disparar uma exceção de breakpoint: ```rust // em src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Olá Mundo{}", "!"); blog_os::init(); // novo // invoca uma exceção de breakpoint x86_64::instructions::interrupts::int3(); // novo // como antes #[cfg(test)] test_main(); println!("Não crashou!"); loop {} } ``` Quando executamos agora no QEMU (usando `cargo run`), vemos o seguinte: ![QEMU printing `EXCEÇÃO: BREAKPOINT` and the interrupt stack frame](qemu-breakpoint-exception.png) Funciona! A CPU invoca com sucesso nosso manipulador de breakpoint, que imprime a mensagem, e então retorna de volta para a função `_start`, onde a mensagem `Não crashou!` é impressa. Vemos que o interrupt stack frame nos diz os ponteiros de instrução e pilha no momento em que a exceção ocorreu. Esta informação é muito útil ao depurar exceções inesperadas. ### Adicionando um Teste Vamos criar um teste que garante que o acima continue funcionando. Primeiro, atualizamos a função `_start` para também chamar `init`: ```rust // em src/lib.rs /// Ponto de entrada para `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); // novo test_main(); loop {} } ``` Lembre-se, esta função `_start` é usada quando executamos `cargo test --lib`, já que Rust testa o `lib.rs` completamente independente do `main.rs`. Precisamos chamar `init` aqui para configurar uma IDT antes de executar os testes. Agora podemos criar um teste `test_breakpoint_exception`: ```rust // em src/interrupts.rs #[test_case] fn test_breakpoint_exception() { // invoca uma exceção de breakpoint x86_64::instructions::interrupts::int3(); } ``` O teste invoca a função `int3` para disparar uma exceção de breakpoint. Ao verificar que a execução continua depois, verificamos que nosso manipulador de breakpoint está funcionando corretamente. Você pode tentar este novo teste executando `cargo test` (todos os testes) ou `cargo test --lib` (apenas testes de `lib.rs` e seus módulos). Você deve ver o seguinte na saída: ``` blog_os::interrupts::test_breakpoint_exception... [ok] ``` ## Muita Mágica? A convenção de chamada `x86-interrupt` e o tipo [`InterruptDescriptorTable`] tornaram o processo de manipulação de exceção relativamente simples e indolor. Se isso foi muita mágica para você e você gostaria de aprender todos os detalhes sórdidos da manipulação de exceção, nós temos você coberto: Nossa série ["Manipulando Exceções com Funções Nuas"] mostra como manipular exceções sem a convenção de chamada `x86-interrupt` e também cria seu próprio tipo IDT. Historicamente, essas postagens eram as principais postagens de manipulação de exceção antes que a convenção de chamada `x86-interrupt` e a crate `x86_64` existissem. Note que essas postagens são baseadas na [primeira edição] deste blog e podem estar desatualizadas. ["Manipulando Exceções com Funções Nuas"]: @/edition-1/extra/naked-exceptions/_index.md [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [primeira edição]: @/edition-1/_index.md ## O Que Vem a Seguir? Capturamos com sucesso nossa primeira exceção e retornamos dela! O próximo passo é garantir que capturemos todas as exceções porque uma exceção não capturada causa um [triple fault] fatal, que leva a uma redefinição do sistema. A próxima postagem explica como podemos evitar isso capturando corretamente [double faults]. [triple fault]: https://wiki.osdev.org/Triple_Fault [double faults]: https://wiki.osdev.org/Double_Fault#Double_Fault ================================================ FILE: blog/content/edition-2/posts/05-cpu-exceptions/index.zh-CN.md ================================================ +++ title = "CPU异常处理" weight = 5 path = "zh-CN/cpu-exceptions" date = 2018-06-17 [extra] # Please update this when updating the translation translation_based_on_commit = "096c044b4f3697e91d8e30a2e817e567d0ef21a2" # GitHub usernames of the people that translated this post translators = ["liuyuran"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["JiangengDong", "Byacrya"] +++ CPU异常在很多情况下都有可能发生,比如访问无效的内存地址,或者在除法运算里除以0。为了处理这些错误,我们需要设置一个 _中断描述符表_ 来提供异常处理函数。在文章的最后,我们的内核将能够捕获 [断点异常][breakpoint exceptions] 并在处理后恢复正常执行。 [breakpoint exceptions]: https://wiki.osdev.org/Exceptions#Breakpoint 这个系列的blog在[GitHub]上开放开发,如果你有任何问题,请在这里开一个issue来讨论。当然你也可以在[底部][at the bottom]留言。你可以在[`post-05`][post branch]找到这篇文章的完整源码。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-05 ## 简述 异常信号会在当前指令触发错误时被触发,例如执行了除数为0的除法。当异常发生后,CPU会中断当前的工作,并立即根据异常类型调用对应的错误处理函数。 在x86架构中,存在20种不同的CPU异常类型,以下为最重要的几种: - **Page Fault**: 页错误是被非法内存访问触发的,例如当前指令试图访问未被映射过的页,或者试图写入只读页。 - **Invalid Opcode**: 该错误是说当前指令操作符无效,比如在不支持SSE的旧式CPU上执行了 [SSE 指令][SSE instructions]。 - **General Protection Fault**: 该错误的原因有很多,主要原因就是权限异常,即试图使用用户态代码执行核心指令,或是修改配置寄存器的保留字段。 - **Double Fault**: 当错误发生时,CPU会尝试调用错误处理函数,但如果 _在调用错误处理函数过程中_ 再次发生错误,CPU就会触发该错误。另外,如果没有注册错误处理函数也会触发该错误。 - **Triple Fault**: 如果CPU调用了对应 `Double Fault` 异常的处理函数依然没有成功,该错误会被抛出。这是一个致命级别的 _三重异常_,这意味着我们已经无法捕捉它,对于大多数操作系统而言,此时就应该重置数据并重启操作系统。 [SSE instructions]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions 在 [OSDev wiki][exceptions] 可以看到完整的异常类型列表。 [exceptions]: https://wiki.osdev.org/Exceptions ### 中断描述符表 要捕捉CPU异常,我们需要设置一个 _中断描述符表_ (_Interrupt Descriptor Table_, IDT),用来捕获每一个异常。由于硬件层面会不加验证的直接使用,所以我们需要根据预定义格式直接写入数据。符表的每一行都遵循如下的16字节结构。 | Type | Name | Description | | ---- | ------------------------ | ------------------------------------------------------- | | u16 | Function Pointer [0:15] | 处理函数地址的低位(最后16位) | | u16 | GDT selector | [全局描述符表][global descriptor table]中的代码段标记。 | | u16 | Options | (如下所述) | | u16 | Function Pointer [16:31] | 处理函数地址的中位(中间16位) | | u32 | Function Pointer [32:63] | 处理函数地址的高位(剩下的所有位) | | u32 | Reserved | [global descriptor table]: https://en.wikipedia.org/wiki/Global_Descriptor_Table Options字段的格式如下: | Bits | Name | Description | | ----- | -------------------------------- | --------------------------------------------------------------- | | 0-2 | Interrupt Stack Table Index | 0: 不要切换栈, 1-7: 当处理函数被调用时,切换到中断栈表的第n层。 | | 3-7 | Reserved | | 8 | 0: Interrupt Gate, 1: Trap Gate | 如果该比特被置为0,当处理函数被调用时,中断会被禁用。 | | 9-11 | must be one | | 12 | must be zero | | 13‑14 | Descriptor Privilege Level (DPL) | 执行处理函数所需的最小特权等级。 | | 15 | Present | 每个异常都具有一个预定义的IDT序号,比如 invalid opcode 异常对应6号,而 page fault 异常对应14号,因此硬件可以直接寻找到对应的IDT条目。 OSDev wiki中的 [异常对照表][exceptions] 可以查到所有异常的IDT序号(在Vector nr.列)。 通常而言,当异常发生时,CPU会执行如下步骤: 1. 将一些寄存器数据入栈,包括指令指针以及 [RFLAGS] 寄存器。(我们会在文章稍后些的地方用到这些数据。) 2. 读取中断描述符表(IDT)的对应条目,比如当发生 page fault 异常时,调用14号条目。 3. 判断该条目确实存在,如果不存在,则触发 double fault 异常。 4. 如果该条目属于中断门(interrupt gate,bit 40 被设置为0),则禁用硬件中断。 5. 将 [GDT] 选择器载入代码段寄存器(CS segment)。 6. 跳转执行处理函数。 [RFLAGS]: https://en.wikipedia.org/wiki/FLAGS_register [GDT]: https://en.wikipedia.org/wiki/Global_Descriptor_Table 不过现在我们不必为4和5多加纠结,未来我们会单独讲解全局描述符表和硬件中断的。 ## IDT类型 与其创建我们自己的IDT类型映射,不如直接使用 `x86_64` crate 内置的 [`InterruptDescriptorTable` 结构][`InterruptDescriptorTable` struct],其实现是这样的: [`InterruptDescriptorTable` struct]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html ``` rust #[repr(C)] pub struct InterruptDescriptorTable { pub divide_by_zero: Entry, pub debug: Entry, pub non_maskable_interrupt: Entry, pub breakpoint: Entry, pub overflow: Entry, pub bound_range_exceeded: Entry, pub invalid_opcode: Entry, pub device_not_available: Entry, pub double_fault: Entry, pub invalid_tss: Entry, pub segment_not_present: Entry, pub stack_segment_fault: Entry, pub general_protection_fault: Entry, pub page_fault: Entry, pub x87_floating_point: Entry, pub alignment_check: Entry, pub machine_check: Entry, pub simd_floating_point: Entry, pub virtualization: Entry, pub security_exception: Entry, // some fields omitted } ``` 每一个字段都是 [`idt::Entry`] 类型,这个类型包含了一条完整的IDT条目(定义参见上文)。 其泛型参数 `F` 定义了中断处理函数的类型,在有些字段中该参数为 [`HandlerFunc`],而有些则是 [`HandlerFuncWithErrCode`],而对于 page fault 这种特殊异常,则为 [`PageFaultHandlerFunc`]。 [`idt::Entry`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.Entry.html [`HandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFunc.html [`HandlerFuncWithErrCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.HandlerFuncWithErrCode.html [`PageFaultHandlerFunc`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/type.PageFaultHandlerFunc.html 首先让我们看一看 `HandlerFunc` 类型的定义: ```rust type HandlerFunc = extern "x86-interrupt" fn(_: InterruptStackFrame); ``` 这是一个针对 `extern "x86-interrupt" fn` 类型的 [类型别名][type alias]。`extern` 关键字使用 [外部调用约定][foreign calling convention] 定义了一个函数,这种定义方式多用于和C语言代码通信(`extern "C" fn`),那么这里的外部调用约定又究竟调用了哪些东西? [type alias]: https://doc.rust-lang.org/book/ch20-03-advanced-types.html#creating-type-synonyms-with-type-aliases [foreign calling convention]: https://doc.rust-lang.org/nomicon/ffi.html#foreign-calling-conventions ## 中断调用约定 异常触发十分类似于函数调用:CPU会直接跳转到处理函数的第一个指令处开始执行,执行结束后,CPU会跳转到返回地址,并继续执行之前的函数调用。 然而两者最大的不同点是:函数调用是由编译器通过 `call` 指令主动发起的,而错误处理函数则可能会由 _任何_ 指令触发。要了解这两者所造成影响的不同,我们需要更深入的追踪函数调用。 [调用约定][Calling conventions] 指定了函数调用的详细信息,比如可以指定函数的参数存放在哪里(寄存器,或者栈,或者别的什么地方)以及如何返回结果。在 x86_64 Linux 中,以下规则适用于C语言函数(指定于 [System V ABI] 标准): [Calling conventions]: https://en.wikipedia.org/wiki/Calling_convention [System V ABI]: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf - 前六个整型参数从寄存器传入 `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` - 其他参数从栈传入 - 函数返回值存放在 `rax` 和 `rdx` 注意,Rust并不遵循C ABI,而是遵循自己的一套规则,即 [尚未正式发布的 Rust ABI 草案][rust abi],所以这些规则仅在使用 `extern "C" fn` 对函数进行定义时才会使用。 [rust abi]: https://github.com/rust-lang/rfcs/issues/600 ### 保留寄存器和临时寄存器 调用约定将寄存器分为两部分:_保留寄存器_ 和 _临时寄存器_ 。 _保留寄存器_ 的值应当在函数调用时保持不变,所以被调用的函数( _"callee"_ )只有在保证"返回之前将这些寄存器的值恢复到初始值"的前提下,才被允许覆写这些寄存器的值, 在函数开始时将这类寄存器的值存入栈中,并在返回之前将之恢复到寄存器中是一种十分常见的做法。 而 _临时寄存器_ 则相反,被调用函数可以无限制的反复写入寄存器,若调用者希望此类寄存器在函数调用后保持数值不变,则需要自己来处理备份和恢复过程(例如将其数值保存在栈中),因而这类寄存器又被称为 _caller-saved_。 在 x86_64 架构下,C调用约定指定了这些寄存器分类: | 保留寄存器 | 临时寄存器 | | ----------------------------------------------- | ----------------------------------------------------------- | | `rbp`, `rbx`, `rsp`, `r12`, `r13`, `r14`, `r15` | `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`, `r9`, `r10`, `r11` | | _callee-saved_ | _caller-saved_ | 编译器已经内置了这些规则,因而可以自动生成保证程序正常执行的指令。例如绝大多数函数的汇编指令都以 `push rbp` 开头,也就是将 `rbp` 的值备份到栈中(因为它是 `callee-saved` 型寄存器)。 ### 保存所有寄存器数据 区别于函数调用,异常在执行 _任何_ 指令时都有可能发生。在大多数情况下,我们在编译期不可能知道程序跑起来会发生什么异常。比如编译器无法预知某条指令是否会触发 page fault 或者 stack overflow。 正因我们不知道异常会何时发生,所以我们无法预先保存寄存器。这意味着我们无法使用依赖调用方备份 (caller-saved) 的寄存器的调用传统作为异常处理程序。因此,我们需要一个保存所有寄存器的传统。x86-interrupt 恰巧就是其中之一,它可以保证在函数返回时,寄存器里的值均返回原样。 但请注意,这并不意味着所有寄存器都会在进入函数时备份入栈。编译器仅会备份被函数覆写的寄存器,继而为只使用几个寄存器的短小函数生成高效的代码。 ### 中断栈帧 当一个常规函数调用发生时(使用 `call` 指令),CPU会在跳转目标函数之前,将返回地址入栈。当函数返回时(使用 `ret` 指令),CPU会在跳回目标函数之前弹出返回地址。所以常规函数调用的栈帧看起来是这样的: ![function stack frame](function-stack-frame.svg) 对于错误和中断处理函数,仅仅压入一个返回地址并不足够,因为中断处理函数通常会运行在一个不那么一样的上下文中(栈指针、CPU flags等等)。所以CPU在遇到中断发生时是这么处理的: 1. **对齐栈指针**: 任何指令都有可能触发中断,所以栈指针可能是任何值,而部分CPU指令(比如部分SSE指令)需要栈指针16字节边界对齐,因此CPU会在中断触发后立刻为其进行对齐。 2. **切换栈** (部分情况下): 当CPU特权等级改变时,例如当一个用户态程序触发CPU异常时,会触发栈切换。该行为也可能被所谓的 _中断栈表_ 配置,在特定中断中触发,关于该表,我们会在下一篇文章做出讲解。 3. **压入旧的栈指针**: 当中断发生后,栈指针对齐之前,CPU会将栈指针寄存器(`rsp`)和栈段寄存器(`ss`)的数据入栈,由此可在中断处理函数返回后,恢复上一层的栈指针。 4. **压入并更新 `RFLAGS` 寄存器**: [`RFLAGS`] 寄存器包含了各式各样的控制位和状态位,当中断发生时,CPU会改变其中的部分数值,并将旧值入栈。 5. **压入指令指针**: 在跳转中断处理函数之前,CPU会将指令指针寄存器(`rip`)和代码段寄存器(`cs`)的数据入栈,此过程与常规函数调用中返回地址入栈类似。 6. **压入错误码** (针对部分异常): 对于部分特定的异常,比如 page faults ,CPU会推入一个错误码用于标记错误的成因。 7. **执行中断处理函数**: CPU会读取对应IDT条目中描述的中断处理函数对应的地址和段描述符,将两者载入 `rip` 和 `cs` 以开始运行处理函数。 [`RFLAGS`]: https://en.wikipedia.org/wiki/FLAGS_register 所以 _中断栈帧_ 看起来是这样的: ![interrupt stack frame](exception-stack-frame.svg) 在 `x86_64` crate 中,中断栈帧已经被 [`InterruptStackFrame`] 结构完整表达,该结构会以 `&mut` 的形式传入处理函数,并可以用于查询错误发生的更详细的原因。但该结构并不包含错误码字段,因为只有极少量的错误会传入错误码,所以对于这类需要传入 `error_code` 的错误,其函数类型变为了 [`HandlerFuncWithErrCode`]。 [`InterruptStackFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptStackFrame.html ### 幕后花絮 `x86-interrupt` 调用约定是一个十分厉害的抽象,它几乎隐藏了所有错误处理函数中的凌乱细节,但尽管如此,了解一下水面下发生的事情还是有用的。我们来简单介绍一下被 `x86-interrupt` 隐藏起来的行为: - **传递参数**: 绝大多数指定参数的调用约定都是期望通过寄存器取得参数的,但事实上这是无法实现的,因为我们不能在备份寄存器数据之前就将其复写。`x86-interrupt` 的解决方案时,将参数以指定的偏移量放到栈上。 - **使用 `iretq` 返回**: 由于中断栈帧和普通函数调用的栈帧是完全不同的,我们无法通过 `ret` 指令直接返回,所以此时必须使用 `iretq` 指令。 - **处理错误码**: 部分异常传入的错误码会让错误处理更加复杂,它会造成栈指针对齐失效(见下一条),而且需要在返回之前从栈中弹出去。好在 `x86-interrupt` 为我们挡住了这些额外的复杂度。但是它无法判断哪个异常对应哪个处理函数,所以它需要从函数参数数量上推断一些信息,因此程序员需要为每个异常使用正确的函数类型。当然你已经不需要烦恼这些, `x86_64` crate 中的 `InterruptDescriptorTable` 已经帮助你完成了定义。 - **对齐栈**: 对于一些指令(尤其是SSE指令)而言,它们需要提前进行16字节边界对齐操作,通常而言CPU在异常发生之后就会自动完成这一步。但是部分异常会由于传入错误码而破坏掉本应完成的对齐操作,此时 `x86-interrupt` 会为我们重新完成对齐。 如果你对更多细节有兴趣:我们还有关于使用 [裸函数][naked functions] 展开异常处理的一个系列章节,参见 [文末][too-much-magic]。 [naked functions]: https://github.com/rust-lang/rfcs/blob/master/text/1201-naked-fns.md [too-much-magic]: #hei-mo-fa-you-dian-duo ## 实现 那么理论知识暂且到此为止,该开始为我们的内核实现CPU异常处理了。首先我们在 `src/interrupts.rs` 创建一个模块,并加入函数 `init_idt` 用来创建一个新的 `InterruptDescriptorTable`: ``` rust // in src/lib.rs pub mod interrupts; // in src/interrupts.rs use x86_64::structures::idt::InterruptDescriptorTable; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); } ``` 现在我们就可以添加处理函数了,首先给 [breakpoint exception] 添加。该异常是一个绝佳的测试途径,因为它唯一的目的就是在 `int3` 指令执行时暂停程序运行。 [breakpoint exception]: https://wiki.osdev.org/Exceptions#Breakpoint breakpoint exception 通常被用在调试器中:当程序员为程序打上断点,调试器会将对应的位置覆写为 `int3` 指令,CPU执行该指令后,就会抛出 breakpoint exception 异常。在调试完毕,需要程序继续运行时,调试器就会将原指令覆写回 `int3` 的位置。如果要了解更多细节,请查阅 ["_调试器是如何工作的_"]["_How debuggers work_"] 系列。 ["_How debuggers work_"]: https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints 不过现在我们还不需要覆写指令,只需要打印一行日志,表明接收到了这个异常,然后让程序继续运行即可。那么我们就来创建一个简单的 `breakpoint_handler` 方法并加入IDT中: ```rust // in src/interrupts.rs use x86_64::structures::idt::{InterruptDescriptorTable, InterruptStackFrame}; use crate::println; pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); } extern "x86-interrupt" fn breakpoint_handler( stack_frame: InterruptStackFrame) { println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); } ``` 现在,我们的处理函数应当会输出一行信息以及完整的栈帧。 但当我们尝试编译的时候,报出了下面的错误: ``` error[E0658]: x86-interrupt ABI is experimental and subject to change (see issue #40180) --> src/main.rs:53:1 | 53 | / extern "x86-interrupt" fn breakpoint_handler(stack_frame: InterruptStackFrame) { 54 | | println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame); 55 | | } | |_^ | = help: add #![feature(abi_x86_interrupt)] to the crate attributes to enable ``` 这是因为 `x86-interrupt` 并不是稳定特性,需要手动启用,只需要在我们的 `lib.rs` 中加入 `#![feature(abi_x86_interrupt)]` 开关即可。 ### 载入 IDT 要让CPU使用新的中断描述符表,我们需要使用 [`lidt`] 指令来装载一下,`x86_64` 的 `InterruptDescriptorTable` 结构提供了 [`load`][InterruptDescriptorTable::load] 函数用来实现这个需求。让我们来试一下: [`lidt`]: https://www.felixcloutier.com/x86/lgdt:lidt [InterruptDescriptorTable::load]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html#method.load ```rust // in src/interrupts.rs pub fn init_idt() { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.load(); } ``` 再次尝试编译,又出现了新的错误: ``` error: `idt` does not live long enough --> src/interrupts/mod.rs:43:5 | 43 | idt.load(); | ^^^ does not live long enough 44 | } | - borrowed value only lives until here | = note: borrowed value must be valid for the static lifetime... ``` 原来 `load` 函数要求的生命周期为 `&'static self` ,也就是整个程序的生命周期,其原因就是CPU在接收到下一个IDT之前会一直使用这个描述符表。如果生命周期小于 `'static` ,很可能就会出现使用已释放对象的bug。 问题至此已经很清晰了,我们的 `idt` 是创建在栈上的,它的生命周期仅限于 `init` 函数执行期间,之后这部分栈内存就会被其他函数调用,CPU再来访问IDT的话,只会读取到一段随机数据。好在 `InterruptDescriptorTable::load` 被严格定义了函数生命周期限制,这样 Rust 编译器就可以在编译时就发现这些潜在问题。 要修复这些错误很简单,让 `idt` 具备 `'static` 类型的生命周期即可,我们可以使用 [`Box`] 在堆上申请一段内存,并转化为 `'static` 指针即可,但问题是我们正在写的东西是操作系统内核,(暂时)并没有堆这种东西。 [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html 作为替代,我们可以试着直接将IDT定义为 `'static` 变量: ```rust static IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } ``` 然而这样就会引入一个新问题:静态变量是不可修改的,这样我们就无法在 `init` 函数中修改里面的数据了,所以需要把变量类型修改为 [`static mut`]: [`static mut`]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#accessing-or-modifying-a-mutable-static-variable ```rust static mut IDT: InterruptDescriptorTable = InterruptDescriptorTable::new(); pub fn init_idt() { unsafe { IDT.breakpoint.set_handler_fn(breakpoint_handler); IDT.load(); } } ``` 这样就不会有编译错误了,但是这并不符合官方推荐的编码习惯,因为理论上说 `static mut` 类型的变量很容易形成数据竞争,所以需要用 [`unsafe` 代码块][`unsafe` block] 修饰调用语句。 [`unsafe` block]: https://doc.rust-lang.org/1.30.0/book/second-edition/ch19-01-unsafe-rust.html#unsafe-superpowers #### 懒加载拯救世界 好在还有 `lazy_static` 宏可以用,区别于普通 `static` 变量在编译器求值,这个宏可以使代码块内的 `static` 变量在第一次取值时求值。所以,我们完全可以把初始化代码写在变量定义的代码块里,同时也不影响后续的取值。 在 [创建VGA字符缓冲的单例][vga text buffer lazy static] 时我们已经引入了 `lazy_static` crate,所以我们可以直接使用 `lazy_static!` 来创建IDT: [vga text buffer lazy static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics ```rust // in src/interrupts.rs use lazy_static::lazy_static; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt }; } pub fn init_idt() { IDT.load(); } ``` 现在碍眼的 `unsafe` 代码块成功被去掉了,尽管 `lazy_static!` 的内部依然使用了 `unsafe` 代码块,但是至少它已经抽象为了一个安全接口。 ### 跑起来 最后一步就是在 `main.rs` 里执行 `init_idt` 函数以在我们的内核里装载IDT,但不要直接调用,而应在 `lib.rs` 里封装一个 `init` 函数出来: ```rust // in src/lib.rs pub fn init() { interrupts::init_idt(); } ``` 这样我们就可以把所有初始化逻辑都集中在一个函数里,从而让 `main.rs` 、 `lib.rs` 以及单元测试中的 `_start` 共享初始化逻辑。 现在我们更新一下 `main.rs` 中的 `_start` 函数,调用 `init` 并手动触发一次 breakpoint exception: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // new // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); // new // as before #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` 当我们在QEMU中运行之后(`cargo run`),效果是这样的: ![QEMU printing `EXCEPTION: BREAKPOINT` and the interrupt stack frame](qemu-breakpoint-exception.png) 成功了!CPU成功调用了中断处理函数并打印出了信息,然后返回 `_start` 函数打印出了 `It did not crash!`。 我们可以看到,中断栈帧告诉了我们当错误发生时指令和栈指针的具体数值,这些信息在我们调试意外错误的时候非常有用。 ### 添加测试 那么让我们添加一个测试用例,用来确保以上工作成果可以顺利运行。首先需要在 `_start` 函数中调用 `init`: ```rust // in src/lib.rs /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); // new test_main(); loop {} } ``` 注意,这里的 `_start` 会在 `cargo test --lib` 这条命令的上下文中运行,而 `lib.rs` 的执行环境完全独立于 `main.rs`,所以我们需要在运行测试之前调用 `init` 装载IDT。 那么我们接着创建一个测试用例 `test_breakpoint_exception`: ```rust // in src/interrupts.rs #[test_case] fn test_breakpoint_exception() { // invoke a breakpoint exception x86_64::instructions::interrupts::int3(); } ``` 该测试仅调用了 `int3` 函数以触发 breakpoint exception,通过查看这个函数是否能够继续运行下去,就可以确认我们对应的中断处理函数是否工作正常。 现在,你可以执行 `cargo test` 来运行所有测试,或者执行 `cargo test --lib` 来运行 `lib.rs` 及其子模块中包含的测试,最终输出如下: ``` blog_os::interrupts::test_breakpoint_exception... [ok] ``` ## 黑魔法有点多? 相对来说,`x86-interrupt` 调用约定和 [`InterruptDescriptorTable`] 类型让错误处理变得直截了当,如果这对你来说太过于神奇,进而想要了解错误处理中的所有隐秘细节,我们推荐读一下这些:[“使用裸函数处理错误”][“Handling Exceptions with Naked Functions”] 系列文章展示了如何在不使用 `x86-interrupt` 的前提下创建IDT。但是需要注意的是,这些文章都是在 `x86-interrupt` 调用约定和 `x86_64` crate 出现之前的产物,这些东西属于博客的 [第一版][first edition],不排除信息已经过期了的可能。 [“Handling Exceptions with Naked Functions”]: @/edition-1/extra/naked-exceptions/_index.md [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [first edition]: @/edition-1/_index.md ## 接下来是? 我们已经成功捕获了第一个异常,并从异常中成功恢复,下一步就是试着捕获所有异常,如果有未捕获的异常就会触发致命的[triple fault],那就只能重启整个系统了。下一篇文章会展开说我们如何通过正确捕捉[double faults]来避免这种情况。 [triple fault]: https://wiki.osdev.org/Triple_Fault [double faults]: https://wiki.osdev.org/Double_Fault#Double_Fault ================================================ FILE: blog/content/edition-2/posts/06-double-faults/index.es.md ================================================ +++ title = "Excepciones de Doble Fallo" weight = 6 path = "es/double-fault-exceptions" date = 2018-06-18 [extra] chapter = "Interrupciones" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ Esta publicación explora en detalle la excepción de doble fallo, que ocurre cuando la CPU no logra invocar un controlador de excepciones. Al manejar esta excepción, evitamos fallos _triples_ fatales que causan un reinicio del sistema. Para prevenir fallos triples en todos los casos, también configuramos una _Tabla de Pila de Interrupciones_ (IST) para capturar dobles fallos en una pila de núcleo separada. Este blog se desarrolla abiertamente en [GitHub]. Si tienes problemas o preguntas, abre un issue allí. También puedes dejar comentarios [al final]. El código fuente completo de esta publicación se puede encontrar en la rama [`post-06`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-06 ## ¿Qué es un Doble Fallo? En términos simplificados, un doble fallo es una excepción especial que ocurre cuando la CPU no logra invocar un controlador de excepciones. Por ejemplo, ocurre cuando se activa un fallo de página pero no hay un controlador de fallo de página registrado en la [Tabla de Descriptores de Interrupciones][IDT] (IDT). Así que es un poco similar a los bloques de captura de "cosecha todo" en lenguajes de programación con excepciones, por ejemplo, `catch(...)` en C++ o `catch(Exception e)` en Java o C#. [IDT]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table Un doble fallo se comporta como una excepción normal. Tiene el número de vector `8` y podemos definir una función controladora normal para él en la IDT. Es realmente importante proporcionar un controlador de doble fallo, porque si un doble fallo no se maneja, ocurre un fallo _triple_ fatal. Los fallos triples no se pueden capturar, y la mayoría del hardware reacciona con un reinicio del sistema. ### Provocando un Doble Fallo Provocamos un doble fallo al activar una excepción para la cual no hemos definido una función controladora: ```rust // en src/main.rs #[no_mangle] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // provocar un fallo de página unsafe { *(0xdeadbeef as *mut u8) = 42; }; // como antes #[cfg(test)] test_main(); println!("¡No se colapsó!"); loop {} } ``` Usamos `unsafe` para escribir en la dirección inválida `0xdeadbeef`. La dirección virtual no está mapeada a una dirección física en las tablas de páginas, por lo que ocurre un fallo de página. No hemos registrado un controlador de fallo de página en nuestra [IDT], así que ocurre un doble fallo. Cuando iniciamos nuestro núcleo ahora, vemos que entra en un bucle de arranque interminable. La razón del bucle de arranque es la siguiente: 1. La CPU intenta escribir en `0xdeadbeef`, lo que causa un fallo de página. 2. La CPU consulta la entrada correspondiente en la IDT y ve que no se especifica ninguna función controladora. Por lo tanto, no puede llamar al controlador de fallo de página y ocurre un doble fallo. 3. La CPU consulta la entrada de la IDT del controlador de doble fallo, pero esta entrada tampoco especifica una función controladora. Por lo tanto, ocurre un fallo _triple_. 4. Un fallo triple es fatal. QEMU reacciona a esto como la mayoría del hardware real y emite un reinicio del sistema. Por lo tanto, para prevenir este fallo triple, necesitamos proporcionar una función controladora para los fallos de página o un controlador de doble fallo. Queremos evitar los fallos triples en todos los casos, así que empecemos con un controlador de doble fallo que se invoca para todos los tipos de excepciones no manejadas. ## Un Controlador de Doble Fallo Un doble fallo es una excepción normal con un código de error, por lo que podemos especificar una función controladora similar a nuestra función controladora de punto de interrupción: ```rust // en src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.double_fault.set_handler_fn(double_fault_handler); // nuevo idt }; } // nuevo extern "x86-interrupt" fn double_fault_handler( stack_frame: InterruptStackFrame, _error_code: u64) -> ! { panic!("EXCEPCIÓN: DOBLE FALLO\n{:#?}", stack_frame); } ``` Nuestro controlador imprime un corto mensaje de error y volcado del marco de pila de excepciones. El código de error del controlador de doble fallo siempre es cero, así que no hay razón para imprimirlo. Una diferencia con el controlador de punto de interrupción es que el controlador de doble fallo es [_divergente_]. La razón es que la arquitectura `x86_64` no permite devolver de una excepción de doble fallo. [_divergente_]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html Cuando iniciamos nuestro núcleo ahora, deberíamos ver que se invoca el controlador de doble fallo: ![QEMU imprimiendo `EXCEPCIÓN: DOBLE FALLO` y el marco de pila de excepciones](qemu-catch-double-fault.png) ¡Funcionó! Aquí está lo que sucedió esta vez: 1. La CPU intenta escribir en `0xdeadbeef`, lo que causa un fallo de página. 2. Como antes, la CPU consulta la entrada correspondiente en la IDT y ve que no se define ninguna función controladora. Así que ocurre un doble fallo. 3. La CPU salta al – ahora presente – controlador de doble fallo. El fallo triple (y el bucle de arranque) ya no ocurre, ya que la CPU ahora puede llamar al controlador de doble fallo. ¡Eso fue bastante directo! Entonces, ¿por qué necesitamos una publicación completa sobre este tema? Bueno, ahora podemos capturar la mayoría de los dobles fallos, pero hay algunos casos en los que nuestro enfoque actual no es suficiente. ## Causas de Doble Fallos Antes de mirar los casos especiales, necesitamos conocer las causas exactas de los dobles fallos. Arriba, usamos una definición bastante vaga: > Un doble fallo es una excepción especial que ocurre cuando la CPU no logra invocar un controlador de excepciones. ¿Qué significa exactamente _“no logra invocar”_? ¿No está presente el controlador? ¿El controlador está [intercambiado]? ¿Y qué sucede si un controlador causa excepciones a su vez? [intercambiado]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf Por ejemplo, ¿qué ocurre si: 1. ocurre una excepción de punto de interrupción, pero la función controladora correspondiente está intercambiada? 2. ocurre un fallo de página, pero el controlador de fallo de página está intercambiado? 3. un controlador de división por cero causa una excepción de punto de interrupción, pero el controlador de punto de interrupción está intercambiado? 4. nuestro núcleo desborda su pila y se activa la _página de guardia_? Afortunadamente, el manual de AMD64 ([PDF][AMD64 manual]) tiene una definición exacta (en la Sección 8.2.9). Según él, una “excepción de doble fallo _puede_ ocurrir cuando una segunda excepción ocurre durante el manejo de un controlador de excepción previo (primera)”. El _“puede”_ es importante: Solo combinaciones muy específicas de excepciones conducen a un doble fallo. Estas combinaciones son: | Primera Excepción | Segunda Excepción | | ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | | [División por cero],
    [TSS No Válido],
    [Segmento No Presente],
    [Fallo de Segmento de Pila],
    [Fallo de Protección General] | [TSS No Válido],
    [Segmento No Presente],
    [Fallo de Segmento de Pila],
    [Fallo de Protección General] | | [Fallo de Página] | [Fallo de Página],
    [TSS No Válido],
    [Segmento No Presente],
    [Fallo de Segmento de Pila],
    [Fallo de Protección General] | [División por cero]: https://wiki.osdev.org/Exceptions#Division_Error [TSS No Válido]: https://wiki.osdev.org/Exceptions#Invalid_TSS [Segmento No Presente]: https://wiki.osdev.org/Exceptions#Segment_Not_Present [Fallo de Segmento de Pila]: https://wiki.osdev.org/Exceptions#Stack-Segment_Fault [Fallo de Protección General]: https://wiki.osdev.org/Exceptions#General_Protection_Fault [Fallo de Página]: https://wiki.osdev.org/Exceptions#Page_Fault [AMD64 manual]: https://www.amd.com/system/files/TechDocs/24593.pdf Así que, por ejemplo, un fallo de división por cero seguido de un fallo de página está bien (se invoca el controlador de fallo de página), pero un fallo de división por cero seguido de un fallo de protección general conduce a un doble fallo. Con la ayuda de esta tabla, podemos responder las tres primeras preguntas anteriores: 1. Si ocurre una excepción de punto de interrupción y la función controladora correspondiente está intercambiada, ocurre un _fallo de página_ y se invoca el _controlador de fallo de página_. 2. Si ocurre un fallo de página y el controlador de fallo de página está intercambiado, ocurre un _doble fallo_ y se invoca el _controlador de doble fallo_. 3. Si un controlador de división por cero causa una excepción de punto de interrupción, la CPU intenta invocar el controlador de punto de interrupción. Si el controlador de punto de interrupción está intercambiado, ocurre un _fallo de página_ y se invoca el _controlador de fallo de página_. De hecho, incluso el caso de una excepción sin una función controladora en la IDT sigue este esquema: Cuando ocurre la excepción, la CPU intenta leer la entrada correspondiente de la IDT. Dado que la entrada es 0, que no es una entrada válida de la IDT, ocurre un _fallo de protección general_. No definimos una función controladora para el fallo de protección general tampoco, así que ocurre otro fallo de protección general. Según la tabla, esto conduce a un doble fallo. ### Desbordamiento de Pila del Núcleo Veamos la cuarta pregunta: > ¿Qué ocurre si nuestro núcleo desborda su pila y se activa la página de guardia? Una página de guardia es una página de memoria especial en la parte inferior de una pila que permite detectar desbordamientos de pila. La página no está mapeada a ningún marco físico, por lo que acceder a ella provoca un fallo de página en lugar de corromper silenciosamente otra memoria. El cargador de arranque establece una página de guardia para nuestra pila de núcleo, así que un desbordamiento de pila provoca un _fallo de página_. Cuando ocurre un fallo de página, la CPU busca el controlador de fallo de página en la IDT e intenta empujar el [marco de pila de interrupción] en la pila. Sin embargo, el puntero de pila actual aún apunta a la página de guardia no presente. Por lo tanto, ocurre un segundo fallo de página, que causa un doble fallo (según la tabla anterior). [marco de pila de interrupción]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-stack-frame Así que la CPU intenta llamar al _controlador de doble fallo_ ahora. Sin embargo, en un doble fallo, la CPU también intenta empujar el marco de pila de excepción. El puntero de pila aún apunta a la página de guardia, por lo que ocurre un _tercer_ fallo de página, que causa un _fallo triple_ y un reinicio del sistema. Así que nuestro actual controlador de doble fallo no puede evitar un fallo triple en este caso. ¡Probémoslo nosotros mismos! Podemos provocar fácilmente un desbordamiento de pila del núcleo llamando a una función que recursivamente se llame a sí misma sin fin: ```rust // en src/main.rs #[no_mangle] // no mangles el nombre de esta función pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); fn stack_overflow() { stack_overflow(); // por cada recursión, se empuja la dirección de retorno } // provocar un desbordamiento de pila stack_overflow(); […] // test_main(), println(…), y loop {} } ``` Cuando intentamos este código en QEMU, vemos que el sistema entra en un bucle de arranque nuevamente. Entonces, ¿cómo podemos evitar este problema? No podemos omitir el empuje del marco de pila de excepción, ya que la CPU lo hace ella misma. Así que necesitamos asegurarnos de alguna manera de que la pila sea siempre válida cuando ocurra una excepción de doble fallo. Afortunadamente, la arquitectura `x86_64` tiene una solución a este problema. ## Cambio de Pilas La arquitectura `x86_64` es capaz de cambiar a una pila conocida y predefinida cuando ocurre una excepción. Este cambio se realiza a nivel de hardware, así que se puede hacer antes de que la CPU empuje el marco de pila de excepción. El mecanismo de cambio se implementa como una _Tabla de Pila de Interrupciones_ (IST). La IST es una tabla de 7 punteros a pilas conocidas y válidas. En pseudocódigo estilo Rust: ```rust struct InterruptStackTable { stack_pointers: [Option; 7], } ``` Para cada controlador de excepciones, podemos elegir una pila de la IST a través del campo `stack_pointers` en la entrada correspondiente de la [IDT]. Por ejemplo, nuestro controlador de doble fallo podría usar la primera pila en la IST. Entonces, la CPU cambia automáticamente a esta pila cada vez que ocurre un doble fallo. Este cambio ocurriría antes de que se empuje cualquier cosa, previniendo el fallo triple. [IDT entry]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table ### La IST y TSS La Tabla de Pila de Interrupciones (IST) es parte de una estructura antigua llamada _[Segmento de Estado de Tarea]_ (TSS). La TSS solía contener varias piezas de información (por ejemplo, el estado de registro del procesador) sobre una tarea en modo de 32 bits y se usaba, por ejemplo, para [cambio de contexto de hardware]. Sin embargo, el cambio de contexto de hardware ya no se admite en modo de 64 bits y el formato de la TSS ha cambiado completamente. [Segmento de Estado de Tarea]: https://en.wikipedia.org/wiki/Task_state_segment [cambio de contexto de hardware]: https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching En `x86_64`, la TSS ya no contiene ninguna información específica de tarea. En su lugar, contiene dos tablas de pilas (la IST es una de ellas). El único campo común entre la TSS de 32 bits y 64 bits es el puntero al [bitmap de permisos de puertos de E/S]. [bitmap de permisos de puertos de E/S]: https://en.wikipedia.org/wiki/Task_state_segment#I.2FO_port_permissions La TSS de 64 bits tiene el siguiente formato: | Campo | Tipo | | --------------------------------------------- | ---------- | | (reservado) | `u32` | | Tabla de Pilas de Privilegio | `[u64; 3]` | | (reservado) | `u64` | | Tabla de Pila de Interrupciones | `[u64; 7]` | | (reservado) | `u64` | | (reservado) | `u16` | | Dirección Base del Mapa de E/S | `u16` | La _Tabla de Pilas de Privilegio_ es usada por la CPU cuando cambia el nivel de privilegio. Por ejemplo, si ocurre una excepción mientras la CPU está en modo usuario (nivel de privilegio 3), la CPU normalmente cambia a modo núcleo (nivel de privilegio 0) antes de invocar el controlador de excepciones. En ese caso, la CPU cambiaría a la 0ª pila en la Tabla de Pilas de Privilegio (ya que 0 es el nivel de privilegio de destino). Aún no tenemos programas en modo usuario, así que ignoraremos esta tabla por ahora. ### Creando una TSS Creemos una nueva TSS que contenga una pila de doble fallo separada en su tabla de pila de interrupciones. Para ello, necesitamos una estructura TSS. Afortunadamente, la crate `x86_64` ya contiene una [`struct TaskStateSegment`] que podemos usar. [`struct TaskStateSegment`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/tss/struct.TaskStateSegment.html Creamos la TSS en un nuevo módulo `gdt` (el nombre tendrá sentido más adelante): ```rust // en src/lib.rs pub mod gdt; // en src/gdt.rs use x86_64::VirtAddr; use x86_64::structures::tss::TaskStateSegment; use lazy_static::lazy_static; pub const DOUBLE_FAULT_IST_INDEX: u16 = 0; lazy_static! { static ref TSS: TaskStateSegment = { let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = { const STACK_SIZE: usize = 4096 * 5; static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE]; let stack_start = VirtAddr::from_ptr(unsafe { &STACK }); let stack_end = stack_start + STACK_SIZE; stack_end }; tss }; } ``` Usamos `lazy_static` porque el evaluador de const de Rust aún no es lo suficientemente potente como para hacer esta inicialización en tiempo de compilación. Definimos que la entrada 0 de la IST es la pila de doble fallo (cualquier otro índice de IST también funcionaría). Luego, escribimos la dirección superior de una pila de doble fallo en la entrada 0. Escribimos la dirección superior porque las pilas en `x86` crecen hacia abajo, es decir, de direcciones altas a bajas. No hemos implementado la gestión de memoria aún, así que no tenemos una forma adecuada de asignar una nueva pila. En su lugar, usamos un array `static mut` como almacenamiento de pila por ahora. El `unsafe` es requerido porque el compilador no puede garantizar la ausencia de condiciones de carrera cuando se accede a estáticos mutables. Es importante que sea un `static mut` y no un `static` inmutable, porque de lo contrario el cargador de arranque lo mapeará a una página de solo lectura. Reemplazaremos esto con una asignación de pila adecuada en una publicación posterior, entonces el `unsafe` ya no será necesario en este lugar. Ten en cuenta que esta pila de doble fallo no tiene página de guardia que proteja contra el desbordamiento de pila. Esto significa que no deberíamos hacer nada intensivo en pila en nuestro controlador de doble fallo porque un desbordamiento de pila podría corromper la memoria debajo de la pila. #### Cargando la TSS Ahora que hemos creado una nueva TSS, necesitamos una forma de decirle a la CPU que debe usarla. Desafortunadamente, esto es un poco engorroso ya que la TSS utiliza el sistema de segmentación (por razones históricas). En lugar de cargar la tabla directamente, necesitamos agregar un nuevo descriptor de segmento a la [Tabla Global de Descriptores] (GDT). Luego podemos cargar nuestra TSS invocando la instrucción [`ltr`] con el índice correspondiente de la GDT. (Esta es la razón por la que llamamos a nuestro módulo `gdt`). [Tabla Global de Descriptores]: https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/ [`ltr`]: https://www.felixcloutier.com/x86/ltr ### La Tabla Global de Descriptores La Tabla Global de Descriptores (GDT) es un reliquia que se usaba para [segmentación de memoria] antes de que la paginación se convirtiera en el estándar de facto. Sin embargo, todavía se necesita en modo de 64 bits para varias cosas, como la configuración del modo núcleo/usuario o la carga de la TSS. [segmentación de memoria]: https://en.wikipedia.org/wiki/X86_memory_segmentation La GDT es una estructura que contiene los _segmentos_ del programa. Se usaba en arquitecturas más antiguas para aislar programas unos de otros antes de que la paginación se convirtiera en el estándar. Para más información sobre segmentación, consulta el capítulo del mismo nombre en el libro gratuito [“Three Easy Pieces”]. Mientras que la segmentación ya no se admite en modo de 64 bits, la GDT sigue existiendo. Se utiliza principalmente para dos cosas: cambiar entre espacio de núcleo y espacio de usuario, y cargar una estructura TSS. [“Three Easy Pieces”]: http://pages.cs.wisc.edu/~remzi/OSTEP/ #### Creando una GDT Creemos una GDT estática que incluya un segmento para nuestra TSS estática: ```rust // en src/gdt.rs use x86_64::structures::gdt::{GlobalDescriptorTable, Descriptor}; lazy_static! { static ref GDT: GlobalDescriptorTable = { let mut gdt = GlobalDescriptorTable::new(); gdt.add_entry(Descriptor::kernel_code_segment()); gdt.add_entry(Descriptor::tss_segment(&TSS)); gdt }; } ``` Como antes, usamos `lazy_static` de nuevo. Creamos una nueva GDT con un segmento de código y un segmento de TSS. #### Cargando la GDT Para cargar nuestra GDT, creamos una nueva función `gdt::init` que llamamos desde nuestra función `init`: ```rust // en src/gdt.rs pub fn init() { GDT.load(); } // en src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); } ``` Ahora nuestra GDT está cargada (ya que la función `_start` llama a `init`), pero aún vemos el bucle de arranque en el desbordamiento de pila. ### Los Pasos Finales El problema es que los segmentos de la GDT aún no están activos porque los registros de segmento y TSS aún contienen los valores de la antigua GDT. También necesitamos modificar la entrada de IDT de doble fallo para que use la nueva pila. En resumen, necesitamos hacer lo siguiente: 1. **Recargar el registro de segmento de código**: Hemos cambiado nuestra GDT, así que deberíamos recargar `cs`, el registro del segmento de código. Esto es necesario porque el antiguo selector de segmento podría ahora apuntar a un descriptor de GDT diferente (por ejemplo, un descriptor de TSS). 2. **Cargar la TSS**: Cargamos una GDT que contiene un selector de TSS, pero aún necesitamos decirle a la CPU que debe usar esa TSS. 3. **Actualizar la entrada de IDT**: Tan pronto como nuestra TSS esté cargada, la CPU tendrá acceso a una tabla de pila de interrupciones (IST) válida. Luego podemos decirle a la CPU que debe usar nuestra nueva pila de doble fallo modificando nuestra entrada de IDT de doble fallo. Para los dos primeros pasos, necesitamos acceso a las variables `code_selector` y `tss_selector` en nuestra función `gdt::init`. Podemos lograr esto haciéndolas parte de la estática a través de una nueva estructura `Selectors`: ```rust // en src/gdt.rs use x86_64::structures::gdt::SegmentSelector; lazy_static! { static ref GDT: (GlobalDescriptorTable, Selectors) = { let mut gdt = GlobalDescriptorTable::new(); let code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS)); (gdt, Selectors { code_selector, tss_selector }) }; } struct Selectors { code_selector: SegmentSelector, tss_selector: SegmentSelector, } ``` Ahora podemos usar los selectores para recargar el registro `cs` y cargar nuestra `TSS`: ```rust // en src/gdt.rs pub fn init() { use x86_64::instructions::tables::load_tss; use x86_64::instructions::segmentation::{CS, Segment}; GDT.0.load(); unsafe { CS::set_reg(GDT.1.code_selector); load_tss(GDT.1.tss_selector); } } ``` Recargamos el registro de segmento de código usando [`CS::set_reg`] y cargamos la TSS usando [`load_tss`]. Las funciones están marcadas como `unsafe`, así que necesitamos un bloque `unsafe` para invocarlas. La razón es que podría ser posible romper la seguridad de la memoria al cargar selectores inválidos. [`CS::set_reg`]: https://docs.rs/x86_64/0.14.5/x86_64/instructions/segmentation/struct.CS.html#method.set_reg [`load_tss`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tables/fn.load_tss.html Ahora que hemos cargado una TSS válida y una tabla de pila de interrupciones, podemos establecer el índice de pila para nuestro controlador de doble fallo en la IDT: ```rust // en src/interrupts.rs use crate::gdt; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); unsafe { idt.double_fault.set_handler_fn(double_fault_handler) .set_stack_index(gdt::DOUBLE_FAULT_IST_INDEX); // nuevo } idt }; } ``` El método `set_stack_index` es inseguro porque el llamador debe asegurarse de que el índice utilizado es válido y no ya está usado para otra excepción. ¡Eso es todo! Ahora la CPU debería cambiar a la pila de doble fallo cada vez que ocurra un doble fallo. Así que podemos capturar _todos_ los dobles fallos, incluidos los desbordamientos de pila del núcleo: ![QEMU imprimiendo `EXCEPCIÓN: DOBLE FALLO` y un volcado del marco de pila de excepciones](qemu-double-fault-on-stack-overflow.png) A partir de ahora, ¡no deberíamos ver un fallo triple nuevamente! Para asegurar que no rompamos accidentalmente lo anterior, deberíamos agregar una prueba para esto. ## Una Prueba de Desbordamiento de Pila Para probar nuestro nuevo módulo `gdt` y asegurarnos de que el controlador de doble fallo se llama correctamente en un desbordamiento de pila, podemos agregar una prueba de integración. La idea es provocar un doble fallo en la función de prueba y verificar que se llama al controlador de doble fallo. Comencemos con un esqueleto mínimo: ```rust // en tests/stack_overflow.rs #![no_std] #![no_main] use core::panic::PanicInfo; #[no_mangle] pub extern "C" fn _start() -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` Al igual que nuestra prueba de `panic_handler`, la prueba se ejecutará [sin un arnés de prueba]. La razón es que no podemos continuar la ejecución después de un doble fallo, así que más de una prueba no tiene sentido. Para desactivar el arnés de prueba para la prueba, agregamos lo siguiente a nuestro `Cargo.toml`: ```toml # en Cargo.toml [[test]] name = "stack_overflow" harness = false ``` [sin un arnés de prueba]: @/edition-2/posts/04-testing/index.md#no-harness-tests Ahora `cargo test --test stack_overflow` debería compilar con éxito. La prueba falla, por supuesto, ya que el macro `unimplemented` provoca un pánico. ### Implementando `_start` La implementación de la función `_start` se ve así: ```rust // en tests/stack_overflow.rs use blog_os::serial_print; #[no_mangle] pub extern "C" fn _start() -> ! { serial_print!("stack_overflow::stack_overflow...\t"); blog_os::gdt::init(); init_test_idt(); // provocar un desbordamiento de pila stack_overflow(); panic!("La ejecución continuó después del desbordamiento de pila"); } #[allow(unconditional_recursion)] fn stack_overflow() { stack_overflow(); // por cada recursión, la dirección de retorno es empujada volatile::Volatile::new(0).read(); // prevenir optimizaciones de recursión de cola } ``` Llamamos a nuestra función `gdt::init` para inicializar una nueva GDT. En lugar de llamar a nuestra función `interrupts::init_idt`, llamamos a una función `init_test_idt` que se explicará en un momento. La función `stack_overflow` es casi idéntica a la función en nuestro `main.rs`. La única diferencia es que al final de la función, realizamos una lectura [volátil] adicional usando el tipo [`Volatile`] para prevenir una optimización del compilador llamada [_eliminación de llamadas de cola_]. Entre otras cosas, esta optimización permite al compilador transformar una función cuya última declaración es una llamada recursiva a una normal. Por lo tanto, no se crea un marco de pila adicional para la llamada a la función, así que el uso de la pila permanece constante. [volátil]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) [`Volatile`]: https://docs.rs/volatile/0.2.6/volatile/struct.Volatile.html [_eliminación de llamadas de cola_]: https://en.wikipedia.org/wiki/Tail_call En nuestro caso, sin embargo, queremos que el desbordamiento de pila ocurra, así que agregamos una declaración de lectura volátil ficticia al final de la función, que el compilador no puede eliminar. Por lo tanto, la función ya no es _tail recursive_, y se previene la transformación en un bucle. También agregamos el atributo `allow(unconditional_recursion)` para silenciar la advertencia del compilador de que la función recurre sin fin. ### La IDT de Prueba Como se mencionó anteriormente, la prueba necesita su propia IDT con un controlador de doble fallo personalizado. La implementación se ve así: ```rust // en tests/stack_overflow.rs use lazy_static::lazy_static; use x86_64::structures::idt::InterruptDescriptorTable; lazy_static! { static ref TEST_IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); unsafe { idt.double_fault .set_handler_fn(test_double_fault_handler) .set_stack_index(blog_os::gdt::DOUBLE_FAULT_IST_INDEX); } idt }; } pub fn init_test_idt() { TEST_IDT.load(); } ``` La implementación es muy similar a nuestra IDT normal en `interrupts.rs`. Al igual que en la IDT normal, establecemos un índice de pila en la IST para el controlador de doble fallo con el fin de cambiar a una pila separada. La función `init_test_idt` carga la IDT en la CPU a través del método `load`. ### El Controlador de Doble Fallo La única pieza que falta es nuestro controlador de doble fallo. Se ve así: ```rust // en tests/stack_overflow.rs use blog_os::{exit_qemu, QemuExitCode, serial_println}; use x86_64::structures::idt::InterruptStackFrame; extern "x86-interrupt" fn test_double_fault_handler( _stack_frame: InterruptStackFrame, _error_code: u64, ) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` Cuando se llama al controlador de doble fallo, salimos de QEMU con un código de salida de éxito, lo que marca la prueba como pasada. Dado que las pruebas de integración son ejecutables completamente separadas, necesitamos establecer el atributo `#![feature(abi_x86_interrupt)]` nuevamente en la parte superior de nuestro archivo de prueba. Ahora podemos ejecutar nuestra prueba a través de `cargo test --test stack_overflow` (o `cargo test` para ejecutar todas las pruebas). Como era de esperar, vemos la salida de `stack_overflow... [ok]` en la consola. Intenta comentar la línea `set_stack_index`; debería hacer que la prueba falle. ## Resumen En esta publicación, aprendimos qué es un doble fallo y bajo qué condiciones ocurre. Agregamos un controlador básico de doble fallo que imprime un mensaje de error y añadimos una prueba de integración para ello. También habilitamos el cambio de pila soportado por hardware en excepciones de doble fallo para que también funcione en desbordamientos de pila. Mientras lo implementábamos, aprendimos sobre el segmento de estado de tarea (TSS), la tabla de pila de interrupciones (IST) que contiene, y la tabla global de descriptores (GDT), que se usaba para segmentación en arquitecturas anteriores. ## ¿Qué sigue? La próxima publicación explica cómo manejar interrupciones de dispositivos externos como temporizadores, teclados o controladores de red. Estas interrupciones de hardware son muy similares a las excepciones, por ejemplo, también se despachan a través de la IDT. Sin embargo, a diferencia de las excepciones, no surgen directamente en la CPU. En su lugar, un _controlador de interrupciones_ agrega estas interrupciones y las reenvía a la CPU según su prioridad. En la próxima publicación, exploraremos el [Intel 8259] (“PIC”) controlador de interrupciones y aprenderemos cómo implementar soporte para teclado. [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ================================================ FILE: blog/content/edition-2/posts/06-double-faults/index.fa.md ================================================ +++ title = "خطاهای دوگانه" weight = 6 path = "fa/double-fault-exceptions" date = 2018-06-18 [extra] # Please update this when updating the translation translation_based_on_commit = "3ac829171218156c07ce9a27186fee58e3a5521e" # GitHub usernames of the people that translated this post translators = ["hamidrezakp", "MHBahrampour"] rtl = true +++ این پست به طور دقیق جزئیات استثنای خطای دوگانه (ترجمه: double fault exception) را بررسی می‌کند، این استثنا هنگامی رخ می‌دهد که CPU نتواند یک کنترل کننده استثنا را فراخوانی کند. با کنترل این استثنا، از بروز _خطاهای سه گانه_ (ترجمه: triple faults) کشنده که باعث ریست (کلمه: reset) شدن سیستم می‌شوند، جلوگیری می‌کنیم. برای جلوگیری از خطاهای سه گانه در همه موارد، ما همچنین یک _Interrupt Stack Table_ را تنظیم کرده‌ایم تا خطاهای دوگانه را روی یک پشته هسته جداگانه بگیرد. این بلاگ بصورت آزاد روی [گیت‌هاب] توسعه داده شده است. اگر شما مشکل یا سوالی دارید، لطفاً آن‌جا یک ایشو باز کنید. شما همچنین می‌توانید [در زیر] این پست کامنت بگذارید. منبع کد کامل این پست را می‌توانید در بِرَنچ [`post-06`][post branch] پیدا کنید. [گیت‌هاب]: https://github.com/phil-opp/blog_os [در زیر]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-06 ## خطای دوگانه چیست؟ به عبارت ساده، خطای دوگانه یک استثنای به خصوص است و هنگامی رخ می‌دهد که CPU نتواند یک کنترل کننده استثنا را فراخوانی کند. به عنوان مثال، این اتفاق هنگامی رخ می‌دهد که یک page fault (ترجمه: خطای صفحه) رخ دهد اما هیچ کنترل کننده خطایی در [جدول توصیف کننده وقفه][IDT] (ترجمه: Interrupt Descriptor Table) ثبت نشده باشد. بنابراین به نوعی شبیه بلاک‌های همه گیر در زبان‌های برنامه‌نویسی با استثناها می‌باشد، به عنوان مثال `catch(...)` در ++C یا `catch(Exception e)` در جاوا و #C. [IDT]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table خطای دوگانه مانند یک استثنای عادی رفتار می‌کند. دارای شماره وکتور (کلمه: vector) `8` است و ما می‌توانیم یک تابع طبیعی کنترل کننده برای آن در IDT تعریف کنیم. تهیه یک کنترل کننده خطای دوگانه بسیار مهم است، زیرا اگر یک خطای دوگانه کنترل نشود، یک خطای کشنده سه گانه رخ می‌دهد. خطاهای سه گانه قابل کشف نیستند و اکثر سخت افزارها با تنظیم مجدد سیستم واکنش نشان می‌دهند. ### راه‌اندازی یک خطای دوگانه بیایید یک خطای دوگانه را با راه‌اندازی (ترجمه: triggering) یک استثنا برای آن ایجاد کنیم، ما هنوز یک تابع کنترل کننده تعریف نکرده‌ایم: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // trigger a page fault unsafe { *(0xdeadbeef as *mut u8) = 42; }; // as before #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` برای نوشتن در آدرس نامعتبر `0xdeadbeef` از` unsafe` استفاده می‌کنیم. آدرس مجازی در جداول صفحه به آدرس فیزیکی مپ نمی‌شود، بنابراین خطای صفحه رخ می‌دهد. ما یک کنترل کننده خطای صفحه در [IDT] خود ثبت نکرده‌ایم، بنابراین یک خطای دوگانه رخ می‌دهد. حال وقتی هسته را اجرا می‌کنیم، می‌بینیم که وارد یک حلقه بوت بی‌پایان می‌شود. دلایل حلقه بوت به شرح زیر است: ۱. سی‌پی‌یو سعی به نوشتن در `0xdeadbeef` دارد، که باعث خطای صفحه می‌شود. ۲. سی‌پی‌یو به ورودی مربوطه در IDT نگاه می‌کند و می‌بیند که هیچ تابع کنترل کننده‌ای مشخص نشده است. بنابراین، نمی‌تواند کنترل کننده خطای صفحه را فراخوانی کند و یک خطای دوگانه رخ می‌دهد. ۳. سی‌پی‌یو ورودی IDT کنترل کننده خطای دو گانه را بررسی می‌کند، اما این ورودی هم تابع کنترل کننده‌ای را مشخص نمی‌کند. بنابراین، یک خطای _سه‌گانه_ رخ می‌دهد. ۴. خطای سه گانه کشنده است. QEMU مانند اکثر سخت افزارهای واقعی به آن واکنش نشان داده دستور ریست شدن سیستم را صادر می‌کند. بنابراین برای جلوگیری از این خطای سه‌گانه، باید یک تابع کنترل کننده برای خطاهای صفحه یا یک کنترل کننده خطای دوگانه ارائه دهیم. ما می‌خواهیم در همه موارد از خطاهای سه گانه جلوگیری کنیم، بنابراین بیایید با یک کنترل کننده خطای دوگانه شروع کنیم که برای همه انواع استثنا بدون کنترل فراخوانی می‌شود. ## کنترل کننده خطای دوگانه خطای دوگانه یک استثنا عادی با کد خطا است، بنابراین می‌توانیم یک تابع کنترل کننده مشابه کنترل کننده نقطه شکست (ترجمه: breakpoint) تعیین کنیم: ```rust // in src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.double_fault.set_handler_fn(double_fault_handler); // new idt }; } // new extern "x86-interrupt" fn double_fault_handler( stack_frame: InterruptStackFrame, _error_code: u64) -> ! { panic!("EXCEPTION: DOUBLE FAULT\n{:#?}", stack_frame); } ``` کنترل کننده ما یک پیام خطای کوتاه چاپ می‌کند و قاب پشته استثنا را تخلیه می‌کند. کد خطای کنترل کننده خطای دوگانه همیشه صفر است، بنابراین دلیلی برای چاپ آن وجود ندارد. یک تفاوت در کنترل کننده نقطه شکست این است که کنترل کننده خطای دوگانه [_diverging_] \(ترجمه: واگرا) است. چون معماری `x86_64` امکان بازگشت از یک استثنا خطای دوگانه را ندارد. [_diverging_]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html حال وقتی هسته را اجرا می‌کنیم، باید ببینیم که کنترل کننده خطای دوگانه فراخوانی می‌شود: ![QEMU printing `EXCEPTION: DOUBLE FAULT` and the exception stack frame](qemu-catch-double-fault.png) کار کرد! آن‌چه این بار اتفاق می‌افتد بصورت زیر است: ۱. سی‌پی‌یو سعی به نوشتن در `0xdeadbeef` دارد، که باعث خطای صفحه می‌شود. ۲. مانند قبل، سی‌پی‌یو به ورودی مربوطه در IDT نگاه می‌کند و می‌بیند که هیچ تابع کنترل کننده‌ای مشخص نشده است. بنابراین، یک خطای دوگانه رخ می‌دهد. ۳. سی‌پی‌یو به کنترل کننده خطای دوگانه - که اکنون وجود دارد - می‌رود. خطای سه گانه (و حلقه بوت) دیگر رخ نمی‌دهد، زیرا اکنون CPU می‌تواند کنترل کننده خطای دوگانه را فراخوانی کند. این کاملاً ساده بود! پس چرا ما برای این موضوع به یک پست کامل نیاز داریم؟ خب، ما اکنون قادر به ردیابی _اکثر_ خطاهای دوگانه هستیم، اما مواردی وجود دارد که رویکرد فعلی ما کافی نیست. ## علل رخ داد خطای دوگانه قبل از بررسی موارد خاص، باید علل دقیق خطاهای دوگانه را بدانیم. در بالا، ما از یک تعریف کاملا مبهم استفاده کردیم: > خطای دوگانه یک استثنای به خصوص است و هنگامی رخ می‌دهد که CPU نتواند یک کنترل کننده استثنا را فراخوانی کند. عبارت _“fails to invoke”_ دقیقا چه معنایی دارد؟ کنترل کننده وجود ندارد؟ کنترل کننده [خارج شده][swapped out] \(منظور این است که آیا صفحه مربوط به کنترل کننده از حافظه خارج شده)؟ و اگر کنترل کننده خودش باعث رخ دادن یک استثناها شود، چه اتفاقی رخ می‌دهد؟ [swapped out]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf به عنوان مثال، چه اتفاقی می‌افتد اگر: ۱. یک استثنای نقطه شکست رخ می‌دهد، آیا تابع کنترل کننده مربوطه خارج شده است؟ ۲. یک خطای صفحه رخ می‌دهد، آیا کنترل کننده خطای صفحه خارج شده است؟ ۳. کنترل کننده‌ی «تقسیم بر صفر» باعث رخ دادن یک استثنای نقطه شکست می‌شود، آیا کنترل کننده نقطه شکست خارج شده است؟ ۴. هسته ما پشته خود را سرریز می‌کند و آیا _صفحه محافظ_ (ترجمه: guard page) ضربه می‌خورد؟ خوشبختانه، کتابچه راهنمای AMD64 ([PDF][AMD64 manual]) یک تعریف دقیق دارد (در بخش 8.2.9). مطابق آن، "یک استثنای خطای دوگانه _می‌تواند_ زمانی اتفاق بیفتد که یک استثنا دوم هنگام کار با یک کنترل کننده استثنا قبلی (اول) رخ دهد". _"می تواند"_ مهم است: فقط ترکیبی بسیار خاص از استثناها منجر به خطای دوگانه می‌شود. این ترکیبات عبارتند از: استثنای اول | استثنای دوم ----------------|----------------- [Divide-by-zero],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] [Page Fault] | [Page Fault],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] [Divide-by-zero]: https://wiki.osdev.org/Exceptions#Division_Error [Invalid TSS]: https://wiki.osdev.org/Exceptions#Invalid_TSS [Segment Not Present]: https://wiki.osdev.org/Exceptions#Segment_Not_Present [Stack-Segment Fault]: https://wiki.osdev.org/Exceptions#Stack-Segment_Fault [General Protection Fault]: https://wiki.osdev.org/Exceptions#General_Protection_Fault [Page Fault]: https://wiki.osdev.org/Exceptions#Page_Fault [AMD64 manual]: https://www.amd.com/system/files/TechDocs/24593.pdf بنابراین به عنوان مثال، یک خطای تقسیم بر صفر (ترجمه: Divide-by-zero) و به دنبال آن خطای صفحه (ترجمه: Page Fault)، خوب است (کنترل کننده خطای صفحه فراخوانی می‌شود)، اما خطای تقسیم بر صفر و به دنبال آن یک خطای محافظت عمومی (ترجمه: General Protection) منجر به خطای دوگانه می شود. با کمک این جدول می‌توانیم به سه مورد اول از سوال‌های بالا پاسخ دهیم: ۱. اگر یک استثنای نقطه شکست اتفاق بیفتد و تابع مربوط به کنترل کننده آن خارج شده باشد، یک _خطای صفحه_ رخ می‌دهد و _کنترل کننده خطای صفحه_ فراخوانی می‌شود. ۲. اگر خطای صفحه رخ دهد و کنترل کننده خطای صفحه خارج شده باشد، یک _خطای دوگانه_ رخ می‌دهد و _کنترل کننده خطای دوگانه_ فراخوانی می‌شود. ۳. اگر یک کنترل کننده تقسیم بر صفر باعث استثنای نقطه شکست شود، CPU سعی می‌کند تا کنترل کننده نقطه شکست را فراخوانی کند. اگر کنترل کننده نقطه شکست خارج شده باشد، یک _خطای صفحه_ رخ می‌دهد و _کنترل کننده خطای صفحه_ فراخوانی می‌شود. در حقیقت، حتی موارد استثنا بدون تابع کنترل کننده در IDT نیز از این طرح پیروی می‌کند: وقتی استثنا رخ می‌دهد، CPU سعی می‌کند ورودی IDT مربوطه را بخواند. از آن‌جا که ورودی 0 است، که یک ورودی IDT معتبر نیست، یک _خطای محافظت کلی_ رخ می‌دهد. ما یک تابع کنترل کننده برای خطای محافظت عمومی نیز تعریف نکردیم، بنابراین یک خطای محافظت عمومی دیگر رخ می‌دهد. طبق جدول، این منجر به یک خطای دوگانه می‌شود. ### سرریزِ پشته‌ی هسته بیایید به سوال چهارم نگاه کنیم: > چه اتفاقی می‌افتد اگر هسته ما پشته خود را سرریز کند و صفحه محافظ ضربه بخورد؟ یک صفحه محافظ یک صفحه حافظه ویژه در پایین پشته است که امکان تشخیصِ سرریز پشته را فراهم می‌کند. صفحه به هیچ قاب فیزیکی مپ نشده است، بنابراین دسترسی به آن باعث خطای صفحه می‌شود به جای اینکه بی صدا حافظه دیگر را خراب کند. بوت‌لودر یک صفحه محافظ برای پشته هسته ما تنظیم می‌کند، بنابراین سرریز پشته باعث _خطای صفحه_ می‌شود. هنگامی که خطای صفحه رخ می‌دهد، پردازنده به دنبال کنترل کننده خطای صفحه در IDT است و سعی می‌کند تا [قاب پشته وقفه][interrupt stack frame] را به پشته پوش می‌کند. با این حال، اشاره‌گر پشته فعلی هنوز به صفحه محافظی اشاره می‌کند که موجود نیست. بنابراین، خطای صفحه دوم رخ می‌دهد، که باعث خطای دوگانه می‌شود (مطابق جدول فوق). [interrupt stack frame]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-stack-frame بنابراین حالا پردازنده سعی می‌کند _کنترل کننده خطای دوگانه_ را فراخوانی کند. با این حال، هنگام رخ دادن خطای دوگانه پردازنده سعی می‌کند تا قاب پشته استثنا را نیز پوش کند. اشاره‌گر پشته هنوز به سمت صفحه محافظ است، بنابراین یک خطای صفحه _سوم_ رخ می‌هد که باعث یک _خطای سه‌گانه_ و راه اندازی مجدد سیستم می‌شود. بنابراین کنترل کننده خطای دوگانه فعلی ما نمی‌تواند از خطای سه‌گانه در این مورد جلوگیری کند. بیایید خودمان امتحان کنیم! ما می‌توانیم با فراخوانی تابعی که به طور بی‌وقفه بازگشت می‌یابد، به راحتی سرریز پشته هسته را تحریک کنیم (باعث رخ دادن یک سرریز پشته هسته شویم): ```rust // in src/main.rs #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); fn stack_overflow() { stack_overflow(); // for each recursion, the return address is pushed } // trigger a stack overflow stack_overflow(); […] // test_main(), println(…), and loop {} } ``` وقتی این کد را در QEMU امتحان می‌کنیم، می‌بینیم که سیستم دوباره وارد یک حلقه بوت می‌شود. بنابراین چگونه می‌توانیم از بروز این مشکل جلوگیری کنیم؟ ما نمی‌توانیم پوش کردن قاب پشته استثنا را حذف کنیم، زیرا پردازنده خود این کار را انجام می‌دهد. بنابراین ما باید به نحوی اطمینان حاصل کنیم که وقتی یک استثنای خطای دوگانه رخ می‌دهد، پشته همیشه معتبر است. خوشبختانه، معماری x86_64 راه حلی برای این مشکل دارد. ## تعویض پشته‌ها معماری x86_64 قادر است در صورت وقوع یک استثنا به یک پشته از پیش تعریف شده و شناخته شده تعویض شود. این تعویض در سطح سخت افزاری اتفاق می‌افتد، بنابراین می‌توان آن را قبل از اینکه پردازنده قاب پشته استثنا را پوش کند، انجام داد. مکانیزم تعویض به عنوان _Interrupt Stack Table_ (IST) پیاده‌سازی می‌شود. IST جدولی است با 7 اشاره‌گر برای دسته های معروف. در شبه‌ کد شبیه Rust: ```rust struct InterruptStackTable { stack_pointers: [Option; 7], } ``` برای هر کنترل کننده استثنا، می‌توانیم یک پشته از IST از طریق فیلد `stack_pointers` مربوط به [IDT entry] انتخاب کنیم. به عنوان مثال، ما می‌توانیم از اولین پشته در IST برای کنترل کننده خطای دوگانه استفاده کنیم. هرگاه خطای دوگانه رخ دهد، پردازنده به طور خودکار به این پشته تغییر می‌کند. این تعویض قبل از پوش کردن هر چیزی اتفاق می‌افتد، بنابراین از خطای سه‌گانه جلوگیری می‌کند. [IDT entry]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table ### IST و TSS جدول پشته وقفه (ترجمه: Interrupt Stack Table: IST) بخشی از یک ساختار قدیمی است که به آن _[سگمنت وضعیت پروسه]_ \(Task State Segment: TSS) گفته می‌شود. TSS برای نگهداری اطلاعات مختلف (به عنوان مثال وضعیت ثبات پردازنده) در مورد یک پروسه در حالت 32 بیتی استفاده می‌شد و به عنوان مثال برای [تعویض سخت‌افزاری context] \(ترجمه: hardware context switching) استفاده می‌شد. با این حال، تعویض سخت‌افزاری context دیگر در حالت 64 بیتی پشتیبانی نمی‌شود و قالب TSS کاملاً تغییر کرده است. [سگمنت وضعیت پروسه]: https://en.wikipedia.org/wiki/Task_state_segment [تعویض سخت‌افزاری context]: https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching در x86_64، دیگر TSS هیچ اطلاعات خاصی برای پرسه‌ها ندارد. در عوض، دو جدول پشته را در خود جای داده است (IST یکی از آنهاست). تنها فیلد مشترک بین TSS 32-bit و TSS 64-bit اشاره‌گر به [بیت‌مپ مجوزهای پورت I/O] است. [بیت‌مپ مجوزهای پورت I/O]: https://en.wikipedia.org/wiki/Task_state_segment#I.2FO_port_permissions فرمت TSS 64-bit مانند زیر است: فیلد | نوع ------ | ---------------- (reserved) | `u32` Privilege Stack Table | `[u64; 3]` (reserved) | `u64` Interrupt Stack Table | `[u64; 7]` (reserved) | `u64` (reserved) | `u16` I/O Map Base Address | `u16` وقتی سطح ممتاز تغییر می‌کند، پردازنده از _Privilege Stack Table_ استفاده می‌کند. به عنوان مثال، اگر یک استثنا در حالی که CPU در حالت کاربر است (سطح ممتاز 3) رخ دهد، CPU معمولاً قبل از فراخوانی کنترل کننده استثنا، به حالت هسته تغییر می‌کند (سطح امتیاز 0). در این حالت، CPU به پشته صفرم در جدول پشته ممتاز تغییر وضعیت می دهد (از آنجا که 0، سطح ممتاز هدف است). ما هنوز هیچ برنامه حالت کاربر نداریم، بنابراین اکنون این جدول را نادیده می‌گیریم. ### ایجاد یک TSS بیایید یک TSS جدید ایجاد کنیم که شامل یک پشته خطای دوگانه جداگانه در جدول پشته وقفه خود باشد. برای این منظور ما به یک ساختار TSS نیاز داریم. خوشبختانه کریت `x86_64` از قبل حاوی [ساختار `TaskStateSegment`] است که می‌توانیم از آن استفاده کنیم. [ساختار `TaskStateSegment`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/tss/struct.TaskStateSegment.html ما TSS را در یک ماژول جدید به نام `gdt` ایجاد می‌کنیم (نام این ماژول بعداً برای‌تان معنا پیدا می‌کند): ```rust // in src/lib.rs pub mod gdt; // in src/gdt.rs use x86_64::VirtAddr; use x86_64::structures::tss::TaskStateSegment; use lazy_static::lazy_static; pub const DOUBLE_FAULT_IST_INDEX: u16 = 0; lazy_static! { static ref TSS: TaskStateSegment = { let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = { const STACK_SIZE: usize = 4096 * 5; static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE]; let stack_start = VirtAddr::from_ptr(&raw const STACK); let stack_end = stack_start + STACK_SIZE; stack_end }; tss }; } ``` ما از `lazy_static` استفاده می‌کنیم زیرا ارزیابی کننده ثابت راست هنوز آن‌قدر توانمند نیست که بتواند این مقداردهی اولیه را در زمان کامپایل انجام دهد. ما تعریف می‌کنیم که ورودی صفرم IST پشته خطای دوگانه است (هر اندیس دیگری از IST نیز قابل استفاده است). سپس آدرس بالای یک پشته خطای دوگانه را در ورودی صفرم می‌نویسیم. ما آدرس بالایی را می‌نویسیم زیرا پشته‌های x86 به سمت پایین رشد می‌کنند، یعنی از آدرس‌های بالا به آدرس‌های پایین می‌آیند. ما هنوز مدیریت حافظه را پیاده سازی نکرده‌ایم، بنابراین روش مناسبی برای اختصاص پشته جدید نداریم. در عوض، فعلاً از یک آرایه `static mut` به عنوان حافظه پشته استفاده میکنیم. مهم است که یک `static mut` باشد و نه یک استاتیک‌ تغییرناپذیر (ترجمه: immutable)، زیرا در غیر این صورت bootloader آن را به یک صفحه فقط خواندنی نگاشت می‌کند. توجه داشته باشید که این پشته خطای دوگانه فاقد صفحه محافظ در برابر سرریز پشته است. یعنی ما نباید هیچ کاری که اضافه شدن ایتمی در پشته شود را انجام دهیم زیرا سرریز پشته ممکن است حافظه زیر پشته را خراب کند. #### بارگذاری TSS اکنون که TSS جدیدی ایجاد کردیم، به روشی نیاز داریم که به CPU بگوییم باید از آن استفاده کند. متأسفانه این کمی دشوار است، زیرا TSS به دلایل تاریخی از سیستم سگمنت‌بندی (ترجمه: segmentation) استفاده می‌کند. به جای بارگذاری مستقیم جدول، باید توصیفگر سگمنت جدیدی را به [جدول توصیف‌گر سراسری] \(Global Descriptor Table: GDT) اضافه کنیم. سپس می‌توانیم TSS خود را با فراخوانی [دستور `ltr`] با اندیس GDT مربوطه بارگذاری کنیم. (دلیل این‌که نام ماژول را `gdt` گذاشتیم نیز همین بود). [جدول توصیف‌گر سراسری]: https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/ [دستور `ltr`]: https://www.felixcloutier.com/x86/ltr ### جدول توصیف‌گر سراسری جدول توصیف‌گر سراسری (GDT) یک یادگاری است که قبل از این‌که صفحه‌بندی به صورت استاندارد تبدیل شود، برای [تقسیم‌بندی حافظه] استفاده می‌شد. این مورد همچنان در حالت 64 بیتی برای موارد مختلف مانند پیکربندی هسته/کاربر یا بارگذاری TSS مورد نیاز است. [تقسیم‌بندی حافظه]: https://en.wikipedia.org/wiki/X86_memory_segmentation جدول توصیف‌گر سراسری، ساختاری است که شامل _بخشهای_ برنامه است. قبل از اینکه صفحه‌بندی به استاندارد تبدیل شود، از آن در معماری‌های قدیمی استفاده می‌شد تا برنامه ها را از یکدیگر جدا کند. برای کسب اطلاعات بیشتر در مورد سگمنت‌بندی، فصل مربوط به این موضوع در [کتاب “Three Easy Pieces”] را مطالعه کنید. در حالی که سگمنت‌بندی در حالت 64 بیتی دیگر پشتیبانی نمی‌شود، GDT هنوز وجود دارد. بیشتر برای دو چیز استفاده می‌شود: جابجایی بین فضای هسته و فضای کاربر، و بارگذاری ساختار TSS. [کتاب “Three Easy Pieces”]: http://pages.cs.wisc.edu/~remzi/OSTEP/ #### ایجاد یک GDT بیایید یک `GDT` استاتیک ایجاد کنیم که شامل یک بخش برای TSS استاتیک ما باشد: ```rust // in src/gdt.rs use x86_64::structures::gdt::{GlobalDescriptorTable, Descriptor}; lazy_static! { static ref GDT: GlobalDescriptorTable = { let mut gdt = GlobalDescriptorTable::new(); gdt.add_entry(Descriptor::kernel_code_segment()); gdt.add_entry(Descriptor::tss_segment(&TSS)); gdt }; } ``` ما دوباره از `lazy_static` استفاده می‌کنیم، زیرا ارزیابی کننده ثابت راست هنوز آن‌قدر توانمند نیست. ما یک GDT جدید با یک کد سگمنت و یک بخش TSS ایجاد می‌کنیم. #### بارگذاری GDT برای بارگذاری GDT، یک تابع جدید `gdt::init` ایجاد می‌کنیم که آن را از تابع `init` فراخوانی می‌کنیم: ```rust // in src/gdt.rs pub fn init() { GDT.load(); } // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); } ``` اکنون GDT ما بارگذاری شده است (از آن‌جا که تابع `start_`، تابع `init` را فراخوانی می‌کند)، اما هنوز حلقه بوت را هنگامِ سرریز پشته مشاهده می‌کنیم. ### مراحل پایانی مشکل این است که سگمنت‌های GDT هنوز فعال نیستند زیرا سگمنت و ثبات‌های TSS هنوز حاوی مقادیر GDT قدیمی هستند. ما همچنین باید ورودی خطای دوگانه IDT را اصلاح کنیم تا از پشته جدید استفاده کند. به طور خلاصه، باید موارد زیر را انجام دهیم: ۱. **بارگذاری مجدد ثبات کد سگمنت**: ما GDT خود را تغییر دادیم، بنابراین باید `cs`، ثبات کد سگمنت را بارگذاری مجدد کنیم. این مورد الزامی است زیرا انتخاب‌گر سگمنت قدیمی می‌تواند اکنون توصیف‌گر دیگری از GDT را نشان دهد (به عنوان مثال توصیف کننده TSS). ۲. **بارگذاری TSS**: ما یک GDT بارگذاری کردیم که شامل یک انتخاب‌گر TSS است، اما هنوز باید به CPU بگوییم که باید از آن TSS استفاده کند. ۳. **بروزرسانی ورودی IDT**: به محض این‌که TSS بارگذاری شد، CPU به یک جدول پشته وقفه معتبر (IST) دسترسی دارد. سپس می‌توانیم به CPU بگوییم که باید با تغییر در ورودی IDT خطای دوگانه از پشته خطای دوگانه جدید استفاده کند. برای دو مرحله اول، ما نیاز به دسترسی به متغیرهای` code_selector` و `tss_selector` در تابع `gdt::init` داریم. می‌توانیم با تبدیل آن‌ها به بخشی از استاتیک از طریق ساختار جدید `Selectors` به این هدف برسیم: ```rust // in src/gdt.rs use x86_64::structures::gdt::SegmentSelector; lazy_static! { static ref GDT: (GlobalDescriptorTable, Selectors) = { let mut gdt = GlobalDescriptorTable::new(); let code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS)); (gdt, Selectors { code_selector, tss_selector }) }; } struct Selectors { code_selector: SegmentSelector, tss_selector: SegmentSelector, } ``` اکنون می‌توانیم با استفاده از انتخاب‌گرها، ثبات بخش `cs` را بارگذاری مجدد کرده و `TSS` را بارگذاری کنیم: ```rust // in src/gdt.rs pub fn init() { use x86_64::instructions::segmentation::set_cs; use x86_64::instructions::tables::load_tss; GDT.0.load(); unsafe { set_cs(GDT.1.code_selector); load_tss(GDT.1.tss_selector); } } ``` ما با استفاده از [`set_cs`] ثبات کد سگمنت را بارگذاری مجدد می‌کنیم و برای بارگذاری TSS با از [`load_tss`] استفاده می‌کنیم. توابع به عنوان `unsafe` علامت گذاری شده‌اند، بنابراین برای فراخوانی آن‌ها به یک بلوک `unsafe` نیاز داریم. چون ممکن است با بارگذاری انتخاب‌گرهای نامعتبر، ایمنی حافظه از بین برود. [`set_cs`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/segmentation/fn.set_cs.html [`load_tss`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tables/fn.load_tss.html اکنون که یک TSS معتبر و جدول پشته‌ وقفه را بارگذاری کردیم، می‌توانیم اندیس پشته را برای کنترل کننده خطای دوگانه در IDT تنظیم کنیم: ```rust // in src/interrupts.rs use crate::gdt; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); unsafe { idt.double_fault.set_handler_fn(double_fault_handler) .set_stack_index(gdt::DOUBLE_FAULT_IST_INDEX); // new } idt }; } ``` روش `set_stack_index` ایمن نیست زیرا فراخوان (ترجمه: caller) باید اطمینان حاصل کند که اندیس استفاده شده معتبر است و قبلاً برای استثنای دیگری استفاده نشده است. همین! اکنون CPU باید هر زمان که خطای دوگانه رخ داد، به پشته خطای دوگانه برود. بنابراین، ما می‌توانیم _همه_ خطاهای دوگانه، از جمله سرریزهای پشته هسته را بگیریم: ![QEMU printing `EXCEPTION: DOUBLE FAULT` and a dump of the exception stack frame](qemu-double-fault-on-stack-overflow.png) از این به بعد هرگز نباید شاهد خطای سه‌گانه باشیم! برای اطمینان از اینکه موارد بالا را به طور تصادفی نقض نمی‌کنیم، باید یک تست برای این کار اضافه کنیم. ## تست سرریز پشته برای آزمایش ماژول `gdt` جدید و اطمینان از اینکه مدیر خطای دوگانه به درستی هنگام سرریز پشته فراخوانی شده است، می‌توانیم یک تست یکپارچه اضافه کنیم. ایده این است که یک خطای دوگانه در تابع تست ایجاد کنید و تأیید کنید که مدیر خطای دوگانه فراخوانی می‌شود. بیایید با یک طرح مینیمال شروع کنیم: ```rust // in tests/stack_overflow.rs #![no_std] #![no_main] use core::panic::PanicInfo; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` مانند تست `panic_handler`، تست [بدون یک test harness] اجرا خواهد شد. زیرا پس از یک خطای دوگانه نمی‌توانیم اجرا را ادامه دهیم، بنابراین بیش از یک تست منطقی نیست. برای غیرفعال کردن test harness برای این تست، موارد زیر را به `Cargo.toml` اضافه می‌کنیم: ```toml # in Cargo.toml [[test]] name = "stack_overflow" harness = false ``` [بدون یک test harness]: @/edition-2/posts/04-testing/index.md#no-harness-tests حال باید `cargo test --test stack_overflow` بصورت موفقیت‌آمیز کامپایل شود. البته این تست با شکست مواجه می‌شود، زیرا ماکروی `unimplemented` پنیک می‌کند. ### پیاده‌سازی `start_` پیاده‌سازی تابع `start_` مانند این است: ```rust // in tests/stack_overflow.rs use blog_os::serial_print; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { serial_print!("stack_overflow::stack_overflow...\t"); blog_os::gdt::init(); init_test_idt(); // trigger a stack overflow stack_overflow(); panic!("Execution continued after stack overflow"); } #[allow(unconditional_recursion)] fn stack_overflow() { stack_overflow(); // for each recursion, the return address is pushed volatile::Volatile::new(0).read(); // prevent tail recursion optimizations } ``` برای راه‌اندازی یک GDT جدید، تابع `gdt::init` را فراخوانی می‌کنیم. به جای فراخوانی تابع `interrupts::init_idt`، تابع `init_test_idt` را فراخوانی می‌کنیم که بزودی توضیح داده می‌شود. زیرا ما می‌خواهیم یک مدیر خطای دوگانه سفارشی ثبت کنیم که به جای پنیک کردن، دستور `exit_qemu(QemuExitCode::Success)` را انجام می‌دهد. تابع `stack_overflow` تقریباً مشابه تابع موجود در `main.rs` است. تنها تفاوت این است که برای جلوگیری از بهینه‌سازی کامپایلر موسوم به [_tail call elimination_]، در پایان تابع، یک خواندنِ [فرارِ] \(ترجمه: volatile) اضافه به وسیله نوع [`Volatile`] انجام می‌دهیم. از جمله، این بهینه‌سازی به کامپایلر اجازه می‌دهد تابعی را که آخرین عبارت آن فراخوانی تابع بازگشتی است، به یک حلقه طبیعی تبدیل کند. بنابراین، هیچ قاب پشته اضافی برای فراخوانی تابع ایجاد نمی‌شود، پس استفاده از پشته ثابت می‌ماند. [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) [`Volatile`]: https://docs.rs/volatile/0.2.6/volatile/struct.Volatile.html [_tail call elimination_]: https://en.wikipedia.org/wiki/Tail_call با این حال، در مورد ما، ما می‌خواهیم که سرریز پشته اتفاق بیفتد، بنابراین در انتهای تابع یک دستور خواندن فرار ساختگی اضافه می‌کنیم، که کامپایلر مجاز به حذف آن نیست. بنابراین، تابع دیگر _tail recursive_ نیست و از تبدیل به یک حلقه جلوگیری می‌شود. ما همچنین صفت `allow(unconditional_recursion)` را اضافه می‌کنیم تا اخطار کامپایلر را در مورد تکرار بی‌وقفه تابع خاموش نگه دارد. ### تست IDT همانطور که در بالا ذکر شد، این تست به IDT مخصوص خود با یک مدیر خطای دوگانه سفارشی نیاز دارد. پیاده‌سازی به این شکل است: ```rust // in tests/stack_overflow.rs use lazy_static::lazy_static; use x86_64::structures::idt::InterruptDescriptorTable; lazy_static! { static ref TEST_IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); unsafe { idt.double_fault .set_handler_fn(test_double_fault_handler) .set_stack_index(blog_os::gdt::DOUBLE_FAULT_IST_INDEX); } idt }; } pub fn init_test_idt() { TEST_IDT.load(); } ``` پیاده‌سازی بسیار شبیه IDT طبیعی ما در `interrupts.rs` است. مانند IDT عادی، برای مدیر خطای دوگانه به منظور جابجایی به پشته‌ای جداگانه، یک اندیس پشته را در IST تنظیم می‌کنیم. تابع `init_test_idt` با استفاده از روش `load`، آی‌دی‌تی را بر روی پردازنده بارگذاری می‌کند. ### مدیر خطای دوگانه تنها قسمت جامانده، مدیر خطای دوگانه است که به این شکل پیاده‌سازی می‌شود: ```rust // in tests/stack_overflow.rs use blog_os::{exit_qemu, QemuExitCode, serial_println}; use x86_64::structures::idt::InterruptStackFrame; extern "x86-interrupt" fn test_double_fault_handler( _stack_frame: InterruptStackFrame, _error_code: u64, ) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` هنگامی که مدیر خطای دوگانه فراخوانی می‌شود، از QEMU با یک کد خروج موفقیت‌آمیز خارج می‌شویم، که تست را بعنوان «قبول شده» علامت‌گذاری می‌داند. از آن‌جا که تست‌های یکپارچه اجرایی‌های کاملاً مجزایی هستند، باید صفت `[feature(abi_x86_interrupt)]!#` را در بالای فایل تست تنظیم کنیم. اکنون می‌توانیم تست را از طریق `cargo test --test stack_overflow` (یا `cargo test` برای اجرای همه تست‌ها) انجام دهیم. همانطور که انتظار می‌رفت، خروجی `stack_overflow... [ok ]` را در کنسول مشاهده می‌کنیم. خط `set_stack_index` را کامنت کنید: این امر باعث می‌شود تست از کار بیفتد. ## خلاصه در این پست یاد گرفتیم که خطای دوگانه چیست و در چه شرایطی رخ می‌دهد. ما یک مدیر خطای دوگانه پایه اضافه کردیم که پیام خطا را چاپ می‌کند و یک تست یکپارچه برای آن اضافه کردیم. ما همچنین تعویض پشته پشتیبانی شده سخت‌افزاری را در استثناهای خطای دوگانه فعال کردیم تا در سرریز پشته نیز کار کند. در حین پیاده‌سازی آن، ما با سگمنت وضعیت پروسه (TSS)، جدول پشته وقفه (IST) و جدول توصیف کننده سراسری (GDT) آشنا شدیم، که برای سگمنت‌بندی در معماری‌های قدیمی استفاده می‌شد. ## بعدی چیست؟ پست بعدی نحوه مدیریت وقفه‌های دستگاه‌های خارجی مانند تایمر، صفحه کلید یا کنترل کننده‌های شبکه را توضیح می‌دهد. این وقفه‌های سخت‌افزاری بسیار شبیه به استثناها هستند، به عنوان مثال آن‌ها هم از طریق IDT ارسال می‌شوند. با این حال، برخلاف استثناها، مستقیماً روی پردازنده رخ نمی‌دهند. در عوض، یک _interrupt controller_ این وقفه‌ها را جمع کرده و بسته به اولویت، آن‌ها را به CPU می‌فرستد. در بخش بعدی، مدیر وقفه [Intel 8259] \("PIC") را بررسی خواهیم کرد و نحوه پیاده‌سازی پشتیبانی صفحه کلید را یاد خواهیم گرفت. [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ================================================ FILE: blog/content/edition-2/posts/06-double-faults/index.ja.md ================================================ +++ title = "Double Faults" weight = 6 path = "ja/double-fault-exceptions" date = 2018-06-18 [extra] # Please update this when updating the translation translation_based_on_commit = "27ac0e1acc36f640d7045b427da2ed65b945756b" # GitHub usernames of the people that translated this post translators = ["garasubo"] +++ この記事ではCPUが例外ハンドラの呼び出しに失敗したときに起きる、ダブルフォルト例外について詳細に見ていきます。この例外を処理することによって、システムリセットを起こす重大な**トリプルフォルト**を避けることができます。あらゆる場合においてトリプルフォルトを防ぐために、ダブルフォルトを異なるカーネルスタック上でキャッチするための**割り込みスタックテーブル**をセットアップしていきます。 このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください(訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-06` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-06 ## ダブルフォルトとは 簡単に言うとダブルフォルトとはCPUが例外ハンドラを呼び出すことに失敗したときに起きる特別な例外です。例えば、ページフォルトが起きたが、ページフォルトハンドラが[割り込みディスクリプタテーブル][IDT](IDT: Interrupt Descriptor Table)に登録されていないときに発生します。つまり、C++での`catch(...)`や、JavaやC#の`catch(Exception e)`ような、例外のあるプログラミング言語のcatch-allブロックのようなものです。 [IDT]: @/edition-2/posts/05-cpu-exceptions/index.ja.md#ge-riip-miji-shu-zi-biao ダブルフォルトは通常の例外のように振る舞います。ベクター番号`8`を持ち、IDTに通常のハンドラ関数として定義できます。ダブルフォルトがうまく処理されないと、より重大な例外である**トリプルフォルト**が起きてしまうため、ダブルフォルトハンドラを設定することはとても重要です。トリプルフォルトはキャッチできず、ほとんどのハードウェアはシステムリセットを起こします。 ### ダブルフォルトを起こす ハンドラ関数を定義していない例外を発生させることでダブルフォルトを起こしてみましょう。 ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // ページフォルトを起こす unsafe { *(0xdeadbeef as *mut u8) = 42; }; // 前回同様 #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` 不正なアドレスである`0xdeadbeef`に書き込みを行うため`unsafe`を使います。この仮想アドレスはページテーブル上で物理アドレスにマップされていないため、ページフォルトが発生します。私達の[IDT]にはページフォルトが登録されていないため、ダブルフォルトが発生します。 今、私達のカーネルを起動すると、ブートループが発生します。この理由は以下の通りです: 1. CPUが`0xdeadbeef`に書き込みを試みページフォルトを起こします。 2. CPUはIDTに対応するエントリを探しに行き、ハンドラ関数が指定されていないことを発見します。結果、ページフォルトハンドラが呼び出せず、ダブルフォルトが発生します。 3. CPUはダブルフォルトハンドラのIDTエントリを見にいきますが、このエントリもハンドラ関数を指定していません。結果、**トリプルフォルト**が発生します。 4. トリプルフォルトは重大なエラーなので、QEMUはほとんどの実際のハードウェアと同様にシステムリセットを行います。 このトリプルフォルトを防ぐためには、ページフォルトかダブルフォルトのハンドラ関数を定義しないといけません。私達はすべての場合におけるトリプルフォルトを防ぎたいので、適切に処理できなかったすべての例外において呼び出されることになるダブルフォルトハンドラを定義するところからはじめましょう。 ## ダブルフォルトハンドラ ダブルフォルトは通常のエラーコードのある例外なので、ブレークポイントハンドラと同じようにハンドラ関数を指定することができます。 ```rust // in src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.double_fault.set_handler_fn(double_fault_handler); // new idt }; } // new extern "x86-interrupt" fn double_fault_handler( stack_frame: InterruptStackFrame, _error_code: u64) -> ! { panic!("EXCEPTION: DOUBLE FAULT\n{:#?}", stack_frame); } ``` 私達のハンドラは短いエラーメッセージを出力して、例外スタックフレームをダンプします。ダブルフォルトハンドラのエラーコードは常に`0`なので、出力する必要はないでしょう。ブレークポイントハンドラとの違いの一つは、ダブルフォルトハンドラは[発散する](diverging)(訳注: 翻訳当時、リンク先未訳)ということです。`x86_64`アーキテクチャではダブルフォルト例外から復帰することができないためです。 [発散する]: https://doc.rust-jp.rs/rust-by-example-ja/fn/diverging.html ここで私達のカーネルを起動すると、ダブルフォルトハンドラが呼び出されていることがわかることでしょう。 ![QEMU printing `EXCEPTION: DOUBLE FAULT` and the exception stack frame](qemu-catch-double-fault.png) 動きました!ここで何が起きているかというと、 1. CPUが`0xdeadbeef`に書き込みを試みページフォルトを起こします。 2. 以前と同様に、CPUはIDT中の対応するエントリを見にいきますが、ハンドラ関数が定義されていないことを発見し、結果、ダブルフォルトが起きます。 3. 今回はダブルフォルトハンドラが指定されているので、CPUはそれを適切に呼び出せます。 CPUはダブルフォルトハンドラを呼べるようになったので、トリプルフォルト(とブートループ)はもう起こりません。 ここまでは簡単です。ではなぜこの例外のために丸々一つの記事を用意したのでしょうか?実は、私達は**ほとんどの**ダブルフォルトをキャッチすることはできますが、このアプローチでは十分でないケースがいくつか存在するのです。 ## ダブルフォルトの原因 特別なケースを見ていく前に、ダブルフォルトの正確な原因を知る必要があります。ここまで、私達はとてもあいまいな定義を使ってきました。 > ダブルフォルトとはCPUが例外ハンドラを呼び出すことに失敗したときに起きる特別な例外です。 **「呼び出すことに失敗する」** とは正確には何を意味するのでしょうか?ハンドラが存在しない?ハンドラが[スワップアウト]された?また、ハンドラ自身が例外を発生させたらどうなるのでしょうか? [スワップアウト]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf 例えば以下のようなことが起こるとどうなるでしょう? 1. ブレークポイント例外が発生したが、対応するハンドラがスワップアウトされていたら? 2. ページフォルトが発生したが、ページフォルトハンドラがスワップアウトされていたら? 3. ゼロ除算ハンドラがブレークポイント例外を起こしたが、ブレークポイントハンドラがスワップアウトされていたら? 4. カーネルがスタックをオーバーフローさせて**ガードページ**にヒットしたら? 幸いにもAMD64のマニュアル([PDF][AMD64 manual])には正確な定義が書かれています(8.2.9章)。それによると「ダブルフォルト例外は直前の(一度目の)例外ハンドラの処理中に二度目の例外が発生したとき**起きうる** (can occur)」と書かれています。**起きうる**というのが重要で、とても特別な例外の組み合わせでのみダブルフォルトとなります。この組み合わせは以下のようになっています。 | 最初の例外 | 二度目の例外 | | ------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | | [ゼロ除算],
    [無効TSS],
    [セグメント不在],
    [スタックセグメントフォルト],
    [一般保護違反] | [無効TSS],
    [セグメント不在],
    [スタックセグメントフォルト],
    [一般保護違反] | | [ページフォルト] | [ページフォルト],
    [無効TSS],
    [セグメント不在],
    [スタックセグメントフォルト],
    [一般保護違反] | [ゼロ除算]: https://wiki.osdev.org/Exceptions#Division_Error [無効TSS]: https://wiki.osdev.org/Exceptions#Invalid_TSS [セグメント不在]: https://wiki.osdev.org/Exceptions#Segment_Not_Present [スタックセグメントフォルト]: https://wiki.osdev.org/Exceptions#Stack-Segment_Fault [一般保護違反]: https://wiki.osdev.org/Exceptions#General_Protection_Fault [ページフォルト]: https://wiki.osdev.org/Exceptions#Page_Fault [AMD64 manual]: https://www.amd.com/system/files/TechDocs/24593.pdf 例えばゼロ除算例外に続いてページフォルトが起きた場合は問題ありません(ページフォルトハンドラが呼び出される)が、ゼロ除算例外に続いて一般保護違反が起きた場合はダブルフォルトが発生します。 この表を見れば、先程の質問のうち最初の3つに答えることができます: 1. ブレークポイント例外が発生して、対応するハンドラ関数がスワップアウトされている場合、**ページフォルト**が発生して**ページフォルトハンドラ**が呼び出される 2. ページフォルトが発生してページフォルトハンドラがスワップアウトされている場合、**ダブルフォルト**が発生して**ダブルフォルトハンドラ**が呼び出されます。 3. ゼロ除算ハンドラがブレークポイント例外を発生させた場合、CPUはブレークポイントハンドラを呼び出そうとします。もしブレークポイントハンドラがスワップアウトされている場合、**ページフォルト**が発生して**ページフォルトハンドラ**が呼び出されます。 実際、IDTにハンドラ関数が指定されていない例外のケースでもこの体系に従っています。つまり、例外が発生したとき、CPUは対応するIDTエントリを読み込みにいきます。このエントリは0であり正しいIDTエントリではないので、**一般保護違反**が発生します。私達は一般保護違反のハンドラも定義していないので、新たな一般保護違反が発生します。表によるとこれはダブルフォルトを起こします。 ### カーネルスタックオーバーフロー 4つ目の質問を見てみましょう。 > カーネルがスタックをオーバーフローさせてガードページにヒットしたら? ガードページはスタックの底にある特別なメモリページで、これによってスタックオーバーフローを検出することができます。このページはどの物理メモリにもマップされていないので、アクセスすることで警告なく他のメモリを破壊する代わりにページフォルトが発生します。ブートローダーはカーネルスタックのためにガードページをセットアップするので、スタックオーバーフローは**ページフォルト**を発生させることになります。 ページフォルトが起きるとCPUはIDT内のページフォルトハンドラを探しにいき、[割り込みスタックフレーム](訳注: 翻訳当時、リンク先未訳)をスタック上にプッシュしようと試みます。しかし、このスタックポインタは存在しないガードページを指しています。結果、二度目のページフォルトが発生して、ダブルフォルトが起きます(上の表によれば)。 [割り込みスタックフレーム]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-stack-frame そして、ここでCPUは**ダブルフォルトハンドラ**を呼びにいきます。しかし、ダブルフォルト例外においてもCPUは例外スタックフレームをプッシュしようと試みます。スタックポインタはまだガードページを指しているので、**三度目の**ページフォルトが起きて、**トリプルフォルト**を発生させシステムは再起動します。そのため、私達の今のダブルフォルトハンドラではこの場合でのトリプルフォルトを避けることができません。 実際にやってみましょう。カーネルスタックオーバーフローは無限に再帰する関数を呼び出すことによって簡単に引き起こせます: ```rust // in src/main.rs #[unsafe(no_mangle)] // この関数の名前修飾をしない pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); fn stack_overflow() { stack_overflow(); // 再帰呼び出しのために、リターンアドレスがプッシュされる } // スタックオーバーフローを起こす stack_overflow(); […] // test_main(), println(…), and loop {} } ``` これをQEMUで試すと、再びブートループに入るのがわかります。 では、私達はどうすればこの問題を避けられるでしょうか?例外スタックフレームをプッシュすることは、CPU自身が行ってしまうので、省略できません。つまりどうにかしてダブルフォルト例外が発生したときスタックが常に正常であることを保証する必要があります。幸いにもx86_64アーキテクチャにはこの問題の解決策があります。 ## スタックを切り替える x86_64アーキテクチャは例外発生時に予め定義されている既知の正常なスタックに切り替えることができます。この切り替えはハードウェアレベルで発生するので、CPUが例外スタックフレームをプッシュする前に行うことができます。 切り替えの仕組みは**割り込みスタックテーブル**(IST: Interrupt Stack Table)として実装されています。ISTは7つの既知の正常なポインタのテーブルです。Rust風の疑似コードで表すとこのようになります: ```rust struct InterruptStackTable { stack_pointers: [Option; 7], } ``` 各例外ハンドラに対して、私達は対応する[IDTエントリ](訳注: 翻訳当時、リンク先未訳)の`stack_pointers`フィールドを通してISTからスタックを選ぶことができます。例えば、IST中の最初のスタックをダブルフォルトハンドラのために使うことができます。そうすると、CPUがダブルフォルトが発生したときは必ず、このスタックに自動的に切り替えを行います。この切り替えは何かがプッシュされる前に起きるので、トリプルフォルトを防ぐことになります。 [IDTエントリ]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table ### ISTとTSS 割り込みスタックテーブル(IST)は **[タスクステートセグメント]**(TSS)というレガシーな構造体の一部です。TSSはかつては様々な32ビットモードでのタスクに関する情報(例:プロセッサのレジスタの状態)を保持していて、例えば[ハードウェアコンテキストスイッチング]に使われていました。しかし、ハードウェアコンテキストスイッチングは64ビットモードではサポートされなくなり、TSSのフォーマットは完全に変わりました。 [タスクステートセグメント]: https://ja.wikipedia.org/wiki/Task_state_segment [ハードウェアコンテキストスイッチング]: https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching x86_64ではTSSはタスク固有の情報は全く持たなくなりました。代わりに、2つのスタックテーブル(ISTがその1つ)を持つようになりました。唯一32ビットと64ビットのTSSで共通のフィールドは[I/Oポート許可ビットマップ]へのポインタのみです。 [I/Oポート許可ビットマップ]: https://ja.wikipedia.org/wiki/Task_state_segment#I/O許可ビットマップ 64ビットのTSSは下記のようなフォーマットです: | フィールド | 型 | | -------------------------------------------- | ---------- | | (予約済み) | `u32` | | 特権スタックテーブル | `[u64; 3]` | | (予約済み) | `u64` | | 割り込みスタックテーブル | `[u64; 7]` | | (予約済み) | `u64` | | (予約済み) | `u16` | | I/Oマップベースアドレス | `u16` | **特権スタックテーブル**は特権レベルが変わった際にCPUが使用します。例えば、CPUがユーザーモード(特権レベル3)の時に例外が発生した場合、CPUは通常例外ハンドラを呼び出す前にカーネルモード(特権レベル0)に切り替わります。この場合、CPUは特権レベルスタックテーブルの0番目のスタックに切り替わります。ユーザーモードについてはまだ実装してないため、このテープルはとりあえず無視しておきましょう。 ### TSSをつくる 割り込みスタックテーブルにダブルフォルト用のスタックを含めた新しいTSSをつくってみましょう。そのためにはTSS構造体が必要です。幸いにも、すでに`x86_64`クレートに[`TaskStateSegment`構造体]が含まれているので、これを使っていきます。 [`TaskStateSegment`構造体]: https://docs.rs/x86_64/0.14.2/x86_64/structures/tss/struct.TaskStateSegment.html 新しい`gdt`モジュール内でTSSをつくります(名前の意味は後でわかるでしょう): ```rust // in src/lib.rs pub mod gdt; // in src/gdt.rs use x86_64::VirtAddr; use x86_64::structures::tss::TaskStateSegment; use lazy_static::lazy_static; pub const DOUBLE_FAULT_IST_INDEX: u16 = 0; lazy_static! { static ref TSS: TaskStateSegment = { let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = { const STACK_SIZE: usize = 4096 * 5; static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE]; let stack_start = VirtAddr::from_ptr(&raw const STACK); let stack_end = stack_start + STACK_SIZE; stack_end }; tss }; } ``` Rustの定数評価機はこの初期化をコンパイル時に行うことがまだできないので`lazy_static`を使います。ここでは0番目のISTエントリをダブルフォルト用のスタックとして定義します(他のISTのインデックスでも動くでしょう)。そして、ダブルフォルト用スタックの先頭アドレスを0番目のエントリに書き込みます。先頭アドレスを書き込むのはx86のスタックは下、つまり高いアドレスから低いアドレスに向かって伸びていくからです。 私達はまだメモリ管理を実装していません。そのため、新しいスタックを確保する適切な方法がありません。 その代わり今回は、スタックのストレージとして`static mut`な配列を使います。 これが不変の`static`ではなく`static mut`であることは重要です。 そうでなければブートローダーはこれをリードオンリーのページにマップしてしまうからです。 私達は後の記事でこの部分を適切なスタック確保処理に置き換えます。 ちなみに、このダブルフォルトスタックはスタックオーバーフローに対する保護をするガードページを持ちません。つまり、スタックオーバーフローがスタックより下のメモリを破壊するかもしれないので、私達はダブルフォルトハンドラ内でスタックを多用すべきではないということです。 #### TSSを読み込む 新しいTSSをつくったので、CPUにそれを使うように教える方法が必要です。残念ながら、これはちょっと面倒くさいです。なぜならTSSは(歴史的な理由で)セグメンテーションシステムを使うためです。テーブルを直接読み込むのではなく、新しいセグメントディスクリプタを[グローバルディスクリプタテーブル](GDT: Global Descriptor Table)に追加する必要があります。そうすると各自のGDTインデックスで[`ltr`命令]を呼び出すことで私達のTSSを読み込むことができます。 [グローバルディスクリプタテーブル]: https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/ [`ltr`命令]: https://www.felixcloutier.com/x86/ltr ### グローバルディスクリプタテーブル グローバルディスクリプタテーブル(GDT)はページングがデファクトスタンダードになる以前は、[メモリセグメンテーション]のため使われていた古い仕組みです。カーネル・ユーザーモードの設定やTSSの読み込みなど、様々なことを行うために64ビットモードでも未だに必要です。 [メモリセグメンテーション]: https://ja.wikipedia.org/wiki/セグメント方式 GDTはプログラムの**セグメント**を含む構造です。ページングが標準になる以前に、プログラム同士を独立させるためにより古いアーキテクチャで使われていました。セグメンテーションに関するより詳しい情報は無料の[「Three Easy Pieces」]という本の同じ名前の章を見てください。セグメンテーションは64ビットモードではもうサポートされていませんが、GDTはまだ存在しています。GDTはカーネル空間とユーザー空間の切り替えと、TSS構造体の読み込みという主に2つのことに使われています。 [「Three Easy Pieces」]: http://pages.cs.wisc.edu/~remzi/OSTEP/ #### GDTをつくる `TSS`の静的変数のセグメントを含む静的`GDT`をつくりましょう: ```rust // in src/gdt.rs use x86_64::structures::gdt::{GlobalDescriptorTable, Descriptor}; lazy_static! { static ref GDT: GlobalDescriptorTable = { let mut gdt = GlobalDescriptorTable::new(); gdt.add_entry(Descriptor::kernel_code_segment()); gdt.add_entry(Descriptor::tss_segment(&TSS)); gdt }; } ``` 先に紹介したコードと同様に、再び`lazy_static`を使います。コードセグメントとTSSセグメントを持つ新しいGDTを作成します。 #### GDTを読み込む GDTを読み込むために新しく`gdt::init`関数をつくり、これを`init`関数から呼び出します: ```rust // in src/gdt.rs pub fn init() { GDT.load(); } // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); } ``` これでGDTが読み込まれます(`_start`関数は`init`を呼び出すため)が、これではまだスタックオーバーフローでブートループが起きてしまいます。 ### 最後のステップ 問題はGDTセグメントとTSSレジスタが古いGDTからの値を含んでいるため、GDTセグメントがまだ有効になっていないことです。ダブルフォルト用のIDTエントリが新しいスタックを使うように変更する必要もあります。 まとめると、私達は次のようなことをする必要があります: 1. **コードセグメントレジスタを再読み込みする**:GDTを変更したので、コードセグメントレジスタ`cs`を再読み込みする必要があります。これは、古いセグメントセレクタが異なるGDTディスクリプタ(例:TSSディスクリプタ)を指す可能性があるためです。 2. **TSSをロードする**:TSSセレクタを含むGDTをロードしましたが、CPUにこのTSSを使うよう教えてあげる必要があります。 3. **IDTエントリを更新する**:TSSがロードされると同時に、CPUは正常な割り込みスタックテーブル(IST)へアクセスできるようになります。そうしたら、ダブルフォルトIDTエントリを変更することで、CPUに新しいダブルフォルトスタックを使うよう教えてあげることができます。 最初の2つのステップのために、私達は`gdt::init`関数の中で`code_selector`と`tss_selector`変数にアクセスする必要があります。これは、その変数たちを新しい`Selectors`構造体を使い静的変数にすることで実装できます: ```rust // in src/gdt.rs use x86_64::structures::gdt::SegmentSelector; lazy_static! { static ref GDT: (GlobalDescriptorTable, Selectors) = { let mut gdt = GlobalDescriptorTable::new(); let code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS)); (gdt, Selectors { code_selector, tss_selector }) }; } struct Selectors { code_selector: SegmentSelector, tss_selector: SegmentSelector, } ``` これで私達は`cs`セグメントレジスタを再読み込みし`TSS`を読み込むためにセレクタを使うことができます: ```rust // in src/gdt.rs pub fn init() { use x86_64::instructions::segmentation::set_cs; use x86_64::instructions::tables::load_tss; GDT.0.load(); unsafe { set_cs(GDT.1.code_selector); load_tss(GDT.1.tss_selector); } } ``` [`set_cs`]を使ってコードセグメントレジスタを再読み込みして、[`load_tss`]を使ってTSSを読み込んでいます。これらの関数は`unsafe`とマークされているので、呼び出すには`unsafe`ブロックが必要です。`unsafe`なのは、不正なセレクタを読み込むことでメモリ安全性を壊す可能性があるからです。 [`set_cs`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/segmentation/fn.set_cs.html [`load_tss`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tables/fn.load_tss.html これで正常なTSSと割り込みスタックテーブルを読み込んだので、私達はIDT内のダブルフォルトハンドラにスタックインデックスをセットすることができます: ```rust // in src/interrupts.rs use crate::gdt; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); unsafe { idt.double_fault.set_handler_fn(double_fault_handler) .set_stack_index(gdt::DOUBLE_FAULT_IST_INDEX); // new } idt }; } ``` `set_stack_index`メソッドは呼び出し側が、使われているインデックスが正しいものであり、かつ他の例外で使われていないかを確かめる必要があるため、`unsafe`です。 これで全部です。CPUはダブルフォルトが発生したら常にダブルフォルトスタックに切り替えるでしょう。よって、私達はカーネルスタックオーバーフローを含む**すべての**ダブルフォルトをキャッチすることができます。 ![QEMU printing `EXCEPTION: DOUBLE FAULT` and a dump of the exception stack frame](qemu-double-fault-on-stack-overflow.png) これからはトリプルフォルトを見ることは二度とないでしょう。これらダブルフォルトのための実装を誤って壊していないことを保証するために、テストを追加しましょう。 ## スタックオーバーフローテスト 新しい`gdt`モジュールをテストしダブルフォルトハンドラがスタックオーバーフローで正しく呼ばれることを保証するために、結合テストを追加します。ここでの考えは、テスト関数内でダブルフォルトを引き起こしダブルフォルトハンドラが呼び出されていることを確かめる、というものです。 最小の骨組みから始めましょう: ```rust // in tests/stack_overflow.rs #![no_std] #![no_main] use core::panic::PanicInfo; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` `panic_handler`のテストと同様、テストは[テストハーネスなし]で実行されます。理由は私達はダブルフォルト後に実行を続けることができず、2つ以上のテストは意味をなさないためです。テストハーネスを無効にするために、以下を`Cargo.toml`に追加します: ```toml # in Cargo.toml [[test]] name = "stack_overflow" harness = false ``` [テストハーネスなし]: @/edition-2/posts/04-testing/index.ja.md#hanesu-harness-nonaitesuto これで`cargo test --test stack_overflow`でのコンパイルは成功するでしょう。`unimplemented`マクロがパニックを起こすため、テストはもちろん失敗します。 ### `_start`を実装する `_start`関数の実装はこのようになります: ```rust // in tests/stack_overflow.rs use blog_os::serial_print; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { serial_print!("stack_overflow::stack_overflow...\t"); blog_os::gdt::init(); init_test_idt(); // スタックオーバーフローを起こす stack_overflow(); panic!("Execution continued after stack overflow"); } #[allow(unconditional_recursion)] fn stack_overflow() { stack_overflow(); // 再帰のたびにリターンアドレスがプッシュされる volatile::Volatile::new(0).read(); // 末尾最適化を防ぐ } ``` 新しいGDTを初期化するために`gdt::init`関数を呼びます。そして`interrupts::init_idt`関数を呼び出す代わりに、すぐ後で説明する`init_test_idt`関数を呼びます。なぜなら、私達はパニックの代わりに`exit_qemu(QemuExitCode::Success)`を実行するカスタムしたダブルフォルトハンドラを登録したいためです。 `stack_overflow`関数は`main.rs`の中にある関数とほとんど同じです。唯一の違いは[**末尾呼び出し最適化**]と呼ばれるコンパイラの最適化を防ぐために[`Volatile`]タイプを使って関数の末尾で追加の[volatile]読み込みを行っていることです。この最適化の特徴として、コンパイラが、最後の文が再帰関数呼び出しである関数を通常のループに変換できるようになる、というものがあります。その結果として、追加のスタックフレームが関数呼び出しではつくられず、スタックの使用量が変わらないままとなります。 [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) [`Volatile`]: https://docs.rs/volatile/0.2.6/volatile/struct.Volatile.html [**末尾呼び出し最適化**]: https://ja.wikipedia.org/wiki/末尾再帰#末尾呼出し最適化 しかし、ここではスタックオーバーフローを起こしたいので、コンパイラに削除されない、ダミーのvolatile読み込み文を関数の末尾に追加します。その結果、関数は**末尾再帰**ではなくなり、ループへの変換は防がれます。更に関数が無限に再帰することに対するコンパイラの警告をなくすために`allow(unconditional_recursion)`属性を追加します。 ### IDTのテスト 上で述べたように、テストはカスタムしたダブルフォルトハンドラを含む専用のIDTが必要です。実装はこのようになります: ```rust // in tests/stack_overflow.rs use lazy_static::lazy_static; use x86_64::structures::idt::InterruptDescriptorTable; lazy_static! { static ref TEST_IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); unsafe { idt.double_fault .set_handler_fn(test_double_fault_handler) .set_stack_index(blog_os::gdt::DOUBLE_FAULT_IST_INDEX); } idt }; } pub fn init_test_idt() { TEST_IDT.load(); } ``` 実装は`interrupts.rs`内の通常のIDTと非常に似ています。通常のIDT同様、分離されたスタックに切り替えるようダブルフォルトハンドラ用のISTにスタックインデックスをセットします。`init_test_idt`関数は`load`メソッドによりCPU上にIDTを読み込みます。 ### ダブルフォルトハンドラ 唯一欠けているのはダブルフォルトハンドラです。このようになります: ```rust // in tests/stack_overflow.rs use blog_os::{exit_qemu, QemuExitCode, serial_println}; use x86_64::structures::idt::InterruptStackFrame; extern "x86-interrupt" fn test_double_fault_handler( _stack_frame: InterruptStackFrame, _error_code: u64, ) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` ダブルフォルトハンドラが呼ばれるとき、私達はQEMUを正常な終了コードで終了し、テストを成功とマークします。結合テストは完全に分けられた実行ファイルなので、私達はテストファイルの先頭で`#![feature(abi_x86_interrupt)]`属性を再びセットする必要があります。 これで私達は`cargo test --test stack_overflow`(もしくは全部のテストを走らせるよう`cargo test`)でテストを走らせることができます。期待していたとおり、`stack_overflow... [ok]`とコンソールに出力されるのがわかります。`set_stack_index`の行をコメントアウトすると、テストは失敗するでしょう。 ## まとめ この記事では私達はダブルフォルトが何であるかとどういう条件下で発生するかを学びました。エラーメッセージを出力する基本的なダブルフォルトハンドラと、そのための結合テストを追加しました。 また、私達はスタックオーバーフロー下でも動くよう、ダブルフォルト発生時にハードウェアがサポートするスタック切り替えを行うようにしました。実装していく中で、古いアーキテクチャでのセグメンテーションで使われていたタスクステートセグメント(TSS)、割り込みスタックテーブル(IST)、グローバルディスクリプタテーブル(GDT)についても学びました。 ## 次は? 次の記事ではタイマーやキーボード、ネットワークコントローラのような、外部デバイスからの割り込みをどのように処理するかを説明します。これらのハードウェア割り込みは例外によく似ています。例えば、これらもIDTからディスパッチされます。しかしながら、例外とは違い、それらはCPU上で直接発生するものではありません。代わりに、**割り込みコントローラ**がこれらの割り込みを集めて、優先度によってそれらをCPUに送ります。次回、私達は[Intel 8259](PIC)割り込みコントローラを調べ、どのようにキーボードのサポートを実装するかを学びます。 [Intel 8259]: https://ja.wikipedia.org/wiki/Intel_8259 ================================================ FILE: blog/content/edition-2/posts/06-double-faults/index.ko.md ================================================ +++ title = "더블 폴트 (Double Fault)" weight = 6 path = "ko/double-fault-exceptions" date = 2018-06-18 [extra] # Please update this when updating the translation translation_based_on_commit = "a108367d712ef97c28e8e4c1a22da4697ba6e6cd" # GitHub usernames of the people that translated this post translators = ["JOE1994"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["dalinaum"] +++ 이번 글에서는 CPU가 예외 처리 함수를 호출하는 데에 실패할 때 발생하는 더블 폴트 (double fault) 예외에 대해 자세히 다룹니다. 더블 폴트 예외를 처리함으로써 시스템 재부팅을 발생시키는 치명적인 _트리플 폴트 (triple fault)_ 예외를 피할 수 있습니다. 트리플 폴트가 발생할 수 있는 모든 경우에 대비하기 위해 _Interrupt Stack Table_ 을 만들고 별도의 커널 스택에서 더블 폴트를 처리할 것입니다. 이 블로그는 [GitHub 저장소][GitHub]에서 오픈 소스로 개발되고 있으니, 문제나 문의사항이 있다면 저장소의 'Issue' 기능을 이용해 제보해주세요. [페이지 맨 아래][at the bottom]에 댓글을 남기실 수도 있습니다. 이 포스트와 관련된 모든 소스 코드는 저장소의 [`post-06 브랜치`][post branch]에서 확인하실 수 있습니다. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-06 ## 더블 폴트 (Double Fault)란 무엇일까요? 간단히 말하면, 더블 폴트는 CPU가 예외 처리 함수를 호출하는 것에 실패했을 때 발생하는 예외입니다. 예를 들면 페이지 폴트가 발생했는데 [인터럽트 서술자 테이블 (Interrupt Descriptor Table; IDT)][IDT] 에 등록된 페이지 폴트 처리 함수가 없을 때 더블 폴트가 예외가 발생합니다. 비유한다면 C++의 `catch(..)`문이나 Java 및 C#의 `catch(Exception e)`문처럼 모든 종류의 예외를 처리할 수 있다는 점에서 유사합니다. [IDT]: @/edition-2/posts/05-cpu-exceptions/index.ko.md#the-interrupt-descriptor-table 더블 폴트는 다른 예외들과 다를 게 없습니다. IDT 내에서 배정된 벡터 인덱스(`8`)가 있고, IDT에 해당 예외를 처리할 일반 함수를 정의할 수 있습니다. 더블 폴트 처리 함수를 제공하는 것은 매우 중요한데, 더블 폴트가 처리되지 않으면 치명적인 _트리플 폴트_ 가 발생하기 때문입니다. 트리플 폴트를 처리하는 것은 불가능해서 대부분의 하드웨어는 시스템을 리셋하는 방식으로 대응합니다. ### 더블 폴트 일으키기 예외 처리 함수가 등록되지 않은 예외를 발생시켜 더블 폴트를 일으켜 보겠습니다. ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // 페이지 폴트 일으키기 unsafe { *(0xdeadbeef as *mut u8) = 42; }; // 이전과 동일 #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` `unsafe` 키워드를 사용해 유효하지 않은 메모리 주소 `0xdeadbeef`에 값을 씁니다. 페이지 테이블에서 해당 가상 주소는 실제 물리 주소에 매핑되지 않았기에 페이지 폴트가 발생합니다. 아직 우리가 [IDT]에 페이지 폴트 처리 함수를 등록하지 않았기 때문에 이어서 더블 폴트가 발생합니다. 이제 커널을 실행시키면 커널이 무한히 부팅하는 루프에 갇히는 것을 확인하실 수 있습니다. 커널이 루프에 갇히는 이유는 아래와 같습니다. 1. CPU가 메모리 주소 `0xdeadbeef`에 값을 쓰려고 시도한 것 때문에 페이지 폴트가 발생합니다. 2. CPU는 IDT에서 페이지 폴트에 대응하는 엔트리를 확인하지만, 페이지 폴트 처리 함수가 등록되어 있지 않습니다. 호출할 수 있는 페이지 폴트 처리 함수가 없어 더블 폴트가 발생합니다. 3. CPU는 IDT에서 더블 폴트에 대응하는 엔트리를 확인하지만, 더블 폴트 처리 함수가 등록되어 있지 않습니다. 이후 _트리플 폴트_ 가 발생합니다. 4. 트리플 폴트는 치명적입니다. 다른 실제 하드웨어들처럼 QEMU 또한 시스템을 리셋합니다. 이런 상황에서 트리플 폴트 발생을 막으려면 페이지 폴트 또는 더블 폴트의 처리 함수를 등록해야 합니다. 어떤 경우에서든 트리플 폴트만은 막아야 하므로, 처리되지 않은 예외가 있을 때 호출되는 더블 폴트의 처리 함수부터 먼저 작성하겠습니다. ## 더블 폴트 처리 함수 더블 폴트도 일반적인 예외로서 오류 코드를 가집니다. 따라서 더블 폴트 처리 함수를 작성할 때 이전에 작성한 breakpoint 예외 처리 함수와 비슷하게 작성할 수 있습니다. ```rust // in src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.double_fault.set_handler_fn(double_fault_handler); // 새롭게 추가함 idt }; } // 새롭게 추가함 extern "x86-interrupt" fn double_fault_handler( stack_frame: InterruptStackFrame, _error_code: u64) -> ! { panic!("EXCEPTION: DOUBLE FAULT\n{:#?}", stack_frame); } ``` 우리가 작성한 더블 폴트 처리 함수는 짧은 오류 메시지와 함께 예외 스택 프레임의 정보를 출력합니다. 더블 폴트 처리 함수의 오류 코드가 0인 것은 이미 아는 사실이니 굳이 출력할 필요가 없습니다. breakpoint 예외 처리 함수와 비교해 하나 다른 점은 더블 폴트 처리 함수가 [발산하는][_diverging_] 함수라는 것인데, 그 이유는 더블 폴트로부터 반환하는 것을 `x86_64` 아키텍처에서 허용하지 않기 때문입니다. [_diverging_]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html 이제 커널을 실행시키면 더블 폴트 처리 함수가 호출되는 것을 확인하실 수 있습니다. ![QEMU printing `EXCEPTION: DOUBLE FAULT` and the exception stack frame](qemu-catch-double-fault.png) 성공입니다! 어떤 일들이 일어났는지 단계별로 살펴보겠습니다. 1. CPU가 메모리 주소 `0xdeadbeef`에 값을 적으려 하고, 그 결과 페이지 폴트가 발생합니다. 2. 이전처럼 CPU는 IDT에서 페이지 폴트에 대응하는 엔트리를 확인하지만, 등록된 처리 함수가 없음을 확인합니다. 그 결과 더블 폴트가 발생합니다. 3. CPU의 제어 흐름이 등록된 더블 폴트 처리 함수로 점프합니다. CPU가 더블 폴트 처리 함수를 호출할 수 있기에, 트리플 폴트와 무한 재부팅 루프는 더 이상 발생하지 않습니다. 별로 어렵지 않군요! 그럼에도 이 주제 하나에 이 글 전체를 할애한 이유가 궁금하신가요? 사실, 현재 우리는 _대부분의_ 더블 폴트를 처리할 수는 있지만, 현재의 커널 구현으로는 더블 폴트를 처리하지 못하는 특수한 경우들이 아직 남아 있습니다. ## 더블 폴트의 원인들 특수한 경우들을 살펴보기 전에, 우선 더블 폴트가 일어나는 엄밀한 원인에 대해 파악해야 합니다. 본문의 윗부분에서는 더블 폴트를 설명할 때 다소 애매하고 느슨한 정의를 사용했습니다. > 더블 폴트는 CPU가 예외 처리 함수를 호출하는 것에 실패했을 때 발생하는 예외입니다. _“예외 처리 함수를 호출하는 것에 실패했을 때”_ 라는 게 정확히 무슨 뜻일까요? 예외 처리 함수가 등록되어 있지 않아 호출에 실패했다? 예외 처리 함수가 [스왑-아웃][swapped out] 되어 있어 호출에 실패했다? 그리고 예외 처리 함수 자체가 다시 예외를 발생시키면 어떻게 될까요? [swapped out]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf 예를 들어, 아래의 각각의 상황들을 가정했을 때 무슨 일이 일어날지 고민해 봅시다. 1. breakpoint 예외가 발생한 시점에 breakpoint 예외 처리 함수가 스왑-아웃 되어 있는 경우? 2. 페이지 폴트가 발생한 시점에 페이지 폴트 처리 함수가 스왓-아웃 되어 있는 경우? 3. divide-by-zero 예외 처리 함수가 breakpoint 예외를 발생시킨 시점에 breakpoint 예외 처리 함수가 스왑-아웃 되어 있는 경우? 4. 커널이 스택 오버 플로우를 일으켜 _보호 페이지 (guard page)_ 에 접근하는 경우? 다행히 AMD64 매뉴얼 ([PDF][AMD64 manual])에서 더블 폴트의 명확한 정의를 제시합니다 (매뉴얼 섹션 8.2.9 참조). 매뉴얼의 정의에 따르면, “더블 폴트 예외는 1번째 발생한 예외를 처리하는 도중 2번째 예외가 발생한 경우에 _발생할 수 있다_” 합니다. 여기서 _“발생할 수 있다”_ 라는 표현이 중요한데, 더블 폴트는 아래의 표에서 보이는 것처럼 특수한 조합의 예외들이 순서대로 일어났을 때에만 발생합니다. 1번째 발생한 예외 | 2번째 발생한 예외 ----------------|----------------- [Divide-by-zero],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] [Page Fault] | [Page Fault],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] [Divide-by-zero]: https://wiki.osdev.org/Exceptions#Division_Error [Invalid TSS]: https://wiki.osdev.org/Exceptions#Invalid_TSS [Segment Not Present]: https://wiki.osdev.org/Exceptions#Segment_Not_Present [Stack-Segment Fault]: https://wiki.osdev.org/Exceptions#Stack-Segment_Fault [General Protection Fault]: https://wiki.osdev.org/Exceptions#General_Protection_Fault [Page Fault]: https://wiki.osdev.org/Exceptions#Page_Fault [AMD64 manual]: https://www.amd.com/system/files/TechDocs/24593.pdf 예를 들면 divide-by-zero 예외 뒤에 페이지 폴트가 발생하는 것은 괜찮지만 (페이지 폴트 처리 함수가 호출됨), divide-by-zero 예외 뒤에 general-protection fault 예외가 발생하면 더블 폴트가 발생합니다. 위 테이블을 이용하면 위에서 했던 질문 중 첫 3개에 대해 대답할 수 있습니다. 1. breakpoint 예외가 발생한 시점에 해당 예외 처리 함수가 스왑-아웃 되어 있는 경우, _페이지 폴트_ 가 발생하고 _페이지 폴트 처리 함수_ 가 호출됩니다. 2. 페이지 폴트가 발생한 시점에 페이지 폴트 처리 함수가 스왑-아웃 되어 있는 경우, _더블 폴트_ 가 발생하고 _더블 폴트 처리 함수_ 가 호출됩니다. 3. divide-by-zero 예외 처리 함수가 breakpoint 예외를 일으키는 경우, CPU가 breakpoint 예외 처리 함수의 호출을 시도합니다. breakpoint 예외 처리 함수가 스왑-아웃 되어 있는 경우, _페이지 폴트_ 가 발생하고 _페이지 폴트 처리 함수_ 가 호출됩니다. 사실 임의의 예외에 대한 처리 함수가 IDT에 없다는 것만으로 더블 폴트가 발생하는 것이 아닙니다. 예외가 발생하면 CPU는 그 예외에 대응하는 IDT 엔트리를 참조합니다. 해당 엔트리 값이 0인 경우 (= 예외 처리 함수가 등록되어 있지 않음), _general protection fault_ 예외가 발생합니다. 우리는 해당 예외를 처리할 함수를 등록하지 않았기 때문에, 새로운 general protection fault 예외가 또 발생합니다. general protection fault가 이어서 2번 일어났으니, 위 테이블에 따라 더블 폴트가 발생합니다. ### 커널 스택 오버플로우 이제 위의 질문들 중 마지막 4번째 질문을 살펴보겠습니다. > 커널이 스택 오버 플로우를 일으켜 _보호 페이지 (guard page)_ 에 접근하는 경우, 무슨 일이 일어날까요? 보호 페이지는 스택의 맨 아래에 위치하면서 스택 오버플로우를 감지하는 특별한 메모리 페이지입니다. 해당 페이지는 어떤 물리 프레임에도 매핑되지 않으며, CPU가 해당 페이지에 접근하면 물리 메모리에 접근하는 대신 페이지 폴트가 발생합니다. 부트로더가 커널 스택의 보호 페이지를 초기화하며, 이후 커널 스택 오버플로우가 발생하면 _페이지 폴트_ 가 발생합니다. 페이지 폴트가 발생하면 CPU는 IDT에서 페이지 폴트 처리 함수를 찾고 스택에 [인터럽트 스택 프레임 (interrupt stack frame)][interrupt stack frame]을 push 하려고 합니다. 하지만 현재의 스택 포인터는 물리 프레임이 매핑되지 않은 보호 페이지를 가리키고 있습니다. 따라서 2번째 페이지 폴트가 발생하고, 그 결과 더블 폴트가 발생합니다 (위 테이블 참조). [interrupt stack frame]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-stack-frame CPU는 이제 _더블 폴트 처리 함수_ 를 호출하려고 시도합니다. 하지만, 더블 폴트 발생 시 CPU는 또 예외 스택 프레임 (= 인터럽트 스택 프레임)을 스택에 push하려고 합니다. 스택 포인터는 여전히 보호 페이지를 가리키고, 따라서 _3번째_ 페이지 폴트 발생 후 _트리플 폴트_ 가 발생하고 시스템이 재부팅 됩니다. 우리가 지금 가진 더블 폴트 처리 함수로는 이 상황에서 트리플 폴트를 막을 수 없습니다. 역시 백문이 불여일견이죠! 무한 재귀 함수를 호출해 손쉽게 커널 스택오버플로우를 일으켜 봅시다. ```rust // in src/main.rs #[unsafe(no_mangle)] // 이 함수의 이름을 mangle하지 않습니다 pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); fn stack_overflow() { stack_overflow(); // 재귀 호출할 때마다 스택에 반환 주소를 push 합니다 } // 스택 오버플로우 일으키기 stack_overflow(); […] // test_main(), println(…), and loop {} } ``` 이 코드를 QEMU에서 실행하면 시스템이 또 무한 재부팅 루프에 갇히는 것을 확인할 수 있습니다. 이 문제를 어떻게 피할 수 있을까요? CPU 하드웨어가 예외 스택 프레임을 push 하는 것이라서, 커널 코드를 통해 스택 프레임의 push 과정을 생략할 수는 없습니다. 그래서 더블 폴트가 발생한 시점에는 늘 커널 스택이 유효하도록 보장할 수 있는 방법을 찾아야 합니다. 다행히도, x86_64 아키텍처는 이 문제에 대한 해답을 가지고 있습니다. ## 스택 교체하기 x86_64 아키텍처는 예외 발생 시 스택을 미리 지정한 다른 안전한 스택으로 교체하는 것이 가능합니다. 이러한 스택 교체는 하드웨어 단에서 일어나고, 따라서 CPU가 예외 스택 프레임을 스택에 push 하기 전에 스택을 교체하는 것이 가능합니다. 이러한 스택 교체는 _인터럽트 스택 테이블 (Interrupt Stack Table; IST)_ 을 사용해 진행됩니다. IST는 안전한 것으로 알려진 7개의 다른 스택들의 주소를 저장하는 테이블입니다. IST의 구조를 Rust 코드 형식으로 표현하자면 아래와 같습니다. ```rust struct InterruptStackTable { stack_pointers: [Option; 7], } ``` 각 예외 처리 함수는 [IDT 엔트리][IDT entry]의 `stack_pointers` 필드를 통해 IST의 스택 중 하나를 사용하도록 선택할 수 있습니다. 예를 들어, 우리의 더블 폴트 처리 함수가 IST의 1번째 스택을 사용하도록 설정할 수 있습니다. 그 후에는 더블 폴트가 발생할 때마다 CPU가 스택을 IST의 1번째 스택으로 교체합니다. 스택에 새로운 데이터가 push 되기 전에 스택 교체가 이뤄지기 때문에 트리플 폴트를 피할 수 있습니다. [IDT entry]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table ### IST와 TSS 인터럽트 스택 테이블 (IST)은 오래되어 이젠 구식이 된 _[Task State Segment]_ (TSS)라는 구조체의 일부입니다. 예전에 TSS는 다양한 정보 (예: 프로세서 레지스터들의 상태 값)를 저장하거나 [하드웨어를 이용한 컨텍스트 스위치][hardware context switching]을 지원하는 용도로 사용됐습니다. 하지만 하드웨어를 이용한 컨텍스트 스위치를 64비트 모드에서부터는 지원하지 않게 되었고, 그 이후 TSS의 구조는 완전히 바뀌었습니다. [Task State Segment]: https://en.wikipedia.org/wiki/Task_state_segment [hardware context switching]: https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching x86_64 아키텍처에서 TSS는 특정 태스크 (task) 관련 정보를 보관하지 않습니다. 대신 TSS는 두 개의 스택 테이블을 보관합니다 (IST가 그중 하나입니다). 32비트 시스템의 TSS와 64비트 시스템의 TSS의 유일한 공통 필드는 [I/O port permissions bitmap]에 대한 포인터 하나 뿐입니다. [I/O port permissions bitmap]: https://en.wikipedia.org/wiki/Task_state_segment#I.2FO_port_permissions 64비트 TSS는 아래의 구조를 가집니다. Field | Type ------ | ---------------- (reserved) | `u32` Privilege Stack Table | `[u64; 3]` (reserved) | `u64` Interrupt Stack Table | `[u64; 7]` (reserved) | `u64` (reserved) | `u16` I/O Map Base Address | `u16` CPU가 특권 레벨을 교체할 때 _Privilege Stack Table_ 을 사용합니다. CPU가 사용자 모드일 때 (특권 레벨 = 3) 예외가 발생하면, CPU는 예외 처리 함수를 호출에 앞서 커널 모드로 전환합니다 (특권 레벨 = 0). 이 경우 CPU는 스택을 Privilege Stack Table의 0번째 스택으로 교체합니다 (특권 레벨이 0이라서). 아직 우리의 커널에서 동작하는 사용자 모드 프로그램이 없으므로, 일단은 이 테이블에 대해 걱정하지 않아도 됩니다. ### TSS 생성하기 새로운 TSS를 생성하고 TSS의 인터럽트 스택 테이블에 별도의 더블 폴트 스택을 갖추도록 코드를 작성하겠습니다. 우선 TSS를 나타낼 구조체가 필요하기에, `x86_64` 크레이트가 제공하는 [`TaskStateSegment` 구조체][`TaskStateSegment` struct]를 사용하겠습니다. [`TaskStateSegment` struct]: https://docs.rs/x86_64/0.14.2/x86_64/structures/tss/struct.TaskStateSegment.html 새로운 모듈 `gdt`에 TSS를 생성합니다 (모듈 이름이 왜 gdt인지는 이후에 납득이 가실 겁니다). ```rust // in src/lib.rs pub mod gdt; // in src/gdt.rs use x86_64::VirtAddr; use x86_64::structures::tss::TaskStateSegment; use lazy_static::lazy_static; pub const DOUBLE_FAULT_IST_INDEX: u16 = 0; lazy_static! { static ref TSS: TaskStateSegment = { let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = { const STACK_SIZE: usize = 4096 * 5; static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE]; let stack_start = VirtAddr::from_ptr(&raw const STACK); let stack_end = stack_start + STACK_SIZE; stack_end }; tss }; } ``` Rust의 const evaluator가 위와 같은 TSS의 초기화를 컴파일 중에 진행하지 못해서 `lazy_static`을 사용합니다. IST의 0번째 엔트리가 더블 폴트 스택이 되도록 정합니다 (꼭 0번째일 필요는 없음). 그다음 더블 폴트 스택의 최상단 주소를 IST의 0번째 엔트리에 저장합니다. 스택의 최상단 주소를 저장하는 이유는 x86 시스템에서 스택은 높은 주소에서 출발해 낮은 주소 영역 쪽으로 성장하기 때문입니다. 우리가 아직 커널에 메모리 관리 (memory management) 기능을 구현하지 않아서 스택을 할당할 정규적인 방법이 없습니다. 임시방편으로 `static mut` 배열을 스택 메모리인 것처럼 사용할 것입니다. 배열은 꼭 `static`이 아닌 `static mut`로 설정해야 하는데, 그 이유는 부트로더가 `static` 변수를 읽기 전용 메모리 페이지에 배치하기 때문입니다. 이 더블 폴트 스택에 스택 오버플로우를 감지하기 위한 보호 페이지가 없다는 것에 유의해야 합니다. 더블 폴트 스택에서 스택 오버플로우가 발생하면 스택 아래의 메모리 영역을 일부 덮어쓸 수 있기 때문에, 더블 폴트 처리 함수 안에서 스택 메모리를 과도하게 소모해서는 안됩니다. #### TSS 불러오기 새로운 TSS도 만들었으니, 이제 CPU에게 이 TSS를 쓰도록 지시할 방법이 필요합니다. TSS가 역사적 이유로 인해 세그멘테이션 (segmentation) 시스템을 사용하는 탓에, CPU에 TSS를 쓰도록 지시하는 과정이 꽤 번거롭습니다. TSS를 직접 불러오는 대신, [전역 서술자 테이블 (Global Descriptor Table; GDT)][Global Descriptor Table]을 가리키는 새로운 세그먼트 서술자 (segment descriptor)를 추가해야 합니다. 그 후 [`ltr` 명령어][`ltr` instruction]에 GDT 안에서의 TSS의 인덱스를 주고 호출하여 TSS를 불러올 수 있습니다. (이것이 모듈 이름을 `gdt`로 설정한 이유입니다.) [Global Descriptor Table]: https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/ [`ltr` instruction]: https://www.felixcloutier.com/x86/ltr ### 전역 서술자 테이블 (Global Descriptor Table) 전역 서술자 테이블 (Global Descriptor Table; GDT)는 메모리 페이징이 표준이 되기 이전, [메모리 세그멘테이션 (memory segmentation)][memory segmentation]을 지원하는 데 쓰인 오래된 물건입니다. 64비트 모드에서도 여전히 여러 쓰임새가 있는데, 커널/사용자 모드 설정 및 TSS 불러오기 등의 용도에 쓰입니다. [memory segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation GDT는 프로그램의 _세그먼트_ 들을 저장하는 구조체입니다. 메모리 페이징이 표준화되어 쓰이기 이전의 오래된 아키텍처들에서 프로그램들을 서로 격리할 때 GDT를 사용했습니다. 세그멘테이션에 대한 자세한 정보는 무료 공개된 [책 “Three Easy Pieces”][“Three Easy Pieces” book]의 Segmentation 챕터를 참고해 주세요. 세그멘테이션은 64비트 모드에서는 더 이상 지원되지 않지만, 그래도 GDT는 남아 있습니다. GDT는 대체로 2가지 용도에 쓰입니다: 1) 커널 공간과 사용자 공간 사이 교체를 진행할 때. 2) TSS 구조체를 불러올 때. [“Three Easy Pieces” book]: http://pages.cs.wisc.edu/~remzi/OSTEP/ #### GDT 만들기 static 변수 `TSS`의 세그먼트를 포함하는 static `GDT`를 만듭니다. ```rust // in src/gdt.rs use x86_64::structures::gdt::{GlobalDescriptorTable, Descriptor}; lazy_static! { static ref GDT: GlobalDescriptorTable = { let mut gdt = GlobalDescriptorTable::new(); gdt.add_entry(Descriptor::kernel_code_segment()); gdt.add_entry(Descriptor::tss_segment(&TSS)); gdt }; } ``` 이전처럼 `lazy_static`을 사용했습니다. 코드 세그먼트와 TSS 세그먼트를 포함하는 GDT를 만듭니다. #### GDT 불러오기 GDT를 불러오는 용도의 함수 `gdt::init` 함수를 만들고, `init` 함수로부터 해당 함수를 호출합니다. ```rust // in src/gdt.rs pub fn init() { GDT.load(); } // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); } ``` 이제 GDT를 불러온 상태입니다만 (`_start` 함수가 `init` 함수를 호출했기 때문에), 여전히 커널 스택 오버플로우 발생 시 커널이 무한 재부팅 루프에 갇힙니다. ### 최종 단계 세그먼트 레지스터 및 TSS 레지스터가 기존의 GDT로부터 읽어온 값들을 저장하고 있는 탓에, 우리가 만든 GDT의 세그먼트들이 활성화되지 않은 상황입니다. 또한 더블 폴트 처리 함수가 새로운 스택을 쓰도록 IDT에서 더블 폴트 처리 함수의 엔트리를 알맞게 수정해야 합니다. 정리하자면 우리는 아래의 작업을 순차적으로 진행해야 합니다. 1. **code segment 레지스터의 값 갱신하기**: GDT를 변경하였으니 코드 세그먼트 레지스터 `cs`의 값도 갱신해야 합니다. 기존의 세그먼트 선택자는 새 GDT 안에서 코드 세그먼트가 아닌 다른 세그먼트의 선택자와 동일할 수도 있습니다 (예: TSS 선택자). 2. **TSS 불러오기**: GDT와 TSS 선택자를 불러오고, 그 후 CPU가 해당 TSS를 사용하도록 지시해야 합니다. 3. **IDT 엔트리 수정하기**: TSS를 불러온 시점부터 CPU는 유효한 인터럽트 스택 테이블 (IST)에 접근할 수 있습니다. 앞으로 더블 폴트 발생 시 CPU가 새로운 더블 폴트 스택으로 교체하도록, IDT에서 더블 폴트에 대응하는 엔트리를 알맞게 수정합니다. 첫 두 단계를 수행하려면 `gdt::init` 함수에서 두 변수 `code_selector`와 `tss_selector`에 대한 접근할 수 있어야 합니다. `Selectors` 라는 새로운 구조체를 통해 해당 변수들을 `gdt::init` 함수에서 접근할 수 있게 만듭니다. ```rust // in src/gdt.rs use x86_64::structures::gdt::SegmentSelector; lazy_static! { static ref GDT: (GlobalDescriptorTable, Selectors) = { let mut gdt = GlobalDescriptorTable::new(); let code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS)); (gdt, Selectors { code_selector, tss_selector }) }; } struct Selectors { code_selector: SegmentSelector, tss_selector: SegmentSelector, } ``` 이제 접근 가능해진 선택자들을 사용해 `cs` 레지스터의 값을 갱신하고 우리가 만든 `TSS`를 불러옵니다. ```rust // in src/gdt.rs pub fn init() { use x86_64::instructions::tables::load_tss; use x86_64::instructions::segmentation::{CS, Segment}; GDT.0.load(); unsafe { CS::set_reg(GDT.1.code_selector); load_tss(GDT.1.tss_selector); } } ``` [`set_cs`] 함수로 코드 세그먼트 레지스터의 값을 갱신하고, [`load_tss`] 함수로 우리가 만든 TSS를 불러옵니다. 이 함수들은 `unsafe` 함수로 정의되어 있어 `unsafe` 블록 안에서만 호출할 수 있습니다. 이 함수들이 `unsafe`로 정의된 이유는 해당 함수들에 유효하지 않은 선택자를 전달할 경우 메모리 안전성을 해칠 수 있기 때문입니다. [`set_cs`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/segmentation/fn.set_cs.html [`load_tss`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tables/fn.load_tss.html 유효한 TSS와 인터럽트 스택 테이블을 불러왔으니, 이제 더블 폴트 처리 함수가 사용할 스택의 인덱스를 IDT에서 지정해 봅시다. ```rust // in src/interrupts.rs use crate::gdt; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); unsafe { idt.double_fault.set_handler_fn(double_fault_handler) .set_stack_index(gdt::DOUBLE_FAULT_IST_INDEX); // 새롭게 추가함 } idt }; } ``` `set_stack_index`가 unsafe 함수인 이유는, 이 함수를 호출하는 측에서 인덱스가 유효하고 다른 예외 처리 시 사용 중이지 않다는 것을 보장해야 하기 때문입니다. 수고하셨습니다! 이제부터 더블 폴트가 일어난다면 CPU는 스택을 더블 폴트 스택으로 교체할 것입니다. 드디어 커널 스택 오버플로우가 발생하는 상황을 포함하여 더블 폴트가 일어나는 _어떤 경우라도_ 더블 폴트를 처리할 수 있게 됐습니다. ![QEMU printing `EXCEPTION: DOUBLE FAULT` and a dump of the exception stack frame](qemu-double-fault-on-stack-overflow.png) 앞으로 트리플 폴트를 볼 일은 없을 겁니다! 위에서 구현한 내용을 우리가 미래에 실수로라도 훼손하지 않도록, 위 구현의 작동을 점검하는 테스트를 추가해 보겠습니다. ## 커널 스택 오버플로우 테스트 `gdt` 모듈을 테스트하고 커널 스택 오버플로우 발생 시 더블 폴트 처리 함수가 호출되는지 확인하는 용도의 통합 테스트를 추가할 것입니다. 테스트 함수에서 더블 폴트를 일으킨 후에 더블 폴트 처리 함수가 호출되었는지 확인하는 테스트를 작성하겠습니다. 최소한의 뼈대 코드에서부터 테스트 작성을 시작해 봅시다. ```rust // in tests/stack_overflow.rs #![no_std] #![no_main] use core::panic::PanicInfo; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` 이전에 작성한 `panic_handler` 테스트처럼 이 테스트 또한 [테스트 하네스 (test harness) 없이][without a test harness] 실행될 것입니다. 그 이유는 더블 폴트가 발생한 후에는 프로그램의 정상 실행을 재개할 수가 없기 때문에 어차피 1개 이상의 테스트를 두는 것이 의미가 없기 때문입니다. 테스트 하네스를 사용하지 않도록 `Cargo.toml`에 아래의 코드를 추가합니다. ```toml # in Cargo.toml [[test]] name = "stack_overflow" harness = false ``` [without a test harness]: @/edition-2/posts/04-testing/index.ko.md#no-harness-tests `cargo test --test stack_overflow` 실행 시 컴파일은 성공할 것이고, 테스트 내의 `unimplemented` 매크로 때문에 테스트 실행은 실패할 것입니다. ### `_start` 함수 구현 `_start` 함수의 코드 구현은 아래와 같습니다. ```rust // in tests/stack_overflow.rs use blog_os::serial_print; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { serial_print!("stack_overflow::stack_overflow...\t"); blog_os::gdt::init(); init_test_idt(); // 스택 오버플로우 일으키기 stack_overflow(); panic!("Execution continued after stack overflow"); } #[allow(unconditional_recursion)] fn stack_overflow() { stack_overflow(); // 재귀 호출할 때마다 반환 주소가 스택에 push 됩니다 volatile::Volatile::new(0).read(); // "tail call elimination" 방지하기 } ``` `gdt::init` 함수를 호출해 새 GDT를 초기화합니다. `interrupts::init_idt` 함수 대신 `init_test_idt` 함수를 호출하는데, 그 이유는 패닉하지 않고 `exit_qemu(QemuExitCode::Success)`를 호출하는 새로운 더블 폴트 처리 함수를 등록해 사용할 것이기 때문입니다. `stack_overflow` 함수는 `main.rs`에서 작성했던 것과 거의 동일합니다. 유일한 차이점은 함수 마지막에 추가로 [`Volatile`] 타입을 이용한 [volatile] 읽기를 통해 [_tail call elimination_]을 방지한다는 것입니다. 주어진 함수의 맨 마지막 구문이 재귀 함수에 대한 호출인 경우, 컴파일러는 tail call elimination 기법을 통해 재귀 함수 호출을 평범한 반복문으로 변환할 수 있습니다. 그렇게 하면 재귀 함수 호출 시 새로운 스택 프레임이 생성되지 않고, 스택 메모리 사용량은 일정하게 유지됩니다. [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) [`Volatile`]: https://docs.rs/volatile/0.2.6/volatile/struct.Volatile.html [_tail call elimination_]: https://en.wikipedia.org/wiki/Tail_call 이 테스트에서 우리는 스택 오버플로우가 발생하기를 원하기 때문에, 함수의 맨 마지막에 컴파일러가 제거할 수 없는 volatile 읽기 작업을 삽입합니다. 따라서 `stack_overflow` 함수는 더 이상 _꼬리 재귀 (tail recursive)_ 함수가 아니게 되고, tail call elimination 기법을 통한 최적화 역시 할 수 없게 됩니다. 또 `allow(unconditional_recursion)` 속성을 함수에 추가해 "함수가 무한히 재귀한다"는 경고 메시지가 출력되지 않게 합니다. ### 테스트용 IDT 위에서 언급했듯이, 살짝 변경된 새로운 더블 폴트 처리 함수가 등록된 테스트용 IDT가 필요합니다. 테스트 용 IDT의 구현은 아래와 같습니다. ```rust // in tests/stack_overflow.rs use lazy_static::lazy_static; use x86_64::structures::idt::InterruptDescriptorTable; lazy_static! { static ref TEST_IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); unsafe { idt.double_fault .set_handler_fn(test_double_fault_handler) .set_stack_index(blog_os::gdt::DOUBLE_FAULT_IST_INDEX); } idt }; } pub fn init_test_idt() { TEST_IDT.load(); } ``` 코드 구현은 우리가 `interrupts.rs`에서 작성한 IDT와 매우 흡사합니다. 기존과 마찬가지로 더블 폴트 처리 함수가 사용할 스택의 인덱스를 정해줍니다. `init_test_idt` 함수는 `load` 함수를 통해 테스트 용 IDT를 CPU로 불러옵니다. ### 더블 폴트 처리 함수 마지막 남은 단계는 더블 폴트 처리 함수를 작성하는 것입니다. 코드 구현은 아래와 같습니다. ```rust // in tests/stack_overflow.rs use blog_os::{exit_qemu, QemuExitCode, serial_println}; use x86_64::structures::idt::InterruptStackFrame; extern "x86-interrupt" fn test_double_fault_handler( _stack_frame: InterruptStackFrame, _error_code: u64, ) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` 더블 폴트 처리 함수가 호출되면 우리는 성공 종료 코드와 함께 QEMU를 종료시키고, 테스트는 성공한 것으로 처리됩니다. 통합 테스트는 완전히 독립적인 실행 파일로 간주하기 때문에, 다시 한번 테스트 파일의 맨 위에 `#![feature(abi_x86_interrupt)]` 속성을 추가해야 합니다. `cargo test --test stack_overflow`를 통해 새로 작성한 테스트를 실행할 수 있습니다 (또는 `cargo test`로 모든 테스트 실행). 예상대로 콘솔에 `stack_overflow... [ok]` 라는 메시지가 출력될 것입니다. 테스트 코드에서 `set_stack_index`를 호출하지 않게 주석 처리한 후 테스트를 실행하면 테스트가 실패하는 것 또한 확인할 수 있을 것입니다. ## 정리 이 글에서는 더블 폴트와 더블 폴트의 발생 조건에 대해 배웠습니다. 오류 메시지를 출력하는 간단한 더블 폴트 처리 함수를 커널에 추가했고, 해당 함수의 올바른 동작을 점검하는 통합 테스트도 추가했습니다. 또한 우리는 더블 폴트 발생 시 하드웨어의 스택 교체 기능을 통해 커널 스택 오버 플로우 발생 시에도 더블 폴트가 제대로 처리되도록 구현했습니다. 구현 과정에서 Task State Segment (TSS)와 그 안에 포함된 인터럽트 스택 테이블 (Interrupt Stack Table; IST), 그리고 오래된 아키텍처들에서 세그멘테이션 (segmentation)에 사용됐던 전역 서술자 테이블 (Global Descriptor Table; GDT)에 대해 배웠습니다. ## 다음 단계는 무엇일까요? 다음 글에서는 타이머, 키보드, 네트워크 컨트롤러 등의 외부 장치로부터 전송되어 오는 인터럽트들을 처리하는 방법에 대해 설명하겠습니다. 이러한 하드웨어 인터럽트들은 예외와 마찬가지로 IDT에 등록된 처리 함수를 통해 처리된다는 점에서 유사합니다. 인터럽트가 예외와 다른 점은 예외와 달리 CPU로부터 발생하지 않는다는 것입니다. 대신에 _인터럽트 컨트롤러 (interrupt controller)_ 가 외부 장치로부터 전송되어 오는 인터럽트들을 수합한 후 인터럽트 우선 순위에 맞춰 CPU로 인터럽트들을 전달합니다. 다음 글에서 [Intel 8259] (“PIC”) 인터럽트 컨트롤러에 대해 알아보고, 키보드 입력을 지원하는 법을 배울 것입니다. [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ================================================ FILE: blog/content/edition-2/posts/06-double-faults/index.md ================================================ +++ title = "Double Faults" weight = 6 path = "double-fault-exceptions" date = 2018-06-18 [extra] chapter = "Interrupts" +++ This post explores the double fault exception in detail, which occurs when the CPU fails to invoke an exception handler. By handling this exception, we avoid fatal _triple faults_ that cause a system reset. To prevent triple faults in all cases, we also set up an _Interrupt Stack Table_ to catch double faults on a separate kernel stack. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-06`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-06 ## What is a Double Fault? In simplified terms, a double fault is a special exception that occurs when the CPU fails to invoke an exception handler. For example, it occurs when a page fault is triggered but there is no page fault handler registered in the [Interrupt Descriptor Table][IDT] (IDT). So it's kind of similar to catch-all blocks in programming languages with exceptions, e.g., `catch(...)` in C++ or `catch(Exception e)` in Java or C#. [IDT]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table A double fault behaves like a normal exception. It has the vector number `8` and we can define a normal handler function for it in the IDT. It is really important to provide a double fault handler, because if a double fault is unhandled, a fatal _triple fault_ occurs. Triple faults can't be caught, and most hardware reacts with a system reset. ### Triggering a Double Fault Let's provoke a double fault by triggering an exception for which we didn't define a handler function: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // trigger a page fault unsafe { *(0xdeadbeef as *mut u8) = 42; }; // as before #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` We use `unsafe` to write to the invalid address `0xdeadbeef`. The virtual address is not mapped to a physical address in the page tables, so a page fault occurs. We haven't registered a page fault handler in our [IDT], so a double fault occurs. When we start our kernel now, we see that it enters an endless boot loop. The reason for the boot loop is the following: 1. The CPU tries to write to `0xdeadbeef`, which causes a page fault. 2. The CPU looks at the corresponding entry in the IDT and sees that no handler function is specified. Thus, it can't call the page fault handler and a double fault occurs. 3. The CPU looks at the IDT entry of the double fault handler, but this entry does not specify a handler function either. Thus, a _triple_ fault occurs. 4. A triple fault is fatal. QEMU reacts to it like most real hardware and issues a system reset. So in order to prevent this triple fault, we need to either provide a handler function for page faults or a double fault handler. We want to avoid triple faults in all cases, so let's start with a double fault handler that is invoked for all unhandled exception types. ## A Double Fault Handler A double fault is a normal exception with an error code, so we can specify a handler function similar to our breakpoint handler: ```rust // in src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.double_fault.set_handler_fn(double_fault_handler); // new idt }; } // new extern "x86-interrupt" fn double_fault_handler( stack_frame: InterruptStackFrame, _error_code: u64) -> ! { panic!("EXCEPTION: DOUBLE FAULT\n{:#?}", stack_frame); } ``` Our handler prints a short error message and dumps the exception stack frame. The error code of the double fault handler is always zero, so there's no reason to print it. One difference to the breakpoint handler is that the double fault handler is [_diverging_]. The reason is that the `x86_64` architecture does not permit returning from a double fault exception. [_diverging_]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html When we start our kernel now, we should see that the double fault handler is invoked: ![QEMU printing `EXCEPTION: DOUBLE FAULT` and the exception stack frame](qemu-catch-double-fault.png) It worked! Here is what happened this time: 1. The CPU tries to write to `0xdeadbeef`, which causes a page fault. 2. Like before, the CPU looks at the corresponding entry in the IDT and sees that no handler function is defined. Thus, a double fault occurs. 3. The CPU jumps to the – now present – double fault handler. The triple fault (and the boot-loop) no longer occurs, since the CPU can now call the double fault handler. That was quite straightforward! So why do we need a whole post for this topic? Well, we're now able to catch _most_ double faults, but there are some cases where our current approach doesn't suffice. ## Causes of Double Faults Before we look at the special cases, we need to know the exact causes of double faults. Above, we used a pretty vague definition: > A double fault is a special exception that occurs when the CPU fails to invoke an exception handler. What does _“fails to invoke”_ mean exactly? The handler is not present? The handler is [swapped out]? And what happens if a handler causes exceptions itself? [swapped out]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf For example, what happens if: 1. a breakpoint exception occurs, but the corresponding handler function is swapped out? 2. a page fault occurs, but the page fault handler is swapped out? 3. a divide-by-zero handler causes a breakpoint exception, but the breakpoint handler is swapped out? 4. our kernel overflows its stack and the _guard page_ is hit? Fortunately, the AMD64 manual ([PDF][AMD64 manual]) has an exact definition (in Section 8.2.9). According to it, a “double fault exception _can_ occur when a second exception occurs during the handling of a prior (first) exception handler”. The _“can”_ is important: Only very specific combinations of exceptions lead to a double fault. These combinations are: | First Exception | Second Exception | | --------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | | [Divide-by-zero],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | | [Page Fault] | [Page Fault],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | [Divide-by-zero]: https://wiki.osdev.org/Exceptions#Division_Error [Invalid TSS]: https://wiki.osdev.org/Exceptions#Invalid_TSS [Segment Not Present]: https://wiki.osdev.org/Exceptions#Segment_Not_Present [Stack-Segment Fault]: https://wiki.osdev.org/Exceptions#Stack-Segment_Fault [General Protection Fault]: https://wiki.osdev.org/Exceptions#General_Protection_Fault [Page Fault]: https://wiki.osdev.org/Exceptions#Page_Fault [AMD64 manual]: https://www.amd.com/system/files/TechDocs/24593.pdf So, for example, a divide-by-zero fault followed by a page fault is fine (the page fault handler is invoked), but a divide-by-zero fault followed by a general-protection fault leads to a double fault. With the help of this table, we can answer the first three of the above questions: 1. If a breakpoint exception occurs and the corresponding handler function is swapped out, a _page fault_ occurs and the _page fault handler_ is invoked. 2. If a page fault occurs and the page fault handler is swapped out, a _double fault_ occurs and the _double fault handler_ is invoked. 3. If a divide-by-zero handler causes a breakpoint exception, the CPU tries to invoke the breakpoint handler. If the breakpoint handler is swapped out, a _page fault_ occurs and the _page fault handler_ is invoked. In fact, even the case of an exception without a handler function in the IDT follows this scheme: When the exception occurs, the CPU tries to read the corresponding IDT entry. Since the entry is 0, which is not a valid IDT entry, a _general protection fault_ occurs. We did not define a handler function for the general protection fault either, so another general protection fault occurs. According to the table, this leads to a double fault. ### Kernel Stack Overflow Let's look at the fourth question: > What happens if our kernel overflows its stack and the guard page is hit? A guard page is a special memory page at the bottom of a stack that makes it possible to detect stack overflows. The page is not mapped to any physical frame, so accessing it causes a page fault instead of silently corrupting other memory. The bootloader sets up a guard page for our kernel stack, so a stack overflow causes a _page fault_. When a page fault occurs, the CPU looks up the page fault handler in the IDT and tries to push the [interrupt stack frame] onto the stack. However, the current stack pointer still points to the non-present guard page. Thus, a second page fault occurs, which causes a double fault (according to the above table). [interrupt stack frame]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-stack-frame So the CPU tries to call the _double fault handler_ now. However, on a double fault, the CPU tries to push the exception stack frame, too. The stack pointer still points to the guard page, so a _third_ page fault occurs, which causes a _triple fault_ and a system reboot. So our current double fault handler can't avoid a triple fault in this case. Let's try it ourselves! We can easily provoke a kernel stack overflow by calling a function that recurses endlessly: ```rust // in src/main.rs #[unsafe(no_mangle)] // don't mangle the name of this function pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); fn stack_overflow() { stack_overflow(); // for each recursion, the return address is pushed } // trigger a stack overflow stack_overflow(); […] // test_main(), println(…), and loop {} } ``` When we try this code in QEMU, we see that the system enters a bootloop again. So how can we avoid this problem? We can't omit the pushing of the exception stack frame, since the CPU itself does it. So we need to ensure somehow that the stack is always valid when a double fault exception occurs. Fortunately, the x86_64 architecture has a solution to this problem. ## Switching Stacks The x86_64 architecture is able to switch to a predefined, known-good stack when an exception occurs. This switch happens at hardware level, so it can be performed before the CPU pushes the exception stack frame. The switching mechanism is implemented as an _Interrupt Stack Table_ (IST). The IST is a table of 7 pointers to known-good stacks. In Rust-like pseudocode: ```rust struct InterruptStackTable { stack_pointers: [Option; 7], } ``` For each exception handler, we can choose a stack from the IST through the `stack_pointers` field in the corresponding [IDT entry]. For example, our double fault handler could use the first stack in the IST. Then the CPU automatically switches to this stack whenever a double fault occurs. This switch would happen before anything is pushed, preventing the triple fault. [IDT entry]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table ### The IST and TSS The Interrupt Stack Table (IST) is part of an old legacy structure called _[Task State Segment]_ \(TSS). The TSS used to hold various pieces of information (e.g., processor register state) about a task in 32-bit mode and was, for example, used for [hardware context switching]. However, hardware context switching is no longer supported in 64-bit mode and the format of the TSS has changed completely. [Task State Segment]: https://en.wikipedia.org/wiki/Task_state_segment [hardware context switching]: https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching On x86_64, the TSS no longer holds any task-specific information at all. Instead, it holds two stack tables (the IST is one of them). The only common field between the 32-bit and 64-bit TSS is the pointer to the [I/O port permissions bitmap]. [I/O port permissions bitmap]: https://en.wikipedia.org/wiki/Task_state_segment#I.2FO_port_permissions The 64-bit TSS has the following format: | Field | Type | | -------------------------------------------- | ---------- | | (reserved) | `u32` | | Privilege Stack Table | `[u64; 3]` | | (reserved) | `u64` | | Interrupt Stack Table | `[u64; 7]` | | (reserved) | `u64` | | (reserved) | `u16` | | I/O Map Base Address | `u16` | The _Privilege Stack Table_ is used by the CPU when the privilege level changes. For example, if an exception occurs while the CPU is in user mode (privilege level 3), the CPU normally switches to kernel mode (privilege level 0) before invoking the exception handler. In that case, the CPU would switch to the 0th stack in the Privilege Stack Table (since 0 is the target privilege level). We don't have any user-mode programs yet, so we will ignore this table for now. ### Creating a TSS Let's create a new TSS that contains a separate double fault stack in its interrupt stack table. For that, we need a TSS struct. Fortunately, the `x86_64` crate already contains a [`TaskStateSegment` struct] that we can use. [`TaskStateSegment` struct]: https://docs.rs/x86_64/0.14.2/x86_64/structures/tss/struct.TaskStateSegment.html We create the TSS in a new `gdt` module (the name will make sense later): ```rust // in src/lib.rs pub mod gdt; // in src/gdt.rs use x86_64::VirtAddr; use x86_64::structures::tss::TaskStateSegment; use lazy_static::lazy_static; pub const DOUBLE_FAULT_IST_INDEX: u16 = 0; lazy_static! { static ref TSS: TaskStateSegment = { let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = { const STACK_SIZE: usize = 4096 * 5; static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE]; let stack_start = VirtAddr::from_ptr(&raw const STACK); let stack_end = stack_start + STACK_SIZE; stack_end }; tss }; } ``` We use `lazy_static` because Rust's const evaluator is not yet powerful enough to do this initialization at compile time. We define that the 0th IST entry is the double fault stack (any other IST index would work too). Then we write the top address of a double fault stack to the 0th entry. We write the top address because stacks on x86 grow downwards, i.e., from high addresses to low addresses. We haven't implemented memory management yet, so we don't have a proper way to allocate a new stack. Instead, we use a `static mut` array as stack storage for now. It is important that it is a `static mut` and not an immutable `static`, because otherwise the bootloader will map it to a read-only page. We will replace this with a proper stack allocation in a later post. Note that this double fault stack has no guard page that protects against stack overflow. This means that we should not do anything stack-intensive in our double fault handler because a stack overflow might corrupt the memory below the stack. #### Loading the TSS Now that we've created a new TSS, we need a way to tell the CPU that it should use it. Unfortunately, this is a bit cumbersome since the TSS uses the segmentation system (for historical reasons). Instead of loading the table directly, we need to add a new segment descriptor to the [Global Descriptor Table] \(GDT). Then we can load our TSS by invoking the [`ltr` instruction] with the respective GDT index. (This is the reason why we named our module `gdt`.) [Global Descriptor Table]: https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/ [`ltr` instruction]: https://www.felixcloutier.com/x86/ltr ### The Global Descriptor Table The Global Descriptor Table (GDT) is a relic that was used for [memory segmentation] before paging became the de facto standard. However, it is still needed in 64-bit mode for various things, such as kernel/user mode configuration or TSS loading. [memory segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation The GDT is a structure that contains the _segments_ of the program. It was used on older architectures to isolate programs from each other before paging became the standard. For more information about segmentation, check out the equally named chapter of the free [“Three Easy Pieces” book]. While segmentation is no longer supported in 64-bit mode, the GDT still exists. It is mostly used for two things: Switching between kernel space and user space, and loading a TSS structure. [“Three Easy Pieces” book]: http://pages.cs.wisc.edu/~remzi/OSTEP/ #### Creating a GDT Let's create a static `GDT` that includes a segment for our `TSS` static: ```rust // in src/gdt.rs use x86_64::structures::gdt::{GlobalDescriptorTable, Descriptor}; lazy_static! { static ref GDT: GlobalDescriptorTable = { let mut gdt = GlobalDescriptorTable::new(); gdt.add_entry(Descriptor::kernel_code_segment()); gdt.add_entry(Descriptor::tss_segment(&TSS)); gdt }; } ``` As before, we use `lazy_static` again. We create a new GDT with a code segment and a TSS segment. #### Loading the GDT To load our GDT, we create a new `gdt::init` function that we call from our `init` function: ```rust // in src/gdt.rs pub fn init() { GDT.load(); } // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); } ``` Now our GDT is loaded (since the `_start` function calls `init`), but we still see the boot loop on stack overflow. ### The Final Steps The problem is that the GDT segments are not yet active because the segment and TSS registers still contain the values from the old GDT. We also need to modify the double fault IDT entry so that it uses the new stack. In summary, we need to do the following: 1. **Reload code segment register**: We changed our GDT, so we should reload `cs`, the code segment register. This is required since the old segment selector could now point to a different GDT descriptor (e.g., a TSS descriptor). 2. **Load the TSS**: We loaded a GDT that contains a TSS selector, but we still need to tell the CPU that it should use that TSS. 3. **Update the IDT entry**: As soon as our TSS is loaded, the CPU has access to a valid interrupt stack table (IST). Then we can tell the CPU that it should use our new double fault stack by modifying our double fault IDT entry. For the first two steps, we need access to the `code_selector` and `tss_selector` variables in our `gdt::init` function. We can achieve this by making them part of the static through a new `Selectors` struct: ```rust // in src/gdt.rs use x86_64::structures::gdt::SegmentSelector; lazy_static! { static ref GDT: (GlobalDescriptorTable, Selectors) = { let mut gdt = GlobalDescriptorTable::new(); let code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS)); (gdt, Selectors { code_selector, tss_selector }) }; } struct Selectors { code_selector: SegmentSelector, tss_selector: SegmentSelector, } ``` Now we can use the selectors to reload the `cs` register and load our `TSS`: ```rust // in src/gdt.rs pub fn init() { use x86_64::instructions::tables::load_tss; use x86_64::instructions::segmentation::{CS, Segment}; GDT.0.load(); unsafe { CS::set_reg(GDT.1.code_selector); load_tss(GDT.1.tss_selector); } } ``` We reload the code segment register using [`CS::set_reg`] and load the TSS using [`load_tss`]. The functions are marked as `unsafe`, so we need an `unsafe` block to invoke them. The reason is that it might be possible to break memory safety by loading invalid selectors. [`CS::set_reg`]: https://docs.rs/x86_64/0.14.5/x86_64/instructions/segmentation/struct.CS.html#method.set_reg [`load_tss`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tables/fn.load_tss.html Now that we have loaded a valid TSS and interrupt stack table, we can set the stack index for our double fault handler in the IDT: ```rust // in src/interrupts.rs use crate::gdt; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); unsafe { idt.double_fault.set_handler_fn(double_fault_handler) .set_stack_index(gdt::DOUBLE_FAULT_IST_INDEX); // new } idt }; } ``` The `set_stack_index` method is unsafe because the caller must ensure that the used index is valid and not already used for another exception. That's it! Now the CPU should switch to the double fault stack whenever a double fault occurs. Thus, we are able to catch _all_ double faults, including kernel stack overflows: ![QEMU printing `EXCEPTION: DOUBLE FAULT` and a dump of the exception stack frame](qemu-double-fault-on-stack-overflow.png) From now on, we should never see a triple fault again! To ensure that we don't accidentally break the above, we should add a test for this. ## A Stack Overflow Test To test our new `gdt` module and ensure that the double fault handler is correctly called on a stack overflow, we can add an integration test. The idea is to provoke a double fault in the test function and verify that the double fault handler is called. Let's start with a minimal skeleton: ```rust // in tests/stack_overflow.rs #![no_std] #![no_main] use core::panic::PanicInfo; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` Like our `panic_handler` test, the test will run [without a test harness]. The reason is that we can't continue execution after a double fault, so more than one test doesn't make sense. To disable the test harness for the test, we add the following to our `Cargo.toml`: ```toml # in Cargo.toml [[test]] name = "stack_overflow" harness = false ``` [without a test harness]: @/edition-2/posts/04-testing/index.md#no-harness-tests Now `cargo test --test stack_overflow` should compile successfully. The test fails, of course, since the `unimplemented` macro panics. ### Implementing `_start` The implementation of the `_start` function looks like this: ```rust // in tests/stack_overflow.rs use blog_os::serial_print; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { serial_print!("stack_overflow::stack_overflow...\t"); blog_os::gdt::init(); init_test_idt(); // trigger a stack overflow stack_overflow(); panic!("Execution continued after stack overflow"); } #[allow(unconditional_recursion)] fn stack_overflow() { stack_overflow(); // for each recursion, the return address is pushed volatile::Volatile::new(0).read(); // prevent tail recursion optimizations } ``` We call our `gdt::init` function to initialize a new GDT. Instead of calling our `interrupts::init_idt` function, we call an `init_test_idt` function that will be explained in a moment. The reason is that we want to register a custom double fault handler that does an `exit_qemu(QemuExitCode::Success)` instead of panicking. The `stack_overflow` function is almost identical to the function in our `main.rs`. The only difference is that at the end of the function, we perform an additional [volatile] read using the [`Volatile`] type to prevent a compiler optimization called [_tail call elimination_]. Among other things, this optimization allows the compiler to transform a function whose last statement is a recursive function call into a normal loop. Thus, no additional stack frame is created for the function call, so the stack usage remains constant. [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) [`Volatile`]: https://docs.rs/volatile/0.2.6/volatile/struct.Volatile.html [_tail call elimination_]: https://en.wikipedia.org/wiki/Tail_call In our case, however, we want the stack overflow to happen, so we add a dummy volatile read statement at the end of the function, which the compiler is not allowed to remove. Thus, the function is no longer _tail recursive_, and the transformation into a loop is prevented. We also add the `allow(unconditional_recursion)` attribute to silence the compiler warning that the function recurses endlessly. ### The Test IDT As noted above, the test needs its own IDT with a custom double fault handler. The implementation looks like this: ```rust // in tests/stack_overflow.rs use lazy_static::lazy_static; use x86_64::structures::idt::InterruptDescriptorTable; lazy_static! { static ref TEST_IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); unsafe { idt.double_fault .set_handler_fn(test_double_fault_handler) .set_stack_index(blog_os::gdt::DOUBLE_FAULT_IST_INDEX); } idt }; } pub fn init_test_idt() { TEST_IDT.load(); } ``` The implementation is very similar to our normal IDT in `interrupts.rs`. Like in the normal IDT, we set a stack index in the IST for the double fault handler in order to switch to a separate stack. The `init_test_idt` function loads the IDT on the CPU through the `load` method. ### The Double Fault Handler The only missing piece is our double fault handler. It looks like this: ```rust // in tests/stack_overflow.rs use blog_os::{exit_qemu, QemuExitCode, serial_println}; use x86_64::structures::idt::InterruptStackFrame; extern "x86-interrupt" fn test_double_fault_handler( _stack_frame: InterruptStackFrame, _error_code: u64, ) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` When the double fault handler is called, we exit QEMU with a success exit code, which marks the test as passed. Since integration tests are completely separate executables, we need to set the `#![feature(abi_x86_interrupt)]` attribute again at the top of our test file. Now we can run our test through `cargo test --test stack_overflow` (or `cargo test` to run all tests). As expected, we see the `stack_overflow... [ok]` output in the console. Try to comment out the `set_stack_index` line; it should cause the test to fail. ## Summary In this post, we learned what a double fault is and under which conditions it occurs. We added a basic double fault handler that prints an error message and added an integration test for it. We also enabled the hardware-supported stack switching on double fault exceptions so that it also works on stack overflow. While implementing it, we learned about the task state segment (TSS), the contained interrupt stack table (IST), and the global descriptor table (GDT), which was used for segmentation on older architectures. ## What's next? The next post explains how to handle interrupts from external devices such as timers, keyboards, or network controllers. These hardware interrupts are very similar to exceptions, e.g., they are also dispatched through the IDT. However, unlike exceptions, they don't arise directly on the CPU. Instead, an _interrupt controller_ aggregates these interrupts and forwards them to the CPU depending on their priority. In the next post, we will explore the [Intel 8259] \(“PIC”) interrupt controller and learn how to implement keyboard support. [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ================================================ FILE: blog/content/edition-2/posts/06-double-faults/index.pt-BR.md ================================================ +++ title = "Double Faults" weight = 6 path = "pt-BR/double-fault-exceptions" date = 2018-06-18 [extra] chapter = "Interrupções" # Please update this when updating the translation translation_based_on_commit = "9753695744854686a6b80012c89b0d850a44b4b0" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Esta postagem explora a exceção de double fault em detalhes, que ocorre quando a CPU falha ao invocar um manipulador de exceção. Ao manipular esta exceção, evitamos _triple faults_ fatais que causam uma redefinição do sistema. Para prevenir triple faults em todos os casos, também configuramos uma _Interrupt Stack Table_ para capturar double faults em uma pilha de kernel separada. Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na branch [`post-06`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-06 ## O Que é um Double Fault? Em termos simplificados, um double fault é uma exceção especial que ocorre quando a CPU falha ao invocar um manipulador de exceção. Por exemplo, ele ocorre quando um page fault é disparado mas não há manipulador de page fault registrado na [Tabela de Descritores de Interrupção][IDT] (IDT). Então é meio similar aos blocos catch-all em linguagens de programação com exceções, por exemplo, `catch(...)` em C++ ou `catch(Exception e)` em Java ou C#. [IDT]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table Um double fault se comporta como uma exceção normal. Ele tem o número de vetor `8` e podemos definir uma função manipuladora normal para ele na IDT. É realmente importante fornecer um manipulador de double fault, porque se um double fault não é manipulado, ocorre um _triple fault_ fatal. Triple faults não podem ser capturados, e a maioria do hardware reage com uma redefinição do sistema. ### Disparando um Double Fault Vamos provocar um double fault disparando uma exceção para a qual não definimos uma função manipuladora: ```rust // em src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Olá Mundo{}", "!"); blog_os::init(); // dispara um page fault unsafe { *(0xdeadbeef as *mut u8) = 42; }; // como antes #[cfg(test)] test_main(); println!("Não crashou!"); loop {} } ``` Usamos `unsafe` para escrever no endereço inválido `0xdeadbeef`. O endereço virtual não está mapeado para um endereço físico nas tabelas de página, então ocorre um page fault. Não registramos um manipulador de page fault em nossa [IDT], então ocorre um double fault. Quando iniciamos nosso kernel agora, vemos que ele entra em um loop de boot infinito. A razão para o loop de boot é a seguinte: 1. A CPU tenta escrever em `0xdeadbeef`, o que causa um page fault. 2. A CPU olha para a entrada correspondente na IDT e vê que nenhuma função manipuladora está especificada. Assim, ela não pode chamar o manipulador de page fault e ocorre um double fault. 3. A CPU olha para a entrada IDT do manipulador de double fault, mas esta entrada também não especifica uma função manipuladora. Assim, ocorre um _triple_ fault. 4. Um triple fault é fatal. QEMU reage a ele como a maioria do hardware real e emite uma redefinição do sistema. Então, para prevenir este triple fault, precisamos fornecer uma função manipuladora para page faults ou um manipulador de double fault. Queremos evitar triple faults em todos os casos, então vamos começar com um manipulador de double fault que é invocado para todos os tipos de exceção não manipulados. ## Um Manipulador de Double Fault Um double fault é uma exceção normal com um código de erro, então podemos especificar uma função manipuladora similar ao nosso manipulador de breakpoint: ```rust // em src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.double_fault.set_handler_fn(double_fault_handler); // novo idt }; } // novo extern "x86-interrupt" fn double_fault_handler( stack_frame: InterruptStackFrame, _error_code: u64) -> ! { panic!("EXCEÇÃO: DOUBLE FAULT\n{:#?}", stack_frame); } ``` Nosso manipulador imprime uma mensagem de erro curta e despeja o exception stack frame. O código de erro do manipulador de double fault é sempre zero, então não há razão para imprimi-lo. Uma diferença para o manipulador de breakpoint é que o manipulador de double fault é [_divergente_]. A razão é que a arquitetura `x86_64` não permite retornar de uma exceção de double fault. [_divergente_]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html Quando iniciamos nosso kernel agora, devemos ver que o manipulador de double fault é invocado: ![QEMU printing `EXCEÇÃO: DOUBLE FAULT` and the exception stack frame](qemu-catch-double-fault.png) Funcionou! Aqui está o que aconteceu desta vez: 1. A CPU tenta escrever em `0xdeadbeef`, o que causa um page fault. 2. Como antes, a CPU olha para a entrada correspondente na IDT e vê que nenhuma função manipuladora está definida. Assim, ocorre um double fault. 3. A CPU pula para o manipulador de double fault – agora presente. O triple fault (e o loop de boot) não ocorre mais, já que a CPU agora pode chamar o manipulador de double fault. Isso foi bem direto! Então por que precisamos de uma postagem inteira para este tópico? Bem, agora somos capazes de capturar a _maioria_ dos double faults, mas há alguns casos onde nossa abordagem atual não é suficiente. ## Causas de Double Faults Antes de olharmos para os casos especiais, precisamos conhecer as causas exatas de double faults. Acima, usamos uma definição bem vaga: > Um double fault é uma exceção especial que ocorre quando a CPU falha ao invocar um manipulador de exceção. O que _"falha ao invocar"_ significa exatamente? O manipulador não está presente? O manipulador está [trocado para fora]? E o que acontece se um manipulador causa exceções ele mesmo? [trocado para fora]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf Por exemplo, o que acontece se: 1. uma exceção de breakpoint ocorre, mas a função manipuladora correspondente está trocada para fora? 2. um page fault ocorre, mas o manipulador de page fault está trocado para fora? 3. um manipulador de divide-by-zero causa uma exceção de breakpoint, mas o manipulador de breakpoint está trocado para fora? 4. nosso kernel estoura sua pilha e a _guard page_ é atingida? Felizmente, o manual AMD64 ([PDF][AMD64 manual]) tem uma definição exata (na Seção 8.2.9). De acordo com ele, uma "exceção de double fault _pode_ ocorrer quando uma segunda exceção ocorre durante a manipulação de um manipulador de exceção anterior (primeira)". O _"pode"_ é importante: Apenas combinações muito específicas de exceções levam a um double fault. Essas combinações são: | Primeira Exceção | Segunda Exceção | | --------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | | [Divide-by-zero],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | | [Page Fault] | [Page Fault],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | [Divide-by-zero]: https://wiki.osdev.org/Exceptions#Division_Error [Invalid TSS]: https://wiki.osdev.org/Exceptions#Invalid_TSS [Segment Not Present]: https://wiki.osdev.org/Exceptions#Segment_Not_Present [Stack-Segment Fault]: https://wiki.osdev.org/Exceptions#Stack-Segment_Fault [General Protection Fault]: https://wiki.osdev.org/Exceptions#General_Protection_Fault [Page Fault]: https://wiki.osdev.org/Exceptions#Page_Fault [AMD64 manual]: https://www.amd.com/system/files/TechDocs/24593.pdf Então, por exemplo, uma falha de divide-by-zero seguida de um page fault está ok (o manipulador de page fault é invocado), mas uma falha de divide-by-zero seguida de um general-protection fault leva a um double fault. Com a ajuda desta tabela, podemos responder às primeiras três das questões acima: 1. Se uma exceção de breakpoint ocorre e a função manipuladora correspondente está trocada para fora, ocorre um _page fault_ e o _manipulador de page fault_ é invocado. 2. Se um page fault ocorre e o manipulador de page fault está trocado para fora, ocorre um _double fault_ e o _manipulador de double fault_ é invocado. 3. Se um manipulador de divide-by-zero causa uma exceção de breakpoint, a CPU tenta invocar o manipulador de breakpoint. Se o manipulador de breakpoint está trocado para fora, ocorre um _page fault_ e o _manipulador de page fault_ é invocado. De fato, até o caso de uma exceção sem função manipuladora na IDT segue este esquema: Quando a exceção ocorre, a CPU tenta ler a entrada IDT correspondente. Como a entrada é 0, que não é uma entrada IDT válida, ocorre um _general protection fault_. Não definimos uma função manipuladora para o general protection fault também, então outro general protection fault ocorre. De acordo com a tabela, isso leva a um double fault. ### Kernel Stack Overflow Vamos olhar para a quarta questão: > O que acontece se nosso kernel estoura sua pilha e a guard page é atingida? Uma guard page é uma página de memória especial na parte inferior de uma pilha que torna possível detectar estouros de pilha. A página não está mapeada para nenhum frame físico, então acessá-la causa um page fault em vez de silenciosamente corromper outra memória. O bootloader configura uma guard page para nossa pilha de kernel, então um stack overflow causa um _page fault_. Quando um page fault ocorre, a CPU olha para o manipulador de page fault na IDT e tenta empurrar o [interrupt stack frame] na pilha. No entanto, o ponteiro de pilha atual ainda aponta para a guard page não presente. Assim, ocorre um segundo page fault, que causa um double fault (de acordo com a tabela acima). [interrupt stack frame]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-stack-frame Então a CPU tenta chamar o _manipulador de double fault_ agora. No entanto, em um double fault, a CPU tenta empurrar o exception stack frame também. O ponteiro de pilha ainda aponta para a guard page, então ocorre um _terceiro_ page fault, que causa um _triple fault_ e uma reinicialização do sistema. Então nosso manipulador de double fault atual não pode evitar um triple fault neste caso. Vamos tentar nós mesmos! Podemos facilmente provocar um kernel stack overflow chamando uma função que recursa infinitamente: ```rust // em src/main.rs #[unsafe(no_mangle)] // não altere (mangle) o nome desta função pub extern "C" fn _start() -> ! { println!("Olá Mundo{}", "!"); blog_os::init(); fn stack_overflow() { stack_overflow(); // para cada recursão, o endereço de retorno é empurrado } // dispara um stack overflow stack_overflow(); […] // test_main(), println(…), e loop {} } ``` Quando tentamos este código no QEMU, vemos que o sistema entra em um bootloop novamente. Então como podemos evitar este problema? Não podemos omitir o empurrar do exception stack frame, já que a própria CPU faz isso. Então precisamos garantir de alguma forma que a pilha esteja sempre válida quando uma exceção de double fault ocorre. Felizmente, a arquitetura x86_64 tem uma solução para este problema. ## Trocando Pilhas A arquitetura x86_64 é capaz de trocar para uma pilha predefinida e conhecida como boa quando uma exceção ocorre. Esta troca acontece em nível de hardware, então pode ser realizada antes que a CPU empurre o exception stack frame. O mecanismo de troca é implementado como uma _Interrupt Stack Table_ (IST). A IST é uma tabela de 7 ponteiros para pilhas conhecidas como boas. Em pseudocódigo similar a Rust: ```rust struct InterruptStackTable { stack_pointers: [Option; 7], } ``` Para cada manipulador de exceção, podemos escolher uma pilha da IST através do campo `stack_pointers` na [entrada IDT] correspondente. Por exemplo, nosso manipulador de double fault poderia usar a primeira pilha na IST. Então a CPU automaticamente troca para esta pilha sempre que ocorre um double fault. Esta troca aconteceria antes de qualquer coisa ser empurrada, prevenindo o triple fault. [entrada IDT]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table ### A IST e TSS A Interrupt Stack Table (IST) é parte de uma antiga estrutura legada chamada _[Task State Segment]_ \(TSS). A TSS costumava armazenar várias informações (por exemplo, estado de registradores do processador) sobre uma tarefa no modo de 32 bits e era, por exemplo, usada para [troca de contexto de hardware]. No entanto, a troca de contexto de hardware não é mais suportada no modo de 64 bits e o formato da TSS mudou completamente. [Task State Segment]: https://en.wikipedia.org/wiki/Task_state_segment [troca de contexto de hardware]: https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching No x86_64, a TSS não armazena mais nenhuma informação específica de tarefa. Em vez disso, ela armazena duas tabelas de pilha (a IST é uma delas). O único campo comum entre a TSS de 32 bits e 64 bits é o ponteiro para o [bitmap de permissões de porta I/O]. [bitmap de permissões de porta I/O]: https://en.wikipedia.org/wiki/Task_state_segment#I.2FO_port_permissions A TSS de 64 bits tem o seguinte formato: | Campo | Tipo | | -------------------------------------------- | ---------- | | (reservado) | `u32` | | Privilege Stack Table | `[u64; 3]` | | (reservado) | `u64` | | Interrupt Stack Table | `[u64; 7]` | | (reservado) | `u64` | | (reservado) | `u16` | | I/O Map Base Address | `u16` | A _Privilege Stack Table_ é usada pela CPU quando o nível de privilégio muda. Por exemplo, se uma exceção ocorre enquanto a CPU está em modo usuário (nível de privilégio 3), a CPU normalmente troca para o modo kernel (nível de privilégio 0) antes de invocar o manipulador de exceção. Nesse caso, a CPU trocaria para a 0ª pilha na Privilege Stack Table (já que 0 é o nível de privilégio alvo). Ainda não temos nenhum programa em modo usuário, então ignoraremos esta tabela por enquanto. ### Criando uma TSS Vamos criar uma nova TSS que contém uma pilha de double fault separada em sua interrupt stack table. Para isso, precisamos de uma struct TSS. Felizmente, a crate `x86_64` já contém uma [struct `TaskStateSegment`] que podemos usar. [struct `TaskStateSegment`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/tss/struct.TaskStateSegment.html Criamos a TSS em um novo módulo `gdt` (o nome fará sentido mais tarde): ```rust // em src/lib.rs pub mod gdt; // em src/gdt.rs use x86_64::VirtAddr; use x86_64::structures::tss::TaskStateSegment; use lazy_static::lazy_static; pub const DOUBLE_FAULT_IST_INDEX: u16 = 0; lazy_static! { static ref TSS: TaskStateSegment = { let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = { const STACK_SIZE: usize = 4096 * 5; static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE]; let stack_start = VirtAddr::from_ptr(&raw const STACK); let stack_end = stack_start + STACK_SIZE; stack_end }; tss }; } ``` Usamos `lazy_static` porque o avaliador const de Rust ainda não é poderoso o suficiente para fazer esta inicialização em tempo de compilação. Definimos que a 0ª entrada IST é a pilha de double fault (qualquer outro índice IST funcionaria também). Então escrevemos o endereço superior de uma pilha de double fault na 0ª entrada. Escrevemos o endereço superior porque pilhas no x86 crescem para baixo, isto é, de endereços altos para endereços baixos. Ainda não implementamos gerenciamento de memória, então não temos uma forma apropriada de alocar uma nova pilha. Em vez disso, usamos um array `static mut` como armazenamento de pilha por enquanto. É importante que seja uma `static mut` e não uma `static` imutável, porque caso contrário o bootloader a mapearia para uma página somente leitura. Substituiremos isto por uma alocação de pilha apropriada em uma postagem posterior. Note que esta pilha de double fault não tem guard page que proteja contra stack overflow. Isso significa que não devemos fazer nada intensivo em pilha em nosso manipulador de double fault porque um stack overflow poderia corromper a memória abaixo da pilha. #### Carregando a TSS Agora que criamos uma nova TSS, precisamos de uma forma de dizer à CPU que ela deve usá-la. Infelizmente, isto é um pouco trabalhoso, já que a TSS usa o sistema de segmentação (por razões históricas). Em vez de carregar a tabela diretamente, precisamos adicionar um novo descritor de segmento à [Tabela de Descritores Globais] \(GDT). Então podemos carregar nossa TSS invocando a [instrução `ltr`] com o respectivo índice GDT. (Esta é a razão pela qual nomeamos nosso módulo `gdt`.) [Tabela de Descritores Globais]: https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/ [instrução `ltr`]: https://www.felixcloutier.com/x86/ltr ### A Tabela de Descritores Globais A Tabela de Descritores Globais (GDT - Global Descriptor Table) é uma relíquia que foi usada para [segmentação de memória] antes de paginação se tornar o padrão de fato. No entanto, ela ainda é necessária no modo de 64 bits para várias coisas, como configuração de modo kernel/usuário ou carregamento de TSS. [segmentação de memória]: https://en.wikipedia.org/wiki/X86_memory_segmentation A GDT é uma estrutura que contém os _segmentos_ do programa. Ela foi usada em arquiteturas mais antigas para isolar programas uns dos outros antes de paginação se tornar o padrão. Para mais informações sobre segmentação, confira o capítulo de mesmo nome do livro gratuito ["Three Easy Pieces"]. Embora a segmentação não seja mais suportada no modo de 64 bits, a GDT ainda existe. Ela é usada principalmente para duas coisas: Trocar entre espaço de kernel e espaço de usuário, e carregar uma estrutura TSS. ["Three Easy Pieces"]: http://pages.cs.wisc.edu/~remzi/OSTEP/ #### Criando uma GDT Vamos criar uma `GDT` estática que inclui um segmento para nossa `TSS` estática: ```rust // em src/gdt.rs use x86_64::structures::gdt::{GlobalDescriptorTable, Descriptor}; lazy_static! { static ref GDT: GlobalDescriptorTable = { let mut gdt = GlobalDescriptorTable::new(); gdt.add_entry(Descriptor::kernel_code_segment()); gdt.add_entry(Descriptor::tss_segment(&TSS)); gdt }; } ``` Como antes, usamos `lazy_static` novamente. Criamos uma nova GDT com um segmento de código e um segmento TSS. #### Carregando a GDT Para carregar nossa GDT, criamos uma nova função `gdt::init` que chamamos de nossa função `init`: ```rust // em src/gdt.rs pub fn init() { GDT.load(); } // em src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); } ``` Agora nossa GDT está carregada (já que a função `_start` chama `init`), mas ainda vemos o loop de boot no stack overflow. ### Os Passos Finais O problema é que os segmentos GDT ainda não estão ativos porque os registradores de segmento e TSS ainda contêm os valores da GDT antiga. Também precisamos modificar a entrada IDT de double fault para que ela use a nova pilha. Em resumo, precisamos fazer o seguinte: 1. **Recarregar registrador de segmento de código**: Mudamos nossa GDT, então devemos recarregar `cs`, o registrador de segmento de código. Isso é necessário já que o antigo seletor de segmento poderia agora apontar para um descritor GDT diferente (por exemplo, um descritor TSS). 2. **Carregar a TSS**: Carregamos uma GDT que contém um seletor TSS, mas ainda precisamos dizer à CPU que ela deve usar essa TSS. 3. **Atualizar a entrada IDT**: Assim que nossa TSS é carregada, a CPU tem acesso a uma interrupt stack table (IST) válida. Então podemos dizer à CPU que ela deve usar nossa nova pilha de double fault modificando nossa entrada IDT de double fault. Para os dois primeiros passos, precisamos de acesso às variáveis `code_selector` e `tss_selector` em nossa função `gdt::init`. Podemos conseguir isso tornando-as parte da static através de uma nova struct `Selectors`: ```rust // em src/gdt.rs use x86_64::structures::gdt::SegmentSelector; lazy_static! { static ref GDT: (GlobalDescriptorTable, Selectors) = { let mut gdt = GlobalDescriptorTable::new(); let code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS)); (gdt, Selectors { code_selector, tss_selector }) }; } struct Selectors { code_selector: SegmentSelector, tss_selector: SegmentSelector, } ``` Agora podemos usar os seletores para recarregar o registrador `cs` e carregar nossa `TSS`: ```rust // em src/gdt.rs pub fn init() { use x86_64::instructions::tables::load_tss; use x86_64::instructions::segmentation::{CS, Segment}; GDT.0.load(); unsafe { CS::set_reg(GDT.1.code_selector); load_tss(GDT.1.tss_selector); } } ``` Recarregamos o registrador de segmento de código usando [`CS::set_reg`] e carregamos a TSS usando [`load_tss`]. As funções são marcadas como `unsafe`, então precisamos de um bloco `unsafe` para invocá-las. A razão é que pode ser possível quebrar a segurança de memória carregando seletores inválidos. [`CS::set_reg`]: https://docs.rs/x86_64/0.14.5/x86_64/instructions/segmentation/struct.CS.html#method.set_reg [`load_tss`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tables/fn.load_tss.html Agora que carregamos uma TSS e interrupt stack table válidas, podemos definir o índice de pilha para nosso manipulador de double fault na IDT: ```rust // em src/interrupts.rs use crate::gdt; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); unsafe { idt.double_fault.set_handler_fn(double_fault_handler) .set_stack_index(gdt::DOUBLE_FAULT_IST_INDEX); // novo } idt }; } ``` O método `set_stack_index` é unsafe porque o chamador deve garantir que o índice usado é válido e não está já usado para outra exceção. É isso! Agora a CPU deve trocar para a pilha de double fault sempre que ocorre um double fault. Assim, somos capazes de capturar _todos_ os double faults, incluindo kernel stack overflows: ![QEMU printing `EXCEÇÃO: DOUBLE FAULT` and a dump of the exception stack frame](qemu-double-fault-on-stack-overflow.png) De agora em diante, nunca devemos ver um triple fault novamente! Para garantir que não quebramos acidentalmente o acima, devemos adicionar um teste para isso. ## Um Teste de Stack Overflow Para testar nosso novo módulo `gdt` e garantir que o manipulador de double fault é corretamente chamado em um stack overflow, podemos adicionar um teste de integração. A ideia é provocar um double fault na função de teste e verificar que o manipulador de double fault é chamado. Vamos começar com um esqueleto mínimo: ```rust // em tests/stack_overflow.rs #![no_std] #![no_main] use core::panic::PanicInfo; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` Como nosso teste `panic_handler`, o teste executará [sem um test harness]. A razão é que não podemos continuar a execução após um double fault, então mais de um teste não faz sentido. Para desativar o test harness para o teste, adicionamos o seguinte ao nosso `Cargo.toml`: ```toml # em Cargo.toml [[test]] name = "stack_overflow" harness = false ``` [sem um test harness]: @/edition-2/posts/04-testing/index.md#no-harness-tests Agora `cargo test --test stack_overflow` deve compilar com sucesso. O teste falha, é claro, já que a macro `unimplemented` entra em panic. ### Implementando `_start` A implementação da função `_start` se parece com isto: ```rust // em tests/stack_overflow.rs use blog_os::serial_print; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { serial_print!("stack_overflow::stack_overflow...\t"); blog_os::gdt::init(); init_test_idt(); // dispara um stack overflow stack_overflow(); panic!("Execução continuou após stack overflow"); } #[allow(unconditional_recursion)] fn stack_overflow() { stack_overflow(); // para cada recursão, o endereço de retorno é empurrado volatile::Volatile::new(0).read(); // previne otimizações de tail recursion } ``` Chamamos nossa função `gdt::init` para inicializar uma nova GDT. Em vez de chamar nossa função `interrupts::init_idt`, chamamos uma função `init_test_idt` que será explicada em um momento. A razão é que queremos registrar um manipulador de double fault customizado que faz um `exit_qemu(QemuExitCode::Success)` em vez de entrar em panic. A função `stack_overflow` é quase idêntica à função em nosso `main.rs`. A única diferença é que no final da função, realizamos uma leitura [volátil] adicional usando o tipo [`Volatile`] para prevenir uma otimização do compilador chamada [_tail call elimination_]. Entre outras coisas, esta otimização permite ao compilador transformar uma função cuja última instrução é uma chamada de função recursiva em um loop normal. Assim, nenhum stack frame adicional é criado para a chamada de função, então o uso de pilha permanece constante. [volátil]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) [`Volatile`]: https://docs.rs/volatile/0.2.6/volatile/struct.Volatile.html [_tail call elimination_]: https://en.wikipedia.org/wiki/Tail_call No nosso caso, no entanto, queremos que o stack overflow aconteça, então adicionamos uma instrução de leitura volátil fictícia no final da função, que o compilador não tem permissão para remover. Assim, a função não é mais _tail recursive_, e a transformação em um loop é prevenida. Também adicionamos o atributo `allow(unconditional_recursion)` para silenciar o aviso do compilador de que a função recursa infinitamente. ### A IDT de Teste Como notado acima, o teste precisa de sua própria IDT com um manipulador de double fault customizado. A implementação se parece com isto: ```rust // em tests/stack_overflow.rs use lazy_static::lazy_static; use x86_64::structures::idt::InterruptDescriptorTable; lazy_static! { static ref TEST_IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); unsafe { idt.double_fault .set_handler_fn(test_double_fault_handler) .set_stack_index(blog_os::gdt::DOUBLE_FAULT_IST_INDEX); } idt }; } pub fn init_test_idt() { TEST_IDT.load(); } ``` A implementação é muito similar à nossa IDT normal em `interrupts.rs`. Como na IDT normal, definimos um índice de pilha na IST para o manipulador de double fault para trocar para uma pilha separada. A função `init_test_idt` carrega a IDT na CPU através do método `load`. ### O Manipulador de Double Fault A única peça que falta é nosso manipulador de double fault. Ele se parece com isto: ```rust // em tests/stack_overflow.rs use blog_os::{exit_qemu, QemuExitCode, serial_println}; use x86_64::structures::idt::InterruptStackFrame; extern "x86-interrupt" fn test_double_fault_handler( _stack_frame: InterruptStackFrame, _error_code: u64, ) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` Quando o manipulador de double fault é chamado, saímos do QEMU com um código de saída de sucesso, o que marca o teste como aprovado. Como testes de integração são executáveis completamente separados, precisamos definir o atributo `#![feature(abi_x86_interrupt)]` novamente no topo do nosso arquivo de teste. Agora podemos executar nosso teste através de `cargo test --test stack_overflow` (ou `cargo test` para executar todos os testes). Como esperado, vemos a saída `stack_overflow... [ok]` no console. Tente comentar a linha `set_stack_index`; isso deve fazer o teste falhar. ## Resumo Nesta postagem, aprendemos o que é um double fault e sob quais condições ele ocorre. Adicionamos um manipulador de double fault básico que imprime uma mensagem de erro e adicionamos um teste de integração para ele. Também habilitamos a troca de pilha suportada por hardware em exceções de double fault para que também funcione em stack overflow. Enquanto implementávamos isso, aprendemos sobre o segmento de estado de tarefa (TSS), a interrupt stack table (IST) contida nele, e a tabela de descritores globais (GDT), que foi usada para segmentação em arquiteturas mais antigas. ## O Que Vem a Seguir? A próxima postagem explica como manipular interrupções de dispositivos externos como temporizadores, teclados ou controladores de rede. Essas interrupções de hardware são muito similares a exceções, por exemplo, elas também são despachadas através da IDT. No entanto, ao contrário de exceções, elas não surgem diretamente na CPU. Em vez disso, um _controlador de interrupção_ agrega essas interrupções e as encaminha para a CPU dependendo de sua prioridade. Na próxima postagem, exploraremos o controlador de interrupções [Intel 8259] \("PIC") e aprenderemos como implementar suporte a teclado. [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ================================================ FILE: blog/content/edition-2/posts/06-double-faults/index.zh-CN.md ================================================ +++ title = "Double Faults" weight = 6 path = "zh-CN/double-fault-exceptions" date = 2018-06-18 [extra] # Please update this when updating the translation translation_based_on_commit = "096c044b4f3697e91d8e30a2e817e567d0ef21a2" # GitHub usernames of the people that translated this post translators = ["liuyuran"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["JiangengDong"] +++ 在这篇文章中,我们会探索 double fault 异常的细节,它的触发条件是调用错误处理函数失败。通过捕获该异常,我们可以阻止致命的 _triple faults_ 异常导致系统重启。为了尽可能避免 triple faults ,我们会在一个独立的内核栈配置 _中断栈表_ 来捕捉 double faults。 这个系列的blog在[GitHub]上开放开发,如果你有任何问题,请在这里开一个issue来讨论。当然你也可以在[底部][at the bottom]留言。你可以在[`post-06`][post branch]找到这篇文章的完整源码。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-06 ## 何谓 Double Fault 简而言之,double fault 就是当CPU执行错误处理函数失败时抛出的特殊异常。比如,你没有注册在 [中断描述符表][IDT] 中注册对应 page fault 异常的处理函数,然后程序偏偏就抛出了一个 page fault 异常,这时候就会接着抛出 double fault 异常。这个异常的处理函数就比较类似于具备异常处理功能的编程语言里的 catch-all 语法的效果,比如 C++ 里的 `catch(...)` 和 JAVA/C# 里的 `catch(Exception e)`。 [IDT]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table double fault 的行为和普通异常十分相似,我们可以通过在IDT中注册 `8` 号位的处理函数来拦截该异常。这个处理函数十分重要,如果你不处理这个异常,CPU就会直接抛出 _triple fault_ 异常,该异常无法被任何方式处理,而且会直接导致绝大多数硬件强制重启。 ### 捕捉 Double Fault 我们先来试试看不捕捉 double fault 的情况下触发它会有什么后果: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // trigger a page fault unsafe { *(0xdeadbeef as *mut u8) = 42; }; // as before #[cfg(test)] test_main(); println!("It did not crash!"); loop {} } ``` 这里我们使用 `unsafe` 块直接操作了一个无效的内存地址 `0xdeadbeef`,由于该虚拟地址并未在页表中映射到物理内存,所以必然会抛出 page fault 异常。我们又并未在 [IDT] 中注册对应的处理器,所以 double fault 会紧接着被抛出。 现在启动内核,我们可以看到它直接陷入了崩溃和重启的无限循环,其原因如下: 1. CPU试图向 `0xdeadbeef` 写入数据,这就造成了 page fault 异常。 2. CPU没有在IDT中找到相应的处理函数,所以又抛出了 double fault 异常。 3. CPU再一次没有在IDT中找到相应的处理函数,所以又抛出了 _triple fault_ 异常。 4. 在抛出 triple fault 之后就没有然后了,这个错误是致命级别,如同大多数硬件一样,QEMU对此的处理方式就是重置系统,也就是重启。 通过这个小实验,我们知道在这种情况下,需要提前注册 page faults 或者 double fault 的处理函数才行,但如果想要在任何场景下避免触发 triple faults 异常,则必须注册能够捕捉一切未注册异常类型的 double fault 处理函数。 ## 处理 Double Fault double fault 是一个带错误码的常规错误,所以我们可以参照 breakpoint 处理函数定义一个 double fault 处理函数: ```rust // in src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); idt.double_fault.set_handler_fn(double_fault_handler); // new idt }; } // new extern "x86-interrupt" fn double_fault_handler( stack_frame: InterruptStackFrame, _error_code: u64) -> ! { panic!("EXCEPTION: DOUBLE FAULT\n{:#?}", stack_frame); } ``` 我们的处理函数打印了一行简短的信息,并将栈帧转写了出来。其中错误码一直是0,所以没有必要把它打印出来。要说这和 breakpoint 处理函数有什么区别,那就是 double fault 的处理函数是 [发散的][_diverging_],这是因为 `x86_64` 架构不允许从 double fault 异常中返回任何东西。 [_diverging_]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html 那么再次启动内核,我们可以看到 double fault 的处理函数被成功调用: ![QEMU printing `EXCEPTION: DOUBLE FAULT` and the exception stack frame](qemu-catch-double-fault.png) 让我们来分析一下又发生了什么: 1. CPU尝试往 `0xdeadbeef` 写入数据,引发了 page fault 异常。 2. 如同上次运行一样,CPU并没有在IDT里找到对应的处理函数,所以又引发了 double fault 异常。 3. CPU又跳转到了我们刚刚定义的 double fault 处理函数。 现在 triple fault 及其衍生的重启循环不会再出现了,因为CPU已经妥善处理了 double fault 异常。 这还真是直截了当对吧,但为什么要为这点内容单独写一篇文章呢?没错,我们的确已经可以捕获 _大部分_ double faults 异常,但在部分情况下,这样的做法依然不够。 ## Double Faults 的成因 在解释这些部分情况之前,我们需要先明确一下 double faults 的成因,上文中我们使用了一个模糊的定义: > double fault 就是当CPU执行错误处理函数失败时抛出的特殊异常。 但究竟什么叫 _“调用失败”_ ?没有提供处理函数?处理函数被[换出][swapped out]内存了?或者处理函数本身也出现了异常? [swapped out]: http://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys.pdf 比如以下情况出现时: 1. 如果 breakpoint 异常被触发,但其对应的处理函数已经被换出内存了? 2. 如果 page fault 异常被触发,但其对应的处理函数已经被换出内存了? 3. 如果 divide-by-zero 异常处理函数又触发了 breakpoint 异常,但 breakpoint 异常处理函数已经被换出内存了? 4. 如果我们的内核发生了栈溢出,意外访问到了 _guard page_ ? 幸运的是,AMD64手册([PDF][AMD64 manual])给出了一个准确的定义(在8.2.9这个章节中)。 根据里面的说法,“double fault” 异常 _会_ 在执行主要(一层)异常处理函数时触发二层异常时触发。 这个“会”字十分重要:只有特定的两个异常组合会触发 double fault。 这些异常组合如下: | 一层异常 | 二层异常 | | --------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | | [Divide-by-zero],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | | [Page Fault] | [Page Fault],
    [Invalid TSS],
    [Segment Not Present],
    [Stack-Segment Fault],
    [General Protection Fault] | [Divide-by-zero]: https://wiki.osdev.org/Exceptions#Division_Error [Invalid TSS]: https://wiki.osdev.org/Exceptions#Invalid_TSS [Segment Not Present]: https://wiki.osdev.org/Exceptions#Segment_Not_Present [Stack-Segment Fault]: https://wiki.osdev.org/Exceptions#Stack-Segment_Fault [General Protection Fault]: https://wiki.osdev.org/Exceptions#General_Protection_Fault [Page Fault]: https://wiki.osdev.org/Exceptions#Page_Fault [AMD64 manual]: https://www.amd.com/system/files/TechDocs/24593.pdf 所以那些假设里的 divide-by-zero 异常处理函数触发了 page fault 并不会出问题,只会紧接着触发下一个异常处理函数。但如果 divide-by-zero 异常处理函数触发的是 general-protection fault,则一定会触发 double fault。 那么根据上表,我们可以回答刚刚的假设中的前三个: 1. 如果 breakpoint 异常被触发,但对应的处理函数被换出了内存,_page fault_ 异常就会被触发,并调用其对应的异常处理函数。 2. 如果 page fault 异常被触发,但对应的处理函数被换出了内存,那么 _double fault_ 异常就会被触发,并调用其对应的处理函数。 3. 如果 divide-by-zero 异常处理函数又触发了 breakpoint 异常,但 breakpoint 异常处理函数已经被换出内存了,那么被触发的就是 _page fault_ 异常。 实际上,因在IDT里找不到对应处理函数而抛出异常的内部机制是:当异常发生时,CPU会去试图读取对应的IDT条目,如果该条目不是一个有效的条目,即其值为0,就会触发 _general protection fault_ 异常。但我们同样没有为该异常注册处理函数,所以又一个 general protection fault 被触发了,随后 double fault 也被触发了。 ### 内核栈溢出 现在让我们看一下第四个假设: > 如果我们的内核发生了栈溢出,意外访问到了 _guard page_ ? guard page 是一类位于栈底部的特殊内存页,所以如果发生了栈溢出,最典型的现象就是访问这里。这类内存页不会映射到物理内存中,所以访问这里只会造成 page fault 异常,而不会污染其他内存。bootloader 已经为我们的内核栈设置好了一个 guard page,所以栈溢出会导致 _page fault_ 异常。 当 page fault 发生时,CPU会在IDT寻找对应的处理函数,并尝试将 [中断栈帧][interrupt stack frame] 入栈,但此时栈指针指向了一个实际上并不存在的 guard page,然后第二个 page fault 异常就被触发了,根据上面的表格,double fault 也随之被触发了。 [interrupt stack frame]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-stack-frame 这时,CPU会尝试调用 _double fault_ 对应的处理函数,然而CPU依然会试图将错误栈帧入栈,由于栈指针依然指向 guard page,于是 _第三次_ page fault 发生了,最终导致 _triple fault_ 异常的抛出,系统因此重启。所以仅仅是注册错误处理函数并不能在此种情况下阻止 triple fault 的发生。 让我们来尝试一下,写一个能造成栈溢出的递归函数非常简单: ```rust // in src/main.rs #[unsafe(no_mangle)] // 禁止函数名自动修改 pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); fn stack_overflow() { stack_overflow(); // 每一次递归都会将返回地址入栈 } // 触发 stack overflow stack_overflow(); […] // test_main(), println(…), and loop {} } ``` 我们在QEMU执行这段程序,然后系统就再次进入了重启循环。 所以我们要如何避免这种情况?我们无法忽略异常栈帧入栈这一步,因为这个逻辑是内置在CPU里的。所以我们需要找个办法,让栈在 double fault 异常发生后始终有效。幸运的是,x86_64 架构对于这个问题已经给出了解决方案。 ## 切换栈 x86_64 架构允许在异常发生时,将栈切换为一个预定义的完好栈,这个切换是执行在硬件层次的,所以完全可以在CPU将异常栈帧入栈之前执行。 这个切换机制是由 _中断栈表_ (IST)实现的,IST是一个由7个确认可用的完好栈的指针组成的,用 Rust 语言可以表述为: ```rust struct InterruptStackTable { stack_pointers: [Option; 7], } ``` 对于每一个错误处理函数,我们都可以通过对应的[IDT条目][IDT entry]中的 `stack_pointers` 条目指定IST中的一个栈。比如我们可以让 double fault 对应的处理函数使用IST中的第一个栈指针,则CPU会在这个异常发生时,自动将栈切换为该栈。该切换行为会在所有入栈操作之前进行,由此可以避免进一步触发 triple fault 异常。 [IDT entry]: @/edition-2/posts/05-cpu-exceptions/index.md#the-interrupt-descriptor-table ### IST和TSS 中断栈表(IST)其实是一个名叫 _[任务状态段][Task State Segment](TSS)_ 的古老遗留结构的一部分。 TSS是用来存储32位任务中的零碎信息,比如处理器寄存器的状态,一般用于 [硬件上下文切换][hardware context switching]。但是硬件上下文切换已经不再适用于64位模式,并且TSS的实际数据结构也已经发生了彻底的改变。 [Task State Segment]: https://en.wikipedia.org/wiki/Task_state_segment [hardware context switching]: https://wiki.osdev.org/Context_Switching#Hardware_Context_Switching 在 x86_64 架构中,TSS已经不再存储任何任务相关信息,取而代之的是两个栈表(IST正是其中之一)。 32位TSS和64位TSS唯一的共有字段恐怕就是指向 [I/O端口权限位图][I/O port permissions bitmap] 的指针了。 [I/O port permissions bitmap]: https://en.wikipedia.org/wiki/Task_state_segment#I.2FO_port_permissions 64位TSS的格式如下: | 字段 | 类型 | | ---------------------------------------- | ---------- | | (保留) | `u32` | | 特权栈表 | `[u64; 3]` | | (保留) | `u64` | | 中断栈表 | `[u64; 7]` | | (保留) | `u64` | | (保留) | `u16` | | I/O映射基准地址 | `u16` | _特权栈表_ 在 CPU 特权等级变更的时候会被用到。例如当 CPU 在用户态(特权等级3)中触发一个异常时,一般情况下 CPU 会在执行错误处理函数前切换到内核态(特权等级0),在这种情况下,CPU 会切换为特权栈表的第0层(0层是目标特权等级)。但是目前我们还没有用户态的程序,所以暂且可以忽略这个表。 ### 创建一个TSS 那么我们来创建一个新的包含单独的 double fault 专属栈以及中断栈表的TSS。为此我们需要一个TSS结构体,幸运的是 `x86_64` crate 也已经自带了 [`TaskStateSegment` 结构][`TaskStateSegment` struct] 用来映射它。 [`TaskStateSegment` struct]: https://docs.rs/x86_64/0.14.2/x86_64/structures/tss/struct.TaskStateSegment.html 那么我们新建一个 `gdt` 模块(稍后会说明为何要使用这个名字)用来创建TSS: ```rust // in src/lib.rs pub mod gdt; // in src/gdt.rs use x86_64::VirtAddr; use x86_64::structures::tss::TaskStateSegment; use lazy_static::lazy_static; pub const DOUBLE_FAULT_IST_INDEX: u16 = 0; lazy_static! { static ref TSS: TaskStateSegment = { let mut tss = TaskStateSegment::new(); tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = { const STACK_SIZE: usize = 4096 * 5; static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE]; let stack_start = VirtAddr::from_ptr(&raw const STACK); let stack_end = stack_start + STACK_SIZE; stack_end }; tss }; } ``` 这次依然是使用 `lazy_static`,Rust的静态变量求值器还没有强大到能够在编译器执行初始化代码。我们将IST的0号位定义为 double fault 的专属栈(其他IST序号也可以如此施为)。然后我们将栈的高地址指针写入0号位,之所以这样做,那是因为 x86 的栈内存分配是从高地址到低地址的。 由于我们还没有实现内存管理机制,所以目前无法直接申请新栈,但我们可以使用 `static mut` 形式的数组来在内存中模拟出栈存储区。 而且这里必须是 `static mut` 而不是不可修改的 `static`,否则 bootloader 会将其分配到只读页中。 当然,在后续的文章中,我们会将其修改为真正的栈分配。 但要注意,由于现在 double fault 获取的栈不再具有用于防止栈溢出的 guard page,所以我们不应该做任何栈密集型操作了,否则就有可能会污染到栈下方的内存区域。 #### 加载TSS 我们已经创建了一个TSS,现在的问题就是怎么让CPU使用它。不幸的是这事有点繁琐,因为TSS用到了分段系统(历史原因)。但我们可以不直接加载,而是在[全局描述符表][Global Descriptor Table](GDT)中添加一个段描述符,然后我们就可以通过[`ltr` 指令][`ltr` instruction]加上GDT序号加载我们的TSS。(这也是为什么我们将模块取名为 `gdt`。) [Global Descriptor Table]: https://web.archive.org/web/20190217233448/https://www.flingos.co.uk/docs/reference/Global-Descriptor-Table/ [`ltr` instruction]: https://www.felixcloutier.com/x86/ltr ### 全局描述符表 全局描述符表(GDT)是分页模式成为事实标准之前,用于[内存分段][memory segmentation]的遗留结构,但它在64位模式下仍然需要处理一些事情,比如内核态/用户态的配置以及TSS载入。 [memory segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation GDT是包含了程序 _段信息_ 的结构,在分页模式成为标准前,它在旧架构下起到隔离程序执行环境的作用。要了解更多关于分段的知识,可以查看 [“Three Easy Pieces” book] 这本书的同名章节。尽管GDT在64位模式下已经不再受到支持,但其依然有两个作用,切换内核空间和用户空间,以及加载TSS结构。 [“Three Easy Pieces” book]: http://pages.cs.wisc.edu/~remzi/OSTEP/ #### 创建GDT 我们来创建一个包含了静态 `TSS` 段的 `GDT` 静态结构: ```rust // in src/gdt.rs use x86_64::structures::gdt::{GlobalDescriptorTable, Descriptor}; lazy_static! { static ref GDT: GlobalDescriptorTable = { let mut gdt = GlobalDescriptorTable::new(); gdt.add_entry(Descriptor::kernel_code_segment()); gdt.add_entry(Descriptor::tss_segment(&TSS)); gdt }; } ``` 就像以前一样,我们依然使用了 `lazy_static` 宏,我们通过这段代码创建了TSS和GDT两个结构。 #### 加载GDT 我们先创建一个在 `init` 函数中调用的 `gdt::init` 函数: ```rust // in src/gdt.rs pub fn init() { GDT.load(); } // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); } ``` 现在GDT成功加载了进去(`_start` 会调用 `init` 函数),但我们依然会看到由于栈溢出引发的重启循环。 ### 最终步骤 现在的问题就变成了GDT并未被激活,代码段寄存器和TSS实际上依然引用着旧的GDT,并且我们也需要修改 double fault 对应的IDT条目,使其使用新的栈。 总结一下,我们需要做这些事情: 1. **重载代码段寄存器**: 我们修改了GDT,所以就需要重载代码段寄存器 `cs`,这一步对于修改GDT信息而言是必须的,比如覆写TSS。 2. **加载TSS** : 我们已经加载了包含TSS信息的GDT,但我们还需要告诉CPU使用新的TSS。 3. **更新IDT条目**: 当TSS加载完毕后,CPU就可以访问到新的中断栈表(IST)了,下面我们需要通过修改IDT条目告诉CPU使用新的 double fault 专属栈。 通过前两步,我们可以在 `gdt::init` 函数中调用 `code_selector` 和 `tss_selector` 两个变量,我们可以将两者打包为一个 `Selectors` 结构便于使用: ```rust // in src/gdt.rs use x86_64::structures::gdt::SegmentSelector; lazy_static! { static ref GDT: (GlobalDescriptorTable, Selectors) = { let mut gdt = GlobalDescriptorTable::new(); let code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS)); (gdt, Selectors { code_selector, tss_selector }) }; } struct Selectors { code_selector: SegmentSelector, tss_selector: SegmentSelector, } ``` 现在我们可以使用这两个变量去重载代码段寄存器 `cs` 并重载 `TSS`: ```rust // in src/gdt.rs pub fn init() { use x86_64::instructions::tables::load_tss; use x86_64::instructions::segmentation::{CS, Segment}; GDT.0.load(); unsafe { CS::set_reg(GDT.1.code_selector); load_tss(GDT.1.tss_selector); } } ``` 我们通过 [`set_cs`] 覆写了代码段寄存器,然后使用 [`load_tss`] 来重载了TSS,不过这两个函数都被标记为 `unsafe`,所以 `unsafe` 代码块是必须的。 原因很简单,如果通过这两个函数加载了无效的指针,那么很可能就会破坏掉内存安全性。 [`set_cs`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/segmentation/fn.set_cs.html [`load_tss`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tables/fn.load_tss.html 现在我们已经加载了有效的TSS和中断栈表,我们可以在IDT中为 double fault 对应的处理函数设置栈序号: ```rust // in src/interrupts.rs use crate::gdt; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); unsafe { idt.double_fault.set_handler_fn(double_fault_handler) .set_stack_index(gdt::DOUBLE_FAULT_IST_INDEX); // new } idt }; } ``` `set_stack_index` 函数也是不安全的,因为栈序号的有效性和引用唯一性是需要调用者去确保的。 搞定!现在CPU会在 double fault 异常被触发时自动切换到安全栈了,我们可以捕捉到 _所有_ 的 double fault,包括内核栈溢出: ![QEMU printing `EXCEPTION: DOUBLE FAULT` and a dump of the exception stack frame](qemu-double-fault-on-stack-overflow.png) 现在开始我们应该不会再看到 triple fault 了,但要确保这部分逻辑不被破坏,我们还需要为其添加一个测试。 ## 栈溢出测试 要测试我们的 `gdt` 模块,并确保在栈溢出时可以正确捕捉 double fault,我们可以添加一个集成测试。基本上就是在测试函数中主动触发一个 double fault 异常,确认异常处理函数是否正确运行了。 让我们建立一个最小化框架: ```rust // in tests/stack_overflow.rs #![no_std] #![no_main] use core::panic::PanicInfo; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` 就如同 `panic_handler` 这个测试一样,该测试应该是一个 [无约束测试][without a test harness],其原因就是我们无法在 double fault 被抛出后继续运行,所以连续进行多个测试其实是说不通的。要将测试修改为无约束模式,我们需要将这一段配置加入 `Cargo.toml`: ```toml # in Cargo.toml [[test]] name = "stack_overflow" harness = false ``` [without a test harness]: @/edition-2/posts/04-testing/index.md#no-harness-tests 现在 `cargo test --test stack_overflow` 命令应当可以通过编译了。但是毫无疑问的是还是会执行失败,因为 `unimplemented` 宏必然会导致程序报错。 ### 实现 `_start` `_start` 函数实现后的样子是这样的: ```rust // in tests/stack_overflow.rs use blog_os::serial_print; #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { serial_print!("stack_overflow::stack_overflow...\t"); blog_os::gdt::init(); init_test_idt(); // trigger a stack overflow stack_overflow(); panic!("Execution continued after stack overflow"); } #[allow(unconditional_recursion)] fn stack_overflow() { stack_overflow(); // for each recursion, the return address is pushed volatile::Volatile::new(0).read(); // prevent tail recursion optimizations } ``` 我们调用了 `gdt::init` 函数来初始化GDT,但我们并没有调用 `interrupts::init_idt` 函数,而是调用了一个全新的 `init_test_idt` 函数,我们稍后来实现它。原因就是,我们需要注册一个自定义的 double fault 处理函数,在被触发的时候调用 `exit_qemu(QemuExitCode::Success)` 函数,而非使用默认的逻辑。 `stack_overflow` 函数和我们之前在 `main.rs` 中写的那个函数几乎一模一样,唯一的区别就是在函数的最后使用 [`Volatile`] 类型 加入了一个 [volatile] 读取操作,用来阻止编译器进行 [_尾调用优化_][_tail call elimination_]。除却其他乱七八糟的效果,这个优化最主要的影响就是会让编辑器将最后一行是递归语句的函数转化为普通的循环。由于没有通过递归创建新的栈帧,所以栈自然也不会出问题。 [volatile]: https://en.wikipedia.org/wiki/Volatile_(computer_programming) [`Volatile`]: https://docs.rs/volatile/0.2.6/volatile/struct.Volatile.html [_tail call elimination_]: https://en.wikipedia.org/wiki/Tail_call 在当前用例中,stack overflow 是必须要触发的,所以我们在函数尾部加入了一个无效的 volatile 读取操作来让编译器无法进行此类优化,递归也就无法被自动降级为循环了。当然,为了关闭编译器针对递归的安全警告,我们也需要为这个函数加上 `allow(unconditional_recursion)` 开关。 ### 测试 IDT 作为上一小节的补充,我们说过要在测试专用的IDT中实现一个自定义的 double fault 异常处理函数,就像这样: ```rust // in tests/stack_overflow.rs use lazy_static::lazy_static; use x86_64::structures::idt::InterruptDescriptorTable; lazy_static! { static ref TEST_IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); unsafe { idt.double_fault .set_handler_fn(test_double_fault_handler) .set_stack_index(blog_os::gdt::DOUBLE_FAULT_IST_INDEX); } idt }; } pub fn init_test_idt() { TEST_IDT.load(); } ``` 这和我们在 `interrupts.rs` 中实现的版本十分相似,如同正常的IDT一样,我们都为 double fault 使用IST序号设置了特殊的栈,而上文中提到的 `init_test_idt` 函数则通过 `load` 函数将配置成功装载到CPU。 ### Double Fault 处理函数 那么现在就差处理函数本身了,它看起来是这样子的: ```rust // in tests/stack_overflow.rs use blog_os::{exit_qemu, QemuExitCode, serial_println}; use x86_64::structures::idt::InterruptStackFrame; extern "x86-interrupt" fn test_double_fault_handler( _stack_frame: InterruptStackFrame, _error_code: u64, ) -> ! { serial_println!("[ok]"); exit_qemu(QemuExitCode::Success); loop {} } ``` 这个处理函数被调用后,我们会使用代表成功的返回值退出QEMU,以此即可标记测试完成,但由于集成测试处于完全独立的运行环境,也记得在测试入口文件的头部再次加入 `#![feature(abi_x86_interrupt)]` 开关。 现在我们可以执行 `cargo test --test stack_overflow` 运行当前测试(或者执行 `cargo test` 运行所有测试),应当可以在控制台看到 `stack_overflow... [ok]` 这样的输出。另外,也可以试一下注释掉 `set_stack_index` 这一行的命令,可以观察到失败情况下的输出。 ## 总结 在本文中,我们学到了 double fault 是什么,以及触发它的原因。我们为 double fault 写了相应的处理函数、将错误信息打印到控制台并为它添加了一个集成测试。 同时,我们为 double fault 启用了栈指针切换功能,使其在栈溢出时也可以正常工作。在实现这个功能的同时,我们也学习了在旧架构中用于内存分段的任务状态段(TSS),而该结构又包含了中断栈表(IST)和全局描述符表(GDT)。 ## 下期预告 在下一篇文章中,我们会展开来说外部设备(如定时器、键盘、网络控制器)中断的处理方式。这些硬件中断十分类似于上文所说的异常,都需要通过IDT进行处理,只是中断并不是由CPU抛出的。 _中断控制器_ 会代理这些中断事件,并根据中断的优先级将其转发给CPU处理。我们将会以 [Intel 8259] (PIC) 中断控制器为例对其进行探索,并实现对键盘的支持。 [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ================================================ FILE: blog/content/edition-2/posts/07-hardware-interrupts/index.es.md ================================================ +++ title = "Interrupciones de Hardware" weight = 7 path = "es/hardware-interrupts" date = 2018-10-22 [extra] chapter = "Interrupciones" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ En esta publicación, configuramos el controlador de interrupciones programable para redirigir correctamente las interrupciones de hardware a la CPU. Para manejar estas interrupciones, agregamos nuevas entradas a nuestra tabla de descriptores de interrupciones, tal como lo hicimos con nuestros manejadores de excepciones. Aprenderemos cómo obtener interrupciones de temporizador periódicas y cómo recibir entrada del teclado. Este blog se desarrolla abiertamente en [GitHub]. Si tienes algún problema o pregunta, por favor abre un problema allí. También puedes dejar comentarios [al final]. El código fuente completo de esta publicación se puede encontrar en la rama [`post-07`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-07 ## Visión General Las interrupciones proporcionan una forma de notificar a la CPU sobre dispositivos de hardware conectados. Así que, en lugar de permitir que el kernel verifique periódicamente el teclado en busca de nuevos caracteres (un proceso llamado [_polling_]), el teclado puede notificar al kernel sobre cada pulsación de tecla. Esto es mucho más eficiente porque el kernel solo necesita actuar cuando algo ha sucedido. También permite tiempos de reacción más rápidos, ya que el kernel puede reaccionar inmediatamente y no solo en la siguiente consulta. [_polling_]: https://en.wikipedia.org/wiki/Polling_(computer_science) Conectar todos los dispositivos de hardware directamente a la CPU no es posible. En su lugar, un _controlador de interrupciones_ (interrupt controller) separado agrega las interrupciones de todos los dispositivos y luego notifica a la CPU: ``` ____________ _____ Temporizador ------------> | | | | Teclado ---------> | Interrupt |---------> | CPU | Otro Hardware ---> | Controller | |_____| Etc. -------------> |____________| ``` La mayoría de los controladores de interrupciones son programables, lo que significa que admiten diferentes niveles de prioridad para las interrupciones. Por ejemplo, esto permite dar a las interrupciones del temporizador una prioridad más alta que a las interrupciones del teclado para asegurar un mantenimiento del tiempo preciso. A diferencia de las excepciones, las interrupciones de hardware ocurren _de manera asincrónica_. Esto significa que son completamente independientes del código ejecutado y pueden ocurrir en cualquier momento. Por lo tanto, de repente tenemos una forma de concurrencia en nuestro kernel con todos los posibles errores relacionados con la concurrencia. El estricto modelo de propiedad de Rust nos ayuda aquí porque prohíbe el estado global mutable. Sin embargo, los bloqueos mutuos (deadlocks) siguen siendo posibles, como veremos más adelante en esta publicación. ## El 8259 PIC El [Intel 8259] es un controlador de interrupciones programable (PIC) introducido en 1976. Ha sido reemplazado durante mucho tiempo por el nuevo [APIC], pero su interfaz aún se admite en sistemas actuales por razones de compatibilidad hacia atrás. El PIC 8259 es significativamente más fácil de configurar que el APIC, así que lo utilizaremos para introducirnos a las interrupciones antes de cambiar al APIC en una publicación posterior. [APIC]: https://en.wikipedia.org/wiki/Intel_APIC_Architecture El 8259 tiene ocho líneas de interrupción y varias líneas para comunicarse con la CPU. Los sistemas típicos de aquella época estaban equipados con dos instancias del PIC 8259, uno primario y uno secundario, conectados a una de las líneas de interrupción del primario: [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ``` ____________ ____________ Reloj en Tiempo Real --> | | Temporizador -------------> | | ACPI -------------> | | Teclado-----------> | | _____ Disponible --------> | Secundario |----------------------> | Primario | | | Disponible --------> | Interrupt | Puerto Serial 2 -----> | Interrupt |---> | CPU | Ratón ------------> | Controller | Puerto Serial 1 -----> | Controller | |_____| Co-Procesador -----> | | Puerto Paralelo 2/3 -> | | ATA Primario ------> | | Disco flexible -------> | | ATA Secundario ----> |____________| Puerto Paralelo 1----> |____________| ``` Esta gráfica muestra la asignación típica de líneas de interrupción. Vemos que la mayoría de las 15 líneas tienen un mapeo fijo, por ejemplo, la línea 4 del PIC secundario está asignada al ratón. Cada controlador se puede configurar a través de dos [puertos de I/O], un puerto “comando” y un puerto “datos”. Para el controlador primario, estos puertos son `0x20` (comando) y `0x21` (datos). Para el controlador secundario, son `0xa0` (comando) y `0xa1` (datos). Para más información sobre cómo se pueden configurar los PIC, consulta el [artículo en osdev.org]. [puertos de I/O]: @/edition-2/posts/04-testing/index.md#i-o-ports [artículo en osdev.org]: https://wiki.osdev.org/8259_PIC ### Implementación La configuración predeterminada de los PIC no es utilizable porque envía números de vector de interrupción en el rango de 0–15 a la CPU. Estos números ya están ocupados por excepciones de la CPU. Por ejemplo, el número 8 corresponde a una doble falla. Para corregir este problema de superposición, necesitamos volver a asignar las interrupciones del PIC a números diferentes. El rango real no importa siempre que no se superponga con las excepciones, pero típicamente se elige el rango de 32–47, porque estos son los primeros números libres después de los 32 espacios de excepción. La configuración se realiza escribiendo valores especiales en los puertos de comando y datos de los PIC. Afortunadamente, ya existe una crate llamada [`pic8259`], por lo que no necesitamos escribir la secuencia de inicialización nosotros mismos. Sin embargo, si estás interesado en cómo funciona, consulta [su código fuente][pic crate source]. Es bastante pequeño y está bien documentado. [pic crate source]: https://docs.rs/crate/pic8259/0.10.1/source/src/lib.rs Para agregar la crate como una dependencia, agregamos lo siguiente a nuestro proyecto: [`pic8259`]: https://docs.rs/pic8259/0.10.1/pic8259/ ```toml # en Cargo.toml [dependencies] pic8259 = "0.10.1" ``` La principal abstracción proporcionada por la crate es la estructura [`ChainedPics`] que representa la disposición primario/secundario del PIC que vimos arriba. Está diseñada para ser utilizada de la siguiente manera: [`ChainedPics`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html ```rust // en src/interrupts.rs use pic8259::ChainedPics; use spin; pub const PIC_1_OFFSET: u8 = 32; pub const PIC_2_OFFSET: u8 = PIC_1_OFFSET + 8; pub static PICS: spin::Mutex = spin::Mutex::new(unsafe { ChainedPics::new(PIC_1_OFFSET, PIC_2_OFFSET) }); ``` Como se mencionó anteriormente, estamos estableciendo los desplazamientos para los PIC en el rango de 32–47. Al envolver la estructura `ChainedPics` en un `Mutex`, podemos obtener un acceso mutable seguro (a través del método [`lock`][spin mutex lock]), que necesitamos en el siguiente paso. La función `ChainedPics::new` es insegura porque desplazamientos incorrectos podrían causar un comportamiento indefinido. [spin mutex lock]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html#method.lock Ahora podemos inicializar el PIC 8259 en nuestra función `init`: ```rust // en src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; // nuevo } ``` Usamos la función [`initialize`] para realizar la inicialización del PIC. Al igual que la función `ChainedPics::new`, esta función también es insegura porque puede causar un comportamiento indefinido si el PIC está mal configurado. [`initialize`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html#method.initialize Si todo va bien, deberíamos seguir viendo el mensaje "¡No se ha bloqueado!" al ejecutar `cargo run`. ## Habilitando Interrupciones Hasta ahora, nada sucedió porque las interrupciones todavía están deshabilitadas en la configuración de la CPU. Esto significa que la CPU no escucha al controlador de interrupciones en absoluto, por lo que ninguna interrupción puede llegar a la CPU. Cambiemos eso: ```rust // en src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; x86_64::instructions::interrupts::enable(); // nuevo } ``` La función `interrupts::enable` de la crate `x86_64` ejecuta la instrucción especial `sti` (“set interrupts”) para habilitar las interrupciones externas. Cuando intentamos `cargo run` ahora, vemos que ocurre una doble falla: ![QEMU imprimiendo `EXCEPTION: DOUBLE FAULT` debido al temporizador de hardware](qemu-hardware-timer-double-fault.png) La razón de esta doble falla es que el temporizador de hardware (el [Intel 8253], para ser exactos) está habilitado por defecto, por lo que comenzamos a recibir interrupciones de temporizador tan pronto como habilitamos las interrupciones. Dado que aún no hemos definido una función de manejador para ello, se invoca nuestro manejador de doble falla. [Intel 8253]: https://en.wikipedia.org/wiki/Intel_8253 ## Manejando Interrupciones de Temporizador Como vemos en la gráfica [arriba](#el-8259-pic), el temporizador utiliza la línea 0 del PIC primario. Esto significa que llega a la CPU como interrupción 32 (0 + desplazamiento 32). En lugar de codificar rígidamente el índice 32, lo almacenamos en un enum `InterruptIndex`: ```rust // en src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Temporizador = PIC_1_OFFSET, } impl InterruptIndex { fn as_u8(self) -> u8 { self as u8 } fn as_usize(self) -> usize { usize::from(self.as_u8()) } } ``` El enum es un [enum tipo C] para que podamos especificar directamente el índice para cada variante. El atributo `repr(u8)` especifica que cada variante se representa como un `u8`. Agregaremos más variantes para otras interrupciones en el futuro. [enum tipo C]: https://doc.rust-lang.org/reference/items/enumerations.html#custom-discriminant-values-for-fieldless-enumerations Ahora podemos agregar una función de manejador para la interrupción del temporizador: ```rust // en src/interrupts.rs use crate::print; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] idt[InterruptIndex::Temporizador.as_usize()] .set_handler_fn(timer_interrupt_handler); // nuevo idt }; } extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); } ``` Nuestro `timer_interrupt_handler` tiene la misma firma que nuestros manejadores de excepciones, porque la CPU reacciona de manera idéntica a las excepciones y a las interrupciones externas (la única diferencia es que algunas excepciones empujan un código de error). La estructura [`InterruptDescriptorTable`] implementa el rasgo [`IndexMut`], por lo que podemos acceder a entradas individuales a través de la sintaxis de indexación de arrays. [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [`IndexMut`]: https://doc.rust-lang.org/core/ops/trait.IndexMut.html En nuestro manejador de interrupciones del temporizador, imprimimos un punto en la pantalla. Como la interrupción del temporizador ocurre periódicamente, esperaríamos ver un punto apareciendo en cada tick del temporizador. Sin embargo, cuando lo ejecutamos, vemos que solo se imprime un solo punto: ![QEMU imprimiendo solo un punto por el temporizador de hardware](qemu-single-dot-printed.png) ### Fin de la Interrupción La razón es que el PIC espera una señal explícita de “fin de interrupción” (EOI) de nuestro manejador de interrupciones. Esta señal le dice al controlador que la interrupción ha sido procesada y que el sistema está listo para recibir la siguiente interrupción. Así que el PIC piensa que todavía estamos ocupados procesando la primera interrupción del temporizador y espera pacientemente la señal EOI antes de enviar la siguiente. Para enviar el EOI, usamos nuestra estructura estática `PICS` nuevamente: ```rust // en src/interrupts.rs extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Temporizador.as_u8()); } } ``` El método `notify_end_of_interrupt` determina si el PIC primario o secundario envió la interrupción y luego utiliza los puertos de `comando` y `datos` para enviar una señal EOI a los controladores respectivos. Si el PIC secundario envió la interrupción, ambos PIC deben ser notificados porque el PIC secundario está conectado a una línea de entrada del PIC primario. Debemos tener cuidado de usar el número de vector de interrupción correcto; de lo contrario, podríamos eliminar accidentalmente una interrupción no enviada importante o hacer que nuestro sistema se cuelgue. Esta es la razón por la que la función es insegura. Cuando ejecutamos ahora `cargo run`, vemos puntos apareciendo periódicamente en la pantalla: ![QEMU imprimiendo puntos consecutivos mostrando el temporizador de hardware](qemu-hardware-timer-dots.gif) ### Configurando el Temporizador El temporizador de hardware que usamos se llama _Temporizador de Intervalo Programable_ (Programmable Interval Timer), o PIT, para abreviar. Como su nombre indica, es posible configurar el intervalo entre dos interrupciones. No entraremos en detalles aquí porque pronto pasaremos al [temporizador APIC], pero la wiki de OSDev tiene un artículo extenso sobre la [configuración del PIT]. [temporizador APIC]: https://wiki.osdev.org/APIC_timer [configuración del PIT]: https://wiki.osdev.org/Programmable_Interval_Timer ## Bloqueos Mutuos Ahora tenemos una forma de concurrencia en nuestro kernel: Las interrupciones del temporizador ocurren de manera asincrónica, por lo que pueden interrumpir nuestra función `_start` en cualquier momento. Afortunadamente, el sistema de propiedad de Rust previene muchos tipos de errores relacionados con la concurrencia en tiempo de compilación. Una notable excepción son los bloqueos mutuos (deadlocks). Los bloqueos mutuos ocurren si un hilo intenta adquirir un bloqueo que nunca se liberará. Así, el hilo se cuelga indefinidamente. Ya podemos provocar un bloqueo mutuo en nuestro kernel. Recuerda que nuestra macro `println` llama a la función `vga_buffer::_print`, que [bloquea un `WRITER` global][vga spinlock] utilizando un spinlock: [vga spinlock]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ```rust // en src/vga_buffer.rs […] #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` Bloquea el `WRITER`, llama a `write_fmt` en él y lo desbloquea implícitamente al final de la función. Ahora imagina que una interrupción ocurre mientras `WRITER` está bloqueado y el manejador de interrupciones intenta imprimir algo también: | Timestep | _start | manejador_interrupcion | | -------- | ------------------------ | -------------------------------------------------------------- | | 0 | llama a `println!` |   | | 1 | `print` bloquea `WRITER` |   | | 2 | | **ocurre la interrupción**, el manejador comienza a ejecutarse | | 3 | | llama a `println!` | | 4 | | `print` intenta bloquear `WRITER` (ya bloqueado) | | 5 | | `print` intenta bloquear `WRITER` (ya bloqueado) | | … | | … | | _nunca_ | _desbloquear `WRITER`_ | El `WRITER` está bloqueado, así que el manejador de interrupciones espera hasta que se libere. Pero esto nunca sucede, porque la función `_start` solo continúa ejecutándose después de que el manejador de interrupciones regrese. Así, todo el sistema se cuelga. ### Provocando un Bloqueo Mutuo Podemos provocar fácilmente un bloqueo mutuo así en nuestro kernel imprimiendo algo en el bucle al final de nuestra función `_start`: ```rust // en src/main.rs #[no_mangle] pub extern "C" fn _start() -> ! { […] loop { use blog_os::print; print!("-"); // nuevo } } ``` Cuando lo ejecutamos en QEMU, obtenemos una salida de la forma: ![Salida de QEMU con muchas filas de guiones y sin puntos](./qemu-deadlock.png) Vemos que solo se imprimen un número limitado de guiones hasta que ocurre la primera interrupción del temporizador. Entonces el sistema se cuelga porque el manejador de interrupciones del temporizador provoca un bloqueo mutuo cuando intenta imprimir un punto. Esta es la razón por la que no vemos puntos en la salida anterior. El número real de guiones varía entre ejecuciones porque la interrupción del temporizador ocurre de manera asincrónica. Esta no determinación es lo que hace que los errores relacionados con la concurrencia sean tan difíciles de depurar. ### Solucionando el Bloqueo Mutuo Para evitar este bloqueo mutuo, podemos deshabilitar las interrupciones mientras el `Mutex` está bloqueado: ```rust // en src/vga_buffer.rs /// Imprime la cadena formateada dada en el búfer de texto VGA /// a través de la instancia global `WRITER`. #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // nuevo interrupts::without_interrupts(|| { // nuevo WRITER.lock().write_fmt(args).unwrap(); }); } ``` La función [`without_interrupts`] toma un [closure] y lo ejecuta en un entorno sin interrupciones. La usamos para asegurarnos de que no se produzca ninguna interrupción mientras el `Mutex` esté bloqueado. Cuando ejecutamos nuestro kernel ahora, vemos que sigue funcionando sin colgarse. (Todavía no notamos ningún punto, pero esto es porque están deslizándose demasiado rápido. Intenta ralentizar la impresión, por ejemplo, poniendo un `for _ in 0..10000 {}` dentro del bucle). [`without_interrupts`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.without_interrupts.html [closure]: https://doc.rust-lang.org/book/ch13-01-closures.html Podemos aplicar el mismo cambio a nuestra función de impresión serial para asegurarnos de que tampoco ocurran bloqueos mutuos con ella: ```rust // en src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // nuevo interrupts::without_interrupts(|| { // nuevo SERIAL1 .lock() .write_fmt(args) .expect("Error al imprimir por serie"); }); } ``` Ten en cuenta que deshabilitar interrupciones no debería ser una solución general. El problema es que aumenta la latencia de interrupción en el peor de los casos, es decir, el tiempo hasta que el sistema reacciona a una interrupción. Por lo tanto, las interrupciones solo deben deshabilitarse por un tiempo muy corto. ## Solucionando una Condición de Carrera Si ejecutas `cargo test`, podrías ver que la prueba `test_println_output` falla: ``` > cargo test --lib […] Ejecutando 4 pruebas test_breakpoint_exception...[ok] test_println... [ok] test_println_many... [ok] test_println_output... [failed] Error: se bloqueó en 'assertion failed: `(left == right)` left: `'.'`, right: `'S'`', src/vga_buffer.rs:205:9 ``` La razón es una _condición de carrera_ entre la prueba y nuestro manejador de temporizador. Recuerda que la prueba se ve así: ```rust // en src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Una cadena de prueba que cabe en una sola línea"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` La condición de carrera ocurre porque el manejador de interrupciones del temporizador podría ejecutarse entre el `println` y la lectura de los caracteres en la pantalla. Ten en cuenta que esto no es una peligrosa _data race_, que Rust previene completamente en tiempo de compilación. Consulta el [_Rustonomicon_][nomicon-races] para más detalles. [nomicon-races]: https://doc.rust-lang.org/nomicon/races.html Para solucionar esto, necesitamos mantener el `WRITER` bloqueado durante toda la duración de la prueba, para que el manejador de temporizador no pueda escribir un carácter en la pantalla en medio. La prueba corregida se ve así: ```rust // en src/vga_buffer.rs #[test_case] fn test_println_output() { use core::fmt::Write; use x86_64::instructions::interrupts; let s = "Una cadena de prueba que cabe en una sola línea"; interrupts::without_interrupts(|| { let mut writer = WRITER.lock(); writeln!(writer, "\n{}", s).expect("writeln falló"); for (i, c) in s.chars().enumerate() { let screen_char = writer.buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } }); } ``` Hemos realizado los siguientes cambios: - Mantenemos el escritor bloqueado durante toda la prueba utilizando el método `lock()` explícitamente. En lugar de `println`, usamos la macro [`writeln`] que permite imprimir en un escritor que ya está bloqueado. - Para evitar otro bloqueo mutuo, deshabilitamos las interrupciones durante la duración de la prueba. De lo contrario, la prueba podría ser interrumpida mientras el escritor sigue bloqueado. - Dado que el manejador de interrupciones del temporizador aún puede ejecutarse antes de la prueba, imprimimos una nueva línea adicional `\n` antes de imprimir la cadena `s`. De esta manera, evitamos fallar en la prueba cuando el manejador de temporizador ya ha impreso algunos puntos en la línea actual. [`writeln`]: https://doc.rust-lang.org/core/macro.writeln.html Con los cambios anteriores, `cargo test` ahora tiene éxito de manera determinista. Esta fue una condición de carrera muy inofensiva que solo causó una falla en la prueba. Como puedes imaginar, otras condiciones de carrera pueden ser mucho más difíciles de depurar debido a su naturaleza no determinista. Afortunadamente, Rust nos previene de condiciones de data race, que son la clase más seria de condiciones de carrera, ya que pueden causar todo tipo de comportamientos indefinidos, incluyendo bloqueos del sistema y corrupción silenciosa de memoria. ## La Instrucción `hlt` Hasta ahora, hemos utilizado una simple instrucción de bucle vacío al final de nuestras funciones `_start` y `panic`. Esto hace que la CPU gire sin descanso, y por lo tanto funciona como se espera. Pero también es muy ineficiente, porque la CPU sigue funcionando a toda velocidad incluso cuando no hay trabajo que hacer. Puedes ver este problema en tu administrador de tareas cuando ejecutas tu kernel: el proceso de QEMU necesita cerca del 100% de CPU todo el tiempo. Lo que realmente queremos hacer es detener la CPU hasta que llegue la próxima interrupción. Esto permite que la CPU entre en un estado de sueño en el que consume mucho menos energía. La instrucción [`hlt`] hace exactamente eso. Vamos a usar esta instrucción para crear un bucle infinito eficiente en energía: [`hlt`]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) ```rust // en src/lib.rs pub fn hlt_loop() -> ! { loop { x86_64::instructions::hlt(); } } ``` La función `instructions::hlt` es solo un [delgado envoltorio] alrededor de la instrucción de ensamblador. Es segura porque no hay forma de que comprometa la seguridad de la memoria. [delgado envoltorio]: https://github.com/rust-osdev/x86_64/blob/5e8e218381c5205f5777cb50da3ecac5d7e3b1ab/src/instructions/mod.rs#L16-L22 Ahora podemos utilizar este `hlt_loop` en lugar de los bucles infinitos en nuestras funciones `_start` y `panic`: ```rust // en src/main.rs #[no_mangle] pub extern "C" fn _start() -> ! { […] println!("¡No se ha bloqueado!"); blog_os::hlt_loop(); // nuevo } #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); blog_os::hlt_loop(); // nuevo } ``` Actualicemos también nuestro `lib.rs`: ```rust // en src/lib.rs /// Punto de entrada para `cargo test` #[cfg(test)] #[no_mangle] pub extern "C" fn _start() -> ! { init(); test_main(); hlt_loop(); // nuevo } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[falló]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); hlt_loop(); // nuevo } ``` Cuando ejecutamos nuestro kernel ahora en QEMU, vemos un uso de CPU mucho más bajo. ## Entrada del Teclado Ahora que podemos manejar interrupciones de dispositivos externos, finalmente podemos agregar soporte para la entrada del teclado. Esto nos permitirá interactuar con nuestro kernel por primera vez. [PS/2]: https://en.wikipedia.org/wiki/PS/2_port Al igual que el temporizador de hardware, el controlador del teclado ya está habilitado por defecto. Así que cuando presionas una tecla, el controlador del teclado envía una interrupción al PIC, que la reenvía a la CPU. La CPU busca una función de manejador en la IDT, pero la entrada correspondiente está vacía. Por lo tanto, ocurre una doble falla. Así que agreguemos una función de manejador para la interrupción del teclado. Es bastante similar a cómo definimos el manejador para la interrupción del temporizador; solo utiliza un número de interrupción diferente: ```rust // en src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Temporizador = PIC_1_OFFSET, Teclado, // nuevo } lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] // nuevo idt[InterruptIndex::Teclado.as_usize()] .set_handler_fn(keyboard_interrupt_handler); idt }; } extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("k"); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Teclado.as_u8()); } } ``` Como vemos en la gráfica [arriba](#el-8259-pic), el teclado utiliza la línea 1 del PIC primario. Esto significa que llega a la CPU como interrupción 33 (1 + desplazamiento 32). Agregamos este índice como una nueva variante `Teclado` al enum `InterruptIndex`. No necesitamos especificar el valor explícitamente, ya que de forma predeterminada toma el valor anterior más uno, que también es 33. En el manejador de interrupciones, imprimimos una `k` y enviamos la señal de fin de interrupción al controlador de interrupciones. Ahora vemos que una `k` aparece en la pantalla cuando presionamos una tecla. Sin embargo, esto solo funciona para la primera tecla que presionamos. Incluso si seguimos presionando teclas, no aparecen más `k`s en la pantalla. Esto se debe a que el controlador del teclado no enviará otra interrupción hasta que hayamos leído el llamado _scancode_ de la tecla presionada. ### Leyendo los Scancodes Para averiguar _qué_ tecla fue presionada, necesitamos consultar al controlador del teclado. Hacemos esto leyendo desde el puerto de datos del controlador PS/2, que es el [puerto de I/O] con el número `0x60`: [puerto de I/O]: @/edition-2/posts/04-testing/index.md#i-o-ports ```rust // en src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; print!("{}", scancode); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Teclado.as_u8()); } } ``` Usamos el tipo [`Port`] de la crate `x86_64` para leer un byte del puerto de datos del teclado. Este byte se llama [_scancode_] y representa la pulsación/liberación de la tecla. Aún no hacemos nada con el scancode, excepto imprimirlo en la pantalla: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html [_scancode_]: https://en.wikipedia.org/wiki/Scancode ![QEMU imprimiendo scancodes en la pantalla cuando se presionan teclas](qemu-printing-scancodes.gif) La imagen anterior muestra que estoy escribiendo lentamente "123". Vemos que las teclas adyacentes tienen scancodes adyacentes y que presionar una tecla causa un scancode diferente al soltarla. Pero, ¿cómo traducimos los scancodes a las acciones de las teclas exactamente? ### Interpretando los Scancodes Existen tres estándares diferentes para el mapeo entre scancodes y teclas, los llamados _conjuntos de scancode_. Los tres se remontan a los teclados de las primeras computadoras IBM: el [IBM XT], el [IBM 3270 PC] y el [IBM AT]. Afortunadamente, las computadoras posteriores no continuaron con la tendencia de definir nuevos conjuntos de scancode, sino que emularon los conjuntos existentes y los ampliaron. Hoy en día, la mayoría de los teclados pueden configurarse para emular cualquiera de los tres conjuntos. [IBM XT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer_XT [IBM 3270 PC]: https://en.wikipedia.org/wiki/IBM_3270_PC [IBM AT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer/AT Por defecto, los teclados PS/2 emulan el conjunto de scancode 1 ("XT"). En este conjunto, los 7 bits inferiores de un byte de scancode definen la tecla, y el bit más significativo define si se trata de una pulsación ("0") o una liberación ("1"). Las teclas que no estaban presentes en el teclado original de [IBM XT], como la tecla de entrada en el teclado numérico, generan dos scancodes en sucesión: un byte de escape `0xe0` seguido de un byte que representa la tecla. Para obtener una lista de todos los scancodes del conjunto 1 y sus teclas correspondientes, consulta la [Wiki de OSDev][scancode set 1]. [scancode set 1]: https://wiki.osdev.org/Keyboard#Scan_Code_Set_1 Para traducir los scancodes a teclas, podemos usar una instrucción `match`: ```rust // en src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; // nuevo let key = match scancode { 0x02 => Some('1'), 0x03 => Some('2'), 0x04 => Some('3'), 0x05 => Some('4'), 0x06 => Some('5'), 0x07 => Some('6'), 0x08 => Some('7'), 0x09 => Some('8'), 0x0a => Some('9'), 0x0b => Some('0'), _ => None, }; if let Some(key) = key { print!("{}", key); } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Teclado.as_u8()); } } ``` El código anterior traduce las pulsaciones de las teclas numéricas 0-9 y ignora todas las otras teclas. Utiliza una declaración [match] para asignar un carácter o `None` a cada scancode. Luego, utiliza [`if let`] para desestructurar la opción `key`. Al usar el mismo nombre de variable `key` en el patrón, [somos sombras de] la declaración anterior, lo cual es un patrón común para desestructurar tipos `Option` en Rust. [match]: https://doc.rust-lang.org/book/ch06-02-match.html [`if let`]: https://doc.rust-lang.org/book/ch19-01-all-the-places-for-patterns.html#conditional-if-let-expressions [sombra]: https://doc.rust-lang.org/book/ch03-01-variables-and-mutabilidad.html#shadowing Ahora podemos escribir números: ![QEMU imprimiendo números en la pantalla](qemu-printing-numbers.gif) Traducir las otras teclas funciona de la misma manera. Afortunadamente, existe una crate llamada [`pc-keyboard`] para traducir los scancodes de los conjuntos de scancode 1 y 2, así que no tenemos que implementar esto nosotros mismos. Para usar la crate, la añadimos a nuestro `Cargo.toml` e importamos en nuestro `lib.rs`: [`pc-keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/ ```toml # en Cargo.toml [dependencies] pc-keyboard = "0.7.0" ``` Ahora podemos usar esta crate para reescribir nuestro `keyboard_interrupt_handler`: ```rust // en src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use spin::Mutex; use x86_64::instructions::port::Port; lazy_static! { static ref KEYBOARD: Mutex> = Mutex::new(Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore) ); } let mut keyboard = KEYBOARD.lock(); let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Teclado.as_u8()); } } ``` Usamos la macro `lazy_static` para crear un objeto estático [`Keyboard`] protegido por un Mutex. Inicializamos el `Keyboard` con un diseño de teclado estadounidense y el conjunto de scancode 1. El parámetro [`HandleControl`] permite mapear `ctrl+[a-z]` a los caracteres Unicode `U+0001` a `U+001A`. No queremos hacer eso, así que usamos la opción `Ignore` para manejar el `ctrl` como teclas normales. [`HandleControl`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/enum.HandleControl.html En cada interrupción, bloqueamos el Mutex, leemos el scancode del controlador del teclado y lo pasamos al método [`add_byte`], que traduce el scancode en un `Option`. El [`KeyEvent`] contiene la tecla que causó el evento y si fue un evento de pulsación o liberación. [`Keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html [`add_byte`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.add_byte [`KeyEvent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.KeyEvent.html Para interpretar este evento de tecla, lo pasamos al método [`process_keyevent`], que traduce el evento de tecla a un carácter, si es posible. Por ejemplo, traduce un evento de pulsación de la tecla `A` a un carácter minúscula `a` o un carácter mayúscula `A`, dependiendo de si la tecla de mayúsculas (shift) estaba presionada. [`process_keyevent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.process_keyevent Con este manejador de interrupciones modificado, ahora podemos escribir texto: ![Escribiendo "Hola Mundo" en QEMU](qemu-typing.gif) ### Configurando el Teclado Es posible configurar algunos aspectos de un teclado PS/2, por ejemplo, qué conjunto de scancode debe usar. No lo cubriremos aquí porque esta publicación ya es lo suficientemente larga, pero la Wiki de OSDev tiene una visión general de los posibles [comandos de configuración]. [comandos de configuración]: https://wiki.osdev.org/PS/2_Keyboard#Commands ## Resumen Esta publicación explicó cómo habilitar y manejar interrupciones externas. Aprendimos sobre el PIC 8259 y su disposición primario/secundario, la reasignación de los números de interrupción y la señal de "fin de interrupción". Implementamos manejadores para el temporizador de hardware y el teclado y aprendimos sobre la instrucción `hlt`, que detiene la CPU hasta la siguiente interrupción. Ahora podemos interactuar con nuestro kernel y tenemos algunos bloques fundamentales para crear una pequeña terminal o juegos simples. ## ¿Qué sigue? Las interrupciones de temporizador son esenciales para un sistema operativo porque proporcionan una manera de interrumpir periódicamente el proceso en ejecución y permitir que el kernel recupere el control. El kernel puede luego cambiar a un proceso diferente y crear la ilusión de que varios procesos se están ejecutando en paralelo. Pero antes de que podamos crear procesos o hilos, necesitamos una forma de asignar memoria para ellos. Las próximas publicaciones explorarán la gestión de memoria para proporcionar este bloque fundamental. ================================================ FILE: blog/content/edition-2/posts/07-hardware-interrupts/index.fa.md ================================================ +++ title = "وقفه‌های سخت‌افزاری" weight = 7 path = "fa/hardware-interrupts" date = 2018-10-22 [extra] # Please update this when updating the translation translation_based_on_commit = "b6ff79ac3290ea92c86763d49cc6c0ff4fb0ea30" # GitHub usernames of the people that translated this post translators = ["hamidrezakp", "MHBahrampour"] rtl = true +++ در این پست ما کنترل کننده قابل برنامه ریزی وقفه را تنظیم می کنیم تا وقفه های سخت افزاری را به درستی به پردازنده منتقل کند. برای مدیریت این وقفه‌ها ، موارد جدیدی به جدول توصیف کننده وقفه اضافه می کنیم ، دقیقاً مانند کارهایی که برای کنترل کننده های استثنا انجام دادیم. ما یاد خواهیم گرفت که چگونه وقفه های متناوب تایمر را گرفته و چگونه از صفحه کلید ورودی بگیریم. این بلاگ بصورت آزاد بر روی [گیت‌هاب] توسعه داده شده. اگر مشکل یا سوالی دارید، لطفاً آن‌جا یک ایشو باز کنید. همچنین می‌توانید [در زیر] این پست کامنت بگذارید. سورس کد کامل این پست را می‌توانید در بِرَنچ [`post-07`][post branch] پیدا کنید. [گیت‌هاب]: https://github.com/phil-opp/blog_os [در زیر]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-07 ## مقدمه وقفه‌ها راهی برای اطلاع به پردازنده از دستگاه های سخت افزاری متصل ارائه می دهند. بنابراین به جای اینکه پردازنده به طور دوره‌ای صفحه کلید را برای کاراکترهای جدید بررسی کند(فرآیندی به نام [_polling_]) ، صفحه کلید می‌تواند هسته را برای هر فشردن کلید مطلع کند. این بسیار کارآمدتر است زیرا هسته فقط زمانی که اتفاقی افتاده است باید عمل کند. همچنین زمان واکنش سریع تری را فراهم می کند ، زیرا هسته می تواند بلافاصله و نه تنها در پول(کلمه: poll) بعدی واکنش نشان دهد. [_polling_]: https://en.wikipedia.org/wiki/Polling_(computer_science) اتصال مستقیم تمام دستگاه های سخت افزاری به پردازنده امکان پذیر نیست. در عوض ، یک _کنترل کننده وقفه_ جداگانه ، وقفه‌ها را از همه دستگاه‌ها جمع کرده و سپس پردازنده را مطلع می کند: ``` ____________ _____ Timer ------------> | | | | Keyboard ---------> | Interrupt |---------> | CPU | Other Hardware ---> | Controller | |_____| Etc. -------------> |____________| ``` بیشتر کنترل کننده های وقفه قابل برنامه ریزی هستند ، به این معنی که آنها از اولویت های مختلف برای وقفه‌ها پشتیبانی می کنند. به عنوان مثال ، این اجازه را می دهند تا به وقفه های تایمر اولویت بیشتری نسبت به وقفه های صفحه کلید داد تا از زمان بندی دقیق اطمینان حاصل شود. بر خلاف استثناها ، وقفه های سخت افزاری _به صورت نا هم زمان_ اتفاق می افتند. این بدان معنی است که آنها کاملاً از کد اجرا شده مستقل هستند و در هر زمان ممکن است رخ دهند. بنابراین ما ناگهان شکلی از همروندی در هسته خود با تمام اشکالات احتمالی مرتبط با همروندی داریم. مدل مالکیت دقیق راست در اینجا به ما کمک می کند زیرا مانع حالت تغییر پذیری گلوبال است(mutable global state). با این حال، همچنان احتمال بن بست وجود دارد، همانطور که بعداً در این پست خواهیم دید. ## The 8259 PIC [Intel 8259] یک کنترل کننده وقفه قابل برنامه ریزی (PIC) است که در سال 1976 معرفی شد. مدت طولانی است که با [APIC] جدید جایگزین شده است ، اما رابط آن هنوز به دلایل سازگاری در سیستم های فعلی پشتیبانی می شود. 8259 PIC به طور قابل ملاحظه ای آسان تر از APIC است ، بنابراین ما قبل از مهاجرت و استفاده از APIC در آینده، از آن برای معرفی وقفه استفاده خواهیم کرد. [APIC]: https://en.wikipedia.org/wiki/Intel_APIC_Architecture 8259 دارای 8 خط وقفه و چندین خط برای برقراری ارتباط با پردازنده است. سیستم های معمولی در آن زمان به دو نمونه از 8259 PIC مجهز بودند ، یکی اصلی و دیگری PIC ثانویه که به یکی از خطوط وقفه اولیه متصل است: [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ``` ____________ ____________ Real Time Clock --> | | Timer -------------> | | ACPI -------------> | | Keyboard-----------> | | _____ Available --------> | Secondary |----------------------> | Primary | | | Available --------> | Interrupt | Serial Port 2 -----> | Interrupt |---> | CPU | Mouse ------------> | Controller | Serial Port 1 -----> | Controller | |_____| Co-Processor -----> | | Parallel Port 2/3 -> | | Primary ATA ------> | | Floppy disk -------> | | Secondary ATA ----> |____________| Parallel Port 1----> |____________| ``` این نمودار نحوه اتصال معمول خطوط وقفه را نشان می دهد. می بینیم که بیشتر 15 خط دارای یک نگاشت ثابت هستند ، به عنوان مثال خط 4 PIC ثانویه به ماوس اختصاص داده شده است. هر کنترل کننده را می توان از طریق دو [پورت ورودی/خروجی] ، یک پورت "فرمان" و یک پورت "داده" پیکربندی کرد. برای کنترل کننده اصلی ، این پورت‌ها `0x20` (فرمان) و`0x21` (داده) هستند. برای کنترل کننده ثانویه آنها `0xa0` (فرمان) و `0xa1` (داده) هستند. برای اطلاعات بیشتر در مورد نحوه پیکربندی PIC ها ، به [مقاله‌ای در osdev.org] مراجعه کنید. [پورت ورودی/خروجی]: @/edition-2/posts/04-testing/index.md#i-o-ports [مقاله‌ای در osdev.org]: https://wiki.osdev.org/8259_PIC ### پیاده سازی پیکربندی پیش فرض PIC ها قابل استفاده نیست، زیرا اعداد بردار وقفه را در محدوده 15-0 به پردازنده می فرستد. این اعداد در حال حاضر توسط استثناهای پردازنده اشغال شده‌اند ، به عنوان مثال شماره 8 مربوط به یک خطای دوگانه است. برای رفع این مشکل همپوشانی، باید وقفه های PIC را به اعداد دیگری تغییر دهیم. دامنه واقعی مهم نیست به شرطی که با استثناها همپوشانی نداشته باشد ، اما معمولاً محدوده 47-32 انتخاب می شود، زیرا اینها اولین شماره های آزاد پس از 32 اسلات استثنا هستند. پیکربندی با نوشتن مقادیر ویژه در پورت های فرمان و داده PIC ها اتفاق می افتد. خوشبختانه قبلا کرت‌ای به نام [`pic8259`] وجود دارد، بنابراین نیازی نیست که توالی راه اندازی اولیه را خودمان بنویسیم. در صورت علاقه‌مند بودن به چگونگی عملکرد آن، [کد منبع آن][pic crate source] را بررسی کنید، نسبتاً کوچک و دارای مستند خوبی است. [pic crate source]: https://docs.rs/crate/pic8259/0.10.1/source/src/lib.rs برای افزودن کرت به عنوان وابستگی ، موارد زیر را به پروژه خود اضافه می کنیم: [`pic8259`]: https://docs.rs/pic8259/0.10.1/pic8259/ ```toml # in Cargo.toml [dependencies] pic8259 = "0.10.1" ``` انتزاع اصلی ارائه شده توسط کرت، ساختمان [`ChainedPics`] است که نمایانگر طرح اولیه/ثانویه PIC است که در بالا دیدیم. برای استفاده به روش زیر طراحی شده است: [`ChainedPics`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html ```rust // in src/interrupts.rs use pic8259::ChainedPics; use spin; pub const PIC_1_OFFSET: u8 = 32; pub const PIC_2_OFFSET: u8 = PIC_1_OFFSET + 8; pub static PICS: spin::Mutex = spin::Mutex::new(unsafe { ChainedPics::new(PIC_1_OFFSET, PIC_2_OFFSET) }); ``` همانطور که در بالا اشاره کردیم، افست PIC ها را در محدوده 47-32 تنظیم می کنیم. با بسته بندی ساختمان `ChainedPics` در `Mutex` می توانیم دسترسی قابل تغییر و ایمن (از طریق [متد lock][spin mutex lock]) به آن داشته باشیم، که در مرحله بعدی به آن نیاز داریم. تابع `ChainedPics::new` ناامن است زیرا افست اشتباه ممکن است باعث رفتار نامشخص شود. [spin mutex lock]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html#method.lock اکنون می توانیم 8259 PIC را در تابع `init` خود مقدار دهی اولیه کنیم: ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; // new } ``` ما از تابع [`initialize`] برای انجام مقداردهی اولیه PIC استفاده می کنیم. مانند تابع `ChainedPics::new`، این تابع نیز ایمن نیست زیرا در صورت عدم پیکربندی صحیح PIC می تواند باعث رفتار نامشخص شود. [`initialize`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html#method.initialize اگر همه چیز خوب پیش برود ، باید هنگام اجرای `cargo run` پیام "It did not crash" را ببینیم. ## فعال‌سازی وقفه‌ها تاکنون هیچ اتفاقی نیفتاده است زیرا وقفه‌ها همچنان در تنظیمات پردازنده غیرفعال هستند. این بدان معناست که پردازنده به هیچ وجه به کنترل کننده وقفه گوش نمی دهد، بنابراین هیچ وقفه ای نمی تواند به پردازنده برسد. بیایید این را تغییر دهیم: ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; x86_64::instructions::interrupts::enable(); // new } ``` تابع `interrupts::enable` از کرت `x86_64` دستورالعمل خاص `sti` را اجرا می کند (“set interrupts”) تا وقفه های خارجی را فعال کند. اکنون وقتی `cargo run` را امتحان می کنیم ، می بینیم که یک خطای دوگانه رخ می‌دهد: ![QEMU printing `EXCEPTION: DOUBLE FAULT` because of hardware timer](qemu-hardware-timer-double-fault.png) دلیل این خطای دوگانه این است که تایمر سخت افزاری (به طور دقیق تر [Intel 8253]) به طور پیش فرض فعال است، بنابراین به محض فعال کردن وقفه‌ها ، شروع به دریافت وقفه های تایمر می کنیم. از آنجا که هنوز یک تابع کنترل کننده برای آن تعریف نکرده‌ایم ، کنترل کننده خطای دوگانه فراخوانی می شود. [Intel 8253]: https://en.wikipedia.org/wiki/Intel_8253 ## مدیریت وقفه‌های تایمر همانطور که در شکل [بالا](#the-8259-pic) می بینیم، تایمر از خط 0 از PIC اصلی استفاده می کند. این به این معنی است که به صورت وقفه 32 (0 + افست 32) به پردازنده می رسد. به جای هارد-کد(Hardcode) کردن 32، آن را در یک اینام(enum) به نام `InterruptIndex` ذخیره می کنیم: ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, } impl InterruptIndex { fn as_u8(self) -> u8 { self as u8 } fn as_usize(self) -> usize { usize::from(self.as_u8()) } } ``` اینام یک [اینام C مانند] است بنابراین ما می توانیم ایندکس را برای هر نوع به طور مستقیم مشخص کنیم. ویژگی `repr(u8)` مشخص می کند که هر نوع به عنوان `u8` نشان داده می شود. در آینده انواع بیشتری برای وقفه های دیگر اضافه خواهیم کرد. [اینام C مانند]: https://doc.rust-lang.org/reference/items/enumerations.html#custom-discriminant-values-for-fieldless-enumerations اکنون می توانیم یک تابع کنترل کننده برای وقفه تایمر اضافه کنیم: ```rust // in src/interrupts.rs use crate::print; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] idt[InterruptIndex::Timer.as_usize()] .set_handler_fn(timer_interrupt_handler); // new idt }; } extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); } ``` `timer_interrupt_handler` ما دارای امضای مشابه کنترل کننده های استثنای ما است ، زیرا پردازنده به طور یکسان به استثناها و وقفه های خارجی واکنش نشان می دهد (تنها تفاوت این است که برخی از استثناها کد خطا را در پشته ذخیره می‌کنند). ساختمان [`InterruptDescriptorTable`] تریت [`IndexMut`] را پیاده سازی می کند، بنابراین می توانیم از طریق سینتکس ایندکس‌دهی آرایه، به ایتم های جداگانه دسترسی پیدا کنیم. [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [`IndexMut`]: https://doc.rust-lang.org/core/ops/trait.IndexMut.html در کنترل کننده وقفه تایمر، یک نقطه را روی صفحه چاپ می کنیم. همانطور که وقفه تایمر به صورت دوره ای اتفاق می افتد ، انتظار داریم که در هر تیک تایمر یک نقطه ظاهر شود. با این حال، هنگامی که آن را اجرا می کنیم می بینیم که فقط یک نقطه چاپ می شود: ![QEMU printing only a single dot for hardware timer](qemu-single-dot-printed.png) ### پایان وقفه دلیل این امر این است که PIC انتظار دارد یک سیگنال صریح "پایان وقفه" (EOI) از کنترل کننده وقفه ما دریافت کند. این سیگنال به PIC می گوید که وقفه پردازش شده و سیستم آماده دریافت وقفه بعدی است. بنابراین PIC فکر می کند ما هنوز مشغول پردازش وقفه تایمر اول هستیم و قبل از ارسال سیگنال بعدی با صبر و حوصله منتظر سیگنال EOI از ما هست. برای ارسال EOI ، ما دوباره از ساختمان ثابت `PICS` خود استفاده می کنیم: ```rust // in src/interrupts.rs extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Timer.as_u8()); } } ``` `notify_end_of_interrupt` تشخیص می‌دهد که PIC اصلی یا ثانویه وقفه را ارسال کرده است و سپس از پورت های `command` و `data` برای ارسال سیگنال EOI به PIC های مربوطه استفاده می کند. اگر PIC ثانویه وقفه را ارسال کرد ، هر دو PIC باید مطلع شوند زیرا PIC ثانویه به یک خط ورودی از PIC اصلی متصل است. ما باید مراقب باشیم که از شماره بردار وقفه صحیح استفاده کنیم، در غیر این صورت می توانیم به طور تصادفی یک وقفه مهم ارسال نشده را حذف کنیم یا باعث هنگ سیستم خود شویم. این دلیل آن است که تابع ناامن است. اکنون هنگامی که `cargo run` را اجرا می کنیم، نقاطی را می بینیم که به صورت دوره ای روی صفحه ظاهر می شوند: ![QEMU printing consecutive dots showing the hardware timer](qemu-hardware-timer-dots.gif) ### پیکربندی تایمر تایمر سخت افزاری که ما از آن استفاده می کنیم ، _Programmable Interval Timer_ یا به اختصار PIT نامیده می شود. همانطور که از نام آن مشخص است ، می توان فاصله بین دو وقفه را پیکربندی کرد. ما در اینجا به جزئیات نمی پردازیم زیرا به زودی به [تایمر APIC] سوییچ خواهیم کرد، اما ویکی OSDev مقاله مفصلی درباره [پیکربندی PIT] دارد. [تایمر APIC]: https://wiki.osdev.org/APIC_timer [پیکربندی PIT]: https://wiki.osdev.org/Programmable_Interval_Timer ## بن‌بست ها اکنون نوعی همروندی در هسته خود داریم: وقفه های تایمر به صورت ناهمزمان اتفاق می افتند ، بنابراین می توانند تابع `start_` را در هر زمان قطع کنند. خوشبختانه سیستم مالکیت راست از بسیاری از مشکلات مربوط به همروندی در زمان کامپایل جلوگیری می کند. یک استثنا قابل توجه بن‌بست است. درصورتی که نخ(Thread) بخواهد قفلی را بدست آورد که هرگز آزاد نخواهد شد، بن‌بست به وجود می آید. بنابراین نخ به طور نامحدود هنگ می‌کند. ما می توانیم در هسته خود بن‌بست ایجاد کنیم. اگر به یاد داشته باشید، ماکرو `println` ما تابع `vga_buffer::_print` را فراخوانی می کند، که با استفاده از spinlock یک [`WRITER` گلوبال را قفل میکند][vga spinlock]. [vga spinlock]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ```rust // in src/vga_buffer.rs […] #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` `WRITER` را قفل می کند، `write_fmt` را روی آن فراخوانی می کند و در انتهای تابع به طور ضمنی قفل آن را باز می کند. حال تصور کنید که در حالی که `WRITER` قفل شده است وقفه رخ دهد و کنترل کننده وقفه نیز سعی کند چیزی را چاپ کند: | Timestep | _start | interrupt_handler | | -------- | ---------------------- | ----------------------------------------------- | | 0 | calls `println!` |   | | 1 | `print` locks `WRITER` |   | | 2 | | **interrupt occurs**, handler begins to run | | 3 | | calls `println!` | | 4 | | `print` tries to lock `WRITER` (already locked) | | 5 | | `print` tries to lock `WRITER` (already locked) | | … | | … | | _never_ | _unlock `WRITER`_ | `WRITER` قفل شده است ، بنابراین کنترل کننده وقفه منتظر می ماند تا آزاد شود. اما این هرگز اتفاق نمی افتد ، زیرا تابع `start_` فقط پس از بازگشت کنترل کننده وقفه ادامه می یابد. بنابراین کل سیستم هنگ است. ### ایجاد بن‌بست ما می توانیم با چاپ چیزی در حلقه در انتهای تابع `start_` خود ، به راحتی چنین بن‌بست‌ای در هسته خود ایجاد کنیم: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] loop { use blog_os::print; print!("-"); // new } } ``` وقتی آن را در QEMU اجرا می کنیم ، خروجی به حالت زیر دریافت می‌کنیم: ![QEMU output with many rows of hyphens and no dots](./qemu-deadlock.png) می بینیم که فقط تعداد محدودی خط فاصله ، تا زمانی که وقفه تایمر اول اتفاق بیفتد، چاپ می شود. سپس سیستم هنگ می‌کند زیرا تایمر هنگام تلاش برای چاپ یک نقطه باعث بن‌بست می‌شود. به همین دلیل است که در خروجی فوق هیچ نقطه‌ای مشاهده نمی‌کنیم. تعداد واقعی خط فاصله بین هر اجرا متفاوت است زیرا وقفه تایمر به صورت غیر همزمان انجام می شود. این عدم قطعیت، اشکال زدایی اشکالات مربوط به همروندی را بسیار دشوار می کند. ### رفع بن‌بست برای جلوگیری از این بن‌بست ، تا زمانی که `Mutex` قفل شده باشد، می توانیم وقفه‌ها را غیرفعال کنیم: ```rust // in src/vga_buffer.rs /// Prints the given formatted string to the VGA text buffer /// through the global `WRITER` instance. #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // new interrupts::without_interrupts(|| { // new WRITER.lock().write_fmt(args).unwrap(); }); } ``` تابع [`without_interrupts`] یک [کلوژر] را گرفته و آن را در یک محیط بدون وقفه اجرا می کند. ما از آن استفاده می کنیم تا اطمینان حاصل کنیم که تا زمانی که `Mutex` قفل شده است ، هیچ وقفه ای رخ نمی دهد. اکنون هنگامی که هسته را اجرا می کنیم ، می بینیم که آن بدون هنگ کردن به کار خود ادامه می دهد. (ما هنوز هیچ نقطه ای را مشاهده نمی کنیم ، اما این به این دلیل است که سرعت حرکت آنها بسیار سریع است. سعی کنید سرعت چاپ را کم کنید، مثلاً با قرار دادن `for _ in 0..10000 {}` در داخل حلقه.) [`without_interrupts`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.without_interrupts.html [کلوژر]: https://doc.rust-lang.org/book/second-edition/ch13-01-closures.html ما می توانیم همین تغییر را در تابع چاپ سریال نیز اعمال کنیم تا اطمینان حاصل کنیم که هیچ بن‌بستی در آن رخ نمی دهد: ```rust // in src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // new interrupts::without_interrupts(|| { // new SERIAL1 .lock() .write_fmt(args) .expect("Printing to serial failed"); }); } ``` توجه داشته باشید که غیرفعال کردن وقفه‌ها نباید یک راه حل کلی باشد. مشکل این است که بدترین حالت تأخیر در وقفه را افزایش می دهد ، یعنی زمانی که سیستم به وقفه واکنش نشان می دهد. بنابراین وقفه‌ها باید فقط برای مدت زمان کوتاه غیرفعال شوند. ## رفع وضعیت رقابتی اگر `cargo test` را اجرا کنید ، ممکن است ببینید تست `test_println_output` با شکست مواجه می‌شود: ``` > cargo test --lib […] Running 4 tests test_breakpoint_exception...[ok] test_println... [ok] test_println_many... [ok] test_println_output... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `'.'`, right: `'S'`', src/vga_buffer.rs:205:9 ``` دلیل آن وجود یک _وضعیت رقابتی_ بین تست و کنترل کننده تایمر ماست. اگر به یاد داشته باشید ، تست به این شکل است: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` این تست یک رشته را در بافر VGA چاپ می کند و سپس با پیمایش دستی روی آرایه `buffer_chars` خروجی را بررسی می کند. وضعیت رقابتی رخ می دهد زیرا ممکن است کنترل کننده وقفه تایمر بین `println` و خواندن کاراکتر های صفحه اجرا شود. توجه داشته باشید که این یک رقابت داده(Data race) خطرناک نیست، که Rust در زمان کامپایل کاملاً از آن جلوگیری کند. برای جزئیات به [_Rustonomicon_][nomicon-races] مراجعه کنید. [nomicon-races]: https://doc.rust-lang.org/nomicon/races.html برای رفع این مشکل ، باید `WRITER` را برای مدت زمان کامل تست قفل نگه داریم ، به این ترتیب که کنترل کننده تایمر نمی تواند `.` را روی صفحه نمایش در میان کار تست بنویسد. تست اصلاح شده به این شکل است: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { use core::fmt::Write; use x86_64::instructions::interrupts; let s = "Some test string that fits on a single line"; interrupts::without_interrupts(|| { let mut writer = WRITER.lock(); writeln!(writer, "\n{}", s).expect("writeln failed"); for (i, c) in s.chars().enumerate() { let screen_char = writer.buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } }); } ``` ما تغییرات زیر را انجام دادیم: - ما با استفاده صریح از متد `()lock` ، نویسنده را برای کل تست قفل می کنیم. به جای `println` ، از ماکرو [`writeln`] استفاده می کنیم که امکان چاپ بر روی نویسنده قبلاً قفل شده را فراهم می کند. - برای جلوگیری از یک بن‌بست دیگر ، وقفه‌ها را برای مدت زمان تست غیرفعال می کنیم. در غیر این صورت ممکن است تست در حالی که نویسنده هنوز قفل است قطع شود. - از آنجا که کنترل کننده وقفه تایمر هنوز می تواند قبل از تست اجرا شود ، قبل از چاپ رشته `s` یک خط جدید `n\` اضافی چاپ می کنیم. به این ترتیب ، اگر که کنترل کننده تایمر تعدادی کاراکتر `.` را در خط فعلی چاپ کرده باشد، از شکست تست جلوگیری می کنیم. [`writeln`]: https://doc.rust-lang.org/core/macro.writeln.html اکنون با تغییرات فوق ، `cargo test` دوباره با قطعیت موفق می شود. این یک وضعیت رقابتی بسیار بی خطر بود که فقط باعث شکست تست می‌شد. همانطور که می توانید تصور کنید، اشکال زدایی سایر وضعیت‌های رقابتی به دلیل ماهیت غیر قطعی بودن آنها بسیار دشوارتر است. خوشبختانه، راست مانع از رقابت داده‌ها می شود ، که جدی‌ترین نوع وضعیت رقابتی است ، زیرا می تواند باعث انواع رفتارهای تعریف نشده ، از جمله کرش کردن سیستم و خراب شدن آرام و بی صدای حافظه شود. ## دستورالعمل `hlt` تاکنون از یک حلقه خالی ساده در پایان توابع `start_` و` panic` استفاده می کردیم. این باعث می شود پردازنده به طور بی وقفه بچرخد و بنابراین مطابق انتظار عمل می کند. اما بسیار ناکارآمد است، زیرا پردازنده همچنان با سرعت کامل کار می کند حتی اگر کاری برای انجام نداشته باشد. هنگامی که هسته را اجرا می کنید می توانید این مشکل را در مدیر وظیفه خود مشاهده کنید: فرایند QEMU در کل مدت زمان نیاز به تقریباً 100٪ پردازنده دارد. کاری که واقعاً می خواهیم انجام دهیم این است که پردازنده را تا رسیدن وقفه بعدی متوقف کنیم. این اجازه می دهد پردازنده وارد حالت خواب شود که در آن انرژی بسیار کمتری مصرف می کند. [دستورالعمل `hlt`] دقیقاً همین کار را می کند. بیایید از این دستورالعمل برای ایجاد یک حلقه بی پایان با مصرف انرژی پایین استفاده کنیم: [دستورالعمل `hlt`]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) ```rust // in src/lib.rs pub fn hlt_loop() -> ! { loop { x86_64::instructions::hlt(); } } ``` تابع `instructions::hlt` فقط یک [پوشش نازک] بر روی دستورالعمل اسمبلی است. این بی خطر است زیرا به هیچ وجه نمی تواند ایمنی حافظه را به خطر بیندازد. [پوشش نازک]: https://github.com/rust-osdev/x86_64/blob/5e8e218381c5205f5777cb50da3ecac5d7e3b1ab/src/instructions/mod.rs#L16-L22 اکنون می توانیم از این `hlt_loop` به جای حلقه های بی پایان در توابع` start_` و `panic` استفاده کنیم: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] println!("It did not crash!"); blog_os::hlt_loop(); // new } #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); blog_os::hlt_loop(); // new } ``` بیایید `lib.rs` را نیز به روز کنیم: ```rust // in src/lib.rs /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); test_main(); hlt_loop(); // new } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); hlt_loop(); // new } ``` اکنون وقتی هسته خود را در QEMU اجرا می کنیم ، شاهد استفاده بسیار کمتری از پردازنده هستیم. ## ورودی صفحه کلید اکنون که قادر به مدیریت وقفه های دستگاه های خارجی هستیم ، سرانجام قادر به پشتیبانی از ورودی صفحه کلید هستیم. این به ما امکان می دهد برای اولین بار با هسته خود تعامل داشته باشیم. [PS/2]: https://en.wikipedia.org/wiki/PS/2_port مانند تایمر سخت افزاری ، کنترل کننده صفحه کلید نیز به طور پیش فرض از قبل فعال شده است. بنابراین با فشار دادن یک کلید ، کنترل کننده صفحه کلید وقفه را به PIC ارسال می کند و آن را به پردازنده منتقل می کند. پردازنده به دنبال یک تابع کنترل کننده در IDT می‌گردد ، اما ایتم مربوطه خالی است. بنابراین یک خطای دوگانه رخ می دهد. پس بیایید یک تایع کنترل کننده برای وقفه صفحه کلید اضافه کنیم. این کاملاً مشابه نحوه تعریف کنترل کننده برای وقفه تایمر است ، فقط از یک شماره وقفه متفاوت استفاده می کند: ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, Keyboard, // new } lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] // new idt[InterruptIndex::Keyboard.as_usize()] .set_handler_fn(keyboard_interrupt_handler); idt }; } extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("k"); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` همانطور که در شکل [بالا](#the-8259-pic) مشاهده می کنیم ، صفحه کلید از خط 1 در PIC اصلی استفاده می کند. این به این معنی است که به صورت وقفه 33 (1 + افست 32) به پردازنده می رسد. ما این ایندکس را به عنوان یک نوع جدید `Keyboard` به ای‌نام `InterruptIndex` اضافه می کنیم. نیازی نیست که مقدار را صریحاً مشخص کنیم ، زیرا این مقدار به طور پیش فرض برابر مقدار قبلی بعلاوه یک که 33 نیز می باشد ، هست. در کنترل کننده وقفه ، ما یک `k` چاپ می کنیم و سیگنال پایان وقفه را به کنترل کننده وقفه می فرستیم. اکنون می بینیم که وقتی کلید را فشار می دهیم `k` بر روی صفحه ظاهر می شود. با این حال ، این فقط برای اولین کلیدی که فشار می دهیم کار می کند ، حتی اگر به فشار دادن کلیدها ادامه دهیم ، دیگر `k` بر روی صفحه نمایش ظاهر نمی شود. این امر به این دلیل است که کنترل کننده صفحه کلید تا زمانی که اصطلاحاً _scancode_ را نخوانیم ، وقفه دیگری ارسال نمی کند. ### خواندن اسکن‌کد ها برای اینکه بفهمیم _کدام_ کلید فشار داده شده است ، باید کنترل کننده صفحه کلید را جستجو کنیم. ما این کار را با خواندن از پورت داده کنترل کننده PS/2 ، که [پورت ورودی/خروجی] با شماره `0x60` است ، انجام می دهیم: [پورت ورودی/خروجی]: @/edition-2/posts/04-testing/index.md#i-o-ports ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; print!("{}", scancode); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` ما برای خواندن یک بایت از پورت داده صفحه کلید از نوع [`Port`] کرت `x86_64` استفاده می‌کنیم. این بایت [_اسکن کد_] نامیده می شود و عددی است که کلید فشرده شده / رها شده را نشان می دهد. ما هنوز کاری با اسکن کد انجام نمی دهیم ، فقط آن را روی صفحه چاپ می کنیم: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html [_اسکن کد_]: https://en.wikipedia.org/wiki/Scancode ![QEMU printing scancodes to the screen when keys are pressed](qemu-printing-scancodes.gif) تصویر بالا نشان می دهد که من آرام آرام "123" را تایپ می کنم. می بینیم که کلیدهای مجاور دارای اسکن کد مجاور هستند و فشار دادن یک کلید دارای اسکن کد متفاوت با رها کردن آن است. اما چگونه اسکن‌کدها را دقیقاً به کار اصلی آن کلید ترجمه کنیم؟ ### تفسیر اسکن‌کد ها سه استاندارد مختلف برای نگاشت بین اسکن کدها و کلیدها وجود دارد ، اصطلاحاً _مجموعه های اسکن کد_. هر سه به صفحه کلید رایانه های اولیه IBM برمی گردند: [IBM XT] ، [IBM 3270 PC] و [IBM AT]. خوشبختانه رایانه های بعدی روند تعریف مجموعه های جدید اسکن کد را ادامه ندادند ، بلکه مجموعه های موجود را تقلید و آنها را گسترش دادند. امروزه بیشتر صفحه کلیدها را می توان به گونه ای پیکربندی کرد که از هر کدام از سه مجموعه تقلید کند. [IBM XT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer_XT [IBM 3270 PC]: https://en.wikipedia.org/wiki/IBM_3270_PC [IBM AT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer/AT به طور پیش فرض ، صفحه کلیدهای PS/2 مجموعه شماره 1 ("XT") را تقلید می کنند. در این مجموعه ، 7 بیت پایین بایت اسکن‌کد، کلید را تعریف می کند و مهمترین بیت فشردن ("0") یا رها کردن ("1") را مشخص می کند. کلیدهایی که در صفحه کلید اصلی [IBM XT] وجود نداشتند ، مانند کلید enter روی کی‌پد ، دو اسکن کد به طور متوالی ایجاد می کنند: یک بایت فرار(escape) `0xe0` و سپس یک بایت نمایانگر کلید. برای مشاهده لیست تمام اسکن‌کدهای مجموعه 1 و کلیدهای مربوط به آنها ، [ویکی OSDev][scancode set 1] را مشاهده کنید. [scancode set 1]: https://wiki.osdev.org/Keyboard#Scan_Code_Set_1 برای ترجمه اسکن کدها به کلیدها ، می توانیم از عبارت match استفاده کنیم: ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; // new let key = match scancode { 0x02 => Some('1'), 0x03 => Some('2'), 0x04 => Some('3'), 0x05 => Some('4'), 0x06 => Some('5'), 0x07 => Some('6'), 0x08 => Some('7'), 0x09 => Some('8'), 0x0a => Some('9'), 0x0b => Some('0'), _ => None, }; if let Some(key) = key { print!("{}", key); } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` کد بالا فشردن کلیدهای عددی 9-0 را ترجمه کرده و کلیه کلیدهای دیگر را نادیده می گیرد. از عبارت [match] برای اختصاص یک کاراکتر یا `None` به هر اسکن کد استفاده می کند. سپس با استفاده از [`if let`] اپشن `key` را از بین می برد. با استفاده از همان نام متغیر `key` در الگو که یک روش معمول برای از بین بردن انواع`Option` در راست است تعریف قبلی را [سایه می زنیم]. [match]: https://doc.rust-lang.org/book/ch06-02-match.html [`if let`]: https://doc.rust-lang.org/book/ch19-01-all-the-places-for-patterns.html#conditional-if-let-expressions [سایه می زنیم]: https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#shadowing اکنون می توانیم اعداد را بنویسیم: ![QEMU printing numbers to the screen](qemu-printing-numbers.gif) ترجمه کلیدهای دیگر نیز به همین روش کار می کند. خوشبختانه کرت ای با نام [`pc-keyboard`] برای ترجمه اسکن‌کد مجموعه های اسکن‌کد 1 و 2 وجود دارد ، بنابراین لازم نیست که خودمان این را پیاده سازی کنیم. برای استفاده از کرت ، آن را به `Cargo.toml` اضافه کرده و در`lib.rs` خود وارد می کنیم: [`pc-keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/ ```toml # in Cargo.toml [dependencies] pc-keyboard = "0.7.0" ``` اکنون میتوانیم از این کرت برای باز نویسی `keyboard_interrupt_handler` استفاده کنیم: ```rust // in/src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use spin::Mutex; use x86_64::instructions::port::Port; lazy_static! { static ref KEYBOARD: Mutex> = Mutex::new(Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore) ); } let mut keyboard = KEYBOARD.lock(); let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` ما از ماکرو `lazy_static` برای ایجاد یک شی ثابت [`Keyboard`] محافظت شده توسط Mutex استفاده می کنیم. `Keyboard` را با طرح صفحه کلید ایالات متحده و مجموعه اسکن کد 1 مقداردهی می کنیم. پارامتر [`HandleControl`] اجازه می دهد تا `ctrl+[a-z]` را به کاراکتر های `U+0001` تا `U+001A` نگاشت کنیم. ما نمی خواهیم چنین کاری انجام دهیم ، بنابراین از گزینه `Ignore` برای برخورد با `ctrl` مانند کلیدهای عادی استفاده می کنیم. [`HandleControl`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/enum.HandleControl.html در هر وقفه ، Mutex را قفل می کنیم ، اسکن کد را از کنترل کننده صفحه کلید می خوانیم و آن را به متد [`add_byte`] منتقل می کنیم ، که اسکن کد را به یک ` このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-07` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-07 ## 概要 割り込みは、接続されたハードウェアデバイスから CPU に通知を行う方法を提供します。よって、新しい文字がないか定期的にカーネルにキーボードを確認させるかわりに ([ポーリング][_polling_]と呼ばれます)、キー入力のたびにキーボードのほうからカーネルに通知することができます。この方法の場合、カーネルはなにかが起きたときだけ処置を行えばよいので、とても効率がいいです。カーネルは次のポーリングのタイミングを待たずとも即座に反応することができるので、応答時間も短くなります。 [_polling_]: https://ja.wikipedia.org/wiki/%E3%83%9D%E3%83%BC%E3%83%AA%E3%83%B3%E3%82%B0_(%E6%83%85%E5%A0%B1) すべてのハードウェアを直接 CPU に接続することはできません。そのかわりに、独立した _割り込みコントローラ_ がすべてのデバイスからの割り込みを取りまとめて CPU に通知します: ``` ____________ _____ Timer ------------> | | | | Keyboard ---------> | Interrupt |---------> | CPU | Other Hardware ---> | Controller | |_____| Etc. -------------> |____________| ``` 多くの割り込みコントローラはプログラム可能です。これはそれぞれの割り込みに対して異なる優先度を設定することができるということです。例えば正確な時間管理を保証するために、キーボード割り込みよりもタイマ割り込みに高い優先度を設定することができます。 例外とは異なり、ハードウェア割り込みは _非同期的_ です。これは、ハードウェア割り込みは実行コードからは完全に独立していて、どんなタイミングでもハードウェア割り込みが発生する可能性があるということです。よって突如として私達のカーネルはある種の並行性を持つようになり、これにより並行処理に関するあらゆる潜在的なバグを持つことになります。Rust の厳格な所有権モデルはグローバルな状態を変更可能にすることを禁止しているため、この問題に役立ちます。ただしこの記事の後半で見るとおり、それでもデッドロックは発生してしまいます。 ## 8259 PIC [Intel 8259] は 1976 年に登場したプログラム可能な割り込みコントローラ (programmable interrupt controller: PIC) です。ずいぶん前に、より新しい [APIC] によって置き換えられましたが、そのインタフェースは現在のシステムでも後方互換性のためにサポートされています。8259 PIC は APIC よりも設定がかなり簡単なので、後の記事で APIC に切り替える前に、8259 PIC を使って割り込み処理に入門することにしましょう。 [APIC]: https://ja.wikipedia.org/wiki/APIC 8259 PIC は、割り込み線8本と、CPU と通信するための線を数本持っています。当時の典型的なシステムは 8259 PIC をプライマリとセカンダリの2つ搭載しており、セカンダリの PIC はプライマリの PIC の割り込み線のひとつに接続されていました: [Intel 8259]: https://ja.wikipedia.org/wiki/Intel_8259 ``` ____________ ____________ Real Time Clock --> | | Timer -------------> | | ACPI -------------> | | Keyboard-----------> | | _____ Available --------> | Secondary |----------------------> | Primary | | | Available --------> | Interrupt | Serial Port 2 -----> | Interrupt |---> | CPU | Mouse ------------> | Controller | Serial Port 1 -----> | Controller | |_____| Co-Processor -----> | | Parallel Port 2/3 -> | | Primary ATA ------> | | Floppy disk -------> | | Secondary ATA ----> |____________| Parallel Port 1----> |____________| ``` この図は典型的な割り込み線の割り当てを示しています。15本の線の多くは割り当てが固定されています。例えば、セカンダリの PIC の4番目の線はマウスに割り当てられています。 それぞれのコントローラは、"コマンド" ポートと "データ" ポートという2つの [I/O ポート][I/O ports] を通じて設定を行うことができます。プライマリコントローラでは、これらのポートは `0x20` (コマンド) と `0x21` (データ) になります。セカンダリコントローラでは、`0xa0` (コマンド) と `0xa1` (データ) になります。PIC の設定方法の詳細は [osdev.org の記事][article on osdev.org]を見てください。 [I/O ports]: @/edition-2/posts/04-testing/index.md#i-o-ports [article on osdev.org]: https://wiki.osdev.org/8259_PIC ### 実装 PIC のデフォルト設定では、0から15の割り込みベクタ番号を CPU に送るようになっているため使うことができません。これらの番号は既に CPU 例外で使われており、例えば8番はダブルフォルトに対応します。この重複による問題を解決するためには PIC の割り込みを別の番号にマッピングし直さないといけません。割り込み番号の範囲は例外と重複しなければ問題になりませんが、32個の例外スロットのあとの最初の空き番号である32から47の範囲がよく使われます。 コマンドポートとデータポートに特別な値を書き込むことで PIC の設定を行います。幸い [`pic8259`] というクレートが既にありますので、初期化シーケンスを自分たちで書く必要はありません。クレートの動作に興味があるなら[ソースコード][pic crate source]を確認してみてください。とても小さくドキュメントも豊富です。 [pic crate source]: https://docs.rs/crate/pic8259/0.10.1/source/src/lib.rs クレートを依存関係を追加するため、以下の内容をプロジェクトに追加します: [`pic8259`]: https://docs.rs/pic8259/0.10.1/pic8259/ ```toml # in Cargo.toml [dependencies] pic8259 = "0.10.1" ``` このクレートが提供する主な抽象化は、上で見たようなプライマリとセカンダリの PIC からなるレイアウトを表わす [`ChainedPics`] 構造体です。これは以下のように使うように設計されています: [`ChainedPics`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html ```rust // in src/interrupts.rs use pic8259::ChainedPics; use spin; pub const PIC_1_OFFSET: u8 = 32; pub const PIC_2_OFFSET: u8 = PIC_1_OFFSET + 8; pub static PICS: spin::Mutex = spin::Mutex::new(unsafe { ChainedPics::new(PIC_1_OFFSET, PIC_2_OFFSET) }); ``` 上で述べたように、PIC のオフセットを32から47の範囲に設定しています。`ChainedPics` 構造体を `Mutex` でラップすることで、次のステップで必要になる安全な可変アクセスを ([`lock` メソッド][spin mutex lock]を使って) 得ることができます。間違ったオフセットを指定すると未定義動作となるため、`ChainedPics::new` 関数は unsafe です。 [spin mutex lock]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html#method.lock 8259 PIC の初期化は `init` 関数内で行うことができます: ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; // new } ``` PIC の初期化を行うために [`initialize`] 関数を使います。`ChainedPics::new` 関数と同じように、PIC を間違って設定すると未定義動作となるため、この関数も unsafe になります。 [`initialize`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html#method.initialize すべてうまくいけば、`cargo run` を実行すると "It did not crash" というメッセージが引き続き表示されるはずです。 ## 割り込みの有効化 CPU の設定で割り込みが無効化されていたため、これまではなにも起きませんでした。これは CPU が割り込みコントローラからの信号をすべて無視していたため、割り込みが CPU に届かなかったということです。これを変更しましょう: ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; x86_64::instructions::interrupts::enable(); // new } ``` `x86_64` クレートの `interrupts::enable` 関数は、特別な `sti` ("set interrupts") 命令を実行し外部割り込みを有効にします。ここで `cargo run` を実行するとダブルフォルトが発生します: ![QEMU printing `EXCEPTION: DOUBLE FAULT` because of hardware timer](qemu-hardware-timer-double-fault.png) ダブルフォルトが発生する理由は、ハードウェアタイマ (正確には [Intel 8253]) がデフォルトで有効になっているため、割り込みを有効にするとすぐにタイマ割り込みを受け取り始めるためです。この割り込みのためのハンドラ関数を定義していないため、ダブルフォルトのハンドラが呼ばれることになります。 [Intel 8253]: https://en.wikipedia.org/wiki/Intel_8253 ## タイマ割り込みの処理 [上述](#8259-pic)した図にある通り、タイマはプライマリの PIC の0番目の線を使います。これはタイマ割り込みは32番 (0 + オフセットの32) の割り込みとして CPU に届くということです。32をハードコーディングする代わりに `InterruptIndex` enum に保存することにしましょう: ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, } impl InterruptIndex { fn as_u8(self) -> u8 { self as u8 } fn as_usize(self) -> usize { usize::from(self.as_u8()) } } ``` Rust の enum は [C 言語ライクな enum][C-like enum] であるため、各ヴァリアントに直接インデックスを指定できます。 `repr(u8)` アトリビュートは、各ヴァリアントが `u8` 型で表されるよう指定しています。今後、他の例外に対してヴァリアントを追加していきます。 [C-like enum]: https://doc.rust-lang.org/reference/items/enumerations.html#custom-discriminant-values-for-fieldless-enumerations さて、タイマ割り込みへのハンドラ関数を追加していきます: ```rust // in src/interrupts.rs use crate::print; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] idt[InterruptIndex::Timer.as_usize()] .set_handler_fn(timer_interrupt_handler); // new idt }; } extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); } ``` CPU は例外に対しても外部割り込みに対しても同じように反応するため、我々が定義した `timer_interrupt_handler` は例外ハンドラと同じシグニチャを持っています (唯一の違いは、一部の例外はエラーコードをプッシュすることです)。[`InterruptDescriptorTable`] 構造体は [`IndexMut`] トレイトを実装しているので、配列の添字記法でそれぞれのエントリにアクセスすることができます。 [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [`IndexMut`]: https://doc.rust-lang.org/core/ops/trait.IndexMut.html 我々のタイマ割り込みハンドラでは画面にドットを表示します。タイマ割り込みは定期的に発生するので、タイマティックのたびに新たなドットが現れるだろうと思うでしょう。しかし実行してみると、ドットはひとつしか表示されません: ![QEMU printing only a single dot for hardware timer](qemu-single-dot-printed.png) ### 割り込みの終了 この理由は、PIC は明示的な "割り込みの終了" (end of interrupt: EOI) 信号を割り込みハンドラが通知してくると期待しているからです。この信号は、割り込みが既に処理されシステムが次の割り込みを受け取る準備ができたことをコントローラに伝えます。そのため PIC は、我々のシステムはまだ最初のタイマ割り込みを処理している最中だと考え、次の割り込みを送らずに辛抱強く EOI 信号を待ち続けているのです。 EOI を送るためには、再び静的な `PICS` 構造体を使います: ```rust // in src/interrupts.rs extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Timer.as_u8()); } } ``` `notify_end_of_interrupt` は、プライマリとセカンダリのどちらの PIC が割り込みを送ったかを判断し、コマンドポートとデータポートを使って EOI 信号をそれぞれのコントローラに送ります。セカンダリの PIC はプライマリの PIC の入力線に接続されているため、もしセカンダリの PIC が割り込みを送った場合は、両方の PIC に信号を送る必要があります。 正しい割り込みベクタ番号を使うよう気をつけないと、まだ送信されていない重要な割り込みを間違って消してしまったり、システムがハングしてしまうかもしれません。この関数が unsafe になっているのはこのためです。 `cargo run` を実行すると、画面上にドットが定期的に表示されるでしょう: ![QEMU printing consecutive dots showing the hardware timer](qemu-hardware-timer-dots.gif) ### タイマの設定 我々が使ったハードウェアタイマは _プログラム可能インターバルタイマ_ 、もしくは短く PIT と呼ばれています。名前が示すように、PIT は2つの割り込みの間の間隔を設定することができます。すぐに [APIC タイマ][APIC timer]に切り替えるのでここで詳細に入ることはしませんが、OSDev wiki には [PIT の設定][configuring the PIT]に関する記事が豊富にあります。 [APIC timer]: https://wiki.osdev.org/APIC_timer [configuring the PIT]: https://wiki.osdev.org/Programmable_Interval_Timer ## デッドロック これで我々のカーネルはある種の並行性を持ちました: タイマ割り込みは非同期に発生するので、どんなタイミングでも `_start` 関数に割り込み得るのです。幸い、Rust の所有権システムは並行性に関連する多くのバグをコンパイル時に防ぐことができます。特筆すべき例外のひとつがデッドロックです。デッドロックはスレッドが決して解放されないロックを取得しようとしたときに起こり、そのスレッドは永遠にハングしてしまいます。 我々のカーネルでは、既にデッドロックが起きる可能性があります。我々が実装した `prinln` マクロは `vga_buffer::_print` 関数を呼び出しており、_print 関数はスピンロックを使って[グローバルな `WRITER` をロックする][vga spinlock]ということを思い出してください: [vga spinlock]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ```rust // in src/vga_buffer.rs […] #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` _print 関数は `WRITER` をロックし、`write_fmt` を呼び出し、そして関数の最後で暗黙にロックを解放します。では、`WRITER` がロックされている間に割り込みが発生し、割り込みハンドラもなにかを表示しようとしていると想像してみてください: | 時刻 | _start | 割り込みハンドラ | | -------------------- | ----------------------------- | --------------------------------------------------------- | | 0 | `println!` を呼び出す |   | | 1 | `print` が `WRITER` をロック |   | | 2 | | **割り込みが発生**、割り込みハンドラが動き出す | | 3 | | `println!` を呼び出す | | 4 | | `print` が `WRITER` をロックしようとする (既にロック済み) | | 5 | | `print` が `WRITER` をロックしようとする (既にロック済み) | | … | | … | | _(決して起こらない)_ | _`WRITER` のロックを解放する_ | `WRITER` はロックされているので、割り込みハンドラはそれが解放されるのを待ちます。しかし `_start` 関数は割り込みハンドラから処理が戻らないと実行されることはないので、ロックが解放されることはありません。このためシステム全体がハングしてしまいます。 ### デッドロックを起こす 我々のカーネルでは、`_start` 関数の最後のループの中で何かを表示するだけで簡単にデッドロックを起こすことができます: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] loop { use blog_os::print; print!("-"); // new } } ``` QEMU で実行すると以下のような出力が得られます: ![QEMU output with many rows of hyphens and no dots](./qemu-deadlock.png) 限られた数のハイフンが表示されたのち、最初のタイマ割り込みが発生したことがわかります。そしてタイマ割り込みハンドラがドットを表示しようとするとデッドロックするので、システムがハングしてしまいます。これが上記の出力でドットが表示されていない理由です。 タイマ割り込みは非同期に発生するので、実際のハイフンの数は実行するたびに変わります。この非決定性が、並行性に関するバグのデバッグを非常に難しくします。 ### デッドロックを修正する このデッドロックを回避するため、`Mutex` がロックされている間は割り込みを無効化することができます: ```rust // in src/vga_buffer.rs /// グローバルな `WRITER` インスタンスを使って /// フォーマット済み文字列を VGA テキストバッファに出力する #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // new interrupts::without_interrupts(|| { // new WRITER.lock().write_fmt(args).unwrap(); }); } ``` [`without_interrupts`] 関数は[クロージャ][closure]を引数に取り、これを割り込みが発生しない状態で実行します。これを使えば `Mutex` がロックされている間は割り込みが発生しないことを保証できます。このように修正したカーネルを実行すると、今度はハングせずに実行が続きます。(ドットがないように見えますが、これはスクロールが速すぎるためです。例えば `for _ in 0..10000 {}` をループ内で実行するなどで表示速度を遅くしてみてください。) [`without_interrupts`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.without_interrupts.html [closure]: https://doc.rust-lang.org/book/ch13-01-closures.html シリアル出力関数でもデッドロックが起きないことを保証するために、同等の変更を加えます: ```rust // in src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // new interrupts::without_interrupts(|| { // new SERIAL1 .lock() .write_fmt(args) .expect("Printing to serial failed"); }); } ``` 割り込みを無効化することを一般的な解決策としてはならないことは覚えておいてください。割り込みの無効化は、レイテンシ、つまりシステムが割り込みに反応するまでの時間の最悪値を増加させるという問題があります。そのため割り込みの無効化はごく短時間に限るべきです。 ## 競合状態を修正する `cargo test` を実行すると、`test_println_output` テストが失敗することが確認できるかもしれません: ``` > cargo test --lib […] Running 4 tests test_breakpoint_exception...[ok] test_println... [ok] test_println_many... [ok] test_println_output... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `'.'`, right: `'S'`', src/vga_buffer.rs:205:9 ``` この理由はテスト関数とタイマ割り込みハンドラの間の _競合状態_ です。テスト処理は以下のようになっていました: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` このテストでは、VGA バッファに文字列を出力したあと `buffer_chars` 配列を手動でひとつずつチェックしています。`println` 関数を実行したあと、表示された文字の読み取り処理を行うまでの間にタイマ割り込みハンドラが動作するかもしれず、このとき競合状態になります。ただ、これは危険な _データ競合_ ではないことに注意してください―― Rust はデータ競合をコンパイル時に完全に防ぐことができます。詳細は [_Rustonomicon_][nomicon-races] を参照してください。 [nomicon-races]: https://doc.rust-lang.org/nomicon/races.html これを修正するため、タイマ割り込みハンドラがテストの途中で `.` を画面に出力できないように、テストが完了するまでの間は `WRITER` をロックし続ける必要があります。修正されたテストはこのようになります: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { use core::fmt::Write; use x86_64::instructions::interrupts; let s = "Some test string that fits on a single line"; interrupts::without_interrupts(|| { let mut writer = WRITER.lock(); writeln!(writer, "\n{}", s).expect("writeln failed"); for (i, c) in s.chars().enumerate() { let screen_char = writer.buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } }); } ``` 以下のような変更を行いました: - `lock()` メソッドを明示的に使い、テスト実行中はずっと writer をロックし続けるようにします。`println` の代わりに、既にロックされた writer に表示を行うことができる [`writeln`] マクロを使います。 - 他のデッドロックを防ぐため、テスト実行中は割り込みを無効化します。そうでないと writer がロックされている間に割り込みが入ってきてしまうかもしれません。 - テスト実行前にタイマ割り込みハンドラが実行される可能性は依然としてあるので、文字列 `s` を出力する前に追加で改行文字 `\n` を出力するようにします。これにより、タイマハンドラが現在の行に既に出力した `.` 文字によってテストが失敗するのを避けています。 [`writeln`]: https://doc.rust-lang.org/core/macro.writeln.html 上記の変更によって、`cargo test` は再び必ず成功するようになります。 これはテストが失敗するだけの無害な競合状態でした。想像できると思いますが、他の競合状態はその非決定的な性質のためずっとデバッグが大変になり得ます。幸運なことに Rust は、システムのクラッシュや無兆候でのメモリ破壊を含むあらゆる種類の未定義動作を引き起こす最も深刻なタイプの競合状態であるデータ競合から我々を守ってくれます。 ## `hlt` 命令 これまで我々は、`_start` や `panic` 関数の末尾で単純なループ文を使ってきました。これはずっと CPU を回し続けるので、期待通りに動作します。しかしこれはなにも仕事がない場合でも CPU が全速力で動作し続けることになるので、とても非効率です。カーネルを動かしているときにタスクマネージャを見ればこの問題がすぐに確認できるでしょう: QEMU のプロセスは、常時 CPU 時間のほぼ 100% を必要とします。 我々が本当にやりたいことは、次の割り込みが入るまで CPU を停止することです。これにより CPU はほとんど電力を使わないスリープ状態に入ることができます。[hlt 命令][`hlt` instruction]はまさにこれを行うものです。この命令を使ってエネルギー効率のいい無限ループを作ってみましょう: [`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) ```rust // in src/lib.rs pub fn hlt_loop() -> ! { loop { x86_64::instructions::hlt(); } } ``` `instructions::hlt` 関数はアセンブリ命令の[薄いラッパ][thin wrapper]です。この命令はメモリ安全性を損なわないので unsafe ではありません。 [thin wrapper]: https://github.com/rust-osdev/x86_64/blob/5e8e218381c5205f5777cb50da3ecac5d7e3b1ab/src/instructions/mod.rs#L16-L22 これで `hlt_loop` を `_start` や `panic` 関数内の無限ループの代わりに使うことができます: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] println!("It did not crash!"); blog_os::hlt_loop(); // new } #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); blog_os::hlt_loop(); // new } ``` `lib.rs` も同様に更新しましょう: ```rust // in src/lib.rs /// `cargo test` のエントリポイント #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); test_main(); hlt_loop(); // new } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); hlt_loop(); // new } ``` QEMU でカーネルを動かすと、CPU 時間の消費が大幅に減っていることがわかります。 ## キーボード入力 外部デバイスからの割り込みを扱うことができるようになったので、ついにキーボード入力へのサポートを追加することができます。これにより、我々は初めてカーネルと対話することができるようになります。 [PS/2]: https://ja.wikipedia.org/wiki/PS/2%E3%82%B3%E3%83%8D%E3%82%AF%E3%82%BF ハードウェアタイマのように、キーボードコントローラは既にデフォルトで有効になっています。なのでキーを押すと、キーボードコントローラは PIC に割り込みを送り、CPU に転送されます。CPU は IDT の中からハンドラ関数を探しますが、対応するエントリは空です。よってダブルフォルトが発生します。 ではキーボード割り込みへのハンドラ関数を追加しましょう。異なる割り込み番号を使うだけで、タイマ割り込み用のハンドラを定義した方法とほとんど同じです: ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, Keyboard, // new } lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] // new idt[InterruptIndex::Keyboard.as_usize()] .set_handler_fn(keyboard_interrupt_handler); idt }; } extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("k"); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` [上述](#8259-pic)した図で見たように、キーボードはプライマリ PIC の1番目の線を使います。これはキーボード割り込みは33番(1 + オフセットの32)の割り込みとして CPU に届くということです。このインデックスを `Keyboard` というヴァリアントとして新たに `InterruptIndex` enum に追加します。enum ヴァリアントの値はデフォルトでは前の値に1を足したもの、すなわち33になるので、値を明示的に指定する必要はありません。割り込みハンドラでは、`k` の文字を表示して割り込みコントローラに EOI 信号を送ります。 [上述](#8259-pic)した図で見たように、キーボードはプライマリ PIC の1番目の線を使います。これはキーボード割り込みは33番 (1 + オフセットの32) の割り込みとして CPU に届くということです。このインデックスを新たな `Keyboard` のヴァリアントとして `InterruptIndex` enum に追加します。enum ヴァリアントの値はデフォルトでは前の値に1を足したもの、すなわち33になるので、値を明示的に指定する必要はありません。割り込みハンドラでは、`k` の文字を表示して割り込みコントローラに EOI 信号を送ります。 これでキーを押したときに画面上に `k` の文字が表示されます。しかしこれは最初のキー入力に対してしか動作しません。キーを押し続けたとしても、それ以上 `k` の文字が画面上に表示されることはありません。この理由は、我々が押されたキーの _スキャンコード_ と呼ばれる値を読み取らない限りは、キーボードコントローラは別の割り込みを送らないためです。 ### スキャンコードの読み取り _どの_ キーが押されたか知るためにはキーボードコントローラに問い合わせる必要があります。これは [I/O ポート][I/O port]の `0x60` に割り当てられた PS/2 コントローラのデータポートを読み取ることで行います: [I/O port]: @/edition-2/posts/04-testing/index.md#i-o-ports ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; print!("{}", scancode); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` キーボードのデータポートから1バイトのデータを読み取るため、`x86_64` クレートに含まれる [`Port`] 型を使います。この1バイトは[スキャンコード][_scancode_]と呼ばれ、キーのプレス/リリースの状態を表します。今のところはスキャンコードを画面に表示する以外にはなにもしません: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html [_scancode_]: https://ja.wikipedia.org/wiki/%E3%82%B9%E3%82%AD%E3%83%A3%E3%83%B3%E3%82%B3%E3%83%BC%E3%83%89 ![QEMU printing scancodes to the screen when keys are pressed](qemu-printing-scancodes.gif) 上の画像は、私がゆっくりと "123" とタイプしたときの様子です。隣り合うキーは隣り合うスキャンコードを持ち、キーを押したときと離したときのスキャンコードは異なることがわかります。ではスキャンコードを実際のキー操作に正確に変換するためにはどうしたらいいのでしょうか。 ### スキャンコードの解釈 _スキャンコードセット_ と呼ばれるスキャンコードとキーのマッピングの標準は3つあります。3つのどれもが、 [IBM XT]、[IBM 3270 PC]、[IBM AT] という、初期の IBM コンピュータのキーボードにさかのぼります。幸運なことに、その後のコンピュータは新たなスキャンコードセットを定義するという流れには乗らず、既存のセットをエミュレートして拡張しました。現在では、多くのキーボードは3つのセットのどれでもエミュレートするよう設定できるようになっています。 [IBM XT]: https://ja.wikipedia.org/wiki/IBM_PC_XT [IBM 3270 PC]: https://en.wikipedia.org/wiki/IBM_3270_PC [IBM AT]: https://ja.wikipedia.org/wiki/PC/AT デフォルトでは、PS/2 キーボードはスキャンコードセット1 ("XT") をエミュレートします。このセットでは、スキャンコードの下位の7ビットでキーを表し、最上位の1ビットで押したか ("0") 離したか ("1") を表します。例えばエンターキーのような元の IBM XT のキーボードに存在しないキーに対しては、エスケープである `0xe0` とそのキーを表すバイトという連続した2つのスキャンコードを生成します。スキャンコードセット1の全てのスキャンコードと対応するキーについては [OSDev Wiki][scancode set 1] を確認してください。 [scancode set 1]: https://wiki.osdev.org/Keyboard#Scan_Code_Set_1 スキャンコードをキーに変換するために `match` 文を使います: ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; // new let key = match scancode { 0x02 => Some('1'), 0x03 => Some('2'), 0x04 => Some('3'), 0x05 => Some('4'), 0x06 => Some('5'), 0x07 => Some('6'), 0x08 => Some('7'), 0x09 => Some('8'), 0x0a => Some('9'), 0x0b => Some('0'), _ => None, }; if let Some(key) = key { print!("{}", key); } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` 上記のコードは、0から9の数字キーが押された場合に変換を行い、それ以外のキーは無視します。全てのスキャンコードに対し、[match] 文を使って対応する文字か `None` を割り当てます。そのあと [`if let`] 構文を使ってオプション型の `key` から値を取り出します。パターン部分に `key` という同じ変数名を使うことでそれ以前の宣言を[シャドーイング][shadow]します。これは Rust において `Option` 型から値を取り出すときによく使うパターンです。 [match]: https://doc.rust-lang.org/book/ch06-02-match.html [`if let`]: https://doc.rust-lang.org/book/ch19-01-all-the-places-for-patterns.html#conditional-if-let-expressions [shadow]: https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#shadowing これで数字が表示できるようになりました: ![QEMU printing numbers to the screen](qemu-printing-numbers.gif) 他の文字も同じように変換することができます。幸運なことに、スキャンコードセットの1と2のスキャンコードを変換するための [`pc-keyboard`] というクレートがありますので、これを自分で実装する必要はありません。このクレートを使うために `Cargo.toml` に以下を追加し、`lib.rs` でインポートしましょう: [`pc-keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/ ```toml # in Cargo.toml [dependencies] pc-keyboard = "0.7.0" ``` これでこのクレートを使って `keyboard_interrupt_handler` を書き直すことができます: ```rust // in/src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use spin::Mutex; use x86_64::instructions::port::Port; lazy_static! { static ref KEYBOARD: Mutex> = Mutex::new(Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore) ); } let mut keyboard = KEYBOARD.lock(); let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` ミューテックスで保護された静的な [`Keyboard`] オブジェクトを作るために `lazy_static` マクロを使います。`Keyboard` は、レイアウトを US キーボードに、スキャンコードセットは1として初期化を行います。[`HandleControl`] パラメタは、`ctrl+[a-z]` を Unicode 文字の `U+0001` から `U+001A` にマッピングさせることができます。この機能は使いたくないので、`Ignore` オプションを使い `ctrl` キーを通常のキーと同様に扱います。 [`HandleControl`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/enum.HandleControl.html 各割り込みでは、ミューテックスをロックし、キーボードコントローラからスキャンコードを読み取り、それを読み取ったスキャンコードを `Option` に変換する [`add_byte`] メソッドに渡します。[`KeyEvent`] は、そのイベントを起こしたキーと、それが押されたのか離されたのかの情報を含んでいます。 [`Keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html [`add_byte`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.add_byte [`KeyEvent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.KeyEvent.html このキーイベントを解釈するために、変換可能であればキーイベントを文字に変換する [`process_keyevent`] メソッドにキーイベントを渡します。例えば `A` キーの押下イベントを、シフトキーが押されていたかによって小文字の `a` か大文字の `A` に変換します。 [`process_keyevent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.process_keyevent 修正した割り込みハンドラで、テキストが入力できるようになります: ![Typing "Hello World" in QEMU](qemu-typing.gif) ### キーボードの設定 例えば使用するスキャンコードセットを変えるなど、PS/2 キーボードの一部の設定を変えることができます。もうこの記事は長くなりすぎたのでそれについて説明することはしませんが、OSDev Wiki に[設定コマンド][configuration commands]の概要が記載されています。 [configuration commands]: https://wiki.osdev.org/PS/2_Keyboard#Commands ## まとめ この記事では、外部割り込みを有効にする方法とそれを処理する方法について説明しました。8259 PIC とそのプライマリ/セカンダリレイアウト、割り込み番号をマッピングし直す方法、そして "end of interrupt" 信号について学びました。我々はハードウェアタイマとキーボード向けの割り込みハンドラを実装し、次の割り込みまで CPU を停止させる `hlt` 命令について学びました。 これで我々はカーネルと対話することができるようになり、小さなシェルやシンプルなゲームを作るための基本的な構成要素を得ることができました。 ## 次は? タイマ割り込みは、定期的に動作中のプロセスに割り込み、制御をカーネルに戻す方法を提供するという意味で OS にとって必要不可欠なものです。この上でカーネルは別のプロセスに実行を切り替えることで、まるで複数のプロセスが並列に動いているように見せているのです。 ただし、プロセスやスレッドを作る前に、それらにメモリを割り当てる方法が必要です。次の記事では、メモリ管理という基本的な構成要素を提供するため、メモリ管理について調査していきます。 ================================================ FILE: blog/content/edition-2/posts/07-hardware-interrupts/index.ko.md ================================================ +++ title = "하드웨어 인터럽트" weight = 7 path = "ko/hardware-interrupts" date = 2018-10-22 [extra] # Please update this when updating the translation translation_based_on_commit = "a108367d712ef97c28e8e4c1a22da4697ba6e6cd" # GitHub usernames of the people that translated this post translators = ["JOE1994"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["dalinaum"] +++ 이 글에서는 프로그래밍 할 수 있는 인터럽트 컨트롤러가 인터럽트들을 CPU로 정확히 전달하도록 설정할 것입니다. 새로운 인터럽트들을 처리하기 위해 인터럽트 서술자 테이블 (interrupt descriptor table)에 새로운 엔트리들을 추가할 것입니다 (이전에 예외 처리 함수를 등록했듯이). 또한 일정 주기마다 타이머 인터럽트를 일으키는 방법 및 키보드 입력 받는 방법도 알아볼 것입니다. 이 블로그는 [GitHub 저장소][GitHub]에서 오픈 소스로 개발되고 있으니, 문제나 문의사항이 있다면 저장소의 'Issue' 기능을 이용해 제보해주세요. [페이지 맨 아래][at the bottom]에 댓글을 남기실 수도 있습니다. 이 포스트와 관련된 모든 소스 코드는 저장소의 [`post-07 브랜치`][post branch]에서 확인하실 수 있습니다. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-07 ## 개요 CPU에 연결된 주변 장치들은 인터럽트를 통해 CPU에 알림을 보낼 수 있습니다. 그래서 커널이 주기적으로 키보드 입력이 들어왔는지 확인하게 하는 대신 (이를 [_폴링(polling)_][_polling_] 방식이라고 합니다), 키보드 입력이 들어올 때마다 키보드가 직접 커널에 알림을 보낼 수 있습니다. 이 방식을 사용하면 이벤트 발생 시에만 커널이 행동을 취하면 되므로 에너지 효율성이 더 좋습니다. 또한 이벤트가 발생 시 커널이 다음 poll까지 기다리지 않고 바로 반응할 수 있기에 이벤트에 대한 반응 속도도 더 빠릅니다. [_polling_]: https://en.wikipedia.org/wiki/Polling_(computer_science) 하드웨어 장치들을 모두 CPU에 직접 연결하는 것은 불가능합니다. 대신 별도의 _인터럽트 컨트롤러 (interrupt controller)_ 가 주변 장치로부터 전송된 인터럽트들을 수합한 뒤 CPU에 알림을 보냅니다. ``` ____________ _____ Timer ------------> | | | | Keyboard ---------> | Interrupt |---------> | CPU | Other Hardware ---> | Controller | |_____| Etc. -------------> |____________| ``` 대부분의 인터럽트 컨트롤러들은 프로그래밍을 통해 인터럽트마다 다른 우선순위 레벨을 배정하는 것이 가능합니다. 예를 들어, 키보드 인터럽트보다 타이머 인터럽트에 더 높은 우선순위 레벨을 배정하여 CPU에서 시간을 더 정확히 측정할 수 있습니다. 예외와 달리 하드웨어 인터럽트는 _비동기적으로 (asynchronously)_ 일어납니다. 즉 CPU에서 실행 중인 코드와 별개로 인터럽트는 언제든 발생할 수 있다는 것입니다. 따라서, 커널에 인터럽트를 도입하면서 동시성(concurrency)의 형태가 등장하고 동시성 관련 버그 발생의 가능성도 생깁니다. Rust의 엄격한 소유권 (ownership) 모델이 전역 가변 변수 사용을 금지해 동시성 관련 버그 발생 가능성을 줄여주지만, 교착 상태(deadlock)를 막아주지는 못하며 이는 본문 아래에서 곧 확인하실 수 있습니다. ## 8259 PIC [Intel 8259] 는 프로그래밍 가능한 인터럽트 컨트롤러 (PIC; Programmable Interrupt Controller)이며, 1976년에 처음 도입되었습니다. 이 장치는 오래전에 신형 장치 [APIC]로 대체됐지만, 이전 버전과의 호환성 유지를 위해 그 인터페이스만은 최신 시스템들도 지원하고 있습니다. 8259 PIC를 다루는 것이 APIC를 다루는 것보다 쉽습니다. 그렇기에 인터럽트에 대해 배우고 입문하는 현재 단계에서는 8259 PIC를 쓰고, 이 블로그 시리즈의 이후 글에서는 APIC로 교체하여 사용하겠습니다. [APIC]: https://en.wikipedia.org/wiki/Intel_APIC_Architecture Intel 8259 PIC는 8개의 인터럽트 통신선과 CPU와 통신하기 위한 몇 개의 통신선을 가집니다. 과거의 전형적인 PC 시스템은 8259 PIC를 2개 장착하고 있었는데 (주 PIC와 부 PIC), 주 PIC의 인터럽트 통신선 중 하나를 부 PIC에 연결했습니다. [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ``` ____________ ____________ Real Time Clock --> | | Timer -------------> | | ACPI -------------> | | Keyboard-----------> | | _____ Available --------> | Secondary |----------------------> | Primary | | | Available --------> | Interrupt | Serial Port 2 -----> | Interrupt |---> | CPU | Mouse ------------> | Controller | Serial Port 1 -----> | Controller | |_____| Co-Processor -----> | | Parallel Port 2/3 -> | | Primary ATA ------> | | Floppy disk -------> | | Secondary ATA ----> |____________| Parallel Port 1----> |____________| ``` 위 도표는 인터럽트 통신선을 배정하는 전형적인 방식을 보여줍니다. 15개의 선 중 대부분은 어떤 장치와 연결할지 이미 정해져 있습니다. 예를 들어, 부 PIC의 4번 통신선은 마우스에 연결됩니다. 각 컨트롤러는 "command" 포트와 "data" 포트, 이 2개의 [입출력 포트][I/O ports]들을 사용해 설정합니다. 주 PIC는 `0x20`번 포트가 command 포트, `0x21`번 포트가 data 포트입니다. 부 PIC는 `0xa0`번 포트가 command 포트, `0xa1` 포트가 data 포트입니다. PIC를 설정하는 자세한 방법에 대해서는 [osdev.org의 글][article on osdev.org]을 찾아보시길 바랍니다. [I/O ports]: @/edition-2/posts/04-testing/index.md#i-o-ports [article on osdev.org]: https://wiki.osdev.org/8259_PIC ### 구현 위 PIC들의 기본 설정에서 PIC는 0-15 구간의 인터럽트 벡터 번호를 CPU에 전송합니다. IDT에서 이 구간의 인터럽트 벡터 번호들은 이미 CPU 예외들에 배정되어 있기에, PIC의 기본 설정을 그대로 사용하지 못합니다. 예를 들면 벡터 번호 8은 더블 폴트에 배정되어 있습니다. 벡터 번호가 중복되는 문제를 해결하려면 PIC가 전송하는 인터럽트들을 다른 벡터 번호에 재배정 해야 합니다. 기존 예외들의 벡터 번호와 겹치지 않는 이상 인터럽트들에 어떤 번호를 배정하는지는 크게 중요하지 않습니다만, 예외들에 배정된 첫 32개의 슬롯 다음 비는 32-47 구간의 벡터 번호를 고르는 것이 일반적입니다. PIC 장치의 command 포트 및 data 포트에 특수한 값을 쓰면 장치 설정을 변경할 수 있습니다. 운 좋게도 [`pic8259`] 크레이트 덕에 우리가 장치 설정 초기화/변경 로직을 직접 작성할 필요는 없습니다. 작동 원리가 궁금하시다면 해당 크레이트의 [소스 코드][pic crate source]를 직접 확인해보세요. 코드양이 많지 않고 문서화도 잘 되어 있습니다. [pic crate source]: https://docs.rs/crate/pic8259/0.10.1/source/src/lib.rs 의존 크레이트로 해당 크레이트를 추가하기 위해 아래의 코드를 추가합니다. [`pic8259`]: https://docs.rs/pic8259/0.10.1/pic8259/ ```toml # in Cargo.toml [dependencies] pic8259 = "0.10.1" ``` 이 크레이트의 [`ChainedPics`] 구조체는 위에서 봤던 주/부 PIC 연결 방식을 적절한 추상 레벨에서 표현합니다. 이 구조체는 아래처럼 사용하도록 설계되었습니다. [`ChainedPics`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html ```rust // in src/interrupts.rs use pic8259::ChainedPics; use spin; pub const PIC_1_OFFSET: u8 = 32; pub const PIC_2_OFFSET: u8 = PIC_1_OFFSET + 8; pub static PICS: spin::Mutex = spin::Mutex::new(unsafe { ChainedPics::new(PIC_1_OFFSET, PIC_2_OFFSET) }); ``` 위에서 언급했듯이 PIC들이 사용할 벡터 번호의 오프셋을 32-47 구간에서 선택합니다. `ChainedPics` 구조체를 감싼 `Mutex`의 `lock` 함수를 통해 안전하게 값을 수정할 수 있는데, 이는 다음 단계에서 유용합니다. `ChainedPics::new` 함수에 잘못된 오프셋을 넘기면 undefined behavior가 일어날 수 있어 이 함수는 unsafe 함수로 정의되었습니다. [spin mutex lock]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html#method.lock 이제 `init` 함수에서 8259 PIC 장치를 초기화할 수 있습니다. ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; // 새로 추가함 } ``` [`initialize`] 함수를 사용해 PIC 장치를 초기화합니다. PIC 장치를 잘못 초기화하면 undefined behavior를 일으킬 수 있으므로, `ChainedPics::new` 함수와 마찬가지로 이 함수도 unsafe 함수로 정의되었습니다. [`initialize`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html#method.initialize 추가한 코드에 문제가 없었다면, 다시 `cargo run`을 실행해도 예전처럼 "It did not crash"라는 메시지가 출력될 것입니다. ## 인터럽트 활성화하기 CPU 설정에서 인터럽트 사용이 해제되어 있었기에 아직 아무 일도 일어나지 않았습니다. 인터럽트 사용이 해제되어 있으면 CPU는 인터럽트 컨트롤러부터 오는 신호를 전혀 받지 않고, 따라서 어떤 인터럽트도 CPU에 도달할 수 없습니다. CPU 설정을 바꿔보겠습니다. ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; x86_64::instructions::interrupts::enable(); // 새로 추가함 } ``` `x86_64` 크레이트의 `interrupts::enable` 함수는 `sti` 명령어 (“set interrupts”)를 실행해 외부 인터럽트를 사용하도록 설정합니다. 이제 `cargo run`을 실행하면 더블 폴트가 발생하는 것을 확인할 수 있습니다. ![QEMU printing `EXCEPTION: DOUBLE FAULT` because of hardware timer](qemu-hardware-timer-double-fault.png) 더블 폴트가 발생하는 이유는, [Intel 8253] 장치에서 기본적으로 하드웨어 타이머를 사용하도록 설정이 되어 있고, CPU에서 인터럽트 사용을 활성화한 직후부터 타이머 인터럽트가 CPU로 전송되기 때문입니다. 우리가 아직 타이머 인터럽트 처리 함수를 정의하지 않았기 때문에 더블 폴트 처리 함수가 호출됩니다. [Intel 8253]: https://en.wikipedia.org/wiki/Intel_8253 ## 타이머 인터럽트 처리하기 [위 도표](#8259-pic)를 보면 타이머는 주 PIC의 0번 통신선을 사용합니다. 이는 즉 타이머 인터럽트가 CPU에 인터럽트 벡터 번호가 32 (0 + 오프셋 32)인 인터럽트로 전송된다는 것을 뜻합니다. 코드에 번호 32를 그대로 적지 않고 `InterruptIndex` enum에 저장합니다. ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, } impl InterruptIndex { fn as_u8(self) -> u8 { self as u8 } fn as_usize(self) -> usize { usize::from(self.as_u8()) } } ``` [C언어의 enum][C-like enum]처럼 이 enum은 각 분류에 사용할 인덱스 값을 지정할 수 있습니다. `repr(u8)` 속성은 해당 enum을 `u8` 타입으로서 저장 및 표현되도록 합니다. 향후에 새로운 인터럽트들을 지원해야 할 때 이 enum에 새로운 분류를 추가할 것입니다. [C-like enum]: https://doc.rust-lang.org/reference/items/enumerations.html#custom-discriminant-values-for-fieldless-enumerations 이제 타이머 인터럽트를 처리할 함수를 작성할 수 있습니다. ```rust // in src/interrupts.rs use crate::print; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] idt[InterruptIndex::Timer.as_usize()] .set_handler_fn(timer_interrupt_handler); // 새로 추가함 idt }; } extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); } ``` `timer_interrupt_handler` 함수는 우리가 가진 다른 예외 처리 함수들과 같은 함수 원형을 가지는데, 그 이유는 CPU가 예외와 인터럽트에 같은 방식으로 대응하기 때문입니다 (유일한 차이점은 일부 예외들이 오류 코드를 추가로 push한다는 것). [`InterruptDescriptorTable`] 구조체는 [`IndexMut`] 트레이트를 구현해서 배열을 색인하는 것과 동일한 문법을 써서 테이블의 각 엔트리에 접근할 수 있습니다. [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [`IndexMut`]: https://doc.rust-lang.org/core/ops/trait.IndexMut.html 우리가 작성한 타이머 인트럽트 처리 함수는 화면에 점을 출력합니다. 타이머 인터럽트는 주기적으로 발생하므로, 타이머 주기마다 화면에 새로운 점이 출력되기를 기대하는 것이 자연스럽습니다. 하지만 커널을 실행해 보면 화면에 점이 단 1개만 출력될 것입니다. ![QEMU printing only a single dot for hardware timer](qemu-single-dot-printed.png) ### End of Interrupt 점이 1개만 출력되는 이유는 PIC가 인터럽트 처리 함수로부터 명시적으로 “end of interrupt” (EOI) 신호가 전송되어 오기를 기다리기 때문입니다. 이 신호는 PIC에 해당 인터럽트가 처리되었으며 시스템이 다음 인터럽트를 받을 준비가 된 것을 알립니다. 신호를 받지 못한 PIC는 시스템이 아직 첫 타이머 인터럽트를 처리 중이라 생각하고 EOI 신호가 올 때까지 다음 인터럽트를 보내지 않고 기다리는 것입니다. static으로 선언된 `PICS` 구조체를 다시 사용해 EOI 신호를 보냅니다. ```rust // in src/interrupts.rs extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Timer.as_u8()); } } ``` `notify_end_of_interrupt` 함수는 주 PIC와 부 PIC 중 누가 인터럽트를 보냈었는지 파악하고, 그 후 `command` 포트와 `data` 포트를 사용해 인터럽트를 전송했던 PIC로 EOI 신호를 보냅니다. 부 PIC가 인터럽트를 보냈었다면, 부 PIC가 주 PIC의 입력 통신선에 연결되어 있다 보니 두 PIC 모두 EOI 신호를 받게 됩니다. 여기서 우리는 올바른 인터럽트 벡터 번호를 사용하도록 주의해야 합니다. 잘못된 번호를 쓰면, 아직 CPU로 전송하지 않은 중요한 인터럽트가 소실되거나 시스템이 아무 반응도 하지 않게 될 수 있습니다. 이런 이유로 `notify_end_of_interrupt` 함수가 `unsafe`로 선언된 것입니다. 다시 `cargo run`을 실행하면 화면에 주기적으로 점이 찍히는 것을 확인할 수 있습니다. ![QEMU printing consecutive dots showing the hardware timer](qemu-hardware-timer-dots.gif) ### 타이머 설정하기 우리가 쓰는 하드웨어 타이머는 _Programmable Interval Timer_, 또는 줄여서 PIT라고 부릅니다. 이름에서 알 수 있듯이, 프로그래밍을 통해 인터럽트 사이 시간 간격을 조정할 수 있습니다. 곧 [APIC 타이머][APIC timer]로 교체해 사용할 것이기 때문에 PIT에 대해 자세히 다루지는 않겠습니다만, OSDev 위키에 [PIT를 설정하는 방법][configuring the PIT]에 대한 자세한 글이 있으니 참고하시기 바랍니다. [APIC timer]: https://wiki.osdev.org/APIC_timer [configuring the PIT]: https://wiki.osdev.org/Programmable_Interval_Timer ## 교착 상태 (Deadlock) 이제 우리의 커널에 동시성의 개념이 등장했습니다. 타이머 인터럽트는 비동기적으로 발생하기에 `_start` 함수 실행 중 언제든 발생할 수 있습니다. Rust의 소유권 (ownership) 시스템이 다양한 동시성 관련 버그를 컴파일 시간에 방지하지만, 교착 상태는 막지 못합니다. 스레드(thread)가 해제되지 않을 lock을 얻으려고 할 때 교착 상태가 일어나며, 해당 스레드는 영원히 대기 상태에 갇히게 됩니다. 현재 우리의 커널에서 교착 상태를 일으킬 수 있습니다. 우리가 쓰는 `println` 매크로가 호출하는 `vga_buffer::_print` 함수는 스핀 락(spinlock)을 통해 [전역 변수 `WRITER`에 대한 lock을 잠급니다][vga spinlock]. [vga spinlock]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ```rust // in src/vga_buffer.rs […] #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` 위 함수는 `WRITER`에 대한 lock을 잠그고 `write_fmt`를 호출하며, 위 함수의 반환 직전에 `WRITER`에 대한 lock을 암묵적으로 해제합니다. `WRITER`에 대한 lock이 잠긴 상태에서 인터럽트가 발생하고, 해당 인터럽트의 처리 함수가 화면에 뭔가 출력하려 한다고 가정해봅시다. | 시간 순서 | _start | 인터럽트 처리 함수 | | --------- | ------------------------- | ----------------------------------------------- | | 0 | `println!` 호출 |   | | 1 | `print`가 `WRITER`를 잠금 |   | | 2 | | **인터럽트 발생**, 인터럽트 처리 함수 실행 시작 | | 3 | | `println!` 호출 | | 4 | | `print`가 이미 잠긴 `WRITER`를 또 잠그려고 함 | | 5 | | `print`가 이미 잠긴 `WRITER`를 또 잠그려고 함 | | … | | … | | _never_ | _`WRITER` 잠금 해제_ | `WRITER`에 대한 lock이 잠겨 있으니, 인터럽트 처리 함수는 해당 lock이 풀릴 때까지 기다립니다. 하지만 `_start` 함수는 인터럽트 처리 함수가 반환한 후에 실행을 재개하기 때문에 lock이 풀리지 않습니다. 그 결과, 시스템 전체가 응답 불가 상태가 됩니다. ### 교착 상태 일으키기 `_start` 함수의 맨 마지막 loop 안에서 화면에 출력을 시도하면 쉽게 커널에 교착 상태를 일으킬 수 있습니다. ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] loop { use blog_os::print; print!("-"); // 새로 추가함 } } ``` QEMU에서 실행하면 아래와 같은 출력을 얻게 됩니다. ![QEMU output with many rows of hyphens and no dots](./qemu-deadlock.png) 첫 타이머 인터럽트 발생 전까지는 제한된 수의 붙임표(`-`)가 출력됩니다. 첫 타이머 인터럽트 후, 타이머 인터럽트 처리 함수가 온점(`.`)을 출력하려다 교착 상태에 빠지고 시스템은 아무 반응을 하지 않습니다. 이것이 출력 내용에 온점이 전혀 없는 이유입니다. 타이머 인터럽트가 비동기적으로 발생하다보니 커널을 실행할 때마다 출력되는 붙임표의 수가 다를 수 있습니다. 동시성 관련 버그들은 실행 결과가 이렇게 비결정론적(non-deterministic)인 경우가 많아 디버깅하기 쉽지 않습니다. ### 교착 상태 방지하기 `Mutex`가 잠긴 동안 인터럽트를 해제하면 교착 상태를 방지할 수 있습니다. ```rust // in src/vga_buffer.rs /// Prints the given formatted string to the VGA text buffer /// through the global `WRITER` instance. #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // 새로 추가함 interrupts::without_interrupts(|| { // 새로 추가함 WRITER.lock().write_fmt(args).unwrap(); }); } ``` [`without_interrupts`] 함수는 인자로 받은 [클로저(closure)][closure]를 인터럽트가 없는 환경에서 실행합니다. 이 함수를 통해 `Mutex`가 잠긴 동안 인터럽트가 발생하지 않게 보장합니다. 커널을 다시 실행하면 커널이 응답 불가 상태에 빠지지 않고 계속 실행되는 것을 확인할 수 있습니다. (화면 스크롤이 너무 빠르게 내려가다 보니 화면에 점이 출력되는 것을 확인하기 어려울 수 있습니다. `_start` 함수의 loop 안에 `for _ in 0..10000 {}`를 삽입하는 등의 방법으로 출력 속도를 늦춰 보세요.) [`without_interrupts`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.without_interrupts.html [closure]: https://doc.rust-lang.org/book/ch13-01-closures.html 직렬 포트를 이용한 출력 함수 코드 역시 같은 방식으로 수정하여 교착 상태를 방지합니다. ```rust // in src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // 새로 추가함 interrupts::without_interrupts(|| { // 새로 추가함 SERIAL1 .lock() .write_fmt(args) .expect("Printing to serial failed"); }); } ``` 인터럽트를 해제하는 것이 일반적으로 사용할 해결 방식이 아니라는 것을 아셔야 합니다. 인터럽트를 해제하면 인터럽트가 최대로 많이 몰렸을 때 시스템이 인터럽트에 반응할 수 있는 시간 (worst-case interrupt latency)이 늦어집니다. 따라서 인터럽트를 해제하려면 아주 짧은 시간 동안만 해야 합니다. ## 경쟁 상태 (Race Condition) 예방하기 `cargo test`를 실행하면 테스트 `test_println_output`가 때때로 실패하는 것을 확인할 수 있습니다: ``` > cargo test --lib […] Running 4 tests test_breakpoint_exception...[ok] test_println... [ok] test_println_many... [ok] test_println_output... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `'.'`, right: `'S'`', src/vga_buffer.rs:205:9 ``` 이 테스트가 때때로 실패하는 것은 이 테스트와 우리가 작성한 타이머 처리 함수 간 _경쟁 상태 (race condition)_ 때문입니다. 예전에 작성했던 이 테스트의 코드를 다시 살펴보겠습니다. ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` 이 테스트는 VGA 버퍼에 문자열에 출력한 후 `buffer_chars` 배열을 직접 순회하여 출력된 내용을 확인합니다. 경쟁 상태가 생기는 이유는, `println`과 `screen_char`를 읽는 코드 사이에 타이머 인터럽트 처리 함수가 호출될 수 있기 때문입니다. 이는 컴파일 시간에 Rust가 막아주는 위험한 _데이터 레이스 (data race)_ 와는 다릅니다. 자세한 내용은 [_Rustonomicon_][nomicon-races]을 참고해주세요. [nomicon-races]: https://doc.rust-lang.org/nomicon/races.html 이 문제를 고치려면 위 테스트가 실행 중에 `WRITER`에 대한 lock을 계속 잠그고 있어야 합니다. 그렇게 해야 타이머 처리 함수가 그 사이에 `.`을 출력하지 못합니다. 아래와 같이 테스트를 수정합니다. ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { use core::fmt::Write; use x86_64::instructions::interrupts; let s = "Some test string that fits on a single line"; interrupts::without_interrupts(|| { let mut writer = WRITER.lock(); writeln!(writer, "\n{}", s).expect("writeln failed"); for (i, c) in s.chars().enumerate() { let screen_char = writer.buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } }); } ``` 변경 사항들을 정리하면 아래와 같습니다. - 테스트 실행 중에는 `lock()` 함수를 사용해 WRITER를 잠가 놓습니다. `println` 대신 [`writeln`] 매크로를 써서 이미 잠긴 WRITER를 이용해 메시지를 출력합니다. - 또 다른 교착 상태를 피하려고 테스트 중에는 인터럽트의 사용을 해제합니다. 그렇게 하지 않으면 테스트 실행 중 WRITER가 잠긴 상태에서 발생한 다른 인터럽트가 테스트 실행을 방해할 수 있습니다. - 테스트 실행 시작 전에 타이머 인터럽트 처리 함수가 실행될 수 있으니 문자열 `s` 출력 전에 개행 문자 `\n`을 출력합니다. 이렇게 하면 타이머 인터럽트 처리 함수가 현재 행에 이미 `.` 문자를 여럿 출력했더라도 이 테스트가 실패하지 않을 것입니다. [`writeln`]: https://doc.rust-lang.org/core/macro.writeln.html 이제 다시 `cargo test`를 실행하면 항상 성공하는 것을 확인하실 수 있습니다. 위에서 다룬 경쟁 상태 (race condition)는 테스트 실패를 일으키는 것 외에 큰 해를 끼치지는 않았습니다. 하지만 비결정론적인 결과를 낳는다는 본질적인 특성 때문에 이보다 디버깅하기 더 까다로운 경쟁 상태 역시 존재할 수 있습니다. 데이터 레이스(data race)라는 가장 위험한 종류의 경쟁 상태는 시스템 크래시나 메모리 커럽션 (memory corruption) 등 온갖 undefined behavior를 일으킬 수 있지만, 다행히 Rust가 우리를 데이터 레이스로부터 지켜줍니다. ## `hlt` 명령어 여태까지는 `_start` 및 `panic` 함수들의 맨 마지막에 간단한 빈 loop를 사용했습니다. 이 loop 때문에 CPU는 실행을 종료하지 않는데, CPU가 딱히 할 일이 없는데 CPU를 최고 속도로 가동하는 것은 에너지 효율성 측면에서 매우 비효율적입니다. 커널 실행 후 태스크 매니저를 보시면 QEMU 프로세스가 항상 CPU 시간을 100% 가까이 사용하고 있을 것입니다. 우리가 정말 해야 할 일은 다음 인터럽트 전까지 CPU가 정지하도록 하는 것입니다. CPU는 저전력 상태의 대기 모드에서 실행을 정지하고 대기할 수 있습니다. `hlt` 명령어를 쓰면 CPU가 저전력 대기 상태에 들어가게 할 수 있습니다. 이 명령어를 사용해 에너지를 효율적으로 사용하는 무한 루프를 작성합니다. [`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) ```rust // in src/lib.rs pub fn hlt_loop() -> ! { loop { x86_64::instructions::hlt(); } } ``` 함수 `instructions::hlt`는 그저 hlt 어셈블리 명령어를 [얇게 감싸 포장][thin wrapper]합니다. 이 명령어로는 메모리 안전성을 해칠 방법이 없어 안전합니다. [thin wrapper]: https://github.com/rust-osdev/x86_64/blob/5e8e218381c5205f5777cb50da3ecac5d7e3b1ab/src/instructions/mod.rs#L16-L22 `_start` 및 `panic` 함수에서 사용하던 무한 루프를 방금 작성한 `hlt_loop`로 교체합니다. ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] println!("It did not crash!"); blog_os::hlt_loop(); // 새로 추가함 } #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); blog_os::hlt_loop(); // 새로 추가함 } ``` `lib.rs` 또한 마찬가지로 수정합니다. ```rust // in src/lib.rs /// `cargo test`의 실행 시작 지점 #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); test_main(); hlt_loop(); // 새로 추가함 } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); hlt_loop(); // 새로 추가함 } ``` 이제 QEMU에서 커널을 실행하면 CPU 사용량이 훨씬 감소한 것을 확인할 수 있습니다. ## 키보드 입력 외부 장치로부터 오는 인터럽트를 커널에서 처리할 수 있게 되었으니, 이제 드디어 커널이 키보드 입력을 지원하도록 만들 차례입니다. 키보드 입력을 지원함으로써 커널과 상호작용할 수 있게 될 것입니다. [PS/2]: https://en.wikipedia.org/wiki/PS/2_port 하드웨어 타이머와 마찬가지로, 키보드 컨트롤러의 인터럽트도 기본적으로 사용이 활성화되어 있습니다. 키보드 키를 누르면 키보드 컨트롤러는 PIC로 인터럽트를 보내고, PIC는 다시 그 인터럽트를 CPU로 전달합니다. CPU는 IDT에서 키보드 인터럽트의 엔트리를 조회하지만, 등록된 인터럽트 처리 함수가 없어 더블 폴트가 발생합니다. 키보드 인터럽트를 처리하는 함수를 추가합니다. 다른 인터럽트 번호를 사용한다는 점을 빼면, 이전에 타이머 인터럽트 처리 함수를 작성했던 것과 매우 유사합니다. ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, Keyboard, // 새로 추가함 } lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] // 새로 추가함 idt[InterruptIndex::Keyboard.as_usize()] .set_handler_fn(keyboard_interrupt_handler); idt }; } extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("k"); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` [위 도표](#8259-pic)를 보면 키보드는 주 PIC의 1번 통신선을 사용합니다. 즉 CPU에 전달된 키보드 인터럽트의 벡터 번호는 33 (1 + 오프셋 32)이 됩니다. 해당 번호를 `InterruptIndex` enum의 새 분류 `Keyboard`에 배정합니다. `Keyboard`의 값을 명시적으로 정해주지 않아도 바로 이전 분류의 값에 1을 더한 값(=33)이 배정됩니다. 인터럽트 처리 함수는 `k`를 출력한 후 인터럽트 컨트롤러에 EOI 신호를 전송합니다. 이제 아무 키를 하나 입력하면 화면에 `k`가 출력됩니다. 하지만 아무 키를 하나 새로 입력하면 화면에 `k`가 새로 출력되지 않습니다. 그 이유는 기존에 입력된 키의 _스캔 코드 (scancode)_ 를 우리가 읽어 가지 않는 한 키보드 컨트롤러가 새 인터럽트를 보내지 않기 때문입니다. ### 스캔 코드 읽기 _어떤_ 키가 입력됐는지 확인하려면 키보드 컨트롤러에 저장된 정보를 확인해야 합니다. PS/2 컨트롤러의 데이터 포트, 즉 `0x60`번 [입출력 포트 (I/O port)][I/O port]를 읽어 들여 어떤 키가 입력됐는지 확인할 수 있습니다. [I/O port]: @/edition-2/posts/04-testing/index.md#i-o-ports ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; print!("{}", scancode); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` `x86_64` 크레이트의 [`Port`] 타입을 사용해 키보드의 데이터 포트로부터 1바이트를 읽어옵니다. 이 1바이트의 데이터를 [_스캔 코드 (scancode)_][_scancode_]라고 부르며, 누른 키 또는 누른 상태에서 뗀 키의 정보를 가집니다. 일단은 스캔 코드를 출력하기만 할 뿐, 읽은 스캔 코드 값을 이용한 작업은 하지 않습니다. [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html [_scancode_]: https://en.wikipedia.org/wiki/Scancode ![QEMU printing scancodes to the screen when keys are pressed](qemu-printing-scancodes.gif) 위 이미지는 제가 키보드로 천천히 "123"을 입력했을 때의 화면을 보여줍니다. 이를 통해 인접한 키들은 인접한 값의 스캔 코드를 가진다는 것, 그리고 키를 누를 때와 누른 키를 뗄 때 서로 다른 스캔 코드가 발생한다는 것을 알 수 있습니다. 근데 각 스캔 코드는 실제 키 누름/뗌에 어떤 기준으로 배정된 것일까요? ### 스캔 코드 해석하기 스캔 코드를 키보드 키에 배정하는 표준을 _스캔코드 셋 (scancode set)_ 이라 부르며, 서로 다른 3가지 표준이 존재합니다. 셋 모두 초기의 IBM 컴퓨터들 ([IBM XT], [IBM 3270 PC], [IBM AT])로부터 기원합니다. 이후의 컴퓨터들은 새로운 스캔코드 셋을 정의하는 대신 기존의 것들을 지원하거나 확장해 사용했습니다. 오늘날 대부분의 키보드는 에뮬레이팅을 통해 이 3가지 셋 중 어느 것이든 사용할 수 있습니다. [IBM XT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer_XT [IBM 3270 PC]: https://en.wikipedia.org/wiki/IBM_3270_PC [IBM AT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer/AT PS/2 키보드는 기본적으로 1번 스캔 코드 셋 ("XT")를 사용하게 되어 있습니다. 이 셋에서 스캔 코드의 하위 7비트는 입력된 키를 표현하고, 최상위 비트는 키를 누른 것인지 ("0") 혹은 키에서 손을 뗀 것인지 ("1") 표현합니다. 엔터 키처럼 [IBM XT] 키보드에 없었던 키들은 2개의 스캔 코드 (`0xe0` 그리고 그 후 키를 나타내는 1바이트)를 연이어 생성합니다. [OSDev Wiki][scancode set 1]를 보시면 1번 스캔코드 셋의 모든 스캔 코드와 그에 대응하는 키를 확인하실 수 있습니다. [scancode set 1]: https://wiki.osdev.org/Keyboard#Scan_Code_Set_1 `match`문을 사용해 스캔 코드를 알맞는 키에 대응시켜 해석합니다. ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; // 새로 추가함 let key = match scancode { 0x02 => Some('1'), 0x03 => Some('2'), 0x04 => Some('3'), 0x05 => Some('4'), 0x06 => Some('5'), 0x07 => Some('6'), 0x08 => Some('7'), 0x09 => Some('8'), 0x0a => Some('9'), 0x0b => Some('0'), _ => None, }; if let Some(key) = key { print!("{}", key); } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` 위 코드는 0-9의 숫자 키 누름을 인식하고 출력하며, 다른 키는 무시합니다. [match]문을 사용해 각 스캔코드에 문자 또는 `None`을 배정합니다. 그 후 [`if let`]을 사용해 스캔 코드에 배정된 문자 `key`를 추출합니다. 이미 존재하던 변수 `key`와 같은 이름을 패턴에서 사용해 기존 변수의 정의를 [shadow]하는데, 이는 Rust에서 `Option` 타입 안의 값을 추출할 때 자주 사용되는 방식입니다. [match]: https://doc.rust-lang.org/book/ch06-02-match.html [`if let`]: https://doc.rust-lang.org/book/ch19-01-all-the-places-for-patterns.html#conditional-if-let-expressions [shadow]: https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#shadowing 이제 키보드로 숫자를 입력할 수 있습니다. ![QEMU printing numbers to the screen](qemu-printing-numbers.gif) 나머지 키를 인식하는 것도 위와 마찬가지 방법으로 진행하면 됩니다. 다행히도, [`pc-keyboard`] 크레이트가 1번/2번 스캔코드 셋을 해석하는 기능을 제공합니다. `Cargo.toml`에 이 크레이트를 추가하고 `lib.rs`에서 불러와 사용합니다. [`pc-keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/ ```toml # in Cargo.toml [dependencies] pc-keyboard = "0.7.0" ``` `pc-keyboard` 크레이트를 사용해 `keyboard_interrupt_handler` 함수를 새로 작성합니다. ```rust // in/src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use spin::Mutex; use x86_64::instructions::port::Port; lazy_static! { static ref KEYBOARD: Mutex> = Mutex::new(Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore) ); } let mut keyboard = KEYBOARD.lock(); let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` `lazy_static` 매크로를 사용해 Mutex로 감싼 [`Keyboard`] 타입의 static 오브젝트를 얻습니다. `Keyboard`가 미국 키보드 레이아웃과 1번 스캔코드 셋을 사용하도록 초기화합니다. [`HandleControl`] 매개변수를 사용하면 `ctrl+[a-z]` 키 입력을 유니코드 `U+0001`에서 `U+001A`까지 값에 대응시킬 수 있습니다. 우리는 그렇게 하지 않기 위해 해당 매개변수에 `Ignore` 옵션을 주고 `ctrl` 키를 일반 키로서 취급합니다. [`HandleControl`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/enum.HandleControl.html 인터럽트마다 우리는 Mutex를 잠그고 키보드 컨트롤러로부터 스캔 코드를 읽어온 후, 그 스캔 코드를 [`add_byte`] 함수에 전달합니다. 이 함수는 스캔 코드를 `Option`으로 변환합니다. [`KeyEvent`] 타입은 입력된 키의 정보와 키의 누름/뗌 정보를 저장합니다. [`Keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html [`add_byte`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.add_byte [`KeyEvent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.KeyEvent.html [`process_keyevent`] 함수가 인자로 받은 `KeyEvent`를 변환하여 입력된 키의 문자를 반환합니다 (변환 불가한 경우 `None` 반환). 예를 들어, `A`키 입력 시 shift키 입력 여부에 따라 소문자 `a` 또는 대문자 `A`를 얻습니다. [`process_keyevent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.process_keyevent 수정한 인터럽트 처리 함수를 통해 텍스트를 입력한 대로 화면에 출력할 수 있습니다. ![Typing "Hello World" in QEMU](qemu-typing.gif) ### 키보드 설정하기 PS/2 키보드의 일부 설정을 변경하는 것이 가능한데, 예를 들면 어떤 스캔 코드 집합을 사용할지 지정할 수 있습니다. 본문이 너무 길어지니 해당 내용까지 다루지는 않겠지만, OSDev 위키를 확인하시면 [키보드 설정을 변경할 때 사용할 수 있는 명령어][configuration commands]들의 목록을 보실 수 있습니다. [configuration commands]: https://wiki.osdev.org/PS/2_Keyboard#Commands ## 정리 이 글에서는 인터럽트를 활성화하고 외부 인터럽트를 처리하는 방법에 대해 설명했습니다. 우리는 8259 PIC 장치, 주 PIC와 부 PIC를 연결하는 방식, 인터럽트 번호를 재배정하는 방법, 그리고 "end of interrupt" 신호 등에 대해 배웠습니다. 우리는 하드웨어 타이머와 키보드의 인터럽트 처리 함수를 구현했고, CPU를 다음 인터럽트까지 멈추는 `hlt` 명령어에 대해 배웠습니다. 이제 커널과 상호작용할 수 있게 되었으니, 간단한 커맨드 쉘이나 게임을 작성할 기본적인 도구를 갖춘 셈입니다. ## 다음 단계는 무엇일까요? 운영체제에서 타이머 인터럽트는 필수적인 존재입니다. 그 이유는 타이머 인터럽트를 사용해 주기적으로 실행 중인 프로세스를 멈추고 커널로 제어 흐름을 가져올 수 있기 때문입니다. 그 후 커널은 다른 프로세스를 실행시킬 수 있고, 여러 프로세스가 동시에 실행 중인 듯한 사용자 경험을 제공할 수 있습니다. 프로세스나 스레드를 만들려면 우선 그들이 사용할 메모리를 할당할 방법이 필요합니다. 다음 몇 글들에서는 메모리 할당 기능을 제공하기 위한 메모리 관리 (memory management)에 대해 알아보겠습니다. ================================================ FILE: blog/content/edition-2/posts/07-hardware-interrupts/index.md ================================================ +++ title = "Hardware Interrupts" weight = 7 path = "hardware-interrupts" date = 2018-10-22 [extra] chapter = "Interrupts" +++ In this post, we set up the programmable interrupt controller to correctly forward hardware interrupts to the CPU. To handle these interrupts, we add new entries to our interrupt descriptor table, just like we did for our exception handlers. We will learn how to get periodic timer interrupts and how to get input from the keyboard. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-07`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-07 ## Overview Interrupts provide a way to notify the CPU from attached hardware devices. So instead of letting the kernel periodically check the keyboard for new characters (a process called [_polling_]), the keyboard can notify the kernel of each keypress. This is much more efficient because the kernel only needs to act when something happened. It also allows faster reaction times since the kernel can react immediately and not only at the next poll. [_polling_]: https://en.wikipedia.org/wiki/Polling_(computer_science) Connecting all hardware devices directly to the CPU is not possible. Instead, a separate _interrupt controller_ aggregates the interrupts from all devices and then notifies the CPU: ``` ____________ _____ Timer ------------> | | | | Keyboard ---------> | Interrupt |---------> | CPU | Other Hardware ---> | Controller | |_____| Etc. -------------> |____________| ``` Most interrupt controllers are programmable, which means they support different priority levels for interrupts. For example, this allows to give timer interrupts a higher priority than keyboard interrupts to ensure accurate timekeeping. Unlike exceptions, hardware interrupts occur _asynchronously_. This means they are completely independent from the executed code and can occur at any time. Thus, we suddenly have a form of concurrency in our kernel with all the potential concurrency-related bugs. Rust's strict ownership model helps us here because it forbids mutable global state. However, deadlocks are still possible, as we will see later in this post. ## The 8259 PIC The [Intel 8259] is a programmable interrupt controller (PIC) introduced in 1976. It has long been replaced by the newer [APIC], but its interface is still supported on current systems for backwards compatibility reasons. The 8259 PIC is significantly easier to set up than the APIC, so we will use it to introduce ourselves to interrupts before we switch to the APIC in a later post. [APIC]: https://en.wikipedia.org/wiki/Intel_APIC_Architecture The 8259 has eight interrupt lines and several lines for communicating with the CPU. The typical systems back then were equipped with two instances of the 8259 PIC, one primary and one secondary PIC, connected to one of the interrupt lines of the primary: [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ``` ____________ ____________ Real Time Clock --> | | Timer -------------> | | ACPI -------------> | | Keyboard-----------> | | _____ Available --------> | Secondary |----------------------> | Primary | | | Available --------> | Interrupt | Serial Port 2 -----> | Interrupt |---> | CPU | Mouse ------------> | Controller | Serial Port 1 -----> | Controller | |_____| Co-Processor -----> | | Parallel Port 2/3 -> | | Primary ATA ------> | | Floppy disk -------> | | Secondary ATA ----> |____________| Parallel Port 1----> |____________| ``` This graphic shows the typical assignment of interrupt lines. We see that most of the 15 lines have a fixed mapping, e.g., line 4 of the secondary PIC is assigned to the mouse. Each controller can be configured through two [I/O ports], one “command” port and one “data” port. For the primary controller, these ports are `0x20` (command) and `0x21` (data). For the secondary controller, they are `0xa0` (command) and `0xa1` (data). For more information on how the PICs can be configured, see the [article on osdev.org]. [I/O ports]: @/edition-2/posts/04-testing/index.md#i-o-ports [article on osdev.org]: https://wiki.osdev.org/8259_PIC ### Implementation The default configuration of the PICs is not usable because it sends interrupt vector numbers in the range of 0–15 to the CPU. These numbers are already occupied by CPU exceptions. For example, number 8 corresponds to a double fault. To fix this overlapping issue, we need to remap the PIC interrupts to different numbers. The actual range doesn't matter as long as it does not overlap with the exceptions, but typically the range of 32–47 is chosen, because these are the first free numbers after the 32 exception slots. The configuration happens by writing special values to the command and data ports of the PICs. Fortunately, there is already a crate called [`pic8259`], so we don't need to write the initialization sequence ourselves. However, if you are interested in how it works, check out [its source code][pic crate source]. It's fairly small and well documented. [pic crate source]: https://docs.rs/crate/pic8259/0.10.1/source/src/lib.rs To add the crate as a dependency, we add the following to our project: [`pic8259`]: https://docs.rs/pic8259/0.10.1/pic8259/ ```toml # in Cargo.toml [dependencies] pic8259 = "0.10.1" ``` The main abstraction provided by the crate is the [`ChainedPics`] struct that represents the primary/secondary PIC layout we saw above. It is designed to be used in the following way: [`ChainedPics`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html ```rust // in src/interrupts.rs use pic8259::ChainedPics; use spin; pub const PIC_1_OFFSET: u8 = 32; pub const PIC_2_OFFSET: u8 = PIC_1_OFFSET + 8; pub static PICS: spin::Mutex = spin::Mutex::new(unsafe { ChainedPics::new(PIC_1_OFFSET, PIC_2_OFFSET) }); ``` As noted above, we're setting the offsets for the PICs to the range 32–47. By wrapping the `ChainedPics` struct in a `Mutex`, we can get safe mutable access (through the [`lock` method][spin mutex lock]), which we need in the next step. The `ChainedPics::new` function is unsafe because wrong offsets could cause undefined behavior. [spin mutex lock]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html#method.lock We can now initialize the 8259 PIC in our `init` function: ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; // new } ``` We use the [`initialize`] function to perform the PIC initialization. Like the `ChainedPics::new` function, this function is also unsafe because it can cause undefined behavior if the PIC is misconfigured. [`initialize`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html#method.initialize If all goes well, we should continue to see the "It did not crash" message when executing `cargo run`. ## Enabling Interrupts Until now, nothing happened because interrupts are still disabled in the CPU configuration. This means that the CPU does not listen to the interrupt controller at all, so no interrupts can reach the CPU. Let's change that: ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; x86_64::instructions::interrupts::enable(); // new } ``` The `interrupts::enable` function of the `x86_64` crate executes the special `sti` instruction (“set interrupts”) to enable external interrupts. When we try `cargo run` now, we see that a double fault occurs: ![QEMU printing `EXCEPTION: DOUBLE FAULT` because of hardware timer](qemu-hardware-timer-double-fault.png) The reason for this double fault is that the hardware timer (the [Intel 8253], to be exact) is enabled by default, so we start receiving timer interrupts as soon as we enable interrupts. Since we didn't define a handler function for it yet, our double fault handler is invoked. [Intel 8253]: https://en.wikipedia.org/wiki/Intel_8253 ## Handling Timer Interrupts As we see from the graphic [above](#the-8259-pic), the timer uses line 0 of the primary PIC. This means that it arrives at the CPU as interrupt 32 (0 + offset 32). Instead of hardcoding index 32, we store it in an `InterruptIndex` enum: ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, } impl InterruptIndex { fn as_u8(self) -> u8 { self as u8 } fn as_usize(self) -> usize { usize::from(self.as_u8()) } } ``` The enum is a [C-like enum] so that we can directly specify the index for each variant. The `repr(u8)` attribute specifies that each variant is represented as a `u8`. We will add more variants for other interrupts in the future. [C-like enum]: https://doc.rust-lang.org/reference/items/enumerations.html#custom-discriminant-values-for-fieldless-enumerations Now we can add a handler function for the timer interrupt: ```rust // in src/interrupts.rs use crate::print; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] idt[InterruptIndex::Timer.as_usize()] .set_handler_fn(timer_interrupt_handler); // new idt }; } extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); } ``` Our `timer_interrupt_handler` has the same signature as our exception handlers, because the CPU reacts identically to exceptions and external interrupts (the only difference is that some exceptions push an error code). The [`InterruptDescriptorTable`] struct implements the [`IndexMut`] trait, so we can access individual entries through array indexing syntax. [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [`IndexMut`]: https://doc.rust-lang.org/core/ops/trait.IndexMut.html In our timer interrupt handler, we print a dot to the screen. As the timer interrupt happens periodically, we would expect to see a dot appearing on each timer tick. However, when we run it, we see that only a single dot is printed: ![QEMU printing only a single dot for hardware timer](qemu-single-dot-printed.png) ### End of Interrupt The reason is that the PIC expects an explicit “end of interrupt” (EOI) signal from our interrupt handler. This signal tells the controller that the interrupt was processed and that the system is ready to receive the next interrupt. So the PIC thinks we're still busy processing the first timer interrupt and waits patiently for the EOI signal before sending the next one. To send the EOI, we use our static `PICS` struct again: ```rust // in src/interrupts.rs extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Timer.as_u8()); } } ``` The `notify_end_of_interrupt` figures out whether the primary or secondary PIC sent the interrupt and then uses the `command` and `data` ports to send an EOI signal to the respective controllers. If the secondary PIC sent the interrupt, both PICs need to be notified because the secondary PIC is connected to an input line of the primary PIC. We need to be careful to use the correct interrupt vector number, otherwise we could accidentally delete an important unsent interrupt or cause our system to hang. This is the reason that the function is unsafe. When we now execute `cargo run` we see dots periodically appearing on the screen: ![QEMU printing consecutive dots showing the hardware timer](qemu-hardware-timer-dots.gif) ### Configuring the Timer The hardware timer that we use is called the _Programmable Interval Timer_, or PIT, for short. Like the name says, it is possible to configure the interval between two interrupts. We won't go into details here because we will switch to the [APIC timer] soon, but the OSDev wiki has an extensive article about the [configuring the PIT]. [APIC timer]: https://wiki.osdev.org/APIC_timer [configuring the PIT]: https://wiki.osdev.org/Programmable_Interval_Timer ## Deadlocks We now have a form of concurrency in our kernel: The timer interrupts occur asynchronously, so they can interrupt our `_start` function at any time. Fortunately, Rust's ownership system prevents many types of concurrency-related bugs at compile time. One notable exception is deadlocks. Deadlocks occur if a thread tries to acquire a lock that will never become free. Thus, the thread hangs indefinitely. We can already provoke a deadlock in our kernel. Remember, our `println` macro calls the `vga_buffer::_print` function, which [locks a global `WRITER`][vga spinlock] using a spinlock: [vga spinlock]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ```rust // in src/vga_buffer.rs […] #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` It locks the `WRITER`, calls `write_fmt` on it, and implicitly unlocks it at the end of the function. Now imagine that an interrupt occurs while the `WRITER` is locked and the interrupt handler tries to print something too: | Timestep | _start | interrupt_handler | | -------- | ---------------------- | ----------------------------------------------- | | 0 | calls `println!` |   | | 1 | `print` locks `WRITER` |   | | 2 | | **interrupt occurs**, handler begins to run | | 3 | | calls `println!` | | 4 | | `print` tries to lock `WRITER` (already locked) | | 5 | | `print` tries to lock `WRITER` (already locked) | | … | | … | | _never_ | _unlock `WRITER`_ | The `WRITER` is locked, so the interrupt handler waits until it becomes free. But this never happens, because the `_start` function only continues to run after the interrupt handler returns. Thus, the entire system hangs. ### Provoking a Deadlock We can easily provoke such a deadlock in our kernel by printing something in the loop at the end of our `_start` function: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] loop { use blog_os::print; print!("-"); // new } } ``` When we run it in QEMU, we get an output of the form: ![QEMU output with many rows of hyphens and no dots](./qemu-deadlock.png) We see that only a limited number of hyphens are printed until the first timer interrupt occurs. Then the system hangs because the timer interrupt handler deadlocks when it tries to print a dot. This is the reason that we see no dots in the above output. The actual number of hyphens varies between runs because the timer interrupt occurs asynchronously. This non-determinism is what makes concurrency-related bugs so difficult to debug. ### Fixing the Deadlock To avoid this deadlock, we can disable interrupts as long as the `Mutex` is locked: ```rust // in src/vga_buffer.rs /// Prints the given formatted string to the VGA text buffer /// through the global `WRITER` instance. #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // new interrupts::without_interrupts(|| { // new WRITER.lock().write_fmt(args).unwrap(); }); } ``` The [`without_interrupts`] function takes a [closure] and executes it in an interrupt-free environment. We use it to ensure that no interrupt can occur as long as the `Mutex` is locked. When we run our kernel now, we see that it keeps running without hanging. (We still don't notice any dots, but this is because they're scrolling by too fast. Try to slow down the printing, e.g., by putting a `for _ in 0..10000 {}` inside the loop.) [`without_interrupts`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.without_interrupts.html [closure]: https://doc.rust-lang.org/book/ch13-01-closures.html We can apply the same change to our serial printing function to ensure that no deadlocks occur with it either: ```rust // in src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // new interrupts::without_interrupts(|| { // new SERIAL1 .lock() .write_fmt(args) .expect("Printing to serial failed"); }); } ``` Note that disabling interrupts shouldn't be a general solution. The problem is that it increases the worst-case interrupt latency, i.e., the time until the system reacts to an interrupt. Therefore, interrupts should only be disabled for a very short time. ## Fixing a Race Condition If you run `cargo test`, you might see the `test_println_output` test failing: ``` > cargo test --lib […] Running 4 tests test_breakpoint_exception...[ok] test_println... [ok] test_println_many... [ok] test_println_output... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `'.'`, right: `'S'`', src/vga_buffer.rs:205:9 ``` The reason is a _race condition_ between the test and our timer handler. Remember, the test looks like this: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` The test prints a string to the VGA buffer and then checks the output by manually iterating over the `buffer_chars` array. The race condition occurs because the timer interrupt handler might run between the `println` and the reading of the screen characters. Note that this isn't a dangerous _data race_, which Rust completely prevents at compile time. See the [_Rustonomicon_][nomicon-races] for details. [nomicon-races]: https://doc.rust-lang.org/nomicon/races.html To fix this, we need to keep the `WRITER` locked for the complete duration of the test, so that the timer handler can't write a `.` to the screen in between. The fixed test looks like this: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { use core::fmt::Write; use x86_64::instructions::interrupts; let s = "Some test string that fits on a single line"; interrupts::without_interrupts(|| { let mut writer = WRITER.lock(); writeln!(writer, "\n{}", s).expect("writeln failed"); for (i, c) in s.chars().enumerate() { let screen_char = writer.buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } }); } ``` We performed the following changes: - We keep the writer locked for the complete test by using the `lock()` method explicitly. Instead of `println`, we use the [`writeln`] macro that allows printing to an already locked writer. - To avoid another deadlock, we disable interrupts for the test's duration. Otherwise, the test might get interrupted while the writer is still locked. - Since the timer interrupt handler can still run before the test, we print an additional newline `\n` before printing the string `s`. This way, we avoid test failure when the timer handler has already printed some `.` characters to the current line. [`writeln`]: https://doc.rust-lang.org/core/macro.writeln.html With the above changes, `cargo test` now deterministically succeeds again. This was a very harmless race condition that only caused a test failure. As you can imagine, other race conditions can be much more difficult to debug due to their non-deterministic nature. Luckily, Rust prevents us from data races, which are the most serious class of race conditions since they can cause all kinds of undefined behavior, including system crashes and silent memory corruptions. ## The `hlt` Instruction Until now, we used a simple empty loop statement at the end of our `_start` and `panic` functions. This causes the CPU to spin endlessly, and thus works as expected. But it is also very inefficient, because the CPU continues to run at full speed even though there's no work to do. You can see this problem in your task manager when you run your kernel: The QEMU process needs close to 100% CPU the whole time. What we really want to do is to halt the CPU until the next interrupt arrives. This allows the CPU to enter a sleep state in which it consumes much less energy. The [`hlt` instruction] does exactly that. Let's use this instruction to create an energy-efficient endless loop: [`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) ```rust // in src/lib.rs pub fn hlt_loop() -> ! { loop { x86_64::instructions::hlt(); } } ``` The `instructions::hlt` function is just a [thin wrapper] around the assembly instruction. It is safe because there's no way it can compromise memory safety. [thin wrapper]: https://github.com/rust-osdev/x86_64/blob/5e8e218381c5205f5777cb50da3ecac5d7e3b1ab/src/instructions/mod.rs#L16-L22 We can now use this `hlt_loop` instead of the endless loops in our `_start` and `panic` functions: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] println!("It did not crash!"); blog_os::hlt_loop(); // new } #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); blog_os::hlt_loop(); // new } ``` Let's update our `lib.rs` as well: ```rust // in src/lib.rs /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); test_main(); hlt_loop(); // new } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); hlt_loop(); // new } ``` When we run our kernel now in QEMU, we see a much lower CPU usage. ## Keyboard Input Now that we are able to handle interrupts from external devices, we are finally able to add support for keyboard input. This will allow us to interact with our kernel for the first time. [PS/2]: https://en.wikipedia.org/wiki/PS/2_port Like the hardware timer, the keyboard controller is already enabled by default. So when you press a key, the keyboard controller sends an interrupt to the PIC, which forwards it to the CPU. The CPU looks for a handler function in the IDT, but the corresponding entry is empty. Therefore, a double fault occurs. So let's add a handler function for the keyboard interrupt. It's quite similar to how we defined the handler for the timer interrupt; it just uses a different interrupt number: ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, Keyboard, // new } lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] // new idt[InterruptIndex::Keyboard.as_usize()] .set_handler_fn(keyboard_interrupt_handler); idt }; } extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("k"); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` As we see from the graphic [above](#the-8259-pic), the keyboard uses line 1 of the primary PIC. This means that it arrives at the CPU as interrupt 33 (1 + offset 32). We add this index as a new `Keyboard` variant to the `InterruptIndex` enum. We don't need to specify the value explicitly, since it defaults to the previous value plus one, which is also 33. In the interrupt handler, we print a `k` and send the end of interrupt signal to the interrupt controller. We now see that a `k` appears on the screen when we press a key. However, this only works for the first key we press. Even if we continue to press keys, no more `k`s appear on the screen. This is because the keyboard controller won't send another interrupt until we have read the so-called _scancode_ of the pressed key. ### Reading the Scancodes To find out _which_ key was pressed, we need to query the keyboard controller. We do this by reading from the data port of the PS/2 controller, which is the [I/O port] with the number `0x60`: [I/O port]: @/edition-2/posts/04-testing/index.md#i-o-ports ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; print!("{}", scancode); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` We use the [`Port`] type of the `x86_64` crate to read a byte from the keyboard's data port. This byte is called the [_scancode_] and it represents the key press/release. We don't do anything with the scancode yet, other than print it to the screen: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html [_scancode_]: https://en.wikipedia.org/wiki/Scancode ![QEMU printing scancodes to the screen when keys are pressed](qemu-printing-scancodes.gif) The above image shows me slowly typing "123". We see that adjacent keys have adjacent scancodes and that pressing a key causes a different scancode than releasing it. But how do we translate the scancodes to the actual key actions exactly? ### Interpreting the Scancodes There are three different standards for the mapping between scancodes and keys, the so-called _scancode sets_. All three go back to the keyboards of early IBM computers: the [IBM XT], the [IBM 3270 PC], and the [IBM AT]. Later computers fortunately did not continue the trend of defining new scancode sets, but rather emulated the existing sets and extended them. Today, most keyboards can be configured to emulate any of the three sets. [IBM XT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer_XT [IBM 3270 PC]: https://en.wikipedia.org/wiki/IBM_3270_PC [IBM AT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer/AT By default, PS/2 keyboards emulate scancode set 1 ("XT"). In this set, the lower 7 bits of a scancode byte define the key, and the most significant bit defines whether it's a press ("0") or a release ("1"). Keys that were not present on the original [IBM XT] keyboard, such as the enter key on the keypad, generate two scancodes in succession: a `0xe0` escape byte and then a byte representing the key. For a list of all set 1 scancodes and their corresponding keys, check out the [OSDev Wiki][scancode set 1]. [scancode set 1]: https://wiki.osdev.org/Keyboard#Scan_Code_Set_1 To translate the scancodes to keys, we can use a `match` statement: ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; // new let key = match scancode { 0x02 => Some('1'), 0x03 => Some('2'), 0x04 => Some('3'), 0x05 => Some('4'), 0x06 => Some('5'), 0x07 => Some('6'), 0x08 => Some('7'), 0x09 => Some('8'), 0x0a => Some('9'), 0x0b => Some('0'), _ => None, }; if let Some(key) = key { print!("{}", key); } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` The above code translates keypresses of the number keys 0-9 and ignores all other keys. It uses a [match] statement to assign a character or `None` to each scancode. It then uses [`if let`] to destructure the optional `key`. By using the same variable name `key` in the pattern, we [shadow] the previous declaration, which is a common pattern for destructuring `Option` types in Rust. [match]: https://doc.rust-lang.org/book/ch06-02-match.html [`if let`]: https://doc.rust-lang.org/book/ch19-01-all-the-places-for-patterns.html#conditional-if-let-expressions [shadow]: https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#shadowing Now we can write numbers: ![QEMU printing numbers to the screen](qemu-printing-numbers.gif) Translating the other keys works in the same way. Fortunately, there is a crate named [`pc-keyboard`] for translating scancodes of scancode sets 1 and 2, so we don't have to implement this ourselves. To use the crate, we add it to our `Cargo.toml` and import it in our `lib.rs`: [`pc-keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/ ```toml # in Cargo.toml [dependencies] pc-keyboard = "0.7.0" ``` Now we can use this crate to rewrite our `keyboard_interrupt_handler`: ```rust // in/src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use spin::Mutex; use x86_64::instructions::port::Port; lazy_static! { static ref KEYBOARD: Mutex> = Mutex::new(Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore) ); } let mut keyboard = KEYBOARD.lock(); let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` We use the `lazy_static` macro to create a static [`Keyboard`] object protected by a Mutex. We initialize the `Keyboard` with a US keyboard layout and the scancode set 1. The [`HandleControl`] parameter allows to map `ctrl+[a-z]` to the Unicode characters `U+0001` through `U+001A`. We don't want to do that, so we use the `Ignore` option to handle the `ctrl` like normal keys. [`HandleControl`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/enum.HandleControl.html On each interrupt, we lock the Mutex, read the scancode from the keyboard controller, and pass it to the [`add_byte`] method, which translates the scancode into an `Option`. The [`KeyEvent`] contains the key which caused the event and whether it was a press or release event. [`Keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html [`add_byte`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.add_byte [`KeyEvent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.KeyEvent.html To interpret this key event, we pass it to the [`process_keyevent`] method, which translates the key event to a character, if possible. For example, it translates a press event of the `A` key to either a lowercase `a` character or an uppercase `A` character, depending on whether the shift key was pressed. [`process_keyevent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.process_keyevent With this modified interrupt handler, we can now write text: ![Typing "Hello World" in QEMU](qemu-typing.gif) ### Configuring the Keyboard It's possible to configure some aspects of a PS/2 keyboard, for example, which scancode set it should use. We won't cover it here because this post is already long enough, but the OSDev Wiki has an overview of possible [configuration commands]. [configuration commands]: https://wiki.osdev.org/PS/2_Keyboard#Commands ## Summary This post explained how to enable and handle external interrupts. We learned about the 8259 PIC and its primary/secondary layout, the remapping of the interrupt numbers, and the "end of interrupt" signal. We implemented handlers for the hardware timer and the keyboard and learned about the `hlt` instruction, which halts the CPU until the next interrupt. Now we are able to interact with our kernel and have some fundamental building blocks for creating a small shell or simple games. ## What's next? Timer interrupts are essential for an operating system because they provide a way to periodically interrupt the running process and let the kernel regain control. The kernel can then switch to a different process and create the illusion of multiple processes running in parallel. But before we can create processes or threads, we need a way to allocate memory for them. The next posts will explore memory management to provide this fundamental building block. ================================================ FILE: blog/content/edition-2/posts/07-hardware-interrupts/index.pt-BR.md ================================================ +++ title = "Interrupções de Hardware" weight = 7 path = "pt-BR/hardware-interrupts" date = 2018-10-22 [extra] chapter = "Interrupções" # Please update this when updating the translation translation_based_on_commit = "9753695744854686a6b80012c89b0d850a44b4b0" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Nesta postagem, configuramos o controlador de interrupção programável para encaminhar corretamente interrupções de hardware para a CPU. Para manipular essas interrupções, adicionamos novas entradas à nossa tabela de descritores de interrupção, assim como fizemos para nossos manipuladores de exceção. Aprenderemos como obter interrupções periódicas de timer e como obter entrada do teclado. Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na branch [`post-07`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-07 ## Visão Geral Interrupções fornecem uma forma de notificar a CPU de dispositivos de hardware conectados. Então, em vez de deixar o kernel verificar periodicamente o teclado por novos caracteres (um processo chamado [_polling_]), o teclado pode notificar o kernel de cada pressionamento de tecla. Isso é muito mais eficiente porque o kernel só precisa agir quando algo aconteceu. Também permite tempos de reação mais rápidos, já que o kernel pode reagir imediatamente e não apenas na próxima verificação. [_polling_]: https://en.wikipedia.org/wiki/Polling_(computer_science) Conectar todos os dispositivos de hardware diretamente à CPU não é possível. Em vez disso, um _controlador de interrupção_ separado agrega as interrupções de todos os dispositivos e então notifica a CPU: ``` ____________ _____ Timer ------------> | | | | Teclado ----------> | Controlador|---------> | CPU | Outro Hardware ---> | de | |_____| Etc. -------------> | Interrupção| |____________| ``` A maioria dos controladores de interrupção são programáveis, o que significa que suportam diferentes níveis de prioridade para interrupções. Por exemplo, isso permite dar às interrupções de timer uma prioridade mais alta que as interrupções de teclado para garantir cronometragem precisa. Ao contrário de exceções, interrupções de hardware ocorrem _assincronamente_. Isso significa que são completamente independentes do código executado e podem ocorrer a qualquer momento. Assim, temos repentinamente uma forma de concorrência em nosso kernel com todos os potenciais bugs relacionados à concorrência. O modelo estrito de ownership de Rust nos ajuda aqui porque proíbe estado global mutável. No entanto, deadlocks ainda são possíveis, como veremos mais tarde nesta postagem. ## O 8259 PIC O [Intel 8259] é um controlador de interrupção programável (PIC) introduzido em 1976. Ele foi há muito tempo substituído pelo mais novo [APIC], mas sua interface ainda é suportada em sistemas atuais por razões de compatibilidade retroativa. O 8259 PIC é significativamente mais fácil de configurar que o APIC, então o usaremos para nos introduzir a interrupções antes de mudarmos para o APIC em uma postagem posterior. [APIC]: https://en.wikipedia.org/wiki/Intel_APIC_Architecture O 8259 tem oito linhas de interrupção e várias linhas para se comunicar com a CPU. Os sistemas típicos daquela época eram equipados com duas instâncias do 8259 PIC, um PIC primário e um secundário, conectado a uma das linhas de interrupção do primário: [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ``` ____________ ____________ Real Time Clock --> | | Timer -------------> | | ACPI -------------> | | Teclado-----------> | | _____ Disponível -------> | Controlador|----------------------> | Controlador| | | Disponível -------> | de | Porta Serial 2 ----> | de |---> | CPU | Mouse ------------> | Interrupção| Porta Serial 1 ----> | Interrupção| |_____| Co-Processador ---> | Secundário | Porta Paralela 2/3 > | Primário | ATA Primário -----> | | Disquete ---------> | | ATA Secundário ---> |____________| Porta Paralela 1---> |____________| ``` Este gráfico mostra a atribuição típica de linhas de interrupção. Vemos que a maioria das 15 linhas têm um mapeamento fixo, por exemplo, a linha 4 do PIC secundário é atribuída ao mouse. Cada controlador pode ser configurado através de duas [portas I/O], uma porta "comando" e uma porta "dados". Para o controlador primário, essas portas são `0x20` (comando) e `0x21` (dados). Para o controlador secundário, elas são `0xa0` (comando) e `0xa1` (dados). Para mais informações sobre como os PICs podem ser configurados, veja o [artigo em osdev.org]. [portas I/O]: @/edition-2/posts/04-testing/index.md#i-o-ports [artigo em osdev.org]: https://wiki.osdev.org/8259_PIC ### Implementação A configuração padrão dos PICs não é utilizável porque envia números de vetor de interrupção no intervalo de 0–15 para a CPU. Esses números já estão ocupados por exceções de CPU. Por exemplo, o número 8 corresponde a um double fault. Para corrigir esse problema de sobreposição, precisamos remapear as interrupções PIC para números diferentes. O intervalo real não importa desde que não se sobreponha às exceções, mas tipicamente o intervalo de 32–47 é escolhido, porque esses são os primeiros números livres após os 32 slots de exceção. A configuração acontece escrevendo valores especiais nas portas de comando e dados dos PICs. Felizmente, já existe uma crate chamada [`pic8259`], então não precisamos escrever a sequência de inicialização nós mesmos. No entanto, se você estiver interessado em como funciona, confira [seu código-fonte][pic crate source]. Ele é bastante pequeno e bem documentado. [pic crate source]: https://docs.rs/crate/pic8259/0.10.1/source/src/lib.rs Para adicionar a crate como dependência, adicionamos o seguinte ao nosso projeto: [`pic8259`]: https://docs.rs/pic8259/0.10.1/pic8259/ ```toml # em Cargo.toml [dependencies] pic8259 = "0.10.1" ``` A principal abstração fornecida pela crate é a struct [`ChainedPics`] que representa o layout primário/secundário de PIC que vimos acima. Ela é projetada para ser usada da seguinte forma: [`ChainedPics`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html ```rust // em src/interrupts.rs use pic8259::ChainedPics; use spin; pub const PIC_1_OFFSET: u8 = 32; pub const PIC_2_OFFSET: u8 = PIC_1_OFFSET + 8; pub static PICS: spin::Mutex = spin::Mutex::new(unsafe { ChainedPics::new(PIC_1_OFFSET, PIC_2_OFFSET) }); ``` Como notado acima, estamos definindo os offsets para os PICs no intervalo 32–47. Ao envolver a struct `ChainedPics` em um `Mutex`, obtemos acesso mutável seguro (através do [método `lock`][spin mutex lock]), que precisamos no próximo passo. A função `ChainedPics::new` é unsafe porque offsets errados poderiam causar comportamento indefinido. [spin mutex lock]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html#method.lock Agora podemos inicializar o 8259 PIC em nossa função `init`: ```rust // em src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; // novo } ``` Usamos a função [`initialize`] para realizar a inicialização do PIC. Como a função `ChainedPics::new`, esta função também é unsafe porque pode causar comportamento indefinido se o PIC estiver mal configurado. [`initialize`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html#method.initialize Se tudo correr bem, devemos continuar a ver a mensagem "Não crashou!" ao executar `cargo run`. ## Habilitando Interrupções Até agora, nada aconteceu porque as interrupções ainda estão desativadas na configuração da CPU. Isso significa que a CPU não escuta o controlador de interrupção de forma alguma, então nenhuma interrupção pode chegar à CPU. Vamos mudar isso: ```rust // em src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; x86_64::instructions::interrupts::enable(); // novo } ``` A função `interrupts::enable` da crate `x86_64` executa a instrução especial `sti` ("set interrupts") para habilitar interrupções externas. Quando tentamos `cargo run` agora, vemos que ocorre um double fault: ![QEMU printing `EXCEÇÃO: DOUBLE FAULT` because of hardware timer](qemu-hardware-timer-double-fault.png) A razão para este double fault é que o timer de hardware (o [Intel 8253], para ser exato) é habilitado por padrão, então começamos a receber interrupções de timer assim que habilitamos interrupções. Como ainda não definimos uma função manipuladora para ele, nosso manipulador de double fault é invocado. [Intel 8253]: https://en.wikipedia.org/wiki/Intel_8253 ## Manipulando Interrupções de Timer Como vemos do gráfico [acima](#o-8259-pic), o timer usa a linha 0 do PIC primário. Isso significa que ele chega à CPU como interrupção 32 (0 + offset 32). Em vez de codificar rigidamente o índice 32, o armazenamos em um enum `InterruptIndex`: ```rust // em src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, } impl InterruptIndex { fn as_u8(self) -> u8 { self as u8 } fn as_usize(self) -> usize { usize::from(self.as_u8()) } } ``` O enum é um [enum similar a C] para que possamos especificar diretamente o índice para cada variante. O atributo `repr(u8)` especifica que cada variante é representada como um `u8`. Adicionaremos mais variantes para outras interrupções no futuro. [enum similar a C]: https://doc.rust-lang.org/reference/items/enumerations.html#custom-discriminant-values-for-fieldless-enumerations Agora podemos adicionar uma função manipuladora para a interrupção de timer: ```rust // em src/interrupts.rs use crate::print; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] idt[InterruptIndex::Timer.as_usize()] .set_handler_fn(timer_interrupt_handler); // novo idt }; } extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); } ``` Nosso `timer_interrupt_handler` tem a mesma assinatura que nossos manipuladores de exceção, porque a CPU reage identicamente a exceções e interrupções externas (a única diferença é que algumas exceções empurram um código de erro). A struct [`InterruptDescriptorTable`] implementa a trait [`IndexMut`], então podemos acessar entradas individuais através da sintaxe de indexação de array. [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [`IndexMut`]: https://doc.rust-lang.org/core/ops/trait.IndexMut.html Em nosso manipulador de interrupção de timer, imprimimos um ponto na tela. Como a interrupção de timer acontece periodicamente, esperaríamos ver um ponto aparecendo a cada tick do timer. No entanto, quando o executamos, vemos que apenas um único ponto é impresso: ![QEMU printing only a single dot for hardware timer](qemu-single-dot-printed.png) ### End of Interrupt A razão é que o PIC espera um sinal explícito de "end of interrupt" (EOI) do nosso manipulador de interrupção. Este sinal diz ao controlador que a interrupção foi processada e que o sistema está pronto para receber a próxima interrupção. Então o PIC pensa que ainda estamos ocupados processando a primeira interrupção de timer e espera pacientemente pelo sinal EOI antes de enviar a próxima. Para enviar o EOI, usamos nossa struct `PICS` estática novamente: ```rust // em src/interrupts.rs extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Timer.as_u8()); } } ``` O `notify_end_of_interrupt` descobre se o PIC primário ou secundário enviou a interrupção e então usa as portas `command` e `data` para enviar um sinal EOI aos respectivos controladores. Se o PIC secundário enviou a interrupção, ambos os PICs precisam ser notificados porque o PIC secundário está conectado a uma linha de entrada do PIC primário. Precisamos ter cuidado para usar o número de vetor de interrupção correto, caso contrário poderíamos acidentalmente deletar uma importante interrupção não enviada ou fazer nosso sistema travar. Esta é a razão pela qual a função é unsafe. Quando agora executamos `cargo run` vemos pontos aparecendo periodicamente na tela: ![QEMU printing consecutive dots showing the hardware timer](qemu-hardware-timer-dots.gif) ### Configurando o Timer O timer de hardware que usamos é chamado de _Programmable Interval Timer_, ou PIT, resumidamente. Como o nome diz, é possível configurar o intervalo entre duas interrupções. Não entraremos em detalhes aqui porque mudaremos em breve para o [APIC timer], mas a wiki do OSDev tem um artigo extenso sobre [configurando o PIT]. [APIC timer]: https://wiki.osdev.org/APIC_timer [configurando o PIT]: https://wiki.osdev.org/Programmable_Interval_Timer ## Deadlocks Agora temos uma forma de concorrência em nosso kernel: As interrupções de timer ocorrem assincronamente, então podem interromper nossa função `_start` a qualquer momento. Felizmente, o sistema de ownership de Rust previne muitos tipos de bugs relacionados à concorrência em tempo de compilação. Uma exceção notável são deadlocks. Deadlocks ocorrem se uma thread tenta adquirir um lock que nunca se tornará livre. Assim, a thread trava indefinidamente. Já podemos provocar um deadlock em nosso kernel. Lembre-se, nossa macro `println` chama a função `vga_buffer::_print`, que [trava um `WRITER` global][vga spinlock] usando um spinlock: [vga spinlock]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ```rust // em src/vga_buffer.rs […] #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` Ela trava o `WRITER`, chama `write_fmt` nele, e implicitamente o destrava no final da função. Agora imagine que uma interrupção ocorre enquanto o `WRITER` está travado e o manipulador de interrupção tenta imprimir algo também: | Passo de Tempo | _start | interrupt_handler | | -------------- | ---------------------- | ----------------------------------------------------- | | 0 | chama `println!` |   | | 1 | `print` trava `WRITER` |   | | 2 | | **interrupção ocorre**, manipulador começa a executar | | 3 | | chama `println!` | | 4 | | `print` tenta travar `WRITER` (já travado) | | 5 | | `print` tenta travar `WRITER` (já travado) | | … | | … | | _nunca_ | _destravar `WRITER`_ | O `WRITER` está travado, então o manipulador de interrupção espera até que se torne livre. Mas isso nunca acontece, porque a função `_start` só continua a executar após o manipulador de interrupção retornar. Assim, o sistema inteiro trava. ### Provocando um Deadlock Podemos facilmente provocar tal deadlock em nosso kernel imprimindo algo no loop no final de nossa função `_start`: ```rust // em src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] loop { use blog_os::print; print!("-"); // novo } } ``` Quando o executamos no QEMU, obtemos uma saída da forma: ![QEMU output with many rows of hyphens and no dots](./qemu-deadlock.png) Vemos que apenas um número limitado de hífens são impressos até que a primeira interrupção de timer ocorre. Então o sistema trava porque o manipulador de interrupção de timer entra em deadlock quando tenta imprimir um ponto. Esta é a razão pela qual não vemos pontos na saída acima. O número real de hífens varia entre execuções porque a interrupção de timer ocorre assincronamente. Este não-determinismo é o que torna bugs relacionados à concorrência tão difíceis de depurar. ### Corrigindo o Deadlock Para evitar este deadlock, podemos desativar interrupções enquanto o `Mutex` está travado: ```rust // em src/vga_buffer.rs /// Imprime a string formatada dada no buffer de texto VGA /// através da instância global `WRITER`. #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // novo interrupts::without_interrupts(|| { // novo WRITER.lock().write_fmt(args).unwrap(); }); } ``` A função [`without_interrupts`] recebe um [closure] e o executa em um ambiente livre de interrupções. Usamos isso para garantir que nenhuma interrupção pode ocorrer enquanto o `Mutex` está travado. Quando executamos nosso kernel agora, vemos que ele continua executando sem travar. (Ainda não notamos nenhum ponto, mas isso é porque eles estão rolando rápido demais. Tente diminuir a velocidade da impressão, por exemplo, colocando um `for _ in 0..10000 {}` dentro do loop.) [`without_interrupts`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.without_interrupts.html [closure]: https://doc.rust-lang.org/book/ch13-01-closures.html Podemos aplicar a mesma mudança à nossa função de impressão serial para garantir que nenhum deadlock ocorra com ela também: ```rust // em src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // novo interrupts::without_interrupts(|| { // novo SERIAL1 .lock() .write_fmt(args) .expect("Impressão para serial falhou"); }); } ``` Note que desativar interrupções não deve ser uma solução geral. O problema é que isso aumenta a latência de interrupção no pior caso, isto é, o tempo até o sistema reagir a uma interrupção. Portanto, interrupções devem ser desativadas apenas por um tempo muito curto. ## Corrigindo uma Race Condition Se você executar `cargo test`, pode ver o teste `test_println_output` falhar: ``` > cargo test --lib […] Running 4 tests test_breakpoint_exception...[ok] test_println... [ok] test_println_many... [ok] test_println_output... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `'.'`, right: `'S'`', src/vga_buffer.rs:205:9 ``` A razão é uma _race condition_ entre o teste e nosso manipulador de timer. Lembre-se, o teste se parece com isto: ```rust // em src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Uma string de teste que cabe em uma única linha"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` O teste imprime uma string no buffer VGA e então verifica a saída iterando manualmente pelo array `buffer_chars`. A race condition ocorre porque o manipulador de interrupção de timer pode executar entre o `println` e a leitura dos caracteres de tela. Note que isso não é uma _data race_ perigosa, que Rust previne completamente em tempo de compilação. Veja o [_Rustonomicon_][nomicon-races] para detalhes. [nomicon-races]: https://doc.rust-lang.org/nomicon/races.html Para corrigir isso, precisamos manter o `WRITER` travado pela duração completa do teste, para que o manipulador de timer não possa escrever um `.` na tela no meio. O teste corrigido se parece com isto: ```rust // em src/vga_buffer.rs #[test_case] fn test_println_output() { use core::fmt::Write; use x86_64::instructions::interrupts; let s = "Uma string de teste que cabe em uma única linha"; interrupts::without_interrupts(|| { let mut writer = WRITER.lock(); writeln!(writer, "\n{}", s).expect("writeln falhou"); for (i, c) in s.chars().enumerate() { let screen_char = writer.buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } }); } ``` Realizamos as seguintes mudanças: - Mantemos o writer travado pelo teste completo usando o método `lock()` explicitamente. Em vez de `println`, usamos a macro [`writeln`] que permite imprimir em um writer já travado. - Para evitar outro deadlock, desativamos interrupções pela duração do teste. Caso contrário, o teste poderia ser interrompido enquanto o writer ainda está travado. - Como o manipulador de interrupção de timer ainda pode executar antes do teste, imprimimos uma nova linha adicional `\n` antes de imprimir a string `s`. Desta forma, evitamos falha do teste quando o manipulador de timer já imprimiu alguns caracteres `.` na linha atual. [`writeln`]: https://doc.rust-lang.org/core/macro.writeln.html Com as mudanças acima, `cargo test` agora tem sucesso deterministicamente novamente. Esta foi uma race condition muito inofensiva que causou apenas uma falha de teste. Como você pode imaginar, outras race conditions podem ser muito mais difíceis de depurar devido à sua natureza não-determinística. Felizmente, Rust nos previne de data races, que são a classe mais séria de race conditions, já que podem causar todo tipo de comportamento indefinido, incluindo crashes de sistema e corrupções silenciosas de memória. ## A Instrução `hlt` Até agora, usamos uma simples instrução de loop vazio no final de nossas funções `_start` e `panic`. Isso faz a CPU girar infinitamente, e assim funciona como esperado. Mas também é muito ineficiente, porque a CPU continua executando a velocidade máxima mesmo que não haja trabalho a fazer. Você pode ver este problema em seu gerenciador de tarefas quando executa seu kernel: O processo QEMU precisa de perto de 100% de CPU o tempo todo. O que realmente queremos fazer é parar a CPU até a próxima interrupção chegar. Isso permite que a CPU entre em um estado de sono no qual consome muito menos energia. A [instrução `hlt`] faz exatamente isso. Vamos usar esta instrução para criar um loop infinito eficiente em energia: [instrução `hlt`]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) ```rust // em src/lib.rs pub fn hlt_loop() -> ! { loop { x86_64::instructions::hlt(); } } ``` A função `instructions::hlt` é apenas um [wrapper fino] em torno da instrução assembly. Ela é segura porque não há forma de comprometer a segurança de memória. [wrapper fino]: https://github.com/rust-osdev/x86_64/blob/5e8e218381c5205f5777cb50da3ecac5d7e3b1ab/src/instructions/mod.rs#L16-L22 Agora podemos usar este `hlt_loop` em vez dos loops infinitos em nossas funções `_start` e `panic`: ```rust // em src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] println!("Não crashou!"); blog_os::hlt_loop(); // novo } #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); blog_os::hlt_loop(); // novo } ``` Vamos atualizar nosso `lib.rs` também: ```rust // em src/lib.rs /// Ponto de entrada para `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); test_main(); hlt_loop(); // novo } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); hlt_loop(); // novo } ``` Quando executamos nosso kernel agora no QEMU, vemos um uso de CPU muito menor. ## Entrada de Teclado Agora que somos capazes de manipular interrupções de dispositivos externos, finalmente podemos adicionar suporte para entrada de teclado. Isso nos permitirá interagir com nosso kernel pela primeira vez. [PS/2]: https://en.wikipedia.org/wiki/PS/2_port Como o timer de hardware, o controlador de teclado já está habilitado por padrão. Então quando você pressiona uma tecla, o controlador de teclado envia uma interrupção para o PIC, que a encaminha para a CPU. A CPU procura por uma função manipuladora na IDT, mas a entrada correspondente está vazia. Portanto, ocorre um double fault. Então vamos adicionar uma função manipuladora para a interrupção de teclado. É bem similar a como definimos o manipulador para a interrupção de timer; apenas usa um número de interrupção diferente: ```rust // em src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, Keyboard, // novo } lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] // novo idt[InterruptIndex::Keyboard.as_usize()] .set_handler_fn(keyboard_interrupt_handler); idt }; } extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("k"); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` Como vemos do gráfico [acima](#o-8259-pic), o teclado usa a linha 1 do PIC primário. Isso significa que ele chega à CPU como interrupção 33 (1 + offset 32). Adicionamos este índice como uma nova variante `Keyboard` ao enum `InterruptIndex`. Não precisamos especificar o valor explicitamente, já que ele assume o valor anterior mais um por padrão, que também é 33. No manipulador de interrupção, imprimimos um `k` e enviamos o sinal end of interrupt para o controlador de interrupção. Agora vemos que um `k` aparece na tela quando pressionamos uma tecla. No entanto, isso só funciona para a primeira tecla que pressionamos. Mesmo se continuarmos a pressionar teclas, nenhum `k` adicional aparece na tela. Isso ocorre porque o controlador de teclado não enviará outra interrupção até lermos o chamado _scancode_ da tecla pressionada. ### Lendo os Scancodes Para descobrir _qual_ tecla foi pressionada, precisamos consultar o controlador de teclado. Fazemos isso lendo da porta de dados do controlador PS/2, que é a [porta I/O] com o número `0x60`: [porta I/O]: @/edition-2/posts/04-testing/index.md#i-o-ports ```rust // em src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; print!("{}", scancode); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` Usamos o tipo [`Port`] da crate `x86_64` para ler um byte da porta de dados do teclado. Este byte é chamado de [_scancode_] e representa o pressionamento/liberação de tecla. Ainda não fazemos nada com o scancode, apenas o imprimimos na tela: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html [_scancode_]: https://en.wikipedia.org/wiki/Scancode ![QEMU printing scancodes to the screen when keys are pressed](qemu-printing-scancodes.gif) A imagem acima me mostra digitando lentamente "123". Vemos que teclas adjacentes têm scancodes adjacentes e que pressionar uma tecla causa um scancode diferente de liberá-la. Mas como traduzimos exatamente os scancodes para as ações reais de tecla? ### Interpretando os Scancodes Existem três padrões diferentes para o mapeamento entre scancodes e teclas, os chamados _conjuntos de scancode_. Todos os três remontam aos teclados de computadores IBM antigos: o [IBM XT], o [IBM 3270 PC], e o [IBM AT]. Felizmente, computadores posteriores não continuaram a tendência de definir novos conjuntos de scancode, mas em vez disso emularam os conjuntos existentes e os estenderam. Hoje, a maioria dos teclados pode ser configurada para emular qualquer um dos três conjuntos. [IBM XT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer_XT [IBM 3270 PC]: https://en.wikipedia.org/wiki/IBM_3270_PC [IBM AT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer/AT Por padrão, teclados PS/2 emulam o conjunto de scancode 1 ("XT"). Neste conjunto, os 7 bits inferiores de um byte de scancode definem a tecla, e o bit mais significativo define se é um pressionamento ("0") ou uma liberação ("1"). Teclas que não estavam presentes no [IBM XT] original, como a tecla enter no teclado numérico, geram dois scancodes em sucessão: um byte de escape `0xe0` e então um byte representando a tecla. Para uma lista de todos os scancodes do conjunto 1 e suas teclas correspondentes, confira a [Wiki OSDev][scancode set 1]. [scancode set 1]: https://wiki.osdev.org/Keyboard#Scan_Code_Set_1 Para traduzir os scancodes para teclas, podemos usar uma instrução `match`: ```rust // em src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; // novo let key = match scancode { 0x02 => Some('1'), 0x03 => Some('2'), 0x04 => Some('3'), 0x05 => Some('4'), 0x06 => Some('5'), 0x07 => Some('6'), 0x08 => Some('7'), 0x09 => Some('8'), 0x0a => Some('9'), 0x0b => Some('0'), _ => None, }; if let Some(key) = key { print!("{}", key); } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` O código acima traduz pressionamentos das teclas numéricas 0-9 e ignora todas as outras teclas. Ele usa uma instrução [match] para atribuir um caractere ou `None` a cada scancode. Então usa [`if let`] para desestruturar o `key` opcional. Ao usar o mesmo nome de variável `key` no padrão, [sombreamos] a declaração anterior, que é um padrão comum para desestruturar tipos `Option` em Rust. [match]: https://doc.rust-lang.org/book/ch06-02-match.html [`if let`]: https://doc.rust-lang.org/book/ch19-01-all-the-places-for-patterns.html#conditional-if-let-expressions [sombreamos]: https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#shadowing Agora podemos escrever números: ![QEMU printing numbers to the screen](qemu-printing-numbers.gif) Traduzir as outras teclas funciona da mesma forma. Felizmente, existe uma crate chamada [`pc-keyboard`] para traduzir scancodes dos conjuntos de scancode 1 e 2, então não precisamos implementar isso nós mesmos. Para usar a crate, a adicionamos ao nosso `Cargo.toml` e a importamos em nosso `lib.rs`: [`pc-keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/ ```toml # em Cargo.toml [dependencies] pc-keyboard = "0.7.0" ``` Agora podemos usar esta crate para reescrever nosso `keyboard_interrupt_handler`: ```rust // em/src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use spin::Mutex; use x86_64::instructions::port::Port; lazy_static! { static ref KEYBOARD: Mutex> = Mutex::new(Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore) ); } let mut keyboard = KEYBOARD.lock(); let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` Usamos a macro `lazy_static` para criar um objeto [`Keyboard`] estático protegido por um Mutex. Inicializamos o `Keyboard` com um layout de teclado americano e o conjunto de scancode 1. O parâmetro [`HandleControl`] permite mapear `ctrl+[a-z]` aos caracteres Unicode `U+0001` através de `U+001A`. Não queremos fazer isso, então usamos a opção `Ignore` para manipular o `ctrl` como teclas normais. [`HandleControl`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/enum.HandleControl.html Em cada interrupção, travamos o Mutex, lemos o scancode do controlador de teclado, e o passamos para o método [`add_byte`], que traduz o scancode em um `Option`. O [`KeyEvent`] contém a tecla que causou o evento e se foi um evento de pressionamento ou liberação. [`Keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html [`add_byte`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.add_byte [`KeyEvent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.KeyEvent.html Para interpretar este evento de tecla, o passamos para o método [`process_keyevent`], que traduz o evento de tecla em um caractere, se possível. Por exemplo, ele traduz um evento de pressionamento da tecla `A` em um caractere `a` minúsculo ou um caractere `A` maiúsculo, dependendo se a tecla shift foi pressionada. [`process_keyevent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.process_keyevent Com este manipulador de interrupção modificado, agora podemos escrever texto: ![Typing "Hello World" in QEMU](qemu-typing.gif) ### Configurando o Teclado É possível configurar alguns aspectos de um teclado PS/2, por exemplo, qual conjunto de scancode ele deve usar. Não cobriremos isso aqui porque esta postagem já está longa o suficiente, mas a Wiki do OSDev tem uma visão geral dos possíveis [comandos de configuração]. [comandos de configuração]: https://wiki.osdev.org/PS/2_Keyboard#Commands ## Resumo Esta postagem explicou como habilitar e manipular interrupções externas. Aprendemos sobre o 8259 PIC e seu layout primário/secundário, o remapeamento dos números de interrupção, e o sinal "end of interrupt". Implementamos manipuladores para o timer de hardware e o teclado e aprendemos sobre a instrução `hlt`, que para a CPU até a próxima interrupção. Agora somos capazes de interagir com nosso kernel e temos alguns blocos fundamentais para criar um pequeno shell ou jogos simples. ## O Que Vem a Seguir? Interrupções de timer são essenciais para um sistema operacional porque fornecem uma forma de interromper periodicamente o processo em execução e deixar o kernel retomar o controle. O kernel pode então mudar para um processo diferente e criar a ilusão de múltiplos processos executando em paralelo. Mas antes de podermos criar processos ou threads, precisamos de uma forma de alocar memória para eles. As próximas postagens explorarão gerenciamento de memória para fornecer este bloco fundamental. ================================================ FILE: blog/content/edition-2/posts/07-hardware-interrupts/index.zh-CN.md ================================================ +++ title = "硬件中断" weight = 7 path = "zh-CN/hardware-interrupts" date = 2018-10-22 [extra] # Please update this when updating the translation translation_based_on_commit = "096c044b4f3697e91d8e30a2e817e567d0ef21a2" # GitHub usernames of the people that translated this post translators = ["liuyuran"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["JiangengDong"] +++ 在本文中,我们会对可编程的中断控制器进行设置,以将硬件中断转发给CPU,而要处理这些中断,只需要像处理异常一样在中断描述符表中加入一个新条目即可,在这里我们会以获取周期计时器的中断和获取键盘输入为例进行讲解。 这个系列的 blog 在[GitHub]上开放开发,如果你有任何问题,请在这里开一个 issue 来讨论。当然你也可以在[底部][at the bottom]留言。你可以在[`post-07`][post branch]找到这篇文章的完整源码。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-07 ## 前言 中断是其他硬件对CPU发送通知的一种方式,所以除了使用 [_轮询_][_polling_] 进程在内核层面定时检查键盘输入以外,由键盘主动通知内核按键输入的结果也是个可行的方案。相比之下,后者可能还更加有用,此时内核只需要处理接收到的事件即可。这也可以极大降低系统的反应延时,因为内核无需等待下一次轮询周期。 [_polling_]: https://en.wikipedia.org/wiki/Polling_(computer_science) 根据常识,将所有硬件直连CPU是不可能的,所以需要一个统一的 _中断控制器_ 对所有设备中断进行代理,并由它间接通知CPU: ``` ____________ _____ Timer ------------> | | | | Keyboard ---------> | Interrupt |---------> | CPU | Other Hardware ---> | Controller | |_____| Etc. -------------> |____________| ``` 绝大多数中断控制器都是可编程的,也就是说可以自行设定中断的优先级,比如我们可以为计时器中断设定比键盘中断更高的优先级,以保证系统时间的精确性。 和异常不同,硬件中断完全是 _异步的_ ,也就是说它们可以在任何时候发生,且时序完全独立于正在运行的代码。所以我们的内核里就突然添加了一种异步的逻辑形式,并且也引入了所有潜在的与异步逻辑相关的Bug可能性。此时Rust严格的所有权模型此时就开始具备优势,因为它从根本上禁止了可变的全局状态。但尽管如此,死锁很难完全避免,这个问题我们会在文章稍后的部分进行说明。 ## The 8259 PIC [Intel 8259] 是一款于1976年发布的可编程中断控制器(PIC),事实上,它已经被更先进的 [APIC] 替代很久了,但其接口依然出于兼容问题被现有系统所支持。但是 8259 PIC 的设置方式比起APIC实在简单太多了,所以我们先以前者为例解说一下基本原理,在下一篇文章中再切换为APIC。 [APIC]: https://en.wikipedia.org/wiki/Intel_APIC_Architecture 8529具有8个中断管脚和一个和CPU通信的独立管脚,而当年的典型系统一般会安装两片 8259 PIC ,一个作为主芯片,另一个则作为副芯片,就像下面这样: [Intel 8259]: https://en.wikipedia.org/wiki/Intel_8259 ``` ____________ ____________ Real Time Clock --> | | Timer -------------> | | ACPI -------------> | | Keyboard-----------> | | _____ Available --------> | Secondary |----------------------> | Primary | | | Available --------> | Interrupt | Serial Port 2 -----> | Interrupt |---> | CPU | Mouse ------------> | Controller | Serial Port 1 -----> | Controller | |_____| Co-Processor -----> | | Parallel Port 2/3 -> | | Primary ATA ------> | | Floppy disk -------> | | Secondary ATA ----> |____________| Parallel Port 1----> |____________| ``` 上图展示了中断管脚的典型逻辑定义,我们可以看到,实际上可定义的管脚共有15个,例如副PIC的4号管脚被定义为了鼠标。 每个控制器都可以通过两个 [I/O 端口][I/O ports] 进行配置,一个是“指令”端口,另一个是“数据”端口。对于主控制器,端口地址是 `0x20`(指令)和 `0x21`(数据),而对于副控制器,端口地址是 `0xa0`(指令)和 `0xa1`(数据)。要查看更多关于PIC配置的细节,请参见 [article on osdev.org]。 [I/O ports]: @/edition-2/posts/04-testing/index.md#i-o-ports [article on osdev.org]: https://wiki.osdev.org/8259_PIC ### 实现 PIC默认的配置其实是无法使用的,因为它仅仅是将0-15之间的中断向量编号发送给了CPU,然而这些编号已经用在了CPU的异常编号中了,比如8号代指 double fault 异常。要修复这个错误,我们需要对PIC中断序号进行重映射,新的序号只需要避开已被定义的CPU异常即可,CPU定义的异常数量有32个,所以通常会使用32-47这个区段。 我们需要通过往指令和数据端口写入特定数据才能对配置进行编程,幸运的是已经有了一个名叫 [`pic8259`] 的crate封装了这些东西,我们无需自己去处理这些初始化方面的细节。 如果你十分好奇其中的细节,这里是 [它的源码][pic crate source],他的内部逻辑其实十分简洁,而且具备完善的文档。 [pic crate source]: https://docs.rs/crate/pic8259/0.10.1/source/src/lib.rs 我们可以这样将 crate 作为依赖加入工程中: [`pic8259`]: https://docs.rs/pic8259/0.10.1/pic8259/ ```toml # in Cargo.toml [dependencies] pic8259 = "0.10.1" ``` 这个 crate 提供的主要抽象结构就是 [`ChainedPics`],用于映射上文所说的主副PIC的映射布局,它可以这样使用: [`ChainedPics`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html ```rust // in src/interrupts.rs use pic8259::ChainedPics; use spin; pub const PIC_1_OFFSET: u8 = 32; pub const PIC_2_OFFSET: u8 = PIC_1_OFFSET + 8; pub static PICS: spin::Mutex = spin::Mutex::new(unsafe { ChainedPics::new(PIC_1_OFFSET, PIC_2_OFFSET) }); ``` 我们成功将PIC的中断编号范围设定为了32–47。我们使用 `Mutex` 容器包裹了 `ChainedPics`,这样就可以通过([`lock` 函数][spin mutex lock])拿到被定义为安全的变量修改权限,我们在下文会用到这个权限。`ChainedPics::new` 处于unsafe块,因为错误的偏移量可能会导致一些未定义行为。 [spin mutex lock]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html#method.lock 那么现在,我们就可以在 `init` 函数中初始化 8259 PIC 配置了: ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; // new } ``` 我们使用 [`initialize`] 函数进行PIC的初始化。正如 `ChainedPics::new` ,这个函数也是 unsafe 的,因为里面的不安全逻辑可能会导致PIC配置失败,进而出现一些未定义行为。 [`initialize`]: https://docs.rs/pic8259/0.10.1/pic8259/struct.ChainedPics.html#method.initialize 如果一切顺利,我们在运行 `cargo run` 后应当能看到诸如 "It did not crash" 此类的输出信息。 ## 启用中断 不过现在什么都不会发生,因为CPU配置里面中断还是禁用状态呢,也就是说CPU现在根本不会监听来自中断控制器的信息,即任何中断都无法到达CPU。我们来启用它: ```rust // in src/lib.rs pub fn init() { gdt::init(); interrupts::init_idt(); unsafe { interrupts::PICS.lock().initialize() }; x86_64::instructions::interrupts::enable(); // new } ``` `x86_64` crate 中的 `interrupts::enable` 会执行特殊的 `sti` (“set interrupts”) 指令来启用外部中断。当我们试着执行 `cargo run` 后,double fault 异常几乎是立刻就被抛出了: ![QEMU printing `EXCEPTION: DOUBLE FAULT` because of hardware timer](qemu-hardware-timer-double-fault.png) 其原因就是硬件计时器(准确的说,是[Intel 8253])默认是被启用的,所以在启用中断控制器之后,CPU开始接收到计时器中断信号,而我们又并未设定相对应的处理函数,所以就抛出了 double fault 异常。 [Intel 8253]: https://en.wikipedia.org/wiki/Intel_8253 ## 处理计时器中断 我们已经知道 [计时器组件](#the-8259-pic) 使用了主PIC的0号管脚,根据上文中我们定义的序号偏移量32,所以计时器对应的中断序号也是32。但是不要将32硬编码进去,我们将其存储到枚举类型 `InterruptIndex` 中: ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, } impl InterruptIndex { fn as_u8(self) -> u8 { self as u8 } fn as_usize(self) -> usize { usize::from(self.as_u8()) } } ``` 这是一个 [C语言风格的枚举][C-like enum],我们可以为每个枚举值指定其对应的数值,`repr(u8)` 开关使枚举值对应的数值以 `u8` 格式进行存储,这样未来我们可以在这里加入更多的中断枚举。 [C-like enum]: https://doc.rust-lang.org/reference/items/enumerations.html#custom-discriminant-values-for-fieldless-enumerations 那么开始为计时器中断添加一个处理函数: ```rust // in src/interrupts.rs use crate::print; lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] idt[InterruptIndex::Timer.as_usize()] .set_handler_fn(timer_interrupt_handler); // new idt }; } extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); } ``` `timer_interrupt_handler` 和错误处理函数具有相同的函数签名,这是因为CPU对异常和外部中断的处理方式是相同的(除了个别异常会传入错误码以外)。[`InterruptDescriptorTable`] 结构实现了 [`IndexMut`] trait,所以我们可以通过序号来单独修改某一个条目。 [`InterruptDescriptorTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.InterruptDescriptorTable.html [`IndexMut`]: https://doc.rust-lang.org/core/ops/trait.IndexMut.html 在我们刚刚写好的处理函数中,我们会往屏幕上输出一个点,随着计时器中断周期性触发,我们应该能看到每一个计时周期过后屏幕上都会多出一个点。然而事实却并不是如此,我们只能在屏幕上看到一个点: ![QEMU printing only a single dot for hardware timer](qemu-single-dot-printed.png) ### 结束中断 这是因为PIC还在等着我们的处理函数返回 “中断结束” (EOI) 信号。该信号会通知控制器终端已处理,系统已准备好接收下一个中断。所以如果始终不发送EOI信号,那么PIC就会认为我们还在一直处理第一个计时器中断,然后暂停了后续的中断信号发送,直到接收到EOI信号。 要发送EOI信号,我们可以再使用一下 `PICS`: ```rust // in src/interrupts.rs extern "x86-interrupt" fn timer_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("."); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Timer.as_u8()); } } ``` `notify_end_of_interrupt` 会自行判断中断信号发送的源头(主PIC或者副PIC),并使用指令和数据端口将信号发送到目标控制器。当然,如果是要发送到副PIC,那么结果上必然等同于同时发送到两个PIC,因为副PIC的输入管脚连在主PIC上面。 请注意,这里的中断编码一定不可以写错,不然可能会导致某个中断信号迟迟得不到回应导致系统整体挂起。这也是该函数被标记为不安全的原因。 现在我们再次运行 `cargo run`,就可以看到屏幕上开始正常输出点号了: ![QEMU printing consecutive dots showing the hardware timer](qemu-hardware-timer-dots.gif) ### 配置计时器 我们所使用的硬件计时器叫做 _可编程周期计时器_ (PIT),就如同字面上的意思一样,其两次中断之间的间隔是可配置的。当然,不会在此展开说,因为我们很快就会使用 [APIC计时器][APIC timer] 来代替它,但是你可以在OSDev wiki中找到一些关于[配置PIT计时器][configuring the PIT]的拓展文章。 [APIC timer]: https://wiki.osdev.org/APIC_timer [configuring the PIT]: https://wiki.osdev.org/Programmable_Interval_Timer ## 死锁 现在,我们的内核里就出现了一种全新的异步逻辑:计时器中断是异步的,所以它可能会在任何时候中断 `_start` 函数的运行。幸运的是Rust的所有权体系为我们在编译期避免了相当比例的bug,其中最典型的就是死锁 —— 当一个线程试图使用一个永远不会被释放的锁时,这个线程就会被永久性挂起。 我们可以在内核里主动引发一次死锁看看,请回忆一下,我们的 `println` 宏调用了 `vga_buffer::_print` 函数,而这个函数又使用了 [`WRITER`][vga spinlock] 变量,该变量被定义为带同步锁的变量: [vga spinlock]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ```rust // in src/vga_buffer.rs […] #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; WRITER.lock().write_fmt(args).unwrap(); } ``` 获取到 `WRITER` 变量的锁后,调用其内部的 `write_fmt` 函数,然后在结尾隐式解锁该变量。但是假如在函数执行一半的时候,中断处理函数触发,同样试图打印日志的话: | Timestep | _start | interrupt_handler | | -------- | ---------------------- | ----------------------------------------------- | | 0 | calls `println!` |   | | 1 | `print` locks `WRITER` |   | | 2 | | **interrupt occurs**, handler begins to run | | 3 | | calls `println!` | | 4 | | `print` tries to lock `WRITER` (already locked) | | 5 | | `print` tries to lock `WRITER` (already locked) | | … | | … | | _never_ | _unlock `WRITER`_ | `WRITER` 被锁定,所以中断处理函数就会一直等待到它被解锁为止,然而后续永远不会发生了,因为只有当中断处理函数返回,`_start` 函数才会继续运行,`WRITER` 才可能被解锁,所以整个系统就这么挂起了。 ### 引发死锁 基于这个原理,我们可以通过在 `_start` 函数中构建一个输出循环来很轻易地触发死锁: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] loop { use blog_os::print; print!("-"); // new } } ``` 在QEMU中运行后,输出是这样的: ![QEMU output with many rows of hyphens and no dots](./qemu-deadlock.png) 我们可以看到,这段程序只输出了有限的中划线,在第一次计时器中断触发后就不再动弹了,这是因为计时器中断对应的处理函数触发了输出宏中潜在的死锁,这也是为什么我们没有在上面的输出中看到点号的原因。 由于计时器中断是完全异步的,所以每次运行能够输出的中划线数量都是不确定的,这种特性也导致和并发相关的bug非常难以调试。 ### 修复死锁 要避免死锁,我们可以在 `Mutex` 被锁定时禁用中断: ```rust // in src/vga_buffer.rs /// Prints the given formatted string to the VGA text buffer /// through the global `WRITER` instance. #[doc(hidden)] pub fn _print(args: fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // new interrupts::without_interrupts(|| { // new WRITER.lock().write_fmt(args).unwrap(); }); } ``` [`without_interrupts`] 函数可以使一个 [闭包][closure] 代码块在无中断环境下执行,由此我们可以让 `Mutex` 变量在锁定期间的执行逻辑不会被中断信号打断。再次运行我们的内核,此时程序就不会被挂起了。(然而我们依然不会看到任何点号,因为输出速度实在是太快了,试着降低一下输出速度就可以了,比如在循环里插入一句 `for _ in 0..10000 {}`。) [`without_interrupts`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.without_interrupts.html [closure]: https://doc.rust-lang.org/book/ch13-01-closures.html 我们也可以在串行输出函数里也加入同样的逻辑来避免死锁: ```rust // in src/serial.rs #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; use x86_64::instructions::interrupts; // new interrupts::without_interrupts(|| { // new SERIAL1 .lock() .write_fmt(args) .expect("Printing to serial failed"); }); } ``` 但请注意,禁用中断不应是被广泛使用的手段,它可能会造成中断的处理延迟增加,比如操作系统是依靠中断信号进行计时的。因此,中断仅应在极短的时间内被禁用。 ## 修复竞态条件 如果你运行 `cargo test` 命令,则会发现`test_println_output` 测试执行失败: ``` > cargo test --lib […] Running 4 tests test_breakpoint_exception...[ok] test_println... [ok] test_println_many... [ok] test_println_output... [failed] Error: panicked at 'assertion failed: `(left == right)` left: `'.'`, right: `'S'`', src/vga_buffer.rs:205:9 ``` 其原因就是测试函数和计时器中断处理函数出现了 _竞态条件_,测试函数是这样的: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { let s = "Some test string that fits on a single line"; println!("{}", s); for (i, c) in s.chars().enumerate() { let screen_char = WRITER.lock().buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } } ``` 该测试将一串字符打印到VGA缓冲区,并通过一个循环检测 `buffer_chars` 数组的内容。竞态条件出现的原因就是在 `println` 和检测逻辑之间触发了计时器中断,其处理函数同样调用了输出语句。不过这并非危险的 _数据竞争_,该种竞争可以被Rust语言在编译期完全避免。如果你对此感兴趣,可以查阅一下 [_Rustonomicon_][nomicon-races]。 [nomicon-races]: https://doc.rust-lang.org/nomicon/races.html 要修复这个问题,我们需要让 `WRITER` 加锁的范围扩大到整个测试函数,使计时器中断处理函数无法输出 `.`,就像这样: ```rust // in src/vga_buffer.rs #[test_case] fn test_println_output() { use core::fmt::Write; use x86_64::instructions::interrupts; let s = "Some test string that fits on a single line"; interrupts::without_interrupts(|| { let mut writer = WRITER.lock(); writeln!(writer, "\n{}", s).expect("writeln failed"); for (i, c) in s.chars().enumerate() { let screen_char = writer.buffer.chars[BUFFER_HEIGHT - 2][i].read(); assert_eq!(char::from(screen_char.ascii_character), c); } }); } ``` 我们进行了如下修改: - 我们使用 `lock()` 函数显式加锁,然后将 `println` 改为 [`writeln`] 宏,以此绕开输出必须加锁的限制。 - 为了避免死锁,我们同时在测试函数执行期间禁用中断,否则中断处理函数可能会意外被触发。 - 为了防止在测试执行前计时器中断被触发所造成的干扰,我们先输出一句 `\n`,即可避免行首出现多余的 `.` 造成的干扰。 [`writeln`]: https://doc.rust-lang.org/core/macro.writeln.html 经过以上修改,`cargo test` 就可以正确运行了。 好在这是一种十分无害的竞态条件,仅仅会导致测试失败,但如你所想,其它形式的竞态条件可能会更加难以调试。幸运的是,更加恶性的数据竞争已经被Rust从根本上避免了,大部分数据竞争都会造成无法预知的行为,比如系统崩溃,或者悄无声息的内存破坏。 ## `hlt` 指令 目前我们在 `_start` 和 `panic` 函数的末尾都使用了一个空白的循环,这的确能让整体逻辑正常运行,但也会让CPU全速运转 —— 尽管此时并没有什么需要计算的工作。如果你在执行内核时打开任务管理器,便会发现QEMU的CPU占用率全程高达100%。 但是,我们可以让CPU在下一个中断触发之前休息一下,也就是进入休眠状态来节省一点点能源。[`hlt` instruction][`hlt` 指令] 可以让我们做到这一点,那就来用它写一个节能的无限循环: [`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) ```rust // in src/lib.rs pub fn hlt_loop() -> ! { loop { x86_64::instructions::hlt(); } } ``` `instructions::hlt` 只是对应汇编指令的 [薄包装][thin wrapper],并且它是内存安全的,没有破坏内存的风险。 [thin wrapper]: https://github.com/rust-osdev/x86_64/blob/5e8e218381c5205f5777cb50da3ecac5d7e3b1ab/src/instructions/mod.rs#L16-L22 现在我们来试着在 `_start` 和 `panic` 中使用 `hlt_loop`: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { […] println!("It did not crash!"); blog_os::hlt_loop(); // new } #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); blog_os::hlt_loop(); // new } ``` 接下来再更新一下 `lib.rs` : ```rust // in src/lib.rs /// Entry point for `cargo test` #[cfg(test)] #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { init(); test_main(); hlt_loop(); // new } pub fn test_panic_handler(info: &PanicInfo) -> ! { serial_println!("[failed]\n"); serial_println!("Error: {}\n", info); exit_qemu(QemuExitCode::Failed); hlt_loop(); // new } ``` 再次在QEMU中执行我们的内核,CPU使用率已经降低到了比较低的水平了。 ## 键盘输入 现在,我们已经知道了如何接收外部设备的中断信号,我们可以进一步对键盘添加支持,由此我们可以与内核进行交互。 [PS/2]: https://en.wikipedia.org/wiki/PS/2_port 就如同硬件计时器一样,键盘控制器也是默认启用的,所以当你敲击键盘上某个按键时,键盘控制器就会经由PIC向CPU发送中断信号。然而CPU此时是无法在IDT找到相关的中断处理函数的,所以 double fault 异常会被抛出。 所以我们需要为键盘中断添加一个处理函数,它十分类似于计时器中断处理的实现,只不过需要对中断编号做出一点小小的修改: ```rust // in src/interrupts.rs #[derive(Debug, Clone, Copy)] #[repr(u8)] pub enum InterruptIndex { Timer = PIC_1_OFFSET, Keyboard, // new } lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); idt.breakpoint.set_handler_fn(breakpoint_handler); […] // new idt[InterruptIndex::Keyboard.as_usize()] .set_handler_fn(keyboard_interrupt_handler); idt }; } extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { print!("k"); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` [上文](#the-8259-pic) 提到,键盘使用的是主PIC的1号管脚,在CPU的中断编号为33(1 + 偏移量32)。我们需要在 `InterruptIndex` 枚举类型里添加一个 `Keyboard`,但是无需显式指定对应值,因为在默认情况下,它的对应值是上一个枚举对应值加一也就是33。在处理函数中,我们先输出一个 `k`,并发送结束信号来结束中断。 现在当我们按下任意一个按键,就会在屏幕上输出一个 `k`,然而这只会生效一次,因为键盘控制器在我们 _获取扫描码_ 之前,是不会发送下一个中断的。 ### 读取扫描码 要找到哪个按键被按下,我们还需要询问一下键盘控制器,我们可以从 PS/2 控制器(即地址为 `0x60` 的 [I/O端口][I/O port])的数据端口获取到该信息: [I/O port]: @/edition-2/posts/04-testing/index.md#i-o-ports ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; print!("{}", scancode); unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` 我们使用了 `x86_64` crate 中的 [`Port`] 来从键盘数据端口中读取名为 [_扫描码_] 的随着按键按下/释放而不断变化的数字。我们暂且不处理它,只是在屏幕上打印出来: [`Port`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/port/struct.Port.html [_scancode_]: https://en.wikipedia.org/wiki/Scancode ![QEMU printing scancodes to the screen when keys are pressed](qemu-printing-scancodes.gif) 在上图中,演示的正是缓慢输入 `123` 的结果。我们可以看到,相邻的按键具备相邻的扫描码,而按下按键和松开按键也会出现不同的扫描码,那么问题来了,我们该如何对这些扫描码进行译码? ### 扫描码转义 关于按键与键位码之间的映射关系,目前存在三种不同的标准(所谓的 _扫描码映射集_)。三种标准都可以追溯到早期的IBM电脑键盘:[IBM XT]、 [IBM 3270 PC]和[IBM AT]。好在之后的电脑并未另起炉灶定义新的扫描码映射集,但也对现有类型进行模拟并加以扩展,如今的绝大多数键盘都可以模拟成这三种类型之一。 [IBM XT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer_XT [IBM 3270 PC]: https://en.wikipedia.org/wiki/IBM_3270_PC [IBM AT]: https://en.wikipedia.org/wiki/IBM_Personal_Computer/AT 默认情况下,PS/2 键盘会模拟Set-1(XT),在该布局下,扫描码的低7位表示按键,而其他的比特位则定义了是按下(0)还是释放(1)。不过这些按键并非都存在于原本的 [IBM XT] 键盘上,比如小键盘的回车键,此时就会连续生成两个扫描码:`0xe0` 以及一个自定义的代表该键位的数字。[OSDev Wiki][scancode set 1] 可以查阅到Set-1下的扫描码对照表。 [scancode set 1]: https://wiki.osdev.org/Keyboard#Scan_Code_Set_1 要将扫描码译码成按键,我们可以用一个match匹配语句: ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; // new let key = match scancode { 0x02 => Some('1'), 0x03 => Some('2'), 0x04 => Some('3'), 0x05 => Some('4'), 0x06 => Some('5'), 0x07 => Some('6'), 0x08 => Some('7'), 0x09 => Some('8'), 0x0a => Some('9'), 0x0b => Some('0'), _ => None, }; if let Some(key) = key { print!("{}", key); } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` 以上代码可以对数字按键0-9进行转义,并忽略其他键位。具体到程序逻辑中,就是使用 [match] 匹配映射数字0-9,对于其他扫描码则返回 `None`,然后使用 [`if let`] 语句对 `key` 进行解构取值,在这个语法中,代码块中的 `key` 会 [遮蔽][shadow] 掉代码块外的同名 `Option` 型变量。 [match]: https://doc.rust-lang.org/book/ch06-02-match.html [`if let`]: https://doc.rust-lang.org/book/ch19-01-all-the-places-for-patterns.html#conditional-if-let-expressions [shadow]: https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#shadowing 现在我们就可以向控制台写入数字了: ![QEMU printing numbers to the screen](qemu-printing-numbers.gif) 其他扫描码也可以通过同样的手段进行译码,不过真的很麻烦,好在 [`pc-keyboard`] crate 已经帮助我们实现了Set-1和Set-2的译码工作,所以无需自己去实现。所以我们只需要将下述内容添加到 `Cargo.toml`,并在 `lib.rs` 里进行引用: [`pc-keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/ ```toml # in Cargo.toml [dependencies] pc-keyboard = "0.7.0" ``` 现在我们可以使用新的crate对 `keyboard_interrupt_handler` 进行改写: ```rust // in/src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame) { use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use spin::Mutex; use x86_64::instructions::port::Port; lazy_static! { static ref KEYBOARD: Mutex> = Mutex::new(Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore) ); } let mut keyboard = KEYBOARD.lock(); let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` 首先我们使用 `lazy_static` 宏创建一个受到Mutex同步锁保护的 [`Keyboard`] 对象,初始化参数为美式键盘布局以及Set-1。至于 [`HandleControl`],它可以设定为将 `ctrl+[a-z]` 映射为Unicode字符 `U+0001` 至 `U+001A`,但我们不想这样,所以使用了 `Ignore` 选项让 `ctrl` 仅仅表现为一个正常键位。 [`HandleControl`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/enum.HandleControl.html 对于每一个中断,我们都会为 KEYBOARD 加锁,从键盘控制器获取扫描码并将其传入 [`add_byte`] 函数,并将其转化为 `Option` 结构。[`KeyEvent`] 包括了触发本次中断的按键信息,以及子动作是按下还是释放。 [`Keyboard`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html [`add_byte`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.add_byte [`KeyEvent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.KeyEvent.html 要处理KeyEvent,我们还需要将其传入 [`process_keyevent`] 函数,将其转换为人类可读的字符,若果有必要,也会对字符进行一些处理。典型例子就是,要判断 `A` 键按下后输入的是小写 `a` 还是大写 `A`,这要取决于shift键是否同时被按下。 [`process_keyevent`]: https://docs.rs/pc-keyboard/0.7.0/pc_keyboard/struct.Keyboard.html#method.process_keyevent 进行这些修改之后,我们就可以正常输入英文了: ![Typing "Hello World" in QEMU](qemu-typing.gif) ### 配置键盘 PS/2 键盘可以配置的地方其实还有很多,比如设定它使用何种扫描码映射集,然而这篇文章已经够长了,就不在此展开说明,如果有兴趣,可以在OSDev wiki查看[更详细的资料][configuration commands]。 [configuration commands]: https://wiki.osdev.org/PS/2_Keyboard#Commands ## 小结 本文描述了如何启用并处理外部中断。我们学习了关于8259 PIC的主副布局、重映射中断编号以及结束中断信号的基础知识,实现了简单的硬件计时器和键盘的中断处理器,以及如何使用 `hlt` 指令让CPU休眠至下次接收到中断信号。 现在我们已经可以和内核进行交互,满足了创建简易控制台或简易游戏的基础条件。 ## 下文预告 计时器中断对操作系统而言至关重要,它可以使内核定期重新获得控制权,由此内核可以对线程进行调度,创造出多个线程并行执行的错觉。 然而在我们创建进程或线程之前,我们还需要解决内存分配问题。下一篇文章中,我们就会对内存管理进行阐述,以提供后续功能会使用到的基础设施。 ================================================ FILE: blog/content/edition-2/posts/08-paging-introduction/index.es.md ================================================ +++ title = "Introducción a la Paginación" weight = 8 path = "es/paging-introduction" date = 2019-01-14 [extra] chapter = "Gestión de Memoria" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ Esta publicación introduce la _paginación_ (paging), un esquema de gestión de memoria muy común que también utilizaremos para nuestro sistema operativo. Explica por qué se necesita la aislamiento de memoria, cómo funciona la _segmentación_ (segmentation), qué es la _memoria virtual_ (virtual memory) y cómo la paginación soluciona los problemas de fragmentación de memoria. También explora el diseño de las tablas de páginas multinivel en la arquitectura x86_64. Este blog se desarrolla abiertamente en [GitHub]. Si tienes algún problema o pregunta, por favor abre un issue allí. También puedes dejar comentarios [al final]. El código fuente completo de esta publicación se puede encontrar en la rama [`post-08`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-08 ## Protección de Memoria Una de las principales tareas de un sistema operativo es aislar programas entre sí. Tu navegador web no debería poder interferir con tu editor de texto, por ejemplo. Para lograr este objetivo, los sistemas operativos utilizan funcionalidades de hardware para asegurarse de que las áreas de memoria de un proceso no sean accesibles por otros procesos. Hay diferentes enfoques dependiendo del hardware y la implementación del sistema operativo. Como ejemplo, algunos procesadores ARM Cortex-M (usados en sistemas embebidos) tienen una _Unidad de Protección de Memoria_ (Memory Protection Unit, MPU), que permite definir un pequeño número (por ejemplo, 8) de regiones de memoria con diferentes permisos de acceso (por ejemplo, sin acceso, solo lectura, lectura-escritura). En cada acceso a la memoria, la MPU asegura que la dirección esté en una región con permisos de acceso correctos y lanza una excepción en caso contrario. Al cambiar las regiones y los permisos de acceso en cada cambio de proceso, el sistema operativo puede asegurarse de que cada proceso solo acceda a su propia memoria y, por lo tanto, aísla los procesos entre sí. [_Unidad de Protección de Memoria_]: https://developer.arm.com/docs/ddi0337/e/memory-protection-unit/about-the-mpu En x86, el hardware admite dos enfoques diferentes para la protección de memoria: [segmentación] y [paginación]. [segmentación]: https://en.wikipedia.org/wiki/X86_memory_segmentation [paginación]: https://en.wikipedia.org/wiki/Virtual_memory#Paged_virtual_memory ## Segmentación La segmentación fue introducida en 1978, originalmente para aumentar la cantidad de memoria direccionable. La situación en ese entonces era que las CPU solo usaban direcciones de 16 bits, lo que limitaba la cantidad de memoria direccionable a 64 KiB. Para hacer accesibles más de estos 64 KiB, se introdujeron registros de segmento adicionales, cada uno conteniendo una dirección de desplazamiento. La CPU sumaba automáticamente este desplazamiento en cada acceso a la memoria, de modo que hasta 1 MiB de memoria era accesible. El registro del segmento es elegido automáticamente por la CPU dependiendo del tipo de acceso a la memoria: para obtener instrucciones, se utiliza el segmento de código `CS`, y para operaciones de pila (push/pop), se utiliza el segmento de pila `SS`. Otras instrucciones utilizan el segmento de datos `DS` o el segmento adicional `ES`. Más tarde, se añadieron dos registros de segmento adicionales, `FS` y `GS`, que pueden ser utilizados libremente. En la primera versión de la segmentación, los registros de segmento contenían directamente el desplazamiento y no se realizaba control de acceso. Esto se cambió más tarde con la introducción del _modo protegido_ (protected mode). Cuando la CPU funciona en este modo, los descriptores de segmento contienen un índice a una _tabla de descriptores_ local o global, que contiene – además de una dirección de desplazamiento – el tamaño del segmento y los permisos de acceso. Al cargar tablas de descriptores globales/locales separadas para cada proceso, que confinan los accesos de memoria a las áreas de memoria del propio proceso, el sistema operativo puede aislar los procesos entre sí. [_modo protegido_]: https://en.wikipedia.org/wiki/X86_memory_segmentation#Protected_mode [_tabla de descriptores_]: https://en.wikipedia.org/wiki/Global_Descriptor_Table Al modificar las direcciones de memoria antes del acceso real, la segmentación ya utilizaba una técnica que ahora se usa casi en todas partes: _memoria virtual_ (virtual memory). ### Memoria Virtual La idea detrás de la memoria virtual es abstraer las direcciones de memoria del dispositivo de almacenamiento físico subyacente. En lugar de acceder directamente al dispositivo de almacenamiento, se realiza primero un paso de traducción. Para la segmentación, el paso de traducción consiste en agregar la dirección de desplazamiento del segmento activo. Imagina un programa que accede a la dirección de memoria `0x1234000` en un segmento con un desplazamiento de `0x1111000`: La dirección que realmente se accede es `0x2345000`. Para diferenciar los dos tipos de direcciones, se llaman _virtuales_ a las direcciones antes de la traducción, y _físicas_ a las direcciones después de la traducción. Una diferencia importante entre estos dos tipos de direcciones es que las direcciones físicas son únicas y siempre se refieren a la misma ubicación de memoria distinta. Las direcciones virtuales, en cambio, dependen de la función de traducción. Es completamente posible que dos direcciones virtuales diferentes se refieran a la misma dirección física. Además, direcciones virtuales idénticas pueden referirse a diferentes direcciones físicas cuando utilizan diferentes funciones de traducción. Un ejemplo donde esta propiedad es útil es ejecutar el mismo programa en paralelo dos veces: ![Dos espacios de direcciones virtuales con direcciones 0–150, uno traducido a 100–250, el otro a 300–450](segmentation-same-program-twice.svg) Aquí el mismo programa se ejecuta dos veces, pero con diferentes funciones de traducción. La primera instancia tiene un desplazamiento de segmento de 100, de manera que sus direcciones virtuales 0–150 se traducen a las direcciones físicas 100–250. La segunda instancia tiene un desplazamiento de 300, que traduce sus direcciones virtuales 0–150 a direcciones físicas 300–450. Esto permite que ambos programas ejecuten el mismo código y utilicen las mismas direcciones virtuales sin interferir entre sí. Otra ventaja es que los programas ahora se pueden colocar en ubicaciones de memoria física arbitrarias, incluso si utilizan direcciones virtuales completamente diferentes. Por lo tanto, el sistema operativo puede utilizar la cantidad total de memoria disponible sin necesidad de recompilar programas. ### Fragmentación La diferenciación entre direcciones virtuales y físicas hace que la segmentación sea realmente poderosa. Sin embargo, tiene el problema de la fragmentación. Como ejemplo, imagina que queremos ejecutar una tercera copia del programa que vimos anteriormente: ![Tres espacios de direcciones virtuales, pero no hay suficiente espacio continuo para el tercero](segmentation-fragmentation.svg) No hay forma de mapear la tercera instancia del programa a la memoria virtual sin superposición, a pesar de que hay más que suficiente memoria libre disponible. El problema es que necesitamos memoria _continua_ y no podemos utilizar los pequeños fragmentos libres. Una forma de combatir esta fragmentación es pausar la ejecución, mover las partes utilizadas de la memoria más cerca entre sí, actualizar la traducción y luego reanudar la ejecución: ![Tres espacios de direcciones virtuales después de la desfragmentación](segmentation-fragmentation-compacted.svg) Ahora hay suficiente espacio continuo para iniciar la tercera instancia de nuestro programa. La desventaja de este proceso de desfragmentación es que necesita copiar grandes cantidades de memoria, lo que disminuye el rendimiento. También necesita hacerse regularmente antes de que la memoria se fragmenta demasiado. Esto hace que el rendimiento sea impredecible, ya que los programas son pausados en momentos aleatorios y podrían volverse no responsivos. El problema de la fragmentación es una de las razones por las que la segmentación ya no se utiliza en la mayoría de los sistemas. De hecho, la segmentación ni siquiera es compatible en el modo de 64 bits en x86. En su lugar, se utiliza _paginación_ (paging), que evita por completo el problema de la fragmentación. ## Paginación La idea es dividir tanto el espacio de memoria virtual como el físico en bloques pequeños de tamaño fijo. Los bloques del espacio de memoria virtual se llaman _páginas_ (pages), y los bloques del espacio de direcciones físicas se llaman _marcos_ (frames). Cada página puede ser mapeada individualmente a un marco, lo que hace posible dividir regiones de memoria más grandes a través de marcos físicos no consecutivos. La ventaja de esto se ve claramente si recapitulamos el ejemplo del espacio de memoria fragmentado, pero usamos paginación en lugar de segmentación esta vez: ![Con paginación, la tercera instancia del programa puede dividirse entre muchas áreas físicas más pequeñas.](paging-fragmentation.svg) En este ejemplo, tenemos un tamaño de página de 50 bytes, lo que significa que cada una de nuestras regiones de memoria se divide en tres páginas. Cada página se mapea a un marco individualmente, por lo que una región de memoria virtual continua puede ser mapeada a marcos físicos no continuos. Esto nos permite iniciar la tercera instancia del programa sin realizar ninguna desfragmentación antes. ### Fragmentación Oculta En comparación con la segmentación, la paginación utiliza muchas pequeñas regiones de memoria de tamaño fijo en lugar de unas pocas grandes regiones de tamaño variable. Dado que cada marco tiene el mismo tamaño, no hay marcos que sean demasiado pequeños para ser utilizados, por lo que no ocurre fragmentación. O _parece_ que no ocurre fragmentación. Aún existe algún tipo oculto de fragmentación, la llamada _fragmentación interna_ (internal fragmentation). La fragmentación interna ocurre porque no cada región de memoria es un múltiplo exacto del tamaño de la página. Imagina un programa de tamaño 101 en el ejemplo anterior: aún necesitaría tres páginas de tamaño 50, por lo que ocuparía 49 bytes más de lo necesario. Para diferenciar los dos tipos de fragmentación, el tipo de fragmentación que ocurre al usar segmentación se llama _fragmentación externa_ (external fragmentation). La fragmentación interna es desafortunada pero a menudo es mejor que la fragmentación externa que ocurre con la segmentación. Aún desperdicia memoria, pero no requiere desfragmentación y hace que la cantidad de fragmentación sea predecible (en promedio, media página por región de memoria). ### Tablas de Páginas Vimos que cada una de las potencialmente millones de páginas se mapea individualmente a un marco. Esta información de mapeo necesita ser almacenada en algún lugar. La segmentación utiliza un registro de selector de segmento individual para cada región de memoria activa, lo cual no es posible para la paginación, ya que hay muchas más páginas que registros. En su lugar, la paginación utiliza una estructura tabular llamada _tabla de páginas_ (page table) para almacenar la información de mapeo. Para nuestro ejemplo anterior, las tablas de páginas se verían así: ![Tres tablas de páginas, una para cada instancia del programa. Para la instancia 1, el mapeo es 0->100, 50->150, 100->200. Para la instancia 2, es 0->300, 50->350, 100->400. Para la instancia 3, es 0->250, 50->450, 100->500.](paging-page-tables.svg) Vemos que cada instancia del programa tiene su propia tabla de páginas. Un puntero a la tabla actualmente activa se almacena en un registro especial de la CPU. En `x86`, este registro se llama `CR3`. Es trabajo del sistema operativo cargar este registro con el puntero a la tabla de páginas correcta antes de ejecutar cada instancia del programa. En cada acceso a la memoria, la CPU lee el puntero de la tabla del registro y busca el marco mapeado para la página accedida en la tabla. Esto se realiza completamente en hardware y es completamente invisible para el programa en ejecución. Para agilizar el proceso de traducción, muchas arquitecturas de CPU tienen una caché especial que recuerda los resultados de las últimas traducciones. Dependiendo de la arquitectura, las entradas de las tablas de páginas también pueden almacenar atributos como permisos de acceso en un campo de banderas. En el ejemplo anterior, la bandera "r/w" hace que la página sea tanto legible como escribible. ### Tablas de Páginas multinivel Las simples tablas de páginas que acabamos de ver tienen un problema en espacios de direcciones más grandes: desperdician memoria. Por ejemplo, imagina un programa que utiliza las cuatro páginas virtuales `0`, `1_000_000`, `1_000_050` y `1_000_100` (usamos `_` como separador de miles): ![Página 0 mapeada al marco 0 y páginas `1_000_000`–`1_000_150` mapeadas a marcos 100–250](single-level-page-table.svg) Solo necesita 4 marcos físicos, pero la tabla de páginas tiene más de un millón de entradas. No podemos omitir las entradas vacías porque entonces la CPU ya no podría saltar directamente a la entrada correcta en el proceso de traducción (por ejemplo, ya no se garantiza que la cuarta página use la cuarta entrada). Para reducir la memoria desperdiciada, podemos usar una **tabla de páginas de dos niveles**. La idea es que utilizamos diferentes tablas de páginas para diferentes regiones de direcciones. Una tabla adicional llamada tabla de páginas _nivel 2_ (level 2) contiene el mapeo entre las regiones de direcciones y las tablas de páginas (nivel 1). Esto se explica mejor con un ejemplo. Supongamos que cada tabla de páginas de nivel 1 es responsable de una región de tamaño `10_000`. Entonces, las siguientes tablas existirían para el mapeo anterior: ![Página 0 apunta a la entrada 0 de la tabla de páginas de nivel 2, que apunta a la tabla de páginas de nivel 1 T1. La primera entrada de T1 apunta al marco 0; las otras entradas están vacías. Las páginas `1_000_000`–`1_000_150` apuntan a la entrada 100 de la tabla de páginas de nivel 2, que apunta a una tabla de páginas de nivel 1 diferente T2. Las tres primeras entradas de T2 apuntan a marcos 100–250; las otras entradas están vacías.](multilevel-page-table.svg) La página 0 cae en la primera región de `10_000` bytes, por lo que utiliza la primera entrada de la tabla de páginas de nivel 2. Esta entrada apunta a la tabla de páginas de nivel 1 T1, que especifica que la página `0` apunta al marco `0`. Las páginas `1_000_000`, `1_000_050` y `1_000_100` caen todas en la entrada número 100 de la región de `10_000` bytes, por lo que utilizan la entrada 100 de la tabla de páginas de nivel 2. Esta entrada apunta a una tabla de páginas de nivel 1 diferente T2, que mapea las tres páginas a los marcos `100`, `150` y `200`. Ten en cuenta que la dirección de página en las tablas de nivel 1 no incluye el desplazamiento de región. Por ejemplo, la entrada para la página `1_000_050` es solo `50`. Aún tenemos 100 entradas vacías en la tabla de nivel 2, pero muchas menos que el millón de entradas vacías de antes. La razón de este ahorro es que no necesitamos crear tablas de páginas de nivel 1 para las regiones de memoria no mapeadas entre `10_000` y `1_000_000`. El principio de las tablas de páginas de dos niveles se puede extender a tres, cuatro o más niveles. Luego, el registro de la tabla de páginas apunta a la tabla de nivel más alto, que apunta a la tabla de nivel más bajo, que apunta a la siguiente tabla de nivel inferior, y así sucesivamente. La tabla de páginas de nivel 1 luego apunta al marco mapeado. El principio en general se llama _tabla de páginas multinivel_ (multilevel page table) o _jerárquica_. Ahora que sabemos cómo funcionan la paginación y las tablas de páginas multinivel, podemos ver cómo se implementa la paginación en la arquitectura x86_64 (suponemos en lo siguiente que la CPU funciona en modo de 64 bits). ## Paginación en x86_64 La arquitectura x86_64 utiliza una tabla de páginas de 4 niveles y un tamaño de página de 4 KiB. Cada tabla de páginas, independientemente del nivel, tiene un tamaño fijo de 512 entradas. Cada entrada tiene un tamaño de 8 bytes, por lo que cada tabla tiene un tamaño de 512 * 8 B = 4 KiB y, por lo tanto, encaja exactamente en una página. El índice de la tabla de páginas para cada nivel se deriva directamente de la dirección virtual: ![Los bits 0–12 son el desplazamiento de la página, los bits 12–21 el índice de nivel 1, los bits 21–30 el índice de nivel 2, los bits 30–39 el índice de nivel 3, y los bits 39–48 el índice de nivel 4](x86_64-table-indices-from-address.svg) Vemos que cada índice de tabla consta de 9 bits, lo que tiene sentido porque cada tabla tiene 2^9 = 512 entradas. Los 12 bits más bajos son el desplazamiento en la página de 4 KiB (2^12 bytes = 4 KiB). Los bits 48 a 64 se descartan, lo que significa que x86_64 no es realmente de 64 bits, ya que solo admite direcciones de 48 bits. A pesar de que se descartan los bits 48 a 64, no pueden establecerse en valores arbitrarios. En cambio, todos los bits en este rango deben ser copias del bit 47 para mantener las direcciones únicas y permitir extensiones futuras como la tabla de páginas de 5 niveles. Esto se llama _extensión de signo_ (sign-extension) porque es muy similar a la [extensión de signo en complemento a dos]. Cuando una dirección no está correctamente extendida de signo, la CPU lanza una excepción. [extensión de signo en complemento a dos]: https://en.wikipedia.org/wiki/Two's_complement#Sign_extension Cabe destacar que los recientes procesadores Intel "Ice Lake" admiten opcionalmente [tablas de páginas de 5 niveles] para extender las direcciones virtuales de 48 bits a 57 bits. Dado que optimizar nuestro núcleo para una CPU específica no tiene sentido en esta etapa, solo trabajaremos con tablas de páginas de 4 niveles estándar en esta publicación. [tablas de páginas de 5 niveles]: https://en.wikipedia.org/wiki/Intel_5-level_paging ### Ejemplo de Traducción Pasemos por un ejemplo para entender cómo funciona el proceso de traducción en detalle: ![Un ejemplo de una jerarquía de 4 niveles de páginas con cada tabla de páginas mostrada en memoria física](x86_64-page-table-translation.svg) La dirección física de la tabla de páginas de nivel 4 actualmente activa, que es la raíz de la tabla de páginas de 4 niveles, se almacena en el registro `CR3`. Cada entrada de la tabla de nivel 1 luego apunta al marco físico de la tabla del siguiente nivel. La entrada de la tabla de nivel 1 luego apunta al marco mapeado. Ten en cuenta que todas las direcciones en las tablas de páginas son físicas en lugar de virtuales, porque de lo contrario la CPU también necesitaría traducir esas direcciones (lo que podría provocar una recursión interminable). La jerarquía de tablas de páginas anterior mapea dos páginas (en azul). A partir de los índices de la tabla de páginas, podemos deducir que las direcciones virtuales de estas dos páginas son `0x803FE7F000` y `0x803FE00000`. Veamos qué sucede cuando el programa intenta leer desde la dirección `0x803FE7F5CE`. Primero, convertimos la dirección a binario y determinamos los índices de la tabla de páginas y el desplazamiento de la página para la dirección: ![Los bits de extensión de signo son todos 0, el índice de nivel 4 es 1, el índice de nivel 3 es 0, el índice de nivel 2 es 511, el índice de nivel 1 es 127, y el desplazamiento de la página es 0x5ce](x86_64-page-table-translation-addresses.png) Con estos índices, ahora podemos recorrer la jerarquía de la tabla de páginas para determinar el marco mapeado para la dirección: - Comenzamos leyendo la dirección de la tabla de nivel 4 del registro `CR3`. - El índice de nivel 4 es 1, así que miramos la entrada en el índice 1 de esa tabla, que nos dice que la tabla de nivel 3 se almacena en la dirección 16 KiB. - Cargamos la tabla de nivel 3 desde esa dirección y miramos la entrada en el índice 0, que nos apunta a la tabla de nivel 2 en 24 KiB. - El índice de nivel 2 es 511, así que miramos la última entrada de esa página para averiguar la dirección de la tabla de nivel 1. - A través de la entrada en el índice 127 de la tabla de nivel 1, finalmente descubrimos que la página está mapeada al marco de 12 KiB, o 0x3000 en hexadecimal. - El paso final es agregar el desplazamiento de la página a la dirección del marco para obtener la dirección física 0x3000 + 0x5ce = 0x35ce. ![El mismo ejemplo de jerarquía de 4 niveles de páginas con 5 flechas adicionales: "Paso 0" del registro CR3 a la tabla de nivel 4, "Paso 1" de la entrada de nivel 4 a la tabla de nivel 3, "Paso 2" de la entrada de nivel 3 a la tabla de nivel 2, "Paso 3" de la entrada de nivel 2 a la tabla de nivel 1, y "Paso 4" de la tabla de nivel 1 a los marcos mapeados.](x86_64-page-table-translation-steps.svg) Los permisos para la página en la tabla de nivel 1 son `r`, lo que significa que es solo de lectura. El hardware hace cumplir estos permisos y lanzaría una excepción si intentáramos escribir en esa página. Los permisos en las páginas de niveles superiores restringen los posibles permisos en niveles inferiores, por lo que si establecemos la entrada de nivel 3 como solo lectura, ninguna página que use esta entrada puede ser escribible, incluso si los niveles inferiores especifican permisos de lectura/escritura. Es importante tener en cuenta que, aunque este ejemplo utilizó solo una instancia de cada tabla, normalmente hay múltiples instancias de cada nivel en cada espacio de direcciones. En el máximo, hay: - una tabla de nivel 4, - 512 tablas de nivel 3 (porque la tabla de nivel 4 tiene 512 entradas), - 512 * 512 tablas de nivel 2 (porque cada una de las 512 tablas de nivel 3 tiene 512 entradas), y - 512 * 512 * 512 tablas de nivel 1 (512 entradas para cada tabla de nivel 2). ### Formato de la Tabla de Páginas Las tablas de páginas en la arquitectura x86_64 son básicamente un array de 512 entradas. En sintaxis de Rust: ```rust #[repr(align(4096))] pub struct PageTable { entries: [PageTableEntry; 512], } ``` Como se indica por el atributo `repr`, las tablas de páginas necesitan estar alineadas a la página, es decir, alineadas en un límite de 4 KiB. Este requisito garantiza que una tabla de páginas siempre llene una página completa y permite una optimización que hace que las entradas sean muy compactas. Cada entrada tiene un tamaño de 8 bytes (64 bits) y tiene el siguiente formato: | Bit(s) | Nombre | Significado | | ------ | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | | 0 | presente | la página está actualmente en memoria | | 1 | escribible | se permite escribir en esta página | | 2 | accesible por el usuario | si no se establece, solo el código en modo núcleo puede acceder a esta página | | 3 | caché de escritura a través | las escrituras van directamente a la memoria | | 4 | desactivar caché | no se utiliza caché para esta página | | 5 | accedido | la CPU establece este bit cuando se utiliza esta página | | 6 | sucio | la CPU establece este bit cuando se realiza una escritura en esta página | | 7 | página enorme/null | debe ser 0 en P1 y P4, crea una página de 1 GiB en P3, crea una página de 2 MiB en P2 | | 8 | global | la página no se borra de las cachés al cambiar el espacio de direcciones (el bit PGE del registro CR4 debe estar establecido) | | 9-11 | disponible | puede ser utilizado libremente por el sistema operativo | | 12-51 | dirección física | la dirección física alineada de 52 bits del marco o de la siguiente tabla de páginas | | 52-62 | disponible | puede ser utilizado libremente por el sistema operativo | | 63 | no ejecutar | prohibir la ejecución de código en esta página (el bit NXE en el registro EFER debe estar establecido) | Vemos que solo los bits 12–51 se utilizan para almacenar la dirección física del marco. Los bits restantes se utilizan como banderas o pueden ser utilizados libremente por el sistema operativo. Esto es posible porque siempre apuntamos a una dirección alineada a 4096 bytes, ya sea a una tabla de páginas alineada a la página o al inicio de un marco mapeado. Esto significa que los bits 0–11 son siempre cero, por lo que no hay razón para almacenar estos bits porque el hardware puede simplemente configurarlos en cero antes de usar la dirección. Lo mismo es cierto para los bits 52–63, ya que la arquitectura x86_64 solo admite direcciones físicas de 52 bits (similar a como solo admite direcciones virtuales de 48 bits). Veamos más de cerca las banderas disponibles: - La bandera `presente` diferencia las páginas mapeadas de las no mapeadas. Puede usarse para intercambiar temporalmente páginas en disco cuando la memoria principal se llena. Cuando la página se accede posteriormente, ocurre una excepción especial llamada _fallo de página_ (page fault), a la cual el sistema operativo puede reaccionar volviendo a cargar la página faltante desde el disco y luego continuar el programa. - Las banderas `escribible` y `no ejecutar` controlan si el contenido de la página es escribible o contiene instrucciones ejecutables, respectivamente. - Las banderas `accedido` y `sucio` son automáticamente configuradas por la CPU cuando se produce una lectura o escritura en la página. Esta información puede ser utilizada por el sistema operativo, por ejemplo, para decidir qué páginas intercambiar o si el contenido de la página ha sido modificado desde el último guardado en disco. - Las banderas `caché de escritura a través` y `desactivar caché` permiten el control de cachés para cada página individualmente. - La bandera `accesible por el usuario` hace que una página esté disponible para el código de espacio de usuario, de lo contrario, solo es accesible cuando la CPU está en modo núcleo. Esta característica puede utilizarse para hacer [llamadas al sistema] más rápidas manteniendo el núcleo mapeado mientras un programa de espacio de usuario se está ejecutando. Sin embargo, la vulnerabilidad [Spectre] puede permitir que los programas de espacio de usuario lean estas páginas, sin embargo. - La bandera `global` le indica al hardware que una página está disponible en todos los espacios de direcciones y, por lo tanto, no necesita ser eliminada de la caché de traducción (ver la sección sobre el TLB a continuación) al cambiar de espacio de direcciones. Esta bandera se utiliza comúnmente junto con una bandera `accesible por el usuario` desactivada para mapear el código del núcleo a todos los espacios de direcciones. - La bandera `página enorme` permite la creación de páginas de tamaños más grandes al permitir que las entradas de las tablas de nivel 2 o nivel 3 apunten directamente a un marco mapeado. Con este bit establecido, el tamaño de la página aumenta por un factor de 512 a 2 MiB = 512 * 4 KiB para las entradas de nivel 2 o incluso 1 GiB = 512 * 2 MiB para las entradas de nivel 3. La ventaja de usar páginas más grandes es que se necesitan menos líneas de la caché de traducción y menos tablas de páginas. [llamadas al sistema]: https://en.wikipedia.org/wiki/System_call [Spectre]: https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) El crate `x86_64` proporciona tipos para [tablas de páginas] y sus [entradas], por lo que no necesitamos crear estas estructuras nosotros mismos. [tablas de páginas]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTable.html [entradas]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html ### El Buffer de Traducción (TLB) Una tabla de páginas de 4 niveles hace que la traducción de direcciones virtuales sea costosa porque cada traducción requiere cuatro accesos a la memoria. Para mejorar el rendimiento, la arquitectura x86_64 almacena en caché las últimas traducciones en el denominado _buffer de traducción_ (translation lookaside buffer, TLB). Esto permite omitir la traducción cuando todavía está en caché. A diferencia de las demás cachés de la CPU, el TLB no es completamente transparente y no actualiza ni elimina traducciones cuando cambian los contenidos de las tablas de páginas. Esto significa que el núcleo debe actualizar manualmente el TLB cada vez que modifica una tabla de páginas. Para hacer esto, hay una instrucción especial de la CPU llamada [`invlpg`] ("invalidar página") que elimina la traducción para la página especificada del TLB, de modo que se vuelva a cargar desde la tabla de páginas en el siguiente acceso. El crate `x86_64` proporciona funciones en Rust para ambas variantes en el [`módulo tlb`]. [`invlpg`]: https://www.felixcloutier.com/x86/INVLPG.html [`módulo tlb`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tlb/index.html Es importante recordar limpiar el TLB en cada modificación de tabla de páginas porque de lo contrario, la CPU podría seguir utilizando la vieja traducción, lo que puede llevar a errores no determinísticos que son muy difíciles de depurar. ## Implementación Una cosa que aún no hemos mencionado: **Nuestro núcleo ya se ejecuta sobre paginación**. El bootloader (cargador de arranque) que añadimos en la publicación ["Un núcleo mínimo de Rust"] ya ha configurado una jerarquía de paginación de 4 niveles que mapea cada página de nuestro núcleo a un marco físico. El bootloader hace esto porque la paginación es obligatoria en el modo de 64 bits en x86_64. ["Un núcleo mínimo de Rust"]: @/edition-2/posts/02-minimal-rust-kernel/index.md#creating-a-bootimage Esto significa que cada dirección de memoria que utilizamos en nuestro núcleo era una dirección virtual. Acceder al búfer VGA en la dirección `0xb8000` solo funcionó porque el bootloader _mapeó por identidad_ esa página de memoria, lo que significa que mapeó la página virtual `0xb8000` al marco físico `0xb8000`. La paginación hace que nuestro núcleo ya sea relativamente seguro, ya que cada acceso a memoria que está fuera de límites causa una excepción de fallo de página en lugar de escribir en la memoria física aleatoria. El bootloader incluso establece los permisos de acceso correctos para cada página, lo que significa que solo las páginas que contienen código son ejecutables y solo las páginas de datos son escribibles. ### Fallos de Página Intentemos causar un fallo de página accediendo a alguna memoria fuera de nuestro núcleo. Primero, creamos un controlador de fallos de página y lo registramos en nuestra IDT, para que veamos una excepción de fallo de página en lugar de un fallo doble genérico: [fallo doble]: @/edition-2/posts/06-double-faults/index.md ```rust // en src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); […] idt.page_fault.set_handler_fn(page_fault_handler); // nuevo idt }; } use x86_64::structures::idt::PageFaultErrorCode; use crate::hlt_loop; extern "x86-interrupt" fn page_fault_handler( stack_frame: InterruptStackFrame, error_code: PageFaultErrorCode, ) { use x86_64::registers::control::Cr2; println!("EXCEPCIÓN: FALLO DE PÁGINA"); println!("Dirección Accedida: {:?}", Cr2::read()); println!("Código de Error: {:?}", error_code); println!("{:#?}", stack_frame); hlt_loop(); } ``` El registro [`CR2`] se configura automáticamente por la CPU en un fallo de página y contiene la dirección virtual accedida que provocó el fallo de página. Usamos la función [`Cr2::read`] del crate `x86_64` para leerla e imprimirla. El tipo [`PageFaultErrorCode`] proporciona más información sobre el tipo de acceso a la memoria que causó el fallo de página, por ejemplo, si fue causado por una operación de lectura o escritura. Por esta razón, también la imprimimos. No podemos continuar la ejecución sin resolver el fallo de página, por lo que entramos en un [`hlt_loop`] al final. [`CR2`]: https://en.wikipedia.org/wiki/Control_register#CR2 [`Cr2::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr2.html#method.read [`PageFaultErrorCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html [bug de LLVM]: https://github.com/rust-lang/rust/issues/57270 [`hlt_loop`]: @/edition-2/posts/07-hardware-interrupts/index.md#the-hlt-instruction Ahora podemos intentar acceder a alguna memoria fuera de nuestro núcleo: ```rust // en src/main.rs #[no_mangle] pub extern "C" fn _start() -> ! { println!("¡Hola Mundo{}", "!"); blog_os::init(); // nuevo let ptr = 0xdeadbeaf as *mut u8; unsafe { *ptr = 42; } // como antes #[cfg(test)] test_main(); println!("¡No se estrelló!"); blog_os::hlt_loop(); } ``` Cuando lo ejecutamos, vemos que se llama a nuestro controlador de fallos de página: ![EXCEPCIÓN: Fallo de Página, Dirección Accedida: VirtAddr(0xdeadbeaf), Código de Error: CAUSED_BY_WRITE, InterruptStackFrame: {…}](qemu-page-fault.png) El registro `CR2` efectivamente contiene `0xdeadbeaf`, la dirección que intentamos acceder. El código de error nos dice a través del [`CAUSED_BY_WRITE`] que la falla ocurrió mientras intentábamos realizar una operación de escritura. También nos dice más a través de los [bits que _no_ están establecidos][`PageFaultErrorCode`]. Por ejemplo, el hecho de que la bandera `PROTECTION_VIOLATION` no esté establecida significa que el fallo de página ocurrió porque la página objetivo no estaba presente. [`CAUSED_BY_WRITE`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.CAUSED_BY_WRITE Vemos que el puntero de instrucciones actual es `0x2031b2`, así que sabemos que esta dirección apunta a una página de código. Las páginas de código están mapeadas como solo lectura por el bootloader, así que leer desde esta dirección funciona, pero escribir causa un fallo de página. Puedes intentar esto cambiando el puntero `0xdeadbeaf` a `0x2031b2`: ```rust // Nota: La dirección real podría ser diferente para ti. Usa la dirección que // informa tu controlador de fallos de página. let ptr = 0x2031b2 as *mut u8; // leer desde una página de código unsafe { let x = *ptr; } println!("la lectura funcionó"); // escribir en una página de código unsafe { *ptr = 42; } println!("la escritura funcionó"); ``` Al comentar la última línea, vemos que el acceso de lectura funciona, pero el acceso de escritura causa un fallo de página: ![QEMU con salida: "la lectura funcionó, EXCEPCIÓN: Fallo de Página, Dirección Accedida: VirtAddr(0x2031b2), Código de Error: PROTECTION_VIOLATION | CAUSED_BY_WRITE, InterruptStackFrame: {…}"](qemu-page-fault-protection.png) Vemos que el mensaje _"la lectura funcionó"_ se imprime, lo que indica que la operación de lectura no causó errores. Sin embargo, en lugar del mensaje _"la escritura funcionó"_, ocurre un fallo de página. Esta vez la bandera [`PROTECTION_VIOLATION`] está establecida además de la bandera [`CAUSED_BY_WRITE`], lo que indica que la página estaba presente, pero la operación no estaba permitida en ella. En este caso, las escrituras a la página no están permitidas ya que las páginas de código están mapeadas como solo lectura. [`PROTECTION_VIOLATION`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.PROTECTION_VIOLATION ### Accediendo a las Tablas de Páginas Intentemos echar un vistazo a las tablas de páginas que definen cómo está mapeado nuestro núcleo: ```rust // en src/main.rs #[no_mangle] pub extern "C" fn _start() -> ! { println!("¡Hola Mundo{}", "!"); blog_os::init(); use x86_64::registers::control::Cr3; let (level_4_page_table, _) = Cr3::read(); println!("Tabla de páginas de nivel 4 en: {:?}", level_4_page_table.start_address()); […] // test_main(), println(…), y hlt_loop() } ``` La función [`Cr3::read`] del `x86_64` devuelve la tabla de páginas de nivel 4 actualmente activa desde el registro `CR3`. Devuelve una tupla de un tipo [`PhysFrame`] y un tipo [`Cr3Flags`]. Solo nos interesa el marco, así que ignoramos el segundo elemento de la tupla. [`Cr3::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3.html#method.read [`PhysFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/frame/struct.PhysFrame.html [`Cr3Flags`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3Flags.html Cuando lo ejecutamos, vemos la siguiente salida: ``` Tabla de páginas de nivel 4 en: PhysAddr(0x1000) ``` Entonces, la tabla de páginas de nivel 4 actualmente activa se almacena en la dirección `0x1000` en _memoria física_, como indica el tipo de wrapper [`PhysAddr`]. La pregunta ahora es: ¿cómo podemos acceder a esta tabla desde nuestro núcleo? [`PhysAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.PhysAddr.html Acceder a la memoria física directamente no es posible cuando la paginación está activa, ya que los programas podrían fácilmente eludir la protección de memoria y acceder a la memoria de otros programas de lo contrario. Así que la única forma de acceder a la tabla es a través de alguna página virtual que esté mapeada al marco físico en la dirección `0x1000`. Este problema de crear mapeos para los marcos de tabla de páginas es un problema general ya que el núcleo necesita acceder a las tablas de páginas regularmente, por ejemplo, al asignar una pila para un nuevo hilo. Las soluciones a este problema se explican en detalle en la siguiente publicación. ## Resumen Esta publicación introdujo dos técnicas de protección de memoria: segmentación y paginación. Mientras que la primera utiliza regiones de memoria de tamaño variable y sufre de fragmentación externa, la segunda utiliza páginas de tamaño fijo y permite un control mucho más detallado sobre los permisos de acceso. La paginación almacena la información de mapeo para las páginas en tablas de páginas con uno o más niveles. La arquitectura x86_64 utiliza tablas de páginas de 4 niveles y un tamaño de página de 4 KiB. El hardware recorre automáticamente las tablas de páginas y almacena en caché las traducciones resultantes en el buffer de traducción (TLB). Este buffer no se actualiza de manera transparente y necesita ser limpiado manualmente en cambios de tabla de páginas. Aprendimos que nuestro núcleo ya se ejecuta sobre paginación y que los accesos ilegales a la memoria provocan excepciones de fallo de página. Intentamos acceder a las tablas de páginas actualmente activas, pero no pudimos hacerlo porque el registro CR3 almacena una dirección física que no podemos acceder directamente desde nuestro núcleo. ## ¿Qué sigue? La siguiente publicación explica cómo implementar soporte para la paginación en nuestro núcleo. Presenta diferentes formas de acceder a la memoria física desde nuestro núcleo, lo que hace posible acceder a las tablas de páginas en las que se ejecuta nuestro núcleo. En este momento, seremos capaces de implementar funciones para traducir direcciones virtuales a físicas y para crear nuevos mapeos en las tablas de páginas. ================================================ FILE: blog/content/edition-2/posts/08-paging-introduction/index.fa.md ================================================ +++ title = "مقدمه‌ای بر صفحه‌بندی" weight = 8 path = "fa/paging-introduction" date = 2019-01-14 [extra] # Please update this when updating the translation translation_based_on_commit = "f692c5b377460e872bca2d3fcec787f4a0d1ec9b" # GitHub usernames of the people that translated this post translators = ["hamidrezakp", "MHBahrampour"] rtl = true +++ در این پست _صفحه‌بندی_، یک طرح مدیریت حافظه بسیار رایج که ما نیز برای سیستم‌عامل خود استفاده خواهیم کرد، معرفی می‌شود. این پست توضیح می‌دهد که چرا ایزوله سازی حافظه مورد نیاز است، قطعه‌بندی چگونه کار می‌کند، _حافظه مجازی_ چیست و چگونه صفحه‌بندی مشکلات تقسیم حافظه را حل می کند. همچنین طرح جدول‌های صفحه چند سطحی را در معماری x86_64 بررسی می‌کند. این بلاگ بصورت آزاد روی [گیت‌هاب] توسعه داده شده است. اگر شما مشکل یا سوالی دارید، لطفاً آن‌جا یک ایشو باز کنید. شما همچنین می‌توانید [در زیر] این پست کامنت بگذارید. منبع کد کامل این پست را می‌توانید در بِرَنچ [`post-08`][post branch] پیدا کنید. [گیت‌هاب]: https://github.com/phil-opp/blog_os [در زیر]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-08 ## محافظت از حافظه یکی از وظایف اصلی یک سیستم‌عامل جداسازی (ایزوله کردن) برنامه‌ها از یکدیگر است. به عنوان مثال، مرورگر وب شما نباید در کار ویرایشگر متن تداخلی ایجاد کند. برای دستیابی به این هدف، سیستم‌عامل‌ها از قابلیتی سخت‌افزاری استفاده کرده تا اطمینان حاصل کنند که حافظه مربوط به یک پروسه، توسط پروسه‌ای دیگر غیر قابل دسترس است. رویکردهای مختلفی وجود دارد که به سخت‌افزار و پیاده‌سازی سیستم عامل بستگی دارد. به عنوان مثال‌، برخی از پردازنده‌های ARM Cortex-M (برای سیستم‌های تعبیه شده استفاده می‌شوند) دارای یک [_واحد محافظت از حافظه_] (Memory Protection Unit: MPU) هستند، که به شما این امکان را می‌دهد که تعداد کمی از ناحیه حافظه (مانند 8) را با مجوزهای دسترسی متفاوت تعریف کنید (به عنوان مثال عدم دسترسی، فقط خواندنی، خواندنی-نوشتنی). در هر دسترسی به حافظه، MPU اطمینان حاصل می‌کند که آدرس در ناحیه‌ای با مجوزهای دسترسی صحیح قرار دارد و در غیر این‌صورت یک استثنا ایجاد می‌کند. با تغییر ناحیه و مجوزهای دسترسی در هر تعویض پروسه (ترجمه: process switch)، سیستم‌عامل می‌تواند اطمینان حاصل کند که هر پروسه فقط به حافظه خود دسترسی پیدا می‌کند و بنابراین پروسه‌ها را ایزوله می‌کند. [_واحد محافظت از حافظه_]: https://developer.arm.com/docs/ddi0337/e/memory-protection-unit/about-the-mpu در x86، سخت‌افزار از دو روش مختلف برای محافظت از حافظه پشتیبانی می‌کند: [قطعه‌بندی] و [صفحه‌بندی]. [قطعه‌بندی]: https://en.wikipedia.org/wiki/X86_memory_segmentation [صفحه‌بندی]: https://en.wikipedia.org/wiki/Virtual_memory#Paged_virtual_memory ## قطعه‌بندی قطعه‌بندی قبلاً در سال 1978 برای افزایش میزان حافظه‌‌ی آدرس پذیر معرفی شده بود. وضعیت در آن زمان این بود که پردازنده‌ها فقط از آدرس‌های 16 بیتی استفاده می‌کردند که باعث کاهش حافظه آدرس پذیر به 64KiB می‌شد. برای دسترسی بیشتر از این 64KiB،‌ ثبات‌های قطعه‌ی اضافی معرفی شدند که هر کدام حاوی یک offset هستند. پردازنده به طور خودکار این آفست را بر روی هر دسترسی به حافظه اضافه می‌کند، بنابراین حداکثر ۱ مگابایت حافظه قابل دسترسی است. بسته به نوع دسترسی به حافظه، ثبات قطعه به طور خودکار توسط پردازنده انتخاب می‌شود: برای دستورالعمل‌های واکشی (ترجمه: fetching)، از کد `CS` و برای عملیات‌های پشته (push/pop) پشته قطعه `SS` استفاده می‌شود. سایر دستورالعمل‌ها ازقطعه‌ی داده `DS` یا قطعه‌ی اضافه `ES` استفاده می‌کنند. بعدها دو ثبات قطعه‌ی اضافی `FS` و `GS` اضافه شدند که می‌توانند آزادانه مورد استفاده قرار گیرند. در نسخه اول قطعه‌بندی، ثبات‌های قطعه مستقیماً شامل آفست بودند و هیچ كنترل دسترسی انجام نمی‌شد. بعدها با معرفی [_حالت محافظت شده_] این مورد تغییر کرد. هنگامی که پردازنده در این حالت اجرا می‌شود، توصیف کنندگان قطعه شامل یک فهرست در یک [_جدول توصیف‌کننده_] محلی یا سراسری هستند - که علاوه بر آدرس آفست - اندازه و مجوزهای دسترسی را نیز در خود دارد. با بارگذاری جدول‌‌های توصیف‌کننده سراسری/محلی برای هر فرآیند که دسترسی حافظه را به ناحیه حافظه خود فرآیند محدود می‌کند، سیستم‌عامل می‌تواند فرایندها را از یکدیگر جدا کند. [_حالت محافظت شده_]: https://en.wikipedia.org/wiki/X86_memory_segmentation#Protected_mode [_جدول توصیف‌کننده_]: https://en.wikipedia.org/wiki/Global_Descriptor_Table با اصلاح آدرس‌های حافظه قبل از دسترسی واقعی، قطعه‌بندی از تکنیکی استفاده کرده است که اکنون تقریباً در همه جا استفاده می شود: _حافظه مجازی‌_. ### حافظه مجازی ایده پشت حافظه مجازی این است که آدرس‌های حافظه را از دستگاه ذخیره‌سازی فیزیکی زیرین، دور کنید. به جای دسترسی مستقیم به دستگاه ذخیره‌سازی، ابتدا مرحله ترجمه انجام می‌شود. برای قطعه‌بندی، مرحله ترجمه، افزودن آدرس آفست قطعه‌ی فعال است. تصور کنید یک برنامه به آدرس حافظه `0x1234000` در قطعه‌ای با آفست` 0x1111000` دسترسی پیدا کند: آدرسی که واقعاً قابل دسترسی است `0x2345000` است. برای تمایز بین دو نوع آدرس، به آدرس‌های قبل از ترجمه _مجازی_ و به آدرس‌های بعد از ترجمه _فیزیکی_ گفته می‌شود. یک تفاوت مهم بین این دو نوع آدرس این است که آدرس‌های فیزیکی منحصربه‌فرد هستند و همیشه به همان مکان حافظه متمایز اشاره دارند. از طرف دیگر آدرس‌های مجازی به تابع ترجمه بستگی دارد. کاملاً ممکن است که دو آدرس مجازی مختلف به همان آدرس فیزیکی اشاره داشته باشند. همچنین، آدرس‌های مجازی یکسان می‌توانند هنگام استفاده از توابع ترجمه مختلف، به آدرس‌های فیزیکی مختلفی مراجعه کنند. برای مثال هنگامی که می‌خواهید یک برنامه را دو بار بصورت موازی اجرا کنید، این خاصیت مفید است. ![Two virtual address spaces with address 0–150, one translated to 100–250, the other to 300–450](segmentation-same-program-twice.svg) در اینجا همان برنامه دو بار اجرا می‌شود ، اما با تابع‌های ترجمه مختلف. نمونه اول دارای آفست قطعه 100 است، بنابراین آدرس‌های مجازی 0–150 به آدرس های فیزیکی 100–250 ترجمه می‌شوند. نمونه دوم دارای آفست قطعه 300 است، که آدرس‌های مجازی 0–150 را به آدرس‌های فیزیکی 300–450 ترجمه می‌کند. این به هر دو برنامه این امکان را می‌دهد تا بدون تداخل با یکدیگر کد یکسانی را اجرا کنند و از آدرس‌های مجازی یکسان استفاده کنند. مزیت دیگر این است که برنامه‌ها می‌توانند در مکان‌های حافظه فیزیکی دلخواه قرار بگیرند، حتی اگر از آدرس‌های مجازی کاملاً متفاوتی استفاده کنند. بنابراین، سیستم‌عامل می‌تواند از مقدار کامل حافظه موجود بدون نیاز به کامپایل مجدد برنامه‌ها استفاده کند. ### تکه‌تکه شدن تمایز بین آدرس‌های مجازی و فیزیکی قطعه‌بندی را واقعا قدرتمند می‌کند. با این حال، مشکل تکه‌تکه شدن (ترجمه: fragmentation) دارد. به عنوان مثال، تصور کنید که می‌خواهیم نسخه سوم برنامه‌ای را که در بالا دیدیم اجرا کنیم: ![Three virtual address spaces, but there is not enough continuous space for the third](segmentation-fragmentation.svg) هیچ راهی برای نگاشت کردن نمونه سوم برنامه روی حافظه مجازی بدون همپوشانی وجود ندارد، حتی اگر حافظه آزاد بیش از اندازه کافی در دسترس باشد. مشکل این است که ما به حافظه _یکپارچه_ نیاز داریم و نمی‌توانیم از تکه‌های کوچک استفاده کنیم. یکی از راه‌های مقابله با این تکه‌تکه شدن، وقفه/مکث (pause) در اجرا است، انتقال قسمت‌های استفاده شده حافظه به سمت یکدیگر تا این قسمت‌ها به هم بچسبند و فضای تکه‌تکه شده بین آن‌ها پر شود، سپس به روزرسانی ترجمه و اجرای مجدد آن است: ![Three virtual address spaces after defragmentation](segmentation-fragmentation-compacted.svg) اکنون فضای یکپارچه کافی برای شروع نمونه سوم برنامه ما وجود دارد. نقطه ضعف این فرآیند یکپارچه‌سازی (ترجمه: defragmentation) قطعات این است که نیاز به کپی کردن مقدار زیادی حافظه است که باعث کاهش کارایی می‌شود. همچنین لازم است قبل از اینکه حافظه بیش از حد تکه‌تکه شود، این کار به طور منظم انجام شود. این باعث می‌شود کارایی غیرقابل پیش‌بینی باشد، زیرا برنامه‌ها به طور تصادفی دچار وقفه می‌شوند و ممکن است ناپاسخگو (ترجمه: unresponsive) شوند. مشکل تکه‌تکه شدن یکی از دلایلی است که قطعه‌بندی دیگر توسط اکثر سیستم‌ها استفاده نمی‌شود. در واقع‌، قطعه‌بندی حتی در حالت 64 بیتی روی x86 دیگر پشتیبانی نمی‌شود. در عوض از _صفحه‌بندی_ استفاده می‌شود، که به طور کامل از مشکل تکه‌تکه شدن جلوگیری می‌کند. ## صفحه‌بندی ایده این است که هر دو فضای حافظه مجازی و فیزیکی را به بلوک‌های کوچک و با اندازه ثابت تقسیم کنید. بلوک‌های فضای حافظه مجازی _صفحه‌ها_ و بلوک‌های فضای آدرس فیزیکی _قاب‌ها_ نامیده می‌شوند. هر صفحه را می‌توان به صورت جداگانه به یک قاب نگاشت کرد‌، که باعث می‌شود ناحیه حافظه بزرگتر در قاب‌های فیزیکی غیر یکپارچه تقسیم شوند. اگر مثالِ فضای حافظه تکه‌تکه شده را خلاصه کنیم، مزیت این امر قابل مشاهده می‌شود، اما این بار به جای قطعه‌بندی از صفحه‌بندی استفاده می‌کنیم: ![With paging the third program instance can be split across many smaller physical areas](paging-fragmentation.svg) در این مثال یک صفحه با اندازه 50 بایت داریم، به این معنی که هر یک از ناحیه حافظه ما در سه صفحه تقسیم شده است. هر صفحه به صورت جداگانه به یک قاب نگاشت می‌شود، بنابراین می‌توان یک منطقه حافظه مجازی یکپارچه را به قاب‌های فیزیکی غیر یکپارچه نگاشت کرد. که به ما این امکان را می‌دهد تا نمونه سوم برنامه را بدون انجام هرگونه یکپارچه‌سازی شروع کنیم. ### تکه‌تکه شدن مخفی در مقایسه با قطعه‌بندی‌، صفحه‌بندی به جای چند منطقه بزرگ و متغیر، از تعداد زیادی ناحیه حافظه کوچک و ثابت استفاده می‌کند. از آن‌جا که هر قاب دارای اندازه یکسانی است، هیچ قابی وجود ندارد که از سایز صفحه‌های موجود کوچکتر باشد، پس تکه‌تکه شدن رخ نمی‌دهد. یا _به نظر_ می‌رسد که هیچ تکه‌تکه‌ شدنی رخ نمی‌دهد. هنوز یک نوع تکه‌تکه‌ شدن نخفی وجود دارد، به اصطلاح _تکه‌تکه شدن داخلی_. تکه‌تکه شدن داخلی اتفاق می‌افتد زیرا همه ناحیه حافظه دقیقاً مضربی از اندازه صفحه نیستند. برنامه‌ای با اندازه 101 را در مثال بالا تصور کنید: هنوز به سه صفحه با اندازه 50 نیاز دارد، بنابراین 49 بایت بیش از حد مورد نیاز اشغال می‌کند. برای تمایز بین دو نوع تکه‌تکه‌ شدن، نوعی تکه‌تکه‌ شدنی که هنگام استفاده از قطعه‌بندی اتفاق می‌افتد، _قطعه‌بندی خارجی_ نامیده می‌شود. تکه‌تکه شدن داخلی تأسف آور است، اما اغلب بهتر از تکه‌تکه شدن خارجی است که با قطعه‌بندی رخ می‌دهد. این هنوز حافظه را هدر می‌دهد، اما به یکپارچه‌سازی نیاز ندارد و میزان تکه‌تکه شدن را قابل پیش‌بینی می‌کند (به طور متوسط نیم صفحه در هر منطقه حافظه). ### جدول صفحه‌ها دیدیم که هر یک از میلیون‌ها صفحه بالقوه به صورت جداگانه در یک قاب نگاشت می‌شوند. این اطلاعات نگاشت باید در جایی ذخیره شود. قطعه‌بندی برای هر منطقه حافظه فعال از یک ثبات انتخابگرِ قطعه‌ی جداگانه استفاده می‌کند، که برای صفحه‌بندی امکان پذیر نیست زیرا صفحات بیشتری نسبت به ثبات‌ها وجود دارد. در عوض صفحه‌بندی از یک ساختار جدول به نام _page table_ برای ذخیره اطلاعات نگاشت استفاده می کند. برای مثال بالا، جدول‌های صفحه به صورت زیر است: ![Three page tables, one for each program instance. For instance 1 the mapping is 0->100, 50->150, 100->200. For instance 2 it is 0->300, 50->350, 100->400. For instance 3 it is 0->250, 50->450, 100->500.](paging-page-tables.svg) می‌بینیم که هر نمونه‌ی برنامه جدول صفحه خاص خود را دارد. یک اشاره‌گر به جدولی که در حال حاضر فعال است، در یک رجیستر مخصوص CPU ذخیره می‌شود. در `x86`، این ثبات `CR3` است. وظیفه سیستم‌عامل این است که قبل از اجرای هر نمونه‌ی برنامه، این رجیستر را با اشاره‌گر به جدول صفحه‌ی صحیح بارگذاری کند. در هر دسترسی به حافظه، CPU اشاره‌گر جدول را از ثبات می‌خواند و قاب نگاشته شده را برای صفحه قابل دسترسی در جدول جستجو می‌کند. این کار کاملاً بصورت سخت‌افزاری و کاملاً شفاف برای برنامه‌ی در حال اجرا، انجام می‌شود. برای سرعت بخشیدن به روند ترجمه، بسیاری از معماری‌های CPU حافظه پنهان (ترجمه: cache) ویژه‌ای دارند که نتایج آخرین ترجمه‌ها را به خاطر می‌سپارد. بسته به معماری، ورودی‌های جدول صفحه همچنین می‌توانند ویژگی‌هایی مانند مجوزهای دسترسی را در فیلد پرچم‌ها ذخیره کنند. در مثال بالا، پرچم "r/w" صفحه را، خواندنی و قابل نوشتن می‌کند. ### جدول های صفحه چند سطحی جدول‌های صفحه ساده که اخیراً دیدیم در فضاهای آدرس بزرگتر مشکل دارند: آن‌ها حافظه را هدر می‌دهند. به عنوان مثال، برنامه‌ای را تصور کنید که از چهار صفحه مجازی `0`، `000_000_1`، `050_000_1` و `100_000_1` استفاده کند (ما از `_` به عنوان جداکننده هزاران استفاده می‌کنیم): ![Page 0 mapped to frame 0 and pages `1_000_000`–`1_000_150` mapped to frames 100–250](single-level-page-table.svg) این فقط به 4 قاب فیزیکی نیاز دارد، اما جدول صفحه بیش از یک میلیون ورودی دارد. ما نمی‌توانیم ورودی‌های خالی را حذف کنیم زیرا در این صورت CPU دیگر نمی‌تواند مستقیماً به ورودی صحیح در فرآیند ترجمه پرش کند (به عنوان مثال، دیگر تضمین نمی‌شود که صفحه چهارم از ورودی چهارم استفاده کند). برای کاهش حافظه هدر رفته، می‌توانیم از یک **جدول صفحه دو سطحی** استفاده کنیم. ایده این است که ما از جدول‌های صفحه مختلف برای ناحیه آدرس مختلف استفاده می‌کنیم. یک جدول اضافی با عنوان جدول صفحه _level 2_ شامل نگاشت بین ناحیه آدرس و جدول‌های صفحه (سطح 1) است. این بهتر است با یک مثال توضیح داده شود. بیایید تعریف کنیم که هر جدول صفحه 1 سطح مربوط به منطقه‌ای با اندازه `000_10` است. سپس جدول‌های زیر برای مثال نگاشت بالا وجود دارد: ![Page 0 points to entry 0 of the level 2 page table, which points to the level 1 page table T1. The first entry of T1 points to frame 0, the other entries are empty. Pages `1_000_000`–`1_000_150` point to the 100th entry of the level 2 page table, which points to a different level 1 page table T2. The first three entries of T2 point to frames 100–250, the other entries are empty.](multilevel-page-table.svg) صفحه 0 در اولین بایت منطقه `000_10` قرار می‌گیرد، بنابراین از اولین ورودی جدول صفحه سطح 2 استفاده می‌کند. این ورودی به جدول صفحه 1 سطح T1 اشاره دارد که مشخص می کند صفحه `0` به قاب `0` اشاره می‌کند. صفحات `000_000_1` ،`050_000_1` و `100_000_1` همگی در منطقه صدم `000_10` بایت قرار می‌گیرند، بنابراین آن‌ها از ورودی صدم در جدول صفحه سطح 2 استفاده می‌کنند. این ورودی در جدول سطح 1 صفحه T2 متفاوت است که سه صفحه را با قاب‌های `100`، `150` و `200` نگاشت می‌کند. توجه داشته باشید که آدرس صفحه در جدول‌‌های سطح 1 شامل آفست منطقه نیست، به عنوان مثال، ورودی صفحه `050_000_1` فقط `50` است. ما هنوز 100 ورودی خالی در جدول سطح 2 داریم، اما بسیار کمتر از یک میلیون ورودی خالیِ قبل است. دلیل این پس‌انداز این است که نیازی به ایجاد جدول‌های صفحه سطح 1 برای ناحیه حافظه نگاشت نشده بین `000_10` و `000_000_1` نداریم. قاعده جدول‌های صفحه دو سطحی را می‌توان به سه، چهار یا بیشتر سطح گسترش داد. سپس ثبات جدول صفحه به جدول بالاترین سطح اشاره می‌کند، که به جدول سطح پایین بعدی اشاره می‌کند، که به سطح پایین بعدی اشاره می‌کند و این روال ادامه پیدا می‌کند. جدول صفحه سطح 1 سپس به قاب نگاشته شده اشاره می‌کند. این قاعده را به صورت کلی،‌ جدول صفحات _چند سطحی_ \(ترجمه: multilevel) یا _سلسله مراتبی‌_ \(ترجمه: hierarchical) می‌نامند. اکنون که از نحوه کار جدول‌های صفحه‌بندی و صفحه‌های چند سطحی مطلع شدیم، می‌توانیم به نحوه پیاده‌سازی در معماری x86_64 توجه کنیم (در ادامه فرض می‌کنیم CPU در حالت 64 بیتی کار می‌کند). ## صفحه‌بندی در x86_64 معماری x86_64 از جدول صفحه 4 سطحی و اندازه صفحه 4KiB استفاده می‌کند. هر جدول صفحه، مستقل از سطح، دارای اندازه ثابت 512 ورودی است. اندازه هر ورودی 8 بایت است، پس بزرگی هر جدول 8B * 512 = 4KiB است و بنابراین دقیقاً در یک صفحه قرار می‌گیرد. اندیس جدول صفحه برای سطح مستقیماً از آدرس مجازی مشتق می‌شود: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](x86_64-table-indices-from-address.svg) می‌بینیم که هر اندیس جدول از 9 بیت تشکیل شده است، که منطقی است زیرا هر جدول دارای 512 = 9^2 ورودی است. کمترین 12 بیت در صفحه 4KiB آفست هستند (2^12 بایت = 4 کیلوبایت). بیت های 48 تا 64 کنار گذاشته می‌شوند، به این معنی که x86_64 در واقع 64 بیتی نیست زیرا فقط از آدرس های 48 بیتی پشتیبانی می‌کند. [جدول صفحه 5 سطحی]: https://en.wikipedia.org/wiki/Intel_5-level_paging حتی اگر بیت‌های 48 تا 64 کنار گذاشته‌شوند، نمی‌توان آن‌ها را روی مقادیر دلخواه تنظیم کرد. در عوض، همه بیت‌های این محدوده باید کپی از بیت 47 باشند تا آدرس‌ها منحصربه‌فرد باشند و extension های آینده مانند [جدول صفحه 5 سطحی] را ممکن کنند. این _sign-extension_ نامیده می‌شود زیرا بسیار شبیه به [extension علامت در مکمل دو] است. وقتی آدرس به درستی امضا نشده باشد، CPU یک استثنا را ارائه می‌دهد. [extension علامت در مکمل دو]: https://en.wikipedia.org/wiki/Two's_complement#Sign_extension شایان ذکر است که پردازنده‌های اخیر "Ice Lake" اینتل به صورت اختیاری از [جدول‌های صفحه 5 سطحی] پشتیبانی می‌کنند تا آدرس‌های مجازی را از 48 بیتی به 57 بیتی گسترش دهند. با توجه به این‌که بهینه‌سازی هسته ما برای یک CPU خاص در این مرحله منطقی نیست، ما در این پست فقط با جدول‌های صفحه 4 سطحیِ استاندارد کار خواهیم کرد. [جدول‌های صفحه 5 سطحی]: https://en.wikipedia.org/wiki/Intel_5-level_paging ### مثالی از ترجمه بیایید مثالی بزنیم تا با جزئیات بفهمیم که روند ترجمه چگونه کار می‌کند: ![An example 4-level page hierarchy with each page table shown in physical memory](x86_64-page-table-translation.svg) آدرس فیزیکی جدول صفحه سطح 4 که در حال حاضر فعال می‌باشد، و ریشه جدول صفحه سطح 4 است، در ثبات `CR3` ذخیره می‌شود. سپس هر ورودی جدول صفحه به قاب فیزیکی جدول سطح بعدی اشاره می‌کند. سپس ورودی جدول سطح 1 به قاب نگاشت شده اشاره می‌کند. توجه داشته باشید که تمام آدرس‌های موجود در جدول‌های صفحه فیزیکی هستند، به جای این‌که مجازی باشند، زیرا در غیر این‌صورت CPU نیاز به ترجمه آن آدرس‌ها نیز دارد (که این امر می‌تواند باعث بازگشت بی‌پایان شود). سلسله مراتب جدول صفحه بالا، دو صفحه را نگاشت می‌کند (به رنگ آبی). از اندیس‌های جدول صفحه می‌توان نتیجه گرفت که آدرس‌های مجازی این دو صفحه `0x803FE7F000` و `0x803FE00000` است. بیایید ببینیم چه اتفاقی می‌افتد وقتی برنامه سعی می‌کند از آدرس `0x803FE7F5CE` بخواند. ابتدا آدرس را به باینری تبدیل می‌کنیم و اندیس‌های جدول صفحه و آفست صفحه را برای آدرس تعیین می‌کنیم: ![The sign extension bits are all 0, the level 4 index is 1, the level 3 index is 0, the level 2 index is 511, the level 1 index is 127, and the page offset is 0x5ce](x86_64-page-table-translation-addresses.png) با استفاده از این اندیس‌ها، اکنون می‌توانیم سلسله مراتب جدول صفحه را برای تعیین قاب نگاشته شده برای آدرس دنبال کنیم: - ما با خواندن آدرس جدول سطح 4 از ثبات `CR3` شروع می‌کنیم. - اندیس سطح 4 برابر با 1 است، بنابراین ما به ورودی با اندیس 1 آن جدول نگاه می‌کنیم، که به ما می‌گوید جدول سطح 3 در آدرس 16KiB ذخیره شده است. - ما جدول سطح 3 را از آن آدرس بارگیری می‌کنیم و ورودی با اندیس 0 را مشاهده می‌کنیم، که جدول سطح 2 در 24KiB را به ما نشان می‌دهد. - اندیس سطح 2 برابر با 511 است، بنابراین ما برای یافتن آدرس جدول سطح 1 به آخرین ورودی آن صفحه نگاه می‌کنیم. - از طریق ورودی با اندیس 127 جدول سطح 1، ما در نهایت متوجه می‌شویم که صفحه در قاب 12KiB، یا بصورت هگزادسیمال در 0x3000 نگاشت شده است. - مرحله آخر افزودن آفست صفحه به آدرس قاب است تا آدرس فیزیکی 0x3000 + 0x5ce = 0x35ce بدست آید. ![The same example 4-level page hierarchy with 5 additional arrows: "Step 0" from the CR3 register to the level 4 table, "Step 1" from the level 4 entry to the level 3 table, "Step 2" from the level 3 entry to the level 2 table, "Step 3" from the level 2 entry to the level 1 table, and "Step 4" from the level 1 table to the mapped frames.](x86_64-page-table-translation-steps.svg) مجوزهای صفحه در جدول سطح 1، مجوز "r" است، که به معنای فقط خواندن است. سخت‌افزار این مجوزها را اعمال می‌کند و اگر بخواهیم در آن صفحه بنویسیم یک استثنا را ایجاد می‌کند. مجوزها در صفحات سطح بالاتر مجوزهای احتمالی را در سطح پایین محدود می‌کنند، بنابراین اگر ورودی سطح 3 را فقط برای خواندن تنظیم کنیم، صفحه‌هایی که از این ورودی استفاده می‌کنند نیز قابل نوشتن نیستند، حتی اگر سطوح پایین‌تر مجوزهای خواندن/نوشتن را مشخص کرده باشند. توجه به این نکته مهم است که اگرچه این مثال فقط از یک نمونه از هر جدول استفاده می‌کند، به طور معمول از هر سطح در هر فضای آدرس چندین نمونه وجود دارد. در حالت حداکثری، موارد زیر وجود دارد: - یک جدول سطح 4، - 512 جدول سطح 3 (زیرا جدول سطح 4 دارای 512 ورودی است)، - 512 * 512 جدول سطح 2 (زیرا هر 512 جدولِ سطح 3 دارای 512 ورودی است)، و - 512 * 512 * 512 جدول سطح 1 (512 ورودی برای هر جدول سطح 2). ### قالب جدول صفحه جدول‌های صفحه در معماری x86_64 اساساً آرایه‌ای از 512 ورودی است. در سینتکس (کلمه: syntax) راست: ```rust #[repr(align(4096))] pub struct PageTable { entries: [PageTableEntry; 512], } ``` همان‌طور که با ویژگی `repr` نشان داده شده است، جدول‌های صفحه باید صفحه تراز شوند، یعنی در یک مرز 4KiB تراز شوند. این نیاز تضمین می‌کند که یک جدول صفحه همیشه یک صفحه کامل را پر می‌کند و به بهینه‌سازی اجازه می‌دهد که ورودی‌ها را بسیار جمع و جور کند. هر ورودی 8 بایت (64 بیت) اندازه دارد و دارای قالب زیر است: Bit(s) | Name | Meaning ------ | ---- | ------- 0 | present | the page is currently in memory 1 | writable | it's allowed to write to this page 2 | user accessible | if not set, only kernel mode code can access this page 3 | write through caching | writes go directly to memory 4 | disable cache | no cache is used for this page 5 | accessed | the CPU sets this bit when this page is used 6 | dirty | the CPU sets this bit when a write to this page occurs 7 | huge page/null | must be 0 in P1 and P4, creates a 1GiB page in P3, creates a 2MiB page in P2 8 | global | page isn't flushed from caches on address space switch (PGE bit of CR4 register must be set) 9-11 | available | can be used freely by the OS 12-51 | physical address | the page aligned 52bit physical address of the frame or the next page table 52-62 | available | can be used freely by the OS 63 | no execute | forbid executing code on this page (the NXE bit in the EFER register must be set) می‌بینیم که فقط بیت‌های 12–51 برای ذخیره آدرس قاب فیزیکی استفاده می‌شود، بیت‌های باقی‌مانده به عنوان پرچم استفاده می‌شوند یا توسط سیستم‌عامل می‌توانند آزادانه استفاده شوند. این امکان وجود دارد زیرا ما همیشه به یک آدرس تراز شده 4096 بایت، یا به یک جدول صفحه تراز شده با صفحه یا به شروع یک قاب نگاشت شده، اشاره می‌کنیم. این بدان معناست که بیت‌های 0–11 همیشه صفر هستند، بنابراین دلیلی برای ذخیره این بیت‌ها وجود ندارد زیرا سخت‌افزار می‌تواند آن‌ها را قبل از استفاده از آدرس صفر کند. این مورد در بیت‌های 52-63 نیز صدق می‌کند، زیرا معماری x86_64 فقط از آدرس‌های فیزیکی 52 بیتی پشتیبانی می‌کند (همان‌طور که فقط از آدرس‌های مجازی 48 بیتی پشتیبانی می‌کند). بیایید نگاهی دقیق‌تر به پرچم‌های موجود بیندازیم: - پرچم `present` صفحات نگاشت شده را از صفحات نگاشته نشده متمایز می‌کند. وقتی حافظه اصلی پر شود می‌توان از آن برای تعویض موقت صفحات روی دیسک استفاده کرد. وقتی متعاقباً به صفحه دسترسی پیدا شد، یک استثنای ویژه به نام _page fault_ اتفاق می‌افتد که سیستم‌عامل می‌تواند با بارگیری مجدد صفحه از دست رفته از دیسک و سپس ادامه برنامه‌، به آن واکنش نشان دهد. - پرچم‌های `writable` و `no execute` به ترتیب کنترل می‌کنند که آیا محتوای صفحه، «قابل نوشتن» یا «حاوی دستورالعمل‌های اجرایی بودن» هستند. - پرچم های `accessed` و `dirty` به طور خودکار هنگام پردازش یا نوشتن روی صفحه توسط CPU تنظیم می‌شوند. این اطلاعات می‌تواند توسط سیستم‌عامل مورد استفاده قرار گیرد. به عنوان مثال برای تصمیم‌گیری در مورد تعویض صفحه‌ها یا تغییر محتوای صفحه از آخرین ذخیره روی دیسک. - پرچم‌های `write through caching` و `disable cache` امکان کنترل حافظه پنهان برای هر صفحه را به صورت جداگانه فراهم می‌کند. - پرچم `user accessible` یک صفحه را در دسترس کد فضای کاربر قرار می‌دهد، در غیر این‌صورت فقط وقتی CPU در حالت هسته است، قابل دسترسی است. از این ویژگی می‌تواند برای سریع‌تر کردن [فراخوانی‌های سیستم] با نگه داشتن نگاشت هسته در حین اجرای برنامه فضای کاربر مورد استفاده قرار گیرد. با این وجود، آسیب‌پذیری [Spectre] می‌تواند به برنامه‌های فضای کاربر اجازه دهد این صفحات را بخوانند. - پرچم `global` به سخت‌افزار سیگنال می‌دهد که یک صفحه در تمام فضاهای آدرس موجود است و بنابراین نیازی به حذف شدن از حافظه پنهان ترجمه نیست (به بخش TLB زیر مراجعه کنید) در تعویض‌های فضای آدرس. این پرچم معمولاً همراه با یک پرچم پاک شده `user accessible` برای نگاشت کد هسته در تمام فضاهای آدرس استفاده می‌شود. - پرچم `large page` با اجازه دادن به ورودی جدول‌های صفحه سطح 2 یا سطح 3، اجازه ایجاد صفحاتی با اندازه بزرگتر را می‌دهد تا مستقیماً به یک قاب نگاشت شده اشاره کنند. با استفاده از این بیت، اندازه صفحه با ضریب 512 افزایش می‌یابد برای هر یک از 2MiB = 512 * 4KiB ورودی‌های سطح 2 یا 1GiB = 512 * 2MiB برای ورودی‌های سطح 3. مزیت استفاده از صفحات بزرگتر این است که به خطوط حافظه پنهان ترجمه کمتر و جدول‌های صفحه کمتر نیاز است. [فراخوانی‌های سیستم]: https://en.wikipedia.org/wiki/System_call [Spectre]: https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) کریت `x86_64` انواع مختلفی را برای [جدول‌های صفحه] و [ورودی‌های] آن‌ها فراهم می‌کند، بنابراین نیازی نیست که خودمان این ساختارها را ایجاد کنیم. [جدول‌های صفحه]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTable.html [ورودی‌های]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html ### بافر ترجمه Lookaside یک جدول صفحه 4 سطحی، ترجمه آدرس‌های مجازی را پُر هزینه‌ می‌کند، زیرا هر ترجمه به 4 دسترسی حافظه نیاز دارد. برای بهبود عملکرد، معماری x86_64 آخرین ترجمه‌ها را در _translation lookaside buffer_ یا به اختصار TLB ذخیره می‌کند. و این به ما اجازه می‌دهد تا از ترجمه کردن مجدد ترجمه‌هایی که در حافظه پنهان قرار دارند خودداری کنیم. برخلاف سایر حافظه‌های پنهان پردازنده، TLB کاملاً شفاف نبوده و با تغییر محتوای جدول‌های صفحه، ترجمه‌ها را به‌روز و حذف نمی‌کند. این بدان معنی است که هسته هر زمان که جدول صفحه را تغییر می‌دهد باید TLB را به صورت دستی به‌روز کند. برای انجام این کار، یک دستورالعمل ویژه پردازنده وجود دارد به نام [`invlpg`] ("صفحه نامعتبر") که ترجمه برای صفحه مشخص شده را از TLB حذف می‌کند، بنابراین دوباره از جدول صفحه در دسترسی بعدی بارگیری می‌شود. TLB همچنین می‌تواند با بارگیری مجدد رجیستر `CR3`، که یک تعویض فضای آدرس را شبیه‌سازی می‌کند، کاملاً فلاش (کلمه: flush) شود. کریت `x86_64` توابع راست را برای هر دو نوع در [ماژول `tlb`] فراهم می‌کند. [`invlpg`]: https://www.felixcloutier.com/x86/INVLPG.html [ماژول `tlb`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tlb/index.html مهم است که به یاد داشته باشید که TLB را روی هر جدول صفحه فلاش کنید، زیرا در غیر این‌صورت پردازنده ممکن است از ترجمه قدیمی استفاده کند، که می‌تواند منجر به باگ‌های غیرقطعی شود که اشکال‌زدایی آن بسیار سخت است. ## پیاده‌سازی چیزی که ما هنوز به آن اشاره نکردیم: **هسته ما از قبل با صفحه‌بندی اجرا می‌شود**. بوت‌لودری که در پست ["یک هسته مینیمال با Rust"] اضافه کردیم، قبلاً یک سلسله مراتب صفحه‌بندی 4 سطح را تنظیم کرده است که هر صفحه از هسته ما را در یک قاب فیزیکی نگاشت می‌کند. بوت‌لودر این کار را انجام می‌دهد زیرا صفحه‌بندی در حالت 64 بیتی در x86_64 اجباری است. ["یک هسته مینیمال با Rust"]: @/edition-2/posts/02-minimal-rust-kernel/index.fa.md#skht-dyskh-ymyj این بدان معناست که هر آدرس حافظه‌ای که در هسته خود استفاده کردیم یک آدرس مجازی بود. دسترسی به بافر VGA در آدرس `0xb8000` فقط به این دلیل کار کرد که بوت‌لودر آن صفحه حافظه را نگاشت یکتا (ترجمه: identity mapped) کرد، یعنی صفحه مجازی `0xb8000` را با فریم فیزیکی `0xb8000` نگاشت کرده است. صفحه‌بندی باعث می‌شود که هسته ما نسبتاً ایمن باشد، زیرا هر دسترسی به حافظه که از مرز خارج شود باعث ایجاد استثنای خطای صفحه، به جای نوشتن روی حافظه فیزیکی تصادفی می‌شود. بوت‌لودر حتی مجوزهای دسترسی صحیح را برای هر صفحه تنظیم کرده است، به این معنی که فقط صفحات حاوی کد قابل اجرا هستند و فقط صفحات داده قابل نوشتن هستند. ### خطاهای صفحه بیایید سعی کنیم با دسترسی به برخی از حافظه‌های خارج از هسته، باعث ایجاد خطای صفحه شویم. ابتدا، یک کنترل‌کننده خطای صفحه ایجاد می‌کنیم و آن را در IDT ثبت می‌کنیم، به‌طوری که به جای یک [خطای دوگانه] یک استثنای خطای صفحه مشاهده می‌کنیم: [خطای دوگانه]: @/edition-2/posts/06-double-faults/index.fa.md ```rust // in src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); […] idt.page_fault.set_handler_fn(page_fault_handler); // new idt }; } use x86_64::structures::idt::PageFaultErrorCode; use crate::hlt_loop; extern "x86-interrupt" fn page_fault_handler( stack_frame: InterruptStackFrame, error_code: PageFaultErrorCode, ) { use x86_64::registers::control::Cr2; println!("EXCEPTION: PAGE FAULT"); println!("Accessed Address: {:?}", Cr2::read()); println!("Error Code: {:?}", error_code); println!("{:#?}", stack_frame); hlt_loop(); } ``` ثبات [`CR2`] به‌طور خودکار توسط CPU روی خطای صفحه تنظیم می‌شود و حاوی آدرس مجازی قابل دسترسی است که باعث رخ دادن خطای صفحه شده است. ما برای خواندن و چاپ آن از تابع [`Cr2::read`] کریت ` x86_64` استفاده می‌کنیم. نوع [`PageFaultErrorCode`] اطلاعات بیشتری در مورد نوع دسترسی به حافظه‌ای که باعث خطای صفحه شده است، فراهم می کند، به عنوان مثال این امر به دلیل خواندن یا نوشتن بوده است. به همین دلیل ما آن را نیز چاپ می‌کنیم. بدون رفع خطای صفحه نمی‌توانیم به اجرا ادامه دهیم، بنابراین در انتها یک [hlt_loop] اضافه می‌کنیم. [`CR2`]: https://en.wikipedia.org/wiki/Control_register#CR2 [`Cr2::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr2.html#method.read [`PageFaultErrorCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html [LLVM bug]: https://github.com/rust-lang/rust/issues/57270 [`hlt_loop`]: @/edition-2/posts/07-hardware-interrupts/index.md#the-hlt-instruction اکنون می‌توانیم به برخی از حافظه‌های خارج از هسته خود دسترسی پیدا کنیم: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // new let ptr = 0xdeadbeaf as *mut u8; unsafe { *ptr = 42; } // as before #[cfg(test)] test_main(); println!("It did not crash!"); blog_os::hlt_loop(); } ``` هنگامی که آن را اجرا می‌کنیم، می‌بینیم که کنترل‌کننده خطای صفحه ما صدا زده می‌شود: ![EXCEPTION: Page Fault, Accessed Address: VirtAddr(0xdeadbeaf), Error Code: CAUSED_BY_WRITE, InterruptStackFrame: {…}](qemu-page-fault.png) ثبات `CR2` در واقع حاوی` 0xdeadbeaf` هست، آدرسی که سعی کردیم به آن دسترسی پیدا کنیم. کد خطا از طریق [`CAUSED_BY_WRITE`] به ما می‌گوید که خطا هنگام تلاش برای انجام یک عملیات نوشتن رخ داده است. حتی از طریق [بیت‌هایی که تنظیم _نشده‌اند_][`PageFaultErrorCode`] اطلاعات بیشتری به ما می‌دهد. به عنوان مثال، عدم تنظیم پرچم `PROTECTION_VIOLATION` به این معنی است که خطای صفحه رخ داده است زیرا صفحه هدف وجود ندارد. [`CAUSED_BY_WRITE`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.CAUSED_BY_WRITE می‌بینیم که اشاره‌گر دستورالعمل فعلی `0x2031b2` می‌باشد، بنابراین می‌دانیم که این آدرس به یک صفحه کد اشاره دارد. صفحات کد توسط بوت‌لودر بصورت فقط خواندنی نگاشت می‌شوند، بنابراین خواندن از این آدرس امکان‌پذیر است اما نوشتن باعث خطای صفحه می‌شود. می‌توانید این کار را با تغییر اشاره‌گر `0xdeadbeaf` به `0x2031b2` امتحان کنید: ```rust // Note: The actual address might be different for you. Use the address that // your page fault handler reports. let ptr = 0x2031b2 as *mut u8; // read from a code page unsafe { let x = *ptr; } println!("read worked"); // write to a code page unsafe { *ptr = 42; } println!("write worked"); ``` با کامنت کردن خط آخر، می‌بینیم که دسترسی خواندن کار می‌کند، اما دسترسی نوشتن باعث خطای صفحه می‌شود: ![QEMU with output: "read worked, EXCEPTION: Page Fault, Accessed Address: VirtAddr(0x2031b2), Error Code: PROTECTION_VIOLATION | CAUSED_BY_WRITE, InterruptStackFrame: {…}"](qemu-page-fault-protection.png) می‌بینیم که پیام _"read worked"_ چاپ شده است، که نشان می‌دهد عملیات خواندن هیچ خطایی ایجاد نکرده است. با این حال، به جای پیام _"write worked"_ خطای صفحه رخ می‌دهد. این بار پرچم [`PROTECTION_VIOLATION`] علاوه بر پرچم [`CAUSED_BY_WRITE`] تنظیم شده است، که نشان‌دهنده‌ وجود صفحه است، اما عملیات روی آن مجاز نیست. در این حالت نوشتن در صفحه مجاز نیست زیرا صفحات کد به صورت فقط خواندنی نگاشت می‌شوند. [`PROTECTION_VIOLATION`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.PROTECTION_VIOLATION ### دسترسی به جدول‌های صفحه بیایید سعی کنیم نگاهی به جدول‌های صفحه بیندازیم که نحوه نگاشت هسته را مشخص می‌کند: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); use x86_64::registers::control::Cr3; let (level_4_page_table, _) = Cr3::read(); println!("Level 4 page table at: {:?}", level_4_page_table.start_address()); […] // test_main(), println(…), and hlt_loop() } ``` تابع [`Cr3::read`] از ` x86_64` جدول صفحه سطح 4 که در حال حاضر فعال است را از ثبات `CR3` برمی‌گرداند. یک تاپل (کلمه: tuple) از نوع [`PhysFrame`] و [`Cr3Flags`] برمی‌گرداند. ما فقط به قاب علاقه‌مَندیم، بنابراین عنصر دوم تاپل را نادیده می‌گیریم. [`Cr3::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3.html#method.read [`PhysFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/frame/struct.PhysFrame.html [`Cr3Flags`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3Flags.html هنگامی که آن را اجرا می‌کنیم، خروجی زیر را مشاهده می‌کنیم: ``` Level 4 page table at: PhysAddr(0x1000) ``` بنابراین جدول صفحه سطح 4 که در حال حاضر فعال است در آدرس `0x100` در حافظه _فیزیکی_ ذخیره می‌شود، همان‌طور که توسط نوع بسته‌بندی [`PhysAddr`] نشان داده شده است. حال سوال این است: چگونه می‌توانیم از هسته خود به این جدول دسترسی پیدا کنیم؟ [`PhysAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.PhysAddr.html دسترسی مستقیم به حافظه فیزیکی در هنگام فعال بودن صفحه‌بندی امکان پذیر نیست، زیرا برنامه‌ها به راحتی می‌توانند محافظت از حافظه (ترجمه: memory protection) را دور بزنند و در غیر این‌صورت به حافظه سایر برنامه‌ها دسترسی پیدا می‌کنند. بنابراین تنها راه دسترسی به جدول از طریق برخی از صفحه‌های مجازی است که به قاب فیزیکی در آدرس`0x1000` نگاشت شده. این مشکل ایجاد نگاشت برای قاب‌های جدول صفحه یک مشکل کلی است، زیرا هسته به طور مرتب به جدول‌های صفحه دسترسی دارد، به عنوان مثال هنگام اختصاص پشته برای یک نخِ (ترجمه: thread) جدید. راه حل‌های این مشکل در پست بعدی با جزئیات توضیح داده شده است. ## خلاصه این پست دو روش حفاظت از حافظه را ارائه می‌دهد: تقسیم‌بندی و صفحه‌بندی. در حالی که اولی از ناحیه حافظه با اندازه متغیر استفاده می‌کند و از تکه‌تکه شدن خارجی رنج می‌برد، دومی از صفحات با اندازه ثابت استفاده می‌کند و امکان کنترل دقیق‌تر مجوزهای دسترسی را فراهم می‌کند. صفحه‌بندی اطلاعات نگاشت صفحات موجود در جدول‌های صفحه با یک یا چند سطح را ذخیره می‌کند. معماری x86_64 از جدول‌های صفحه با 4 سطح و اندازه صفحه 4KiB استفاده می‌کند. سخت‌افزار به‌طور خودکار جدول‌های صفحه را مرور می‌کند و ترجمه‌های حاصل را در TLB ذخیره می‌کند. این بافر به طور شفاف به‌روز نمی‌شود و باید به صورت دستی با تغییر جدول صفحه، فلاش شود. ما فهمیدیم که هسته ما در حال حاضر در بالای صفحه‌بندی اجرا می‌شود و دسترسی غیرقانونی حافظه باعث استثناهای خطای صفحه می‌شود. ما سعی کردیم به جدول‌های صفحه فعلی دسترسی پیدا کنیم، اما قادر به انجام این کار نبودیم زیرا ثبات CR3 یک آدرس فیزیکی را ذخیره می‌کند که ما نمی‌توانیم مستقیماً از هسته به آن دسترسی داشته باشیم. ## بعدی چیست؟ در پست بعدی نحوه پیاده‌سازی پشتیبانی برای صفحه‌بندی در هسته توضیح داده شده است. که روش‌های مختلفی برای دسترسی به حافظه فیزیکی از هسته ارائه می‌دهد، که دسترسی به جدول‌های صفحه‌ای که هسته در آن اجرا می‌شود را امکان‌پذیر می‌کند. در این مرحله ما می‌توانیم توابع را برای ترجمه آدرس‌های مجازی به فیزیکی و ایجاد نگاشت‌های جدید در جدول‌های صفحه پیاده‌سازی کنیم. ================================================ FILE: blog/content/edition-2/posts/08-paging-introduction/index.ja.md ================================================ +++ title = "ページング入門" weight = 8 path = "ja/paging-introduction" date = 2019-01-14 [extra] # Please update this when updating the translation translation_based_on_commit = "3315bfe2f63571f5e6e924d58ed32afd8f39f892" # GitHub usernames of the people that translated this post translators = ["swnakamura", "JohnTitor"] +++ この記事では**ページング**を紹介します。これは、私達のオペレーティングシステムにも使う、とても一般的なメモリ管理方式です。なぜメモリの分離が必要なのか、**セグメンテーション**がどういう仕組みなのか、**仮想メモリ**とは何なのか、ページングがいかにしてメモリ断片化 (フラグメンテーション) の問題を解決するのかを説明します。また、x86_64アーキテクチャにおける、マルチレベルページテーブルのレイアウトについても説明します。 このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください(訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-08` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-08 ## メモリの保護 オペレーティングシステムの主な役割の一つに、プログラムを互いに分離するということがあります。例えば、ウェブブラウザがテキストエディタに干渉してはいけません。この目的を達成するために、オペレーティングシステムはハードウェアの機能を利用して、あるプロセスのメモリ領域に他のプロセスがアクセスできないようにします。ハードウェアやOSの実装によって、さまざまなアプローチがあります。 例として、ARM Cortex-Mプロセッサ(組み込みシステムに使われています)のいくつかには、[メモリ保護ユニット][_Memory Protection Unit_] (Memory Protection Unit, MPU) が搭載されており、異なるアクセス権限(例えば、アクセス不可、読み取り専用、読み書きなど)を持つメモリ領域を少数(例えば8個)定義できます。MPUは、メモリアクセスのたびに、そのアドレスが正しいアクセス権限を持つ領域にあるかどうかを確認し、そうでなければ例外を投げます。プロセスを変更するごとにその領域とアクセス権限を変更すれば、オペレーティングシステムはそれぞれのプロセスが自身のメモリにのみアクセスすることを保証し、したがってプロセスを互いに分離できます。 [_Memory Protection Unit_]: https://developer.arm.com/docs/ddi0337/e/memory-protection-unit/about-the-mpu x86においては、ハードウェアは2つの異なるメモリ保護の方法をサポートしています:[セグメンテーション][segmentation]と[ページング][paging]です。 [segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation [paging]: https://en.wikipedia.org/wiki/Virtual_memory#Paged_virtual_memory ## セグメンテーション セグメンテーションは1978年にはすでに導入されており、当初の目的はアドレス可能なメモリの量を増やすことでした。当時、CPUは16bitのアドレスしか使えなかったので、アドレス可能なメモリは64KiBに限られていました。この64KiBを超えてアクセスするために、セグメントレジスタが追加され、このそれぞれにオフセットアドレスを格納するようになりました。CPUがメモリにアクセスするとき、毎回このオフセットを自動的に加算するようにすることで、最大1MiBのメモリにアクセスできるようになりました。 メモリアクセスの種類によって、セグメントレジスタは自動的にCPUによって選ばれます。命令の引き出し (フェッチ) にはコードセグメント`CS`が使用され、スタック操作(プッシュ・ポップ)にはスタックセグメント`SS`が使用されます。その他の命令では、データセグメント`DS`やエクストラセグメント`ES`が使用されます。その後、自由に使用できる`FS`と`GS`というセグメントレジスタも追加されました。 セグメンテーションの初期バージョンでは、セグメントレジスタは直接オフセットを格納しており、アクセス制御は行われていませんでした。これは後に[プロテクトモード (protected mode) ][_protected mode_]が導入されたことで変更されました。CPUがこのモードで動作している時、セグメント記述子 (ディスクリプタ) 局所 (ローカル) または大域 (グローバル) [**記述子表 (ディスクリプタテーブル) **][_descriptor table_]を格納します。これには(オフセットアドレスに加えて)セグメントのサイズとアクセス権限が格納されます。それぞれのプロセスに対し、メモリアクセスをプロセスのメモリ領域にのみ制限するような大域/局所記述子表をロードすることで、OSはプロセスを互いに隔離できます。 [_protected mode_]: https://en.wikipedia.org/wiki/X86_memory_segmentation#Protected_mode [_descriptor table_]: https://en.wikipedia.org/wiki/Global_Descriptor_Table メモリアドレスを実際にアクセスされる前に変更するという点において、セグメンテーションは今やほぼすべての場所で使われている**仮想メモリ**というテクニックをすでに採用していたと言えます。 ### 仮想メモリ 仮想メモリの背景にある考え方は、下層にある物理的なストレージデバイスからメモリアドレスを抽象化することです。ストレージデバイスに直接アクセスするのではなく、先に何らかの変換ステップが踏まれます。セグメンテーションの場合、この変換ステップとはアクティブなセグメントのオフセットアドレスを追加することです。例えば、オフセット`0x1111000`のセグメントにあるプログラムが`0x1234000`というメモリアドレスにアクセスすると、実際にアクセスされるアドレスは`0x2345000`になります。 この2種類のアドレスを区別するため、変換前のアドレスを **仮想(アドレス)** と、変換後のアドレスを **物理(アドレス)** と呼びます。この2種類のアドレスの重要な違いの一つは、物理アドレスは常に同じ一意なメモリ位置を指すということです。いっぽう仮想アドレス(の指す場所)は変換する関数に依存します。二つの異なる仮想アドレスが同じ物理アドレスを指すということは十分にありえます。また、変換関数が異なっていれば、同じ仮想アドレスが別の物理アドレスを示すということもありえます。 この特性が役立つ例として、同じプログラムを2つ並行して実行するという状況が挙げられます。 ![Two virtual address spaces with address 0–150, one translated to 100–250, the other to 300–450](segmentation-same-program-twice.svg) 同じプログラムを2つ実行していますが、別の変換関数が使われています。1つ目の実体 (インスタンス) ではセグメントのオフセットが100なので、0から150の仮想アドレスは100から250に変換されます。2つ目のインスタンスではオフセットが300なので、0から150の仮想アドレスが300から450に変換されます。これにより、プログラムが互いに干渉することなく同じコード、同じ仮想アドレスを使うことができます。 もう一つの利点は、プログラムが全く異なる仮想アドレスを使っていたとしても、物理メモリ上の任意の場所に置けるということです。したがって、OSはプログラムを再コンパイルすることなく利用可能なメモリをフルに活用できます。 ### 断片化 (fragmentation) 物理アドレスと仮想アドレスを分けることにより、セグメンテーションは非常に強力なものとなっています。しかし、これにより断片化という問題が発生します。例として、上で見たプログラムの3つ目を実行したいとしましょう: ![Three virtual address spaces, but there is not enough continuous space for the third](segmentation-fragmentation.svg) 開放されているメモリは十分にあるにも関わらず、プログラムのインスタンスを重ねることなく物理メモリに対応づけることはできません。ここで必要なのは **連続した** メモリであり、開放されているメモリが小さな塊であっては使えないためです。 この断片化に対処する方法の一つは、実行を一時停止し、メモリの使用されている部分を寄せ集めて、変換関数を更新し、実行を再開することでしょう: ![Three virtual address spaces after defragmentation](segmentation-fragmentation-compacted.svg) これで、プログラムの3つ目のインスタンスを開始するのに十分な連続したスペースができました。 このデフラグメンテーションという処理の欠点は、大量のメモリをコピーしなければならず、パフォーマンスを低下させてしまうことです。また、メモリが断片化しすぎる前に定期的に実行しないといけません。そうすると、プログラムが時々一時停止して反応がなくなるので、性能が予測不可能になってしまいます。 ほとんどのシステムでセグメンテーションが用いられなくなった理由の一つに、この断片化の問題があります。実際、x86の64ビットモードでは、セグメンテーションはもはやサポートされていません。代わりに **ページング** が使用されており、これにより断片化の問題は完全に回避されます。 ## ページング ページングの考え方は、仮想メモリ空間と物理メモリ空間の両方を、サイズの固定された小さなブロックに分割するというものです。仮想メモリ空間のブロックは **ページ** と呼ばれ、物理アドレス空間のブロックは **フレーム** と呼ばれます。各ページはフレームに独立してマッピングできるので、大きなメモリ領域を連続していない物理フレームに分割することが可能です。 この方法の利点は、上のメモリ空間断片化の状況をもう一度、セグメンテーションの代わりにページングを使って見てみれば明らかになります: ![With paging the third program instance can be split across many smaller physical areas](paging-fragmentation.svg) この例では、ページサイズは50バイトなので、それぞれのメモリ領域が3つのページに分割されます。それぞれのページは個別にフレームに対応付けられるので、連続した仮想メモリ領域を非連続な物理フレームへと対応付けられるのです。これにより、デフラグを事前に実行することなく、3つ目のプログラムのインスタンスを開始できるようになります。 ### 隠された断片化 少ない数の可変なサイズのメモリ領域を使っていたセグメンテーションと比べると、ページングでは大量の小さい固定サイズのメモリ領域を使います。すべてのフレームが同じ大きさなので、「小さすぎて使えないフレーム」などというものは存在せず、したがって断片化も起きません。 というより、**目に見える** 断片化は起きていない、という方が正しいでしょう。**内部 (internal) 断片化**と呼ばれる、目に見えない断片化は依然として起こっています。内部断片化は、すべてのメモリ領域がページサイズの整数倍ぴったりにはならないために生じます。例えば、上の例でサイズが101のプログラムを考えてみてください:この場合でもサイズ50のページが3つ必要で、必要な量より49バイト多く占有します。これらの2種類の断片化を区別するため、セグメンテーションを使うときに起きる断片化は **外部 (external) 断片化** と呼ばれます。 内部断片化が起こるのは残念なことですが、セグメンテーションで発生していた外部断片化よりも優れていることが多いです。確かにメモリ領域は無駄にしますが、デフラグメンテーションをする必要がなく、また断片化の量も予想できるからです(平均するとメモリ領域ごとにページの半分)。 ### ページテーブル 最大で何百万ものページがそれぞれ独立にフレームに対応付けられることを見てきました。この対応付けの情報はどこかに保存されなければなりません。セグメンテーションでは、有効なメモリ領域ごとに個別のセグメントセレクタを使っていましたが、ページングではレジスタよりも遥かに多くのページが使われるので、これは不可能です。代わりにページングでは **ページテーブル** と呼ばれる (テーブル) 構造を使って対応付の情報を保存します。 上の例では、ページテーブルは以下のようになります: ![Three page tables, one for each program instance. For instance 1 the mapping is 0->100, 50->150, 100->200. For instance 2 it is 0->300, 50->350, 100->400. For instance 3 it is 0->250, 50->450, 100->500.](paging-page-tables.svg) それぞれのプログラムのインスタンスが独自のページテーブルを持っているのが分かります。現在有効なテーブルへのポインタは、特殊なCPUのレジスタに格納されます。`x86`においては、このレジスタは`CR3`と呼ばれています。それぞれのプログラムのインスタンスを実行する前に、正しいページテーブルを指すポインタをこのレジスタにロードするのはOSの役割です。 それぞれのメモリアクセスにおいて、CPUはテーブルへのポインタをレジスタから読み出し、テーブル内のアクセスされたページから対応するフレームを見つけ出します。これは完全にハードウェア内で行われ、実行しているプログラムからはこの動作は見えません。変換プロセスを高速化するために、多くのCPUアーキテクチャは前回の変換の結果を覚えておく専用のキャッシュを持っています。 アーキテクチャによっては、ページテーブルのエントリは"Flags"フィールドにあるアクセス権限のような属性も保持できます。上の例では、"r/w"フラグがあることにより、このページは読み書きのどちらも可能だということを示しています。 ### 複数層 (Multilevel) ページテーブル 上で見たシンプルなページテーブルは、アドレス空間が大きくなってくると問題が発生します:メモリが無駄になるのです。たとえば、`0`, `1_000_000`, `1_000_050` および `1_000_100`(3ケタごとの区切りとして`_`を用いています)の4つの仮想ページを使うプログラムを考えてみましょう。 ![Page 0 mapped to frame 0 and pages `1_000_000`–`1_000_150` mapped to frames 100–250](single-level-page-table.svg) このプログラムはたった4つしか物理フレームを必要としていないのに、テーブルには100万以上ものエントリが存在してしまっています。空のエントリを省略した場合、変換プロセスにおいてCPUが正しいエントリに直接ジャンプできなくなってしまうので、それはできません(たとえば、4つめのページが4つめのエントリを使っていることが保証されなくなってしまいます)。 この無駄になるメモリを減らせる、 **2層ページテーブル** を使ってみましょう。発想としては、それぞれのアドレス領域に異なるページテーブルを使うというものです。**レベル2** ページテーブルと呼ばれる追加のページテーブルは、アドレス領域と(レベル1の)ページテーブルのあいだの対応を格納します。 これを理解するには、例を見るのが一番です。それぞれのレベル1テーブルは大きさ`10_000`の領域に対応するとします。すると、以下のテーブルが上のマッピングの例に対応するものとなります: ![Page 0 points to entry 0 of the level 2 page table, which points to the level 1 page table T1. The first entry of T1 points to frame 0, the other entries are empty. Pages `1_000_000`–`1_000_150` point to the 100th entry of the level 2 page table, which points to a different level 1 page table T2. The first three entries of T2 point to frames 100–250, the other entries are empty.](multilevel-page-table.svg) ページ0は最初の`10_000`バイト領域に入るので、レベル2ページテーブルの最初のエントリを使います。このエントリはT1というレベル1ページテーブルを指し、このページテーブルはページ`0`がフレーム`0`に対応すると指定します。 ページ`1_000_000`, `1_000_050`および`1_000_100`はすべて、`10_000`バイトの大きさの領域100個目に入るので、レベル2ページテーブルの100個目のエントリを使います。このエントリは、T2という別のレベル1テーブルを指しており、このレベル1テーブルはこれらの3つのページをフレーム`100`, `150`および`200`に対応させています。レベル1テーブルにおけるページアドレスには領域のオフセットは含まれていない、つまり例えば、ページ`1_000_050`のエントリは単に`50`である、ということに注意してください。 レベル2テーブルにはまだ100個の空のエントリがありますが、前の100万にくらべればこれはずっと少ないです。このように節約できる理由は、`10_000`から`10_000_000`の、対応付けのないメモリ領域のためのレベル1テーブルを作る必要がないためです。 2層ページテーブルの原理は、3、4、それ以上に多くの層に拡張できます。このとき、ページテーブルレジスタは最も高いレベルのテーブルを指し、そのテーブルは次に低いレベルのテーブルを指し、それはさらに低いレベルのものを、と続きます。そして、レベル1のテーブルは対応するフレームを指します。この原理は一般に **複数層 (multilevel) ** ページテーブルや、 **階層型 (hierarchical) ** ページテーブルと呼ばれます。 ページングと複数層ページテーブルの仕組みが理解できたので、x86_64アーキテクチャにおいてどのようにページングが実装されているのかについて見ていきましょう(以下では、CPUは64ビットモードで動いているとします)。 ## x86_64におけるページング x86_64アーキテクチャは4層ページテーブルを使っており、ページサイズは4KiBです。それぞれのページテーブルは、層によらず512のエントリを持っています。それぞれのエントリの大きさは8バイトなので、それぞれのテーブルは512 * 8B = 4KiBであり、よってぴったり1ページに収まります。 (各)レベルのページテーブルインデックスは、仮想アドレスから直接求められます: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](x86_64-table-indices-from-address.svg) それぞれのテーブルインデックスは9ビットからなることがわかります。それぞれのテーブルに2^9 = 512エントリあることを考えるとこれは妥当です。最下位の12ビットは4KiBページ内でのオフセット(2^12バイト = 4KiB)です。48ビットから64ビットは捨てられます。つまり、x86_64は48ビットのアドレスにしか対応しておらず、そのため(64ビットアーキテクチャなどとよく呼ばれるが)実際には64ビットではないということです。 [5-level page table]: https://en.wikipedia.org/wiki/Intel_5-level_paging 48ビットから64ビットが捨てられるからといって、任意の値にしてよいということではありません。アドレスを一意にし、5層ページテーブルのような将来の拡張に備えるため、この範囲のすべてのビットは47ビットの値と同じにしないといけません。これは、[2の補数における符号拡張][sign extension in two's complement]によく似ているので、 **符号 (sign) 拡張 (extension) ** とよばれています。アドレスが適切に符号拡張されていない場合、CPUは例外を投げます。 [sign extension in two's complement]: https://en.wikipedia.org/wiki/Two's_complement#Sign_extension 近年発売されたIntelのIce LakeというCPUは、[5層ページテーブル][5-level page tables]を使用することもでき、そうすると仮想アドレスが48ビットから57ビットまで延長されるということは書いておく価値があるでしょう。いまの段階で私たちのカーネルをこの特定のCPUに最適化する意味はないので、この記事では標準の4層ページテーブルのみを使うことにします。 [5-level page tables]: https://en.wikipedia.org/wiki/Intel_5-level_paging ### 変換の例 この変換の仕組みをより詳細に理解するために、例を挙げて見てみましょう。 ![An example 4-level page hierarchy with each page table shown in physical memory](x86_64-page-table-translation.svg) 現在有効なレベル4ページテーブルの物理アドレス、つまりレベル4ページテーブルの「 (root) 」は`CR3`レジスタに格納されています。それぞれのページテーブルエントリは、次のレベルのテーブルの物理フレームを指しています。そして、レベル1のテーブルは対応するフレームを指しています。なお、ページテーブル内のアドレスは全て仮想ではなく物理アドレスであることに注意してください。さもなければ、CPUは(変換プロセス中に)それらのアドレスも変換しなくてはならず、無限再帰に陥ってしまうかもしれないからです。 上のページテーブル階層構造は、最終的に(青色の)2つのページへの対応を行っています。ページテーブルのインデックスから、これらの2つのページの仮想アドレスは`0x803FE7F000`と`0x803FE00000`であると推論できます。プログラムがアドレス`0x803FE7F5CE`から読み込もうとしたときに何が起こるかを見てみましょう。まず、アドレスを2進数に変換し、アドレスのページテーブルインデックスとページオフセットが何であるかを決定します: ![The sign extension bits are all 0, the level 4 index is 1, the level 3 index is 0, the level 2 index is 511, the level 1 index is 127, and the page offset is 0x5ce](x86_64-page-table-translation-addresses.png) これらのインデックス情報をもとにページテーブル階層構造を移動して、このアドレスに対応するフレームを決定します: - まず、`CR3`レジスタからレベル4テーブルのアドレスを読み出します。 - レベル4のインデックスは1なので、このテーブルの1つ目のインデックスを見ます。すると、レベル3テーブルはアドレス16KiBに格納されていると分かります。 - レベル3テーブルをそのアドレスから読み出し、インデックス0のエントリを見ると、レベル2テーブルは24KiBにあると教えてくれます。 - レベル2のインデックスは511なので、このページの最後のエントリを見て、レベル1テーブルのアドレスを見つけます。 - レベル1テーブルの127番目のエントリを読むことで、ついに対象のページは12KiB(16進数では0x3000)のフレームに対応づけられていると分かります。 - 最後のステップは、ページオフセットをフレームアドレスに足して、物理アドレスを得ることです。0x3000 + 0x5ce = 0x35ce ![The same example 4-level page hierarchy with 5 additional arrows: "Step 0" from the CR3 register to the level 4 table, "Step 1" from the level 4 entry to the level 3 table, "Step 2" from the level 3 entry to the level 2 table, "Step 3" from the level 2 entry to the level 1 table, and "Step 4" from the level 1 table to the mapped frames.](x86_64-page-table-translation-steps.svg) レベル1テーブルにあるこのページの権限は`r`であり、これは読み込み専用という意味です。これらのような権限に対する侵害はハードウェアによって保護されており、このページに書き込もうとした場合は例外が投げられます。より高いレベルのページにおける権限は、下のレベルにおいて可能な権限を制限します。たとえばレベル3エントリを読み込み専用にした場合、下のレベルで読み書きを許可したとしても、このエントリを使うページはすべて書き込み不可になります。 この例ではそれぞれのテーブルの実体 (インスタンス) を1つずつしか使いませんでしたが、普通、それぞれのアドレス空間において、各レベルに対して複数のインスタンスが使われるということは知っておく価値があるでしょう。最大で - 1個のレベル4テーブル - 512個のレベル3テーブル(レベル4テーブルには512エントリあるので) - 512 * 512個のレベル2テーブル(512個のレベル3テーブルそれぞれに512エントリあるので) - 512 * 512 * 512個のレベル1テーブル(それぞれのレベル2テーブルに512エントリあるので) があります。 ### ページテーブルの形式 x86_64アーキテクチャにおけるページテーブルは詰まるところ512個のエントリの配列です。Rustの構文では以下のようになります: ```rust #[repr(align(4096))] pub struct PageTable { entries: [PageTableEntry; 512], } ``` `repr`属性で示されるように、ページテーブルはアラインされる必要があります。つまり4KiBごとの境界に揃えられる必要がある、ということです。この条件により、ページテーブルはつねにページひとつを完全に使うので、エントリをとてもコンパクトにできる最適化が可能になります。 それぞれのエントリは8バイト(64ビット)の大きさであり、以下の形式です: ビット | 名前 | 意味 ------ | ---- | ------- 0 | present | このページはメモリ内にある 1 | writable | このページへの書き込みは許可されている 2 | user accessible | 0の場合、カーネルモードのみこのページにアクセスできる 3 | write through caching | 書き込みはメモリに対して直接行われる 4 | disable cache | このページにキャッシュを使わない 5 | accessed | このページが使われているとき、CPUはこのビットを1にする 6 | dirty | このページへの書き込みが行われたとき、CPUはこのビットを1にする 7 | huge page/null | P1とP4においては0で、P3においては1GiBのページを、P2においては2MiBのページを作る 8 | global | キャッシュにあるこのページはアドレス空間変更の際に初期化されない(CR4レジスタのPGEビットが1である必要がある) 9-11 | available | OSが自由に使える 12-51 | physical address | ページ単位にアラインされた、フレームまたは次のページテーブルの52bit物理アドレス 52-62 | available | OSが自由に使える 63 | no execute | このページにおいてプログラムを実行することを禁じる(EFERレジスタのNXEビットが1である必要がある) 12-51ビットだけが物理フレームアドレスを格納するのに使われていて、残りのビットはフラグやオペレーティングシステムが自由に使うようになっていることがわかります。これが可能なのは、常に4096バイト単位のページに揃え (アライン) られたアドレス(ページテーブルか、対応づけられたフレームの先頭)を指しているからです。これは、0-11ビットは常にゼロであることを意味し、したがってこれらのビットを格納しておく必要はありません。アドレスを使用する前に、ハードウェアがそれらのビットをゼロとして(追加して)やれば良いからです。また、52-63ビットについても格納しておく必要はありません。なぜならx86_64アーキテクチャは52ビットの物理アドレスしかサポートしていないからです(仮想アドレスを48ビットしかサポートしていないのと似ています)。 上のフラグについてより詳しく見てみましょう: - `present`フラグは、対応付けられているページとそうでないページを区別します。このフラグは、メインメモリが一杯になったとき、ページを一時的にディスクにスワップしたいときに使うことができます。後でページがアクセスされたら、 **ページフォルト** という特別な例外が発生するので、オペレーティングシステムは不足しているページをディスクから読み出すことでこれに対応し、プログラムを再開します。 - `writable`と`no execute`フラグはそれぞれ、このページの中身が書き込み可能かと、実行可能な命令であるかを制御します。 - `accessed`と`dirty`フラグは、ページへの読み込みか書き込みが行われたときにCPUによって自動的に1にセットされます。この情報はオペレーティングシステムによって活用でき、例えば、どのページをスワップするかや、ページの中身が最後にディスクに保存されて以降に修正されたかを確認できます。 - `write through caching`と`disable cache`フラグで、キャッシュの制御をページごとに独立して行うことができます。 - `user accessible`フラグはページをユーザー空間のコードが利用できるようにします。このフラグが1になっていない場合、CPUがカーネルモードのときにのみアクセスできます。この機能は、ユーザ空間のプログラムが実行している間もカーネル(の使用しているメモリ)を対応付けたままにしておくことで、[システムコール][system calls]を高速化するために使うことができます。しかし、[Spectre]脆弱性を使うと、この機能があるにもかかわらず、ユーザ空間プログラムがこれらのページを読むことができてしまいます。 - `global`フラグは、このページはすべてのアドレス空間で利用可能であり、よってアドレス空間の変更時に変換キャッシュ(TLBに関する下のセクションを読んでください)から取り除く必要がないことをハードウェアに伝えます。このフラグはカーネルコードをすべてのアドレス空間に対応付けるため、一般的に`user accsessible`フラグと一緒に使われます。 - `huge page`フラグを使うと、レベル2か3のページが対応付けられたフレームを直接指すようにすることで、より大きいサイズのページを作ることができます。このビットが1のとき、ページの大きさは512倍になるので、レベル2のエントリの場合は2MiB = 512 * 4KiBに、レベル3のエントリの場合は1GiB = 512 * 2MiBにもなります。大きいページを使うことのメリットは、必要な変換キャッシュのラインの数やページテーブルの数が少なくなることです。 [system calls]: https://en.wikipedia.org/wiki/System_call [Spectre]: https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) `x86_64`クレートが[ページテーブル][page tables]とその[エントリ][entries]のための型を提供してくれているので、これらの構造体を私達自身で作る必要はありません。 [page tables]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTable.html [entries]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html ### トランスレーション・ルックアサイド・バッファ 4層ページテーブルを使うと、仮想アドレスを変換するたびに4回メモリアクセスを行わないといけないので、変換のコストは大きくなります。性能改善のために、x86_64アーキテクチャは、直前数回の変換内容を **トランスレーション・ルックアサイド・バッファ (translation lookaside buffer, TLB)** と呼ばれるところにキャッシュします。これにより、前の変換がまだキャッシュされているなら、変換をスキップできます。 他のCPUキャッシュと異なり、TLBは完全に透明ではなく、ページテーブルの内容が変わったときに変換内容を更新したり取り除いたりしてくれません(訳注:キャッシュが透明 (transparent) であるとは、利用者がキャッシュの存在を意識する必要がないという意味)。つまり、カーネルがページテーブルを変更したときは、カーネル自らTLBを更新しないといけないということです。これを行うために、[`invlpg`]("invalidate page"、ページを無効化の意)という特別なCPU命令があります。これは指定されたページの変換をTLBから取り除き、次のアクセスの際に再び読み込まれるようにします。また、TLBは`CR3`レジスタを再設定することでもflushできます。`CR3`レジスタの再設定は、アドレス空間が変更されたという状況を模擬するのです。`x86_64`クレートの[`tlb`モジュール][`tlb` module]が、両方のやり方のRust関数を提供しています。
    **訳注:** flushは「(溜まった水を)どっと流す」「(トイレなどを)水で洗い流す」という意味の言葉です。そのためコンピュータサイエンスにおいて「キャッシュなどに溜められていたデータを(場合によっては適切な出力先に書き込みながら)削除する」という意味を持つようになりました。ここではどこかに出力しているわけではないので、「初期化」と同じような意味と考えて差し支えないでしょう。
    [`invlpg`]: https://www.felixcloutier.com/x86/INVLPG.html [`tlb` module]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tlb/index.html ページテーブルを修正したときは毎回TLBをflushしないといけないということはしっかりと覚えておいてください。これを行わないと、CPUは古い変換を使いつづけるかもしれず、これはデバッグの非常に難しい、予測不能なバグに繋がるかもしれないためです。 ## 実装 ひとつ言っていなかったことがあります:**わたしたちのカーネルはすでにページングを使っています**。[Rustでつくる最小のカーネル]["A minimal Rust Kernel"]の記事で追加したブートローダは、すでに私たちのカーネルのすべてのページを物理フレームに対応付けるような4層ページ階層構造を設定しているのです。ブートローダがこれを行う理由は、x86_64の64ビットモードにおいてページングは必須となっているからです。 ["A minimal Rust kernel"]: @/edition-2/posts/02-minimal-rust-kernel/index.ja.md#butoimeziwozuo-ru つまり、私達がカーネルにおいて使ってきたすべてのメモリアドレスは、仮想アドレスだったということです。アドレス`0xb8000`にあるVGAバッファへのアクセスが上手くいっていたのは、ひとえにブートローダがこのメモリページを **恒等対応** させていた、つまり、仮想ページ`0xb8000`を物理フレーム`0xb8000`に対応させていたからです。 ページングにより、境界外メモリアクセスをしてもおかしな物理メモリに書き込むのではなくページフォルト例外を起こすようになっているため、私達のカーネルはすでに比較的安全になっていました。ブートローダはそれぞれのページに正しい権限を設定することさえしてくれるので、コードを含むページだけが実行可能であり、データを含むページだけが書き込み可能になっています。 ### ページフォルト カーネルの外のメモリにアクセスすることによって、ページフォルトを引き起こしてみましょう。まず、通常の[ダブルフォルト][double fault]ではなくページフォルト例外が得られるように、ページフォルト処理関数 (ハンドラ) を作ってIDTに追加しましょう: [double fault]: @/edition-2/posts/06-double-faults/index.ja.md ```rust // in src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); […] idt.page_fault.set_handler_fn(page_fault_handler); // ここを追加 idt }; } use x86_64::structures::idt::PageFaultErrorCode; use crate::hlt_loop; extern "x86-interrupt" fn page_fault_handler( stack_frame: InterruptStackFrame, error_code: PageFaultErrorCode, ) { use x86_64::registers::control::Cr2; println!("EXCEPTION: PAGE FAULT"); println!("Accessed Address: {:?}", Cr2::read()); println!("Error Code: {:?}", error_code); println!("{:#?}", stack_frame); hlt_loop(); } ``` [`CR2`]レジスタは、ページフォルト時にCPUによって自動的に設定されており、その値はアクセスされページフォルトを引き起こした仮想アドレスになっています。`x86_64`クレートの[`Cr2::read`]関数を使ってこれを読み込み出力します。[`PageFaultErrorCode`]型は、ページフォルトを引き起こしたメモリアクセスの種類についてより詳しい情報を提供します(例えば、読み込みと書き込みのどちらによるものなのか、など)。そのため、これも出力します。ページフォルトを解決しない限り実行を継続することはできないので、最後は[`hlt_loop`]に入ります。 [`CR2`]: https://en.wikipedia.org/wiki/Control_register#CR2 [`Cr2::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr2.html#method.read [`PageFaultErrorCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html [LLVM bug]: https://github.com/rust-lang/rust/issues/57270 [`hlt_loop`]: @/edition-2/posts/07-hardware-interrupts/index.md#the-hlt-instruction それではカーネル外のメモリにアクセスしてみましょう: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // ここを追加 let ptr = 0xdeadbeaf as *mut u8; unsafe { *ptr = 42; } // ここはこれまでと同じ #[cfg(test)] test_main(); println!("It did not crash!"); blog_os::hlt_loop(); } ``` これを実行すると、ページフォルトハンドラが呼びだされたのを見ることができます: ![EXCEPTION: Page Fault, Accessed Address: VirtAddr(0xdeadbeaf), Error Code: CAUSED_BY_WRITE, InterruptStackFrame: {…}](qemu-page-fault.png) `CR2`レジスタは確かに私達がアクセスしようとしていたアドレスである`0xdeadbeaf`を格納しています。エラーコードが[`CAUSED_BY_WRITE`]なので、この障害 (フォルト) write (書き込み) 操作の実行中に発生したのだと分かります。更に、[1にセットされていないビット][`PageFaultErrorCode`]からも情報を得ることができます。例えば、`PROTECTION_VIOLATION`フラグが1にセットされていないことから、ページフォルトは対象のページが存在しなかったために発生したのだと分かります。 [`CAUSED_BY_WRITE`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.CAUSED_BY_WRITE ページフォルトを起こした時点での命令ポインタは`0x2031b2`であるので、このアドレスはコードページを指しているとわかります。コードページはブートローダによって読み込み専用に指定されているので、このアドレスからの読み込みは大丈夫ですが、このページへの書き込みはページフォルトを起こします。`0xdeadbeaf`へのポインタを`0x2031b2`に変更して、これを試してみましょう。 ```rust // 注意:実際のアドレスは個々人で違うかもしれません。 // あなたのページフォルトハンドラが報告した値を使ってください。 let ptr = 0x2031b2 as *mut u8; // コードページから読み込む unsafe { let x = *ptr; } println!("read worked"); // コードページへと書き込む unsafe { *ptr = 42; } println!("write worked"); ``` 最後の2行をコメントアウトすると、読み込みアクセスだけになるので実行は成功しますが、そうしなかった場合ページフォルトが発生します: ![QEMU with output: "read worked, EXCEPTION: Page Fault, Accessed Address: VirtAddr(0x2031b2), Error Code: PROTECTION_VIOLATION | CAUSED_BY_WRITE, InterruptStackFrame: {…}"](qemu-page-fault-protection.png) "read worked"というメッセージが表示されますが、これは読み込み操作が何のエラーも発生させなかったことを示しています。しかし、"write worked"のメッセージではなく、ページフォルトが発生してしまいました。今回は[`CAUSED_BY_WRITE`]フラグに加えて[`PROTECTION_VIOLATION`]フラグがセットされています。これは、ページは存在していたものの、それに対する今回の操作が許可されていなかったということを示します。今回の場合、ページへの書き込みは、コードページが読み込み専用に指定されているため許可されていませんでした。 [`PROTECTION_VIOLATION`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.PROTECTION_VIOLATION ### ページテーブルへのアクセス 私達のカーネルがどのように(物理メモリに)対応づけられているのかを定義しているページテーブルを見てみましょう。 ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); use x86_64::registers::control::Cr3; let (level_4_page_table, _) = Cr3::read(); println!("Level 4 page table at: {:?}", level_4_page_table.start_address()); […] // test_main(), println(…), hlt_loop() などが続く } ``` `x86_64`クレートの[`Cr3::read`]関数は、現在有効なレベル4ページテーブルを`CR3`レジスタから読みとって返します。この関数は[`PhysFrame`]型と[`Cr3Flags`]型のタプルを返します。私達はフレームにしか興味がないので、タプルの2つ目の要素は無視しました。 [`Cr3::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3.html#method.read [`PhysFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/frame/struct.PhysFrame.html [`Cr3Flags`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3Flags.html これを実行すると、以下の出力を得ます: ``` Level 4 page table at: PhysAddr(0x1000) ``` というわけで、現在有効なレベル4ページテーブルは、[`PhysAddr`]ラッパ型が示すように、 **物理** メモリのアドレス`0x1000`に格納されています。ここで疑問が生まれます:このテーブルに私達のカーネルからアクセスするにはどうすればいいのでしょう? [`PhysAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.PhysAddr.html ページングが有効なとき、物理メモリに直接アクセスすることはできません。もしそれができたとしたら、プログラムは容易くメモリ保護を回避して他のプログラムのメモリにアクセスできてしまうだろうからです。ですので、テーブルにアクセスする唯一の方法は、アドレス`0x1000`の物理フレームに対応づけられているような仮想ページにアクセスすることです。ページテーブルの存在するフレームへの対応づけは(実用上も必要になる)一般的な問題です。なぜなら、例えば新しいスレッドのためにスタックを割り当てるときなど、カーネルは日常的にページテーブルにアクセスする必要があるためです。 この問題への解決策は次の記事で詳細に論じます。 ## まとめ この記事では2つのメモリ保護技術を紹介しました:セグメンテーションとページングです。前者は可変サイズのメモリ領域を使用するため外部断片化の問題が存在するのに対し、後者は固定サイズのページを使用するためアクセス権限に関して遥かに細やかな制御が可能となっていました。 ページングは、(仮想メモリと物理メモリの)対応情報を1層以上のページテーブルに格納します。x86_64アーキテクチャにおいては4層ページテーブルが使用され、ページサイズは4KiBです。ハードウェアは自動的にページテーブルを辿り、変換の結果をトランスレーション・ルックアサイド・バッファ (TLB) にキャッシュします。このバッファは自動的に更新されない(「透明ではない」)ので、ページテーブルの変更時には明示的にflushする必要があります。 私達のカーネルは既にページングによって動いており、不正なメモリアクセスはページフォルト例外を発生させるということを学びました。現在有効なページテーブルへとアクセスしたかったのですが、CR3レジスタに格納されている物理アドレスはカーネルから直接アクセスできないものであるため、それはできませんでした。 ## 次は? 次の記事では、私達のカーネルをページングに対応させる方法について説明します。私達のカーネルから物理メモリにアクセスする幾つかの方法を示すので、これらを用いれば私達のカーネルが動作しているページテーブルにアクセスできます。そうすると、仮想アドレスを物理アドレスに変換する関数を実装でき、ページテーブルに新しい対応づけを作れるようになります。 ================================================ FILE: blog/content/edition-2/posts/08-paging-introduction/index.md ================================================ +++ title = "Introduction to Paging" weight = 8 path = "paging-introduction" date = 2019-01-14 [extra] chapter = "Memory Management" +++ This post introduces _paging_, a very common memory management scheme that we will also use for our operating system. It explains why memory isolation is needed, how _segmentation_ works, what _virtual memory_ is, and how paging solves memory fragmentation issues. It also explores the layout of multilevel page tables on the x86_64 architecture. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-08`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-08 ## Memory Protection One main task of an operating system is to isolate programs from each other. Your web browser shouldn't be able to interfere with your text editor, for example. To achieve this goal, operating systems utilize hardware functionality to ensure that memory areas of one process are not accessible by other processes. There are different approaches depending on the hardware and the OS implementation. As an example, some ARM Cortex-M processors (used for embedded systems) have a [_Memory Protection Unit_] (MPU), which allows you to define a small number (e.g., 8) of memory regions with different access permissions (e.g., no access, read-only, read-write). On each memory access, the MPU ensures that the address is in a region with correct access permissions and throws an exception otherwise. By changing the regions and access permissions on each process switch, the operating system can ensure that each process only accesses its own memory and thus isolates processes from each other. [_Memory Protection Unit_]: https://developer.arm.com/docs/ddi0337/e/memory-protection-unit/about-the-mpu On x86, the hardware supports two different approaches to memory protection: [segmentation] and [paging]. [segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation [paging]: https://en.wikipedia.org/wiki/Virtual_memory#Paged_virtual_memory ## Segmentation Segmentation was already introduced in 1978, originally to increase the amount of addressable memory. The situation back then was that CPUs only used 16-bit addresses, which limited the amount of addressable memory to 64 KiB. To make more than these 64 KiB accessible, additional segment registers were introduced, each containing an offset address. The CPU automatically added this offset on each memory access, so that up to 1 MiB of memory was accessible. The segment register is chosen automatically by the CPU depending on the kind of memory access: For fetching instructions, the code segment `CS` is used, and for stack operations (push/pop), the stack segment `SS` is used. Other instructions use the data segment `DS` or the extra segment `ES`. Later, two additional segment registers, `FS` and `GS`, were added, which can be used freely. In the first version of segmentation, the segment registers directly contained the offset and no access control was performed. This was changed later with the introduction of the [_protected mode_]. When the CPU runs in this mode, the segment descriptors contain an index into a local or global [_descriptor table_], which contains – in addition to an offset address – the segment size and access permissions. By loading separate global/local descriptor tables for each process, which confine memory accesses to the process's own memory areas, the OS can isolate processes from each other. [_protected mode_]: https://en.wikipedia.org/wiki/X86_memory_segmentation#Protected_mode [_descriptor table_]: https://en.wikipedia.org/wiki/Global_Descriptor_Table By modifying the memory addresses before the actual access, segmentation already employed a technique that is now used almost everywhere: _virtual memory_. ### Virtual Memory The idea behind virtual memory is to abstract away the memory addresses from the underlying physical storage device. Instead of directly accessing the storage device, a translation step is performed first. For segmentation, the translation step is to add the offset address of the active segment. Imagine a program accessing memory address `0x1234000` in a segment with an offset of `0x1111000`: The address that is really accessed is `0x2345000`. To differentiate the two address types, addresses before the translation are called _virtual_, and addresses after the translation are called _physical_. One important difference between these two kinds of addresses is that physical addresses are unique and always refer to the same distinct memory location. Virtual addresses, on the other hand, depend on the translation function. It is entirely possible that two different virtual addresses refer to the same physical address. Also, identical virtual addresses can refer to different physical addresses when they use different translation functions. An example where this property is useful is running the same program twice in parallel: ![Two virtual address spaces with address 0–150, one translated to 100–250, the other to 300–450](segmentation-same-program-twice.svg) Here the same program runs twice, but with different translation functions. The first instance has a segment offset of 100, so that its virtual addresses 0–150 are translated to the physical addresses 100–250. The second instance has an offset of 300, which translates its virtual addresses 0–150 to physical addresses 300–450. This allows both programs to run the same code and use the same virtual addresses without interfering with each other. Another advantage is that programs can now be placed at arbitrary physical memory locations, even if they use completely different virtual addresses. Thus, the OS can utilize the full amount of available memory without needing to recompile programs. ### Fragmentation The differentiation between virtual and physical addresses makes segmentation really powerful. However, it has the problem of fragmentation. As an example, imagine that we want to run a third copy of the program we saw above: ![Three virtual address spaces, but there is not enough continuous space for the third](segmentation-fragmentation.svg) There is no way to map the third instance of the program to virtual memory without overlapping, even though there is more than enough free memory available. The problem is that we need _continuous_ memory and can't use the small free chunks. One way to combat this fragmentation is to pause execution, move the used parts of the memory closer together, update the translation, and then resume execution: ![Three virtual address spaces after defragmentation](segmentation-fragmentation-compacted.svg) Now there is enough continuous space to start the third instance of our program. The disadvantage of this defragmentation process is that it needs to copy large amounts of memory, which decreases performance. It also needs to be done regularly before the memory becomes too fragmented. This makes performance unpredictable since programs are paused at random times and might become unresponsive. The fragmentation problem is one of the reasons that segmentation is no longer used by most systems. In fact, segmentation is not even supported in 64-bit mode on x86 anymore. Instead, _paging_ is used, which completely avoids the fragmentation problem. ## Paging The idea is to divide both the virtual and physical memory space into small, fixed-size blocks. The blocks of the virtual memory space are called _pages_, and the blocks of the physical address space are called _frames_. Each page can be individually mapped to a frame, which makes it possible to split larger memory regions across non-continuous physical frames. The advantage of this becomes visible if we recap the example of the fragmented memory space, but use paging instead of segmentation this time: ![With paging, the third program instance can be split across many smaller physical areas.](paging-fragmentation.svg) In this example, we have a page size of 50 bytes, which means that each of our memory regions is split across three pages. Each page is mapped to a frame individually, so a continuous virtual memory region can be mapped to non-continuous physical frames. This allows us to start the third instance of the program without performing any defragmentation before. ### Hidden Fragmentation Compared to segmentation, paging uses lots of small, fixed-sized memory regions instead of a few large, variable-sized regions. Since every frame has the same size, there are no frames that are too small to be used, so no fragmentation occurs. Or it _seems_ like no fragmentation occurs. There is still some hidden kind of fragmentation, the so-called _internal fragmentation_. Internal fragmentation occurs because not every memory region is an exact multiple of the page size. Imagine a program of size 101 in the above example: It would still need three pages of size 50, so it would occupy 49 bytes more than needed. To differentiate the two types of fragmentation, the kind of fragmentation that happens when using segmentation is called _external fragmentation_. Internal fragmentation is unfortunate but often better than the external fragmentation that occurs with segmentation. It still wastes memory, but does not require defragmentation and makes the amount of fragmentation predictable (on average half a page per memory region). ### Page Tables We saw that each of the potentially millions of pages is individually mapped to a frame. This mapping information needs to be stored somewhere. Segmentation uses an individual segment selector register for each active memory region, which is not possible for paging since there are way more pages than registers. Instead, paging uses a table structure called _page table_ to store the mapping information. For our above example, the page tables would look like this: ![Three page tables, one for each program instance. For instance 1, the mapping is 0->100, 50->150, 100->200. For instance 2, it is 0->300, 50->350, 100->400. For instance 3, it is 0->250, 50->450, 100->500.](paging-page-tables.svg) We see that each program instance has its own page table. A pointer to the currently active table is stored in a special CPU register. On `x86`, this register is called `CR3`. It is the job of the operating system to load this register with the pointer to the correct page table before running each program instance. On each memory access, the CPU reads the table pointer from the register and looks up the mapped frame for the accessed page in the table. This is entirely done in hardware and completely invisible to the running program. To speed up the translation process, many CPU architectures have a special cache that remembers the results of the last translations. Depending on the architecture, page table entries can also store attributes such as access permissions in a flags field. In the above example, the "r/w" flag makes the page both readable and writable. ### Multilevel Page Tables The simple page tables we just saw have a problem in larger address spaces: they waste memory. For example, imagine a program that uses the four virtual pages `0`, `1_000_000`, `1_000_050`, and `1_000_100` (we use `_` as a thousands separator): ![Page 0 mapped to frame 0 and pages `1_000_000`–`1_000_150` mapped to frames 100–250](single-level-page-table.svg) It only needs 4 physical frames, but the page table has over a million entries. We can't omit the empty entries because then the CPU would no longer be able to jump directly to the correct entry in the translation process (e.g., it is no longer guaranteed that the fourth page uses the fourth entry). To reduce the wasted memory, we can use a **two-level page table**. The idea is that we use different page tables for different address regions. An additional table called _level 2_ page table contains the mapping between address regions and (level 1) page tables. This is best explained by an example. Let's define that each level 1 page table is responsible for a region of size `10_000`. Then the following tables would exist for the above example mapping: ![Page 0 points to entry 0 of the level 2 page table, which points to the level 1 page table T1. The first entry of T1 points to frame 0; the other entries are empty. Pages `1_000_000`–`1_000_150` point to the 100th entry of the level 2 page table, which points to a different level 1 page table T2. The first three entries of T2 point to frames 100–250; the other entries are empty.](multilevel-page-table.svg) Page 0 falls into the first `10_000` byte region, so it uses the first entry of the level 2 page table. This entry points to level 1 page table T1, which specifies that page `0` points to frame `0`. The pages `1_000_000`, `1_000_050`, and `1_000_100` all fall into the 100th `10_000` byte region, so they use the 100th entry of the level 2 page table. This entry points to a different level 1 page table T2, which maps the three pages to frames `100`, `150`, and `200`. Note that the page address in level 1 tables does not include the region offset. For example, the entry for page `1_000_050` is just `50`. We still have 100 empty entries in the level 2 table, but much fewer than the million empty entries before. The reason for these savings is that we don't need to create level 1 page tables for the unmapped memory regions between `10_000` and `1_000_000`. The principle of two-level page tables can be extended to three, four, or more levels. Then the page table register points to the highest level table, which points to the next lower level table, which points to the next lower level, and so on. The level 1 page table then points to the mapped frame. The principle in general is called a _multilevel_ or _hierarchical_ page table. Now that we know how paging and multilevel page tables work, we can look at how paging is implemented in the x86_64 architecture (we assume in the following that the CPU runs in 64-bit mode). ## Paging on x86_64 The x86_64 architecture uses a 4-level page table and a page size of 4 KiB. Each page table, independent of the level, has a fixed size of 512 entries. Each entry has a size of 8 bytes, so each table is 512 * 8 B = 4 KiB large and thus fits exactly into one page. The page table index for each level is derived directly from the virtual address: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](x86_64-table-indices-from-address.svg) We see that each table index consists of 9 bits, which makes sense because each table has 2^9 = 512 entries. The lowest 12 bits are the offset in the 4 KiB page (2^12 bytes = 4 KiB). Bits 48 to 64 are discarded, which means that x86_64 is not really 64-bit since it only supports 48-bit addresses. Even though bits 48 to 64 are discarded, they can't be set to arbitrary values. Instead, all bits in this range have to be copies of bit 47 in order to keep addresses unique and allow future extensions like the 5-level page table. This is called _sign-extension_ because it's very similar to the [sign extension in two's complement]. When an address is not correctly sign-extended, the CPU throws an exception. [sign extension in two's complement]: https://en.wikipedia.org/wiki/Two's_complement#Sign_extension It's worth noting that the recent "Ice Lake" Intel CPUs optionally support [5-level page tables] to extend virtual addresses from 48-bit to 57-bit. Given that optimizing our kernel for a specific CPU does not make sense at this stage, we will only work with standard 4-level page tables in this post. [5-level page tables]: https://en.wikipedia.org/wiki/Intel_5-level_paging ### Example Translation Let's go through an example to understand how the translation process works in detail: ![An example of a 4-level page hierarchy with each page table shown in physical memory](x86_64-page-table-translation.svg) The physical address of the currently active level 4 page table, which is the root of the 4-level page table, is stored in the `CR3` register. Each page table entry then points to the physical frame of the next level table. The entry of the level 1 table then points to the mapped frame. Note that all addresses in the page tables are physical instead of virtual, because otherwise the CPU would need to translate those addresses too (which could cause a never-ending recursion). The above page table hierarchy maps two pages (in blue). From the page table indices, we can deduce that the virtual addresses of these two pages are `0x803FE7F000` and `0x803FE00000`. Let's see what happens when the program tries to read from address `0x803FE7F5CE`. First, we convert the address to binary and determine the page table indices and the page offset for the address: ![The sign extension bits are all 0, the level 4 index is 1, the level 3 index is 0, the level 2 index is 511, the level 1 index is 127, and the page offset is 0x5ce](x86_64-page-table-translation-addresses.png) With these indices, we can now walk the page table hierarchy to determine the mapped frame for the address: - We start by reading the address of the level 4 table out of the `CR3` register. - The level 4 index is 1, so we look at the entry with index 1 of that table, which tells us that the level 3 table is stored at address 16 KiB. - We load the level 3 table from that address and look at the entry with index 0, which points us to the level 2 table at 24 KiB. - The level 2 index is 511, so we look at the last entry of that page to find out the address of the level 1 table. - Through the entry with index 127 of the level 1 table, we finally find out that the page is mapped to frame 12 KiB, or 0x3000 in hexadecimal. - The final step is to add the page offset to the frame address to get the physical address 0x3000 + 0x5ce = 0x35ce. ![The same example 4-level page hierarchy with 5 additional arrows: "Step 0" from the CR3 register to the level 4 table, "Step 1" from the level 4 entry to the level 3 table, "Step 2" from the level 3 entry to the level 2 table, "Step 3" from the level 2 entry to the level 1 table, and "Step 4" from the level 1 table to the mapped frames.](x86_64-page-table-translation-steps.svg) The permissions for the page in the level 1 table are `r`, which means read-only. The hardware enforces these permissions and would throw an exception if we tried to write to that page. Permissions in higher level pages restrict the possible permissions in lower levels, so if we set the level 3 entry to read-only, no pages that use this entry can be writable, even if lower levels specify read/write permissions. It's important to note that even though this example used only a single instance of each table, there are typically multiple instances of each level in each address space. At maximum, there are: - one level 4 table, - 512 level 3 tables (because the level 4 table has 512 entries), - 512 * 512 level 2 tables (because each of the 512 level 3 tables has 512 entries), and - 512 * 512 * 512 level 1 tables (512 entries for each level 2 table). ### Page Table Format Page tables on the x86_64 architecture are basically an array of 512 entries. In Rust syntax: ```rust #[repr(align(4096))] pub struct PageTable { entries: [PageTableEntry; 512], } ``` As indicated by the `repr` attribute, page tables need to be page-aligned, i.e., aligned on a 4 KiB boundary. This requirement guarantees that a page table always fills a complete page and allows an optimization that makes entries very compact. Each entry is 8 bytes (64 bits) large and has the following format: Bit(s) | Name | Meaning ------ | ---- | ------- 0 | present | the page is currently in memory 1 | writable | it's allowed to write to this page 2 | user accessible | if not set, only kernel mode code can access this page 3 | write-through caching | writes go directly to memory 4 | disable cache | no cache is used for this page 5 | accessed | the CPU sets this bit when this page is used 6 | dirty | the CPU sets this bit when a write to this page occurs 7 | huge page/null | must be 0 in P1 and P4, creates a 1 GiB page in P3, creates a 2 MiB page in P2 8 | global | page isn't flushed from caches on address space switch (PGE bit of CR4 register must be set) 9-11 | available | can be used freely by the OS 12-51 | physical address | the page aligned 52bit physical address of the frame or the next page table 52-62 | available | can be used freely by the OS 63 | no execute | forbid executing code on this page (the NXE bit in the EFER register must be set) We see that only bits 12–51 are used to store the physical frame address. The remaining bits are used as flags or can be freely used by the operating system. This is possible because we always point to a 4096-byte aligned address, either to a page-aligned page table or to the start of a mapped frame. This means that bits 0–11 are always zero, so there is no reason to store these bits because the hardware can just set them to zero before using the address. The same is true for bits 52–63, because the x86_64 architecture only supports 52-bit physical addresses (similar to how it only supports 48-bit virtual addresses). Let's take a closer look at the available flags: - The `present` flag differentiates mapped pages from unmapped ones. It can be used to temporarily swap out pages to disk when the main memory becomes full. When the page is accessed subsequently, a special exception called _page fault_ occurs, to which the operating system can react by reloading the missing page from disk and then continuing the program. - The `writable` and `no execute` flags control whether the contents of the page are writable or contain executable instructions, respectively. - The `accessed` and `dirty` flags are automatically set by the CPU when a read or write to the page occurs. This information can be leveraged by the operating system, e.g., to decide which pages to swap out or whether the page contents have been modified since the last save to disk. - The `write-through caching` and `disable cache` flags allow the control of caches for every page individually. - The `user accessible` flag makes a page available to userspace code, otherwise, it is only accessible when the CPU is in kernel mode. This feature can be used to make [system calls] faster by keeping the kernel mapped while a userspace program is running. However, the [Spectre] vulnerability can allow userspace programs to read these pages nonetheless. - The `global` flag signals to the hardware that a page is available in all address spaces and thus does not need to be removed from the translation cache (see the section about the TLB below) on address space switches. This flag is commonly used together with a cleared `user accessible` flag to map the kernel code to all address spaces. - The `huge page` flag allows the creation of pages of larger sizes by letting the entries of the level 2 or level 3 page tables directly point to a mapped frame. With this bit set, the page size increases by factor 512 to either 2 MiB = 512 * 4 KiB for level 2 entries or even 1 GiB = 512 * 2 MiB for level 3 entries. The advantage of using larger pages is that fewer lines of the translation cache and fewer page tables are needed. [system calls]: https://en.wikipedia.org/wiki/System_call [Spectre]: https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) The `x86_64` crate provides types for [page tables] and their [entries], so we don't need to create these structures ourselves. [page tables]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTable.html [entries]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html ### The Translation Lookaside Buffer A 4-level page table makes the translation of virtual addresses expensive because each translation requires four memory accesses. To improve performance, the x86_64 architecture caches the last few translations in the so-called _translation lookaside buffer_ (TLB). This allows skipping the translation when it is still cached. Unlike the other CPU caches, the TLB is not fully transparent and does not update or remove translations when the contents of page tables change. This means that the kernel must manually update the TLB whenever it modifies a page table. To do this, there is a special CPU instruction called [`invlpg`] (“invalidate page”) that removes the translation for the specified page from the TLB, so that it is loaded again from the page table on the next access. The TLB can also be flushed completely by reloading the `CR3` register, which simulates an address space switch. The `x86_64` crate provides Rust functions for both variants in the [`tlb` module]. [`invlpg`]: https://www.felixcloutier.com/x86/INVLPG.html [`tlb` module]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tlb/index.html It is important to remember to flush the TLB on each page table modification because otherwise, the CPU might keep using the old translation, which can lead to non-deterministic bugs that are very hard to debug. ## Implementation One thing that we did not mention yet: **Our kernel already runs on paging**. The bootloader that we added in the ["A minimal Rust Kernel"] post has already set up a 4-level paging hierarchy that maps every page of our kernel to a physical frame. The bootloader does this because paging is mandatory in 64-bit mode on x86_64. ["A minimal Rust kernel"]: @/edition-2/posts/02-minimal-rust-kernel/index.md#creating-a-bootimage This means that every memory address that we used in our kernel was a virtual address. Accessing the VGA buffer at address `0xb8000` only worked because the bootloader _identity mapped_ that memory page, which means that it mapped the virtual page `0xb8000` to the physical frame `0xb8000`. Paging makes our kernel already relatively safe, since every memory access that is out of bounds causes a page fault exception instead of writing to random physical memory. The bootloader even sets the correct access permissions for each page, which means that only the pages containing code are executable and only data pages are writable. ### Page Faults Let's try to cause a page fault by accessing some memory outside of our kernel. First, we create a page fault handler and register it in our IDT, so that we see a page fault exception instead of a generic [double fault]: [double fault]: @/edition-2/posts/06-double-faults/index.md ```rust // in src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); […] idt.page_fault.set_handler_fn(page_fault_handler); // new idt }; } use x86_64::structures::idt::PageFaultErrorCode; use crate::hlt_loop; extern "x86-interrupt" fn page_fault_handler( stack_frame: InterruptStackFrame, error_code: PageFaultErrorCode, ) { use x86_64::registers::control::Cr2; println!("EXCEPTION: PAGE FAULT"); println!("Accessed Address: {:?}", Cr2::read()); println!("Error Code: {:?}", error_code); println!("{:#?}", stack_frame); hlt_loop(); } ``` The [`CR2`] register is automatically set by the CPU on a page fault and contains the accessed virtual address that caused the page fault. We use the [`Cr2::read`] function of the `x86_64` crate to read and print it. The [`PageFaultErrorCode`] type provides more information about the type of memory access that caused the page fault, for example, whether it was caused by a read or write operation. For this reason, we print it too. We can't continue execution without resolving the page fault, so we enter a [`hlt_loop`] at the end. [`CR2`]: https://en.wikipedia.org/wiki/Control_register#CR2 [`Cr2::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr2.html#method.read [`PageFaultErrorCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html [LLVM bug]: https://github.com/rust-lang/rust/issues/57270 [`hlt_loop`]: @/edition-2/posts/07-hardware-interrupts/index.md#the-hlt-instruction Now we can try to access some memory outside our kernel: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // new let ptr = 0xdeadbeaf as *mut u8; unsafe { *ptr = 42; } // as before #[cfg(test)] test_main(); println!("It did not crash!"); blog_os::hlt_loop(); } ``` When we run it, we see that our page fault handler is called: ![EXCEPTION: Page Fault, Accessed Address: VirtAddr(0xdeadbeaf), Error Code: CAUSED_BY_WRITE, InterruptStackFrame: {…}](qemu-page-fault.png) The `CR2` register indeed contains `0xdeadbeaf`, the address that we tried to access. The error code tells us through the [`CAUSED_BY_WRITE`] that the fault occurred while trying to perform a write operation. It tells us even more through the [bits that are _not_ set][`PageFaultErrorCode`]. For example, the fact that the `PROTECTION_VIOLATION` flag is not set means that the page fault occurred because the target page wasn't present. [`CAUSED_BY_WRITE`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.CAUSED_BY_WRITE We see that the current instruction pointer is `0x2031b2`, so we know that this address points to a code page. Code pages are mapped read-only by the bootloader, so reading from this address works but writing causes a page fault. You can try this by changing the `0xdeadbeaf` pointer to `0x2031b2`: ```rust // Note: The actual address might be different for you. Use the address that // your page fault handler reports. let ptr = 0x2031b2 as *mut u8; // read from a code page unsafe { let x = *ptr; } println!("read worked"); // write to a code page unsafe { *ptr = 42; } println!("write worked"); ``` By commenting out the last line, we see that the read access works, but the write access causes a page fault: ![QEMU with output: "read worked, EXCEPTION: Page Fault, Accessed Address: VirtAddr(0x2031b2), Error Code: PROTECTION_VIOLATION | CAUSED_BY_WRITE, InterruptStackFrame: {…}"](qemu-page-fault-protection.png) We see that the _"read worked"_ message is printed, which indicates that the read operation did not cause any errors. However, instead of the _"write worked"_ message, a page fault occurs. This time the [`PROTECTION_VIOLATION`] flag is set in addition to the [`CAUSED_BY_WRITE`] flag, which indicates that the page was present, but the operation was not allowed on it. In this case, writes to the page are not allowed since code pages are mapped as read-only. [`PROTECTION_VIOLATION`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.PROTECTION_VIOLATION ### Accessing the Page Tables Let's try to take a look at the page tables that define how our kernel is mapped: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); use x86_64::registers::control::Cr3; let (level_4_page_table, _) = Cr3::read(); println!("Level 4 page table at: {:?}", level_4_page_table.start_address()); […] // test_main(), println(…), and hlt_loop() } ``` The [`Cr3::read`] function of the `x86_64` returns the currently active level 4 page table from the `CR3` register. It returns a tuple of a [`PhysFrame`] and a [`Cr3Flags`] type. We are only interested in the frame, so we ignore the second element of the tuple. [`Cr3::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3.html#method.read [`PhysFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/frame/struct.PhysFrame.html [`Cr3Flags`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3Flags.html When we run it, we see the following output: ``` Level 4 page table at: PhysAddr(0x1000) ``` So the currently active level 4 page table is stored at address `0x1000` in _physical_ memory, as indicated by the [`PhysAddr`] wrapper type. The question now is: how can we access this table from our kernel? [`PhysAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.PhysAddr.html Accessing physical memory directly is not possible when paging is active, since programs could easily circumvent memory protection and access the memory of other programs otherwise. So the only way to access the table is through some virtual page that is mapped to the physical frame at address `0x1000`. This problem of creating mappings for page table frames is a general problem since the kernel needs to access the page tables regularly, for example, when allocating a stack for a new thread. Solutions to this problem are explained in detail in the next post. ## Summary This post introduced two memory protection techniques: segmentation and paging. While the former uses variable-sized memory regions and suffers from external fragmentation, the latter uses fixed-sized pages and allows much more fine-grained control over access permissions. Paging stores the mapping information for pages in page tables with one or more levels. The x86_64 architecture uses 4-level page tables and a page size of 4 KiB. The hardware automatically walks the page tables and caches the resulting translations in the translation lookaside buffer (TLB). This buffer is not updated transparently and needs to be flushed manually on page table changes. We learned that our kernel already runs on top of paging and that illegal memory accesses cause page fault exceptions. We tried to access the currently active page tables, but we weren't able to do it because the CR3 register stores a physical address that we can't access directly from our kernel. ## What's next? The next post explains how to implement support for paging in our kernel. It presents different ways to access physical memory from our kernel, which makes it possible to access the page tables that our kernel runs on. At this point, we are able to implement functions for translating virtual to physical addresses and for creating new mappings in the page tables. ================================================ FILE: blog/content/edition-2/posts/08-paging-introduction/index.pt-BR.md ================================================ +++ title = "Introdução à Paginação" weight = 8 path = "pt-BR/paging-introduction" date = 2019-01-14 [extra] chapter = "Gerenciamento de Memória" # Please update this when updating the translation translation_based_on_commit = "9753695744854686a6b80012c89b0d850a44b4b0" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Esta postagem introduz _paginação_, um esquema de gerenciamento de memória muito comum que também usaremos para nosso sistema operacional. Ela explica por que o isolamento de memória é necessário, como _segmentação_ funciona, o que é _memória virtual_, e como paginação resolve problemas de fragmentação de memória. Também explora o layout de tabelas de página multinível na arquitetura x86_64. Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na branch [`post-08`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-08 ## Proteção de Memória Uma tarefa principal de um sistema operacional é isolar programas uns dos outros. Seu navegador web não deveria ser capaz de interferir com seu editor de texto, por exemplo. Para alcançar este objetivo, sistemas operacionais utilizam funcionalidade de hardware para garantir que áreas de memória de um processo não sejam acessíveis por outros processos. Existem diferentes abordagens dependendo do hardware e da implementação do SO. Como exemplo, alguns processadores ARM Cortex-M (usados para sistemas embarcados) têm uma [_Memory Protection Unit_] (MPU), que permite definir um pequeno número (por exemplo, 8) de regiões de memória com diferentes permissões de acesso (por exemplo, sem acesso, somente leitura, leitura-escrita). Em cada acesso à memória, a MPU garante que o endereço está em uma região com permissões de acesso corretas e lança uma exceção caso contrário. Ao mudar as regiões e permissões de acesso em cada troca de processo, o sistema operacional pode garantir que cada processo acesse apenas sua própria memória e assim isole processos uns dos outros. [_Memory Protection Unit_]: https://developer.arm.com/docs/ddi0337/e/memory-protection-unit/about-the-mpu No x86, o hardware suporta duas abordagens diferentes para proteção de memória: [segmentação] e [paginação]. [segmentação]: https://en.wikipedia.org/wiki/X86_memory_segmentation [paginação]: https://en.wikipedia.org/wiki/Virtual_memory#Paged_virtual_memory ## Segmentação Segmentação já foi introduzida em 1978, originalmente para aumentar a quantidade de memória endereçável. A situação naquela época era que CPUs usavam apenas endereços de 16 bits, o que limitava a quantidade de memória endereçável a 64 KiB. Para tornar mais que esses 64 KiB acessíveis, registradores de segmento adicionais foram introduzidos, cada um contendo um endereço de deslocamento. A CPU automaticamente adicionava este deslocamento em cada acesso à memória, então até 1 MiB de memória era acessível. O registrador de segmento é escolhido automaticamente pela CPU dependendo do tipo de acesso à memória: Para buscar instruções, o segmento de código `CS` é usado, e para operações de pilha (push/pop), o segmento de pilha `SS` é usado. Outras instruções usam o segmento de dados `DS` ou o segmento extra `ES`. Posteriormente, dois registradores de segmento adicionais, `FS` e `GS`, foram adicionados, que podem ser usados livremente. Na primeira versão de segmentação, os registradores de segmento continham diretamente o deslocamento e nenhum controle de acesso era realizado. Isso mudou posteriormente com a introdução do [_modo protegido_]. Quando a CPU executa neste modo, os descritores de segmento contêm um índice em uma [_tabela de descritores_] local ou global, que contém – além de um endereço de deslocamento – o tamanho do segmento e permissões de acesso. Ao carregar tabelas de descritores globais/locais separadas para cada processo, que confinam acessos à memória às próprias áreas de memória do processo, o SO pode isolar processos uns dos outros. [_modo protegido_]: https://en.wikipedia.org/wiki/X86_memory_segmentation#Protected_mode [_tabela de descritores_]: https://en.wikipedia.org/wiki/Global_Descriptor_Table Ao modificar os endereços de memória antes do acesso real, segmentação já empregava uma técnica que agora é usada em quase todo lugar: _memória virtual_. ### Memória Virtual A ideia por trás da memória virtual é abstrair os endereços de memória do dispositivo de armazenamento físico subjacente. Em vez de acessar diretamente o dispositivo de armazenamento, um passo de tradução é realizado primeiro. Para segmentação, o passo de tradução é adicionar o endereço de deslocamento do segmento ativo. Imagine um programa acessando o endereço de memória `0x1234000` em um segmento com deslocamento de `0x1111000`: O endereço que é realmente acessado é `0x2345000`. Para diferenciar os dois tipos de endereço, endereços antes da tradução são chamados _virtuais_, e endereços após a tradução são chamados _físicos_. Uma diferença importante entre esses dois tipos de endereços é que endereços físicos são únicos e sempre se referem à mesma localização de memória distinta. Endereços virtuais, por outro lado, dependem da função de tradução. É inteiramente possível que dois endereços virtuais diferentes se refiram ao mesmo endereço físico. Além disso, endereços virtuais idênticos podem se referir a endereços físicos diferentes quando usam funções de tradução diferentes. Um exemplo onde esta propriedade é útil é executar o mesmo programa duas vezes em paralelo: ![Two virtual address spaces with address 0–150, one translated to 100–250, the other to 300–450](segmentation-same-program-twice.svg) Aqui o mesmo programa executa duas vezes, mas com funções de tradução diferentes. A primeira instância tem um deslocamento de segmento de 100, então seus endereços virtuais 0–150 são traduzidos para os endereços físicos 100–250. A segunda instância tem um deslocamento de 300, que traduz seus endereços virtuais 0–150 para endereços físicos 300–450. Isso permite que ambos os programas executem o mesmo código e usem os mesmos endereços virtuais sem interferir uns com os outros. Outra vantagem é que programas agora podem ser colocados em localizações arbitrárias de memória física, mesmo se usarem endereços virtuais completamente diferentes. Assim, o SO pode utilizar a quantidade total de memória disponível sem precisar recompilar programas. ### Fragmentação A diferenciação entre endereços virtuais e físicos torna a segmentação realmente poderosa. No entanto, ela tem o problema de fragmentação. Como exemplo, imagine que queremos executar uma terceira cópia do programa que vimos acima: ![Three virtual address spaces, but there is not enough continuous space for the third](segmentation-fragmentation.svg) Não há forma de mapear a terceira instância do programa para memória virtual sem sobreposição, mesmo que haja mais que memória livre suficiente disponível. O problema é que precisamos de memória _contínua_ e não podemos usar os pequenos pedaços livres. Uma forma de combater esta fragmentação é pausar a execução, mover as partes usadas da memória mais próximas, atualizar a tradução, e então retomar a execução: ![Three virtual address spaces after defragmentation](segmentation-fragmentation-compacted.svg) Agora há espaço contínuo suficiente para iniciar a terceira instância do nosso programa. A desvantagem deste processo de desfragmentação é que ele precisa copiar grandes quantidades de memória, o que diminui o desempenho. Também precisa ser feito regularmente antes que a memória se torne muito fragmentada. Isso torna o desempenho imprevisível já que programas são pausados em momentos aleatórios e podem se tornar não responsivos. O problema de fragmentação é uma das razões pelas quais segmentação não é mais usada pela maioria dos sistemas. Na verdade, segmentação nem é mais suportada no modo de 64 bits no x86. Em vez disso, _paginação_ é usada, que evita completamente o problema de fragmentação. ## Paginação A ideia é dividir tanto o espaço de memória virtual quanto o físico em pequenos blocos de tamanho fixo. Os blocos do espaço de memória virtual são chamados _páginas_, e os blocos do espaço de endereço físico são chamados _frames_. Cada página pode ser individualmente mapeada para um frame, o que torna possível dividir regiões de memória maiores em frames físicos não contínuos. A vantagem disso se torna visível se recapitularmos o exemplo do espaço de memória fragmentado, mas usando paginação em vez de segmentação desta vez: ![With paging, the third program instance can be split across many smaller physical areas.](paging-fragmentation.svg) Neste exemplo, temos um tamanho de página de 50 bytes, o que significa que cada uma de nossas regiões de memória é dividida em três páginas. Cada página é mapeada para um frame individualmente, então uma região de memória virtual contínua pode ser mapeada para frames físicos não contínuos. Isso nos permite iniciar a terceira instância do programa sem realizar nenhuma desfragmentação antes. ### Fragmentação Escondida Comparado à segmentação, paginação usa muitas regiões de memória pequenas de tamanho fixo em vez de algumas regiões grandes de tamanho variável. Como cada frame tem o mesmo tamanho, não há frames que são muito pequenos para serem usados, então nenhuma fragmentação ocorre. Ou _parece_ que nenhuma fragmentação ocorre. Ainda há algum tipo escondido de fragmentação, a chamada _fragmentação interna_. Fragmentação interna ocorre porque nem toda região de memória é um múltiplo exato do tamanho da página. Imagine um programa de tamanho 101 no exemplo acima: Ele ainda precisaria de três páginas de tamanho 50, então ocuparia 49 bytes a mais que o necessário. Para diferenciar os dois tipos de fragmentação, o tipo de fragmentação que acontece ao usar segmentação é chamado _fragmentação externa_. Fragmentação interna é infeliz mas frequentemente melhor que a fragmentação externa que ocorre com segmentação. Ela ainda desperdiça memória, mas não requer desfragmentação e torna a quantidade de fragmentação previsível (em média metade de uma página por região de memória). ### Tabelas de Página Vimos que cada uma das potencialmente milhões de páginas é individualmente mapeada para um frame. Esta informação de mapeamento precisa ser armazenada em algum lugar. Segmentação usa um registrador seletor de segmento individual para cada região de memória ativa, o que não é possível para paginação já que há muito mais páginas que registradores. Em vez disso, paginação usa uma estrutura de tabela chamada _tabela de página_ para armazenar a informação de mapeamento. Para nosso exemplo acima, as tabelas de página pareceriam com isto: ![Three page tables, one for each program instance. For instance 1, the mapping is 0->100, 50->150, 100->200. For instance 2, it is 0->300, 50->350, 100->400. For instance 3, it is 0->250, 50->450, 100->500.](paging-page-tables.svg) Vemos que cada instância de programa tem sua própria tabela de página. Um ponteiro para a tabela atualmente ativa é armazenado em um registrador especial da CPU. No `x86`, este registrador é chamado `CR3`. É trabalho do sistema operacional carregar este registrador com o ponteiro para a tabela de página correta antes de executar cada instância de programa. Em cada acesso à memória, a CPU lê o ponteiro da tabela do registrador e procura o frame mapeado para a página acessada na tabela. Isso é inteiramente feito em hardware e completamente invisível para o programa em execução. Para acelerar o processo de tradução, muitas arquiteturas de CPU têm um cache especial que lembra os resultados das últimas traduções. Dependendo da arquitetura, entradas da tabela de página também podem armazenar atributos como permissões de acesso em um campo de flags. No exemplo acima, a flag "r/w" torna a página tanto legível quanto gravável. ### Tabelas de Página Multinível As tabelas de página simples que acabamos de ver têm um problema em espaços de endereço maiores: elas desperdiçam memória. Por exemplo, imagine um programa que usa as quatro páginas virtuais `0`, `1_000_000`, `1_000_050`, e `1_000_100` (usamos `_` como separador de milhares): ![Page 0 mapped to frame 0 and pages `1_000_000`–`1_000_150` mapped to frames 100–250](single-level-page-table.svg) Ele precisa apenas de 4 frames físicos, mas a tabela de página tem mais de um milhão de entradas. Não podemos omitir as entradas vazias porque então a CPU não seria mais capaz de pular diretamente para a entrada correta no processo de tradução (por exemplo, não é mais garantido que a quarta página use a quarta entrada). Para reduzir a memória desperdiçada, podemos usar uma **tabela de página de dois níveis**. A ideia é que usamos tabelas de página diferentes para regiões de endereço diferentes. Uma tabela adicional chamada tabela de página de _nível 2_ contém o mapeamento entre regiões de endereço e tabelas de página (nível 1). Isso é melhor explicado por um exemplo. Vamos definir que cada tabela de página de nível 1 é responsável por uma região de tamanho `10_000`. Então as seguintes tabelas existiriam para o exemplo de mapeamento acima: ![Page 0 points to entry 0 of the level 2 page table, which points to the level 1 page table T1. The first entry of T1 points to frame 0; the other entries are empty. Pages `1_000_000`–`1_000_150` point to the 100th entry of the level 2 page table, which points to a different level 1 page table T2. The first three entries of T2 point to frames 100–250; the other entries are empty.](multilevel-page-table.svg) A página 0 cai na primeira região de `10_000` bytes, então usa a primeira entrada da tabela de página de nível 2. Esta entrada aponta para a tabela de página de nível 1 T1, que especifica que a página `0` aponta para o frame `0`. As páginas `1_000_000`, `1_000_050`, e `1_000_100` todas caem na 100ª região de `10_000` bytes, então usam a 100ª entrada da tabela de página de nível 2. Esta entrada aponta para uma tabela de página de nível 1 diferente T2, que mapeia as três páginas para frames `100`, `150`, e `200`. Note que o endereço da página em tabelas de nível 1 não inclui o deslocamento da região. Por exemplo, a entrada para a página `1_000_050` é apenas `50`. Ainda temos 100 entradas vazias na tabela de nível 2, mas muito menos que o milhão de entradas vazias antes. A razão para essas economias é que não precisamos criar tabelas de página de nível 1 para as regiões de memória não mapeadas entre `10_000` e `1_000_000`. O princípio de tabelas de página de dois níveis pode ser estendido para três, quatro, ou mais níveis. Então o registrador de tabela de página aponta para a tabela de nível mais alto, que aponta para a tabela de próximo nível mais baixo, que aponta para o próximo nível mais baixo, e assim por diante. A tabela de página de nível 1 então aponta para o frame mapeado. O princípio em geral é chamado de tabela de página _multinível_ ou _hierárquica_. Agora que sabemos como paginação e tabelas de página multinível funcionam, podemos olhar como paginação é implementada na arquitetura x86_64 (assumimos no seguinte que a CPU executa no modo de 64 bits). ## Paginação no x86_64 A arquitetura x86_64 usa uma tabela de página de 4 níveis e um tamanho de página de 4 KiB. Cada tabela de página, independente do nível, tem um tamanho fixo de 512 entradas. Cada entrada tem um tamanho de 8 bytes, então cada tabela tem 512 * 8 B = 4 KiB de tamanho e assim cabe exatamente em uma página. O índice da tabela de página para cada nível é derivado diretamente do endereço virtual: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](x86_64-table-indices-from-address.svg) Vemos que cada índice de tabela consiste de 9 bits, o que faz sentido porque cada tabela tem 2^9 = 512 entradas. Os 12 bits mais baixos são o deslocamento na página de 4 KiB (2^12 bytes = 4 KiB). Os bits 48 a 64 são descartados, o que significa que x86_64 não é realmente 64 bits já que suporta apenas endereços de 48 bits. Mesmo que os bits 48 a 64 sejam descartados, eles não podem ser definidos para valores arbitrários. Em vez disso, todos os bits nesta faixa devem ser cópias do bit 47 para manter endereços únicos e permitir extensões futuras como a tabela de página de 5 níveis. Isso é chamado _extensão de sinal_ porque é muito similar à [extensão de sinal em complemento de dois]. Quando um endereço não é corretamente estendido com sinal, a CPU lança uma exceção. [extensão de sinal em complemento de dois]: https://en.wikipedia.org/wiki/Two's_complement#Sign_extension Vale notar que as CPUs Intel "Ice Lake" recentes opcionalmente suportam [tabelas de página de 5 níveis] para estender endereços virtuais de 48 bits para 57 bits. Dado que otimizar nosso kernel para uma CPU específica não faz sentido neste estágio, trabalharemos apenas com tabelas de página padrão de 4 níveis nesta postagem. [tabelas de página de 5 níveis]: https://en.wikipedia.org/wiki/Intel_5-level_paging ### Exemplo de Tradução Vamos passar por um exemplo para entender como o processo de tradução funciona em detalhes: ![An example of a 4-level page hierarchy with each page table shown in physical memory](x86_64-page-table-translation.svg) O endereço físico da tabela de página de nível 4 atualmente ativa, que é a raiz da tabela de página de 4 níveis, é armazenado no registrador `CR3`. Cada entrada da tabela de página então aponta para o frame físico da tabela de próximo nível. A entrada da tabela de nível 1 então aponta para o frame mapeado. Note que todos os endereços nas tabelas de página são físicos em vez de virtuais, porque caso contrário a CPU precisaria traduzi-los também (o que poderia causar uma recursão sem fim). A hierarquia de tabela de página acima mapeia duas páginas (em azul). Dos índices da tabela de página, podemos deduzir que os endereços virtuais dessas duas páginas são `0x803FE7F000` e `0x803FE00000`. Vamos ver o que acontece quando o programa tenta ler do endereço `0x803FE7F5CE`. Primeiro, convertemos o endereço para binário e determinamos os índices da tabela de página e o deslocamento de página para o endereço: ![The sign extension bits are all 0, the level 4 index is 1, the level 3 index is 0, the level 2 index is 511, the level 1 index is 127, and the page offset is 0x5ce](x86_64-page-table-translation-addresses.png) Com esses índices, agora podemos percorrer a hierarquia da tabela de página para determinar o frame mapeado para o endereço: - Começamos lendo o endereço da tabela de nível 4 do registrador `CR3`. - O índice de nível 4 é 1, então olhamos para a entrada com índice 1 daquela tabela, que nos diz que a tabela de nível 3 está armazenada no endereço 16 KiB. - Carregamos a tabela de nível 3 daquele endereço e olhamos para a entrada com índice 0, que nos aponta para a tabela de nível 2 em 24 KiB. - O índice de nível 2 é 511, então olhamos para a última entrada daquela página para descobrir o endereço da tabela de nível 1. - Através da entrada com índice 127 da tabela de nível 1, finalmente descobrimos que a página está mapeada para o frame 12 KiB, ou 0x3000 em hexadecimal. - O passo final é adicionar o deslocamento de página ao endereço do frame para obter o endereço físico 0x3000 + 0x5ce = 0x35ce. ![The same example 4-level page hierarchy with 5 additional arrows: "Step 0" from the CR3 register to the level 4 table, "Step 1" from the level 4 entry to the level 3 table, "Step 2" from the level 3 entry to the level 2 table, "Step 3" from the level 2 entry to the level 1 table, and "Step 4" from the level 1 table to the mapped frames.](x86_64-page-table-translation-steps.svg) As permissões para a página na tabela de nível 1 são `r`, o que significa somente leitura. O hardware reforça essas permissões e lançaria uma exceção se tentássemos escrever naquela página. Permissões em páginas de nível mais alto restringem as permissões possíveis em níveis mais baixos, então se definirmos a entrada de nível 3 como somente leitura, nenhuma página que use esta entrada pode ser gravável, mesmo se níveis mais baixos especificarem permissões de leitura/escrita. É importante notar que mesmo que este exemplo usasse apenas uma única instância de cada tabela, tipicamente há múltiplas instâncias de cada nível em cada espaço de endereço. No máximo, há: - uma tabela de nível 4, - 512 tabelas de nível 3 (porque a tabela de nível 4 tem 512 entradas), - 512 * 512 tabelas de nível 2 (porque cada uma das 512 tabelas de nível 3 tem 512 entradas), e - 512 * 512 * 512 tabelas de nível 1 (512 entradas para cada tabela de nível 2). ### Formato da Tabela de Página Tabelas de página na arquitetura x86_64 são basicamente um array de 512 entradas. Na sintaxe Rust: ```rust #[repr(align(4096))] pub struct PageTable { entries: [PageTableEntry; 512], } ``` Como indicado pelo atributo `repr`, tabelas de página precisam ser alinhadas por página, isto é, alinhadas em um limite de 4 KiB. Este requisito garante que uma tabela de página sempre preenche uma página completa e permite uma otimização que torna as entradas muito compactas. Cada entrada tem 8 bytes (64 bits) de tamanho e tem o seguinte formato: Bit(s) | Nome | Significado ------ | ---- | ------- 0 | present | a página está atualmente na memória 1 | writable | é permitido escrever nesta página 2 | user accessible | se não definido, apenas código em modo kernel pode acessar esta página 3 | write-through caching | escritas vão diretamente para a memória 4 | disable cache | nenhum cache é usado para esta página 5 | accessed | a CPU define este bit quando esta página é usada 6 | dirty | a CPU define este bit quando uma escrita nesta página ocorre 7 | huge page/null | deve ser 0 em P1 e P4, cria uma página de 1 GiB em P3, cria uma página de 2 MiB em P2 8 | global | página não é removida dos caches em troca de espaço de endereço (bit PGE do registrador CR4 deve estar definido) 9-11 | available | pode ser usado livremente pelo SO 12-51 | physical address | o endereço físico de 52 bits alinhado por página do frame ou da próxima tabela de página 52-62 | available | pode ser usado livremente pelo SO 63 | no execute | proíbe executar código nesta página (o bit NXE no registrador EFER deve estar definido) Vemos que apenas os bits 12–51 são usados para armazenar o endereço físico do frame. Os bits restantes são usados como flags ou podem ser usados livremente pelo sistema operacional. Isso é possível porque sempre apontamos para um endereço alinhado em 4096 bytes, seja para uma tabela de página alinhada por página ou para o início de um frame mapeado. Isso significa que os bits 0–11 são sempre zero, então não há razão para armazenar esses bits porque o hardware pode simplesmente defini-los para zero antes de usar o endereço. O mesmo é verdade para os bits 52–63, porque a arquitetura x86_64 suporta apenas endereços físicos de 52 bits (similar a como suporta apenas endereços virtuais de 48 bits). Vamos olhar mais de perto as flags disponíveis: - A flag `present` diferencia páginas mapeadas de não mapeadas. Ela pode ser usada para temporariamente trocar páginas para o disco quando a memória principal fica cheia. Quando a página é acessada subsequentemente, uma exceção especial chamada _page fault_ ocorre, à qual o sistema operacional pode reagir recarregando a página faltante do disco e então continuando o programa. - As flags `writable` e `no execute` controlam se o conteúdo da página é gravável ou contém instruções executáveis, respectivamente. - As flags `accessed` e `dirty` são automaticamente definidas pela CPU quando uma leitura ou escrita na página ocorre. Esta informação pode ser aproveitada pelo sistema operacional, por exemplo, para decidir quais páginas trocar ou se o conteúdo da página foi modificado desde o último salvamento no disco. - As flags `write-through caching` e `disable cache` permitem o controle de caches para cada página individualmente. - A flag `user accessible` torna uma página disponível para código em espaço de usuário, caso contrário, é acessível apenas quando a CPU está em modo kernel. Este recurso pode ser usado para tornar [chamadas de sistema] mais rápidas mantendo o kernel mapeado enquanto um programa em espaço de usuário está executando. No entanto, a vulnerabilidade [Spectre] pode permitir que programas em espaço de usuário leiam essas páginas de qualquer forma. - A flag `global` sinaliza ao hardware que uma página está disponível em todos os espaços de endereço e assim não precisa ser removida do cache de tradução (veja a seção sobre o TLB abaixo) em trocas de espaço de endereço. Esta flag é comumente usada junto com uma flag `user accessible` desmarcada para mapear o código do kernel para todos os espaços de endereço. - A flag `huge page` permite a criação de páginas de tamanhos maiores permitindo que as entradas das tabelas de página de nível 2 ou nível 3 apontem diretamente para um frame mapeado. Com este bit definido, o tamanho da página aumenta por fator 512 para 2 MiB = 512 * 4 KiB para entradas de nível 2 ou até 1 GiB = 512 * 2 MiB para entradas de nível 3. A vantagem de usar páginas maiores é que menos linhas do cache de tradução e menos tabelas de página são necessárias. [chamadas de sistema]: https://en.wikipedia.org/wiki/System_call [Spectre]: https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) A crate `x86_64` fornece tipos para [tabelas de página] e suas [entradas], então não precisamos criar essas estruturas nós mesmos. [tabelas de página]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTable.html [entradas]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html ### O Translation Lookaside Buffer Uma tabela de página de 4 níveis torna a tradução de endereços virtuais cara porque cada tradução requer quatro acessos à memória. Para melhorar o desempenho, a arquitetura x86_64 armazena em cache as últimas traduções no chamado _translation lookaside buffer_ (TLB). Isso permite pular a tradução quando ela ainda está em cache. Ao contrário dos outros caches da CPU, o TLB não é totalmente transparente e não atualiza ou remove traduções quando o conteúdo das tabelas de página muda. Isso significa que o kernel deve atualizar manualmente o TLB sempre que modifica uma tabela de página. Para fazer isso, há uma instrução especial da CPU chamada [`invlpg`] ("invalidate page") que remove a tradução para a página especificada do TLB, para que seja carregada novamente da tabela de página no próximo acesso. O TLB também pode ser completamente esvaziado recarregando o registrador `CR3`, que simula uma troca de espaço de endereço. A crate `x86_64` fornece funções Rust para ambas as variantes no [módulo `tlb`]. [`invlpg`]: https://www.felixcloutier.com/x86/INVLPG.html [módulo `tlb`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tlb/index.html É importante lembrar de esvaziar o TLB em cada modificação de tabela de página porque caso contrário a CPU pode continuar usando a tradução antiga, o que pode levar a bugs não-determinísticos que são muito difíceis de depurar. ## Implementação Uma coisa que ainda não mencionamos: **Nosso kernel já executa em paginação**. O bootloader que adicionamos na postagem ["Um Kernel Rust Mínimo"] já configurou uma hierarquia de paginação de 4 níveis que mapeia cada página do nosso kernel para um frame físico. O bootloader faz isso porque paginação é obrigatória no modo de 64 bits no x86_64. ["Um Kernel Rust Mínimo"]: @/edition-2/posts/02-minimal-rust-kernel/index.md#creating-a-bootimage Isso significa que cada endereço de memória que usamos em nosso kernel era um endereço virtual. Acessar o buffer VGA no endereço `0xb8000` só funcionou porque o bootloader fez _identity mapping_ daquela página de memória, o que significa que ele mapeou a página virtual `0xb8000` para o frame físico `0xb8000`. Paginação já torna nosso kernel relativamente seguro, já que cada acesso à memória que está fora dos limites causa uma exceção de page fault em vez de escrever em memória física aleatória. O bootloader até define as permissões de acesso corretas para cada página, o que significa que apenas as páginas contendo código são executáveis e apenas páginas de dados são graváveis. ### Page Faults Vamos tentar causar um page fault acessando alguma memória fora do nosso kernel. Primeiro, criamos um manipulador de page fault e o registramos em nossa IDT, para que vejamos uma exceção de page fault em vez de um [double fault] genérico: [double fault]: @/edition-2/posts/06-double-faults/index.md ```rust // em src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); […] idt.page_fault.set_handler_fn(page_fault_handler); // novo idt }; } use x86_64::structures::idt::PageFaultErrorCode; use crate::hlt_loop; extern "x86-interrupt" fn page_fault_handler( stack_frame: InterruptStackFrame, error_code: PageFaultErrorCode, ) { use x86_64::registers::control::Cr2; println!("EXCEÇÃO: PAGE FAULT"); println!("Endereço Acessado: {:?}", Cr2::read()); println!("Código de Erro: {:?}", error_code); println!("{:#?}", stack_frame); hlt_loop(); } ``` O registrador [`CR2`] é automaticamente definido pela CPU em um page fault e contém o endereço virtual acessado que causou o page fault. Usamos a função [`Cr2::read`] da crate `x86_64` para lê-lo e imprimi-lo. O tipo [`PageFaultErrorCode`] fornece mais informações sobre o tipo de acesso à memória que causou o page fault, por exemplo, se foi causado por uma operação de leitura ou escrita. Por esta razão, também o imprimimos. Não podemos continuar a execução sem resolver o page fault, então entramos em um [`hlt_loop`] no final. [`CR2`]: https://en.wikipedia.org/wiki/Control_register#CR2 [`Cr2::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr2.html#method.read [`PageFaultErrorCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html [LLVM bug]: https://github.com/rust-lang/rust/issues/57270 [`hlt_loop`]: @/edition-2/posts/07-hardware-interrupts/index.md#the-hlt-instruction Agora podemos tentar acessar alguma memória fora do nosso kernel: ```rust // em src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Olá Mundo{}", "!"); blog_os::init(); // novo let ptr = 0xdeadbeaf as *mut u8; unsafe { *ptr = 42; } // como antes #[cfg(test)] test_main(); println!("Não crashou!"); blog_os::hlt_loop(); } ``` Quando o executamos, vemos que nosso manipulador de page fault é chamado: ![EXCEPTION: Page Fault, Accessed Address: VirtAddr(0xdeadbeaf), Error Code: CAUSED_BY_WRITE, InterruptStackFrame: {…}](qemu-page-fault.png) O registrador `CR2` de fato contém `0xdeadbeaf`, o endereço que tentamos acessar. O código de erro nos diz através do [`CAUSED_BY_WRITE`] que a falha ocorreu ao tentar realizar uma operação de escrita. Ele nos diz ainda mais através dos [bits que _não_ estão definidos][`PageFaultErrorCode`]. Por exemplo, o fato de que a flag `PROTECTION_VIOLATION` não está definida significa que o page fault ocorreu porque a página alvo não estava presente. [`CAUSED_BY_WRITE`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.CAUSED_BY_WRITE Vemos que o ponteiro de instrução atual é `0x2031b2`, então sabemos que este endereço aponta para uma página de código. Páginas de código são mapeadas como somente leitura pelo bootloader, então ler deste endereço funciona mas escrever causa um page fault. Você pode tentar isso mudando o ponteiro `0xdeadbeaf` para `0x2031b2`: ```rust // Note: O endereço real pode ser diferente para você. Use o endereço que // seu manipulador de page fault reporta. let ptr = 0x2031b2 as *mut u8; // lê de uma página de código unsafe { let x = *ptr; } println!("leitura funcionou"); // escreve em uma página de código unsafe { *ptr = 42; } println!("escrita funcionou"); ``` Ao comentar a última linha, vemos que o acesso de leitura funciona, mas o acesso de escrita causa um page fault: ![QEMU with output: "leitura funcionou, EXCEÇÃO: Page Fault, Accessed Address: VirtAddr(0x2031b2), Error Code: PROTECTION_VIOLATION | CAUSED_BY_WRITE, InterruptStackFrame: {…}"](qemu-page-fault-protection.png) Vemos que a mensagem _"leitura funcionou"_ é impressa, o que indica que a operação de leitura não causou nenhum erro. No entanto, em vez da mensagem _"escrita funcionou"_, ocorre um page fault. Desta vez a flag [`PROTECTION_VIOLATION`] está definida além da flag [`CAUSED_BY_WRITE`], o que indica que a página estava presente, mas a operação não era permitida nela. Neste caso, escritas na página não são permitidas já que páginas de código são mapeadas como somente leitura. [`PROTECTION_VIOLATION`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.PROTECTION_VIOLATION ### Acessando as Tabelas de Página Vamos tentar dar uma olhada nas tabelas de página que definem como nosso kernel é mapeado: ```rust // em src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Olá Mundo{}", "!"); blog_os::init(); use x86_64::registers::control::Cr3; let (level_4_page_table, _) = Cr3::read(); println!("Tabela de página de nível 4 em: {:?}", level_4_page_table.start_address()); […] // test_main(), println(…), e hlt_loop() } ``` A função [`Cr3::read`] da crate `x86_64` retorna a tabela de página de nível 4 atualmente ativa do registrador `CR3`. Ela retorna uma tupla de um tipo [`PhysFrame`] e um tipo [`Cr3Flags`]. Estamos interessados apenas no frame, então ignoramos o segundo elemento da tupla. [`Cr3::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3.html#method.read [`PhysFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/frame/struct.PhysFrame.html [`Cr3Flags`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3Flags.html Quando o executamos, vemos a seguinte saída: ``` Tabela de página de nível 4 em: PhysAddr(0x1000) ``` Então a tabela de página de nível 4 atualmente ativa está armazenada no endereço `0x1000` na memória _física_, como indicado pelo tipo wrapper [`PhysAddr`]. A questão agora é: como podemos acessar esta tabela do nosso kernel? [`PhysAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.PhysAddr.html Acessar memória física diretamente não é possível quando paginação está ativa, já que programas poderiam facilmente contornar a proteção de memória e acessar a memória de outros programas caso contrário. Então a única forma de acessar a tabela é através de alguma página virtual que está mapeada para o frame físico no endereço `0x1000`. Este problema de criar mapeamentos para frames de tabela de página é um problema geral, já que o kernel precisa acessar as tabelas de página regularmente, por exemplo, ao alocar uma pilha para uma nova thread. Soluções para este problema são explicadas em detalhes na próxima postagem. ## Resumo Esta postagem introduziu duas técnicas de proteção de memória: segmentação e paginação. Enquanto a primeira usa regiões de memória de tamanho variável e sofre de fragmentação externa, a última usa páginas de tamanho fixo e permite controle muito mais refinado sobre permissões de acesso. Paginação armazena a informação de mapeamento para páginas em tabelas de página com um ou mais níveis. A arquitetura x86_64 usa tabelas de página de 4 níveis e um tamanho de página de 4 KiB. O hardware automaticamente percorre as tabelas de página e armazena em cache as traduções resultantes no translation lookaside buffer (TLB). Este buffer não é atualizado transparentemente e precisa ser esvaziado manualmente em mudanças de tabela de página. Aprendemos que nosso kernel já executa em cima de paginação e que acessos ilegais à memória causam exceções de page fault. Tentamos acessar as tabelas de página atualmente ativas, mas não conseguimos fazê-lo porque o registrador CR3 armazena um endereço físico que não podemos acessar diretamente do nosso kernel. ## O Que Vem a Seguir? A próxima postagem explica como implementar suporte para paginação em nosso kernel. Ela apresenta diferentes formas de acessar memória física do nosso kernel, o que torna possível acessar as tabelas de página nas quais nosso kernel executa. Neste ponto, seremos capazes de implementar funções para traduzir endereços virtuais para físicos e para criar novos mapeamentos nas tabelas de página. ================================================ FILE: blog/content/edition-2/posts/08-paging-introduction/index.zh-CN.md ================================================ +++ title = "内存分页初探" weight = 8 path = "zh-CN/paging-introduction" date = 2019-01-14 [extra] # Please update this when updating the translation translation_based_on_commit = "096c044b4f3697e91d8e30a2e817e567d0ef21a2" # GitHub usernames of the people that translated this post translators = ["liuyuran"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["JiangengDong"] +++ 本文主要讲解 _内存分页_ 机制,一种我们将会应用到操作系统里的十分常见的内存模型。同时,也会展开说明为何需要进行内存隔离、_分段机制_ 是如何运作的、_虚拟内存_ 是什么,以及内存分页是如何解决内存碎片问题的,同时也会对x86_64的多级页表布局进行探索。 这个系列的 blog 在[GitHub]上开放开发,如果你有任何问题,请在这里开一个 issue 来讨论。当然你也可以在[底部][at the bottom]留言。你可以在[`post-08`][post branch]找到这篇文章的完整源码。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-08 ## 内存保护 操作系统的主要任务之一就是隔离各个应用程序的执行环境,比如你的浏览器不应对你的文本编辑器造成影响,因此,操作系统会利用硬件级别的功能确保一个进程无法访问另一个进程的内存区域,但具体实现方式因硬件和操作系统实现而异。 比如一些 ARM Cortex-M 处理器(用于嵌入式系统)搭载了 [_内存保护单元_][_Memory Protection Unit_] (MPU),该单元允许你定义少量具有不同读写权限的内存区域。MPU可以确保每一次对内存的访问都需要具备对应的权限,否则就会抛出异常。而操作系统则会在进程切换时,确保当前进程仅能访问自己所持有的内存区域,由此实现内存隔离。 [_Memory Protection Unit_]: https://developer.arm.com/docs/ddi0337/e/memory-protection-unit/about-the-mpu 在x86架构下,硬件层次为内存保护提供了两种不同的途径:[段][segmentation] 和 [页][paging]。 [segmentation]: https://en.wikipedia.org/wiki/X86_memory_segmentation [paging]: https://en.wikipedia.org/wiki/Virtual_memory#Paged_virtual_memory ## 内存分段 内存分段技术出现于1978年,初衷是用于扩展可用内存,该技术的最初背景是当时的CPU仅使用16位地址,而可使用的内存也只有64KiB。为了扩展可用内存,用于存储偏移量的段寄存器这个概念应运而生,CPU可以据此访问更多的内存,因此可用内存被成功扩展到了1MiB。 CPU可根据内存访问方式自动确定段寄存器的定义:对于指令获取操作,使用代码段寄存器 `CS`;对于栈操作(入栈/出栈),使用栈段寄存器 `SS`;对于其他指令,则使用数据段寄存器 `DS` 或额外段寄存器 `ES`。另外还有两个后来添加的扩展段寄存器 `FS` 和 `GS`,可以随意使用。 在最初版本的内存分段中,段寄存器仅仅是直接包含了偏移量,并不包含任何权限控制,直到 [_保护模式_][_protected mode_] 这个概念的出现。当CPU进入此模式后,段描述符会包含一个本地或全局的 [_描述符表_][_descriptor table_] 索引,它对应的数据包含了偏移量、段的大小和访问权限。通过加载各个进程所属的全局/本地描述符表,可以实现进程仅能访问属于自己的内存区域的效果,操作系统也由此实现了进程隔离。 [_protected mode_]: https://en.wikipedia.org/wiki/X86_memory_segmentation#Protected_mode [_descriptor table_]: https://en.wikipedia.org/wiki/Global_Descriptor_Table 针对在判断权限前如何更正内存地址这个问题,内存分段使用了一个如今已经高度普及的技术:_虚拟内存_。 ### 虚拟内存 所谓虚拟内存,就是将物理存储器地址抽象为一段完全独立的内存区域,在直接访问物理存储器之前,加入了一个地址转换的步骤。对于内存分页机制而言,地址转换就是在虚拟地址的基础上加入偏移量,如在偏移量为 `0x1111000` 的段中,虚拟地址 `0x1234000` 的对应的物理内存地址是 `0x2345000`。 首先我们需要明确两个名词,执行地址转换步骤之前的地址叫做 _虚拟地址_,而转换后的地址叫做 _物理地址_,两者最显著的区别就是物理地址是全局唯一的,而两个虚拟地址理论上可能指向同一个物理地址。同样的,如果使用不同的地址偏移量,同一个虚拟地址可能会对应不同的物理地址。 最直观的例子就是同时执行两个相同的程序: ![Two virtual address spaces with address 0–150, one translated to 100–250, the other to 300–450](segmentation-same-program-twice.svg) 如你所见,这就是两个相同程序的内存分配情况,两者具有不同的地址偏移量(即 _段基址_)。第一个程序实例的段基址为100,所以其虚拟地址范围0-150换算成物理地址就是100-250。第二个程序实例的段基址为300,所以其虚拟地址范围0-150换算成物理地址就是300-450。所以该机制允许程序共用同一套代码逻辑,使用同样的虚拟地址,并且不会干扰到彼此。 该机制的另一个优点就是让程序不局限于特定的某一段物理内存,而是依赖另一套虚拟内存地址,从而让操作系统在不重编译程序的前提下使用全部的内存区域。 ### 内存碎片 虚拟内存机制已经让内存分段机制十分强大,但也有碎片化的问题,请看,如果我们同时执行三个程序实例的话: ![Three virtual address spaces, but there is not enough continuous space for the third](segmentation-fragmentation.svg) 在不能重叠使用的前提下,我们完全找不到足够的地方来容纳第三个程序,因为剩余的连续空间已经不够了。此时的问题在于,我们需要使用 _连续_ 的内存区域,不要将那些中间的空白部分白白浪费掉。 比较合适的办法就是暂停程序运行,将内存块移动到一个连续区间内,更新段基址信息,然后恢复程序运行: ![Three virtual address spaces after defragmentation](segmentation-fragmentation-compacted.svg) 这样我们就有足够的内存空间来运行第三个程序实例了。 但这样做也有一些问题,内存整理程序往往需要拷贝一段比较大的内存,这会很大程度上影响性能,但是又必须在碎片问题变得过于严重前完成这个操作。同时由于其消耗时间的不可预测性,程序很可能会随机挂起,甚至在用户视角下失去响应。 这也是大多数系统放弃内存分段技术的原因之一,事实上,该技术已经被x86平台的64位模式所抛弃,因为 _内存分页技术_ 已经完全解决了碎片化问题。 ## 内存分页 内存分页的思想依然是使用虚拟地址映射物理地址,但是其分配单位变成了固定长度的较小的内存区域。这些虚拟内存块被称为 _页_,而其对应的物理内存则被称为 _页帧_,每一页都可以映射到一个对应的页帧中。这也就意味着我们可以将程序所使用的一大块内存区域打散到所有物理内存中,而不必分配一块连续的区域。 其优势就在于,如果我们遇到上文中提到的内存碎片问题时,内存分页技术会这样解决它: ![With paging the third program instance can be split across many smaller physical areas](paging-fragmentation.svg) 例如我们将页的单位设置为50字节,也就是说我们的每一个程序实例所使用的内存都被分割为三页。每一页都可以独立映射到一个页帧中,因此连续的虚拟内存并不一定需要对应连续的物理内存区域,因此也就无需进行内存碎片整理了。 ### 潜在碎片 对比内存分段,内存分页选择用较多的较小且固定长度的内存区域代替较少的较大且长度不固定的内存区域。正因为如此,不会有页帧因为长度过小而产生内存碎片。 然而这只是 _表面上如此_,实际上依然存在着名为 _内部碎片_ 的隐蔽内存碎片,造成内部碎片的原因是并非每个内存区域都是分页单位的整数倍。比如一个程序需要101字节的内存,但它依然需要分配3个长度为50字节的页,最终造成了49字节的内存浪费,区别于内存分段造成的内存碎片,这种情况被称为 _内部碎片_。 内部碎片虽然也很可恶,但是无论如何也比内存分段造成的内存碎片要好得多,尽管其依然会浪费内存空间,但是无需碎片整理,且碎片数量是可预测的(每一个虚拟内存空间平均会造成半个页帧的内存浪费)。 ### 页表 我们应当预见到,在操作系统开始运行后,会存在数以百万计的页-页帧映射关系,这些映射关系需要存储在某个地方。分段技术可以为每个活动的内存区域都指定一个段寄存器,但是分页技术不行,因为其使用到的页的数量实在是太多了,远多于寄存器数量,所以分页技术采用了一种叫做 _页表_ 的结构来存储映射信息。 以上面的应用场合为例,页表看起来是这样子的: ![Three page tables, one for each program instance. For instance 1 the mapping is 0->100, 50->150, 100->200. For instance 2 it is 0->300, 50->350, 100->400. For instance 3 it is 0->250, 50->450, 100->500.](paging-page-tables.svg) 我们可以看到每个程序实例都有其专有的页表,但当前正在活跃的页表指针会被存储到特定的CPU寄存器中,在 `x86` 架构中,该寄存器被称为 `CR3`。操作系统的任务之一,就是在程序运行前,把当前所使用的页表指针推进对应的寄存器中。 每次内存访问CPU都会从寄存器获取页表指针,并从页表中获取虚拟地址所对应的页帧,这一步操作完全由硬件完成,对于程序而言是完全透明的。为了加快地址转换的速度,许多CPU架构都加入了一个能够存储最后一次地址转换相关信息的特殊缓存。 根据架构实现的不同,页表也可以在 flags 字段存储一些额外的属性,如访问权限之类。在上面的场景下。 "r/w" 这个 flag 可以使该页同时能够读和写。 ### 多级页表 上文中的简单页表在较大的地址空间下会有个问题:太浪费内存了。打个比方,一个程序需要使用4个虚拟内存页 `0`、`1_000_000`、`1_000_050` 和 `1_000_100`(假设以 `_` 为千位分隔符): ![Page 0 mapped to frame 0 and pages `1_000_000`–`1_000_150` mapped to frames 100–250](single-level-page-table.svg) 尽管它仅仅会使用4个页帧,但是页表中有百万级别的映射条目,而我们还不能释放那些空白的条目,因为这会对地址转换造成很大的风险(比如可能无法保证4号页依然对应4号页帧)。 我们可以使用 **两级页表** 来避免内存浪费,其基本思路就是对不同的地址区域使用不同的页表。地址区域和一级页表的映射关系被存储在一个被称为 _二级页表_ 的额外表格中。 举个例子,我们先假设每个一级页表映射 `10_000` 字节的内存空间,在上文所述的应用场合下,此时的页表结构看上去是这样的: ![Page 0 points to entry 0 of the level 2 page table, which points to the level 1 page table T1. The first entry of T1 points to frame 0, the other entries are empty. Pages `1_000_000`–`1_000_150` point to the 100th entry of the level 2 page table, which points to a different level 1 page table T2. The first three entries of T2 point to frames 100–250, the other entries are empty.](multilevel-page-table.svg) 页 `0` 位于第一个 `10_000` 字节的内存区域内,位于内存区域 `0` 内,对应一级页表 `T1`,所以它所在的内存位置也可以被表述为 `页 0 帧 0`. 页 `1_000_000`、 `1_000_050` 和 `1_000_100` 均可以映射到第100个 `10_000` 字节的内存区域内,所以位于内存区域 `1_000_100` 中,该内存区域指向一级页表 T2。但这三个页分别对应该一级页表 T2 中的页帧 `100`、`150` 和 `200`,因为一级页表中是不存储内存区域偏移量的。 在这个场合中,二级页表中还是出现了100个被浪费的位置,不过无论如何也比之前数以百万计的浪费好多了,因为我们没有额外创建指向 `10_000` 到 `1_000_000` 这段内存区域的一级页表。 同理,两级页表的原理可以扩展到三级、四级甚至更多的级数。通常而言,可以让页表寄存器指向最高级数的表,然后一层一层向下寻址,直到抵达一级页表,获取页帧地址。这种技术就叫做 _多级_ 或 _多层_ 页表。 那么现在我们已经明白了内存分页和多级页表机制的工作原理,下面我们会探索一下在 x86_64 平台下内存分页机制是如何实现的(假设CPU运行在64位模式下)。 ## x86_64中的分页 x86_64 平台使用4级页表,页大小为4KiB,无论层级,每个页表均具有512个条目,每个条目占用8字节,所以每个页表固定占用 512 * 8B = 4KiB,正好占满一个内存页。 每一级的页表索引号都可以通过虚拟地址推导出来: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](x86_64-table-indices-from-address.svg) 我们可以看到,每个表索引号占据 9 个比特,这当然是有道理的,每个表都有 2^9 = 512 个条目,低12位用来表示内存页的偏移量(2^12 bytes = 4KiB,而上文提到页大小为4KiB)。第 48-64 位毫无用处,这也就意味着 x86_64 并非真正的 64 位,因为它实际上支持 48 位地址。 [5-level page table]: https://en.wikipedia.org/wiki/Intel_5-level_paging 尽管48-64位毫无用处,但依然不被允许随意赋值,而是必须将其设置为与47位相同的值以保证地址唯一性,由此留出未来对此进行扩展的可能性,如实现5级页表。该技术被称为 _符号扩展_,理由是它与 [二进制补码][sign extension in two's complement] 机制真的太相似了。当地址不符合该机制定义的规则时,CPU会抛出异常。 [sign extension in two's complement]: https://en.wikipedia.org/wiki/Two's_complement#Sign_extension 值得注意的是,英特尔最近发布了一款代号是冰湖的CPU,它的新功能之一就是可选支持能够将虚拟地址从48位扩展到57位的 [5级页表][5-level page tables]。但是针对一款特定的CPU做优化在现阶段并没有多少意义,所以本文仅会涉及标准的4级页表。 [5-level page tables]: https://en.wikipedia.org/wiki/Intel_5-level_paging ### 地址转换范例 请看下图,这就是一个典型的地址转换过程的范例: ![An example 4-level page hierarchy with each page table shown in physical memory](x86_64-page-table-translation.svg) `CR3` 寄存器中存储着指向4级页表的物理地址,而在每一级的页表(除一级页表外)中,都存在着指向下一级页表的指针,1级页表则存放着直接指向页帧地址的指针。注意,这里的指针,都是指页表的物理地址,而非虚拟地址,否则CPU会因为需要进行额外的地址转换而陷入无限递归中。 最终,寻址结果是上图中的两个蓝色区域,根据页表查询结果,它们的虚拟地址分别是 `0x803FE7F000` 和 `0x803FE00000`,那么让我们看一看当程序尝试访问内存地址 `0x803FE7F5CE` 时会发生什么事情。首先我们需要把地址转换为二进制,然后确定该地址所对应的页表索引和页偏移量: ![The sign extension bits are all 0, the level 4 index is 1, the level 3 index is 0, the level 2 index is 511, the level 1 index is 127, and the page offset is 0x5ce](x86_64-page-table-translation-addresses.png) 通过这些索引,我们就可以通过依次查询多级页表来定位最终要指向的页帧: - 首先,我们需要从 `CR3` 寄存器中读出4级页表的物理地址。 - 4级页表的索引号是1,所以我们可以看到3级页表的地址是16KiB。 - 载入3级页表,根据索引号0,确定2级页表的地址是24KiB。 - 载入2级页表,根据索引号511,确定1级页表的地址是32KiB。 - 载入1级页表,根据索引号127,确定该地址所对应的页帧地址为12KiB,使用Hex表达可写作 0x3000。 - 最终步骤就是将最后的页偏移量拼接到页帧地址上,即可得到物理地址,即 0x3000 + 0x5ce = 0x35ce。 ![The same example 4-level page hierarchy with 5 additional arrows: "Step 0" from the CR3 register to the level 4 table, "Step 1" from the level 4 entry to the level 3 table, "Step 2" from the level 3 entry to the level 2 table, "Step 3" from the level 2 entry to the level 1 table, and "Step 4" from the level 1 table to the mapped frames.](x86_64-page-table-translation-steps.svg) 由上图可知,该页帧在一级页表中的权限被标记为 `r`,即只读,硬件层面已经确保当我们试图写入数据的时候会抛出异常。较高级别的页表的权限设定会覆盖较低级别的页表,如3级页表中设定为只读的区域,其所关联的所有下级页表对应的内存区域均会被认为是只读,低级别的页表本身的设定会被忽略。 注意,示例图片中为了简化显示,看起来每个页表都只有一个条目,但实际上,4级以下的页表每一层都可能存在多个实例,其数量上限如下: - 1个4级页表 - 512个3级页表(因为4级页表可以有512个条目) - 512*512个2级页表(因为每个3级页表可以有512个条目) - 512\*512\*512个1级页表(因为每个2级页表可以有512个条目) ### 页表格式 在 x86_64 平台下,页表是一个具有512个条目的数组,于Rust而言就是这样: ```rust #[repr(align(4096))] pub struct PageTable { entries: [PageTableEntry; 512], } ``` `repr` 属性定义了内存页的大小,这里将其设定为了4KiB,该设置确保了页表总是能填满一整个内存页,并允许编译器进行一些优化,使其存储方式更加紧凑。 每个页表条目长度都是8字节(64比特),其内部结构如下: | Bit(s) | 名字 | 含义 | | ------ | --------------------- | ----------------------------------------------------------------------------- | | 0 | present | 该页目前在内存中 | | 1 | writable | 该页可写 | | 2 | user accessible | 如果没有设定,仅内核代码可以访问该页 | | 3 | write through caching | 写操作直接应用到内存 | | 4 | disable cache | 对该页禁用缓存 | | 5 | accessed | 当该页正在被使用时,CPU设置该比特的值 | | 6 | dirty | 当该页正在被写入时,CPU设置该比特的值 | | 7 | huge page/null | 在P1和P4状态时必须为0,在P3时创建一个1GiB的内存页,在P2时创建一个2MiB的内存页 | | 8 | global | 当地址空间切换时,该页尚未应用更新(CR4寄存器中的PGE比特位必须一同被设置) | | 9-11 | available | 可被操作系统自由使用 | | 12-51 | physical address | 经过52比特对齐过的页帧地址,或下一级的页表地址 | | 52-62 | available | 可被操作系统自由使用 | | 63 | no execute | 禁止在该页中运行代码(EFER寄存器中的NXE比特位必须一同被设置) | 我们可以看到,仅12–51位会用于存储页帧地址或页表地址,其余比特都用于存储标志位,或由操作系统自由使用。 其原因就是,该地址总是指向一个4096字节对齐的地址、页表或者页帧的起始地址。 这也就意味着0-11位始终为0,没有必要存储这些东西,硬件层面在使用该地址之前,也会将这12位比特设置为0,52-63位同理,因为x86_64平台仅支持52位物理地址(类似于上文中提到的仅支持48位虚拟地址的原因)。 进一步说明一下可用的标志位: - `present` 标志位并非是指未映射的页,而是指其对应的内存页由于物理内存已满而被交换到硬盘中,如果该页在换出之后再度被访问,则会抛出 _page fault_ 异常,此时操作系统应该将此页重新载入物理内存以继续执行程序。 - `writable` 和 `no execute` 标志位分别控制该页是否可写,以及是否包含可执行指令。 - `accessed` 和 `dirty` 标志位由CPU在读写该页时自动设置,该状态信息可用于辅助操作系统的内存控制,如判断哪些页可以换出,以及换出到硬盘后页里的内容是否已被修改。 - `write through caching` 和 `disable cache` 标志位可以单独控制每一个页对应的缓存。 - `user accessible` 标志位决定了页中是否包含用户态的代码,否则它仅当CPU处于核心态时可访问。该特性可用于在用户态程序运行时保持内核代码映射以加速[系统调用][system calls]。然而,[Spectre] 漏洞会允许用户态程序读取到此类页的数据。 - `global` 标志位决定了该页是否会在所有地址空间都存在,即使切换地址空间,也不会从地址转换缓存(参见下文中关于TLB的章节)中被移除。一般和 `user accessible` 标志位共同使用,在所有地址空间映射内核代码。 - `huge page` 标志位允许2级页表或3级页表直接指向页帧来分配一块更大的内存空间,该标志位被启用后,页大小会增加512倍。就结果而言,对于2级页表的条目,其会直接指向一个 2MiB = 512 * 4KiB 大小的大型页帧,而对于3级页表的条目,就会直接指向一个 1GiB = 512 * 2MiB 大小的巨型页帧。通常而言,这个功能会用于节省地址转换缓存的空间,以及降低逐层查找页表的耗时。 [system calls]: https://en.wikipedia.org/wiki/System_call [Spectre]: https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) `x86_64` crate 为我们提供了 [page tables] 的结构封装,以及其内部条目 [entries],所以我们无需自己实现具体的结构。 [page tables]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTable.html [entries]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html ### 地址转换后备缓冲区(TLB) 显而易见,4级页表使地址转换过程变得有点慢,每次转换都需要进行4次内存访问。为了改善这个问题,x86_64平台将最后几次转换结果放在所谓的 _地址转换后备缓冲区_(TLB)中,这样对同样地址的连续重复转换就可以直接返回缓存中存储的结果。 不同于CPU缓存,TLB并非是完全对外透明的,它在页表变化时并不会自动更新或删除被缓存的结果。这也就是说,内核需要在页表发生变化时,自己来处理TLB的更新。针对这个需要,CPU也提供了一个用于从TLB删除特定页的缓存的指令 [`invlpg`] (“invalidate page”),调用该指令之后,下次访问该页就会重新生成缓存。不过还有一个更彻底的办法,通过手动写入 `CR3` 寄存器可以制造出模拟地址空间切换的效果,TLB也会被完全刷新。`x86_64` crate 中的 [`tlb` module] 提供了上面的两种手段,并封装了对应的函数。 [`invlpg`]: https://www.felixcloutier.com/x86/INVLPG.html [`tlb` module]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/tlb/index.html 请注意,在修改页表之后,同步修改TLB是十分十分重要的事情,不然CPU可能会返回一个错误的物理地址,因为这种原因造成的bug是非常难以追踪和调试的。 ## 具体实现 有件事我们还没有提过:**我们的内核已经是在页上运行的**。在前文 ["最小内核"]["A minimal Rust Kernel"] 中,我们添加的bootloader已经搭建了一个4级页表结构,并将内核中使用的每个页都映射到了物理页帧上,其原因就是,在64位的 x86_64 平台下分页是被强制使用的。 ["A minimal Rust kernel"]: @/edition-2/posts/02-minimal-rust-kernel/index.md#creating-a-bootimage 这也就是说,我们在内核中所使用的每一个内存地址其实都是虚拟地址,VGA缓冲区是唯一的例外,因为bootloader为这个地址使用了 _一致映射_,令其直接指向地址 `0xb8000`。所谓一致映射,就是能将虚拟页 `0xb8000` 直接映射到物理页帧 `0xb8000`。 使用分页技术后,我们的内核在某种意义上已经十分安全了,因为越界的内存访问会导致 page fault 异常而不是访问到一个随机物理地址。bootloader已经为每一个页都设置了正确的权限,比如仅代码页具有执行权限、仅数据页具有写权限。 ### Page Faults 那么我们来通过内存越界访问手动触发一次 page fault,首先我们先写一个错误处理函数并注册到IDT中,这样我们就可以正常接收到这个异常,而非 [double fault] 了: [double fault]: @/edition-2/posts/06-double-faults/index.md ```rust // in src/interrupts.rs lazy_static! { static ref IDT: InterruptDescriptorTable = { let mut idt = InterruptDescriptorTable::new(); […] idt.page_fault.set_handler_fn(page_fault_handler); // new idt }; } use x86_64::structures::idt::PageFaultErrorCode; use crate::hlt_loop; extern "x86-interrupt" fn page_fault_handler( stack_frame: InterruptStackFrame, error_code: PageFaultErrorCode, ) { use x86_64::registers::control::Cr2; println!("EXCEPTION: PAGE FAULT"); println!("Accessed Address: {:?}", Cr2::read()); println!("Error Code: {:?}", error_code); println!("{:#?}", stack_frame); hlt_loop(); } ``` [`CR2`] 寄存器会在 page fault 发生时,被CPU自动写入导致异常的虚拟地址,我们可以用 `x86_64` crate 提供的 [`Cr2::read`] 函数来读取并打印该寄存器。[`PageFaultErrorCode`] 类型为我们提供了内存访问型异常的具体信息,比如究竟是因为读取还是写入操作,我们同样将其打印出来。并且不要忘记,在显式结束异常处理前,程序是不会恢复运行的,所以要在最后调用 [`hlt_loop`] 函数。 [`CR2`]: https://en.wikipedia.org/wiki/Control_register#CR2 [`Cr2::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr2.html#method.read [`PageFaultErrorCode`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html [LLVM bug]: https://github.com/rust-lang/rust/issues/57270 [`hlt_loop`]: @/edition-2/posts/07-hardware-interrupts/index.md#the-hlt-instruction 那么可以开始触发内存越界访问了: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); // new let ptr = 0xdeadbeaf as *mut u8; unsafe { *ptr = 42; } // as before #[cfg(test)] test_main(); println!("It did not crash!"); blog_os::hlt_loop(); } ``` 启动执行后,我们可以看到,page fault 的处理函数被触发了: ![EXCEPTION: Page Fault, Accessed Address: VirtAddr(0xdeadbeaf), Error Code: CAUSED_BY_WRITE, InterruptStackFrame: {…}](qemu-page-fault.png) `CR2` 确实保存了导致异常的虚拟地址 `0xdeadbeaf`,而错误码 [`CAUSED_BY_WRITE`] 也说明了导致异常的操作是写入。甚至于可以通过 [未设置的比特位][`PageFaultErrorCode`] 看出更多的信息,例如 `PROTECTION_VIOLATION` 未被设置说明目标页根本就不存在。 [`CAUSED_BY_WRITE`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.CAUSED_BY_WRITE 并且我们可以看到当前指令指针是 `0x2031b2`,根据上文的知识,我们知道它应该属于一个代码页。而代码页被bootloader设定为只读权限,所以读取是正常的,但写入就会触发 page fault 异常。比如你可以试着将上面代码中的 `0xdeadbeaf` 换成 `0x2031b2`: ```rust // Note: The actual address might be different for you. Use the address that // your page fault handler reports. let ptr = 0x2031b2 as *mut u8; // read from a code page unsafe { let x = *ptr; } println!("read worked"); // write to a code page unsafe { *ptr = 42; } println!("write worked"); ``` 执行后,我们可以看到读取操作成功了,但写入操作抛出了 page fault 异常: ![QEMU with output: "read worked, EXCEPTION: Page Fault, Accessed Address: VirtAddr(0x2031b2), Error Code: PROTECTION_VIOLATION | CAUSED_BY_WRITE, InterruptStackFrame: {…}"](qemu-page-fault-protection.png) 我们可以看到 _"read worked"_ 这条日志,说明读操作没有出问题,而 _"write worked"_ 这条日志则没有被打印,起而代之的是一个异常日志。这一次 [`PROTECTION_VIOLATION`] 标志位的 [`CAUSED_BY_WRITE`] 比特位被设置,说明异常正是被非法写入操作引发的,因为我们之前为该页设置了只读权限。 [`PROTECTION_VIOLATION`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/idt/struct.PageFaultErrorCode.html#associatedconstant.PROTECTION_VIOLATION ### 访问页表 那么我们来看看内核中页表的存储方式: ```rust // in src/main.rs #[unsafe(no_mangle)] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); blog_os::init(); use x86_64::registers::control::Cr3; let (level_4_page_table, _) = Cr3::read(); println!("Level 4 page table at: {:?}", level_4_page_table.start_address()); […] // test_main(), println(…), and hlt_loop() } ``` `x86_64` crate 中的 [`Cr3::read`] 函数可以返回 `CR3` 寄存器中的当前使用的4级页表,它返回的是 [`PhysFrame`] 和 [`Cr3Flags`] 两个类型组成的元组结构。不过此时我们只关心页帧信息,所以第二个元素暂且不管。 [`Cr3::read`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3.html#method.read [`PhysFrame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/frame/struct.PhysFrame.html [`Cr3Flags`]: https://docs.rs/x86_64/0.14.2/x86_64/registers/control/struct.Cr3Flags.html 然后我们会看到如下输出: ``` Level 4 page table at: PhysAddr(0x1000) ``` 所以当前的4级页表存储在 _物理地址_ `0x1000` 处,而且地址的外层数据结构是 [`PhysAddr`],那么问题来了:我们如何在内核中直接访问这个页表? [`PhysAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.PhysAddr.html 当分页功能启用时,直接访问物理内存是被禁止的,否则程序就可以很轻易的侵入其他程序的内存,所以唯一的途径就是通过某些手段构建一个指向 `0x1000` 的虚拟页。那么问题就变成了如何手动创建页映射,但其实该功能在很多地方都会用到,例如内核在创建新的线程时需要额外创建栈,同样需要用到该功能。 我们将在下一篇文章中对此问题进行展开。 ## 小结 本文介绍了两种内存保护技术:分段和分页。前者每次分配的内存区域大小是可变的,但会受到内存碎片的影响;而后者使用固定大小的页,并允许对访问权限进行精确控制。 分页技术将映射信息存储在一级或多级页表中,x86_64 平台使用4级页表和4KiB的页大小,硬件会自动逐级寻址并将地址转换结果存储在地址转换后备缓冲区(TLB)中,然而此缓冲区并非完全对用户透明,需要在页表发生变化时进行手动干预。 并且我们知道了内核已经被预定义了一个分页机制,内存越界访问会导致 page fault 异常。并且我们暂时无法访问当前正在使用的页表,因为 CR3 寄存器存储的地址无法在内核中直接访问。 ## 下文预告 在下一篇文章中,我们会详细讲解如何在内核中实现对分页机制的支持,这会提供一种直接访问物理内存的特别手段,也就是说我们可以直接访问页表。由此,我们可以在程序中实现虚拟地址到物理地址的转换函数,也使得在页表中手动创建映射成为了可能。 ================================================ FILE: blog/content/edition-2/posts/09-paging-implementation/index.es.md ================================================ +++ title = "Implementación de Paginación" weight = 9 path = "es/implementacion-de-paginacion" date = 2019-03-14 [extra] chapter = "Gestión de la Memoria" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ Esta publicación muestra cómo implementar soporte para paginación en nuestro núcleo. Primero explora diferentes técnicas para hacer accesibles los marcos de la tabla de páginas físicas al núcleo y discute sus respectivas ventajas y desventajas. Luego implementa una función de traducción de direcciones y una función para crear un nuevo mapeo. Este blog se desarrolla abiertamente en [GitHub]. Si tienes algún problema o pregunta, abre un problema allí. También puedes dejar comentarios [al final]. El código fuente completo de esta publicación se puede encontrar en la rama [`post-09`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-09 ## Introducción La [publicación anterior] dio una introducción al concepto de paginación. Motivó la paginación comparándola con la segmentación, explicó cómo funcionan la paginación y las tablas de páginas, y luego introdujo el diseño de tabla de páginas de 4 niveles de `x86_64`. Descubrimos que el bootloader (cargador de arranque) ya configuró una jerarquía de tablas de páginas para nuestro núcleo, lo que significa que nuestro núcleo ya se ejecuta en direcciones virtuales. Esto mejora la seguridad, ya que los accesos ilegales a la memoria causan excepciones de falta de página en lugar de modificar la memoria física arbitraria. [publicación anterior]: @/edition-2/posts/08-paging-introduction/index.md La publicación terminó con el problema de que [no podemos acceder a las tablas de páginas desde nuestro núcleo][end of previous post] porque se almacenan en la memoria física y nuestro núcleo ya se ejecuta en direcciones virtuales. Esta publicación explora diferentes enfoques para hacer los marcos de la tabla de páginas accesibles a nuestro núcleo. Discutiremos las ventajas y desventajas de cada enfoque y luego decidiremos un enfoque para nuestro núcleo. [end of previous post]: @/edition-2/posts/08-paging-introduction/index.md#accessing-the-page-tables Para implementar el enfoque, necesitaremos el soporte del bootloader, así que lo configuraremos primero. Después, implementaremos una función que recorra la jerarquía de tablas de páginas para traducir direcciones virtuales a físicas. Finalmente, aprenderemos a crear nuevos mapeos en las tablas de páginas y a encontrar marcos de memoria no utilizados para crear nuevas tablas de páginas. ## Accediendo a las Tablas de Páginas {#accediendo-a-las-tablas-de-paginas} Acceder a las tablas de páginas desde nuestro núcleo no es tan fácil como podría parecer. Para entender el problema, echemos un vistazo a la jerarquía de tablas de páginas de 4 niveles del artículo anterior nuevamente: ![Un ejemplo de una jerarquía de página de 4 niveles con cada tabla de páginas mostrada en memoria física](../paging-introduction/x86_64-page-table-translation.svg) Lo importante aquí es que cada entrada de página almacena la dirección _física_ de la siguiente tabla. Esto evita la necesidad de hacer una traducción para estas direcciones también, lo cual sería malo para el rendimiento y podría fácilmente causar bucles de traducción infinitos. El problema para nosotros es que no podemos acceder directamente a las direcciones físicas desde nuestro núcleo, ya que nuestro núcleo también se ejecuta sobre direcciones virtuales. Por ejemplo, cuando accedemos a la dirección `4 KiB`, accedemos a la dirección _virtual_ `4 KiB`, no a la dirección _física_ `4 KiB` donde se almacena la tabla de páginas de nivel 4. Cuando queremos acceder a la dirección física `4 KiB`, solo podemos hacerlo a través de alguna dirección virtual que mapea a ella. Así que, para acceder a los marcos de la tabla de páginas, necesitamos mapear algunas páginas virtuales a ellos. Hay diferentes formas de crear estos mapeos que nos permiten acceder a marcos arbitrarios de la tabla de páginas. ### Mapeo de Identidad Una solución simple es **mapear de identidad todas las tablas de páginas**: ![Un espacio de direcciones virtual y física con varias páginas virtuales mapeadas al marco físico con la misma dirección](identity-mapped-page-tables.svg) En este ejemplo, vemos varios marcos de tablas de páginas mapeados de identidad. De esta manera, las direcciones físicas de las tablas de páginas también son direcciones virtuales válidas, por lo que podemos acceder fácilmente a las tablas de páginas de todos los niveles comenzando desde el registro CR3. Sin embargo, esto desordena el espacio de direcciones virtuales y dificulta encontrar regiones de memoria continuas de tamaños más grandes. Por ejemplo, imagina que queremos crear una región de memoria virtual de tamaño 1000 KiB en el gráfico anterior, por ejemplo, para [mapeo de una memoria de archivo]. No podemos comenzar la región en `28 KiB` porque colisionaría con la página ya mapeada en `1004 KiB`. Así que tenemos que buscar más hasta que encontremos un área suficientemente grande sin mapear, por ejemplo, en `1008 KiB`. Este es un problema de fragmentación similar al de la [segmentación]. [mapeo de una memoria de archivo]: https://en.wikipedia.org/wiki/Memory-mapped_file [segmentación]: @/edition-2/posts/08-paging-introduction/index.md#fragmentation Igualmente, hace que sea mucho más difícil crear nuevas tablas de páginas porque necesitamos encontrar marcos físicos cuyos correspondientes páginas no estén ya en uso. Por ejemplo, asumamos que reservamos la región de memoria _virtual_ de 1000 KiB comenzando en `1008 KiB` para nuestro archivo mapeado en memoria. Ahora no podemos usar ningún marco con una dirección _física_ entre `1000 KiB` y `2008 KiB`, porque no podemos mapear de identidad. ### Mapear en un Desplazamiento Fijo Para evitar el problema de desordenar el espacio de direcciones virtuales, podemos **usar una región de memoria separada para los mapeos de la tabla de páginas**. Así que en lugar de mapear de identidad los marcos de las tablas de páginas, los mapeamos en un desplazamiento fijo en el espacio de direcciones virtuales. Por ejemplo, el desplazamiento podría ser de 10 TiB: ![La misma figura que para el mapeo de identidad, pero cada página virtual mapeada está desplazada por 10 TiB.](page-tables-mapped-at-offset.svg) Al usar la memoria virtual en el rango `10 TiB..(10 TiB + tamaño de la memoria física)` exclusivamente para mapeos de tablas de páginas, evitamos los problemas de colisión del mapeo de identidad. Reservar una región tan grande del espacio de direcciones virtuales solo es posible si el espacio de direcciones virtuales es mucho más grande que el tamaño de la memoria física. Esto no es un problema en `x86_64` ya que el espacio de direcciones de 48 bits es de 256 TiB. Este enfoque aún tiene la desventaja de que necesitamos crear un nuevo mapeo cada vez que creamos una nueva tabla de páginas. Además, no permite acceder a las tablas de páginas de otros espacios de direcciones, lo que sería útil al crear un nuevo proceso. ### Mapear la Memoria Física Completa Podemos resolver estos problemas **mapeando la memoria física completa** en lugar de solo los marcos de la tabla de páginas: ![La misma figura que para el mapeo con desplazamiento, pero cada marco físico tiene un mapeo (en 10 TiB + X) en lugar de solo los marcos de la tabla de páginas.](map-complete-physical-memory.svg) Este enfoque permite a nuestro núcleo acceder a memoria física arbitraria, incluyendo marcos de la tabla de páginas de otros espacios de direcciones. La región de memoria virtual reservada tiene el mismo tamaño que antes, con la diferencia de que ya no contiene páginas sin mapear. La desventaja de este enfoque es que se necesitan tablas de páginas adicionales para almacenar el mapeo de la memoria física. Estas tablas de páginas deben almacenarse en alguna parte, por lo que ocupan parte de la memoria física, lo que puede ser un problema en dispositivos con poca memoria. En `x86_64`, sin embargo, podemos utilizar [páginas grandes] con un tamaño de 2 MiB para el mapeo, en lugar de las páginas de 4 KiB por defecto. De esta manera, mapear 32 GiB de memoria física solo requiere 132 KiB para las tablas de páginas, ya que solo se necesita una tabla de nivel 3 y 32 tablas de nivel 2. Las páginas grandes también son más eficientes en caché, ya que utilizan menos entradas en el buffer de traducción (TLB). [páginas grandes]: https://en.wikipedia.org/wiki/Page_%28computer_memory%29#Multiple_page_sizes ### Mapeo Temporal Para dispositivos con cantidades muy pequeñas de memoria física, podríamos **mapear los marcos de la tabla de páginas solo temporalmente** cuando necesitemos acceder a ellos. Para poder crear los mapeos temporales, solo necesitamos una única tabla de nivel 1 mapeada de identidad: ![Un espacio de direcciones virtual y física con una tabla de nivel 1 mapeada de identidad, que mapea su 0ª entrada al marco de la tabla de nivel 2, mapeando así ese marco a la página con dirección 0](temporarily-mapped-page-tables.svg) La tabla de nivel 1 en este gráfico controla los primeros 2 MiB del espacio de direcciones virtuales. Esto se debe a que es accesible comenzando en el registro CR3 y siguiendo la entrada 0 en las tablas de páginas de niveles 4, 3 y 2. La entrada con índice `8` mapea la página virtual en la dirección `32 KiB` al marco físico en la dirección `32 KiB`, mapeando de identidad la tabla de nivel 1 misma. El gráfico muestra este mapeo de identidad mediante la flecha horizontal en `32 KiB`. Al escribir en la tabla de nivel 1 mapeada de identidad, nuestro núcleo puede crear hasta 511 mapeos temporales (512 menos la entrada requerida para el mapeo de identidad). En el ejemplo anterior, el núcleo creó dos mapeos temporales: - Al mapear la 0ª entrada de la tabla de nivel 1 al marco con dirección `24 KiB`, creó un mapeo temporal de la página virtual en `0 KiB` al marco físico de la tabla de nivel 2, indicado por la línea de puntos. - Al mapear la 9ª entrada de la tabla de nivel 1 al marco con dirección `4 KiB`, creó un mapeo temporal de la página virtual en `36 KiB` al marco físico de la tabla de nivel 4, indicado por la línea de puntos. Ahora el núcleo puede acceder a la tabla de nivel 2 escribiendo en la página `0 KiB` y a la tabla de nivel 4 escribiendo en la página `36 KiB`. El proceso para acceder a un marco de tabla de páginas arbitrario con mapeos temporales sería: - Buscar una entrada libre en la tabla de nivel 1 mapeada de identidad. - Mapear esa entrada al marco físico de la tabla de páginas que queremos acceder. - Acceder al marco objetivo a través de la página virtual que se mapea a la entrada. - Reestablecer la entrada como no utilizada, eliminando así el mapeo temporal nuevamente. Este enfoque reutiliza las mismas 512 páginas virtuales para crear los mapeos y, por lo tanto, requiere solo 4 KiB de memoria física. La desventaja es que es un poco engorroso, especialmente porque un nuevo mapeo podría requerir modificaciones en múltiples niveles de la tabla, lo que significa que tendríamos que repetir el proceso anterior múltiples veces. ### Tablas de Páginas Recursivas Otro enfoque interesante, que no requiere tablas de páginas adicionales, es **mapear la tabla de páginas de manera recursiva**. La idea detrás de este enfoque es mapear una entrada de la tabla de nivel 4 a la misma tabla de nivel 4. Al hacer esto, reservamos efectivamente una parte del espacio de direcciones virtuales y mapeamos todos los marcos de tablas de páginas actuales y futuros a ese espacio. Veamos un ejemplo para entender cómo funciona todo esto: ![Un ejemplo de una jerarquía de página de 4 niveles con cada tabla de páginas mostrada en memoria física. La entrada 511 de la tabla de nivel 4 está mapeada al marco de 4KiB, el marco de la tabla de nivel 4 misma.](recursive-page-table.png) La única diferencia con el [ejemplo al principio de este artículo] es la entrada adicional en el índice `511` en la tabla de nivel 4, que está mapeada al marco físico `4 KiB`, el marco de la tabla de nivel 4 misma. [ejemplo al principio de este artículo]: #accediendo-a-las-tablas-de-paginas Al permitir que la CPU siga esta entrada en una traducción, no llega a una tabla de nivel 3, sino a la misma tabla de nivel 4 nuevamente. Esto es similar a una función recursiva que se llama a sí misma; por lo tanto, esta tabla se llama _tabla de páginas recursiva_. Lo importante es que la CPU asume que cada entrada en la tabla de nivel 4 apunta a una tabla de nivel 3, por lo que ahora trata la tabla de nivel 4 como una tabla de nivel 3. Esto funciona porque las tablas de todos los niveles tienen la misma estructura exacta en `x86_64`. Al seguir la entrada recursiva una o múltiples veces antes de comenzar la traducción real, podemos efectivamente acortar el número de niveles que la CPU recorre. Por ejemplo, si seguimos la entrada recursiva una vez y luego procedemos a la tabla de nivel 3, la CPU piensa que la tabla de nivel 3 es una tabla de nivel 2. Siguiendo, trata la tabla de nivel 2 como una tabla de nivel 1 y la tabla de nivel 1 como el marco mapeado. Esto significa que ahora podemos leer y escribir la tabla de nivel 1 porque la CPU piensa que es el marco mapeado. El gráfico a continuación ilustra los cinco pasos de traducción: ![El ejemplo anterior de jerarquía de páginas de 4 niveles con 5 flechas: "Paso 0" de CR4 a la tabla de nivel 4, "Paso 1" de la tabla de nivel 4 a la tabla de nivel 4, "Paso 2" de la tabla de nivel 4 a la tabla de nivel 3, "Paso 3" de la tabla de nivel 3 a la tabla de nivel 2, y "Paso 4" de la tabla de nivel 2 a la tabla de nivel 1.](recursive-page-table-access-level-1.png) De manera similar, podemos seguir la entrada recursiva dos veces antes de comenzar la traducción para reducir el número de niveles recorridos a dos: ![La misma jerarquía de páginas de 4 niveles con las siguientes 4 flechas: "Paso 0" de CR4 a la tabla de nivel 4, "Pasos 1&2" de la tabla de nivel 4 a la tabla de nivel 4, "Paso 3" de la tabla de nivel 4 a la tabla de nivel 3, y "Paso 4" de la tabla de nivel 3 a la tabla de nivel 2.](recursive-page-table-access-level-2.png) Sigamos paso a paso: Primero, la CPU sigue la entrada recursiva en la tabla de nivel 4 y piensa que llega a una tabla de nivel 3. Luego sigue la entrada recursiva nuevamente y piensa que llega a una tabla de nivel 2. Pero en realidad, todavía está en la tabla de nivel 4. Cuando la CPU ahora sigue una entrada diferente, aterriza en una tabla de nivel 3, pero piensa que ya está en una tabla de nivel 1. Así que mientras la siguiente entrada apunta a una tabla de nivel 2, la CPU piensa que apunta al marco mapeado, lo que nos permite leer y escribir la tabla de nivel 2. Acceder a las tablas de niveles 3 y 4 funciona de la misma manera. Para acceder a la tabla de nivel 3, seguimos la entrada recursiva tres veces, engañando a la CPU para que piense que ya está en una tabla de nivel 1. Luego seguimos otra entrada y llegamos a una tabla de nivel 3, que la CPU trata como un marco mapeado. Para acceder a la tabla de nivel 4 misma, simplemente seguimos la entrada recursiva cuatro veces hasta que la CPU trate la tabla de nivel 4 como el marco mapeado (en azul en el gráfico a continuación). ![La misma jerarquía de páginas de 4 niveles con las siguientes 3 flechas: "Paso 0" de CR4 a la tabla de nivel 4, "Pasos 1,2,3" de la tabla de nivel 4 a la tabla de nivel 4, y "Paso 4" de la tabla de nivel 4 a la tabla de nivel 3. En azul, la alternativa "Pasos 1,2,3,4" flecha de la tabla de nivel 4 a la tabla de nivel 4.](recursive-page-table-access-level-3.png) Puede llevar un tiempo asimilar el concepto, pero funciona bastante bien en la práctica. En la siguiente sección, explicamos cómo construir direcciones virtuales para seguir la entrada recursiva una o múltiples veces. No utilizaremos la paginación recursiva para nuestra implementación, así que no necesitas leerlo para continuar con la publicación. Si te interesa, simplemente haz clic en _"Cálculo de Direcciones"_ para expandirlo. ---

    Cálculo de Direcciones

    Vimos que podemos acceder a tablas de todos los niveles siguiendo la entrada recursiva una o múltiples veces antes de la traducción real. Dado que los índices en las tablas de los cuatro niveles se derivan directamente de la dirección virtual, necesitamos construir direcciones virtuales especiales para esta técnica. Recuerda, los índices de la tabla de páginas se derivan de la dirección de la siguiente manera: ![Bits 0–12 son el desplazamiento de página, bits 12–21 el índice de nivel 1, bits 21–30 el índice de nivel 2, bits 30–39 el índice de nivel 3, y bits 39–48 el índice de nivel 4](../paging-introduction/x86_64-table-indices-from-address.svg) Supongamos que queremos acceder a la tabla de nivel 1 que mapea una página específica. Como aprendimos anteriormente, esto significa que debemos seguir la entrada recursiva una vez antes de continuar con los índices de niveles 4, 3 y 2. Para hacer eso, movemos cada bloque de la dirección un bloque a la derecha y establecemos el índice original de nivel 4 en el índice de la entrada recursiva: ![Bits 0–12 son el desplazamiento en el marco de la tabla de nivel 1, bits 12–21 el índice de nivel 2, bits 21–30 el índice de nivel 3, bits 30–39 el índice de nivel 4, y bits 39–48 el índice de la entrada recursiva](table-indices-from-address-recursive-level-1.svg) Para acceder a la tabla de nivel 2 de esa página, movemos cada índice dos bloques a la derecha y configuramos ambos bloques del índice original de nivel 4 y el índice original de nivel 3 al índice de la entrada recursiva: ![Bits 0–12 son el desplazamiento en el marco de la tabla de nivel 2, bits 12–21 el índice de nivel 3, bits 21–30 el índice de nivel 4, y bits 30–39 y bits 39–48 son el índice de la entrada recursiva](table-indices-from-address-recursive-level-2.svg) Acceder a la tabla de nivel 3 funciona moviendo cada bloque tres bloques a la derecha y usando el índice recursivo para el índice original de niveles 4, 3 y 2: ![Bits 0–12 son el desplazamiento en el marco de la tabla de nivel 3, bits 12–21 el índice de nivel 4, y bits 21–30, bits 30–39 y bits 39–48 son el índice de la entrada recursiva](table-indices-from-address-recursive-level-3.svg) Finalmente, podemos acceder a la tabla de nivel 4 moviendo cada bloque cuatro bloques a la derecha y usando el índice recursivo para todos los bloques de dirección excepto para el desplazamiento: ![Bits 0–12 son el desplazamiento en el marco de la tabla l y bits 12–21, bits 21–30, bits 30–39 y bits 39–48 son el índice de la entrada recursiva](table-indices-from-address-recursive-level-4.svg) Ahora podemos calcular direcciones virtuales para las tablas de los cuatro niveles. Incluso podemos calcular una dirección que apunte exactamente a una entrada específica de la tabla de páginas multiplicando su índice por 8, el tamaño de una entrada de tabla de páginas. La tabla a continuación resume la estructura de la dirección para acceder a los diferentes tipos de marcos: | Dirección Virtual para | Estructura de Dirección ([octal]) | | --------------------------- | --------------------------------- | | Página | `0o_SSSSSS_AAA_BBB_CCC_DDD_EEEE` | | Entrada de Tabla de Nivel 1 | `0o_SSSSSS_RRR_AAA_BBB_CCC_DDDD` | | Entrada de Tabla de Nivel 2 | `0o_SSSSSS_RRR_RRR_AAA_BBB_CCCC` | | Entrada de Tabla de Nivel 3 | `0o_SSSSSS_RRR_RRR_RRR_AAA_BBBB` | | Entrada de Tabla de Nivel 4 | `0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA` | [octal]: https://en.wikipedia.org/wiki/Octal Donde `AAA` es el índice de nivel 4, `BBB` el índice de nivel 3, `CCC` el índice de nivel 2, y `DDD` el índice de nivel 1 del marco mapeado, y `EEEE` el desplazamiento dentro de él. `RRR` es el índice de la entrada recursiva. Cuando un índice (tres dígitos) se transforma en un desplazamiento (cuatro dígitos), se hace multiplicándolo por 8 (el tamaño de una entrada de tabla de páginas). Con este desplazamiento, la dirección resultante apunta directamente a la respectiva entrada de la tabla de páginas. `SSSSSS` son bits de extensión de signo, lo que significa que son todos copias del bit 47. Este es un requisito especial para direcciones válidas en la arquitectura `x86_64`. Lo explicamos en el [artículo anterior][sign extension]. [sign extension]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 Usamos números [octales] para representar las direcciones ya que cada carácter octal representa tres bits, lo que nos permite separar claramente los índices de 9 bits de los diferentes niveles de la tabla de páginas. Esto no es posible con el sistema hexadecimal, donde cada carácter representa cuatro bits. ##### En Código Rust Para construir tales direcciones en código Rust, puedes usar operaciones bit a bit: ```rust // la dirección virtual cuya correspondiente tablas de páginas quieres acceder let addr: usize = […]; let r = 0o777; // índice recursivo let sign = 0o177777 << 48; // extensión de signo // recuperar los índices de la tabla de páginas de la dirección que queremos traducir let l4_idx = (addr >> 39) & 0o777; // índice de nivel 4 let l3_idx = (addr >> 30) & 0o777; // índice de nivel 3 let l2_idx = (addr >> 21) & 0o777; // índice de nivel 2 let l1_idx = (addr >> 12) & 0o777; // índice de nivel 1 let page_offset = addr & 0o7777; // calcular las direcciones de las tablas let level_4_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (r << 12); let level_3_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (l4_idx << 12); let level_2_table_addr = sign | (r << 39) | (r << 30) | (l4_idx << 21) | (l3_idx << 12); let level_1_table_addr = sign | (r << 39) | (l4_idx << 30) | (l3_idx << 21) | (l2_idx << 12); ``` El código anterior asume que la última entrada de nivel 4 con índice `0o777` (511) se mapea de manera recursiva. Este no es el caso actualmente, así que el código aún no funcionará. Véase a continuación cómo decirle al bootloader que configure el mapeo recursivo. Alternativamente, para realizar las operaciones bit a bit manualmente, puedes usar el tipo [`RecursivePageTable`] de la crate `x86_64`, que proporciona abstracciones seguras para varias operaciones de la tabla de páginas. Por ejemplo, el siguiente código muestra cómo traducir una dirección virtual a su dirección física mapeada: [`RecursivePageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html ```rust // en src/memory.rs use x86_64::structures::paging::{Mapper, Page, PageTable, RecursivePageTable}; use x86_64::{VirtAddr, PhysAddr}; /// Crea una instancia de RecursivePageTable a partir de la dirección de nivel 4. let level_4_table_addr = […]; let level_4_table_ptr = level_4_table_addr as *mut PageTable; let recursive_page_table = unsafe { let level_4_table = &mut *level_4_table_ptr; RecursivePageTable::new(level_4_table).unwrap(); } /// Recupera la dirección física para la dirección virtual dada let addr: u64 = […] let addr = VirtAddr::new(addr); let page: Page = Page::containing_address(addr); // realizar la traducción let frame = recursive_page_table.translate_page(page); frame.map(|frame| frame.start_address() + u64::from(addr.page_offset())) ``` Nuevamente, se requiere un mapeo recursivo válido para que este código funcione. Con tal mapeo, la dirección faltante `level_4_table_addr` se puede calcular como en el primer ejemplo de código.
    --- La paginación recursiva es una técnica interesante que muestra cuán poderoso puede ser un solo mapeo en una tabla de páginas. Es relativamente fácil de implementar y solo requiere una cantidad mínima de configuración (solo una entrada recursiva), por lo que es una buena opción para los primeros experimentos con paginación. Sin embargo, también tiene algunas desventajas: - Ocupa una gran cantidad de memoria virtual (512 GiB). Esto no es un gran problema en el gran espacio de direcciones de 48 bits, pero podría llevar a un comportamiento de caché subóptimo. - Solo permite acceder fácilmente al espacio de direcciones activo actualmente. Acceder a otros espacios de direcciones sigue siendo posible cambiando la entrada recursiva, pero se requiere un mapeo temporal para volver a cambiar. Describimos cómo hacer esto en la publicación (desactualizada) [_Remap The Kernel_]. - Se basa fuertemente en el formato de tabla de páginas de `x86` y podría no funcionar en otras arquitecturas. [_Remap The Kernel_]: https://os.phil-opp.com/remap-the-kernel/#overview ## Soporte del Bootloader Todos estos enfoques requieren modificaciones en las tablas de páginas para su configuración. Por ejemplo, se necesitan crear mapeos para la memoria física o debe mapearse una entrada de la tabla de nivel 4 de forma recursiva. El problema es que no podemos crear estos mapeos requeridos sin una forma existente de acceder a las tablas de páginas. Esto significa que necesitamos la ayuda del bootloader, que crea las tablas de páginas en las que se ejecuta nuestro núcleo. El bootloader tiene acceso a las tablas de páginas, por lo que puede crear cualquier mapeo que necesitemos. En su implementación actual, la crate `bootloader` tiene soporte para dos de los enfoques anteriores, controlados a través de [c características de cargo]: [c características de cargo]: https://doc.rust-lang.org/cargo/reference/features.html#the-features-section - La característica `map_physical_memory` mapea la memoria física completa en algún lugar del espacio de direcciones virtuales. Por lo tanto, el núcleo tiene acceso a toda la memoria física y puede seguir el enfoque [_Mapear la Memoria Física Completa_](#mapear-la-memoria-fisica-completa). - Con la característica `recursive_page_table`, el bootloader mapea una entrada de la tabla de nivel 4 de manera recursiva. Esto permite que el núcleo acceda a las tablas de páginas como se describe en la sección [_Tablas de Páginas Recursivas_](#tablas-de-paginas-recursivas). Elegimos el primer enfoque para nuestro núcleo ya que es simple, independiente de la plataforma y más poderoso (también permite acceder a marcos que no son de tabla de páginas). Para habilitar el soporte necesario del bootloader, agregamos la característica `map_physical_memory` a nuestra dependencia de `bootloader`: ```toml [dependencies] bootloader = { version = "0.9", features = ["map_physical_memory"]} ``` Con esta característica habilitada, el bootloader mapea la memoria física completa a algún rango de direcciones virtuales no utilizadas. Para comunicar el rango de direcciones virtuales a nuestro núcleo, el bootloader pasa una estructura de _información de boot_. ### Información de Boot La crate `bootloader` define una struct [`BootInfo`] que contiene toda la información que pasa a nuestro núcleo. La struct aún se encuentra en una etapa temprana, así que espera algunos errores al actualizar a futuras versiones de bootloader que sean [incompatibles con semver]. Con la característica `map_physical_memory` habilitada, actualmente tiene los dos campos `memory_map` y `physical_memory_offset`: [`BootInfo`]: https://docs.rs/bootloader/0.9/bootloader/bootinfo/struct.BootInfo.html [incompatibles con semver]: https://doc.rust-lang.org/stable/cargo/reference/specifying-dependencies.html#caret-requirements - El campo `memory_map` contiene una descripción general de la memoria física disponible. Esto le dice a nuestro núcleo cuánta memoria física está disponible en el sistema y qué regiones de memoria están reservadas para dispositivos como el hardware VGA. El mapa de memoria se puede consultar desde la BIOS o UEFI firmware, pero solo muy al principio en el proceso de arranque. Por esta razón, debe ser proporcionado por el bootloader porque no hay forma de que el núcleo lo recupere más tarde. Necesitaremos el mapa de memoria más adelante en esta publicación. - El `physical_memory_offset` nos indica la dirección de inicio virtual del mapeo de memoria física. Al agregar este desplazamiento a una dirección física, obtenemos la dirección virtual correspondiente. Esto nos permite acceder a memoria física arbitraria desde nuestro núcleo. - Este desplazamiento de memoria física se puede personalizar añadiendo una tabla `[package.metadata.bootloader]` en Cargo.toml y configurando el campo `physical-memory-offset = "0x0000f00000000000"` (o cualquier otro valor). Sin embargo, ten en cuenta que el bootloader puede entrar en pánico si se encuentra valores de dirección física que comienzan a superponerse con el espacio más allá del desplazamiento, es decir, áreas que habría mapeado previamente a otras direcciones físicas tempranas. Por lo tanto, en general, cuanto mayor sea el valor (> 1 TiB), mejor. El bootloader pasa la struct `BootInfo` a nuestro núcleo en forma de un argumento `&'static BootInfo` a nuestra función `_start`. Aún no hemos declarado este argumento en nuestra función, así que lo agregaremos: ```rust // en src/main.rs use bootloader::BootInfo; #[no_mangle] pub extern "C" fn _start(boot_info: &'static BootInfo) -> ! { // nuevo argumento […] } ``` No fue un problema dejar de lado este argumento antes porque la convención de llamada `x86_64` pasa el primer argumento en un registro de CPU. Por lo tanto, el argumento simplemente se ignora cuando no se declara. Sin embargo, sería un problema si accidentalmente usáramos un tipo de argumento incorrecto, ya que el compilador no conoce la firma de tipo correcta de nuestra función de entrada. ### El Macro `entry_point` Dado que nuestra función `_start` se llama externamente desde el bootloader, no se verifica la firma de nuestra función. Esto significa que podríamos hacer que tome argumentos arbitrarios sin ningún error de compilación, pero fallaría o causaría un comportamiento indefinido en tiempo de ejecución. Para asegurarnos de que la función de punto de entrada siempre tenga la firma correcta que espera el bootloader, la crate `bootloader` proporciona un macro [`entry_point`] que proporciona una forma verificada por tipo de definir una función de Rust como punto de entrada. Vamos a reescribir nuestra función de punto de entrada para usar este macro: [`entry_point`]: https://docs.rs/bootloader/0.6.4/bootloader/macro.entry_point.html ```rust // en src/main.rs use bootloader::{BootInfo, entry_point}; entry_point!(kernel_main); fn kernel_main(boot_info: &'static BootInfo) -> ! { […] } ``` Ya no necesitamos usar `extern "C"` ni `no_mangle` para nuestro punto de entrada, ya que el macro define el verdadero punto de entrada inferior `_start` por nosotros. La función `kernel_main` es ahora una función de Rust completamente normal, así que podemos elegir un nombre arbitrario para ella. Lo importante es que esté verificada por tipo, así que se producirá un error de compilación cuando usemos una firma de función incorrecta, por ejemplo, al agregar un argumento o cambiar el tipo de argumento. Realizaremos el mismo cambio en nuestro `lib.rs`: ```rust // en src/lib.rs #[cfg(test)] use bootloader::{entry_point, BootInfo}; #[cfg(test)] entry_point!(test_kernel_main); /// Punto de entrada para `cargo test` #[cfg(test)] fn test_kernel_main(_boot_info: &'static BootInfo) -> ! { // como antes init(); test_main(); hlt_loop(); } ``` Dado que el punto de entrada solo se usa en modo de prueba, agregamos el atributo `#[cfg(test)]` a todos los elementos. Le damos a nuestro punto de entrada de prueba el nombre distintivo `test_kernel_main` para evitar confusión con el `kernel_main` de nuestro `main.rs`. No usamos el parámetro `BootInfo` por ahora, así que anteponemos un `_` al nombre del parámetro para silenciar la advertencia de variable no utilizada. ## Implementación Ahora que tenemos acceso a la memoria física, finalmente podemos comenzar a implementar nuestro código de tablas de páginas. Primero, echaremos un vistazo a las tablas de páginas actualmente activas en las que se ejecuta nuestro núcleo. En el segundo paso, crearemos una función de traducción que devuelve la dirección física que se mapea a una dada dirección virtual. Como último paso, intentaremos modificar las tablas de páginas para crear un nuevo mapeo. Antes de comenzar, creamos un nuevo módulo `memory` para nuestro código: ```rust // en src/lib.rs pub mod memory; ``` Para el módulo, creamos un archivo vacío `src/memory.rs`. ### Accediendo a las Tablas de Páginas Al [final del artículo anterior], intentamos echar un vistazo a las tablas de páginas en las que se ejecuta nuestro núcleo, pero fallamos ya que no podíamos acceder al marco físico al que apunta el registro `CR3`. Ahora podemos continuar desde allí creando una función `active_level_4_table` que devuelve una referencia a la tabla de nivel 4 activa: [end of the previous post]: @/edition-2/posts/08-paging-introduction/index.md#accessing-the-page-tables ```rust // en src/memory.rs use x86_64::{ structures::paging::PageTable, VirtAddr, }; /// Devuelve una referencia mutable a la tabla de nivel 4 activa. /// /// Esta función es insegura porque el llamador debe garantizar que la /// memoria física completa esté mapeada en memoria virtual en el pasado /// `physical_memory_offset`. Además, esta función solo debe ser llamada una vez /// para evitar aliasing de referencias `&mut` (lo que es comportamiento indefinido). pub unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable { use x86_64::registers::control::Cr3; let (level_4_table_frame, _) = Cr3::read(); let phys = level_4_table_frame.start_address(); let virt = physical_memory_offset + phys.as_u64(); let page_table_ptr: *mut PageTable = virt.as_mut_ptr(); &mut *page_table_ptr // inseguro } ``` Primero, leemos el marco físico de la tabla de nivel 4 activa desde el registro `CR3`. Luego tomamos su dirección de inicio física, la convertimos a un `u64`, y le agregamos el `physical_memory_offset` para obtener la dirección virtual donde se mapea la tabla de páginas. Finalmente, convertimos la dirección virtual a un puntero crudo `*mut PageTable` a través del método `as_mut_ptr` y luego creamos de manera insegura una referencia `&mut PageTable` a partir de ello. Creamos una referencia `&mut` en lugar de una `&` porque más adelante mutaremos las tablas de páginas en esta publicación. No necesitamos usar un bloque inseguro aquí porque Rust trata el cuerpo completo de una `unsafe fn` como un gran bloque inseguro. Esto hace que nuestro código sea más peligroso ya que podríamos accidentalmente introducir una operación insegura en líneas anteriores sin darnos cuenta. También dificulta mucho más encontrar operaciones inseguras entre operaciones seguras. Hay un [RFC](https://github.com/rust-lang/rfcs/pull/2585) para cambiar este comportamiento. Ahora podemos usar esta función para imprimir las entradas de la tabla de nivel 4: ```rust // en src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory::active_level_4_table; use x86_64::VirtAddr; println!("¡Hola Mundo{}", "!"); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let l4_table = unsafe { active_level_4_table(phys_mem_offset) }; for (i, entry) in l4_table.iter().enumerate() { if !entry.is_unused() { println!("Entrada L4 {}: {:?}", i, entry); } } // como antes #[cfg(test)] test_main(); println!("¡No se estrelló!"); blog_os::hlt_loop(); } ``` Primero, convertimos el `physical_memory_offset` de la struct `BootInfo` a un [`VirtAddr`] y lo pasamos a la función `active_level_4_table`. Luego, usamos la función `iter` para iterar sobre las entradas de las tablas de páginas y el combinador [`enumerate`] para agregar un índice `i` a cada elemento. Solo imprimimos entradas no vacías porque todas las 512 entradas no cabrían en la pantalla. [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate Cuando lo ejecutamos, vemos el siguiente resultado: ![QEMU imprime la entrada 0 (0x2000, PRESENTE, ESCRIBIBLE, ACCEDIDO), la entrada 1 (0x894000, PRESENTE, ESCRIBIBLE, ACCEDIDO, SUCIO), la entrada 31 (0x88e000, PRESENTE, ESCRIBIBLE, ACCEDIDO, SUCIO), la entrada 175 (0x891000, PRESENTE, ESCRIBIBLE, ACCEDIDO, SUCIO), y la entrada 504 (0x897000, PRESENTE, ESCRIBIBLE, ACCEDIDO, SUCIO)](qemu-print-level-4-table.png) Vemos que hay varias entradas no vacías, que todas mapean a diferentes tablas de nivel 3. Hay tantas regiones porque el código del núcleo, la pila del núcleo, el mapeo de memoria física y la información de arranque utilizan áreas de memoria separadas. Para atravesar las tablas de páginas más a fondo y echar un vistazo a una tabla de nivel 3, podemos tomar el marco mapeado de una entrada y convertirlo a una dirección virtual nuevamente: ```rust // en el bucle `for` en src/main.rs use x86_64::structures::paging::PageTable; if !entry.is_unused() { println!("Entrada L4 {}: {:?}", i, entry); // obtener la dirección física de la entrada y convertirla let phys = entry.frame().unwrap().start_address(); let virt = phys.as_u64() + boot_info.physical_memory_offset; let ptr = VirtAddr::new(virt).as_mut_ptr(); let l3_table: &PageTable = unsafe { &*ptr }; // imprimir las entradas no vacías de la tabla de nivel 3 for (i, entry) in l3_table.iter().enumerate() { if !entry.is_unused() { println!(" Entrada L3 {}: {:?}", i, entry); } } } ``` Para observar las tablas de nivel 2 y nivel 1, repetimos ese proceso para las entradas de nivel 3 y nivel 2. Como puedes imaginar, esto se vuelve muy verboso muy rápido, así que no mostramos el código completo aquí. Recorrer manualmente las tablas de páginas es interesante porque ayuda a entender cómo la CPU realiza la traducción. Sin embargo, la mayoría de las veces, solo nos interesa la dirección física mapeada para una dirección virtual dada, así que vamos a crear una función para eso. ### Traduciendo Direcciones Para traducir una dirección virtual a una dirección física, tenemos que recorrer la tabla de páginas de 4 niveles hasta llegar al marco mapeado. Vamos a crear una función que realice esta traducción: ```rust // en src/memory.rs use x86_64::PhysAddr; /// Traduce la dirección virtual dada a la dirección física mapeada, o /// `None` si la dirección no está mapeada. /// /// Esta función es insegura porque el llamador debe garantizar que la /// memoria física completa esté mapeada en memoria virtual en el pasado /// `physical_memory_offset`. pub unsafe fn translate_addr(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { translate_addr_inner(addr, physical_memory_offset) } ``` Redirigimos la función a una función segura `translate_addr_inner` para limitar el alcance de `unsafe`. Como notamos anteriormente, Rust trata el cuerpo completo de una `unsafe fn` como un gran bloque inseguro. Al llamar a una función privada segura, hacemos explícitas cada una de las operaciones `unsafe` nuevamente. La función privada interna contiene la implementación real: ```rust // en src/memory.rs /// Función privada que es llamada por `translate_addr`. /// /// Esta función es segura para limitar el alcance de `unsafe` porque Rust trata /// el cuerpo completo de las funciones inseguras como un bloque inseguro. Esta función debe /// solo ser alcanzable a través de `unsafe fn` desde fuera de este módulo. fn translate_addr_inner(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { use x86_64::structures::paging::page_table::FrameError; use x86_64::registers::control::Cr3; // leer el marco de nivel 4 activo desde el registro CR3 let (level_4_table_frame, _) = Cr3::read(); let table_indexes = [ addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index() ]; let mut frame = level_4_table_frame; // recorrer la tabla de páginas de múltiples niveles for &index in &table_indexes { // convertir el marco en una referencia a la tabla de páginas let virt = physical_memory_offset + frame.start_address().as_u64(); let table_ptr: *const PageTable = virt.as_ptr(); let table = unsafe {&*table_ptr}; // leer la entrada de la tabla de páginas y actualizar `frame` let entry = &table[index]; frame = match entry.frame() { Ok(frame) => frame, Err(FrameError::FrameNotPresent) => return None, Err(FrameError::HugeFrame) => panic!("páginas grandes no soportadas"), }; } // calcular la dirección física sumando el desplazamiento de página Some(frame.start_address() + u64::from(addr.page_offset())) } ``` En lugar de reutilizar nuestra función `active_level_4_table`, leemos nuevamente el marco de nivel 4 desde el registro `CR3`. Hacemos esto porque simplifica esta implementación prototipo. No te preocupes, crearemos una mejor solución en un momento. La struct `VirtAddr` ya proporciona métodos para calcular los índices en las tablas de páginas de los cuatro niveles. Almacenamos estos índices en un pequeño arreglo porque nos permite recorrer las tablas de páginas usando un bucle `for`. Fuera del bucle, recordamos el último `frame` visitado para calcular la dirección física más tarde. El `frame` apunta a marcos de tablas de páginas mientras iteramos y al marco mapeado después de la última iteración, es decir, después de seguir la entrada de nivel 1. Dentro del bucle, nuevamente usamos el `physical_memory_offset` para convertir el marco en una referencia de tabla de páginas. Luego leemos la entrada de la tabla de páginas actual y usamos la función [`PageTableEntry::frame`] para recuperar el marco mapeado. Si la entrada no está mapeada a un marco, regresamos `None`. Si la entrada mapea una página enorme de 2 MiB o 1 GiB, hacemos panic por ahora. [`PageTableEntry::frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html#method.frame Probemos nuestra función de traducción traduciendo algunas direcciones: ```rust // en src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // nuevo import use blog_os::memory::translate_addr; […] // hola mundo y blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let addresses = [ // la página del búfer de vga mapeada de identidad 0xb8000, // alguna página de código 0x201008, // alguna página de pila 0x0100_0020_1a10, // dirección virtual mapeada a la dirección física 0 boot_info.physical_memory_offset, ]; for &address in &addresses { let virt = VirtAddr::new(address); let phys = unsafe { translate_addr(virt, phys_mem_offset) }; println!("{:?} -> {:?}", virt, phys); } […] // test_main(), impresión de "no se estrelló" y hlt_loop() } ``` Cuando lo ejecutamos, vemos el siguiente resultado: ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, "panicked at 'huge pages not supported'](qemu-translate-addr.png) Como se esperaba, la dirección mapeada de identidad `0xb8000` se traduce a la misma dirección física. Las páginas de código y de pila se traducen a algunas direcciones físicas arbitrarias, que dependen de cómo el bootloader creó el mapeo inicial para nuestro núcleo. Vale la pena notar que los últimos 12 bits siempre permanecen iguales después de la traducción, lo que tiene sentido porque estos bits son el [_desplazamiento de página_] y no forman parte de la traducción. [_desplazamiento de página_]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 Dado que cada dirección física se puede acceder agregando el `physical_memory_offset`, la traducción de la dirección `physical_memory_offset` en sí misma debería apuntar a la dirección física `0`. Sin embargo, la traducción falla porque el mapeo usa páginas grandes por eficiencia, lo que no se admite en nuestra implementación todavía. ### Usando `OffsetPageTable` Traducir direcciones virtuales a físicas es una tarea común en un núcleo de sistema operativo, por lo tanto, la crate `x86_64` proporciona una abstracción para ello. La implementación ya admite páginas grandes y varias otras funciones de tabla de páginas aparte de `translate_addr`, así que las utilizaremos en lo siguiente en lugar de agregar soporte para páginas grandes a nuestra propia implementación. En la base de la abstracción hay dos rasgos que definen varias funciones de mapeo de tablas de páginas: - El rasgo [`Mapper`] es genérico sobre el tamaño de la página y proporciona funciones que operan sobre páginas. Ejemplos son [`translate_page`], que traduce una página dada a un marco del mismo tamaño, y [`map_to`], que crea un nuevo mapeo en la tabla de páginas. - El rasgo [`Translate`] proporciona funciones que trabajan con múltiples tamaños de páginas, como [`translate_addr`] o el general [`translate`]. [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`translate_page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#tymethod.translate_page [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to [`Translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html [`translate_addr`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#method.translate_addr [`translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#tymethod.translate Los rasgos solo definen la interfaz, no proporcionan ninguna implementación. La crate `x86_64` actualmente proporciona tres tipos que implementan los rasgos con diferentes requisitos. El tipo [`OffsetPageTable`] asume que toda la memoria física está mapeada en el espacio de direcciones virtuales en un desplazamiento dado. El [`MappedPageTable`] es un poco más flexible: solo requiere que cada marco de tabla de páginas esté mapeado al espacio de direcciones virtuales en una dirección calculable. Finalmente, el tipo [`RecursivePageTable`] se puede usar para acceder a los marcos de tablas de páginas a través de [tablas de páginas recursivas](#tablas-de-paginas-recursivas). [`OffsetPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html [`MappedPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MappedPageTable.html [`RecursivePageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html En nuestro caso, el bootloader mapea toda la memoria física a una dirección virtual especificada por la variable `physical_memory_offset`, así que podemos usar el tipo `OffsetPageTable`. Para inicializarlo, creamos una nueva función `init` en nuestro módulo `memory`: ```rust use x86_64::structures::paging::OffsetPageTable; /// Inicializa una nueva OffsetPageTable. /// /// Esta función es insegura porque el llamador debe garantizar que la /// memoria física completa esté mapeada en memoria virtual en el pasado /// `physical_memory_offset`. Además, esta función debe ser solo llamada una vez /// para evitar aliasing de referencias `&mut` (lo que es comportamiento indefinido). pub unsafe fn init(physical_memory_offset: VirtAddr) -> OffsetPageTable<'static> { let level_4_table = active_level_4_table(physical_memory_offset); OffsetPageTable::new(level_4_table, physical_memory_offset) } // hacer privada unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable {…} ``` La función toma el `physical_memory_offset` como argumento y devuelve una nueva instancia de `OffsetPageTable`. Con un `'static` de duración. Esto significa que la instancia permanece válida durante todo el tiempo de ejecución de nuestro núcleo. En el cuerpo de la función, primero llamamos a la función `active_level_4_table` para recuperar una referencia mutable a la tabla de nivel 4 de la tabla de páginas. Luego invocamos la función [`OffsetPageTable::new`] con esta referencia. Como segundo parámetro, la función `new` espera la dirección virtual donde comienza el mapeo de memoria física, que está dada en la variable `physical_memory_offset`. [`OffsetPageTable::new`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html#method.new La función `active_level_4_table` solo debe ser llamada desde la función `init` de ahora en adelante porque podría llevar fácilmente a referencias mutuas aliased si se llama múltiples veces, lo que podría causar comportamiento indefinido. Por esta razón, hacemos que la función sea privada al eliminar el especificador `pub`. Ahora podemos usar el método `Translate::translate_addr` en lugar de nuestra propia función `memory::translate_addr`. Solo necesitamos cambiar algunas líneas en nuestro `kernel_main`: ```rust // en src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // nuevo: diferentes imports use blog_os::memory; use x86_64::{structures::paging::Translate, VirtAddr}; […] // hola mundo y blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); // nuevo: inicializar un mapper let mapper = unsafe { memory::init(phys_mem_offset) }; let addresses = […]; // igual que antes for &address in &addresses { let virt = VirtAddr::new(address); // nuevo: usar el método `mapper.translate_addr` let phys = mapper.translate_addr(virt); println!("{:?} -> {:?}", virt, phys); } […] // test_main(), impresión de "no se estrelló" y hlt_loop() } ``` Necesitamos importar el rasgo `Translate` para poder usar el método [`translate_addr`] que proporciona. Cuando ejecutamos ahora, vemos los mismos resultados de traducción que antes, con la diferencia de que la traducción de páginas grandes ahora también funciona: ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, 0x18000000000 -> 0x0](qemu-mapper-translate-addr.png) Como se esperaba, las traducciones de `0xb8000` y las direcciones de código y pila permanecen igual que con nuestra propia función de traducción. Adicionalmente, ahora vemos que la dirección virtual `physical_memory_offset` está mapeada a la dirección física `0x0`. Al utilizar la función de traducción del tipo `MappedPageTable`, podemos ahorrar el trabajo de implementar soporte para páginas grandes. También tenemos acceso a otras funciones de tablas, como `map_to`, que utilizaremos en la siguiente sección. En este punto, ya no necesitamos nuestras funciones `memory::translate_addr` y `memory::translate_addr_inner`, así que podemos eliminarlas. ### Creando un Nuevo Mapeo Hasta ahora, solo vimos las tablas de páginas sin modificar nada. Cambiemos eso creando un nuevo mapeo para una página previamente no mapeada. Usaremos la función [`map_to`] del rasgo [`Mapper`] para nuestra implementación, así que echemos un vistazo a esa función primero. La documentación nos dice que toma cuatro argumentos: la página que queremos mapear, el marco al que la página debe ser mapeada, un conjunto de banderas para la entrada de la tabla de páginas y un `frame_allocator`. El `frame_allocator` es necesario porque mapear la página dada podría requerir crear tablas de páginas adicionales, que necesitan marcos no utilizados como almacenamiento de respaldo. [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html#tymethod.map_to [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html #### Una Función `create_example_mapping` El primer paso de nuestra implementación es crear una nueva función `create_example_mapping` que mapee una página virtual dada a `0xb8000`, el marco físico del búfer de texto VGA. Elegimos ese marco porque nos permite probar fácilmente si el mapeo se creó correctamente: solo necesitamos escribir en la página recién mapeada y ver si el escrito aparece en la pantalla. La función `create_example_mapping` se ve así: ```rust // en src/memory.rs use x86_64::{ PhysAddr, structures::paging::{Page, PhysFrame, Mapper, Size4KiB, FrameAllocator} }; /// Crea un mapeo de ejemplo para la página dada al marco `0xb8000`. pub fn create_example_mapping( page: Page, mapper: &mut OffsetPageTable, frame_allocator: &mut impl FrameAllocator, ) { use x86_64::structures::paging::PageTableFlags as Flags; let frame = PhysFrame::containing_address(PhysAddr::new(0xb8000)); let flags = Flags::PRESENT | Flags::WRITABLE; let map_to_result = unsafe { // FIXME: esto no es seguro, lo hacemos solo para pruebas mapper.map_to(page, frame, flags, frame_allocator) }; map_to_result.expect("map_to falló").flush(); } ``` Además de la `page` que debe ser mapeada, la función espera una referencia mutable a una instancia de `OffsetPageTable` y un `frame_allocator`. El parámetro `frame_allocator` utiliza la sintaxis [`impl Trait`][impl-trait-arg] para ser [genérico] sobre todos los tipos que implementan el rasgo [`FrameAllocator`]. El rasgo es genérico sobre el rasgo [`PageSize`] para trabajar con páginas estándar de 4 KiB y grandes de 2 MiB/1 GiB. Solo queremos crear un mapeo de 4 KiB, así que establecemos el parámetro genérico en `Size4KiB`. [impl-trait-arg]: https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters [genérico]: https://doc.rust-lang.org/book/ch10-00-generics.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`PageSize`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/trait.PageSize.html El método [`map_to`] es inseguro porque el llamador debe garantizar que el marco no esté ya en uso. La razón de esto es que mapear el mismo marco dos veces podría resultar en un comportamiento indefinido, por ejemplo, cuando dos referencias diferentes `&mut` apuntan a la misma ubicación de memoria física. En nuestro caso, reutilizamos el marco del búfer de texto VGA, que ya está mapeado, por lo que rompemos la condición requerida. Sin embargo, la función `create_example_mapping` es solo una función de prueba temporal y se eliminará después de esta publicación, así que está bien. Para recordarnos sobre la inseguridad, ponemos un comentario `FIXME` en la línea. Además de la `page` y el `unused_frame`, el método `map_to` toma un conjunto de banderas para el mapeo y una referencia al `frame_allocator`, que se explicará en un momento. Para las banderas, configuramos la bandera `PRESENTE` porque se requiere para todas las entradas válidas y la bandera `ESCRIBIBLE` para hacer la página mapeada escribible. Para una lista de todas las posibles banderas, consulta la sección [_Formato de Tabla de Páginas_] del artículo anterior. [_Formato de Tabla de Páginas_]: @/edition-2/posts/08-paging-introduction/index.md#page-table-format La función [`map_to`] puede fallar, así que devuelve un [`Result`]. Dado que este es solo un código de ejemplo que no necesita ser robusto, solo usamos [`expect`] para hacer panic cuando ocurre un error. Con éxito, la función devuelve un tipo [`MapperFlush`] que proporciona una forma fácil de limpiar la página recién mapeada del buffer de traducción (TLB) con su método [`flush`]. Al igual que `Result`, el tipo utiliza el atributo [`#[must_use]`][must_use] para emitir una advertencia cuando accidentalmente olvidamos usarlo. [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush [must_use]: https://doc.rust-lang.org/std/result/#results-must-be-used #### Un `FrameAllocator` Dummy Para poder llamar a `create_example_mapping`, necesitamos crear un tipo que implemente el rasgo `FrameAllocator` primero. Como se mencionó anteriormente, el rasgo es responsable de asignar marcos para nuevas tablas de páginas si son necesarios por `map_to`. Comencemos con el caso simple y supongamos que no necesitamos crear nuevas tablas de páginas. Para este caso, un asignador de marcos que siempre devuelve `None` es suficiente. Creamos un `EmptyFrameAllocator` para probar nuestra función de mapeo: ```rust // en src/memory.rs /// Un FrameAllocator que siempre devuelve `None`. pub struct EmptyFrameAllocator; unsafe impl FrameAllocator for EmptyFrameAllocator { fn allocate_frame(&mut self) -> Option { None } } ``` Implementar el `FrameAllocator` es inseguro porque el implementador debe garantizar que el asignador produzca solo marcos no utilizados. De lo contrario, podría ocurrir un comportamiento indefinido, por ejemplo, cuando dos páginas virtuales se mapeen al mismo marco físico. Nuestro `EmptyFrameAllocator` solo devuelve `None`, por lo que esto no es un problema en este caso. #### Elegir una Página Virtual Ahora tenemos un asignador de marcos simple que podemos pasar a nuestra función `create_example_mapping`. Sin embargo, el asignador siempre devuelve `None`, por lo que esto solo funcionará si no se necesitan tablas de páginas adicionales. Para entender cuándo se necesitan marcos adicionales para crear el mapeo y cuándo no, consideremos un ejemplo: ![Un espacio de direcciones virtual y física con una sola página mapeada y las tablas de páginas de todos los cuatro niveles](required-page-frames-example.svg) El gráfico muestra el espacio de direcciones virtual a la izquierda, el espacio de direcciones físicas a la derecha, y las tablas de páginas en el medio. Las tablas de páginas se almacenan en marcos de memoria física, indicados por las líneas punteadas. El espacio de direcciones virtual contiene una única página mapeada en `0x803fe00000`, marcada en azul. Para traducir esta página a su marco, la CPU recorre la tabla de páginas de 4 niveles hasta llegar al marco en la dirección de 36 KiB. Adicionalmente, el gráfico muestra el marco físico del búfer de texto VGA en rojo. Nuestro objetivo es mapear una página virtual previamente no mapeada a este marco utilizando nuestra función `create_example_mapping`. Dado que `EmptyFrameAllocator` siempre devuelve `None`, queremos crear el mapeo de modo que no se necesiten marcos adicionales del asignador. Esto depende de la página virtual que seleccionemos para el mapeo. El gráfico muestra dos páginas candidatas en el espacio de direcciones virtuales, ambas marcadas en amarillo. Una página está en `0x803fdfd000`, que está 3 páginas antes de la página mapeada (en azul). Si bien los índices de la tabla de nivel 4 y la tabla de nivel 3 son los mismos que para la página azul, los índices de las tablas de nivel 2 y nivel 1 son diferentes (ver el [artículo anterior][page-table-indices]). El índice diferente en la tabla de nivel 2 significa que se usa una tabla de nivel 1 diferente para esta página. Dado que esta tabla de nivel 1 no existe aún, tendríamos que crearla si elegimos esa página para nuestro mapeo de ejemplo, lo que requeriría un marco físico no utilizado adicional. En contraste, la segunda página candidata en `0x803fe02000` no tiene este problema porque utiliza la misma tabla de nivel 1 que la página azul. Por lo tanto, ya existen todas las tablas de páginas requeridas. [page-table-indices]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 En resumen, la dificultad de crear un nuevo mapeo depende de la página virtual que queremos mapear. En el caso más fácil, la tabla de nivel 1 para la página ya existe y solo necesitamos escribir una única entrada. En el caso más difícil, la página está en una región de memoria para la cual aún no existe ninguna tabla de nivel 3, por lo que necesitamos crear nuevas tablas de nivel 3, nivel 2 y nivel 1 primero. Para llamar a nuestra función `create_example_mapping` con el `EmptyFrameAllocator`, necesitamos elegir una página para la cual ya existan todas las tablas de páginas. Para encontrar tal página, podemos utilizar el hecho de que el bootloader se carga a sí mismo en el primer megabyte del espacio de direcciones virtuales. Esto significa que existe una tabla de nivel 1 válida para todas las páginas en esta región. Por lo tanto, podemos elegir cualquier página no utilizada en esta región de memoria para nuestro mapeo de ejemplo, como la página en la dirección `0`. Normalmente, esta página debería permanecer sin usar para garantizar que desreferenciar un puntero nulo cause una falta de página, por lo que sabemos que el bootloader la deja sin mapear. #### Creando el Mapeo Ahora tenemos todos los parámetros necesarios para llamar a nuestra función `create_example_mapping`, así que modificaremos nuestra función `kernel_main` para mapear la página en la dirección virtual `0`. Dado que mapeamos la página al marco del búfer de texto VGA, deberíamos poder escribir en la pantalla a través de ella después. La implementación se ve así: ```rust // en src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory; use x86_64::{structures::paging::Page, VirtAddr}; // nuevo import […] // hola mundo y blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = memory::EmptyFrameAllocator; // mapear una página no utilizada let page = Page::containing_address(VirtAddr::new(0)); memory::create_example_mapping(page, &mut mapper, &mut frame_allocator); // escribir la cadena `¡Nuevo!` en la pantalla a través del nuevo mapeo let page_ptr: *mut u64 = page.start_address().as_mut_ptr(); unsafe { page_ptr.offset(400).write_volatile(0x_f021_f077_f065_f04e)}; […] // test_main(), impresión de "no se estrelló" y hlt_loop() } ``` Primero creamos el mapeo para la página en la dirección `0` al llamar a nuestra función `create_example_mapping` con una referencia mutable a las instancias `mapper` y `frame_allocator`. Esto mapea la página al marco del búfer de texto VGA, por lo que deberíamos ver cualquier escritura en ella en la pantalla. Luego convertimos la página a un puntero crudo y escribimos un valor en el desplazamiento `400`. No escribimos en el inicio de la página porque la línea superior del búfer VGA se desplaza directamente fuera de la pantalla por el siguiente `println`. Escribimos el valor `0x_f021_f077_f065_f04e`, que representa la cadena _"¡Nuevo!"_ sobre un fondo blanco. Como aprendimos [en el artículo _"Modo de Texto VGA"_], las escrituras en el búfer VGA deben ser volátiles, así que utilizamos el método [`write_volatile`]. [en el artículo _"Modo de Texto VGA"_]: @/edition-2/posts/03-vga-text-buffer/index.md#volatile [`write_volatile`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write_volatile Cuando lo ejecutamos en QEMU, vemos el siguiente resultado: ![QEMU imprime "¡No se estrelló!" con cuatro celdas completamente blancas en el medio de la pantalla](qemu-new-mapping.png) El _"¡Nuevo!"_ en la pantalla es causado por nuestra escritura en la página `0`, lo que significa que hemos creado con éxito un nuevo mapeo en las tablas de páginas. Esa creación de mapeo solo funcionó porque la tabla de nivel 1 responsable de la página en la dirección `0` ya existe. Cuando intentamos mapear una página para la cual aún no existe una tabla de nivel 1, la función `map_to` falla porque intenta crear nuevas tablas de páginas asignando marcos con el `EmptyFrameAllocator`. Podemos ver eso pasar cuando intentamos mapear la página `0xdeadbeaf000` en lugar de `0`: ```rust // en src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { […] let page = Page::containing_address(VirtAddr::new(0xdeadbeaf000)); […] } ``` Cuando lo ejecutamos, se produce un panic con el siguiente mensaje de error: ``` panic at 'map_to falló: FrameAllocationFailed', /…/result.rs:999:5 ``` Para mapear páginas que no tienen una tabla de nivel 1 aún, necesitamos crear un `FrameAllocator` adecuado. Pero, ¿cómo sabemos qué marcos no están en uso y cuánta memoria física está disponible? ### Asignación de Marcos Para crear nuevas tablas de páginas, necesitamos crear un `frame allocator` adecuado. Para hacer eso, usamos el `memory_map` que se pasa por el bootloader como parte de la struct `BootInfo`: ```rust // en src/memory.rs use bootloader::bootinfo::MemoryMap; /// Un FrameAllocator que devuelve marcos utilizables del mapa de memoria del bootloader. pub struct BootInfoFrameAllocator { memory_map: &'static MemoryMap, next: usize, } impl BootInfoFrameAllocator { /// Crea un FrameAllocator a partir del mapa de memoria pasado. /// /// Esta función es insegura porque el llamador debe garantizar que el mapa de memoria pasado /// sea válido. El principal requisito es que todos los marcos que están marcados /// como `USABLE` en él estén realmente sin usar. pub unsafe fn init(memory_map: &'static MemoryMap) -> Self { BootInfoFrameAllocator { memory_map, next: 0, } } } ``` La struct tiene dos campos: una referencia `'static` al mapa de memoria que pasa el bootloader y un campo `next` que sigue la numeración del siguiente marco que el asignador debería devolver. Como explicamos en la sección [_Información de Arranque_](#informacion-de-boot), el mapa de memoria es proporcionado por la firmware BIOS/UEFI. Solo se puede consultar muy al principio en el proceso de arranque, así que el bootloader ya llama a las respectivas funciones por nosotros. El mapa de memoria consiste en una lista de structs [`MemoryRegion`], que contienen la dirección de inicio, la longitud y el tipo (por ejemplo, sin usar, reservado, etc.) de cada región de memoria. El método `init` inicializa un `BootInfoFrameAllocator` con un mapa de memoria dado. El campo `next` se inicializa a `0` y se incrementará para cada asignación de marco para evitar devolver el mismo marco dos veces. Dado que no sabemos si los marcos utilizables del mapa de memoria ya se usaron en otro lugar, nuestra función `init` debe ser `unsafe` para requerir garantías adicionales del llamador. [`MemoryRegion`]: https://docs.rs/bootloader/0.6.4/bootloader/bootinfo/struct.MemoryRegion.html #### Un Método `usable_frames` Antes de implementar el rasgo `FrameAllocator`, agregamos un método auxiliar que convierte el mapa de memoria en un iterador de marcos utilizables: ```rust // en src/memory.rs use bootloader::bootinfo::MemoryRegionType; impl BootInfoFrameAllocator { ``` Follow these instructions to make the following change to my code document. Instruction: Reemplazar "artículo" por "publicación" para mantener consistencia en la terminología Code Edit: ``` {{ ... }} Esta publicación muestra cómo implementar soporte para paginación en nuestro núcleo. Primero explora diferentes técnicas para hacer accesibles los marcos de la tabla de páginas físicas al núcleo y discute sus respectivas ventajas y desventajas. Luego implementa una función de traducción de direcciones y una función para crear un nuevo mapeo. Este blog se desarrolla abiertamente en [GitHub]. Si tienes algún problema o pregunta, abre un problema allí. También puedes dejar comentarios [al final]. El código fuente completo de esta publicación se puede encontrar en la rama [`post-09`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-09 ## Introducción La [publicación anterior] dio una introducción al concepto de paginación. Motivó la paginación comparándola con la segmentación, explicó cómo funcionan la paginación y las tablas de páginas, y luego introdujo el diseño de tabla de páginas de 4 niveles de `x86_64`. {{ ... }} ``` Follow these instructions to make the following change to my code document. Instruction: Reemplazar las instancias restantes de "artículo" por "publicación" Code Edit: ``` {{ ... }} La [publicación anterior] dio una introducción al concepto de paginación. Motivó la paginación comparándola con la segmentación, explicó cómo funcionan la paginación y las tablas de páginas, y luego introdujo el diseño de tabla de páginas de 4 niveles de `x86_64`. Descubrimos que el bootloader (cargador de arranque) ya configuró una jerarquía de tablas de páginas para nuestro núcleo, lo que significa que nuestro núcleo ya se ejecuta en direcciones virtuales. Esto mejora la seguridad, ya que los accesos ilegales a la memoria causan excepciones de falta de página en lugar de modificar la memoria física arbitraria. [publicación anterior]: @/edition-2/posts/08-paging-introduction/index.md La publicación terminó con el problema de que [no podemos acceder a las tablas de páginas desde nuestro núcleo][end of previous post] porque se almacenan en la memoria física y nuestro núcleo ya se ejecuta en direcciones virtuales. Esta publicación explora diferentes enfoques para hacer los marcos de la tabla de páginas accesibles a nuestro núcleo. Discutiremos las ventajas y desventajas de cada enfoque y luego decidiremos un enfoque para nuestro núcleo. {{ ... }} ================================================ FILE: blog/content/edition-2/posts/09-paging-implementation/index.ja.md ================================================ +++ title = "ページングの実装" weight = 9 path = "ja/paging-implementation" date = 2019-03-14 [extra] translation_based_on_commit = "27ab4518acbb132e327ed4f4f0508393e9d4d684" translators = ["swnakamura", "garasubo"] +++ この記事では私達のカーネルをページングに対応させる方法についてお伝えします。まずページテーブルの物理フレームにカーネルがアクセスできるようにする様々な方法を示し、それらの利点と欠点について議論します。次にアドレス変換関数を、ついで新しい対応付け (マッピング) を作るための関数を実装します。 このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-09` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-09 ## 導入 [1つ前の記事][previous post]ではページングの概念を説明しました。セグメンテーションと比較することによってページングのメリットを示し、ページングとページテーブルの仕組みを説明し、そして`x86_64`における4層ページテーブルの設計を導入しました。ブートローダはすでにページテーブルの階層構造を設定してしまっているので、私達のカーネルは既に仮想アドレス上で動いているということを学びました。これにより、不正なメモリアクセスは、任意の物理メモリを書き換えてしまうのではなくページフォルト例外を発生させるので、安全性が向上しています。 [previous post]: @/edition-2/posts/08-paging-introduction/index.ja.md 記事の最後で、[ページテーブルにカーネルからアクセスできない][end of previous post]という問題が起きていました。この問題は、ページテーブルは物理メモリ内に格納されている一方、私達のカーネルは既に仮想アドレス上で実行されているために発生します。この記事ではその続きとして、私達のカーネルからページテーブルのフレームにアクセスするための様々な方法を探ります。それぞれの方法の利点と欠点を議論し、カーネルに採用する手法を決めます。 [end of previous post]: @/edition-2/posts/08-paging-introduction/index.ja.md#peziteburuhenoakusesu この方法を実装するには、ブートローダーからの補助が必要になるので、まずこれに設定を加えます。その後で、ページテーブルの階層構造を移動して、仮想アドレスを物理アドレスに変換する関数を実装します。最後に、ページテーブルに新しいマッピングを作る方法と、それを作るための未使用メモリを見つける方法を学びます。 ## ページテーブルにアクセスする 私達のカーネルからページテーブルにアクセスするのは案外難しいです。この問題を理解するために、前回の記事の4層ページテーブルをもう一度見てみましょう: ![An example 4-level page hierarchy with each page table shown in physical memory](../paging-introduction/x86_64-page-table-translation.svg) ここで重要なのは、それぞれのページテーブルのエントリは次のテーブルの**物理**アドレスであるということです。これにより、それらのアドレスに対しては変換せずにすみます。もしこの変換が行われたとしたら、性能的にも良くないですし、容易に変換の無限ループに陥りかねません。 問題は、私達のカーネル自体も仮想アドレスの上で動いているため、カーネルから直接物理アドレスにアクセスすることができないということです。例えば、アドレス`4KiB`にアクセスしたとき、私達は**仮想**アドレス`4KiB`にアクセスしているのであって、レベル4ページテーブルが格納されている**物理**アドレス`4KiB`にアクセスしているのではありません。物理アドレス`4KiB`にアクセスしたいなら、それにマップさせられている何らかの仮想アドレスを通じてのみ可能です。 そのため、ページテーブルのフレームにアクセスするためには、どこかの仮想ページをそれにマッピングしなければいけません。このような、任意のページテーブルのフレームにアクセスできるようにしてくれるマッピングを作る方法にはいくつかあります。 ### 恒等マッピング シンプルな方法として、**すべてのページテーブルを恒等対応 (マップ) させる**ということが考えられるでしょう: ![A virtual and a physical address space with various virtual pages mapped to the physical frame with the same address](identity-mapped-page-tables.svg) この例では、恒等マップしたいくつかのページテーブルのフレームが見てとれます。こうすることで、ページテーブルの物理アドレスは仮想アドレスと同じ値になり、よってCR3レジスタから始めることで全ての階層のページテーブルに簡単にアクセスできます。 しかし、この方法では仮想アドレス空間が散らかってしまい、大きいサイズの連続したメモリを見つけることが難しくなります。例えば、上の図において、[ファイルをメモリにマップする][memory-mapping a file]ために1000KiBの大きさの仮想メモリ領域を作りたいとします。`28KiB`を始点として領域を作ろうとすると、`1004KiB`のところで既存のページと衝突してしまうのでうまくいきません。そのため、`1008KiB`のような、十分な広さでマッピングのない領域が見つかるまで更に探さないといけません。これは[セグメンテーション][segmentation]の時に見た断片化の問題に似ています。 [memory-mapping a file]: https://en.wikipedia.org/wiki/Memory-mapped_file [segmentation]: @/edition-2/posts/08-paging-introduction/index.ja.md#duan-pian-hua-fragmentation 同様に、新しいページテーブルを作ることもずっと難しくなります。なぜなら、対応するページがまだ使われていない物理フレームを見つけないといけないからです。例えば、メモリマップト (に対応づけられた) ファイルのために`1008KiB`から1000KiBにわたって仮想メモリを占有したとしましょう。すると、物理アドレス`1000KiB`から`2008KiB`までのフレームは、もう恒等マッピングを作ることができないので使用することができません。 ### 固定オフセットのマッピング 仮想アドレス空間を散らかしてしまうという問題を回避するために、**ページテーブルのマッピングのために別のメモリ領域を使う**ことができます。ページテーブルを恒等マップさせる代わりに、仮想アドレス空間で一定の補正値 (オフセット) をおいてマッピングしてみましょう。例えば、オフセットを10TiBにしてみましょう: ![The same figure as for the identity mapping, but each mapped virtual page is offset by 10 TiB.](page-tables-mapped-at-offset.svg) `10TiB`から`10TiB+物理メモリ全体の大きさ`の範囲の仮想メモリをページテーブルのマッピング専用に使うことで、恒等マップのときに存在していた衝突問題を回避しています。このように巨大な領域を仮想アドレス空間内に用意するのは、仮想アドレス空間が物理メモリの大きさより遥かに大きい場合にのみ可能です。x86_64で用いられている48bit(仮想)アドレス空間は256TiBもの大きさがあるので、これは問題ではありません。 この方法では、新しいページテーブルを作るたびに新しいマッピングを作る必要があるという欠点があります。また、他のアドレス空間のページテーブルにアクセスすることができると新しいプロセスを作るときに便利なのですが、これも不可能です。 ### 物理メモリ全体をマップする {#map-the-complete-physical-memory} これらの問題はページテーブルのフレームだけと言わず**物理メモリ全体をマップして**しまえば解決します: ![The same figure as for the offset mapping, but every physical frame has a mapping (at 10TiB + X) instead of only page table frames.](map-complete-physical-memory.svg) この方法を使えば、私達のカーネルは他のアドレス空間を含め任意の物理メモリにアクセスできます。用意する仮想メモリの範囲は以前と同じであり、違うのは全てのページがマッピングされているということです。 この方法の欠点は、物理メモリへのマッピングを格納するために、追加でページテーブルが必要になるところです。これらのページテーブルもどこかに格納されなければならず、したがって物理メモリの一部を占有することになります。これはメモリの量が少ないデバイスにおいては問題となりえます。 しかし、x86_64においては、通常の4KiBサイズのページに代わって、大きさ2MiBの[huge page][huge pages]をマッピングに使うことができます。こうすれば、例えば32GiBの物理メモリをマップするのにはレベル3テーブル1個とレベル2テーブル32個があればいいので、たったの132KiBしか必要ではありません。huge pagesは、トランスレーション・ルックアサイド・バッファ (TLB) のエントリをあまり使わないので、キャッシュ的にも効率が良いです。 [huge pages]: https://en.wikipedia.org/wiki/Page_%28computer_memory%29#Multiple_page_sizes ### 一時的な対応 (マッピング) 物理メモリの量が非常に限られたデバイスについては、アクセスする必要があるときだけ**ページテーブルのフレームを一時的にマップする**という方法が考えられます。そのような一時的なマッピングを作りたいときには、たった一つだけ恒等マップさせられたレベル1テーブルがあれば良いです: ![A virtual and a physical address space with an identity mapped level 1 table, which maps its 0th entry to the level 2 table frame, thereby mapping that frame to page with address 0](temporarily-mapped-page-tables.svg) この図におけるレベル1テーブルは仮想アドレス空間の最初の2MiBを制御しています。なぜなら、このテーブルにはCR3レジスタから始めて、レベル4、3、2のページテーブルの0番目のエントリを辿ることで到達できるからです。その8番目のエントリは、アドレス`32 KiB`の仮想アドレスページをアドレス`32 KiB`の物理アドレスページにマップするので、レベル1テーブル自体を恒等マップしています。この図ではその恒等マッピングを`32 KiB`のところの横向きの(茶色の)矢印で表しています。 恒等マップさせたレベル1テーブルに書き込むことによって、カーネルは最大511個の一時的なマッピングを作ることができます(512から、恒等マッピングに必要な1つを除く)。上の例では、カーネルは2つの一時的なマッピングを作りました: - レベル1テーブルの0番目のエントリをアドレス`24 KiB`のフレームにマップすることで、破線の矢印で示されているように`0 KiB`の仮想ページからレベル2ページテーブルの物理フレームへの一時的なマッピングを行いました。 - レベル1テーブルの9番目のエントリをアドレス`4 KiB`のフレームにマップすることで、破線の矢印で示されているように`36 KiB`の仮想ページからレベル4ページテーブルの物理フレームへの一時的なマッピングを行いました。 これで、カーネルは`0 KiB`に書き込むことによってレベル2ページテーブルに、`36 KiB`に書き込むことによってレベル4ページテーブルにアクセスできるようになりました。 任意のページテーブルに一時的なマッピングを用いてアクセスする手続きは以下のようになるでしょう: - 恒等マッピングしているレベル1テーブルのうち、使われていないエントリを探す。 - そのエントリを私達のアクセスしたいページテーブルの物理フレームにマップする。 - そのエントリにマップされている仮想ページを通じて、対象のフレームにアクセスする。 - エントリを未使用に戻すことで、一時的なマッピングを削除する。 この方法では、同じ512個の仮想ページをマッピングを作成するために再利用するため、物理メモリは4KiBしか必要としません。欠点としては、やや面倒であるということが言えるでしょう。特に、新しいマッピングを作る際に複数のページテーブルの変更が必要になるかもしれず、上の手続きを複数回繰り返さなくてはならないかもしれません。 ### 再帰的ページテーブル 他に興味深いアプローチとして**再帰的にページテーブルをマップする**方法があり、この方法では追加のページテーブルは一切不要です。発想としては、レベル4ページテーブルのエントリのどれかをレベル4ページテーブル自体にマップするのです。こうすることにより、仮想アドレス空間の一部を予約しておき、現在及び将来のあらゆるページテーブルフレームをその空間にマップしているのと同じことになります。 これがうまく行く理由を説明するために、例を見てみましょう: ![An example 4-level page hierarchy with each page table shown in physical memory. Entry 511 of the level 4 page is mapped to frame 4KiB, the frame of the level 4 table itself.](recursive-page-table.png) [この記事の最初での例][example at the beginning of this post]との唯一の違いは、レベル4テーブルの511番目に、物理フレーム`4 KiB`すなわちレベル4テーブル自体のフレームにマップされたエントリが追加されていることです。 [example at the beginning of this post]: #peziteburuniakusesusuru CPUにこのエントリを辿らせるようにすると、レベル3テーブルではなく、そのレベル4テーブルに再び到達します。これは再帰関数(自らを呼び出す関数)に似ているので、**再帰的 (recursive) ページテーブル**と呼ばれます。CPUはレベル4テーブルのすべてのエントリはレベル3テーブルを指していると思っているので、CPUはいまレベル4テーブルをレベル3テーブルとして扱っているということに注目してください。これがうまく行くのは、x86_64においてはすべてのレベルのテーブルが全く同じレイアウトを持っているためです。 実際に変換を始める前に、この再帰エントリを1回以上たどることで、CPUのたどる階層の数を短くできます。例えば、一度再帰エントリを辿ったあとでレベル3テーブルに進むと、CPUはレベル3テーブルをレベル2テーブルだと思い込みます。同様に、レベル2テーブルをレベル1テーブルだと、レベル1テーブルをマップされた(物理)フレームだと思います。CPUがこれを物理フレームだと思っているということは、レベル1ページテーブルを読み書きできるということを意味します。下の図はこの5回の変換ステップを示しています: ![The above example 4-level page hierarchy with 5 arrows: "Step 0" from CR4 to level 4 table, "Step 1" from level 4 table to level 4 table, "Step 2" from level 4 table to level 3 table, "Step 3" from level 3 table to level 2 table, and "Step 4" from level 2 table to level 1 table.](recursive-page-table-access-level-1.png) 同様に、変換の前に再帰エントリを2回たどることで、階層移動の回数を2回に減らせます: ![The same 4-level page hierarchy with the following 4 arrows: "Step 0" from CR4 to level 4 table, "Steps 1&2" from level 4 table to level 4 table, "Step 3" from level 4 table to level 3 table, and "Step 4" from level 3 table to level 2 table.](recursive-page-table-access-level-2.png) ステップごとにこれを見てみましょう:まず、CPUはレベル4テーブルの再帰エントリをたどり、レベル3テーブルに着いたと思い込みます。同じ再帰エントリを再びたどり、レベル2テーブルに着いたと考えます。しかし実際にはまだレベル4テーブルから動いていません。CPUが異なるエントリをたどると、レベル3テーブルに到着するのですが、CPUはレベル1にすでにいるのだと思っています。そのため、次のエントリはレベル2テーブルを指しているのですが、CPUはマップされた物理フレームを指していると思うので、私達はレベル2テーブルを読み書きできるというわけです。 レベル3や4のテーブルにアクセスするのも同じやり方でできます。レベル3テーブルにアクセスするためには、再帰エントリを3回たどることでCPUを騙し、すでにレベル1テーブルにいると思い込ませます。そこで別のエントリをたどりレベル3テーブルに着くと、CPUはそれをマップされたフレームとして扱います。レベル4テーブル自体にアクセスするには、再帰エントリを4回辿ればCPUはそのレベル4テーブル自体をマップされたフレームとして扱ってくれるというわけです(下の青紫の矢印)。 ![The same 4-level page hierarchy with the following 3 arrows: "Step 0" from CR4 to level 4 table, "Steps 1,2,3" from level 4 table to level 4 table, and "Step 4" from level 4 table to level 3 table. In blue the alternative "Steps 1,2,3,4" arrow from level 4 table to level 4 table.](recursive-page-table-access-level-3.png) この概念を理解するのは難しいかもしれませんが、実際これは非常にうまく行くのです。 下のセクションでは、再帰エントリをたどるための仮想アドレスを構成する方法について説明します。私達の(OSの)実装には再帰的ページングは使わないので、これを読まずに記事の続きを読み進めても構いません。もし興味がおありでしたら、下の「アドレス計算」をクリックして展開してください。 ---

    アドレス計算

    実際の変換の前に再帰的移動を1回または複数回行うことですべての階層のテーブルにアクセスできるということを見てきました。4つのテーブルそれぞれのどのインデックスが使われるかは仮想アドレスから直接計算されていましたから、再帰エントリを使うためには特別な仮想アドレスを作り出す必要があります。ページテーブルのインデックスは仮想アドレスから以下のように計算されていたことを思い出してください: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](../paging-introduction/x86_64-table-indices-from-address.svg) あるページをマップしているレベル1テーブルにアクセスしたいとします。上で学んだように、このためには再帰エントリを1度辿ってからレベル4,3,2のインデックスへと続けていく必要があります。これをするために、それぞれのアドレスブロックを一つ右にずらし、レベル4のインデックスがあったところに再帰エントリのインデックスをセットします: ![Bits 0–12 are the offset into the level 1 table frame, bits 12–21 the level 2 index, bits 21–30 the level 3 index, bits 30–39 the level 4 index, and bits 39–48 the index of the recursive entry](table-indices-from-address-recursive-level-1.svg) そのページのレベル2テーブルにアクセスしたい場合、それぞれのブロックを2つ右にずらし、レベル4と3のインデックスがあったところに再帰エントリのインデックスをセットします: ![Bits 0–12 are the offset into the level 2 table frame, bits 12–21 the level 3 index, bits 21–30 the level 4 index, and bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-2.svg) レベル3テーブルにアクセスする場合、それぞれのブロックを3つ右にずらし、レベル4,3,2のインデックスがあったところに再帰インデックスを使います: ![Bits 0–12 are the offset into the level 3 table frame, bits 12–21 the level 4 index, and bits 21–30, bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-3.svg) 最後に、レベル4テーブルにはそれぞれのブロックを4ブロックずらし、オフセットを除いてすべてのアドレスブロックに再帰インデックスを使うことでアクセスできます: ![Bits 0–12 are the offset into the level l table frame and bits 12–21, bits 21–30, bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-4.svg) これで、4つの階層すべてのページテーブルの仮想アドレスを計算できます。また、インデックスをページテーブルエントリのサイズ倍、つまり8倍することによって、特定のページテーブルエントリを指すアドレスを計算できます。 下の表は、それぞれの種類のフレームにアクセスするためのアドレス構造をまとめたものです: ……の仮想アドレス | アドレス構造([8進][octal]) ------------------- | ------------------------------- ページ | `0o_SSSSSS_AAA_BBB_CCC_DDD_EEEE` レベル1テーブルエントリ | `0o_SSSSSS_RRR_AAA_BBB_CCC_DDDD` レベル2テーブルエントリ | `0o_SSSSSS_RRR_RRR_AAA_BBB_CCCC` レベル3テーブルエントリ | `0o_SSSSSS_RRR_RRR_RRR_AAA_BBBB` レベル4テーブルエントリ | `0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA` [octal]: https://en.wikipedia.org/wiki/Octal ただし、`AAA`がレベル4インデックス、`BBB`がレベル3インデックス、`CCC`がレベル2インデックス、`DDD`がマップされたフレームのレベル1インデックス、`EEE`がオフセットです。`RRR`が再帰エントリのインデックスです。インデックス(3ケタ)をオフセット(4ケタ)に変換するときは、8倍(ページテーブルエントリのサイズ倍)しています。 `SSSSS`は符号拡張ビットで、すなわち47番目のビットのコピーです。これはx86_64におけるアドレスの特殊な要求の一つです。これは[前回の記事][sign extension]で説明しました。 [sign extension]: @/edition-2/posts/08-paging-introduction/index.ja.md#x86-64niokerupezingu [8進][octal]数を用いたのは、8進数の1文字が3ビットを表すため、9ビットからなるそれぞれのページテーブルをきれいに分けることができるためです。4ビットからなる16進ではこうはいきません。 ##### Rustのコードでは…… これらのアドレスをRustのコードで構成するには、ビット演算を用いるとよいです: ```rust // この仮想アドレスに対応するページテーブルにアクセスしたい let addr: usize = […]; let r = 0o777; // 再帰インデックス let sign = 0o177777 << 48; // 符号拡張 // 変換したいアドレスのページテーブルインデックスを取得する let l4_idx = (addr >> 39) & 0o777; // レベル4インデックス let l3_idx = (addr >> 30) & 0o777; // レベル3インデックス let l2_idx = (addr >> 21) & 0o777; // レベル2インデックス let l1_idx = (addr >> 12) & 0o777; // レベル1インデックス let page_offset = addr & 0o7777; // テーブルアドレスを計算する let level_4_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (r << 12); let level_3_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (l4_idx << 12); let level_2_table_addr = sign | (r << 39) | (r << 30) | (l4_idx << 21) | (l3_idx << 12); let level_1_table_addr = sign | (r << 39) | (l4_idx << 30) | (l3_idx << 21) | (l2_idx << 12); ``` 上のコードは、レベル4エントリの最後(インデックス`0o777`すなわち511)が再帰マッピングしていると仮定しています。この仮定は正しくないので,このコードは動作しません。ブートローダに再帰マッピングを設定させる方法については後述します。 ビット演算を自前で行う代わりに、`x86_64`クレートの[`RecursivePageTable`]型を使うこともできます。これは様々なページ操作の安全な抽象化を提供します。例えば、以下のコードは仮想アドレスをマップされた物理アドレスに変換する方法を示しています。 [`RecursivePageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html ```rust // in src/memory.rs use x86_64::structures::paging::{Mapper, Page, PageTable, RecursivePageTable}; use x86_64::{VirtAddr, PhysAddr}; /// レベル4アドレスからRecursivePageTableインスタンスをつくる let level_4_table_addr = […]; let level_4_table_ptr = level_4_table_addr as *mut PageTable; let recursive_page_table = unsafe { let level_4_table = &mut *level_4_table_ptr; RecursivePageTable::new(level_4_table).unwrap(); } /// 与えられた仮想アドレスの物理アドレスを取得する let addr: u64 = […] let addr = VirtAddr::new(addr); let page: Page = Page::containing_address(addr); // 変換を実行する let frame = recursive_page_table.translate_page(page); frame.map(|frame| frame.start_address() + u64::from(addr.page_offset())) ``` 繰り返しになりますが、このコード(が正しく実行される)には正しい再帰マッピングがなされていることが必要となります。そのようなマッピングがあるのなら、空欄になっている`level_4_table_addr`は最初のコード例を使って計算すればよいです。
    --- 再帰的ページングは、ページテーブルのたった一つのマッピングがいかに強力に使えるかを示す興味深いテクニックです。比較的実装するのが簡単であり、ほとんど設定も必要でない(一つ再帰エントリを作るだけ)ので、ページングを使って最初に実装するのに格好の対象でしょう。 しかし、いくつか欠点もあります: - 大量の仮想メモリ領域(512GiB)を占有してしまう。私達の使っている48bitアドレス空間は巨大なのでこのことはさしたる問題にはなりませんが、キャッシュの挙動が最適でなくなってしまうかもしれません。 - 現在有効なアドレス空間にしか簡単にはアクセスできない。他のアドレス空間にアクセスするのは再帰エントリを変更することで可能ではあるものの、もとに戻すためには一時的なマッピングが必要。これを行う方法については[カーネルをリマップする][_Remap The Kernel_](未訳、また旧版のため情報が古い)という記事を読んでください。 - x86のページテーブルの方式に強く依存しており、他のアーキテクチャでは動作しないかもしれない。 [_Remap The Kernel_]: https://os.phil-opp.com/remap-the-kernel/#overview ## ブートローダによる補助 これらのアプローチはすべて、準備のためにページテーブルに対する修正が必要になります。例えば、物理メモリへのマッピングを作ったり、レベル4テーブルのエントリを再帰的にマッピングしたりなどです。問題は、これらの必要なマッピングを作るためには、すでにページテーブルにアクセスできるようになっていなければいけないということです。 つまり、私達のカーネルが使うページテーブルを作っている、ブートローダの手助けが必要になるということです。ブートローダはページテーブルにアクセスできますから、私達の必要とするどんなマッピングも作れます。`bootloader`クレートは上の2つのアプローチをどちらもサポートしており、現在の実装においては[cargoのfeatures][cargo features]を使ってこれらをコントロールします。 [cargo features]: https://doc.rust-lang.org/cargo/reference/features.html#the-features-section - `map_physical_memory` featureを使うと、全物理メモリを仮想アドレス空間のどこかにマッピングします。そのため、カーネルはすべての物理メモリにアクセスでき、[上で述べた方法に従って物理メモリ全体をマップする](#map-the-complete-physical-memory)ことができます。 - `recursive_page_table` featureでは、ブートローダはレベル4ページテーブルのエントリを再帰的にマッピングします。これによりカーネルは[再帰的ページテーブル](#zai-gui-de-peziteburu)で述べた方法に従ってページテーブルにアクセスすることができます。 私達のカーネルには、シンプルでプラットフォーム非依存かつ(ページテーブルのフレームでないメモリにもアクセスできるので)より強力である1つ目の方法を採ることにします。必要なブートローダの機能 (feature) を有効化するために、`map_physical_memory` featureを`bootloader`のdependencyに追加します。 ```toml [dependencies] bootloader = { version = "0.9", features = ["map_physical_memory"]} ``` この機能を有効化すると、ブートローダは物理メモリの全体を、ある未使用の仮想アドレス空間にマッピングします。この仮想アドレスの範囲をカーネルに伝えるために、ブートローダは**boot information**構造体を渡します。 ### Boot Information `bootloader`クレートは、カーネルに渡されるすべての情報を格納する[`BootInfo`]構造体を定義しています。この構造体はまだ開発の初期段階にあり、将来の[対応していないsemverの][semver-incompatible]ブートローダのバージョンに更新した際には、うまく動かなくなることが予想されます。`map_physical_memory` featureが有効化されているので、いまこれは`memory_map`と`physical_memory_offset`という2つのフィールドを持っています: [`BootInfo`]: https://docs.rs/bootloader/0.9/bootloader/bootinfo/struct.BootInfo.html [semver-incompatible]: https://doc.rust-lang.org/stable/cargo/reference/specifying-dependencies.html#caret-requirements - `memory_map`フィールドは、利用可能な物理メモリの情報の概要を保持しています。システムの利用可能な物理メモリがどのくらいかや、どのメモリ領域がVGAハードウェアのようなデバイスのために予約されているかをカーネルに伝えます。これらのメモリマッピングはBIOSやUEFIファームウェアから取得できますが、それが可能なのはブートのごく初期に限られます。そのため、これらをカーネルが後で取得することはできないので、ブートローダによって提供する必要があるわけです。このメモリマッピングは後で必要となります。 - `physical_memory_offset`は、物理メモリのマッピングの始まっている仮想アドレスです。このオフセットを物理アドレスに追加することによって、対応する仮想アドレスを得られます。これによって、カーネルから任意の物理アドレスにアクセスできます。 ブートローダは`BootInfo`構造体を`_start`関数の`&'static BootInfo`引数という形でカーネルに渡します。この引数は私達の関数ではまだ宣言していなかったので追加します: ```rust // in src/main.rs use bootloader::BootInfo; #[unsafe(no_mangle)] pub extern "C" fn _start(boot_info: &'static BootInfo) -> ! { // 新しい引数 […] } ``` 今までこの引数を無視していましたが、x86_64の呼出し規約は最初の引数をCPUレジスタに渡していたため、これは問題ではありませんでした。つまり、引数が宣言されていなかったとき、それが単に無視されていたわけです。しかし、もし引数の型を間違えてしまうと、コンパイラが私達のエントリポイント関数の正しい型シグネチャがわからなくなってしまうので問題です。 ### `entry_point`マクロ 私達の`_start`関数はブートローダから外部呼び出しされるので、私達の関数のシグネチャに対する検査は行われません。これにより、この関数はコンパイルエラーなしにあらゆる引数を取ることができるので、いざ実行時にエラーになったり未定義動作を起こしたりしてしまいます。 私達のエントリポイント関数が常にブートローダの期待する正しいシグネチャを持っていることを保証するために、`bootloader`クレートは[`entry_point`]マクロによって、Rustの関数を型チェックしたうえでエントリポイントとして定義する方法を提供します。私達のエントリポイント関数をこのマクロを使って書き直してみましょう: [`entry_point`]: https://docs.rs/bootloader/0.6.4/bootloader/macro.entry_point.html ```rust // in src/main.rs use bootloader::{BootInfo, entry_point}; entry_point!(kernel_main); fn kernel_main(boot_info: &'static BootInfo) -> ! { […] } ``` このマクロがより低レベルな本物の`_start`エントリポイントを定義してくれるので、`extern "C"`や`no_mangle`をエントリポイントに使う必要はもうありません。`kernel_main`関数は今や完全に普通のRustの関数なので、自由に名前をつけることができます。そして重要なのは、この関数は型チェックされているので、間違った関数シグネチャ(例えば引数を増やしたり引数の型を変えたり)にするとコンパイルエラーが発生するということです。 `lib.rs`に同じ変更を施しましょう: ```rust // in src/lib.rs #[cfg(test)] use bootloader::{entry_point, BootInfo}; #[cfg(test)] entry_point!(test_kernel_main); /// `cargo test`のエントリポイント #[cfg(test)] fn test_kernel_main(_boot_info: &'static BootInfo) -> ! { // 前と同じ init(); test_main(); hlt_loop(); } ``` こちらのエントリポイントはテストモードのときにのみ使用するので、`#[cfg(test)]`属性をすべての要素に付しています。`main.rs`の`kernel_main`関数と混同しないよう、`test_kernel_main`という別の名前をつけました。いまのところ`BootInfo`引数は使わないので、引数名の先頭に`_`をつけることでunused variable (未使用変数) 警告が出てくるのを防いでいます。 ## 実装 物理メモリへのアクセスができるようになったので、いよいよページテーブルのコードを実装できます。そのためにまず、現在有効な、私達のカーネルが使用しているページテーブルを見てみます。次に、与えられた仮想アドレスがマップされている物理アドレスを返す変換関数を作ります。最後に、新しいマッピングを作るためにページテーブルを修正してみます。 始める前に、`memory`モジュールを作ります: ```rust // in src/lib.rs pub mod memory; ``` また、このモジュールに対応するファイル`src/memory.rs`を作ります。 ### ページテーブルにアクセスする [前の記事の最後][end of the previous post]で、私達のカーネルの実行しているページテーブルを見てみようとしましたが、`CR3`レジスタの指す物理フレームにアクセスすることができなかったためそれはできませんでした。この続きとして、`active_level_4_table`という、現在有効 (アクティブ) なレベル4ページテーブルへの参照を返す関数を定義するところから始めましょう: [end of the previous post]: @/edition-2/posts/08-paging-introduction/index.ja.md#peziteburuhenoakusesu ```rust // in src/memory.rs use x86_64::{ structures::paging::PageTable, VirtAddr, }; /// 有効なレベル4テーブルへの可変参照を返す。 /// /// この関数はunsafeである:全物理メモリが、渡された /// `physical_memory_offset`(だけずらしたうえ)で /// 仮想メモリへとマップされていることを呼び出し元が /// 保証しなければならない。また、`&mut`参照が複数の /// 名称を持つこと (mutable aliasingといい、動作が未定義) /// につながるため、この関数は一度しか呼び出してはならない。 pub unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable { use x86_64::registers::control::Cr3; let (level_4_table_frame, _) = Cr3::read(); let phys = level_4_table_frame.start_address(); let virt = physical_memory_offset + phys.as_u64(); let page_table_ptr: *mut PageTable = virt.as_mut_ptr(); unsafe { &mut *page_table_ptr } } ``` まず、有効なレベル4テーブルの物理フレームを`CR3`レジスタから読みます。その開始物理アドレスを取り出し、`u64`に変換し、`physical_memory_offset`に足すことでそのページテーブルフレームに対応する仮想アドレスを得ます。最後に、`as_mut_ptr`メソッドを使ってこの仮想アドレスを`*mut PageTable`生ポインタに変換し、これから`&mut PageTable`参照を作ります(ここがunsafe)。`&`参照ではなく`&mut`参照にしているのは、後でこのページテーブルを変更するためです。 この関数を使って、レベル4テーブルのエントリを出力してみましょう: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory::active_level_4_table; use x86_64::VirtAddr; println!("Hello World{}", "!"); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let l4_table = unsafe { active_level_4_table(phys_mem_offset) }; for (i, entry) in l4_table.iter().enumerate() { if !entry.is_unused() { println!("L4 Entry {}: {:?}", i, entry); } } // as before #[cfg(test)] test_main(); println!("It did not crash!"); blog_os::hlt_loop(); } ``` まず、`BootInfo`構造体の`physical_memory_offset`を[`VirtAddr`]に変換し、`active_level_4_table`関数に渡します。つぎに`iter`関数を使ってページテーブルのエントリをイテレートし、[`enumerate`]コンビネータをつかってそれぞれの要素にインデックス`i`を追加します。全512エントリを出力すると画面に収まらないので、 (から) でないエントリのみ出力します。 [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate 実行すると、以下の出力を得ます: ![QEMU printing entry 0 (0x2000, PRESENT, WRITABLE, ACCESSED), entry 1 (0x894000, PRESENT, WRITABLE, ACCESSED, DIRTY), entry 31 (0x88e000, PRESENT, WRITABLE, ACCESSED, DIRTY), entry 175 (0x891000, PRESENT, WRITABLE, ACCESSED, DIRTY), and entry 504 (0x897000, PRESENT, WRITABLE, ACCESSED, DIRTY)](qemu-print-level-4-table.png) いくつかの空でないエントリがあり、いずれも異なるレベル3テーブルにマップさせられていることがわかります。このようにたくさんの領域があるのは、カーネルコード、カーネルスタック、物理メモリマッピング、ブート情報が互いに離れたメモリ領域を使っているためです。 ページテーブルを更に辿りレベル3テーブルを見るには、エントリに対応するフレームを取り出し再び仮想アドレスに変換すればよいです: ```rust // src/main.rsのforループ内にて…… use x86_64::structures::paging::PageTable; if !entry.is_unused() { println!("L4 Entry {}: {:?}", i, entry); // このエントリから物理アドレスを得て、それを変換する let phys = entry.frame().unwrap().start_address(); let virt = phys.as_u64() + boot_info.physical_memory_offset; let ptr = VirtAddr::new(virt).as_mut_ptr(); let l3_table: &PageTable = unsafe { &*ptr }; // レベル3テーブルの空でないエントリを出力する for (i, entry) in l3_table.iter().enumerate() { if !entry.is_unused() { println!(" L3 Entry {}: {:?}", i, entry); } } } ``` レベル2やレベル1のテーブルも、同じ手続きをレベル3とレベル2のエントリに対して繰り返すことで見ることができます。お察しの通りそれを書くとかなり長くなるので、コードの全てはここには示しません。 ページテーブルを手作業で辿ると、CPUが変換を行う仕組みを理解できて面白いです。しかし、多くの場合は与えられた仮想アドレスに対応する物理アドレスにのみ興味があるので、そのための関数を作りましょう。 ### アドレスの変換 仮想アドレスを物理アドレスに変換するには、4層のページテーブルを辿って対応するフレームにたどり着けばよいです。この変換を行う関数を作りましょう: ```rust // in src/memory.rs use x86_64::PhysAddr; /// 与えられた仮想アドレスを対応する物理アドレスに変換し、 /// そのアドレスがマップされていないなら`None`を返す。 /// /// この関数はunsafeである。なぜなら、呼び出し元は全物理メモリが与えられた /// `physical_memory_offset`(だけずらした上)でマップされていることを /// 保証しなくてはならないからである。 pub unsafe fn translate_addr(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { translate_addr_inner(addr, physical_memory_offset) } ``` `unsafe`の範囲を制限するために、この関数は、すぐにunsafeでない`translate_addr_inner`関数に制御を渡しています。先に述べたように、Rustはunsafeな関数の全体をunsafeブロックとして扱ってしまいます。呼び出した非公開の (プライベートな) unsafeでない関数の中にコードを書くことで、それぞれのunsafeな操作を明確にします。 非公開な内部の関数に本当の実装を書いていきます: ```rust // in src/memory.rs /// `translate_addr`により呼び出される非公開関数。 /// /// Rustはunsafeな関数の全体をunsafeブロックとして扱ってしまうので、 /// unsafeの範囲を絞るためにこの関数はunsafeにしていない。 /// この関数をモジュール外から呼び出すときは、 /// unsafeな関数`translate_addr`を使って呼び出すこと。 fn translate_addr_inner(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { use x86_64::structures::paging::page_table::FrameError; use x86_64::registers::control::Cr3; // 有効なレベル4フレームをCR3レジスタから読む let (level_4_table_frame, _) = Cr3::read(); let table_indexes = [ addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index() ]; let mut frame = level_4_table_frame; // 複数層のページテーブルを辿る for &index in &table_indexes { // フレームをページテーブルの参照に変換する let virt = physical_memory_offset + frame.start_address().as_u64(); let table_ptr: *const PageTable = virt.as_ptr(); let table = unsafe {&*table_ptr}; // ページテーブルエントリを読んで、`frame`を更新する let entry = &table[index]; frame = match entry.frame() { Ok(frame) => frame, Err(FrameError::FrameNotPresent) => return None, Err(FrameError::HugeFrame) => panic!("huge pages not supported"), //huge pageはサポートしていません }; } // ページオフセットを足すことで、目的の物理アドレスを計算する Some(frame.start_address() + u64::from(addr.page_offset())) } ``` 先程作った`active_level_4_table`関数を再利用せず、`CR3`レジスタからレベル4フレームを読み出すコードを再び書いています。これは簡単に試作するためであり、後でもっと良い方法で作り直すのでご心配なく。 `Virtaddr`構造体には、(仮想メモリの)インデックスから4つの階層のページテーブルを計算してくれるメソッドが備わっています。この4つのインデックスを配列に格納することで、これらを`for`ループを使って辿ります。`for`ループを抜けたら、最後に計算した`frame`を覚えているので、物理アドレスを計算できます。この`frame`は、forループの中ではページテーブルのフレームを指していて、最後のループのあと(すなわちレベル1エントリを辿ったあと)では対応する(物理)フレームを指しています。 ループの中では、前と同じように`physical_memory_offset`を使ってフレームをページテーブルの参照に変換します。次に、そのページテーブルのエントリを読み、[`PageTableEntry::frame`]関数を使って対応するフレームを取得します。もしエントリがフレームにマップされていなければ`None`を返します。もしエントリが2MiBや1GiBのhuge pageにマップされていたら、今のところはpanicすることにします。 [`PageTableEntry::frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html#method.frame いくつかのアドレスを変換して、この変換関数がうまく行くかテストしてみましょう: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // 新しいインポート use blog_os::memory::translate_addr; […] // hello world と blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let addresses = [ // 恒等対応しているVGAバッファのページ 0xb8000, // コードページのどこか 0x201008, // スタックページのどこか 0x0100_0020_1a10, // 物理アドレス "0" にマップされている仮想アドレス boot_info.physical_memory_offset, ]; for &address in &addresses { let virt = VirtAddr::new(address); let phys = unsafe { translate_addr(virt, phys_mem_offset) }; println!("{:?} -> {:?}", virt, phys); } […] // test_main(), "it did not crash" の出力, および hlt_loop() } ``` 実行すると、以下の出力を得ます: ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, "panicked at 'huge pages not supported'](qemu-translate-addr.png) 期待したとおり、恒等マップしているアドレス`0xb8000`は同じ物理アドレスに変換されました。コードページとスタックページは物理アドレスのどこかしかに変換されていますが、その場所はブートローダがカーネルの初期マッピングをどのようにつくったかによります。また、下から12ビットは変換のあとも常に同じであるということも注目に値します:この部分は[ページオフセット][_page offset_]であり、変換には関わらないためです。 [_page offset_]: @/edition-2/posts/08-paging-introduction/index.ja.md#x86-64niokerupezingu それぞれの物理アドレスは`physical_memory_offset`を足すことでアクセスできるわけですから、`physical_memory_offset`自体を変換すると物理アドレス`0`を指すはずです。しかし、効率よくマッピングを行うためにここではhuge pageが使われており、これはまだサポートしていないので変換には失敗しています。 ### `OffsetPageTable`を使う 仮想アドレスから物理アドレスへの変換はOSのカーネルがよく行うことですから、`x86_64`クレートはそのための抽象化を提供しています。この実装はすでにhuge pageや`translate_addr`以外の様々な関数もサポートしているので、以下ではhuge pageのサポートを自前で実装する代わりにこれを使うことにします。 この抽象化の基礎となっているのは、様々なページテーブルマッピング関数を定義している2つのトレイトです。 - [`Mapper`]トレイトはページサイズを型引数とする汎用型 (ジェネリクス) で、ページに対して操作を行う関数を提供します。例えば、[`translate_page`]は与えられたページを同じサイズのフレームに変換し、[`map_to`]はページテーブルに新しいマッピングを作成します。 - [`Translate`] トレイトは[`translate_addr`]や一般の[`translate`]のような、さまざまなページサイズに対して動くような関数を提供します。 [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`translate_page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#tymethod.translate_page [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to [`Translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html [`translate_addr`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#method.translate_addr [`translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#tymethod.translate これらのトレイトはインターフェイスを定義しているだけであり、その実装は何一つ提供していません。`x86_64`クレートは現在、このトレイトを実装する型を異なる要件に合わせて3つ用意しています。[`OffsetPageTable`]型は、全物理メモリがあるオフセットで仮想アドレスにマップしていることを前提とします。[`MappedPageTable`]はもう少し融通が効き、それぞれのページテーブルフレームが(そのフレームから)計算可能な仮想アドレスにマップしていることだけを前提とします。最後に[`RecursivePageTable`]型は、ページテーブルのフレームに[再帰的ページテーブル](#zai-gui-de-peziteburu)を使ってアクセスするときに使えます。 [`OffsetPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html [`MappedPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MappedPageTable.html [`RecursivePageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html 私達の場合、ブートローダは全物理メモリを`physical_memory_offset`変数で指定された仮想アドレスで物理メモリにマップしているので、`OffsetPageTable`型が使えます。これを初期化するために、`memory`モジュールに新しく`init`関数を作りましょう: ```rust use x86_64::structures::paging::OffsetPageTable; /// 新しいOffsetPageTableを初期化する。 /// /// この関数はunsafeである:全物理メモリが、渡された /// `physical_memory_offset`(だけずらしたうえ)で /// 仮想メモリへとマップされていることを呼び出し元が /// 保証しなければならない。また、`&mut`参照が複数の /// 名称を持つこと (mutable aliasingといい、動作が未定義) /// につながるため、この関数は一度しか呼び出してはならない。 pub unsafe fn init(physical_memory_offset: VirtAddr) -> OffsetPageTable<'static> { unsafe { let level_4_table = active_level_4_table(physical_memory_offset); OffsetPageTable::new(level_4_table, physical_memory_offset) } } // これは非公開にする unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable {…} ``` この関数は`physical_memory_offset`を引数としてとり、`'static`ライフタイムを持つ`OffsetPageTable`を作って返します。このライフタイムは、私達のカーネルが実行している間この実体 (インスタンス) はずっと有効であるという意味です。関数の中ではまず`active_level_4_table`関数を呼び出し、レベル4ページテーブルへの可変参照を取得します。次に[`OffsetPageTable::new`]関数をこの参照を使って呼び出します。この`new`関数の第二引数には、物理メモリのマッピングの始まる仮想アドレスが入ることになっています。つまり`physical_memory_offset`です。 [`OffsetPageTable::new`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html#method.new 可変参照が複数の名称を持つと未定義動作を起こす可能性があるので、今後`active_level_4_table`関数は`init`関数から一度呼び出されることを除いては呼び出されてはなりません。そのため、`pub`指定子を外してこの関数を非公開にしています。 これで、自前の`memory::translate_addr`関数の代わりに`Translate::translate_addr`メソッドを使うことができます。これには`kernel_main`を数行だけ書き換えればよいです: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // インポートが追加・変更されている use blog_os::memory; use x86_64::{structures::paging::Translate, VirtAddr}; […] // hello worldとblog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); // 追加:mapperを初期化 let mapper = unsafe { memory::init(phys_mem_offset) }; let addresses = […]; // 前と同じ for &address in &addresses { let virt = VirtAddr::new(address); // 追加:`mapper.translate_addr`メソッドを使う let phys = mapper.translate_addr(virt); println!("{:?} -> {:?}", virt, phys); } […] // test_main(), "it did not crash" の出力, および hlt_loop() } ``` [`translate_addr`]メソッドを使うために、それを提供している`Translate`トレイトをインポートする必要があります。 これを実行すると、同じ変換結果が得られますが、今度はhuge pageの変換もうまく行っています: ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, 0x18000000000 -> 0x0](qemu-mapper-translate-addr.png) 想定通り、`0xb8000`やコード・スタックアドレスの変換結果は自前の変換関数と同じになっています。また、`physical_memory_offset`は物理アドレス`0x0`にマップされているのもわかります。 `MappedPageTable`型の変換関数を使うことで、huge pageをサポートする手間が省けます。また`map_to`のような他のページング関数も利用でき、これは次のセクションで使います。 この時点で、自作した`memory::translate_addr`関数や`memory::translate_addr_inner`関数はもう必要ではないので削除して構いません。 ### 新しいマッピングを作る これまでページテーブルを見てきましたが、それに対する変更は行っていませんでした。ページテーブルに対する変更として、マッピングのなかったページにマッピングを作ってみましょう。 これを実装するには[`Mapper`]トレイトの[`map_to`]関数を使うので、この関数について少し見てみましょう。ドキュメントによると四つ引数があります:マッピングに使うページ、ページをマップさせるフレーム、ページテーブルエントリにつかうフラグの集合、そして`frame_allocator`です。フレームアロケータ (frame allocator) (フレームを割り当てる (アロケートする) 機能を持つ)が必要な理由は、与えられたページをマップするために追加でページテーブルを作成する必要があるかもしれず、これを格納するためには使われていないフレームが必要となるからです。 [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html#tymethod.map_to [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html #### `create_example_mapping`関数 私達が実装していく最初のステップとして、`create_example_mapping`関数という、与えられた仮想ページを`0xb8000`すなわちVGAテキストバッファの物理フレームにマップする関数を作ってみましょう。このフレームを選んだ理由は、マッピングが正しくなされたかをテストするのが容易だからです:マッピングしたページに書き込んで、それが画面に現れるか確認するだけでよいのですから。 `create_example_mapping`は以下のようになります: ```rust // in src/memory.rs use x86_64::{ PhysAddr, structures::paging::{Page, PhysFrame, Mapper, Size4KiB, FrameAllocator} }; /// 与えられたページをフレーム`0xb8000`に試しにマップする。 pub fn create_example_mapping( page: Page, mapper: &mut OffsetPageTable, frame_allocator: &mut impl FrameAllocator, ) { use x86_64::structures::paging::PageTableFlags as Flags; let frame = PhysFrame::containing_address(PhysAddr::new(0xb8000)); let flags = Flags::PRESENT | Flags::WRITABLE; let map_to_result = unsafe { // FIXME: unsafeであり、テストのためにのみ行う mapper.map_to(page, frame, flags, frame_allocator) }; map_to_result.expect("map_to failed").flush(); } ``` この関数は、マップする`page`に加え`OffsetPageTable`のインスタンスと`frame_allocator`への可変参照を引数に取ります。`frame_allocator`引数は[`impl Trait`][impl-trait-arg]構文により[`FrameAllocator`]トレイトを実装するあらゆる型の[汎用型][generic]になっています。`FrameAllocator`トレイトは[`PageSize`]トレイトを実装するなら(トレイト引数のサイズが)4KiBでも2MiBや1GiBのhuge pageでも構わない汎用 (ジェネリック) トレイトです。私達は4KiBのマッピングのみを作りたいので、ジェネリック引数は`Size4KiB`にしています。 [impl-trait-arg]: https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters [generic]: https://doc.rust-lang.org/book/ch10-00-generics.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`PageSize`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/trait.PageSize.html [`map_to`]メソッドは、呼び出し元がフレームはまだ使われていないことを保証しないといけないので、unsafeです。なぜなら、同じフレームを二度マップすると(例えば2つの異なる`&mut`参照が物理メモリの同じ場所を指すことで)未定義動作を起こす可能性があるからです。今回、VGAテキストバッファのフレームという、すでにマップされているフレームを再度使っているので、この要件を破ってしまっています。しかしながら、`create_example_mapping`関数は一時的なテスト関数であり、この記事のあとには取り除かれるので大丈夫です。この危険性のことを忘れないようにするために、その行に`FIXME` (`要修正`) コメントをつけておきます。 `map_to`関数が`page`と`unused_frame`に加えてフラグの集合と`frame_allocator`への参照を取りますが、これについてはすぐに説明します。フラグについては、`PRESENT`フラグという有効なエントリ全てに必須のフラグと、`WRITABLE`フラグという対応するページを書き込み可能にするフラグをセットしています。フラグの一覧については、前記事の[ページテーブルの形式][_Page Table Format_]を参照してください。 [_Page Table Format_]: @/edition-2/posts/08-paging-introduction/index.ja.md#peziteburunoxing-shi [`map_to`]関数は失敗しうるので、[`Result`]を返します。これは失敗しても構わない単なるテストコードなので、エラーが起きたときは[`expect`]を使ってパニックしてしまうことにします。この関数は成功したとき[`MapperFlush`]型を返します。この型の[`flush`]メソッドを使うと、新しくマッピングしたページをトランスレーション・ルックアサイド・バッファ (TLB) から簡単にflushすることができます。この型は`Result`と同じく[`#[must_use]`][must_use]属性を使っており、使用し忘れると警告を出します。 [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush [must_use]: https://doc.rust-lang.org/std/result/#results-must-be-used #### ダミーの`FrameAllocator` `create_example_mapping`関数を呼べるようにするためには、まず`FrameAllocator`トレイトを実装する型を作成する必要があります。上で述べたように、このトレイトは新しいページのためのフレームを`map_to`が必要としたときに割り当てる役割を持っています。 単純なケースを考えましょう:新しいページテーブルを作る必要がないと仮定してしまいます。この場合、常に`None`を返すフレームアロケータで十分です。私達のマッピング関数をテストするために、そのような`EmptyFrameAllocator`を作ります。 ```rust // in src/memory.rs /// つねに`None`を返すFrameAllocator pub struct EmptyFrameAllocator; unsafe impl FrameAllocator for EmptyFrameAllocator { fn allocate_frame(&mut self) -> Option { None } } ``` `FrameAllocator`を実装するのはunsafeです。なぜなら、実装する人は、実装したアロケータが未使用のフレームのみ取得することを保証しなければならないからです。さもなくば、例えば二つの仮想ページが同じ物理フレームにマップされたときに未定義動作が起こるかもしれません。この`EmptyFrameAllocator`は`None`しか返さないので、これは問題ではありません。 #### 仮想ページを選ぶ `create_example_mapping`関数に渡すための単純なフレームアロケータを手に入れました。しかし、このアロケータは常に`None`を返すので、マッピングを作る際に追加のページテーブルフレームが必要でなかったときにのみうまく動作します。いつ追加のページテーブルフレームが必要でありいつそうでないのかを知るために、例をとって考えてみましょう: ![A virtual and a physical address space with a single mapped page and the page tables of all four levels](required-page-frames-example.svg) この図の左は仮想アドレス空間を、右は物理アドレス空間を、真ん中はページテーブルを示します。このページテーブルが格納されている物理フレームが破線で示されています。仮想アドレス空間は一つのマップされたページをアドレス`0x803fe00000`に持っており、これは青色で示されています。このページをフレームに変換するために、CPUは4層のページテーブルを辿り、アドレス36KiBのフレームに到達します。 また、この図はVGAテキストバッファの物理フレームを赤色で示しています。私達の目的は、`create_example_mapping`関数を使ってまだマップされていない仮想ページをこのフレームにマップすることです。私達の`EmptyFrameAllocator`は常に`None`を返すので、アロケータからフレームを追加する必要がないようにマッピングを作りたいです。これができるかは、私達がマッピングにどの仮想ページを使うかに依存します。 この図の仮想アドレス空間には、2つの候補となるページを黄色で示しています。ページのうち一つはアドレス`0x803fe00000`で、これは(青で示された)マップされているページの3つ前です。レベル4と3のテーブルのインデックスは青いページと同じですが、レベル2と1のインデックスは違います([前の記事][page-table-indices]を参照)。レベル2テーブルのインデックスが違うということは、異なるレベル1テーブルが使われることを意味します。そんなレベル1テーブルは存在しないので、もしこちらを使っていたら、使われていない物理フレームを追加(でアロケート)する必要が出てきます。対して、2つ目のアドレス`0x803fe02000`にある候補のページは、青のページと同じレベル1ページテーブルを使うのでこの問題は発生しません。よって、必要となるすべてのページテーブルはすでに存在しています。 [page-table-indices]: @/edition-2/posts/08-paging-introduction/index.ja.md#x86-64niokerupezingu まとめると、新しいマッピングを作るときの難易度は、マッピングしようとしている仮想ページに依存するということです。作ろうとしているページのレベル1ページテーブルがすでに存在すると最も簡単で、エントリをそのページに一つ書き込むだけです。ページがレベル3のテーブルすら存在しない領域にある場合が最も難しく、その場合まずレベル3,2,1のページテーブルを新しく作る必要があります。 `EmptyFrameAllocator`を使って`create_example_mapping`を呼び出すためには、すべての(階層の)ページテーブルがすでに存在しているページを選ぶ必要があります。そんなページを探すにあたっては、ブートローダが自分自身を仮想アドレス空間の最初の1メガバイトに読み込んでいるということを利用できます。つまり、この領域のすべてのページについて、レベル1テーブルがきちんと存在しているということです。したがって、試しにマッピングを作るときに、このメモリ領域のいずれかの未使用ページ、例えばアドレス`0`を使えばよいです。普通このページは、ヌルポインタの参照外しがページフォルトを引き起こすことを保証するために使用しないので、ブートローダもここをマップさせてはいないはずです。 #### マッピングを作る というわけで、`create_example_mapping`関数を呼び出すために必要なすべての引数を手に入れたので、仮想アドレス`0`をマップするよう`kernel_main`関数を変更していきましょう。このページをVGAテキストバッファのフレームにマップすると、以後、画面に書き込むことができるようになるはずです。実装は以下のようになります: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory; use x86_64::{structures::paging::Page, VirtAddr}; // 新しいインポート […] // hello worldとblog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = memory::EmptyFrameAllocator; // 未使用のページをマップする let page = Page::containing_address(VirtAddr::new(0)); memory::create_example_mapping(page, &mut mapper, &mut frame_allocator); // 新しいマッピングを使って、文字列`New!`を画面に書き出す let page_ptr: *mut u64 = page.start_address().as_mut_ptr(); unsafe { page_ptr.offset(400).write_volatile(0x_f021_f077_f065_f04e)}; […] // test_main(), "it did not crash" printing, および hlt_loop() } ``` まず、`mapper`と`frame_allocator`インスタンスの可変参照を渡して`create_example_mapping`を呼ぶことで、アドレス`0`のページにマッピングを作っています。これはVGAテキストバッファのフレームにマップしているので、これに書き込んだものは何であれ画面に出てくるはずです。 次にページを生ポインタに変更して、オフセット`400`に値を書き込みます。このページの最初に書き込むとVGAバッファの一番上の行になり、次のprintlnで即座に画面外に流れていってしまうので、それを避けています。値`0x_f021_f077_f065_f04e`は、白背景の"New!"という文字列を表します。[VGAテキストモードの記事][in the _“VGA Text Mode”_ post]で学んだように、VGAバッファへの書き込みはvolatileでなければならないので、[`write_volatile`]メソッドを使っています。 [in the _“VGA Text Mode”_ post]: @/edition-2/posts/03-vga-text-buffer/index.ja.md#volatile [`write_volatile`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write_volatile QEMUで実行すると、以下の出力を得ます: ![QEMU printing "It did not crash!" with four completely white cells in the middle of the screen](qemu-new-mapping.png) 画面の "New!" はページ`0`への書き込みによるものなので、ページテーブルへの新しいマッピングの作成が成功したということを意味します。 このマッピングが成功したのは、アドレス`0`を管轄するレベル1テーブルがすでに存在していたからに過ぎません。レベル1テーブルがまだ存在しないページをマッピングしようとすると、`map_to`関数は新しいページテーブルを作るために`EmptyFrameAllocator`からフレームを割り当てようとしてエラーになります。`0`の代わりに`0xdeadbeaf000`をマッピングしようとするとそれが発生するのが見られます。 ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { […] let page = Page::containing_address(VirtAddr::new(0xdeadbeaf000)); […] } ``` これを実行すると、以下のエラーメッセージとともにパニックします: ``` panicked at 'map_to failed: FrameAllocationFailed', /…/result.rs:999:5 ``` レベル1テーブルがまだ存在していないページをマップするためには、ちゃんとした`FrameAllocator`を作らないといけません。しかし、どのフレームが未使用で、どのフレームが利用可能かはどうすればわかるのでしょう? ### フレームを割り当てる 新しいページテーブルを作成するためには、ちゃんとしたフレームアロケータを作る必要があります。このためには、ブートローダによって渡される`BootInfo`構造体の一部である`memory_map`を使います: ```rust // in src/memory.rs use bootloader::bootinfo::MemoryMap; /// ブートローダのメモリマップから、使用可能な /// フレームを返すFrameAllocator pub struct BootInfoFrameAllocator { memory_map: &'static MemoryMap, next: usize, } impl BootInfoFrameAllocator { /// 渡されたメモリマップからFrameAllocatorを作る。 /// /// この関数はunsafeである:呼び出し元は渡された /// メモリマップが有効であることを保証しなければ /// ならない。特に、`USABLE`なフレームは実際に /// 未使用でなくてはならない。 pub unsafe fn init(memory_map: &'static MemoryMap) -> Self { BootInfoFrameAllocator { memory_map, next: 0, } } } ``` この構造体は2つのフィールドを持ちます。ブートローダによって渡されたメモリマップへの`'static`な参照と、アロケータが次に返すべきフレームの番号を覚えておくための`next`フィールドです。 [_Boot Information_](#boot-information)節で説明したように、このメモリマップはBIOS/UEFIファームウェアから提供されます。これはブートプロセスのごく初期にのみ取得できますが、ブートローダがそのための関数を既に呼んでくれています。メモリマップは`MemoryRegion`構造体のリストからなり、この構造体はそれぞれのメモリ領域の開始アドレス、長さ、型(未使用か、予約済みかなど)を格納しています。 `init`関数は`BootInfoFrameAllocator`を与えられたメモリマップで初期化します。`next`フィールドは`0`で初期化し、フレームを割当てるたびに値を増やすことで同じフレームを二度返すことを防ぎます。メモリマップのusable (使用可能) とされているフレームが他のどこかで使われたりしていないかは知ることができないので、この`init`関数はそれを呼び出し元に追加で保証させるために`unsafe`でないといけません。 #### `usable_frames`メソッド `FrameAllocator`トレイトを実装していく前に、渡されたメモリマップをusableなフレームのイテレータに変換する補助メソッドを追加します: ```rust // in src/memory.rs use bootloader::bootinfo::MemoryRegionType; impl BootInfoFrameAllocator { /// メモリマップによって指定されたusableなフレームのイテレータを返す。 fn usable_frames(&self) -> impl Iterator { // メモリマップからusableな領域を得る let regions = self.memory_map.iter(); let usable_regions = regions .filter(|r| r.region_type == MemoryRegionType::Usable); // それぞれの領域をアドレス範囲にmapで変換する let addr_ranges = usable_regions .map(|r| r.range.start_addr()..r.range.end_addr()); // フレームの開始アドレスのイテレータへと変換する let frame_addresses = addr_ranges.flat_map(|r| r.step_by(4096)); // 開始アドレスから`PhysFrame`型を作る frame_addresses.map(|addr| PhysFrame::containing_address(PhysAddr::new(addr))) } } ``` この関数はイテレータのコンビネータメソッドを使って、最初に与えられる`MemoryMap`を使用可能な物理フレームのイテレータに変換します: - まず`iter`メソッドを使ってメモリマップを[`MemoryRegion`]のイテレータに変える。 - 次に[`filter`]メソッドを使って、予約済みなどの理由で使用不可能な領域を飛ばすようにする。ブートローダは作ったマッピングに使ったメモリマップはきちんと更新するので、私達のカーネル(コード、データ、スタック)に使われているフレームやブート情報を格納するのに使われているフレームはすでに`InUse` (`使用中`) などでマークされています。そのため`Usable`なフレームは他の場所では使われていないはずとわかります。 - つぎに、[`map`]コンビネータとRustの[range構文][range syntax]を使って、メモリ領域のイテレータからアドレス範囲のイテレータへと変換する。 - つぎに、アドレス範囲から[`step_by`]で4096個ごとにアドレスを選び、[`flat_map`]を使うことでフレームの最初のアドレスのイテレータを得る。4096バイト(=4KiB)はページのサイズに等しいので、それぞれのフレームの開始地点のアドレスが得られます。ブートローダのページは使用可能なメモリ領域をすべてアラインするので、ここで改めてアラインや丸めを行う必要はありません。`map`ではなく[`flat_map`]を使うことで、`Iterator>`ではなく`Iterator`を得ています。 - 最後に、開始アドレスの型を`PhysFrame`に変更することで`Iterator`を得ている。 [`MemoryRegion`]: https://docs.rs/bootloader/0.6.4/bootloader/bootinfo/struct.MemoryRegion.html [`filter`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter [`map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.map [range syntax]: https://doc.rust-lang.org/core/ops/struct.Range.html [`step_by`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.step_by [`flat_map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.flat_map この関数の戻り型は[`impl Trait`]機能を用いています。こうすると、`PhysFrame`をitemの型として持つような[`Iterator`]トレイトを実装する何らかの型を返すのだと指定できます。これは重要です――なぜなら、戻り値の型は名前のつけられないクロージャ型に依存し、**具体的な名前をつけるのが不可能**だからです。 [`impl Trait`]: https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits [`Iterator`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html #### `FrameAllocator`トレイトを実装する これで`FrameAllocator`トレイトを実装できます: ```rust // in src/memory.rs unsafe impl FrameAllocator for BootInfoFrameAllocator { fn allocate_frame(&mut self) -> Option { let frame = self.usable_frames().nth(self.next); self.next += 1; frame } } ``` まず`usable_frames`メソッドを使ってメモリマップからusableなフレームのイテレータを得ます。つぎに、[`Iterator::nth`]関数で`self.next`番目の(つまり`(self.next - 1)`だけ飛ばして)フレームを得ます。このフレームを返してリターンする前に、`self.next`を1だけ増やして次の呼び出しで1つ後のフレームが得られるようにします。 [`Iterator::nth`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.nth この実装は割当てを行うごとに`usable_frames`アロケータを作り直しているので、最適とは言い難いです。イテレータを構造体のフィールドとして直接格納するほうが良いでしょう。すると`nth`メソッドを使う必要はなくなり、割り当てのたびに[`next`]を使えばいいだけです。このアプローチの問題は、今の所構造体のフィールドに`impl Trait`型(の変数)を格納することができないことです。いつの日か、[named existential type][_named existential types_]が完全に実装されたときにはこれが可能になるかもしれません。 [`next`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#tymethod.next [_named existential types_]: https://github.com/rust-lang/rfcs/pull/2071 #### `BootInfoFrameAllocator`を使う `kernel_main`関数を修正して`EmptyFrameAllocator`のインスタンスの代わりに`BootInfoFrameAllocator`を渡しましょう: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory::BootInfoFrameAllocator; […] let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; […] } ``` ブート情報を使うフレームアロケータのおかげでマッピングは成功し、白背景に黒文字の"New!"が再び画面に現れました。舞台裏では、`map_to`メソッドが不足しているページテーブルを以下のやり方で作っています: - 渡された`frame_allocator`を使って未使用のフレームを割り当ててもらう。 - フレームをゼロで埋めることで、新しい空のページテーブルを作る。 - 上位のテーブルのエントリをそのフレームにマップする。 - 次の層で同じことを続ける。 `create_example_mapping`関数はただのお試しコードにすぎませんが、今や私達は任意のページにマッピングを作れるようになりました。これは、今後の記事で行うメモリ割り当てやマルチスレッディングにおいて不可欠です。 [上](#create-example-mappingguan-shu)で説明したような未定義動作を誤って引き起こしてしまうことのないよう、この時点で`create_example_mapping`関数を再び取り除いておきましょう。 ## まとめ この記事ではページテーブルのある物理フレームにアクセスするための様々なテクニックを学びました。恒等マップ、物理メモリ全体のマッピング、一時的なマッピング、再帰的ページテーブルなどです。このうち、シンプルでポータブル (アーキテクチャ非依存) で強力な、物理メモリ全体のマッピングを選びました。 ページテーブルにアクセスできなければ物理メモリをマップされないので、ブートローダの補助が必要でした。`bootloader`クレートはcargoのfeaturesというオプションを通じて、必要となるマッピングの作成をサポートしています。さらに、必要となる情報をエントリポイント関数の`&BootInfo`引数という形で私達のカーネルに渡してくれます。 実装についてですが、最初はページテーブルを辿る変換関数を自分の手で実装し、そのあとで`x86_64`クレートの`MappedPageTable`型を使いました。また、ページテーブルに新しいマッピングを作る方法や、そのために必要な`FrameAllocator`をブートローダに渡されたメモリマップをラップすることで作る方法を学びました。 ## 次は? 次の記事では、私達のカーネルのためのヒープメモリ領域を作り、それによって[メモリの割り当て][allocate memory]を行ったり各種の[コレクション型][collection types]を使うことが可能になります。 [allocate memory]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html [collection types]: https://doc.rust-lang.org/alloc/collections/index.html ================================================ FILE: blog/content/edition-2/posts/09-paging-implementation/index.md ================================================ +++ title = "Paging Implementation" weight = 9 path = "paging-implementation" date = 2019-03-14 [extra] chapter = "Memory Management" +++ This post shows how to implement paging support in our kernel. It first explores different techniques to make the physical page table frames accessible to the kernel and discusses their respective advantages and drawbacks. It then implements an address translation function and a function to create a new mapping. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-09`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-09 ## Introduction The [previous post] gave an introduction to the concept of paging. It motivated paging by comparing it with segmentation, explained how paging and page tables work, and then introduced the 4-level page table design of `x86_64`. We found out that the bootloader already set up a page table hierarchy for our kernel, which means that our kernel already runs on virtual addresses. This improves safety since illegal memory accesses cause page fault exceptions instead of modifying arbitrary physical memory. [previous post]: @/edition-2/posts/08-paging-introduction/index.md The post ended with the problem that we [can't access the page tables from our kernel][end of previous post] because they are stored in physical memory and our kernel already runs on virtual addresses. This post explores different approaches to making the page table frames accessible to our kernel. We will discuss the advantages and drawbacks of each approach and then decide on an approach for our kernel. [end of previous post]: @/edition-2/posts/08-paging-introduction/index.md#accessing-the-page-tables To implement the approach, we will need support from the bootloader, so we'll configure it first. Afterward, we will implement a function that traverses the page table hierarchy in order to translate virtual to physical addresses. Finally, we learn how to create new mappings in the page tables and how to find unused memory frames for creating new page tables. ## Accessing Page Tables Accessing the page tables from our kernel is not as easy as it may seem. To understand the problem, let's take a look at the example 4-level page table hierarchy from the previous post again: ![An example 4-level page hierarchy with each page table shown in physical memory](../paging-introduction/x86_64-page-table-translation.svg) The important thing here is that each page entry stores the _physical_ address of the next table. This avoids the need to run a translation for these addresses too, which would be bad for performance and could easily cause endless translation loops. The problem for us is that we can't directly access physical addresses from our kernel since our kernel also runs on top of virtual addresses. For example, when we access address `4 KiB` we access the _virtual_ address `4 KiB`, not the _physical_ address `4 KiB` where the level 4 page table is stored. When we want to access the physical address `4 KiB`, we can only do so through some virtual address that maps to it. So in order to access page table frames, we need to map some virtual pages to them. There are different ways to create these mappings that all allow us to access arbitrary page table frames. ### Identity Mapping A simple solution is to **identity map all page tables**: ![A virtual and a physical address space with various virtual pages mapped to the physical frame with the same address](identity-mapped-page-tables.svg) In this example, we see various identity-mapped page table frames. This way, the physical addresses of page tables are also valid virtual addresses so that we can easily access the page tables of all levels starting from the CR3 register. However, it clutters the virtual address space and makes it more difficult to find continuous memory regions of larger sizes. For example, imagine that we want to create a virtual memory region of size 1000 KiB in the above graphic, e.g., for [memory-mapping a file]. We can't start the region at `28 KiB` because it would collide with the already mapped page at `1004 KiB`. So we have to look further until we find a large enough unmapped area, for example at `1008 KiB`. This is a similar fragmentation problem as with [segmentation]. [memory-mapping a file]: https://en.wikipedia.org/wiki/Memory-mapped_file [segmentation]: @/edition-2/posts/08-paging-introduction/index.md#fragmentation Equally, it makes it much more difficult to create new page tables because we need to find physical frames whose corresponding pages aren't already in use. For example, let's assume that we reserved the _virtual_ 1000 KiB memory region starting at `1008 KiB` for our memory-mapped file. Now we can't use any frame with a _physical_ address between `1000 KiB` and `2008 KiB` anymore, because we can't identity map it. ### Map at a Fixed Offset To avoid the problem of cluttering the virtual address space, we can **use a separate memory region for page table mappings**. So instead of identity mapping page table frames, we map them at a fixed offset in the virtual address space. For example, the offset could be 10 TiB: ![The same figure as for the identity mapping, but each mapped virtual page is offset by 10 TiB.](page-tables-mapped-at-offset.svg) By using the virtual memory in the range `10 TiB..(10 TiB + physical memory size)` exclusively for page table mappings, we avoid the collision problems of the identity mapping. Reserving such a large region of the virtual address space is only possible if the virtual address space is much larger than the physical memory size. This isn't a problem on x86_64 since the 48-bit address space is 256 TiB large. This approach still has the disadvantage that we need to create a new mapping whenever we create a new page table. Also, it does not allow accessing page tables of other address spaces, which would be useful when creating a new process. ### Map the Complete Physical Memory We can solve these problems by **mapping the complete physical memory** instead of only page table frames: ![The same figure as for the offset mapping, but every physical frame has a mapping (at 10 TiB + X) instead of only page table frames.](map-complete-physical-memory.svg) This approach allows our kernel to access arbitrary physical memory, including page table frames of other address spaces. The reserved virtual memory range has the same size as before, with the difference that it no longer contains unmapped pages. The disadvantage of this approach is that additional page tables are needed for storing the mapping of the physical memory. These page tables need to be stored somewhere, so they use up a part of physical memory, which can be a problem on devices with a small amount of memory. On x86_64, however, we can use [huge pages] with a size of 2 MiB for the mapping, instead of the default 4 KiB pages. This way, mapping 32 GiB of physical memory only requires 132 KiB for page tables since only one level 3 table and 32 level 2 tables are needed. Huge pages are also more cache efficient since they use fewer entries in the translation lookaside buffer (TLB). [huge pages]: https://en.wikipedia.org/wiki/Page_%28computer_memory%29#Multiple_page_sizes ### Temporary Mapping For devices with very small amounts of physical memory, we could **map the page table frames only temporarily** when we need to access them. To be able to create the temporary mappings, we only need a single identity-mapped level 1 table: ![A virtual and a physical address space with an identity mapped level 1 table, which maps its 0th entry to the level 2 table frame, thereby mapping that frame to the page with address 0](temporarily-mapped-page-tables.svg) The level 1 table in this graphic controls the first 2 MiB of the virtual address space. This is because it is reachable by starting at the CR3 register and following the 0th entry in the level 4, level 3, and level 2 page tables. The entry with index `8` maps the virtual page at address `32 KiB` to the physical frame at address `32 KiB`, thereby identity mapping the level 1 table itself. The graphic shows this identity-mapping by the horizontal arrow at `32 KiB`. By writing to the identity-mapped level 1 table, our kernel can create up to 511 temporary mappings (512 minus the entry required for the identity mapping). In the above example, the kernel created two temporary mappings: - By mapping the 0th entry of the level 1 table to the frame with address `24 KiB`, it created a temporary mapping of the virtual page at `0 KiB` to the physical frame of the level 2 page table, indicated by the dashed arrow. - By mapping the 9th entry of the level 1 table to the frame with address `4 KiB`, it created a temporary mapping of the virtual page at `36 KiB` to the physical frame of the level 4 page table, indicated by the dashed arrow. Now the kernel can access the level 2 page table by writing to page `0 KiB` and the level 4 page table by writing to page `36 KiB`. The process for accessing an arbitrary page table frame with temporary mappings would be: - Search for a free entry in the identity-mapped level 1 table. - Map that entry to the physical frame of the page table that we want to access. - Access the target frame through the virtual page that maps to the entry. - Set the entry back to unused, thereby removing the temporary mapping again. This approach reuses the same 512 virtual pages for creating the mappings and thus requires only 4 KiB of physical memory. The drawback is that it is a bit cumbersome, especially since a new mapping might require modifications to multiple table levels, which means that we would need to repeat the above process multiple times. ### Recursive Page Tables Another interesting approach, which requires no additional page tables at all, is to **map the page table recursively**. The idea behind this approach is to map an entry from the level 4 page table to the level 4 table itself. By doing this, we effectively reserve a part of the virtual address space and map all current and future page table frames to that space. Let's go through an example to understand how this all works: ![An example 4-level page hierarchy with each page table shown in physical memory. Entry 511 of the level 4 page is mapped to frame 4KiB, the frame of the level 4 table itself.](recursive-page-table.png) The only difference to the [example at the beginning of this post] is the additional entry at index `511` in the level 4 table, which is mapped to physical frame `4 KiB`, the frame of the level 4 table itself. [example at the beginning of this post]: #accessing-page-tables By letting the CPU follow this entry on a translation, it doesn't reach a level 3 table but the same level 4 table again. This is similar to a recursive function that calls itself, therefore this table is called a _recursive page table_. The important thing is that the CPU assumes that every entry in the level 4 table points to a level 3 table, so it now treats the level 4 table as a level 3 table. This works because tables of all levels have the exact same layout on x86_64. By following the recursive entry one or multiple times before we start the actual translation, we can effectively shorten the number of levels that the CPU traverses. For example, if we follow the recursive entry once and then proceed to the level 3 table, the CPU thinks that the level 3 table is a level 2 table. Going further, it treats the level 2 table as a level 1 table and the level 1 table as the mapped frame. This means that we can now read and write the level 1 page table because the CPU thinks that it is the mapped frame. The graphic below illustrates the five translation steps: ![The above example 4-level page hierarchy with 5 arrows: "Step 0" from CR4 to level 4 table, "Step 1" from level 4 table to level 4 table, "Step 2" from level 4 table to level 3 table, "Step 3" from level 3 table to level 2 table, and "Step 4" from level 2 table to level 1 table.](recursive-page-table-access-level-1.png) Similarly, we can follow the recursive entry twice before starting the translation to reduce the number of traversed levels to two: ![The same 4-level page hierarchy with the following 4 arrows: "Step 0" from CR4 to level 4 table, "Steps 1&2" from level 4 table to level 4 table, "Step 3" from level 4 table to level 3 table, and "Step 4" from level 3 table to level 2 table.](recursive-page-table-access-level-2.png) Let's go through it step by step: First, the CPU follows the recursive entry on the level 4 table and thinks that it reaches a level 3 table. Then it follows the recursive entry again and thinks that it reaches a level 2 table. But in reality, it is still on the level 4 table. When the CPU now follows a different entry, it lands on a level 3 table but thinks it is already on a level 1 table. So while the next entry points to a level 2 table, the CPU thinks that it points to the mapped frame, which allows us to read and write the level 2 table. Accessing the tables of levels 3 and 4 works in the same way. To access the level 3 table, we follow the recursive entry three times, tricking the CPU into thinking it is already on a level 1 table. Then we follow another entry and reach a level 3 table, which the CPU treats as a mapped frame. For accessing the level 4 table itself, we just follow the recursive entry four times until the CPU treats the level 4 table itself as the mapped frame (in blue in the graphic below). ![The same 4-level page hierarchy with the following 3 arrows: "Step 0" from CR4 to level 4 table, "Steps 1,2,3" from level 4 table to level 4 table, and "Step 4" from level 4 table to level 3 table. In blue, the alternative "Steps 1,2,3,4" arrow from level 4 table to level 4 table.](recursive-page-table-access-level-3.png) It might take some time to wrap your head around the concept, but it works quite well in practice. In the section below, we explain how to construct virtual addresses for following the recursive entry one or multiple times. We will not use recursive paging for our implementation, so you don't need to read it to continue with the post. If it interests you, just click on _"Address Calculation"_ to expand it. ---

    Address Calculation

    We saw that we can access tables of all levels by following the recursive entry once or multiple times before the actual translation. Since the indexes into the tables of the four levels are derived directly from the virtual address, we need to construct special virtual addresses for this technique. Remember, the page table indexes are derived from the address in the following way: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](../paging-introduction/x86_64-table-indices-from-address.svg) Let's assume that we want to access the level 1 page table that maps a specific page. As we learned above, this means that we have to follow the recursive entry once before continuing with the level 4, level 3, and level 2 indexes. To do that, we move each block of the address one block to the right and set the original level 4 index to the index of the recursive entry: ![Bits 0–12 are the offset into the level 1 table frame, bits 12–21 the level 2 index, bits 21–30 the level 3 index, bits 30–39 the level 4 index, and bits 39–48 the index of the recursive entry](table-indices-from-address-recursive-level-1.svg) For accessing the level 2 table of that page, we move each index block two blocks to the right and set both the blocks of the original level 4 index and the original level 3 index to the index of the recursive entry: ![Bits 0–12 are the offset into the level 2 table frame, bits 12–21 the level 3 index, bits 21–30 the level 4 index, and bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-2.svg) Accessing the level 3 table works by moving each block three blocks to the right and using the recursive index for the original level 4, level 3, and level 2 address blocks: ![Bits 0–12 are the offset into the level 3 table frame, bits 12–21 the level 4 index, and bits 21–30, bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-3.svg) Finally, we can access the level 4 table by moving each block four blocks to the right and using the recursive index for all address blocks except for the offset: ![Bits 0–12 are the offset into the level l table frame and bits 12–21, bits 21–30, bits 30–39, and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-4.svg) We can now calculate virtual addresses for the page tables of all four levels. We can even calculate an address that points exactly to a specific page table entry by multiplying its index by 8, the size of a page table entry. The table below summarizes the address structure for accessing the different kinds of frames: Virtual Address for | Address Structure ([octal]) ------------------- | ------------------------------- Page | `0o_SSSSSS_AAA_BBB_CCC_DDD_EEEE` Level 1 Table Entry | `0o_SSSSSS_RRR_AAA_BBB_CCC_DDDD` Level 2 Table Entry | `0o_SSSSSS_RRR_RRR_AAA_BBB_CCCC` Level 3 Table Entry | `0o_SSSSSS_RRR_RRR_RRR_AAA_BBBB` Level 4 Table Entry | `0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA` [octal]: https://en.wikipedia.org/wiki/Octal Whereas `AAA` is the level 4 index, `BBB` the level 3 index, `CCC` the level 2 index, and `DDD` the level 1 index of the mapped frame, and `EEEE` the offset into it. `RRR` is the index of the recursive entry. When an index (three digits) is transformed to an offset (four digits), it is done by multiplying it by 8 (the size of a page table entry). With this offset, the resulting address directly points to the respective page table entry. `SSSSSS` are sign extension bits, which means that they are all copies of bit 47. This is a special requirement for valid addresses on the x86_64 architecture. We explained it in the [previous post][sign extension]. [sign extension]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 We use [octal] numbers for representing the addresses since each octal character represents three bits, which allows us to clearly separate the 9-bit indexes of the different page table levels. This isn't possible with the hexadecimal system, where each character represents four bits. ##### In Rust Code To construct such addresses in Rust code, you can use bitwise operations: ```rust // the virtual address whose corresponding page tables you want to access let addr: usize = […]; let r = 0o777; // recursive index let sign = 0o177777 << 48; // sign extension // retrieve the page table indices of the address that we want to translate let l4_idx = (addr >> 39) & 0o777; // level 4 index let l3_idx = (addr >> 30) & 0o777; // level 3 index let l2_idx = (addr >> 21) & 0o777; // level 2 index let l1_idx = (addr >> 12) & 0o777; // level 1 index let page_offset = addr & 0o7777; // calculate the table addresses let level_4_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (r << 12); let level_3_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (l4_idx << 12); let level_2_table_addr = sign | (r << 39) | (r << 30) | (l4_idx << 21) | (l3_idx << 12); let level_1_table_addr = sign | (r << 39) | (l4_idx << 30) | (l3_idx << 21) | (l2_idx << 12); ``` The above code assumes that the last level 4 entry with index `0o777` (511) is recursively mapped. This isn't the case currently, so the code won't work yet. See below on how to tell the bootloader to set up the recursive mapping. Alternatively to performing the bitwise operations by hand, you can use the [`RecursivePageTable`] type of the `x86_64` crate, which provides safe abstractions for various page table operations. For example, the code below shows how to translate a virtual address to its mapped physical address: [`RecursivePageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html ```rust // in src/memory.rs use x86_64::structures::paging::{Mapper, Page, PageTable, RecursivePageTable}; use x86_64::{VirtAddr, PhysAddr}; /// Creates a RecursivePageTable instance from the level 4 address. let level_4_table_addr = […]; let level_4_table_ptr = level_4_table_addr as *mut PageTable; let recursive_page_table = unsafe { let level_4_table = &mut *level_4_table_ptr; RecursivePageTable::new(level_4_table).unwrap(); } /// Retrieve the physical address for the given virtual address let addr: u64 = […] let addr = VirtAddr::new(addr); let page: Page = Page::containing_address(addr); // perform the translation let frame = recursive_page_table.translate_page(page); frame.map(|frame| frame.start_address() + u64::from(addr.page_offset())) ``` Again, a valid recursive mapping is required for this code. With such a mapping, the missing `level_4_table_addr` can be calculated as in the first code example.
    --- Recursive Paging is an interesting technique that shows how powerful a single mapping in a page table can be. It is relatively easy to implement and only requires a minimal amount of setup (just a single recursive entry), so it's a good choice for first experiments with paging. However, it also has some disadvantages: - It occupies a large amount of virtual memory (512 GiB). This isn't a big problem in the large 48-bit address space, but it might lead to suboptimal cache behavior. - It only allows accessing the currently active address space easily. Accessing other address spaces is still possible by changing the recursive entry, but a temporary mapping is required for switching back. We described how to do this in the (outdated) [_Remap The Kernel_] post. - It heavily relies on the page table format of x86 and might not work on other architectures. [_Remap The Kernel_]: https://os.phil-opp.com/remap-the-kernel/#overview ## Bootloader Support All of these approaches require page table modifications for their setup. For example, mappings for the physical memory need to be created or an entry of the level 4 table needs to be mapped recursively. The problem is that we can't create these required mappings without an existing way to access the page tables. This means that we need the help of the bootloader, which creates the page tables that our kernel runs on. The bootloader has access to the page tables, so it can create any mappings that we need. In its current implementation, the `bootloader` crate has support for two of the above approaches, controlled through [cargo features]: [cargo features]: https://doc.rust-lang.org/cargo/reference/features.html#the-features-section - The `map_physical_memory` feature maps the complete physical memory somewhere into the virtual address space. Thus, the kernel has access to all physical memory and can follow the [_Map the Complete Physical Memory_](#map-the-complete-physical-memory) approach. - With the `recursive_page_table` feature, the bootloader maps an entry of the level 4 page table recursively. This allows the kernel to access the page tables as described in the [_Recursive Page Tables_](#recursive-page-tables) section. We choose the first approach for our kernel since it is simple, platform-independent, and more powerful (it also allows access to non-page-table-frames). To enable the required bootloader support, we add the `map_physical_memory` feature to our `bootloader` dependency: ```toml [dependencies] bootloader = { version = "0.9", features = ["map_physical_memory"]} ``` With this feature enabled, the bootloader maps the complete physical memory to some unused virtual address range. To communicate the virtual address range to our kernel, the bootloader passes a _boot information_ structure. ### Boot Information The `bootloader` crate defines a [`BootInfo`] struct that contains all the information it passes to our kernel. The struct is still in an early stage, so expect some breakage when updating to future [semver-incompatible] bootloader versions. With the `map_physical_memory` feature enabled, it currently has the two fields `memory_map` and `physical_memory_offset`: [`BootInfo`]: https://docs.rs/bootloader/0.9/bootloader/bootinfo/struct.BootInfo.html [semver-incompatible]: https://doc.rust-lang.org/stable/cargo/reference/specifying-dependencies.html#caret-requirements - The `memory_map` field contains an overview of the available physical memory. This tells our kernel how much physical memory is available in the system and which memory regions are reserved for devices such as the VGA hardware. The memory map can be queried from the BIOS or UEFI firmware, but only very early in the boot process. For this reason, it must be provided by the bootloader because there is no way for the kernel to retrieve it later. We will need the memory map later in this post. - The `physical_memory_offset` tells us the virtual start address of the physical memory mapping. By adding this offset to a physical address, we get the corresponding virtual address. This allows us to access arbitrary physical memory from our kernel. - This physical memory offset can be customized by adding a `[package.metadata.bootloader]` table in Cargo.toml and setting the field `physical-memory-offset = "0x0000f00000000000"` (or any other value). However, note that the bootloader can panic if it runs into physical address values that start to overlap with the space beyond the offset, i.e., areas it would have previously mapped to some other early physical addresses. So in general, the higher the value (> 1 TiB), the better. The bootloader passes the `BootInfo` struct to our kernel in the form of a `&'static BootInfo` argument to our `_start` function. We don't have this argument declared in our function yet, so let's add it: ```rust // in src/main.rs use bootloader::BootInfo; #[unsafe(no_mangle)] pub extern "C" fn _start(boot_info: &'static BootInfo) -> ! { // new argument […] } ``` It wasn't a problem to leave off this argument before because the x86_64 calling convention passes the first argument in a CPU register. Thus, the argument is simply ignored when it isn't declared. However, it would be a problem if we accidentally used a wrong argument type, since the compiler doesn't know the correct type signature of our entry point function. ### The `entry_point` Macro Since our `_start` function is called externally from the bootloader, no checking of our function signature occurs. This means that we could let it take arbitrary arguments without any compilation errors, but it would fail or cause undefined behavior at runtime. To make sure that the entry point function always has the correct signature that the bootloader expects, the `bootloader` crate provides an [`entry_point`] macro that provides a type-checked way to define a Rust function as the entry point. Let's rewrite our entry point function to use this macro: [`entry_point`]: https://docs.rs/bootloader/0.6.4/bootloader/macro.entry_point.html ```rust // in src/main.rs use bootloader::{BootInfo, entry_point}; entry_point!(kernel_main); fn kernel_main(boot_info: &'static BootInfo) -> ! { […] } ``` We no longer need to use `extern "C"` or `no_mangle` for our entry point, as the macro defines the real lower level `_start` entry point for us. The `kernel_main` function is now a completely normal Rust function, so we can choose an arbitrary name for it. The important thing is that it is type-checked so that a compilation error occurs when we use a wrong function signature, for example by adding an argument or changing the argument type. Let's perform the same change in our `lib.rs`: ```rust // in src/lib.rs #[cfg(test)] use bootloader::{entry_point, BootInfo}; #[cfg(test)] entry_point!(test_kernel_main); /// Entry point for `cargo test` #[cfg(test)] fn test_kernel_main(_boot_info: &'static BootInfo) -> ! { // like before init(); test_main(); hlt_loop(); } ``` Since the entry point is only used in test mode, we add the `#[cfg(test)]` attribute to all items. We give our test entry point the distinct name `test_kernel_main` to avoid confusion with the `kernel_main` of our `main.rs`. We don't use the `BootInfo` parameter for now, so we prefix the parameter name with a `_` to silence the unused variable warning. ## Implementation Now that we have access to physical memory, we can finally start to implement our page table code. First, we will take a look at the currently active page tables that our kernel runs on. In the second step, we will create a translation function that returns the physical address that a given virtual address is mapped to. As a last step, we will try to modify the page tables in order to create a new mapping. Before we begin, we create a new `memory` module for our code: ```rust // in src/lib.rs pub mod memory; ``` For the module, we create an empty `src/memory.rs` file. ### Accessing the Page Tables At the [end of the previous post], we tried to take a look at the page tables our kernel runs on, but failed since we couldn't access the physical frame that the `CR3` register points to. We're now able to continue from there by creating an `active_level_4_table` function that returns a reference to the active level 4 page table: [end of the previous post]: @/edition-2/posts/08-paging-introduction/index.md#accessing-the-page-tables ```rust // in src/memory.rs use x86_64::{ structures::paging::PageTable, VirtAddr, }; /// Returns a mutable reference to the active level 4 table. /// /// This function is unsafe because the caller must guarantee that the /// complete physical memory is mapped to virtual memory at the passed /// `physical_memory_offset`. Also, this function must be only called once /// to avoid aliasing `&mut` references (which is undefined behavior). pub unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable { use x86_64::registers::control::Cr3; let (level_4_table_frame, _) = Cr3::read(); let phys = level_4_table_frame.start_address(); let virt = physical_memory_offset + phys.as_u64(); let page_table_ptr: *mut PageTable = virt.as_mut_ptr(); unsafe { &mut *page_table_ptr } } ``` First, we read the physical frame of the active level 4 table from the `CR3` register. We then take its physical start address, convert it to a `u64`, and add it to `physical_memory_offset` to get the virtual address where the page table frame is mapped. Finally, we convert the virtual address to a `*mut PageTable` raw pointer through the `as_mut_ptr` method and then unsafely create a `&mut PageTable` reference from it. We create a `&mut` reference instead of a `&` reference because we will mutate the page tables later in this post. We can now use this function to print the entries of the level 4 table: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory::active_level_4_table; use x86_64::VirtAddr; println!("Hello World{}", "!"); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let l4_table = unsafe { active_level_4_table(phys_mem_offset) }; for (i, entry) in l4_table.iter().enumerate() { if !entry.is_unused() { println!("L4 Entry {}: {:?}", i, entry); } } // as before #[cfg(test)] test_main(); println!("It did not crash!"); blog_os::hlt_loop(); } ``` First, we convert the `physical_memory_offset` of the `BootInfo` struct to a [`VirtAddr`] and pass it to the `active_level_4_table` function. We then use the `iter` function to iterate over the page table entries and the [`enumerate`] combinator to additionally add an index `i` to each element. We only print non-empty entries because all 512 entries wouldn't fit on the screen. [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate When we run it, we see the following output: ![QEMU printing entry 0 (0x2000, PRESENT, WRITABLE, ACCESSED), entry 1 (0x894000, PRESENT, WRITABLE, ACCESSED, DIRTY), entry 31 (0x88e000, PRESENT, WRITABLE, ACCESSED, DIRTY), entry 175 (0x891000, PRESENT, WRITABLE, ACCESSED, DIRTY), and entry 504 (0x897000, PRESENT, WRITABLE, ACCESSED, DIRTY)](qemu-print-level-4-table.png) We see that there are various non-empty entries, which all map to different level 3 tables. There are so many regions because kernel code, kernel stack, physical memory mapping, and boot information all use separate memory areas. To traverse the page tables further and take a look at a level 3 table, we can take the mapped frame of an entry and convert it to a virtual address again: ```rust // in the `for` loop in src/main.rs use x86_64::structures::paging::PageTable; if !entry.is_unused() { println!("L4 Entry {}: {:?}", i, entry); // get the physical address from the entry and convert it let phys = entry.frame().unwrap().start_address(); let virt = phys.as_u64() + boot_info.physical_memory_offset; let ptr = VirtAddr::new(virt).as_mut_ptr(); let l3_table: &PageTable = unsafe { &*ptr }; // print non-empty entries of the level 3 table for (i, entry) in l3_table.iter().enumerate() { if !entry.is_unused() { println!(" L3 Entry {}: {:?}", i, entry); } } } ``` For looking at the level 2 and level 1 tables, we repeat that process for the level 3 and level 2 entries. As you can imagine, this gets very verbose very quickly, so we don't show the full code here. Traversing the page tables manually is interesting because it helps to understand how the CPU performs the translation. However, most of the time, we are only interested in the mapped physical address for a given virtual address, so let's create a function for that. ### Translating Addresses To translate a virtual to a physical address, we have to traverse the four-level page table until we reach the mapped frame. Let's create a function that performs this translation: ```rust // in src/memory.rs use x86_64::PhysAddr; /// Translates the given virtual address to the mapped physical address, or /// `None` if the address is not mapped. /// /// This function is unsafe because the caller must guarantee that the /// complete physical memory is mapped to virtual memory at the passed /// `physical_memory_offset`. pub unsafe fn translate_addr(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { translate_addr_inner(addr, physical_memory_offset) } ``` We forward the function to a safe `translate_addr_inner` function to limit the scope of `unsafe`. As we noted above, Rust treats the complete body of an `unsafe fn` like a large unsafe block. By calling into a private safe function, we make each `unsafe` operation explicit again. The private inner function contains the real implementation: ```rust // in src/memory.rs /// Private function that is called by `translate_addr`. /// /// This function is safe to limit the scope of `unsafe` because Rust treats /// the whole body of unsafe functions as an unsafe block. This function must /// only be reachable through `unsafe fn` from outside of this module. fn translate_addr_inner(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { use x86_64::structures::paging::page_table::FrameError; use x86_64::registers::control::Cr3; // read the active level 4 frame from the CR3 register let (level_4_table_frame, _) = Cr3::read(); let table_indexes = [ addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index() ]; let mut frame = level_4_table_frame; // traverse the multi-level page table for &index in &table_indexes { // convert the frame into a page table reference let virt = physical_memory_offset + frame.start_address().as_u64(); let table_ptr: *const PageTable = virt.as_ptr(); let table = unsafe {&*table_ptr}; // read the page table entry and update `frame` let entry = &table[index]; frame = match entry.frame() { Ok(frame) => frame, Err(FrameError::FrameNotPresent) => return None, Err(FrameError::HugeFrame) => panic!("huge pages not supported"), }; } // calculate the physical address by adding the page offset Some(frame.start_address() + u64::from(addr.page_offset())) } ``` Instead of reusing our `active_level_4_table` function, we read the level 4 frame from the `CR3` register again. We do this because it simplifies this prototype implementation. Don't worry, we will create a better solution in a moment. The `VirtAddr` struct already provides methods to compute the indexes into the page tables of the four levels. We store these indexes in a small array because it allows us to traverse the page tables using a `for` loop. Outside of the loop, we remember the last visited `frame` to calculate the physical address later. The `frame` points to page table frames while iterating and to the mapped frame after the last iteration, i.e., after following the level 1 entry. Inside the loop, we again use the `physical_memory_offset` to convert the frame into a page table reference. We then read the entry of the current page table and use the [`PageTableEntry::frame`] function to retrieve the mapped frame. If the entry is not mapped to a frame, we return `None`. If the entry maps a huge 2 MiB or 1 GiB page, we panic for now. [`PageTableEntry::frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html#method.frame Let's test our translation function by translating some addresses: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // new import use blog_os::memory::translate_addr; […] // hello world and blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let addresses = [ // the identity-mapped vga buffer page 0xb8000, // some code page 0x201008, // some stack page 0x0100_0020_1a10, // virtual address mapped to physical address 0 boot_info.physical_memory_offset, ]; for &address in &addresses { let virt = VirtAddr::new(address); let phys = unsafe { translate_addr(virt, phys_mem_offset) }; println!("{:?} -> {:?}", virt, phys); } […] // test_main(), "it did not crash" printing, and hlt_loop() } ``` When we run it, we see the following output: ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, "panicked at 'huge pages not supported'](qemu-translate-addr.png) As expected, the identity-mapped address `0xb8000` translates to the same physical address. The code page and the stack page translate to some arbitrary physical addresses, which depend on how the bootloader created the initial mapping for our kernel. It's worth noting that the last 12 bits always stay the same after translation, which makes sense because these bits are the [_page offset_] and not part of the translation. [_page offset_]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 Since each physical address can be accessed by adding the `physical_memory_offset`, the translation of the `physical_memory_offset` address itself should point to physical address `0`. However, the translation fails because the mapping uses huge pages for efficiency, which is not supported in our implementation yet. ### Using `OffsetPageTable` Translating virtual to physical addresses is a common task in an OS kernel, therefore the `x86_64` crate provides an abstraction for it. The implementation already supports huge pages and several other page table functions apart from `translate_addr`, so we will use it in the following instead of adding huge page support to our own implementation. At the basis of the abstraction are two traits that define various page table mapping functions: - The [`Mapper`] trait is generic over the page size and provides functions that operate on pages. Examples are [`translate_page`], which translates a given page to a frame of the same size, and [`map_to`], which creates a new mapping in the page table. - The [`Translate`] trait provides functions that work with multiple page sizes, such as [`translate_addr`] or the general [`translate`]. [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`translate_page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#tymethod.translate_page [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to [`Translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html [`translate_addr`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#method.translate_addr [`translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#tymethod.translate The traits only define the interface, they don't provide any implementation. The `x86_64` crate currently provides three types that implement the traits with different requirements. The [`OffsetPageTable`] type assumes that the complete physical memory is mapped to the virtual address space at some offset. The [`MappedPageTable`] is a bit more flexible: It only requires that each page table frame is mapped to the virtual address space at a calculable address. Finally, the [`RecursivePageTable`] type can be used to access page table frames through [recursive page tables](#recursive-page-tables). [`OffsetPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html [`MappedPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MappedPageTable.html [`RecursivePageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html In our case, the bootloader maps the complete physical memory at a virtual address specified by the `physical_memory_offset` variable, so we can use the `OffsetPageTable` type. To initialize it, we create a new `init` function in our `memory` module: ```rust use x86_64::structures::paging::OffsetPageTable; /// Initialize a new OffsetPageTable. /// /// This function is unsafe because the caller must guarantee that the /// complete physical memory is mapped to virtual memory at the passed /// `physical_memory_offset`. Also, this function must be only called once /// to avoid aliasing `&mut` references (which is undefined behavior). pub unsafe fn init(physical_memory_offset: VirtAddr) -> OffsetPageTable<'static> { unsafe { let level_4_table = active_level_4_table(physical_memory_offset); OffsetPageTable::new(level_4_table, physical_memory_offset) } } // make private unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable {…} ``` The function takes the `physical_memory_offset` as an argument and returns a new `OffsetPageTable` instance with a `'static` lifetime. This means that the instance stays valid for the complete runtime of our kernel. In the function body, we first call the `active_level_4_table` function to retrieve a mutable reference to the level 4 page table. We then invoke the [`OffsetPageTable::new`] function with this reference. As the second parameter, the `new` function expects the virtual address at which the mapping of the physical memory starts, which is given in the `physical_memory_offset` variable. [`OffsetPageTable::new`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html#method.new The `active_level_4_table` function should only be called from the `init` function from now on because it can easily lead to aliased mutable references when called multiple times, which can cause undefined behavior. For this reason, we make the function private by removing the `pub` specifier. We can now use the `Translate::translate_addr` method instead of our own `memory::translate_addr` function. We only need to change a few lines in our `kernel_main`: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // new: different imports use blog_os::memory; use x86_64::{structures::paging::Translate, VirtAddr}; […] // hello world and blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); // new: initialize a mapper let mapper = unsafe { memory::init(phys_mem_offset) }; let addresses = […]; // same as before for &address in &addresses { let virt = VirtAddr::new(address); // new: use the `mapper.translate_addr` method let phys = mapper.translate_addr(virt); println!("{:?} -> {:?}", virt, phys); } […] // test_main(), "it did not crash" printing, and hlt_loop() } ``` We need to import the `Translate` trait in order to use the [`translate_addr`] method it provides. When we run it now, we see the same translation results as before, with the difference that the huge page translation now also works: ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, 0x18000000000 -> 0x0](qemu-mapper-translate-addr.png) As expected, the translations of `0xb8000` and the code and stack addresses stay the same as with our own translation function. Additionally, we now see that the virtual address `physical_memory_offset` is mapped to the physical address `0x0`. By using the translation function of the `MappedPageTable` type, we can spare ourselves the work of implementing huge page support. We also have access to other page functions, such as `map_to`, which we will use in the next section. At this point, we no longer need our `memory::translate_addr` and `memory::translate_addr_inner` functions, so we can delete them. ### Creating a new Mapping Until now, we only looked at the page tables without modifying anything. Let's change that by creating a new mapping for a previously unmapped page. We will use the [`map_to`] function of the [`Mapper`] trait for our implementation, so let's take a look at that function first. The documentation tells us that it takes four arguments: the page that we want to map, the frame that the page should be mapped to, a set of flags for the page table entry, and a `frame_allocator`. The frame allocator is needed because mapping the given page might require creating additional page tables, which need unused frames as backing storage. [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html#tymethod.map_to [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html #### A `create_example_mapping` Function The first step of our implementation is to create a new `create_example_mapping` function that maps a given virtual page to `0xb8000`, the physical frame of the VGA text buffer. We choose that frame because it allows us to easily test if the mapping was created correctly: We just need to write to the newly mapped page and see whether we see the write appear on the screen. The `create_example_mapping` function looks like this: ```rust // in src/memory.rs use x86_64::{ PhysAddr, structures::paging::{Page, PhysFrame, Mapper, Size4KiB, FrameAllocator} }; /// Creates an example mapping for the given page to frame `0xb8000`. pub fn create_example_mapping( page: Page, mapper: &mut OffsetPageTable, frame_allocator: &mut impl FrameAllocator, ) { use x86_64::structures::paging::PageTableFlags as Flags; let frame = PhysFrame::containing_address(PhysAddr::new(0xb8000)); let flags = Flags::PRESENT | Flags::WRITABLE; let map_to_result = unsafe { // FIXME: this is not safe, we do it only for testing mapper.map_to(page, frame, flags, frame_allocator) }; map_to_result.expect("map_to failed").flush(); } ``` In addition to the `page` that should be mapped, the function expects a mutable reference to an `OffsetPageTable` instance and a `frame_allocator`. The `frame_allocator` parameter uses the [`impl Trait`][impl-trait-arg] syntax to be [generic] over all types that implement the [`FrameAllocator`] trait. The trait is generic over the [`PageSize`] trait to work with both standard 4 KiB pages and huge 2 MiB/1 GiB pages. We only want to create a 4 KiB mapping, so we set the generic parameter to `Size4KiB`. [impl-trait-arg]: https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters [generic]: https://doc.rust-lang.org/book/ch10-00-generics.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`PageSize`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/trait.PageSize.html The [`map_to`] method is unsafe because the caller must ensure that the frame is not already in use. The reason for this is that mapping the same frame twice could result in undefined behavior, for example when two different `&mut` references point to the same physical memory location. In our case, we reuse the VGA text buffer frame, which is already mapped, so we break the required condition. However, the `create_example_mapping` function is only a temporary testing function and will be removed after this post, so it is ok. To remind us of the unsafety, we put a `FIXME` comment on the line. In addition to the `page` and the `unused_frame`, the `map_to` method takes a set of flags for the mapping and a reference to the `frame_allocator`, which will be explained in a moment. For the flags, we set the `PRESENT` flag because it is required for all valid entries and the `WRITABLE` flag to make the mapped page writable. For a list of all possible flags, see the [_Page Table Format_] section of the previous post. [_Page Table Format_]: @/edition-2/posts/08-paging-introduction/index.md#page-table-format The [`map_to`] function can fail, so it returns a [`Result`]. Since this is just some example code that does not need to be robust, we just use [`expect`] to panic when an error occurs. On success, the function returns a [`MapperFlush`] type that provides an easy way to flush the newly mapped page from the translation lookaside buffer (TLB) with its [`flush`] method. Like `Result`, the type uses the [`#[must_use]`][must_use] attribute to emit a warning when we accidentally forget to use it. [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush [must_use]: https://doc.rust-lang.org/std/result/#results-must-be-used #### A dummy `FrameAllocator` To be able to call `create_example_mapping`, we need to create a type that implements the `FrameAllocator` trait first. As noted above, the trait is responsible for allocating frames for new page tables if they are needed by `map_to`. Let's start with the simple case and assume that we don't need to create new page tables. For this case, a frame allocator that always returns `None` suffices. We create such an `EmptyFrameAllocator` for testing our mapping function: ```rust // in src/memory.rs /// A FrameAllocator that always returns `None`. pub struct EmptyFrameAllocator; unsafe impl FrameAllocator for EmptyFrameAllocator { fn allocate_frame(&mut self) -> Option { None } } ``` Implementing the `FrameAllocator` is unsafe because the implementer must guarantee that the allocator yields only unused frames. Otherwise, undefined behavior might occur, for example when two virtual pages are mapped to the same physical frame. Our `EmptyFrameAllocator` only returns `None`, so this isn't a problem in this case. #### Choosing a Virtual Page We now have a simple frame allocator that we can pass to our `create_example_mapping` function. However, the allocator always returns `None`, so this will only work if no additional page table frames are needed for creating the mapping. To understand when additional page table frames are needed and when not, let's consider an example: ![A virtual and a physical address space with a single mapped page and the page tables of all four levels](required-page-frames-example.svg) The graphic shows the virtual address space on the left, the physical address space on the right, and the page tables in between. The page tables are stored in physical memory frames, indicated by the dashed lines. The virtual address space contains a single mapped page at address `0x803fe00000`, marked in blue. To translate this page to its frame, the CPU walks the 4-level page table until it reaches the frame at address 36 KiB. Additionally, the graphic shows the physical frame of the VGA text buffer in red. Our goal is to map a previously unmapped virtual page to this frame using our `create_example_mapping` function. Since our `EmptyFrameAllocator` always returns `None`, we want to create the mapping so that no additional frames are needed from the allocator. This depends on the virtual page that we select for the mapping. The graphic shows two candidate pages in the virtual address space, both marked in yellow. One page is at address `0x803fdfd000`, which is 3 pages before the mapped page (in blue). While the level 4 and level 3 page table indices are the same as for the blue page, the level 2 and level 1 indices are different (see the [previous post][page-table-indices]). The different index into the level 2 table means that a different level 1 table is used for this page. Since this level 1 table does not exist yet, we would need to create it if we chose that page for our example mapping, which would require an additional unused physical frame. In contrast, the second candidate page at address `0x803fe02000` does not have this problem because it uses the same level 1 page table as the blue page. Thus, all the required page tables already exist. [page-table-indices]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 In summary, the difficulty of creating a new mapping depends on the virtual page that we want to map. In the easiest case, the level 1 page table for the page already exists and we just need to write a single entry. In the most difficult case, the page is in a memory region for which no level 3 exists yet, so we need to create new level 3, level 2 and level 1 page tables first. For calling our `create_example_mapping` function with the `EmptyFrameAllocator`, we need to choose a page for which all page tables already exist. To find such a page, we can utilize the fact that the bootloader loads itself in the first megabyte of the virtual address space. This means that a valid level 1 table exists for all pages in this region. Thus, we can choose any unused page in this memory region for our example mapping, such as the page at address `0`. Normally, this page should stay unused to guarantee that dereferencing a null pointer causes a page fault, so we know that the bootloader leaves it unmapped. #### Creating the Mapping We now have all the required parameters for calling our `create_example_mapping` function, so let's modify our `kernel_main` function to map the page at virtual address `0`. Since we map the page to the frame of the VGA text buffer, we should be able to write to the screen through it afterward. The implementation looks like this: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory; use x86_64::{structures::paging::Page, VirtAddr}; // new import […] // hello world and blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = memory::EmptyFrameAllocator; // map an unused page let page = Page::containing_address(VirtAddr::new(0)); memory::create_example_mapping(page, &mut mapper, &mut frame_allocator); // write the string `New!` to the screen through the new mapping let page_ptr: *mut u64 = page.start_address().as_mut_ptr(); unsafe { page_ptr.offset(400).write_volatile(0x_f021_f077_f065_f04e)}; […] // test_main(), "it did not crash" printing, and hlt_loop() } ``` We first create the mapping for the page at address `0` by calling our `create_example_mapping` function with a mutable reference to the `mapper` and the `frame_allocator` instances. This maps the page to the VGA text buffer frame, so we should see any write to it on the screen. Then we convert the page to a raw pointer and write a value to offset `400`. We don't write to the start of the page because the top line of the VGA buffer is directly shifted off the screen by the next `println`. We write the value `0x_f021_f077_f065_f04e`, which represents the string _"New!"_ on a white background. As we learned [in the _“VGA Text Mode”_ post], writes to the VGA buffer should be volatile, so we use the [`write_volatile`] method. [in the _“VGA Text Mode”_ post]: @/edition-2/posts/03-vga-text-buffer/index.md#volatile [`write_volatile`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write_volatile When we run it in QEMU, we see the following output: ![QEMU printing "It did not crash!" with four completely white cells in the middle of the screen](qemu-new-mapping.png) The _"New!"_ on the screen is caused by our write to page `0`, which means that we successfully created a new mapping in the page tables. Creating that mapping only worked because the level 1 table responsible for the page at address `0` already exists. When we try to map a page for which no level 1 table exists yet, the `map_to` function fails because it tries to create new page tables by allocating frames with the `EmptyFrameAllocator`. We can see that happen when we try to map page `0xdeadbeaf000` instead of `0`: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { […] let page = Page::containing_address(VirtAddr::new(0xdeadbeaf000)); […] } ``` When we run it, a panic with the following error message occurs: ``` panicked at 'map_to failed: FrameAllocationFailed', /…/result.rs:999:5 ``` To map pages that don't have a level 1 page table yet, we need to create a proper `FrameAllocator`. But how do we know which frames are unused and how much physical memory is available? ### Allocating Frames In order to create new page tables, we need to create a proper frame allocator. To do that, we use the `memory_map` that is passed by the bootloader as part of the `BootInfo` struct: ```rust // in src/memory.rs use bootloader::bootinfo::MemoryMap; /// A FrameAllocator that returns usable frames from the bootloader's memory map. pub struct BootInfoFrameAllocator { memory_map: &'static MemoryMap, next: usize, } impl BootInfoFrameAllocator { /// Create a FrameAllocator from the passed memory map. /// /// This function is unsafe because the caller must guarantee that the passed /// memory map is valid. The main requirement is that all frames that are marked /// as `USABLE` in it are really unused. pub unsafe fn init(memory_map: &'static MemoryMap) -> Self { BootInfoFrameAllocator { memory_map, next: 0, } } } ``` The struct has two fields: A `'static` reference to the memory map passed by the bootloader and a `next` field that keeps track of the number of the next frame that the allocator should return. As we explained in the [_Boot Information_](#boot-information) section, the memory map is provided by the BIOS/UEFI firmware. It can only be queried very early in the boot process, so the bootloader already calls the respective functions for us. The memory map consists of a list of [`MemoryRegion`] structs, which contain the start address, the length, and the type (e.g. unused, reserved, etc.) of each memory region. The `init` function initializes a `BootInfoFrameAllocator` with a given memory map. The `next` field is initialized with `0` and will be increased for every frame allocation to avoid returning the same frame twice. Since we don't know if the usable frames of the memory map were already used somewhere else, our `init` function must be `unsafe` to require additional guarantees from the caller. #### A `usable_frames` Method Before we implement the `FrameAllocator` trait, we add an auxiliary method that converts the memory map into an iterator of usable frames: ```rust // in src/memory.rs use bootloader::bootinfo::MemoryRegionType; impl BootInfoFrameAllocator { /// Returns an iterator over the usable frames specified in the memory map. fn usable_frames(&self) -> impl Iterator { // get usable regions from memory map let regions = self.memory_map.iter(); let usable_regions = regions .filter(|r| r.region_type == MemoryRegionType::Usable); // map each region to its address range let addr_ranges = usable_regions .map(|r| r.range.start_addr()..r.range.end_addr()); // transform to an iterator of frame start addresses let frame_addresses = addr_ranges.flat_map(|r| r.step_by(4096)); // create `PhysFrame` types from the start addresses frame_addresses.map(|addr| PhysFrame::containing_address(PhysAddr::new(addr))) } } ``` This function uses iterator combinator methods to transform the initial `MemoryMap` into an iterator of usable physical frames: - First, we call the `iter` method to convert the memory map to an iterator of [`MemoryRegion`]s. - Then we use the [`filter`] method to skip any reserved or otherwise unavailable regions. The bootloader updates the memory map for all the mappings it creates, so frames that are used by our kernel (code, data, or stack) or to store the boot information are already marked as `InUse` or similar. Thus, we can be sure that `Usable` frames are not used somewhere else. - Afterwards, we use the [`map`] combinator and Rust's [range syntax] to transform our iterator of memory regions to an iterator of address ranges. - Next, we use [`flat_map`] to transform the address ranges into an iterator of frame start addresses, choosing every 4096th address using [`step_by`]. Since 4096 bytes (= 4 KiB) is the page size, we get the start address of each frame. The bootloader page-aligns all usable memory areas so that we don't need any alignment or rounding code here. By using [`flat_map`] instead of `map`, we get an `Iterator` instead of an `Iterator>`. - Finally, we convert the start addresses to `PhysFrame` types to construct an `Iterator`. [`MemoryRegion`]: https://docs.rs/bootloader/0.6.4/bootloader/bootinfo/struct.MemoryRegion.html [`filter`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter [`map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.map [range syntax]: https://doc.rust-lang.org/core/ops/struct.Range.html [`step_by`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.step_by [`flat_map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.flat_map The return type of the function uses the [`impl Trait`] feature. This way, we can specify that we return some type that implements the [`Iterator`] trait with item type `PhysFrame` but don't need to name the concrete return type. This is important here because we _can't_ name the concrete type since it depends on unnamable closure types. [`impl Trait`]: https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits [`Iterator`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html #### Implementing the `FrameAllocator` Trait Now we can implement the `FrameAllocator` trait: ```rust // in src/memory.rs unsafe impl FrameAllocator for BootInfoFrameAllocator { fn allocate_frame(&mut self) -> Option { let frame = self.usable_frames().nth(self.next); self.next += 1; frame } } ``` We first use the `usable_frames` method to get an iterator of usable frames from the memory map. Then, we use the [`Iterator::nth`] function to get the frame with index `self.next` (thereby skipping `(self.next - 1)` frames). Before returning that frame, we increase `self.next` by one so that we return the following frame on the next call. [`Iterator::nth`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.nth This implementation is not quite optimal since it recreates the `usable_frame` allocator on every allocation. It would be better to directly store the iterator as a struct field instead. Then we wouldn't need the `nth` method and could just call [`next`] on every allocation. The problem with this approach is that it's not possible to store an `impl Trait` type in a struct field currently. It might work someday when [_named existential types_] are fully implemented. [`next`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#tymethod.next [_named existential types_]: https://github.com/rust-lang/rfcs/pull/2071 #### Using the `BootInfoFrameAllocator` We can now modify our `kernel_main` function to pass a `BootInfoFrameAllocator` instance instead of an `EmptyFrameAllocator`: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory::BootInfoFrameAllocator; […] let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; […] } ``` With the boot info frame allocator, the mapping succeeds and we see the black-on-white _"New!"_ on the screen again. Behind the scenes, the `map_to` method creates the missing page tables in the following way: - Use the passed `frame_allocator` to allocate an unused frame. - Zero the frame to create a new, empty page table. - Map the entry of the higher level table to that frame. - Continue with the next table level. While our `create_example_mapping` function is just some example code, we are now able to create new mappings for arbitrary pages. This will be essential for allocating memory or implementing multithreading in future posts. At this point, we should delete the `create_example_mapping` function again to avoid accidentally invoking undefined behavior, as explained [above](#a-create-example-mapping-function). ## Summary In this post we learned about different techniques to access the physical frames of page tables, including identity mapping, mapping of the complete physical memory, temporary mapping, and recursive page tables. We chose to map the complete physical memory since it's simple, portable, and powerful. We can't map the physical memory from our kernel without page table access, so we need support from the bootloader. The `bootloader` crate supports creating the required mapping through optional cargo crate features. It passes the required information to our kernel in the form of a `&BootInfo` argument to our entry point function. For our implementation, we first manually traversed the page tables to implement a translation function, and then used the `MappedPageTable` type of the `x86_64` crate. We also learned how to create new mappings in the page table and how to create the necessary `FrameAllocator` on top of the memory map passed by the bootloader. ## What's next? The next post will create a heap memory region for our kernel, which will allow us to [allocate memory] and use various [collection types]. [allocate memory]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html [collection types]: https://doc.rust-lang.org/alloc/collections/index.html ================================================ FILE: blog/content/edition-2/posts/09-paging-implementation/index.pt-BR.md ================================================ +++ title = "Implementação de Paginação" weight = 9 path = "pt-BR/paging-implementation" date = 2019-03-14 [extra] chapter = "Gerenciamento de Memória" # Please update this when updating the translation translation_based_on_commit = "32f629fb2dc193db0dc0657338bd0ddec5914f05" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Esta postagem mostra como implementar suporte a paginação em nosso kernel. Ela primeiro explora diferentes técnicas para tornar os frames físicos da tabela de página acessíveis ao kernel e discute suas respectivas vantagens e desvantagens. Em seguida, implementa uma função de tradução de endereços e uma função para criar um novo mapeamento. Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na branch [`post-09`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-09 ## Introdução A [postagem anterior] deu uma introdução ao conceito de paginação. Ela motivou paginação comparando-a com segmentação, explicou como paginação e tabelas de página funcionam, e então introduziu o design de tabela de página de 4 níveis do `x86_64`. Descobrimos que o bootloader já configurou uma hierarquia de tabela de página para nosso kernel, o que significa que nosso kernel já executa em endereços virtuais. Isso melhora a segurança, já que acessos ilegais à memória causam exceções de page fault em vez de modificar memória física arbitrária. [postagem anterior]: @/edition-2/posts/08-paging-introduction/index.md A postagem terminou com o problema de que [não podemos acessar as tabelas de página do nosso kernel][end of previous post] porque estão armazenadas na memória física e nosso kernel já executa em endereços virtuais. Esta postagem explora diferentes abordagens para tornar os frames da tabela de página acessíveis ao nosso kernel. Discutiremos as vantagens e desvantagens de cada abordagem e então decidiremos sobre uma abordagem para nosso kernel. [end of previous post]: @/edition-2/posts/08-paging-introduction/index.md#accessing-the-page-tables Para implementar a abordagem, precisaremos de suporte do bootloader, então o configuraremos primeiro. Depois, implementaremos uma função que percorre a hierarquia de tabela de página para traduzir endereços virtuais em físicos. Finalmente, aprenderemos como criar novos mapeamentos nas tabelas de página e como encontrar frames de memória não usados para criar novas tabelas de página. ## Acessando Tabelas de Página Acessar as tabelas de página do nosso kernel não é tão fácil quanto pode parecer. Para entender o problema, vamos dar uma olhada na hierarquia de tabela de página de 4 níveis de exemplo da postagem anterior novamente: ![An example 4-level page hierarchy with each page table shown in physical memory](../paging-introduction/x86_64-page-table-translation.svg) A coisa importante aqui é que cada entrada de página armazena o endereço _físico_ da próxima tabela. Isso evita a necessidade de executar uma tradução para esses endereços também, o que seria ruim para o desempenho e poderia facilmente causar loops de tradução infinitos. O problema para nós é que não podemos acessar diretamente endereços físicos do nosso kernel, já que nosso kernel também executa em cima de endereços virtuais. Por exemplo, quando acessamos o endereço `4 KiB`, acessamos o endereço _virtual_ `4 KiB`, não o endereço _físico_ `4 KiB` onde a tabela de página de nível 4 está armazenada. Quando queremos acessar o endereço físico `4 KiB`, só podemos fazê-lo através de algum endereço virtual que mapeia para ele. Então, para acessar frames de tabela de página, precisamos mapear algumas páginas virtuais para eles. Existem diferentes formas de criar esses mapeamentos que todos nos permitem acessar frames de tabela de página arbitrários. ### Identity Mapping Uma solução simples é fazer **identity map de todas as tabelas de página**: ![A virtual and a physical address space with various virtual pages mapped to the physical frame with the same address](identity-mapped-page-tables.svg) Neste exemplo, vemos vários frames de tabela de página com identity mapping. Desta forma, os endereços físicos das tabelas de página também são endereços virtuais válidos, então podemos facilmente acessar as tabelas de página de todos os níveis começando do registrador CR3. No entanto, isso confunde o espaço de endereço virtual e torna mais difícil encontrar regiões contínuas de memória de tamanhos maiores. Por exemplo, imagine que queremos criar uma região de memória virtual de tamanho 1000 KiB no gráfico acima, por exemplo, para [mapear um arquivo na memória]. Não podemos iniciar a região em `28 KiB` porque colidia com a página já mapeada em `1004 KiB`. Então temos que procurar mais até encontrarmos uma área não mapeada grande o suficiente, por exemplo em `1008 KiB`. Este é um problema de fragmentação similar ao da [segmentação]. [mapear um arquivo na memória]: https://en.wikipedia.org/wiki/Memory-mapped_file [segmentação]: @/edition-2/posts/08-paging-introduction/index.md#fragmentation Igualmente, torna muito mais difícil criar novas tabelas de página porque precisamos encontrar frames físicos cujas páginas correspondentes já não estão em uso. Por exemplo, vamos assumir que reservamos a região de memória _virtual_ de 1000 KiB começando em `1008 KiB` para nosso arquivo mapeado na memória. Agora não podemos mais usar nenhum frame com endereço _físico_ entre `1000 KiB` e `2008 KiB`, porque não podemos fazer identity mapping dele. ### Mapear em um Deslocamento Fixo Para evitar o problema de confundir o espaço de endereço virtual, podemos **usar uma região de memória separada para mapeamentos de tabela de página**. Então, em vez de fazer identity mapping dos frames de tabela de página, os mapeamos em um deslocamento fixo no espaço de endereço virtual. Por exemplo, o deslocamento poderia ser 10 TiB: ![The same figure as for the identity mapping, but each mapped virtual page is offset by 10 TiB.](page-tables-mapped-at-offset.svg) Ao usar a memória virtual no intervalo `10 TiB..(10 TiB + tamanho da memória física)` exclusivamente para mapeamentos de tabela de página, evitamos os problemas de colisão do identity mapping. Reservar uma região tão grande do espaço de endereço virtual só é possível se o espaço de endereço virtual for muito maior que o tamanho da memória física. Isso não é um problema no x86_64, já que o espaço de endereço de 48 bits tem 256 TiB de tamanho. Esta abordagem ainda tem a desvantagem de que precisamos criar um novo mapeamento sempre que criamos uma nova tabela de página. Além disso, não permite acessar tabelas de página de outros espaços de endereço, o que seria útil ao criar um novo processo. ### Mapear a Memória Física Completa Podemos resolver esses problemas **mapeando a memória física completa** em vez de apenas frames de tabela de página: ![The same figure as for the offset mapping, but every physical frame has a mapping (at 10 TiB + X) instead of only page table frames.](map-complete-physical-memory.svg) Esta abordagem permite que nosso kernel acesse memória física arbitrária, incluindo frames de tabela de página de outros espaços de endereço. O intervalo de memória virtual reservado tem o mesmo tamanho de antes, com a diferença de que não contém mais páginas não mapeadas. A desvantagem desta abordagem é que tabelas de página adicionais são necessárias para armazenar o mapeamento da memória física. Essas tabelas de página precisam ser armazenadas em algum lugar, então usam uma parte da memória física, o que pode ser um problema em dispositivos com uma pequena quantidade de memória. No x86_64, no entanto, podemos usar [huge pages] com tamanho de 2 MiB para o mapeamento, em vez das páginas padrão de 4 KiB. Desta forma, mapear 32 GiB de memória física requer apenas 132 KiB para tabelas de página, já que apenas uma tabela de nível 3 e 32 tabelas de nível 2 são necessárias. Huge pages também são mais eficientes em cache, já que usam menos entradas no translation lookaside buffer (TLB). [huge pages]: https://en.wikipedia.org/wiki/Page_%28computer_memory%29#Multiple_page_sizes ### Mapeamento Temporário Para dispositivos com quantidades muito pequenas de memória física, poderíamos **mapear os frames de tabela de página apenas temporariamente** quando precisamos acessá-los. Para poder criar os mapeamentos temporários, precisamos apenas de uma única tabela de nível 1 com identity mapping: ![A virtual and a physical address space with an identity mapped level 1 table, which maps its 0th entry to the level 2 table frame, thereby mapping that frame to the page with address 0](temporarily-mapped-page-tables.svg) A tabela de nível 1 neste gráfico controla os primeiros 2 MiB do espaço de endereço virtual. Isso ocorre porque ela é alcançável começando no registrador CR3 e seguindo a 0ª entrada nas tabelas de página de nível 4, nível 3 e nível 2. A entrada com índice `8` mapeia a página virtual no endereço `32 KiB` para o frame físico no endereço `32 KiB`, fazendo assim identity mapping da própria tabela de nível 1. O gráfico mostra este identity mapping pela seta horizontal em `32 KiB`. Ao escrever na tabela de nível 1 com identity mapping, nosso kernel pode criar até 511 mapeamentos temporários (512 menos a entrada necessária para o identity mapping). No exemplo acima, o kernel criou dois mapeamentos temporários: - Ao mapear a 0ª entrada da tabela de nível 1 para o frame com endereço `24 KiB`, ele criou um mapeamento temporário da página virtual em `0 KiB` para o frame físico da tabela de página de nível 2, indicado pela seta tracejada. - Ao mapear a 9ª entrada da tabela de nível 1 para o frame com endereço `4 KiB`, ele criou um mapeamento temporário da página virtual em `36 KiB` para o frame físico da tabela de página de nível 4, indicado pela seta tracejada. Agora o kernel pode acessar a tabela de página de nível 2 escrevendo na página `0 KiB` e a tabela de página de nível 4 escrevendo na página `36 KiB`. O processo para acessar um frame de tabela de página arbitrário com mapeamentos temporários seria: - Procurar uma entrada livre na tabela de nível 1 com identity mapping. - Mapear essa entrada para o frame físico da tabela de página que queremos acessar. - Acessar o frame alvo através da página virtual que mapeia para a entrada. - Definir a entrada de volta para não usada, removendo assim o mapeamento temporário novamente. Esta abordagem reutiliza as mesmas 512 páginas virtuais para criar os mapeamentos e assim requer apenas 4 KiB de memória física. A desvantagem é que é um pouco trabalhosa, especialmente já que um novo mapeamento pode requerer modificações a múltiplos níveis de tabela, o que significa que precisaríamos repetir o processo acima múltiplas vezes. ### Tabelas de Página Recursivas Outra abordagem interessante, que não requer nenhuma tabela de página adicional, é **mapear a tabela de página recursivamente**. A ideia por trás desta abordagem é mapear uma entrada da tabela de página de nível 4 para a própria tabela de nível 4. Ao fazer isso, efetivamente reservamos uma parte do espaço de endereço virtual e mapeamos todos os frames de tabela de página atuais e futuros para esse espaço. Vamos passar por um exemplo para entender como isso tudo funciona: ![An example 4-level page hierarchy with each page table shown in physical memory. Entry 511 of the level 4 page is mapped to frame 4KiB, the frame of the level 4 table itself.](recursive-page-table.png) A única diferença para o [exemplo no início desta postagem] é a entrada adicional no índice `511` na tabela de nível 4, que está mapeada para o frame físico `4 KiB`, o frame da própria tabela de nível 4. [exemplo no início desta postagem]: #acessando-tabelas-de-pagina Ao deixar a CPU seguir esta entrada em uma tradução, ela não alcança uma tabela de nível 3, mas a mesma tabela de nível 4 novamente. Isso é similar a uma função recursiva que se chama, portanto esta tabela é chamada de _tabela de página recursiva_. A coisa importante é que a CPU assume que cada entrada na tabela de nível 4 aponta para uma tabela de nível 3, então agora trata a tabela de nível 4 como uma tabela de nível 3. Isso funciona porque tabelas de todos os níveis têm exatamente o mesmo layout no x86_64. Ao seguir a entrada recursiva uma ou múltiplas vezes antes de começarmos a tradução real, podemos efetivamente encurtar o número de níveis que a CPU percorre. Por exemplo, se seguirmos a entrada recursiva uma vez e então prosseguirmos para a tabela de nível 3, a CPU pensa que a tabela de nível 3 é uma tabela de nível 2. Indo mais longe, ela trata a tabela de nível 2 como uma tabela de nível 1 e a tabela de nível 1 como o frame mapeado. Isso significa que agora podemos ler e escrever a tabela de página de nível 1 porque a CPU pensa que é o frame mapeado. O gráfico abaixo ilustra os cinco passos de tradução: ![The above example 4-level page hierarchy with 5 arrows: "Step 0" from CR4 to level 4 table, "Step 1" from level 4 table to level 4 table, "Step 2" from level 4 table to level 3 table, "Step 3" from level 3 table to level 2 table, and "Step 4" from level 2 table to level 1 table.](recursive-page-table-access-level-1.png) Similarmente, podemos seguir a entrada recursiva duas vezes antes de iniciar a tradução para reduzir o número de níveis percorridos para dois: ![The same 4-level page hierarchy with the following 4 arrows: "Step 0" from CR4 to level 4 table, "Steps 1&2" from level 4 table to level 4 table, "Step 3" from level 4 table to level 3 table, and "Step 4" from level 3 table to level 2 table.](recursive-page-table-access-level-2.png) Vamos passar por isso passo a passo: Primeiro, a CPU segue a entrada recursiva na tabela de nível 4 e pensa que alcança uma tabela de nível 3. Então ela segue a entrada recursiva novamente e pensa que alcança uma tabela de nível 2. Mas na realidade, ela ainda está na tabela de nível 4. Quando a CPU agora segue uma entrada diferente, ela aterrissa em uma tabela de nível 3, mas pensa que já está em uma tabela de nível 1. Então, enquanto a próxima entrada aponta para uma tabela de nível 2, a CPU pensa que aponta para o frame mapeado, o que nos permite ler e escrever a tabela de nível 2. Acessar as tabelas de níveis 3 e 4 funciona da mesma forma. Para acessar a tabela de nível 3, seguimos a entrada recursiva três vezes, enganando a CPU a pensar que já está em uma tabela de nível 1. Então seguimos outra entrada e alcançamos uma tabela de nível 3, que a CPU trata como um frame mapeado. Para acessar a própria tabela de nível 4, apenas seguimos a entrada recursiva quatro vezes até a CPU tratar a própria tabela de nível 4 como o frame mapeado (em azul no gráfico abaixo). ![The same 4-level page hierarchy with the following 3 arrows: "Step 0" from CR4 to level 4 table, "Steps 1,2,3" from level 4 table to level 4 table, and "Step 4" from level 4 table to level 3 table. In blue, the alternative "Steps 1,2,3,4" arrow from level 4 table to level 4 table.](recursive-page-table-access-level-3.png) Pode levar algum tempo para entender o conceito, mas funciona muito bem na prática. Na seção abaixo, explicamos como construir endereços virtuais para seguir a entrada recursiva uma ou múltiplas vezes. Não usaremos paginação recursiva para nossa implementação, então você não precisa ler para continuar com a postagem. Se isso te interessa, apenas clique em _"Cálculo de Endereço"_ para expandir. ---

    Cálculo de Endereço

    Vimos que podemos acessar tabelas de todos os níveis seguindo a entrada recursiva uma ou múltiplas vezes antes da tradução real. Como os índices nas tabelas dos quatro níveis são derivados diretamente do endereço virtual, precisamos construir endereços virtuais especiais para esta técnica. Lembre-se, os índices da tabela de página são derivados do endereço da seguinte forma: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](../paging-introduction/x86_64-table-indices-from-address.svg) Vamos assumir que queremos acessar a tabela de página de nível 1 que mapeia uma página específica. Como aprendemos acima, isso significa que temos que seguir a entrada recursiva uma vez antes de continuar com os índices de nível 4, nível 3 e nível 2. Para fazer isso, movemos cada bloco do endereço um bloco para a direita e definimos o índice de nível 4 original para o índice da entrada recursiva: ![Bits 0–12 are the offset into the level 1 table frame, bits 12–21 the level 2 index, bits 21–30 the level 3 index, bits 30–39 the level 4 index, and bits 39–48 the index of the recursive entry](table-indices-from-address-recursive-level-1.svg) Para acessar a tabela de nível 2 daquela página, movemos cada bloco de índice dois blocos para a direita e definimos tanto os blocos do índice de nível 4 original quanto do índice de nível 3 original para o índice da entrada recursiva: ![Bits 0–12 are the offset into the level 2 table frame, bits 12–21 the level 3 index, bits 21–30 the level 4 index, and bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-2.svg) Acessar a tabela de nível 3 funciona movendo cada bloco três blocos para a direita e usando o índice recursivo para os blocos de endereço originais de nível 4, nível 3 e nível 2: ![Bits 0–12 are the offset into the level 3 table frame, bits 12–21 the level 4 index, and bits 21–30, bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-3.svg) Finalmente, podemos acessar a tabela de nível 4 movendo cada bloco quatro blocos para a direita e usando o índice recursivo para todos os blocos de endereço exceto o deslocamento: ![Bits 0–12 are the offset into the level l table frame and bits 12–21, bits 21–30, bits 30–39, and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-4.svg) Agora podemos calcular endereços virtuais para as tabelas de página de todos os quatro níveis. Podemos até calcular um endereço que aponta exatamente para uma entrada de tabela de página específica multiplicando seu índice por 8, o tamanho de uma entrada de tabela de página. A tabela abaixo resume a estrutura de endereço para acessar os diferentes tipos de frames: Endereço Virtual para | Estrutura de Endereço ([octal]) ------------------- | ------------------------------- Página | `0o_SSSSSS_AAA_BBB_CCC_DDD_EEEE` Entrada da Tabela de Nível 1 | `0o_SSSSSS_RRR_AAA_BBB_CCC_DDDD` Entrada da Tabela de Nível 2 | `0o_SSSSSS_RRR_RRR_AAA_BBB_CCCC` Entrada da Tabela de Nível 3 | `0o_SSSSSS_RRR_RRR_RRR_AAA_BBBB` Entrada da Tabela de Nível 4 | `0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA` [octal]: https://en.wikipedia.org/wiki/Octal Onde `AAA` é o índice de nível 4, `BBB` o índice de nível 3, `CCC` o índice de nível 2, e `DDD` o índice de nível 1 do frame mapeado, e `EEEE` o deslocamento nele. `RRR` é o índice da entrada recursiva. Quando um índice (três dígitos) é transformado em um deslocamento (quatro dígitos), é feito multiplicando-o por 8 (o tamanho de uma entrada de tabela de página). Com este deslocamento, o endereço resultante aponta diretamente para a respectiva entrada de tabela de página. `SSSSSS` são bits de extensão de sinal, o que significa que são todas cópias do bit 47. Este é um requisito especial para endereços válidos na arquitetura x86_64. Explicamos isso na [postagem anterior][sign extension]. [sign extension]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 Usamos números [octais] para representar os endereços, já que cada caractere octal representa três bits, o que nos permite separar claramente os índices de 9 bits dos diferentes níveis de tabela de página. Isso não é possível com o sistema hexadecimal, onde cada caractere representa quatro bits. ##### Em Código Rust Para construir tais endereços em código Rust, você pode usar operações bitwise: ```rust // o endereço virtual cujas tabelas de página correspondentes você deseja acessar let addr: usize = […]; let r = 0o777; // índice recursivo let sign = 0o177777 << 48; // extensão de sinal // recupera os índices da tabela de página do endereço que queremos traduzir let l4_idx = (addr >> 39) & 0o777; // índice de nível 4 let l3_idx = (addr >> 30) & 0o777; // índice de nível 3 let l2_idx = (addr >> 21) & 0o777; // índice de nível 2 let l1_idx = (addr >> 12) & 0o777; // índice de nível 1 let page_offset = addr & 0o7777; // calcula os endereços da tabela let level_4_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (r << 12); let level_3_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (l4_idx << 12); let level_2_table_addr = sign | (r << 39) | (r << 30) | (l4_idx << 21) | (l3_idx << 12); let level_1_table_addr = sign | (r << 39) | (l4_idx << 30) | (l3_idx << 21) | (l2_idx << 12); ``` O código acima assume que a última entrada de nível 4 com índice `0o777` (511) está mapeada recursivamente. Isso não é o caso atualmente, então o código ainda não funcionará. Veja abaixo sobre como dizer ao bootloader para configurar o mapeamento recursivo. Alternativamente a realizar as operações bitwise manualmente, você pode usar o tipo [`RecursivePageTable`] da crate `x86_64`, que fornece abstrações seguras para várias operações de tabela de página. Por exemplo, o código abaixo mostra como traduzir um endereço virtual para seu endereço físico mapeado: [`RecursivePageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html ```rust // em src/memory.rs use x86_64::structures::paging::{Mapper, Page, PageTable, RecursivePageTable}; use x86_64::{VirtAddr, PhysAddr}; /// Cria uma instância RecursivePageTable do endereço de nível 4. let level_4_table_addr = […]; let level_4_table_ptr = level_4_table_addr as *mut PageTable; let recursive_page_table = unsafe { let level_4_table = &mut *level_4_table_ptr; RecursivePageTable::new(level_4_table).unwrap(); } /// Recupera o endereço físico para o endereço virtual dado let addr: u64 = […] let addr = VirtAddr::new(addr); let page: Page = Page::containing_address(addr); // realiza a tradução let frame = recursive_page_table.translate_page(page); frame.map(|frame| frame.start_address() + u64::from(addr.page_offset())) ``` Novamente, um mapeamento recursivo válido é necessário para este código. Com tal mapeamento, o `level_4_table_addr` faltante pode ser calculado como no primeiro exemplo de código.
    --- Paginação Recursiva é uma técnica interessante que mostra quão poderoso um único mapeamento em uma tabela de página pode ser. É relativamente fácil de implementar e requer apenas uma quantidade mínima de configuração (apenas uma única entrada recursiva), então é uma boa escolha para primeiros experimentos com paginação. No entanto, também tem algumas desvantagens: - Ela ocupa uma grande quantidade de memória virtual (512 GiB). Isso não é um grande problema no grande espaço de endereço de 48 bits, mas pode levar a comportamento de cache subótimo. - Ela só permite acessar facilmente o espaço de endereço atualmente ativo. Acessar outros espaços de endereço ainda é possível mudando a entrada recursiva, mas um mapeamento temporário é necessário para mudar de volta. Descrevemos como fazer isso na postagem (desatualizada) [_Remap The Kernel_]. - Ela depende fortemente do formato de tabela de página do x86 e pode não funcionar em outras arquiteturas. [_Remap The Kernel_]: https://os.phil-opp.com/remap-the-kernel/#overview ## Suporte do Bootloader Todas essas abordagens requerem modificações de tabela de página para sua configuração. Por exemplo, mapeamentos para a memória física precisam ser criados ou uma entrada da tabela de nível 4 precisa ser mapeada recursivamente. O problema é que não podemos criar esses mapeamentos necessários sem uma forma existente de acessar as tabelas de página. Isso significa que precisamos da ajuda do bootloader, que cria as tabelas de página nas quais nosso kernel executa. O bootloader tem acesso às tabelas de página, então pode criar quaisquer mapeamentos que precisamos. Em sua implementação atual, a crate `bootloader` tem suporte para duas das abordagens acima, controladas através de [cargo features]: [cargo features]: https://doc.rust-lang.org/cargo/reference/features.html#the-features-section - A feature `map_physical_memory` mapeia a memória física completa em algum lugar no espaço de endereço virtual. Assim, o kernel tem acesso a toda a memória física e pode seguir a abordagem [_Mapear a Memória Física Completa_](#mapear-a-memoria-fisica-completa). - Com a feature `recursive_page_table`, o bootloader mapeia uma entrada da tabela de página de nível 4 recursivamente. Isso permite que o kernel acesse as tabelas de página como descrito na seção [_Tabelas de Página Recursivas_](#tabelas-de-pagina-recursivas). Escolhemos a primeira abordagem para nosso kernel, já que é simples, independente de plataforma, e mais poderosa (também permite acesso a frames que não são de tabela de página). Para habilitar o suporte de bootloader necessário, adicionamos a feature `map_physical_memory` à nossa dependência `bootloader`: ```toml [dependencies] bootloader = { version = "0.9", features = ["map_physical_memory"]} ``` Com esta feature habilitada, o bootloader mapeia a memória física completa para algum intervalo de endereço virtual não usado. Para comunicar o intervalo de endereço virtual ao nosso kernel, o bootloader passa uma estrutura de _boot information_. ### Boot Information A crate `bootloader` define uma struct [`BootInfo`] que contém todas as informações que ela passa para nosso kernel. A struct ainda está em um estágio inicial, então espere alguma quebra ao atualizar para versões [semver-incompatíveis] futuras do bootloader. Com a feature `map_physical_memory` habilitada, ela atualmente tem dois campos `memory_map` e `physical_memory_offset`: [`BootInfo`]: https://docs.rs/bootloader/0.9/bootloader/bootinfo/struct.BootInfo.html [semver-incompatíveis]: https://doc.rust-lang.org/stable/cargo/reference/specifying-dependencies.html#caret-requirements - O campo `memory_map` contém uma visão geral da memória física disponível. Isso diz ao nosso kernel quanta memória física está disponível no sistema e quais regiões de memória são reservadas para dispositivos como o hardware VGA. O mapa de memória pode ser consultado do firmware BIOS ou UEFI, mas apenas muito cedo no processo de boot. Por esta razão, deve ser fornecido pelo bootloader porque não há forma do kernel recuperá-lo mais tarde. Precisaremos do mapa de memória mais tarde nesta postagem. - O `physical_memory_offset` nos diz o endereço inicial virtual do mapeamento de memória física. Ao adicionar este deslocamento a um endereço físico, obtemos o endereço virtual correspondente. Isso nos permite acessar memória física arbitrária do nosso kernel. - Este deslocamento de memória física pode ser customizado adicionando uma tabela `[package.metadata.bootloader]` em Cargo.toml e definindo o campo `physical-memory-offset = "0x0000f00000000000"` (ou qualquer outro valor). No entanto, note que o bootloader pode entrar em panic se ele encontrar valores de endereço físico que começam a se sobrepor com o espaço além do deslocamento, isto é, áreas que ele teria previamente mapeado para alguns outros endereços físicos iniciais. Então, em geral, quanto maior o valor (> 1 TiB), melhor. O bootloader passa a struct `BootInfo` para nosso kernel na forma de um argumento `&'static BootInfo` para nossa função `_start`. Ainda não temos este argumento declarado em nossa função, então vamos adicioná-lo: ```rust // em src/main.rs use bootloader::BootInfo; #[unsafe(no_mangle)] pub extern "C" fn _start(boot_info: &'static BootInfo) -> ! { // novo argumento […] } ``` Não foi um problema deixar este argumento de fora antes porque a convenção de chamada x86_64 passa o primeiro argumento em um registrador da CPU. Assim, o argumento é simplesmente ignorado quando não é declarado. No entanto, seria um problema se usássemos acidentalmente um tipo de argumento errado, já que o compilador não conhece a assinatura de tipo correta da nossa função de ponto de entrada. ### A Macro `entry_point` Como nossa função `_start` é chamada externamente pelo bootloader, nenhuma verificação da assinatura da nossa função ocorre. Isso significa que poderíamos deixá-la receber argumentos arbitrários sem nenhum erro de compilação, mas falharia ou causaria comportamento indefinido em tempo de execução. Para garantir que a função de ponto de entrada sempre tenha a assinatura correta que o bootloader espera, a crate `bootloader` fornece uma macro [`entry_point`] que fornece uma forma verificada por tipo de definir uma função Rust como ponto de entrada. Vamos reescrever nossa função de ponto de entrada para usar esta macro: [`entry_point`]: https://docs.rs/bootloader/0.6.4/bootloader/macro.entry_point.html ```rust // em src/main.rs use bootloader::{BootInfo, entry_point}; entry_point!(kernel_main); fn kernel_main(boot_info: &'static BootInfo) -> ! { […] } ``` Não precisamos mais usar `extern "C"` ou `no_mangle` para nosso ponto de entrada, já que a macro define o verdadeiro ponto de entrada `_start` de nível mais baixo para nós. A função `kernel_main` agora é uma função Rust completamente normal, então podemos escolher um nome arbitrário para ela. A coisa importante é que ela é verificada por tipo, então um erro de compilação ocorre quando usamos uma assinatura de função errada, por exemplo, adicionando um argumento ou mudando o tipo do argumento. Vamos realizar a mesma mudança em nosso `lib.rs`: ```rust // em src/lib.rs #[cfg(test)] use bootloader::{entry_point, BootInfo}; #[cfg(test)] entry_point!(test_kernel_main); /// Ponto de entrada para `cargo test` #[cfg(test)] fn test_kernel_main(_boot_info: &'static BootInfo) -> ! { // como antes init(); test_main(); hlt_loop(); } ``` Como o ponto de entrada é usado apenas em modo de teste, adicionamos o atributo `#[cfg(test)]` a todos os itens. Damos ao nosso ponto de entrada de teste o nome distinto `test_kernel_main` para evitar confusão com o `kernel_main` do nosso `main.rs`. Não usamos o parâmetro `BootInfo` por enquanto, então prefixamos o nome do parâmetro com um `_` para silenciar o aviso de variável não usada. ## Implementação Agora que temos acesso à memória física, podemos finalmente começar a implementar nosso código de tabela de página. Primeiro, daremos uma olhada nas tabelas de página atualmente ativas nas quais nosso kernel executa. No segundo passo, criaremos uma função de tradução que retorna o endereço físico para o qual um dado endereço virtual está mapeado. Como último passo, tentaremos modificar as tabelas de página para criar um novo mapeamento. Antes de começarmos, criamos um novo módulo `memory` para nosso código: ```rust // em src/lib.rs pub mod memory; ``` Para o módulo, criamos um arquivo vazio `src/memory.rs`. ### Acessando as Tabelas de Página No [final da postagem anterior], tentamos dar uma olhada nas tabelas de página nas quais nosso kernel executa, mas falhamos, já que não conseguimos acessar o frame físico para o qual o registrador `CR3` aponta. Agora podemos continuar de lá criando uma função `active_level_4_table` que retorna uma referência à tabela de página de nível 4 ativa: [final da postagem anterior]: @/edition-2/posts/08-paging-introduction/index.md#accessing-the-page-tables ```rust // em src/memory.rs use x86_64::{ structures::paging::PageTable, VirtAddr, }; /// Retorna uma referência mutável à tabela de nível 4 ativa. /// /// Esta função é unsafe porque o chamador deve garantir que a /// memória física completa está mapeada para memória virtual no /// `physical_memory_offset` passado. Além disso, esta função deve ser chamada apenas uma vez /// para evitar referenciar `&mut` com aliasing (que é comportamento indefinido). pub unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable { use x86_64::registers::control::Cr3; let (level_4_table_frame, _) = Cr3::read(); let phys = level_4_table_frame.start_address(); let virt = physical_memory_offset + phys.as_u64(); let page_table_ptr: *mut PageTable = virt.as_mut_ptr(); unsafe { &mut *page_table_ptr } } ``` Primeiro, lemos o frame físico da tabela de nível 4 ativa do registrador `CR3`. Então pegamos seu endereço inicial físico, o convertemos para um `u64`, e o adicionamos ao `physical_memory_offset` para obter o endereço virtual onde o frame da tabela de página está mapeado. Finalmente, convertemos o endereço virtual para um ponteiro bruto `*mut PageTable` através do método `as_mut_ptr` e então criamos unsafely uma referência `&mut PageTable` dele. Criamos uma referência `&mut` em vez de uma referência `&` porque mudaremos as tabelas de página mais tarde nesta postagem. Não precisávamos especificar o nome da nossa função de ponto de entrada explicitamente, já que o linker procura por uma função com o nome `_start` por padrão. Agora podemos usar esta função para imprimir as entradas da tabela de nível 4: ```rust // em src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory::active_level_4_table; use x86_64::VirtAddr; println!("Olá Mundo{}", "!"); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let l4_table = unsafe { active_level_4_table(phys_mem_offset) }; for (i, entry) in l4_table.iter().enumerate() { if !entry.is_unused() { println!("Entrada L4 {}: {:?}", i, entry); } } // como antes #[cfg(test)] test_main(); println!("Não crashou!"); blog_os::hlt_loop(); } ``` Primeiro, convertemos o `physical_memory_offset` da struct `BootInfo` para um [`VirtAddr`] e o passamos para a função `active_level_4_table`. Então usamos a função `iter` para iterar sobre as entradas da tabela de página e o combinador [`enumerate`] para adicionar adicionalmente um índice `i` a cada elemento. Imprimimos apenas entradas não vazias porque todas as 512 entradas não caberiam na tela. [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate Quando o executamos, vemos a seguinte saída: ![QEMU printing entry 0 (0x2000, PRESENT, WRITABLE, ACCESSED), entry 1 (0x894000, PRESENT, WRITABLE, ACCESSED, DIRTY), entry 31 (0x88e000, PRESENT, WRITABLE, ACCESSED, DIRTY), entry 175 (0x891000, PRESENT, WRITABLE, ACCESSED, DIRTY), and entry 504 (0x897000, PRESENT, WRITABLE, ACCESSED, DIRTY)](qemu-print-level-4-table.png) Vemos que existem várias entradas não vazias, que todas mapeiam para diferentes tabelas de nível 3. Há tantas regiões porque código do kernel, pilha do kernel, mapeamento de memória física, e informação de boot todos usam áreas de memória separadas. Para percorrer as tabelas de página mais e dar uma olhada em uma tabela de nível 3, podemos pegar o frame mapeado de uma entrada e convertê-lo para um endereço virtual novamente: ```rust // em no loop `for` em src/main.rs use x86_64::structures::paging::PageTable; if !entry.is_unused() { println!("Entrada L4 {}: {:?}", i, entry); // obtém o endereço físico da entrada e o converte let phys = entry.frame().unwrap().start_address(); let virt = phys.as_u64() + boot_info.physical_memory_offset; let ptr = VirtAddr::new(virt).as_mut_ptr(); let l3_table: &PageTable = unsafe { &*ptr }; // imprime entradas não vazias da tabela de nível 3 for (i, entry) in l3_table.iter().enumerate() { if !entry.is_unused() { println!(" Entrada L3 {}: {:?}", i, entry); } } } ``` Para olhar as tabelas de nível 2 e nível 1, repetimos esse processo para as entradas de nível 3 e nível 2. Como você pode imaginar, isso se torna muito verboso muito rapidamente, então não mostramos o código completo aqui. Percorrer as tabelas de página manualmente é interessante porque ajuda a entender como a CPU realiza a tradução. No entanto, na maioria das vezes, estamos interessados apenas no endereço físico mapeado para um dado endereço virtual, então vamos criar uma função para isso. ### Traduzindo Endereços Para traduzir um endereço virtual para físico, temos que percorrer a tabela de página de quatro níveis até alcançarmos o frame mapeado. Vamos criar uma função que realiza esta tradução: ```rust // em src/memory.rs use x86_64::PhysAddr; /// Traduz o endereço virtual dado para o endereço físico mapeado, ou /// `None` se o endereço não está mapeado. /// /// Esta função é unsafe porque o chamador deve garantir que a /// memória física completa está mapeada para memória virtual no /// `physical_memory_offset` passado. pub unsafe fn translate_addr(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { translate_addr_inner(addr, physical_memory_offset) } ``` Encaminhamos a função para uma função `translate_addr_inner` segura para limitar o escopo de `unsafe`. Como notamos acima, Rust trata o corpo completo de uma `unsafe fn` como um grande bloco unsafe. Ao chamar uma função privada segura, tornamos cada operação `unsafe` explícita novamente. A função privada interna contém a implementação real: ```rust // em src/memory.rs /// Função privada que é chamada por `translate_addr`. /// /// Esta função é segura para limitar o escopo de `unsafe` porque Rust trata /// todo o corpo de funções unsafe como um bloco unsafe. Esta função deve /// ser alcançável apenas através de `unsafe fn` de fora deste módulo. fn translate_addr_inner(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { use x86_64::structures::paging::page_table::FrameError; use x86_64::registers::control::Cr3; // lê o frame da tabela de nível 4 ativa do registrador CR3 let (level_4_table_frame, _) = Cr3::read(); let table_indexes = [ addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index() ]; let mut frame = level_4_table_frame; // percorre a tabela de página multinível for &index in &table_indexes { // converte o frame em uma referência de tabela de página let virt = physical_memory_offset + frame.start_address().as_u64(); let table_ptr: *const PageTable = virt.as_ptr(); let table = unsafe {&*table_ptr}; // lê a entrada da tabela de página e atualiza `frame` let entry = &table[index]; frame = match entry.frame() { Ok(frame) => frame, Err(FrameError::FrameNotPresent) => return None, Err(FrameError::HugeFrame) => panic!("huge pages não suportadas"), }; } // calcula o endereço físico adicionando o deslocamento de página Some(frame.start_address() + u64::from(addr.page_offset())) } ``` Em vez de reutilizar nossa função `active_level_4_table`, lemos o frame de nível 4 do registrador `CR3` novamente. Fazemos isso porque isso simplifica esta implementação de protótipo. Não se preocupe, criaremos uma solução melhor em um momento. A struct `VirtAddr` já fornece métodos para computar os índices nas tabelas de página dos quatro níveis. Armazenamos esses índices em um pequeno array porque isso nos permite percorrer as tabelas de página usando um loop `for`. Fora do loop, lembramos do último `frame` visitado para calcular o endereço físico mais tarde. O `frame` aponta para frames de tabela de página enquanto itera e para o frame mapeado após a última iteração, isto é, após seguir a entrada de nível 1. Dentro do loop, novamente usamos o `physical_memory_offset` para converter o frame em uma referência de tabela de página. Então lemos a entrada da tabela de página atual e usamos a função [`PageTableEntry::frame`] para recuperar o frame mapeado. Se a entrada não está mapeada para um frame, retornamos `None`. Se a entrada mapeia uma huge page de 2 MiB ou 1 GiB, entramos em panic por enquanto. [`PageTableEntry::frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html#method.frame Vamos testar nossa função de tradução traduzindo alguns endereços: ```rust // em src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // nova importação use blog_os::memory::translate_addr; […] // hello world e blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let addresses = [ // a página do buffer vga com identity mapping 0xb8000, // alguma página de código 0x201008, // alguma página de pilha 0x0100_0020_1a10, // endereço virtual mapeado para endereço físico 0 boot_info.physical_memory_offset, ]; for &address in &addresses { let virt = VirtAddr::new(address); let phys = unsafe { translate_addr(virt, phys_mem_offset) }; println!("{:?} -> {:?}", virt, phys); } […] // test_main(), impressão "não crashou", e hlt_loop() } ``` Quando o executamos, vemos a seguinte saída: ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, "panicked at 'huge pages não suportadas'](qemu-translate-addr.png) Como esperado, o endereço com identity mapping `0xb8000` traduz para o mesmo endereço físico. A página de código e a página de pilha traduzem para alguns endereços físicos arbitrários, que dependem de como o bootloader criou o mapeamento inicial para nosso kernel. Vale notar que os últimos 12 bits sempre permanecem os mesmos após a tradução, o que faz sentido porque esses bits são o [_deslocamento de página_] e não fazem parte da tradução. [_deslocamento de página_]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 Como cada endereço físico pode ser acessado adicionando o `physical_memory_offset`, a tradução do próprio endereço `physical_memory_offset` deveria apontar para o endereço físico `0`. No entanto, a tradução falha porque o mapeamento usa huge pages para eficiência, o que não é suportado em nossa implementação ainda. ### Usando `OffsetPageTable` Traduzir endereços virtuais para físicos é uma tarefa comum em um kernel de SO, portanto a crate `x86_64` fornece uma abstração para isso. A implementação já suporta huge pages e várias outras funções de tabela de página além de `translate_addr`, então a usaremos no seguinte em vez de adicionar suporte a huge pages à nossa própria implementação. Na base da abstração estão duas traits que definem várias funções de mapeamento de tabela de página: - A trait [`Mapper`] é genérica sobre o tamanho da página e fornece funções que operam em páginas. Exemplos são [`translate_page`], que traduz uma dada página para um frame do mesmo tamanho, e [`map_to`], que cria um novo mapeamento na tabela de página. - A trait [`Translate`] fornece funções que trabalham com múltiplos tamanhos de página, como [`translate_addr`] ou a [`translate`] geral. [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`translate_page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#tymethod.translate_page [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to [`Translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html [`translate_addr`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#method.translate_addr [`translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#tymethod.translate As traits apenas definem a interface, elas não fornecem nenhuma implementação. A crate `x86_64` atualmente fornece três tipos que implementam as traits com diferentes requisitos. O tipo [`OffsetPageTable`] assume que a memória física completa está mapeada para o espaço de endereço virtual em algum deslocamento. O [`MappedPageTable`] é um pouco mais flexível: Ele apenas requer que cada frame de tabela de página esteja mapeado para o espaço de endereço virtual em um endereço calculável. Finalmente, o tipo [`RecursivePageTable`] pode ser usado para acessar frames de tabela de página através de [tabelas de página recursivas](#tabelas-de-pagina-recursivas). [`OffsetPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html [`MappedPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MappedPageTable.html [`RecursivePageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html No nosso caso, o bootloader mapeia a memória física completa em um endereço virtual especificado pela variável `physical_memory_offset`, então podemos usar o tipo `OffsetPageTable`. Para inicializá-lo, criamos uma nova função `init` em nosso módulo `memory`: ```rust use x86_64::structures::paging::OffsetPageTable; /// Inicializa um novo OffsetPageTable. /// /// Esta função é unsafe porque o chamador deve garantir que a /// memória física completa está mapeada para memória virtual no /// `physical_memory_offset` passado. Além disso, esta função deve ser chamada apenas uma vez /// para evitar referenciar `&mut` com aliasing (que é comportamento indefinido). pub unsafe fn init(physical_memory_offset: VirtAddr) -> OffsetPageTable<'static> { unsafe { let level_4_table = active_level_4_table(physical_memory_offset); OffsetPageTable::new(level_4_table, physical_memory_offset) } } // torna privada unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable {…} ``` A função recebe o `physical_memory_offset` como argumento e retorna uma nova instância `OffsetPageTable` com um tempo de vida `'static`. Isso significa que a instância permanece válida pela execução completa do nosso kernel. No corpo da função, primeiro chamamos a função `active_level_4_table` para recuperar uma referência mutável à tabela de página de nível 4. Então invocamos a função [`OffsetPageTable::new`] com esta referência. Como segundo parâmetro, a função `new` espera o endereço virtual no qual o mapeamento da memória física começa, que é dado na variável `physical_memory_offset`. [`OffsetPageTable::new`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html#method.new A função `active_level_4_table` deve ser chamada apenas da função `init` a partir de agora porque pode facilmente levar a referências mutáveis com aliasing quando chamada múltiplas vezes, o que pode causar comportamento indefinido. Por esta razão, tornamos a função privada removendo o especificador `pub`. Agora podemos usar o método `Translate::translate_addr` em vez de nossa própria função `memory::translate_addr`. Precisamos mudar apenas algumas linhas em nosso `kernel_main`: ```rust // em src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // novo: importações diferentes use blog_os::memory; use x86_64::{structures::paging::Translate, VirtAddr}; […] // hello world e blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); // novo: inicializa um mapper let mapper = unsafe { memory::init(phys_mem_offset) }; let addresses = […]; // mesmo de antes for &address in &addresses { let virt = VirtAddr::new(address); // novo: use o método `mapper.translate_addr` let phys = mapper.translate_addr(virt); println!("{:?} -> {:?}", virt, phys); } […] // test_main(), impressão "não crashou", e hlt_loop() } ``` Precisamos importar a trait `Translate` para usar o método [`translate_addr`] que ela fornece. Quando o executamos agora, vemos os mesmos resultados de tradução de antes, com a diferença de que a tradução de huge page agora também funciona: ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, 0x18000000000 -> 0x0](qemu-mapper-translate-addr.png) Como esperado, as traduções de `0xb8000` e dos endereços de código e pilha permanecem as mesmas da nossa própria função de tradução. Adicionalmente, agora vemos que o endereço virtual `physical_memory_offset` está mapeado para o endereço físico `0x0`. Ao usar a função de tradução do tipo `MappedPageTable`, podemos nos poupar o trabalho de implementar suporte a huge pages. Também temos acesso a outras funções de página, como `map_to`, que usaremos na próxima seção. Neste ponto, não precisamos mais de nossas funções `memory::translate_addr` e `memory::translate_addr_inner`, então podemos deletá-las. ### Criando um Novo Mapeamento Até agora, apenas olhamos para as tabelas de página sem modificar nada. Vamos mudar isso criando um novo mapeamento para uma página previamente não mapeada. Usaremos a função [`map_to`] da trait [`Mapper`] para nossa implementação, então vamos olhar para essa função primeiro. A documentação nos diz que ela recebe quatro argumentos: a página que queremos mapear, o frame para o qual a página deve ser mapeada, um conjunto de flags para a entrada da tabela de página, e um `frame_allocator`. O frame allocator é necessário porque mapear a página dada pode requerer criar tabelas de página adicionais, que precisam de frames não usados como armazenamento de respaldo. [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html#tymethod.map_to [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html #### Uma Função `create_example_mapping` O primeiro passo de nossa implementação é criar uma nova função `create_example_mapping` que mapeia uma dada página virtual para `0xb8000`, o frame físico do buffer de texto VGA. Escolhemos esse frame porque nos permite facilmente testar se o mapeamento foi criado corretamente: Apenas precisamos escrever na página recém-mapeada e ver se vemos a escrita aparecer na tela. A função `create_example_mapping` se parece com isto: ```rust // em src/memory.rs use x86_64::{ PhysAddr, structures::paging::{Page, PhysFrame, Mapper, Size4KiB, FrameAllocator} }; /// Cria um mapeamento de exemplo para a página dada para o frame `0xb8000`. pub fn create_example_mapping( page: Page, mapper: &mut OffsetPageTable, frame_allocator: &mut impl FrameAllocator, ) { use x86_64::structures::paging::PageTableFlags as Flags; let frame = PhysFrame::containing_address(PhysAddr::new(0xb8000)); let flags = Flags::PRESENT | Flags::WRITABLE; let map_to_result = unsafe { // FIXME: isso não é seguro, fazemos apenas para testes mapper.map_to(page, frame, flags, frame_allocator) }; map_to_result.expect("map_to falhou").flush(); } ``` Além da `page` que deve ser mapeada, a função espera uma referência mutável para uma instância `OffsetPageTable` e um `frame_allocator`. O parâmetro `frame_allocator` usa a sintaxe [`impl Trait`][impl-trait-arg] para ser [genérico] sobre todos os tipos que implementam a trait [`FrameAllocator`]. A trait é genérica sobre a trait [`PageSize`] para trabalhar com páginas padrão de 4 KiB e huge pages de 2 MiB/1 GiB. Queremos criar apenas um mapeamento de 4 KiB, então definimos o parâmetro genérico para `Size4KiB`. [impl-trait-arg]: https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters [genérico]: https://doc.rust-lang.org/book/ch10-00-generics.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`PageSize`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/trait.PageSize.html O método [`map_to`] é unsafe porque o chamador deve garantir que o frame ainda não está em uso. A razão para isso é que mapear o mesmo frame duas vezes poderia resultar em comportamento indefinido, por exemplo, quando duas diferentes referências `&mut` apontam para a mesma localização de memória física. No nosso caso, reutilizamos o frame do buffer de texto VGA, que já está mapeado, então quebramos a condição necessária. No entanto, a função `create_example_mapping` é apenas uma função de teste temporária e será removida após esta postagem, então está ok. Para nos lembrar da insegurança, colocamos um comentário `FIXME` na linha. Além da `page` e do `unused_frame`, o método `map_to` recebe um conjunto de flags para o mapeamento e uma referência ao `frame_allocator`, que será explicado em um momento. Para as flags, definimos a flag `PRESENT` porque ela é necessária para todas as entradas válidas e a flag `WRITABLE` para tornar a página mapeada gravável. Para uma lista de todas as flags possíveis, veja a seção [_Formato da Tabela de Página_] da postagem anterior. [_Formato da Tabela de Página_]: @/edition-2/posts/08-paging-introduction/index.md#page-table-format O método [`map_to`] pode falhar, então retorna um [`Result`]. Como este é apenas algum código de exemplo que não precisa ser robusto, apenas usamos [`expect`] para entrar em panic quando ocorre um erro. Em sucesso, a função retorna um tipo [`MapperFlush`] que fornece uma forma fácil de esvaziar a página recém-mapeada do translation lookaside buffer (TLB) com seu método [`flush`]. Como `Result`, o tipo usa o atributo [`#[must_use]`][must_use] para emitir um aviso se acidentalmente esquecermos de usá-lo. [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush [must_use]: https://doc.rust-lang.org/std/result/#results-must-be-used #### Um `FrameAllocator` Fictício Para poder chamar `create_example_mapping`, precisamos criar um tipo que implemente a trait `FrameAllocator` primeiro. Como notado acima, a trait é responsável por alocar frames para novas tabelas de página se elas são necessárias pelo `map_to`. Vamos começar com o caso simples e assumir que não precisamos criar novas tabelas de página. Para este caso, um frame allocator que sempre retorna `None` é suficiente. Criamos tal `EmptyFrameAllocator` para testar nossa função de mapeamento: ```rust // em src/memory.rs /// Um FrameAllocator que sempre retorna `None`. pub struct EmptyFrameAllocator; unsafe impl FrameAllocator for EmptyFrameAllocator { fn allocate_frame(&mut self) -> Option { None } } ``` Implementar o `FrameAllocator` é unsafe porque o implementador deve garantir que o allocator retorna apenas frames não usados. Caso contrário, comportamento indefinido pode ocorrer, por exemplo, quando duas páginas virtuais são mapeadas para o mesmo frame físico. Nosso `EmptyFrameAllocator` apenas retorna `None`, então isso não é um problema neste caso. #### Escolhendo uma Página Virtual Agora temos um frame allocator simples que podemos passar para nossa função `create_example_mapping`. No entanto, o allocator sempre retorna `None`, então isso só funcionará se nenhum frame de tabela de página adicional for necessário para criar o mapeamento. Para entender quando frames de tabela de página adicionais são necessários e quando não, vamos considerar um exemplo: ![A virtual and a physical address space with a single mapped page and the page tables of all four levels](required-page-frames-example.svg) O gráfico mostra o espaço de endereço virtual à esquerda, o espaço de endereço físico à direita, e as tabelas de página entre eles. As tabelas de página são armazenadas em frames de memória física, indicados pelas linhas tracejadas. O espaço de endereço virtual contém uma única página mapeada no endereço `0x803fe00000`, marcada em azul. Para traduzir esta página para seu frame, a CPU percorre a tabela de página de 4 níveis até alcançar o frame no endereço 36 KiB. Adicionalmente, o gráfico mostra o frame físico do buffer de texto VGA em vermelho. Nosso objetivo é mapear uma página virtual previamente não mapeada para este frame usando nossa função `create_example_mapping`. Como nosso `EmptyFrameAllocator` sempre retorna `None`, queremos criar o mapeamento de forma que nenhum frame adicional seja necessário do allocator. Isso depende da página virtual que selecionamos para o mapeamento. O gráfico mostra duas páginas candidatas no espaço de endereço virtual, ambas marcadas em amarelo. Uma página está no endereço `0x803fdfd000`, que é 3 páginas antes da página mapeada (em azul). Enquanto os índices de tabela de página de nível 4 e nível 3 são os mesmos da página azul, os índices de nível 2 e nível 1 são diferentes (veja a [postagem anterior][page-table-indices]). O índice diferente na tabela de nível 2 significa que uma tabela de nível 1 diferente é usada para esta página. Como esta tabela de nível 1 ainda não existe, precisaríamos criá-la se escolhêssemos aquela página para nosso mapeamento de exemplo, o que requereria um frame físico não usado adicional. Em contraste, a segunda página candidata no endereço `0x803fe02000` não tem este problema porque usa a mesma tabela de página de nível 1 que a página azul. Assim, todas as tabelas de página necessárias já existem. [page-table-indices]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 Em resumo, a dificuldade de criar um novo mapeamento depende da página virtual que queremos mapear. No caso mais fácil, a tabela de página de nível 1 para a página já existe e apenas precisamos escrever uma única entrada. No caso mais difícil, a página está em uma região de memória para a qual ainda não existe nível 3, então precisamos criar novas tabelas de página de nível 3, nível 2 e nível 1 primeiro. Para chamar nossa função `create_example_mapping` com o `EmptyFrameAllocator`, precisamos escolher uma página para a qual todas as tabelas de página já existem. Para encontrar tal página, podemos utilizar o fato de que o bootloader se carrega no primeiro megabyte do espaço de endereço virtual. Isso significa que uma tabela de nível 1 válida existe para todas as páginas nesta região. Assim, podemos escolher qualquer página não usada nesta região de memória para nosso mapeamento de exemplo, como a página no endereço `0`. Normalmente, esta página deveria permanecer não usada para garantir que desreferenciar um ponteiro nulo cause um page fault, então sabemos que o bootloader a deixa não mapeada. #### Criando o Mapeamento Agora temos todos os parâmetros necessários para chamar nossa função `create_example_mapping`, então vamos modificar nossa função `kernel_main` para mapear a página no endereço virtual `0`. Como mapeamos a página para o frame do buffer de texto VGA, deveríamos ser capazes de escrever na tela através dela depois. A implementação se parece com isto: ```rust // em src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory; use x86_64::{structures::paging::Page, VirtAddr}; // nova importação […] // hello world e blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = memory::EmptyFrameAllocator; // mapeia uma página não usada let page = Page::containing_address(VirtAddr::new(0)); memory::create_example_mapping(page, &mut mapper, &mut frame_allocator); // escreve a string `New!` na tela através do novo mapeamento let page_ptr: *mut u64 = page.start_address().as_mut_ptr(); unsafe { page_ptr.offset(400).write_volatile(0x_f021_f077_f065_f04e)}; […] // test_main(), impressão "não crashou", e hlt_loop() } ``` Primeiro, criamos o mapeamento para a página no endereço `0` chamando nossa função `create_example_mapping` com referências mutáveis às instâncias `mapper` e `frame_allocator`. Isso mapeia a página para o frame do buffer de texto VGA, então deveríamos ver qualquer escrita a ela na tela. Então, convertemos a página para um ponteiro bruto e escrevemos um valor no deslocamento `400`. Não escrevemos no início da página porque a linha superior do buffer VGA é diretamente deslocada para fora da tela pelo próximo `println`. Escrevemos o valor `0x_f021_f077_f065_f04e`, que representa a string _"New!"_ em um fundo branco. Como aprendemos [na postagem _"Modo de Texto VGA"_], escritas no buffer VGA devem ser voláteis, então usamos o método [`write_volatile`]. [na postagem _"Modo de Texto VGA"_]: @/edition-2/posts/03-vga-text-buffer/index.md#volatile [`write_volatile`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write_volatile Quando o executamos no QEMU, vemos a seguinte saída: ![QEMU printing "Não crashou!" with four completely white cells in the middle of the screen](qemu-new-mapping.png) O _"New!"_ na tela é causado por nossa escrita na página `0`, o que significa que criamos com sucesso um novo mapeamento nas tabelas de página. Criar aquele mapeamento só funcionou porque a tabela de nível 1 responsável pela página no endereço `0` já existe. Quando tentamos mapear uma página para a qual não existe tabela de nível 1 ainda, a função `map_to` falha porque tenta criar novas tabelas de página alocando frames com o `EmptyFrameAllocator`. Podemos ver isso acontecer quando tentamos mapear a página `0xdeadbeaf000` em vez de `0`: ```rust // em src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { […] let page = Page::containing_address(VirtAddr::new(0xdeadbeaf000)); […] } ``` Quando o executamos, um panic com a seguinte mensagem de erro ocorre: ``` panicked at 'map_to falhou: FrameAllocationFailed', /…/result.rs:999:5 ``` Para mapear páginas que ainda não têm uma tabela de página de nível 1, precisamos criar um `FrameAllocator` apropriado. Mas como sabemos quais frames não estão usados e quanta memória física está disponível? ### Alocando Frames Para criar novas tabelas de página, precisamos criar um frame allocator apropriado. Para fazer isso, usamos o `memory_map` que é passado pelo bootloader como parte da struct `BootInfo`: ```rust // em src/memory.rs use bootloader::bootinfo::MemoryMap; /// Um FrameAllocator que retorna frames utilizáveis do mapa de memória do bootloader. pub struct BootInfoFrameAllocator { memory_map: &'static MemoryMap, next: usize, } impl BootInfoFrameAllocator { /// Cria um FrameAllocator do mapa de memória passado. /// /// Esta função é unsafe porque o chamador deve garantir que o mapa de memória /// passado é válido. O requisito principal é que todos os frames que são marcados /// como `USABLE` nele estejam realmente não usados. pub unsafe fn init(memory_map: &'static MemoryMap) -> Self { BootInfoFrameAllocator { memory_map, next: 0, } } } ``` A struct tem dois campos: Uma referência `'static` ao mapa de memória passado pelo bootloader e um campo `next` que mantém rastro do número do próximo frame que o allocator deve retornar. Como explicamos na seção [_Boot Information_](#boot-information), o mapa de memória é fornecido pelo firmware BIOS/UEFI. Ele pode ser consultado apenas muito cedo no processo de boot, então o bootloader já chama as respectivas funções para nós. O mapa de memória consiste de uma lista de structs [`MemoryRegion`], que contêm o endereço inicial, o comprimento, e o tipo (por exemplo, não usado, reservado, etc.) de cada região de memória. A função `init` inicializa um `BootInfoFrameAllocator` com um dado mapa de memória. O campo `next` é inicializado com `0` e será aumentado para cada alocação de frame para evitar retornar o mesmo frame duas vezes. Como não sabemos se os frames utilizáveis do mapa de memória já foram usados em outro lugar, nossa função `init` deve ser `unsafe` para requerer garantias adicionais do chamador. #### Um Método `usable_frames` Antes de implementarmos a trait `FrameAllocator`, adicionamos um método auxiliar que converte o mapa de memória em um iterador de frames utilizáveis: ```rust // em src/memory.rs use bootloader::bootinfo::MemoryRegionType; impl BootInfoFrameAllocator { /// Retorna um iterador sobre os frames utilizáveis especificados no mapa de memória. fn usable_frames(&self) -> impl Iterator { // obtém regiões utilizáveis do mapa de memória let regions = self.memory_map.iter(); let usable_regions = regions .filter(|r| r.region_type == MemoryRegionType::Usable); // mapeia cada região para seu intervalo de endereços let addr_ranges = usable_regions .map(|r| r.range.start_addr()..r.range.end_addr()); // transforma em um iterador de endereços iniciais de frame let frame_addresses = addr_ranges.flat_map(|r| r.step_by(4096)); // cria tipos `PhysFrame` dos endereços iniciais frame_addresses.map(|addr| PhysFrame::containing_address(PhysAddr::new(addr))) } } ``` Esta função usa métodos combinadores de iterador para transformar o `MemoryMap` inicial em um iterador de frames físicos utilizáveis: - Primeiro, chamamos o método `iter` para converter o mapa de memória em um iterador de [`MemoryRegion`]s. - Então usamos o método [`filter`] para pular qualquer região reservada ou de outra forma indisponível. O bootloader atualiza o mapa de memória para todos os mapeamentos que cria, então frames que são usados por nosso kernel (código, dados, ou pilha) ou para armazenar a boot information já estão marcados como `InUse` ou similar. Assim, podemos ter certeza de que frames `Usable` não são usados em outro lugar. - Depois, usamos o combinador [`map`] e a [sintaxe de range] do Rust para transformar nosso iterador de regiões de memória em um iterador de intervalos de endereços. - Em seguida, usamos [`flat_map`] para transformar os intervalos de endereços em um iterador de endereços iniciais de frame, escolhendo cada 4096º endereço usando [`step_by`]. Como 4096 bytes (= 4 KiB) é o tamanho da página, obtemos o endereço inicial de cada frame. O bootloader alinha todas as áreas de memória utilizáveis por página, então não precisamos de nenhum código de alinhamento ou arredondamento aqui. Ao usar [`flat_map`] em vez de `map`, obtemos um `Iterator` em vez de um `Iterator>`. - Finalmente, convertemos os endereços iniciais para tipos `PhysFrame` para construir um `Iterator`. [`MemoryRegion`]: https://docs.rs/bootloader/0.6.4/bootloader/bootinfo/struct.MemoryRegion.html [`filter`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter [`map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.map [sintaxe de range]: https://doc.rust-lang.org/core/ops/struct.Range.html [`step_by`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.step_by [`flat_map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.flat_map O tipo de retorno da função usa a feature [`impl Trait`]. Desta forma, podemos especificar que retornamos algum tipo que implementa a trait [`Iterator`] com tipo de item `PhysFrame` mas não precisamos nomear o tipo de retorno concreto. Isso é importante aqui porque não _podemos_ nomear o tipo concreto já que ele depende de tipos de closure não nomeáveis. [`impl Trait`]: https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits [`Iterator`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html #### Implementando a Trait `FrameAllocator` Agora podemos implementar a trait `FrameAllocator`: ```rust // em src/memory.rs unsafe impl FrameAllocator for BootInfoFrameAllocator { fn allocate_frame(&mut self) -> Option { let frame = self.usable_frames().nth(self.next); self.next += 1; frame } } ``` Primeiro usamos o método `usable_frames` para obter um iterador de frames utilizáveis do mapa de memória. Então, usamos a função [`Iterator::nth`] para obter o frame com índice `self.next` (pulando assim `(self.next - 1)` frames). Antes de retornar aquele frame, aumentamos `self.next` em um para que retornemos o frame seguinte na próxima chamada. [`Iterator::nth`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.nth Esta implementação não é totalmente ideal, já que ela recria o allocator `usable_frame` em cada alocação. Seria melhor armazenar diretamente o iterador como um campo de struct em vez disso. Então não precisaríamos do método `nth` e poderíamos apenas chamar [`next`] em cada alocação. O problema com esta abordagem é que não é possível armazenar um tipo `impl Trait` em um campo de struct atualmente. Pode funcionar algum dia quando [_named existential types_] estiverem totalmente implementados. [`next`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#tymethod.next [_named existential types_]: https://github.com/rust-lang/rfcs/pull/2071 #### Usando o `BootInfoFrameAllocator` Agora podemos modificar nossa função `kernel_main` para passar uma instância `BootInfoFrameAllocator` em vez de um `EmptyFrameAllocator`: ```rust // em src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory::BootInfoFrameAllocator; […] let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; […] } ``` Com o boot info frame allocator, o mapeamento tem sucesso e vemos o _"New!"_ preto-sobre-branco na tela novamente. Por trás das cortinas, o método `map_to` cria as tabelas de página faltantes da seguinte forma: - Use o `frame_allocator` passado para alocar um frame não usado. - Zera o frame para criar uma nova tabela de página vazia. - Mapeia a entrada da tabela de nível mais alto para aquele frame. - Continua com o próximo nível de tabela. Embora nossa função `create_example_mapping` seja apenas algum código de exemplo, agora somos capazes de criar novos mapeamentos para páginas arbitrárias. Isso será essencial para alocar memória ou implementar multithreading em postagens futuras. Neste ponto, devemos deletar a função `create_example_mapping` novamente para evitar acidentalmente invocar comportamento indefinido, como explicado [acima](#uma-funcao-create-example-mapping). ## Resumo Nesta postagem, aprendemos sobre diferentes técnicas para acessar os frames físicos das tabelas de página, incluindo identity mapping, mapeamento da memória física completa, mapeamento temporário, e tabelas de página recursivas. Escolhemos mapear a memória física completa, já que é simples, portável e poderosa. Não podemos mapear a memória física do nosso kernel sem acesso à tabela de página, então precisamos de suporte do bootloader. A crate `bootloader` suporta criar o mapeamento necessário através de cargo crate features opcionais. Ela passa a informação necessária para nosso kernel na forma de um argumento `&BootInfo` para nossa função de ponto de entrada. Para nossa implementação, primeiro percorremos manualmente as tabelas de página para implementar uma função de tradução, e então usamos o tipo `MappedPageTable` da crate `x86_64`. Também aprendemos como criar novos mapeamentos na tabela de página e como criar o `FrameAllocator` necessário em cima do mapa de memória passado pelo bootloader. ## O Que Vem a Seguir? A próxima postagem criará uma região de memória heap para nosso kernel, o que nos permitirá [alocar memória] e usar vários [tipos de coleção]. [alocar memória]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html [tipos de coleção]: https://doc.rust-lang.org/alloc/collections/index.html ================================================ FILE: blog/content/edition-2/posts/09-paging-implementation/index.zh-CN.md ================================================ +++ title = "分页实现" weight = 9 path = "zh-CN/paging-implementation" date = 2019-03-14 [extra] # Please update this when updating the translation translation_based_on_commit = "e56c635c13b61f052089ea6365be8422b5b28d15" # GitHub usernames of the people that translated this post translators = ["weijiew"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["liuyuran"] +++ 这篇文章展示了如何在我们的内核中实现分页支持。它首先探讨了使物理页表帧能够被内核访问的不同技术,并讨论了它们各自的优点和缺点。然后,它实现了一个地址转换功能和一个创建新映射的功能。 这个系列的 blog 在[GitHub]上开放开发,如果你有任何问题,请在这里开一个 issue 来讨论。当然你也可以在[底部][at the bottom]留言。你可以在[`post-09`][post branch]找到这篇文章的完整源码。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-09 ## 介绍 [前文]已经对分页的概念做了介绍。通过比较分页和分段来证明分页的优势,然后解释了分页和页表如何工作,最后介绍了`x86_64`的4级页表设计。此时 bootloader 已经为内核建立了一个页表层次结构,这意味着内核已经在虚拟地址上运行。这样做提高了安全性,因为非法的内存访问会导致页面故障异常,而不是修改任意的物理内存。 [前文]: @/edition-2/posts/08-paging-introduction/index.md 这篇文章最后说,我们[不能从内核中访问页表][end of previous post],因为它们存储在物理内存中,而内核已经在虚拟地址上运行。 这篇文章探讨了使页表框能够被内核访问的不同方法。接下来将讨论每种方法的优点和缺点,最后决定内核采用哪种方法。 [end of previous post]: @/edition-2/posts/08-paging-introduction/index.md#accessing-the-page-tables 为了实现这个方法,我们需要 bootloader 的支持,所以首先要配置它。之后将实现一个遍历页表层次结构的函数,以便将虚拟地址转换为物理地址。最后,我们学习如何在页表中创建新的映射,以及如何为创建新的页表找到未使用的内存框。 ## 访问页表 从内核中访问页表并不像它看起来那么容易。为了理解这个问题,让我们再看一下上一篇文章中的4级页表层次结构的例子。 ![一个4级页层次结构的例子,每个页表都显示在物理内存中](../paging-introduction/x86_64-page-table-translation.svg) 这里重要的是,每个页面条目都存储了下一个表的 _物理_ 地址。这就避免了对这些地址也要进行翻译,这对性能不利,而且容易造成无休止的翻译循环。 此时的问题是,内核无法直接访问物理地址,因为内核也是在虚拟地址之上运行的。例如,当访问地址`4 KiB`时,访问的是 _虚拟_ 地址`4 KiB`,而不是存储4级页表的 _物理_ 地址`4 KiB`。 因此,为了访问页表框架,我们需要将一些虚拟页面映射到它们。有不同的方法来创建这些映射,这些映射都允许我们访问任意的页表框架。 ### 直接映射 一个简单的解决方案是**所有页表的身份映射**。 ![一个虚拟和一个物理地址空间,各种虚拟页以相同的地址映射到物理帧上](identity-mapped-page-tables.svg) 在这个例子中,我们看到各种直接映射的页表框架。页表的物理地址也是有效的虚拟地址,这样我们就可以很容易地访问从CR3寄存器开始的各级页表。 然而,它使虚拟地址空间变得杂乱无章,并使寻找较大尺寸的连续内存区域更加困难。例如,想象一下,我们想在上述图形中创建一个大小为1000 KiB的虚拟内存区域,例如: [memory-mapping a file]。我们不能在`28 KiB`处开始区域,因为它将与`1004 KiB`处已经映射的页面相撞。所以我们必须进一步寻找,直到找到一个足够大的未映射区域,例如在`1008 KiB`。这是一个类似于[segmentation]的碎片化问题。 [memory-mapping a file]: https://en.wikipedia.org/wiki/Memory-mapped_file [segmentation]: @/edition-2/posts/08-paging-introduction/index.md#fragmentation 同样,这也使得创建新的页表更加困难,因为我们需要找到对应的页还没有被使用的物理框。例如,让我们假设我们为我们的内存映射文件保留了 _虚拟_ 1000 KiB内存区域,从`1008 KiB`开始。现在我们不能再使用任何物理地址在`1000 KiB`和`2008 KiB`之间的帧,因为我们不能对它进行 identity map 。 ### 映射一个固定的偏移 为了避免虚拟地址空间的杂乱问题,我们可以**使用一个单独的内存区域来进行页表映射**。因此,我们不是以直接映射页表帧,而是以虚拟地址空间中的固定偏移量来映射它们。例如,偏移量可以是10 TiB。 ![与直接映射的数字相同,但每个映射的虚拟页偏移了10TiB。](page-tables-mapped-at-offset.svg) 通过使用范围为`10 TiB...(10 TiB + 物理内存大小)`的虚拟内存专门用于页表映射,避免了直接映射的碰撞问题。只有当虚拟地址空间比物理内存大小大得多时,保留如此大的虚拟地址空间区域才有可能。这在x86_64上不是一个问题,因为48位的地址空间有256 TiB大。 这种方法仍然有一个缺点,即每当我们创建一个新的页表时,我们都需要创建一个新的映射。另外,它不允许访问其他地址空间的页表,这在创建新进程时是很有用的。 ### 映射完整的物理内存 我们可以通过**映射完整的物理内存**来解决这些问题,而不是只映射页表框架。 ![与偏移量映射的数字相同,但每个物理帧都有一个映射(在10 TiB + X),而不是只有页表帧。](map-complete-physical-memory.svg) 这种方法允许我们的内核访问任意的物理内存,包括其他地址空间的页表框架。保留的虚拟内存范围的大小与以前一样,不同的是它不再包含未映射的页面。 这种方法的缺点是,需要额外的页表来存储物理内存的映射。这些页表需要存储在某个地方,所以它们会占用一部分物理内存,这在内存较小的设备上可能是个问题。 然而,在x86_64上,我们可以使用大小为2 MiB的[巨大页面]进行映射,而不是默认的4 KiB页面。这样,映射32 GiB的物理内存只需要132 KiB的页表,因为只需要一个3级表和32个2级表。巨大页面也是更有效的缓存,因为它们在转换查找缓冲器(TLB)中使用的条目更少。 [巨大页面]: https://en.wikipedia.org/wiki/Page_%28computer_memory%29#Multiple_page_sizes ### 临时映射 对于物理内存数量非常少的设备,我们可以在需要访问页表帧时,只对其进行**临时映射页表**。为了能够创建临时映射,我们只需要一个 identity-mapped 的1级表。 ![一个虚拟和一个物理地址空间,有一个 identity-mapped 的1级表,该表将其第0个条目映射到2级表帧,从而将该帧映射到地址为0的页面上](temporarily-mapped-page-tables.svg) 该图中的第1级表控制着虚拟地址空间的前2 MiB。这是因为它可以通过从CR3寄存器开始,按照第4级、第3级和第2级页面表中的第0个条目到达。索引为`8`的条目将地址为`32 KiB`的虚拟页映射到地址为`32 KiB`的物理帧,从而对1级表本身进行身份映射。图形显示了这种 identity-mapping ,在 "32 KiB "处有一个水平箭头。 通过写到 identity-mapped 的1级表,我们的内核可以创建多达511个临时映射(512减去直接映射需要的条目)。在上面的例子中,内核创建了两个临时映射。 - 通过将第1级表的第0条映射到地址为`24 KiB`的帧,它创建了一个`0 KiB`的虚拟页到第2级页表的物理帧的临时映射,虚线箭头所示。 - 通过将第1级表的第9条映射到地址为`4 KiB`的帧,它创建了一个`36 KiB`的虚拟页与第4级页表的物理帧的临时映射,虚线箭头所示。 现在内核可以通过写到`0KiB`页来访问2级页表,通过写到`36KiB`页来访问4级页表。 访问具有临时映射的任意页表框架的过程是: - 在身份映射的第1级表中搜索一个自由条目。 - 将该条目映射到我们想要访问的页表的物理帧。 - 通过映射到该条目的虚拟页面访问目标框中。 - 将该条目设置为未使用,从而再次删除临时映射。 这种方法重复使用相同的512个虚拟页来创建映射,因此只需要4 KiB的物理内存。缺点是有点麻烦,尤其是一个新的映射可能需要对多个表层进行修改,这意味着我们需要多次重复上述过程。 ### 递归页表 另一种有趣的方法是根本不需要额外的页表,即**映射页表的递归**。这种方法背后思想是将一个条目从第4级页面表映射到第4级表本身。通过这样做,我们有效地保留了虚拟地址空间的一部分,并将所有当前和未来的页表框架映射到该空间。 让我们通过一个例子来了解这一切是如何进行的。 ![一个4级页层次结构的例子,每个页表都显示在物理内存中。第4级页的条目511被映射到帧4KiB,即第4级表本身的帧。](recursive-page-table.png) 与[本文开头的例子]的唯一区别是在4级表中的索引`511`处增加了一个条目,它被映射到物理帧`4 KiB`,即4级表本身的帧。 [本文开头的例子]: #fang-wen-ye-biao 通过让CPU跟随这个条目进行翻译,它不会到达3级表,而是再次到达同一个4级表。这类似于一个调用自身的递归函数,因此这个表被称为 _递归页表_ 。重要的是,CPU假定4级表的每个条目都指向3级表,所以它现在把4级表当作3级表。这是因为所有级别的表在x86_64上都有完全相同的布局。 在我们开始实际翻译之前,通过跟随递归条目一次或多次,我们可以有效地缩短CPU所穿越的层数。例如,如果我们跟随递归条目一次,然后进入第3级表,CPU会认为第3级表是第2级表。再往前走,它把第2级表当作第1级表,把第1级表当作映射的框架。这意味着我们现在可以读写第1级页表了,因为CPU认为它是映射的帧。下面的图形说明了这五个转换步骤。 ![上述例子中的4级页面层次结构有5个箭头。从CR4到4级表的 "第0步",从4级表到4级表的 "第1步",从4级表到3级表的 "第2步",从3级表到2级表的 "第3步",以及从2级表到1级表的 "第4步"。](recursive-page-table-access-level-1.png) 同样地,我们可以在开始翻译之前,先跟随递归条目两次,将遍历的层数减少到两个。 ![同样的4级页面层次结构,有以下4个箭头。从CR4到4级表的 "第0步",从4级表到4级表的 "第1&2步",从4级表到3级表的 "第3步",以及从3级表到2级表的 "第4步"。](recursive-page-table-access-level-2.png) 让我们一步一步地看下去。首先,CPU跟踪4级表的递归条目,认为它到达了3级表。然后,它再次跟踪递归条目,认为它到达了2级表。但实际上,它仍然是在第4级表中。当CPU现在跟随一个不同的条目时,它到达了一个3级表,但认为它已经在1级表上。因此,当下一个条目指向第2级表时,CPU认为它指向了映射的框架,这使得我们能够读写第2级表。 访问第3级和第4级表的方法是一样的。为了访问第3级表,我们沿着递归条目走了三次,诱使CPU认为它已经在第1级表上了。然后我们跟随另一个条目,到达第3级表,CPU将其视为一个映射的框架。对于访问第4级表本身,我们只需跟随递归条目四次,直到CPU将第4级表本身视为映射的框架(在下面的图形中为蓝色)。 ![同样的4级页面层次结构,有以下3个箭头。从CR4到4级表的 "步骤0",从4级表到4级表的 "步骤1,2,3",以及从4级表到3级表的 "步骤4"。蓝色的是替代的 "步骤1,2,3,4 "箭头,从4级表到4级表。](recursive-page-table-access-level-3.png) 可能需要一些时间来理解这个概念,但在实践中效果相当好。 在下面的章节中,我们将解释如何构建虚拟地址,用于跟随递归条目一次或多次。在我们的实现中,我们不会使用递归分页,所以你不需要阅读它就可以继续阅读本帖。如果你感兴趣,只需点击 _"地址计算"_ 来展开。 ---

    地址计算

    我们看到,在实际翻译之前,我们可以通过跟随递归条目一次或多次访问所有级别的表。由于进入四级表的索引直接来自于虚拟地址,我们需要为这种技术构建特殊的虚拟地址。请记住,页表的索引是以如下方式从地址派生的。 ![第0-12位是页面偏移,第12-21位是1级索引,第21-30位是2级索引,第30-39位是3级索引,第39-48位是4级索引。](../paging-introduction/x86_64-table-indices-from-address.svg) 让我们假设我们想访问映射一个特定页面的第1级页面表。正如我们上面所学到的,这意味着我们必须在继续使用第4级、第3级和第2级索引之前,跟随递归条目一次。为了做到这一点,我们将地址的每个块向右移动一个块,并将原来的4级索引设置为递归条目的索引。 ![第0-12位是1级表框的偏移量,第12-21位是2级索引,第21-30位是3级索引,第30-39位是4级索引,第39-48位是递归条目的索引](table-indices-from-address-recursive-level-1.svg) 为了访问该页的第2级表,我们将每个索引块向右移动两个块,并将原第4级索引的块和原第3级索引都设置为递归条目的索引。 ![第0-12位是2级表框的偏移量,第12-21位是3级索引,第21-30位是4级索引,第30-39位和第39-48位是递归条目的索引](table-indices-from-address-recursive-level-2.svg) 访问第3级表的工作方式是将每个块向右移动三个块,并使用原第4级、第3级和第2级地址块的递归索引。 ![第0-12位是第三级表框的偏移量,第12-21位是第四级索引,第21-30位、第30-39位和第39-48位是递归条目的索引。](table-indices-from-address-recursive-level-3.svg) 最后,我们可以通过将每个区块向右移动四个区块,并对除偏移外的所有地址区块使用递归索引来访问第四级表。 ![位0-12是l级表框的偏移量,位12-21、位21-30、位30-39和位39-48是递归条目的索引。](table-indices-from-address-recursive-level-4.svg) 现在我们可以计算出所有四级页表的虚拟地址。我们甚至可以通过将索引乘以8(一个页表项的大小)来计算出一个精确指向特定页表项的地址。 下表总结了访问不同种类框架的地址结构。 | Virtual Address for | Address Structure ([octal]) | | ------------------- | -------------------------------- | | Page | `0o_SSSSSS_AAA_BBB_CCC_DDD_EEEE` | | Level 1 Table Entry | `0o_SSSSSS_RRR_AAA_BBB_CCC_DDDD` | | Level 2 Table Entry | `0o_SSSSSS_RRR_RRR_AAA_BBB_CCCC` | | Level 3 Table Entry | `0o_SSSSSS_RRR_RRR_RRR_AAA_BBBB` | | Level 4 Table Entry | `0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA` | [八进制]: https://en.wikipedia.org/wiki/Octal 而`AAA`是第4级索引,`BBB`是第3级索引,`CCC`是第2级索引,`DDD`是映射框架的第1级索引,`EEEE`是其中的偏移。`RRR`是递归条目的索引。当一个索引(三位数)被转换为一个偏移量(四位数)时,它是通过乘以8(页表项的大小)来完成。有了这个偏移量,产生的地址直接指向相应的页表项。 `SSSSSS`是符号扩展位,这意味着它们都是第47位的副本。这是对x86_64架构上有效地址的特殊要求。[上篇文章][sign extension]解释过。 [sign extension]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 我们使用[八进制]数字来表示地址,因为每个八进制字符代表三个比特,这使我们能够清楚地分开不同页表层的9比特索引。这在十六进制系统中是不可能的,每个字符代表四个比特。 ##### 在Rust代码中 为了在Rust代码中构建这样的地址,可以使用位操作。 ```rust // 你想访问其对应的页表的虚拟地址 let addr: usize = […]; let r = 0o777; // 递归索引 let sign = 0o177777 << 48; // 符号扩展 // 检索我们要翻译的地址的页表索引 let l4_idx = (addr >> 39) & 0o777; // level 4 索引 let l3_idx = (addr >> 30) & 0o777; // level 3 索引 let l2_idx = (addr >> 21) & 0o777; // level 2 索引 let l1_idx = (addr >> 12) & 0o777; // level 1 索引 let page_offset = addr & 0o7777; // 计算页表的地址 let level_4_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (r << 12); let level_3_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (l4_idx << 12); let level_2_table_addr = sign | (r << 39) | (r << 30) | (l4_idx << 21) | (l3_idx << 12); let level_1_table_addr = sign | (r << 39) | (l4_idx << 30) | (l3_idx << 21) | (l2_idx << 12); ``` 上面的代码假设索引为`0o777`(511)的最后一个4级条目是递归映射的。目前不是这样的,所以这段代码还不能工作。请看下面如何告诉bootloader来设置递归映射。 除了手工进行位操作外,你可以使用`x86_64`板块的[`递归页表`]类型,它为各种页表操作提供安全的抽象。例如,下面的代码显示了如何将一个虚拟地址转换为其映射的物理地址。 [`递归页表`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html ```rust // in src/memory.rs use x86_64::structures::paging::{Mapper, Page, PageTable, RecursivePageTable}; use x86_64::{VirtAddr, PhysAddr}; /// 从第4级地址创建一个RecursivePageTable实例。 let level_4_table_addr = […]; let level_4_table_ptr = level_4_table_addr as *mut PageTable; let recursive_page_table = unsafe { let level_4_table = &mut *level_4_table_ptr; RecursivePageTable::new(level_4_table).unwrap(); } /// 检索给定虚拟地址的物理地址 let addr: u64 = […] let addr = VirtAddr::new(addr); let page: Page = Page::containing_address(addr); // 进行翻译 let frame = recursive_page_table.translate_page(page); frame.map(|frame| frame.start_address() + u64::from(addr.page_offset())) ``` 同样,这个代码需要一个有效的递归映射。有了这样的映射,缺失的 `level_4_table_addr` 可以像第一个代码例子那样被计算出来。
    --- 递归分页是一种有趣的技术,它显示了页表中的单个映射可以有多么强大。它比较容易实现,而且只需要少量的设置(只是一个单一的递归条目),所以它是第一次实验分页的一个好选择。 然而,它也有一些弊端: - 它占据了大量的虚拟内存(512 GiB)。在大的48位地址空间中,这不是一个大问题,但它可能会导致次优的缓存行为。 - 它只允许轻松访问当前活动的地址空间。通过改变递归条目,访问其他地址空间仍然是可能的,但切换回来时需要一个临时映射。我们在(已过期的)[_Remap The Kernel_] 文章"地址空间 "中描述了如何做到这一点。 - 它在很大程度上依赖于x86的页表格式,在其他架构上可能无法工作。 [_Remap The Kernel_]: https://os.phil-opp.com/remap-the-kernel/#overview ## 支持引导器 所有这些方法的设置都需要对页表进行修改。例如,需要创建物理内存的映射,或者需要对4级表的一个条目进行递归映射。问题是,如果没有访问页表的现有方法,我们就无法创建这些所需的映射。 这意味着我们需要 bootloader 的帮助,bootloader 创建了内核运行的页表。Bootloader 可以访问页表,所以它可以创建内核需要的任何映射。在目前的实现中,“bootloader” 工具箱支持上述两种方法,通过 [cargo features] 进行控制。 [cargo features]: https://doc.rust-lang.org/cargo/reference/features.html#the-features-section - `map_physical_memory` 功能将某处完整的物理内存映射到虚拟地址空间。因此,内核可以访问所有的物理内存,并且可以遵循[_映射完整物理内存_](#ying-she-wan-zheng-de-wu-li-nei-cun)的方法。 - 有了 “recursive_page_table” 功能,bootloader会递归地映射4级page table的一个条目。这允许内核访问页表,如[_递归页表_](#di-gui-ye-biao)部分所述。 我们为我们的内核选择了第一种方法,因为它很简单,与平台无关,而且更强大(它还允许访问非页表框架)。为了启用所需的引导程序支持,我们在 “引导程序” 的依赖中加入了 "map_physical_memory"功能。 ```toml [dependencies] bootloader = { version = "0.9", features = ["map_physical_memory"]} ``` 启用这个功能后,bootloader 将整个物理内存映射到一些未使用的虚拟地址范围。为了将虚拟地址范围传达给我们的内核,bootloader 传递了一个 _启动信息_ 结构。 ### 启动信息 `Bootloader` 板块定义了一个[`BootInfo`]结构,包含了它传递给我们内核的所有信息。这个结构还处于早期阶段,所以在更新到未来的 [semver-incompatible] bootloader 版本时,可能会出现一些故障。在启用 "map_physical_memory" 功能后,它目前有两个字段 "memory_map" 和 "physical_memory_offset"。 [`BootInfo`]: https://docs.rs/bootloader/0.9/bootloader/bootinfo/struct.BootInfo.html [semver-incompatible]: https://doc.rust-lang.org/stable/cargo/reference/specifying-dependencies.html#caret-requirements - `memory_map`字段包含了可用物理内存的概览。它告诉我们的内核,系统中有多少物理内存可用,哪些内存区域被保留给设备,如VGA硬件。内存图可以从BIOS或UEFI固件中查询,但只能在启动过程的早期查询。由于这个原因,它必须由引导程序提供,因为内核没有办法在以后检索到它。在这篇文章的后面我们将需要内存图。 - `physical_memory_offset`告诉我们物理内存映射的虚拟起始地址。通过把这个偏移量加到物理地址上,我们得到相应的虚拟地址。这使得我们可以从我们的内核中访问任意的物理内存。 - 这个物理内存偏移可以通过在Cargo.toml中添加一个`[package.metadata.bootloader]`表并设置`physical-memory-offset = "0x0000f00000000000"`(或任何其他值)来定制。然而,请注意,如果bootloader遇到物理地址值开始与偏移量以外的空间重叠,也就是说,它以前会映射到其他早期的物理地址的区域,就会出现恐慌。所以一般来说,这个值越高(>1 TiB)越好。 Bootloader将 `BootInfo` 结构以 `&'static BootInfo`参数的形式传递给我们的内核,并传递给我们的`_start`函数。我们的函数中还没有声明这个参数,所以让我们添加它。 ```rust // in src/main.rs use bootloader::BootInfo; #[unsafe(no_mangle)] pub extern "C" fn _start(boot_info: &'static BootInfo) -> ! { // new argument […] } ``` 以前省去这个参数并不是什么问题,因为x86_64的调用惯例在CPU寄存器中传递第一个参数。因此,当这个参数没有被声明时,它被简单地忽略了。然而,如果我们不小心使用了一个错误的参数类型,那将是一个问题,因为编译器不知道我们入口点函数的正确类型签名。 ### `entry_point` 宏 由于我们的`_start`函数是在外部从引导程序中调用的,所以没有对我们的函数签名进行检查。这意味着我们可以让它接受任意参数而不出现任何编译错误,但在运行时它会失败或导致未定义行为。 为了确保入口点函数总是具有引导程序所期望的正确签名,`bootloader`板块提供了一个[`entry_point`]宏,它提供了一种类型检查的方法来定义一个Rust函数作为入口点。让我们重写我们的入口点函数来使用这个宏。 [`entry_point`]: https://docs.rs/bootloader/0.6.4/bootloader/macro.entry_point.html ```rust // in src/main.rs use bootloader::{BootInfo, entry_point}; entry_point!(kernel_main); fn kernel_main(boot_info: &'static BootInfo) -> ! { […] } ``` 我们不再需要使用`extern "C"`或`no_mangle`作为我们的入口点,因为宏为我们定义了真正的低级`_start`入口点。`kernel_main`函数现在是一个完全正常的Rust函数,所以我们可以为它选择一个任意的名字。重要的是,它是经过类型检查的,所以当我们使用一个错误的函数签名时,例如增加一个参数或改变参数类型,就会发生编译错误。 让我们在我们的`lib.rs`中进行同样的修改。 ```rust // in src/lib.rs #[cfg(test)] use bootloader::{entry_point, BootInfo}; #[cfg(test)] entry_point!(test_kernel_main); /// Entry point for `cargo test` #[cfg(test)] fn test_kernel_main(_boot_info: &'static BootInfo) -> ! { // like before init(); test_main(); hlt_loop(); } ``` 由于这个入口点只在测试模式下使用,我们在所有项目中添加了`#[cfg(test)]`属性。我们给我们的测试入口点一个独特的名字`test_kernel_main`,以避免与我们的`main.rs`的`kernel_main`混淆。我们现在不使用`BootInfo`参数,所以我们在参数名前加上`_`,以消除未使用变量的警告。 ## 实现 现在我们可以访问物理内存了,我们终于可以开始实现我们的页表代码了。首先,我们将看一下我们的内核目前运行的活动页表。第二步,我们将创建一个转换函数,返回一个给定的虚拟地址所映射到的物理地址。作为最后一步,我们将尝试修改页表,以便创建一个新的映射。 在我们开始之前,我们为我们的代码创建一个新的`memory`模块。 ```rust // in src/lib.rs pub mod memory; ``` 对于该模块,我们创建一个空的`src/memory.rs`文件。 ### 访问页表 在[上一篇文章的结尾],我们试图查看我们的内核运行的页表,但是由于我们无法访问`CR3`寄存器所指向的物理帧而失败了。我们现在可以通过创建一个`active_level_4_table`函数来继续,该函数返回对活动的4级页面表的引用。 [上一篇文章的结尾]: @/edition-2/posts/08-paging-introduction/index.md#accessing-the-page-tables ```rust // in src/memory.rs use x86_64::{ structures::paging::PageTable, VirtAddr, }; /// 返回一个对活动的4级表的可变引用。 /// /// 这个函数是不安全的,因为调用者必须保证完整的物理内存在传递的 /// `physical_memory_offset`处被映射到虚拟内存。另外,这个函数 /// 必须只被调用一次,以避免别名"&mut "引用(这是未定义的行为)。 pub unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable { use x86_64::registers::control::Cr3; let (level_4_table_frame, _) = Cr3::read(); let phys = level_4_table_frame.start_address(); let virt = physical_memory_offset + phys.as_u64(); let page_table_ptr: *mut PageTable = virt.as_mut_ptr(); unsafe { &mut *page_table_ptr } } ``` 首先,我们从`CR3`寄存器中读取活动的4级表的物理帧。然后我们取其物理起始地址,将其转换为`u64`,并将其添加到`physical_memory_offset`中,得到页表框架映射的虚拟地址。最后,我们通过`as_mut_ptr`方法将虚拟地址转换为`*mut PageTable`原始指针,然后不安全地从它创建一个`&mut PageTable`引用。我们创建一个`&mut`引用,而不是`&`引用,因为我们将在本篇文章的后面对页表进行突变。 现在我们可以用这个函数来打印第4级表格的条目。 ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory::active_level_4_table; use x86_64::VirtAddr; println!("Hello World{}", "!"); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let l4_table = unsafe { active_level_4_table(phys_mem_offset) }; for (i, entry) in l4_table.iter().enumerate() { if !entry.is_unused() { println!("L4 Entry {}: {:?}", i, entry); } } // as before #[cfg(test)] test_main(); println!("It did not crash!"); blog_os::hlt_loop(); } ``` 首先,我们将 "BootInfo" 结构的 "physical_memory_offset "转换为 [`VirtAddr`],并将其传递给 `active_level_4_table` 函数。然后我们使用`iter`函数来迭代页表条目,并使用[`enumerate`]组合器为每个元素增加一个索引`i`。我们只打印非空的条目,因为所有512个条目在屏幕上是放不下的。 [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`enumerate`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.enumerate 当我们运行它时,我们看到以下输出。 ![QEMU打印条目0 (0x2000, PRESENT, WRITABLE, ACCESSED),条目1 (0x894000, PRESENT, WRITABLE, ACCESSED, DIRTY),条目31 (0x88e000, PRESENT, WRITABLE, ACCESSED, DIRTY),条目175 (0x891000, PRESENT, WRITABLE, ACCESSED, DIRTY),以及条目504 (0x897000, PRESENT, WRITABLE, ACCESSED, DIRTY)](qemu-print-level-4-table.png) 我们看到有各种非空条目,它们都映射到不同的3级表。有这么多区域是因为内核代码、内核堆栈、物理内存映射和启动信息都使用独立的内存区域。 为了进一步遍历页表,看一下三级表,我们可以把一个条目的映射帧再转换为一个虚拟地址。 ```rust // in the `for` loop in src/main.rs use x86_64::structures::paging::PageTable; if !entry.is_unused() { println!("L4 Entry {}: {:?}", i, entry); // get the physical address from the entry and convert it let phys = entry.frame().unwrap().start_address(); let virt = phys.as_u64() + boot_info.physical_memory_offset; let ptr = VirtAddr::new(virt).as_mut_ptr(); let l3_table: &PageTable = unsafe { &*ptr }; // print non-empty entries of the level 3 table for (i, entry) in l3_table.iter().enumerate() { if !entry.is_unused() { println!(" L3 Entry {}: {:?}", i, entry); } } } ``` 对于查看2级和1级表,我们对3级和2级条目重复这一过程。你可以想象,这很快就会变得非常冗长,所以我们不在这里展示完整的代码。 手动遍历页表是很有趣的,因为它有助于了解CPU是如何进行转换的。然而,大多数时候,我们只对给定的虚拟地址的映射物理地址感兴趣,所以让我们为它创建一个函数。 ### 翻译地址 为了将虚拟地址转换为物理地址,我们必须遍历四级页表,直到到达映射的帧。让我们创建一个函数来执行这种转换。 ```rust // in src/memory.rs use x86_64::PhysAddr; /// 将给定的虚拟地址转换为映射的物理地址,如果地址没有被映射,则为`None'。 /// /// 这个函数是不安全的,因为调用者必须保证完整的物理内存在传递的`physical_memory_offset`处被映射到虚拟内存。 pub unsafe fn translate_addr(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { translate_addr_inner(addr, physical_memory_offset) } ``` 我们将该函数转发到一个安全的`translate_addr_inner`函数,以限制`unsafe`的范围。正如我们在上面指出的,Rust把一个`unsafe fn`的完整主体当作一个大的不安全块。通过调用一个私有的安全函数,我们使每个`unsafe`操作再次明确。 私有内部函数包含真正的实现: ```rust // in src/memory.rs /// 由 `translate_addr`调用的私有函数。 /// /// 这个函数是安全的,可以限制`unsafe`的范围, /// 因为Rust将不安全函数的整个主体视为不安全块。 /// 这个函数只能通过`unsafe fn`从这个模块的外部到达。 fn translate_addr_inner(addr: VirtAddr, physical_memory_offset: VirtAddr) -> Option { use x86_64::structures::paging::page_table::FrameError; use x86_64::registers::control::Cr3; // 从CR3寄存器中读取活动的4级 frame let (level_4_table_frame, _) = Cr3::read(); let table_indexes = [ addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index() ]; let mut frame = level_4_table_frame; // 遍历多级页表 for &index in &table_indexes { // 将该框架转换为页表参考 let virt = physical_memory_offset + frame.start_address().as_u64(); let table_ptr: *const PageTable = virt.as_ptr(); let table = unsafe {&*table_ptr}; // 读取页表条目并更新`frame`。 let entry = &table[index]; frame = match entry.frame() { Ok(frame) => frame, Err(FrameError::FrameNotPresent) => return None, Err(FrameError::HugeFrame) => panic!("huge pages not supported"), }; } // 通过添加页面偏移量来计算物理地址 Some(frame.start_address() + u64::from(addr.page_offset())) } ``` 我们没有重复使用`active_level_4_table`函数,而是再次从`CR3`寄存器读取4级帧。我们这样做是因为它简化了这个原型的实现。别担心,我们一会儿就会创建一个更好的解决方案。 `VirtAddr`结构已经提供了计算四级页面表索引的方法。我们将这些索引存储在一个小数组中,因为它允许我们使用`for`循环遍历页表。在循环之外,我们记住了最后访问的`frame`,以便以后计算物理地址。`frame`在迭代时指向页表框架,在最后一次迭代后指向映射的框架,也就是在跟随第1级条目之后。 在这个循环中,我们再次使用`physical_memory_offset`将帧转换为页表引用。然后我们读取当前页表的条目,并使用[`PageTableEntry::frame`]函数来检索映射的框架。如果该条目没有映射到一个框架,我们返回`None`。如果该条目映射了一个巨大的2 MiB或1 GiB页面,我们就暂时慌了。 [`PageTableEntry::frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page_table/struct.PageTableEntry.html#method.frame 让我们通过翻译一些地址来测试我们的翻译功能。 ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // new import use blog_os::memory::translate_addr; […] // hello world and blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let addresses = [ // the identity-mapped vga buffer page 0xb8000, // some code page 0x201008, // some stack page 0x0100_0020_1a10, // virtual address mapped to physical address 0 boot_info.physical_memory_offset, ]; for &address in &addresses { let virt = VirtAddr::new(address); let phys = unsafe { translate_addr(virt, phys_mem_offset) }; println!("{:?} -> {:?}", virt, phys); } […] // test_main(), "it did not crash" printing, and hlt_loop() } ``` 当我们运行它时,我们看到以下输出。 ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, "panicked at 'huge pages not supported'](qemu-translate-addr.png) 正如预期的那样,身份映射的地址`0xb8000`翻译成了相同的物理地址。代码页和堆栈页翻译成了一些任意的物理地址,这取决于引导程序如何为我们的内核创建初始映射。值得注意的是,最后12位在翻译后总是保持不变,这是有道理的,因为这些位是[_page offset_],不是翻译的一部分。 [_page offset_]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 由于每个物理地址都可以通过添加`physical_memory_offset`来访问,`physical_memory_offset`地址的翻译本身应该指向物理地址`0`。然而,翻译失败了,因为映射使用了巨大的页面来提高效率,而我们的实现还不支持。 ### 使用 `OffsetPageTable` 将虚拟地址转换为物理地址是操作系统内核中的一项常见任务,因此`x86_64`内核为它提供了一个抽象。这个实现已经支持巨大的页面和除了 "translate_addr "之外的其他几个页表函数,所以我们将在下面使用它,而不是在我们自己的实现中添加巨大的页面支持。 抽象的基础是两个特征,它们定义了各种页表映射功能。 - [`Mapper`] 特质在页面大小上是通用的,并提供对页面进行操作的函数。例如[`translate_page`],它将一个给定的页面翻译成相同大小的框架,以及[`map_to`],它在页面表中创建一个新的映射。 - [`Translate`]特性提供了与多个页面大小有关的函数,如[`translate_addr`]或一般[`translate`]。 [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`translate_page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#tymethod.translate_page [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to [`Translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html [`translate_addr`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#method.translate_addr [`translate`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Translate.html#tymethod.translate 特质只定义接口,不提供任何实现。`x86_64`板块目前提供了三种类型来实现不同要求的特征。[`OffsetPageTable`] 类型假设完整的物理内存被映射到虚拟地址空间的某个偏移处。[`MappedPageTable`]更灵活一些。它只要求每个页表帧在一个可计算的地址处被映射到虚拟地址空间。最后,[`递归页表`]类型可以用来通过[递归页表](#di-gui-ye-biao)访问页表框架。 [`OffsetPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html [`MappedPageTable`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MappedPageTable.html [`递归页表`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.RecursivePageTable.html 在我们的例子中,bootloader在`physical_memory_offset`变量指定的虚拟地址上映射完整的物理内存,所以我们可以使用`OffsetPageTable`类型。为了初始化它,我们在`memory`模块中创建一个新的`init`函数。 ```rust use x86_64::structures::paging::OffsetPageTable; /// 初始化一个新的OffsetPageTable。 /// /// 这个函数是不安全的,因为调用者必须保证完整的物理内存在 /// 传递的`physical_memory_offset`处被映射到虚拟内存。另 /// 外,这个函数必须只被调用一次,以避免别名"&mut "引用(这是未定义的行为)。 pub unsafe fn init(physical_memory_offset: VirtAddr) -> OffsetPageTable<'static> { unsafe { let level_4_table = active_level_4_table(physical_memory_offset); OffsetPageTable::new(level_4_table, physical_memory_offset) } } // 私下进行 unsafe fn active_level_4_table(physical_memory_offset: VirtAddr) -> &'static mut PageTable {…} ``` 该函数接受 "physical_memory_offset "作为参数,并返回一个新的 "OffsetPageTable "实例,该实例具有 "静态 "寿命。这意味着该实例在我们内核的整个运行时间内保持有效。在函数体中,我们首先调用 "active_level_4_table "函数来获取4级页表的可变引用。然后我们用这个引用调用[`OffsetPageTable::new`] 函数。作为第二个参数,`new`函数希望得到物理内存映射开始的虚拟地址,该地址在`physical_memory_offset`变量中给出。 [`OffsetPageTable::new`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.OffsetPageTable.html#method.new 从现在开始,`active_level_4_table`函数只能从`init`函数中调用,因为它在多次调用时很容易导致别名的可变引用,这可能导致未定义的行为。出于这个原因,我们通过删除`pub`指定符使该函数成为私有的。 我们现在可以使用`Translate::translate_addr`方法而不是我们自己的`memory::translate_addr`函数。我们只需要在`kernel_main`中修改几行。 ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { // new: different imports use blog_os::memory; use x86_64::{structures::paging::Translate, VirtAddr}; […] // hello world and blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); // new: initialize a mapper let mapper = unsafe { memory::init(phys_mem_offset) }; let addresses = […]; // same as before for &address in &addresses { let virt = VirtAddr::new(address); // new: use the `mapper.translate_addr` method let phys = mapper.translate_addr(virt); println!("{:?} -> {:?}", virt, phys); } […] // test_main(), "it did not crash" printing, and hlt_loop() } ``` 我们需要导入`Translate`特性,以便使用它提供的[`translate_addr`]方法。 当我们现在运行它时,我们看到和以前一样的翻译结果,不同的是,巨大的页面翻译现在也在工作。 ![0xb8000 -> 0xb8000, 0x201008 -> 0x401008, 0x10000201a10 -> 0x279a10, 0x18000000000 -> 0x0](qemu-mapper-translate-addr.png) 正如预期的那样,`0xb8000`的翻译以及代码和堆栈地址与我们自己的翻译函数保持一致。此外,我们现在看到,虚拟地址`physical_memory_offset`被映射到物理地址`0x0`。 通过使用`MappedPageTable`类型的翻译函数,我们可以免除实现巨大页面支持的工作。我们还可以访问其他的页面函数,如`map_to`,我们将在下一节使用。 在这一点上,我们不再需要`memory::translate_addr`和`memory::translate_addr_inner`函数,所以我们可以删除它们。 ### 创建一个新的映射 到目前为止,我们只看了页面表而没有修改任何东西。让我们改变这种情况,为一个以前没有映射的页面创建一个新的映射。 我们将使用[`Mapper`]特性的[`map_to`]函数来实现,所以让我们先看一下这个函数。文档告诉我们,它需要四个参数:我们想要映射的页面,该页面应该被映射到的框架,一组页面表项的标志,以及一个`frame_allocator`。之所以需要框架分配器,是因为映射给定的页面可能需要创建额外的页表,而页表需要未使用的框架作为后备存储。 [`map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html#tymethod.map_to [`Mapper`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.Mapper.html #### create_example_mapping 函数 我们实现的第一步是创建一个新的`create_example_mapping`函数,将一个给定的虚拟页映射到`0xb8000`,VGA文本缓冲区的物理帧。我们选择这个帧是因为它允许我们很容易地测试映射是否被正确创建。我们只需要写到新映射的页面,看看是否看到写的内容出现在屏幕上。 `create_example_mapping` 函数看起来像这样: ```rust // in src/memory.rs use x86_64::{ PhysAddr, structures::paging::{Page, PhysFrame, Mapper, Size4KiB, FrameAllocator} }; /// 为给定的页面创建一个实例映射到框架`0xb8000`。 pub fn create_example_mapping( page: Page, mapper: &mut OffsetPageTable, frame_allocator: &mut impl FrameAllocator, ) { use x86_64::structures::paging::PageTableFlags as Flags; let frame = PhysFrame::containing_address(PhysAddr::new(0xb8000)); let flags = Flags::PRESENT | Flags::WRITABLE; let map_to_result = unsafe { // FIXME: 这并不安全,我们这样做只是为了测试。 mapper.map_to(page, frame, flags, frame_allocator) }; map_to_result.expect("map_to failed").flush(); } ``` 除了应该被映射的 "page "之外,该函数还希望得到一个对 "OffsetPageTable "实例和 "frame_allocator "的可变引用。参数 "frame_allocator "使用[`impl Trait`][impl-trait-arg]语法,在所有实现[`FrameAllocator`]特征的类型中是[通用]的。该特性在[`PageSize`]特性上是通用的,可以处理标准的4 KiB页面和巨大的2 MiB/1 GiB页面。我们只想创建一个4 KiB的映射,所以我们设置通用参数为`Size4KiB`。 [impl-trait-arg]: https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters [通用]: https://doc.rust-lang.org/book/ch10-00-generics.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`PageSize`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/trait.PageSize.html [`map_to`]方法是不安全的,因为调用者必须确保该帧没有被使用。原因是两次映射同一帧可能导致未定义的行为,例如当两个不同的`&mut`引用指向同一物理内存位置时。在我们的例子中,我们重新使用了已经被映射的VGA文本缓冲区帧,所以我们打破了所需的条件。然而,`create_example_mapping`函数只是一个临时的测试函数,在这篇文章之后会被删除,所以它是可以的。为了提醒我们不安全,我们在这行上加了一个`FIXME`注释。 除了 "page "和 "unused_frame "之外,"map_to "方法还需要一组用于映射的标志和对 "frame_allocator "的引用,这将在稍后解释。对于标志,我们设置了`PRESENT`标志,因为所有有效的条目都需要它,而`WRITABLE`标志是为了使映射的页面可写。关于所有可能的标志的列表,请参见上一篇文章的[_页表格式_]部分。 [_页表格式_]: @/edition-2/posts/08-paging-introduction/index.md#page-table-format [`map_to`]函数可能失败,所以它返回一个[`Result`]。由于这只是一些不需要健壮的示例代码,我们只是使用[`expect`]来在发生错误时进行恐慌。成功后,该函数返回一个[`MapperFlush`]类型,该类型提供了一个简单的方法,用其[`flush`]方法从翻译查找缓冲区(TLB)冲刷新映射的页面。像`Result`一样,该类型使用[`#[must_use]`][must_use]属性,在我们不小心忘记使用它时发出警告。 [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush [must_use]: https://doc.rust-lang.org/std/result/#results-must-be-used #### 一个假的 `FrameAllocator` 为了能够调用`create_example_mapping`,我们需要首先创建一个实现`FrameAllocator`特质的类型。如上所述,如果`map_to`需要新的页表,该特质负责为其分配框架。 让我们从简单的情况开始,假设我们不需要创建新的页面表。对于这种情况,一个总是返回 "无 "的框架分配器就足够了。我们创建这样一个`空框架分配器`来测试我们的映射函数。 ```rust // in src/memory.rs /// 一个总是返回`None'的FrameAllocator。 pub struct EmptyFrameAllocator; unsafe impl FrameAllocator for EmptyFrameAllocator { fn allocate_frame(&mut self) -> Option { None } } ``` 实现`FrameAllocator`是不安全的,因为实现者必须保证分配器只产生未使用的帧。否则,可能会发生未定义的行为,例如,当两个虚拟页被映射到同一个物理帧时。我们的 "空框架分配器 "只返回 "无",所以在这种情况下,这不是一个问题。 #### 选择一个虚拟页面 我们现在有一个简单的框架分配器,我们可以把它传递给我们的`create_example_mapping`函数。然而,分配器总是返回 "无",所以只有在创建映射时不需要额外的页表框架时,这才会起作用。为了理解什么时候需要额外的页表框架,什么时候不需要,让我们考虑一个例子。 ![一个虚拟和一个物理地址空间,有一个单一的映射页和所有四级的页表](required-page-frames-example.svg) 图中左边是虚拟地址空间,右边是物理地址空间,中间是页表。页表被存储在物理内存框架中,用虚线表示。虚拟地址空间包含一个地址为`0x803fe00000`的单一映射页,用蓝色标记。为了将这个页面转换到它的框架,CPU在4级页表上行走,直到到达地址为36 KiB的框架。 此外,该图用红色显示了VGA文本缓冲区的物理帧。我们的目标是使用`create_example_mapping`函数将一个先前未映射的虚拟页映射到这个帧。由于我们的`EmptyFrameAllocator`总是返回`None`,我们想创建映射,这样就不需要分配器提供额外的帧。这取决于我们为映射选择的虚拟页。 图中显示了虚拟地址空间中的两个候选页,都用黄色标记。一个页面在地址`0x803fdfd000`,比映射的页面(蓝色)早3页。虽然4级和3级页表的索引与蓝色页相同,但2级和1级的索引不同(见[上一篇][页表-索引])。2级表的不同索引意味着这个页面使用了一个不同的1级表。由于这个1级表还不存在,如果我们选择该页作为我们的例子映射,我们就需要创建它,这就需要一个额外的未使用的物理帧。相比之下,地址为`0x803fe02000`的第二个候选页就没有这个问题,因为它使用了与蓝色页面相同的1级页表。因此,所有需要的页表都已经存在。 [页表-索引]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 总之,创建一个新的映射的难度取决于我们想要映射的虚拟页。在最简单的情况下,该页的1级页表已经存在,我们只需要写一个条目。在最困难的情况下,该页是在一个还不存在第三级的内存区域,所以我们需要先创建新的第三级、第二级和第一级页表。 为了用 "EmptyFrameAllocator "调用我们的 "create_example_mapping "函数,我们需要选择一个所有页表都已存在的页面。为了找到这样的页面,我们可以利用bootloader在虚拟地址空间的第一兆字节内加载自己的事实。这意味着这个区域的所有页面都存在一个有效的1级表。因此,我们可以选择这个内存区域中任何未使用的页面作为我们的例子映射,比如地址为`0`的页面。通常情况下,这个页面应该保持未使用状态,以保证解读空指针会导致页面故障,所以我们知道bootloader没有将其映射。 #### 创建映射 现在我们有了调用`create_example_mapping`函数所需的所有参数,所以让我们修改`kernel_main`函数来映射虚拟地址`0`的页面。由于我们将页面映射到VGA文本缓冲区的帧上,我们应该能够在之后通过它写到屏幕上。实现起来是这样的。 ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory; use x86_64::{structures::paging::Page, VirtAddr}; // 新的导入 […] // hello world and blog_os::init let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = memory::EmptyFrameAllocator; // 映射未使用的页 let page = Page::containing_address(VirtAddr::new(0)); memory::create_example_mapping(page, &mut mapper, &mut frame_allocator); // 通过新的映射将字符串 `New!` 写到屏幕上。 let page_ptr: *mut u64 = page.start_address().as_mut_ptr(); unsafe { page_ptr.offset(400).write_volatile(0x_f021_f077_f065_f04e)}; […] // test_main(), "it did not crash" printing, and hlt_loop() } ``` 我们首先通过调用 "create_example_mapping "函数为地址为0的页面创建映射,并为 "mapper "和 "frame_allocator "实例提供一个可变的引用。这将页面映射到VGA文本缓冲区框架,所以我们应该在屏幕上看到对它的任何写入。 然后我们将页面转换为原始指针,并写一个值到偏移量`400`。我们不写到页面的开始,因为VGA缓冲区的顶行被下一个`println`直接移出了屏幕。我们写值`0x_f021_f077_f065_f04e`,表示白色背景上的字符串 _"New!"_ 。正如我们[在 _"VGA文本模式"_ 帖子中]所学到的,对VGA缓冲区的写入应该是不稳定的,所以我们使用[`write_volatile`]方法。 [在 _"VGA文本模式"_ 帖子中]: @/edition-2/posts/03-vga-text-buffer/index.md#volatile [`write_volatile`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write_volatile 当我们在QEMU中运行它时,我们看到以下输出。 ![QEMU打印 "It did not crash!",屏幕中间有四个完全白色的单元格。](qemu-new-mapping.png) 屏幕上的 _"New!"_ 是由我们写到页`0`引起的,这意味着我们成功地在页表中创建了一个新的映射。 创建该映射只是因为负责地址为`0`的页面的1级表已经存在。当我们试图映射一个还不存在一级表的页面时,`map_to`函数失败了,因为它试图通过用`EmptyFrameAllocator`分配帧来创建新的页表。当我们试图映射`0xdeadbeaf000`而不是`0`页面时,我们可以看到这种情况发生。 ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { […] let page = Page::containing_address(VirtAddr::new(0xdeadbeaf000)); […] } ``` 当我们运行它时,出现了恐慌,并有以下错误信息。 ``` panicked at 'map_to failed: FrameAllocationFailed', /…/result.rs:999:5 ``` 为了映射那些还没有一级页表的页面,我们需要创建一个合适的`FrameAllocator`。但是我们如何知道哪些帧是未使用的,以及有多少物理内存是可用的? ### 分配页框 为了创建新的页表,我们需要创建一个合适的框架分配器。为了做到这一点,我们使用由bootloader传递的`memory_map`,作为`BootInfo`结构的一部分。 ```rust // in src/memory.rs use bootloader::bootinfo::MemoryMap; /// 一个FrameAllocator,从bootloader的内存地图中返回可用的 frames。 pub struct BootInfoFrameAllocator { memory_map: &'static MemoryMap, next: usize, } impl BootInfoFrameAllocator { /// 从传递的内存 map 中创建一个FrameAllocator。 /// /// 这个函数是不安全的,因为调用者必须保证传递的内存 map 是有效的。 /// 主要的要求是,所有在其中被标记为 "可用 "的帧都是真正未使用的。 pub unsafe fn init(memory_map: &'static MemoryMap) -> Self { BootInfoFrameAllocator { memory_map, next: 0, } } } ``` 该结构有两个字段。一个是对bootloader传递的内存 map 的 `'static` 引用,一个是跟踪分配器应该返回的下一帧的 `next`字段。 正如我们在[_启动信息_](#qi-dong-xin-xi)部分所解释的,内存图是由 BIOS/UEFI 固件提供的。它只能在启动过程的早期被查询,所以引导程序已经为我们调用了相应的函数。内存地图由[`MemoryRegion`]结构列表组成,其中包含每个内存区域的起始地址、长度和类型(如未使用、保留等)。 `init`函数用一个给定的内存映射初始化一个`BootInfoFrameAllocator`。`next`字段被初始化为`0`,并将在每次分配帧时增加,以避免两次返回相同的帧。由于我们不知道内存映射的可用帧是否已经在其他地方被使用,我们的`init`函数必须是`不安全的`,以要求调用者提供额外的保证。 #### 一个 `usable_frames` 方法 在我们实现`FrameAllocator`特性之前,我们添加一个辅助方法,将内存映射转换为可用帧的迭代器。 ```rust // in src/memory.rs use bootloader::bootinfo::MemoryRegionType; impl BootInfoFrameAllocator { /// 返回内存映射中指定的可用框架的迭代器。 fn usable_frames(&self) -> impl Iterator { // 从内存 map 中获取可用的区域 let regions = self.memory_map.iter(); let usable_regions = regions .filter(|r| r.region_type == MemoryRegionType::Usable); // 将每个区域映射到其地址范围 let addr_ranges = usable_regions .map(|r| r.range.start_addr()..r.range.end_addr()); // 转化为一个帧起始地址的迭代器 let frame_addresses = addr_ranges.flat_map(|r| r.step_by(4096)); // 从起始地址创建 `PhysFrame` 类型 frame_addresses.map(|addr| PhysFrame::containing_address(PhysAddr::new(addr))) } } ``` 这个函数使用迭代器组合方法将初始的`MemoryMap`转化为可用的物理帧的迭代器。 - 首先,我们调用`iter`方法,将内存映射转换为多个[`MemoryRegion`]的迭代器。 - 然后我们使用[`filter`]方法跳过任何保留或其他不可用的区域。Bootloader为它创建的所有映射更新了内存地图,所以被我们的内核使用的帧(代码、数据或堆栈)或存储启动信息的帧已经被标记为`InUse`或类似的。因此,我们可以确定 "可使用" 的帧没有在其他地方使用。 - 之后,我们使用[`map`]组合器和Rust的[range语法]将我们的内存区域迭代器转化为地址范围的迭代器。 - 接下来,我们使用[`flat_map`]将地址范围转化为帧起始地址的迭代器,使用[`step_by`]选择每4096个地址。由于4096字节(=4 KiB)是页面大小,我们得到了每个帧的起始地址。Bootloader对所有可用的内存区域进行页对齐,所以我们在这里不需要任何对齐或舍入代码。通过使用[`flat_map`]而不是`map`,我们得到一个`Iterator`而不是`Iterator`。 - 最后,我们将起始地址转换为 `PhysFrame` 类型,以构建一个 `Iterator`。 [`MemoryRegion`]: https://docs.rs/bootloader/0.6.4/bootloader/bootinfo/struct.MemoryRegion.html [`filter`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter [`map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.map [range语法]: https://doc.rust-lang.org/core/ops/struct.Range.html [`step_by`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.step_by [`flat_map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.flat_map 该函数的返回类型使用了[`impl Trait`]特性。这样,我们可以指定返回某个实现[`Iterator`]特质的类型,项目类型为`PhysFrame`,但不需要命名具体的返回类型。这在这里很重要,因为我们不能命名具体的类型,因为它依赖于不可命名的闭包类型。 [`impl Trait`]: https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits [`Iterator`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html #### 实现 `FrameAllocator` Trait 现在我们可以实现 `FrameAllocator` trait: ```rust // in src/memory.rs unsafe impl FrameAllocator for BootInfoFrameAllocator { fn allocate_frame(&mut self) -> Option { let frame = self.usable_frames().nth(self.next); self.next += 1; frame } } ``` 我们首先使用`usable_frames`方法,从内存 map 中获得一个可用帧的迭代器。然后,我们使用[`Iterator::nth`]函数来获取索引为`self.next`的帧(从而跳过`(self.next - 1)`帧)。在返回该帧之前,我们将`self.next`增加1,以便在下次调用时返回下一帧。 [`Iterator::nth`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.nth 这个实现不是很理想,因为它在每次分配时都会重新创建`usable_frame`分配器。最好的办法是直接将迭代器存储为一个结构域。这样我们就不需要`nth`方法了,可以在每次分配时直接调用[`next`]。这种方法的问题是,目前不可能将 "impl Trait "类型存储在一个结构字段中。当 [_named existential types_] 完全实现时,它可能会在某一天发挥作用。 [`next`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#tymethod.next [_named existential types_]: https://github.com/rust-lang/rfcs/pull/2071 #### 使用 `BootInfoFrameAllocator` 我们现在可以修改我们的`kernel_main`函数来传递一个`BootInfoFrameAllocator`实例,而不是`EmptyFrameAllocator`。 ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::memory::BootInfoFrameAllocator; […] let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; […] } ``` 通过启动信息框架分配器,映射成功了,我们又在屏幕上看到了白底黑字的 _"New!"_ 。在幕后,`map_to`方法以如下方式创建了丢失的页表。 - 使用传递的`frame_allocator`来分配一个未使用的框架。 - 将框架归零,创建一个新的、空的页表。 - 将上一级表的条目映射到该框架。 - 继续下一级的表。 虽然我们的`create_example_mapping`函数只是一些示例代码,但我们现在能够为任意的页面创建新的映射。这对于分配内存或在未来的文章中实现多线程是至关重要的。 此时,我们应该再次删除`create_example_mapping`函数,以避免意外地调用未定义的行为,正如 [上面](#create-example-mapping-han-shu) 所解释的那样。 ## 总结 在这篇文章中,我们了解了访问页表物理框架的不同技术,包括直接映射、完整物理内存的映射、临时映射和递归页表。我们选择了映射完整的物理内存,因为它简单、可移植,而且功能强大。 我们不能在没有页表访问的情况下从我们的内核映射物理内存,所以我们需要bootloader的支持。`bootloader`板块支持通过可选的 cargo 板块功能创建所需的映射。它以"&BootInfo "参数的形式将所需信息传递给我们的内核。 对于我们的实现,我们首先手动遍历页表以实现翻译功能,然后使用`x86_64`板块的`MappedPageTable`类型。我们还学习了如何在页表中创建新的映射,以及如何在引导程序传递的内存映射之上创建必要的 "FrameAllocator"。 ## 下篇文章是什么? 下一篇文章将为我们的内核创建一个堆内存区域,这将允许我们[分配内存]和使用各种[集合类型]。 [分配内存]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html [集合类型]: https://doc.rust-lang.org/alloc/collections/index.html ================================================ FILE: blog/content/edition-2/posts/10-heap-allocation/index.es.md ================================================ +++ title = "Asignación en el Heap" weight = 10 path = "es/heap-allocation" date = 2019-06-26 [extra] chapter = "Gestión de Memoria" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ Este post añade soporte para la asignación en el heap a nuestro núcleo. Primero, proporciona una introducción a la memoria dinámica y muestra cómo el borrow checker (verificador de préstamos) previene errores comunes de asignación. Luego, implementa la interfaz básica de asignación de Rust, crea una región de memoria en el heap y configura una crate de asignador. Al final de este post, todos los tipos de asignación y recolección de la crate `alloc` integrada estarán disponibles para nuestro núcleo. Este blog se desarrolla abiertamente en [GitHub]. Si tienes algún problema o preguntas, por favor abre un issue allí. También puedes dejar comentarios [al final]. El código fuente completo de este post se puede encontrar en la rama [`post-10`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-10 ## Variables Locales y Estáticas Actualmente usamos dos tipos de variables en nuestro núcleo: variables locales y variables `static`. Las variables locales se almacenan en el [call stack] y son válidas solo hasta que la función envolvente retorna. Las variables estáticas se almacenan en una ubicación de memoria fija y viven siempre durante toda la duración del programa. ### Variables Locales Las variables locales se almacenan en el [call stack], que es una [estructura de datos tipo pila] que soporta operaciones de `push` y `pop`. En cada entrada de función, los parámetros, la dirección de retorno y las variables locales de la función llamada son empujadas por el compilador: [call stack]: https://en.wikipedia.org/wiki/Call_stack [estructura de datos tipo pila]: https://en.wikipedia.org/wiki/Stack_(abstract_data_type) ![Una función `outer()` y una función `inner(i: usize)`, donde `outer` llama a `inner(1)`. Ambas tienen algunas variables locales. La pila de llamadas contiene los siguientes espacios: las variables locales de outer, luego el argumento `i = 1`, luego la dirección de retorno, luego las variables locales de inner.](call-stack.svg) El ejemplo anterior muestra la pila de llamadas después de que la función `outer` llamó a la función `inner`. Vemos que la pila de llamadas contiene primero las variables locales de `outer`. En la llamada a `inner`, el parámetro `1` y la dirección de retorno para la función fueron empujados. Luego, se transfirió el control a `inner`, que empujó sus variables locales. Después de que la función `inner` retorna, su parte de la pila de llamadas se desapila nuevamente y solo permanecen las variables locales de `outer`: ![La pila de llamadas contiene solo las variables locales de `outer`](call-stack-return.svg) Vemos que las variables locales de `inner` solo viven hasta que la función retorna. El compilador de Rust refuerza estas duraciones y lanza un error cuando usamos un valor durante demasiado tiempo, por ejemplo, cuando intentamos devolver una referencia a una variable local: ```rust fn inner(i: usize) -> &'static u32 { let z = [1, 2, 3]; &z[i] } ``` ([ejecutar el ejemplo en el playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=6186a0f3a54f468e1de8894996d12819)) Si bien devolver una referencia no tiene sentido en este ejemplo, hay casos en los que queremos que una variable viva más que la función. Ya hemos visto tal caso en nuestro núcleo cuando intentamos [cargar una tabla de descriptores de interrupción] y tuvimos que usar una variable `static` para extender la duración. [cargar una tabla de descriptores de interrupción]: @/edition-2/posts/05-cpu-exceptions/index.md#loading-the-idt ### Variables Estáticas Las variables estáticas se almacenan en una ubicación de memoria fija separada de la pila. Esta ubicación de memoria se asigna en tiempo de compilación por el enlazador y se codifica en el ejecutable. Las variables estáticas viven durante toda la ejecución del programa, por lo que tienen la duración `'static` y siempre pueden ser referenciadas desde variables locales: ![El mismo ejemplo de outer/inner, excepto que inner tiene un `static Z: [u32; 3] = [1,2,3];` y devuelve una referencia `&Z[i]`](call-stack-static.svg) Cuando la función `inner` retorna en el ejemplo anterior, su parte de la pila de llamadas se destruye. Las variables estáticas viven en un rango de memoria separado que nunca se destruye, por lo que la referencia `&Z[1]` sigue siendo válida después del retorno. Aparte de la duración `'static`, las variables estáticas también tienen la propiedad útil de que su ubicación es conocida en tiempo de compilación, de modo que no se necesita ninguna referencia para acceder a ellas. Utilizamos esa propiedad para nuestra macro `println`: Al usar un [static `Writer`] internamente, no se necesita una referencia `&mut Writer` para invocar la macro, lo que es muy útil en [manejadores de excepciones], donde no tenemos acceso a variables adicionales. [static `Writer`]: @/edition-2/posts/03-vga-text-buffer/index.md#a-global-interface [manejadores de excepciones]: @/edition-2/posts/05-cpu-exceptions/index.md#implementation Sin embargo, esta propiedad de las variables estáticas trae un inconveniente crucial: son de solo lectura por defecto. Rust refuerza esto porque ocurriría una [condición de carrera] si, por ejemplo, dos hilos modificaran una variable estática al mismo tiempo. La única forma de modificar una variable estática es encapsularla en un tipo [`Mutex`], que asegura que solo exista una sola referencia `&mut` en cualquier momento. Ya utilizamos un `Mutex` para nuestro [buffer `Writer` estático VGA][vga mutex]. [condición de carrera]: https://doc.rust-lang.org/nomicon/races.html [`Mutex`]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html [vga mutex]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ## Memoria Dinámica Las variables locales y estáticas ya son muy poderosas juntas y habilitan la mayoría de los casos de uso. Sin embargo, vimos que ambas tienen sus limitaciones: - Las variables locales solo viven hasta el final de la función o bloque envolvente. Esto se debe a que viven en la pila de llamadas y se destruyen después de que la función envolvente retorna. - Las variables estáticas siempre viven durante toda la ejecución del programa, por lo que no hay forma de recuperar y reutilizar su memoria cuando ya no se necesitan. Además, tienen semánticas de propiedad poco claras y son accesibles desde todas las funciones, por lo que necesitan ser protegidas por un [`Mutex`] cuando queremos modificarlas. Otra limitación de las variables locales y estáticas es que tienen un tamaño fijo. Por lo que no pueden almacenar una colección que crezca dinámicamente a medida que se añaden más elementos. (Hay propuestas para [valores rvalue sin tamaño] en Rust que permitirían variables locales con tamaño dinámico, pero solo funcionan en algunos casos específicos.) [valores rvalue sin tamaño]: https://github.com/rust-lang/rust/issues/48055 Para eludir estas desventajas, los lenguajes de programación suelen soportar una tercera región de memoria para almacenar variables llamada **heap**. El heap soporta _asignación de memoria dinámica_ en tiempo de ejecución a través de dos funciones llamadas `allocate` y `deallocate`. Funciona de la siguiente manera: La función `allocate` devuelve un fragmento de memoria libre del tamaño especificado que se puede usar para almacenar una variable. Esta variable vive hasta que se libera llamando a la función `deallocate` con una referencia a la variable. Pasemos por un ejemplo: ![La función inner llama `allocate(size_of([u32; 3]))`, escribe `z.write([1,2,3]);` y devuelve `(z as *mut u32).offset(i)`. En el valor devuelto `y`, la función outer realiza `deallocate(y, size_of(u32))`.](call-stack-heap.svg) Aquí la función `inner` utiliza memoria del heap en lugar de variables estáticas para almacenar `z`. Primero asigna un bloque de memoria del tamaño requerido, que devuelve un `*mut u32` [puntero bruto]. Luego usa el método [`ptr::write`] para escribir el arreglo `[1,2,3]` en él. En el último paso, utiliza la función [`offset`] para calcular un puntero al elemento `i`-ésimo y luego lo devuelve. (Nota que omitimos algunos casts requeridos y bloques unsafe en esta función de ejemplo por brevedad.) [puntero bruto]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`ptr::write`]: https://doc.rust-lang.org/core/ptr/fn.write.html [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset La memoria asignada vive hasta que se libera explícitamente mediante una llamada a `deallocate`. Por lo tanto, el puntero devuelto sigue siendo válido incluso después de que `inner` haya retornado y su parte de la pila de llamadas se haya destruido. La ventaja de usar memoria del heap en comparación con memoria estática es que la memoria se puede reutilizar después de que se libera, lo que hacemos a través de la llamada `deallocate` en `outer`. Después de esa llamada, la situación se ve así: ![La pila de llamadas contiene las variables locales de `outer`, el heap contiene `z[0]` y `z[2]`, pero ya no `z[1]`.](call-stack-heap-freed.svg) Vemos que el espacio de `z[1]` está libre nuevamente y puede ser reutilizado para la siguiente llamada a `allocate`. Sin embargo, también vemos que `z[0]` y `z[2]` nunca se liberan porque nunca los desapilamos. Tal error se llama _fuga de memoria_ y es a menudo la causa del consumo excesivo de memoria de los programas (solo imagina lo que sucede cuando llamamos a `inner` repetidamente en un bucle). Esto puede parecer malo, pero hay tipos de errores mucho más peligrosos que pueden ocurrir con la asignación dinámica. ### Errores Comunes Apartando las fugas de memoria, que son desafortunadas pero no hacen que el programa sea vulnerable a atacantes, hay dos tipos comunes de errores con consecuencias más severas: - Cuando accidentalmente continuamos usando una variable después de llamar a `deallocate` sobre ella, tenemos una vulnerabilidad de **uso después de liberar**. Tal error causa comportamiento indefinido y a menudo puede ser explotado por atacantes para ejecutar código arbitrario. - Cuando accidentalmente liberamos una variable dos veces, tenemos una vulnerabilidad de **double-free**. Esto es problemático porque podría liberar una asignación diferente que se había asignado en el mismo lugar después de la primera llamada a `deallocate`. Así, puede llevar nuevamente a una vulnerabilidad de uso después de liberar. Estos tipos de vulnerabilidades son bien conocidos, por lo que uno podría esperar que las personas hayan aprendido a evitarlas hasta ahora. Pero no, tales vulnerabilidades todavía se encuentran regularmente, por ejemplo, esta [vulnerabilidad de uso después de liberar en Linux][linux vulnerability] (2019), que permitió la ejecución de código arbitrario. Una búsqueda en la web como `use-after-free linux {año actual}` probablemente siempre arrojará resultados. Esto muestra que incluso los mejores programadores no siempre son capaces de manejar correctamente la memoria dinámica en proyectos complejos. [vulnerabilidad de linux]: https://securityboulevard.com/2019/02/linux-use-after-free-vulnerability-found-in-linux-2-6-through-4-20-11/ Para evitar estos problemas, muchos lenguajes, como Java o Python, gestionan la memoria dinámica automáticamente utilizando una técnica llamada [_recolección de basura_]. La idea es que el programador nunca invoca `deallocate` manualmente. En cambio, el programa se pausa regularmente y se escanea en busca de variables de heap no utilizadas, que luego se liberan automáticamente. Por lo tanto, las vulnerabilidades mencionadas no pueden ocurrir. Los inconvenientes son el costo de rendimiento de la verificación regular y las largas pausas que probablemente ocurran. [_recolección de basura_]: https://en.wikipedia.org/wiki/Garbage_collection_(computer_science) Rust adopta un enfoque diferente al problema: utiliza un concepto llamado [_propiedad_] que puede verificar la corrección de las operaciones de memoria dinámica en tiempo de compilación. Por lo tanto, no se necesita recolección de basura para evitar las vulnerabilidades mencionadas, lo que significa que no hay costos de rendimiento. Otra ventaja de este enfoque es que el programador aún tiene un control fino sobre el uso de la memoria dinámica, al igual que con C o C++. [_propiedad_]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html ### Asignaciones en Rust En lugar de permitir que el programador llame manualmente a `allocate` y `deallocate`, la biblioteca estándar de Rust proporciona tipos de abstracción que llaman a estas funciones implícitamente. El tipo más importante es [**`Box`**], que es una abstracción para un valor asignado en el heap. Proporciona una función constructora [`Box::new`] que toma un valor, llama a `allocate` con el tamaño del valor y luego mueve el valor al espacio recién asignado en el heap. Para liberar la memoria del heap nuevamente, el tipo `Box` implementa el [`Drop` trait] para llamar a `deallocate` cuando sale del alcance: [**`Box`**]: https://doc.rust-lang.org/std/boxed/index.html [`Box::new`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html#method.new [`Drop` trait]: https://doc.rust-lang.org/book/ch15-03-drop.html ```rust { let z = Box::new([1,2,3]); […] } // z sale del alcance y se llama a `deallocate` ``` Este patrón tiene el extraño nombre [_la adquisición de recursos es inicialización_] (o _RAII_ para abreviar). Se originó en C++, donde se utiliza para implementar un tipo de abstracción similar llamado [`std::unique_ptr`]. [_la adquisición de recursos es inicialización_]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization [`std::unique_ptr`]: https://en.cppreference.com/w/cpp/memory/unique_ptr Tal tipo por sí solo no es suficiente para prevenir todos los errores de uso después de liberar, ya que los programadores aún pueden mantener referencias después de que el `Box` sale del alcance y la correspondencia de memoria del heap se libera: ```rust let x = { let z = Box::new([1,2,3]); &z[1] }; // z sale del alcance y se llama a `deallocate` println!("{}", x); ``` Aquí es donde entra la propiedad de Rust. Asigna una [duración] abstracta a cada referencia, que es el ámbito en el que la referencia es válida. En el ejemplo anterior, la referencia `x` se toma del arreglo `z`, por lo que se vuelve inválida después de que `z` sale del alcance. Cuando [ejecutas el ejemplo anterior en el playground][playground-2], verás que el compilador de Rust efectivamente lanza un error: [duración]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html [playground-2]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=28180d8de7b62c6b4a681a7b1f745a48 ``` error[E0597]: `z[_]` no vive lo suficiente --> src/main.rs:4:9 | 2 | let x = { | - préstamo almacenado más tarde aquí 3 | let z = Box::new([1,2,3]); 4 | &z[1] | ^^^^^ valor prestado no vive lo suficiente 5 | }; // z sale del alcance y se llama a `deallocate` | - `z[_]` se destruye aquí mientras aún está prestado ``` La terminología puede ser un poco confusa al principio. Tomar una referencia a un valor se llama _préstamo_ del valor, ya que es similar a un préstamo en la vida real: tienes acceso temporal a un objeto pero debes devolverlo en algún momento, y no debes destruirlo. Al verificar que todos los préstamos terminan antes de que se destruya un objeto, el compilador de Rust puede garantizar que no pueda ocurrir una situación de uso después de liberar. El sistema de propiedad de Rust va aún más lejos, previniendo no solo errores de uso después de liberar, sino también proporcionando [_seguridad de memoria_], como lenguajes recolectores de basura como Java o Python. Además, garantiza [_seguridad de hilo_] y es, por lo tanto, incluso más seguro que esos lenguajes en código multihilo. Y lo más importante, todas estas verificaciones ocurren en tiempo de compilación, por lo que no hay sobrecarga en tiempo de ejecución en comparación con la gestión de memoria escrita a mano en C. [_seguridad de memoria_]: https://en.wikipedia.org/wiki/Memory_safety [_seguridad de hilo_]: https://en.wikipedia.org/wiki/Thread_safety ### Casos de Uso Ahora conocemos los básicos de la asignación de memoria dinámica en Rust, pero ¿cuándo deberíamos usarla? Hemos llegado muy lejos con nuestro núcleo sin asignación de memoria dinámica, así que ¿por qué la necesitamos ahora? Primero, la asignación de memoria dinámica siempre conlleva un poco de sobrecarga de rendimiento, ya que necesitamos encontrar un espacio libre en el heap para cada asignación. Por esta razón, las variables locales son generalmente preferibles, especialmente en código de núcleo sensible al rendimiento. Sin embargo, hay casos en los que la asignación de memoria dinámica es la mejor opción. Como regla básica, se requiere memoria dinámica para variables que tienen una duración dinámica o un tamaño variable. El tipo más importante con una duración dinámica es [**`Rc`**], que cuenta las referencias a su valor envuelto y lo libera después de que todas las referencias han salido del alcance. Ejemplos de tipos con un tamaño variable son [**`Vec`**], [**`String`**] y otros [tipos de colección] que crecen dinámicamente cuando se añaden más elementos. Estos tipos funcionan al asignar una mayor cantidad de memoria cuando se llenan, copiando todos los elementos y luego liberando la antigua asignación. [**`Rc`**]: https://doc.rust-lang.org/alloc/rc/index.html [**`Vec`**]: https://doc.rust-lang.org/alloc/vec/index.html [**`String`**]: https://doc.rust-lang.org/alloc/string/index.html [tipos de colección]: https://doc.rust-lang.org/alloc/collections/index.html Para nuestro núcleo, necesitaríamos principalmente los tipos de colección, por ejemplo, para almacenar una lista de tareas activas al implementar la multitarea en futuros posts. ## La Interfaz del Asignador El primer paso en implementar un asignador de heap es agregar una dependencia en la crate integrada [`alloc`]. Al igual que la crate [`core`], es un subconjunto de la biblioteca estándar que además contiene los tipos de asignación y colección. Para agregar la dependencia en `alloc`, añadimos lo siguiente a nuestro `lib.rs`: [`alloc`]: https://doc.rust-lang.org/alloc/ [`core`]: https://doc.rust-lang.org/core/ ```rust // en src/lib.rs extern crate alloc; ``` A diferencia de las dependencias normales, no necesitamos modificar el `Cargo.toml`. La razón es que la crate `alloc` se envía con el compilador de Rust como parte de la biblioteca estándar, por lo que el compilador ya conoce la crate. Al agregar esta declaración `extern crate`, especificamos que el compilador debería intentar incluirla. (Históricamente, todas las dependencias necesitaban una declaración `extern crate`, que ahora es opcional). Dado que estamos compilando para un objetivo personalizado, no podemos usar la versión precompilada de `alloc` que se envía con la instalación de Rust. En su lugar, debemos decirle a cargo que recompilar la crate desde la fuente. Podemos hacerlo añadiendo esta a la matriz `unstable.build-std` en nuestro archivo `.cargo/config.toml`: ```toml # en .cargo/config.toml [unstable] build-std = ["core", "compiler_builtins", "alloc"] ``` Ahora el compilador recompilará e incluirá la crate `alloc` en nuestro núcleo. La razón por la que la crate `alloc` está deshabilitada por defecto en crates `#[no_std]` es que tiene requisitos adicionales. Cuando intentamos compilar nuestro proyecto ahora, veremos estos requisitos como errores: ``` error: no se encontró ningún asignador de memoria global, pero se requiere uno; vincular a std o agregar #[global_allocator] a un elemento estático que implemente el trait GlobalAlloc. ``` El error ocurre porque la crate `alloc` requiere un asignador de heap, que es un objeto que proporciona las funciones `allocate` y `deallocate`. En Rust, los asignadores de heap se describen mediante el trait [`GlobalAlloc`], que se menciona en el mensaje de error. Para establecer el asignador de heap para la crate, el atributo `#[global_allocator]` debe aplicarse a una variable `static` que implemente el trait `GlobalAlloc`. [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ### El Trait `GlobalAlloc` El trait [`GlobalAlloc`] define las funciones que debe proporcionar un asignador de heap. El trait es especial porque casi nunca se usa directamente por el programador. En su lugar, el compilador insertará automáticamente las llamadas apropiadas a los métodos del trait al utilizar los tipos de asignación y colección de `alloc`. Dado que necesitaremos implementar el trait para todos nuestros tipos de asignadores, vale la pena echar un vistazo más de cerca a su declaración: ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` Define los dos métodos requeridos [`alloc`] y [`dealloc`], que corresponden a las funciones `allocate` y `deallocate` que usamos en nuestros ejemplos: - El método [`alloc`] toma una instancia de [`Layout`] como argumento, que describe el tamaño y alineación deseados que debe tener el bloque de memoria asignada. Devuelve un [puntero bruto] al primer byte del bloque de memoria asignada. En lugar de un valor de error explícito, el método `alloc` devuelve un puntero nulo para señalar un error de asignación. Esto es un poco no idiomático, pero tiene la ventaja de que es fácil envolver asignadores de sistema existentes ya que utilizan la misma convención. - El método [`dealloc`] es el contraparte y es responsable de liberar un bloque de memoria nuevamente. Recibe dos argumentos: el puntero devuelto por `alloc` y el `Layout` que se usó para la asignación. [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html El trait además define los dos métodos [`alloc_zeroed`] y [`realloc`] con implementaciones predeterminadas: - El método [`alloc_zeroed`] es equivalente a llamar a `alloc` y luego establecer el bloque de memoria asignado a cero, lo cual es exactamente lo que hace la implementación predeterminada proporcionada. Una implementación de asignador puede reemplazar las implementaciones predeterminadas con una implementación personalizada más eficiente si es posible. - El método [`realloc`] permite aumentar o disminuir una asignación. La implementación predeterminada asigna un nuevo bloque de memoria con el tamaño deseado y copia todo el contenido de la asignación anterior. Nuevamente, una implementación de asignador podría proporcionar probablemente una implementación más eficiente de este método, por ejemplo, aumentando/disminuyendo la asignación en su lugar si es posible. [`alloc_zeroed`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.alloc_zeroed [`realloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.realloc #### Inseguridad Una cosa a notar es que tanto el trait en sí como todos los métodos del trait se declaran como `unsafe`: - La razón para declarar el trait como `unsafe` es que el programador debe garantizar que la implementación del trait para un tipo de asignador sea correcta. Por ejemplo, el método `alloc` nunca debe devolver un bloque de memoria que ya está siendo usado en otro lugar porque esto causaría comportamiento indefinido. - De manera similar, la razón por la que los métodos son `unsafe` es que el llamador debe asegurar diversas invariantes al llamar a los métodos, por ejemplo, que el `Layout` pasado a `alloc` especifica un tamaño no nulo. Esto no es realmente relevante en la práctica ya que los métodos son normalmente llamados directamente por el compilador, que asegura que se cumplan los requisitos. ### Un `DummyAllocator` Ahora que sabemos qué debe proporcionar un tipo de asignador, podemos crear un simple asignador nulo. Para eso, creamos un nuevo módulo `allocator`: ```rust // en src/lib.rs pub mod allocator; ``` Nuestro asignador nulo hace lo mínimo absoluto para implementar el trait y siempre devuelve un error cuando se llama a `alloc`. Se ve así: ```rust // en src/allocator.rs use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr::null_mut; pub struct Dummy; unsafe impl GlobalAlloc for Dummy { unsafe fn alloc(&self, _layout: Layout) -> *mut u8 { null_mut() } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { panic!("dealloc no debería ser llamado nunca") } } ``` La estructura no necesita ningún campo, así que la creamos como un [tipo de tamaño cero]. Como se mencionó anteriormente, siempre devolvemos el puntero nulo de `alloc`, que corresponde a un error de asignación. Dado que el asignador nunca devuelve memoria, una llamada a `dealloc` nunca debe ocurrir. Por esta razón, simplemente hacemos panic en el método `dealloc`. Los métodos `alloc_zeroed` y `realloc` tienen implementaciones predeterminadas, por lo que no necesitamos proporcionar implementaciones para ellos. [tipo de tamaño cero]: https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts Ahora tenemos un asignador simple, pero aún tenemos que decirle al compilador de Rust que debe usar este asignador. Aquí es donde entra el atributo `#[global_allocator]`. ### El Atributo `#[global_allocator]` El atributo `#[global_allocator]` le dice al compilador de Rust qué instancia de asignador debe usar como el asignador global de heap. El atributo solo es aplicable a un `static` que implemente el trait `GlobalAlloc`. Registremos una instancia de nuestro asignador `Dummy` como el asignador global: ```rust // en src/allocator.rs #[global_allocator] static ALLOCATOR: Dummy = Dummy; ``` Dado que el asignador `Dummy` es un [tipo de tamaño cero], no necesitamos especificar ningún campo en la expresión de inicialización. Con este static, los errores de compilación deberían estar arreglados. Ahora podemos usar los tipos de asignación y colección de `alloc`. Por ejemplo, podemos usar un [`Box`] para asignar un valor en el heap: [`Box`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html ```rust // en src/main.rs extern crate alloc; use alloc::boxed::Box; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] imprimir "¡Hola Mundo!", llamar a `init`, crear `mapper` y `frame_allocator` let x = Box::new(41); // […] llamar a `test_main` en modo de prueba println!("¡No se cayó!"); blog_os::hlt_loop(); } ``` Nota que necesitamos especificar la declaración `extern crate alloc` en nuestro `main.rs` también. Esto es requerido porque las partes de `lib.rs` y `main.rs` se tratan como crates separadas. Sin embargo, no necesitamos crear otro static `#[global_allocator]` ya que el asignador global se aplica a todas las crates en el proyecto. De hecho, especificar un asignador adicional en otra crate sería un error. Cuando ejecutamos el código anterior, vemos que ocurre un panic: ![QEMU imprimiendo "panicado en `alloc error: Layout { size_: 4, align_: 4 }, src/lib.rs:89:5"](qemu-dummy-output.png) El panic ocurre porque la función `Box::new` llama implícitamente a la función `alloc` del asignador global. Nuestro asignador nulo siempre devuelve un puntero nulo, así que cada asignación falla. Para arreglar esto, necesitamos crear un asignador que realmente devuelva memoria utilizable. ## Creando un Heap para el Núcleo Antes de que podamos crear un asignador adecuado, primero necesitamos crear una región de memoria heap de la que el asignador pueda asignar memoria. Para hacer esto, necesitamos definir un rango de memoria virtual para la región del heap y luego mapear esta región a un marco físico. Ve la publicación [_"Introducción a la Paginación"_] para una visión general de la memoria virtual y las tablas de páginas. [_"Introducción a la Paginación"_]: @/edition-2/posts/08-paging-introduction/index.md El primer paso es definir una región de memoria virtual para el heap. Podemos elegir cualquier rango de dirección virtual que nos guste, siempre que no esté ya utilizado para otra región de memoria. Definámoslo como la memoria que comienza en la dirección `0x_4444_4444_0000` para que podamos reconocer fácilmente un puntero de heap más tarde: ```rust // en src/allocator.rs pub const HEAP_START: usize = 0x_4444_4444_0000; pub const HEAP_SIZE: usize = 100 * 1024; // 100 KiB ``` Establecemos el tamaño del heap en 100 KiB por ahora. Si necesitamos más espacio en el futuro, simplemente podemos aumentarlo. Si tratamos de usar esta región del heap ahora, ocurrirá un fallo de página ya que la región de memoria virtual no está mapeada a la memoria física todavía. Para resolver esto, creamos una función `init_heap` que mapea las páginas del heap usando la [API de Mapper] que introdujimos en la publicación [_"Implementación de Paginación"_]: [API de Mapper]: @/edition-2/posts/09-paging-implementation/index.md#using-offsetpagetable [_"Implementación de Paginación"_]: @/edition-2/posts/09-paging-implementation/index.md ```rust // en src/allocator.rs use x86_64::{ structures::paging::{ mapper::MapToError, FrameAllocator, Mapper, Page, PageTableFlags, Size4KiB, }, VirtAddr, }; pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { let page_range = { let heap_start = VirtAddr::new(HEAP_START as u64); let heap_end = heap_start + HEAP_SIZE - 1u64; let heap_start_page = Page::containing_address(heap_start); let heap_end_page = Page::containing_address(heap_end); Page::range_inclusive(heap_start_page, heap_end_page) }; for page in page_range { let frame = frame_allocator .allocate_frame() .ok_or(MapToError::FrameAllocationFailed)?; let flags = PageTableFlags::PRESENT | PageTableFlags::WRITABLE; unsafe { mapper.map_to(page, frame, flags, frame_allocator)?.flush() }; } Ok(()) } ``` La función toma referencias mutables a una instancia [`Mapper`] y a una instancia [`FrameAllocator`], ambas limitadas a páginas de 4 KiB usando [`Size4KiB`] como parámetro genérico. El valor de retorno de la función es un [`Result`] con el tipo unidad `()` como variante de éxito y un [`MapToError`] como variante de error, que es el tipo de error devuelto por el método [`Mapper::map_to`]. Reutilizar el tipo de error tiene sentido aquí porque el método `map_to` es la principal fuente de errores en esta función. [`Mapper`]:https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`Size4KiB`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/enum.Size4KiB.html [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`MapToError`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html [`Mapper::map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to La implementación se puede dividir en dos partes: - **Creando el rango de páginas:** Para crear un rango de las páginas que queremos mapear, convertimos el puntero `HEAP_START` a un tipo [`VirtAddr`]. Luego calculamos la dirección del final del heap a partir de ella sumando el `HEAP_SIZE`. Queremos un límite inclusivo (la dirección del último byte del heap), por lo que restamos 1. A continuación, convertimos las direcciones en tipos [`Page`] usando la función [`containing_address`]. Finalmente, creamos un rango de páginas a partir de las páginas inicial y final utilizando la función [`Page::range_inclusive`]. - **Mapeo de las páginas:** El segundo paso es mapear todas las páginas del rango de páginas que acabamos de crear. Para eso, iteramos sobre estas páginas usando un bucle `for`. Para cada página, hacemos lo siguiente: - Asignamos un marco físico al que la página debería ser mapeada usando el método [`FrameAllocator::allocate_frame`]. Este método devuelve [`None`] cuando no quedan más marcos. Nos ocupamos de ese caso al mapearlo a un error [`MapToError::FrameAllocationFailed`] a través del método [`Option::ok_or`] y luego aplicando el [operador de signo de interrogación] para retornar temprano en caso de error. - Establecemos el flag `PRESENT` requerido y el flag `WRITABLE` para la página. Con estos flags, tanto los accesos de lectura como de escritura están permitidos, lo que tiene sentido para la memoria del heap. - Usamos el método [`Mapper::map_to`] para crear el mapeo en la tabla de páginas activa. El método puede fallar, así que usamos el [operador de signo de interrogación] otra vez para avanzar el error al llamador. En caso de éxito, el método devuelve una instancia de [`MapperFlush`] que podemos usar para actualizar el [_buffer de traducción de direcciones_] utilizando el método [`flush`]. [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`Page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html [`containing_address`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.containing_address [`Page::range_inclusive`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.range_inclusive [`FrameAllocator::allocate_frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html#tymethod.allocate_frame [`None`]: https://doc.rust-lang.org/core/option/enum.Option.html#variant.None [`MapToError::FrameAllocationFailed`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html#variant.FrameAllocationFailed [`Option::ok_or`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.ok_or [operador de signo de interrogación]: https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [_buffer de traducción de direcciones_]: @/edition-2/posts/08-paging-introduction/index.md#the-translation-lookaside-buffer [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush El último paso es llamar a esta función desde nuestro `kernel_main`: ```rust // en src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; // nueva importación use blog_os::memory::{self, BootInfoFrameAllocator}; println!("¡Hola Mundo{}!", ""); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; // nueva allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("falló la inicialización del heap"); let x = Box::new(41); // […] llamar a `test_main` en contexto de prueba println!("¡No se cayó!"); blog_os::hlt_loop(); } ``` Mostramos la función completa aquí para contexto. Las únicas nuevas líneas son la importación de `blog_os::allocator` y la llamada a la función `allocator::init_heap`. En caso de que la función `init_heap` devuelva un error, hacemos panic usando el método [`Result::expect`] ya que actualmente no hay una forma sensata para nosotros de manejar este error. [`Result::expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect Ahora tenemos una región de memoria heap mapeada que está lista para ser utilizada. La llamada a `Box::new` aún utiliza nuestro antiguo asignador `Dummy`, así que todavía verás el error "sin memoria" cuando lo ejecutes. Arreglemos esto utilizando un asignador apropiado. ## Usando una Crate de Asignador Dado que implementar un asignador es algo complicado, empezamos usando una crate de asignador externa. Aprenderemos cómo implementar nuestro propio asignador en el próximo post. Una crate de asignador simple para aplicaciones `no_std` es la crate [`linked_list_allocator`]. Su nombre proviene del hecho de que utiliza una estructura de datos de lista enlazada para hacer un seguimiento de las regiones de memoria desasignadas. Ve la próxima publicación para una explicación más detallada de este enfoque. [`linked_list_allocator`]: https://github.com/phil-opp/linked-list-allocator/ Para usar la crate, primero necesitamos agregar una dependencia en ella en nuestro `Cargo.toml`: ```toml # en Cargo.toml [dependencies] linked_list_allocator = "0.9.0" ``` Luego podemos reemplazar nuestro asignador nulo con el asignador proporcionado por la crate: ```rust // en src/allocator.rs use linked_list_allocator::LockedHeap; #[global_allocator] static ALLOCATOR: LockedHeap = LockedHeap::empty(); ``` La estructura se llama `LockedHeap` porque usa el tipo [`spinning_top::Spinlock`] para la sincronización. Esto es requerido porque múltiples hilos podrían acceder al static `ALLOCATOR` al mismo tiempo. Como siempre, al utilizar un spinlock o un mutex, debemos tener cuidado de no causar accidentalmente un deadlock. Esto significa que no debemos realizar ninguna asignación en manejadores de interrupciones, ya que pueden ejecutarse en cualquier momento y podrían interrumpir una asignación en progreso. [`spinning_top::Spinlock`]: https://docs.rs/spinning_top/0.1.0/spinning_top/type.Spinlock.html Configurar el `LockedHeap` como asignador global no es suficiente. La razón es que usamos la función constructora [`empty`], que crea un asignador sin ninguna memoria de respaldo. Al igual que nuestro asignador nulo, siempre devuelve un error en `alloc`. Para arreglar esto, necesitamos inicializar el asignador después de crear el heap: [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.LockedHeap.html#method.empty ```rust // en src/allocator.rs pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { // […] mapear todas las páginas del heap a marcos físicos // nueva unsafe { ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE); } Ok(()) } ``` Usamos el método [`lock`] sobre el spinlock interno del tipo `LockedHeap` para obtener una referencia exclusiva a la instancia [`Heap`] envuelta, sobre la cual luego llamamos al método [`init`] con los límites del heap como argumentos. Como la función [`init`] ya intenta escribir en la memoria del heap, debemos inicializar el heap solo _después_ de mapear las páginas del heap. [`lock`]: https://docs.rs/lock_api/0.3.3/lock_api/struct.Mutex.html#method.lock [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init Después de inicializar el heap, ahora podemos usar todos los tipos de asignación y colección de la crate integrada [`alloc`] sin error: ```rust // en src/main.rs use alloc::{boxed::Box, vec, vec::Vec, rc::Rc}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] inicializar interrupciones, mapper, frame_allocator, heap // asignar un número en el heap let heap_value = Box::new(41); println!("valor del heap en {:p}", heap_value); // crear un vector de tamaño dinámico let mut vec = Vec::new(); for i in 0..500 { vec.push(i); } println!("vector en {:p}", vec.as_slice()); // crear un vector contado por referencias -> será liberado cuando el conteo llegue a 0 let reference_counted = Rc::new(vec![1, 2, 3]); let cloned_reference = reference_counted.clone(); println!("el conteo de referencia actual es {}", Rc::strong_count(&cloned_reference)); core::mem::drop(reference_counted); println!("el conteo de referencia ahora es {} ahora", Rc::strong_count(&cloned_reference)); // […] llamar a `test_main` en contexto de prueba println!("¡No se cayó!"); blog_os::hlt_loop(); } ``` Este ejemplo de código muestra algunos usos de los tipos [`Box`], [`Vec`] y [`Rc`]. Para los tipos `Box` y `Vec`, imprimimos los punteros del heap subyacente usando el especificador de formato [`{:p}`]. Para mostrar `Rc`, creamos un valor del heap contado por referencias y usamos la función [`Rc::strong_count`] para imprimir el conteo de referencias actual antes y después de soltar una de las instancias (usando [`core::mem::drop`]). [`Vec`]: https://doc.rust-lang.org/alloc/vec/ [`Rc`]: https://doc.rust-lang.org/alloc/rc/ [`{:p}` especificador de formato]: https://doc.rust-lang.org/core/fmt/trait.Pointer.html [`Rc::strong_count`]: https://doc.rust-lang.org/alloc/rc/struct.Rc.html#method.strong_count [`core::mem::drop`]: https://doc.rust-lang.org/core/mem/fn.drop.html Cuando lo ejecutamos, vemos lo siguiente: ![QEMU imprimiendo ` valor del heap en 0x444444440000 vector en 0x4444444408000 el conteo de referencia actual es 2 el conteo de referencia ahora es 1 ](qemu-alloc-showcase.png) Como se esperaba, vemos que los valores `Box` y `Vec` viven en el heap, como lo indica el puntero que comienza con el prefijo `0x_4444_4444_*`. El valor contado por referencias también se comporta como se esperaba, con el conteo de referencias siendo 2 después de la llamada a `clone`, y 1 nuevamente después de que se eliminó una de las instancias. La razón por la que el vector comienza en el desplazamiento `0x800` no es que el valor `Box` sea `0x800` bytes grande, sino las [reasignaciones] que ocurren cuando el vector necesita aumentar su capacidad. Por ejemplo, cuando la capacidad del vector es 32 y tratamos de añadir el siguiente elemento, el vector asigna un nuevo arreglo de respaldo con una capacidad de 64 tras las escenas y copia todos los elementos. Luego libera la antigua asignación. [reasignaciones]: https://doc.rust-lang.org/alloc/vec/struct.Vec.html#capacity-and-reallocation Por supuesto, hay muchos más tipos de asignación y colección en la crate `alloc` que ahora podemos usar todos en nuestro núcleo, incluyendo: - el puntero contado por referencias seguro para hilos [`Arc`] - el tipo de cadena propia [`String`] y la macro [`format!`] - [`LinkedList`] - el búfer de anillo creciente [`VecDeque`] - la cola de prioridad [`BinaryHeap`] - [`BTreeMap`] y [`BTreeSet`] [`Arc`]: https://doc.rust-lang.org/alloc/sync/struct.Arc.html [`String`]: https://doc.rust-lang.org/alloc/string/struct.String.html [`format!`]: https://doc.rust-lang.org/alloc/macro.format.html [`LinkedList`]: https://doc.rust-lang.org/alloc/collections/linked_list/struct.LinkedList.html [`VecDeque`]: https://doc.rust-lang.org/alloc/collections/vec_deque/struct.VecDeque.html [`BinaryHeap`]: https://doc.rust-lang.org/alloc/collections/binary_heap/struct.BinaryHeap.html [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html [`BTreeSet`]: https://doc.rust-lang.org/alloc/collections/btree_set/struct.BTreeSet.html Estos tipos serán muy útiles cuando queramos implementar listas de hilos, colas de programación o soporte para async/await. ## Añadiendo una Prueba Para asegurarnos de que no rompemos accidentalmente nuestro nuevo código de asignación, deberíamos agregar una prueba de integración para ello. Comenzamos creando un nuevo archivo `tests/heap_allocation.rs` con el siguiente contenido: ```rust // en tests/heap_allocation.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] extern crate alloc; use bootloader::{entry_point, BootInfo}; use core::panic::PanicInfo; entry_point!(main); fn main(boot_info: &'static BootInfo) -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` Reutilizamos las funciones `test_runner` y `test_panic_handler` de nuestro `lib.rs`. Dado que queremos probar asignaciones, habilitamos la crate `alloc` a través de la declaración `extern crate alloc`. Para más información sobre el boilerplate de la prueba, consulta la publicación [_Pruebas_] . [_Pruebas_]: @/edition-2/posts/04-testing/index.md La implementación de la función `main` se ve así: ```rust // en tests/heap_allocation.rs fn main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; use blog_os::memory::{self, BootInfoFrameAllocator}; use x86_64::VirtAddr; blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("falló la inicialización del heap"); test_main(); loop {} } ``` Es muy similar a la función `kernel_main` en nuestro `main.rs`, con las diferencias de que no invocamos `println`, no incluimos ninguna asignación de ejemplo y llamamos a `test_main` incondicionalmente. Ahora estamos listos para agregar algunos casos de prueba. Primero, agregamos una prueba que realiza algunas asignaciones simples usando [`Box`] y verifica los valores asignados para asegurar que las asignaciones básicas funcionan: ```rust // en tests/heap_allocation.rs use alloc::boxed::Box; #[test_case] fn simple_allocation() { let heap_value_1 = Box::new(41); let heap_value_2 = Box::new(13); assert_eq!(*heap_value_1, 41); assert_eq!(*heap_value_2, 13); } ``` Lo más importante es que esta prueba verifica que no ocurre error de asignación. A continuación, construimos iterativamente un gran vector, para probar tanto grandes asignaciones como múltiples asignaciones (debido a reasignaciones): ```rust // en tests/heap_allocation.rs use alloc::vec::Vec; #[test_case] fn large_vec() { let n = 1000; let mut vec = Vec::new(); for i in 0..n { vec.push(i); } assert_eq!(vec.iter().sum::(), (n - 1) * n / 2); } ``` Verificamos la suma comparándola con la fórmula para la [n-ésima suma parcial]. Esto nos da algo de confianza en que los valores asignados son todos correctos. [n-ésima suma parcial]: https://en.wikipedia.org/wiki/1_%2B_2_%2B_3_%2B_4_%2B_%E2%8B%AF#Partial_sums Como tercera prueba, creamos diez mil asignaciones una tras otra: ```rust // en tests/heap_allocation.rs use blog_os::allocator::HEAP_SIZE; #[test_case] fn many_boxes() { for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } } ``` Esta prueba asegura que el asignador reutiliza la memoria liberada para asignaciones subsecuentes, ya que de lo contrario se quedaría sin memoria. Esto puede parecer un requisito obvio para un asignador, pero hay diseños de asignador que no hacen esto. Un ejemplo es el diseño del asignador bump que se explicará en el próximo post. ¡Vamos a ejecutar nuestra nueva prueba de integración! ``` > cargo test --test heap_allocation […] Ejecutando 3 pruebas simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` ¡Las tres pruebas tuvieron éxito! También puedes invocar `cargo test` (sin el argumento `--test`) para ejecutar todas las pruebas unitarias e integradas. ## Resumen Este post dio una introducción a la memoria dinámica y explicó por qué y dónde se necesita. Vimos cómo el borrow checker previene vulnerabilidades comunes y aprendimos cómo funciona la API de asignación de Rust. Después de crear una implementación mínima de la interfaz de asignador de Rust usando un asignador nulo, creamos una región de memoria heap adecuada para nuestro núcleo. Para eso, definimos un rango de direcciones virtuales para el heap y luego mapeamos todas las páginas de ese rango a marcos físicos usando el `Mapper` y `FrameAllocator` de la publicación anterior. Finalmente, agregamos una dependencia en la crate `linked_list_allocator` para añadir un asignador adecuado a nuestro núcleo. Con este asignador, pudimos utilizar `Box`, `Vec` y otros tipos de asignación y colección de la crate `alloc`. ## ¿Qué sigue? Si bien ya hemos añadido soporte para la asignación en el heap en este post, dejamos la mayor parte del trabajo a la crate `linked_list_allocator`. El próximo post mostrará en detalle cómo se puede implementar un asignador desde cero. Presentará múltiples posibles diseños de asignadores, mostrará cómo implementar versiones simples de ellos y explicará sus ventajas y desventajas. ================================================ FILE: blog/content/edition-2/posts/10-heap-allocation/index.ja.md ================================================ +++ title = "ヒープ割り当て" weight = 10 path = "ja/heap-allocation" date = 2019-06-26 [extra] # Please update this when updating the translation translation_based_on_commit = "afeed7477bb19a29d94a96b8b0620fd241b0d55f" # GitHub usernames of the people that translated this post translators = ["swnakamura", "garasubo"] +++ この記事では、私たちのカーネルにヒープ割り当て (アロケーション) の機能を追加します。まず動的メモリの基礎を説明し、どのようにして借用チェッカがありがちなアロケーションエラーを防いでくれるのかを示します。その後Rustの基本的なアロケーションインターフェースを実装し、ヒープメモリ領域を作成し、アロケータクレートを設定します。この記事を終える頃には、Rustに組み込みの`alloc`クレートのすべてのアロケーション・コレクション型が私たちのカーネルで利用可能になっているでしょう。 このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-10` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-10 ## 局所 (ローカル) 変数と静的 (スタティック) 変数 私たちのカーネルでは現在二種類の変数が使用されています:局所変数と`static`変数です。局所変数は[コールスタック][call stack]に格納されており、変数の定義された関数がリターンするまでの間のみ有効です。静的変数はメモリ上の固定された場所に格納されており、プログラムのライフタイム全体で常に生存しています。 ### 局所変数 局所変数は[コールスタック][call stack]に格納されています。これはプッシュ (`push`) とポップ (`pop`) という命令をサポートする[スタックというデータ構造][stack data structure]です。関数に入るたびに、パラメータ、リターンアドレス、呼び出された関数の局所変数がコンパイラによってプッシュされます: [call stack]: https://ja.wikipedia.org/wiki/%E3%82%B3%E3%83%BC%E3%83%AB%E3%82%B9%E3%82%BF%E3%83%83%E3%82%AF [stack data structure]: https://ja.wikipedia.org/wiki/%E3%82%B9%E3%82%BF%E3%83%83%E3%82%AF ![outer()とinner(i: usize)関数。両方が局所変数を持っています。outerはinner(1)を呼びます。コールスタックには順に以下の領域があります:outerの局所変数、引数i=1、リターンアドレス、そしてinnerの局所変数。](call-stack.svg) 上の例は、`outer`関数が`inner`関数を呼び出した後のコールスタックを示しています。コールスタックは`outer`の局所変数を先に持っていることが分かります。`inner`を呼び出すと、パラメータ`1`とこの関数のリターンアドレスがプッシュされます。そこで制御は`inner`へと移り、`inner`は自身の局所変数をプッシュします。 `inner`関数がリターンすると、コールスタックのこの関数に対応する部分がポップされ、`outer`の局所変数のみが残ります: ![outerの局所変数しか持っていないコールスタック](call-stack-return.svg) `inner`関数の局所変数はリターンまでしか生存していないことが分かります。Rustコンパイラはこの生存期間 (ライフタイム) を強制し、私たちが値を長く使いすぎてしまうとエラーを投げます。例えば、局所変数への参照を返そうとしたときがそうです: ```rust fn inner(i: usize) -> &'static u32 { let z = [1, 2, 3]; &z[i] } ``` ([この例をplaygroundで実行する](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=6186a0f3a54f468e1de8894996d12819)) 上の例の場合、参照を返すことには意味がありませんが、変数に関数よりも長く生存して欲しいというケースは存在します。すでに私たちのカーネルでそのようなケースに遭遇しています。それは[割り込み記述子表 (IDT) を読み込][load an interrupt descriptor table]もうとしたときで、ライフタイムを延ばすために`static`変数を使う必要がありました。 [load an interrupt descriptor table]: @/edition-2/posts/05-cpu-exceptions/index.ja.md#idtwodu-miip-mu ### 静的変数 静的変数は、スタックとは別の固定されたメモリ位置に格納されます。このメモリ位置はコンパイル時にリンカによって指定され、実行可能ファイルにエンコードされています。静的変数はプログラムの実行中ずっと生存するため、`'static`ライフタイムを持っており、局所変数によっていつでも参照することができます。 ![同じouter/innerの例ですが、innerが`static Z: [u32; 3] = [1,2,3];`を持っており、参照`&Z[i]`を返します](call-stack-static.svg) 上の例で`inner`関数がリターンするとき、それに対応するコールスタックは破棄されます。(しかし)静的変数は絶対に破棄されない別のメモリ領域にあるため、参照`&Z[1]`はリターン後も有効です。 `'static`ライフタイムの他にも静的変数には利点があります。それらは位置がコンパイル時に分かるため、アクセスするために参照が必要ないのです。この特性を私たちの`println`マクロを作る際に利用しました:[静的な`Writer`][static `Writer`]をその内部で使うことで、マクロを呼び出す際に`&mut Writer`参照が必要でなくなります。これは他の変数にアクセスできない[例外処理関数][exception handlers]においてとても有用です。 [static `Writer`]: @/edition-2/posts/03-vga-text-buffer/index.ja.md#da-yu-de-global-naintahuesu [exception handlers]: @/edition-2/posts/05-cpu-exceptions/index.ja.md#shi-zhuang しかし、静的変数のこの特性には重大な欠点がついてきます:デフォルトでは読み込み専用なのです。Rustがこのルールを強制するのは、例えば二つのスレッドがある静的変数を同時に変更した場合[データ競合][data race]が発生するためです。静的変数を変更する唯一の方法は、それを[`Mutex`]型にカプセル化し、あらゆる時刻において`&mut`参照が一つしか存在しないことを保証することです。`Mutex`は[VGAバッファへの静的な`Writer`][vga mutex]を作ったときにすでに使いました。 [data race]: https://doc.rust-jp.rs/rust-nomicon-ja/races.html [`Mutex`]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html [vga mutex]: @/edition-2/posts/03-vga-text-buffer/index.ja.md#supinrotuku ## 動的 (ダイナミック) メモリ 局所変数と静的変数を組み合わせれば、それら自体とても強力であり、ほとんどのユースケースを満足します。しかし、どちらにも制限が存在することも見てきました: - 局所変数はそれを定義する関数やブロックが終わるまでしか生存しません。なぜなら、これらはコールスタックに存在し、関数がリターンした段階で破棄されるからです。 - 静的変数はプログラムの実行中常に生存するため、必要なくなったときでもメモリを取り戻したり再利用したりする方法がありません。また、所有権のセマンティクスが不明瞭であり、すべての関数からアクセスできてしまうため、変更しようと思ったときには[`Mutex`]で保護してやらないといけません。 局所変数・静的変数の制約としてもう一つ、固定サイズであることが挙げられます。従ってこれらは要素が追加されたときに動的に大きくなるコレクションを格納することができません(Rustにおいて動的サイズの局所変数を可能にする[unsized rvalues]の提案が行われていますが、これはいくつかの特定のケースでしかうまく動きません)。 [unsized rvalues]: https://github.com/rust-lang/rust/issues/48055 これらの欠点を回避するために、プログラミング言語はしばしば、変数を格納するための第三の領域である**ヒープ**をサポートします。ヒープは、`allocate`と`deallocate`という二つの関数を通じて、実行時の**動的メモリ割り当て**をサポートします。仕組みとしては以下のようになります:`allocate`関数は、変数を格納するのに使える、指定されたサイズの解放されたメモリの塊を返します。変数への参照を引数に`deallocate`関数を呼び出すことによってその変数を解放するまで、この変数は生存します。 例を使って見てみましょう: ![inner関数は`allocate(size_of([u32; 3]))`を呼び、`z.write([1,2,3]);`で書き込みを行い、`(z as *mut u32).offset(i)`を返します。outer関数は返された値`y`に対して`deallocate(y, size_of(u32))`を行います。](call-stack-heap.svg) ここで`inner`関数は`z`を格納するために静的変数ではなくヒープメモリを使っています。まず要求されたサイズのメモリブロックを割り当て、`*mut u32`の[生ポインタ][raw pointer]を受け取ります。その後で[`ptr::write`]メソッドを使ってこれに配列`[1,2,3]`を書き込みます。最後のステップとして、[`offset`]関数を使って`i`番目の要素へのポインタを計算しそれを返します(簡単のため、必要なキャストやunsafeブロックをいくつか省略しました)。 [raw pointer]: https://doc.rust-jp.rs/book-ja/ch19-01-unsafe-rust.html#%E7%94%9F%E3%83%9D%E3%82%A4%E3%83%B3%E3%82%BF%E3%82%92%E5%8F%82%E7%85%A7%E5%A4%96%E3%81%97%E3%81%99%E3%82%8B [`ptr::write`]: https://doc.rust-lang.org/core/ptr/fn.write.html [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset 割り当てられたメモリは`deallocate`の呼び出しによって明示的に解放されるまで生存します。したがって、返されたポインタは、`inner`がリターンしコールスタックの対応する部分が破棄された後も有効です。スタティックメモリと比較したときのヒープメモリの長所は、解放(`outer`内の`deallocate`呼び出しでまさにこれを行っています)後に再利用できるということです。この呼び出しの後、状況は以下のようになります。 ![コールスタックはouterの局所変数を持っており、ヒープはz[0]とz[2]を持っているが、z[1]はもう持っていない。](call-stack-heap-freed.svg) `z[1]`スロットが解放され、次の`allocate`呼び出しで再利用できることが分かります。しかし、`z[0]`と`z[2]`は永久にdeallocateされず、したがって永久に解放されないことも分かります。このようなバグは**メモリリーク**と呼ばれており、しばしばプログラムの過剰なメモリ消費を引き起こします(`inner`をループで何度も呼び出したらどんなことになるか、想像してみてください)。これ自体良くないことに思われるかもしれませんが、動的割り当てはもっと危険性の高いバグを発生させうるのです。 ### よくあるミス メモリリークは困りものですが、プログラムを攻撃者に対して脆弱にはしません。しかしこのほかに、より深刻な結果を招く二種類のバグが存在します: - もし変数に対して`deallocate`を呼んだ後にも間違ってそれを使い続けたら、いわゆるuse-after-free (メモリ解放後に使用) 脆弱性が発生します。このようなバグは未定義動作を引き起こし、しばしば攻撃者が任意コードを実行するのに利用されます。 - 間違ってある変数を二度解放したら、double-free (二重解放) 脆弱性が発生します。これが問題になるのは、最初の`deallocate`呼び出しの後に同じ場所にallocateされた別の割り当てを解放してしまうかもしれないからです。従って、これもまたuse-after-free脆弱性につながりかねません。 これらの脆弱性は広く知られているため、回避する方法も解明されているはずだとお思いになるかもしれません。しかし答えはいいえで、このような脆弱性は未だ散見され、例えば最近でも任意コード実行を許す[Linuxのuse-after-free脆弱性][linux vulnerability]が存在しました。このことは、最高のプログラマーであっても、複雑なプロジェクトにおいて常に正しく動的メモリを扱えはしないということを示しています。 [linux vulnerability]: https://securityboulevard.com/2019/02/linux-use-after-free-vulnerability-found-in-linux-2-6-through-4-20-11/ これらの問題を回避するため、JavaやPythonといった多くの言語では[**ガベージコレクション**][_garbage collection_]という技術を使って自動的に動的メモリを管理しています。発想としては、プログラマが絶対に自分の手で`deallocate`を呼び出すことがないようにするというものです。代わりに、プログラムが定期的に一時停止されてスキャンされ、未使用のヒープ変数が見つかったら自動的にdeallocateされるのです。従って、上のような脆弱性は絶対に発生し得ません。欠点としては,定期的にスキャンすることによる性能のオーバーヘッドが発生することと、一時停止の時間が長くなりがちであることが挙げられます。 [_garbage collection_]: https://ja.wikipedia.org/wiki/%E3%82%AC%E3%83%99%E3%83%BC%E3%82%B8%E3%82%B3%E3%83%AC%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3 Rustはこの問題に対して別のアプローチを取ります:[**所有権**][_ownership_]と呼ばれる概念を使って、動的メモリの操作の正確性をコンパイル時にチェックするのです。従って前述の脆弱性を回避するためのガベージコレクションの必要がなく、性能のオーバーヘッドが存在しません。このアプローチのもう一つの利点として、CやC++と同様、プログラマが動的メモリの使用に関して精緻な制御を行うことができるということが挙げられます。 [_ownership_]: https://doc.rust-jp.rs/book-ja/ch04-01-what-is-ownership.html ### Rustにおける割り当て プログラマーに自分の手で`allocate`と`deallocate`を呼ばせる代わりに、Rustの標準ライブラリはこれらの関数を暗黙の内に呼ぶ抽象型を提供しています。最も重要な型は[**`Box`**]で、これはヒープに割り当てられた値の抽象化です。これは[`Box::new`]コンストラクタ関数を提供しており、これは値を引数として、その値のサイズを引数に`allocate`を呼び出し、ヒープ上に新しく割り当てられたスロットにその値を移動 (ムーブ) します。ヒープメモリを解放するために、スコープから出た際に`deallocate`を呼ぶような[`Drop`トレイト][`Drop` trait]を`Box`型は実装しています。 [**`Box`**]: https://doc.rust-lang.org/std/boxed/index.html [`Box::new`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html#method.new [`Drop` trait]: https://doc.rust-jp.rs/book-ja/ch15-03-drop.html ```rust { let z = Box::new([1,2,3]); […] } // zがスコープから出たので`deallocate`が呼ばれる ``` このような記法のパターンは[リソース取得は初期化である][_resource acquisition is initialization_](resource acquisition is initialization、略してRAII)という奇妙な名前を持っています。C++で[`std::unique_ptr`]という同じような抽象型を実装するのに使われたのが始まりです。 [_resource acquisition is initialization_]: https://ja.wikipedia.org/wiki/RAII [`std::unique_ptr`]: https://en.cppreference.com/w/cpp/memory/unique_ptr このような型自体ではすべてのuse-after-freeバグを防ぐのに十分ではありません。なぜなら、プログラマは、`Box`がスコープ外に出て対応するヒープメモリスロットがdeallocateされた後でも参照を利用し続けることができてしまうからです: ```rust let x = { let z = Box::new([1,2,3]); &z[1] }; // zがスコープから出たので`deallocate`が呼ばれる println!("{}", x); ``` ここでRustの所有権の出番です。所有権システムは、参照が有効なスコープを表す抽象[ライフタイム][lifetime]をそれぞれの参照に指定します。上の例では、参照`x`は配列`z`から取られているので、`z`がスコープ外に出ると無効になります。[上の例をplaygroundで実行する][playground-2]と、確かにRustコンパイラがエラーを投げるのが分かります: [lifetime]: https://doc.rust-jp.rs/book-ja/ch10-03-lifetime-syntax.html [playground-2]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=28180d8de7b62c6b4a681a7b1f745a48 ``` error[E0597]: `z[_]` does not live long enough --> src/main.rs:4:9 | 2 | let x = { | - borrow later stored here 3 | let z = Box::new([1,2,3]); | - binding `z` declared here 4 | &z[1] | ^^^^^ borrowed value does not live long enough 5 | }; // z goes out of scope and `deallocate` is called | - `z[_]` dropped here while still borrowed ``` ここで使われている用語は初見では少しわかりにくいかもしれません。値の参照を取ることは値を借用する (borrow) と呼ばれています。これは現実での借用と似ているためです:オブジェクトに一時的にアクセスできるようになりますが、それをいつか返さなければならず、また破壊することも許されません。オブジェクトが破壊される前にすべての借用が終了することを確かめることにより、Rustコンパイラはuse-after-freeが起こりえないことを保証できるのです。 Rustの所有権システムはさらに突き詰められており、use-after-freeバグを防ぐだけでなく、JavaやPythonのようなガベージコレクション型言語と同じ完全な[メモリ安全性 (セーフティ) ][_memory safety_]を提供しています。さらに[スレッド安全性 (セーフティ) ][_thread safety_]も保証されており、マルチスレッドのプログラムにおいてはこれらの言語よりもさらに安全です。さらに最も重要なことに、これらのチェックは全てコンパイル時に行われるため、C言語で手書きされたメモリ管理と比べても実行時のオーバーヘッドはありません。 [_memory safety_]: https://ja.wikipedia.org/wiki/%E3%83%A1%E3%83%A2%E3%83%AA%E5%AE%89%E5%85%A8%E6%80%A7 [_thread safety_]: https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AC%E3%83%83%E3%83%89%E3%82%BB%E3%83%BC%E3%83%95 ### 使用例 Rustにおける動的メモリ割り当ての基礎を学んだわけですが、これをいつ使えば良いのでしょうか?私たちのカーネルは動的メモリ割り当てなしにこれだけやってこられたのに、どうして今になってこれが必要なのでしょうか? まず覚えておいて欲しいのは、割り当てを行うたびにヒープから空いているスロットを探してこないといけないので、動的メモリ割り当てには少しだけ性能オーバーヘッドがあるということです。このため、特に性能が重要となるカーネルのプログラムにおいては、一般に局所変数の方が好ましいです。しかし、動的メモリ割り当てが最良の選択肢であるようなケースも存在するのです。 基本的なルールとして、動的メモリは動的なライフタイムや可変サイズを持つような変数に必要とされます。動的なライフタイムを持つ最も重要な型は[**`Rc`**]で、これはラップされた値に対する参照を数えておき、すべての参照がスコープから外れたらそれをdeallocateするというものです。可変サイズを持つ型の例には、[**`Vec`**]、[**`String`**]、その他の[コレクション型][collection types]といった、要素が追加されたときに動的に大きくなるような型が挙げられます。これらの型は、容量が一杯になると、より大きい量のメモリを割り当て、すべての要素をコピーし、古い割り当てをdeallocateすることにより対処します。 [**`Rc`**]: https://doc.rust-lang.org/alloc/rc/index.html [**`Vec`**]: https://doc.rust-lang.org/alloc/vec/index.html [**`String`**]: https://doc.rust-lang.org/alloc/string/index.html [collection types]: https://doc.rust-lang.org/alloc/collections/index.html 私たちのカーネルでは主にコレクション型を必要とし、例えば、将来の記事でマルチタスキングを実行するときにアクティブなタスクのリストを格納するために使います。 ## アロケータインターフェース ヒープアロケータを実装するための最初のステップは、組み込みの[`alloc`]クレートへの依存関係を追加することです。[`core`]クレートと同様、これは標準ライブラリのサブセットであり、アロケーション型やコレクション型を含んでいます。`alloc`への依存関係を追加するために、以下を`lib.rs`に追加します: [`alloc`]: https://doc.rust-lang.org/alloc/ [`core`]: https://doc.rust-lang.org/core/ ```rust // in src/lib.rs extern crate alloc; ``` 通常の依存関係と異なり`Cargo.toml`を修正する必要はありません。その理由は、`alloc`クレートは標準ライブラリの一部としてRustコンパイラに同梱されているため、コンパイラはすでにこのクレートのことを知っているからです。この`extern crate`宣言を追加することで、コンパイラにこれをインクルードしようと試みるよう指定しています(昔はすべての依存関係が`extern crate`宣言を必要としていたのですが、いまは任意です)。
    **訳者注:** 詳しくは[edition guideの対応するページ](https://doc.rust-jp.rs/edition-guide/rust-2018/path-changes.html#%E3%81%95%E3%82%88%E3%81%86%E3%81%AA%E3%82%89extern-crate)をご覧ください。
    カスタムターゲット向けにコンパイルしようとしているので、Rustインストール時に同梱されていたコンパイル済みの`alloc`を使用することはできません。代わりにcargoにこのクレートをソースから再コンパイルするよう命令する必要があります。これは、配列`unstable.build-std`を`.cargo/config.toml`ファイルに追加することで行えます。 ```toml # in .cargo/config.toml [unstable] build-std = ["core", "compiler_builtins", "alloc"] ```` これでコンパイラは`alloc`クレートを再コンパイルして私たちのカーネルにインクルードしてくれます。 `alloc`クレートが`#[no_std]`なクレートで標準では無効化されている理由は、これが追加の要件を持っているからです。今私たちのプロジェクトをコンパイルしようとすると、その要件をエラーとして目にすることになります: ``` error: no global memory allocator found but one is required; link to std or add #[global_allocator] to a static item that implements the GlobalAlloc trait. (エラー:グローバルメモリアロケータが見つかりませんが、一つ必要です。  stdをリンクするか、GlobalAllocトレイトを実装する静的な要素に#[global_allocator]を付けてください。) error: `#[alloc_error_handler]` function required, but not found (エラー:`#[alloc_error_handler]`関数が必要ですが、見つかりません) ``` 最初のエラーは、`alloc`クレートが、ヒープアロケータという`allocate`と`deallocate`関数を提供するオブジェクトを必要とするために発生します。Rustにおいては、ヒープアロケータ(の満たすべき性質)は[`GlobalAlloc`]トレイトによって記述されており、エラーメッセージでもそのことについて触れられています。クレートのヒープアロケータを設定するためには、`#[global_allocator]`属性を`GlobalAlloc`トレイトを実装する何らかの`static`変数に適用する必要があります。 二つ目のエラーは、(主にメモリが不足している場合)`allocate`の呼び出しが失敗しうるために発生します。私たちのプログラムはこのケースに対処できるようになっている必要があり、そのために使われる関数が`#[alloc_error_handler]`なのです。 [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html 次のセクションでこのトレイトと属性について説明します。 ### `GlobalAlloc`トレイト [`GlobalAlloc`]トレイトはヒープアロケータの提供しなければならない関数を定義します。このトレイトは、プログラマが絶対に直接使わないという点において特別です。代わりに、`alloc`のアロケーション・コレクション型を使うときに、コンパイラがトレイトメソッドへの適切な呼び出しを自動的に挿入します。 このトレイトを私たちのアロケータ型全てに実装しなければならないので、その宣言は詳しく見ておく価値があるでしょう: ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` このトレイトは[`alloc`]と[`dealloc`]という必須メソッドを定義しており、これは上の例で使った`allocate`と`deallocate`関数に相当します: - [`alloc`]メソッドは[`Layout`]インスタンス(割り当てられたメモリの持つべきサイズとアラインメントを記述する)を引数として取ります。メソッドは割り当てられたメモリブロックの最初のバイトへの[生ポインタ][raw pointer]を返します。割り当てエラーが起きたことを示す際は、明示的なエラー値を返す代わりにヌルポインタを返します。このやり方は(Rustの)慣習とはやや外れていますが、同じ慣習に従っている既存のシステムのアロケータをラップするのが簡単になるという利点があります。 - [`dealloc`]はその対で、メモリブロックを開放する役割を持ちます。このメソッドは、`alloc`によって返されたポインタと割り当ての際に使われた`Layout`という二つの引数を取ります。 [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html このトレイトは[`alloc_zeroed`]と[`realloc`]という二つのデフォルト実装付きメソッドも定義しています。 - [`alloc_zeroed`]メソッドは`alloc`を呼んでから割り当てられたメモリブロックの値を0にするのに等しく、デフォルト実装でもまさに同じことをしています。もし、より効率的なカスタム実装があるならば、デフォルト実装を上書きすることもできます。 - [`realloc`]メソッドは割り当てたメモリを拡大したり縮小したりすることができます。デフォルト実装では、要求されたサイズの新しいメモリブロックを割り当て、以前のアロケーションから中身を全てコピーします。同じく、アロケータの実装によってはこのメソッドをより効率的に実装することができるかもしれません。例えば、可能な場合はその場でアロケーションを拡大・縮小するなど。 [`alloc_zeroed`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.alloc_zeroed [`realloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.realloc #### Unsafe トレイト自体とすべてのトレイトメソッドが`unsafe`として宣言されていることに気をつけましょう: - トレイトを`unsafe`として宣言する理由は、プログラマがアロケータ型のトレイト実装が正しいことを保証しなければならないからです。例えば、`alloc`メソッドは他のどこかですでに使用されているメモリブロックを決して返してはならず、もしそうすると未定義動作が発生してしまいます。 - 同様に、メソッドが`unsafe`である理由は、メソッドを呼び出す際に呼び出し元がいくつかの不変条件を保証しなければならないからです。例えば、`alloc`に渡される`Layout`の指定するサイズが非ゼロであることなどです。実際にはこれは大して重要ではなく、というのもこれらのメソッドはコンパイラによって直接呼び出されるため、これらの要件が満たされていることは保証されているからです。 ### `DummyAllocator` アロケータ型が何を提供しないといけないかを理解したので、シンプルなダミー (ハリボテ) のアロケータを作ることができます。そのためまず新しく`allocator`モジュールを作りましょう: ```rust // in src/lib.rs pub mod allocator; ``` 私たちのダミーアロケータでは、トレイトを実装するための最小限のことしかせず、`alloc`が呼び出されたら常にエラーを返すようにします。以下のようになります: ```rust // in src/allocator.rs use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr::null_mut; pub struct Dummy; unsafe impl GlobalAlloc for Dummy { unsafe fn alloc(&self, _layout: Layout) -> *mut u8 { null_mut() } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { panic!("dealloc should be never called") } } ``` この構造体はフィールドを必要としないので、[サイズがゼロの型][zero sized type]として作成します。上で述べたように、`alloc`は常に割り当てエラーに相当するヌルポインタを返すようにします。アロケータがメモリを返すことは絶対に起きないのだから、`dealloc`の呼び出しも絶対に起きないはずです。このため`dealloc`メソッドでは単にpanicすることにします。`alloc_zeroed`と`realloc`メソッドにはデフォルト実装があるので、これらを実装する必要はありません。 [zero sized type]: https://doc.rust-jp.rs/rust-nomicon-ja/exotic-sizes.html#%E3%82%B5%E3%82%A4%E3%82%BA%E3%81%8C-0-%E3%81%AE%E5%9E%8Bzst-zero-sized-type こうして単純なアロケータを手に入れたわけですが、さらにRustコンパイラにこのアロケータを使うよう指示しないといけません。ここで`#[global_allocator]`属性の出番です。 ### `#[global_allocator]`属性 `#[global_allocator]`属性は、どのアロケータインスタンスをグローバルヒープアロケータとして使うべきかをRustコンパイラに指示します。この属性は`GlobalAlloc`トレイトを実装する`static`にのみ適用できます。私たちの`Dummy`アロケータのインスタンスをグローバルアロケータとして登録してみましょう: ```rust // in src/allocator.rs #[global_allocator] static ALLOCATOR: Dummy = Dummy; ``` `Dummy`アロケータは[サイズがゼロの型][zero sized type]なので、初期化式でフィールドを指定する必要はありません。 これをコンパイルしようとすると、最初のエラーは消えているはずです。残っている二つ目のエラーを修正しましょう: ``` error: `#[alloc_error_handler]` function required, but not found ``` ### `#[alloc_error_handler]`属性 `GlobalAlloc`トレイトについて議論したときに学んだように、`alloc`関数はヌルポインタを返すことによって割り当てエラーを示します。ここで生じる疑問は、そのように割り当てが失敗したときRustランタイムはどう対処するべきなのかということです。ここで`#[alloc_error_handler]`属性の出番です。この属性は、パニックが起こったときにパニックハンドラが呼ばれるのと同じように、割り当てエラーが起こったときに呼ばれる関数を指定するのです。 コンパイルエラーを修正するためにそのような関数を追加してみましょう: ```rust // in src/lib.rs #![feature(alloc_error_handler)] // ファイルの先頭に書く #[alloc_error_handler] fn alloc_error_handler(layout: alloc::alloc::Layout) -> ! { panic!("allocation error: {:?}", layout) } ``` `alloc_error_handler`関数はまだunstableなので、feature gateによってこれを有効化する必要があります。この関数は引数を一つ取ります:割り当てエラーが起こったとき`alloc`関数に渡されていた`Layout`のインスタンスです。割り当ての失敗を解決するためにできることはないので、`Layout`インスタンスを含めたメッセージを表示してただpanicすることにしましょう。 この関数を追加したことで、コンパイルエラーは修正されたはずです。これで`alloc`のアロケーション・コレクション型を使えるようになりました。例えば、[`Box`]を使ってヒープに値を割り当てることができます: [`Box`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html ```rust // in src/main.rs extern crate alloc; use alloc::boxed::Box; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] "Hello World!"を表示, `init`の呼び出し, `mapper`と`frame_allocator`を作成 let x = Box::new(41); // […] テストモードでは`test_main`を呼ぶ println!("It did not crash!"); blog_os::hlt_loop(); } ``` `main.rs`においても`extern crate alloc`文を指定する必要があることに注意してください。`lib.rs`と`main.rs`は別のクレートとして取り扱われているためです。しかしながら、グローバルアロケータはプロジェクト内のすべてのクレートに適用されるため、`#[global_allocator]`静的変数をもう一つ作る必要はありません。実際、別のクレートで新しいアロケータを指定するとエラーになります。 上のコードを実行すると、`alloc_error_handler`関数が呼ばれるのが分かります: ![QEMUが"panicked at `allocation error: Layout { size_: 4, align_: 4 }, src/lib.rs:89:5"と出力している。](qemu-dummy-output.png) `Box::new`関数は暗黙のうちにグローバルアロケータの`alloc`関数を呼び出すため、エラーハンドラが呼ばれました。私たちのダミーアロケータは常にヌルポインタを返すので、あらゆる割り当てが失敗するのです。これを修正するためには、使用可能なメモリを実際に返すアロケータを作る必要があります。 ## Creating a Kernel Heap 適切なアロケータを作りたいですが、その前にまず、そのアロケータがメモリを割り当てるためのヒープメモリ領域を作らないといけません。このために、ヒープ領域のための仮想メモリ範囲を定義し、その領域を物理フレームに対応付ける必要があります。仮想メモリとページテーブルの概要については、[ページング入門][_"Introduction To Paging"_]の記事を読んでください。 [_"Introduction To Paging"_]: @/edition-2/posts/08-paging-introduction/index.ja.md 最初のステップはヒープのための仮想メモリ領域を定義することです。他のメモリ領域に使われていない限り、どんな仮想アドレス範囲でも構いません。ここでは、あとからそこがヒープポインタだと簡単に分かるよう、`0x_4444_4444_0000`から始まるメモリとしましょう。 ```rust // in src/allocator.rs pub const HEAP_START: usize = 0x_4444_4444_0000; pub const HEAP_SIZE: usize = 100 * 1024; // 100 KiB ``` 今のところヒープの大きさは100 KiBとします。将来より多くの領域が必要になったら大きくすれば良いです。 今このヒープ領域を使おうとすると、仮想メモリ領域が物理メモリにまだ対応付けられていないためページフォルトが発生します。これを解決するために、[ページング入門][_"Paging Implementation"_]の記事で導入した[`Mapper` API]を使ってヒープページを対応付ける関数`init_heap`を作ります: [`Mapper` API]: @/edition-2/posts/09-paging-implementation/index.ja.md#offsetpagetablewoshi-u [_"Paging Implementation"_]: @/edition-2/posts/09-paging-implementation/index.ja.md ```rust // in src/allocator.rs use x86_64::{ structures::paging::{ mapper::MapToError, FrameAllocator, Mapper, Page, PageTableFlags, Size4KiB, }, VirtAddr, }; pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { let page_range = { let heap_start = VirtAddr::new(HEAP_START as u64); let heap_end = heap_start + HEAP_SIZE - 1u64; let heap_start_page = Page::containing_address(heap_start); let heap_end_page = Page::containing_address(heap_end); Page::range_inclusive(heap_start_page, heap_end_page) }; for page in page_range { let frame = frame_allocator .allocate_frame() .ok_or(MapToError::FrameAllocationFailed)?; let flags = PageTableFlags::PRESENT | PageTableFlags::WRITABLE; unsafe { mapper.map_to(page, frame, flags, frame_allocator)?.flush() }; } Ok(()) } ``` この関数は[`Mapper`]と[`FrameAllocator`]への可変参照を取ります。これらはどちらも[`Size4KiB`]をジェネリックパラメータとすることで4KiBページのみに制限しています。この関数の戻り値は[`Result`]で、成功ヴァリアントが`()`、失敗ヴァリアントが([`Mapper::map_to`]メソッドによって失敗時に返されるエラー型である)[`MapToError`]です。この関数における主なエラーの原因は`map_to`メソッドであるため、このエラー型を流用するのは理にかなっています。 [`Mapper`]:https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`Size4KiB`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/enum.Size4KiB.html [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`MapToError`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html [`Mapper::map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to 実装内容は以下の二つに分けられます: - **ページ範囲の作成:** 対応付けたいページ領域を作成するために、ポインタ`HEAP_START`を[`VirtAddr`]型に変換します。つぎに`HEAP_SIZE`を足すことによってヒープの終端アドレスを計算します。端が含まれる境界 (インクルーシブレンジ) にしたい(ヒープの最後のバイトのアドレスとしたい)ので1を引きます。次に、これらのアドレスを[`containing_address`]関数を使って[`Page`]型に変換します。最後に、[`Page::range_inclusive`]関数を使って最初と最後のページからページ範囲を作成します。 - **ページの対応付け (マッピング) :** 二つ目のステップは、今作ったページ範囲のすべてのページに対して対応付けを行うことです。これを行うため、`for`ループを使ってこのページ範囲に対して繰り返し処理を行います。それぞれのページに対して以下を行います: - [`FrameAllocator::allocate_frame`]メソッドを使って、ページのマップされるべき物理フレームを割り当てます。このメソッドはもうフレームが残っていないとき[`None`]を返します。このケースに対処するため、[`Option::ok_or`]メソッドを使ってこれを[`MapToError::FrameAllocationFailed`]に変換し、エラーの場合は[`?`演算子][question mark operator]を使って早期リターンしています。 - このページに対し、必要となる`PRESENT`フラグと`WRITABLE`フラグをセットします。これらのフラグにより読み書きのアクセスが許可されますが、これはヒープメモリとして理にかなっています。 - [`Mapper::map_to`]メソッドを使ってアクティブなページテーブルに対応付けを作成します。このメソッドは失敗しうるので、同様に[`?`演算子][question mark operator]を使ってエラーを呼び出し元に受け渡します。成功時には、このメソッドは[`MapperFlush`]インスタンスを返しますが、これを使って[`flush`]メソッドを呼ぶことで[**トランスレーション・ルックアサイド・バッファ**][_translation lookaside buffer_]を更新することができます。 [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`Page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html [`containing_address`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.containing_address [`Page::range_inclusive`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.range_inclusive [`FrameAllocator::allocate_frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html#tymethod.allocate_frame [`None`]: https://doc.rust-lang.org/core/option/enum.Option.html#variant.None [`MapToError::FrameAllocationFailed`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html#variant.FrameAllocationFailed [`Option::ok_or`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.ok_or [question mark operator]: https://doc.rust-jp.rs/book-ja/ch09-02-recoverable-errors-with-result.html#%E3%82%A8%E3%83%A9%E3%83%BC%E5%A7%94%E8%AD%B2%E3%81%AE%E3%82%B7%E3%83%A7%E3%83%BC%E3%83%88%E3%82%AB%E3%83%83%E3%83%88-%E6%BC%94%E7%AE%97%E5%AD%90 [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [_translation lookaside buffer_]: @/edition-2/posts/08-paging-introduction/index.ja.md#toransuresiyonrutukuasaidobatuhua [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush 最後のステップは、この関数を`kernel_main`から呼び出すことです: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; // 新しいインポート use blog_os::memory::{self, BootInfoFrameAllocator}; println!("Hello World{}", "!"); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; // ここを追加 allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("heap initialization failed"); let x = Box::new(41); // […] テストモードでは`test_main`を呼ぶ println!("It did not crash!"); blog_os::hlt_loop(); } ``` ここで、文脈が分かるよう関数の全体を示しています。(しかし)新しい行は`blog_os::allocator`のインポートと`allocator::init_heap`の呼び出しだけです。`init_heap`関数がエラーを返した場合、これを処理する良い方法は今のところないため、[`Result::expect`]メソッドを使ってパニックします。 [`Result::expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect これで、使用する準備のできた、対応付けられたヒープメモリ領域を手に入れました。`Box::new`の呼び出しはまだ私たちの古い`Dummy`アロケータを使っているので、実行しても依然として「メモリ不足」のエラーを見ることになるでしょう。適切なアロケータを使うようにして、このエラーを修正してみましょう。 ## アロケータクレートを使う アロケータを実装するのは少々複雑なので、まずは既製のアロケータを使うことにしましょう。アロケータを自作する方法については次の記事で学びます。 `no_std`のアプリケーションのためのシンプルなアロケータのひとつに[`linked_list_allocator`]クレートがあります。この名前は、割り当てられていないメモリ領域を連結リストを使って管理しているところから来ています。この手法のより詳しい説明については次の記事を読んでください。 このクレートを使うためには、まず依存関係を`Cargo.toml`に追加する必要があります: [`linked_list_allocator`]: https://github.com/phil-opp/linked-list-allocator/ ```toml # in Cargo.toml [dependencies] linked_list_allocator = "0.9.0" ``` 次に私たちのダミーアロケータをこのクレートによって提供されるアロケータで置き換えます: ```rust // in src/allocator.rs use linked_list_allocator::LockedHeap; #[global_allocator] static ALLOCATOR: LockedHeap = LockedHeap::empty(); ``` この構造体は同期のために`spinning_top::Spinlock`型を使うため`LockedHeap`という名前が付いています。これが必要なのは、`ALLOCATOR`静的変数に複数のスレッドが同時にアクセスすることがありえるからです。スピンロックやmutexを使うときはいつもそうであるように、誤ってデッドロックを起こさないように注意する必要があります。これが意味するのは、我々は割り込みハンドラ内で一切アロケーションを行ってはいけないと言うことです。なぜなら、割り込みハンドラはどんなタイミングでも走る可能性があるため、進行中のアロケーションに割り込んでいることがあるからです。 [`spinning_top::Spinlock`]: https://docs.rs/spinning_top/0.1.0/spinning_top/type.Spinlock.html `LockedHeap`をグローバルアロケータとして設定するだけでは十分ではありません。いま[`empty`]コンストラクタ関数を使っていますが、この関数はメモリを与えることなくアロケータを作るからです。私たちのダミーアロケータと同じく、これ(今の状態の`LockedHeap`)は`alloc`を行うと常にエラーを返します。この問題を修正するため、ヒープを作った後でアロケータを初期化する必要があります: [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.LockedHeap.html#method.empty ```rust // in src/allocator.rs pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { // […] すべてのヒープページを物理フレームにマップする // new unsafe { ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE); } Ok(()) } ``` `LockedHeap`型の内部のスピンロックの[`lock`]メソッドを呼ぶことで、ラップされた[`Heap`]インスタンスへの排他参照を得て、これの[`init`]メソッドをヒープの境界を引数として呼んでいます。`init`関数自体がヒープメモリに書き込もうとするので、ヒープページを対応付けた **後に** ヒープを初期化することが重要です。 [`lock`]: https://docs.rs/lock_api/0.3.3/lock_api/struct.Mutex.html#method.lock [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init ヒープを初期化できたら、組み込みの[`alloc`]クレートのあらゆるアロケーション・コレクション型がエラーなく使用できます: ```rust // in src/main.rs use alloc::{boxed::Box, vec, vec::Vec, rc::Rc}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialize interrupts, mapper, frame_allocator, heap // ヒープに数字をアロケートする let heap_value = Box::new(41); println!("heap_value at {:p}", heap_value); // 動的サイズのベクタを作成する let mut vec = Vec::new(); for i in 0..500 { vec.push(i); } println!("vec at {:p}", vec.as_slice()); // 参照カウントされたベクタを作成する -> カウントが0になると解放される let reference_counted = Rc::new(vec![1, 2, 3]); let cloned_reference = reference_counted.clone(); println!("current reference count is {}", Rc::strong_count(&cloned_reference)); core::mem::drop(reference_counted); println!("reference count is {} now", Rc::strong_count(&cloned_reference)); // […] テストでは `test_main` を呼ぶ println!("It did not crash!"); blog_os::hlt_loop(); } ``` このコード例では[`Box`], [`Vec`], [`Rc`]型を使ってみました。`Box`型と`Vec`型については対応するヒープポインタを[`{:p}`フォーマット指定子][`{:p}` formatting specifier]を使って出力しています。`Rc`についての例を示すために、参照カウントされたヒープ値を作成し、インスタンスを([`core::mem::drop`]を使って)ドロップする前と後に[`Rc::strong_count`]関数を使って現在の参照カウントを出力しています。 [`Vec`]: https://doc.rust-lang.org/alloc/vec/ [`Rc`]: https://doc.rust-lang.org/alloc/rc/ [`{:p}` formatting specifier]: https://doc.rust-lang.org/core/fmt/trait.Pointer.html [`Rc::strong_count`]: https://doc.rust-lang.org/alloc/rc/struct.Rc.html#method.strong_count [`core::mem::drop`]: https://doc.rust-lang.org/core/mem/fn.drop.html 実行すると、以下のような結果を得ます: ![QEMUが` heap_value at 0x444444440000 vec at 0x4444444408000 current reference count is 2 reference count is 1 now `と出力している](qemu-alloc-showcase.png) ポインタが`0x_4444_4444_*`で始まることから、`Box`と`Vec`の値は想定通りヒープ上にあることが分かります。参照カウントされた値も期待したとおり振る舞っており、`clone`呼び出しの後では参照カウントは2になり、インスタンスの一方がドロップされた後では再び1になっています。 ベクタがヒープメモリの先頭から`0x800`だけずれた場所から始まるのは、Box内の値が`0x800`バイトの大きさがあるためではなく、ベクタが容量を増やさなければならないときに発生する[再割り当て (リアロケーション) ][reallocations]のためです。例えば、ベクタの容量が32の際に次の要素を追加しようとすると、ベクタは内部で容量64の配列を新たに割り当て、すべての要素をコピーします。その後古い割り当てを解放しています。 [reallocations]: https://doc.rust-lang.org/alloc/vec/struct.Vec.html#capacity-and-reallocation もちろん`alloc`クレートにはもっと多くのアロケーション・コレクション型があり、今やそれらのすべてを私たちのカーネルで使うことができます。それには以下が含まれます: - スレッドセーフな参照カウントポインタ[`Arc`] - 文字列を所有する型[`String`]と[`format!`]マクロ - [`LinkedList`] - 必要に応じてサイズを大きくできるリングバッファ[`VecDeque`] - プライオリティキューである[`BinaryHeap`] - [`BTreeMap`]と[`BTreeSet`] [`Arc`]: https://doc.rust-lang.org/alloc/sync/struct.Arc.html [`String`]: https://doc.rust-lang.org/alloc/string/struct.String.html [`format!`]: https://doc.rust-lang.org/alloc/macro.format.html [`LinkedList`]: https://doc.rust-lang.org/alloc/collections/linked_list/struct.LinkedList.html [`VecDeque`]: https://doc.rust-lang.org/alloc/collections/vec_deque/struct.VecDeque.html [`BinaryHeap`]: https://doc.rust-lang.org/alloc/collections/binary_heap/struct.BinaryHeap.html [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html [`BTreeSet`]: https://doc.rust-lang.org/alloc/collections/btree_set/struct.BTreeSet.html これらの型は、スレッドリスト、スケジュールキュー、async/awaitのサポートを実装しようとするときにとても有用になります。 ## テストを追加する いま新しく作ったアロケーションコードを間違って壊してしまうことがないことを保証するために、結合 (インテグレーション) テストを追加するべきでしょう。まず、次のような内容のファイル`tests/heap_allocation.rs`を作成します。 ```rust // in tests/heap_allocation.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] extern crate alloc; use bootloader::{entry_point, BootInfo}; use core::panic::PanicInfo; entry_point!(main); fn main(boot_info: &'static BootInfo) -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` `lib.rs`の`test_runner`関数と`test_panic_handler`関数を再利用します。私たちはアロケーションをテストしたいので、`extern crate alloc`宣言を使って`alloc`クレートを有効化します。テストに共通する定型部については[テスト][_Testing_]の記事を読んでください。 [_Testing_]: @/edition-2/posts/04-testing/index.ja.md `main`関数の実装は以下のようになります: ```rust // in tests/heap_allocation.rs fn main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; use blog_os::memory::{self, BootInfoFrameAllocator}; use x86_64::VirtAddr; blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("heap initialization failed"); test_main(); loop {} } ``` 私たちの`main.rs`内の`kernel_main`関数によく似ていますが、`println`を呼び出さず、例示のため行ったアロケーションも行わず、また`test_main`を無条件で呼び出しているという違いがあります。 これでテストケースを追加する準備ができました。まず、[`Box`]を使って単純な割り当て (アロケーション) を行い、割り当てられた値を確かめることで基本的なアロケーションがうまくいっていることを確かめるテストを追加しましょう: ```rust // in tests/heap_allocation.rs use alloc::boxed::Box; #[test_case] fn simple_allocation() { let heap_value_1 = Box::new(41); let heap_value_2 = Box::new(13); assert_eq!(*heap_value_1, 41); assert_eq!(*heap_value_2, 13); } ``` 最も重要なのは、このテストはアロケーションエラーが起きないことを検証してくれるということです。 次に、反復によって少しずつ大きなベクタを作ることで、大きな割り当てと(再割り当てによる)複数回の割り当ての両方をテストしましょう: ```rust // in tests/heap_allocation.rs use alloc::vec::Vec; #[test_case] fn large_vec() { let n = 1000; let mut vec = Vec::new(); for i in 0..n { vec.push(i); } assert_eq!(vec.iter().sum::(), (n - 1) * n / 2); } ``` このベクタの和を[n次部分和][n-th partial sum]の公式と比較することで検証しています。これにより、割り当てられた値はすべて正しいことをある程度保証できます。 [n-th partial sum]: https://ja.wikipedia.org/wiki/1%2B2%2B3%2B4%2B%E2%80%A6 3つ目のテストとして、10000回次々にアロケーションを行います: ```rust // in tests/heap_allocation.rs use blog_os::allocator::HEAP_SIZE; #[test_case] fn many_boxes() { for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } } ``` このテストではアロケータが解放されたメモリを次の割り当てで再利用していることを保証してくれます。もしそうなっていなければメモリ不足が起きるでしょう。こんなことアロケータにとって当たり前の要件だと思われるかもしれませんが、これを行わないようなアロケータの設計も存在するのです。その例として、次の記事で説明するbump allocatorがあります。 では、私たちの新しい結合テストを実行してみましょう: ``` > cargo test --test heap_allocation […] Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` すべてのテストが成功しました!`cargo test`コマンドを(`--test`引数なしに)呼ぶことで、すべての結合テストを実行することもできます。 ## まとめ この記事では動的メモリに入門し、なぜ、そしていつそれが必要になるのかを説明しました。Rustの借用チェッカがどのようにしてよくある脆弱性を防ぐのか、そしてRustのアロケーションAPIがどのような仕組みなのかを理解しました。 ダミーアロケータでRustのアロケータインターフェースの最小限の実装を作成した後、私たちのカーネル用の適切なヒープメモリ領域を作成しました。これを行うために、ヒープ用の仮想アドレス範囲を定義し、前の記事で説明した`Mapper`と`FrameAllocator`を使ってその範囲のすべてのページを物理フレームに対応付けました。 最後に、`linked_list_allocator`クレートへの依存関係を追加し、適切なアロケータを私たちのカーネルに追加しました。このアロケータのおかげで、`alloc`クレートに含まれる`Box`、`Vec`、その他のアロケーション・コレクション型を使えるようになりました。 ## 次は? この記事ではヒープ割り当て機能のサポートを追加しましたが、ほとんどの仕事は`linked_list_allocator`クレートに任せてしまっています。次の記事では、アロケータをゼロから実装する方法を詳細にお伝えします。可能なアロケータの設計を複数提示し、それらを単純化したものを実装する方法を示し、それらの利点と欠点を説明します。 ================================================ FILE: blog/content/edition-2/posts/10-heap-allocation/index.md ================================================ +++ title = "Heap Allocation" weight = 10 path = "heap-allocation" date = 2019-06-26 [extra] chapter = "Memory Management" +++ This post adds support for heap allocation to our kernel. First, it gives an introduction to dynamic memory and shows how the borrow checker prevents common allocation errors. It then implements the basic allocation interface of Rust, creates a heap memory region, and sets up an allocator crate. At the end of this post, all the allocation and collection types of the built-in `alloc` crate will be available to our kernel. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-10`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-10 ## Local and Static Variables We currently use two types of variables in our kernel: local variables and `static` variables. Local variables are stored on the [call stack] and are only valid until the surrounding function returns. Static variables are stored at a fixed memory location and always live for the complete lifetime of the program. ### Local Variables Local variables are stored on the [call stack], which is a [stack data structure] that supports `push` and `pop` operations. On each function entry, the parameters, the return address, and the local variables of the called function are pushed by the compiler: [call stack]: https://en.wikipedia.org/wiki/Call_stack [stack data structure]: https://en.wikipedia.org/wiki/Stack_(abstract_data_type) ![An `outer()` and an `inner(i: usize)` function, where `outer` calls `inner(1)`. Both have some local variables. The call stack contains the following slots: the local variables of outer, then the argument `i = 1`, then the return address, then the local variables of inner.](call-stack.svg) The above example shows the call stack after the `outer` function called the `inner` function. We see that the call stack contains the local variables of `outer` first. On the `inner` call, the parameter `1` and the return address for the function were pushed. Then control was transferred to `inner`, which pushed its local variables. After the `inner` function returns, its part of the call stack is popped again and only the local variables of `outer` remain: ![The call stack containing only the local variables of `outer`](call-stack-return.svg) We see that the local variables of `inner` only live until the function returns. The Rust compiler enforces these lifetimes and throws an error when we use a value for too long, for example when we try to return a reference to a local variable: ```rust fn inner(i: usize) -> &'static u32 { let z = [1, 2, 3]; &z[i] } ``` ([run the example on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=6186a0f3a54f468e1de8894996d12819)) While returning a reference makes no sense in this example, there are cases where we want a variable to live longer than the function. We already saw such a case in our kernel when we tried to [load an interrupt descriptor table] and had to use a `static` variable to extend the lifetime. [load an interrupt descriptor table]: @/edition-2/posts/05-cpu-exceptions/index.md#loading-the-idt ### Static Variables Static variables are stored at a fixed memory location separate from the stack. This memory location is assigned at compile time by the linker and encoded in the executable. Statics live for the complete runtime of the program, so they have the `'static` lifetime and can always be referenced from local variables: ![The same outer/inner example, except that inner has a `static Z: [u32; 3] = [1,2,3];` and returns a `&Z[i]` reference](call-stack-static.svg) When the `inner` function returns in the above example, its part of the call stack is destroyed. The static variables live in a separate memory range that is never destroyed, so the `&Z[1]` reference is still valid after the return. Apart from the `'static` lifetime, static variables also have the useful property that their location is known at compile time, so that no reference is needed for accessing them. We utilized that property for our `println` macro: By using a [static `Writer`] internally, there is no `&mut Writer` reference needed to invoke the macro, which is very useful in [exception handlers], where we don't have access to any additional variables. [static `Writer`]: @/edition-2/posts/03-vga-text-buffer/index.md#a-global-interface [exception handlers]: @/edition-2/posts/05-cpu-exceptions/index.md#implementation However, this property of static variables brings a crucial drawback: they are read-only by default. Rust enforces this because a [data race] would occur if, e.g., two threads modified a static variable at the same time. The only way to modify a static variable is to encapsulate it in a [`Mutex`] type, which ensures that only a single `&mut` reference exists at any point in time. We already used a `Mutex` for our [static VGA buffer `Writer`][vga mutex]. [data race]: https://doc.rust-lang.org/nomicon/races.html [`Mutex`]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html [vga mutex]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ## Dynamic Memory Local and static variables are already very powerful together and enable most use cases. However, we saw that they both have their limitations: - Local variables only live until the end of the surrounding function or block. This is because they live on the call stack and are destroyed after the surrounding function returns. - Static variables always live for the complete runtime of the program, so there is no way to reclaim and reuse their memory when they're no longer needed. Also, they have unclear ownership semantics and are accessible from all functions, so they need to be protected by a [`Mutex`] when we want to modify them. Another limitation of local and static variables is that they have a fixed size. So they can't store a collection that dynamically grows when more elements are added. (There are proposals for [unsized rvalues] in Rust that would allow dynamically sized local variables, but they only work in some specific cases.) [unsized rvalues]: https://github.com/rust-lang/rust/issues/48055 To circumvent these drawbacks, programming languages often support a third memory region for storing variables called the **heap**. The heap supports _dynamic memory allocation_ at runtime through two functions called `allocate` and `deallocate`. It works in the following way: The `allocate` function returns a free chunk of memory of the specified size that can be used to store a variable. This variable then lives until it is freed by calling the `deallocate` function with a reference to the variable. Let's go through an example: ![The inner function calls `allocate(size_of([u32; 3]))`, writes `z.write([1,2,3]);`, and returns `(z as *mut u32).offset(i)`. On the returned value `y`, the outer function performs `deallocate(y, size_of(u32))`.](call-stack-heap.svg) Here the `inner` function uses heap memory instead of static variables for storing `z`. It first allocates a memory block of the required size, which returns a `*mut u32` [raw pointer]. It then uses the [`ptr::write`] method to write the array `[1,2,3]` to it. In the last step, it uses the [`offset`] function to calculate a pointer to the `i`-th element and then returns it. (Note that we omitted some required casts and unsafe blocks in this example function for brevity.) [raw pointer]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`ptr::write`]: https://doc.rust-lang.org/core/ptr/fn.write.html [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset The allocated memory lives until it is explicitly freed through a call to `deallocate`. Thus, the returned pointer is still valid even after `inner` returned and its part of the call stack was destroyed. The advantage of using heap memory compared to static memory is that the memory can be reused after it is freed, which we do through the `deallocate` call in `outer`. After that call, the situation looks like this: ![The call stack contains the local variables of `outer`, the heap contains `z[0]` and `z[2]`, but no longer `z[1]`.](call-stack-heap-freed.svg) We see that the `z[1]` slot is free again and can be reused for the next `allocate` call. However, we also see that `z[0]` and `z[2]` are never freed because we never deallocate them. Such a bug is called a _memory leak_ and is often the cause of excessive memory consumption of programs (just imagine what happens when we call `inner` repeatedly in a loop). This might seem bad, but there are much more dangerous types of bugs that can happen with dynamic allocation. ### Common Errors Apart from memory leaks, which are unfortunate but don't make the program vulnerable to attackers, there are two common types of bugs with more severe consequences: - When we accidentally continue to use a variable after calling `deallocate` on it, we have a so-called **use-after-free** vulnerability. Such a bug causes undefined behavior and can often be exploited by attackers to execute arbitrary code. - When we accidentally free a variable twice, we have a **double-free** vulnerability. This is problematic because it might free a different allocation that was allocated in the same spot after the first `deallocate` call. Thus, it can lead to a use-after-free vulnerability again. These types of vulnerabilities are commonly known, so one might expect that people have learned how to avoid them by now. But no, such vulnerabilities are still regularly found, for example this [use-after-free vulnerability in Linux][linux vulnerability] (2019), that allowed arbitrary code execution. A web search like `use-after-free linux {current year}` will probably always yield results. This shows that even the best programmers are not always able to correctly handle dynamic memory in complex projects. [linux vulnerability]: https://securityboulevard.com/2019/02/linux-use-after-free-vulnerability-found-in-linux-2-6-through-4-20-11/ To avoid these issues, many languages, such as Java or Python, manage dynamic memory automatically using a technique called [_garbage collection_]. The idea is that the programmer never invokes `deallocate` manually. Instead, the program is regularly paused and scanned for unused heap variables, which are then automatically deallocated. Thus, the above vulnerabilities can never occur. The drawbacks are the performance overhead of the regular scan and the probably long pause times. [_garbage collection_]: https://en.wikipedia.org/wiki/Garbage_collection_(computer_science) Rust takes a different approach to the problem: It uses a concept called [_ownership_] that is able to check the correctness of dynamic memory operations at compile time. Thus, no garbage collection is needed to avoid the mentioned vulnerabilities, which means that there is no performance overhead. Another advantage of this approach is that the programmer still has fine-grained control over the use of dynamic memory, just like with C or C++. [_ownership_]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html ### Allocations in Rust Instead of letting the programmer manually call `allocate` and `deallocate`, the Rust standard library provides abstraction types that call these functions implicitly. The most important type is [**`Box`**], which is an abstraction for a heap-allocated value. It provides a [`Box::new`] constructor function that takes a value, calls `allocate` with the size of the value, and then moves the value to the newly allocated slot on the heap. To free the heap memory again, the `Box` type implements the [`Drop` trait] to call `deallocate` when it goes out of scope: [**`Box`**]: https://doc.rust-lang.org/std/boxed/index.html [`Box::new`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html#method.new [`Drop` trait]: https://doc.rust-lang.org/book/ch15-03-drop.html ```rust { let z = Box::new([1,2,3]); […] } // z goes out of scope and `deallocate` is called ``` This pattern has the strange name [_resource acquisition is initialization_] (or _RAII_ for short). It originated in C++, where it is used to implement a similar abstraction type called [`std::unique_ptr`]. [_resource acquisition is initialization_]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization [`std::unique_ptr`]: https://en.cppreference.com/w/cpp/memory/unique_ptr Such a type alone does not suffice to prevent all use-after-free bugs since programmers can still hold on to references after the `Box` goes out of scope and the corresponding heap memory slot is deallocated: ```rust let x = { let z = Box::new([1,2,3]); &z[1] }; // z goes out of scope and `deallocate` is called println!("{}", x); ``` This is where Rust's ownership comes in. It assigns an abstract [lifetime] to each reference, which is the scope in which the reference is valid. In the above example, the `x` reference is taken from the `z` array, so it becomes invalid after `z` goes out of scope. When you [run the above example on the playground][playground-2] you see that the Rust compiler indeed throws an error: [lifetime]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html [playground-2]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=28180d8de7b62c6b4a681a7b1f745a48 ``` error[E0597]: `z[_]` does not live long enough --> src/main.rs:4:9 | 2 | let x = { | - borrow later stored here 3 | let z = Box::new([1,2,3]); | - binding `z` declared here 4 | &z[1] | ^^^^^ borrowed value does not live long enough 5 | }; // z goes out of scope and `deallocate` is called | - `z[_]` dropped here while still borrowed ``` The terminology can be a bit confusing at first. Taking a reference to a value is called _borrowing_ the value since it's similar to a borrow in real life: You have temporary access to an object but need to return it sometime, and you must not destroy it. By checking that all borrows end before an object is destroyed, the Rust compiler can guarantee that no use-after-free situation can occur. Rust's ownership system goes even further, preventing not only use-after-free bugs but also providing complete [_memory safety_], as garbage collected languages like Java or Python do. Additionally, it guarantees [_thread safety_] and is thus even safer than those languages in multi-threaded code. And most importantly, all these checks happen at compile time, so there is no runtime overhead compared to hand-written memory management in C. [_memory safety_]: https://en.wikipedia.org/wiki/Memory_safety [_thread safety_]: https://en.wikipedia.org/wiki/Thread_safety ### Use Cases We now know the basics of dynamic memory allocation in Rust, but when should we use it? We've come really far with our kernel without dynamic memory allocation, so why do we need it now? First, dynamic memory allocation always comes with a bit of performance overhead since we need to find a free slot on the heap for every allocation. For this reason, local variables are generally preferable, especially in performance-sensitive kernel code. However, there are cases where dynamic memory allocation is the best choice. As a basic rule, dynamic memory is required for variables that have a dynamic lifetime or a variable size. The most important type with a dynamic lifetime is [**`Rc`**], which counts the references to its wrapped value and deallocates it after all references have gone out of scope. Examples for types with a variable size are [**`Vec`**], [**`String`**], and other [collection types] that dynamically grow when more elements are added. These types work by allocating a larger amount of memory when they become full, copying all elements over, and then deallocating the old allocation. [**`Rc`**]: https://doc.rust-lang.org/alloc/rc/index.html [**`Vec`**]: https://doc.rust-lang.org/alloc/vec/index.html [**`String`**]: https://doc.rust-lang.org/alloc/string/index.html [collection types]: https://doc.rust-lang.org/alloc/collections/index.html For our kernel, we will mostly need the collection types, for example, to store a list of active tasks when implementing multitasking in future posts. ## The Allocator Interface The first step in implementing a heap allocator is to add a dependency on the built-in [`alloc`] crate. Like the [`core`] crate, it is a subset of the standard library that additionally contains the allocation and collection types. To add the dependency on `alloc`, we add the following to our `lib.rs`: [`alloc`]: https://doc.rust-lang.org/alloc/ [`core`]: https://doc.rust-lang.org/core/ ```rust // in src/lib.rs extern crate alloc; ``` Contrary to normal dependencies, we don't need to modify the `Cargo.toml`. The reason is that the `alloc` crate ships with the Rust compiler as part of the standard library, so the compiler already knows about the crate. By adding this `extern crate` statement, we specify that the compiler should try to include it. (Historically, all dependencies needed an `extern crate` statement, which is now optional). Since we are compiling for a custom target, we can't use the precompiled version of `alloc` that is shipped with the Rust installation. Instead, we have to tell cargo to recompile the crate from source. We can do that by adding it to the `unstable.build-std` array in our `.cargo/config.toml` file: ```toml # in .cargo/config.toml [unstable] build-std = ["core", "compiler_builtins", "alloc"] ``` Now the compiler will recompile and include the `alloc` crate in our kernel. The reason that the `alloc` crate is disabled by default in `#[no_std]` crates is that it has additional requirements. When we try to compile our project now, we will see these requirements as errors: ``` error: no global memory allocator found but one is required; link to std or add #[global_allocator] to a static item that implements the GlobalAlloc trait. ``` The error occurs because the `alloc` crate requires a heap allocator, which is an object that provides the `allocate` and `deallocate` functions. In Rust, heap allocators are described by the [`GlobalAlloc`] trait, which is mentioned in the error message. To set the heap allocator for the crate, the `#[global_allocator]` attribute must be applied to a `static` variable that implements the `GlobalAlloc` trait. [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ### The `GlobalAlloc` Trait The [`GlobalAlloc`] trait defines the functions that a heap allocator must provide. The trait is special because it is almost never used directly by the programmer. Instead, the compiler will automatically insert the appropriate calls to the trait methods when using the allocation and collection types of `alloc`. Since we will need to implement the trait for all our allocator types, it is worth taking a closer look at its declaration: ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` It defines the two required methods [`alloc`] and [`dealloc`], which correspond to the `allocate` and `deallocate` functions we used in our examples: - The [`alloc`] method takes a [`Layout`] instance as an argument, which describes the desired size and alignment that the allocated memory should have. It returns a [raw pointer] to the first byte of the allocated memory block. Instead of an explicit error value, the `alloc` method returns a null pointer to signal an allocation error. This is a bit non-idiomatic, but it has the advantage that wrapping existing system allocators is easy since they use the same convention. - The [`dealloc`] method is the counterpart and is responsible for freeing a memory block again. It receives two arguments: the pointer returned by `alloc` and the `Layout` that was used for the allocation. [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html The trait additionally defines the two methods [`alloc_zeroed`] and [`realloc`] with default implementations: - The [`alloc_zeroed`] method is equivalent to calling `alloc` and then setting the allocated memory block to zero, which is exactly what the provided default implementation does. An allocator implementation can override the default implementations with a more efficient custom implementation if possible. - The [`realloc`] method allows to grow or shrink an allocation. The default implementation allocates a new memory block with the desired size and copies over all the content from the previous allocation. Again, an allocator implementation can probably provide a more efficient implementation of this method, for example by growing/shrinking the allocation in-place if possible. [`alloc_zeroed`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.alloc_zeroed [`realloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.realloc #### Unsafety One thing to notice is that both the trait itself and all trait methods are declared as `unsafe`: - The reason for declaring the trait as `unsafe` is that the programmer must guarantee that the trait implementation for an allocator type is correct. For example, the `alloc` method must never return a memory block that is already used somewhere else because this would cause undefined behavior. - Similarly, the reason that the methods are `unsafe` is that the caller must ensure various invariants when calling the methods, for example, that the `Layout` passed to `alloc` specifies a non-zero size. This is not really relevant in practice since the methods are normally called directly by the compiler, which ensures that the requirements are met. ### A `DummyAllocator` Now that we know what an allocator type should provide, we can create a simple dummy allocator. For that, we create a new `allocator` module: ```rust // in src/lib.rs pub mod allocator; ``` Our dummy allocator does the absolute minimum to implement the trait and always returns an error when `alloc` is called. It looks like this: ```rust // in src/allocator.rs use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr::null_mut; pub struct Dummy; unsafe impl GlobalAlloc for Dummy { unsafe fn alloc(&self, _layout: Layout) -> *mut u8 { null_mut() } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { panic!("dealloc should be never called") } } ``` The struct does not need any fields, so we create it as a [zero-sized type]. As mentioned above, we always return the null pointer from `alloc`, which corresponds to an allocation error. Since the allocator never returns any memory, a call to `dealloc` should never occur. For this reason, we simply panic in the `dealloc` method. The `alloc_zeroed` and `realloc` methods have default implementations, so we don't need to provide implementations for them. [zero-sized type]: https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts We now have a simple allocator, but we still have to tell the Rust compiler that it should use this allocator. This is where the `#[global_allocator]` attribute comes in. ### The `#[global_allocator]` Attribute The `#[global_allocator]` attribute tells the Rust compiler which allocator instance it should use as the global heap allocator. The attribute is only applicable to a `static` that implements the `GlobalAlloc` trait. Let's register an instance of our `Dummy` allocator as the global allocator: ```rust // in src/allocator.rs #[global_allocator] static ALLOCATOR: Dummy = Dummy; ``` Since the `Dummy` allocator is a [zero-sized type], we don't need to specify any fields in the initialization expression. With this static, the compilation errors should be fixed. Now we can use the allocation and collection types of `alloc`. For example, we can use a [`Box`] to allocate a value on the heap: [`Box`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html ```rust // in src/main.rs extern crate alloc; use alloc::boxed::Box; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] print "Hello World!", call `init`, create `mapper` and `frame_allocator` let x = Box::new(41); // […] call `test_main` in test mode println!("It did not crash!"); blog_os::hlt_loop(); } ``` Note that we need to specify the `extern crate alloc` statement in our `main.rs` too. This is required because the `lib.rs` and `main.rs` parts are treated as separate crates. However, we don't need to create another `#[global_allocator]` static because the global allocator applies to all crates in the project. In fact, specifying an additional allocator in another crate would be an error. When we run the above code, we see that a panic occurs: ![QEMU printing "panicked at `allocation error: Layout { size_: 4, align_: 4 }, src/lib.rs:89:5"](qemu-dummy-output.png) The panic occurs because the `Box::new` function implicitly calls the `alloc` function of the global allocator. Our dummy allocator always returns a null pointer, so every allocation fails. To fix this, we need to create an allocator that actually returns usable memory. ## Creating a Kernel Heap Before we can create a proper allocator, we first need to create a heap memory region from which the allocator can allocate memory. To do this, we need to define a virtual memory range for the heap region and then map this region to physical frames. See the [_"Introduction To Paging"_] post for an overview of virtual memory and page tables. [_"Introduction To Paging"_]: @/edition-2/posts/08-paging-introduction/index.md The first step is to define a virtual memory region for the heap. We can choose any virtual address range that we like, as long as it is not already used for a different memory region. Let's define it as the memory starting at address `0x_4444_4444_0000` so that we can easily recognize a heap pointer later: ```rust // in src/allocator.rs pub const HEAP_START: usize = 0x_4444_4444_0000; pub const HEAP_SIZE: usize = 100 * 1024; // 100 KiB ``` We set the heap size to 100 KiB for now. If we need more space in the future, we can simply increase it. If we tried to use this heap region now, a page fault would occur since the virtual memory region is not mapped to physical memory yet. To resolve this, we create an `init_heap` function that maps the heap pages using the [`Mapper` API] that we introduced in the [_"Paging Implementation"_] post: [`Mapper` API]: @/edition-2/posts/09-paging-implementation/index.md#using-offsetpagetable [_"Paging Implementation"_]: @/edition-2/posts/09-paging-implementation/index.md ```rust // in src/allocator.rs use x86_64::{ structures::paging::{ mapper::MapToError, FrameAllocator, Mapper, Page, PageTableFlags, Size4KiB, }, VirtAddr, }; pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { let page_range = { let heap_start = VirtAddr::new(HEAP_START as u64); let heap_end = heap_start + HEAP_SIZE - 1u64; let heap_start_page = Page::containing_address(heap_start); let heap_end_page = Page::containing_address(heap_end); Page::range_inclusive(heap_start_page, heap_end_page) }; for page in page_range { let frame = frame_allocator .allocate_frame() .ok_or(MapToError::FrameAllocationFailed)?; let flags = PageTableFlags::PRESENT | PageTableFlags::WRITABLE; unsafe { mapper.map_to(page, frame, flags, frame_allocator)?.flush() }; } Ok(()) } ``` The function takes mutable references to a [`Mapper`] and a [`FrameAllocator`] instance, both limited to 4 KiB pages by using [`Size4KiB`] as the generic parameter. The return value of the function is a [`Result`] with the unit type `()` as the success variant and a [`MapToError`] as the error variant, which is the error type returned by the [`Mapper::map_to`] method. Reusing the error type makes sense here because the `map_to` method is the main source of errors in this function. [`Mapper`]:https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`Size4KiB`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/enum.Size4KiB.html [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`MapToError`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html [`Mapper::map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to The implementation can be broken down into two parts: - **Creating the page range:**: To create a range of the pages that we want to map, we convert the `HEAP_START` pointer to a [`VirtAddr`] type. Then we calculate the heap end address from it by adding the `HEAP_SIZE`. We want an inclusive bound (the address of the last byte of the heap), so we subtract 1. Next, we convert the addresses into [`Page`] types using the [`containing_address`] function. Finally, we create a page range from the start and end pages using the [`Page::range_inclusive`] function. - **Mapping the pages:** The second step is to map all pages of the page range we just created. For that, we iterate over these pages using a `for` loop. For each page, we do the following: - We allocate a physical frame that the page should be mapped to using the [`FrameAllocator::allocate_frame`] method. This method returns [`None`] when there are no more frames left. We deal with that case by mapping it to a [`MapToError::FrameAllocationFailed`] error through the [`Option::ok_or`] method and then applying the [question mark operator] to return early in the case of an error. - We set the required `PRESENT` flag and the `WRITABLE` flag for the page. With these flags, both read and write accesses are allowed, which makes sense for heap memory. - We use the [`Mapper::map_to`] method for creating the mapping in the active page table. The method can fail, so we use the [question mark operator] again to forward the error to the caller. On success, the method returns a [`MapperFlush`] instance that we can use to update the [_translation lookaside buffer_] using the [`flush`] method. [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`Page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html [`containing_address`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.containing_address [`Page::range_inclusive`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.range_inclusive [`FrameAllocator::allocate_frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html#tymethod.allocate_frame [`None`]: https://doc.rust-lang.org/core/option/enum.Option.html#variant.None [`MapToError::FrameAllocationFailed`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html#variant.FrameAllocationFailed [`Option::ok_or`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.ok_or [question mark operator]: https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [_translation lookaside buffer_]: @/edition-2/posts/08-paging-introduction/index.md#the-translation-lookaside-buffer [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush The final step is to call this function from our `kernel_main`: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; // new import use blog_os::memory::{self, BootInfoFrameAllocator}; println!("Hello World{}", "!"); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; // new allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("heap initialization failed"); let x = Box::new(41); // […] call `test_main` in test mode println!("It did not crash!"); blog_os::hlt_loop(); } ``` We show the full function for context here. The only new lines are the `blog_os::allocator` import and the call to the `allocator::init_heap` function. In case the `init_heap` function returns an error, we panic using the [`Result::expect`] method since there is currently no sensible way for us to handle this error. [`Result::expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect We now have a mapped heap memory region that is ready to be used. The `Box::new` call still uses our old `Dummy` allocator, so you will still see the "out of memory" error when you run it. Let's fix this by using a proper allocator. ## Using an Allocator Crate Since implementing an allocator is somewhat complex, we start by using an external allocator crate. We will learn how to implement our own allocator in the next post. A simple allocator crate for `no_std` applications is the [`linked_list_allocator`] crate. Its name comes from the fact that it uses a linked list data structure to keep track of deallocated memory regions. See the next post for a more detailed explanation of this approach. To use the crate, we first need to add a dependency on it in our `Cargo.toml`: [`linked_list_allocator`]: https://github.com/phil-opp/linked-list-allocator/ ```toml # in Cargo.toml [dependencies] linked_list_allocator = "0.9.0" ``` Then we can replace our dummy allocator with the allocator provided by the crate: ```rust // in src/allocator.rs use linked_list_allocator::LockedHeap; #[global_allocator] static ALLOCATOR: LockedHeap = LockedHeap::empty(); ``` The struct is named `LockedHeap` because it uses the [`spinning_top::Spinlock`] type for synchronization. This is required because multiple threads could access the `ALLOCATOR` static at the same time. As always, when using a spinlock or a mutex, we need to be careful to not accidentally cause a deadlock. This means that we shouldn't perform any allocations in interrupt handlers, since they can run at an arbitrary time and might interrupt an in-progress allocation. [`spinning_top::Spinlock`]: https://docs.rs/spinning_top/0.1.0/spinning_top/type.Spinlock.html Setting the `LockedHeap` as global allocator is not enough. The reason is that we use the [`empty`] constructor function, which creates an allocator without any backing memory. Like our dummy allocator, it always returns an error on `alloc`. To fix this, we need to initialize the allocator after creating the heap: [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.LockedHeap.html#method.empty ```rust // in src/allocator.rs pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { // […] map all heap pages to physical frames // new unsafe { ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE); } Ok(()) } ``` We use the [`lock`] method on the inner spinlock of the `LockedHeap` type to get an exclusive reference to the wrapped [`Heap`] instance, on which we then call the [`init`] method with the heap bounds as arguments. Because the [`init`] function already tries to write to the heap memory, we must initialize the heap only _after_ mapping the heap pages. [`lock`]: https://docs.rs/lock_api/0.3.3/lock_api/struct.Mutex.html#method.lock [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init After initializing the heap, we can now use all allocation and collection types of the built-in [`alloc`] crate without error: ```rust // in src/main.rs use alloc::{boxed::Box, vec, vec::Vec, rc::Rc}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialize interrupts, mapper, frame_allocator, heap // allocate a number on the heap let heap_value = Box::new(41); println!("heap_value at {:p}", heap_value); // create a dynamically sized vector let mut vec = Vec::new(); for i in 0..500 { vec.push(i); } println!("vec at {:p}", vec.as_slice()); // create a reference counted vector -> will be freed when count reaches 0 let reference_counted = Rc::new(vec![1, 2, 3]); let cloned_reference = reference_counted.clone(); println!("current reference count is {}", Rc::strong_count(&cloned_reference)); core::mem::drop(reference_counted); println!("reference count is {} now", Rc::strong_count(&cloned_reference)); // […] call `test_main` in test context println!("It did not crash!"); blog_os::hlt_loop(); } ``` This code example shows some uses of the [`Box`], [`Vec`], and [`Rc`] types. For the `Box` and `Vec` types, we print the underlying heap pointers using the [`{:p}` formatting specifier]. To showcase `Rc`, we create a reference-counted heap value and use the [`Rc::strong_count`] function to print the current reference count before and after dropping an instance (using [`core::mem::drop`]). [`Vec`]: https://doc.rust-lang.org/alloc/vec/ [`Rc`]: https://doc.rust-lang.org/alloc/rc/ [`{:p}` formatting specifier]: https://doc.rust-lang.org/core/fmt/trait.Pointer.html [`Rc::strong_count`]: https://doc.rust-lang.org/alloc/rc/struct.Rc.html#method.strong_count [`core::mem::drop`]: https://doc.rust-lang.org/core/mem/fn.drop.html When we run it, we see the following: ![QEMU printing ` heap_value at 0x444444440000 vec at 0x4444444408000 current reference count is 2 reference count is 1 now ](qemu-alloc-showcase.png) As expected, we see that the `Box` and `Vec` values live on the heap, as indicated by the pointer starting with the `0x_4444_4444_*` prefix. The reference counted value also behaves as expected, with the reference count being 2 after the `clone` call, and 1 again after one of the instances was dropped. The reason that the vector starts at offset `0x800` is not that the boxed value is `0x800` bytes large, but the [reallocations] that occur when the vector needs to increase its capacity. For example, when the vector's capacity is 32 and we try to add the next element, the vector allocates a new backing array with a capacity of 64 behind the scenes and copies all elements over. Then it frees the old allocation. [reallocations]: https://doc.rust-lang.org/alloc/vec/struct.Vec.html#capacity-and-reallocation Of course, there are many more allocation and collection types in the `alloc` crate that we can now all use in our kernel, including: - the thread-safe reference counted pointer [`Arc`] - the owned string type [`String`] and the [`format!`] macro - [`LinkedList`] - the growable ring buffer [`VecDeque`] - the [`BinaryHeap`] priority queue - [`BTreeMap`] and [`BTreeSet`] [`Arc`]: https://doc.rust-lang.org/alloc/sync/struct.Arc.html [`String`]: https://doc.rust-lang.org/alloc/string/struct.String.html [`format!`]: https://doc.rust-lang.org/alloc/macro.format.html [`LinkedList`]: https://doc.rust-lang.org/alloc/collections/linked_list/struct.LinkedList.html [`VecDeque`]: https://doc.rust-lang.org/alloc/collections/vec_deque/struct.VecDeque.html [`BinaryHeap`]: https://doc.rust-lang.org/alloc/collections/binary_heap/struct.BinaryHeap.html [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html [`BTreeSet`]: https://doc.rust-lang.org/alloc/collections/btree_set/struct.BTreeSet.html These types will become very useful when we want to implement thread lists, scheduling queues, or support for async/await. ## Adding a Test To ensure that we don't accidentally break our new allocation code, we should add an integration test for it. We start by creating a new `tests/heap_allocation.rs` file with the following content: ```rust // in tests/heap_allocation.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] extern crate alloc; use bootloader::{entry_point, BootInfo}; use core::panic::PanicInfo; entry_point!(main); fn main(boot_info: &'static BootInfo) -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` We reuse the `test_runner` and `test_panic_handler` functions from our `lib.rs`. Since we want to test allocations, we enable the `alloc` crate through the `extern crate alloc` statement. For more information about the test boilerplate, check out the [_Testing_] post. [_Testing_]: @/edition-2/posts/04-testing/index.md The implementation of the `main` function looks like this: ```rust // in tests/heap_allocation.rs fn main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; use blog_os::memory::{self, BootInfoFrameAllocator}; use x86_64::VirtAddr; blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("heap initialization failed"); test_main(); loop {} } ``` It is very similar to the `kernel_main` function in our `main.rs`, with the differences that we don't invoke `println`, don't include any example allocations, and call `test_main` unconditionally. Now we're ready to add a few test cases. First, we add a test that performs some simple allocations using [`Box`] and checks the allocated values to ensure that basic allocations work: ```rust // in tests/heap_allocation.rs use alloc::boxed::Box; #[test_case] fn simple_allocation() { let heap_value_1 = Box::new(41); let heap_value_2 = Box::new(13); assert_eq!(*heap_value_1, 41); assert_eq!(*heap_value_2, 13); } ``` Most importantly, this test verifies that no allocation error occurs. Next, we iteratively build a large vector, to test both large allocations and multiple allocations (due to reallocations): ```rust // in tests/heap_allocation.rs use alloc::vec::Vec; #[test_case] fn large_vec() { let n = 1000; let mut vec = Vec::new(); for i in 0..n { vec.push(i); } assert_eq!(vec.iter().sum::(), (n - 1) * n / 2); } ``` We verify the sum by comparing it with the formula for the [n-th partial sum]. This gives us some confidence that the allocated values are all correct. [n-th partial sum]: https://en.wikipedia.org/wiki/1_%2B_2_%2B_3_%2B_4_%2B_%E2%8B%AF#Partial_sums As a third test, we create ten thousand allocations after each other: ```rust // in tests/heap_allocation.rs use blog_os::allocator::HEAP_SIZE; #[test_case] fn many_boxes() { for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } } ``` This test ensures that the allocator reuses freed memory for subsequent allocations since it would run out of memory otherwise. This might seem like an obvious requirement for an allocator, but there are allocator designs that don't do this. An example is the bump allocator design that will be explained in the next post. Let's run our new integration test: ``` > cargo test --test heap_allocation […] Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` All three tests succeeded! You can also invoke `cargo test` (without the `--test` argument) to run all unit and integration tests. ## Summary This post gave an introduction to dynamic memory and explained why and where it is needed. We saw how Rust's borrow checker prevents common vulnerabilities and learned how Rust's allocation API works. After creating a minimal implementation of Rust's allocator interface using a dummy allocator, we created a proper heap memory region for our kernel. For that, we defined a virtual address range for the heap and then mapped all pages of that range to physical frames using the `Mapper` and `FrameAllocator` from the previous post. Finally, we added a dependency on the `linked_list_allocator` crate to add a proper allocator to our kernel. With this allocator, we were able to use `Box`, `Vec`, and other allocation and collection types from the `alloc` crate. ## What's next? While we already added heap allocation support in this post, we left most of the work to the `linked_list_allocator` crate. The next post will show in detail how an allocator can be implemented from scratch. It will present multiple possible allocator designs, show how to implement simple versions of them, and explain their advantages and drawbacks. ================================================ FILE: blog/content/edition-2/posts/10-heap-allocation/index.pt-BR.md ================================================ +++ title = "Alocação no Heap" weight = 10 path = "pt-BR/heap-allocation" date = 2019-06-26 [extra] chapter = "Gerenciamento de Memória" # Please update this when updating the translation translation_based_on_commit = "1ba06fe61c39c1379bd768060c21040b62ff3f0b" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Este post adiciona suporte para alocação no heap ao nosso kernel. Primeiro, ele fornece uma introdução à memória dinâmica e mostra como o verificador de empréstimos previne erros comuns de alocação. Em seguida, implementa a interface básica de alocação do Rust, cria uma região de memória heap e configura uma crate de alocador. Ao final deste post, todos os tipos de alocação e coleção da crate embutida `alloc` estarão disponíveis para o nosso kernel. Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou pergunta, por favor abra uma issue lá. Você também pode deixar comentários [no final]. O código-fonte completo para este post pode ser encontrado no branch [`post-10`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-10 ## Variáveis Locais e Estáticas Atualmente usamos dois tipos de variáveis em nosso kernel: variáveis locais e variáveis `static`. Variáveis locais são armazenadas na [pilha de chamadas] e são válidas apenas até que a função circundante retorne. Variáveis estáticas são armazenadas em um local de memória fixo e sempre vivem pela duração completa do programa. ### Variáveis Locais Variáveis locais são armazenadas na [pilha de chamadas], que é uma [estrutura de dados de pilha] que suporta operações de `push` e `pop`. Em cada entrada de função, os parâmetros, o endereço de retorno e as variáveis locais da função chamada são colocados na pilha pelo compilador: [pilha de chamadas]: https://en.wikipedia.org/wiki/Call_stack [estrutura de dados de pilha]: https://en.wikipedia.org/wiki/Stack_(abstract_data_type) ![Uma função `outer()` e uma função `inner(i: usize)`, onde `outer` chama `inner(1)`. Ambas têm algumas variáveis locais. A pilha de chamadas contém os seguintes slots: as variáveis locais de outer, então o argumento `i = 1`, então o endereço de retorno, então as variáveis locais de inner.](call-stack.svg) O exemplo acima mostra a pilha de chamadas depois que a função `outer` chamou a função `inner`. Vemos que a pilha de chamadas contém as variáveis locais de `outer` primeiro. Na chamada de `inner`, o parâmetro `1` e o endereço de retorno da função foram colocados na pilha. Então o controle foi transferido para `inner`, que colocou suas variáveis locais na pilha. Depois que a função `inner` retorna, sua parte da pilha de chamadas é removida novamente e apenas as variáveis locais de `outer` permanecem: ![A pilha de chamadas contendo apenas as variáveis locais de `outer`](call-stack-return.svg) Vemos que as variáveis locais de `inner` vivem apenas até a função retornar. O compilador Rust impõe esses tempos de vida e gera um erro quando usamos um valor por muito tempo, por exemplo, quando tentamos retornar uma referência a uma variável local: ```rust fn inner(i: usize) -> &'static u32 { let z = [1, 2, 3]; &z[i] } ``` ([execute o exemplo no playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=6186a0f3a54f468e1de8894996d12819)) Embora retornar uma referência não faça sentido neste exemplo, há casos em que queremos que uma variável viva mais tempo do que a função. Já vimos tal caso em nosso kernel quando tentamos [carregar uma tabela de descritores de interrupção] e tivemos que usar uma variável `static` para estender o tempo de vida. [carregar uma tabela de descritores de interrupção]: @/edition-2/posts/05-cpu-exceptions/index.md#loading-the-idt ### Variáveis Estáticas Variáveis estáticas são armazenadas em um local de memória fixo separado da pilha. Este local de memória é atribuído em tempo de compilação pelo linker e codificado no executável. Variáveis estáticas vivem pela duração completa de execução do programa, então têm o tempo de vida `'static` e sempre podem ser referenciadas de variáveis locais: ![O mesmo exemplo outer/inner, exceto que inner tem um `static Z: [u32; 3] = [1,2,3];` e retorna uma referência `&Z[i]`](call-stack-static.svg) Quando a função `inner` retorna no exemplo acima, sua parte da pilha de chamadas é destruída. As variáveis estáticas vivem em um intervalo de memória separado que nunca é destruído, então a referência `&Z[1]` ainda é válida após o retorno. Além do tempo de vida `'static`, variáveis estáticas também têm a propriedade útil de que sua localização é conhecida em tempo de compilação, de modo que nenhuma referência é necessária para acessá-las. Utilizamos essa propriedade para nossa macro `println`: Ao usar um [`Writer` estático] internamente, nenhuma referência `&mut Writer` é necessária para invocar a macro, o que é muito útil em [manipuladores de exceção], onde não temos acesso a variáveis adicionais. [`Writer` estático]: @/edition-2/posts/03-vga-text-buffer/index.md#a-global-interface [manipuladores de exceção]: @/edition-2/posts/05-cpu-exceptions/index.md#implementation No entanto, essa propriedade de variáveis estáticas traz uma desvantagem crucial: elas são somente leitura por padrão. Rust impõe isso porque uma [corrida de dados] ocorreria se, por exemplo, duas threads modificassem uma variável estática ao mesmo tempo. A única maneira de modificar uma variável estática é encapsulá-la em um tipo [`Mutex`], que garante que apenas uma referência `&mut` exista em qualquer momento. Já usamos um `Mutex` para nosso [`Writer` estático do buffer VGA][vga mutex]. [corrida de dados]: https://doc.rust-lang.org/nomicon/races.html [`Mutex`]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html [vga mutex]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ## Memória Dinâmica Variáveis locais e estáticas já são muito poderosas juntas e permitem a maioria dos casos de uso. No entanto, vimos que ambas têm suas limitações: - Variáveis locais vivem apenas até o final da função ou bloco circundante. Isso ocorre porque elas vivem na pilha de chamadas e são destruídas depois que a função circundante retorna. - Variáveis estáticas sempre vivem pela duração completa de execução do programa, então não há maneira de recuperar e reutilizar sua memória quando não são mais necessárias. Além disso, elas têm semântica de propriedade pouco clara e são acessíveis de todas as funções, então precisam ser protegidas por um [`Mutex`] quando queremos modificá-las. Outra limitação de variáveis locais e estáticas é que elas têm um tamanho fixo. Então elas não podem armazenar uma coleção que cresce dinamicamente quando mais elementos são adicionados. (Existem propostas para [rvalues não dimensionados] em Rust que permitiriam variáveis locais de tamanho dinâmico, mas eles só funcionam em alguns casos específicos.) [rvalues não dimensionados]: https://github.com/rust-lang/rust/issues/48055 Para contornar essas desvantagens, linguagens de programação frequentemente suportam uma terceira região de memória para armazenar variáveis chamada **heap**. O heap suporta _alocação de memória dinâmica_ em tempo de execução através de duas funções chamadas `allocate` e `deallocate`. Funciona da seguinte maneira: A função `allocate` retorna um pedaço livre de memória do tamanho especificado que pode ser usado para armazenar uma variável. Esta variável então vive até ser liberada chamando a função `deallocate` com uma referência à variável. Vamos passar por um exemplo: ![A função inner chama `allocate(size_of([u32; 3]))`, escreve `z.write([1,2,3]);`, e retorna `(z as *mut u32).offset(i)`. No valor retornado `y`, a função outer realiza `deallocate(y, size_of(u32))`.](call-stack-heap.svg) Aqui a função `inner` usa memória heap em vez de variáveis estáticas para armazenar `z`. Primeiro ela aloca um bloco de memória do tamanho necessário, que retorna um [ponteiro bruto] `*mut u32`. Em seguida, usa o método [`ptr::write`] para escrever o array `[1,2,3]` nele. No último passo, usa a função [`offset`] para calcular um ponteiro para o `i`-ésimo elemento e então o retorna. (Note que omitimos alguns casts e blocos unsafe necessários nesta função de exemplo por brevidade.) [ponteiro bruto]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`ptr::write`]: https://doc.rust-lang.org/core/ptr/fn.write.html [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset A memória alocada vive até ser explicitamente liberada através de uma chamada para `deallocate`. Assim, o ponteiro retornado ainda é válido mesmo depois que `inner` retornou e sua parte da pilha de chamadas foi destruída. A vantagem de usar memória heap comparada à memória estática é que a memória pode ser reutilizada depois de ser liberada, o que fazemos através da chamada `deallocate` em `outer`. Depois dessa chamada, a situação se parece com isso: ![A pilha de chamadas contém as variáveis locais de `outer`, o heap contém `z[0]` e `z[2]`, mas não mais `z[1]`.](call-stack-heap-freed.svg) Vemos que o slot `z[1]` está livre novamente e pode ser reutilizado para a próxima chamada `allocate`. No entanto, também vemos que `z[0]` e `z[2]` nunca são liberados porque nunca os desalocamos. Tal bug é chamado de _vazamento de memória_ e é frequentemente a causa do consumo excessivo de memória de programas (imagine apenas o que acontece quando chamamos `inner` repetidamente em um loop). Isso pode parecer ruim, mas existem tipos muito mais perigosos de bugs que podem acontecer com alocação dinâmica. ### Erros Comuns Além de vazamentos de memória, que são lamentáveis mas não tornam o programa vulnerável a atacantes, existem dois tipos comuns de bugs com consequências mais graves: - Quando acidentalmente continuamos a usar uma variável depois de chamar `deallocate` nela, temos uma chamada vulnerabilidade **use-after-free**. Tal bug causa comportamento indefinido e pode frequentemente ser explorado por atacantes para executar código arbitrário. - Quando acidentalmente liberamos uma variável duas vezes, temos uma vulnerabilidade **double-free**. Isso é problemático porque pode liberar uma alocação diferente que foi alocada no mesmo local após a primeira chamada `deallocate`. Assim, pode levar a uma vulnerabilidade use-after-free novamente. Esses tipos de vulnerabilidades são comumente conhecidos, então pode-se esperar que as pessoas tenham aprendido como evitá-los até agora. Mas não, tais vulnerabilidades ainda são encontradas regularmente, por exemplo esta [vulnerabilidade use-after-free no Linux][linux vulnerability] (2019), que permitiu execução de código arbitrário. Uma busca na web como `use-after-free linux {ano atual}` provavelmente sempre produzirá resultados. Isso mostra que mesmo os melhores programadores nem sempre são capazes de lidar corretamente com memória dinâmica em projetos complexos. [linux vulnerability]: https://securityboulevard.com/2019/02/linux-use-after-free-vulnerability-found-in-linux-2-6-through-4-20-11/ Para evitar esses problemas, muitas linguagens, como Java ou Python, gerenciam memória dinâmica automaticamente usando uma técnica chamada [_coleta de lixo_]. A ideia é que o programador nunca invoca `deallocate` manualmente. Em vez disso, o programa é regularmente pausado e escaneado em busca de variáveis heap não utilizadas, que são então automaticamente desalocadas. Assim, as vulnerabilidades acima nunca podem ocorrer. As desvantagens são a sobrecarga de desempenho do escaneamento regular e os tempos de pausa provavelmente longos. [_coleta de lixo_]: https://en.wikipedia.org/wiki/Garbage_collection_(computer_science) Rust adota uma abordagem diferente para o problema: Ele usa um conceito chamado [_propriedade_] que é capaz de verificar a correção das operações de memória dinâmica em tempo de compilação. Assim, nenhuma coleta de lixo é necessária para evitar as vulnerabilidades mencionadas, o que significa que não há sobrecarga de desempenho. Outra vantagem dessa abordagem é que o programador ainda tem controle refinado sobre o uso de memória dinâmica, assim como com C ou C++. [_propriedade_]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html ### Alocações em Rust Em vez de deixar o programador chamar `allocate` e `deallocate` manualmente, a biblioteca padrão do Rust fornece tipos de abstração que chamam essas funções implicitamente. O tipo mais importante é [**`Box`**], que é uma abstração para um valor alocado no heap. Ele fornece uma função construtora [`Box::new`] que recebe um valor, chama `allocate` com o tamanho do valor e então move o valor para o slot recém-alocado no heap. Para liberar a memória heap novamente, o tipo `Box` implementa a [trait `Drop`] para chamar `deallocate` quando sai do escopo: [**`Box`**]: https://doc.rust-lang.org/std/boxed/index.html [`Box::new`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html#method.new [trait `Drop`]: https://doc.rust-lang.org/book/ch15-03-drop.html ```rust { let z = Box::new([1,2,3]); […] } // z sai do escopo e `deallocate` é chamado ``` Esse padrão tem o nome estranho [_aquisição de recurso é inicialização_] (ou _RAII_ abreviado). Ele se originou em C++, onde é usado para implementar um tipo de abstração similar chamado [`std::unique_ptr`]. [_aquisição de recurso é inicialização_]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization [`std::unique_ptr`]: https://en.cppreference.com/w/cpp/memory/unique_ptr Tal tipo sozinho não é suficiente para prevenir todos os bugs use-after-free, já que programadores ainda podem manter referências depois que o `Box` sai do escopo e o slot de memória heap correspondente é desalocado: ```rust let x = { let z = Box::new([1,2,3]); &z[1] }; // z sai do escopo e `deallocate` é chamado println!("{}", x); ``` É aqui que a propriedade do Rust entra. Ela atribui um [tempo de vida] abstrato a cada referência, que é o escopo no qual a referência é válida. No exemplo acima, a referência `x` é retirada do array `z`, então ela se torna inválida depois que `z` sai do escopo. Quando você [executa o exemplo acima no playground][playground-2], você vê que o compilador Rust de fato gera um erro: [tempo de vida]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html [playground-2]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=28180d8de7b62c6b4a681a7b1f745a48 ``` error[E0597]: `z[_]` does not live long enough --> src/main.rs:4:9 | 2 | let x = { | - borrow later stored here 3 | let z = Box::new([1,2,3]); | - binding `z` declared here 4 | &z[1] | ^^^^^ borrowed value does not live long enough 5 | }; // z sai do escopo e `deallocate` é chamado | - `z[_]` dropped here while still borrowed ``` A terminologia pode ser um pouco confusa no início. Pegar uma referência a um valor é chamado de _emprestar_ o valor, já que é similar a um empréstimo na vida real: Você tem acesso temporário a um objeto mas precisa devolvê-lo em algum momento, e você não deve destruí-lo. Ao verificar que todos os empréstimos terminam antes que um objeto seja destruído, o compilador Rust pode garantir que nenhuma situação use-after-free pode ocorrer. O sistema de propriedade do Rust vai ainda mais longe, prevenindo não apenas bugs use-after-free mas também fornecendo [_segurança de memória_] completa, como linguagens com coleta de lixo como Java ou Python fazem. Adicionalmente, ele garante [_segurança de thread_] e assim é ainda mais seguro que essas linguagens em código multi-thread. E mais importante, todas essas verificações acontecem em tempo de compilação, então não há sobrecarga em tempo de execução comparado ao gerenciamento de memória manual em C. [_segurança de memória_]: https://en.wikipedia.org/wiki/Memory_safety [_segurança de thread_]: https://en.wikipedia.org/wiki/Thread_safety ### Casos de Uso Agora sabemos o básico de alocação de memória dinâmica em Rust, mas quando devemos usá-la? Chegamos muito longe com nosso kernel sem alocação de memória dinâmica, então por que precisamos dela agora? Primeiro, alocação de memória dinâmica sempre vem com um pouco de sobrecarga de desempenho, já que precisamos encontrar um slot livre no heap para cada alocação. Por essa razão, variáveis locais geralmente são preferíveis, especialmente em código kernel sensível ao desempenho. No entanto, existem casos em que alocação de memória dinâmica é a melhor escolha. Como regra básica, memória dinâmica é necessária para variáveis que têm um tempo de vida dinâmico ou um tamanho variável. O tipo mais importante com tempo de vida dinâmico é [**`Rc`**], que conta as referências ao seu valor encapsulado e o desaloca depois que todas as referências saíram do escopo. Exemplos de tipos com tamanho variável são [**`Vec`**], [**`String`**] e outros [tipos de coleção] que crescem dinamicamente quando mais elementos são adicionados. Esses tipos funcionam alocando uma quantidade maior de memória quando ficam cheios, copiando todos os elementos e então desalocando a alocação antiga. [**`Rc`**]: https://doc.rust-lang.org/alloc/rc/index.html [**`Vec`**]: https://doc.rust-lang.org/alloc/vec/index.html [**`String`**]: https://doc.rust-lang.org/alloc/string/index.html [tipos de coleção]: https://doc.rust-lang.org/alloc/collections/index.html Para o nosso kernel, precisaremos principalmente dos tipos de coleção, por exemplo, para armazenar uma lista de tarefas ativas ao implementar multitarefa em posts futuros. ## A Interface do Alocador O primeiro passo na implementação de um alocador heap é adicionar uma dependência na crate embutida [`alloc`]. Como a crate [`core`], ela é um subconjunto da biblioteca padrão que adicionalmente contém os tipos de alocação e coleção. Para adicionar a dependência em `alloc`, adicionamos o seguinte ao nosso `lib.rs`: [`alloc`]: https://doc.rust-lang.org/alloc/ [`core`]: https://doc.rust-lang.org/core/ ```rust // em src/lib.rs extern crate alloc; ``` Ao contrário de dependências normais, não precisamos modificar o `Cargo.toml`. A razão é que a crate `alloc` vem com o compilador Rust como parte da biblioteca padrão, então o compilador já conhece a crate. Ao adicionar esta declaração `extern crate`, especificamos que o compilador deve tentar incluí-la. (Historicamente, todas as dependências precisavam de uma declaração `extern crate`, que agora é opcional). Como estamos compilando para um alvo personalizado, não podemos usar a versão pré-compilada de `alloc` que vem com a instalação do Rust. Em vez disso, temos que dizer ao cargo para recompilar a crate a partir do código-fonte. Podemos fazer isso adicionando-a ao array `unstable.build-std` em nosso arquivo `.cargo/config.toml`: ```toml # em .cargo/config.toml [unstable] build-std = ["core", "compiler_builtins", "alloc"] ``` Agora o compilador irá recompilar e incluir a crate `alloc` em nosso kernel. A razão pela qual a crate `alloc` é desabilitada por padrão em crates `#[no_std]` é que ela tem requisitos adicionais. Quando tentamos compilar nosso projeto agora, veremos esses requisitos como erros: ``` error: no global memory allocator found but one is required; link to std or add #[global_allocator] to a static item that implements the GlobalAlloc trait. ``` O erro ocorre porque a crate `alloc` requer um alocador heap, que é um objeto que fornece as funções `allocate` e `deallocate`. Em Rust, alocadores heap são descritos pela trait [`GlobalAlloc`], que é mencionada na mensagem de erro. Para definir o alocador heap para a crate, o atributo `#[global_allocator]` deve ser aplicado a uma variável `static` que implementa a trait `GlobalAlloc`. [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ### A Trait `GlobalAlloc` A trait [`GlobalAlloc`] define as funções que um alocador heap deve fornecer. A trait é especial porque quase nunca é usada diretamente pelo programador. Em vez disso, o compilador irá automaticamente inserir as chamadas apropriadas aos métodos da trait ao usar os tipos de alocação e coleção de `alloc`. Como precisaremos implementar a trait para todos os nossos tipos de alocador, vale a pena dar uma olhada mais de perto em sua declaração: ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` Ela define os dois métodos obrigatórios [`alloc`] e [`dealloc`], que correspondem às funções `allocate` e `deallocate` que usamos em nossos exemplos: - O método [`alloc`] recebe uma instância [`Layout`] como argumento, que descreve o tamanho e alinhamento desejados que a memória alocada deve ter. Ele retorna um [ponteiro bruto] para o primeiro byte do bloco de memória alocado. Em vez de um valor de erro explícito, o método `alloc` retorna um ponteiro nulo para sinalizar um erro de alocação. Isso é um pouco não idiomático, mas tem a vantagem de que envolver alocadores de sistema existentes é fácil, já que eles usam a mesma convenção. - O método [`dealloc`] é a contraparte e é responsável por liberar um bloco de memória novamente. Ele recebe dois argumentos: o ponteiro retornado por `alloc` e o `Layout` que foi usado para a alocação. [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html A trait adicionalmente define os dois métodos [`alloc_zeroed`] e [`realloc`] com implementações padrão: - O método [`alloc_zeroed`] é equivalente a chamar `alloc` e então definir o bloco de memória alocado para zero, que é exatamente o que a implementação padrão fornecida faz. Uma implementação de alocador pode substituir as implementações padrão com uma implementação personalizada mais eficiente se possível. - O método [`realloc`] permite aumentar ou diminuir uma alocação. A implementação padrão aloca um novo bloco de memória com o tamanho desejado e copia todo o conteúdo da alocação anterior. Novamente, uma implementação de alocador pode provavelmente fornecer uma implementação mais eficiente deste método, por exemplo, aumentando/diminuindo a alocação no lugar, se possível. [`alloc_zeroed`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.alloc_zeroed [`realloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.realloc #### Insegurança Uma coisa a notar é que tanto a trait em si quanto todos os métodos da trait são declarados como `unsafe`: - A razão para declarar a trait como `unsafe` é que o programador deve garantir que a implementação da trait para um tipo de alocador esteja correta. Por exemplo, o método `alloc` nunca deve retornar um bloco de memória que já está sendo usado em outro lugar porque isso causaria comportamento indefinido. - Similarmente, a razão pela qual os métodos são `unsafe` é que o chamador deve garantir várias invariantes ao chamar os métodos, por exemplo, que o `Layout` passado para `alloc` especifica um tamanho diferente de zero. Isso não é realmente relevante na prática, já que os métodos normalmente são chamados diretamente pelo compilador, que garante que os requisitos sejam atendidos. ### Um `DummyAllocator` Agora que sabemos o que um tipo de alocador deve fornecer, podemos criar um alocador dummy simples. Para isso, criamos um novo módulo `allocator`: ```rust // em src/lib.rs pub mod allocator; ``` Nosso alocador dummy faz o mínimo absoluto para implementar a trait e sempre retorna um erro quando `alloc` é chamado. Ele se parece com isso: ```rust // em src/allocator.rs use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr::null_mut; pub struct Dummy; unsafe impl GlobalAlloc for Dummy { unsafe fn alloc(&self, _layout: Layout) -> *mut u8 { null_mut() } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { panic!("dealloc should be never called") } } ``` A struct não precisa de nenhum campo, então a criamos como um [tipo de tamanho zero]. Como mencionado acima, sempre retornamos o ponteiro nulo de `alloc`, que corresponde a um erro de alocação. Como o alocador nunca retorna nenhuma memória, uma chamada para `dealloc` nunca deve ocorrer. Por essa razão, simplesmente entramos em pânico no método `dealloc`. Os métodos `alloc_zeroed` e `realloc` têm implementações padrão, então não precisamos fornecer implementações para eles. [tipo de tamanho zero]: https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts Agora temos um alocador simples, mas ainda temos que dizer ao compilador Rust que ele deve usar este alocador. É aqui que o atributo `#[global_allocator]` entra. ### O Atributo `#[global_allocator]` O atributo `#[global_allocator]` diz ao compilador Rust qual instância de alocador ele deve usar como alocador heap global. O atributo só é aplicável a um `static` que implementa a trait `GlobalAlloc`. Vamos registrar uma instância de nosso alocador `Dummy` como o alocador global: ```rust // em src/allocator.rs #[global_allocator] static ALLOCATOR: Dummy = Dummy; ``` Como o alocador `Dummy` é um [tipo de tamanho zero], não precisamos especificar nenhum campo na expressão de inicialização. Com este static, os erros de compilação devem ser corrigidos. Agora podemos usar os tipos de alocação e coleção de `alloc`. Por exemplo, podemos usar um [`Box`] para alocar um valor no heap: [`Box`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html ```rust // em src/main.rs extern crate alloc; use alloc::boxed::Box; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] imprimir "Hello World!", chamar `init`, criar `mapper` e `frame_allocator` let x = Box::new(41); // […] chamar `test_main` no modo de teste println!("It did not crash!"); blog_os::hlt_loop(); } ``` Note que precisamos especificar a declaração `extern crate alloc` em nosso `main.rs` também. Isso é necessário porque as partes `lib.rs` e `main.rs` são tratadas como crates separadas. No entanto, não precisamos criar outro `#[global_allocator]` static porque o alocador global se aplica a todas as crates do projeto. Na verdade, especificar um alocador adicional em outra crate seria um erro. Quando executamos o código acima, vemos que um pânico ocorre: ![QEMU imprimindo "panicked at `allocation error: Layout { size_: 4, align_: 4 }, src/lib.rs:89:5"](qemu-dummy-output.png) O pânico ocorre porque a função `Box::new` chama implicitamente a função `alloc` do alocador global. Nosso alocador dummy sempre retorna um ponteiro nulo, então toda alocação falha. Para corrigir isso, precisamos criar um alocador que realmente retorna memória utilizável. ## Criando um Heap do Kernel Antes de podermos criar um alocador apropriado, primeiro precisamos criar uma região de memória heap da qual o alocador pode alocar memória. Para fazer isso, precisamos definir um intervalo de memória virtual para a região heap e então mapear esta região para frames físicos. Veja o post [_"Introdução ao Paging"_] para uma visão geral de memória virtual e tabelas de página. [_"Introdução ao Paging"_]: @/edition-2/posts/08-paging-introduction/index.md O primeiro passo é definir uma região de memória virtual para o heap. Podemos escolher qualquer intervalo de endereço virtual que quisermos, desde que não esteja já sendo usado para uma região de memória diferente. Vamos defini-la como a memória começando no endereço `0x_4444_4444_0000` para que possamos facilmente reconhecer um ponteiro heap mais tarde: ```rust // em src/allocator.rs pub const HEAP_START: usize = 0x_4444_4444_0000; pub const HEAP_SIZE: usize = 100 * 1024; // 100 KiB ``` Definimos o tamanho do heap para 100 KiB por enquanto. Se precisarmos de mais espaço no futuro, podemos simplesmente aumentá-lo. Se tentássemos usar esta região heap agora, uma falha de página ocorreria, já que a região de memória virtual ainda não está mapeada para memória física. Para resolver isso, criamos uma função `init_heap` que mapeia as páginas heap usando a [API `Mapper`] que introduzimos no post [_"Implementação de Paging"_]: [API `Mapper`]: @/edition-2/posts/09-paging-implementation/index.md#using-offsetpagetable [_"Implementação de Paging"_]: @/edition-2/posts/09-paging-implementation/index.md ```rust // em src/allocator.rs use x86_64::{ structures::paging::{ mapper::MapToError, FrameAllocator, Mapper, Page, PageTableFlags, Size4KiB, }, VirtAddr, }; pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { let page_range = { let heap_start = VirtAddr::new(HEAP_START as u64); let heap_end = heap_start + HEAP_SIZE - 1u64; let heap_start_page = Page::containing_address(heap_start); let heap_end_page = Page::containing_address(heap_end); Page::range_inclusive(heap_start_page, heap_end_page) }; for page in page_range { let frame = frame_allocator .allocate_frame() .ok_or(MapToError::FrameAllocationFailed)?; let flags = PageTableFlags::PRESENT | PageTableFlags::WRITABLE; unsafe { mapper.map_to(page, frame, flags, frame_allocator)?.flush() }; } Ok(()) } ``` A função recebe referências mutáveis para uma instância [`Mapper`] e uma instância [`FrameAllocator`], ambas limitadas a páginas de 4 KiB usando [`Size4KiB`] como o parâmetro genérico. O valor de retorno da função é um [`Result`] com o tipo unitário `()` como a variante de sucesso e um [`MapToError`] como a variante de erro, que é o tipo de erro retornado pelo método [`Mapper::map_to`]. Reutilizar o tipo de erro faz sentido aqui porque o método `map_to` é a principal fonte de erros nesta função. [`Mapper`]:https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`Size4KiB`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/enum.Size4KiB.html [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`MapToError`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html [`Mapper::map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to A implementação pode ser dividida em duas partes: - **Criando o intervalo de páginas:**: Para criar um intervalo das páginas que queremos mapear, convertemos o ponteiro `HEAP_START` para um tipo [`VirtAddr`]. Então calculamos o endereço final do heap a partir dele adicionando o `HEAP_SIZE`. Queremos um limite inclusivo (o endereço do último byte do heap), então subtraímos 1. Em seguida, convertemos os endereços em tipos [`Page`] usando a função [`containing_address`]. Finalmente, criamos um intervalo de páginas das páginas inicial e final usando a função [`Page::range_inclusive`]. - **Mapeando as páginas:** O segundo passo é mapear todas as páginas do intervalo de páginas que acabamos de criar. Para isso, iteramos sobre essas páginas usando um loop `for`. Para cada página, fazemos o seguinte: - Alocamos um frame físico para o qual a página deve ser mapeada usando o método [`FrameAllocator::allocate_frame`]. Este método retorna [`None`] quando não há mais frames disponíveis. Lidamos com esse caso mapeando-o para um erro [`MapToError::FrameAllocationFailed`] através do método [`Option::ok_or`] e então aplicando o [operador de ponto de interrogação] para retornar cedo em caso de erro. - Definimos a flag `PRESENT` obrigatória e a flag `WRITABLE` para a página. Com essas flags, tanto acessos de leitura quanto de escrita são permitidos, o que faz sentido para memória heap. - Usamos o método [`Mapper::map_to`] para criar o mapeamento na tabela de páginas ativa. O método pode falhar, então usamos o [operador de ponto de interrogação] novamente para encaminhar o erro ao chamador. Em caso de sucesso, o método retorna uma instância [`MapperFlush`] que podemos usar para atualizar o [_buffer de tradução lookaside_] usando o método [`flush`]. [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`Page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html [`containing_address`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.containing_address [`Page::range_inclusive`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.range_inclusive [`FrameAllocator::allocate_frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html#tymethod.allocate_frame [`None`]: https://doc.rust-lang.org/core/option/enum.Option.html#variant.None [`MapToError::FrameAllocationFailed`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html#variant.FrameAllocationFailed [`Option::ok_or`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.ok_or [operador de ponto de interrogação]: https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [_buffer de tradução lookaside_]: @/edition-2/posts/08-paging-introduction/index.md#the-translation-lookaside-buffer [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush O passo final é chamar esta função de nossa `kernel_main`: ```rust // em src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; // nova importação use blog_os::memory::{self, BootInfoFrameAllocator}; println!("Hello World{}", "!"); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; // novo allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("heap initialization failed"); let x = Box::new(41); // […] chamar `test_main` no modo de teste println!("It did not crash!"); blog_os::hlt_loop(); } ``` Mostramos a função completa para contexto aqui. As únicas linhas novas são a importação `blog_os::allocator` e a chamada para a função `allocator::init_heap`. No caso de a função `init_heap` retornar um erro, entramos em pânico usando o método [`Result::expect`], já que atualmente não há maneira sensata de lidarmos com este erro. [`Result::expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect Agora temos uma região de memória heap mapeada que está pronta para ser usada. A chamada `Box::new` ainda usa nosso alocador `Dummy` antigo, então você ainda verá o erro "out of memory" quando executá-lo. Vamos corrigir isso usando um alocador apropriado. ## Usando uma Crate de Alocador Como implementar um alocador é um tanto complexo, começamos usando uma crate de alocador externa. Aprenderemos como implementar nosso próprio alocador no próximo post. Uma crate de alocador simples para aplicações `no_std` é a crate [`linked_list_allocator`]. Seu nome vem do fato de que ela usa uma estrutura de dados de lista encadeada para acompanhar as regiões de memória desalocadas. Veja o próximo post para uma explicação mais detalhada dessa abordagem. Para usar a crate, primeiro precisamos adicionar uma dependência nela em nosso `Cargo.toml`: [`linked_list_allocator`]: https://github.com/phil-opp/linked-list-allocator/ ```toml # em Cargo.toml [dependencies] linked_list_allocator = "0.9.0" ``` Então podemos substituir nosso alocador dummy pelo alocador fornecido pela crate: ```rust // em src/allocator.rs use linked_list_allocator::LockedHeap; #[global_allocator] static ALLOCATOR: LockedHeap = LockedHeap::empty(); ``` A struct é chamada `LockedHeap` porque usa o tipo [`spinning_top::Spinlock`] para sincronização. Isso é necessário porque múltiplas threads podem acessar o static `ALLOCATOR` ao mesmo tempo. Como sempre, ao usar um spinlock ou um mutex, precisamos ter cuidado para não causar acidentalmente um deadlock. Isso significa que não devemos realizar nenhuma alocação em manipuladores de interrupção, já que eles podem executar em um momento arbitrário e podem interromper uma alocação em andamento. [`spinning_top::Spinlock`]: https://docs.rs/spinning_top/0.1.0/spinning_top/type.Spinlock.html Definir o `LockedHeap` como alocador global não é suficiente. A razão é que usamos a função construtora [`empty`], que cria um alocador sem nenhuma memória de suporte. Como nosso alocador dummy, ele sempre retorna um erro em `alloc`. Para corrigir isso, precisamos inicializar o alocador após criar o heap: [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.LockedHeap.html#method.empty ```rust // em src/allocator.rs pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { // […] mapear todas as páginas heap para frames físicos // novo unsafe { ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE); } Ok(()) } ``` Usamos o método [`lock`] no spinlock interno do tipo `LockedHeap` para obter uma referência exclusiva à instância [`Heap`] encapsulada, na qual então chamamos o método [`init`] com os limites do heap como argumentos. Como a função [`init`] já tenta escrever na memória heap, devemos inicializar o heap somente _depois_ de mapear as páginas heap. [`lock`]: https://docs.rs/lock_api/0.3.3/lock_api/struct.Mutex.html#method.lock [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init Depois de inicializar o heap, agora podemos usar todos os tipos de alocação e coleção da crate embutida [`alloc`] sem erro: ```rust // em src/main.rs use alloc::{boxed::Box, vec, vec::Vec, rc::Rc}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] inicializar interrupções, mapper, frame_allocator, heap // alocar um número no heap let heap_value = Box::new(41); println!("heap_value at {:p}", heap_value); // criar um vetor de tamanho dinâmico let mut vec = Vec::new(); for i in 0..500 { vec.push(i); } println!("vec at {:p}", vec.as_slice()); // criar um vetor com contagem de referências -> será liberado quando a contagem chegar a 0 let reference_counted = Rc::new(vec![1, 2, 3]); let cloned_reference = reference_counted.clone(); println!("current reference count is {}", Rc::strong_count(&cloned_reference)); core::mem::drop(reference_counted); println!("reference count is {} now", Rc::strong_count(&cloned_reference)); // […] chamar `test_main` no contexto de teste println!("It did not crash!"); blog_os::hlt_loop(); } ``` Este exemplo de código mostra alguns usos dos tipos [`Box`], [`Vec`] e [`Rc`]. Para os tipos `Box` e `Vec`, imprimimos os ponteiros heap subjacentes usando o [especificador de formatação `{:p}`]. Para mostrar `Rc`, criamos um valor heap com contagem de referências e usamos a função [`Rc::strong_count`] para imprimir a contagem de referências atual antes e depois de descartar uma instância (usando [`core::mem::drop`]). [`Vec`]: https://doc.rust-lang.org/alloc/vec/ [`Rc`]: https://doc.rust-lang.org/alloc/rc/ [especificador de formatação `{:p}`]: https://doc.rust-lang.org/core/fmt/trait.Pointer.html [`Rc::strong_count`]: https://doc.rust-lang.org/alloc/rc/struct.Rc.html#method.strong_count [`core::mem::drop`]: https://doc.rust-lang.org/core/mem/fn.drop.html Quando o executamos, vemos o seguinte: ![QEMU imprimindo ` heap_value at 0x444444440000 vec at 0x4444444408000 current reference count is 2 reference count is 1 now ](qemu-alloc-showcase.png) Como esperado, vemos que os valores `Box` e `Vec` vivem no heap, como indicado pelo ponteiro começando com o prefixo `0x_4444_4444_*`. O valor com contagem de referências também se comporta como esperado, com a contagem de referências sendo 2 após a chamada `clone`, e 1 novamente depois que uma das instâncias foi descartada. A razão pela qual o vetor começa no offset `0x800` não é que o valor encaixotado seja `0x800` bytes grande, mas as [realocações] que ocorrem quando o vetor precisa aumentar sua capacidade. Por exemplo, quando a capacidade do vetor é 32 e tentamos adicionar o próximo elemento, o vetor aloca um novo array de suporte com capacidade de 64 nos bastidores e copia todos os elementos. Então ele libera a alocação antiga. [realocações]: https://doc.rust-lang.org/alloc/vec/struct.Vec.html#capacity-and-reallocation É claro que existem muitos mais tipos de alocação e coleção na crate `alloc` que agora podemos usar todos em nosso kernel, incluindo: - o ponteiro com contagem de referências thread-safe [`Arc`] - o tipo de string proprietária [`String`] e a macro [`format!`] - [`LinkedList`] - o buffer circular crescente [`VecDeque`] - a fila de prioridade [`BinaryHeap`] - [`BTreeMap`] e [`BTreeSet`] [`Arc`]: https://doc.rust-lang.org/alloc/sync/struct.Arc.html [`String`]: https://doc.rust-lang.org/alloc/string/struct.String.html [`format!`]: https://doc.rust-lang.org/alloc/macro.format.html [`LinkedList`]: https://doc.rust-lang.org/alloc/collections/linked_list/struct.LinkedList.html [`VecDeque`]: https://doc.rust-lang.org/alloc/collections/vec_deque/struct.VecDeque.html [`BinaryHeap`]: https://doc.rust-lang.org/alloc/collections/binary_heap/struct.BinaryHeap.html [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html [`BTreeSet`]: https://doc.rust-lang.org/alloc/collections/btree_set/struct.BTreeSet.html Esses tipos se tornarão muito úteis quando quisermos implementar listas de threads, filas de escalonamento ou suporte para async/await. ## Adicionando um Teste Para garantir que não quebremos acidentalmente nosso novo código de alocação, devemos adicionar um teste de integração para ele. Começamos criando um novo arquivo `tests/heap_allocation.rs` com o seguinte conteúdo: ```rust // em tests/heap_allocation.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] extern crate alloc; use bootloader::{entry_point, BootInfo}; use core::panic::PanicInfo; entry_point!(main); fn main(boot_info: &'static BootInfo) -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` Reutilizamos as funções `test_runner` e `test_panic_handler` de nosso `lib.rs`. Como queremos testar alocações, habilitamos a crate `alloc` através da declaração `extern crate alloc`. Para mais informações sobre o boilerplate de teste, confira o post [_Testing_]. [_Testing_]: @/edition-2/posts/04-testing/index.md A implementação da função `main` se parece com isso: ```rust // em tests/heap_allocation.rs fn main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; use blog_os::memory::{self, BootInfoFrameAllocator}; use x86_64::VirtAddr; blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("heap initialization failed"); test_main(); loop {} } ``` Ela é muito similar à função `kernel_main` em nosso `main.rs`, com as diferenças de que não invocamos `println`, não incluímos nenhuma alocação de exemplo, e chamamos `test_main` incondicionalmente. Agora estamos prontos para adicionar alguns casos de teste. Primeiro, adicionamos um teste que realiza algumas alocações simples usando [`Box`] e verifica os valores alocados para garantir que as alocações básicas funcionam: ```rust // em tests/heap_allocation.rs use alloc::boxed::Box; #[test_case] fn simple_allocation() { let heap_value_1 = Box::new(41); let heap_value_2 = Box::new(13); assert_eq!(*heap_value_1, 41); assert_eq!(*heap_value_2, 13); } ``` Mais importante, este teste verifica que nenhum erro de alocação ocorre. Em seguida, construímos iterativamente um vetor grande, para testar tanto alocações grandes quanto múltiplas alocações (devido a realocações): ```rust // em tests/heap_allocation.rs use alloc::vec::Vec; #[test_case] fn large_vec() { let n = 1000; let mut vec = Vec::new(); for i in 0..n { vec.push(i); } assert_eq!(vec.iter().sum::(), (n - 1) * n / 2); } ``` Verificamos a soma comparando-a com a fórmula para a [soma parcial n-ésima]. Isso nos dá alguma confiança de que os valores alocados estão todos corretos. [soma parcial n-ésima]: https://en.wikipedia.org/wiki/1_%2B_2_%2B_3_%2B_4_%2B_%E2%8B%AF#Partial_sums Como terceiro teste, criamos dez mil alocações uma após a outra: ```rust // em tests/heap_allocation.rs use blog_os::allocator::HEAP_SIZE; #[test_case] fn many_boxes() { for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } } ``` Este teste garante que o alocador reutiliza memória liberada para alocações subsequentes, já que ficaria sem memória caso contrário. Isso pode parecer um requisito óbvio para um alocador, mas existem designs de alocador que não fazem isso. Um exemplo é o design de alocador bump que será explicado no próximo post. Vamos executar nosso novo teste de integração: ``` > cargo test --test heap_allocation […] Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` Todos os três testes foram bem-sucedidos! Você também pode invocar `cargo test` (sem o argumento `--test`) para executar todos os testes unitários e de integração. ## Resumo Este post deu uma introdução à memória dinâmica e explicou por que e onde ela é necessária. Vimos como o verificador de empréstimos do Rust previne vulnerabilidades comuns e aprendemos como a API de alocação do Rust funciona. Depois de criar uma implementação mínima da interface de alocador do Rust usando um alocador dummy, criamos uma região de memória heap apropriada para o nosso kernel. Para isso, definimos um intervalo de endereço virtual para o heap e então mapeamos todas as páginas desse intervalo para frames físicos usando o `Mapper` e `FrameAllocator` do post anterior. Finalmente, adicionamos uma dependência na crate `linked_list_allocator` para adicionar um alocador apropriado ao nosso kernel. Com este alocador, pudemos usar `Box`, `Vec` e outros tipos de alocação e coleção da crate `alloc`. ## O que vem a seguir? Embora já tenhamos adicionado suporte para alocação heap neste post, deixamos a maior parte do trabalho para a crate `linked_list_allocator`. O próximo post mostrará em detalhes como um alocador pode ser implementado do zero. Ele apresentará múltiplos designs de alocador possíveis, mostrará como implementar versões simples deles e explicará suas vantagens e desvantagens. ================================================ FILE: blog/content/edition-2/posts/10-heap-allocation/index.zh-CN.md ================================================ +++ title = "堆分配" weight = 10 path = "zh-CN/heap-allocation" date = 2019-06-26 [extra] chapter = "Memory Management" # Please update this when updating the translation translation_based_on_commit = "2edf0221a34e3dbfd45cf5d45309689accb14e50" # GitHub usernames of the people that translated this post translators = ["Liuliuliu7"] # GitHub usernames of the people that contributed to this translation translation_contributors = [] +++ 本文为我们的内核添加堆分配支持。首先,它介绍了动态内存,并展示了 Rust 的借用检查器如何防止常见的分配错误。接着,它实现了 Rust 基本的分配接口,创建了堆内存区域,并实现了一个分配器 crate。本文结束时,内置的 `alloc` crate 中的所有分配和集合类型都将在我们的内核中可用。 这个系列的 blog 在[GitHub]上开放开发,如果你有任何问题,请在这里开一个 issue 来讨论。当然你也可以在[底部][at the bottom]留言。你可以在[`post-10`][post branch]找到这篇文章的完整源码。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-10 ## 局部变量与静态变量 目前,我们的内核中使用了两类变量:局部变量和静态(`static`) 变量。局部变量存储在[调用栈]上,仅在函数返回前有效。静态变量存储在固定的内存位置,在程序的整个生命周期内有效。 ### 局部变量 局部变量存储在调用栈上,调用栈是一个支持 `push` 和 `pop` 操作的[栈数据结构]。在每次函数调用时,编译器会将函数的参数、返回地址和局部变量压入栈中: [调用栈]: https://en.wikipedia.org/wiki/Call_stack [栈数据结构]: https://en.wikipedia.org/wiki/Stack_(abstract_data_type) ![一个 outer() 和 inner(i: usize) 函数,其中 outer 调用 inner(1)。两者均有一些局部变量。调用栈包含以下内容:outer 的局部变量,参数 i = 1,返回地址,inner 的局部变量。](call-stack.svg) 上述示例展示了 `outer` 函数调用 `inner` 函数后的调用栈。调用栈首先包含 `outer` 的局部变量。在调用 `inner` 时,参数 `1` 和返回地址被压入栈中。然后控制权转移到 `inner`,其局部变量也被压入。 在 `inner` 函数返回后,其调用栈部分被弹出,仅保留 `outer` 的局部变量: ![调用栈仅包含 outer 的局部变量](call-stack-return.svg) 我们看到,`inner` 的局部变量仅在函数返回前有效。Rust 编译器保证了这些变量的生命周期,如果我们其生命周期外使用该变量(例如返回一个局部变量的引用),则会抛出错误: ```rust fn inner(i: usize) -> &'static u32 { let z = [1, 2, 3]; &z[i] } ``` [在 playground 上运行示例](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=6186a0f3a54f468e1de8894996d12819) 在此示例中返回局部变量的引用没有意义,但在某些情况下,我们希望变量的生命周期超过函数。例如,在我们的内核中加载中断描述符表时,我们需要使用 `static` 变量来延长生命周期。 ### 静态变量 静态变量存储在与栈分开的固定内存位置中,其内存位置由链接器在编译时分配并编码在可执行文件中。静态变量在程序的整个运行期间始终存在,因此具有 `'static` 生命周期,局部变量总是可以引用它们: ![与 outer/inner 示例相同,但 inner 有一个静态变量 Z: [u32; 3] = [1,2,3]; 并返回 &Z[i] 引用](call-stack-static.svg) 在上述示例中,当 `inner` 函数返回时,其调用栈分被销毁。静态变量存储在永不销毁的独立内存区域,因此 `&Z[1]` 引用在返回后仍然有效。 除了 `'static` 生命周期,静态变量还有一个有用的特性:它们的内存位置在编译时已知,因此访问时无需引用。我们在 `println` 宏中利用了这一点:通过内部使用静态[`Writer`],调用宏时无需 `&mut Writer` 引用,这在[异常处理程序]中尤为有用,因为在这些场景中我们无法访问其他变量。 [`Writer`]: @/edition-2/posts/03-vga-text-buffer/index.md#a-global-interface [异常处理程序]: @/edition-2/posts/05-cpu-exceptions/index.md#implementation 然而,静态变量的这一特性带来了一个关键缺点:它们默认是只读的。Rust 强制执行这一点,因为如果多个线程同时修改静态变量,会导致[数据竞争][data race]。修改静态变量的唯一方法是将其封装在 [`Mutex`] 类型中,以确保任何时候只有一个 `&mut` 引用存在。我们已经为[静态 VGA 缓冲区 `Writer`][vga mutex]使用了 `Mutex`。 [data race]: https://doc.rust-lang.org/nomicon/races.html [`Mutex`]: https://docs.rs/spin/0.5.2/spin/struct.Mutex.html [vga mutex]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks ## 动态内存 局部变量和静态变量结合已经非常强大,足以应对大多数使用场景。然而,它们各有局限性: - **局部变量**:仅在函数或块结束前有效,因为它们存储在调用栈上,函数返回后即被销毁。 - **静态变量**:在程序整个运行期间始终存在,无法在不再需要时回收和重用其内存。此外,它们的所有权语义不明确,可被所有函数访问,因此修改时需使用 [`Mutex`] 保护。 局部变量和静态变量的另一个限制是它们只能有固定大小,因此无法存储动态增长的集合。(Rust 中有关于[非固定大小值][unsized rvalues]的提案,允许动态大小的局部变量,但仅适用于特定场景。) [unsized rvalues]: https://github.com/rust-lang/rust/issues/48055 为解决这些缺点,编程语言通常提供第三种内存区域,称为**堆**,用于存储变量。堆通过 `allocate` 和 `deallocate` 两个函数支持运行时_动态内存分配_:`allocate` 函数返回指定大小的空闲内存块,用于存储变量,该变量在被 `deallocate` 函数释放前一直存在。 以下是一个示例: ![inner 函数调用 allocate(size_of([u32; 3])),写入 z.write([1,2,3]);,并返回 (z as *mut u32).offset(i)。outer 函数对返回的 y 执行 deallocate(y, size_of(u32))。](call-stack-heap.svg) 在此,`inner` 函数使用堆内存而非静态变量来存储 `z`。它首先分配所需大小的内存块,返回一个 `*mut u32` [裸指针][raw pointer]。然后使用 [`ptr::write`] 方法将数组 `[1,2,3]` 写入。最后使用 [`offset`] 函数计算第 `i` 个元素的指针并返回。(为简洁起见,此示例函数省略了部分必需的类型转换和 unsafe 块) [raw pointer]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#dereferencing-a-raw-pointer [`ptr::write`]: https://doc.rust-lang.org/core/ptr/fn.write.html [`offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.offset 分配的内存在调用 `deallocate` 显式释放前一直存在。因此,虽然 `inner` 返回并销毁其调用栈部分,返回的指针仍然有效。与静态内存相比,堆内存的优势在于释放后内存可被重用,我们在 `outer` 中通过 `deallocate` 调用实现了这一点。之后,情况如下: ![调用栈包含 outer 的局部变量,堆包含 z[0] 和 z[2],但不再包含 z[1]。](call-stack-heap-freed.svg) 我们看到 `z[1]` 已被释放,可在下一次 `allocate` 调用时被重用。然而,`z[0]` 和 `z[2]` 从未被释放,这被称为**内存泄漏**,常导致程序内存消耗过高(想象在循环中反复调用 `inner` 的后果)。这可能导致严重的问题,但动态分配还可能引发更危险的错误。 ### 常见错误 除了内存泄漏(其虽不利但不会使程序易受攻击),还有两种后果更严重的常见错误: - **释放后使用(use-after-free)**:在对变量 `deallocate` 后继续使用,这将导致未定义行为,常被攻击者利用以执行任意代码。 - **双重释放(double-free)**:意外对变量进行两次释放,可能释放了在同一位置重新分配的其他内存块,从而导致释放后使用漏洞。 这些漏洞广为人知,但即使在复杂项目中,最优秀的程序员也难以完全避免。例如,2019 年 Linux 中发现的[释放后使用漏洞][linux vulnerability]可导致任意代码执行。通过搜索 `use-after-free linux {年份}` 通常能找到相关结果。这表明即使最优秀的程序员在复杂项目中也难以正确处理动态内存。 [linux vulnerability]: https://securityboulevard.com/2019/02/linux-use-after-free-vulnerability-found-in-linux-2-6-through-4-20-11/ 为避免这些问题,许多语言(如 Java 或 Python)使用*垃圾回收(garbage collection)*来自动管理动态内存。程序员无需手动调用 `deallocate`,而程序会定期暂停并扫描未使用的堆变量,并自动释放它们,从而避免上述漏洞。但这种方法的缺点是定期扫描会造成性能开销以及可能的长时间暂停。 Rust 采用了不同的方法:通过[*所有权*][_ownership_]概念,在编译时检查动态内存操作的正确性,无需垃圾回收即可避免上述漏洞,这意味着无性能开销。另一个好处是程序员仍能像在 C 或 C++ 中一样精细控制动态内存。 [_ownership_]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html ### Rust 中的分配 Rust 标准库不需要程序员直接调用 `allocate` 和 `deallocate`,而是提供抽象类型隐式调用这些函数。最重要的类型是 [**`Box`**],用于堆分配值。它提供 [`Box::new`] 构造函数,其接受一个值,调用 `allocate` 获取所需大小的内存,并将值移动到堆上新分配的空间中。为了释放堆内存,`Box` 实现了[`Drop` trait][`Drop` trait],并在变量离开作用域时调用 `deallocate`: [**`Box`**]: https://doc.rust-lang.org/std/boxed/index.html [`Box::new`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html#method.new [`Drop` trait]: https://doc.rust-lang.org/book/ch15-03-drop.html ```rust { let z = Box::new([1,2,3]); […] } // z 离开作用域,`deallocate` 被调用 ``` 这种模式有一个奇怪的名字,称为[*资源获取即初始化*][_resource acquisition is initialization_](简称为RAII),起源于 C++,用于实现类似 [`std::unique_ptr`] 的类型。 [_resource acquisition is initialization_]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization [`std::unique_ptr`]: https://en.cppreference.com/w/cpp/memory/unique_ptr 仅靠这种类型无法防止所有释放后使用错误,因为程序员可能在 `Box` 超出作用域并释放堆内存后仍持有引用: ```rust let x = { let z = Box::new([1,2,3]); &z[1] }; // z 超出作用域,调用 `deallocate` println!("{}", x); ``` 这就是 Rust 的所有权起作用的地方。它通过为每个引用分配一个抽象[生命周期][lifetime](引用有效的范围)解决此问题。在上述示例中,`x` 引用了 `z` 数组,因此在 `z` 超出作用域后失效。在 [playground][playground-2] 运行上述代码,Rust 编译器会报错: [lifetime]: https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html [playground-2]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=28180d8de7b62c6b4a681a7b1f745a48 ``` error[E0597]: `z[_]` does not live long enough --> src/main.rs:4:9 | 2 | let x = { | - borrow later stored here 3 | let z = Box::new([1,2,3]); | - binding `z` declared here 4 | &z[1] | ^^^^^ borrowed value does not live long enough 5 | }; // z goes out of scope and `deallocate` is called | - `z[_]` dropped here while still borrowed ``` 这一术语初看可能有些复杂。获取值的引用称为 *借用*,类似于现实中的借用:临时访问对象,但需在某时归还,且不能销毁。通过检查所有借用在对象销毁前结束,Rust 编译器保证不会发生释放后使用情况。 Rust 的所有权系统不仅防止释放后使用,还提供与 Java 或 Python 等垃圾回收语言相同的完全[*内存安全*][_memory safety_]。此外,它保证[*线程安全*][_thread safety_],在多线程代码中比这些语言更安全。最重要的是,所有检查在编译时进行,与 C 的手动内存管理相比没有运行时开销。 [_memory safety_]: https://en.wikipedia.org/wiki/Memory_safety [_thread safety_]: https://en.wikipedia.org/wiki/Thread_safety ### 使用场景 我们了解了 Rust 中动态内存分配的基础,但何时使用?我们的内核在不使用动态内存的情况下已经取得了很大进展,为何现在需要? 首先,动态内存分配总会带来一些性能开销,因为每次分配需在堆上寻找空闲槽。因此,在性能敏感的内核代码中,局部变量通常更优。然而,在某些情况下,动态内存分配是最佳选择。 基本规则是,动态内存适用于具有动态生命周期或可变大小的变量。最重要的动态生命周期类型是 [**`Rc`**],它跟踪被包裹变量的引用数,并在所有引用超出作用域后释放。具有可变大小的类型包括 [**`Vec`**]、[**`String`**] 等[集合类型][collection types],这些类型在用满时分配更大内存,复制所有元素,然后释放旧分配。 [**`Rc`**]: https://doc.rust-lang.org/alloc/rc/index.html [**`Vec`**]: https://doc.rust-lang.org/alloc/vec/index.html [**`String`**]: https://doc.rust-lang.org/alloc/string/index.html [collection types]: https://doc.rust-lang.org/alloc/collections/index.html 对于我们的内核,未来实现多任务时,我们主要需要集合类型,例如存储活动任务列表。 ## 分配器接口 实现堆分配器的第一步是添加对内置 [`alloc`] crate 的依赖。与 [`core`] crate 类似,它是标准库的子集,包含分配和集合类型。在 `lib.rs` 中添加: [`alloc`]: https://doc.rust-lang.org/alloc/ [`core`]: https://doc.rust-lang.org/core/ ```rust // in src/lib.rs extern crate alloc; ``` 与普通依赖不同,我们无需修改 `Cargo.toml`。因为 `alloc` crate 作为标准库的一部分提供给 Rust 编译器,编译器已经了解了这个 crate。通过 `extern crate` 语句,我们指定编译器去尝试包含它。(出于历史原因,所有依赖都需要 `extern crate` 语句,这现在是可选的。) 由于我们为自定义目标编译,无法使用 Rust 安装中预编译的 `alloc` 版本。需通过在 `.cargo/config.toml` 中添加 `unstable.build-std` 数组,指示 cargo 从源代码重新编译这个 crate: ```toml # in .cargo/config.toml [unstable] build-std = ["core", "compiler_builtins", "alloc"] ``` 现在编译器将重新编译并包含 `alloc` crate。 `#[no_std]` crate 中默认禁用 `alloc` crate 的原因是 `alloc` crate 有额外要求。编译项目时,我们会看到错误: ``` error: no global memory allocator found but one is required; link to std or add #[global_allocator] to a static item that implements the GlobalAlloc trait. ``` 错误原因是 `alloc` crate 需要一个堆分配器,并且它需要实现 `allocate` 和 `deallocate` 函数。在 Rust 中,堆分配器由 [`GlobalAlloc`] 特性描述,错误信息中提到了这个特性。为了实现堆分配器, 我们需将 `#[global_allocator]` 属性应用到一个实现了 `GlobalAlloc` 特性的 `static` 变量。 [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ### `GlobalAlloc` 特性 [`GlobalAlloc`] 特性定义了堆分配器必须提供的函数。该特性比较特殊,因为程序员几乎从不直接使用它。相反,当使用 `alloc` crate 的分配和集合类型时,编译器会自动插入对该特性方法的适当调用。 由于我们需要为所有分配器类型实现该特性,仔细查看其声明是值得的: ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` 它定义了两个必需的方法 [`alloc`] 和 [`dealloc`],对应于我们在示例中使用的 `allocate` 和 `deallocate` 函数: - [`alloc`] 方法接受一个 [`Layout`] 实例作为参数,该实例描述了分配内存所需的大小和对齐方式。它返回一个指向所分配内存块第一个字节的[裸指针][raw pointer]。`alloc` 方法不返回显式错误值,而是通过返回空指针来表示分配错误。虽然这有点非常规,但优点是易于包装现有系统分配器,因为它们使用相同的约定。 - [`dealloc`] 方法是其对应物,负责再次释放内存块。它接收两个参数:由 `alloc` 返回的指针和分配时使用的 [`Layout`]。 [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html 该特性还定义了 `alloc_zeroed` 和 `realloc` 两个方法,其有默认实现: - **alloc_zeroed 方法**:等同于调用 `alloc` 后将内存块置零,这就是默认实现。分配器可以提供更高效的实现。 - **realloc 方法**:允许扩展或缩小分配。默认实现是分配新内存块,复制旧分配内容。分配器可以提供更高效的实现,例如原地扩展/缩小。 #### 不安全性 需要注意的一点是,特性本身及其所有方法都被声明为 `unsafe`: - 将特性声明为 `unsafe` 的原因是,程序员必须保证分配器类型的特性实现是正确的。例如,`alloc` 方法绝不能返回已使用的内存块,因为这会导致未定义行为。 - 同样,其方法声明为 `unsafe` 的原因是,调用者在调用方法时必须确保各种不变量,例如,传递给 `alloc` 的 `Layout` 指定了非零大小。这和实际使用关系不大,因为这些方法通常由编译器直接调用,编译器会确保满足要求。 ### 虚拟分配器 了解分配器类型应提供的功能后,我们创建一个简单的虚拟分配器,在 `allocator` 模块中: ```rust // in src/lib.rs pub mod allocator; ``` 我们的虚拟分配器只实现特性的最小要求,并且调用 `alloc` 总是返回错误,它看起来如下: ```rust // in src/allocator.rs use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr::null_mut; pub struct Dummy; unsafe impl GlobalAlloc for Dummy { unsafe fn alloc(&self, _layout: Layout) -> *mut u8 { null_mut() } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { panic!("dealloc should be never called") } } ``` 这个结构体无需任何字段,所以我们定义它为[零大小类型][zero-sized type]。如上所述,`alloc` 始终返回空指针,这表示一个分配错误。由于从不返回内存,`dealloc` 不应被调用。因此 `dealloc` 只是简单调用 panic。`alloc_zeroed` 和 `realloc` 有默认实现,无需手动提供。 [zero-sized type]: https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts 我们现在有了一个虚拟分配器,但我们需要告诉 Rust 编译器应使用此分配器。这时需要使用 `#[global_allocator]` 属性。 ### `#[global_allocator]` 属性 `#[global_allocator]` 属性告诉 Rust 编译器应使用哪个分配器实例作为全局堆分配器。该属性只能用于实现了 `GlobalAlloc` 特性的 `static` 变量。让我们将 `Dummy` 分配器的一个实例注册为全局分配器: ```rust // in src/allocator.rs #[global_allocator] static ALLOCATOR: Dummy = Dummy; ``` 由于 `Dummy` 是零大小类型,初始化无需指定字段。 此静态变量应该可以修复编译错误。现在可使用 `alloc` 的分配和集合类型。例如,使用 [`Box`] 在堆上分配值: [`Box`]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html ```rust // in src/main.rs extern crate alloc; use alloc::boxed::Box; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] print "Hello World!", call `init`, create `mapper` and `frame_allocator` let x = Box::new(41); // […] call `test_main` in test mode println!("It did not crash!"); blog_os::hlt_loop(); } ``` 注意,我们需要在 `main.rs` 中也指定 `extern crate alloc` 语句。这是必须的,因为 `lib.rs` 和 `main.rs` 被视为单独的 crate。但是我们无需创建另一个 `#[global_allocator]` 静态变量,因为全局分配器适用于项目中的所有 crate。事实上,在另一个 crate 中指定额外的分配器会引发错误。 当我们运行上述代码时,会看到发生了一个 panic: ![QEMU 打印 "panicked at `allocation error: Layout { size_: 4, align_: 4 }, src/lib.rs:89:5"](qemu-dummy-output.png) 发生 panic 是因为 `Box::new` 函数隐式调用了全局分配器的 `alloc` 函数。我们的虚拟分配器始终返回空指针,因此每次分配都会失败。要修复此问题,我们需要创建一个能够返回可用内存的分配器。 ## 创建内核堆 在创建真正的分配器之前,我们首先需要创建一个堆内存区域,分配器可以从中分配内存。为此,我们需要为堆区域定义一个虚拟内存范围,然后将该区域映射到物理内存。有关虚拟内存和页表的概述,请参阅文章["内存分页初探"][_"Introduction To Paging"_]。 [_"Introduction To Paging"_]: @/edition-2/posts/08-paging-introduction/index.md 第一步是为堆定义一个虚拟内存区域。我们可以选择任何喜欢的虚拟地址范围,只要它尚未用于其他内存区域。让我们将其定义为从地址 `0x_4444_4444_0000` 开始的内存,以便以后轻松识别堆指针: ```rust // in src/allocator.rs pub const HEAP_START: usize = 0x_4444_4444_0000; pub const HEAP_SIZE: usize = 100 * 1024; // 100 KiB ``` 堆大小暂定为 100 KiB,未来可根据需要增加。 如果我们现在尝试使用这个堆区域,会发生页面错误,因为虚拟内存区域尚未映射到物理内存。为解决此问题,我们创建一个 `init_heap` 函数,使用我们在 ["页面实现"][_"Paging Implementation"_] 文章中介绍的 [`Mapper` API] 映射堆页面: [`Mapper` API]: @/edition-2/posts/09-paging-implementation/index.md#using-offsetpagetable [_"Paging Implementation"_]: @/edition-2/posts/09-paging-implementation/index.md ```rust // in src/allocator.rs use x86_64::{ structures::paging::{ mapper::MapToError, FrameAllocator, Mapper, Page, PageTableFlags, Size4KiB, }, VirtAddr, }; pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { let page_range = { let heap_start = VirtAddr::new(HEAP_START as u64); let heap_end = heap_start + HEAP_SIZE - 1u64; let heap_start_page = Page::containing_address(heap_start); let heap_end_page = Page::containing_address(heap_end); Page::range_inclusive(heap_start_page, heap_end_page) }; for page in page_range { let frame = frame_allocator .allocate_frame() .ok_or(MapToError::FrameAllocationFailed)?; let flags = PageTableFlags::PRESENT | PageTableFlags::WRITABLE; unsafe { mapper.map_to(page, frame, flags, frame_allocator)?.flush() }; } Ok(()) } ``` 该函数接受对 [`Mapper`] 和 [`FrameAllocator`] 实例的可变引用,两者都通过使用 [`Size4KiB`] 作为泛型参数限制为 4 KiB 页面。函数的返回值是一个 [`Result`],成功返回单元类型 `()`,失败返回[`MapToError`],这是 [`Mapper::map_to`] 方法返回的错误类型。在这里重用错误类型是有意义的,因为 `map_to` 方法是此函数的主要错误来源。 [`Mapper`]:https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html [`FrameAllocator`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html [`Size4KiB`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/enum.Size4KiB.html [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`MapToError`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html [`Mapper::map_to`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/trait.Mapper.html#method.map_to 实现可分为两个部分: - **创建页面范围**:为了创建我们希望映射的页面范围,我们将 `HEAP_START` 指针转换为 [`VirtAddr`] 类型。然后通过加上 `HEAP_SIZE` 计算堆结束地址。因为我们想要包含地址边界(堆最后一个字节的地址),因此减去 1。接下来,使用 [`containing_address`] 函数将地址转换为 [`Page`] 类型。最后,使用 [`Page::range_inclusive`] 函数从起始和结束页面创建页面范围。 - **映射页面**:第二步是映射我们刚创建的页面范围中的所有页面。为此,我们使用 `for` 循环迭代这些页面。对每个页面,我们执行以下操作: - 使用 [`FrameAllocator::allocate_frame`] 方法分配页面应映射到的物理内存。当没有更多内存时,该方法返回 [`None`]。我们通过 [`Option::ok_or`] 方法将其映射到 [`MapToError::FrameAllocationFailed`] 错误,并使用[问号操作符][question mark operator] 在错误情况下提前返回。 - 为页面设置必需的 `PRESENT` 标志和 `WRITABLE` 标志。这些标志允许读写访问,这对堆内存来说是合理的。 - 使用 [`Mapper::map_to`] 方法在活动页面表中创建映射。该方法可能失败,因此我们再次使用[问号操作符][question mark operator]将错误转发给调用者。成功时,该方法返回一个 [`MapperFlush`] 实例,我们可以使用其 [`flush`] 方法更新[_转换后备缓冲区_][_translation lookaside buffer_](简称TLB)。 [`VirtAddr`]: https://docs.rs/x86_64/0.14.2/x86_64/addr/struct.VirtAddr.html [`Page`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html [`containing_address`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.containing_address [`Page::range_inclusive`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/page/struct.Page.html#method.range_inclusive [`FrameAllocator::allocate_frame`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/trait.FrameAllocator.html#tymethod.allocate_frame [`None`]: https://doc.rust-lang.org/core/option/enum.Option.html#variant.None [`MapToError::FrameAllocationFailed`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/enum.MapToError.html#variant.FrameAllocationFailed [`Option::ok_or`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.ok_or [question mark operator]: https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html [`MapperFlush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [_translation lookaside buffer_]: @/edition-2/posts/08-paging-introduction/index.md#the-translation-lookaside-buffer [`flush`]: https://docs.rs/x86_64/0.14.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush 最后一步是在 `kernel_main` 中调用此函数: ```rust // in src/main.rs fn kernel_main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; // new import use blog_os::memory::{self, BootInfoFrameAllocator}; println!("Hello World{}", "!"); blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; // new allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("heap initialization failed"); let x = Box::new(41); // […] call `test_main` in test mode println!("It did not crash!"); blog_os::hlt_loop(); } ``` 我们在这里展示了完整的函数以提供上下文。新增的只有导入 `blog_os::allocator` 和对 `allocator::init_heap` 函数的调用。如果 `init_heap` 函数返回错误,我们使用 [`Result::expect`] 方法触发 panic,因为目前我们没有更好的错误处理方式。 [`Result::expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect 我们现在有了一个已映射的堆内存区域,可以开始使用了。但调用 `Box::new` 时仍使用旧的 `Dummy` 分配器,因此运行时仍会看到 "out of memory" 错误。下面我们通过使用真正的分配器来修复这个问题。 ## 使用分配器 Crate 由于实现分配器较复杂,我们先使用一个外部分配器 crate。我们将在下一篇文章中学习如何实现自己的分配器。 适用于 `no_std` 应用的简单分配器 crate 是 [`linked_list_allocator`] crate。其名称源于它使用链表数据结构来跟踪已释放的内存区域。有关此方法的详细解释,请参阅下一篇文章。 要使用该 crate,我们首先需要在 `Cargo.toml` 中添加对其的依赖: ```toml # in Cargo.toml [dependencies] linked_list_allocator = "0.9.0" ``` 然后用这个 crate 所提供的分配器替换虚拟分配器: ```rust // in src/allocator.rs use linked_list_allocator::LockedHeap; #[global_allocator] static ALLOCATOR: LockedHeap = LockedHeap::empty(); ``` 该结构体名为 `LockedHeap`,因为它使用 [`spinning_top::Spinlock`] 类型进行同步。这是必须的,因为多个线程可能同时访问 `ALLOCATOR` 静态变量。与使用自旋锁或互斥锁时一样,我们需要小心以避免造成死锁。这意味着我们不应在中断处理程序中执行任何分配,因为它们可能在任意时间运行并中断正在进行的分配。 [`spinning_top::Spinlock`]: https://docs.rs/spinning_top/0.1.0/spinning_top/type.Spinlock.html 仅将 `LockedHeap` 设置为全局分配器还不够。原因是使用了 [`empty`] 构造函数,该函数创建了一个没有可用内存的分配器。与我们的虚拟分配器一样,它在 `alloc` 时始终返回错误。要解决此问题,我们需要在创建堆后初始化分配器: [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.LockedHeap.html#method.empty ```rust // in src/allocator.rs pub fn init_heap( mapper: &mut impl Mapper, frame_allocator: &mut impl FrameAllocator, ) -> Result<(), MapToError> { // […] map all heap pages to physical frames // new unsafe { ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE); } Ok(()) } ``` 我们使用 `LockedHeap` 类型内部自旋锁的 [`lock`] 方法获取封装后 [`Heap`] 实例的独立引用,然后传入堆边界作为参数,在其上调用 [`init`] 方法。由于 [`init`] 函数已经尝试写入堆内存,我们必须在映射堆页面之后再进行初始化堆。 [`lock`]: https://docs.rs/lock_api/0.3.3/lock_api/struct.Mutex.html#method.lock [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init 初始化堆后,我们现在可以正确地使用内置 [alloc] crate 的所有分配和集合类型: ```rust // in src/main.rs use alloc::{boxed::Box, vec, vec::Vec, rc::Rc}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialize interrupts, mapper, frame_allocator, heap // 在堆上分配数字 let heap_value = Box::new(41); println!("heap_value at {:p}", heap_value); // 创建动态大小向量 let mut vec = Vec::new(); for i in 0..500 { vec.push(i); } println!("vec at {:p}", vec.as_slice()); // 创建引用计数向量,计数为 0 时释放 let reference_counted = Rc::new(vec![1, 2, 3]); let cloned_reference = reference_counted.clone(); println!("current reference count is {}",Rc::strong_count(&cloned_reference)); core::mem::drop(reference_counted); println!("reference count is {} now", Rc::strong_count(&cloned_reference)); // […] call `test_main` in test context println!("It did not crash!"); blog_os::hlt_loop(); } ``` 此代码示例展示了 [`Box`]、[`Vec`] 和 [`Rc`] 类型的一些用法。对于 [`Box`] 和 [`Vec`] 类型,我们使用 [`{:p}` 格式说明符][`{:p}` formatting specifier] 打印底层的堆指针。为了展示 [`Rc`],我们创建了一个引用计数的堆值,并使用 [`Rc::strong_count`] 函数打印丢弃一个引用实例(使用 [`core::mem::drop`])前后的当前引用计数。 [`Vec`]: https://doc.rust-lang.org/alloc/vec/ [`Rc`]: https://doc.rust-lang.org/alloc/rc/ [`{:p}` formatting specifier]: https://doc.rust-lang.org/core/fmt/trait.Pointer.html [`Rc::strong_count`]: https://doc.rust-lang.org/alloc/rc/struct.Rc.html#method.strong_count [`core::mem::drop`]: https://doc.rust-lang.org/core/mem/fn.drop.html 当我们运行的时候,可以看到如下结果: ![QEMU 打印: heap_value at 0x444444440000 vec at 0x444444440800 current reference count is 2 reference count is 1 now](qemu-alloc-showcase.png) 正如预期的那样,我们看到 `Box` 和 `Vec` 的值都位于堆上,从其以 `0x_4444_4444_*` 为前缀的指针可以看出。引用计数的值也符合预期,在调用 `clone` 后引用计数为 2,其中一个引用实例被丢弃后又变为 1。 向量从偏移量 `0x800` 处开始的原因并不是因为 `Box` 分配了 `0x800` 字节大小的内存,而是由于向量需要增加容量时发生的[重新分配]。例如,当向量的容量是 32 且我们尝试添加下一个元素时,向量会在幕后分配一个容量为 64 的新数组,并复制所有元素。然后它会释放旧的分配。 当然,`alloc` crate 中还有许多其他的分配和集合类型,我们现在都可以在内核中使用,包括: - 线程安全的引用计数指针 [`Arc`] - 字符串类型 [`String`]] 和 [`format!`] 宏 - [`LinkedList`] - 可增长的环形缓冲区 [`VecDeque`] - [`BinaryHeap`] 优先队列 - [`BTreeMap`] 和 [`BTreeSet`] [`arc`]: https://doc.rust-lang.org/alloc/sync/struct.Arc.html [`string`]: https://doc.rust-lang.org/alloc/string/struct.String.html [`format!`]: https://doc.rust-lang.org/alloc/macro.format.html [`linkedlist`]: https://doc.rust-lang.org/alloc/collections/linked_list/struct.LinkedList.html [`vecdeque`]: https://doc.rust-lang.org/alloc/collections/vec_deque/struct.VecDeque.html [`binaryheap`]: https://doc.rust-lang.org/alloc/collections/binary_heap/struct.BinaryHeap.html [`btreemap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html [`btreeset`]: https://doc.rust-lang.org/alloc/collections/btree_set/struct.BTreeSet.html 当我们想要实现线程列表、调度队列或支持 async/await 时,这些类型将变得非常有用。 ## 添加测试 为了确保我们不会意外地破坏新的内存分配代码,我们应该为其添加一个集成测试。我们首先创建一个新的 `tests/heap_allocation.rs` 文件,内容如下: ```rust // in tests/heap_allocation.rs #![no_std] #![no_main] #![feature(custom_test_frameworks)] #![test_runner(blog_os::test_runner)] #![reexport_test_harness_main = "test_main"] extern crate alloc; use bootloader::{entry_point, BootInfo}; use core::panic::PanicInfo; entry_point!(main); fn main(boot_info: &'static BootInfo) -> ! { unimplemented!(); } #[panic_handler] fn panic(info: &PanicInfo) -> ! { blog_os::test_panic_handler(info) } ``` 我们复用了 `lib.rs` 中的 `test_runner` 和 `test_panic_handler` 函数。因为我们想测试内存分配,所以通过 `extern crate alloc` 语句导入了 `alloc` crate。关于测试样板代码的更多信息,请查看 [_Testing_] 这篇文章。 [_Testing_]: @/edition-2/posts/04-testing/index.md `main` 函数的实现如下: ```rust // in tests/heap_allocation.rs fn main(boot_info: &'static BootInfo) -> ! { use blog_os::allocator; use blog_os::memory::{self, BootInfoFrameAllocator}; use x86_64::VirtAddr; blog_os::init(); let phys_mem_offset = VirtAddr::new(boot_info.physical_memory_offset); let mut mapper = unsafe { memory::init(phys_mem_offset) }; let mut frame_allocator = unsafe { BootInfoFrameAllocator::init(&boot_info.memory_map) }; allocator::init_heap(&mut mapper, &mut frame_allocator) .expect("heap initialization failed"); test_main(); loop {} } ``` 它与我们 `main.rs` 中的 `kernel_main` 函数非常相似,不同之处在于我们没有调用 `println`,没有包含任何分配示例,并且无条件地调用了`test_main`。 现在我们已经准备好添加一些测试用例。首先,我们添加一个使用 [`Box`] 执行简单分配的测试,并检查分配的值以确保基本的分配功能正常工作: ```rust // in tests/heap_allocation.rs use alloc::boxed::Box; #[test_case] fn simple_allocation() { let heap_value_1 = Box::new(41); let heap_value_2 = Box::new(13); assert_eq!(*heap_value_1, 41); assert_eq!(*heap_value_2, 13); } ``` 正如预期,测试验证了没有发生分配错误。 接下来,我们迭代地构建一个大型向量,以测试大内存分配和多次内存分配(多次内存分配是由重新分配造成的): ```rust // in tests/heap_allocation.rs use alloc::vec::Vec; #[test_case] fn large_vec() { let n = 1000; let mut vec = Vec::new(); for i in 0..n { vec.push(i); } assert_eq!(vec.iter().sum::(), (n - 1) * n / 2); } ``` 我们通过将其与 [n-th partial sum] 的公式进行比较来验证总和。这使我们确保分配的值都时正确的。 [n-th partial sum]: https://en.wikipedia.org/wiki/1_%2B_2_%2B_3_%2B_4_%2B_%E2%8B%AF#Partial_sums 作为第三个测试,我们连续创建一万个分配 ```rust // in tests/heap_allocation.rs use blog_os::allocator::HEAP_SIZE; #[test_case] fn many_boxes() { for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } } ``` 该测试确保分配器会重用已释放的内存进行后续分配,否则内存会耗尽。这看起来对于一个分配器来说可能是一个显而易见的要求,但有些分配器并不会这样做。一个例子就是将在下一篇文章中解释的 bump 分配器。 运行集成测试: ``` > cargo test --test heap_allocation Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` 所有三项测试都已成功!你也可以调用 `cargo test`(不带 `--test` 参数)来运行所有的单元测试和集成测试。 ## 总结 本文介绍了动态内存,并解释了它的必要性及应用场景。我们了解了 Rust 的借用检查器如何防止常见的漏洞,以及 Rust 的内存分配 API 的工作原理。 在通过一个虚拟分配器实现了 Rust 分配器接口的最小版本后,我们为内核创建了一个合适的堆内存区域。为此,我们定义了堆的虚拟地址范围,然后使用上一篇文章中的 `Mapper` 和 `FrameAllocator` 将该范围的所有页面映射到物理内存。 最后,我们添加了对 `linked_list_allocator` crate 的依赖,为内核添加了一个合适的分配器。有了这个分配器,我们就可以使用 `Box`、`Vec` 以及 `alloc` crate 中的其他分配和集合类型了。 ## 下篇预告 尽管我们在这篇文章中已经添加了堆分配支持,但大部分工作都留给了 `linked_list_allocator` crate。下一篇文章将详细展示如何从头开始实现一个分配器。它将介绍多种可能的分配器设计,展示如何实现它们的简单版本,并解释它们的优缺点。 ================================================ FILE: blog/content/edition-2/posts/11-allocator-designs/index.es.md ================================================ +++ title = "Diseños de Allocadores" weight = 11 path = "es/allocator-designs" date = 2020-01-20 [extra] chapter = "Gestión de Memoria" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ Este post explica cómo implementar allocadores de heap desde cero. Presenta y discute diferentes diseños de allocadores, incluyendo asignación de bump, asignación de lista enlazada y asignación de bloques de tamaño fijo. Para cada uno de los tres diseños, crearemos una implementación básica que se puede utilizar para nuestro kernel. Este blog se desarrolla abiertamente en [GitHub]. Si tienes problemas o preguntas, por favor abre un issue allí. También puedes dejar comentarios [al final]. El código fuente completo para este post se puede encontrar en la rama [`post-11`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-11 ## Introducción En el [post anterior], añadimos soporte básico para asignaciones de heap a nuestro kernel. Para ello, [creamos una nueva región de memoria][map-heap] en las tablas de páginas y [utilizamos el crate `linked_list_allocator`][use-alloc-crate] para gestionar esa memoria. Aunque ahora tenemos un heap operativo, dejamos la mayor parte del trabajo al crate del allocador sin intentar entender cómo funciona. [post anterior]: @/edition-2/posts/10-heap-allocation/index.md [map-heap]: @/edition-2/posts/10-heap-allocation/index.md#creating-a-kernel-heap [use-alloc-crate]: @/edition-2/posts/10-heap-allocation/index.md#using-an-allocator-crate En este post, mostraremos cómo crear nuestro propio allocador de heap desde cero en lugar de depender de un crate de allocador existente. Discutiremos diferentes diseños de allocadores, incluyendo un _allocador de bump_ simplista y un _allocador de bloques de tamaño fijo_ básico, y usaremos este conocimiento para implementar un allocador con mejor rendimiento (en comparación con el crate `linked_list_allocator`). ### Objetivos de Diseño La responsabilidad de un allocador es gestionar la memoria heap disponible. Necesita devolver memoria no utilizada en las llamadas a `alloc` y mantener un registro de la memoria liberada por `dealloc` para que pueda ser reutilizada. Más importante aún, nunca debe entregar memoria que ya esté en uso en otro lugar, ya que esto causaría un comportamiento indefinido. Aparte de la corrección, hay muchos objetivos secundarios de diseño. Por ejemplo, el allocador debería utilizar de manera efectiva la memoria disponible y mantener baja la [_fragmentación_]. Además, debería funcionar bien para aplicaciones concurrentes y escalar a cualquier número de procesadores. Para un rendimiento máximo, podría incluso optimizar el diseño de la memoria con respecto a los cachés de CPU para mejorar la [localidad de caché] y evitar [compartición falsa]. [localidad de caché]: https://www.geeksforgeeks.org/locality-of-reference-and-cache-operation-in-cache-memory/ [_fragmentación_]: https://en.wikipedia.org/wiki/Fragmentation_(computing) [compartición falsa]: https://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html Estos requisitos pueden hacer que los buenos allocadores sean muy complejos. Por ejemplo, [jemalloc] tiene más de 30.000 líneas de código. Esta complejidad a menudo es indeseable en el código del kernel, donde un solo error puede conducir a vulnerabilidades de seguridad graves. Afortunadamente, los patrones de asignación del código del kernel son a menudo mucho más simples en comparación con el código de espacio de usuario, de manera que diseños de allocadores relativamente simples suelen ser suficientes. [jemalloc]: http://jemalloc.net/ A continuación, presentamos tres posibles diseños de allocadores del kernel y explicamos sus ventajas e inconvenientes. ## Allocador de Bump El diseño de allocador más simple es un _allocador de bump_ (también conocido como _allocador de pila_). Asigna memoria de forma lineal y solo lleva un registro del número de bytes asignados y del número de asignaciones. Solo es útil en casos de uso muy específicos porque tiene una limitación severa: solo puede liberar toda la memoria a la vez. ### Idea La idea detrás de un allocador de bump es asignar memoria linealmente al incrementar una variable llamada `next`, que apunta al inicio de la memoria no utilizada. Al principio, `next` es igual a la dirección de inicio del heap. En cada asignación, `next` se incrementa por el tamaño de la asignación, de manera que siempre apunta al límite entre la memoria utilizada y no utilizada: ![El área de memoria heap en tres momentos: 1: Existe una única asignación al inicio del heap; el puntero `next` apunta a su final. 2: Se agregó una segunda asignación justo después de la primera; el puntero `next` apunta al final de la segunda asignación. 3: Se agregó una tercera asignación justo después de la segunda; el puntero `next` apunta al final de la tercera asignación.](bump-allocation.svg) El puntero `next` solo se mueve en una sola dirección y por lo tanto nunca entrega la misma región de memoria dos veces. Cuando alcanza el final del heap, no se puede asignar más memoria, resultando en un error de falta de memoria en la siguiente asignación. Un allocador de bump se implementa a menudo con un contador de asignaciones, que se incrementa en 1 con cada llamada a `alloc` y se disminuye en 1 con cada llamada a `dealloc`. Cuando el contador de asignaciones llega a cero, significa que todas las asignaciones en el heap han sido liberadas. En este caso, el puntero `next` puede restablecerse a la dirección de inicio del heap, de manera que toda la memoria del heap esté disponible para nuevas asignaciones nuevamente. ### Implementación Comenzamos nuestra implementación declarando un nuevo submódulo `allocator::bump`: ```rust // en src/allocator.rs pub mod bump; ``` El contenido del submódulo vive en un nuevo archivo `src/allocator/bump.rs`, que creamos con el siguiente contenido: ```rust // en src/allocator/bump.rs pub struct BumpAllocator { heap_start: usize, heap_end: usize, next: usize, allocations: usize, } impl BumpAllocator { /// Crea un nuevo allocador de bump vacío. pub const fn new() -> Self { BumpAllocator { heap_start: 0, heap_end: 0, next: 0, allocations: 0, } } /// Inicializa el allocador de bump con los límites de heap dados. /// /// Este método es inseguro porque el llamador debe garantizar que el rango de memoria dado no está en uso. Además, este método debe ser llamado solo una vez. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { self.heap_start = heap_start; self.heap_end = heap_start + heap_size; self.next = heap_start; } } ``` Los campos `heap_start` y `heap_end` mantienen un registro de los límites inferior y superior de la región de memoria heap. El llamador necesita asegurarse de que estas direcciones son válidas, de lo contrario, el allocador retornaría memoria inválida. Por esta razón, la función `init` necesita ser `insegura` para su llamada. El propósito del campo `next` es apuntar siempre al primer byte no utilizado del heap, es decir, la dirección de inicio de la siguiente asignación. Se establece en `heap_start` en la función `init` porque al principio, todo el heap está sin usar. En cada asignación, este campo se incrementará por el tamaño de la asignación (_"bump"_) para asegurar que no devolvemos la misma región de memoria dos veces. El campo `allocations` es un contador simple para las asignaciones activas con el objetivo de restablecer el allocador después de que la última asignación ha sido liberada. Se inicializa con 0. Elegimos crear una función `init` separada en lugar de realizar la inicialización directamente en `new` para mantener la interfaz idéntica a la proporcionada por el crate `linked_list_allocator`. De esta manera, los allocadores pueden cambiarse sin cambios adicionales en el código. ### Implementando `GlobalAlloc` Como [se explicó en el post anterior][global-alloc], todos los allocadores de heap necesitan implementar el trait [`GlobalAlloc`], que se define de la siguiente manera: [global-alloc]: @/edition-2/posts/10-heap-allocation/index.md#the-allocator-interface [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` Solo se requieren los métodos `alloc` y `dealloc`; los otros dos métodos tienen implementaciones predeterminadas y pueden ser omitidos. #### Primer Intento de Implementación Intentemos implementar el método `alloc` para nuestro `BumpAllocator`: ```rust // en src/allocator/bump.rs use alloc::alloc::{GlobalAlloc, Layout}; unsafe impl GlobalAlloc for BumpAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // TODO: alineación y verificación de límites let alloc_start = self.next; self.next = alloc_start + layout.size(); self.allocations += 1; alloc_start as *mut u8 } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { todo!(); } } ``` Primero, utilizamos el campo `next` como la dirección de inicio para nuestra asignación. Luego actualizamos el campo `next` para que apunte a la dirección final de la asignación, que es la siguiente dirección no utilizada en el heap. Antes de devolver la dirección de inicio de la asignación como un puntero `*mut u8`, incrementamos el contador de `allocations` en 1. Ten en cuenta que no realizamos ninguna verificación de límites o ajustes de alineación, por lo que esta implementación aún no es segura. Esto no importa mucho porque falla al compilar de todos modos con el siguiente error: ``` error[E0594]: cannot assign to `self.next` which is behind a `&` reference --> src/allocator/bump.rs:29:9 | 29 | self.next = alloc_start + layout.size(); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&` reference, so the data it refers to cannot be written ``` (El mismo error también ocurre para la línea `self.allocations += 1`. Lo omitimos aquí por brevedad.) El error ocurre porque los métodos [`alloc`] y [`dealloc`] del trait [`GlobalAlloc`] solo operan en una referencia inmutable `&self`, por lo que no es posible actualizar los campos `next` y `allocations`. Esto es problemático porque actualizar `next` en cada asignación es el principio esencial de un allocador de bump. [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc #### `GlobalAlloc` y Mutabilidad Antes de buscar una posible solución a este problema de mutabilidad, intentemos entender por qué los métodos del trait `GlobalAlloc` están definidos con argumentos `&self`: Como vimos [en el post anterior][global-allocator], el allocador de heap global se define añadiendo el atributo `#[global_allocator]` a un `static` que implementa el trait `GlobalAlloc`. Las variables estáticas son inmutables en Rust, así que no hay forma de llamar a un método que tome `&mut self` en el allocador estático. Por esta razón, todos los métodos de `GlobalAlloc` solo toman una referencia inmutable `&self`. [global-allocator]: @/edition-2/posts/10-heap-allocation/index.md#the-global-allocator-attribute Afortunadamente, hay una manera de obtener una referencia `&mut self` de una referencia `&self`: podemos usar la [mutabilidad interna] sincronizada envolviendo el allocador en un [`spin::Mutex`] spinlock. Este tipo proporciona un método `lock` que realiza [exclusión mutua] y, por lo tanto, convierte de manera segura una referencia `&self` a una referencia `&mut self`. Ya hemos utilizado el tipo wrapper varias veces en nuestro kernel, por ejemplo, para el [búfer de texto VGA][vga-mutex]. [mutabilidad interna]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [vga-mutex]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks [`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html [exclusión mutua]: https://en.wikipedia.org/wiki/Mutual_exclusion #### Un Tipo Wrapper `Locked` Con la ayuda del tipo wrapper `spin::Mutex`, podemos implementar el trait `GlobalAlloc` para nuestro allocador de bump. El truco es implementar el trait no para `BumpAllocator` directamente, sino para el tipo envuelto `spin::Mutex`: ```rust unsafe impl GlobalAlloc for spin::Mutex {…} ``` Desafortunadamente, esto aún no funciona porque el compilador de Rust no permite implementaciones de traits para tipos definidos en otros crates: ``` error[E0117]: only traits defined in the current crate can be implemented for arbitrary types --> src/allocator/bump.rs:28:1 | 28 | unsafe impl GlobalAlloc for spin::Mutex { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^-------------------------- | | | | | `spin::mutex::Mutex` is not defined in the current crate | impl doesn't use only types from inside the current crate | = note: define and implement a trait or new type instead ``` Para solucionar esto, necesitamos crear nuestro propio tipo wrapper alrededor de `spin::Mutex`: ```rust // en src/allocator.rs /// Un wrapper alrededor de spin::Mutex para permitir implementaciones de traits. pub struct Locked
    { inner: spin::Mutex, } impl Locked { pub const fn new(inner: A) -> Self { Locked { inner: spin::Mutex::new(inner), } } pub fn lock(&self) -> spin::MutexGuard { self.inner.lock() } } ``` El tipo es un wrapper genérico alrededor de `spin::Mutex`. No impone restricciones sobre el tipo envuelto `A`, por lo que puede ser utilizado para envolver todo tipo de tipos, no solo allocadores. Proporciona una simple función constructora `new` que envuelve un valor dado. Para mayor comodidad, también proporciona una función `lock` que llama a `lock` en el `Mutex` envuelto. Dado que el tipo `Locked` es lo suficientemente general como para ser útil para otras implementaciones de allocadores también, lo colocamos en el módulo `allocator` padre. #### Implementación para `Locked` El tipo `Locked` se define en nuestro propio crate (en contraste con `spin::Mutex`), por lo que podemos usarlo para implementar `GlobalAlloc` para nuestro allocador de bump. La implementación completa es la siguiente: ```rust // en src/allocator/bump.rs use super::{align_up, Locked}; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut bump = self.lock(); // obtener una referencia mutable let alloc_start = align_up(bump.next, layout.align()); let alloc_end = match alloc_start.checked_add(layout.size()) { Some(end) => end, None => return ptr::null_mut(), }; if alloc_end > bump.heap_end { ptr::null_mut() // sin memoria } else { bump.next = alloc_end; bump.allocations += 1; alloc_start as *mut u8 } } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { let mut bump = self.lock(); // obtener una referencia mutable bump.allocations -= 1; if bump.allocations == 0 { bump.next = bump.heap_start; } } } ``` El primer paso para ambos métodos `alloc` y `dealloc` es llamar al método `Mutex::lock` a través del campo `inner` para obtener una referencia mutable al tipo de allocador envuelto. La instancia permanece bloqueada hasta el final del método, de modo que no puede ocurrir ninguna condición de carrera en contextos multihilo (agregaremos soporte para hilos pronto). Comparado con el prototipo anterior, la implementación de `alloc` ahora respeta los requisitos de alineación y realiza una verificación de límites para asegurarse de que las asignaciones se mantengan dentro de la región de memoria heap. El primer paso es redondear la dirección `next` a la alineación especificada por el argumento `Layout`. El código de la función `align_up` se muestra en un momento. Luego sumamos el tamaño de la asignación a `alloc_start` para obtener la dirección final de la asignación. Para prevenir el desbordamiento de enteros en asignaciones grandes, utilizamos el método `checked_add`. Si ocurre un desbordamiento o si la dirección final de la asignación es mayor que la dirección final del heap, devolvemos un puntero nulo para señalar una situación de falta de memoria. De lo contrario, actualizamos la dirección `next` y aumentamos el contador de `allocations` en 1 como antes. Finalmente, devolvemos la dirección `alloc_start` convertida a un puntero `*mut u8`. [`checked_add`]: https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html La función `dealloc` ignora el puntero y los argumentos de `Layout` dados. En su lugar, simplemente disminuye el contador de `allocations`. Si el contador alcanza 0 nuevamente, significa que todas las asignaciones fueron liberadas nuevamente. En este caso, restablece la dirección `next` a la dirección `heap_start` para hacer que toda la memoria del heap esté disponible nuevamente. #### Alineación de Direcciones La función `align_up` es lo suficientemente general como para que podamos ponerla en el módulo `allocator` padre. Una implementación básica se ve así: ```rust // en src/allocator.rs /// Alinea la dirección dada `addr` hacia arriba a la alineación `align`. fn align_up(addr: usize, align: usize) -> usize { let remainder = addr % align; if remainder == 0 { addr // addr ya está alineado } else { addr - remainder + align } } ``` La función primero calcula el [resto] de la división de `addr` entre `align`. Si el resto es `0`, la dirección ya está alineada con la alineación dada. De lo contrario, alineamos la dirección restando el resto (para que el nuevo resto sea 0) y luego sumando la alineación (para que la dirección no se vuelva más pequeña que la dirección original). [resto]: https://en.wikipedia.org/wiki/Euclidean_division Ten en cuenta que esta no es la forma más eficiente de implementar esta función. Una implementación mucho más rápida se ve así: ```rust /// Alinea la dirección dada `addr` hacia arriba a la alineación `align`. /// /// Requiere que `align` sea una potencia de dos. fn align_up(addr: usize, align: usize) -> usize { (addr + align - 1) & !(align - 1) } ``` Este método requiere que `align` sea una potencia de dos, lo que puede ser garantizado utilizando el trait `GlobalAlloc` (y su parámetro [`Layout`]). Esto hace posible crear una [máscara de bits] para alinear la dirección de manera muy eficiente. Para entender cómo funciona, repasemos el proceso paso a paso, comenzando por el lado derecho: [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html [máscara de bits]: https://en.wikipedia.org/wiki/Mask_(computing) - Dado que `align` es una potencia de dos, su [representación binaria] tiene solo un solo bit establecido (por ejemplo, `0b000100000`). Esto significa que `align - 1` tiene todos los bits inferiores establecidos (por ejemplo, `0b00011111`). - Al crear el [NO bit a bit] a través del operador `!`, obtenemos un número que tiene todos los bits establecidos excepto los bits inferiores a `align` (por ejemplo, `0b…111111111100000`). - Al realizar un [Y bit a bit] en una dirección y `!(align - 1)`, alineamos la dirección _hacia abajo_. Esto funciona borrando todos los bits que están por debajo de `align`. - Dado que queremos alinear hacia arriba en lugar de hacia abajo, incrementamos `addr` en `align - 1` antes de realizar el Y bit a bit. De esta manera, las direcciones ya alineadas permanecen iguales mientras que las direcciones no alineadas se redondean al siguiente límite de alineación. [representación binaria]: https://en.wikipedia.org/wiki/Binary_number#Representation [NO bit a bit]: https://en.wikipedia.org/wiki/Bitwise_operation#NOT [Y bit a bit]: https://en.wikipedia.org/wiki/Bitwise_operation#AND Qué variante elijas depende de ti. Ambas calculan el mismo resultado, solo que utilizan diferentes métodos. ### Usándolo Para usar el allocador de bump en lugar del crate `linked_list_allocator`, necesitamos actualizar el estático `ALLOCATOR` en `allocator.rs`: ```rust // en src/allocator.rs use bump::BumpAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(BumpAllocator::new()); ``` Aquí se vuelve importante que declaramos `BumpAllocator::new` y `Locked::new` como funciones [`const`]. Si fueran funciones normales, ocurriría un error de compilación porque la expresión de inicialización de un `static` debe ser evaluable en tiempo de compilación. [`const`]: https://doc.rust-lang.org/reference/items/functions.html#const-functions No necesitamos cambiar la llamada `ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE)` en nuestra función `init_heap` porque el allocador de bump proporciona la misma interfaz que el allocador proporcionado por el `linked_list_allocator`. Ahora nuestro kernel usa nuestro allocador de bump. ¡Todo debería seguir funcionando, incluyendo las pruebas del [`heap_allocation`] que creamos en el post anterior! [`heap_allocation`]: @/edition-2/posts/10-heap-allocation/index.md#adding-a-test ``` > cargo test --test heap_allocation […] Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` ### Discusión La gran ventaja de la asignación de bump es que es muy rápida. Comparada con otros diseños de allocadores (ver abajo) que necesitan buscar activamente un bloque de memoria adecuado y realizar varias tareas de contabilidad en `alloc` y `dealloc`, un allocador de bump [puede ser optimizado][bump downwards] a solo unas pocas instrucciones de ensamblador. Esto hace que los allocadores de bump sean útiles para optimizar el rendimiento de la asignación, por ejemplo cuando se crea una [biblioteca de DOM virtual]. [bump downwards]: https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html [biblioteca de DOM virtual]: https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/ Si bien un allocador de bump se usa raramente como el allocador global, el principio de la asignación de bump se aplica a menudo en la forma de [asignación de arena], que básicamente agrupa asignaciones individuales para mejorar el rendimiento. Un ejemplo de un allocador de arena para Rust se encuentra en el crate [`toolshed`]. [asignación de arena]: https://mgravell.github.io/Pipelines.Sockets.Unofficial/docs/arenas.html [`toolshed`]: https://docs.rs/toolshed/0.8.1/toolshed/index.html #### La Desventaja de un Allocador de Bump La principal limitación de un allocador de bump es que solo puede reutilizar la memoria liberada después de que todas las asignaciones han sido liberadas. Esto significa que una sola asignación de larga duración es suficiente para evitar la reutilización de memoria. Podemos ver esto cuando agregamos una variación de la prueba `many_boxes`: ```rust // en tests/heap_allocation.rs #[test_case] fn many_boxes_long_lived() { let long_lived = Box::new(1); // nueva for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } assert_eq!(*long_lived, 1); // nueva } ``` Como la prueba `many_boxes`, esta prueba crea un gran número de asignaciones para provocar un error de falta de memoria si el allocador no reutiliza la memoria liberada. Adicionalmente, la prueba crea una asignación `long_lived`, que vive para toda la ejecución del bucle. Cuando intentamos ejecutar nuestra nueva prueba, vemos que efectivamente falla: ``` > cargo test --test heap_allocation Running 4 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [failed] Error: panicked at 'allocation error: Layout { size_: 8, align_: 8 }', src/lib.rs:86:5 ``` Intentemos entender por qué ocurre este fallo en detalle: Primero, la asignación `long_lived` se crea al inicio del heap, aumentando así el contador de `allocations` en 1. Para cada iteración del bucle, se crea una asignación de corta duración y se libera inmediatamente antes de que comience la siguiente iteración. Esto significa que el contador de `allocations` aumenta temporalmente a 2 al comienzo de una iteración y disminuye a 1 al final. El problema ahora es que el allocador de bump solo puede reutilizar la memoria después de que _todas_ las asignaciones han sido liberadas, es decir, cuando el contador de `allocations` cae a 0. Dado que esto no ocurre antes del final del bucle, cada iteración del bucle asigna una nueva región de memoria, lo que lleva a un error de falta de memoria después de un número de iteraciones. #### ¿Arreglando la Prueba? Hay dos trucos potenciales que podríamos utilizar para arreglar la prueba para nuestro allocador de bump: - Podríamos actualizar `dealloc` para verificar si la asignación liberada fue la última asignación devuelta por `alloc` comparando su dirección final con el puntero `next`. En caso de que sean iguales, podemos restablecer de manera segura `next` a la dirección de inicio de la asignación liberada. De esta manera, cada iteración del bucle reutiliza el mismo bloque de memoria. - Podríamos agregar un método `alloc_back` que asigna memoria desde el _final_ del heap utilizando un campo adicional `next_back`. Entonces podríamos utilizar manualmente este método de asignación para todas las asignaciones de larga duración, separando así las asignaciones de corta y larga duración en el heap. Ten en cuenta que esta separación solo funciona si se sabe de antemano cuánto tiempo vivirá cada asignación. Otra desventaja de este enfoque es que realizar asignaciones manualmente es engorroso y potencialmente inseguro. Si bien ambos enfoques funcionan para arreglar la prueba, no son una solución general ya que solo pueden reutilizar memoria en casos muy específicos. La pregunta es: ¿hay una solución general que reutilice _toda_ la memoria liberada? #### ¿Reutilizando Toda la Memoria Liberada? Como aprendimos [en el post anterior][heap-intro], las asignaciones pueden vivir indefinidamente y pueden liberarse en un orden arbitrario. Esto significa que necesitamos llevar un registro de un número potencialmente ilimitado de regiones de memoria no utilizadas no contiguas, como se ilustra en el siguiente ejemplo: [heap-intro]: @/edition-2/posts/10-heap-allocation/index.md#dynamic-memory ![](allocation-fragmentation.svg) La gráfica muestra el heap a lo largo del tiempo. Al principio, todo el heap está sin usar y la dirección `next` es igual a `heap_start` (línea 1). Luego ocurre la primera asignación (línea 2). En la línea 3, se asigna un segundo bloque de memoria y se libera la primera asignación. Se agregan muchas más asignaciones en la línea 4. La mitad de ellas son de muy corta duración y ya se liberan en la línea 5, donde también se agrega otra nueva asignación. La línea 5 muestra el problema fundamental: tenemos cinco regiones de memoria no utilizadas con diferentes tamaños, pero el puntero `next` solo puede apuntar al comienzo de la última región. Mientras que podríamos almacenar las direcciones de inicio y tamaños de las otras regiones no utilizadas en un arreglo de tamaño 4 para este ejemplo, esta no es una solución general, ya que podríamos crear fácilmente un ejemplo con 8, 16 o 1000 regiones de memoria no utilizadas. Normalmente, cuando tenemos un número potencialmente ilimitado de elementos, podemos usar simplemente una colección asignada en el heap. Esto no es realmente posible en nuestro caso, ya que el allocador de heap no puede depender de sí mismo (causaría una recursión infinita o bloqueos). Así que necesitamos encontrar una solución diferente. ## Allocador de Lista Enlazada Un truco común para llevar un registro de un número arbitrario de áreas de memoria libres al implementar allocadores es utilizar estas áreas de memoria como almacenamiento de respaldo. Esto utiliza el hecho de que las regiones aún están mapeadas a una dirección virtual y respaldadas por un marco físico, pero la información almacenada ya no es necesaria. Al almacenar la información sobre la región liberada en la región misma, podemos rastrear un número ilimitado de regiones liberadas sin necesidad de memoria adicional. El enfoque de implementación más común es construir una lista enlazada simple en la memoria liberada, donde cada nodo es una región de memoria liberada: ![](linked-list-allocation.svg) Cada nodo de la lista contiene dos campos: el tamaño de la región de memoria y un puntero a la siguiente región de memoria no utilizada. Con este enfoque, solo necesitamos un puntero a la primera región no utilizada (llamada `head`) para llevar un registro de todas las regiones no utilizadas, independientemente de su número. La estructura de datos resultante se denomina a menudo [_lista libre_]. [_lista libre_]: https://en.wikipedia.org/wiki/Free_list Como puedes adivinar por el nombre, esta es la técnica que utiliza el crate `linked_list_allocator`. Los allocadores que utilizan esta técnica a menudo se llaman también _allocadores de piscina_. ### Implementación En lo siguiente, crearemos nuestro propio tipo simple `LinkedListAllocator` que utiliza el enfoque anterior para mantener el seguimiento de las regiones de memoria liberadas. Esta parte del post no es necesaria para los posts futuros, así que puedes omitir los detalles de implementación si lo deseas. #### El Tipo de Allocador Comenzamos creando una estructura privada `ListNode` en un nuevo submódulo `allocator::linked_list`: ```rust // en src/allocator.rs pub mod linked_list; ``` ```rust // en src/allocator/linked_list.rs struct ListNode { size: usize, next: Option<&'static mut ListNode>, } ``` Al igual que en la gráfica, un nodo de lista tiene un campo `size` y un puntero opcional al siguiente nodo, representado por el tipo `Option<&'static mut ListNode>`. El tipo `&'static mut` describe semánticamente un objeto [propietario] detrás de un puntero. Básicamente, es un [`Box`] sin un destructor que libera el objeto al final del ámbito. [propietario]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html [`Box`]: https://doc.rust-lang.org/alloc/boxed/index.html Implementamos el siguiente conjunto de métodos para `ListNode`: ```rust // en src/allocator/linked_list.rs impl ListNode { const fn new(size: usize) -> Self { ListNode { size, next: None } } fn start_addr(&self) -> usize { self as *const Self as usize } fn end_addr(&self) -> usize { self.start_addr() + self.size } } ``` El tipo tiene una función constructora simple llamada `new` y métodos para calcular las direcciones de inicio y final de la región representada. Hacemos que la función `new` sea una [función const], que será necesaria más tarde al construir un allocador de lista enlazada estática. Ten en cuenta que cualquier uso de referencias mutables en funciones const (incluido el establecimiento del campo `next` en `None`) sigue siendo inestable. Para que compile, necesitamos agregar **`#![feature(const_mut_refs)]`** en la parte superior de nuestro `lib.rs`. [función const]: https://doc.rust-lang.org/reference/items/functions.html#const-functions Con la estructura `ListNode` como bloque de construcción, ahora podemos crear la estructura `LinkedListAllocator`: ```rust // en src/allocator/linked_list.rs pub struct LinkedListAllocator { head: ListNode, } impl LinkedListAllocator { /// Crea un LinkedListAllocator vacío. pub const fn new() -> Self { Self { head: ListNode::new(0), } } /// Inicializa el allocador con los límites de heap dados. /// /// Esta función es insegura porque el llamador debe garantizar que los /// límites de heap dados son válidos y que el heap está sin usar. Este método debe ser /// llamado solo una vez. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { self.add_free_region(heap_start, heap_size); } /// Agrega la región de memoria dada al inicio de la lista. unsafe fn add_free_region(&mut self, addr: usize, size: usize) { todo!(); } } ``` La estructura contiene un nodo `head` que apunta a la primera región del heap. Solo nos interesa el valor del puntero `next`, por lo que configuramos el `size` en 0 en la función `ListNode::new`. Hacer `head` un `ListNode` en lugar de solo un `&'static mut ListNode` tiene la ventaja de que la implementación del método `alloc` será más sencilla. Al igual que para el allocador de bump, la función `new` no inicializa el allocador con los límites del heap. Además de mantener la compatibilidad con la API, la razón es que la rutina de inicialización requiere escribir un nodo en la memoria del heap, lo que solo puede ocurrir en tiempo de ejecución. La función `new`, sin embargo, necesita ser una [`función const`] que se evalúe en tiempo de compilación porque se utilizará para inicializar el estático `ALLOCATOR`. Por esta razón, proporcionamos nuevamente un método `init` separado y no constante. [`función const`]: https://doc.rust-lang.org/reference/items/functions.html#const-functions El método `init` utiliza un método `add_free_region`, cuya implementación se mostrará más adelante. Por ahora, usamos el macro [`todo!`] para proporcionar una implementación de marcador de posición que siempre provoca un pánico. [`todo!`]: https://doc.rust-lang.org/core/macro.todo.html #### El Método `add_free_region` El método `add_free_region` proporciona la operación fundamental de _inserción_ en la lista vinculada. Actualmente solo llamamos a este método desde `init`, pero también será el método central en nuestra implementación de `dealloc`. Recuerda que la función `dealloc` se llama cuando una región de memoria asignada se libera nuevamente. Para llevar un registro de esta región de memoria liberada, queremos agregarla a la lista enlazada. La implementación del método `add_free_region` se ve así: ```rust // en src/allocator/linked_list.rs use super::align_up; use core::mem; impl LinkedListAllocator { /// Agrega la región de memoria dada al inicio de la lista. unsafe fn add_free_region(&mut self, addr: usize, size: usize) { // asegúrate de que la región liberada sea capaz de contener ListNode assert_eq!(align_up(addr, mem::align_of::()), addr); assert!(size >= mem::size_of::()); // crea un nuevo nodo de lista y agréguelo al inicio de la lista let mut node = ListNode::new(size); node.next = self.head.next.take(); let node_ptr = addr as *mut ListNode; node_ptr.write(node); self.head.next = Some(&mut *node_ptr) } } ``` El método toma la dirección y el tamaño de una región de memoria como argumento y la agrega al inicio de la lista. Primero, asegura que la región dada tiene el tamaño y la alineación necesarios para almacenar un `ListNode`. Luego crea el nodo e inserta en la lista a través de los siguientes pasos: ![](linked-list-allocator-push.svg) El paso 0 muestra el estado del heap antes de que se llame a `add_free_region`. En el paso 1, se llama al método con la región de memoria marcada como `liberada` en la gráfica. Después de las verificaciones iniciales, el método crea un nuevo `node` en su pila con el tamaño de la región liberada. Luego utiliza el método [`Option::take`] para restablecer el puntero `next` del nodo al puntero actual de `head`, convirtiéndolo en `None`. [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take En el paso 2, el método escribe el `node` recién creado al comienzo de la región de memoria liberada mediante el método [`write`] de la punta. Luego señala el puntero de `head` al nuevo nodo. La estructura de punteros resultante puede parecer algo caótica porque la región liberada siempre se inserta al principio de la lista, pero si seguimos los punteros, vemos que cada región libre todavía es accesible desde el puntero `head`. [`write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write #### El Método `find_region` La segunda operación fundamental de una lista enlazada es encontrar una entrada y eliminarla de la lista. Esta es la operación central necesaria para implementar el método `alloc`. Implementamos la operación como un método `find_region` de la siguiente manera: ```rust // en src/allocator/linked_list.rs impl LinkedListAllocator { /// Busca una región libre con el tamaño y alineación dados y la elimina de la lista. /// /// Devuelve una tupla del nodo de la lista y la dirección de inicio de la asignación. fn find_region(&mut self, size: usize, align: usize) -> Option<(&'static mut ListNode, usize)> { // referencia al nodo actual de la lista, actualizada en cada iteración let mut current = &mut self.head; // busca una región de memoria lo suficientemente grande en la lista enlazada while let Some(ref mut region) = current.next { if let Ok(alloc_start) = Self::alloc_from_region(®ion, size, align) { // región adecuada para la asignación -> eliminar nodo de la lista let next = region.next.take(); let ret = Some((current.next.take().unwrap(), alloc_start)); current.next = next; return ret; } else { // región no adecuada -> continuar con la siguiente región current = current.next.as_mut().unwrap(); } } // no se encontró ninguna región adecuada None } } ``` El método utiliza una variable `current` y un [`while let`] bucle para iterar sobre los elementos de la lista. Al principio, `current` se establece en el nodo (ficticio) `head`. En cada iteración, luego se actualiza al campo `next` del nodo actual (en el bloque `else`). Si la región es adecuada para una asignación con el tamaño y alineación dados, se elimina la región de la lista y se devuelve junto con la dirección de `alloc_start`. [`while let`]: https://doc.rust-lang.org/reference/expressions/loop-expr.html#while-let-patterns Cuando el puntero `current.next` se convierte en `None`, el bucle sale. Esto significa que hemos iterado sobre toda la lista, pero no se encontró ninguna región adecuada para una asignación. En ese caso, devolvemos `None`. Si una región es adecuada, es comprobada por la función `alloc_from_region`, cuya implementación se mostrará en un momento. Veamos más detenidamente cómo se elimina una región adecuada de la lista: ![](linked-list-allocator-remove-region.svg) El paso 0 muestra la situación antes de realizar cualquier ajuste de punteros. El `region` y las regiones `current` y los punteros `region.next` y `current.next` están marcados en la gráfica. En el paso 1, tanto el puntero `region.next` como el puntero `current.next` se restablecen a `None` usando el método [`Option::take`] nuevamente. Los punteros originales se almacenan en variables locales llamadas `next` y `ret`. En el paso 2, el puntero `current.next` se establece en el puntero local `next`, que es el puntero original `region.next`. El efecto es que `current` ahora apunta directamente a la región después de `region`, de modo que `region` ya no es un elemento de la lista enlazada. La función luego devuelve el puntero a `region` almacenado en la variable local `ret`. ##### La Función `alloc_from_region` La función `alloc_from_region` devuelve si una región es adecuada para una asignación con un tamaño y alineación dados. Se define de esta manera: ```rust // en src/allocator/linked_list.rs impl LinkedListAllocator { /// Intenta usar la región dada para una asignación con el tamaño y /// alineación dados. /// /// Devuelve la dirección de inicio de la asignación en caso de éxito. fn alloc_from_region(region: &ListNode, size: usize, align: usize) -> Result { let alloc_start = align_up(region.start_addr(), align); let alloc_end = alloc_start.checked_add(size).ok_or(())?; if alloc_end > region.end_addr() { // región demasiado pequeña return Err(()); } let excess_size = region.end_addr() - alloc_end; if excess_size > 0 && excess_size < mem::size_of::() { // el resto de la región es demasiado pequeño para almacenar un ListNode (requerido porque la // asignación divide la región en una parte utilizada y una parte libre) return Err(()); } // región adecuada para la asignación Ok(alloc_start) } } ``` Primero, la función calcula la dirección de inicio y final de una posible asignación, utilizando la función `align_up` que definimos anteriormente y el método [`checked_add`]. Si ocurre un desbordamiento o si la dirección final está detrás de la dirección final de la región, la asignación no cabe en la región y devolvemos un error. La función realiza una verificación menos obvia después de eso. Esta verificación es necesaria porque la mayoría de las veces una asignación no cabe perfectamente en una región adecuada, de modo que una parte de la región permanece utilizable después de la asignación. Esta parte de la región debe almacenar su propio `ListNode` después de la asignación, por lo que debe ser lo suficientemente grande para hacerlo. La verificación comprueba exactamente eso: o la asignación cabe perfectamente (`excess_size == 0`) o el tamaño excedente es lo suficientemente grande para almacenar un `ListNode`. #### Implementando `GlobalAlloc` Con las operaciones fundamentales proporcionadas por los métodos `add_free_region` y `find_region`, finalmente podemos implementar el trait `GlobalAlloc`. Al igual que con el allocador de bump, no implementamos el trait directamente para `LinkedListAllocator`, sino solo para un `Locked` envuelto. El tipo de [`Locked`] agrega mutabilidad interna a través de un spinlock, lo que nos permite modificar la instancia del allocador incluso si los métodos `alloc` y `dealloc` solo toman referencias `&self`. [`Locked`]: @/edition-2/posts/11-allocator-designs/index.md#a-locked-wrapper-type La implementación se ve así: ```rust // en src/allocator/linked_list.rs use super::Locked; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // realizar ajustes de layout let (size, align) = LinkedListAllocator::size_align(layout); let mut allocator = self.lock(); if let Some((region, alloc_start)) = allocator.find_region(size, align) { let alloc_end = alloc_start.checked_add(size).expect("overflow"); let excess_size = region.end_addr() - alloc_end; if excess_size > 0 { allocator.add_free_region(alloc_end, excess_size); } alloc_start as *mut u8 } else { ptr::null_mut() } } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { // realizar ajustes de layout let (size, _) = LinkedListAllocator::size_align(layout); self.lock().add_free_region(ptr as usize, size) } } ``` Comencemos con la función `dealloc` porque es más sencilla: Primero, realiza algunos ajustes de layout, que explicaremos en un momento. Luego, obtiene una referencia `&mut LinkedListAllocator` al llamar a la función [`Mutex::lock`] sobre el tipo [`Locked`]. Por último, llama a la función `add_free_region` para agregar la región liberada a la lista libre. El método `alloc` es un poco más complejo. Comienza con los mismos ajustes de layout y también llama a la función [`Mutex::lock`] para recibir una referencia mutable del allocador. Luego utiliza el método `find_region` para encontrar una región de memoria adecuada para la asignación y eliminarla de la lista. Si esto no tiene éxito y se devuelve `None`, devuelve `null_mut` para señalar un error ya que no hay región de memoria adecuada. En el caso de éxito, el método `find_region` devuelve una tupla de la región adecuada (ya no está en la lista) y la dirección de inicio de la asignación. Usando `alloc_start`, el tamaño de la asignación y la dirección final de la región, calcula la dirección final de la asignación y el tamaño excedente nuevamente. Si el tamaño excedente no es nulo, llama a `add_free_region` para agregar el tamaño excedente de la región de memoria nuevamente a la lista libre. Por último, devuelve la dirección `alloc_start` convertida a un puntero `*mut u8`. #### Ajustes de Layout Así que, ¿cuáles son esos ajustes de layout que hacemos al principio de ambos métodos `alloc` y `dealloc`? Aseguran que cada bloque asignado sea capaz de almacenar un `ListNode`. Esto es importante porque el bloque de memoria se va a liberar en algún momento, y queremos escribir un `ListNode` en él. Si el bloque es más pequeño que un `ListNode` o no tiene la alineación correcta, puede ocurrir un comportamiento indefinido. Los ajustes de layout se realizan mediante la función `size_align`, que se define así: ```rust // en src/allocator/linked_list.rs impl LinkedListAllocator { /// Ajusta el layout dado para que la región de memoria resultante /// también sea capaz de almacenar un `ListNode`. /// /// Devuelve el tamaño y la alineación ajustados como una tupla (tamaño, alineación). fn size_align(layout: Layout) -> (usize, usize) { let layout = layout .align_to(mem::align_of::()) .expect("ajuste de alineación fallido") .pad_to_align(); let size = layout.size().max(mem::size_of::()); (size, layout.align()) } } ``` Primero, la función utiliza el método [`align_to`] sobre el `Layout` pasado para aumentar la alineación a la alineación de un `ListNode` si es necesario. Luego usa el método [`pad_to_align`] para redondear el tamaño a un múltiplo de la alineación para garantizar que la dirección de inicio del siguiente bloque de memoria tendrá la alineación correcta para almacenar un `ListNode` también. En el segundo paso, utiliza el método [`max`] para imponer un tamaño mínimo de asignación de `mem::size_of::`. De esta manera, la función `dealloc` puede escribir de forma segura un `ListNode` en el bloque de memoria liberado. [`align_to`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align_to [`pad_to_align`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.pad_to_align [`max`]: https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.max ### Usándolo Ahora podemos actualizar el estático `ALLOCATOR` en el módulo `allocator` para usar nuestro nuevo `LinkedListAllocator`: ```rust // en src/allocator.rs use linked_list::LinkedListAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(LinkedListAllocator::new()); ``` Dado que la función `init` se comporta de la misma manera para el allocador de bump y el allocador de lista enlazada, no necesitamos modificar la llamada `init` en `init_heap`. Cuando ejecutamos nuestras pruebas de `heap_allocation` nuevamente, vemos que todas las pruebas pasan ahora, incluida la prueba `many_boxes_long_lived` que falló con el allocador de bump: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` Esto demuestra que nuestro allocador de lista enlazada puede reutilizar la memoria liberada para asignaciones subsiguientes. ### Discusión En contraste con el allocador de bump, el allocador de lista enlazada es mucho más adecuado como un allocador de propósito general, principalmente porque es capaz de reutilizar directamente la memoria liberada. Sin embargo, también tiene algunas desventajas. Algunas de ellas son solo causadas por nuestra implementación básica, pero también hay desventajas fundamentales del diseño del allocador en sí. #### Fusionando Bloques Liberados El principal problema con nuestra implementación es que solo divide el heap en bloques más pequeños pero nunca los fusiona nuevamente. Considera este ejemplo: ![](linked-list-allocator-fragmentation-on-dealloc.svg) En la primera línea, se crean tres asignaciones en el heap. Dos de ellas son liberadas nuevamente en la línea 2 y la tercera es liberada en la línea 3. Ahora el heap completo está sin usar nuevamente, pero todavía está dividido en cuatro bloques individuales. En este punto, una asignación grande podría no ser posible porque ninguno de los cuatro bloques es lo suficientemente grande. Con el tiempo, el proceso continúa y el heap se divide en bloques cada vez más pequeños. En algún momento, el heap está tan fragmentado que incluso las asignaciones de tamaño normal fallarán. Para solucionar este problema, necesitamos fusionar los bloques liberados adyacentes nuevamente. Para el ejemplo anterior, esto significaría lo siguiente: ![](linked-list-allocator-merge-on-dealloc.svg) Al igual que antes, dos de las tres asignaciones se liberan en la línea `2`. En lugar de mantener el heap fragmentado, ahora realizamos un paso adicional en la línea `2a` para fusionar de nuevo los dos bloques más a la derecha. En la línea `3`, se libera la tercera asignación (como antes), resultando en un heap completamente sin usar representado por tres bloques distintos. En un paso adicional de fusión en la línea `3a`, luego fusionamos los tres bloques adyacentes nuevamente. El crate `linked_list_allocator` implementa esta estrategia de fusión de la siguiente manera: En lugar de insertar bloques de memoria liberados al principio de la lista enlazada en `deallocate`, siempre mantiene la lista ordenada por dirección de inicio. De esta manera, la fusión se puede realizar directamente en la llamada a `deallocate` al examinar las direcciones y tamaños de los dos bloques vecinos en la lista. Por supuesto, la operación de liberación es más lenta de esta manera, pero previene la fragmentación del heap que vimos arriba. #### Rendimiento Como aprendimos arriba, el allocador de bump es extremadamente rápido y puede optimizarse a solo unas pocas operaciones de ensamblador. El allocador de lista enlazada tiene un rendimiento mucho peor en esta categoría. El problema es que una solicitud de asignación puede requerir recorrer la lista enlazada completa hasta que encuentre un bloque adecuado. Dado que la longitud de la lista depende del número de bloques de memoria no utilizados, el rendimiento puede variar extremadamente para diferentes programas. Un programa que solo crea un par de asignaciones experimentará un rendimiento relativamente rápido en la asignación. Sin embargo, para un programa que fragmenta el heap con muchas asignaciones, el rendimiento de la asignación será muy malo porque la lista enlazada será muy larga y contendrá en su mayoría bloques muy pequeños. Es importante notar que este problema de rendimiento no es un problema causado por nuestra implementación básica, sino un problema fundamental del enfoque de la lista enlazada. Dado que el rendimiento de asignación puede ser muy importante para el código a nivel de kernel, exploramos un tercer diseño de allocador a continuación que intercambia un rendimiento mejorado por una utilización de memoria reducida. ## Allocador de Bloques de Tamaño Fijo A continuación, presentamos un diseño de allocador que utiliza bloques de memoria de tamaño fijo para satisfacer las solicitudes de asignación. De esta manera, el allocador a menudo devuelve bloques que son más grandes de lo que se necesita para las asignaciones, lo que resulta en memoria desperdiciada debido a [fragmentación interna]. Por otro lado, reduce drásticamente el tiempo requerido para encontrar un bloque adecuado (en comparación con el allocador de lista enlazada), lo que resulta en un mejor rendimiento de asignación. ### Introducción La idea detrás de un _allocador de bloques de tamaño fijo_ es la siguiente: En lugar de asignar exactamente la cantidad de memoria solicitada, definimos un pequeño número de tamaños de bloque y redondeamos cada asignación al siguiente tamaño de bloque. Por ejemplo, con tamaños de bloque de 16, 64 y 512 bytes, una asignación de 4 bytes devolvería un bloque de 16 bytes, una asignación de 48 bytes un bloque de 64 bytes y una asignación de 128 bytes un bloque de 512 bytes. Al igual que el allocador de lista enlazada, llevamos un registro de la memoria no utilizada creando una lista enlazada en la memoria no utilizada. Sin embargo, en lugar de usar una sola lista con diferentes tamaños de bloque, creamos una lista separada para cada clase de tamaño. Cada lista solo almacena bloques de un solo tamaño. Por ejemplo, con tamaños de bloque de 16, 64 y 512, habría tres listas enlazadas separadas en la memoria: ![](fixed-size-block-example.svg). En lugar de un solo puntero `head`, tenemos los tres punteros `head_16`, `head_64` y `head_512` que apuntan cada uno al primer bloque no utilizado del tamaño correspondiente. Todos los nodos en una sola lista tienen el mismo tamaño. Por ejemplo, la lista iniciada por el puntero `head_16` solo contiene bloques de 16 bytes. Esto significa que ya no necesitamos almacenar el tamaño en cada nodo de lista, ya que ya está especificado por el nombre del puntero de cabeza. Dado que cada elemento en una lista tiene el mismo tamaño, cada elemento de la lista es igualmente adecuado para una solicitud de asignación. Esto significa que podemos realizar una asignación de manera muy eficiente usando los siguientes pasos: - Redondear el tamaño de asignación solicitado al siguiente tamaño de bloque. Por ejemplo, cuando se solicita una asignación de 12 bytes, elegiríamos el tamaño de bloque de 16 en el ejemplo anterior. - Recuperar el puntero de cabeza para la lista, por ejemplo, para el tamaño de bloque 16, necesitamos usar `head_16`. - Eliminar el primer bloque de la lista y devolverlo. Lo más notable es que siempre podemos devolver el primer elemento de la lista y ya no necesitamos recorrer toda la lista. Por lo tanto, las asignaciones son mucho más rápidas que con el allocador de lista enlazada. #### Tamaños de Bloques y Memoria Desperdiciada Dependiendo de los tamaños de los bloques, perdemos mucha memoria al redondear. Por ejemplo, cuando se devuelve un bloque de 512 bytes para una asignación de 128 bytes, tres cuartas partes de la memoria asignada están sin usar. Al definir tamaños de bloque razonables, es posible limitar la cantidad de memoria desperdiciada hasta cierto grado. Por ejemplo, al usar potencias de 2 (4, 8, 16, 32, 64, 128, …) como tamaños de bloque, podemos limitar el desperdicio de memoria a la mitad del tamaño de la asignación en el peor caso y a un cuarto del tamaño de la asignación en el caso promedio. También es común optimizar los tamaños de bloque en función de los tamaños de asignación comunes en un programa. Por ejemplo, podríamos agregar un tamaño de bloque de 24 para mejorar el uso de memoria en programas que realizan asignaciones de 24 bytes con frecuencia. De esta manera, la cantidad de memoria desperdiciada se puede reducir a menudo sin perder los beneficios de rendimiento. #### Desasignación Al igual que la asignación, la desasignación también es muy rápida. Implica los siguientes pasos: - Redondear el tamaño de asignación liberado al siguiente tamaño de bloque. Esto es necesario ya que el compilador solo pasa el tamaño de asignación solicitado a `dealloc`, no el tamaño del bloque que fue devuelto por `alloc`. Al utilizar la misma función de ajuste de tamaño en ambos métodos `alloc` y `dealloc`, podemos asegurarnos de que siempre liberamos la cantidad correcta de memoria. - Recuperar el puntero de cabeza para la lista. - Agregar el bloque liberado al inicio de la lista actualizando el puntero de cabeza. Lo más notable es que tampoco se requiere recorrer la lista para la desasignación. Esto significa que el tiempo requerido para una llamada a `dealloc` se mantiene constante, independientemente de la longitud de la lista. #### Allocador de Respaldo Dado que las asignaciones grandes (>2 KB) son raras, especialmente en los núcleos de sistemas operativos, puede tener sentido retroceder a un allocador diferente para estas asignaciones. Por ejemplo, podríamos utilizar un allocador de lista enlazada para asignaciones mayores de 2048 bytes a fin de reducir el desperdicio de memoria. Dado que solo se espera muy pocas asignaciones de ese tamaño, la lista enlazada se mantendría pequeña y las (de)asignaciones seguirían siendo razonablemente rápidas. #### Creando Nuevos Bloques Arriba, asumimos siempre que hay suficiente bloques de un tamaño específico en la lista para satisfacer todas las solicitudes de asignación. Sin embargo, en algún momento, la lista enlazada para un tamaño de bloque específico se queda vacía. En este punto, hay dos formas en las que podemos crear nuevos bloques no utilizados de un tamaño específico para satisfacer una solicitud de asignación: - Asignar un nuevo bloque del allocador de respaldo (si hay uno). - Dividir un bloque más grande de otra lista. Esto funciona mejor si los tamaños de bloque son potencias de dos. Por ejemplo, un bloque de 32 bytes se puede dividir en dos bloques de 16 bytes. Para nuestra implementación, asignaremos nuevos bloques del allocador de respaldo, ya que la implementación es mucho más simple. ### Implementación Ahora que sabemos cómo funciona un allocador de bloques de tamaño fijo, podemos comenzar nuestra implementación. No dependeremos de la implementación del allocador de lista enlazada creada en la sección anterior, así que puedes seguir esta parte incluso si omitiste la implementación del allocador de lista enlazada. #### Nodo de Lista Comenzamos nuestra implementación creando un tipo `ListNode` en un nuevo módulo `allocator::fixed_size_block`: ```rust // en src/allocator.rs pub mod fixed_size_block; ``` ```rust // en src/allocator/fixed_size_block.rs struct ListNode { next: Option<&'static mut ListNode>, } ``` Este tipo es similar al tipo `ListNode` de nuestra [implementación del allocador de lista enlazada], con la diferencia de que no tenemos un campo `size`. No es necesario porque cada bloque en una lista tiene el mismo tamaño con el diseño del allocador de bloques de tamaño fijo. [implementación del allocador de lista enlazada]: #el-tipo-de-allocador #### Tamaños de Bloques A continuación, definimos un slice constante `BLOCK_SIZES` con los tamaños de bloque utilizados para nuestra implementación: ```rust // en src/allocator/fixed_size_block.rs /// Los tamaños de bloque a utilizar. /// /// Los tamaños deben ser cada uno potencia de 2 porque también se utilizan como /// la alineación del bloque (las alineaciones deben ser siempre potencias de 2). const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512, 1024, 2048]; ``` Como tamaños de bloque, usamos potencias de 2, comenzando desde 8 hasta 2048. No definimos tamaños de bloque más pequeños que 8 porque cada bloque debe ser capaz de almacenar un puntero de 64 bits al siguiente bloque cuando se libera. Para asignaciones mayores de 2048 bytes, retrocederemos a un allocador de lista enlazada. Para simplificar la implementación, definimos el tamaño de un bloque como su alineación necesaria en la memoria. Así que un bloque de 16 bytes siempre está alineado en un límite de 16 bytes y un bloque de 512 bytes está alineado en un límite de 512 bytes. Dado que las alineaciones siempre deben ser potencias de 2, esto excluye cualquier otro tamaño de bloque. Si necesitamos tamaños de bloque que no sean potencias de 2 en el futuro, aún podemos ajustar nuestra implementación para ello (por ejemplo, definiendo un segundo arreglo `BLOCK_ALIGNMENTS`). #### El Tipo de Allocador Usando el tipo `ListNode` y el slice `BLOCK_SIZES`, ahora podemos definir nuestro tipo de allocador: ```rust // en src/allocator/fixed_size_block.rs pub struct FixedSizeBlockAllocator { list_heads: [Option<&'static mut ListNode>; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap, } ``` El campo `list_heads` es un arreglo de punteros `head`, uno para cada tamaño de bloque. Esto se implementa utilizando el `len()` del slice `BLOCK_SIZES` como longitud del arreglo. Como allocador de respaldo para asignaciones más grandes que el mayor tamaño de bloque, usamos el allocador proporcionado por el `linked_list_allocator`. También podríamos usar el `LinkedListAllocator` que implementamos nosotros mismos, pero tiene la desventaja de que no [fusiona bloques liberados]. [fusiona bloques liberados]: #fusionando-bloques-liberados Para construir un `FixedSizeBlockAllocator`, proporcionamos las mismas funciones `new` e `init` que implementamos para los otros tipos de allocadores también: ```rust // en src/allocator/fixed_size_block.rs impl FixedSizeBlockAllocator { /// Crea un FixedSizeBlockAllocator vacío. pub const fn new() -> Self { const EMPTY: Option<&'static mut ListNode> = None; FixedSizeBlockAllocator { list_heads: [EMPTY; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap::empty(), } } /// Inicializa el allocador con los límites de heap dados. /// /// Esta función es insegura porque el llamador debe garantizar que los /// límites de heap dados son válidos y que el heap está sin usar. Este método debe ser /// llamado solo una vez. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { self.fallback_allocator.init(heap_start, heap_size); } } ``` La función `new` solo inicializa el arreglo `list_heads` con nodos vacíos y crea un allocador de lista enlazada [`empty`] como `fallback_allocator`. La constante `EMPTY` es necesaria para decirle al compilador de Rust que queremos inicializar el arreglo con un valor constante. Inicializar el arreglo directamente como `[None; BLOCK_SIZES.len()]` no funciona, porque entonces el compilador requiere que `Option<&'static mut ListNode>` implemente el trait `Copy`, lo que no hace. Esta es una limitación actual del compilador de Rust, que podría desaparecer en el futuro. [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.empty Si aún no lo has hecho para la implementación del `LinkedListAllocator`, también necesitas agregar **`#![feature(const_mut_refs)]`** en la parte superior de tu `lib.rs`. La razón es que cualquier uso de tipos de referencia mutables en funciones const sigue siendo inestable, incluido el tipo de elemento de referencia `Option<&'static mut ListNode>` del campo `list_heads` (incluso si lo establecemos en `None`). La función insegura `init` solo llama a la función [`init`] del `fallback_allocator` sin realizar ninguna inicialización adicional del arreglo `list_heads`. En su lugar, lo inicializaremos de manera perezosa en las llamadas a `alloc` y `dealloc`. [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init Para mayor comodidad, también creamos un método privado `fallback_alloc` que asigna usando el `fallback_allocator`: ```rust // en src/allocator/fixed_size_block.rs use alloc::alloc::Layout; use core::ptr; impl FixedSizeBlockAllocator { /// Asigna usando el allocador de respaldo. fn fallback_alloc(&mut self, layout: Layout) -> *mut u8 { match self.fallback_allocator.allocate_first_fit(layout) { Ok(ptr) => ptr.as_ptr(), Err(_) => ptr::null_mut(), } } } ``` El tipo [`Heap`] del crate `linked_list_allocator` no implementa [`GlobalAlloc`] (ya que es [no posible sin bloquear]). En su lugar, proporciona un método [`allocate_first_fit`] que tiene una interfaz ligeramente diferente. En lugar de devolver un `*mut u8` y usar un puntero nulo para señalar un error, devuelve un `Result, ()>`. El tipo [`NonNull`] es una abstracción para un puntero en bruto que se garantiza que no sea un puntero nulo. Mapeando el caso `Ok` al método [`NonNull::as_ptr`] y el caso `Err` a un puntero nulo, podemos traducir fácilmente esto de nuevo a un tipo `*mut u8`. [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [no posible sin bloquear]: #globalalloc-y-mutabilidad [`allocate_first_fit`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.allocate_first_fit [`NonNull`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html [`NonNull::as_ptr`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html#method.as_ptr #### Calculando el Índice de la Lista Antes de implementar el trait `GlobalAlloc`, definimos una función auxiliar `list_index` que devuelve el tamaño de bloque más bajo posible para un dado [`Layout`]: ```rust // en src/allocator/fixed_size_block.rs /// Elige un tamaño de bloque apropiado para el layout dado. /// /// Devuelve un índice en el arreglo `BLOCK_SIZES`. fn list_index(layout: &Layout) -> Option { let required_block_size = layout.size().max(layout.align()); BLOCK_SIZES.iter().position(|&s| s >= required_block_size) } ``` El bloque debe tener al menos el tamaño y la alineación requeridos por el `Layout` dado. Dado que definimos que el tamaño del bloque es también su alineación, esto significa que el `required_block_size` es el [máximo] de los atributos [`size()`] y [`align()`] del layout. Para encontrar el siguiente bloque más grande en el slice `BLOCK_SIZES`, primero usamos el método [`iter()`] para obtener un iterador y luego el método [`position()`] para encontrar el índice del primer bloque que es al menos tan grande como el `required_block_size`. [máximo]: https://doc.rust-lang.org/core/cmp/trait.Ord.html#method.max [`size()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.size [`align()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align [`iter()`]: https://doc.rust-lang.org/std/primitive.slice.html#method.iter [`position()`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.position Ten en cuenta que no devolvemos el tamaño del bloque en sí, sino el índice en el slice `BLOCK_SIZES`. La razón es que queremos utilizar el índice devuelto como índice en el arreglo `list_heads`. #### Implementando `GlobalAlloc` El último paso es implementar el trait `GlobalAlloc`: ```rust // en src/allocator/fixed_size_block.rs use super::Locked; use alloc::alloc::GlobalAlloc; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { todo!(); } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { todo!(); } } ``` Al igual que con los otros allocadores, no implementamos el trait `GlobalAlloc` directamente para nuestro tipo de allocador, sino que usamos el wrapper [`Locked`] para agregar mutabilidad interna sincronizada. Dado que las implementaciones de `alloc` y `dealloc` son relativamente grandes, las introduciremos una por una a continuación. ##### `alloc` La implementación del método `alloc` se ve así: ```rust // en el bloque `impl` en src/allocator/fixed_size_block.rs unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { match allocator.list_heads[index].take() { Some(node) => { allocator.list_heads[index] = node.next.take(); node as *mut ListNode as *mut u8 } None => { // no existe un bloque en la lista => asignar un nuevo bloque let block_size = BLOCK_SIZES[index]; // solo funciona si todos los tamaños de bloque son potencia de 2 let block_align = block_size; let layout = Layout::from_size_align(block_size, block_align) .unwrap(); allocator.fallback_alloc(layout) } } } None => allocator.fallback_alloc(layout), } } ``` Repasemos el proceso paso a paso: Primero, utilizamos el método `Locked::lock` para obtener una referencia mutable a la instancia del allocador envuelta. A continuación, llamamos a la función `list_index` que acabamos de definir para calcular el tamaño de bloque apropiado para el layout dado y obtener el índice correspondiente en el arreglo `list_heads`. Si este índice es `None`, ningún tamaño de bloque encaja para la asignación, por lo que utilizamos el `fallback_allocator` usando la función `fallback_alloc`. Si el índice de la lista es `Some`, intentamos eliminar el primer nodo en la lista correspondiente iniciada por `list_heads[index]` usando el método [`Option::take`]. Si la lista no está vacía, entramos en la rama `Some(node)` de la declaración `match`, donde apuntamos el puntero de cabeza de la lista al sucesor del nodo eliminado (utilizando [`take`] de nuevo). Finalmente, devolvemos el puntero `node` como un `*mut u8`. [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take Si el puntero de cabeza es `None`, indica que la lista de bloques está vacía. Esto significa que necesitamos construir un nuevo bloque como se [describió arriba](#creando-nuevos-bloques). Para ello, primero obtenemos el tamaño actual del bloque del slice `BLOCK_SIZES` y lo utilizamos como tamaño y alineación para el nuevo bloque. Luego, creamos un nuevo `Layout` a partir de él y llamamos al método `fallback_alloc` para realizar la asignación. La razón para ajustar el layout y la alineación es que el bloque se agregará a la lista de bloques en la desasignación. #### `dealloc` La implementación del método `dealloc` se ve así: ```rust // en src/allocator/fixed_size_block.rs use core::{mem, ptr:: ================================================ FILE: blog/content/edition-2/posts/11-allocator-designs/index.ja.md ================================================ +++ title = "アロケータの設計" weight = 11 path = "allocator-designs/ja" date = 2020-01-20 [extra] # Please update this when updating the translation translation_based_on_commit = "2e3230eca2275226ec33c2dfe7f98f2f4b9a48b4" # GitHub usernames of the people that translated this post translators = ["swnakamura"] +++ この記事ではヒープアロケータをゼロから実装する方法を説明します。バンプアロケータ、連結リストアロケータ、固定サイズブロックアロケータなどの様々なアロケータの設計を示し、それらについて議論します。3つそれぞれのデザインについて、私たちのカーネルに使える基礎的な実装を作ります。 このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事のソースコード全体は[`post-11` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-11 ## はじめに [前回の記事][previous post]では、カーネルへのヒープ割り当ての基本的なサポートを追加しました。そのために、ページテーブルに[新しいメモリ領域を作成][map-heap]し、[`linked_list_allocator`クレートを使用][use-alloc-crate]してそのメモリを管理しました。ヒープは動作するようになりましたが、このアロケータクレートがどのように動作しているのかを理解しようとすることなく、仕事のほとんどを任せてしまっていました。 [previous post]: @/edition-2/posts/10-heap-allocation/index.ja.md [map-heap]: @/edition-2/posts/10-heap-allocation/index.ja.md#creating-a-kernel-heap [use-alloc-crate]: @/edition-2/posts/10-heap-allocation/index.ja.md#aroketakuretowoshi-u この記事では、既存のアロケータクレートに頼るのではなく、独自のヒープアロケータをゼロから作成する方法を紹介します。単純無比の**バンプアロケータ**、基本の**固定サイズブロックアロケータ**など、さまざまなアロケータの設計について議論し、この知識を使用して(`linked_list_allocator`クレートと比較して)より性能のよいアロケータを実装します。 ### 設計目標 アロケータの責任は、利用可能なヒープメモリを管理することです。`alloc`が呼ばれたら未使用のメモリを返し、`dealloc`によって解放されたメモリが再利用できるように記録をとる必要があります。最も重要なことは、すでに他の場所で使用されているメモリを決して渡してはならないということです。これをすると未定義動作が起きてしまいます。 メモリの正しい管理のほかにも、多くの二次的な設計目標があります。たとえば、アロケータは利用可能なメモリを効果的に利用し、[**断片化**][_fragmentation_]があまり起きないようにすべきです。さらに、並列なアプリケーションにもうまく機能し、任意の数のプロセッサに拡張できなくてはなりません。性能を最大化するため、CPUキャッシュに合わせてメモリレイアウトを最適化し、[キャッシュの局所性][cache locality]を改善したり[false sharing]を回避することすらするかもしれません。 [cache locality]: https://www.geeksforgeeks.org/locality-of-reference-and-cache-operation-in-cache-memory/ [_fragmentation_]: https://en.wikipedia.org/wiki/Fragmentation_(computing) [false sharing]: https://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html これらの要件により、優れたアロケータは非常に複雑になりえます。例えば、[jemalloc]には3万行以上のコードがあります。ここまで複雑なものは、たった一つのバグが深刻なセキュリティ脆弱性につながりうるカーネルコードでは望ましくない場合が多いでしょう。幸いなことに、カーネルのコードにおけるメモリ割り当てのパターンは、ユーザースペースのコードと比較してはるかに単純であることが多いため、比較的単純なアロケータ設計で十分です。 [jemalloc]: http://jemalloc.net/ 以下では、3つのカーネルアロケータの設計を示し、その長所と短所を説明します。 ## バンプアロケータ 最も単純なアロケータの設計は**バンプアロケータ**(**スタックアロケータ**とも呼ばれる)です。メモリを直線的に割り当て、割り当てられたバイト数と割り当ての数のみを管理します。このアロケータは非常に特定のユースケースでのみ有用です──なぜなら、一度にすべてのメモリを解放することしかできないという厳しい制約があるからです。 ### 考え方 バンプアロケータの考え方は、未使用のメモリの開始位置を指す`next`変数を増やす("bump" する)ことによって、メモリを順に割り当てるというものです。はじめ、`next`はヒープの開始アドレスに等しいです。`next`は、各割り当てにおいて割り当てサイズだけ増加し、この値が使用済みメモリと未使用メモリの境界を常に指すようにします。 ![3つの時点におけるヒープメモリ領域: 1: ヒープの開始地点に一つの割り当てが存在する。`next`ポインタはその終端を指している。 2: 二つ目の割り当てが一つ目のすぐ右に追加された。`next`ポインタは二つ目の割り当ての終端を指している。 3: 三つ目の割り当てが二つ目のすぐ右に追加された。`next`ポインタは三つ目の割り当ての終端を指している。](bump-allocation.svg) `next`ポインタは1つの方向にしか移動しないため、同じメモリ領域を2回渡すことはありません。これがヒープの終わりに達すると、それ以上のメモリを割り当てることができないので、次の割り当てでメモリ不足エラーが発生します。 多くの場合、バンプアロケータは「割り当てカウンタ」付きで実装されます。これは、`alloc`の呼び出しのたび1増加し、`dealloc`の呼び出しのたび1減少します。割り当てカウンタがゼロになることは、ヒープ上のすべての割り当てが解除されたことを意味します。このとき、`next`ポインタをヒープの開始アドレスにリセットし、ヒープメモリ全体を再び割り当てに使えるようにすることができます。 ### 実装 `allocator::bump`サブモジュールを宣言するところから実装を始めましょう: ```rust // in src/allocator.rs pub mod bump; ``` サブモジュールの内容は、新しい`src/allocator/bump.rs`ファイルに、以下の内容で作ります: ```rust // in src/allocator/bump.rs pub struct BumpAllocator { heap_start: usize, heap_end: usize, next: usize, allocations: usize, } impl BumpAllocator { /// 新しい空のバンプアロケータを作る。 pub const fn new() -> Self { BumpAllocator { heap_start: 0, heap_end: 0, next: 0, allocations: 0, } } /// 与えられたヒープ領域でバンプアロケータを初期化する。 /// /// このメソッドはunsafeである。呼び出し元は与えられたメモリ範囲が未使用であることを /// 保証しなければならない。また、このメソッドは一度しか呼ばれてはならない。 pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { self.heap_start = heap_start; self.heap_end = heap_start + heap_size; self.next = heap_start; } } ``` `heap_start`フィールドと`heap_end`フィールドは、ヒープメモリ領域の下限と上限を管理します。呼び出し元は、これらのアドレスが有効であることを保証する必要があります。そうでない場合、アロケータは不正なメモリを返すでしょう。このため、`init`関数の呼び出しは`unsafe`でなければなりません。 `next`フィールドの目的は、常にヒープの最初の未使用バイト、つまり次の割り当ての開始アドレスを指すことです。最初はヒープ全体が未使用であるため、`init`関数では`heap_start`に設定されています。各割り当てで、このフィールドは割り当てサイズだけ増加("bump")し、同じメモリ領域を2回返さないようにします。 `allocations`フィールドは、有効な割り当ての単純なカウンタで、最後の割り当てが解放されたときにアロケータをリセットするためにあります。0で初期化します。 インターフェイスを`linked_list_allocator`クレートによって提供されるアロケータと同じにするために、初期化を`new`関数の中で直接実行するのではなく、別の`init`関数を作りました。こうすることで、コードの変更なしにアロケータを切り替えることができます。 ### `GlobalAlloc`を実装する [前回の記事で説明した][global-alloc]ように、すべてのヒープアロケータは、次のように定義されている[`GlobalAlloc`]トレイトを実装する必要があります: [global-alloc]: @/edition-2/posts/10-heap-allocation/index.ja.md#aroketaintahuesu [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` 必要なのは`alloc`と`dealloc`メソッドのみです。他の2つのメソッドにはデフォルト実装があるので省略できます。 #### 最初の実装 `BumpAllocator`の`alloc`メソッドを実装してみましょう。 ```rust // in src/allocator/bump.rs use alloc::alloc::{GlobalAlloc, Layout}; unsafe impl GlobalAlloc for BumpAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // TODO アラインメント・境界のチェック let alloc_start = self.next; self.next = alloc_start + layout.size(); self.allocations += 1; alloc_start as *mut u8 } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { todo!(); } } ``` まず、割り当ての開始アドレスとして`next`フィールドを使用します。次に、割り当ての終端アドレス(ヒープの次の未使用アドレスでもある)を指すように`next`フィールドを更新します。`allocations`カウンタを1増やしてから、割り当ての開始アドレスを`*mut u8`ポインタとして返します。 境界チェックやアラインメント調整を行わないので、この実装はまだ安全ではないことに注意してください。まあいずれにせよ、以下のエラーでコンパイルに失敗するのでたいした問題ではないのですが: ``` error[E0594]: cannot assign to `self.next` which is behind a `&` reference --> src/allocator/bump.rs:29:9 | 29 | self.next = alloc_start + layout.size(); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&` reference, so the data it refers to cannot be written ``` (`self.allocations += 1`の行でも同じエラーが発生します。簡潔のためにここでは省略しました) このエラーが起こるのは、`GlobalAlloc`トレイトの[`alloc`]および[`dealloc`]メソッドが不変な`&self`参照に対してのみ動作するため、`next`フィールドと`allocations`フィールドを更新できないために発生します。割り当てで毎回`next`を更新することがバンプアロケータの大原則であるため、これは問題ですね。 [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc #### `GlobalAlloc`と可変性 この可変性の問題にどんな解決策が可能かを見る前に、`GlobalAlloc`トレイトメソッドがなぜ`&self`引数で定義されているのかを考えてみましょう。[前回の記事][global-allocator]で見たように、グローバルヒープアロケータは`GlobalAlloc`トレイトを実装する`static`に`#[global_allocator]`属性を追加することによって定義されます。静的 (スタティック) 変数はRustでは不変であるため、この静的なアロケータで`&mut self`を取るメソッドを呼び出すことはできません。よって、`GlobalAlloc`のすべてのメソッドは、不変な`&self`参照のみを取ります。 [global-allocator]: @/edition-2/posts/10-heap-allocation/index.ja.md#global-allocator-shu-xing 幸いなことに、`&self`参照から`&mut self`参照を取得する方法があります。アロケータを[`spin::Mutex`]スピンロックでラップすることで、同期された[内部可変性][interior mutability]を使えるのです。この型は、[相互排他制御][mutual exclusion]を行う`lock`メソッドを提供し、`&self`参照を`&mut self`参照に安全に変換します。このラッパ型はカーネルですでに複数回使用しています([VGAテキストバッファ][vga-mutex]など)。 [interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [vga-mutex]: @/edition-2/posts/03-vga-text-buffer/index.ja.md#supinrotuku [`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html [mutual exclusion]: https://en.wikipedia.org/wiki/Mutual_exclusion #### `Locked`ラッパ型 spin::Mutexラッパ型の助けを借りれば、バンプアロケータに`GlobalAlloc`トレイトを実装できます。このトレイトを`BumpAllocator`に直接実装するのではなく、ラップされた`spin::Mutex`型に対して実装するのがミソです。 ```rust unsafe impl GlobalAlloc for spin::Mutex {…} ``` 残念ながら、Rustコンパイラは他のクレートで定義された型のトレイト実装を許可していないため、これはまだうまくいきません。 ``` error[E0117]: only traits defined in the current crate can be implemented for arbitrary types --> src/allocator/bump.rs:28:1 | 28 | unsafe impl GlobalAlloc for spin::Mutex { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^-------------------------- | | | | | `spin::mutex::Mutex` is not defined in the current crate | impl doesn't use only types from inside the current crate | = note: define and implement a trait or new type instead ``` これに対処するためには、`spin::Mutex`型をラップする独自の型を作ればよいです: ```rust // in src/allocator.rs /// トレイト実装を許してもらうための、spin::Mutexをラップする型 pub struct Locked { inner: spin::Mutex, } impl Locked { pub const fn new(inner: A) -> Self { Locked { inner: spin::Mutex::new(inner), } } pub fn lock(&self) -> spin::MutexGuard { self.inner.lock() } } ``` この型は、`spin::Mutex`の汎用 (ジェネリック) ラッパです。ラップされる型`A`に制限はないので、アロケータだけでなく、あらゆる種類の型をラップするために使用できます。このラッパは、指定された値をラップする単純な`new`コンストラクタ関数を提供しています。ラップされた`Mutex`で`lock`を呼び出す`lock`関数も、便利なので提供しています。`Locked`型はとても汎用的であり、他のアロケータの実装にも役立つため、親の`allocator`モジュールに入れることにします。 #### `Locked`の実装 `Locked`型は(`spin::Mutex`とは違って)私たちクレートの中で定義されているため、私たちのバンプアロケータに`GlobalAlloc`型を実装するために使用できます。実装の全体は次のようになります: ```rust // in src/allocator/bump.rs use super::{align_up, Locked}; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut bump = self.lock(); // 可変参照を得る let alloc_start = align_up(bump.next, layout.align()); let alloc_end = match alloc_start.checked_add(layout.size()) { Some(end) => end, None => return ptr::null_mut(), }; if alloc_end > bump.heap_end { ptr::null_mut() // メモリ不足 } else { bump.next = alloc_end; bump.allocations += 1; alloc_start as *mut u8 } } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { let mut bump = self.lock(); // 可変参照を得る bump.allocations -= 1; if bump.allocations == 0 { bump.next = bump.heap_start; } } } ``` `alloc`と`dealloc`は両方、まず、`inner`フィールドを通じて[`Mutex::lock`]メソッドを呼び出し、ラップされたアロケータ型への可変参照を取得します。インスタンスはメソッドの終了までロックされたままであるため、(まもなくスレッドのサポートを追加するのですが)マルチスレッドになってもデータ競合が発生することはありません。 [`Mutex::lock`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock 前のプロトタイプと比較してみると、`alloc`の実装はアラインメント要件を守るようになっており、割り当てがヒープメモリ領域内にあることを保証するために境界チェックを実行するようになっています。この関数はまず、`next`アドレスを`Layout`引数で指定されたアラインメントに切り上げます。`align_up`関数のコードはすぐ後で示します。次に、要求された割り当てサイズを`alloc_start`に足して、割り当ての終端アドレスを得ます。巨大な割り当てが試みられた際に整数のオーバーフローが起きることを防ぐため、[`checked_add`]メソッドを使っています。オーバーフローが発生した場合、または割り当ての終端アドレスがヒープの終端アドレスよりも大きくなる場合、メモリ不足であることを示すためにヌルポインタを返します。それ以外の場合は、以前のように、`next`アドレスを更新し、`allocations`カウンタを1増やします。最後に、`*mut u8`ポインタに変換された`alloc_start`アドレスを返します。 [`checked_add`]: https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html `dealloc`関数は、指定されたポインタと`Layout`引数を無視します。代わりに、単に`allocations`カウンターを減らします。カウンターが`0`に戻ったなら、それはすべての割り当てが再び解放されたことを意味します。このとき、`next`アドレスを`heap_start`アドレスにリセットして、ヒープメモリ全体を再び使用できるようにします。 #### アドレスのアラインメント `align_up`関数の用途は広いので、親の`allocator`モジュールに入れてもよいでしょう。基本的な実装は以下のようになります: ```rust // in src/allocator.rs /// 与えられたアドレス`addr`を`align`に上丸めする fn align_up(addr: usize, align: usize) -> usize { let remainder = addr % align; if remainder == 0 { addr // addr はすでに丸められていた } else { addr - remainder + align } } ``` この関数はまず、`align`で`addr`を割った[余り][remainder]を計算します。余りが`0`の場合、アドレスはすでに指定されたアラインメントに丸められているということです。それ以外の場合は、(余りが0になるように)余りを引いてアドレスをアラインし、(アドレスが元のアドレスよりも小さくならないように)アラインメントを足します。 [remainder]: https://en.wikipedia.org/wiki/Euclidean_division 実は、これはこの関数を実装する最も効率的な方法ではありません。はるかに高速な実装は次のようになります: ```rust /// 与えられたアドレス`addr`を`align`に上丸めする /// /// `align`は2の累乗でなければならない fn align_up(addr: usize, align: usize) -> usize { (addr + align - 1) & !(align - 1) } ``` この方法では、`align`が2の累乗である必要がありますが、これは`GlobalAlloc`トレイト(およびその[`Layout`]パラメータ)を利用するならば保証されています。この場合、非常に効率的にアドレスを揃えるための[ビットマスク][bitmask]を作成できます。その原理を理解するために、式の右側から一つずつ見ていきましょう: [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html [bitmask]: https://en.wikipedia.org/wiki/Mask_(computing) - `align`は2の累乗であるため、その[2進数表現][binary representation]は1つのビットのみが1であるはずである(例:`0b000100000`)。これは、`align - 1`ではそれより下位のすべてのビットが1であることを意味する(例:`0b000011111`)。 - `!`演算子すなわち[ビットごとの`NOT`][bitwise `NOT`]を行うことで、「`align`より下位のビット」以外がすべて1であるような数字を得ることができる(例:`0b…111111111100000`) - あるアドレスと`!(align - 1)`の間で[ビットごとの`AND`][bitwise `AND`]を行うことで、アドレスを**下向きに**アラインする。なぜなら、`align`よりも小さいビットがすべて0になるからである。 - 下向きではなく上向きにアラインしたいので、ビットごとの`AND`の前に`addr`を`align - 1`だけ増やしておく。こうすると、すでにアラインされているアドレスには影響がないが、アラインされていないアドレスは次のアラインメント境界に丸められるようになる。 [binary representation]: https://en.wikipedia.org/wiki/Binary_number#Representation [bitwise `NOT`]: https://en.wikipedia.org/wiki/Bitwise_operation#NOT [bitwise `AND`]: https://en.wikipedia.org/wiki/Bitwise_operation#AND どちらの実装を使うかは自由です。結果は同じで、計算方法が違うだけです。 ### 使ってみる `linked_list_allocator`クレートの代わりにバンプアロケータを使うには、`allocator.rs`の`ALLOCATOR`静的変数を更新する必要があります: ```rust // in src/allocator.rs use bump::BumpAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(BumpAllocator::new()); ``` ここで、`BumpAllocator::new`と`Locked::new`を`const`関数として宣言しておいたことが効いてきます。`static`の初期化式はコンパイル時に評価可能でなければならないため、もしそれらが通常の関数だったならコンパイルエラーが発生していたでしょう。 [`const` functions]: https://doc.rust-lang.org/reference/items/functions.html#const-functions バンプアロケータは`linked_list_allocator`によって提供されるアロケータと同じインターフェイスを提供するため、`init_heap`関数の`ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE)`呼び出しを変更する必要はありません。 これで、私たちのカーネルはバンプアロケータを使うようになりました! 前回の記事で作った[`heap_allocation`のテスト][`heap_allocation` tests]を含め、すべての機能がうまくいくはずです。 [`heap_allocation` tests]: @/edition-2/posts/10-heap-allocation/index.ja.md#tesutowozhui-jia-suru ``` > cargo test --test heap_allocation […] Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` ### 議論 バンプアロケータの大きな利点は、非常に速いことです。`alloc`や`dealloc`のたびにサイズの合うメモリを動的に探索し様々な管理タスクを行う必要があるほかのアロケータの設計(後述)に比べると、バンプアロケータはたった数個のアセンブリ命令に[最適化することができる][bump downwards]のですから。これによりバンプアロケータは、メモリ割り当ての性能を最大化したいとき、例えば[仮想DOMライブラリ][virtual DOM library]を作成したいときなどに役に立ちます。 [bump downwards]: https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html [virtual DOM library]: https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/ バンプアロケータがグローバルアロケータとして使われることはまれですが、バンプアロケーションの原理はしばしば[アリーナアロケーション][arena allocation]の形で使われます。これは要するに割り当てをバッチにまとめることで性能を上げるというものです。Rustにおけるアリーナアロケータの例は[`toolshed`]クレートに含まれています。 [arena allocation]: https://mgravell.github.io/Pipelines.Sockets.Unofficial/docs/arenas.html [`toolshed`]: https://docs.rs/toolshed/0.8.1/toolshed/index.html #### バンプアロケータの欠点 バンプアロケータの主な制約は、すべてのメモリ割り当てが解放されないと割り当て解除 (デアロケート) されたメモリを再利用できないことです。これは、たった一つでも「寿命の長い」割り当てがあると、メモリの再利用ができなくなってしまうことを意味します。`many_boxes`テストを少し変更したものを追加すると、それが起こるのを見ることができます。 ```rust // in tests/heap_allocation.rs #[test_case] fn many_boxes_long_lived() { let long_lived = Box::new(1); // ここを追加 for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } assert_eq!(*long_lived, 1); // ここを追加 } ``` `many_boxes`テストと同様、このテストは大量の割り当てを行うことで、アロケータが解放されたメモリを再利用できていない場合にメモリ不足エラーを引き起こします。さらに、このテストではループの間ずっと存在している`long_lived`という割り当てを追加しています。 この新しいテストを実行しようとすると、確かに失敗することがわかります: ``` > cargo test --test heap_allocation Running 4 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [failed] Error: panicked at 'allocation error: Layout { size_: 8, align_: 8 }', src/lib.rs:86:5 ``` この失敗が発生する理由を詳しく理解してみましょう。まず、ヒープの先頭に変数`long_lived`の割り当てが作成され、`allocations`カウンタが1増加します。ループの反復ごとに、一時的な割り当てが作成され、次の反復が始まる前にすぐ解放されます。これは、`allocations`カウンタが反復の開始時に一時的に2に増加し、終了時に1に減少することを意味します。問題は、バンプアロケータは**すべての**割り当てが解放された時、つまり`allocations`カウンタが0に減ったときにのみメモリを再利用できるということです。これはループの間には起こらないため、各ループ反復で新しいメモリ領域が割り当てられ、結果として大量の反復の後にメモリ不足エラーを引き起こします。 #### テストを成功させるには このテストを成功させるために、私たちのバンプアロケータに行える工夫が二つほど考えられます: - `dealloc`を更新し、解放されたメモリが前回の`alloc`によって返されたものであるかを、その終端アドレスと`next`ポインタを比較することでチェックするようにします。もし等しいなら、`next`を解放された割り当ての先頭に戻しても大丈夫でしょう。こうすれば、それぞれの反復は同じメモリブロックを使うようになります。 - ヒープの**末尾**からメモリを割り当てていく`alloc_back`メソッドと、そのための`next_back`フィールドを追加するという方法もあります。長期間生存する割り当てには手動でこちらを使うようにすることで、ヒープ上における短期間の割り当てと長期間の割り当てを分離するのです。この「分離」は、どの割り当てがどのくらい生存するか事前にわかっていないと使えないということに注意してください。また、割り当てを手動で行うのは面倒だしunsafeかもしれないという欠点もあります。 どちらのアプローチでもテストを成功させられますが、非常に限られたケースでしかメモリを再利用できないため、一般的な解決策とはいえません。問題は、解放された**すべての**メモリを再利用する一般的な解決策はあるのか、ということです。 #### 解放されたすべてのメモリを再利用するには? [前回の記事][heap-intro]で学んだように、割り当ては任意の期間生存する可能性があり、どのような順序でも解放されえます。これは、次の例に示すように、個数に上限のない、非連続な未使用メモリ領域を管理する必要があることを意味します: [heap-intro]: @/edition-2/posts/10-heap-allocation/index.ja.md#dong-de-dainamituku-memori ![](allocation-fragmentation.svg) この図は、ヒープの経時変化を示しています。最初は、ヒープ全体が未使用で、`next`アドレスは`heap_start`に等しいです(1行目)。その後、最初の割り当てが行われます(2行目)。3行目では、2つ目のメモリブロックが割り当てられ、最初の割り当ては解放されています。4行目ではたくさんの割り当てが追加されています。それらの半分は非常に短命であり、すでに5行目では解放されていますが、この行では新しい割り当ても追加されています。 5行目が根本的な問題を示しています:サイズの異なる未使用のメモリ領域が5つありますが、`next`ポインタはそのうち最後の領域の先頭を指すことしかできません。たとえば今回なら、長さ4の配列に、ほかの未使用メモリ領域の開始アドレスとサイズを保存することはできます。しかし、未使用メモリ領域の数が8個とか16個、1000個にもなる例だって簡単に作れてしまうので、これは一般的な解決策ではありません。 普通、要素数に上限がないときは、ヒープに割り当てられたコレクションを使ってしまえばいいです。これは私たちの場合には実際には不可能です──なぜなら、ヒープアロケータが自分自身に依存するのは不可能ですから(無限再帰やデッドロックを起こしてしまうでしょう)。なので別の解決策を見つける必要があります。 ## 連結 (リンクト) リストアロケータ アロケータを実装する際、任意の数の空きメモリ領域を管理するためによく使われる方法は、これらの領域自体を管理領域として使用することです。この方法は、未使用メモリ領域もまた仮想アドレスにマッピングされており、対応する物理フレームも存在しはするが、そこに保存された情報はもはや必要ない、ということを利用します。解放された領域に関する情報をそれらの領域自体に保存することで、追加のメモリを必要とせずにいくらでも解放された領域を管理できます。 最もよく見られる実装方法は、解放されたメモリの中に、各ノードが解放されたメモリ領域であるような一つの連結リストを作るというものです: ![](linked-list-allocation.svg) リストの各ノードには、メモリ領域のサイズと次の未使用メモリ領域へのポインタの2つのフィールドが含まれています。このアプローチでは、未使用領域がいくつあろうと、そのすべてを最初の未使用領域(`head`と呼ばれる)へのポインタだけで管理できます。結果として生じるこのデータ構造は、しばしば[フリーリスト][_free list_]と呼ばれます。 [_free list_]: https://en.wikipedia.org/wiki/Free_list 名前から想像がつくかもしれませんが、この方法は`linked_list_allocator`クレートが使用しているものです(訳注:連結リストアロケータはlinked list allocatorの訳)。このテクニックを使用するアロケータは、しばしば**プールアロケータ**とも呼ばれます。 ### 実装 以下では、解放されたメモリ領域を管理するために上記の方法を使用する、独自のシンプルな`LinkedListAllocator`型を作成します。記事のこの部分は今後の記事には必要ありませんので、実装の詳細を飛ばしていただいてもかまいません。 #### アロケータ型 まず、新しい`allocator::linked_list`サブモジュールの中に非公開 (プライベート) の`ListNode`構造体を作ることから始めましょう: ```rust // in src/allocator.rs pub mod linked_list; ``` ```rust // in src/allocator/linked_list.rs struct ListNode { size: usize, next: Option<&'static mut ListNode>, } ``` 図に示したように、リストのノードは`size`フィールドと、次のノードへのオプショナルなポインタを持ちます。後者は`Option<&'static mut ListNode>`型によって表されます。`&'static mut`型はポインタで指されている[所有された][owned]オブジェクトを意味します。要するに、スコープの終了時にオブジェクトを解放するデストラクタを持たないような[`Box`]型です。 [owned]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html [`Box`]: https://doc.rust-lang.org/alloc/boxed/index.html 以下の`ListNode`のメソッドを実装します: ```rust // in src/allocator/linked_list.rs impl ListNode { const fn new(size: usize) -> Self { ListNode { size, next: None } } fn start_addr(&self) -> usize { self as *const Self as usize } fn end_addr(&self) -> usize { self.start_addr() + self.size } } ``` この型は`new`という単純なコンストラクタ関数を持ち、表現する領域の開始・終端アドレスを計算するメソッドを持っています。`new`関数は[const関数][const function]としていますが、これは後で静的な連結リストアロケータを作る際に必要になるためです。 [const function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions `ListNode`構造体を部品として使うことで、`LinkedListAllocator`構造体を作ることができます: ```rust // in src/allocator/linked_list.rs pub struct LinkedListAllocator { head: ListNode, } impl LinkedListAllocator { /// 空のLinkedListAllocatorを作る。 pub const fn new() -> Self { Self { head: ListNode::new(0), } } /// 与えられたヒープ境界でアロケータを初期化する。 /// /// この関数はunsafeである。なぜなら、呼び出し元は渡すヒープ境界が /// 有効でヒープが未使用であることを保証しなければならないからである。 /// このメソッドは一度しか呼ばれてはならない。 pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.add_free_region(heap_start, heap_size); } } /// 与えられたメモリ領域をリストの先頭に追加する。 unsafe fn add_free_region(&mut self, addr: usize, size: usize) { todo!(); } } ``` この構造体は、最初のヒープ領域を指す`head`ノードを持っています。ここでは`next`ポインタの値にしか興味がないので、`ListNode::new`関数では`size`を0にしてしまいます。`head`を単に`&'static mut ListNode`にするのではなく`ListNode`にすると、`alloc`メソッドの実装が単純にできるというメリットがあります。 バンプアロケータと同じように、`new`関数はアロケータをヒープ境界で初期化したりはしません。この理由は、APIの互換性を保つためというのに加え、初期化ルーチンがノードをヒープメモリに書き込む必要があり、これは実行時にしか行えないということがあります。`new`関数は`ALLOCATOR`静的変数を初期化するのに使われるので、[`const`関数][`const` function]すなわちコンパイル時に評価できる関数である必要があります。この理由によって、ここでも、非constな`init`メソッドを別に提供しているというわけです。 [`const` function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions `init`メソッドは`add_free_region`メソッドを使っていますが、この実装はすぐ後で示します。今のところは、[`todo!`]マクロを実装の代わりに置いておいて、常にパニックするようにしておきましょう。 [`todo!`]: https://doc.rust-lang.org/core/macro.todo.html #### `add_free_region`メソッド `add_free_region`メソッドは連結リストの最も基本的な操作である**プッシュ**操作を提供します。今はこのメソッドは`init`からしか呼んでいませんが、このメソッドは私たちが`dealloc`を実装する際にも中心的な役割を果たします。`dealloc`メソッドは割り当てられたメモリ領域が解放されたときに呼ばれるのだということを思い出してください。その解放されたメモリ領域を管理するために、それを連結リストにプッシュする必要があるのです。 `add_free_region`メソッドの実装は以下のようになります: ```rust // in src/allocator/linked_list.rs use super::align_up; use core::mem; impl LinkedListAllocator { /// 与えられたメモリ領域をリストの先頭に追加する。 unsafe fn add_free_region(&mut self, addr: usize, size: usize) { // 解放された領域がListNodeを格納できることを確かめる assert_eq!(align_up(addr, mem::align_of::()), addr); assert!(size >= mem::size_of::()); // 新しいリストノードを作り、それをリストの先頭に追加する let mut node = ListNode::new(size); node.next = self.head.next.take(); let node_ptr = addr as *mut ListNode; unsafe { node_ptr.write(node); self.head.next = Some(&mut *node_ptr) } } } ``` このメソッドはメモリ領域のアドレスと大きさを引数として取り、リストの先頭にそれを追加します。まず、与えられた領域が`ListNode`を格納するのに必要なサイズとアラインメントを満たしていることを確認します。次に、ノードを作成し、それを以下のようなステップでリストに追加します: ![](linked-list-allocator-push.svg) Step 0は`add_free_region`が呼ばれる前のヒープの状態を示しています。Step 1では、`add_free_region`メソッドが図において`freed`と書かれているメモリ領域で呼ばれました。初期チェックを終えると、このメソッドは[`Option::take`]メソッドを使ってノードの`next`ポインタを現在の`head`ポインタに設定し、これによって`head`ポインタは`None`に戻ります。 [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take Step 2では、このメソッドは新しく作られた`node`を`write`メソッドを使って解放されたメモリ領域の先頭に書き込みます。次に`head`ポインタがこの新しいノードを指すようにします。解放された領域は常にリストの先頭に挿入されていくので、結果として生じるポインタ構造はいささか混沌としているように思われますが、`head`ポインタからポインタをたどっていけば、解放されたそれぞれの領域に到達できるというのには変わりありません。 [`write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write #### `find_region`メソッド 連結リストの二つ目の基本操作は要素を探してリストからそれを取り除くことです。これは`alloc`メソッドの実装の中核となる操作です。この操作を`find_region`メソッドとして以下のように実装しましょう: ```rust // in src/allocator/linked_list.rs impl LinkedListAllocator { /// 与えられたサイズの解放された領域を探し、リストからそれを /// 取り除く。 /// /// リストノードと割り当ての開始アドレスからなるタプルを返す。 fn find_region(&mut self, size: usize, align: usize) -> Option<(&'static mut ListNode, usize)> { // 現在のリストノードへの参照。繰り返しごとに更新していく let mut current = &mut self.head; // 連結リストから十分大きな領域を探す while let Some(ref mut region) = current.next { if let Ok(alloc_start) = Self::alloc_from_region(®ion, size, align) { // 領域が割り当てに適している -> リストから除く let next = region.next.take(); let ret = Some((current.next.take().unwrap(), alloc_start)); current.next = next; return ret; } else { // 割り当てに適していない -> 次の領域で繰り返す current = current.next.as_mut().unwrap(); } } // 適した領域が見つからなかった None } } ``` このメソッドは`current`変数と`while let`ループを使ってリストの各要素に関して反復を行っています。はじめ、`current`は(ダミーの)`head`ノードに設定されています。繰り返しごとに(`else`ブロックで)これは現在のノードの`next`フィールドへと更新されます。領域が与えられたサイズとアラインメントの割り当てに適しているなら、その領域がリストから取り除かれて`alloc_start`アドレスとともに返されます。 [`while let` loop]: https://doc.rust-lang.org/reference/expressions/loop-expr.html#while-let-patterns `current.next`ポインタが`None`になった場合、ループから抜けます。これは、リスト全体を反復したものの割り当てに適した領域が見つからなかったことを意味します。その場合`None`を返します。領域が適しているか否かは`alloc_from_region`によってチェックされていますが、この関数の実装はすぐに示します。 適した領域がリストから除かれる様子をもう少し詳しく見てみましょう: ![](linked-list-allocator-remove-region.svg) Step 0はポインタに修正を行う前の状況を表しています。`region`と`current`という領域と、`region.next`と`current.next`というポインタが図中に示されています。Step 1では、`region.next`と`current.next`ポインタが[`Option::take`]メソッドによって`None`に戻されています。ポインタの元の値は`next`と`ret`というローカル変数に格納されています。 Step 2では、ポインタ`current.next`がローカル変数であるポインタ`next`(元々は`region.next`ポインタだったもの)に設定されています。これにより、`current`は`region`の次の領域を指すようになっているので、`region`はもはやこの連結リストの要素ではありません。この関数はその後、ローカル変数`ret`に格納されていた`region`へのポインタを返します。 ##### `alloc_from_region`関数 `alloc_from_region`関数は領域が与えられたサイズとアラインメントの割り当てに適しているかどうかを返します。以下のように定義されます: ```rust // in src/allocator/linked_list.rs impl LinkedListAllocator { /// 与えられた領域で与えられたサイズとアラインメントの /// 割り当てを行おうとする。 /// /// 成功した場合、割り当ての開始アドレスを返す。 fn alloc_from_region(region: &ListNode, size: usize, align: usize) -> Result { let alloc_start = align_up(region.start_addr(), align); let alloc_end = alloc_start.checked_add(size).ok_or(())?; if alloc_end > region.end_addr() { // 領域が小さすぎる return Err(()); } let excess_size = region.end_addr() - alloc_end; if excess_size > 0 && excess_size < mem::size_of::() { // 領域の残りが小さすぎてListNodeを格納できない(割り当ては // 領域を使用部と解放部に分けるので、この条件が必要) return Err(()); } // 領域は割り当てに適している Ok(alloc_start) } } ``` まず、この関数は行おうとしている割り当ての開始・終端アドレスを、先ほど定義した`align_up`関数と[`checked_add`]メソッドを使って計算します。オーバーフローが起こったり、(割り当ての)終端アドレスが領域の終端アドレスよりも後ろにあったりした場合は、割り当ては領域に入りきらないのでエラーを返します。 その後でこの関数は、必要な理由がやや分かりにくいチェックを行っています。このチェックが必要になるのは、多くの場合適した領域にも割り当てがぴったりフィットするわけではないので、割り当て後も一部の領域が使用可能なままになるからです。領域のこの部分は割り当て後も自分自身の`ListNode`を格納しなければならないので、それが可能なくらいのサイズがないといけません。このチェックはまさにそれを確かめています:割り当てが完璧にフィットするか(`excess_size == 0`)、または`ListNode`を格納するのに十分超過領域が大きいかを調べています。 #### `GlobalAlloc`を実装する `add_free_region`と`find_region`メソッドによって基本となる操作が提供されたので、ついに`GlobalAlloc`トレイトを実装することができます。バンプアロケータの時と同じように、このトレイトを`LinkedListAllocator`に直接実装するのではなく、ラップされた`Locked`に実装するようにします。[`Locked`ラッパ][`Locked` wrapper]はスピンロックによって内部可変性を追加するので、これにより`&self`参照しか取らない`alloc`や`dealloc`メソッドでもアロケータを変更できるようになります。 [`Locked` wrapper]: @/edition-2/posts/11-allocator-designs/index.ja.md#lockedratupaxing 実装は以下のようになります: ```rust // in src/allocator/linked_list.rs use super::Locked; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // レイアウト調整を行う let (size, align) = LinkedListAllocator::size_align(layout); let mut allocator = self.lock(); if let Some((region, alloc_start)) = allocator.find_region(size, align) { let alloc_end = alloc_start.checked_add(size).expect("overflow"); let excess_size = region.end_addr() - alloc_end; if excess_size > 0 { unsafe { allocator.add_free_region(alloc_end, excess_size); } } alloc_start as *mut u8 } else { ptr::null_mut() } } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { // レイアウト調整を行う let (size, _) = LinkedListAllocator::size_align(layout); unsafe { self.lock().add_free_region(ptr as usize, size) } } } ``` `dealloc`メソッドのほうが単純なのでこちらから見ていきましょう:このメソッドではまず、何かしらのレイアウト調整(すぐ後で説明します)を行っています。その次に、`&mut LinkedListAllocator`という参照を[`Locked`ラッパ][`Locked` wrapper]の[`Mutex::lock`]関数を呼ぶことによって取得します。最後に、`add_free_region`関数で割り当て解除された領域をフリーリストに追加します。 `alloc`メソッドはもう少し複雑です。(`dealloc`と)同じようにレイアウト調整を行い、[`Mutex::lock`]でアロケータの可変参照を得るところから始めます。次に`find_region`メソッドを使って割り当てに適したメモリ領域を見つけ、それをリストから取り除きます。これが成功せず`None`が返された場合、適したメモリ領域がないため、(このメソッドは)`null_mut`を返すことでエラーを表します。 成功した場合、`find_region`メソッドは(リストからすでに除かれた)適した領域と、割り当ての開始アドレスからなるタプルを返します。(それを受け、`alloc`は)`alloc_start`と割り当てのサイズ、および領域の終端アドレスを使うことで、割り当ての終端アドレスと超過サイズを再び計算します。もし超過サイズがゼロでないなら、`add_free_region`を呼んでメモリ領域の超過サイズをフリーリストに戻します。最後に、`alloc_start`アドレスを`*mut u8`ポインタにキャストして返します。 #### レイアウト調整 ……で、`alloc`と`dealloc`両方の最初に行っていたレイアウト調整はいったい何なのでしょうか? これらは、それぞれの割り当てブロックが`ListNode`を格納することができることを保証しているのです。これが重要なのは、このメモリブロックはいつか割り当て解除されることになるので、そのときそこに`ListNode`を書き込む必要が出てくるからです。ブロックが`ListNode`より小さかったり正しいアラインメントがなされていなかったりすると、未定義動作につながります。 レイアウト調整は`size_align`関数によって行われています。この定義は以下のようになっています: ```rust // in src/allocator/linked_list.rs impl LinkedListAllocator { /// 与えられたレイアウトを調整し、割り当てられるメモリ領域が /// `ListNode`を格納することもできるようにする。 /// /// 調整されたサイズとアラインメントをタプルとして返す。 fn size_align(layout: Layout) -> (usize, usize) { let layout = layout .align_to(mem::align_of::()) .expect("adjusting alignment failed") .pad_to_align(); let size = layout.size().max(mem::size_of::()); (size, layout.align()) } } ``` まず、この関数は渡された[`Layout`]の[`align_to`]メソッドを使って、そのアラインメントを`ListNode`のアラインメントにまで(必要なら)増やします。次に[`pad_to_align`]メソッドを使って、レイアウトのサイズがアラインメントの倍数であるようにし、次のメモリブロックのアラインメントもまた`ListNode`を格納できる適切なものになるようにします。 次に、[`max`]メソッドによって割り当てが最低でも`mem::size_of::`の大きさになるようにします。こうしておけば、`dealloc`関数は安心して`ListNode`を解放されたメモリブロックに書き込むことができます。 [`align_to`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align_to [`pad_to_align`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.pad_to_align [`max`]: https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.max ### 使ってみる 今や、`allocator`モジュール内の`ALLOCATOR`静的変数を新しい`LinkedListAllocator`で置き換えられます: ```rust // in src/allocator.rs use linked_list::LinkedListAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(LinkedListAllocator::new()); ``` `init`関数はバンプアロケータでも連結リストアロケータでも同じ振る舞いをするようにしたので、`init_heap`内における`init`関数の呼び出しを修正する必要はありません。 `heap_allocation`テストをもう一度実行すると、バンプアロケータでは失敗していた`many_boxes_long_lived`テストを含めすべてのテストをパスします: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` これは、私たちの連結リストアロケータが、二つ目以降の割り当てのメモリが解放されたときも、それを再利用できていることを示しています。 ### 議論 解放されたメモリをすぐに再利用できるため、連結リストアロケータは汎用のアロケータとしてバンプアロケータよりはるかに優れています。しかし欠点もあります。そのうちいくつかは私たちの実装が高度でないために起きているのですが、アロケータの設計自体にも根本的な欠点があるのです。 #### 解放されたブロックを結合する 私たちの実装の大きな問題は、ヒープをより小さなブロックへと分割してはいくものの、それらを結合し直すことは全くやっていないことです。次の例を考えてみましょう: ![](linked-list-allocator-fragmentation-on-dealloc.svg) 最初の行では、ヒープ上に三つの割り当てが作られています。2行目ではそのうち2つが、3行目では3つ目が解放されています。今やヒープ全体が未使用状態に戻ったわけですが、まだ4つの別々のブロックに分かれたままです。この時点で、4つのブロックどれもサイズが足らず、巨大な割り当てが不可能ということがあり得るかもしれません。時間がたち、このプロセスがさらに続くと、ヒープはさらに小さいブロックへと分割されています。いつかのタイミングで、ヒープがあまりにも断片化したせいで、普通の割り当てすら失敗するようになってしまうでしょう。 この問題を解決するためには、隣り合う解放されたブロックを結合する必要があります。上の例の場合、以下を意味します: ![](linked-list-allocator-merge-on-dealloc.svg) 図中の`2`の行では、以前のように、3つの割り当てのうち2つが解放されています。ここで、ヒープを断片化したままにしておくのではなく、追加で`2a`のステップを行って右端の二つのブロックを結合して一つに戻しましょう。`3`行目では(以前のように)3つめの領域が解放され、3つの異なるブロックで表される完全に未使用のヒープができました。追加で`3a`の結合ステップを行い、これらの隣り合ったブロックを結合して一つに戻します。 `linked_list_allocator`クレートはこのような結合戦略を以下のように実装しています:`deallocate`にて、解放されたメモリブロックを連結リストの先頭に入れるかわりに、リストを常に開始アドレスでソートされた状態にしておくのです。こうすると、`deallocate`関数の呼び出しが行われたときに、リスト内で隣り合うブロックのアドレスとサイズを調べることで、結合を即座に行うことができます。もちろん、このようにすると割り当て解除操作は遅くなってしまいますが、上で見たようなヒープの断片化は防ぐことができます。 #### 性能 前述したように、バンプアロケータはとんでもなく速く、ほんの数個のアセンブリ命令に最適化することができます。これらと比べると、連結リストアロケータの性能はずっと悪いです。問題は、割り当ての要求に対し、適したブロックが見つかるまで連結リスト全体を調べ上げる必要があるかもしれないことです。 リスト長は未使用のメモリブロックの数によって決まるので、プログラムごとに性能は大きく変わりえます。いくつかしか割り当てを行わないプログラムは、割り当ての性能が比較的よいと感じることでしょう。しかし、大量の割り当てでヒープを断片化させてしまうプログラムの場合、連結リストがとても長くなり、そのほとんどがとても小さなブロックしか持たないということになるので、割り当ての性能は非常に悪くなってしまうでしょう。 この性能の問題は、私たちの実装が簡素なせいで起きているのではなく、連結リストを使った方法の根本的な問題であるということに注意してください。アロケータの性能はカーネルレベルのコードにとって非常に重要になるので、ここからは第三のアプローチ──性能を向上する代わりに、メモリの利用効率を犠牲にするもの──を見ていきましょう。 ## 固定サイズブロックアロケータ 以下では、割り当ての要求を遂行するために固定サイズのメモリブロックを使うアロケータの設計を示します。こうすると、アロケータはしばしば必要なものより大きなブロックを返すので、[内部断片化][internal fragmentation]によるメモリの無駄が発生します。いっぽうで、適切なブロックを見つけるのに必要な時間が(連結リストアロケータと比べて)激減するので、割り当ての性能はずっとよくなります。 ### 導入 **固定サイズブロックアロケータ**の背後にある発想は以下のようなものです:要求された量ぴったりのメモリを返す代わりに、いくつかのブロックサイズを決めて、割り当てのサイズを次のブロックサイズに切り上げるようにするのです。たとえば、ブロックサイズを16, 64, 512バイトとしたら、4バイトの割り当ては16バイトのブロックを、48バイトの割り当ては64バイトのブロックを、128バイトの割り当ては512バイトのブロックを返します。 連結リストアロケータと同じように、未使用メモリ部に連結リストを作ることによって未使用メモリを管理します。しかし、様々なブロックサイズのブロックを持つ一つのリストを使うのではなく、それぞれのサイズクラスごとに別のリストを作ります。それぞれのリストは一つのサイズのブロックのみを格納するのです。例えば、ブロックサイズが16, 64, 512のとき、3つの別々の連結リストがメモリ内にできます: ![](fixed-size-block-example.svg). `head`ポインタも一つではなく、`head_16`, `head_64`, `head_512`という、対応するサイズの最初の未使用ブロックを指す3つのポインタがあることになります。一つのリスト内のノードはすべて同じサイズです。たとえば、`head_16`ポインタから始まるリストには16バイトのブロックのみが含まれます。これが意味するのは、ヘッドポインタの名前でそれぞれのリストのノードサイズは指定されているので、ノード内にそれらを格納する必要はないということです。 リスト内のそれぞれの要素は同じサイズを持っているので、割り当ての要求に要素が適しているかはすべての要素について同じです。これは、以下の手順をとることで非常に効率的に割り当てを行えるということを意味します: - 要求された割り当てサイズを次のブロックサイズに切り上げる。たとえば、上の例で12バイトの割り当てが要求されたら、ブロックサイズを16バイトとする。 - リストのヘッドポインタを手に入れる。ブロックサイズが16なら、`head_16`を使う。 - リストから最初のブロックを取り除きそれを返す。 注目すべきは、常にリストの最初の要素を返せばよく、リスト全体を走査する必要はないということです。よって、連結リストアロケータに比べて割り当てはずっと高速になります。 #### ブロックサイズと無駄になるメモリ ブロックサイズの決め方によっては、切り上げによって多くのメモリを失うことになります。例えば、128バイトの割り当てに対し512バイトのブロックが返されるとき、割り当てられたメモリの3/4は使われません。適切なブロックサイズを使うことで、無駄になるメモリの量をある程度にまで減らすことはできます。例えば、ブロックサイズとして2の累乗(4, 8, 16, 32, 64, 128, ……)を使うと、無駄になるメモリを最悪でもメモリサイズの半分、平均してメモリサイズの1/4とすることができます。 ブロックサイズをプログラムにおいてよく使われるサイズに基づいて最適化するというのも、よく行われます。例えば、24バイトのメモリ割り当てをよく行うプログラムにおけるメモリ効率を向上するため、24バイトのブロックサイズを追加することができるでしょう。このように、無駄になるメモリの量はしばしば性能上の利点を失うことなく減らすことができます。 #### 割り当て解除 割り当てと同様、割り当ての解除もとても重要です。以下の手順をとります: - 解放された割り当てサイズを次のブロックサイズに切り上げる。これが必要になるのは、コンパイラが`dealloc`に渡してくるのは要求したときの割り当てサイズであり、`alloc`によって返されたブロックのサイズではないためである。`alloc`と`dealloc`で同じサイズ修正関数を使うことで、正しい量のメモリを解放していることは保証される。 - リストのヘッドポインタを手に入れる。 - ヘッドポインタを更新することで、解放されたブロックをリストの先頭に追加する。 注目すべきは、割り当て解除においてもリストの走査は必要ないということです。これが意味するのは、`dealloc`に必要な時間はリスト長によらず一定だということです。 #### 代替 (フォールバック) アロケータ (2KBを超えるような)大きな割り当ては、とくにオペレーティングシステムのカーネルにおいては珍しいことが多いので、そのような割り当てに対しては代替 (フォールバック) のアロケータを使うのがよいかもしれません。例えば、2048バイトより大きな割り当てに対してはメモリの無駄を減らすために連結リストアロケータにフォールバックするのです。そのようなサイズの割り当ての数は非常に少ないはずなので、連結リストの長さが長くなることはなく、割り当て・割り当ての解除も比較的速くできるでしょう。 #### 新しいブロックを作る 上では、リスト内には特定のサイズのブロックがつねに十分あり、すべての割り当ての要求を満足できることを仮定していました。しかし、いつかの時点で、あるブロックサイズの連結リストが空になってしまうでしょう。そのとき、割り当ての要求を満足するために特定のサイズの未使用ブロックを作り出す方法が二つ考えられます: - 代替アロケータ(もしあるなら)から新しいブロックを割り当てる - 別のリストからより大きなブロックを持ってきて、それを分割する。この方法は、ブロックサイズが2の累乗であるときに最もうまくいく。例えば、32バイトのブロックは二つの16バイトのブロックに分割できる。 実装がずっと簡単になるので、私たちの実装では代替アロケータから新しいブロックを割り当てることにしましょう。 ### 実装 固定サイズブロックアロケータの仕組みを理解したので、実装を始めることができます。以前のパートで作成した連結リストアロケータの実装は使わないので、もし連結リストアロケータの実装部分を飛ばしていたとしても、この部分は読み進めることができます。 #### リストノード 実装は、新しい`allocator::fixed_size_block`モジュールに`ListNode`型を作るところから始めましょう。 ```rust // in src/allocator.rs pub mod fixed_size_block; ``` ```rust // in src/allocator/fixed_size_block.rs struct ListNode { next: Option<&'static mut ListNode>, } ``` この型は[連結リストアロケータの実装][linked list allocator implementation]における`ListNode`型と似ていますが、`size`フィールドがありません。固定サイズブロックアロケータにおいては、リスト内のすべてのブロックが同じサイズを持つため、必要ないのです。 [linked list allocator implementation]: #aroketaxing #### ブロックサイズ つぎに、私たちの実装におけるブロックサイズをもつ定数スライス`BLOCK_SIZES`を定義します: ```rust // in src/allocator/fixed_size_block.rs /// 使用するブロックサイズ。 /// /// これらは2の累乗でなければならない。なぜなら、これらは /// (2の累乗でなければならない)ブロックのアラインメントとしても使われるからである。 const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512, 1024, 2048]; ``` ブロックサイズとして、8から2048までの2の累乗を使います。8より小さいブロックサイズを定義しないのは、それぞれのブロックは、解放されたときに次のブロックを指す64ビットのポインタを格納することができなければならないからです。2048バイトより大きな割り当てに対しては、代替の連結リストアロケータに任せましょう。 実装を簡単にするために、ブロックのサイズとメモリに要求されるアラインメントを同じにすることにします。つまり、16バイトのブロックはつねに16バイトの境界に、512バイトのブロックは512バイトの境界に合わせられます。アラインメントは常に2の累乗でなければならないので、他のブロックサイズは許されないのです。2の累乗でないブロックサイズが必要になった場合は、(例えば、`BLOCK_ALIGNMENTS`配列を定義することで)この実装を修正することもできます。 #### アロケータ型 `ListNode`型と`BLOCK_SIZES`スライスを使って、私たちのアロケータ型を定義することができます: ```rust // in src/allocator/fixed_size_block.rs pub struct FixedSizeBlockAllocator { list_heads: [Option<&'static mut ListNode>; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap, } ``` `list_heads`フィールドはブロックサイズごとの`head`ポインタの配列です。これは`BLOCK_SIZES`に`len()`を使うことで配列長とすることで実装しています。最大のブロックサイズよりも大きな割り当てに対する代替アロケータとして、`linked_list_allocator`の提供するアロケータを使います。私たち自身で実装した`LinkedListAllocator`を使っても良いのですが、これには[解放されたブロックを結合][merge freed blocks]する機能が実装されていません。 [merge freed blocks]: #jie-fang-saretaburotukuwojie-he-suru `FixedSizeBlockAllocator`を作るには、他のアロケータ型に実装したのと同じ`new`関数と`init`関数を実装すればよいです: ```rust // in src/allocator/fixed_size_block.rs impl FixedSizeBlockAllocator { /// 空のFixedSizeBlockAllocatorを作る。 pub const fn new() -> Self { const EMPTY: Option<&'static mut ListNode> = None; FixedSizeBlockAllocator { list_heads: [EMPTY; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap::empty(), } } /// アロケータを与えられたヒープ境界で初期化する。 /// /// この関数はunsafeである;呼び出し元は与えるヒープ境界が有効であり /// ヒープが未使用であることを保証しなければならないからである。 /// このメソッドは一度しか呼ばれてはならない。 pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.fallback_allocator.init(heap_start, heap_size); } } } ``` `new`関数がするのは、`list_heads`配列を空のノードで初期化し、`fallback_allocator`として[`empty`]で空の連結リストアロケータを作ることだけです。`EMPTY`定数が必要なのは、Rustコンパイラに配列を定数値で初期化したいのだと伝えるためです。配列を直接`[None; BLOCK_SIZES.len()]`で初期化するとうまくいきません──なぜなら、そうするとコンパイラは`Option<&'static mut ListNode>`が`Copy`トレイトを実装していることを要求するようになるのですが、そうはなっていないからです。これは現在のRustコンパイラの制約であり、将来解決するかもしれません。 [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.empty このunsafeな`init`関数は`fallback_allocator`の[`init`]関数を呼ぶだけで、`list_heads`配列の初期化などは行いません。これらの配列の初期化は、`alloc`と`dealloc`呼び出しが行われたときに初めて行います。 [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init 利便性のため、`fallback_allocator`を使って割り当てを行う非公開のメソッド`fallback_alloc`も作ります: ```rust // in src/allocator/fixed_size_block.rs use alloc::alloc::Layout; use core::ptr; impl FixedSizeBlockAllocator { /// 代替アロケータを使って割り当てを行う。 fn fallback_alloc(&mut self, layout: Layout) -> *mut u8 { match self.fallback_allocator.allocate_first_fit(layout) { Ok(ptr) => ptr.as_ptr(), Err(_) => ptr::null_mut(), } } } ``` `linked_list_allocator`クレートの[`Heap`]型は[`GlobalAlloc`]を実装してはいません([ロックを使わない限り不可能なため][not possible without locking])。代わりに、[`allocate_first_fit`]というインターフェイスの少し違うメソッドを提供しています。これは、`*mut u8`を返したり、エラーを表すためにヌルポインタを使うのではなく、`Result, ()>`を返します。[`NonNull`]型は、ヌルポインタでないことが保証されている生ポインタの抽象化です。`Ok`の場合は[`NonNull::as_ptr`]メソッドへ、`Err`の場合ヌルポインタへと対応づけることで、これを簡単に`*mut u8` 型に戻すことができます。 [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [not possible without locking]: #globalalloctoke-bian-xing [`allocate_first_fit`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.allocate_first_fit [`NonNull`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html [`NonNull::as_ptr`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html#method.as_ptr #### リストのインデックスを計算する `GlobalAlloc`トレイトを実装する前に、与えられた[`Layout`]を格納できる最小のブロックサイズを返すようなヘルパ関数`list_index`を定義します: ```rust // in src/allocator/fixed_size_block.rs /// 与えられたレイアウトに対して適切なブロックサイズを選ぶ。 /// /// `BLOCK_SIZES`配列のインデックスを返す。 fn list_index(layout: &Layout) -> Option { let required_block_size = layout.size().max(layout.align()); BLOCK_SIZES.iter().position(|&s| s >= required_block_size) } ``` ブロックは少なくとも与えられた`Layout`の要求するサイズとアラインメントを持っていないといけません。私たちはブロックサイズがブロックのアラインメントでもあると定義していたので、これは`required_block_size`がレイアウトの[`size()`]と[`align()`]属性の[最大値][maximum]であるということを意味します。`BLOCK_SIZES`スライスの中でそれよりも大きいブロックを探すために、まず[`iter()`]メソッドでイテレータを得て、つぎに[`position()`]メソッドで`required_block_size`以上の大きさを持つ最初のブロックのインデックスを見つけます。 [maximum]: https://doc.rust-lang.org/core/cmp/trait.Ord.html#method.max [`size()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.size [`align()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align [`iter()`]: https://doc.rust-lang.org/std/primitive.slice.html#method.iter [`position()`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.position ブロックサイズそのものではなく、`BLOCK_SIZES`スライスのインデックスを返していることに注意してください。これは、ここで返したインデックスを`list_heads`配列のインデックスとして使いたいからです。 #### `GlobalAlloc`を実装する 最後のステップは、`GlobalAlloc`トレイトを実装することです: ```rust // in src/allocator/fixed_size_block.rs use super::Locked; use alloc::alloc::GlobalAlloc; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { todo!(); } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { todo!(); } } ``` 他のアロケータの時と同じく、`GlobalAlloc`トレイトをアロケータ型に直接実装するのではなく、[`Locked`ラッパ][`Locked` wrapper]を使って同期された内部可変性を追加しています。`alloc`と`dealloc`の実装は結構長いので、以下で一つ一つ示していきます。 ##### `alloc` `alloc`メソッドの実装は以下のようになります: ```rust // src/allocator/fixed_size_block.rsの`impl`ブロックの中 unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { match allocator.list_heads[index].take() { Some(node) => { allocator.list_heads[index] = node.next.take(); node as *mut ListNode as *mut u8 } None => { // リストにブロックがない→新しいブロックを割り当てる let block_size = BLOCK_SIZES[index]; // すべてのブロックサイズが2の累乗であるときにのみ正しく動く let block_align = block_size; let layout = Layout::from_size_align(block_size, block_align) .unwrap(); allocator.fallback_alloc(layout) } } } None => allocator.fallback_alloc(layout), } } ``` 一つ一つ見ていきましょう: まず、`Locked::lock`メソッドを使ってラップされたアロケータのインスタンスへの可変参照を手に入れます。次に、ついさっき定義した`list_index`関数を呼んで、与えられたレイアウトに対して適切なブロックサイズを計算し、`list_heads`配列の対応するインデックスを得ます。これが`None`だったなら、割り当てに適したブロックサイズはないので、`fallback_alloc`関数を使って`fallback_allocator`を使います。 もしリストのインデックスが`Some`なら、`list_heads[index]`から始まる対応するリストから[`Option::take`]メソッドを使って最初のノードを取り出すことを試みます。リストが空でないなら、`match`文の`Some(node)`節に入り、(ふたたび[`take`][`Option::take`]を使って)`node`の次の要素を取り出しリストの先頭のポインタとします。最後に、取り出された`node`ポインタを`*mut u8`として返します。 [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take もしリストのヘッドが`None`だったなら、ブロックリストが空であったということです。この場合、[上で説明した](#xin-siiburotukuwozuo-ru)ように新しいブロックを作らなくてはなりません。そのために、まず現在のブロックサイズを`BLOCK_SIZES`スライスから得て、それを新しいブロックのサイズとアラインメント両方として使います。それによって新しい`Layout`を作り、`fallback_alloc`メソッドを使って割り当てを行います。レイアウトとアラインメントの調整をしているのは、割り当て解除の際にこのブロックがブロックリストに追加されるからです。 #### `dealloc` `dealloc`メソッドの実装は以下のようになります: ```rust // in src/allocator/fixed_size_block.rs use core::{mem, ptr::NonNull}; // `unsafe impl GlobalAlloc`ブロックの中 unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { let new_node = ListNode { next: allocator.list_heads[index].take(), }; // ブロックがノードを格納できるサイズとアラインメントを持っていることを確認 assert!(mem::size_of::() <= BLOCK_SIZES[index]); assert!(mem::align_of::() <= BLOCK_SIZES[index]); let new_node_ptr = ptr as *mut ListNode; unsafe { new_node_ptr.write(new_node); allocator.list_heads[index] = Some(&mut *new_node_ptr); } } None => { let ptr = NonNull::new(ptr).unwrap(); unsafe { allocator.fallback_allocator.deallocate(ptr, layout); } } } } ``` `alloc`と同じように、まず`lock`メソッドを使ってアロケータの可変参照を得て、`list_index`関数で与えられた`Layout`に対応するブロックリストを得ます。インデックスが`None`なら、`BLOCK_SIZES`にはサイズの合うブロックサイズがなかった、つまりこの割り当てが代替アロケータによって行われたことを意味します。従って、代替アロケータの[`deallocate`][`Heap::deallocate`]を使ってメモリを解放します。このメソッドは`*mut u8`ではなく[`NonNull`]を受け取るので、先にポインタを変換しておく必要があります(ここの`unwrap`はポインタがヌル値だったときのみ失敗するのですが、コンパイラが`dealloc`を呼ぶときにはそれは決して起きないはずです)。 [`Heap::deallocate`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.deallocate もし`list_index`がブロックのインデックスを返したなら、解放されたメモリブロックをリストに追加しなければなりません。このために、まず現在のリストの先頭を指す新しい`ListNode`を(ここでも[`Option::take`]を使って)作ります。新しいノードを解放されたメモリブロックに書き込む前に、`index`によって指定されている現在のブロックサイズが`ListNode`を格納するのに必要なサイズとアラインメントを満たしていることをassertします。その後与えられた`*mut u8`ポインタを`*mut ListNode`ポインタに変換し、これに対しunsafeな[`write`][`pointer::write`]メソッドを使うことで書き込みを実行します。最後のステップはリストの先頭ポインタ──これに対して`take`を呼んだので現在は`None`です──を設定することです。このために、生の`new_node_ptr`を可変参照に変換します。 [`pointer::write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write いくつか注目すべきことがあります: - 私たちは、ブロックリストによって割り当てられたブロックと代替アロケータによって割り当てられたブロックを区別していません。これにより、`alloc`で作られた新しいブロックは`dealloc`でブロックリストに追加されるので、そのサイズのブロックの数は増えることになります。 - 私たちの実装において、新しいブロックが作られる唯一の場所は`alloc`メソッドです。つまり、最初は空のブロックリストから始めて、それらのブロックサイズの割り当てが行われたときに初めてリストを埋めていくということです。 - `alloc`と`dealloc`で`unsafe`な操作を行っていますが、`unsafe`ブロックは必要ありません。これは、Rustは現在unsafeな関数の中身全体を大きな`unsafe`ブロックとして扱っているからです。明示的に`unsafe`ブロックを使うと、どの操作がunsafeなのかそうでないのかが明白になるという利点があるので、この挙動を変更する[RFCが提案](https://github.com/rust-lang/rfcs/pull/2585)されています。 ### 使う 私たちが今作った`FixedSizeBlockAllocator`を使うには、`allocator`モジュールの`ALLOCATOR`静的変数を更新する必要があります: ```rust // in src/allocator.rs use fixed_size_block::FixedSizeBlockAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new( FixedSizeBlockAllocator::new()); ``` `init`関数は、私たちの実装してきたすべてのアロケータで同じように振る舞うので、`init_heap`内における`init`関数の呼び出しを修正する必要はありません。 `heap_allocation`テストをもう一度実行すると、すべてのテストが変わらずパスしているはずです: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` 私たちの新しいアロケータはうまく動いてるみたいですね! ### 議論 固定サイズブロック方式は連結リスト方式よりはるかに優れた性能を持っていますが、(2の累乗をブロックサイズとして使うとき)最大でメモリの半分を無駄にします。このトレードオフに価値があるかは、行われる割り当ての種類に大きく依存します。オペレーティングシステムのカーネルについては、性能が非常に重要なので、固定サイズブロック方式はよりよい選択であるように思われます。 実装の面では、現在の実装には様々な改善可能な箇所があります。 - ブロックが必要になってから代替アロケータで割り当てる代わりに、リストを事前に埋めておき最初の割り当ての性能を向上させる方が良いかもしれません。 - 実装を簡単にするため、2の累乗のブロックサイズのみを許すことで、ブロックサイズをアラインメントとしても使えるようにしました。アラインメントを別のやり方で格納する(もしくは計算する)ことで、任意の他のブロックサイズを使うこともできるでしょう。こうすると、より多くのブロックサイズ(例えば、よくある割り当てサイズのもの)を追加でき、無駄になるメモリを最小化できます。 - 現在、新しいブロックを作ることはしますが、それらを解放することは行っていません。これは断片化につながり、最終的には巨大な割り当ての失敗につながるかもしれません。それぞれのブロックサイズの最大リスト長を制限する方が良いかもしれません。最大長に達すると、その後の割り当て解除はリストに加える代わりに代替アロケータを使って解放するようにします。 - 4KiB以上の割り当てについて、連結リストアロケータで代替するかわりに特別なアロケータを使うことが考えられます。発想としては、4KiBのページの上で動作する仕組みである[ページング][paging]を利用し、連続した仮想メモリのブロックを非連続な物理フレームへと対応づけるのです。こうすると、巨大な割り当てに関する未使用メモリの断片化はもはや問題ではなくなります。 - この「ページアロケータ」があるなら、ブロックサイズを4KiBまで増やし、連結リストアロケータはなくしてしまっても良いかもしれません。このやり方の利点は、断片化が少なくなり、性能の予測性が高まる──つまり、最悪の場合の性能がより良くなる──ことです。 [paging]: @/edition-2/posts/08-paging-introduction/index.ja.md 上で述べた実装の改善点は、あくまで提案に過ぎないということを忘れないでください。オペレーティングシステムのアロケータは、概してカーネル特有の作業のために高度に最適化されていますが、これは詳細なプロファイリングをしてこそ可能になるものなのです。 ### 亜種 また、固定サイズブロックアロケータの設計には多くの亜種があります。有名な例として**スラブアロケータ**と**バディアロケータ**の二つがあり、これらはLinuxのような有名なカーネルにおいても使われています。以下では、これらの二つの設計を軽く紹介します。 #### スラブアロケータ [スラブアロケータ][slab allocator]の発想は、カーネルで使われる型をいくつか選び、それらに直接対応するブロックサイズを使うというものです。こうすると、それらの型の割り当てサイズはブロックサイズに完全に一致するので、メモリは一切無駄になりません。時には、未使用ブロック内の型インスタンスを事前初期化することでさらに性能を向上させられるかもしれません。 [slab allocator]: https://en.wikipedia.org/wiki/Slab_allocation スラブアロケータはしばしば他のアロケータと組み合わせて使われます。例えば、固定サイズブロックアロケータと組み合わせて、割り当てられたブロックをさらに分割しメモリの無駄を減らすことができます。一つの巨大な割り当ての上で[オブジェクトプール][object pool pattern]を実装するのにもよく使われます。 [object pool pattern]: https://en.wikipedia.org/wiki/Object_pool_pattern #### バディアロケータ [バディアロケータ][buddy allocator]では、解放されたブロックの管理に連結リストを使う代わりに、[二分木][binary tree]を使い、ブロックサイズを2の累乗にします。あるサイズの新しいブロックが必要になったら、より大きいサイズのブロックを二つに割り、木に二つの子ノードを作ります。ブロックが解放されたときは毎回、木での隣のブロックを調べます。もし隣も解放されているなら、二つのブロックを合わせて二倍の大きさのブロックに戻します。 この合体ステップのおかげで、[外部断片化][external fragmentation]が少なくなり、解放されたブロックが大きな割り当てに再利用できます。代替アロケータも使わないので、性能の予測可能性も高まります。最大の問題は、2の累乗のブロックサイズしか使えないので、大量のメモリが[内部断片化][internal fragmentation]で無駄になるかもしれないことです。このためバディアロケータはしばしば、割り当てたブロックをより小さな複数のブロックに分割するスラブアロケータと組み合わせて使われます。 [buddy allocator]: https://en.wikipedia.org/wiki/Buddy_memory_allocation [binary tree]: https://en.wikipedia.org/wiki/Binary_tree [external fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#External_fragmentation [internal fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#Internal_fragmentation ## まとめ この記事では様々なアロケータの設計を概観しました。一つの`next`ポインタを増やしていくことでメモリを線形に渡していく、基本の[バンプアロケータ][bump allocator]の実装を学びました。バンプアロケータはとても速いですが、割り当てがすべて解放されてからでないとメモリを再利用できません。そのため、グローバルアロケータとして使われることはまれです。 [bump allocator]: @/edition-2/posts/11-allocator-designs/index.ja.md#banpuaroketa 次に、解放されたメモリブロック自体を使って[フリーリスト][free list]と呼ばれる連結リストを作る[連結リストアロケータ][linked list allocator]を作りました。このリストによって、さまざまなサイズ・任意の数の解放されたブロックを格納することができます。この手法は、メモリが一切無駄にならない一方、割り当ての要求によってリスト全体を走査する必要が出てくる可能性があり、性能が悪いです。私たちの実装では、隣接する解放されたブロックを結合することをしていないので、[外部断片化][external fragmentation]も起きてしまいます。 [linked list allocator]: @/edition-2/posts/11-allocator-designs/index.ja.md#lian-jie-rinkuto-risutoaroketa [free list]: https://en.wikipedia.org/wiki/Free_list 連結リスト方式の性能の問題を解決するため、決められたブロックサイズの集合を事前に定義しておく[固定サイズブロックアロケータ][fixed-size block allocator]を作りました。ブロックサイズごとに別々の[フリーリスト][free list]が存在するので、割り当て・割り当て解除はリストの先頭で挿入・取り出しを行えば良いだけになり、非常に速いです。それぞれの割り当てはそれより大きなブロックサイズに丸められるので、[内部断片化][internal fragmentation]によっていくらかのメモリが無駄になります。 [fixed-size block allocator]: @/edition-2/posts/11-allocator-designs/index.ja.md#gu-ding-saizuburotukuaroketa アロケータの設計はもっとたくさんあり、それぞれ異なるトレードオフがあります。[スラブアロケータ][Slab allocation]はよくある固定サイズの構造の割り当てをうまく最適化できますが、どのような状況でも使えるとは限りません。[バディアロケータ][Buddy allocation]は二分木を使って解放されたブロックを結合し直しますが、2の累乗のブロックサイズしか使えないので、大量のメモリを無駄にしてしまいます。また、カーネルの実装ごとに行う作業の内容は違うので、どんな状況にも対応できる「最強の」アロケータの設計などないということを覚えておくのが大事です。 [Slab allocation]: @/edition-2/posts/11-allocator-designs/index.ja.md#surabuaroketa [Buddy allocation]: @/edition-2/posts/11-allocator-designs/index.ja.md#badeiaroketa ## 次は? この記事で、メモリ管理の実装に関してはいったん終わりとします。次は[**マルチタスク**][_multitasking_]について、手始めに[**async/await**][_async/await_]の形を取った協調的マルチタスクから学んでいきます。その後の記事で、[**スレッド**][_threads_]、[**マルチプロセス**][_multiprocessing_]、[**プロセス**][_processes_]についても学びます。 [_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking [_threads_]: https://en.wikipedia.org/wiki/Thread_(computing) [_processes_]: https://en.wikipedia.org/wiki/Process_(computing) [_multiprocessing_]: https://en.wikipedia.org/wiki/Multiprocessing [_async/await_]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html ================================================ FILE: blog/content/edition-2/posts/11-allocator-designs/index.md ================================================ +++ title = "Allocator Designs" weight = 11 path = "allocator-designs" date = 2020-01-20 [extra] chapter = "Memory Management" +++ This post explains how to implement heap allocators from scratch. It presents and discusses different allocator designs, including bump allocation, linked list allocation, and fixed-size block allocation. For each of the three designs, we will create a basic implementation that can be used for our kernel. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-11`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-11 ## Introduction In the [previous post], we added basic support for heap allocations to our kernel. For that, we [created a new memory region][map-heap] in the page tables and [used the `linked_list_allocator` crate][use-alloc-crate] to manage that memory. While we have a working heap now, we left most of the work to the allocator crate without trying to understand how it works. [previous post]: @/edition-2/posts/10-heap-allocation/index.md [map-heap]: @/edition-2/posts/10-heap-allocation/index.md#creating-a-kernel-heap [use-alloc-crate]: @/edition-2/posts/10-heap-allocation/index.md#using-an-allocator-crate In this post, we will show how to create our own heap allocator from scratch instead of relying on an existing allocator crate. We will discuss different allocator designs, including a simplistic _bump allocator_ and a basic _fixed-size block allocator_, and use this knowledge to implement an allocator with improved performance (compared to the `linked_list_allocator` crate). ### Design Goals The responsibility of an allocator is to manage the available heap memory. It needs to return unused memory on `alloc` calls and keep track of memory freed by `dealloc` so that it can be reused again. Most importantly, it must never hand out memory that is already in use somewhere else because this would cause undefined behavior. Apart from correctness, there are many secondary design goals. For example, the allocator should effectively utilize the available memory and keep [_fragmentation_] low. Furthermore, it should work well for concurrent applications and scale to any number of processors. For maximal performance, it could even optimize the memory layout with respect to the CPU caches to improve [cache locality] and avoid [false sharing]. [cache locality]: https://www.geeksforgeeks.org/locality-of-reference-and-cache-operation-in-cache-memory/ [_fragmentation_]: https://en.wikipedia.org/wiki/Fragmentation_(computing) [false sharing]: https://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html These requirements can make good allocators very complex. For example, [jemalloc] has over 30.000 lines of code. This complexity is often undesired in kernel code, where a single bug can lead to severe security vulnerabilities. Fortunately, the allocation patterns of kernel code are often much simpler compared to userspace code, so that relatively simple allocator designs often suffice. [jemalloc]: http://jemalloc.net/ In the following, we present three possible kernel allocator designs and explain their advantages and drawbacks. ## Bump Allocator The most simple allocator design is a _bump allocator_ (also known as _stack allocator_). It allocates memory linearly and only keeps track of the number of allocated bytes and the number of allocations. It is only useful in very specific use cases because it has a severe limitation: it can only free all memory at once. ### Idea The idea behind a bump allocator is to linearly allocate memory by increasing (_"bumping"_) a `next` variable, which points to the start of the unused memory. At the beginning, `next` is equal to the start address of the heap. On each allocation, `next` is increased by the allocation size so that it always points to the boundary between used and unused memory: ![The heap memory area at three points in time: 1: A single allocation exists at the start of the heap; the `next` pointer points to its end. 2: A second allocation was added right after the first; the `next` pointer points to the end of the second allocation. 3: A third allocation was added right after the second one; the `next` pointer points to the end of the third allocation.](bump-allocation.svg) The `next` pointer only moves in a single direction and thus never hands out the same memory region twice. When it reaches the end of the heap, no more memory can be allocated, resulting in an out-of-memory error on the next allocation. A bump allocator is often implemented with an allocation counter, which is increased by 1 on each `alloc` call and decreased by 1 on each `dealloc` call. When the allocation counter reaches zero, it means that all allocations on the heap have been deallocated. In this case, the `next` pointer can be reset to the start address of the heap, so that the complete heap memory is available for allocations again. ### Implementation We start our implementation by declaring a new `allocator::bump` submodule: ```rust // in src/allocator.rs pub mod bump; ``` The content of the submodule lives in a new `src/allocator/bump.rs` file, which we create with the following content: ```rust // in src/allocator/bump.rs pub struct BumpAllocator { heap_start: usize, heap_end: usize, next: usize, allocations: usize, } impl BumpAllocator { /// Creates a new empty bump allocator. pub const fn new() -> Self { BumpAllocator { heap_start: 0, heap_end: 0, next: 0, allocations: 0, } } /// Initializes the bump allocator with the given heap bounds. /// /// This method is unsafe because the caller must ensure that the given /// memory range is unused. Also, this method must be called only once. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { self.heap_start = heap_start; self.heap_end = heap_start + heap_size; self.next = heap_start; } } ``` The `heap_start` and `heap_end` fields keep track of the lower and upper bounds of the heap memory region. The caller needs to ensure that these addresses are valid, otherwise the allocator would return invalid memory. For this reason, the `init` function needs to be `unsafe` to call. The purpose of the `next` field is to always point to the first unused byte of the heap, i.e., the start address of the next allocation. It is set to `heap_start` in the `init` function because at the beginning, the entire heap is unused. On each allocation, this field will be increased by the allocation size (_"bumped"_) to ensure that we don't return the same memory region twice. The `allocations` field is a simple counter for the active allocations with the goal of resetting the allocator after the last allocation has been freed. It is initialized with 0. We chose to create a separate `init` function instead of performing the initialization directly in `new` in order to keep the interface identical to the allocator provided by the `linked_list_allocator` crate. This way, the allocators can be switched without additional code changes. ### Implementing `GlobalAlloc` As [explained in the previous post][global-alloc], all heap allocators need to implement the [`GlobalAlloc`] trait, which is defined like this: [global-alloc]: @/edition-2/posts/10-heap-allocation/index.md#the-allocator-interface [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` Only the `alloc` and `dealloc` methods are required; the other two methods have default implementations and can be omitted. #### First Implementation Attempt Let's try to implement the `alloc` method for our `BumpAllocator`: ```rust // in src/allocator/bump.rs use alloc::alloc::{GlobalAlloc, Layout}; unsafe impl GlobalAlloc for BumpAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // TODO alignment and bounds check let alloc_start = self.next; self.next = alloc_start + layout.size(); self.allocations += 1; alloc_start as *mut u8 } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { todo!(); } } ``` First, we use the `next` field as the start address for our allocation. Then we update the `next` field to point to the end address of the allocation, which is the next unused address on the heap. Before returning the start address of the allocation as a `*mut u8` pointer, we increase the `allocations` counter by 1. Note that we don't perform any bounds checks or alignment adjustments, so this implementation is not safe yet. This does not matter much because it fails to compile anyway with the following error: ``` error[E0594]: cannot assign to `self.next` which is behind a `&` reference --> src/allocator/bump.rs:29:9 | 29 | self.next = alloc_start + layout.size(); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&` reference, so the data it refers to cannot be written ``` (The same error also occurs for the `self.allocations += 1` line. We omitted it here for brevity.) The error occurs because the [`alloc`] and [`dealloc`] methods of the `GlobalAlloc` trait only operate on an immutable `&self` reference, so updating the `next` and `allocations` fields is not possible. This is problematic because updating `next` on every allocation is the essential principle of a bump allocator. [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc #### `GlobalAlloc` and Mutability Before we look at a possible solution to this mutability problem, let's try to understand why the `GlobalAlloc` trait methods are defined with `&self` arguments: As we saw [in the previous post][global-allocator], the global heap allocator is defined by adding the `#[global_allocator]` attribute to a `static` that implements the `GlobalAlloc` trait. Static variables are immutable in Rust, so there is no way to call a method that takes `&mut self` on the static allocator. For this reason, all the methods of `GlobalAlloc` only take an immutable `&self` reference. [global-allocator]: @/edition-2/posts/10-heap-allocation/index.md#the-global-allocator-attribute Fortunately, there is a way to get a `&mut self` reference from a `&self` reference: We can use synchronized [interior mutability] by wrapping the allocator in a [`spin::Mutex`] spinlock. This type provides a `lock` method that performs [mutual exclusion] and thus safely turns a `&self` reference to a `&mut self` reference. We've already used the wrapper type multiple times in our kernel, for example for the [VGA text buffer][vga-mutex]. [interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [vga-mutex]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks [`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html [mutual exclusion]: https://en.wikipedia.org/wiki/Mutual_exclusion #### A `Locked` Wrapper Type With the help of the `spin::Mutex` wrapper type, we can implement the `GlobalAlloc` trait for our bump allocator. The trick is to implement the trait not for the `BumpAllocator` directly, but for the wrapped `spin::Mutex` type: ```rust unsafe impl GlobalAlloc for spin::Mutex {…} ``` Unfortunately, this still doesn't work because the Rust compiler does not permit trait implementations for types defined in other crates: ``` error[E0117]: only traits defined in the current crate can be implemented for arbitrary types --> src/allocator/bump.rs:28:1 | 28 | unsafe impl GlobalAlloc for spin::Mutex { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^-------------------------- | | | | | `spin::mutex::Mutex` is not defined in the current crate | impl doesn't use only types from inside the current crate | = note: define and implement a trait or new type instead ``` To fix this, we need to create our own wrapper type around `spin::Mutex`: ```rust // in src/allocator.rs /// A wrapper around spin::Mutex to permit trait implementations. pub struct Locked { inner: spin::Mutex, } impl Locked { pub const fn new(inner: A) -> Self { Locked { inner: spin::Mutex::new(inner), } } pub fn lock(&self) -> spin::MutexGuard { self.inner.lock() } } ``` The type is a generic wrapper around a `spin::Mutex`. It imposes no restrictions on the wrapped type `A`, so it can be used to wrap all kinds of types, not just allocators. It provides a simple `new` constructor function that wraps a given value. For convenience, it also provides a `lock` function that calls `lock` on the wrapped `Mutex`. Since the `Locked` type is general enough to be useful for other allocator implementations too, we put it in the parent `allocator` module. #### Implementation for `Locked` The `Locked` type is defined in our own crate (in contrast to `spin::Mutex`), so we can use it to implement `GlobalAlloc` for our bump allocator. The full implementation looks like this: ```rust // in src/allocator/bump.rs use super::{align_up, Locked}; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut bump = self.lock(); // get a mutable reference let alloc_start = align_up(bump.next, layout.align()); let alloc_end = match alloc_start.checked_add(layout.size()) { Some(end) => end, None => return ptr::null_mut(), }; if alloc_end > bump.heap_end { ptr::null_mut() // out of memory } else { bump.next = alloc_end; bump.allocations += 1; alloc_start as *mut u8 } } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { let mut bump = self.lock(); // get a mutable reference bump.allocations -= 1; if bump.allocations == 0 { bump.next = bump.heap_start; } } } ``` The first step for both `alloc` and `dealloc` is to call the [`Mutex::lock`] method through the `inner` field to get a mutable reference to the wrapped allocator type. The instance remains locked until the end of the method, so that no data race can occur in multithreaded contexts (we will add threading support soon). [`Mutex::lock`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock Compared to the previous prototype, the `alloc` implementation now respects alignment requirements and performs a bounds check to ensure that the allocations stay inside the heap memory region. The first step is to round up the `next` address to the alignment specified by the `Layout` argument. The code for the `align_up` function is shown in a moment. We then add the requested allocation size to `alloc_start` to get the end address of the allocation. To prevent integer overflow on large allocations, we use the [`checked_add`] method. If an overflow occurs or if the resulting end address of the allocation is larger than the end address of the heap, we return a null pointer to signal an out-of-memory situation. Otherwise, we update the `next` address and increase the `allocations` counter by 1 like before. Finally, we return the `alloc_start` address converted to a `*mut u8` pointer. [`checked_add`]: https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html The `dealloc` function ignores the given pointer and `Layout` arguments. Instead, it just decreases the `allocations` counter. If the counter reaches `0` again, it means that all allocations were freed again. In this case, it resets the `next` address to the `heap_start` address to make the complete heap memory available again. #### Address Alignment The `align_up` function is general enough that we can put it into the parent `allocator` module. A basic implementation looks like this: ```rust // in src/allocator.rs /// Align the given address `addr` upwards to alignment `align`. fn align_up(addr: usize, align: usize) -> usize { let remainder = addr % align; if remainder == 0 { addr // addr already aligned } else { addr - remainder + align } } ``` The function first computes the [remainder] of the division of `addr` by `align`. If the remainder is `0`, the address is already aligned with the given alignment. Otherwise, we align the address by subtracting the remainder (so that the new remainder is 0) and then adding the alignment (so that the address does not become smaller than the original address). [remainder]: https://en.wikipedia.org/wiki/Euclidean_division Note that this isn't the most efficient way to implement this function. A much faster implementation looks like this: ```rust /// Align the given address `addr` upwards to alignment `align`. /// /// Requires that `align` is a power of two. fn align_up(addr: usize, align: usize) -> usize { (addr + align - 1) & !(align - 1) } ``` This method requires `align` to be a power of two, which can be guaranteed by utilizing the `GlobalAlloc` trait (and its [`Layout`] parameter). This makes it possible to create a [bitmask] to align the address in a very efficient way. To understand how it works, let's go through it step by step, starting on the right side: [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html [bitmask]: https://en.wikipedia.org/wiki/Mask_(computing) - Since `align` is a power of two, its [binary representation] has only a single bit set (e.g. `0b000100000`). This means that `align - 1` has all the lower bits set (e.g. `0b00011111`). - By creating the [bitwise `NOT`] through the `!` operator, we get a number that has all the bits set except for the bits lower than `align` (e.g. `0b…111111111100000`). - By performing a [bitwise `AND`] on an address and `!(align - 1)`, we align the address _downwards_. This works by clearing all the bits that are lower than `align`. - Since we want to align upwards instead of downwards, we increase the `addr` by `align - 1` before performing the bitwise `AND`. This way, already aligned addresses remain the same while non-aligned addresses are rounded to the next alignment boundary. [binary representation]: https://en.wikipedia.org/wiki/Binary_number#Representation [bitwise `NOT`]: https://en.wikipedia.org/wiki/Bitwise_operation#NOT [bitwise `AND`]: https://en.wikipedia.org/wiki/Bitwise_operation#AND Which variant you choose is up to you. Both compute the same result, only using different methods. ### Using It To use the bump allocator instead of the `linked_list_allocator` crate, we need to update the `ALLOCATOR` static in `allocator.rs`: ```rust // in src/allocator.rs use bump::BumpAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(BumpAllocator::new()); ``` Here it becomes important that we declared `BumpAllocator::new` and `Locked::new` as [`const` functions]. If they were normal functions, a compilation error would occur because the initialization expression of a `static` must be evaluable at compile time. [`const` functions]: https://doc.rust-lang.org/reference/items/functions.html#const-functions We don't need to change the `ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE)` call in our `init_heap` function because the bump allocator provides the same interface as the allocator provided by the `linked_list_allocator`. Now our kernel uses our bump allocator! Everything should still work, including the [`heap_allocation` tests] that we created in the previous post: [`heap_allocation` tests]: @/edition-2/posts/10-heap-allocation/index.md#adding-a-test ``` > cargo test --test heap_allocation […] Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` ### Discussion The big advantage of bump allocation is that it's very fast. Compared to other allocator designs (see below) that need to actively look for a fitting memory block and perform various bookkeeping tasks on `alloc` and `dealloc`, a bump allocator [can be optimized][bump downwards] to just a few assembly instructions. This makes bump allocators useful for optimizing the allocation performance, for example when creating a [virtual DOM library]. [bump downwards]: https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html [virtual DOM library]: https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/ While a bump allocator is seldom used as the global allocator, the principle of bump allocation is often applied in the form of [arena allocation], which basically batches individual allocations together to improve performance. An example of an arena allocator for Rust is contained in the [`toolshed`] crate. [arena allocation]: https://mgravell.github.io/Pipelines.Sockets.Unofficial/docs/arenas.html [`toolshed`]: https://docs.rs/toolshed/0.8.1/toolshed/index.html #### The Drawback of a Bump Allocator The main limitation of a bump allocator is that it can only reuse deallocated memory after all allocations have been freed. This means that a single long-lived allocation suffices to prevent memory reuse. We can see this when we add a variation of the `many_boxes` test: ```rust // in tests/heap_allocation.rs #[test_case] fn many_boxes_long_lived() { let long_lived = Box::new(1); // new for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } assert_eq!(*long_lived, 1); // new } ``` Like the `many_boxes` test, this test creates a large number of allocations to provoke an out-of-memory failure if the allocator does not reuse freed memory. Additionally, the test creates a `long_lived` allocation, which lives for the whole loop execution. When we try to run our new test, we see that it indeed fails: ``` > cargo test --test heap_allocation Running 4 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [failed] Error: panicked at 'allocation error: Layout { size_: 8, align_: 8 }', src/lib.rs:86:5 ``` Let's try to understand why this failure occurs in detail: First, the `long_lived` allocation is created at the start of the heap, thereby increasing the `allocations` counter by 1. For each iteration of the loop, a short-lived allocation is created and directly freed again before the next iteration starts. This means that the `allocations` counter is temporarily increased to 2 at the beginning of an iteration and decreased to 1 at the end of it. The problem now is that the bump allocator can only reuse memory after _all_ allocations have been freed, i.e., when the `allocations` counter falls to 0. Since this doesn't happen before the end of the loop, each loop iteration allocates a new region of memory, leading to an out-of-memory error after a number of iterations. #### Fixing the Test? There are two potential tricks that we could utilize to fix the test for our bump allocator: - We could update `dealloc` to check whether the freed allocation was the last allocation returned by `alloc` by comparing its end address with the `next` pointer. In case they're equal, we can safely reset `next` back to the start address of the freed allocation. This way, each loop iteration reuses the same memory block. - We could add an `alloc_back` method that allocates memory from the _end_ of the heap using an additional `next_back` field. Then we could manually use this allocation method for all long-lived allocations, thereby separating short-lived and long-lived allocations on the heap. Note that this separation only works if it's clear beforehand how long each allocation will live. Another drawback of this approach is that manually performing allocations is cumbersome and potentially unsafe. While both of these approaches work to fix the test, they are not a general solution since they are only able to reuse memory in very specific cases. The question is: Is there a general solution that reuses _all_ freed memory? #### Reusing All Freed Memory? As we learned [in the previous post][heap-intro], allocations can live arbitrarily long and can be freed in an arbitrary order. This means that we need to keep track of a potentially unbounded number of non-continuous, unused memory regions, as illustrated by the following example: [heap-intro]: @/edition-2/posts/10-heap-allocation/index.md#dynamic-memory ![](allocation-fragmentation.svg) The graphic shows the heap over the course of time. At the beginning, the complete heap is unused, and the `next` address is equal to `heap_start` (line 1). Then the first allocation occurs (line 2). In line 3, a second memory block is allocated and the first allocation is freed. Many more allocations are added in line 4. Half of them are very short-lived and already get freed in line 5, where another new allocation is also added. Line 5 shows the fundamental problem: We have five unused memory regions with different sizes, but the `next` pointer can only point to the beginning of the last region. While we could store the start addresses and sizes of the other unused memory regions in an array of size 4 for this example, this isn't a general solution since we could easily create an example with 8, 16, or 1000 unused memory regions. Normally, when we have a potentially unbounded number of items, we can just use a heap-allocated collection. This isn't really possible in our case, since the heap allocator can't depend on itself (it would cause endless recursion or deadlocks). So we need to find a different solution. ## Linked List Allocator A common trick to keep track of an arbitrary number of free memory areas when implementing allocators is to use these areas themselves as backing storage. This utilizes the fact that the regions are still mapped to a virtual address and backed by a physical frame, but the stored information is not needed anymore. By storing the information about the freed region in the region itself, we can keep track of an unbounded number of freed regions without needing additional memory. The most common implementation approach is to construct a single linked list in the freed memory, with each node being a freed memory region: ![](linked-list-allocation.svg) Each list node contains two fields: the size of the memory region and a pointer to the next unused memory region. With this approach, we only need a pointer to the first unused region (called `head`) to keep track of all unused regions, regardless of their number. The resulting data structure is often called a [_free list_]. [_free list_]: https://en.wikipedia.org/wiki/Free_list As you might guess from the name, this is the technique that the `linked_list_allocator` crate uses. Allocators that use this technique are also often called _pool allocators_. ### Implementation In the following, we will create our own simple `LinkedListAllocator` type that uses the above approach for keeping track of freed memory regions. This part of the post isn't required for future posts, so you can skip the implementation details if you like. #### The Allocator Type We start by creating a private `ListNode` struct in a new `allocator::linked_list` submodule: ```rust // in src/allocator.rs pub mod linked_list; ``` ```rust // in src/allocator/linked_list.rs struct ListNode { size: usize, next: Option<&'static mut ListNode>, } ``` Like in the graphic, a list node has a `size` field and an optional pointer to the next node, represented by the `Option<&'static mut ListNode>` type. The `&'static mut` type semantically describes an [owned] object behind a pointer. Basically, it's a [`Box`] without a destructor that frees the object at the end of the scope. [owned]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html [`Box`]: https://doc.rust-lang.org/alloc/boxed/index.html We implement the following set of methods for `ListNode`: ```rust // in src/allocator/linked_list.rs impl ListNode { const fn new(size: usize) -> Self { ListNode { size, next: None } } fn start_addr(&self) -> usize { self as *const Self as usize } fn end_addr(&self) -> usize { self.start_addr() + self.size } } ``` The type has a simple constructor function named `new` and methods to calculate the start and end addresses of the represented region. We make the `new` function a [const function], which will be required later when constructing a static linked list allocator. [const function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions With the `ListNode` struct as a building block, we can now create the `LinkedListAllocator` struct: ```rust // in src/allocator/linked_list.rs pub struct LinkedListAllocator { head: ListNode, } impl LinkedListAllocator { /// Creates an empty LinkedListAllocator. pub const fn new() -> Self { Self { head: ListNode::new(0), } } /// Initialize the allocator with the given heap bounds. /// /// This function is unsafe because the caller must guarantee that the given /// heap bounds are valid and that the heap is unused. This method must be /// called only once. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.add_free_region(heap_start, heap_size); } } /// Adds the given memory region to the front of the list. unsafe fn add_free_region(&mut self, addr: usize, size: usize) { todo!(); } } ``` The struct contains a `head` node that points to the first heap region. We are only interested in the value of the `next` pointer, so we set the `size` to 0 in the `ListNode::new` function. Making `head` a `ListNode` instead of just a `&'static mut ListNode` has the advantage that the implementation of the `alloc` method will be simpler. Like for the bump allocator, the `new` function doesn't initialize the allocator with the heap bounds. In addition to maintaining API compatibility, the reason is that the initialization routine requires writing a node to the heap memory, which can only happen at runtime. The `new` function, however, needs to be a [`const` function] that can be evaluated at compile time because it will be used for initializing the `ALLOCATOR` static. For this reason, we again provide a separate, non-constant `init` method. [`const` function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions The `init` method uses an `add_free_region` method, whose implementation will be shown in a moment. For now, we use the [`todo!`] macro to provide a placeholder implementation that always panics. [`todo!`]: https://doc.rust-lang.org/core/macro.todo.html #### The `add_free_region` Method The `add_free_region` method provides the fundamental _push_ operation on the linked list. We currently only call this method from `init`, but it will also be the central method in our `dealloc` implementation. Remember, the `dealloc` method is called when an allocated memory region is freed again. To keep track of this freed memory region, we want to push it to the linked list. The implementation of the `add_free_region` method looks like this: ```rust // in src/allocator/linked_list.rs use super::align_up; use core::mem; impl LinkedListAllocator { /// Adds the given memory region to the front of the list. unsafe fn add_free_region(&mut self, addr: usize, size: usize) { // ensure that the freed region is capable of holding ListNode assert_eq!(align_up(addr, mem::align_of::()), addr); assert!(size >= mem::size_of::()); // create a new list node and append it at the start of the list let mut node = ListNode::new(size); node.next = self.head.next.take(); let node_ptr = addr as *mut ListNode; unsafe { node_ptr.write(node); self.head.next = Some(&mut *node_ptr) } } } ``` The method takes the address and size of a memory region as an argument and adds it to the front of the list. First, it ensures that the given region has the necessary size and alignment for storing a `ListNode`. Then it creates the node and inserts it into the list through the following steps: ![](linked-list-allocator-push.svg) Step 0 shows the state of the heap before `add_free_region` is called. In step 1, the method is called with the memory region marked as `freed` in the graphic. After the initial checks, the method creates a new `node` on its stack with the size of the freed region. It then uses the [`Option::take`] method to set the `next` pointer of the node to the current `head` pointer, thereby resetting the `head` pointer to `None`. [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take In step 2, the method writes the newly created `node` to the beginning of the freed memory region through the [`write`] method. It then points the `head` pointer to the new node. The resulting pointer structure looks a bit chaotic because the freed region is always inserted at the beginning of the list, but if we follow the pointers, we see that each free region is still reachable from the `head` pointer. [`write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write #### The `find_region` Method The second fundamental operation on a linked list is finding an entry and removing it from the list. This is the central operation needed for implementing the `alloc` method. We implement the operation as a `find_region` method in the following way: ```rust // in src/allocator/linked_list.rs impl LinkedListAllocator { /// Looks for a free region with the given size and alignment and removes /// it from the list. /// /// Returns a tuple of the list node and the start address of the allocation. fn find_region(&mut self, size: usize, align: usize) -> Option<(&'static mut ListNode, usize)> { // reference to current list node, updated for each iteration let mut current = &mut self.head; // look for a large enough memory region in linked list while let Some(ref mut region) = current.next { if let Ok(alloc_start) = Self::alloc_from_region(®ion, size, align) { // region suitable for allocation -> remove node from list let next = region.next.take(); let ret = Some((current.next.take().unwrap(), alloc_start)); current.next = next; return ret; } else { // region not suitable -> continue with next region current = current.next.as_mut().unwrap(); } } // no suitable region found None } } ``` The method uses a `current` variable and a [`while let` loop] to iterate over the list elements. At the beginning, `current` is set to the (dummy) `head` node. On each iteration, it is then updated to the `next` field of the current node (in the `else` block). If the region is suitable for an allocation with the given size and alignment, the region is removed from the list and returned together with the `alloc_start` address. [`while let` loop]: https://doc.rust-lang.org/reference/expressions/loop-expr.html#while-let-patterns When the `current.next` pointer becomes `None`, the loop exits. This means we iterated over the whole list but found no region suitable for an allocation. In that case, we return `None`. Whether a region is suitable is checked by the `alloc_from_region` function, whose implementation will be shown in a moment. Let's take a more detailed look at how a suitable region is removed from the list: ![](linked-list-allocator-remove-region.svg) Step 0 shows the situation before any pointer adjustments. The `region` and `current` regions and the `region.next` and `current.next` pointers are marked in the graphic. In step 1, both the `region.next` and `current.next` pointers are reset to `None` by using the [`Option::take`] method. The original pointers are stored in local variables called `next` and `ret`. In step 2, the `current.next` pointer is set to the local `next` pointer, which is the original `region.next` pointer. The effect is that `current` now directly points to the region after `region`, so that `region` is no longer an element of the linked list. The function then returns the pointer to `region` stored in the local `ret` variable. ##### The `alloc_from_region` Function The `alloc_from_region` function returns whether a region is suitable for an allocation with a given size and alignment. It is defined like this: ```rust // in src/allocator/linked_list.rs impl LinkedListAllocator { /// Try to use the given region for an allocation with given size and /// alignment. /// /// Returns the allocation start address on success. fn alloc_from_region(region: &ListNode, size: usize, align: usize) -> Result { let alloc_start = align_up(region.start_addr(), align); let alloc_end = alloc_start.checked_add(size).ok_or(())?; if alloc_end > region.end_addr() { // region too small return Err(()); } let excess_size = region.end_addr() - alloc_end; if excess_size > 0 && excess_size < mem::size_of::() { // rest of region too small to hold a ListNode (required because the // allocation splits the region in a used and a free part) return Err(()); } // region suitable for allocation Ok(alloc_start) } } ``` First, the function calculates the start and end address of a potential allocation, using the `align_up` function we defined earlier and the [`checked_add`] method. If an overflow occurs or if the end address is behind the end address of the region, the allocation doesn't fit in the region and we return an error. The function performs a less obvious check after that. This check is necessary because most of the time an allocation does not fit a suitable region perfectly, so that a part of the region remains usable after the allocation. This part of the region must store its own `ListNode` after the allocation, so it must be large enough to do so. The check verifies exactly that: either the allocation fits perfectly (`excess_size == 0`) or the excess size is large enough to store a `ListNode`. #### Implementing `GlobalAlloc` With the fundamental operations provided by the `add_free_region` and `find_region` methods, we can now finally implement the `GlobalAlloc` trait. As with the bump allocator, we don't implement the trait directly for the `LinkedListAllocator` but only for a wrapped `Locked`. The [`Locked` wrapper] adds interior mutability through a spinlock, which allows us to modify the allocator instance even though the `alloc` and `dealloc` methods only take `&self` references. [`Locked` wrapper]: @/edition-2/posts/11-allocator-designs/index.md#a-locked-wrapper-type The implementation looks like this: ```rust // in src/allocator/linked_list.rs use super::Locked; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // perform layout adjustments let (size, align) = LinkedListAllocator::size_align(layout); let mut allocator = self.lock(); if let Some((region, alloc_start)) = allocator.find_region(size, align) { let alloc_end = alloc_start.checked_add(size).expect("overflow"); let excess_size = region.end_addr() - alloc_end; if excess_size > 0 { unsafe { allocator.add_free_region(alloc_end, excess_size); } } alloc_start as *mut u8 } else { ptr::null_mut() } } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { // perform layout adjustments let (size, _) = LinkedListAllocator::size_align(layout); unsafe { self.lock().add_free_region(ptr as usize, size) } } } ``` Let's start with the `dealloc` method because it is simpler: First, it performs some layout adjustments, which we will explain in a moment. Then, it retrieves a `&mut LinkedListAllocator` reference by calling the [`Mutex::lock`] function on the [`Locked` wrapper]. Lastly, it calls the `add_free_region` function to add the deallocated region to the free list. The `alloc` method is a bit more complex. It starts with the same layout adjustments and also calls the [`Mutex::lock`] function to receive a mutable allocator reference. Then it uses the `find_region` method to find a suitable memory region for the allocation and remove it from the list. If this doesn't succeed and `None` is returned, it returns `null_mut` to signal an error as there is no suitable memory region. In the success case, the `find_region` method returns a tuple of the suitable region (no longer in the list) and the start address of the allocation. Using `alloc_start`, the allocation size, and the end address of the region, it calculates the end address of the allocation and the excess size again. If the excess size is not null, it calls `add_free_region` to add the excess size of the memory region back to the free list. Finally, it returns the `alloc_start` address casted as a `*mut u8` pointer. #### Layout Adjustments So what are these layout adjustments that we make at the beginning of both `alloc` and `dealloc`? They ensure that each allocated block is capable of storing a `ListNode`. This is important because the memory block is going to be deallocated at some point, where we want to write a `ListNode` to it. If the block is smaller than a `ListNode` or does not have the correct alignment, undefined behavior can occur. The layout adjustments are performed by the `size_align` function, which is defined like this: ```rust // in src/allocator/linked_list.rs impl LinkedListAllocator { /// Adjust the given layout so that the resulting allocated memory /// region is also capable of storing a `ListNode`. /// /// Returns the adjusted size and alignment as a (size, align) tuple. fn size_align(layout: Layout) -> (usize, usize) { let layout = layout .align_to(mem::align_of::()) .expect("adjusting alignment failed") .pad_to_align(); let size = layout.size().max(mem::size_of::()); (size, layout.align()) } } ``` First, the function uses the [`align_to`] method on the passed [`Layout`] to increase the alignment to the alignment of a `ListNode` if necessary. It then uses the [`pad_to_align`] method to round up the size to a multiple of the alignment to ensure that the start address of the next memory block will have the correct alignment for storing a `ListNode` too. In the second step, it uses the [`max`] method to enforce a minimum allocation size of `mem::size_of::`. This way, the `dealloc` function can safely write a `ListNode` to the freed memory block. [`align_to`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align_to [`pad_to_align`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.pad_to_align [`max`]: https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.max ### Using it We can now update the `ALLOCATOR` static in the `allocator` module to use our new `LinkedListAllocator`: ```rust // in src/allocator.rs use linked_list::LinkedListAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(LinkedListAllocator::new()); ``` Since the `init` function behaves the same for the bump and linked list allocators, we don't need to modify the `init` call in `init_heap`. When we now run our `heap_allocation` tests again, we see that all tests pass now, including the `many_boxes_long_lived` test that failed with the bump allocator: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` This shows that our linked list allocator is able to reuse freed memory for subsequent allocations. ### Discussion In contrast to the bump allocator, the linked list allocator is much more suitable as a general-purpose allocator, mainly because it is able to directly reuse freed memory. However, it also has some drawbacks. Some of them are only caused by our basic implementation, but there are also fundamental drawbacks of the allocator design itself. #### Merging Freed Blocks The main problem with our implementation is that it only splits the heap into smaller blocks but never merges them back together. Consider this example: ![](linked-list-allocator-fragmentation-on-dealloc.svg) In the first line, three allocations are created on the heap. Two of them are freed again in line 2 and the third is freed in line 3. Now the complete heap is unused again, but it is still split into four individual blocks. At this point, a large allocation might not be possible anymore because none of the four blocks is large enough. Over time, the process continues, and the heap is split into smaller and smaller blocks. At some point, the heap is so fragmented that even normal sized allocations will fail. To fix this problem, we need to merge adjacent freed blocks back together. For the above example, this would mean the following: ![](linked-list-allocator-merge-on-dealloc.svg) Like before, two of the three allocations are freed in line `2`. Instead of keeping the fragmented heap, we now perform an additional step in line `2a` to merge the two rightmost blocks back together. In line `3`, the third allocation is freed (like before), resulting in a completely unused heap represented by three distinct blocks. In an additional merging step in line `3a`, we then merge the three adjacent blocks back together. The `linked_list_allocator` crate implements this merging strategy in the following way: Instead of inserting freed memory blocks at the beginning of the linked list on `deallocate`, it always keeps the list sorted by start address. This way, merging can be performed directly on the `deallocate` call by examining the addresses and sizes of the two neighboring blocks in the list. Of course, the deallocation operation is slower this way, but it prevents the heap fragmentation we saw above. #### Performance As we learned above, the bump allocator is extremely fast and can be optimized to just a few assembly operations. The linked list allocator performs much worse in this category. The problem is that an allocation request might need to traverse the complete linked list until it finds a suitable block. Since the list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience relatively fast allocation performance. For a program that fragments the heap with many allocations, however, the allocation performance will be very bad because the linked list will be very long and mostly contain very small blocks. It's worth noting that this performance issue isn't a problem caused by our basic implementation but a fundamental problem of the linked list approach. Since allocation performance can be very important for kernel-level code, we explore a third allocator design in the following that trades improved performance for reduced memory utilization. ## Fixed-Size Block Allocator In the following, we present an allocator design that uses fixed-size memory blocks for fulfilling allocation requests. This way, the allocator often returns blocks that are larger than needed for allocations, which results in wasted memory due to [internal fragmentation]. On the other hand, it drastically reduces the time required to find a suitable block (compared to the linked list allocator), resulting in much better allocation performance. ### Introduction The idea behind a _fixed-size block allocator_ is the following: Instead of allocating exactly as much memory as requested, we define a small number of block sizes and round up each allocation to the next block size. For example, with block sizes of 16, 64, and 512 bytes, an allocation of 4 bytes would return a 16-byte block, an allocation of 48 bytes a 64-byte block, and an allocation of 128 bytes a 512-byte block. Like the linked list allocator, we keep track of the unused memory by creating a linked list in the unused memory. However, instead of using a single list with different block sizes, we create a separate list for each size class. Each list then only stores blocks of a single size. For example, with block sizes of 16, 64, and 512, there would be three separate linked lists in memory: ![](fixed-size-block-example.svg). Instead of a single `head` pointer, we have the three head pointers `head_16`, `head_64`, and `head_512` that each point to the first unused block of the corresponding size. All nodes in a single list have the same size. For example, the list started by the `head_16` pointer only contains 16-byte blocks. This means that we no longer need to store the size in each list node since it is already specified by the name of the head pointer. Since each element in a list has the same size, each list element is equally suitable for an allocation request. This means that we can very efficiently perform an allocation using the following steps: - Round up the requested allocation size to the next block size. For example, when an allocation of 12 bytes is requested, we would choose the block size of 16 in the above example. - Retrieve the head pointer for the list, e.g., for block size 16, we need to use `head_16`. - Remove the first block from the list and return it. Most notably, we can always return the first element of the list and no longer need to traverse the full list. Thus, allocations are much faster than with the linked list allocator. #### Block Sizes and Wasted Memory Depending on the block sizes, we lose a lot of memory by rounding up. For example, when a 512-byte block is returned for a 128-byte allocation, three-quarters of the allocated memory is unused. By defining reasonable block sizes, it is possible to limit the amount of wasted memory to some degree. For example, when using the powers of 2 (4, 8, 16, 32, 64, 128, …) as block sizes, we can limit the memory waste to half of the allocation size in the worst case and a quarter of the allocation size in the average case. It is also common to optimize block sizes based on common allocation sizes in a program. For example, we could additionally add block size 24 to improve memory usage for programs that often perform allocations of 24 bytes. This way, the amount of wasted memory can often be reduced without losing the performance benefits. #### Deallocation Much like allocation, deallocation is also very performant. It involves the following steps: - Round up the freed allocation size to the next block size. This is required since the compiler only passes the requested allocation size to `dealloc`, not the size of the block that was returned by `alloc`. By using the same size-adjustment function in both `alloc` and `dealloc`, we can make sure that we always free the correct amount of memory. - Retrieve the head pointer for the list. - Add the freed block to the front of the list by updating the head pointer. Most notably, no traversal of the list is required for deallocation either. This means that the time required for a `dealloc` call stays the same regardless of the list length. #### Fallback Allocator Given that large allocations (>2 KB) are often rare, especially in operating system kernels, it might make sense to fall back to a different allocator for these allocations. For example, we could fall back to a linked list allocator for allocations greater than 2048 bytes in order to reduce memory waste. Since only very few allocations of that size are expected, the linked list would stay small and the (de)allocations would still be reasonably fast. #### Creating new Blocks Above, we always assumed that there are always enough blocks of a specific size in the list to fulfill all allocation requests. However, at some point, the linked list for a given block size becomes empty. At this point, there are two ways we can create new unused blocks of a specific size to fulfill an allocation request: - Allocate a new block from the fallback allocator (if there is one). - Split a larger block from a different list. This best works if block sizes are powers of two. For example, a 32-byte block can be split into two 16-byte blocks. For our implementation, we will allocate new blocks from the fallback allocator since the implementation is much simpler. ### Implementation Now that we know how a fixed-size block allocator works, we can start our implementation. We won't depend on the implementation of the linked list allocator created in the previous section, so you can follow this part even if you skipped the linked list allocator implementation. #### List Node We start our implementation by creating a `ListNode` type in a new `allocator::fixed_size_block` module: ```rust // in src/allocator.rs pub mod fixed_size_block; ``` ```rust // in src/allocator/fixed_size_block.rs struct ListNode { next: Option<&'static mut ListNode>, } ``` This type is similar to the `ListNode` type of our [linked list allocator implementation], with the difference that we don't have a `size` field. It isn't needed because every block in a list has the same size with the fixed-size block allocator design. [linked list allocator implementation]: #the-allocator-type #### Block Sizes Next, we define a constant `BLOCK_SIZES` slice with the block sizes used for our implementation: ```rust // in src/allocator/fixed_size_block.rs /// The block sizes to use. /// /// The sizes must each be power of 2 because they are also used as /// the block alignment (alignments must be always powers of 2). const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512, 1024, 2048]; ``` As block sizes, we use powers of 2, starting from 8 up to 2048. We don't define any block sizes smaller than 8 because each block must be capable of storing a 64-bit pointer to the next block when freed. For allocations greater than 2048 bytes, we will fall back to a linked list allocator. To simplify the implementation, we define the size of a block as its required alignment in memory. So a 16-byte block is always aligned on a 16-byte boundary and a 512-byte block is aligned on a 512-byte boundary. Since alignments always need to be powers of 2, this rules out any other block sizes. If we need block sizes that are not powers of 2 in the future, we can still adjust our implementation for this (e.g., by defining a second `BLOCK_ALIGNMENTS` array). #### The Allocator Type Using the `ListNode` type and the `BLOCK_SIZES` slice, we can now define our allocator type: ```rust // in src/allocator/fixed_size_block.rs pub struct FixedSizeBlockAllocator { list_heads: [Option<&'static mut ListNode>; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap, } ``` The `list_heads` field is an array of `head` pointers, one for each block size. This is implemented by using the `len()` of the `BLOCK_SIZES` slice as the array length. As a fallback allocator for allocations larger than the largest block size, we use the allocator provided by the `linked_list_allocator`. We could also use the `LinkedListAllocator` we implemented ourselves instead, but it has the disadvantage that it does not [merge freed blocks]. [merge freed blocks]: #merging-freed-blocks For constructing a `FixedSizeBlockAllocator`, we provide the same `new` and `init` functions that we implemented for the other allocator types too: ```rust // in src/allocator/fixed_size_block.rs impl FixedSizeBlockAllocator { /// Creates an empty FixedSizeBlockAllocator. pub const fn new() -> Self { const EMPTY: Option<&'static mut ListNode> = None; FixedSizeBlockAllocator { list_heads: [EMPTY; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap::empty(), } } /// Initialize the allocator with the given heap bounds. /// /// This function is unsafe because the caller must guarantee that the given /// heap bounds are valid and that the heap is unused. This method must be /// called only once. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.fallback_allocator.init(heap_start, heap_size); } } } ``` The `new` function just initializes the `list_heads` array with empty nodes and creates an [`empty`] linked list allocator as `fallback_allocator`. The `EMPTY` constant is needed to tell the Rust compiler that we want to initialize the array with a constant value. Initializing the array directly as `[None; BLOCK_SIZES.len()]` does not work, because then the compiler requires `Option<&'static mut ListNode>` to implement the `Copy` trait, which it does not. This is a current limitation of the Rust compiler, which might go away in the future. [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.empty The unsafe `init` function only calls the [`init`] function of the `fallback_allocator` without doing any additional initialization of the `list_heads` array. Instead, we will initialize the lists lazily on `alloc` and `dealloc` calls. [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init For convenience, we also create a private `fallback_alloc` method that allocates using the `fallback_allocator`: ```rust // in src/allocator/fixed_size_block.rs use alloc::alloc::Layout; use core::ptr; impl FixedSizeBlockAllocator { /// Allocates using the fallback allocator. fn fallback_alloc(&mut self, layout: Layout) -> *mut u8 { match self.fallback_allocator.allocate_first_fit(layout) { Ok(ptr) => ptr.as_ptr(), Err(_) => ptr::null_mut(), } } } ``` The [`Heap`] type of the `linked_list_allocator` crate does not implement [`GlobalAlloc`] (as it's [not possible without locking]). Instead, it provides an [`allocate_first_fit`] method that has a slightly different interface. Instead of returning a `*mut u8` and using a null pointer to signal an error, it returns a `Result, ()>`. The [`NonNull`] type is an abstraction for a raw pointer that is guaranteed to not be a null pointer. By mapping the `Ok` case to the [`NonNull::as_ptr`] method and the `Err` case to a null pointer, we can easily translate this back to a `*mut u8` type. [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [not possible without locking]: #globalalloc-and-mutability [`allocate_first_fit`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.allocate_first_fit [`NonNull`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html [`NonNull::as_ptr`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html#method.as_ptr #### Calculating the List Index Before we implement the `GlobalAlloc` trait, we define a `list_index` helper function that returns the lowest possible block size for a given [`Layout`]: ```rust // in src/allocator/fixed_size_block.rs /// Choose an appropriate block size for the given layout. /// /// Returns an index into the `BLOCK_SIZES` array. fn list_index(layout: &Layout) -> Option { let required_block_size = layout.size().max(layout.align()); BLOCK_SIZES.iter().position(|&s| s >= required_block_size) } ``` The block must have at least the size and alignment required by the given `Layout`. Since we defined that the block size is also its alignment, this means that the `required_block_size` is the [maximum] of the layout's [`size()`] and [`align()`] attributes. To find the next-larger block in the `BLOCK_SIZES` slice, we first use the [`iter()`] method to get an iterator and then the [`position()`] method to find the index of the first block that is at least as large as the `required_block_size`. [maximum]: https://doc.rust-lang.org/core/cmp/trait.Ord.html#method.max [`size()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.size [`align()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align [`iter()`]: https://doc.rust-lang.org/std/primitive.slice.html#method.iter [`position()`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.position Note that we don't return the block size itself, but the index into the `BLOCK_SIZES` slice. The reason is that we want to use the returned index as an index into the `list_heads` array. #### Implementing `GlobalAlloc` The last step is to implement the `GlobalAlloc` trait: ```rust // in src/allocator/fixed_size_block.rs use super::Locked; use alloc::alloc::GlobalAlloc; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { todo!(); } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { todo!(); } } ``` Like for the other allocators, we don't implement the `GlobalAlloc` trait directly for our allocator type, but use the [`Locked` wrapper] to add synchronized interior mutability. Since the `alloc` and `dealloc` implementations are relatively large, we introduce them one by one in the following. ##### `alloc` The implementation of the `alloc` method looks like this: ```rust // in `impl` block in src/allocator/fixed_size_block.rs unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { match allocator.list_heads[index].take() { Some(node) => { allocator.list_heads[index] = node.next.take(); node as *mut ListNode as *mut u8 } None => { // no block exists in list => allocate new block let block_size = BLOCK_SIZES[index]; // only works if all block sizes are a power of 2 let block_align = block_size; let layout = Layout::from_size_align(block_size, block_align) .unwrap(); allocator.fallback_alloc(layout) } } } None => allocator.fallback_alloc(layout), } } ``` Let's go through it step by step: First, we use the `Locked::lock` method to get a mutable reference to the wrapped allocator instance. Next, we call the `list_index` function we just defined to calculate the appropriate block size for the given layout and get the corresponding index into the `list_heads` array. If this index is `None`, no block size fits for the allocation, therefore we use the `fallback_allocator` using the `fallback_alloc` function. If the list index is `Some`, we try to remove the first node in the corresponding list started by `list_heads[index]` using the [`Option::take`] method. If the list is not empty, we enter the `Some(node)` branch of the `match` statement, where we point the head pointer of the list to the successor of the popped `node` (by using [`take`][`Option::take`] again). Finally, we return the popped `node` pointer as a `*mut u8`. [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take If the list head is `None`, it indicates that the list of blocks is empty. This means that we need to construct a new block as [described above](#creating-new-blocks). For that, we first get the current block size from the `BLOCK_SIZES` slice and use it as both the size and the alignment for the new block. Then we create a new `Layout` from it and call the `fallback_alloc` method to perform the allocation. The reason for adjusting the layout and alignment is that the block will be added to the block list on deallocation. #### `dealloc` The implementation of the `dealloc` method looks like this: ```rust // in src/allocator/fixed_size_block.rs use core::{mem, ptr::NonNull}; // inside the `unsafe impl GlobalAlloc` block unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { let new_node = ListNode { next: allocator.list_heads[index].take(), }; // verify that block has size and alignment required for storing node assert!(mem::size_of::() <= BLOCK_SIZES[index]); assert!(mem::align_of::() <= BLOCK_SIZES[index]); let new_node_ptr = ptr as *mut ListNode; unsafe { new_node_ptr.write(new_node); allocator.list_heads[index] = Some(&mut *new_node_ptr); } } None => { let ptr = NonNull::new(ptr).unwrap(); unsafe { allocator.fallback_allocator.deallocate(ptr, layout); } } } } ``` Like in `alloc`, we first use the `lock` method to get a mutable allocator reference and then the `list_index` function to get the block list corresponding to the given `Layout`. If the index is `None`, no fitting block size exists in `BLOCK_SIZES`, which indicates that the allocation was created by the fallback allocator. Therefore, we use its [`deallocate`][`Heap::deallocate`] to free the memory again. The method expects a [`NonNull`] instead of a `*mut u8`, so we need to convert the pointer first. (The `unwrap` call only fails when the pointer is null, which should never happen when the compiler calls `dealloc`.) [`Heap::deallocate`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.deallocate If `list_index` returns a block index, we need to add the freed memory block to the list. For that, we first create a new `ListNode` that points to the current list head (by using [`Option::take`] again). Before we write the new node into the freed memory block, we first assert that the current block size specified by `index` has the required size and alignment for storing a `ListNode`. Then we perform the write by converting the given `*mut u8` pointer to a `*mut ListNode` pointer and then calling the unsafe [`write`][`pointer::write`] method on it. The last step is to set the head pointer of the list, which is currently `None` since we called `take` on it, to our newly written `ListNode`. For that, we convert the raw `new_node_ptr` to a mutable reference. [`pointer::write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write There are a few things worth noting: - We don't differentiate between blocks allocated from a block list and blocks allocated from the fallback allocator. This means that new blocks created in `alloc` are added to the block list on `dealloc`, thereby increasing the number of blocks of that size. - The `alloc` method is the only place where new blocks are created in our implementation. This means that we initially start with empty block lists and only fill these lists lazily when allocations of their block size are performed. - We don't need `unsafe` blocks in `alloc` and `dealloc`, even though we perform some `unsafe` operations. The reason is that Rust currently treats the complete body of unsafe functions as one large `unsafe` block. Since using explicit `unsafe` blocks has the advantage that it's obvious which operations are unsafe and which are not, there is a [proposed RFC](https://github.com/rust-lang/rfcs/pull/2585) to change this behavior. ### Using it To use our new `FixedSizeBlockAllocator`, we need to update the `ALLOCATOR` static in the `allocator` module: ```rust // in src/allocator.rs use fixed_size_block::FixedSizeBlockAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new( FixedSizeBlockAllocator::new()); ``` Since the `init` function behaves the same for all allocators we implemented, we don't need to modify the `init` call in `init_heap`. When we now run our `heap_allocation` tests again, all tests should still pass: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` Our new allocator seems to work! ### Discussion While the fixed-size block approach has much better performance than the linked list approach, it wastes up to half of the memory when using powers of 2 as block sizes. Whether this tradeoff is worth it heavily depends on the application type. For an operating system kernel, where performance is critical, the fixed-size block approach seems to be the better choice. On the implementation side, there are various things that we could improve in our current implementation: - Instead of only allocating blocks lazily using the fallback allocator, it might be better to pre-fill the lists to improve the performance of initial allocations. - To simplify the implementation, we only allowed block sizes that are powers of 2 so that we could also use them as the block alignment. By storing (or calculating) the alignment in a different way, we could also allow arbitrary other block sizes. This way, we could add more block sizes, e.g., for common allocation sizes, in order to minimize the wasted memory. - We currently only create new blocks, but never free them again. This results in fragmentation and might eventually result in allocation failure for large allocations. It might make sense to enforce a maximum list length for each block size. When the maximum length is reached, subsequent deallocations are freed using the fallback allocator instead of being added to the list. - Instead of falling back to a linked list allocator, we could have a special allocator for allocations greater than 4 KiB. The idea is to utilize [paging], which operates on 4 KiB pages, to map a continuous block of virtual memory to non-continuous physical frames. This way, fragmentation of unused memory is no longer a problem for large allocations. - With such a page allocator, it might make sense to add block sizes up to 4 KiB and drop the linked list allocator completely. The main advantages of this would be reduced fragmentation and improved performance predictability, i.e., better worst-case performance. [paging]: @/edition-2/posts/08-paging-introduction/index.md It's important to note that the implementation improvements outlined above are only suggestions. Allocators used in operating system kernels are typically highly optimized for the specific workload of the kernel, which is only possible through extensive profiling. ### Variations There are also many variations of the fixed-size block allocator design. Two popular examples are the _slab allocator_ and the _buddy allocator_, which are also used in popular kernels such as Linux. In the following, we give a short introduction to these two designs. #### Slab Allocator The idea behind a [slab allocator] is to use block sizes that directly correspond to selected types in the kernel. This way, allocations of those types fit a block size exactly and no memory is wasted. Sometimes, it might be even possible to preinitialize type instances in unused blocks to further improve performance. [slab allocator]: https://en.wikipedia.org/wiki/Slab_allocation Slab allocation is often combined with other allocators. For example, it can be used together with a fixed-size block allocator to further split an allocated block in order to reduce memory waste. It is also often used to implement an [object pool pattern] on top of a single large allocation. [object pool pattern]: https://en.wikipedia.org/wiki/Object_pool_pattern #### Buddy Allocator Instead of using a linked list to manage freed blocks, the [buddy allocator] design uses a [binary tree] data structure together with power-of-2 block sizes. When a new block of a certain size is required, it splits a larger sized block into two halves, thereby creating two child nodes in the tree. Whenever a block is freed again, its neighbor block in the tree is analyzed. If the neighbor is also free, the two blocks are joined back together to form a block of twice the size. The advantage of this merge process is that [external fragmentation] is reduced so that small freed blocks can be reused for a large allocation. It also does not use a fallback allocator, so the performance is more predictable. The biggest drawback is that only power-of-2 block sizes are possible, which might result in a large amount of wasted memory due to [internal fragmentation]. For this reason, buddy allocators are often combined with a slab allocator to further split an allocated block into multiple smaller blocks. [buddy allocator]: https://en.wikipedia.org/wiki/Buddy_memory_allocation [binary tree]: https://en.wikipedia.org/wiki/Binary_tree [external fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#External_fragmentation [internal fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#Internal_fragmentation ## Summary This post gave an overview of different allocator designs. We learned how to implement a basic [bump allocator], which hands out memory linearly by increasing a single `next` pointer. While bump allocation is very fast, it can only reuse memory after all allocations have been freed. For this reason, it is rarely used as a global allocator. [bump allocator]: @/edition-2/posts/11-allocator-designs/index.md#bump-allocator Next, we created a [linked list allocator] that uses the freed memory blocks itself to create a linked list, the so-called [free list]. This list makes it possible to store an arbitrary number of freed blocks of different sizes. While no memory waste occurs, the approach suffers from poor performance because an allocation request might require a complete traversal of the list. Our implementation also suffers from [external fragmentation] because it does not merge adjacent freed blocks back together. [linked list allocator]: @/edition-2/posts/11-allocator-designs/index.md#linked-list-allocator [free list]: https://en.wikipedia.org/wiki/Free_list To fix the performance problems of the linked list approach, we created a [fixed-size block allocator] that predefines a fixed set of block sizes. For each block size, a separate [free list] exists so that allocations and deallocations only need to insert/pop at the front of the list and are thus very fast. Since each allocation is rounded up to the next larger block size, some memory is wasted due to [internal fragmentation]. [fixed-size block allocator]: @/edition-2/posts/11-allocator-designs/index.md#fixed-size-block-allocator There are many more allocator designs with different tradeoffs. [Slab allocation] works well to optimize the allocation of common fixed-size structures, but is not applicable in all situations. [Buddy allocation] uses a binary tree to merge freed blocks back together, but wastes a large amount of memory because it only supports power-of-2 block sizes. It's also important to remember that each kernel implementation has a unique workload, so there is no "best" allocator design that fits all cases. [Slab allocation]: @/edition-2/posts/11-allocator-designs/index.md#slab-allocator [Buddy allocation]: @/edition-2/posts/11-allocator-designs/index.md#buddy-allocator ## What's next? With this post, we conclude our memory management implementation for now. Next, we will start exploring [_multitasking_], starting with cooperative multitasking in the form of [_async/await_]. In subsequent posts, we will then explore [_threads_], [_multiprocessing_], and [_processes_]. [_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking [_threads_]: https://en.wikipedia.org/wiki/Thread_(computing) [_processes_]: https://en.wikipedia.org/wiki/Process_(computing) [_multiprocessing_]: https://en.wikipedia.org/wiki/Multiprocessing [_async/await_]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html ================================================ FILE: blog/content/edition-2/posts/11-allocator-designs/index.pt-BR.md ================================================ +++ title = "Designs de Alocadores" weight = 11 path = "pt-BR/allocator-designs" date = 2020-01-20 [extra] chapter = "Gerenciamento de Memória" # Please update this when updating the translation translation_based_on_commit = "c0fc0bed9e8b8459dde80a71f4f89f578cb5ddfb" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Este post explica como implementar alocadores heap do zero. Ele apresenta e discute diferentes designs de alocadores, incluindo alocação bump, alocação de lista encadeada e alocação de bloco de tamanho fixo. Para cada um dos três designs, criaremos uma implementação básica que pode ser usada para o nosso kernel. Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou pergunta, por favor abra uma issue lá. Você também pode deixar comentários [no final]. O código-fonte completo para este post pode ser encontrado no branch [`post-11`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-11 ## Introdução No [post anterior], adicionamos suporte básico para alocações heap ao nosso kernel. Para isso, [criamos uma nova região de memória][map-heap] nas tabelas de página e [usamos a crate `linked_list_allocator`][use-alloc-crate] para gerenciar essa memória. Embora agora tenhamos um heap funcional, deixamos a maior parte do trabalho para a crate do alocador sem tentar entender como ela funciona. [post anterior]: @/edition-2/posts/10-heap-allocation/index.md [map-heap]: @/edition-2/posts/10-heap-allocation/index.md#creating-a-kernel-heap [use-alloc-crate]: @/edition-2/posts/10-heap-allocation/index.md#using-an-allocator-crate Neste post, mostraremos como criar nosso próprio alocador heap do zero em vez de depender de uma crate de alocador existente. Discutiremos diferentes designs de alocadores, incluindo um _alocador bump_ simplista e um _alocador de bloco de tamanho fixo_ básico, e usaremos esse conhecimento para implementar um alocador com desempenho aprimorado (comparado à crate `linked_list_allocator`). ### Objetivos de Design A responsabilidade de um alocador é gerenciar a memória heap disponível. Ele precisa retornar memória não utilizada em chamadas `alloc` e acompanhar a memória liberada por `dealloc` para que possa ser reutilizada novamente. Mais importante, ele nunca deve entregar memória que já está em uso em outro lugar porque isso causaria comportamento indefinido. Além da correção, existem muitos objetivos de design secundários. Por exemplo, o alocador deve utilizar efetivamente a memória disponível e manter a [_fragmentação_] baixa. Além disso, ele deve funcionar bem para aplicações concorrentes e escalar para qualquer número de processadores. Para desempenho máximo, ele poderia até otimizar o layout da memória em relação aos caches da CPU para melhorar a [localidade de cache] e evitar [compartilhamento falso]. [localidade de cache]: https://www.geeksforgeeks.org/locality-of-reference-and-cache-operation-in-cache-memory/ [_fragmentação_]: https://en.wikipedia.org/wiki/Fragmentation_(computing) [compartilhamento falso]: https://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html Esses requisitos podem tornar bons alocadores muito complexos. Por exemplo, [jemalloc] tem mais de 30.000 linhas de código. Essa complexidade é frequentemente indesejada no código do kernel, onde um único bug pode levar a vulnerabilidades de segurança graves. Felizmente, os padrões de alocação do código do kernel são frequentemente muito mais simples comparados ao código do espaço do usuário, de modo que designs de alocadores relativamente simples frequentemente são suficientes. [jemalloc]: http://jemalloc.net/ A seguir, apresentamos três possíveis designs de alocadores de kernel e explicamos suas vantagens e desvantagens. ## Alocador Bump O design de alocador mais simples é um _alocador bump_ (também conhecido como _alocador de pilha_). Ele aloca memória linearmente e só mantém o controle do número de bytes alocados e do número de alocações. Ele só é útil em casos de uso muito específicos porque tem uma limitação severa: ele só pode liberar toda a memória de uma vez. ### Ideia A ideia por trás de um alocador bump é alocar memória linearmente aumentando (_"bumping"_) uma variável `next`, que aponta para o início da memória não utilizada. No início, `next` é igual ao endereço inicial do heap. Em cada alocação, `next` é aumentado pelo tamanho da alocação para que sempre aponte para a fronteira entre memória usada e não utilizada: ![A área de memória heap em três pontos no tempo: 1: Uma única alocação existe no início do heap; o ponteiro `next` aponta para seu final. 2: Uma segunda alocação foi adicionada logo após a primeira; o ponteiro `next` aponta para o final da segunda alocação. 3: Uma terceira alocação foi adicionada logo após a segunda; o ponteiro `next` aponta para o final da terceira alocação.](bump-allocation.svg) O ponteiro `next` só se move em uma única direção e, portanto, nunca entrega a mesma região de memória duas vezes. Quando ele alcança o final do heap, nenhuma memória adicional pode ser alocada, resultando em um erro de falta de memória na próxima alocação. Um alocador bump é frequentemente implementado com um contador de alocações, que é aumentado em 1 em cada chamada `alloc` e diminuído em 1 em cada chamada `dealloc`. Quando o contador de alocações atinge zero, significa que todas as alocações no heap foram desalocadas. Nesse caso, o ponteiro `next` pode ser redefinido para o endereço inicial do heap, de modo que a memória heap completa esteja disponível para alocações novamente. ### Implementação Começamos nossa implementação declarando um novo submódulo `allocator::bump`: ```rust // em src/allocator.rs pub mod bump; ``` O conteúdo do submódulo vive em um novo arquivo `src/allocator/bump.rs`, que criamos com o seguinte conteúdo: ```rust // em src/allocator/bump.rs pub struct BumpAllocator { heap_start: usize, heap_end: usize, next: usize, allocations: usize, } impl BumpAllocator { /// Cria um novo alocador bump vazio. pub const fn new() -> Self { BumpAllocator { heap_start: 0, heap_end: 0, next: 0, allocations: 0, } } /// Inicializa o alocador bump com os limites de heap fornecidos. /// /// Este método é unsafe porque o chamador deve garantir que o intervalo /// de memória fornecido esteja não utilizado. Além disso, este método deve ser chamado apenas uma vez. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { self.heap_start = heap_start; self.heap_end = heap_start + heap_size; self.next = heap_start; } } ``` Os campos `heap_start` e `heap_end` mantêm o controle dos limites inferior e superior da região de memória heap. O chamador precisa garantir que esses endereços sejam válidos, caso contrário o alocador retornaria memória inválida. Por essa razão, a função `init` precisa ser `unsafe` para chamar. O propósito do campo `next` é sempre apontar para o primeiro byte não utilizado do heap, ou seja, o endereço inicial da próxima alocação. Ele é definido como `heap_start` na função `init` porque no início, o heap inteiro está não utilizado. Em cada alocação, este campo será aumentado pelo tamanho da alocação (_"bumped"_) para garantir que não retornemos a mesma região de memória duas vezes. O campo `allocations` é um simples contador para as alocações ativas com o objetivo de redefinir o alocador após a última alocação ter sido liberada. Ele é inicializado com 0. Escolhemos criar uma função `init` separada em vez de realizar a inicialização diretamente em `new` para manter a interface idêntica ao alocador fornecido pela crate `linked_list_allocator`. Dessa forma, os alocadores podem ser trocados sem mudanças adicionais no código. ### Implementando `GlobalAlloc` Como [explicado no post anterior][global-alloc], todos os alocadores heap precisam implementar a trait [`GlobalAlloc`], que é definida assim: [global-alloc]: @/edition-2/posts/10-heap-allocation/index.pt-BR.md#a-interface-do-alocador [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` Apenas os métodos `alloc` e `dealloc` são obrigatórios; os outros dois métodos têm implementações padrão e podem ser omitidos. #### Primeira Tentativa de Implementação Vamos tentar implementar o método `alloc` para nosso `BumpAllocator`: ```rust // em src/allocator/bump.rs use alloc::alloc::{GlobalAlloc, Layout}; unsafe impl GlobalAlloc for BumpAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // TODO verificação de alinhamento e limites let alloc_start = self.next; self.next = alloc_start + layout.size(); self.allocations += 1; alloc_start as *mut u8 } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { todo!(); } } ``` Primeiro, usamos o campo `next` como o endereço inicial para nossa alocação. Então atualizamos o campo `next` para apontar para o endereço final da alocação, que é o próximo endereço não utilizado no heap. Antes de retornar o endereço inicial da alocação como um ponteiro `*mut u8`, aumentamos o contador `allocations` em 1. Note que não realizamos nenhuma verificação de limites ou ajustes de alinhamento, então esta implementação ainda não é segura. Isso não importa muito porque ela falha ao compilar de qualquer forma com o seguinte erro: ``` error[E0594]: cannot assign to `self.next` which is behind a `&` reference --> src/allocator/bump.rs:29:9 | 29 | self.next = alloc_start + layout.size(); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&` reference, so the data it refers to cannot be written ``` (O mesmo erro também ocorre para a linha `self.allocations += 1`. Omitimos aqui por brevidade.) O erro ocorre porque os métodos [`alloc`] e [`dealloc`] da trait `GlobalAlloc` operam apenas em uma referência imutável `&self`, então atualizar os campos `next` e `allocations` não é possível. Isso é problemático porque atualizar `next` em cada alocação é o princípio essencial de um alocador bump. [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc #### `GlobalAlloc` e Mutabilidade Antes de olharmos para uma possível solução para este problema de mutabilidade, vamos tentar entender por que os métodos da trait `GlobalAlloc` são definidos com argumentos `&self`: Como vimos [no post anterior][global-allocator], o alocador heap global é definido adicionando o atributo `#[global_allocator]` a um `static` que implementa a trait `GlobalAlloc`. Variáveis estáticas são imutáveis em Rust, então não há maneira de chamar um método que recebe `&mut self` no alocador estático. Por essa razão, todos os métodos de `GlobalAlloc` recebem apenas uma referência imutável `&self`. [global-allocator]: @/edition-2/posts/10-heap-allocation/index.md#the-global-allocator-attribute Felizmente, há uma maneira de obter uma referência `&mut self` de uma referência `&self`: Podemos usar [mutabilidade interior] sincronizada envolvendo o alocador em um spinlock [`spin::Mutex`]. Este tipo fornece um método `lock` que realiza [exclusão mútua] e, portanto, transforma com segurança uma referência `&self` em uma referência `&mut self`. Já usamos o tipo wrapper várias vezes em nosso kernel, por exemplo, para o [buffer de texto VGA][vga-mutex]. [mutabilidade interior]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [vga-mutex]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks [`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html [exclusão mútua]: https://en.wikipedia.org/wiki/Mutual_exclusion #### Um Tipo Wrapper `Locked` Com a ajuda do tipo wrapper `spin::Mutex`, podemos implementar a trait `GlobalAlloc` para nosso alocador bump. O truque é implementar a trait não para o `BumpAllocator` diretamente, mas para o tipo envolvido `spin::Mutex`: ```rust unsafe impl GlobalAlloc for spin::Mutex {…} ``` Infelizmente, isso ainda não funciona porque o compilador Rust não permite implementações de traits para tipos definidos em outras crates: ``` error[E0117]: only traits defined in the current crate can be implemented for arbitrary types --> src/allocator/bump.rs:28:1 | 28 | unsafe impl GlobalAlloc for spin::Mutex { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^-------------------------- | | | | | `spin::mutex::Mutex` is not defined in the current crate | impl doesn't use only types from inside the current crate | = note: define and implement a trait or new type instead ``` Para corrigir isso, precisamos criar nosso próprio tipo wrapper em torno de `spin::Mutex`: ```rust // em src/allocator.rs /// Um wrapper em torno de spin::Mutex para permitir implementações de traits. pub struct Locked { inner: spin::Mutex, } impl Locked { pub const fn new(inner: A) -> Self { Locked { inner: spin::Mutex::new(inner), } } pub fn lock(&self) -> spin::MutexGuard { self.inner.lock() } } ``` O tipo é um wrapper genérico em torno de um `spin::Mutex`. Ele não impõe restrições no tipo envolvido `A`, então pode ser usado para envolver todos os tipos, não apenas alocadores. Ele fornece uma simples função construtora `new` que envolve um valor dado. Para conveniência, ele também fornece uma função `lock` que chama `lock` no `Mutex` envolvido. Como o tipo `Locked` é geral o suficiente para ser útil para outras implementações de alocadores também, o colocamos no módulo `allocator` pai. #### Implementação para `Locked` O tipo `Locked` é definido em nossa própria crate (em contraste com `spin::Mutex`), então podemos usá-lo para implementar `GlobalAlloc` para nosso alocador bump. A implementação completa se parece com isso: ```rust // em src/allocator/bump.rs use super::{align_up, Locked}; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut bump = self.lock(); // obter uma referência mutável let alloc_start = align_up(bump.next, layout.align()); let alloc_end = match alloc_start.checked_add(layout.size()) { Some(end) => end, None => return ptr::null_mut(), }; if alloc_end > bump.heap_end { ptr::null_mut() // fora de memória } else { bump.next = alloc_end; bump.allocations += 1; alloc_start as *mut u8 } } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { let mut bump = self.lock(); // obter uma referência mutável bump.allocations -= 1; if bump.allocations == 0 { bump.next = bump.heap_start; } } } ``` O primeiro passo para tanto `alloc` quanto `dealloc` é chamar o método [`Mutex::lock`] através do campo `inner` para obter uma referência mutável ao tipo alocador envolvido. A instância permanece bloqueada até o final do método, para que nenhuma corrida de dados possa ocorrer em contextos multi-thread (adicionaremos suporte a threading em breve). [`Mutex::lock`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock Comparado ao protótipo anterior, a implementação de `alloc` agora respeita requisitos de alinhamento e realiza uma verificação de limites para garantir que as alocações permaneçam dentro da região de memória heap. O primeiro passo é arredondar o endereço `next` para cima até o alinhamento especificado pelo argumento `Layout`. O código para a função `align_up` é mostrado em um momento. Então adicionamos o tamanho de alocação solicitado a `alloc_start` para obter o endereço final da alocação. Para prevenir overflow de inteiro em alocações grandes, usamos o método [`checked_add`]. Se ocorrer um overflow ou se o endereço final resultante da alocação for maior que o endereço final do heap, retornamos um ponteiro nulo para sinalizar uma situação de falta de memória. Caso contrário, atualizamos o endereço `next` e aumentamos o contador `allocations` em 1 como antes. Finalmente, retornamos o endereço `alloc_start` convertido para um ponteiro `*mut u8`. [`checked_add`]: https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html A função `dealloc` ignora o ponteiro e os argumentos `Layout` fornecidos. Em vez disso, ela apenas diminui o contador `allocations`. Se o contador atingir `0` novamente, significa que todas as alocações foram liberadas novamente. Nesse caso, ela redefine o endereço `next` para o endereço `heap_start` para tornar a memória heap completa disponível novamente. #### Alinhamento de Endereço A função `align_up` é geral o suficiente para que possamos colocá-la no módulo `allocator` pai. Uma implementação básica se parece com isso: ```rust // em src/allocator.rs /// Alinha o endereço fornecido `addr` para cima até o alinhamento `align`. fn align_up(addr: usize, align: usize) -> usize { let remainder = addr % align; if remainder == 0 { addr // addr já está alinhado } else { addr - remainder + align } } ``` A função primeiro calcula o [resto] da divisão de `addr` por `align`. Se o resto for `0`, o endereço já está alinhado com o alinhamento fornecido. Caso contrário, alinhamos o endereço subtraindo o resto (para que o novo resto seja 0) e então adicionando o alinhamento (para que o endereço não se torne menor que o endereço original). [resto]: https://en.wikipedia.org/wiki/Euclidean_division Note que esta não é a maneira mais eficiente de implementar esta função. Uma implementação muito mais rápida se parece com isso: ```rust /// Alinha o endereço fornecido `addr` para cima até o alinhamento `align`. /// /// Requer que `align` seja uma potência de dois. fn align_up(addr: usize, align: usize) -> usize { (addr + align - 1) & !(align - 1) } ``` Este método requer que `align` seja uma potência de dois, o que pode ser garantido utilizando a trait `GlobalAlloc` (e seu parâmetro [`Layout`]). Isso torna possível criar uma [máscara de bits] para alinhar o endereço de uma maneira muito eficiente. Para entender como funciona, vamos passar por isso passo a passo, começando no lado direito: [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html [máscara de bits]: https://en.wikipedia.org/wiki/Mask_(computing) - Como `align` é uma potência de dois, sua [representação binária] tem apenas um único bit definido (por exemplo, `0b000100000`). Isso significa que `align - 1` tem todos os bits inferiores definidos (por exemplo, `0b00011111`). - Ao criar o [`NOT` bit a bit] através do operador `!`, obtemos um número que tem todos os bits definidos exceto os bits inferiores a `align` (por exemplo, `0b…111111111100000`). - Ao realizar um [`AND` bit a bit] em um endereço e `!(align - 1)`, alinhamos o endereço _para baixo_. Isso funciona limpando todos os bits que são inferiores a `align`. - Como queremos alinhar para cima em vez de para baixo, aumentamos o `addr` por `align - 1` antes de realizar o `AND` bit a bit. Dessa forma, endereços já alinhados permanecem os mesmos enquanto endereços não alinhados são arredondados para o próximo limite de alinhamento. [representação binária]: https://en.wikipedia.org/wiki/Binary_number#Representation [`NOT` bit a bit]: https://en.wikipedia.org/wiki/Bitwise_operation#NOT [`AND` bit a bit]: https://en.wikipedia.org/wiki/Bitwise_operation#AND Qual variante você escolher fica a seu critério. Ambas calculam o mesmo resultado, apenas usando métodos diferentes. ### Usando-o Para usar o alocador bump em vez da crate `linked_list_allocator`, precisamos atualizar o static `ALLOCATOR` em `allocator.rs`: ```rust // em src/allocator.rs use bump::BumpAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(BumpAllocator::new()); ``` Aqui se torna importante que declaramos `BumpAllocator::new` e `Locked::new` como [funções `const`]. Se fossem funções normais, ocorreria um erro de compilação porque a expressão de inicialização de um `static` deve ser avaliável em tempo de compilação. [funções `const`]: https://doc.rust-lang.org/reference/items/functions.html#const-functions Não precisamos modificar a chamada `ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE)` em nossa função `init_heap` porque o alocador bump fornece a mesma interface que o alocador fornecido pela `linked_list_allocator`. Agora nosso kernel usa nosso alocador bump! Tudo ainda deve funcionar, incluindo os [testes `heap_allocation`] que criamos no post anterior: [testes `heap_allocation`]: @/edition-2/posts/10-heap-allocation/index.md#adding-a-test ``` > cargo test --test heap_allocation […] Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` Nosso novo alocador parece funcionar! ### Discussão A grande vantagem da alocação bump é que ela é muito rápida. Comparado a outros designs de alocadores (veja abaixo) que precisam procurar ativamente por um bloco de memória adequado e realizar várias tarefas de contabilidade em `alloc` e `dealloc`, um alocador bump [pode ser otimizado][bump downwards] para apenas algumas instruções assembly. Isso torna os alocadores bump úteis para otimizar o desempenho de alocação, por exemplo, ao criar uma [biblioteca DOM virtual]. [bump downwards]: https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html [biblioteca DOM virtual]: https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/ Embora um alocador bump raramente seja usado como o alocador global, o princípio de alocação bump é frequentemente aplicado na forma de [alocação arena], que basicamente agrupa alocações individuais juntas para melhorar o desempenho. Um exemplo de um alocador arena para Rust está contido na crate [`toolshed`]. [alocação arena]: https://mgravell.github.io/Pipelines.Sockets.Unofficial/docs/arenas.html [`toolshed`]: https://docs.rs/toolshed/0.8.1/toolshed/index.html #### A Desvantagem de um Alocador Bump A principal limitação de um alocador bump é que ele só pode reutilizar memória desalocada depois que todas as alocações foram liberadas. Isso significa que uma única alocação de longa duração é suficiente para prevenir a reutilização de memória. Podemos ver isso quando adicionamos uma variação do teste `many_boxes`: ```rust // em tests/heap_allocation.rs #[test_case] fn many_boxes_long_lived() { let long_lived = Box::new(1); // novo for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } assert_eq!(*long_lived, 1); // novo } ``` Como o teste `many_boxes`, este teste cria um grande número de alocações para provocar uma falha de falta de memória se o alocador não reutilizar memória liberada. Adicionalmente, o teste cria uma alocação `long_lived`, que vive pela execução completa do loop. Quando tentamos executar nosso novo teste, vemos que ele de fato falha: ``` > cargo test --test heap_allocation Running 4 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [failed] Error: panicked at 'allocation error: Layout { size_: 8, align_: 8 }', src/lib.rs:86:5 ``` Vamos tentar entender por que essa falha ocorre em detalhe: Primeiro, a alocação `long_lived` é criada no início do heap, aumentando assim o contador `allocations` em 1. Para cada iteração do loop, uma alocação de curta duração é criada e diretamente liberada novamente antes da próxima iteração começar. Isso significa que o contador `allocations` é temporariamente aumentado para 2 no início de uma iteração e diminuído para 1 no final dela. O problema agora é que o alocador bump só pode reutilizar memória depois que _todas_ as alocações foram liberadas, ou seja, quando o contador `allocations` cai para 0. Como isso não acontece antes do final do loop, cada iteração do loop aloca uma nova região de memória, levando a um erro de falta de memória após um número de iterações. #### Corrigindo o Teste? Existem dois truques potenciais que poderíamos utilizar para corrigir o teste para nosso alocador bump: - Poderíamos atualizar `dealloc` para verificar se a alocação liberada foi a última alocação retornada por `alloc` comparando seu endereço final com o ponteiro `next`. No caso de serem iguais, podemos com segurança redefinir `next` de volta ao endereço inicial da alocação liberada. Dessa forma, cada iteração do loop reutiliza o mesmo bloco de memória. - Poderíamos adicionar um método `alloc_back` que aloca memória do _final_ do heap usando um campo `next_back` adicional. Então poderíamos usar manualmente este método de alocação para todas as alocações de longa duração, separando assim alocações de curta e longa duração no heap. Note que esta separação só funciona se estiver claro de antemão quanto tempo cada alocação viverá. Outra desvantagem desta abordagem é que realizar alocações manualmente é trabalhoso e potencialmente inseguro. Embora ambas essas abordagens funcionem para corrigir o teste, elas não são uma solução geral, já que são capazes apenas de reutilizar memória em casos muito específicos. A questão é: Existe uma solução geral que reutiliza _toda_ memória liberada? #### Reutilizando Toda Memória Liberada? Como aprendemos [no post anterior][heap-intro], alocações podem viver arbitrariamente por muito tempo e podem ser liberadas em uma ordem arbitrária. Isso significa que precisamos acompanhar um número potencialmente ilimitado de regiões de memória não contínuas e não utilizadas, conforme ilustrado pelo seguinte exemplo: [heap-intro]: @/edition-2/posts/10-heap-allocation/index.md#dynamic-memory ![](allocation-fragmentation.svg) O gráfico mostra o heap ao longo do tempo. No início, o heap completo está não utilizado, e o endereço `next` é igual a `heap_start` (linha 1). Então a primeira alocação ocorre (linha 2). Na linha 3, um segundo bloco de memória é alocado e a primeira alocação é liberada. Muitas mais alocações são adicionadas na linha 4. Metade delas tem vida muito curta e já são liberadas na linha 5, onde outra nova alocação também é adicionada. A linha 5 mostra o problema fundamental: Temos cinco regiões de memória não utilizadas com tamanhos diferentes, mas o ponteiro `next` só pode apontar para o início da última região. Embora pudéssemos armazenar os endereços iniciais e tamanhos das outras regiões de memória não utilizadas em um array de tamanho 4 para este exemplo, isso não é uma solução geral, já que poderíamos facilmente criar um exemplo com 8, 16 ou 1000 regiões de memória não utilizadas. Normalmente, quando temos um número potencialmente ilimitado de itens, podemos simplesmente usar uma coleção alocada no heap. Isso não é realmente possível no nosso caso, já que o alocador heap não pode depender de si mesmo (isso causaria recursão infinita ou deadlocks). Então precisamos encontrar uma solução diferente. ## Alocador de Lista Encadeada Um truque comum para acompanhar um número arbitrário de áreas de memória livres ao implementar alocadores é usar essas áreas em si como armazenamento de suporte. Isso utiliza o fato de que as regiões ainda estão mapeadas para um endereço virtual e apoiadas por um frame físico, mas a informação armazenada não é mais necessária. Ao armazenar a informação sobre a região liberada na própria região, podemos acompanhar um número ilimitado de regiões liberadas sem precisar de memória adicional. A abordagem de implementação mais comum é construir uma lista encadeada única na memória liberada, com cada nó sendo uma região de memória liberada: ![](linked-list-allocation.svg) Cada nó da lista contém dois campos: o tamanho da região de memória e um ponteiro para a próxima região de memória não utilizada. Com esta abordagem, só precisamos de um ponteiro para a primeira região não utilizada (chamada `head`) para acompanhar todas as regiões não utilizadas, independentemente de seu número. A estrutura de dados resultante é frequentemente chamada de [_lista livre_]. [_lista livre_]: https://en.wikipedia.org/wiki/Free_list Como você pode adivinhar pelo nome, esta é a técnica que a crate `linked_list_allocator` usa. Alocadores que usam esta técnica também são frequentemente chamados de _alocadores de pool_. ### Implementação A seguir, criaremos nosso próprio tipo simples `LinkedListAllocator` que usa a abordagem acima para acompanhar regiões de memória liberadas. Esta parte do post não é necessária para posts futuros, então você pode pular os detalhes de implementação se quiser. #### O Tipo Alocador Começamos criando uma struct privada `ListNode` em um novo submódulo `allocator::linked_list`: ```rust // em src/allocator.rs pub mod linked_list; ``` ```rust // em src/allocator/linked_list.rs struct ListNode { size: usize, next: Option<&'static mut ListNode>, } ``` Como no gráfico, um nó da lista tem um campo `size` e um ponteiro opcional para o próximo nó, representado pelo tipo `Option<&'static mut ListNode>`. O tipo `&'static mut` descreve semanticamente um objeto [possuído] por trás de um ponteiro. Basicamente, é um [`Box`] sem um destruidor que libera o objeto no final do escopo. [possuído]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html [`Box`]: https://doc.rust-lang.org/alloc/boxed/index.html Implementamos o seguinte conjunto de métodos para `ListNode`: ```rust // em src/allocator/linked_list.rs impl ListNode { const fn new(size: usize) -> Self { ListNode { size, next: None } } fn start_addr(&self) -> usize { self as *const Self as usize } fn end_addr(&self) -> usize { self.start_addr() + self.size } } ``` O tipo tem uma simples função construtora chamada `new` e métodos para calcular os endereços inicial e final da região representada. Tornamos a função `new` uma [função const], que será necessária mais tarde ao construir um alocador de lista encadeada estático. [função const]: https://doc.rust-lang.org/reference/items/functions.html#const-functions Com a struct `ListNode` como um bloco de construção, agora podemos criar a struct `LinkedListAllocator`: ```rust // em src/allocator/linked_list.rs pub struct LinkedListAllocator { head: ListNode, } impl LinkedListAllocator { /// Cria um LinkedListAllocator vazio. pub const fn new() -> Self { Self { head: ListNode::new(0), } } /// Inicializa o alocador com os limites de heap fornecidos. /// /// Esta função é unsafe porque o chamador deve garantir que os /// limites de heap fornecidos sejam válidos e que o heap esteja não utilizado. Este método deve ser /// chamado apenas uma vez. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.add_free_region(heap_start, heap_size); } } /// Adiciona a região de memória fornecida à frente da lista. unsafe fn add_free_region(&mut self, addr: usize, size: usize) { todo!(); } } ``` A struct contém um nó `head` que aponta para a primeira região heap. Estamos interessados apenas no valor do ponteiro `next`, então definimos o `size` como 0 na função `ListNode::new`. Tornar `head` um `ListNode` em vez de apenas um `&'static mut ListNode` tem a vantagem de que a implementação do método `alloc` será mais simples. Como para o alocador bump, a função `new` não inicializa o alocador com os limites do heap. Além de manter compatibilidade com a API, a razão é que a rotina de inicialização requer escrever um nó na memória heap, o que só pode acontecer em tempo de execução. A função `new`, no entanto, precisa ser uma [função `const`] que pode ser avaliada em tempo de compilação porque será usada para inicializar o static `ALLOCATOR`. Por essa razão, fornecemos novamente um método `init` separado e não constante. [função `const`]: https://doc.rust-lang.org/reference/items/functions.html#const-functions O método `init` usa um método `add_free_region`, cuja implementação será mostrada em um momento. Por enquanto, usamos a macro [`todo!`] para fornecer uma implementação placeholder que sempre entra em pânico. [`todo!`]: https://doc.rust-lang.org/core/macro.todo.html #### O Método `add_free_region` O método `add_free_region` fornece a operação fundamental de _push_ na lista encadeada. Atualmente só chamamos este método de `init`, mas ele também será o método central em nossa implementação de `dealloc`. Lembre-se, o método `dealloc` é chamado quando uma região de memória alocada é liberada novamente. Para acompanhar esta região de memória liberada, queremos empurrá-la para a lista encadeada. A implementação do método `add_free_region` se parece com isso: ```rust // em src/allocator/linked_list.rs use super::align_up; use core::mem; impl LinkedListAllocator { /// Adiciona a região de memória fornecida à frente da lista. unsafe fn add_free_region(&mut self, addr: usize, size: usize) { // garantir que a região liberada seja capaz de conter ListNode assert_eq!(align_up(addr, mem::align_of::()), addr); assert!(size >= mem::size_of::()); // criar um novo nó da lista e anexá-lo no início da lista let mut node = ListNode::new(size); node.next = self.head.next.take(); let node_ptr = addr as *mut ListNode; unsafe { node_ptr.write(node); self.head.next = Some(&mut *node_ptr) } } } ``` O método recebe o endereço e tamanho de uma região de memória como argumento e a adiciona à frente da lista. Primeiro, ele garante que a região fornecida tenha o tamanho e alinhamento necessários para armazenar um `ListNode`. Então ele cria o nó e o insere na lista através dos seguintes passos: ![](linked-list-allocator-push.svg) O passo 0 mostra o estado do heap antes de `add_free_region` ser chamado. No passo 1, o método é chamado com a região de memória marcada como `freed` no gráfico. Após as verificações iniciais, o método cria um novo `node` em sua pilha com o tamanho da região liberada. Então ele usa o método [`Option::take`] para definir o ponteiro `next` do nó para o ponteiro `head` atual, redefinindo assim o ponteiro `head` para `None`. [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take No passo 2, o método escreve o `node` recém-criado no início da região de memória liberada através do método [`write`]. Então ele aponta o ponteiro `head` para o novo nó. A estrutura de ponteiros resultante parece um pouco caótica porque a região liberada é sempre inserida no início da lista, mas se seguirmos os ponteiros, vemos que cada região livre ainda é alcançável a partir do ponteiro `head`. [`write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write #### O Método `find_region` A segunda operação fundamental em uma lista encadeada é encontrar uma entrada e removê-la da lista. Esta é a operação central necessária para implementar o método `alloc`. Implementamos a operação como um método `find_region` da seguinte maneira: ```rust // em src/allocator/linked_list.rs impl LinkedListAllocator { /// Procura por uma região livre com o tamanho e alinhamento fornecidos e a remove /// da lista. /// /// Retorna uma tupla do nó da lista e o endereço inicial da alocação. fn find_region(&mut self, size: usize, align: usize) -> Option<(&'static mut ListNode, usize)> { // referência ao nó atual da lista, atualizada para cada iteração let mut current = &mut self.head; // procurar uma região de memória grande o suficiente na lista encadeada while let Some(ref mut region) = current.next { if let Ok(alloc_start) = Self::alloc_from_region(®ion, size, align) { // região adequada para alocação -> remover nó da lista let next = region.next.take(); let ret = Some((current.next.take().unwrap(), alloc_start)); current.next = next; return ret; } else { // região não adequada -> continuar com a próxima região current = current.next.as_mut().unwrap(); } } // nenhuma região adequada encontrada None } } ``` O método usa uma variável `current` e um [loop `while let`] para iterar sobre os elementos da lista. No início, `current` é definido como o nó `head` (dummy). Em cada iteração, ele é então atualizado para o campo `next` do nó atual (no bloco `else`). Se a região for adequada para uma alocação com o tamanho e alinhamento fornecidos, a região é removida da lista e retornada junto com o endereço `alloc_start`. [loop `while let`]: https://doc.rust-lang.org/reference/expressions/loop-expr.html#while-let-patterns Quando o ponteiro `current.next` se torna `None`, o loop sai. Isso significa que iteramos sobre toda a lista mas não encontramos nenhuma região adequada para uma alocação. Nesse caso, retornamos `None`. Se uma região é adequada é verificado pela função `alloc_from_region`, cuja implementação será mostrada em um momento. Vamos dar uma olhada mais detalhada em como uma região adequada é removida da lista: ![](linked-list-allocator-remove-region.svg) O passo 0 mostra a situação antes de quaisquer ajustes de ponteiros. As regiões `region` e `current` e os ponteiros `region.next` e `current.next` estão marcados no gráfico. No passo 1, tanto o ponteiro `region.next` quanto `current.next` são redefinidos para `None` usando o método [`Option::take`]. Os ponteiros originais são armazenados em variáveis locais chamadas `next` e `ret`. No passo 2, o ponteiro `current.next` é definido para o ponteiro local `next`, que é o ponteiro original `region.next`. O efeito é que `current` agora aponta diretamente para a região depois de `region`, de modo que `region` não é mais um elemento da lista encadeada. A função então retorna o ponteiro para `region` armazenado na variável local `ret`. ##### A Função `alloc_from_region` A função `alloc_from_region` retorna se uma região é adequada para uma alocação com um dado tamanho e alinhamento. Ela é definida assim: ```rust // em src/allocator/linked_list.rs impl LinkedListAllocator { /// Tenta usar a região fornecida para uma alocação com tamanho e /// alinhamento dados. /// /// Retorna o endereço inicial da alocação em caso de sucesso. fn alloc_from_region(region: &ListNode, size: usize, align: usize) -> Result { let alloc_start = align_up(region.start_addr(), align); let alloc_end = alloc_start.checked_add(size).ok_or(())?; if alloc_end > region.end_addr() { // região muito pequena return Err(()); } let excess_size = region.end_addr() - alloc_end; if excess_size > 0 && excess_size < mem::size_of::() { // resto da região muito pequeno para conter um ListNode (necessário porque a // alocação divide a região em uma parte usada e uma parte livre) return Err(()); } // região adequada para alocação Ok(alloc_start) } } ``` Primeiro, a função calcula os endereços inicial e final de uma alocação potencial, usando a função `align_up` que definimos anteriormente e o método [`checked_add`]. Se ocorrer um overflow ou se o endereço final estiver além do endereço final da região, a alocação não cabe na região e retornamos um erro. A função realiza uma verificação menos óbvia depois disso. Esta verificação é necessária porque na maioria das vezes uma alocação não se encaixa perfeitamente em uma região adequada, de modo que uma parte da região permanece utilizável após a alocação. Esta parte da região deve armazenar seu próprio `ListNode` após a alocação, então deve ser grande o suficiente para fazê-lo. A verificação verifica exatamente isso: ou a alocação se encaixa perfeitamente (`excess_size == 0`) ou o tamanho excedente é grande o suficiente para armazenar um `ListNode`. #### Implementando `GlobalAlloc` Com as operações fundamentais fornecidas pelos métodos `add_free_region` e `find_region`, agora podemos finalmente implementar a trait `GlobalAlloc`. Como com o alocador bump, não implementamos a trait diretamente para o `LinkedListAllocator`, mas apenas para um `Locked` envolvido. O [wrapper `Locked`] adiciona mutabilidade interior através de um spinlock, que nos permite modificar a instância do alocador mesmo que os métodos `alloc` e `dealloc` recebam apenas referências `&self`. [wrapper `Locked`]: @/edition-2/posts/11-allocator-designs/index.md#a-locked-wrapper-type A implementação se parece com isso: ```rust // em src/allocator/linked_list.rs use super::Locked; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // realizar ajustes de layout let (size, align) = LinkedListAllocator::size_align(layout); let mut allocator = self.lock(); if let Some((region, alloc_start)) = allocator.find_region(size, align) { let alloc_end = alloc_start.checked_add(size).expect("overflow"); let excess_size = region.end_addr() - alloc_end; if excess_size > 0 { unsafe { allocator.add_free_region(alloc_end, excess_size); } } alloc_start as *mut u8 } else { ptr::null_mut() } } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { // realizar ajustes de layout let (size, _) = LinkedListAllocator::size_align(layout); unsafe { self.lock().add_free_region(ptr as usize, size) } } } ``` Vamos começar com o método `dealloc` porque ele é mais simples: Primeiro, ele realiza alguns ajustes de layout, que explicaremos em um momento. Então, ele recupera uma referência `&mut LinkedListAllocator` chamando a função [`Mutex::lock`] no [wrapper `Locked`]. Por último, ele chama a função `add_free_region` para adicionar a região desalocada à lista livre. O método `alloc` é um pouco mais complexo. Ele começa com os mesmos ajustes de layout e também chama a função [`Mutex::lock`] para receber uma referência mutável do alocador. Então ele usa o método `find_region` para encontrar uma região de memória adequada para a alocação e removê-la da lista. Se isso não tiver sucesso e `None` for retornado, ele retorna `null_mut` para sinalizar um erro, já que não há nenhuma região de memória adequada. No caso de sucesso, o método `find_region` retorna uma tupla da região adequada (não mais na lista) e do endereço inicial da alocação. Usando `alloc_start`, o tamanho da alocação e o endereço final da região, ele calcula o endereço final da alocação e o tamanho excedente novamente. Se o tamanho excedente não for nulo, ele chama `add_free_region` para adicionar o tamanho excedente da região de memória de volta à lista livre. Finalmente, ele retorna o endereço `alloc_start` convertido como um ponteiro `*mut u8`. #### Ajustes de Layout Então, o que são esses ajustes de layout que fazemos no início de tanto `alloc` quanto `dealloc`? Eles garantem que cada bloco alocado é capaz de armazenar um `ListNode`. Isso é importante porque o bloco de memória vai ser desalocado em algum ponto, onde queremos escrever um `ListNode` nele. Se o bloco for menor que um `ListNode` ou não tiver o alinhamento correto, comportamento indefinido pode ocorrer. Os ajustes de layout são realizados pela função `size_align`, que é definida assim: ```rust // em src/allocator/linked_list.rs impl LinkedListAllocator { /// Ajusta o layout fornecido para que a região de memória alocada resultante /// também seja capaz de armazenar um `ListNode`. /// /// Retorna o tamanho e alinhamento ajustados como uma tupla (size, align). fn size_align(layout: Layout) -> (usize, usize) { let layout = layout .align_to(mem::align_of::()) .expect("adjusting alignment failed") .pad_to_align(); let size = layout.size().max(mem::size_of::()); (size, layout.align()) } } ``` Primeiro, a função usa o método [`align_to`] no [`Layout`] passado para aumentar o alinhamento para o alinhamento de um `ListNode` se necessário. Então ela usa o método [`pad_to_align`] para arredondar o tamanho para um múltiplo do alinhamento para garantir que o endereço inicial do próximo bloco de memória também terá o alinhamento correto para armazenar um `ListNode`. No segundo passo, ela usa o método [`max`] para impor um tamanho mínimo de alocação de `mem::size_of::`. Dessa forma, a função `dealloc` pode com segurança escrever um `ListNode` no bloco de memória liberado. [`align_to`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align_to [`pad_to_align`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.pad_to_align [`max`]: https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.max ### Usando-o Agora podemos atualizar o static `ALLOCATOR` no módulo `allocator` para usar nosso novo `LinkedListAllocator`: ```rust // em src/allocator.rs use linked_list::LinkedListAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(LinkedListAllocator::new()); ``` Como a função `init` se comporta da mesma forma para os alocadores bump e de lista encadeada, não precisamos modificar a chamada `init` em `init_heap`. Quando agora executamos nossos testes `heap_allocation` novamente, vemos que todos os testes passam agora, incluindo o teste `many_boxes_long_lived` que falhou com o alocador bump: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` Isso mostra que nosso alocador de lista encadeada é capaz de reutilizar memória liberada para alocações subsequentes. ### Discussão Em contraste com o alocador bump, o alocador de lista encadeada é muito mais adequado como um alocador de propósito geral, principalmente porque é capaz de reutilizar diretamente memória liberada. No entanto, ele também tem algumas desvantagens. Algumas delas são causadas apenas pela nossa implementação básica, mas também existem desvantagens fundamentais do próprio design do alocador. #### Mesclando Blocos Liberados O principal problema com nossa implementação é que ela apenas divide o heap em blocos menores, mas nunca os mescla de volta juntos. Considere este exemplo: ![](linked-list-allocator-fragmentation-on-dealloc.svg) Na primeira linha, três alocações são criadas no heap. Duas delas são liberadas novamente na linha 2 e a terceira é liberada na linha 3. Agora o heap completo está não utilizado novamente, mas ainda está dividido em quatro blocos individuais. Neste ponto, uma alocação grande pode não ser mais possível porque nenhum dos quatro blocos é grande o suficiente. Ao longo do tempo, o processo continua, e o heap é dividido em blocos cada vez menores. Em algum ponto, o heap fica tão fragmentado que até alocações de tamanho normal falharão. Para corrigir este problema, precisamos mesclar blocos adjacentes liberados de volta juntos. Para o exemplo acima, isso significaria o seguinte: ![](linked-list-allocator-merge-on-dealloc.svg) Como antes, duas das três alocações são liberadas na linha `2`. Em vez de manter o heap fragmentado, agora realizamos um passo adicional na linha `2a` para mesclar os dois blocos mais à direita de volta juntos. Na linha `3`, a terceira alocação é liberada (como antes), resultando em um heap completamente não utilizado representado por três blocos distintos. Em um passo de mesclagem adicional na linha `3a`, então mesclamos os três blocos adjacentes de volta juntos. A crate `linked_list_allocator` implementa esta estratégia de mesclagem da seguinte maneira: Em vez de inserir blocos de memória liberados no início da lista encadeada em `deallocate`, ela sempre mantém a lista ordenada por endereço inicial. Dessa forma, a mesclagem pode ser realizada diretamente na chamada `deallocate` examinando os endereços e tamanhos dos dois blocos vizinhos na lista. É claro que a operação de desalocação é mais lenta dessa forma, mas previne a fragmentação heap que vimos acima. #### Desempenho Como aprendemos acima, o alocador bump é extremamente rápido e pode ser otimizado para apenas algumas operações assembly. O alocador de lista encadeada tem um desempenho muito pior nesta categoria. O problema é que uma requisição de alocação pode precisar percorrer a lista encadeada completa até encontrar um bloco adequado. Como o comprimento da lista depende do número de blocos de memória não utilizados, o desempenho pode variar extremamente para diferentes programas. Um programa que cria apenas algumas alocações experimentará um desempenho de alocação relativamente rápido. Para um programa que fragmenta o heap com muitas alocações, no entanto, o desempenho de alocação será muito ruim porque a lista encadeada será muito longa e conterá principalmente blocos muito pequenos. Vale a pena notar que este problema de desempenho não é um problema causado pela nossa implementação básica, mas um problema fundamental da abordagem de lista encadeada. Como o desempenho de alocação pode ser muito importante para código a nível de kernel, exploramos um terceiro design de alocador a seguir que troca utilização de memória melhorada por desempenho reduzido. ## Alocador de Bloco de Tamanho Fixo A seguir, apresentamos um design de alocador que usa blocos de memória de tamanho fixo para atender requisições de alocação. Dessa forma, o alocador frequentemente retorna blocos que são maiores do que necessário para alocações, o que resulta em memória desperdiçada devido à [fragmentação interna]. Por outro lado, ele reduz drasticamente o tempo necessário para encontrar um bloco adequado (comparado ao alocador de lista encadeada), resultando em muito melhor desempenho de alocação. ### Introdução A ideia por trás de um _alocador de bloco de tamanho fixo_ é a seguinte: Em vez de alocar exatamente a quantidade de memória solicitada, definimos um pequeno número de tamanhos de bloco e arredondamos cada alocação para cima até o próximo tamanho de bloco. Por exemplo, com tamanhos de bloco de 16, 64 e 512 bytes, uma alocação de 4 bytes retornaria um bloco de 16 bytes, uma alocação de 48 bytes um bloco de 64 bytes, e uma alocação de 128 bytes um bloco de 512 bytes. Como o alocador de lista encadeada, mantemos o controle da memória não utilizada criando uma lista encadeada na memória não utilizada. No entanto, em vez de usar uma única lista com diferentes tamanhos de bloco, criamos uma lista separada para cada classe de tamanho. Cada lista então armazena apenas blocos de um único tamanho. Por exemplo, com tamanhos de bloco de 16, 64 e 512, haveria três listas encadeadas separadas na memória: ![](fixed-size-block-example.svg). Em vez de um único ponteiro `head`, temos os três ponteiros head `head_16`, `head_64` e `head_512` que cada um aponta para o primeiro bloco não utilizado do tamanho correspondente. Todos os nós em uma única lista têm o mesmo tamanho. Por exemplo, a lista iniciada pelo ponteiro `head_16` contém apenas blocos de 16 bytes. Isso significa que não precisamos mais armazenar o tamanho em cada nó da lista, já que ele já está especificado pelo nome do ponteiro head. Como cada elemento em uma lista tem o mesmo tamanho, cada elemento da lista é igualmente adequado para uma requisição de alocação. Isso significa que podemos realizar uma alocação de forma muito eficiente usando os seguintes passos: - Arredondar o tamanho de alocação solicitado para cima até o próximo tamanho de bloco. Por exemplo, quando uma alocação de 12 bytes é solicitada, escolheríamos o tamanho de bloco de 16 no exemplo acima. - Recuperar o ponteiro head para a lista, por exemplo, para tamanho de bloco 16, precisamos usar `head_16`. - Remover o primeiro bloco da lista e retorná-lo. Mais notavelmente, sempre podemos retornar o primeiro elemento da lista e não precisamos mais percorrer a lista completa. Assim, alocações são muito mais rápidas do que com o alocador de lista encadeada. #### Tamanhos de Bloco e Memória Desperdiçada Dependendo dos tamanhos de bloco, perdemos muita memória ao arredondar para cima. Por exemplo, quando um bloco de 512 bytes é retornado para uma alocação de 128 bytes, três quartos da memória alocada estão não utilizados. Ao definir tamanhos de bloco razoáveis, é possível limitar a quantidade de memória desperdiçada até certo ponto. Por exemplo, ao usar as potências de 2 (4, 8, 16, 32, 64, 128, …) como tamanhos de bloco, podemos limitar o desperdício de memória a metade do tamanho de alocação no pior caso e um quarto do tamanho de alocação no caso médio. Também é comum otimizar tamanhos de bloco com base em tamanhos de alocação comuns em um programa. Por exemplo, poderíamos adicionar adicionalmente o tamanho de bloco 24 para melhorar o uso de memória para programas que frequentemente realizam alocações de 24 bytes. Dessa forma, a quantidade de memória desperdiçada frequentemente pode ser reduzida sem perder os benefícios de desempenho. #### Desalocação Assim como a alocação, a desalocação também é muito performática. Ela envolve os seguintes passos: - Arredondar o tamanho de alocação liberado para cima até o próximo tamanho de bloco. Isso é necessário já que o compilador passa apenas o tamanho de alocação solicitado para `dealloc`, não o tamanho do bloco que foi retornado por `alloc`. Ao usar a mesma função de ajuste de tamanho em tanto `alloc` quanto `dealloc`, podemos garantir que sempre liberamos a quantidade correta de memória. - Recuperar o ponteiro head para a lista. - Adicionar o bloco liberado à frente da lista atualizando o ponteiro head. Mais notavelmente, nenhum percurso da lista é necessário para desalocação também. Isso significa que o tempo necessário para uma chamada `dealloc` permanece o mesmo independentemente do comprimento da lista. #### Alocador de Fallback Dado que alocações grandes (>2 KB) são frequentemente raras, especialmente em kernels de sistemas operacionais, pode fazer sentido recorrer a um alocador diferente para essas alocações. Por exemplo, poderíamos recorrer a um alocador de lista encadeada para alocações maiores que 2048 bytes a fim de reduzir o desperdício de memória. Como apenas muito poucas alocações desse tamanho são esperadas, a lista encadeada permaneceria pequena e as (des)alocações ainda seriam razoavelmente rápidas. #### Criando Novos Blocos Acima, sempre assumimos que há blocos suficientes de um tamanho específico na lista para atender todas as requisições de alocação. No entanto, em algum ponto, a lista encadeada para um determinado tamanho de bloco fica vazia. Neste ponto, existem duas maneiras pelas quais podemos criar novos blocos não utilizados de um tamanho específico para atender uma requisição de alocação: - Alocar um novo bloco do alocador de fallback (se houver um). - Dividir um bloco maior de uma lista diferente. Isso funciona melhor se os tamanhos de bloco forem potências de dois. Por exemplo, um bloco de 32 bytes pode ser dividido em dois blocos de 16 bytes. Para nossa implementação, alocaremos novos blocos do alocador de fallback, já que a implementação é muito mais simples. ### Implementação Agora que sabemos como um alocador de bloco de tamanho fixo funciona, podemos começar nossa implementação. Não dependeremos da implementação do alocador de lista encadeada criado na seção anterior, então você pode seguir esta parte mesmo se pulou a implementação do alocador de lista encadeada. #### Nó da Lista Começamos nossa implementação criando um tipo `ListNode` em um novo módulo `allocator::fixed_size_block`: ```rust // em src/allocator.rs pub mod fixed_size_block; ``` ```rust // em src/allocator/fixed_size_block.rs struct ListNode { next: Option<&'static mut ListNode>, } ``` Este tipo é similar ao tipo `ListNode` de nossa [implementação de alocador de lista encadeada], com a diferença de que não temos um campo `size`. Ele não é necessário porque cada bloco em uma lista tem o mesmo tamanho com o design de alocador de bloco de tamanho fixo. [implementação de alocador de lista encadeada]: #o-tipo-alocador #### Tamanhos de Bloco Em seguida, definimos uma slice constante `BLOCK_SIZES` com os tamanhos de bloco usados para nossa implementação: ```rust // em src/allocator/fixed_size_block.rs /// Os tamanhos de bloco a usar. /// /// Os tamanhos devem cada um ser potência de 2 porque também são usados como /// o alinhamento de bloco (alinhamentos devem ser sempre potências de 2). const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512, 1024, 2048]; ``` Como tamanhos de bloco, usamos potências de 2, começando de 8 até 2048. Não definimos tamanhos de bloco menores que 8 porque cada bloco deve ser capaz de armazenar um ponteiro de 64 bits para o próximo bloco quando liberado. Para alocações maiores que 2048 bytes, recorreremos a um alocador de lista encadeada. Para simplificar a implementação, definimos o tamanho de um bloco como seu alinhamento necessário na memória. Então um bloco de 16 bytes sempre está alinhado em um limite de 16 bytes e um bloco de 512 bytes está alinhado em um limite de 512 bytes. Como alinhamentos sempre precisam ser potências de 2, isso exclui quaisquer outros tamanhos de bloco. Se precisarmos de tamanhos de bloco que não são potências de 2 no futuro, ainda podemos ajustar nossa implementação para isso (por exemplo, definindo um segundo array `BLOCK_ALIGNMENTS`). #### O Tipo Alocador Usando o tipo `ListNode` e a slice `BLOCK_SIZES`, agora podemos definir nosso tipo alocador: ```rust // em src/allocator/fixed_size_block.rs pub struct FixedSizeBlockAllocator { list_heads: [Option<&'static mut ListNode>; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap, } ``` O campo `list_heads` é um array de ponteiros `head`, um para cada tamanho de bloco. Isso é implementado usando o `len()` da slice `BLOCK_SIZES` como o comprimento do array. Como um alocador de fallback para alocações maiores que o maior tamanho de bloco, usamos o alocador fornecido pela crate `linked_list_allocator`. Também poderíamos usar o `LinkedListAllocator` que implementamos nós mesmos em vez disso, mas ele tem a desvantagem de que não [mescla blocos liberados]. [mescla blocos liberados]: #mesclando-blocos-liberados Para construir um `FixedSizeBlockAllocator`, fornecemos as mesmas funções `new` e `init` que implementamos para os outros tipos de alocadores também: ```rust // em src/allocator/fixed_size_block.rs impl FixedSizeBlockAllocator { /// Cria um FixedSizeBlockAllocator vazio. pub const fn new() -> Self { const EMPTY: Option<&'static mut ListNode> = None; FixedSizeBlockAllocator { list_heads: [EMPTY; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap::empty(), } } /// Inicializa o alocador com os limites de heap fornecidos. /// /// Esta função é unsafe porque o chamador deve garantir que os /// limites de heap fornecidos sejam válidos e que o heap esteja não utilizado. Este método deve ser /// chamado apenas uma vez. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.fallback_allocator.init(heap_start, heap_size); } } } ``` A função `new` apenas inicializa o array `list_heads` com nós vazios e cria um alocador de lista encadeada [`empty`] como `fallback_allocator`. A constante `EMPTY` é necessária para dizer ao compilador Rust que queremos inicializar o array com um valor constante. Inicializar o array diretamente como `[None; BLOCK_SIZES.len()]` não funciona, porque então o compilador exigiria que `Option<&'static mut ListNode>` implementasse a trait `Copy`, o que ele não faz. Esta é uma limitação atual do compilador Rust, que pode desaparecer no futuro. [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.empty A função `init` unsafe apenas chama a função [`init`] do `fallback_allocator` sem fazer nenhuma inicialização adicional do array `list_heads`. Em vez disso, inicializaremos as listas preguiçosamente em chamadas `alloc` e `dealloc`. [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init Por conveniência, também criamos um método privado `fallback_alloc` que aloca usando o `fallback_allocator`: ```rust // em src/allocator/fixed_size_block.rs use alloc::alloc::Layout; use core::ptr; impl FixedSizeBlockAllocator { /// Aloca usando o alocador de fallback. fn fallback_alloc(&mut self, layout: Layout) -> *mut u8 { match self.fallback_allocator.allocate_first_fit(layout) { Ok(ptr) => ptr.as_ptr(), Err(_) => ptr::null_mut(), } } } ``` O tipo [`Heap`] da crate `linked_list_allocator` não implementa [`GlobalAlloc`] (já que [não é possível sem bloqueio]). Em vez disso, ele fornece um método [`allocate_first_fit`] que tem uma interface ligeiramente diferente. Em vez de retornar um `*mut u8` e usar um ponteiro nulo para sinalizar um erro, ele retorna um `Result, ()>`. O tipo [`NonNull`] é uma abstração para um ponteiro bruto que é garantido de não ser um ponteiro nulo. Ao mapear o caso `Ok` para o método [`NonNull::as_ptr`] e o caso `Err` para um ponteiro nulo, podemos facilmente traduzir isso de volta para um tipo `*mut u8`. [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [não é possível sem bloqueio]: #globalalloc-e-mutabilidade [`allocate_first_fit`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.allocate_first_fit [`NonNull`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html [`NonNull::as_ptr`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html#method.as_ptr #### Calculando o Índice da Lista Antes de implementarmos a trait `GlobalAlloc`, definimos uma função auxiliar `list_index` que retorna o menor tamanho de bloco possível para um dado [`Layout`]: ```rust // em src/allocator/fixed_size_block.rs /// Escolhe um tamanho de bloco apropriado para o layout fornecido. /// /// Retorna um índice no array `BLOCK_SIZES`. fn list_index(layout: &Layout) -> Option { let required_block_size = layout.size().max(layout.align()); BLOCK_SIZES.iter().position(|&s| s >= required_block_size) } ``` O bloco deve ter pelo menos o tamanho e alinhamento exigidos pelo `Layout` fornecido. Como definimos que o tamanho do bloco também é seu alinhamento, isso significa que o `required_block_size` é o [máximo] dos atributos [`size()`] e [`align()`] do layout. Para encontrar o próximo bloco maior na slice `BLOCK_SIZES`, primeiro usamos o método [`iter()`] para obter um iterador e então o método [`position()`] para encontrar o índice do primeiro bloco que é pelo menos tão grande quanto o `required_block_size`. [máximo]: https://doc.rust-lang.org/core/cmp/trait.Ord.html#method.max [`size()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.size [`align()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align [`iter()`]: https://doc.rust-lang.org/std/primitive.slice.html#method.iter [`position()`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.position Note que não retornamos o próprio tamanho de bloco, mas o índice na slice `BLOCK_SIZES`. A razão é que queremos usar o índice retornado como um índice no array `list_heads`. #### Implementando `GlobalAlloc` O último passo é implementar a trait `GlobalAlloc`: ```rust // em src/allocator/fixed_size_block.rs use super::Locked; use alloc::alloc::GlobalAlloc; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { todo!(); } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { todo!(); } } ``` Como para os outros alocadores, não implementamos a trait `GlobalAlloc` diretamente para nosso tipo alocador, mas usamos o [wrapper `Locked`] para adicionar mutabilidade interior sincronizada. Como as implementações de `alloc` e `dealloc` são relativamente grandes, as introduzimos uma por uma a seguir. ##### `alloc` A implementação do método `alloc` se parece com isso: ```rust // no bloco `impl` em src/allocator/fixed_size_block.rs unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { match allocator.list_heads[index].take() { Some(node) => { allocator.list_heads[index] = node.next.take(); node as *mut ListNode as *mut u8 } None => { // nenhum bloco existe na lista => alocar novo bloco let block_size = BLOCK_SIZES[index]; // só funciona se todos os tamanhos de bloco forem uma potência de 2 let block_align = block_size; let layout = Layout::from_size_align(block_size, block_align) .unwrap(); allocator.fallback_alloc(layout) } } } None => allocator.fallback_alloc(layout), } } ``` Vamos passar por isso passo a passo: Primeiro, usamos o método `Locked::lock` para obter uma referência mutável à instância do alocador envolvido. Em seguida, chamamos a função `list_index` que acabamos de definir para calcular o tamanho de bloco apropriado para o layout fornecido e obter o índice correspondente no array `list_heads`. Se este índice for `None`, nenhum tamanho de bloco se encaixa para a alocação, portanto usamos o `fallback_allocator` usando a função `fallback_alloc`. Se o índice da lista for `Some`, tentamos remover o primeiro nó na lista correspondente iniciada por `list_heads[index]` usando o método [`Option::take`]. Se a lista não estiver vazia, entramos no branch `Some(node)` da instrução `match`, onde apontamos o ponteiro head da lista para o sucessor do `node` removido (usando [`take`][`Option::take`] novamente). Finalmente, retornamos o ponteiro `node` removido como um `*mut u8`. [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take Se o head da lista for `None`, indica que a lista de blocos está vazia. Isso significa que precisamos construir um novo bloco como [descrito acima](#criando-novos-blocos). Para isso, primeiro obtemos o tamanho do bloco atual da slice `BLOCK_SIZES` e o usamos como tanto o tamanho quanto o alinhamento para o novo bloco. Então criamos um novo `Layout` a partir dele e chamamos o método `fallback_alloc` para realizar a alocação. A razão para ajustar o layout e alinhamento é que o bloco será adicionado à lista de blocos na desalocação. #### `dealloc` A implementação do método `dealloc` se parece com isso: ```rust // em src/allocator/fixed_size_block.rs use core::{mem, ptr::NonNull}; // dentro do bloco `unsafe impl GlobalAlloc` unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { let new_node = ListNode { next: allocator.list_heads[index].take(), }; // verificar que o bloco tem tamanho e alinhamento necessários para armazenar nó assert!(mem::size_of::() <= BLOCK_SIZES[index]); assert!(mem::align_of::() <= BLOCK_SIZES[index]); let new_node_ptr = ptr as *mut ListNode; unsafe { new_node_ptr.write(new_node); allocator.list_heads[index] = Some(&mut *new_node_ptr); } } None => { let ptr = NonNull::new(ptr).unwrap(); unsafe { allocator.fallback_allocator.deallocate(ptr, layout); } } } } ``` Como em `alloc`, primeiro usamos o método `lock` para obter uma referência mutável do alocador e então a função `list_index` para obter a lista de blocos correspondente ao `Layout` fornecido. Se o índice for `None`, nenhum tamanho de bloco adequado existe em `BLOCK_SIZES`, o que indica que a alocação foi criada pelo alocador de fallback. Portanto, usamos seu método [`deallocate`][`Heap::deallocate`] para liberar a memória novamente. O método espera um [`NonNull`] em vez de um `*mut u8`, então precisamos converter o ponteiro primeiro. (A chamada `unwrap` só falha quando o ponteiro é nulo, o que nunca deve acontecer quando o compilador chama `dealloc`.) [`Heap::deallocate`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.deallocate Se `list_index` retorna um índice de bloco, precisamos adicionar o bloco de memória liberado à lista. Para isso, primeiro criamos um novo `ListNode` que aponta para o head atual da lista (usando [`Option::take`] novamente). Antes de escrevermos o novo nó no bloco de memória liberado, primeiro afirmamos que o tamanho do bloco atual especificado por `index` tem o tamanho e alinhamento necessários para armazenar um `ListNode`. Então realizamos a escrita convertendo o ponteiro `*mut u8` fornecido para um ponteiro `*mut ListNode` e então chamando o método [`write`][`pointer::write`] unsafe nele. O último passo é definir o ponteiro head da lista, que atualmente é `None` já que chamamos `take` nele, para nosso `ListNode` recém-escrito. Para isso, convertemos o ponteiro bruto `new_node_ptr` para uma referência mutável. [`pointer::write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write Há algumas coisas que vale a pena notar: - Não diferenciamos entre blocos alocados de uma lista de blocos e blocos alocados do alocador de fallback. Isso significa que novos blocos criados em `alloc` são adicionados à lista de blocos em `dealloc`, aumentando assim o número de blocos daquele tamanho. - O método `alloc` é o único lugar onde novos blocos são criados em nossa implementação. Isso significa que inicialmente começamos com listas de blocos vazias e só preenchemos essas listas preguiçosamente quando alocações de seu tamanho de bloco são realizadas. - Não precisamos de blocos `unsafe` em `alloc` e `dealloc`, mesmo que realizemos algumas operações `unsafe`. A razão é que Rust atualmente trata o corpo completo de funções unsafe como um grande bloco `unsafe`. Como usar blocos `unsafe` explícitos tem a vantagem de que é óbvio quais operações são unsafe e quais não são, há uma [RFC proposta](https://github.com/rust-lang/rfcs/pull/2585) para mudar este comportamento. ### Usando-o Para usar nosso novo `FixedSizeBlockAllocator`, precisamos atualizar o static `ALLOCATOR` no módulo `allocator`: ```rust // em src/allocator.rs use fixed_size_block::FixedSizeBlockAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new( FixedSizeBlockAllocator::new()); ``` Como a função `init` se comporta da mesma forma para todos os alocadores que implementamos, não precisamos modificar a chamada `init` em `init_heap`. Quando agora executamos nossos testes `heap_allocation` novamente, todos os testes ainda devem passar: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` Nosso novo alocador parece funcionar! ### Discussão Embora a abordagem de bloco de tamanho fixo tenha um desempenho muito melhor do que a abordagem de lista encadeada, ela desperdiça até metade da memória ao usar potências de 2 como tamanhos de bloco. Se este trade-off vale a pena depende muito do tipo de aplicação. Para um kernel de sistema operacional, onde o desempenho é crítico, a abordagem de bloco de tamanho fixo parece ser a melhor escolha. No lado da implementação, existem várias coisas que poderíamos melhorar em nossa implementação atual: - Em vez de alocar blocos preguiçosamente apenas usando o alocador de fallback, pode ser melhor pré-preencher as listas para melhorar o desempenho das alocações iniciais. - Para simplificar a implementação, permitimos apenas tamanhos de bloco que são potências de 2 para que também possamos usá-los como o alinhamento do bloco. Ao armazenar (ou calcular) o alinhamento de uma maneira diferente, também poderíamos permitir outros tamanhos de bloco arbitrários. Dessa forma, poderíamos adicionar mais tamanhos de bloco, por exemplo, para tamanhos de alocação comuns, a fim de minimizar a memória desperdiçada. - Atualmente apenas criamos novos blocos, mas nunca os liberamos novamente. Isso resulta em fragmentação e pode eventualmente resultar em falha de alocação para alocações grandes. Pode fazer sentido impor um comprimento máximo de lista para cada tamanho de bloco. Quando o comprimento máximo é atingido, desalocações subsequentes são liberadas usando o alocador de fallback em vez de serem adicionadas à lista. - Em vez de recorrer a um alocador de lista encadeada, poderíamos ter um alocador especial para alocações maiores que 4 KiB. A ideia é utilizar [paginação], que opera em páginas de 4 KiB, para mapear um bloco contínuo de memória virtual a frames físicos não contínuos. Dessa forma, fragmentação de memória não utilizada não é mais um problema para alocações grandes. - Com tal alocador de página, pode fazer sentido adicionar tamanhos de bloco até 4 KiB e descartar o alocador de lista encadeada completamente. As principais vantagens disso seriam fragmentação reduzida e melhor previsibilidade de desempenho, ou seja, melhor desempenho de pior caso. [paginação]: @/edition-2/posts/08-paging-introduction/index.md É importante notar que as melhorias de implementação descritas acima são apenas sugestões. Alocadores usados em kernels de sistemas operacionais são tipicamente altamente otimizados para a carga de trabalho específica do kernel, o que só é possível através de profiling extensivo. ### Variações Também existem muitas variações do design de alocador de bloco de tamanho fixo. Dois exemplos populares são o _alocador slab_ e o _alocador buddy_, que também são usados em kernels populares como o Linux. A seguir, damos uma breve introdução a esses dois designs. #### Alocador Slab A ideia por trás de um [alocador slab] é usar tamanhos de bloco que correspondem diretamente a tipos selecionados no kernel. Dessa forma, alocações desses tipos se encaixam em um tamanho de bloco exatamente e nenhuma memória é desperdiçada. Às vezes, pode até ser possível pré-inicializar instâncias de tipo em blocos não utilizados para melhorar ainda mais o desempenho. [alocador slab]: https://en.wikipedia.org/wiki/Slab_allocation Alocação slab é frequentemente combinada com outros alocadores. Por exemplo, ela pode ser usada junto com um alocador de bloco de tamanho fixo para dividir ainda mais um bloco alocado a fim de reduzir o desperdício de memória. Também é frequentemente usada para implementar um [padrão de pool de objetos] em cima de uma única grande alocação. [padrão de pool de objetos]: https://en.wikipedia.org/wiki/Object_pool_pattern #### Alocador Buddy Em vez de usar uma lista encadeada para gerenciar blocos liberados, o design [alocador buddy] usa uma estrutura de dados de [árvore binária] junto com tamanhos de bloco que são potências de 2. Quando um novo bloco de um certo tamanho é necessário, ele divide um bloco de tamanho maior em duas metades, criando assim dois nós filhos na árvore. Sempre que um bloco é liberado novamente, seu bloco vizinho na árvore é analisado. Se o vizinho também estiver livre, os dois blocos são unidos de volta para formar um bloco de duas vezes o tamanho. A vantagem deste processo de mesclagem é que a [fragmentação externa] é reduzida para que pequenos blocos liberados possam ser reutilizados para uma alocação grande. Também não usa um alocador de fallback, então o desempenho é mais previsível. A maior desvantagem é que apenas tamanhos de bloco que são potências de 2 são possíveis, o que pode resultar em uma grande quantidade de memória desperdiçada devido à [fragmentação interna]. Por essa razão, alocadores buddy são frequentemente combinados com um alocador slab para dividir ainda mais um bloco alocado em múltiplos blocos menores. [alocador buddy]: https://en.wikipedia.org/wiki/Buddy_memory_allocation [árvore binária]: https://en.wikipedia.org/wiki/Binary_tree [fragmentação externa]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#External_fragmentation [fragmentação interna]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#Internal_fragmentation ## Resumo Este post deu uma visão geral de diferentes designs de alocadores. Aprendemos como implementar um [alocador bump] básico, que distribui memória linearmente aumentando um único ponteiro `next`. Embora a alocação bump seja muito rápida, ela só pode reutilizar memória depois que todas as alocações foram liberadas. Por essa razão, raramente é usada como um alocador global. [alocador bump]: @/edition-2/posts/11-allocator-designs/index.md#bump-allocator Em seguida, criamos um [alocador de lista encadeada] que usa os próprios blocos de memória liberados para criar uma lista encadeada, a chamada [lista livre]. Esta lista torna possível armazenar um número arbitrário de blocos liberados de diferentes tamanhos. Embora nenhum desperdício de memória ocorra, a abordagem sofre de desempenho pobre porque uma requisição de alocação pode requerer um percurso completo da lista. Nossa implementação também sofre de [fragmentação externa] porque não mescla blocos adjacentes liberados de volta juntos. [alocador de lista encadeada]: @/edition-2/posts/11-allocator-designs/index.md#linked-list-allocator [lista livre]: https://en.wikipedia.org/wiki/Free_list Para corrigir os problemas de desempenho da abordagem de lista encadeada, criamos um [alocador de bloco de tamanho fixo] que predefine um conjunto fixo de tamanhos de bloco. Para cada tamanho de bloco, uma [lista livre] separada existe, de modo que alocações e desalocações só precisam inserir/remover na frente da lista e são assim muito rápidas. Como cada alocação é arredondada para cima até o próximo tamanho de bloco maior, alguma memória é desperdiçada devido à [fragmentação interna]. [alocador de bloco de tamanho fixo]: @/edition-2/posts/11-allocator-designs/index.md#fixed-size-block-allocator Existem muitos outros designs de alocadores com diferentes trade-offs. [Alocação slab] funciona bem para otimizar a alocação de estruturas comuns de tamanho fixo, mas não é aplicável em todas as situações. [Alocação buddy] usa uma árvore binária para mesclar blocos liberados de volta juntos, mas desperdiça uma grande quantidade de memória porque só suporta tamanhos de bloco que são potências de 2. Também é importante lembrar que cada implementação de kernel tem uma carga de trabalho única, então não há design de alocador "melhor" que se encaixe em todos os casos. [Alocação slab]: @/edition-2/posts/11-allocator-designs/index.md#slab-allocator [Alocação buddy]: @/edition-2/posts/11-allocator-designs/index.md#buddy-allocator ## O que vem a seguir? Com este post, concluímos nossa implementação de gerenciamento de memória por enquanto. Em seguida, começaremos a explorar [_multitarefa_], começando com multitarefa cooperativa na forma de [_async/await_]. Em posts subsequentes, então exploraremos [_threads_], [_multiprocessamento_] e [_processos_]. [_multitarefa_]: https://en.wikipedia.org/wiki/Computer_multitasking [_threads_]: https://en.wikipedia.org/wiki/Thread_(computing) [_processos_]: https://en.wikipedia.org/wiki/Process_(computing) [_multiprocessamento_]: https://en.wikipedia.org/wiki/Multiprocessing [_async/await_]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html ================================================ FILE: blog/content/edition-2/posts/11-allocator-designs/index.ru.md ================================================ +++ title = "Архитектуры Аллокаторов" weight = 11 path = "ru/allocator-designs" date = 2020-01-20 [extra] chapter = "Memory Management" # Please update this when updating the translation translate_based_on_commit = "eb079d740fb3635e524667f656307097e05ac20d" # GitHub usernames of the people that translated this post translators = ["TakiMoysha"] +++ В этом посте объясняется, как реализовать heap-аллокатор с нуля. Здесь представлены и обсуждаются различные конструкции, а именно bump-аллокатор, linked list и fixed-sized. Для каждой из трех конструкций мы создадим базовую реализацию, которую можно использовать для ядра нашей системы. Этот блог открыто разрабатывается на [GitHub]. Если у вас есть какие-либо проблемы или вопросы, пожалуйста, создайте issue. Вы также можете оставлять комментарии [внизу страницы]. Полный исходный код для этого поста можно найти в ветке [`post-11`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [внизу страницы]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-11 ## Введение В [предыдущем посте] мы добавили в наше ядро базовую поддержку аллокаций памяти в куче. Для этого мы [создали новый регион памяти][map-heap] в таблице страниц и [использовали крейт `linked_list_allocator`][use-alloc-crate] для управления этой памятью. Хотя теперь у нас есть рабочая кучу, мы оставили большую часть работы крейту аллокатора, не пытаясь понять, как он работает. [предыдущем посте]: @/edition-2/posts/10-heap-allocation/index.md [map-heap]: @/edition-2/posts/10-heap-allocation/index.md#creating-a-kernel-heap [use-alloc-crate]: @/edition-2/posts/10-heap-allocation/index.md#using-an-allocator-crate В этом посте мы покажем, как создать собственный аллокатор для кучи с нуля, вместо того чтобы полагаться на существующий крейт-аллокатор. Мы обсудим различные конструкции аллокаторов, включая упрощенный _bump allocator_ и базовый _fixed-size block allocator_, и воспользуемся этими знаниями для реализации аллокатора с улучшенной производительностью (по сравнению с крейтом `linked_list_allocator`). ### Design Goals Ответственность аллокатора - управление доступной памятью в куче. Он должен возвращать неиспользуемую память при вызовах `alloc` и отслеживать память, освобожденную с помощью `dealloc`, чтобы ее можно было использовать повторно. Самое главное, он никогда не должен выделять память, которая уже где-то используется, поскольку это приведет к неопределенному поведению (undefined behavior). Помимо корректности, существует множество второстепенных целей проектирования. Например, аллокатор должен эффективно использовать доступную память и поддерживать низкий уровень [_фрагментации_]. Кроме того, он должен хорошо работать в приложениях с распараллеливанием задач и масштабироваться до любого количества ядер. Для максимальной производительности он может даже оптимизировать структуру памяти (memory layout) с учетом кэшей ЦП, чтобы улучшить [локальность кэша] и избежать [false sharing]. > примечание: memory layout в русском встречается как "структура памяти", но устоявшегося термина нету, обычно пишут memory layout если это важно для контекста. Под этим понимается размер, выравнивание, начало и конец участка памяти. Описывает то, как аллокатор выделяет память. [локальность кэша]: https://www.geeksforgeeks.org/locality-of-reference-and-cache-operation-in-cache-memory/ [_фрагментации_]: https://en.wikipedia.org/wiki/Fragmentation_(computing) [false sharing]: https://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html Эти требования могут сделать хорошие аллокаторы очень сложными. Например, [jemalloc] имеет более 30 000 строк кода. Такая сложность часто нежелательна в коде ядра, где одна ошибка может привести к серьезным уязвимостям безопасности. К счастью, паттерны аллокации памяти в ядра часто гораздо проще по сравнению с userspace кодом, поэтому относительно простых аллокаторов часто достаточно. [jemalloc]: http://jemalloc.net/ Ниже мы представляем три возможных архитектуры аллокатора для ядра и объясняем их преимущества и недостатки. ## Bump Allocator Самая простая конструкция аллокатора - это _bump allocator_ (также известный как _stack allocator_). Он выделяет память линейно и отслеживает только количество выделенных байтов и количество аллокаций. Он полезен только в очень конкретных случаях, поскольку имеет серьезное ограничение: он может освободить только всю память за раз. ### Идея Идея bump-аллокатора заключается в линейном выделении памяти через увеличение (_"bumping"_) переменной `next`, которая указывает на начало неиспользуемой памяти. В начале `next` указывает на начало кучи. При каждом выделении `next` увеличивается на размер аллокации, так что она всегда указывает на границу между использованной и неиспользованной памятью: ![Область памяти в куче в три момента времени: 1: В начале кучи существует одна аллокация; указатель `next` указывает на его конец. 2: Вторая аллокация была добавлена сразу после первой; указатель `next` указывает на конец второй аллокации. 3: Третья аллокация бал добавлена сразу после второго; указатель `next` указывает на конец третьей аллокации.](bump-allocation.svg) Указатель `next` движется только в одном направлении и поэтому никогда не выделяет одну и ту же область памяти дважды. Когда он достигает конца кучи, больше нельзя выделить память, что приводит к ошибке нехватки памяти при следующем выделении. Bump-аллокатор часто реализуется с помощью счетчика аллокаций, который увеличивается на 1 при каждом вызове `alloc` и уменьшается на 1 при каждом вызове `dealloc`. Когда счетчик аллокаций достигает нуля, это означает, что все выделения в куче были освобождены. В этом случае указатель `next` может быть сброшен на начальный адрес кучи, так что вся память кучи снова становится доступной для аллокации. ### Реализация Мы начинаем реализацию с объявления нового подмодуля `allocator::bump`: ```rust // src/allocator.rs pub mod bump; ``` Содержимое подмодуля находится в новом файле `src/allocator/bump.rs`, который мы создаем с содержанием: ```rust // src/allocator/bump.rs pub struct BumpAllocator { heap_start: usize, heap_end: usize, next: usize, allocations: usize, } impl BumpAllocator { /// создаем новый, пустой bump-аллокатор. pub const fn new() -> Self { BumpAllocator { heap_start: 0, heap_end: 0, next: 0, allocations: 0, } } /// Инициализирует bump-аллокатор с заданными границами кучи. /// /// Этот метод небезопасен, поскольку вызывающая сторона должна убедиться, что заданный /// диапазон памяти не используется. Кроме того, этот метод должен вызываться только один раз. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { self.heap_start = heap_start; self.heap_end = heap_start + heap_size; self.next = heap_start; } } ``` Поля `heap_start` и `heap_end` отслеживают нижнюю и верхнюю границы области памяти кучи. Вызывающая сторона должна убедиться, что эти адреса действительны, иначе аллокатор вернет недействительную память. По этой причине функция `init` должна быть помечена как `unsafe`. Цель поля `next` - всегда указывать на первый неиспользуемый байт в куче, т.е. на начальный адрес следующей аллокации. В функции `init` оно устанавливается в значение `heap_start`, поскольку в начале вся куча не используется. При каждом выделении это поле увеличивается на размер аллокации (_«bumped»_), чтобы гарантировать, что мы не вернем одну и ту же область памяти дважды. Поле `allocations` - это простой счетчик активных аллокаций, цель которого - сбросить аллокатор после освобождения последней выделения. Оно инициализируется со значением 0. Мы решили создать отдельную функцию `init` вместо того, чтобы выполнять инициализацию непосредственно в `new`, чтобы интерфейс оставался идентичным аллокатору, предоставляемому крейтом `linked_list_allocator`. Благодаря этому, аллокаторы можно переключать без дополнительных изменений кода. ### Реализация `GlobalAlloc` Как [объяснялось в предыдущем посте][global-alloc], все аллокаторы кучи должны реализовывать трейт [`GlobalAlloc`], который определен следующим образом: [global-alloc]: @/edition-2/posts/10-heap-allocation/index.md#the-allocator-interface [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` Необходимы только методы `alloc` и `dealloc`; другие два обладают реализацией по умолчанию и могут быть опущены. #### Первая попытка реализации Попробуем реализовать метод `alloc` для нашего `BumpAllocator`: ```rust // src/allocator/bump.rs use alloc::alloc::{GlobalAlloc, Layout}; unsafe impl GlobalAlloc for BumpAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // TODO: проверка выравнивания и границ let alloc_start = self.next; self.next = alloc_start + layout.size(); self.allocations += 1; alloc_start as *mut u8 } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { todo!(); } } ``` Сперва мы используем поле `next` в качестве начального адреса для нашей аллокации. Затем мы обновляем его, чтобы оно указывало на конечный адрес выделения, который является следующим неиспользуемым адресом в куче. Перед тем, как вернуть начальный адрес аллокации в виде указателя `*mut u8`, мы увеличиваем счетчик `allocations` на 1. Обратите внимание, что мы не выполняем никаких проверок границ или корректировок выравнивания, поэтому эта реализация еще не является безопасной. Это не имеет большого значения, поскольку в любом случае она не компилируется из-за ошибки: ``` error[E0594]: cannot assign to `self.next` which is behind a `&` reference --> src/allocator/bump.rs:29:9 | 29 | self.next = alloc_start + layout.size(); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&` reference, so the data it refers to cannot be written ``` (Та же ошибка будет возникать для строки `self.allocations += 1`. Мы опустили это здесь для краткости.) Ошибка возникает т.к. методы [`alloc`] и [`dealloc`] трейта `GlobalAlloc` работают только с неизменяемой ссылкой `&self`, поэтому обновление полей `next` и `allocations` невозможно. Это создает проблему, поскольку обновление `next` при каждой аллокации является основным принципом работы bump-аллокатора. [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc #### `GlobalAlloc` и Мутабельность {#globalalloc-and-mutability} Прежде чем рассматривать возможное решение этой проблемы мутабельности, давайте попробуем понять, почему методы трейта `GlobalAlloc` определены с аргументами `&self`: как мы видели [в предыдущем посте][global-allocator], глобальный аллокатор кучи определяется добавлением атрибута `#[global_allocator]` к `static`, который реализует трейт `GlobalAlloc`. Статические переменные в Rust иммутабельны, поэтому нет возможности вызвать метод, принимающий `&mut self`, на статическом аллокаторе. По этой причине все методы `GlobalAlloc` принимают только неизменяемую ссылку `&self`. [global-allocator]: @/edition-2/posts/10-heap-allocation/index.md#the-global-allocator-attribute К счастью, есть способ получить ссылку `&mut self` из ссылки `&self`: мы можем использовать синхронизированную [внутреннюю изменяемость][interior mutability], обернув аллокатор в спинлок [`spin::Mutex`]. Этот тип предоставляет метод `lock`, который выполняет [взаимное исключение (мьютекс)][mutual exclusion] и, таким образом, безопасно превращает ссылку `&self` в ссылку `&mut self`. Мы уже несколько раз использовали тип оболочки в нашем ядре, например, для [текстового буфера VGA][vga-mutex]. [interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [vga-mutex]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks [`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html [mutual exclusion]: https://en.wikipedia.org/wiki/Mutual_exclusion #### Обертка Типа `Locked` С помощью обертки `spin::Mutex` мы может реализовать трейт `GlobalAlloc` для нашего bump-аллокатора. Фокус в том, что реализация трейта не для `BumpAllocator`, а для обернутого типа `spin::Mutex::`: ```rust unsafe impl GlobalAlloc for spin::Mutex {…} ``` К сожалению, это все еще не работает, т.к. компилятор Rust не разрешает реализации трейта для типов определенных в других крейтов: ``` error[E0117]: only traits defined in the current crate can be implemented for arbitrary types --> src/allocator/bump.rs:28:1 | 28 | unsafe impl GlobalAlloc for spin::Mutex { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^-------------------------- | | | | | `spin::mutex::Mutex` is not defined in the current crate | impl doesn't use only types from inside the current crate | = note: define and implement a trait or new type instead ``` Что бы поправить это, мы должны создать нашу собственную обертку вокруг `spin::Mutex`: ```rust // src/allocator.rs /// Обертка вокруг spin::Mutex для доступа к реализации. pub struct Locked { inner: spin::Mutex, } impl Locked { pub const fn new(inner: A) -> Self { Locked { inner: spin::Mutex::new(inner), } } pub fn lock(&self) -> spin::MutexGuard { self.inner.lock() } } ``` Этот тип является общей оберткой для `spin::Mutex`. Он не налагает никаких ограничений на тип `A`, поэтому его можно использовать для обертки всех типов, а не только аллокаторов. Он предоставляет простую функцию-конструктор `new`, которая оборачивает заданное значение. Для удобства он также предоставляет функцию `lock`, которая вызывает `lock` на обёрнутом `Mutex`. Поскольку тип `Locked` достаточно общий, чтобы быть полезным и для других реализаций аллокаторов, мы поместили его в родительский модуль `allocator`. #### Реализация для `Locked` Тип `Locked` определен в нашем собственном крейте (в отличие от `spin::Mutex`), поэтому мы можем использовать его для реализации `GlobalAlloc` для нашего bump-аллокатора. Полная реализация выглядит следующим образом: ```rust // src/allocator/bump.rs use super::{align_up, Locked}; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut bump = self.lock(); // получаем мутабельную ссылку let alloc_start = align_up(bump.next, layout.align()); let alloc_end = match alloc_start.checked_add(layout.size()) { Some(end) => end, None => return ptr::null_mut(), }; if alloc_end > bump.heap_end { ptr::null_mut() // out of memory } else { bump.next = alloc_end; bump.allocations += 1; alloc_start as *mut u8 } } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { let mut bump = self.lock(); // получаем мутабельную ссылку bump.allocations -= 1; if bump.allocations == 0 { bump.next = bump.heap_start; } } } ``` Первым шагом как для `alloc`, так и для `dealloc` является вызов метода [`Mutex::lock`] через поле `inner`, чтобы получить мутабельную ссылку на тип аллокатора. Экземпляр остается заблокированным до конца метода, это нужно чтобы не было состояния гонки (race condition) в многопоточном контексте (скоро мы добавим поддержку многопоточности). [`Mutex::lock`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock По сравнению с предыдущим прототипом, реализация `alloc` теперь учитывает требования к выравниванию и проверяет границы, чтобы гарантировать, что аллокации остаются в пределах области кучи. Первый шаг - округлить адрес `next` до значения выравнивания, указанного аргументом `Layout`. Код функции `align_up` будет показан чуть позже. Затем мы добавляем запрошенный размер аллокации к `alloc_start`, чтобы получить конечный адрес блока. Чтобы предотвратить переполнение integer при больших аллокациях, мы используем метод [`checked_add`]. Если происходит переполнение или если результирующий конечный адрес блока больше конечного адреса кучи, мы возвращаем нулевой указатель, указывающий на нехватку памяти. В противном случае мы обновляем адрес `next` и увеличиваем счетчик `allocations` на 1, как и раньше. Наконец, мы возвращаем адрес `alloc_start`, преобразованный в указатель `*mut u8`. [`checked_add`]: https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html Функция `dealloc` игнорирует заданные аргументы указателя и `Layout`. Вместо этого она просто уменьшает счетчик `allocations`. Если счетчик снова достигает значения `0`, это означает, что все выделенные области памяти были снова освобождены. В этом случае она сбрасывает адрес `next` в адрес `heap_start`, чтобы снова сделать доступной всю память кучи. #### Выравнивание Адреса Функция `align_up` достаточно универсальна, чтобы мы могли поместить ее в родительский модуль `allocator`. Базовая реализация выглядит следующим образом: ```rust // src/allocator.rs /// выравнивание addr по align до верхнего значения fn align_up(addr: usize, align: usize) -> usize { let remainder = addr % align; if remainder == 0 { addr // addr уже выровнен } else { addr - remainder + align } } ``` Функция сначала вычисляет [остаток (переменная remainder)][remainder] от деления `addr` на `align`. Если remainder равен 0, то адрес уже выровнен по заданному выравнивания (переменная align). В противном случае мы выравниваем адрес, вычитая remainder (чтобы новый remainder стал равен 0), а затем прибавляя значение align (чтобы адрес не стал меньше исходного). [remainder]: https://en.wikipedia.org/wiki/Euclidean_division Заметьте, это не самый эффективный способ реализации этой функции. Гораздо более быстрая реализация выглядит так: ```rust /// выравнивание addr по align до верхнего значения. /// /// требуется что бы `align` был кратен 2. fn align_up(addr: usize, align: usize) -> usize { (addr + align - 1) & !(align - 1) } ``` Этот метод требует, чтобы `align` было степенью двойки, что можно гарантировать с помощью трейта `GlobalAlloc` (и его параметра [`Layout`]). Это позволяет создать [bitmask] для выравнивания адреса очень эффективным способом. Чтобы понять, как это работает, давайте пройдемся по нему шаг за шагом, начиная с правой стороны: [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html [bitmask]: https://en.wikipedia.org/wiki/Mask_(computing) - Поскольку `align` является степенью двойки, его [двоичное представление][binary representation] имеет только один установленный бит (например, `0b000100000`). Это означает, что `align - 1` имеет все нижние биты установленными (например, `0b00011111`). - Создавая [битовое `NOT`][bitwise `NOT`] с помощью оператора `!`, мы получаем число, в котором установлены все биты, кроме битов, меньших, чем `align` (например, `0b…111111111100000`). - Выполняя [битовое `AND`][bitwise `AND`] над адресом и `!(align - 1)`, мы выравниваем адрес _вниз_. Это работает путем очистки всех битов, которые ниже `align`. - Поскольку мы хотим выровнять вверх, а не вниз, мы увеличиваем `addr` на `align - 1` перед выполнением битового `AND`. Таким образом, уже выровненные адреса остаются прежними, а невыровненные адреса округляются до следующей границы выравнивания. [binary representation]: https://en.wikipedia.org/wiki/Binary_number#Representation [bitwise `NOT`]: https://en.wikipedia.org/wiki/Bitwise_operation#NOT [bitwise `AND`]: https://en.wikipedia.org/wiki/Bitwise_operation#AND Какой вариант выбрать, решать вам. Оба дают одинаковый результат, только используют разные методы. ### Используем Это Чтобы использовать bump-аллокатор вместо крейта `linked_list_allocator`, нам нужно обновить статическую переменную `ALLOCATOR` в файле `allocator.rs`: ```rust // src/allocator.rs use bump::BumpAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(BumpAllocator::new()); ``` Здесь важно, что мы объявили `BumpAllocator::new` и `Locked::new` как [`const` функции][`const` functions]. Если бы они были обычными функциями, произошла бы ошибка компиляции, поскольку выражение инициализации `static` должно быть вычислимым во время компиляции. [`const` functions]: https://doc.rust-lang.org/reference/items/functions.html#const-functions Нам не нужно изменять вызов `ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE)` в нашей функции `init_heap`, т.к. bump-аллокатор предоставляет тот же интерфейс, что и аллокатор, из `linked_list_allocator`. Теперь наше ядро использует наш bump-аллокатор! Все должно работать как и раньше, включая [тесты heap_allocation][`heap_allocation` tests], которые мы создали в предыдущем посте: [`heap_allocation` tests]: @/edition-2/posts/10-heap-allocation/index.md#adding-a-test ``` > cargo test --test heap_allocation […] Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` ### Обсуждение Большим преимуществом bump-аллокатора является ее высокая скорость. По сравнению с другими архитектурами аллокаторов (см. ниже), которые должны активно искать подходящий блок памяти и выполнять различные задачи учета в `alloc` и `dealloc`, bump-аллокатор [может быть оптимизирован][bump downwards] до нескольких ассемблерных инструкций. Это делает bump-аллокаторы полезными для оптимизации производительности аллокации, например, при создании [виртуальной библиотеки DOM][virtual DOM library]. [bump downwards]: https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html [virtual DOM library]: https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/ Хотя bump-аллокатор редко используется в качестве глобального аллокатора, принцип bump-аллокации часто применяется в форме [арены аллокаций][arena allocation], которая в основном объединяет отдельные аллокации в пакеты для повышения производительности. Пример аллокатора арены для Rust есть в крейте [`toolshed`]. [arena allocation]: https://mgravell.github.io/Pipelines.Sockets.Unofficial/docs/arenas.html [`toolshed`]: https://docs.rs/toolshed/0.8.1/toolshed/index.html #### Недостатки Bump-Аллокатора Основное ограничение bump-аллокатора это то, что он можно использовать освобожденную память только после того, как все будет освобождено. То есть одной долговечной аллокации достаточно, чтобы заблокировать повторное использование памяти. Мы можем увидеть это добавив вариацию теста `many_boxes`: ```rust // tests/heap_allocation.rs #[test_case] fn many_boxes_long_lived() { let long_lived = Box::new(1); // new for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } assert_eq!(*long_lived, 1); // new } ``` Как и тест `many_boxes`, этот тест создает большое количество аллокаций и если не переиспользовать освобожденную память, то это приведет к ошибке out-of-memory. Кроме того, тест создает `long_lived` аллокацию, которая существует в течение всей работы цикла. Когда мы запускаем наш новый тест, мы видим, что он действительно завершается с ошибкой: ``` > cargo test --test heap_allocation Running 4 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [failed] Error: panicked at 'allocation error: Layout { size_: 8, align_: 8 }', src/lib.rs:86:5 ``` Давайте подробно разберём, почему это происходит. Во-первых, аллокация `long_lived` создаётся в начале кучи, тем самым увеличивая счётчик `allocations` на 1. На каждой итерации цикла создаётся короткоживущее выделения, которые сразу же освобождаеются до начала следующей итерации. Это означает, что счётчик `allocations` временно возрастает до 2 в начале итерации и возвращается к 1 в её конце. Проблема в том, что bump-аллокатор может повторно использовать память только тогда, когда _все_ аллокации освобождены, то есть когда счётчик allocations достигает 0. Поскольку это не происходит до окончания цикла, каждая итерация выделяет новый участок памяти, что в итоге приводит к ошибке нехватки памяти после нескольких итераций. #### Исправляем Тест? Есть два возможных трюка, которые мы могли бы использовать что бы поправить наш bump-аллокатор: - Мы могли бы обновить `dealloc`, чтобы проверить, было ли освобожденная аллокация последней, возвращенной `alloc`, путем сравнения его конечного адреса с указателем `next`. В случае, если они равны, мы можем безопасно сбросить `next` обратно к начальному адресу освобожденной аллокации. Таким образом, каждая итерация цикла будет переиспользовать один и тот же блок памяти. - Мы могли бы добавить метод `alloc_back`, который выделяет память с _конца_ кучи, используя дополнительное поле `next_back`. Затем мы могли бы вручную применять этот метод ко всем всех долгоживущим аллокациям, тем самым разделяя кратковременные и долговечные аллокации в куче. Заметьте, что такое разделение работает только в том случае, если заранее известно, как долго будет существовать каждая аллокация. Еще один недостаток этого подхода в том, что, что ручное управление аллокациями является трудоемким и потенциально небезопасным. Хотя оба этих подхода позволяют пройти тест, они не являются универсальным решением, поскольку способны переиспользовать память лишь в очень конкретных случаях. Встаёт вопрос: существует ли общее решение, которое позволяет переиспользовать всю освобождённую память? #### Переиспользование Освобожденной Памяти? Как мы узнали в [предыдущей статье][heap-intro], аллокации могут существовать произвольно долго и освобождаться в произвольном порядке. Это означает, что нам необходимо отслеживать потенциально неограниченное количество непрерывных и несвязанных участков неиспользуемой памяти, как показано в следующем примере: [heap-intro]: @/edition-2/posts/10-heap-allocation/index.md#dynamic-memory ![](allocation-fragmentation.svg) На графике показано состояние кучи в динамике. В начале вся куча неиспользуется, и указатель `next` равен `heap_start` (линия 1). Затем происходит первая аллокация (линия 2). На линии 3 мы освобождаем первый блок и аллоцируем второй. На линии 4 добавляется ещё множество блоков, половина из которых короткоживущие и уже освобождаются на линии 5, где одновременно выделяется ещё один новый блок. Линия 5 демонстрирует фундаментальную проблему: у нас есть пять неиспользуемых областей памяти разного размера, но указатель `next` может указывать только на начало последней из них. Хотя в данном примере мы могли бы хранить начальные адреса и размеры остальных неиспользуемых регионов в массиве размером 4, это не является универсальным решением, поскольку мы легко можем создать пример с 8, 16 или 1000 неиспользуемыми областями памяти. Обычно, когда у нас есть потенциально неограниченное количество элементов, мы используем коллекцию, аллоцированную в куче. Однако в нашем случае это невозможно, т.к heap-аллокатор не может зависеть от самого себя (это привело бы к бесконечной рекурсии или взаимоблокировке, она же deadlock). Поэтому нам нужно найти другое решение. ## Linked List Allocator Одна из распространенных техник для отслеживания свободных областей памяти при реализации аллокаторов - это использование этих самых областей как хранилища. Мы используем факт, что регионы все еще маппятся в виртуальную память и храняться на физическом фрейме, но хранящаяся информация больше не требуется. Записывая информацию об освобожденном регионе прямо в саму область, мы может отслеживать неограниченное кол-во свободных регионов без необходимости дополнительной памяти. Наиболее частый подход к реализации - создание связанного списка (linked list) в освобожденной памяти, где каждый узел представляет собой свободную область памяти: ![](linked-list-allocation.svg) Каждый узел списка содержит два поля: размер свободного региона и указатель на следующий свободный регион памяти. При таком подходе нам достаточно хранить лишь указатель на первый свободный регион (называемый `head`), чтобы отслеживать все свободные области, независимо от их количества. Получившаяся структура данных часто называется [cписок свободной памяти][_free list_] [_free list_]: https://en.wikipedia.org/wiki/Free_list Как вы, вероятно, уже догадались по названию, именно этот метод используется крейтом `linked_list_allocator`. Аллокаторы, применяющие этот подход, также часто называются _pool-аллокаторы_ (pool allocators) ### Реализация Далее мы создадим свой собственный простой тип `LinkedListAllocator`, который использует вышеупомянутый подход для отслеживания освобожденных областей памяти. Эта часть поста не является обязательной для будущих постов, поэтому вы можете пропустить детали реализации, если хотите. #### Тип Аллокатора {#the-allocator-type} Начнем с создания приватной структуры `ListNode` в новом подмодуле `allocator::linked_list`: ```rust // src/allocator.rs pub mod linked_list; ``` ```rust // src/allocator/linked_list.rs struct ListNode { size: usize, next: Option<&'static mut ListNode>, } ``` Как показано на рисунке, узел списка имеет поле `size` и опциональный указатель на следующий узел как тип `Option<&'static mut ListNode>`. Тип `&'static mut` семантически описывает [владеемый][owned] объект за указателем. По сути, это [`Box`] без деструктора, который освобождает объект в конце области видимости. [owned]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html [`Box`]: https://doc.rust-lang.org/alloc/boxed/index.html Мы реализуем следующий метод для `ListNode`: ```rust // src/allocator/linked_list.rs impl ListNode { const fn new(size: usize) -> Self { ListNode { size, next: None } } fn start_addr(&self) -> usize { self as *const Self as usize } fn end_addr(&self) -> usize { self.start_addr() + self.size } } ``` Тип имеет простую функцию-конструктор с именем `new` и методы для вычисления начального и конечного адресов представленной области. Мы делаем функцию `new` [const function], которая понадобится позже при создании статического односвязного аллокатора. [const function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions Используя структуру `ListNode` в качестве строительного блока, мы можем создать структуру `LinkedListAllocator`: ```rust // src/allocator/linked_list.rs pub struct LinkedListAllocator { head: ListNode, } impl LinkedListAllocator { /// создаем пустой LinkedListAllocator. pub const fn new() -> Self { Self { head: ListNode::new(0), } } /// Инициализируем аллокатор с границами кучи. /// /// Эта ф-ция unsafe т.к. вызывающий должен гарантировать ,что полученные /// границы кучи будет корректны и куча не используется. /// Этот метод должен вызываться один раз pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.add_free_region(heap_start, heap_size); } } /// Добавляет полученную память к концу списка. unsafe fn add_free_region(&mut self, addr: usize, size: usize) { todo!(); } } ``` Структура содержит узел `head`, указывающий на первый регион кучи. Нас интересует только значение указателя `next`, поэтому в функции `ListNode::new` мы устанавливаем `size` в 0. То, что `head` является экземпляром `ListNode`, а не просто `&'static mut ListNode`, обладает плюсом в том, что реализация метода `alloc` станет проще. Как и в случае с bump-аллокатором, функция `new` не инициализирует аллокатор с границами кучи. Помимо сохранения совместимости API, причина в том, что процедура инициализации требует записи узла в память кучи, что возможно только во время выполнения. Однако функция `new` должна быть [константой](`const` function), то есть, что бы ее можно было вычислить на этапе компиляции, т.к она будет использоваться для инициализации статической переменной `ALLOCATOR`. По этой причине мы снова предоставляем отдельный, не-константный метод `init`. [`const` function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions Метод `init` использует метод `add_free_region`, реализация которого будет показана через мгновение. Пока что мы используем макрос [`todo!`], он укажет что реализация еще не готова и при достижении будет вызывать панику. [`todo!`]: https://doc.rust-lang.org/core/macro.todo.html #### Метод `add_free_region` Метод `add_free_region` обеспечивает основную операцию _push_ в связанном списке. В настоящее время мы вызываем этот метод только из `init`, но он также будет важным методом для нашей реализации `dealloc`. Помните, что метод `dealloc` вызывается, при освобождении выделенной области памяти. Чтобы отслеживать этот освобожденный участок, мы хотим добавить его в связанный список. Рассмотрим реализацию метода `add_free_region`: ```rust // src/allocator/linked_list.rs use super::align_up; use core::mem; impl LinkedListAllocator { /// Добавить полученную область памяти к концу списка. unsafe fn add_free_region(&mut self, addr: usize, size: usize) { // проверим, что освобожденная область может хранить ListNode assert_eq!(align_up(addr, mem::align_of::()), addr); assert!(size >= mem::size_of::()); // создадим новый узел и добавим его к началу списка let mut node = ListNode::new(size); node.next = self.head.next.take(); let node_ptr = addr as *mut ListNode; unsafe { node_ptr.write(node); self.head.next = Some(&mut *node_ptr) } } } ``` Метод принимает в качестве аргумента адрес и размер области памяти и добавляет ее в начало списка. Сначала он проверяет, что данная область имеет необходимый размер и выравнена для хранения `ListNode`. Затем он создает узел и вставляет его в список, выполняя следующие шаги: ![](linked-list-allocator-push.svg) Шаг 0 показывает состояние кучи перед вызовом `add_free_region`. На шаге 1 метод вызывается с областью памяти, помеченной на рисунке как `freed`. После начальных проверок метод создает новый `node` в своем стеке с размером освобожденной области. Затем он использует метод [`Option::take`], чтобы установить указатель `next` на текущий указатель `head`, тем самым сбрасывая указатель `head` в `None`. [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take На шаге 2 метод записывает вновь созданный `node` в начало освобожденной области памяти с помощью метода [`write`]. Затем он указатель `head` указывает на новый улез. Результирующая структура указателей выглядит немного хаотично, т.к. освобожденная область всегда вставляется в начало списка, но если мы проследим за указателями, то увидим, что каждая свободная область по-прежнему доступна из указателя `head`. [`write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write #### Метод `find_region` Вторая основная операция над связанным списком - поиск записи и ее удаление из списка. Это важная операция, необходимая для реализации метода `alloc`. Мы реализуем эту операцию в виде метода `find_region`: ```rust // src/allocator/linked_list.rs impl LinkedListAllocator { /// Смотрим свободную область заданного размера и выравнивания и /// удаляем ее из списка. /// /// Возвращаем кортеж из списка и начального адреса аллокации. fn find_region(&mut self, size: usize, align: usize) -> Option<(&'static mut ListNode, usize)> { // ссылка на текущий узел, обновляемая при каждой итерации let mut current = &mut self.head; // поиск достаточно большого участка памяти в связанном списке while let Some(ref mut region) = current.next { if let Ok(alloc_start) = Self::alloc_from_region(®ion, size, align) { // область подходит для аллокации -> удалить узел из списка let next = region.next.take(); let ret = Some((current.next.take().unwrap(), alloc_start)); current.next = next; return ret; } else { // облать не подходит -> перейти к следующей current = current.next.as_mut().unwrap(); } } // подходящей области не найдено None } } ``` В этом методе используется переменная `current` и цикл [`while let`] для итерации по элементам списка. В начале `current` устанавливается в (фиктивный) узел `head`. При каждой итерации он обновляется до поля `next` текущего узла (в блоке `else`). Если область подходит для выделения с заданным размером и выравниванием, она удаляется из списка и возвращается вместе с адресом `alloc_start`. [`while let` loop]: https://doc.rust-lang.org/reference/expressions/loop-expr.html#while-let-patterns Когда указатель `current.next` становится `None`, цикл заканчивается. Это означает, что мы прошли весь список, но не нашли подходящей области для аллокации. В этом случае мы возвращаем `None`. Подходит ли область, проверяет функция `alloc_from_region`, реализация которой будет показана чуть позже. Давайте более подробно рассмотрим, как подходящая область удаляется из списка: ![](linked-list-allocator-remove-region.svg) Шаг 0 показывает ситуацию до каких-либо корректировок указателей. Области `region` и `current`, а также указатели `region.next` и `current.next` отмечены на графике. На шаге 1 оба указателя `region.next` и `current.next` сбрасываются в `None` используя [`Option::take`]. Исходные указатели хранятся в локальных переменных с именами `next` и `ret`. На шаге 2 указатель `current.next` устанавливается на локальный указатель `next`, который является исходным указателем `region.next`. В результате `current` теперь напрямую указывает на область после `region`, так что `region` больше не является элементом связанного списка. Затем функция возвращает указатель на `region`, хранящийся в локальной переменной `ret`. ##### Функция `alloc_from_region` Функция `alloc_from_region` возвращает значение, указывающее, подходит ли область для выделения памяти заданного размера и выравнивания. Она определена следующим образом: ```rust // src/allocator/linked_list.rs impl LinkedListAllocator { /// попытка использовать регион для выделения памяти заданного размера и выравнивания. /// /// В случае успеха возвращает начальный адрес выделенной памяти. fn alloc_from_region(region: &ListNode, size: usize, align: usize) -> Result { let alloc_start = align_up(region.start_addr(), align); let alloc_end = alloc_start.checked_add(size).ok_or(())?; if alloc_end > region.end_addr() { // region too small return Err(()); } let excess_size = region.end_addr() - alloc_end; if excess_size > 0 && excess_size < mem::size_of::() { // остальная часть области слишком маленькая чтобы хранить listNode // это необходимо, т.к. аллокация делит область на использованную и свободную return Err(()); } // область подходит для аллокации Ok(alloc_start) } } ``` Сначала функция вычисляет начальный и конечный адрес потенциальной аллокации, используя функцию `align_up`, описанную выше, и метод [`checked_add`]. Если происходит переполнение или конечный адрес находится за конечным адресом области, аллокация не помещается в области, и мы возвращаем ошибку. После этого функция выполняет менее очевидную проверку. Эта проверка необходима, поскольку в большинстве случаев выделение не вписывается в подходящую область идеально, так что часть области остается доступной для использования после аллокации. Эта часть области должна хранить свой собственный `ListNode` после выделения, поэтому она должна быть достаточно большой для этого. Проверка проверяет именно это: либо аллокация вписывается идеально (`excess_size == 0`), либо избыточный размер достаточно велик для хранения `ListNode`. #### Реализация `GlobalAlloc` С помощью операций, предоставляемых методами `add_free_region` и `find_region`, мы можем реализовать трейт `GlobalAlloc`. Как и в случае с bump-аллокатором, мы не реализуем trait напрямую для `LinkedListAllocator`, а только для обернутого `Locked`. Обертка [`Locked`] добавляет внутреннюю мутабельность с помощью спинлока, это позволяет нам изменять сам экземпляр аллокатора, даже если методы `alloc` и `dealloc` принимают только ссылки `&self`. > примечание перевода: спинлок - механизм синхронизации для эксклюзивного доступа к ресурсу, при этом, если не получается захватить доступ к ресурсу код не останавливается, а "крутиться" (spin) в цилке. [`Locked` wrapper]: @/edition-2/posts/11-allocator-designs/index.md#a-locked-wrapper-type Реализация выглядит так: ```rust // src/allocator/linked_list.rs use super::Locked; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // скорректируем memory layout let (size, align) = LinkedListAllocator::size_align(layout); let mut allocator = self.lock(); if let Some((region, alloc_start)) = allocator.find_region(size, align) { let alloc_end = alloc_start.checked_add(size).expect("overflow"); let excess_size = region.end_addr() - alloc_end; if excess_size > 0 { unsafe { allocator.add_free_region(alloc_end, excess_size); } } alloc_start as *mut u8 } else { ptr::null_mut() } } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { // скорректируем memory layout let (size, _) = LinkedListAllocator::size_align(layout); unsafe { self.lock().add_free_region(ptr as usize, size) } } } ``` Начнём с метода `dealloc`, поскольку он проще: сначала выполняется корректировка структуры памяти (adjustment memory layout), которую мы объясним чуть позже. Затем вызываем [`Mutex::lock`] обертке [`Locked`] что бы получить ссылку на аллокатор - `&mut LinkedListAllocator`. В завершение вызываем функцию `add_free_region`, чтобы добавить освобождённую область в список свободной памяти (free list). Метод `alloc` немного сложнее. Он начинается с тех же корректировок структуры и также вызывает [`Mutex::lock`], чтобы получить мутабельную ссылку на аллокатор. Затем он использует метод `find_region`, чтобы найти подходящую область памяти для аллокации и удалить его из списка. Если поиск возвращает None, неуспешен, метод возвращает `null_mut`, сигнализируя об ошибке - подходящей области памяти не найдено. В случае успеха метод `find_region` возвращает кортеж из подходящей области (уже удалённого из списка) и начального адреса выделения. Используя `alloc_start`, размер аллокации и конечный адрес региона, он заново вычисляет конечный адрес области и размер излишка. Если излишек не равен нулю, вызывается `add_free_region`, чтобы вернуть оставшуюся часть региона обратно в список свободных блоков. Наконец, метод возвращает `alloc_start`, приведённый к типу `*mut u8`. #### Корректировка Структуры Памяти Так что же это за корректировка структуры памяти (layout adjustment), которые мы делаем в начале как в `alloc`, так и в `dealloc`? Они гарантируют, что каждый выделенный блок способен хранить `ListNode`. Это важно, потому что блок памяти в какой-то момент будет освобожден, и мы хотим записать в него `ListNode`. Если блок меньше, чем `ListNode`, или не имеет правильного выравнивания, может произойти неопределенное поведение (undefined behavior). Корректировка структуры выполняются функцией `size_align`, которая определена следующим образом: ```rust // src/allocator/linked_list.rs impl LinkedListAllocator { /// Поправить полученный layout так, что бы в выделенная область памяти /// также могла хранить `ListNode`. /// /// Возвращает скорректированный размер и выравнивание в виде кортежа (размер, выравнивание). fn size_align(layout: Layout) -> (usize, usize) { let layout = layout .align_to(mem::align_of::()) .expect("adjusting alignment failed") .pad_to_align(); let size = layout.size().max(mem::size_of::()); (size, layout.align()) } } ``` Сначала функция использует метод [`align_to`] на переданном как аргумент [`Layout`], чтобы при необходимости увеличить выравнивание до выравнивания `ListNode`. Затем используется метод [`pad_to_align`], чтобы округлить размер до кратного выравниванию, чтобы гарантировать, что начальный адрес следующего блока памяти также будет иметь правильное выравнивание для хранения `ListNode`. На втором этапе она использует метод [`max`], чтобы обеспечить минимальный размер аллокации `mem::size_of::`. Таким образом, функция `dealloc` может безопасно записать `ListNode` в освобожденный блок памяти. [`align_to`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align_to [`pad_to_align`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.pad_to_align [`max`]: https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.max ### Используем Это Теперь мы можем обновить статическую переменную `ALLOCATOR` в модуле `allocator`, чтобы использовать наш новый `LinkedListAllocator`: ```rust // src/allocator.rs use linked_list::LinkedListAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(LinkedListAllocator::new()); ``` Поскольку функция `init` ведет себя одинаково для аллокаторов типа bump и linked list, нам не нужно изменять вызов `init` в `init_heap`. Когда мы снова запускаем тесты `heap_allocation`, мы видим, что все тесты проходят успешно, включая тест `many_boxes_long_lived`, который не прошел с bump-аллокатором: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` Это показывает, что наш linked list аллокатор способен повторно использовать освобожденную память для последующих выделений. ### Обсуждение В отличие от bump-аллокатора, аллокатор на основе связанного списка гораздо более подходит в качестве универсального аллокатора, главным образом потому, что он может напрямую повторно использовать освобожденную память. Однако у него есть и некоторые недостатки. Некоторые из них вызваны только нашей базовой реализацией, но есть и фундаментальные минусы самой архитектуры. #### Слияние Освобожденных Блоков {#merging-freed-blocks} Основная проблема нашей реализации заключается в том, что она только разбивает кучу на блоки, но никогда не объединяет их обратно. Рассмотрим следующий пример: ![](linked-list-allocator-fragmentation-on-dealloc.svg) В первой строке на куче создаются три блока. Два из них снова освобождаются на второй строке, а третье - в третьей. Теперь вся куча снова не используется, но по-прежнему разделена на четыре отдельных блока. На этом этапе большое выделение может быть уже невозможно, поскольку ни один из четырех блоков не является достаточно большим. Со временем процесс продолжается, и куча разбивается на все более мелкие блоки. В какой-то момент куча становится настолько фрагментированной, что даже аллокация памяти нормального размера не удается. Чтобы решить эту проблему, нам нужно объединить соседние освобожденные блоки. Для приведенного выше примера это будет означать следующее: ![](linked-list-allocator-merge-on-dealloc.svg) Как и раньше, два из трех блоков освобождаются в строке `2`. Вместо того, чтобы сохранять кучу фрагментированной, теперь мы выполняем дополнительный шаг в строке `2a`, чтобы снова объединить два крайних правых блока. В строке `3` освобождается третья область (как и раньше), в результате чего получается полностью неиспользуемая куча, представленная тремя отдельными блоками. В дополнительном шаге слияния, на строке `3a`, мы снова объединяем три соседних блока. Крейт `linked_list_allocator` реализует эту стратегию слияния следующий образом: вместо вставки освобожденных блоков в начало связанного списка при вызове `deallocate`, он всегда поддерживает список отсортированным по начальному адресу. Благодаря этому слияние можно выполнять непосредственно в процессе вызова `deallocate`, просто проанализировав адреса и размеры двух соседних блоков в списке. Конечно, операция деаллокации в этом случае выполняется медленнее, но это предотвращает фрагментацию кучи, о которой мы говорили выше. #### Производительность Как мы уже выяснили, bump-аллокатор чрезвычайно быстр и может быть оптимизирован до нескольких ассемблерных инструкций. В этом отношении аллокатор на основе связанного списка работает значительно хуже. Проблема в том, что запрос на аллокацию может потребовать обхода всего связанного списка, пока не будет найден подходящий блок. Поскольку длина списка зависит от количества неиспользуемых блоков памяти, производительность варьируется в зависимости от программы. Программа, которая создаёт лишь несколько блоков, будет демонстрировать относительно высокую производительность аллокаций. Однако для программы, которая сильно фрагментирует кучу множеством мелких блоков, производительность аллокаций окажется крайне низкой - ведь список станет очень длинным и будет в основном состоять из мелких, почти непригодных для повторного использования областей памяти. Стоит отметить, что эта проблема производительности не следствие нашей примитивной реализации, а фундаментальное ограничение самого подхода основанного на связанных списках. Поскольку производительность аллокаций имеет критическое значение для кода ядра, в следующем разделе мы рассмотрим третий тип аллокатора, который жертвует эффективностью использования памяти ради значительного повышения производительности. ## Fixed-Size Block Allocator Далее, мы рассмотрим архитектуру аллокатора основанного на аллокациях фиксированного размера (fixed-size block allocator) для выполнения запросов на выделение памяти. Таким образом, аллокатор часто возвращает блоки, которые больше, чем необходимо для выделения, что приводит к потере памяти из-за [внутренней фрагментации]. С другой стороны, это значительно сокращает время, необходимое для поиска подходящего блока (по сравнению с аллокатором связанного списка), что приводит к значительному улучшению производительности выделения памяти. ### Введение Основная идея _аллокатора блоков фиксированного размера_: вместо того, что бы выделять ровно столько памяти, сколько запрошено, мы определяем небольшое кол-во размеров блоков и округляем каждый блок до верхнего размера блока. Например, при размерах 16, 64 и 512 байт запрос на 4 байта вернет блок размером 16 байт, выделение 48 байт - блок в 64 байта, а выделение 128 байт - блок размером 512 байт. Как и при использовании связанного списка, мы отслеживаем неиспользуемую память, создавая связанный список в неиспользуемой памяти. Однако вместо использования одного списка с разными размерами блоков мы создаем отдельный список для каждого класса размеров. Затем каждый список хранит только блоки одного размера. Например, при размерах блоков 16, 64 и 512 в памяти будет три отдельных связанных списка: ![](fixed-size-block-example.svg). Вместо одного указателя `head` у нас есть три указателя head `head_16`, `head_64` и `head_512`, каждый из которых указывает на первый неиспользуемый блок соответствующего размера. Все узлы в одном списке имеют одинаковый размер. Например, список, начинающийся с указателя `head_16`, содержит только 16-байтовые блоки. Это означает, что нам больше не нужно хранить размер в каждом узле списка, поскольку он уже указан в имени указателя head. Поскольку каждый элемент в списке имеет одинаковый размер, каждый элемент списка одинаково подходит для запроса на выделение памяти. Это означает, что мы можем эффективно аллоцировать память, выполняя следующие шаги: - Округлить вверх запрошенный размер памяти до размера поддерживаемого блока. Например, когда запрашивается выделение 12 байт, мы выберем размер блока 16 в приведенном выше примере. - Получить указатель на нужный список, например для блока в 16 байт, нам нужно использовать `head_16`. - Удалить первый блок из списка и вернуть его. Что примечательно, мы всегда может вернуть первый элемент списка и не обходить весь список. Таким образом, аллокация происходит гораздо быстрее, чем с помощью аллокатора на основе связанного списка. #### Блочные Размеры и Потраченная Память В зависимости от размера блоков, мы теряем много памяти из-за округления. Например, для аллокации в 128 байт вернется блок в 512 байт, три четверти выделенной памяти остаются неиспользованными. Определив разумные размеры блоков, можно в некоторой степени ограничить количество потраченной впустую памяти. Например, при использовании степеней числа 2 (4, 8, 16, 32, 64, 128, …) в качестве размеров для блоков мы можем ограничить потерю памяти до половины размера выделения в худшем случае и до четверти размера аллокации в среднем случае. Также часто оптимизируют размеры блоков на основе типичных размеров аллокаций в программе. Например, можно дополнительно добавить размер блока 24, чтобы улучшить использование памяти для программ, которым часто нужны блоки в 24 байта. Таким образом, часто можно уменьшить количество неиспользуемой памяти без потери преимущества в производительности. #### Деаллокация Подобно аллокации, деаллокация также производительна. Она следует следующим шагам: - Округление размера освобожденной памяти до размера следующего блока. Это необходимо, поскольку компилятор передает в `dealloc` только запрошенный размер памяти, а не размер блока, возвращенный `alloc`. Используя одну и ту же функцию выравнивания размера в `alloc` и `dealloc`, мы можем быть уверены, что всегда освобождаем правильный объем памяти. - Получить указатель на соответствующий head-список. - Добавить освобожденный блок в начало списка, обновив указатель на начало. Что примечательно, для деаллокации также не требуется обход списка. Это означает, что время, необходимое для вызова `dealloc`, остается неизменным независимо от длины списка. #### Fallback Allocator Учитывая, что большие аллокации (>2 KB) встречаются довольно редко, особенно в ядре операционных систем, для таких блоков может иметь смысл иметь другой аллокатор. Например, мы могли бы использовать linked list аллокатор для выделений более 2048 байт, чтобы уменьшить потерю памяти. Поскольку ожидается очень мало блоков такого размера, связанный список останется небольшим, а выделение и освобождение памяти по-прежнему будут достаточно быстрыми. #### Создание Новых Блоков {#creating-new-blocks} Выше мы всегда предполагали, что в списке всегда достаточно блоков определенного размера, чтобы удовлетворить все запросы на аллокацию. Однако в какой-то момент связанный список для данного размера блока становится пустым. В этом случае есть два способа создать новые, свободные блоки определенного размера, чтобы удовлетворить запрос на аллокацию: - Выделить новый блок из fallback-аллокатора (если он есть). - Взяв блок другой размерности и разделить его. Это лучше всего работает, если размеры блоков являются степенями двойки. Например, 32-байтовый блок можно разделить на два 16-байтовых блока. Для нашей реализации мы будем выделять новые блоки из fallback-аллокатора, поскольку такая реализация гораздо проще. ### Реализация Теперь, когда мы знаем, как работает фиксированный аллокатор, можем приступить к реализации. Мы не будем полагаться на реализацию linked list аллокатора, созданного в предыдущем разделе, поэтому вы можете следовать этой части, даже если пропустили реализацию аллокатора связанного списка. #### List Node Начнем с создания типа `ListNode` в новом модуле `allocator::fixed_size_block`: ```rust // src/allocator.rs pub mod fixed_size_block; ``` ```rust // src/allocator/fixed_size_block.rs struct ListNode { next: Option<&'static mut ListNode>, } ``` Этот тип аналогичен типу `ListNode` нашей [реализации linked list аллокатора][linked list allocator implementation], с той разницей, что у нас нет поля `size`. Оно не нужно, поскольку каждый блок в списке имеет одинаковый размер при использовании аллокатора блоков фиксированного размера. [linked list allocator implementation]: #the-allocator-type #### Block Sizes Далее мы определяем константу `BLOCK_SIZES` с размерами блоков, используемыми в нашей реализации: ```rust // src/allocator/fixed_size_block.rs /// Размеры блоков, которые будут использоваться. /// /// Каждый размер должен быть степенью числа 2, поскольку они используются и для /// выравнивания блоков (оно всегда должно быть степенью числа 2). const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512, 1024, 2048]; ``` В качестве размеров блоков мы используем степени числа 2, начиная с 8 и заканчивая 2048. Мы не определяем размеры блоков меньше 8, поскольку каждый блок должен быть способен хранить 64-битный указатель на следующий блок при освобождении. Для выделения памяти размером более 2048 байт мы будем использовать аллокатор связанного списка. Чтобы упростить реализацию, мы определяем размер блока как его требуемое выравнивание в памяти. Таким образом, 16-байтовый блок всегда выравнивается по границе 16 байт, а 512-байтовый блок - по границе 512 байт. Поскольку выравнивание всегда должно быть степенью числа 2, это исключает любые другие размеры блоков. Если в будущем нам понадобятся размеры блоков, не являющиеся степенями числа 2, мы все равно сможем адаптировать нашу реализацию для этого (например, определив второй массив `BLOCK_ALIGNMENTS`). #### Тип Аллокатора Используя тип `ListNode` и срез `BLOCK_SIZES`, мы теперь можем определить наш тип аллокатора: ```rust // src/allocator/fixed_size_block.rs pub struct FixedSizeBlockAllocator { list_heads: [Option<&'static mut ListNode>; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap, } ``` Поле `list_heads` представляет собой массив указателей `head`, по одному для каждого размера блока. Это реализуется с помощью `len()` от `BLOCK_SIZES` в качестве длины массива. В качестве fallback-аллокатора, для объектов превышающих максимальный размер блока, мы используем аллокатор, предоставляемый `linked_list_allocator`. Мы также могли бы использовать `LinkedListAllocator`, который мы реализовали сами, но он имеет недостаток, заключающийся в том, что он не [объединяет освобожденные блоки][merge freed blocks]. [merge freed blocks]: #merging-freed-blocks Для построения `FixedSizeBlockAllocator` мы предоставляем те же функции `new` и `init`, которые мы реализовали и для других типов аллокаторов: ```rust // src/allocator/fixed_size_block.rs impl FixedSizeBlockAllocator { /// Создает пустой FixedSizeBlockAllocator. pub const fn new() -> Self { const EMPTY: Option<&'static mut ListNode> = None; FixedSizeBlockAllocator { list_heads: [EMPTY; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap::empty(), } } /// Инициализирует аллокатор с заданными границами кучи. /// /// unsafe, т.к. вызывающая сторона должна гарантировать, что заданные /// границы кучи действительны и куча не используется. Этот метод должен быть /// вызван только один раз. pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.fallback_allocator.init(heap_start, heap_size); } } } ``` Функция `new` просто инициализирует массив `list_heads` пустыми узлами и создает аллокатор связанного списка [`empty`] в качестве `fallback_allocator`. Константа `EMPTY` нужна, чтобы сообщить компилятору Rust, что мы хотим инициализировать массив постоянным значением. Инициализация массива напрямую как `[None; BLOCK_SIZES.len()]` не работает, потому что тогда компилятор требует, чтобы `Option<&'static mut ListNode>` реализовывал трейт `Copy`, чего он не делает. Это текущее ограничение компилятора Rust, которое может исчезнуть в будущем. [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.empty Небезопасная функция `init` вызывает только функцию [`init`] из `fallback_allocator`, не выполняя дополнительной инициализации массива `list_heads`. Вместо этого мы будем инициализировать списки по мере необходимости при вызовах `alloc` и `dealloc`. [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init Для удобства мы также создаем приватный метод `fallback_alloc`, который выполняет аллокацию с помощью `fallback_allocator`: ```rust // src/allocator/fixed_size_block.rs use alloc::alloc::Layout; use core::ptr; impl FixedSizeBlockAllocator { /// Аллокация через fallback-allocator. fn fallback_alloc(&mut self, layout: Layout) -> *mut u8 { match self.fallback_allocator.allocate_first_fit(layout) { Ok(ptr) => ptr.as_ptr(), Err(_) => ptr::null_mut(), } } } ``` Тип [`Heap`] из крейта `linked_list_allocator` не реализует трейт [`GlobalAlloc`] (поскольку это [невозможно без блокировки][not possible without locking]). Вместо этого он предоставляет метод [`allocate_first_fit`][not possible without locking] с немного иным интерфейсом. Вместо возврата `*mut u8` и использования нулевого указателя для сигнализации об ошибке, он возвращает `Result, ()>`. Тип [`NonNull`] - это абстракция для сырого указателя (raw pointer), которая гарантирует, что указатель не может быть нулевым. Преобразуя случай `Ok` через метод [`NonNull::as_ptr`] и случай `Err` в нулевой указатель, мы можем легко преобразовать это обратно в тип `*mut u8`. [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [not possible without locking]: #globalalloc-and-mutability [`allocate_first_fit`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.allocate_first_fit [`NonNull`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html [`NonNull::as_ptr`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html#method.as_ptr #### Вычисление Индекс Списка Прежде чем реализовать трейт `GlobalAlloc`, мы определяем вспомогательную функцию `list_index`, которая возвращает минимально возможный размер блока для заданного [`Layout`]: ```rust // src/allocator/fixed_size_block.rs /// Выбирает предпочитаемый размер блоков для полученного структуры памяти /// /// Возвращает индекс от массива `BLOCK_SIZES`. fn list_index(layout: &Layout) -> Option { let required_block_size = layout.size().max(layout.align()); BLOCK_SIZES.iter().position(|&s| s >= required_block_size) } ``` Блок должен иметь как минимум размер и выравнивание, требуемые полученным `Layout`. Поскольку мы определили, что размер блока равен его выравниванию, это означает, что `required_block_size` является [максимальным][maximum] значением атрибутов [`size()`] и [`align()`]. Чтобы найти следующий по величине блок в срезе `BLOCK_SIZES`, мы сначала используем метод [`iter()`], чтобы получить итератор, а затем метод [`position()`], чтобы найти индекс первого блока, размер которого не меньше `required_block_size`. [maximum]: https://doc.rust-lang.org/core/cmp/trait.Ord.html#method.max [`size()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.size [`align()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align [`iter()`]: https://doc.rust-lang.org/std/primitive.slice.html#method.iter [`position()`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.position Обратите внимание, что мы возвращаем не сам размер блока, а индекс в срезе `BLOCK_SIZES`. Причина в том, что мы хотим использовать возвращаемый индекс в качестве индекса в массиве `list_heads`. #### Реализация `GlobalAlloc` Последний шаг - реализация трейта `GlobalAlloc`: ```rust // src/allocator/fixed_size_block.rs use super::Locked; use alloc::alloc::GlobalAlloc; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { todo!(); } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { todo!(); } } ``` Как и в случае с другими аллокаторами, мы не реализуем trait `GlobalAlloc` напрямую для нашего типа аллокатора, а используем [обертку `Locked`][`Locked` wrapper], чтобы добавить синхронизированную внутреннюю мутабельность. Поскольку реализации `alloc` и `dealloc` относительно велики, мы напишем их по очереди ниже. ##### `alloc` Реализация метода `alloc` выглядит следующим образом: ```rust // `impl` блок в src/allocator/fixed_size_block.rs unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { match allocator.list_heads[index].take() { Some(node) => { allocator.list_heads[index] = node.next.take(); node as *mut ListNode as *mut u8 } None => { // в списке нет блока => выделить новый блок let block_size = BLOCK_SIZES[index]; // работает только тогда, когда размеры блоков являются степенью числа 2 let block_align = block_size; let layout = Layout::from_size_align(block_size, block_align) .unwrap(); allocator.fallback_alloc(layout) } } } None => allocator.fallback_alloc(layout), } } ``` Давайте разберем это шаг за шагом: Сначала мы используем метод `Locked::lock`, чтобы получить мутабельную ссылку на объект-обертку аллокатора. Затем мы вызываем только что определённую функцию `list_index`, чтобы вычислить подходящий размер блока для данной компоновки и получить соответствующий индекс в массиве `list_heads`. Если этот индекс равен `None`, то ни один размер блока не подходит для выделения памяти, поэтому мы используем `fallback_allocator` с помощью функции `fallback_alloc`. Если индекс списка равен `Some`, мы пытаемся удалить первый узел в соответствующем списке, начинающемся с `list_heads[index]`, с помощью метода [`Option::take`]. Если список не пустой, мы входим в ветвь `Some(node)` оператора `match`, где устанавливаем указатель head списка на следующий элемент после извлеченного `node` (снова используя [`take`][`Option::take`]). Наконец, мы возвращаем удаленный указатель `node` как `*mut u8`. [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take Если заголовок списка равен `None` (список блоков пуст), нам нужно создать новый блок, как [описано выше](#creating-new-blocks). Для этого мы сначала получаем текущий размер блока из среза `BLOCK_SIZES` и используем его как для размера, так и для выравнивания нового блока. Затем мы создаём из него новый `Layout` и вызываем метод `fallback_alloc` для аллокации. Причина корректировки структуры памяти и выравнивания в том, что блок будет добавлен в список блоков при освобождении памяти. #### `dealloc` Реализация метода `dealloc` выглядит следующим образом: ```rust // src/allocator/fixed_size_block.rs use core::{mem, ptr::NonNull}; // внутри блока `unsafe impl GlobalAlloc` unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { let new_node = ListNode { next: allocator.list_heads[index].take(), }; // Убедитесь, что блок имеет размер и выравнивание, необходимые для хранения узла. assert!(mem::size_of::() <= BLOCK_SIZES[index]); assert!(mem::align_of::() <= BLOCK_SIZES[index]); let new_node_ptr = ptr as *mut ListNode; unsafe { new_node_ptr.write(new_node); allocator.list_heads[index] = Some(&mut *new_node_ptr); } } None => { let ptr = NonNull::new(ptr).unwrap(); unsafe { allocator.fallback_allocator.deallocate(ptr, layout); } } } } ``` Как и в `alloc`, сначала мы используем метод `lock` чтобы получить мутабельную ссылку на аллокатор, а затем ф-цию `list_index` для получения списка блоков, соответствующего заданному `Layout`. Если индекс равен `None`, значит в `BLOCK_SIZES` нет подходящего по размеру блока, а это значит, что аллокация памяти была выполнена fallback-аллокатором. Поэтому мы используем его метод [`deallocate`][`Heap::deallocate`] что бы освободить память обратно. Метод ожидает `NonNull` вместо `*mut u8`, поэтому нам необходимо сначала преобразовать указатель. (Вызов `unwrap` завершается ошибкой только в случае нулевого указателя, что не должно происходить, когда компилятор вызывает `dealloc`.) [`Heap::deallocate`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.deallocate Если `list_index` возвращает индекс блока, нам нужно добавить освобожденный блок памяти в список. Для этого мы сначала создаем новый `ListNode`, указывающий на текущий заголовок списка (снова используя [`Option::take`]). Прежде чем записывать новый узел в освобожденный блок памяти, мы проверяем, что текущий размер блока, указанный в `index`, имеет требуемый размер и выравнивание для хранения `ListNode`. Затем мы выполняем запись, преобразуя заданный указатель `*mut u8` в указатель `*mut ListNode` и вызывая на нем unsafe метод [`write`][`pointer::write`]. Последний шаг - установить указатель head списка, который в данный момент равен `None`, поскольку мы вызвали `take`, на наш только что записанный `ListNode`. Для этого мы преобразуем исходный `new_node_ptr` в мутабельную ссылку. [`pointer::write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write Стоит отметить несколько моментов: - Мы не различаем блоки, выделенные из списка блоков, и блоки, полученные из fallback-аллокатора. Это означает, что новые блоки, созданные в `alloc`, добавляются в список блоков в `dealloc`, тем самым увеличивая количество блоков этого размера. - Метод `alloc` - единственное место в нашей реализации, где создаются новые блоки. Это означает, что изначально мы начинаем с пустых списков блоков и заполняем их лениво только при выделении памяти размером с блок. - Нам не нужны `unsafe` блоки в `alloc` и `dealloc`, даже несмотря на то, что мы выполняем некоторые небезопасные операции. Причина в том, что Rust в настоящее время рассматривает весь набор небезопасных функций как один большой `unsafe` блок. Поскольку использование явных блоков `unsafe` имеет то преимущество, что очевидно, какие операции являются небезопасными, а какие нет, существует [предложенный RFC](https://github.com/rust-lang/rfcs/pull/2585), который изменит это поведение. ### Используем Это Для использования нашего нового `FixedSizeBlockAllocator` необходимо обновить статическую переменную `ALLOCATOR` в модуле `allocator`: ```rust // src/allocator.rs use fixed_size_block::FixedSizeBlockAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new( FixedSizeBlockAllocator::new()); ``` Поскольку функция `init` ведет себя одинаково для всех реализованных нами распределителей памяти, нам не нужно изменять вызов `init` в `init_heap`. Теперь, когда мы снова запустим наши тесты `heap_allocation`, все тесты должны пройти успешно: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` Наш новый аллокатор, похоже, работает! ### Обсуждение Хотя подход с блоками фиксированного размера демонстрирует гораздо лучшую производительность, чем подход со связанным списком, он приводит к потере до половины памяти при использовании степеней двойки в качестве размеров блоков. Целесообразность такого компромисса во многом зависит от типа приложения. Для ядра операционной системы, где производительность имеет решающее значение, подход с блоками фиксированного размера представляется лучшим выбором. Что касается реализации, то в нашей текущей версии можно улучшить ряд моментов: - Вместо того чтобы выделять блоки только лениво, используя fallback-аллокатор, возможно, лучше предварительно заполнять списки, чтобы повысить производительность первоначальных аллокаций. - Для упрощения реализации мы разрешили только размеры блоков, являющиеся степенями двойки, чтобы мы могли использовать их также в качестве выравнивания блоков. Сохраняя (или вычисляя) выравнивание другим способом, мы могли бы также разрешить произвольные размеры блоков. Таким образом, мы могли бы добавить больше размеров блоков, например, для распространенных размеров выделения памяти, чтобы минимизировать ее потери. - В настоящее время мы только создаем новые блоки, но никогда их не освобождаем. Это приводит к фрагментации и в конечном итоге может привести к сбою выделения памяти при больших аллокациях. Возможно, имеет смысл ввести максимальную длину списка для каждого размера блока. Когда максимальная длина достигается, последующие выделения освобождаются с помощью fallback-аллокатора, а не добавляются в список. - Вместо того, чтобы переходить к аллокатору связанного списка, мы могли бы использовать специальный аллокатор для выделения памяти размером более 4 КБ. Идея заключается в использовании [страничной организации памяти][paging], которая работает со страницами размером 4 КБ, для маппинга непрерывного блока виртуальной памяти на не непрерывные физические фреймы. Таким образом, фрагментация неиспользуемой памяти больше не будет проблемой для аллокаций больших объемов памяти. - С таким аллокатором страниц может иметь смысл добавить размеры блоков до 4 КБ и полностью отказаться от аллокатора связанных списков. Основными преимуществами этого будут уменьшение фрагментации и улучшение предсказуемости производительности, т. е. лучшая производительность в худшем случае. [paging]: @/edition-2/posts/08-paging-introduction/index.md Важно отметить, что описанные выше улучшения реализации являются лишь рекомендациями. Алокаторы, используемые в ядрах операционных систем, как правило, высоко оптимизированы для конкретной рабочей нагрузки ядра, что возможно только благодаря тщательному профилированию. ### Вариации Существует также множество вариаций конструкции фиксированных аллокаторов. Двумя популярными примерами являются аллокатор _slab_ и аллокатор _buddy_, которые также используются в популярных ядрах, таких как Linux. Ниже мы дадим краткое введение в эти две конструкции. #### Slab Allocator Идея [slab-аллокатора][slab allocator] в использовании размеров блоков, которые напрямую соответствуют выбранным типам в ядре. Таким образом, выделение памяти для этих типов точно соответствует размеру блока, и память не тратится зря. Иногда даже можно предварительно инициализировать экземпляры типов в неиспользуемых блоках, чтобы еще больше повысить производительность. [slab allocator]: https://en.wikipedia.org/wiki/Slab_allocation Slab-аллокаторы часто сочетаются с другими аллокаторами. Например, его можно использовать вместе с аллокатором блоков фиксированного размера для дальнейшего разделения выделенного блока с целью сокращения потерь памяти. Его также часто используют для реализации паттерна ["объектный пул"][object pool pattern] поверх одного большого выделения. [object pool pattern]: https://en.wikipedia.org/wiki/Object_pool_pattern #### Buddy Allocator Вместо использования связанного списка для управления освобожденными блоками, в конструкции [buddy allocator] используется [бинарное дерево][binary tree] вместе с размерами блоков, кратными 2. Когда требуется новый блок определенного размера, он разделяет блок большего размера на две половины, создавая тем самым два дочерних узла в дереве. Каждый раз, когда блок снова освобождается, анализируется его соседний блок в дереве. Если соседний блок также свободен, два блока снова объединяются, образуя блок в два раза большего размера. Преимущество этого процесса объединения заключается в том, что [внешняя фрагментация][external fragmentation] уменьшается, так что небольшие освобожденные блоки могут быть повторно использованы для большого выделения. Кроме того, он не использует резервный аллокатор, поэтому производительность более предсказуема. Самым большим недостатком является то, что возможны только размеры блоков, кратные 2, что может привести к большому количеству потраченной впустую памяти из-за [внутренней фрагментации][internal fragmentation]. По этой причине аллокаторы типа buddy часто сочетаются с аллокатором типа slab для дальнейшего разделения выделенного блока на несколько меньших блоков. [buddy allocator]: https://en.wikipedia.org/wiki/Buddy_memory_allocation [binary tree]: https://en.wikipedia.org/wiki/Binary_tree [external fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#External_fragmentation [internal fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#Internal_fragmentation ## Итоги В этой статье был представлен обзор различных конструкций аллокаторов. Мы узнали, как реализовать базовый [bump allocator], который распределяет память линейно, увеличивая один указатель `next`. Хотя bump-аллокатор работает очень быстро, он может повторно использовать память только после того, как все выделенные блоки будут освобождены. По этой причине он редко используется в качестве глобального аллокатора. [bump allocator]: @/edition-2/posts/11-allocator-designs/index.md#bump-allocator Затем мы создали [linked list-аллокатор][linked list allocator], который использует освобожденные блоки памяти для создания связанного списка, так называемого [список свободной памяти][free list]. Этот список позволяет хранить произвольное количество освобожденных блоков разного размера. Хотя при этом не происходит потери памяти, этот подход страдает низкой производительностью, поскольку запрос на выделение памяти может потребовать полного прохождения списка. Наша реализация также страдает от [внешней фрагментации][external fragmentation], поскольку не объединяет соседние освобожденные блоки обратно вместе. [linked list allocator]: @/edition-2/posts/11-allocator-designs/index.md#linked-list-allocator [free list]: https://en.wikipedia.org/wiki/Free_list Чтобы устранить проблемы с производительностью, связанные с использованием связанных списков, мы создали [аллокатор фиксированных блоков][fixed-size block allocator], который заранее определяет фиксированный набор размеров блоков. Для каждого размера блока существует отдельный [список свободной памяти][free list], так что для выделения и освобождения памяти достаточно просто вставить/удалить элемент в начале списка, что делает этот процесс очень быстрым. Поскольку каждое выделение округляется до следующего большего размера блока, часть памяти тратится впустую из-за [внутренней фрагментации][internal fragmentation]. [fixed-size block allocator]: @/edition-2/posts/11-allocator-designs/index.md#fixed-size-block-allocator Существует множество других конструкций аллокаторов с различными компромиссными решениями. [Slab allocation] хорошо подходит для оптимизации распределения общих структур фиксированного размера, но применимо не во всех ситуациях. [Buddy allocation] использует двоичное дерево для объединения освобожденных блоков, но тратит большое количество памяти, поскольку поддерживает только размеры блоков, кратные 2. Также важно помнить, что каждая реализация ядра имеет уникальную рабочую нагрузку, поэтому не существует «лучшей» реализации аллокатора, подходящей для всех случаев. [Slab allocation]: @/edition-2/posts/11-allocator-designs/index.md#slab-allocator [Buddy allocation]: @/edition-2/posts/11-allocator-designs/index.md#buddy-allocator ## Что Далее? Этим постом мы на данный момент завершаем реализацию управления памятью. Далее мы начнем изучать [_многозадачность_][_multitasking_], начиная с кооперативной многозадачности в форме [_async/await_][_async/await_]. В последующих постах мы затем рассмотрим [_потоки_][_threads_], [_многопроцессорность_][_multiprocessing_] и [_процессы_][_processes_]. [_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking [_threads_]: https://en.wikipedia.org/wiki/Thread_(computing) [_processes_]: https://en.wikipedia.org/wiki/Process_(computing) [_multiprocessing_]: https://en.wikipedia.org/wiki/Multiprocessing [_async/await_]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html ================================================ FILE: blog/content/edition-2/posts/11-allocator-designs/index.zh-CN.md ================================================ +++ title = "分配器设计" weight = 11 path = "zh-CN/allocator-designs" date = 2020-01-20 [extra] chapter = "Memory Management" # Please update this when updating the translation translation_based_on_commit = "4e512846617109334af6ae9b1ed03e223cf4b1d0" # GitHub usernames of the people that translated this post translators = ["ttttyy"] # GitHub usernames of the people that contributed to this translation translation_contributors = [] +++ 这篇文章讲解了如何从零开始实现堆分配器。文中介绍并探讨了三种不同的分配器设计,包括bump分配器,链表分配器和固定大小块分配器。对于这三种设计,我们都将构建一个基础实现,供我们的内核使用。 这个系列的 blog 在 [GitHub] 上开放开发,如果你有任何问题,请在这里开一个 issue 来讨论。当然你也可以在 [底部][at the bottom] 留言。你可以在 [`post-11`][post branch] 找到这篇文章的完整源码。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-11 ## 介绍 在 [上一篇文章][previous post] 中,我们为内核添加了基本的堆分配支持。为此,我们在页表中 [创建了一个新的内存区域][map-heap] ,并使用[`linked_list_allocator` crate][use-alloc-crate] 来管理它。现在我们有了一个可以工作的堆,但是我们将大部分工作留给了分配器crate而没有试着理解它是如何工作的。 [previous post]: @/edition-2/posts/10-heap-allocation/index.md [map-heap]: @/edition-2/posts/10-heap-allocation/index.md#creating-a-kernel-heap [use-alloc-crate]: @/edition-2/posts/10-heap-allocation/index.md#using-an-allocator-crate 在本文中,我们将展示如何从零开始实现我们自己的堆分配器,而不是依赖于一个现有的分配器crate。我们将讨论不同的分配器设计,包括一个简化的 _bump 分配器_ 和一个基础的 _固定大小块分配器_ ,并且使用这些知识实现一个性能更好的分配器(相比于`linked_list_allocator` crate)。 ### 设计目标 分配器的职责就是管理可用的堆内存。它需要在`alloc`调用中返回未使用的内存,跟踪被`dealloc`方法释放的内存,以便能再次使用。更重要的是,它必须永远不重复分配已在其他地方使用的内存,因为这会导致未定义的行为。 除了正确性以外,还有许多次要的设计目标。举例来说,分配器应该高效利用可用的内存,并且尽量减少 [碎片化][_fragmentation_] 。此外,它还应适用于并发应用程序,并且可以扩展到任意数量的处理器。为了达到最佳性能,它甚至可以针对CPU缓存优化内存布局,以提高 [缓存局部性][cache locality] 并避免 [假共享][false sharing] 。 [cache locality]: https://www.geeksforgeeks.org/locality-of-reference-and-cache-operation-in-cache-memory/ [_fragmentation_]: https://en.wikipedia.org/wiki/Fragmentation_(computing) [false sharing]: https://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html 这些需求使得优秀的分配器变得非常复杂。例如,[jemalloc] 有超过30,000行代码。这种复杂性不是内核代码所期望的,因为一个简单的bug就能导致严重的安全漏洞。幸运的是,内核代码的内存分配模式通常比用户空间代码简单得多,所以相对简单的分配器设计通常就足够了。 [jemalloc]: http://jemalloc.net/ 接下来,我们将展示三种可能的内存分配器设计并且解释它们的优缺点。 ## Bump分配器 最简单的分配器设计是 _bump分配器_(也被称为 _栈分配器_ )。它线性分配内存,并且只跟踪已分配的字节数量和分配的次数。它只适用于非常特殊的使用场景,因为他有一个严重的限制:它只能一次性释放全部内存。 ### 设计思想 bump分配器的设计思想是通过增加(_"bumping"_)一个指向未使用内存起点的 `next` 变量的值来线性分配内存。一开始,`next`指向堆的起始地址。每次分配内存时,`next`的值都会增加相应的分配大小,从而始终指向已使用和未使用内存之间的边界。 ![堆内存区域在三个时间点的状态: 1:一次分配发生在堆的起始位置,`next` 指针指向它的末尾。 2:在第一次分配之后,又添加了第二次分配,`next` 指针指向第二次分配的末尾。 3:在第二次分配之后,又添加了第三次分配,`next` 指针指向第三次分配的末尾。 ](bump-allocation.svg) `next` 指针只朝一个方向移动,因此同一块内存区域永远不会被重复分配。当它到达堆的末尾时,不再有内存可以分配,下一次分配将导致内存不足错误。 一个bump分配器通常会配合一个分配计数器来实现,每次调用 `alloc` 时增加1;每次调用 `dealloc` 减少1。当分配计数器为零时,这意味着堆上的所有分配都已被释放。在这种情况下,`next` 指针可以被重置为堆的起始地址,使整个堆内存再次可用于分配。 ### 实现 我们从声明一个新的 `allocator::bump` 子模块开始实现: ```rust // in src/allocator.rs pub mod bump; ``` 子模块的内容位于一个新的 `src/allocator/bump.rs` 文件中,我们将使用下面的内容创建它: ```rust // in src/allocator/bump.rs pub struct BumpAllocator { heap_start: usize, heap_end: usize, next: usize, allocations: usize, } impl BumpAllocator { /// 创建一个新的空的bump分配器 pub const fn new() -> Self { BumpAllocator { heap_start: 0, heap_end: 0, next: 0, allocations: 0, } } /// 用给定的堆边界初始化bump分配器 /// 这个方法是不安全的,因为调用者必须确保给定 /// 的内存范围没有被使用。同样,这个方法只能被调用一次。 pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { self.heap_start = heap_start; self.heap_end = heap_start + heap_size; self.next = heap_start; } } ``` `heap_start` 和 `heap_end` 字段跟踪堆内存区域的下界和上界。调用者需要保证这些地址是可用的,否则分配器将返回无效的内存。因此,`init` 函数需要声明为 `unsafe` 。 `next` 字段的作用是始终指向堆的第一个未使用字节,即下一次分配的起始地址。在 `init` 函数中,它被设置为`heap_start` ,因为开始时整个堆都是未使用的。每次分配时,这个字段都会增加相应的分配大小(_“bumped”_),以确保我们不会两次返回相同的内存区域。 `allocations` 字段是一个用于记录活动分配数的简单计数器,其目标是在释放最后一次分配后重置分配器。它的初始值为0。 我们选择创建一个单独的 `init` 函数,而不是直接在 `new` 中执行初始化,是为了保持接口与 `linked_list_allocator` crate 提供的分配器接口一致。这样,分配器就可以在不额外更改代码的情况下进行切换。 ### 实现`GlobalAlloc` 正如 [上篇文章所述][global-alloc] ,所有的堆分配器都必须实现 [`GlobalAlloc`] 特征,其定义如下: [global-alloc]: @/edition-2/posts/10-heap-allocation/index.md#the-allocator-interface [`GlobalAlloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html ```rust pub unsafe trait GlobalAlloc { unsafe fn alloc(&self, layout: Layout) -> *mut u8; unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout); unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } unsafe fn realloc( &self, ptr: *mut u8, layout: Layout, new_size: usize ) -> *mut u8 { ... } } ``` 只有 `alloc` 和 `dealloc` 方法是必须实现的;其他两个方法已有默认实现,可以省略。 #### 第一次实现尝试 让我们试着为我们的 `BumpAllocator` 实现 `alloc` 方法: ```rust // in src/allocator/bump.rs use alloc::alloc::{GlobalAlloc, Layout}; unsafe impl GlobalAlloc for BumpAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // TODO 内存对齐和边界检查 let alloc_start = self.next; self.next = alloc_start + layout.size(); self.allocations += 1; alloc_start as *mut u8 } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { todo!(); } } ``` 首先,我们使用 `next` 字段作为分配的起始地址。然后我们将 `next` 字段更新为分配的结束地址,即堆上的下一个未使用地址。在返回分配起始地址的 `*mut u8` 指针之前,我们将 `allocations` 计数器加一。 注意,我们目前没有执行任何边界检查或是对齐调整,所以这个实现目前是不安全的。但这对我们的实现来说并不重要,因为它会编译失败并报告错误: ``` error[E0594]: cannot assign to `self.next` which is behind a `&` reference --> src/allocator/bump.rs:29:9 | 29 | self.next = alloc_start + layout.size(); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&` reference, so the data it refers to cannot be written ``` (同样的错误也会发生在 `self.allocations += 1` 行。这里为了简洁起见省略了它。) 出现这个错误是因为 `GlobalAlloc` 特征的 [`alloc`] 和 [`dealloc`] 方法只能在一个不可变的 `&self` 引用上操作,因此,更新 `next` 和 `allocations` 字段是不可能的。问题在于,每次分配时更新 `next` 字段正是bump分配器的核心机制。 [`alloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.alloc [`dealloc`]: https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#tymethod.dealloc #### `GlobalAlloc` 和可变性 在我们为可变性问题寻找可能的解决方案前,让我们先理解一下为什么 `GlobalAlloc` 特征的方法是用 `&self` 参数定义的:就像我们在[上一篇文章][global-allocator]中看到的那样,全局堆分配器是通过向实现 `GlobalAlloc` 特征的 `static` 变量上添加 `#[global_allocator]` 属性来定义的。静态变量是 Rust 中的不可变变量,所以无法在静态分配器上调用接受 `&mut self` 的方法。因此,`GlobalAlloc` 特征的所有方法都只接受不可变的 `&self` 引用。 [global-allocator]: @/edition-2/posts/10-heap-allocation/index.md#the-global-allocator-attribute 幸运的是,有一种方法能从 `&self` 引用中获取一个 `&mut self` 引用:我们可以通过将分配器封装在 [`spin::Mutex`] 自旋锁中来实现同步的 [内部可变性][interior mutability] 。这个类型提供的 `lock` 方法能够执行 [互斥][mutual exclusion] ,从而安全地将 `&self` 引用转换为 `&mut self` 引用。我们已经在我们的内核中多次使用了这个封装器类型,例如用于 [VGA 文本缓冲区][vga-mutex] 。 [interior mutability]: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html [vga-mutex]: @/edition-2/posts/03-vga-text-buffer/index.md#spinlocks [`spin::Mutex`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html [mutual exclusion]: https://en.wikipedia.org/wiki/Mutual_exclusion #### `Locked` 封装类型 在 `spin::Mutex`封装类型的帮助下,我们可以为我们的bump分配器实现 `GlobalAlloc` 特征。诀窍是不直接在 `BumpAllocator` 上实现该特征,而是在 `spin::Mutex` 类型实现。 ```rust unsafe impl GlobalAlloc for spin::Mutex {…} ``` 不幸的是,这样还是不行,因为Rust编译器不允许为定义在其他crates中的类型实现特征。 ``` error[E0117]: only traits defined in the current crate can be implemented for arbitrary types --> src/allocator/bump.rs:28:1 | 28 | unsafe impl GlobalAlloc for spin::Mutex { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^-------------------------- | | | | | `spin::mutex::Mutex` is not defined in the current crate | impl doesn't use only types from inside the current crate | = note: define and implement a trait or new type instead ``` 为了解决这个问题,我们需要围绕 `spin::Mutex` 实现我们自己的包装器类型。 ```rust // in src/allocator.rs /// 允许特征实现的围绕 `spin::Mutex` 类型的封装器。 pub struct Locked { inner: spin::Mutex, } impl Locked { pub const fn new(inner: A) -> Self { Locked { inner: spin::Mutex::new(inner), } } pub fn lock(&self) -> spin::MutexGuard { self.inner.lock() } } ``` 这个类型是围绕 `spin::Mutex` 的泛型封装器。它不施加任何对封装类型 `A` 的限制,所以它可以用来封装所有种类的类型,而不仅仅是分配器。它提供了一个简单的 `new` 构造函数,用于封装给定的值。为了方便起见,它还提供了一个 `lock` 函数,用于调用封装的 `Mutex` 上的 `lock` 。由于 `Locked` 类型对于其他分配器实现也很有帮助,所以我们将它放在父 `allocator` 模块中。 #### `Locked` 类型的实现 `Locked` 类型已在我们自己的crate中定义(而不是直接使用 `spin::Mutex`)。因此,可以使用它来为我们的bump分配器实现 `GlobalAlloc` 特征。完整的实现如下: ```rust // in src/allocator/bump.rs use super::{align_up, Locked}; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut bump = self.lock(); // 获取可变引用 let alloc_start = align_up(bump.next, layout.align()); let alloc_end = match alloc_start.checked_add(layout.size()) { Some(end) => end, None => return ptr::null_mut(), }; if alloc_end > bump.heap_end { ptr::null_mut() // 内存不足 } else { bump.next = alloc_end; bump.allocations += 1; alloc_start as *mut u8 } } unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) { let mut bump = self.lock(); // 获取可变引用 bump.allocations -= 1; if bump.allocations == 0 { bump.next = bump.heap_start; } } } ``` `alloc` 和 `dealloc` 的第一步都是调用 [`Mutex::lock`] 方法来通过 `inner` 字段获取封装类型的可变引用。封装实例在方法结束前保持锁定,因此不会在多线程上下文中发生数据竞争(我们很快会添加线程支持)。 [`Mutex::lock`]: https://docs.rs/spin/0.5.0/spin/struct.Mutex.html#method.lock 与之前的原型相比,现在的 `alloc` 实现遵循了对齐要求并执行了边界检查,确保分配的内存区域在堆内存区域内。第一步是将 `next` 地址向上对齐到 `Layout` 参数指定的对齐值。稍后展示 `align_up` 函数的实现。接着,我们将所请求的分配大小加到 `alloc_start` 地址上,得到该次分配的结束地址。为了防止在大内存分配时发生整数溢出,我们使用了 [`checked_add`] 方法。如果发生溢出或分配结束地址大于堆结束地址,我们就返回一个空指针以表示内存不足情况。否则,我们更新 `next` 地址并像之前一样增加 `allocations` 计数器。最后,我们返回转换为 `*mut u8` 指针 `alloc_start` 地址。 [`checked_add`]: https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html `dealloc` 函数忽略了传入的指针和 `Layout` 参数。它仅仅是将 `allocations` 计数器减一。如果计数器再次变为 `0` ,则意味着所有分配都已再次释放。在这种情况下,它将 `next` 地址重置为 `heap_start` 地址,使整个堆内存重新可用。 #### 地址对齐 `align_up` 函数足够通用,因此我们可以将它放到父 `allocator` 模块中。其基本实现如下: ```rust // in src/allocator.rs /// 向上对齐给定地址 `addr` 到对齐值 `align`。 fn align_up(addr: usize, align: usize) -> usize { let remainder = addr % align; if remainder == 0 { addr // 地址已经对齐 } else { addr - remainder + align } } ``` 这个函数首先计算 `addr` 除以 `align` 的[余数][remainder]。如果余数为 `0` ,则地址已经与给定的对齐值对齐。否则,我们通过减去余数(以便余数为 `0`)并加上对齐值(以便地址不小于原始地址)来对齐地址。 [remainder]: https://en.wikipedia.org/wiki/Euclidean_division 注意这不是实现此函数最高效的方法,一个更快的实现如下所示: ```rust /// 向上对齐给定地址 `addr` 到对齐值 `align` 。 /// /// 要求对齐值是2的幂 fn align_up(addr: usize, align: usize) -> usize { (addr + align - 1) & !(align - 1) } ``` 此方法要求 `align` 必须是2的幂,通过 `GlobalAlloc` 特征(及其 [`Layout`] 参数)可以保证这一点。这使得我们可以创建[位掩码][bitmask]来高效地对齐地址。为了理解其工作原理,我们从表达式的右侧逐步解析: [`Layout`]: https://doc.rust-lang.org/alloc/alloc/struct.Layout.html [bitmask]: https://en.wikipedia.org/wiki/Mask_(computing) - 因为 `align` 是2的幂,它的[二进制表示][binary representation]仅有一个比特位为1(例如:`0b000100000`)。这意味着 `align - 1` 在该比特位下的所有低位均为1(例如:`0b00011111`)。 - 通过 `!` 运算符执行[按位取反][bitwise `NOT`]操作, 我们得到一个数,其除了低于 `align`的比特位为0外,其余位均为1。 - 通过将给定地址和 `!(align - 1)` 执行[按位与][bitwise `AND`]操作,我们将该地址 _向下_ 对齐。这是通过将所有低于 `align` 的比特位清除来实现的。 - 因为我们想要向上对齐而不是向下对齐,在执行按位 `AND` 操作之前,先将 `addr` 增加 `align - 1` 的值。这种方式下,已对齐的地址保持不变,而未对齐的地址将被对齐到下一个对齐边界。 [binary representation]: https://en.wikipedia.org/wiki/Binary_number#Representation [bitwise `NOT`]: https://en.wikipedia.org/wiki/Bitwise_operation#NOT [bitwise `AND`]: https://en.wikipedia.org/wiki/Bitwise_operation#AND 你选择使用哪一个变体,这取决于你。它们计算的结果相同,只是使用的方法不同。 ### 用法 为了使用我们的bump分配器,我们需要更新 `allocator.rs` 中的 `ALLOCATOR` 静态变量: ```rust // in src/allocator.rs use bump::BumpAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(BumpAllocator::new()); ``` 我们需要将 `BumpAllocator::new` 和 `Locked::new` 定义为 [`const` 函数][`const` functions] 。如果它们是一般的函数,将会发生编译错误,因为一个 `static` 变量的初始化表达式会在编译时求值。 [`const` functions]: https://doc.rust-lang.org/reference/items/functions.html#const-functions 我们不需要修改我们的 `init_heap` 函数中的 `ALLOCATOR.lock().init(HEAP_START, HEAP_SIZE)` 调用,因为bump分配器提供的接口与 `linked_list_allocator` 提供的接口是一致的。 现在我们的内核使用了我们的bump分配器!一切正常,包括我们在上一篇文章中创建的 [`heap_allocation` tests]: [`heap_allocation` tests]: @/edition-2/posts/10-heap-allocation/index.md#adding-a-test ``` > cargo test --test heap_allocation […] Running 3 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] ``` ### 讨论 bump分配最大的优势就是它非常快。相比其他的需要主动地寻找合适的内存块并且在 `alloc` 和 `dealloc` 时执行各种簿记工作的分配器设计(见下文),bump分配器 [可以对其进行优化][bump downwards] ,使其仅降至仅有几条汇编指令。这使得bump分配器在优化分配性能时非常有用,例如当创建一个 [虚拟 DOM 库][virtual DOM library] 时。 [bump downwards]: https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html [virtual DOM library]: https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/ bump分配器通常不被用作全局分配器,但bump分配的原理通常以 [arena分配][arena allocation] 的形式应用,其核心思想是将独立的小块内存分配操作批量合并处理以提高性能。Rust 的一个arena分配器的例子包含在 [`toolshed`] crate 中。 [arena allocation]: https://mgravell.github.io/Pipelines.Sockets.Unofficial/docs/arenas.html [`toolshed`]: https://docs.rs/toolshed/0.8.1/toolshed/index.html #### bump分配器的缺点 bump分配器的主要限制是它只能在所有已分配的内存都已释放后才能重用已释放的内存。这意味着单个长期存在的分配就可以阻止内存重用。我们可以通过添加 `many_boxes` 测试的变体来看到这一点: ```rust // in tests/heap_allocation.rs #[test_case] fn many_boxes_long_lived() { let long_lived = Box::new(1); // 新的 for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } assert_eq!(*long_lived, 1); // 新的 } ``` 与 `many_boxes` 测试类似,此测试创建了大量的分配,以触发内存不足错误(如果分配器没有重用空闲的内存)。此外,该测试还创建了一个 `long_lived` 分配,它的生命周期贯穿整个循环执行过程。 当我们运行新的测试时,我们会看到它确实失败了: ``` > cargo test --test heap_allocation Running 4 tests simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [failed] Error: panicked at 'allocation error: Layout { size_: 8, align_: 8 }', src/lib.rs:86:5 ``` 让我们试着理解为什么会发生此错误:首先,`long_lived` 分配在堆的起始位置被创建,然后 `allocations` 计数器增加1。对于在循环中的每一次迭代,一个分配会创建并在下一次迭代开始前被直接释放。这意味着 `allocations` 计数器在迭代的一开始短暂地增加为2并在迭代结束时减少为1。现在问题是bump分配器只有在 _所有_ 分配均被释放之后才能重用内存,例如,当 `allocations` 计数器变为0时。因为这在循环结束前不会发生,每次循环迭代分配一个新的内存区域,在一定次数迭代后将导致内存不足错误。 #### 解决测试问题? 有两个潜在的技巧可以用来解决我们bump分配器的测试问题: - 我们可以更新 `dealloc` 方法,通过比较其结束地址与 `next` 指针来检查释放的分配是否与 `alloc` 返回的最后一个分配的结束地址相等。在相等的情况下,我们可以安全地将 `next` 指针恢复为已释放分配的起始地址。这样,每次循环迭代都可以重用相同的内存块。 - 我们可以添加一个 `alloc_back` 方法,该方法使用一个额外的 `next_back` 字段从堆的 _末尾_ 分配内存。然后我们可以为所有长生命周期的分配手动调用此分配方法,从而在堆上实现短生命周期和长生命周期的分配的分离。注意这种分离只有在清楚地知道每个分配会存活多久的前提下才能正常工作。此方法的另一个缺陷是手动进行内存分配是繁琐且不安全的。 虽然这两种方法都可以解决这个测试问题,但因为它们都只能在非常特殊的场景下重用内存,它们都不是通用的解决方案。问题是:存在一种通用的解决方案来重用 _所有_ 已释放的内存吗? #### 重用所有已释放的内存? 从 [上一篇文章][heap-intro] 中我们知道,分配可以存活任意长的时间,也可以以任意顺序被释放。这意味着我们需要跟踪一个可能无界的不连续的未使用内存区域,如下图所示: [heap-intro]: @/edition-2/posts/10-heap-allocation/index.md#dynamic-memory ![](allocation-fragmentation.svg) 这张图展示了堆随时间变化的情况。一开始,整个堆都是未使用的,`next` 地址等于 `heap_start`(第一行)。然后,第一次分配发生(第2行)。在第3行,分配了一个新的内存块并释放了第一个内存块。在第4行添加了更多的分配。其中半数分配是非常短暂的,在第5行已经被释放,此时还新增了一个新的分配。 第五行展示了根本性问题:我们有5个大小不同的未使用内存区域,但 `next` 指针只能指向最后一个区域的开头。虽然我们可以在这个例子中使用一个大小为4的数组来存储其他未使用内存区域的起始地址和大小,但这不是一个通用的解决方案,因为我们可以轻松创建一个使用8、16或1000个未使用内存区域的示例。 通常,当存在潜在无限数量的元素时,我们可以使用一个堆分配集合。这在我们的场景中是不可能的,因为堆分配器不能依赖于它自身(会造成无限递归或死锁)。因此我们需要寻找一种不同的解决方案。 ## 链表分配器 在实现分配器时一个常用的跟踪任意数量的未使用内存区域的技巧是将未使用的内存区域本身用作后备存储。这利用了未使用区域仍然映射到虚拟地址并由物理帧支持,但存储的信息不再被需要这一事实。通过将有关已释放区域的信息存储在区域中,我们可以在不需要额外内存的情况下跟踪无限数量的已释放区域。 最常见的实现方法是在已释放的内存中构造一个单链表,每一个节点都是一个已释放的内存区域: ![](linked-list-allocation.svg) 每个链表节点有两个字段:内存区域的大小和指向下一个未使用内存区域的指针。通过这种方法,我们只需要一个指向第一个未使用区域(称为 `head` )的指针就能跟踪所有未使用的区域而不管它们的数量多少。最终形成的数据结构通常被称为 [_free list_] 。 [_free list_]: https://en.wikipedia.org/wiki/Free_list 你能从这个名字中猜到,这就是 `linked_list_allocator` crate 中用到的技术。使用这种技术的分配器也常被称为 _池分配器_ 。 ### 实现 接下来,我们会创建我们自己的简单的 `LinkedListAllocator` 类型,用于跟踪已释放的内存区域。本部分内容在后续章节中非必需,所以你可以根据自己的喜好跳过实现细节。 #### 分配器类型 {#allocator-type} 我们首先在一个新的 `allocator::linked_list` 子模块中创建一个私有的 `ListNode` 结构体: ```rust // in src/allocator.rs pub mod linked_list; ``` ```rust // in src/allocator/linked_list.rs struct ListNode { size: usize, next: Option<&'static mut ListNode>, } ``` 正如图示所示,链表节点包含一个 `size` 字段和一个指向下一个节点的可选的指针,用 `Option<&'static mut ListNode>` 类型表示。`&'static mut` 类型的语义上描述了一个由指持有的所有权对象。本质上,它是一个缺少在作用域结束时释放对象的析构函数的 [`Box`]智能指针。 [owned]: https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html [`Box`]: https://doc.rust-lang.org/alloc/boxed/index.html 我们为 `ListNode` 实现以下方法: ```rust // in src/allocator/linked_list.rs impl ListNode { const fn new(size: usize) -> Self { ListNode { size, next: None } } fn start_addr(&self) -> usize { self as *const Self as usize } fn end_addr(&self) -> usize { self.start_addr() + self.size } } ``` 此类型包含一个名为 `new` 的构造函数,以及用于计算代表区域起始地址和结束地址的方法。我们将 `new` 函数定义为[常量函数][const function],这一特性在后续构建静态链表分配器时是必需的。 [const function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions 通过将 `ListNode` 结构体作为基础组件,我们现在可以创建 `LinkedListAllocator` 结构体了: ```rust // in src/allocator/linked_list.rs pub struct LinkedListAllocator { head: ListNode, } impl LinkedListAllocator { /// 创建一个空的LinkedListAllocator。 pub const fn new() -> Self { Self { head: ListNode::new(0), } } /// 用给定的堆边界初始化分配器 /// /// 这个函数是不安全的,因为调用者必须保证给定的堆边界是有效的并且堆是未使用的。 /// 此方法只能调用一次 pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.add_free_region(heap_start, heap_size); } } /// 将给定的内存区域添加到链表前端。 unsafe fn add_free_region(&mut self, addr: usize, size: usize) { todo!(); } } ``` 此结构体包含一个指向第一个堆区域的 `head` 节点。我们只关注 `next` 指针的值,所以我们在 `ListNode::new` 函数中将 `size` 设置为0。将 `head` 定义为 `ListNode` 类型而不是 `&'static mut ListNode` 类型的优势在于,`alloc` 方法的实现会更简单。 和bump分配器一样,`new` 函数并未用堆边界初始化分配器。除了保持API兼容性外,这是因为初始化操作需要将链表节点写入堆内存,而这只能在运行时发生。但是,`new` 函数必须被定义为可以在编译期求值的[常量函数][const function],因为该函数将用于初始化 `ALLOCATOR` 静态变量。出于这个原因,我们再次提供一个独立的非常量 `init` 方法。 [`const` function]: https://doc.rust-lang.org/reference/items/functions.html#const-functions `init` 方法使用一个 `add_free_region` 方法,该方法的实现会在稍后展示。现在,我们用 [`todo!`] 宏提供一个总是会触发panic的占位符实现。 [`todo!`]: https://doc.rust-lang.org/core/macro.todo.html #### `add_free_region` 方法 `add_free_region` 方法提供链表的基础 _push_ 操作。我们目前只从 `init` 方法调用它,但它也会是我们 `dealloc` 实现的核心方法。记住,当再次释放已分配的内存区域时,会调用 `dealloc` 方法。为了跟踪此已释放的内存区域,我们希望将其推送到链表中。 `add_free_region` 方法的实现如下: ```rust // in src/allocator/linked_list.rs use super::align_up; use core::mem; impl LinkedListAllocator { /// 将给定的内存区域添加到链表前端。 unsafe fn add_free_region(&mut self, addr: usize, size: usize) { /// 确保给定的内存区域足以存储 ListNode assert_eq!(align_up(addr, mem::align_of::()), addr); assert!(size >= mem::size_of::()); // 创建一个新的 ListNode 并将其添加到链表前端 let mut node = ListNode::new(size); node.next = self.head.next.take(); let node_ptr = addr as *mut ListNode; unsafe { node_ptr.write(node); self.head.next = Some(&mut *node_ptr) } } } ``` 此方法接受一个内存区域的地址和大小作为参数并且将它添加到链表前端。首先,它会确保给定的内存区域是否满足存储 `ListNode` 的所需的最小大小和对齐要求。然后,它会通过以下步骤创建一个新的节点并将其插入链表中: ![](linked-list-allocator-push.svg) 步骤0展示了调用 `add_free_region` 方法之前的堆内存状态。在步骤1中,该方法以图中标记为 `freed` 的内存区域作为参数被调用。在初始检查之后,方法会在栈上创建一个新的 `node`,其大小与已释放的内存区域相同。随后,它使用[`Option::take`]方法将 `node` 的 `next` 指针设置为当前的 `head` 指针,从而将 `head` 指针重置为 `None` 。 [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take 步骤2中,该方法通过 [`write`] 方法将这个新创建的 `node` 写入在空闲内存区域的开始部分。然后,它将 `head` 指针指向这个新节点。结果指针结构看起来有点混乱,因为总是将空闲区域插入到列表的开头,但如果我们跟随着指针,我们会看到每个空闲区域仍然可以从 `head` 指针到达。 [`write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write #### `find_region` 方法 链表的第二个基础操作就是在链表中找到一个节点并移除它。这是实现 `alloc` 方法的中心操作,接下来我们将通过 `find_region` 方法来实现这个操作。 ```rust // in src/allocator/linked_list.rs impl LinkedListAllocator { /// 查找给定大小和对齐方式的空闲区域并将其从链表中移除。 /// /// 返回一个包含链表节点和分配内存区域起始地址的元组。 fn find_region(&mut self, size: usize, align: usize) -> Option<(&'static mut ListNode, usize)> { // 当前链表节点的引用,每次迭代更新 let mut current = &mut self.head; // 在链表中查找合适大小的内存区域 while let Some(ref mut region) = current.next { if let Ok(alloc_start) = Self::alloc_from_region(®ion, size, align) { // 区域适用于分配 -> 从链表中移除该节点 let next = region.next.take(); let ret = Some((current.next.take().unwrap(), alloc_start)); current.next = next; return ret; } else { // 区域不适用 -> 继续下一个区域 current = current.next.as_mut().unwrap(); } } // 未找到合适的区域 None } } ``` 此方法使用一个 `current` 变量和一个 [`while let` 循环] 来遍历链表元素。在开始时,`current` 被设置为(虚拟)`head` 节点。在每次迭代中,它都会被更新为当前节点的 `next` 字段(在 `else` 块中)。如果该区域适用于给定大小和对齐方式的分配,该区域会从链表中移除并与 `alloc_start` 地址一起返回。 [`while let` loop]: https://doc.rust-lang.org/reference/expressions/loop-expr.html#while-let-patterns 当 `current.next` 指针变成 `None` 时,循环退出。这意味着我们遍历了整个链表,但没有找到合适的区域进行分配。在这种情况下,我们返回 `None`。内存区域是否合适是由 `alloc_from_region` 函数检查的,它的实现将在稍后展示。 让我们更详细地了解如何从链表中移除一个合适的内存区域: ![](linked-list-allocator-remove-region.svg) 步骤0展示了任何指针调整之前的状态。`region` 和 `current` 内存区域以及 `region.next` 和 `current.next` 指针都在图中被标记。在步骤1中,通过使用 [`Option::take`] 方法将 `region.next` 和 `current.next` 指针都重置为 `None` 。原指针的值被存储在名为 `next` 和 `ret` 的本地变量中。 步骤2中,`current.next` 指针被设置为本地的 `next` 指针,即原始的 `region.next` 指针。这样做的效果是 `current` 现在直接指向 `region` 后面的内存区域,因此 `region` 不再是链表中的节点。函数随后返回存储在本地 `ret` 变量中的指向 `region` 的指针。 ##### `alloc_from_region` 函数 `alloc_from_region` 函数返回一个区域是否满足指定大小和对齐要求的分配需求。它的定义如下: ```rust // in src/allocator/linked_list.rs impl LinkedListAllocator { /// 尝试将给定区域用于给定大小和对齐要求的分配。 /// /// 成功时返回分配该内存区域的起始地址。 fn alloc_from_region(region: &ListNode, size: usize, align: usize) -> Result { let alloc_start = align_up(region.start_addr(), align); let alloc_end = alloc_start.checked_add(size).ok_or(())?; if alloc_end > region.end_addr() { // 区域太小 return Err(()); } let excess_size = region.end_addr() - alloc_end; if excess_size > 0 && excess_size < mem::size_of::() { // 区域剩余部分太小,不足以存储 ListNode结构体(必须满足此条件, // 因为分配将区域分为已用和空闲部分) return Err(()); } // 内存区域满足分配要求。 Ok(alloc_start) } } ``` 首先,该函数使用我们之前定义的 `align_up` 函数和 [`checked_add`] 方法计算潜在分配的起始和结束地址。如果发生溢出或如果结束地址超出了该区域结束地址,分配就不适合该区域,因此我们将返回一个错误。 该函数随后执行一项并不显而易见的检查。这个检查是必要的,因为大部分情况分配请求无法完全适配某个内存区域,所以在分配之后,该区域仍剩余部分可用的内存空间。此剩余空间必须在分配之后能存储其自身的 `ListNode` ,所以它必须足够大才能这样做。该检查准确地验证了这一点:要么分配完全适配(`excess_size == 0`),要么剩余空间足以存储一个 `ListNode` 。 #### 实现 `GlobalAlloc` 有了在 `add_free_region` and `find_region` 方法中定义的基础操作,我们终于能实现 `GlobalAlloc` 特征了。和bump分配器一样,我们不会直接实现 `GlobalAlloc` 特征,而是为 `LinkedListAllocator` 类型实现 [`Locked` 包装器][`Locked` wrapper]。该包装器通过自旋锁添加内部可变性,这样我们就可以在 `alloc` 和 `dealloc` 方法仅获取到 `&self` 引用的情况下修改分配器实例。 [`Locked` wrapper]: @/edition-2/posts/11-allocator-designs/index.md#a-locked-wrapper-type 其实现如下: ```rust // in src/allocator/linked_list.rs use super::Locked; use alloc::alloc::{GlobalAlloc, Layout}; use core::ptr; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { // 执行布局调整 let (size, align) = LinkedListAllocator::size_align(layout); let mut allocator = self.lock(); if let Some((region, alloc_start)) = allocator.find_region(size, align) { let alloc_end = alloc_start.checked_add(size).expect("overflow"); let excess_size = region.end_addr() - alloc_end; if excess_size > 0 { unsafe { allocator.add_free_region(alloc_end, excess_size); } } alloc_start as *mut u8 } else { ptr::null_mut() } } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { // 执行布局调整 let (size, _) = LinkedListAllocator::size_align(layout); unsafe { self.lock().add_free_region(ptr as usize, size) } } } ``` 让我们从 `dealloc` 方法开始,因为它更简单:首先,该方法执行布局调整,我们将在稍后解释它。然后,该方法通过调用 [`Locked` 包装器][`Locked` wrapper]上的 [`Mutex::lock`] 函数获取一个 `&mut LinkedListAllocator` 引用。最后调用 `add_free_region` 函数将已释放的内存区域添加到空闲链表中。 `alloc` 函数稍有些复杂。它同样从布局调整开始,并且调用 [`Mutex::lock`] 函数来获取一个可变的分配器引用。然后,它调用 `find_region` 方法来查找一个适合分配的内存区域,并从空闲列表中删除该内存区域。如果此调用失败并返回 `None`,则该函数返回 `null_mut` 以表示错误,因为没有找到合适的内存区域。 在成功的场景下,`find_region` 方法返回一个包含适合分配的内存区域(不再在链表中)和分配起始地址的元组。通过 `alloc_start`、分配大小和区域结束地址,它重新计算分配结束地址和剩余空间大小。如果剩余空间大小不为零,则调用 `add_free_region` 将内存区域的剩余空间添加回空闲链表。最后,它将 `alloc_start` 地址转化为 `*mut u8` 指针返回。 #### 布局调整 我们在 `alloc` 和 `dealloc` 调用的布局调整究竟是什么呢?它确保每个已分配的块足以存储一个 `ListNode` 。这是很重要的,因为内存块会在某个时刻被释放,释放时我们会在块中写入一个 `ListNode` 。如果一个块的大小比 `ListNode` 还要小或者没有正确地对齐,将导致未定义的行为。 在 `size_align` 函数中执行的布局调整,其定义如下: ```rust // in src/allocator/linked_list.rs impl LinkedListAllocator { /// 调整给定的内存布局,使最终分配的内存区域 /// 足以存储一个 `ListNode` 。 /// /// 将调整后的大小和对齐方式作为(size, align)元组返回。 fn size_align(layout: Layout) -> (usize, usize) { let layout = layout .align_to(mem::align_of::()) .expect("adjusting alignment failed") .pad_to_align(); let size = layout.size().max(mem::size_of::()); (size, layout.align()) } } ``` 首先,该函数在传入的 [`Layout`] 上调用 [`align_to`] 方法将对齐方式提升至 `ListNode` 的对齐要求。然后,它使用 [`pad_to_align`] 方法将大小向上取整到对齐值的倍数,以确保下一个内存块的起始地址也有正确的对齐方式存储 `ListNode` 。最后,它使用 [`max`] 方法强制最小分配的大小至少为 `mem::size_of::` 。以确保 `dealloc` 函数可以安全地在已释放的内存块写入 `ListNode` 。 [`align_to`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align_to [`pad_to_align`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.pad_to_align [`max`]: https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.max ### 用法 我们可以更新 `allocator` 模块中的 `ALLOCATOR` 静态变量,以使用我们的新 `LinkedListAllocator` : ```rust // in src/allocator.rs use linked_list::LinkedListAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new(LinkedListAllocator::new()); ``` 因为 `init` 函数在bump分配器和链表分配器的行为相同,所以我们不需要修改 `init_heap` 中的 `init` 调用。 当我们再次运行 `heap_allocation` 测试时,我们看到所有测试都通过了,包括使用bump分配器时失败的 `many_boxes_long_lived` 测试: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` 这表明我们的链表分配器可以重用已释放的内存,以满足后续的分配。 ### 讨论 和bump分配器相比,链表分配器更适合作为一个通用分配器,主要是因为它可以直接重用已释放的内存。然而,它也有一些缺点,一部分是由于我们的基础实现所致,另一部分则是由于分配器设计本身的缺陷。 #### 合并已释放的内存块 {#merge-free-blocks} 我们的实现主要的问题就是它只将堆分成更小的内存块,但从不将它们合并到一起。考虑下面的例子: ![](linked-list-allocator-fragmentation-on-dealloc.svg) 在第一行中,我们在堆上创建了三个分配。其中两个分配在第二行被释放,第三行中释放了第三个分配。现在,整个堆再次变为未使用状态,但它被分成了四个独立的内存块。此时,没有一个块足够大,所以无法再创建一个大的分配。随着时间的推移,这个过程继续进行,堆被分成了越来越小的块。在某个时刻,堆已经变得如此碎片化,以至于即使是正常大小的分配也会失败。 为了解决这个问题,我们需要合并相邻的已释放内存块。对于上述示例,这意味着如下操作: ![](linked-list-allocator-merge-on-dealloc.svg) 和之前一样,在第二行中,两个分配被释放。我们现在在 `2a` 行中执行额外的一步来合并最右侧两个相邻的空闲块而不是保持堆碎片化。在第 `3` 行中,第三个分配也被释放(和之前一样),结果是整个未使用的堆被划分成三个独立的块。在第 `3a` 行中额外的合并步骤中,我们再次将三个相邻的块合并到一起。 `linked_list_allocator` crate 通过如下方式实现这一合并策略:在 `deallocate` 调用中,它不会将已释放的内存块插入链表的头部,而是始终保持按起始地址排序维护链表。这样,在 `deallocate` 调用中就可以直接通过检查链表中相邻块的地址和大小来执行合并操作。当然,这样做会使释放操作变慢,但避免了我们上面看到的堆碎片化问题。 #### 性能表现 我们在之前了解到的,bump分配器的性能非常好,因为它只需要几个简单的汇编指令就可以完成。链表分配器的性能要差得多,因为一次分配或许需要遍历整个链表才能找到一个合适的内存块。 因为链表长度取决于未使用内存块的数量,不同程序的性能表现可能差异极大。对于仅创建少量分配的程序,分配性能相对较好。而对于因大量分配导致堆碎片化的程序,分配性能会非常差,因为链表会非常长,大部分内存块尺寸极小。 值得强调的是,相比于我们基础的实现而言,链表方法本身的缺陷才是造成性能问题的主要原因。因为在内核级代码中分配性能相当重要,所以我们将在下文中探索第三种通过降低内存使用率换取性能提升的分配器设计。 ## 固定大小块分配器 接下来,我们展示一种使用固定大小的内存块来满足分配请求的分配器设计。使用这种方法,分配器往往会返回比实际需要更大的内存块,这将会由于 [内部碎片][internal fragmentation] 导致浪费内存,但它会显著减少寻找合适的内存块的时间(相比链表分配器而言),从而获得更好的分配性能。 ### 介绍 _固定大小块分配器_ 背后的思想如下:我们不再精确分配请求所需的内存大小,而是定义一个固定的块大小列表,并且将每个分配向上取整为列表中的下一个内存块大小。例如,对于 16、64 和 512 的块大小,一个 4 字节的分配将返回一个 16 字节的块,一个 48 字节的分配将返回一个 64 字节的块,一个 128 字节的分配将返回一个 512 字节的块。 和链表分配器相同,我们通过在未使用的内存区域中创建链表来跟踪未使用的内存。然而,不再使用单一链表管理不同尺块大小的内存区域,而是为每个尺寸类别创建一个单独的链表。每个列表只存储相同大小的块。例如,对于块大小为 16、64 和 512 的情况,内存中会存在三个单独的链表: ![](fixed-size-block-example.svg). 不同于单个的 `head` 指针,我们现在有三个 `head` 指针 `head_16`、`head_64` 和 `head_512`,它们分别指向对应块大小的第一个未使用内存块。每个链表中的所有节点都具有相同的大小。例如,`head_16` 指针指向的链表只包含 16 字节的块。这意味着我们不再需要在每个链表节点中存储大小,因为它已经由头指针的名称指定。 因为链表中的每个节点都有相同的大小,所以每个节点都同样适合分配请求。这意味着我们可以使用以下步骤非常高效地执行分配操作: - 将请求的分配大小向上取整为下一个块的大小。举例来说,当分配请求12字节时,按上述示例我们选择块大小为16 - 获取该链表的头指针,例如,对于块大小 16,我们需要使用 `head_16`。 - 移除该链表中的第一个块并返回它。 值得注意的是,我们只需要返回链表的第一个元素,不需要遍历整个链表。因此,分配性能相比于链表分配器要更好。 #### 块大小和浪费的内存 根据块大小的不同,向上取整时会浪费大量内存。举个例子,当一个512字节的块被分配给128字节的分配请求时,已分配内存的四分之三是未使用的。通过定义合理的块大小,限制浪费内存的大小是可能的。举例来说,我们使用2的幂(4,8,16,32,64,128,…)作为块大小时,在最差的情况下我们限制浪费内存的大小为已分配大小的一半,平均情况下是四分之一的已分配内存大小。 基于程序中常见的分配内存大小来优化块大小也是普遍做法。举例来说,如果程序中频繁分配24字节的内存时,我们可以额外添加24字节的块大小。这样做可以减少浪费的内存,但不会影响性能。 #### 内存释放 和内存分配类似,内存释放也非常高效。它包括以下步骤: - 将需要释放的块的大小取整到下一个块大小,这是必需的,因为编译器只将请求的大小传入 `dealloc` ,而不是 `alloc` 返回的块大小。通过使用在 `alloc` 中 `dealloc` 中相同的尺寸调整函数,我们能确保释放了正确的内存大小。 - 获取链表的头指针 - 通过更新头指针将已释放的块放到链表头部 值得注意的是,释放内存时不需要遍历链表。这意味着释放内存的时间与链表的长度无关。 #### 后备分配器 考虑到大尺寸内存分配( >2 KB )较少出现,尤其是在操作系统内核中,因此将这些分配回退到不同的分配器是有意义的。例如,我们可以将大于2048字节的分配回退到链表分配器,以减少内存浪费。由于预期这种大小的分配很少,链表规模会保持较小,分配和释放操作的性能也较好。 #### 创建新块 {#create-new-block} 以上的叙述中,我们一直假定有足够的特定大小的未使用块可供分配。然而,在某个特定的块大小的链表为空时,我们有两种方法可以创建新的未使用的特定大小的块来满足分配请求: - 从后备分配器分配一个新块(如果有的话) - 从不同的链表中分配一个更大的块。如果块大小是2的幂,这种方法效果最好。例如,一个32字节的块可以被分成两个16字节的块。 对于我们的实现,我们将从后备分配器分配新的块,因为实现起来要简单得多。 ### 实现 现在我们知道一个固定大小块分配器是如何工作的,我们可以开始我们的实现。我们将不依赖于上一节中创建的链表分配器的实现,因此即使你跳过了链表分配器的实现部分,也可以继续跟随本节内容。 #### 链表节点 我们通过在一个新的 `allocator::fixed_size_block` 模块中创建一个 `ListNode` 类型开始我们的实现: ```rust // in src/allocator.rs pub mod fixed_size_block; ``` ```rust // in src/allocator/fixed_size_block.rs struct ListNode { next: Option<&'static mut ListNode>, } ``` 这个类型和我们 [链表分配器实现][linked list allocator implementation] 中的 `ListNode` 类型类似,不同之处在于我们没有 `size` 字段。该字段在固定大小块分配器设计中不需要,因为每个链表中的块都有相同的大小。 [linked list allocator implementation]: #allocator-type #### 块大小 接下来,我们定义一个常量 `BLOCK_SIZES` 切片,其中包含我们在实现中使用的块大小: ```rust // in src/allocator/fixed_size_block.rs /// 要使用的块大小 /// /// 各块大小必须为2的幂,因为它们同时被 /// 用作块内存对齐(对齐方式必须始终为2的幂) const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512, 1024, 2048]; ``` 我们将使用从8到2048的2的幂作为块大小。我们不定义任何小于8的块大小,因为每个块在释放时都必须能够存储一个指向下一个块的64位指针。对于大于2048字节的分配,我们将回退到链表分配器。 为了简化实现,我们将块的大小定义为其在内存中所需的对齐方式。因此,一个16字节的块始终对齐在16字节边界,一个512字节的块始终对齐512字节边界。由于对齐方式必须始终是2的幂,这意味着任何其他块大小都是无效的。如果我们在未来需要非2的幂的块大小,我们可以调整我们的实现来支持(例如,通过定义一个 `BLOCK_ALIGNMENTS` 数组)。 #### 分配器类型 有了 `ListNode` 类型和 `BLOCK_SIZES` 切片,我们现在可以定义我们的分配器类型: ```rust // in src/allocator/fixed_size_block.rs pub struct FixedSizeBlockAllocator { list_heads: [Option<&'static mut ListNode>; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap, } ``` `list_heads` 字段是一个 `head` 指针的数组,一个指针对应一个块大小。数组的长度通过 `BLOCK_SIZES` 切片的 `len()` 确定。我们使用 `linked_list_allocator` 作为分配请求大小大于最大的块大小时的后备分配器。我们也可以使用我们自己实现的 `LinkedListAllocator` 。但是它的缺点在于不能 [合并空闲块][merge freed blocks] 。 [merge freed blocks]: #merge-free-blocks 为了构造一个 `FixedSizeBlockAllocator`,我们提供与我们为其他分配器类型实现的相同的 `new` 和 `init` 函数: ```rust // in src/allocator/fixed_size_block.rs impl FixedSizeBlockAllocator { /// 创建一个空的FixedSizeBlockAllocator。 pub const fn new() -> Self { const EMPTY: Option<&'static mut ListNode> = None; FixedSizeBlockAllocator { list_heads: [EMPTY; BLOCK_SIZES.len()], fallback_allocator: linked_list_allocator::Heap::empty(), } } /// 用给定的堆边界初始化分配器 /// /// 此函数是不安全的,因为调用者必须保证给定的堆边界是有效的且堆是 /// 未使用的。此方法只能调用一次。 pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) { unsafe { self.fallback_allocator.init(heap_start, heap_size); } } } ``` `new` 函数只是用空节点初始化 `list_heads` 数组,并创建一个 [`empty`] 链表分配器作为 `fallback_allocator` 。`EMPTY` 常量是为了告诉 Rust 编译器我们希望使用常量值初始化数组。直接初始化数组为 `[None; BLOCK_SIZES.len()]` 不起作用,因为编译器会要求 `Option<&'static mut ListNode>` 实现 `Copy` 特征,而但该类型并未实现。这是 Rust 编译器的当前限制,将来可能会改进。 [`empty`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.empty 不安全的 `init` 函数只调用 `fallback_allocator` 的 [`init`] 函数,而不做 `list_heads` 数组的任何额外初始化。相反,我们将在 `alloc` 和 `dealloc` 调用时惰性初始化列表。 [`init`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.init 为了方便起见,我们还创建了一个私有的 `fallback_alloc` 方法来使用 `fallback_allocator` 进行分配: ```rust // in src/allocator/fixed_size_block.rs use alloc::alloc::Layout; use core::ptr; impl FixedSizeBlockAllocator { /// 使用后备分配器分配 fn fallback_alloc(&mut self, layout: Layout) -> *mut u8 { match self.fallback_allocator.allocate_first_fit(layout) { Ok(ptr) => ptr.as_ptr(), Err(_) => ptr::null_mut(), } } } ``` `linked_list_allocator` crate的 [`Heap`] 类型未实现 [`GlobalAlloc`](因为它[没有锁机制是不可能的])。取而代之的是,它提供了一个 [`allocate_first_fit`] 方法,它的接口略有不同。与返回 `*mut u8` 和使用空指针来表示错误不同,它返回一个 `Result, ()>` 。`NonNull` 类型是对保证非空指针的原始指针的抽象。通过将 `Ok` 分支映射到 [`NonNull::as_ptr`] 方法,将 `Err` 映射到空指针,我们可以很轻松地将其转换回 `*mut u8` 类型。 [`Heap`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html [not possible without locking]: #globalalloc-and-mutability [`allocate_first_fit`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.allocate_first_fit [`NonNull`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html [`NonNull::as_ptr`]: https://doc.rust-lang.org/nightly/core/ptr/struct.NonNull.html#method.as_ptr #### 计算列表索引 在我们实现 `GlobalAlloc` 特征之前,我们定义一个 `list_index` 辅助函数,它返回给定 [`Layout`] 的最小可能块大小: ```rust // in src/allocator/fixed_size_block.rs /// 为给定布局选择适当的块大小 /// /// 返回 `BLOCK_SIZES` 数组中的索引 fn list_index(layout: &Layout) -> Option { let required_block_size = layout.size().max(layout.align()); BLOCK_SIZES.iter().position(|&s| s >= required_block_size) } ``` 块大小必须满足给定 `Layout` 的最小大小和对齐要求。由于我们定义了块大小即其对齐方式,这意味着 `required_block_size` 是布局的 [`size()`] 和 [`align()`] 属性的 [最大值]。为了在 `BLOCK_SIZES` 切片中找到下一个更大的块,我们首先使用 [`iter()`] 方法获取迭代器,然后使用 [`position()`] 方法找到第一个大于等于 `required_block_size` 的块的索引。 [maximum]: https://doc.rust-lang.org/core/cmp/trait.Ord.html#method.max [`size()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.size [`align()`]: https://doc.rust-lang.org/core/alloc/struct.Layout.html#method.align [`iter()`]: https://doc.rust-lang.org/std/primitive.slice.html#method.iter [`position()`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.position 注意我们不返回块大小本身,而是返回 `BLOCK_SIZES` 切片的索引。这是因为我们希望将返回的索引用作 `list_heads` 数组的索引。 #### 实现 `GlobalAlloc` 最后一步是实现 `GlobalAlloc` 特征: ```rust // in src/allocator/fixed_size_block.rs use super::Locked; use alloc::alloc::GlobalAlloc; unsafe impl GlobalAlloc for Locked { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { todo!(); } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { todo!(); } } ``` 和其他分配器类似,我们不会直接为我们的分配器类型实现 `GlobalAlloc` 特征,而是使用 [`Locked` 包装器][`Locked` wrapper] 来添加同步的内部可变性。由于 `alloc` 和 `dealloc` 实现相对较长,我们接下来逐一介绍。 [`Locked` wrapper]: https://docs.rs/linked-list-allocator/0.9.0/linked_list_allocator/struct.Locked.html ##### `alloc` `alloc` 方法的实现如下 ```rust // in `impl` block in src/allocator/fixed_size_block.rs unsafe fn alloc(&self, layout: Layout) -> *mut u8 { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { match allocator.list_heads[index].take() { Some(node) => { allocator.list_heads[index] = node.next.take(); node as *mut ListNode as *mut u8 } None => { // 没有块存在于列表中 => 分配新块 let block_size = BLOCK_SIZES[index]; // 只有当所有块大小都是 2 的幂时才有效 let block_align = block_size; let layout = Layout::from_size_align(block_size, block_align) .unwrap(); allocator.fallback_alloc(layout) } } } None => allocator.fallback_alloc(layout), } } ``` 我们逐步来看 首先,我们使用 `Locked::lock` 方法来获取对被包装的分配器实例的可变引用。接下来,我们调用刚刚定义的 `list_index` 函数来为给定布局计算合适的块大小,并获取其在 `list_heads` 数组中对应的索引。如果该索引为 `None`,表示没有适合分配的块大小,因此我们调用 `fallback_alloc` 函数来调用 `fallback_allocator`。 如果列表索引为 `Some` ,我们尝试使用 [`Option::take`] 方法从对应列表的开头移除第一个节点。如果列表不为空,我们进入 `Some(node)` 分支,其中我们将列表头指针指向弹出节点的后继节点(再次使用 [`take`][`Option::take`])。最后,我们将弹出节点指针转换为 `*mut u8` 类型返回。 [`Option::take`]: https://doc.rust-lang.org/core/option/enum.Option.html#method.take 如果链表头是 `None`,则表明该尺寸的内存块链表为空。这意味着我们需要像[上文](#create-new-block)中描述的那样构造一个新块。为此,我们首先从 `BLOCK_SIZES` 切片中获取当前块大小,并将其作为新块的大小和对齐方式。然后我们基于此大小和对齐方式创建一个新的 `Layout` 并调用 `fallback_alloc` 方法执行分配。调整布局和对齐的原因是确保内存块将在释放时能被正确地添加到对应的块列表中。 #### `dealloc` `dealloc` 方法的实现如下: ```rust // in src/allocator/fixed_size_block.rs use core::{mem, ptr::NonNull}; // 在 `unsafe impl GlobalAlloc` 代码块中 unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { let mut allocator = self.lock(); match list_index(&layout) { Some(index) => { let new_node = ListNode { next: allocator.list_heads[index].take(), }; // 验证块是否满足存储节点所需的大小和对齐方式要求 assert!(mem::size_of::() <= BLOCK_SIZES[index]); assert!(mem::align_of::() <= BLOCK_SIZES[index]); let new_node_ptr = ptr as *mut ListNode; unsafe { new_node_ptr.write(new_node); allocator.list_heads[index] = Some(&mut *new_node_ptr); } } None => { let ptr = NonNull::new(ptr).unwrap(); unsafe { allocator.fallback_allocator.deallocate(ptr, layout); } } } } ``` 和 `alloc` 方法类似,我们首先使用 `lock` 方法获取一个可变的分配器引用,接着调用 `list_index` 函数获取给定 `Layout` 的对应的块列表。如果索引为 `None` ,在 `BLOCK_SIZES` 中没有匹配的块大小,说明此分配是由后备分配器分配的。因此我们使用它的 [`deallocate`][`Heap::deallocate`] 方法来重新释放内存。该方法期望接收 [`NonNull`] 而不是 `*mut u8` ,因此我们需要转换指针。( `unwrap` 调用尽在指针为空时失败,而当编译器调用 `dealloc` 这种请狂永远不会发生。) [`Heap::deallocate`]: https://docs.rs/linked_list_allocator/0.9.0/linked_list_allocator/struct.Heap.html#method.deallocate 如果 `list_index` 返回一个块索引,我们需要将已释放的内存块添加到链表中。为此,我们首先创建一个新的 `ListNode`,它指向当前列表头(通过再次调用 [`Option::take`])。在将新节点写入已释放的内存块之前,我们首先断言当前块大小由 `index` 指定的大小和对齐方式对于存储 `ListNode` 是足够的。然后,我们通过将给定的 `*mut u8` 指针转换为 `*mut ListNode` 指针,然后在其上调用不安全的 [`write`][`pointer::write`] 方法来执行写入。最后一步是将列表头指针设置为我们刚刚写入的 `ListNode`。为此,我们将原始的 `new_node_ptr` 转换为可变引用。 [`pointer::write`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write 还有一些需要注意的事项: - 我们不区分从块列表中分配的块和从后备分配器中分配的块。这意味着在 `alloc` 中创建的新块会在调用 `dealloc` 时会被添加到相应的块列表中,从而增加该大小的块数量。 - 在我们的实现中,`alloc` 方法是唯一可以创建新块的地方,这意味着初始时我们的块链表均为空,仅当请求对应尺寸的分配时,这些链表才会懒加载。 - 在 `alloc` 和 `dealloc` 中,我们无需显式使用 `unsafe` 代码块,即使我们做了一些 `unsafe` 操作。原因是rust将整个不安全的函数体视作一个大的 `unsafe` 代码块。由于使用显式的 `unsafe` 代码块可有一个优势即可以清楚地知道哪些操作是不安全的,哪些是安全的, 已有 [提议的RFC](https://github.com/rust-lang/rfcs/pull/2585) 要求修改此行为。 ### 用法 为了使用我们新的 `FixedSizeBlockAllocator`,我们需要更新 `allocator` 模块中的 `ALLOCATOR` 静态变量: ```rust // in src/allocator.rs use fixed_size_block::FixedSizeBlockAllocator; #[global_allocator] static ALLOCATOR: Locked = Locked::new( FixedSizeBlockAllocator::new()); ``` 因为我们的 `init` 函数对于我们实现的所有分配器都具有相同的行为,所以我们不需要修改 `init_heap` 中的 `init` 调用。 当我们再次运行 `heap_allocation` 测试时,所有测试都仍然是全部通过: ``` > cargo test --test heap_allocation simple_allocation... [ok] large_vec... [ok] many_boxes... [ok] many_boxes_long_lived... [ok] ``` 我们的分配器似乎运行正常! ### 讨论 尽管固定大小块分配器相比于链表分配器有更好的性能,但当使用2的幂作为块大小时,它会浪费一半的内存。这个取舍是否值得取决于应用的类型。对于操作系统内核来说,性能是至关重要的,因此固定大小块分配器看起来是更好的选择。 从实现角度说,我们现有的实现还有一些地方可以提升 - 相较于使用后备分配器懒分配内存块,更好的做法是预填块列表来提高初始分配的性能。 - 为了简化实现,我们将块大小限制为2的幂,一便将它们用作块对齐方式。若通过其他方式存储(或计算)块对齐方式,我们可以添加更多块大小,如常见分配尺寸,以减少内存浪费。 - 我们目前仅创建新块,但从不再次释放它们。这导致了内存碎片,最终可能导致大尺寸内存分配失败。可能有必要为每个块大小设置最大列表长度。当达到最大长度时,后续的释放操作将使用后备分配器而不是添加到列表中。 - 相比于回退到链表分配器,我们也可以有一个专门的分配器用于大于4 KiB的分配。其基本思想是利用 [paging] ,它在4 KiB页面上操作,将连续的虚拟内存映射到非连续的物理帧。这样,对于大型分配,未使用内存的碎片问题不再是问题。 - 有了这样的页分配器,我们就可以添加大于4 KiB的块大小,同时完全放弃链表分配器。这样做的主要优势是减少碎片,提高性能可预测性,即更好的最坏情况性能。 [paging]: @/edition-2/posts/08-paging-introduction/index.md 需要注意的是以上提到的改进仅为建议。在操作系统内核中使用的分配器通常都针对特定工作负载进行了高度优化,而这能只有通过广泛的性能分析才能实现。 ### 变体 固定大小块分配器还有许多变体。两个广泛应用的例子是 _slab分配器_ 和 _伙伴分配器_,它们也被用于Linux等流行的内核中。下面我们将简单介绍这两种设计。 #### Slab分配器 [slab分配器][slab allocator] 的核心思想是使用与内核中选择的类型直接对应的块大小。这样,这些类型的分配精确匹配块大小,没有浪费任何内存。有时,甚至可能预先初始化未使用块中的类型实例,以进一步提高性能。 [slab allocator]: https://en.wikipedia.org/wiki/Slab_allocation Slab分配器常和其他分配器组合使用。举个例子,它可以和一个固定大小块分配器一起使用,对已分配的内存块进一步细分以减少内存浪费。它还常被用来在单次大块分配上实现 [对象池模式][object pool pattern] 。 [object pool pattern]: https://en.wikipedia.org/wiki/Object_pool_pattern #### 伙伴分配器 [伙伴分配器][buddy allocator] 使用一个 [二叉树][binary tree] 数据结构而不是链表来管理空闲块,并使用2的幂作为块大小。当需要一个特定大小的块时,它会将一个更大的块拆成两半,从而在树中创建两个子节点。当一个块再次被释放时,会检查它在树上的相邻块。如果相邻块也是空闲的,那么这两个块就会合并为一个双倍尺寸的块。 合并过程的优势在于减少了 [内部碎片][internal fragmentation] ,因此小的空闲块也能被一个大的分配重用。同时它也不需要一个后备分配器,因此性能更容易预测。然而,伙伴分配器只支持2的幂作为块大小,这会因为 [内部碎片][internal fragmentation] 问题导致浪费大量内存。因此,伙伴分配器通常与slab分配器结合使用,进一步将分配的块拆分成多个较小的块。 [buddy allocator]: https://en.wikipedia.org/wiki/Buddy_memory_allocation [binary tree]: https://en.wikipedia.org/wiki/Binary_tree [external fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#External_fragmentation [internal fragmentation]: https://en.wikipedia.org/wiki/Fragmentation_(computing)#Internal_fragmentation ## 总结 这篇文章介绍了不同的分配器设计。我们学习了如何实现一个基本的 [bump分配器][bump allocator] ,它通过增加一个 `next` 指针线性地分配内存。虽然这种分配很快,但只有在所有分配都被释放后才能重用内存。因此,它很少被用作全局分配器。 [bump allocator]: @/edition-2/posts/11-allocator-designs/index.md#bump-allocator 接着,我们创建了一个 [链表分配器][linked list allocator] ,它使用空闲的内存块本身来创建一个链表,称为 [空闲链表][free list] 。这个链表使我们能够存储不同大小的任意数量的空闲块。虽然没有发生内存浪费,但这种方法的性能较差,因为分配请求可能需要遍历整个列表。我们的实现也因为没有合并相邻的空闲块而存在 [外部碎片][external fragmentation] 问题。 [linked list allocator]: @/edition-2/posts/11-allocator-designs/index.md#linked-list-allocator [free list]: https://en.wikipedia.org/wiki/Free_list 为了解决链表方法的性能问题,我们创建了一个 [固定大小块分配器][fixed-size block allocator] ,它预先定义了一组固定的块大小。对于每个块大小,都存在一个单独的 [空闲链表][free list] ,以便分配和释放只需要在列表的头部插入/弹出,因此它非常快。由于每个分配都被舍入到下一个更大的块大小,因此由于 [内部碎片][internal fragmentation] 而导致浪费了一些内存。然而,这种方法对于大部分分配来说是快速的,并且内存浪费对于大部分用例来说是可接受的。 为了解决链表方法的性能问题,我们创建了一个预定义了固定块大小的 [固定大小块分配器][fixed-size block allocator] 。对于每个块大小,都存在一个单独的 [空闲链表][free list] ,以便分配和释放操作只需要在列表的前面插入/弹出,因此非常快。由于每个分配都被向上取整到下一个更大的块大小,因此由于 [内部碎片][internal fragmentation] 而导致浪费了一些内存。 [fixed-size block allocator]: @/edition-2/posts/11-allocator-designs/index.md#fixed-size-block-allocator 分配器设计还存在多种权衡方案。[Slab分配][Slab allocation] 适用于优化常见固定大小结构的分配,但它并不适用于所有场景。[伙伴分配][Buddy allocation] 使用二叉树实现空闲块的合并,但由于只支持2的幂作为块大小,因此浪费了大量内存。还要记住的是,每个内核实现都有一个独特的工作负载,所以没有适合所有场景的“最佳”分配器设计。 [Slab allocation]: @/edition-2/posts/11-allocator-designs/index.md#slab-allocator [Buddy allocation]: @/edition-2/posts/11-allocator-designs/index.md#buddy-allocator ## 下篇预告 通过本文,我们暂时完成了我们内存管理的实现。在下一篇文章中,我们将开始探索 [_多任务处理_][_multitasking_] ,首先从 [_async/await_] 的形式开始协作式多任务处理。随后的文章,我们将探讨 [_线程_][_threads_] 、[_多处理_][_multiprocessing_] 和 [_进程_][_processes_] 。 [_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking [_threads_]: https://en.wikipedia.org/wiki/Thread_(computing) [_processes_]: https://en.wikipedia.org/wiki/Process_(computing) [_multiprocessing_]: https://en.wikipedia.org/wiki/Multiprocessing [_async/await_]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html ================================================ FILE: blog/content/edition-2/posts/12-async-await/index.es.md ================================================ +++ title = "Async/Aait" weight = 12 path = "es/async-await" date = 2020-03-27 [extra] chapter = "Multitasking" # GitHub usernames of the people that translated this post translators = ["dobleuber"] +++ En esta publicación, exploramos el _multitasking cooperativo_ y la característica _async/await_ de Rust. Observamos en detalle cómo funciona async/await en Rust, incluyendo el diseño del trait `Future`, la transformación de máquina de estado y el _pinning_. Luego añadimos soporte básico para async/await a nuestro núcleo creando una tarea de teclado asíncrona y un ejecutor básico. Este blog se desarrolla abiertamente en [GitHub]. Si tienes problemas o preguntas, por favor abre un issue allí. También puedes dejar comentarios [al final]. El código fuente completo de esta publicación se puede encontrar en la rama [`post-12`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [al final]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-12 ## Multitasking Una de las características fundamentales de la mayoría de los sistemas operativos es el [_multitasking_], que es la capacidad de ejecutar múltiples tareas de manera concurrente. Por ejemplo, probablemente tienes otros programas abiertos mientras miras esta publicación, como un editor de texto o una ventana de terminal. Incluso si solo tienes una ventana del navegador abierta, probablemente hay diversas tareas en segundo plano para gestionar tus ventanas de escritorio, verificar actualizaciones o indexar archivos. [_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking Aunque parece que todas las tareas corren en paralelo, solo se puede ejecutar una sola tarea en un núcleo de CPU a la vez. Para crear la ilusión de que las tareas corren en paralelo, el sistema operativo cambia rápidamente entre tareas activas para que cada una pueda avanzar un poco. Dado que las computadoras son rápidas, no notamos estos cambios la mayor parte del tiempo. Mientras que las CPU de un solo núcleo solo pueden ejecutar una sola tarea a la vez, las CPU de múltiples núcleos pueden ejecutar múltiples tareas de manera verdaderamente paralela. Por ejemplo, una CPU con 8 núcleos puede ejecutar 8 tareas al mismo tiempo. Explicaremos cómo configurar las CPU de múltiples núcleos en una publicación futura. Para esta publicación, nos enfocaremos en las CPU de un solo núcleo por simplicidad. (Vale la pena mencionar que todas las CPU de múltiples núcleos comienzan con solo un núcleo activo, así que podemos tratarlas como CPU de un solo núcleo por ahora.) Hay dos formas de multitasking: el multitasking _cooperativo_ requiere que las tareas cedan regularmente el control de la CPU para que otras tareas puedan avanzar. El multitasking _preemptivo_ usa funcionalidades del sistema operativo para cambiar de hilo en puntos arbitrarios en el tiempo forzosamente. A continuación exploraremos las dos formas de multitasking en más detalle y discutiremos sus respectivas ventajas y desventajas. ### Multitasking Preemptivo La idea detrás del multitasking preemptivo es que el sistema operativo controla cuándo cambiar de tareas. Para ello, utiliza el hecho de que recupera el control de la CPU en cada interrupción. Esto hace posible cambiar de tareas cuando hay nueva entrada disponible para el sistema. Por ejemplo, sería posible cambiar de tareas cuando se mueve el mouse o llega un paquete de red. El sistema operativo también puede determinar el momento exacto en que se permite que una tarea se ejecute configurando un temporizador de hardware para enviar una interrupción después de ese tiempo. La siguiente gráfica ilustra el proceso de cambio de tareas en una interrupción de hardware: ![](regain-control-on-interrupt.svg) En la primera fila, la CPU está ejecutando la tarea `A1` del programa `A`. Todas las demás tareas están en pausa. En la segunda fila, una interrupción de hardware llega a la CPU. Como se describió en la publicación sobre [_Interrupciones de Hardware_], la CPU detiene inmediatamente la ejecución de la tarea `A1` y salta al controlador de interrupciones definido en la tabla de descriptores de interrupciones (IDT). A través de este controlador de interrupciones, el sistema operativo vuelve a tener control de la CPU, lo que le permite cambiar a la tarea `B1` en lugar de continuar con la tarea `A1`. [_Interrupciones de Hardware_]: @/edition-2/posts/07-hardware-interrupts/index.md #### Guardando Estado Dado que las tareas se interrumpen en puntos arbitrarios en el tiempo, pueden estar en medio de ciertos cálculos. Para poder reanudarlas más tarde, el sistema operativo debe respaldar todo el estado de la tarea, incluyendo su [pila de llamadas](https://en.wikipedia.org/wiki/Call_stack) y los valores de todos los registros de CPU. Este proceso se llama [_cambio de contexto_]. [call stack]: https://en.wikipedia.org/wiki/Call_stack [_cambio de contexto_]: https://en.wikipedia.org/wiki/Context_switch Dado que la pila de llamadas puede ser muy grande, el sistema operativo normalmente establece una pila de llamadas separada para cada tarea en lugar de respaldar el contenido de la pila de llamadas en cada cambio de tarea. Tal tarea con su propia pila se llama [_hilo de ejecución_] o _hilo_ a secas. Al usar una pila separada para cada tarea, solo se necesitan guardar los contenidos de registro en un cambio de contexto (incluyendo el contador de programa y el puntero de pila). Este enfoque minimiza la sobrecarga de rendimiento de un cambio de contexto, lo que es muy importante, ya que los cambios de contexto a menudo ocurren hasta 100 veces por segundo. [_hilo de ejecución_]: https://en.wikipedia.org/wiki/Thread_(computing) #### Discusión La principal ventaja del multitasking preemptivo es que el sistema operativo puede controlar completamente el tiempo de ejecución permitido de una tarea. De esta manera, puede garantizar que cada tarea obtenga una parte justa del tiempo de CPU, sin necesidad de confiar en que las tareas cooperen. Esto es especialmente importante al ejecutar tareas de terceros o cuando varios usuarios comparten un sistema. La desventaja de la preempción es que cada tarea requiere su propia pila. En comparación con una pila compartida, esto resulta en un mayor uso de memoria por tarea y a menudo limita la cantidad de tareas en el sistema. Otra desventaja es que el sistema operativo siempre debe guardar el estado completo de los registros de CPU en cada cambio de tarea, incluso si la tarea solo utilizó un pequeño subconjunto de los registros. El multitasking preemptivo y los hilos son componentes fundamentales de un sistema operativo porque hacen posible ejecutar programas de espacio de usuario no confiables. Discutiremos estos conceptos en detalle en publicaciones futuras. Sin embargo, para esta publicación, nos enfocaremos en el multitasking cooperativo, que también proporciona capacidades útiles para nuestro núcleo. ### Multitasking Cooperativo En lugar de pausar forzosamente las tareas en ejecución en puntos arbitrarios en el tiempo, el multitasking cooperativo permite que cada tarea se ejecute hasta que ceda voluntariamente el control de la CPU. Esto permite a las tareas pausarse a sí mismas en puntos convenientes en el tiempo, por ejemplo, cuando necesitan esperar por una operación de E/S de todos modos. El multitasking cooperativo se utiliza a menudo a nivel de lenguaje, como en forma de [corutinas](https://en.wikipedia.org/wiki/Coroutine) o [async/await](https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html). La idea es que bien el programador o el compilador inserten operaciones [_yield_] en el programa, que ceden el control de la CPU y permiten que otras tareas se ejecuten. Por ejemplo, se podría insertar un yield después de cada iteración de un bucle complejo. [async/await]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html [_yield_]: https://en.wikipedia.org/wiki/Yield_(multithreading) Es común combinar el multitasking cooperativo con [operaciones asíncronas](https://en.wikipedia.org/wiki/Asynchronous_I/O). En lugar de esperar hasta que una operación se complete y prevenir que otras tareas se ejecuten durante este tiempo, las operaciones asíncronas devuelven un estado "no listo" si la operación aún no ha finalizado. En este caso, la tarea en espera puede ejecutar una operación yield para permitir que otras tareas se ejecuten. [operaciones asíncronas]: https://en.wikipedia.org/wiki/Asynchronous_I/O #### Guardando Estado Debido a que las tareas definen sus propios puntos de pausa, no necesitan que el sistema operativo guarde su estado. En su lugar, pueden guardar exactamente el estado que necesitan para continuar antes de pausarse, lo que a menudo resulta en un mejor rendimiento. Por ejemplo, una tarea que acaba de finalizar un cálculo complejo podría necesitar respaldar solo el resultado final del cálculo ya que no necesita los resultados intermedios. Las implementaciones respaldadas por el lenguaje de tareas cooperativas son a menudo capaces de respaldar las partes necesarias de la pila de llamadas antes de pausarse. Como ejemplo, la implementación de async/await de Rust almacena todas las variables locales que aún se necesitan en una estructura generada automáticamente (ver más abajo). Al respaldar las partes relevantes de la pila de llamadas antes de pausarse, todas las tareas pueden compartir una única pila de llamadas, lo que resulta en un consumo de memoria mucho más bajo por tarea. Esto hace posible crear un número casi arbitrario de tareas cooperativas sin quedarse sin memoria. #### Discusión La desventaja del multitasking cooperativo es que una tarea no cooperativa puede potencialmente ejecutarse durante un tiempo ilimitado. Por lo tanto, una tarea maliciosa o con errores puede evitar que otras tareas se ejecuten y retardar o incluso bloquear todo el sistema. Por esta razón, el multitasking cooperativo debería usarse solo cuando todas las tareas se sabe que cooperan. Por ejemplo, no es una buena idea hacer que el sistema operativo dependa de la cooperación de programas de nivel de usuario arbitrarios. Sin embargo, los fuertes beneficios de rendimiento y memoria del multitasking cooperativo lo convierten en un buen enfoque para uso _dentro_ de un programa, especialmente en combinación con operaciones asíncronas. Dado que un núcleo del sistema operativo es un programa crítico en términos de rendimiento que interactúa con hardware asíncrono, el multitasking cooperativo parece ser un buen enfoque para implementar concurrencia. ## Async/Await en Rust El lenguaje Rust proporciona soporte de primera clase para el multitasking cooperativo en forma de async/await. Antes de que podamos explorar qué es async/await y cómo funciona, necesitamos entender cómo funcionan los _futuros_ y la programación asíncrona en Rust. ### Futuros Un _futuro_ representa un valor que puede no estar disponible aún. Esto podría ser, por ejemplo, un número entero que es calculado por otra tarea o un archivo que se está descargando de la red. En lugar de esperar hasta que el valor esté disponible, los futuros permiten continuar la ejecución hasta que el valor sea necesario. #### Ejemplo El concepto de futuros se ilustra mejor con un pequeño ejemplo: ![Diagrama de secuencia: main llama a `read_file` y está bloqueado hasta que regrese; luego llama a `foo()` y también está bloqueado hasta que regrese. El mismo proceso se repite, pero esta vez se llama a `async_read_file`, que devuelve directamente un futuro; luego se llama a `foo()` de nuevo, que ahora se ejecuta concurrentemente con la carga del archivo. El archivo está disponible antes de que `foo()` regrese.](async-example.svg) Este diagrama de secuencia muestra una función `main` que lee un archivo del sistema de archivos y luego llama a una función `foo`. Este proceso se repite dos veces: una vez con una llamada síncrona `read_file` y otra vez con una llamada asíncrona `async_read_file`. Con la llamada síncrona, la función `main` necesita esperar hasta que el archivo se cargue desde el sistema de archivos. Solo entonces puede llamar a la función `foo`, lo que requiere que espere nuevamente por el resultado. Con la llamada asíncrona `async_read_file`, el sistema de archivos devuelve directamente un futuro y carga el archivo de forma asíncrona en segundo plano. Esto permite que la función `main` llame a `foo` mucho antes, que luego se ejecuta en paralelo con la carga del archivo. En este ejemplo, la carga del archivo incluso termina antes de que `foo` regrese, por lo que `main` puede trabajar directamente con el archivo sin mayor espera después de que `foo` regrese. #### Futuros en Rust En Rust, los futuros están representados por el trait [`Future`], que se ve de la siguiente manera: [`Future`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html ```rust pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll; } ``` El tipo [asociado](https://doc.rust-lang.org/book/ch20-02-advanced-traits.html#associated-types) `Output` especifica el tipo del valor asíncrono. Por ejemplo, la función `async_read_file` en el diagrama anterior devolvería una instancia de `Future` con `Output` configurado a `File`. El método [`poll`] permite comprobar si el valor ya está disponible. Devuelve un enum [`Poll`], que se ve de la siguiente manera: [`poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll [`Poll`]: https://doc.rust-lang.org/nightly/core/task/enum.Poll.html ```rust pub enum Poll { Ready(T), Pending, } ``` Cuando el valor ya está disponible (por ejemplo, el archivo se ha leído completamente desde el disco), se devuelve envuelto en la variante `Ready`. De lo contrario, se devuelve la variante `Pending`, que señala al llamador que el valor aún no está disponible. El método `poll` toma dos argumentos: `self: Pin<&mut Self>` y `cx: &mut Context`. El primero se comporta de manera similar a una referencia normal `&mut self`, excepto que el valor `Self` está [_pinned_] a su ubicación de memoria. Entender `Pin` y por qué es necesario es difícil sin entender primero cómo funciona async/await. Por lo tanto, lo explicaremos más adelante en esta publicación. [_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html El propósito del parámetro `cx: &mut Context` es pasar una instancia de [`Waker`] a la tarea asíncrona, por ejemplo, la carga del sistema de archivos. Este `Waker` permite que la tarea asíncrona señale que ha terminado (o que una parte de ella ha terminado), por ejemplo, que el archivo se ha cargado desde el disco. Dado que la tarea principal sabe que será notificada cuando el `Future` esté listo, no necesita llamar a `poll` una y otra vez. Explicaremos este proceso con más detalle más adelante en esta publicación cuando implementemos nuestro propio tipo de waker. [`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html ### Trabajando con Futuros Ahora sabemos cómo se definen los futuros y entendemos la idea básica detrás del método `poll`. Sin embargo, aún no sabemos cómo trabajar de manera efectiva con los futuros. El problema es que los futuros representan los resultados de tareas asíncronas, que pueden no estar disponibles aún. En la práctica, sin embargo, a menudo necesitamos estos valores directamente para cálculos posteriores. Así que la pregunta es: ¿Cómo podemos recuperar eficientemente el valor de un futuro cuando lo necesitamos? #### Esperando en Futuros Una posible respuesta es esperar hasta que un futuro esté listo. Esto podría verse algo así: ```rust let future = async_read_file("foo.txt"); let file_content = loop { match future.poll(…) { Poll::Ready(value) => break value, Poll::Pending => {}, // no hacer nada } } ``` Aquí estamos _esperando activamente_ por el futuro al llamar a `poll` una y otra vez en un bucle. Los argumentos de `poll` no importan aquí, así que los omitimos. Aunque esta solución funciona, es muy ineficiente porque mantenemos la CPU ocupada hasta que el valor esté disponible. Un enfoque más eficiente podría ser _bloquear_ el hilo actual hasta que el futuro esté disponible. Esto es, por supuesto, solo posible si tienes hilos, así que esta solución no funciona para nuestro núcleo, al menos no aún. Incluso en sistemas donde el bloqueo está soportado, a menudo no se desea porque convierte una tarea asíncrona en una tarea síncrona nuevamente, inhibiendo así los potenciales beneficios de rendimiento de las tareas paralelas. #### Combinadores de Futuros Una alternativa a esperar es utilizar combinadores de futuros. Los combinadores de futuros son métodos como `map` que permiten encadenar y combinar futuros, similar a los métodos del trait [`Iterator`]. En lugar de esperar en el futuro, estos combinadores devuelven un futuro por sí mismos, que aplica la operación de mapeo en `poll`. [`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html Por ejemplo, un simple combinador `string_len` para convertir un `Future` en un `Future` podría verse así: ```rust struct StringLen { inner_future: F, } impl Future for StringLen where F: Future { type Output = usize; fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll { match self.inner_future.poll(cx) { Poll::Ready(s) => Poll::Ready(s.len()), Poll::Pending => Poll::Pending, } } } fn string_len(string: impl Future) -> impl Future { StringLen { inner_future: string, } } // Uso fn file_len() -> impl Future { let file_content_future = async_read_file("foo.txt"); string_len(file_content_future) } ``` Este código no funciona del todo porque no maneja el [_pinning_], pero es suficiente como ejemplo. La idea básica es que la función `string_len` envuelve una instancia de `Future` dada en una nueva estructura `StringLen`, que también implementa `Future`. Cuando se pollea el futuro envuelto, se pollea el futuro interno. Si el valor no está listo aún, `Poll::Pending` se devuelve del futuro envuelto también. Si el valor está listo, la cadena se extrae de la variante `Poll::Ready` y se calcula su longitud. Después, se envuelve nuevamente en `Poll::Ready` y se devuelve. [_pinning_]: https://doc.rust-lang.org/stable/core/pin/index.html Con esta función `string_len`, podemos calcular la longitud de una cadena asíncrona sin esperar por ella. Dado que la función devuelve otro `Future`, el llamador no puede trabajar directamente en el valor devuelto, sino que necesita usar funciones combinadoras nuevamente. De esta manera, todo el gráfico de llamadas se vuelve asíncrono y podemos esperar eficientemente por múltiples futuros a la vez en algún momento, por ejemplo, en la función principal. Debido a que escribir manualmente funciones combinadoras es difícil, a menudo son provistas por bibliotecas. Si bien la biblioteca estándar de Rust en sí no ofrece aún métodos de combinadores, el crate semi-oficial (y compatible con `no_std`) [`futures`] lo hace. Su trait [`FutureExt`] proporciona métodos combinadores de alto nivel como [`map`] o [`then`], que se pueden utilizar para manipular el resultado con closures arbitrarias. [`futures`]: https://docs.rs/futures/0.3.4/futures/ [`FutureExt`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html [`map`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map [`then`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then ##### Ventajas La gran ventaja de los combinadores de futuros es que mantienen las operaciones asíncronas. En combinación con interfaces de E/S asíncronas, este enfoque puede llevar a un rendimiento muy alto. El hecho de que los combinadores de futuros se implementen como estructuras normales con implementaciones de traits permite que el compilador los optimice excesivamente. Para más detalles, consulta la publicación sobre [_Futuros de cero costo en Rust_], que anunció la adición de futuros al ecosistema de Rust. [_Futuros de cero costo en Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ ##### Desventajas Si bien los combinadores de futuros hacen posible escribir código muy eficiente, pueden ser difíciles de usar en algunas situaciones debido al sistema de tipos y la interfaz basada en closures. Por ejemplo, considera el siguiente código: ```rust fn example(min_len: usize) -> impl Future { async_read_file("foo.txt").then(move |content| { if content.len() < min_len { Either::Left(async_read_file("bar.txt").map(|s| content + &s)) } else { Either::Right(future::ready(content)) } }) } ``` ([Pruébalo en el playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=91fc09024eecb2448a85a7ef6a97b8d8)) Aquí leemos el archivo `foo.txt` y luego usamos el combinador [`then`] para encadenar un segundo futuro basado en el contenido del archivo. Si la longitud del contenido es menor que lo dado en `min_len`, leemos un archivo diferente `bar.txt` y se lo anexamos a `content` usando el combinador [`map`]. De lo contrario, solo devolvemos el contenido de `foo.txt`. Necesitamos usar el [`move` keyword] para la closure pasada a `then` porque de lo contrario habría un error de tiempo de vida para `min_len`. La razón por la cual usamos el envoltorio [`Either`] es que los bloques `if` y `else` deben tener siempre el mismo tipo. Dado que devolvemos diferentes tipos de futuros en los bloques, debemos usar el tipo de envoltura para unificarlos en un solo tipo. La función [`ready`] envuelve un valor en un futuro que está inmediatamente listo. La función se requiere aquí porque el envoltorio `Either` espera que el valor envuelto implemente `Future`. [`move` keyword]: https://doc.rust-lang.org/std/keyword.move.html [`Either`]: https://docs.rs/futures/0.3.4/futures/future/enum.Either.html [`ready`]: https://docs.rs/futures/0.3.4/futures/future/fn.ready.html Como puedes imaginar, esto puede llevar rápidamente a código muy complejo para proyectos más grandes. Se invirtió mucho trabajo en agregar soporte para async/await a Rust, con el objetivo de hacer que el código asíncrono sea radicalmente más simple de escribir. ### El Patrón Async/Await La idea detrás de async/await es permitir que el programador escriba código que _parece_ código síncrono normal, pero que es transformado en código asíncrono por el compilador. Funciona basado en las dos palabras clave `async` y `await`. La palabra clave `async` se puede usar en la firma de una función para transformar una función síncrona en una función asíncrona que devuelve un futuro: ```rust async fn foo() -> u32 { 0 } // lo anterior se traduce aproximadamente por el compilador a: fn foo() -> impl Future { future::ready(0) } ``` Esta palabra clave por sí sola no sería tan útil. Sin embargo, dentro de las funciones `async`, se puede utilizar la palabra clave `await` para recuperar el valor asíncrono de un futuro: ```rust async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } ``` ([Pruébalo en el playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=d93c28509a1c67661f31ff820281d434)) Esta función es una traducción directa de la función `example` de [arriba](#desventajas) que usó funciones combinadoras. Usando el operador `.await`, podemos recuperar el valor de un futuro sin necesitar closures o tipos `Either`. Como resultado, podemos escribir nuestro código como escribimos código síncrono normal, con la diferencia de que _esto sigue siendo código asíncrono_. #### Transformación de Máquina de Estado Detrás de escena, el compilador convierte el cuerpo de la función `async` en una [_máquina de estado_], donde cada llamada `.await` representa un estado diferente. Para la función `example` anterior, el compilador crea una máquina de estado con los siguientes cuatro estados: [_máquina de estado_]: https://en.wikipedia.org/wiki/Finite-state_machine ![Cuatro estados: inicio, esperando a foo.txt, esperando a bar.txt, final](async-state-machine-states.svg) Cada estado representa un diferente punto de pausa en la función. Los estados _"Inicio"_ y _"Fin"_ representan la función al comienzo y al final de su ejecución. El estado _"Esperando a foo.txt"_ representa que la función está actualmente esperando el resultado de `async_read_file` primero. Similarmente, el estado _"Esperando a bar.txt"_ representa el punto de pausa donde la función está esperando el resultado de `async_read_file` segundo. La máquina de estado implementa el trait `Future` haciendo que cada llamada a `poll` sea una posible transición de estado: ![Cuatro estados y sus transiciones: inicio, esperando a foo.txt, esperando a bar.txt, fin](async-state-machine-basic.svg) El diagrama usa flechas para representar cambios de estado y formas de diamante para representar formas alternativas. Por ejemplo, si el archivo `foo.txt` no está listo, se toma el camino marcado como _"no"_ y se alcanza el estado _"Esperando a foo.txt"_. De lo contrario, se toma el camino _"sí"_. El pequeño diamante rojo sin leyenda representa la rama `if content.len() < 100` de la función `example`. Observamos que la primera llamada `poll` inicia la función y la deja correr hasta que llega a un futuro que no está listo aún. Si todos los futuros en el camino están listos, la función puede ejecutarse hasta el estado _"Fin"_, donde devuelve su resultado envuelto en `Poll::Ready`. De lo contrario, la máquina de estados entra en un estado de espera y devuelve `Poll::Pending`. En la próxima llamada `poll`, la máquina de estados comienza de nuevo desde el último estado de espera y vuelve a intentar la última operación. #### Guardando Estado Para poder continuar desde el último estado de espera, la máquina de estado debe llevar un seguimiento del estado actual internamente. Además, debe guardar todas las variables que necesita para continuar la ejecución en la siguiente llamada `poll`. Aquí es donde el compilador realmente puede brillar: dado que sabe qué variables se utilizan cuando, puede generar automáticamente estructuras con exactamente las variables que se necesitan. Como ejemplo, el compilador genera estructuras como la siguiente para la función `example` anterior: ```rust // La función `example` nuevamente para que no necesites desplazarte hacia arriba async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } // Las estructuras de estado generadas por el compilador: struct StartState { min_len: usize, } struct WaitingOnFooTxtState { min_len: usize, foo_txt_future: impl Future, } struct WaitingOnBarTxtState { content: String, bar_txt_future: impl Future, } struct EndState {} ``` En los estados _"inicio"_ y _"Esperando a foo.txt"_, se necesita almacenar el parámetro `min_len` para la comparación posterior con `content.len()`. El estado _"Esperando a foo.txt"_ y además almacena un `foo_txt_future`, que representa el futuro devuelto por la llamada `async_read_file`. Este futuro necesita ser polled de nuevo cuando la máquina de estado continúa, así que necesita ser almacenado. El estado _"Esperando a bar.txt"_ contiene la variable `content` para la concatenación de cadenas posterior cuando `bar.txt` esté listo. También almacena un `bar_txt_future` que representa la carga en progreso de `bar.txt`. La estructura no contiene la variable `min_len` porque ya no se necesita después de la comparación `content.len()`. En el estado _"fin"_, no se almacenan variables porque la función ya se ha completado. Ten en cuenta que este es solo un ejemplo del código que el compilador podría generar. Los nombres de las estructuras y la disposición de los campos son detalles de implementación y pueden ser diferentes. #### El Tipo Completo de Máquina de Estado Si bien el código exacto generado por el compilador es un detalle de implementación, ayuda a entender imaginar cómo se vería la máquina de estado generada _podría_ para la función `example`. Ya definimos las estructuras que representan los diferentes estados y que contienen las variables requeridas. Para crear una máquina de estado sobre ellas, podemos combinarlas en un [`enum`]: [`enum`]: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html ```rust enum ExampleStateMachine { Start(StartState), WaitingOnFooTxt(WaitingOnFooTxtState), WaitingOnBarTxt(WaitingOnBarTxtState), End(EndState), } ``` Definimos una variante de enum separada para cada estado y añadimos la estructura de estado correspondiente a cada variante como un campo. Para implementar las transiciones de estado, el compilador genera una implementación del trait `Future` basada en la función `example`: ```rust impl Future for ExampleStateMachine { type Output = String; // tipo de retorno de `example` fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { loop { match self { // TODO: manejar pinning ExampleStateMachine::Start(state) => {…} ExampleStateMachine::WaitingOnFooTxt(state) => {…} ExampleStateMachine::WaitingOnBarTxt(state) => {…} ExampleStateMachine::End(state) => {…} } } } } ``` El tipo `Output` del futuro es `String` porque es el tipo de retorno de la función `example`. Para implementar la función `poll`, utilizamos una instrucción `match` sobre el estado actual dentro de un `loop`. La idea es que cambiamos al siguiente estado tantas veces como sea posible y usamos un explícito `return Poll::Pending` cuando no podemos continuar. Para simplificar, solo mostramos un código simplificado y no manejamos [pinning][_pinned_], propiedad, tiempos de vida, etc. Así que este código y el siguiente deben ser tratados como pseudo-código y no ser usados directamente. Por supuesto, el código generado real por el compilador maneja todo correctamente, aunque de manera posiblemente diferente. Para mantener pequeños los fragmentos de código, presentamos el código de cada brazo de `match` por separado. Empecemos con el estado `Start`: ```rust ExampleStateMachine::Start(state) => { // del cuerpo de `example` let foo_txt_future = async_read_file("foo.txt"); // operación `.await` let state = WaitingOnFooTxtState { min_len: state.min_len, foo_txt_future, }; *self = ExampleStateMachine::WaitingOnFooTxt(state); } ``` La máquina de estado se encuentra en el estado `Start` cuando está justo al principio de la función. En este caso, ejecutamos todo el código del cuerpo de la función `example` hasta la primera `.await`. Para manejar la operación `.await`, cambiamos el estado de la máquina de estado `self` a `WaitingOnFooTxt`, lo que incluye la construcción de la estructura `WaitingOnFooTxtState`. Dado que la instrucción `match self {…}` se ejecuta en un bucle, la ejecución salta al brazo `WaitingOnFooTxt` a continuación: ```rust ExampleStateMachine::WaitingOnFooTxt(state) => { match state.foo_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(content) => { // del cuerpo de `example` if content.len() < state.min_len { let bar_txt_future = async_read_file("bar.txt"); // operación `.await` let state = WaitingOnBarTxtState { content, bar_txt_future, }; *self = ExampleStateMachine::WaitingOnBarTxt(state); } else { *self = ExampleStateMachine::End(EndState); return Poll::Ready(content); } } } } ``` En este brazo de `match`, primero llamamos a la función `poll` de `foo_txt_future`. Si no está lista, salimos del bucle y devolvemos `Poll::Pending`. Dado que `self` permanece en el estado `WaitingOnFooTxt` en este caso, la siguiente llamada `poll` en la máquina de estado ingresará al mismo brazo de `match` y volverá a intentar hacer polling en el `foo_txt_future`. Cuando `foo_txt_future` está listo, asignamos el resultado a la variable `content` y continuamos ejecutando el código de la función `example`: Si `content.len()` es menor que el `min_len` guardado en la estructura de estado, el archivo `bar.txt` se carga asíncronamente. Una vez más, traducimos la operación `.await` en un cambio de estado, esta vez al estado `WaitingOnBarTxt`. Dado que estamos ejecutando el `match` dentro de un bucle, la ejecución salta directamente al brazo de `match` para el nuevo estado después, donde se hace polling en el futuro `bar_txt_future`. En caso de que ingresamos al bloque `else`, no ocurre ninguna otra operación `.await`. Alcanzamos el final de la función y devolvemos `content` envuelto en `Poll::Ready`. También cambiamos el estado actual a `End`. El código para el estado `WaitingOnBarTxt` se ve así: ```rust ExampleStateMachine::WaitingOnBarTxt(state) => { match state.bar_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(bar_txt) => { *self = ExampleStateMachine::End(EndState); // del cuerpo de `example` return Poll::Ready(state.content + &bar_txt); } } } ``` Al igual que en el estado `WaitingOnFooTxt`, comenzamos haciendo polling en `bar_txt_future`. Si aún está pendiente, salimos del bucle y devolvemos `Poll::Pending`. De lo contrario, podemos realizar la última operación de la función `example`: concatenar la variable `content` con el resultado del futuro. Actualizamos la máquina de estado al estado `End` y luego devolvemos el resultado envuelto en `Poll::Ready`. Finalmente, el código para el estado `End` se ve así: ```rust ExampleStateMachine::End(_) => { panic!("poll called after Poll::Ready was returned"); } ``` Los futuros no deben ser polled nuevamente después de que devuelven `Poll::Ready`, así que hacemos panic si se llama a `poll` mientras estamos en el estado `End`. Ahora sabemos cómo podría verse la máquina de estado generada por el compilador y su implementación del trait `Future`. En la práctica, el compilador genera el código de diferentes formas. (En caso de que te interese, la implementación actualmente se basa en [_corutinas_], pero esto es solo un detalle de implementación.) [_corutinas_]: https://doc.rust-lang.org/stable/unstable-book/language-features/coroutines.html La última pieza del rompecabezas es el código generado para la propia función `example`. Recuerda, la cabecera de la función se definió así: ```rust async fn example(min_len: usize) -> String ``` Dado que el cuerpo completo de la función ahora es implementado por la máquina de estado, lo único que debe hacer la función es inicializar la máquina de estado y devolverla. El código generado para esto podría verse así: ```rust fn example(min_len: usize) -> ExampleStateMachine { ExampleStateMachine::Start(StartState { min_len, }) } ``` La función ya no tiene modificador `async` ya que ahora devuelve explícitamente un tipo `ExampleStateMachine`, que implementa el trait `Future`. Como era de esperar, la máquina de estado se construye en el estado `Start` y la estructura de estado correspondiente se inicializa con el parámetro `min_len`. Ten en cuenta que esta función no inicia la ejecución de la máquina de estado. Esta es una decisión de diseño fundamental de los futuros en Rust: no hacen nada hasta que se les pollea por primera vez. ### Pinning Ya que nos hemos encontrado con el _pinning_ varias veces en esta publicación, es momento de explorar qué es el pinning y por qué es necesario. #### Estructuras Autorreferenciales Como se explicó anteriormente, la transformación de máquina de estado almacena las variables locales de cada punto de pausa en una estructura. Para ejemplos pequeños como nuestra función `example`, esto fue sencillo y no llevó a ningún problema. Sin embargo, las cosas se vuelven más difíciles cuando las variables se referencian entre sí. Por ejemplo, considera esta función: ```rust async fn pin_example() -> i32 { let array = [1, 2, 3]; let element = &array[2]; async_write_file("foo.txt", element.to_string()).await; *element } ``` Esta función crea un pequeño `array` con los contenidos `1`, `2` y `3`. Luego crea una referencia al último elemento del array y la almacena en una variable `element`. A continuación, escribe asincrónicamente el número convertido a una cadena en un archivo `foo.txt`. Finalmente, devuelve el número referenciado por `element`. Dado que la función utiliza una única operación `.await`, la máquina de estado resultante tiene tres estados: inicio, fin y "esperando a escribir". La función no toma argumentos, por lo que la estructura para el estado de inicio está vacía. Al igual que antes, la estructura para el estado final está vacía porque la función ha terminado en este punto. Sin embargo, la estructura para el estado de "esperando a escribir" es más interesante: ```rust struct WaitingOnWriteState { array: [1, 2, 3], element: 0x1001c, // dirección del último elemento del array } ``` Necesitamos almacenar tanto `array` como `element` porque la variable `element` es necesaria para el valor de retorno y `array` es referenciada por `element`. Usamos `0x1001c` como un ejemplo de dirección de memoria aquí. En realidad, necesita ser la dirección del último elemento del campo `array`, por lo que depende de dónde viva la estructura en memoria. Las estructuras con tales punteros internos se llaman _estructuras autorefencial_ porque se refieren a sí mismas desde uno de sus campos. #### El Problema con las Estructuras Autorreferenciales El puntero interno de nuestra estructura autorefencial lleva a un problema fundamental, que se hace evidente cuando observamos su disposición en la memoria: ![array en 0x10014 con campos 1, 2 y 3; elemento en dirección 0x10020, apuntando al último elemento del array en 0x1001c](self-referential-struct.svg) El campo `array` comienza en la dirección 0x10014 y el campo `element` en la dirección 0x10020. Apunta a la dirección 0x1001c porque el último elemento del array vive en esta dirección. En este punto, todo sigue bien. Sin embargo, un problema ocurre cuando movemos esta estructura a una dirección de memoria diferente: ![array en 0x10024 con campos 1, 2 y 3; elemento en dirección 0x10030, aún apuntando a 0x1001c, incluso cuando el último elemento del array ahora vive en 0x1002c](self-referential-struct-moved.svg) Movimos la estructura un poco de modo que ahora comienza en la dirección `0x10024`. Esto podría suceder, por ejemplo, cuando pasamos la estructura como un argumento a una función o la asignamos a otra variable de pila diferente. El problema es que el campo `element` aún apunta a la dirección `0x1001c` a pesar de que el último elemento del `array` vive ahora en `0x1002c`. Así, el puntero está colgando, con el resultado de que se produce un comportamiento indefinido en la próxima llamada a `poll`. #### Posibles Soluciones Hay tres enfoques fundamentales para resolver el problema del puntero colgante: - **Actualizar el puntero al moverse**: La idea es actualizar el puntero interno cada vez que la estructura se mueve en memoria para que siga siendo válida después del movimiento. Desafortunadamente, este enfoque requeriría amplios cambios en Rust que resultarían en pérdidas de rendimiento potencialmente enormes. La razón es que necesitaríamos algún tipo de tiempo de ejecución que mantenga un seguimiento del tipo de todos los campos de la estructura y compruebe en cada operación de movimiento si se requiere una actualización de puntero. - **Almacenar un desplazamiento en lugar de auto-referencias**: Para evitar la necesidad de actualizar punteros, el compilador podría intentar almacenar auto-referencias como desplazamientos desde el principio de la estructura. Por ejemplo, el campo `element` de la estructura `WaitingOnWriteState` anterior podría almacenarse en forma de un campo `element_offset` con un valor de 8 porque el elemento del array al que apunta comienza 8 bytes después de la estructura. Dado que el desplazamiento permanece igual cuando la estructura se mueve, no se requieren actualizaciones de campo. El problema con este enfoque es que requiere que el compilador detecte todas las auto-referencias. Esto no es posible en tiempo de compilación porque el valor de una referencia puede depender de la entrada del usuario, por lo que necesitaríamos un sistema en tiempo de ejecución nuevamente para analizar referencias y crear correctamente las estructuras de estado. Esto no solo resultaría en costos de tiempo de ejecución, sino que también impediría ciertas optimizaciones del compilador, lo que provocaría grandes pérdidas de rendimiento nuevamente. - **Prohibir mover la estructura**: Como vimos anteriormente, el puntero colgante solo ocurre cuando movemos la estructura en memoria. Al prohibir completamente las operaciones de movimiento en estructuras autorefenciales, el problema también se puede evitar. La gran ventaja de este enfoque es que se puede implementar a nivel de sistema de tipos sin costos adicionales de tiempo de ejecución. La desventaja es que recaerá sobre el programador lidiar con las operaciones de movimiento en las estructuras potencialmente autorefenciales. Rust eligió la tercera solución por su principio de proporcionar _abstracciones de costo cero_, lo que significa que las abstracciones no deben imponer costos adicionales de tiempo de ejecución. La API de [_pinning_] fue propuesta para este propósito en [RFC 2349](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md). A continuación, daremos un breve resumen de esta API y explicaremos cómo funciona con async/await y futuros. #### Valores en el Heap La primera observación es que los valores [asignados en el heap] ya tienen una dirección de memoria fija la mayoría de las veces. Se crean usando una llamada a `allocate` y luego se referencian mediante un tipo de puntero como `Box`. Si bien es posible mover el tipo de puntero, el valor del heap al que apunta permanece en la misma dirección de memoria hasta que se libera a través de una llamada `deallocate`. [heap-allocated]: @/edition-2/posts/10-heap-allocation/index.md Usando la asignación en el heap, podemos intentar crear una estructura autorefencial: ```rust fn main() { let mut heap_value = Box::new(SelfReferential { self_ptr: 0 as *const _, }); let ptr = &*heap_value as *const SelfReferential; heap_value.self_ptr = ptr; println!("valor en el heap en: {:p}", heap_value); println!("referencia interna: {:p}", heap_value.self_ptr); } struct SelfReferential { self_ptr: *const Self, } ``` ([Pruébalo en el playground][playground-self-ref]) Creamos una estructura simple llamada `SelfReferential` que contiene un solo campo de puntero. Primero inicializamos esta estructura con un puntero nulo y luego la asignamos en el heap usando `Box::new`. Luego determinamos la dirección de la estructura asignada en el heap y la almacenamos en una variable `ptr`. Finalmente, hacemos que la estructura sea autorefencial al asignar la variable `ptr` al campo `self_ptr`. Cuando ejecutamos este código [en el playground][playground-self-ref], vemos que la dirección del valor del heap y su puntero interno son iguales, lo que significa que el campo `self_ptr` es una referencia válida a sí misma. Dado que la variable `heap_value` es solo un puntero, moverla (por ejemplo, pasándola a una función) no cambia la dirección de la estructura en sí, por lo que el `self_ptr` sigue siendo válido incluso si se mueve el puntero. Sin embargo, todavía hay una forma de romper este ejemplo: podemos salir de un `Box` o reemplazar su contenido: ```rust let stack_value = mem::replace(&mut *heap_value, SelfReferential { self_ptr: 0 as *const _, }); println!("valor en: {:p}", &stack_value); println!("referencia interna: {:p}", stack_value.self_ptr); ``` ([Pruébalo en el playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) Aquí usamos la función [`mem::replace`] para reemplazar el valor asignado en el heap con una nueva instancia de estructura. Esto nos permite mover el valor original `heap_value` a la pila, mientras que el campo `self_ptr` de la estructura es ahora un puntero colgante que aún apunta a la antigua dirección del heap. Cuando intentas ejecutar el ejemplo en el playground, verás que las líneas impresas _"valor en:"_ y _"referencia interna:"_ muestran punteros diferentes. Por lo tanto, la asignación de un valor en el heap no es suficiente para hacer que las auto-referencias sean seguras. [`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html El problema fundamental que permitió que se produjera la ruptura anterior es que `Box` permite obtener una referencia `&mut T` al valor asignado en el heap. Esta referencia `&mut` hace posible usar métodos como [`mem::replace`] o [`mem::swap`] para invalidar el valor asignado en el heap. Para resolver este problema, debemos prevenir que se creen referencias `&mut` en estructuras autorefenciales. [`mem::swap`]: https://doc.rust-lang.org/nightly/core/mem/fn.swap.html #### `Pin>` y `Unpin` La API de pinning proporciona una solución al problema de `&mut T` en forma de los tipos envolventes [`Pin`] y el trait marcador [`Unpin`]. La idea detrás de estos tipos es limitar todos los métodos de `Pin` que se pueden usar para obtener referencias `&mut` al valor envuelto (por ejemplo, [`get_mut`][pin-get-mut] o [`deref_mut`][pin-deref-mut]) en el trait `Unpin`. El trait `Unpin` es un [_auto trait_], que se implementa automáticamente para todos los tipos excepto para aquellos que optan explícitamente por no hacerlo. Al hacer que las estructuras autorefenciales opten por no implementar `Unpin`, no hay forma (segura) de obtener un `&mut T` del tipo `Pin>` para ellas. Como resultado, se garantiza que todas las auto-referencias internas se mantendrán válidas. [`Pin`]: https://doc.rust-lang.org/stable/core/pin/struct.Pin.html [`Unpin`]: https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html [pin-get-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_mut [pin-deref-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.deref_mut [_auto trait_]: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits Como ejemplo, actualicemos el tipo `SelfReferential` de arriba para que no implemente `Unpin`: ```rust use core::marker::PhantomPinned; struct SelfReferential { self_ptr: *const Self, _pin: PhantomPinned, } ``` Optamos por no implementar `Unpin` al añadir un segundo campo `_pin` de tipo [`PhantomPinned`]. Este tipo es un tipo de tamaño cero cuyo único propósito es _no_ implementar el trait `Unpin`. Debido a la forma en que funcionan los [auto traits][_auto trait_], un solo campo que no sea `Unpin` es suficiente para hacer que toda la estructura opta por no ser `Unpin`. [`PhantomPinned`]: https://doc.rust-lang.org/nightly/core/marker/struct.PhantomPinned.html El segundo paso es cambiar el tipo de `Box` en el ejemplo a un tipo `Pin>`. La forma más fácil de hacer esto es usar la función [`Box::pin`] en lugar de [`Box::new`] para crear el valor asignado en el heap: [`Box::pin`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin [`Box::new`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.new ```rust let mut heap_value = Box::pin(SelfReferential { self_ptr: 0 as *const _, _pin: PhantomPinned, }); ``` Además de cambiar `Box::new` a `Box::pin`, también necesitamos añadir el nuevo campo `_pin` en el inicializador de la estructura. Dado que `PhantomPinned` es un tipo de tamaño cero, solo necesitamos su nombre de tipo para inicializarlo. Cuando [intentamos ejecutar nuestro ejemplo ajustado](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=961b0db194bbe851ff4d0ed08d3bd98a) ahora, vemos que ya no funciona: ``` error[E0594]: cannot assign to data in dereference of `Pin>` --> src/main.rs:10:5 | 10 | heap_value.self_ptr = ptr; | ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot assign | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` error[E0596]: cannot borrow data in dereference of `Pin>` as mutable --> src/main.rs:16:36 | 16 | let stack_value = mem::replace(&mut *heap_value, SelfReferential { | ^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` ``` Ambos errores ocurren porque el tipo `Pin>` ya no implementa el trait `DerefMut`. Esto es exactamente lo que queremos porque el trait `DerefMut` devolvería una referencia `&mut`, que queremos prevenir. Esto solo ocurre porque ambos optamos por no implementar `Unpin` y cambiamos `Box::new` a `Box::pin`. El problema que queda es que el compilador no solo previene mover el tipo en la línea 16, sino que también prohíbe inicializar el campo `self_ptr` en la línea 10. Esto ocurre porque el compilador no puede diferenciar entre los usos válidos e inválidos de `&mut` referencias. Para que la inicialización funcione nuevamente, debemos usar el método inseguro [`get_unchecked_mut`]: [`get_unchecked_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut ```rust // seguro porque modificar un campo no mueve toda la estructura unsafe { let mut_ref = Pin::as_mut(&mut heap_value); Pin::get_unchecked_mut(mut_ref).self_ptr = ptr; } ``` La función [`get_unchecked_mut`] funciona en un `Pin<&mut T>` en lugar de un `Pin>`, así que debemos usar [`Pin::as_mut`] para convertir el valor. Luego podemos establecer el campo `self_ptr` utilizando la referencia `&mut` devuelta por `get_unchecked_mut`. [`Pin::as_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut Ahora el único error que queda es el error deseado en `mem::replace`. Recuerda, esta operación intenta mover el valor asignado en el heap a la pila, lo cual invalidaría la auto-referencia almacenada en el campo `self_ptr`. Al optar por no implementar `Unpin` y usar `Pin>`, podemos prevenir esta operación en tiempo de compilación y así trabajar de manera segura con estructuras auto-referenciales. Como vimos, el compilador no puede probar que la creación de la auto-referencia es segura (aún), así que necesitamos usar un bloque inseguro y verificar la corrección nosotros mismos. #### Pinning en la Pila y `Pin<&mut T>` En la sección anterior, aprendimos cómo usar `Pin>` para crear de manera segura un valor auto-referencial asignado en el heap. Si bien este enfoque funciona bien y es relativamente seguro (aparte de la construcción insegura), la asignación requerida en el heap conlleva un costo de rendimiento. Dado que Rust se esfuerza por proporcionar _abstracciones de costo cero_ siempre que sea posible, la API de pinning también permite crear instancias de `Pin<&mut T>` que apuntan a valores asignados en la pila. A diferencia de las instancias de `Pin>`, que tienen _propiedad_ del valor envuelto, las instancias de `Pin<&mut T>` solo toman prestado temporalmente el valor envuelto. Esto complica un poco las cosas, ya que requiere que el programador garantice condiciones adicionales por sí mismo. Lo más importante es que un `Pin<&mut T>` debe permanecer pinado durante toda la vida útil de `T` referenciado, lo que puede ser difícil de verificar para variables basadas en la pila. Para ayudar con esto, existen crates como [`pin-utils`], pero aún así no recomendaría pinning en la pila a menos que sepas exactamente lo que estás haciendo. [`pin-utils`]: https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/ Para una lectura más profunda, consulta la documentación del [`módulo pin`] y el método [`Pin::new_unchecked`]. [`módulo pin`]: https://doc.rust-lang.org/nightly/core/pin/index.html [`Pin::new_unchecked`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.new_unchecked #### Pinning y Futuros Como ya vimos en esta publicación, el método [`Future::poll`] utiliza el pinning en forma de un parámetro `Pin<&mut Self>`: [`Future::poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll ```rust fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll ``` La razón por la que este método toma `self: Pin<&mut Self>` en lugar del normal `&mut self` es que las instancias de futuros creadas a partir de async/await son a menudo auto-referenciales, como vimos [arriba][self-ref-async-await]. Al envolver `Self` en `Pin` y dejar que el compilador opte por no ser `Unpin` para futuros auto-referenciales generados a partir de async/await, se garantiza que los futuros no se muevan en memoria entre las llamadas a `poll`. Esto asegura que todas las referencias internas sigan siendo válidas. [self-ref-async-await]: @/edition-2/posts/12-async-await/index.md#self-referential-structs Vale la pena mencionar que mover futuros antes de la primera llamada a `poll` está bien. Esto es resultado del hecho de que los futuros son perezosos y no hacen nada hasta que se les realiza polling por primera vez. El estado inicial de las máquinas de estado generadas, por lo tanto, solo contiene los argumentos de función pero no referencias internas. Para poder llamar a `poll`, el llamador debe envolver el futuro en `Pin` primero, lo que asegura que el futuro no se pueda mover en memoria. Dado que el pinning en la pila es más difícil de hacer correctamente, recomiendo utilizar siempre [`Box::pin`] combinado con [`Pin::as_mut`] para esto. [`futures`]: https://docs.rs/futures/0.3.4/futures/ En caso de que estés interesado en entender cómo implementar de manera segura una función combinadora de futuros utilizando pinning en la pila tú mismo, echa un vistazo al [código relativamente corto del método combinador `map`][map-src] del crate `futures` y la sección sobre [proyecciones y pinning estructural] de la documentación de pin. [map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html [proyecciones y pinning estructural]: https://doc.rust-lang.org/stable/std/pin/index.html#projections-and-structural-pinning ### Ejecutores y Wakers Usando async/await, es posible trabajar con futuros de manera ergonómica y completamente asíncrona. Sin embargo, como aprendimos anteriormente, los futuros no hacen nada hasta que se les hace polling. Esto significa que tenemos que llamar a `poll` en ellos en algún momento, de lo contrario, el código asíncrono nunca se ejecuta. Con un solo futuro, siempre podemos esperar cada futuro manualmente usando un bucle [como se describe arriba](#esperando-en-futuros). Sin embargo, este enfoque es muy ineficiente y no práctico para programas que crean un gran número de futuros. La solución más común a este problema es definir un _ejecutor_ global que sea responsable de hacer polling en todos los futuros en el sistema hasta que se completen. #### Ejecutores El propósito de un ejecutor es permitir ejecutar futuros como tareas independientes, típicamente a través de algún tipo de método `spawn`. Luego, el ejecutor es responsable de hacer polling en todos los futuros hasta que se completen. La gran ventaja de gestionar todos los futuros en un lugar central es que el ejecutor puede cambiar a un futuro diferente siempre que un futuro devuelva `Poll::Pending`. Así, las operaciones asíncronas se ejecutan en paralelo y la CPU se mantiene ocupada. Muchas implementaciones de ejecutores también pueden aprovechar sistemas con múltiples núcleos de CPU. Crean un [pool de hilos] que es capaz de utilizar todos los núcleos si hay suficiente trabajo disponible y utilizan técnicas como [robo de trabajo] para equilibrar la carga entre núcleos. También hay implementaciones de ejecutor especiales para sistemas embebidos que optimizan para baja latencia y sobredimensionamiento de memoria. [pool de hilos]: https://en.wikipedia.org/wiki/Thread_pool [robo de trabajo]: https://en.wikipedia.org/wiki/Work_stealing Para evitar la sobrecarga de hacer polling en futuros repetidamente, los ejecutores suelen aprovechar la API de _waker_ soportada por los futuros de Rust. #### Wakers La idea detrás de la API de waker es que un tipo especial [`Waker`] se pasa a cada invocación de `poll`, envuelto en el tipo [`Context`]. Este tipo `Waker` es creado por el ejecutor y puede ser utilizado por la tarea asíncrona para señalan su (o una parte de su) finalización. Como resultado, el ejecutor no necesita llamar a `poll` en un futuro que anteriormente devolvió `Poll::Pending` hasta que recibe la notificación de waker correspondiente. [`Context`]: https://doc.rust-lang.org/nightly/core/task/struct.Context.html Esto se ilustra mejor con un pequeño ejemplo: ```rust async fn write_file() { async_write_file("foo.txt", "Hello").await; } ``` Esta función escribe asíncronamente la cadena "Hello" en un archivo `foo.txt`. Dado que las escrituras en el disco duro toman algo de tiempo, la primera llamada a `poll` en este futuro probablemente devolverá `Poll::Pending`. Sin embargo, el controlador del disco duro almacenará internamente el `Waker` pasado a la llamada `poll` y lo utilizará para notificar al ejecutor cuando el archivo se haya escrito en el disco. De esta manera, el ejecutor no necesita perder tiempo tratando de `poll` el futuro nuevamente antes de recibir la notificación del waker. Veremos cómo funciona el tipo `Waker` en detalle cuando creemos nuestro propio ejecutor con soporte de waker en la sección de implementación de esta publicación. ### ¿Multitasking Cooperativo? Al principio de esta publicación, hablamos sobre el multitasking preemptivo y cooperativo. Mientras que el multitasking preemptivo depende del sistema operativo para cambiar forzosamente entre tareas en ejecución, el multitasking cooperativo requiere que las tareas cedan voluntariamente el control de la CPU a través de una operación _yield_ regularmente. La gran ventaja del enfoque cooperativo es que las tareas pueden guardar su estado ellas mismas, lo que resulta en cambios de contexto más eficientes y hace posible compartir la misma pila de llamadas entre las tareas. Puede que no sea evidente de inmediato, pero los futuros y async/await son una implementación del patrón de multitasking cooperativo: - Cada futuro que se añade al ejecutor es básicamente una tarea cooperativa. - En lugar de usar una operación yield explícita, los futuros ceden el control del núcleo de CPU al devolver `Poll::Pending` (o `Poll::Ready` al final). - No hay nada que fuerce a los futuros a ceder la CPU. Si quieren, pueden nunca regresar de `poll`, por ejemplo, girando eternamente en un bucle. - Dado que cada futuro puede bloquear la ejecución de otros futuros en el ejecutor, necesitamos confiar en que no sean maliciosos. - Internamente, los futuros almacenan todo el estado que necesitan para continuar la ejecución en la siguiente llamada `poll`. Con async/await, el compilador detecta automáticamente todas las variables que se necesitan y las almacena dentro de la máquina de estado generada. - Solo se guarda el estado mínimo requerido para la continuación. - Dado que el método `poll` cede la pila de llamadas cuando retorna, se puede usar la misma pila para pollear otros futuros. Vemos que los futuros y async/await encajan perfectamente en el patrón de multitasking cooperativo; solo utilizan algunos términos diferentes. En lo sucesivo, por lo tanto, utilizaremos los términos "tarea" y "futuro" indistintamente. ## Implementación Ahora que entendemos cómo funciona el multitasking cooperativo basado en futuros y async/await en Rust, es hora de agregar soporte para ello a nuestro núcleo. Dado que el trait [`Future`] es parte de la biblioteca `core` y async/await es una característica del propio lenguaje, no hay nada especial que debamos hacer para usarlo en nuestro núcleo `#![no_std]`. El único requisito es que usemos como mínimo nightly `2020-03-25` de Rust porque async/await no era compatible con `no_std` antes. Con una versión nightly suficientemente reciente, podemos comenzar a usar async/await en nuestro `main.rs`: ```rust // en src/main.rs async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("número asíncrono: {}", number); } ``` La función `async_number` es una `async fn`, así que el compilador la transforma en una máquina de estado que implementa `Future`. Dado que la función solo devuelve `42`, el futuro resultante devolverá directamente `Poll::Ready(42)` en la primera llamada `poll`. Al igual que `async_number`, la función `example_task` también es una `async fn`. Espera el número devuelto por `async_number` y luego lo imprime usando el macro `println`. Para ejecutar el futuro devuelto por `example_task`, necesitamos llamar a `poll` en él hasta que señale su finalización devolviendo `Poll::Ready`. Para hacer esto, necesitamos crear un tipo de ejecutor simple. ### Tarea Antes de comenzar la implementación del ejecutor, creamos un nuevo módulo `task` con un tipo `Task`: ```rust // en src/lib.rs pub mod task; ``` ```rust // en src/task/mod.rs use core::{future::Future, pin::Pin}; use alloc::boxed::Box; pub struct Task { future: Pin>>, } ``` La estructura `Task` es un envoltorio nuevo alrededor de un futuro pinzado, asignado en el heap y de despacho dinámico con el tipo vacío `()` como salida. Revisemos esto en detalle: - Requerimos que el futuro asociado con una tarea devuelva `()`. Esto significa que las tareas no devuelven ningún resultado, simplemente se ejecutan por sus efectos secundarios. Por ejemplo, la función `example_task` que definimos arriba no tiene valor de retorno, pero imprime algo en pantalla como efecto secundario. - La palabra clave `dyn` indica que almacenamos un [_trait object_] en el `Box`. Esto significa que los métodos en el futuro son [_despachados dinámicamente_], permitiendo que diferentes tipos de futuros se almacenen en el tipo `Task`. Esto es importante porque cada `async fn` tiene su propio tipo y queremos ser capaces de crear múltiples tareas diferentes. - Como aprendimos en la [sección sobre pinning], el tipo `Pin` asegura que un valor no puede moverse en memoria al colocarlo en el heap y prevenir la creación de referencias `&mut` a él. Esto es importante porque los futuros generados por async/await podrían ser auto-referenciales, es decir, contener punteros a sí mismos que se invalidarían cuando el futuro se moviera. [_trait object_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html [_despachados dinámicamente_]: https://doc.rust-lang.org/book/ch18-02-trait-objects.html#trait-objects-perform-dynamic-dispatch [sección sobre pinning]: #pinning Para permitir la creación de nuevas estructuras `Task` a partir de futuros, creamos una función `new`: ```rust // en src/task/mod.rs impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { future: Box::pin(future), } } } ``` La función toma un futuro arbitrario con un tipo de salida de `()` y lo pinza en memoria a través de la función [`Box::pin`]. Luego envuelve el futuro en la estructura `Task` y la devuelve. Se requiere el tiempo de vida `'static` aquí porque el `Task` devuelto puede vivir por un tiempo arbitrario, por lo que el futuro también debe ser válido durante ese tiempo. #### Poll También añadimos un método `poll` para permitir al ejecutor hacer polling en el futuro almacenado: ```rust // en src/task/mod.rs use core::task::{Context, Poll}; impl Task { fn poll(&mut self, context: &mut Context) -> Poll<()> { self.future.as_mut().poll(context) } } ``` Dado que el método [`poll`] del trait `Future` espera ser llamado sobre un tipo `Pin<&mut T>`, usamos el método [`Pin::as_mut`] para convertir el campo `self.future` del tipo `Pin>` primero. Luego llamamos a `poll` en el campo `self.future` convertido y devolvemos el resultado. Como el método `Task::poll` debería ser llamado solo por el ejecutor que crearemos en un momento, mantenemos la función privada. ### Ejecutor simple Dado que los ejecutores pueden ser bastante complejos, comenzamos deliberadamente creando un ejecutor muy básico antes de implementar un ejecutor más completo más adelante. Para ello, primero creamos un nuevo submódulo `task::simple_executor`: ```rust // en src/task/mod.rs pub mod simple_executor; ``` ```rust // en src/task/simple_executor.rs use super::Task; use alloc::collections::VecDeque; pub struct SimpleExecutor { task_queue: VecDeque, } impl SimpleExecutor { pub fn new() -> SimpleExecutor { SimpleExecutor { task_queue: VecDeque::new(), } } pub fn spawn(&mut self, task: Task) { self.task_queue.push_back(task) } } ``` La estructura contiene un solo campo `task_queue` de tipo [`VecDeque`], que es básicamente un vector que permite operaciones de push y pop en ambos extremos. La idea detrás de usar este tipo es que insertamos nuevas tareas a través del método `spawn` al final y extraemos la siguiente tarea para ejecutar desde el frente. De esta manera, obtenemos una simple [cola FIFO] (_"primero en entrar, primero en salir"_). [`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html [cola FIFO]: https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics) #### Waker Inútil Para llamar al método `poll`, necesitamos crear un tipo [`Context`], que envuelve un tipo [`Waker`]. Para comenzar de manera simple, primero crearemos un waker inútil que no hace nada. Para ello, creamos una instancia de [`RawWaker`], la cual define la implementación de los diferentes métodos `Waker`, y luego usamos la función [`Waker::from_raw`] para convertirlo en un `Waker`: [`RawWaker`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html [`Waker::from_raw`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw ```rust // en src/task/simple_executor.rs use core::task::{Waker, RawWaker}; fn dummy_raw_waker() -> RawWaker { todo!(); } fn dummy_waker() -> Waker { unsafe { Waker::from_raw(dummy_raw_waker()) } } ``` La función `from_raw` es insegura porque se puede producir un comportamiento indefinido si el programador no cumple con los requisitos documentados de `RawWaker`. Antes de que veamos la implementación de la función `dummy_raw_waker`, primero intentemos entender cómo funciona el tipo `RawWaker`. ##### `RawWaker` El tipo [`RawWaker`] requiere que el programador defina explícitamente un [_tabla de métodos virtuales_] (_vtable_) que especifica las funciones que deben ser llamadas cuando `RawWaker` se clona, se despierta o se elimina. La disposición de esta vtable es definida por el tipo [`RawWakerVTable`]. Cada función recibe un argumento `*const ()`, que es un puntero _sin tipo_ a algún valor. La razón por la que se utiliza un puntero `*const ()` en lugar de una referencia apropiada es que el tipo `RawWaker` debería ser no genérico pero aún así soportar tipos arbitrarios. El puntero se proporciona colocando `data` en la llamada a [`RawWaker::new`], que simplemente inicializa un `RawWaker`. Luego, el `Waker` utiliza este `RawWaker` para llamar a las funciones de la vtable con `data`. [_tabla de métodos virtuales_]: https://en.wikipedia.org/wiki/Virtual_method_table [`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html [`RawWaker::new`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new Típicamente, el `RawWaker` se crea para alguna estructura asignada en el heap que está envuelta en el tipo [`Box`] o [`Arc`]. Para tales tipos, pueden usarse métodos como [`Box::into_raw`] para convertir el `Box` en un puntero `*const T`. Este puntero puede luego ser convertido a un puntero anónimo `*const ()` y pasado a `RawWaker::new`. Dado que cada función de vtable recibe el mismo `*const ()` como argumento, las funciones pueden convertir de forma segura el puntero de regreso a un `Box` o un `&T` para operar en él. Como puedes imaginar, este proceso es extremadamente peligroso y puede llevar fácilmente a un comportamiento indefinido en caso de errores. Por esta razón, no se recomienda crear manualmente un `RawWaker` a menos que sea absolutamente necesario. [`Box`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`Box::into_raw`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw ##### Un `RawWaker` Inútil Como crear manualmente un `RawWaker` no es recomendable, hay un camino seguro para crear un `Waker` inútil que no haga nada. Afortunadamente, el hecho de que queramos no hacer nada hace que sea relativamente seguro implementar la función `dummy_raw_waker`: ```rust // en src/task/simple_executor.rs use core::task::RawWakerVTable; fn dummy_raw_waker() -> ================================================ FILE: blog/content/edition-2/posts/12-async-await/index.ja.md ================================================ +++ title = "Async/Await" weight = 12 path = "ja/async-await" date = 2020-03-27 [extra] # Please update this when updating the translation translation_based_on_commit = "bf4f88107966c7ab1327c3cdc0ebfbd76bad5c5f" # GitHub usernames of the authors of this translation translators = ["kahirokunn", "garasubo", "sozysozbot", "swnakamura"] # GitHub usernames of the people that contributed to this translation translation_contributors = ["asami-kawasaki", "Foo-x"] +++ この記事では、Rustの**協調的マルチタスク**と**async/await**機能について説明します。Rustのasync/await機能については、`Future` trait の設計、ステートマシンの変換、 **pinning** などを含めて詳しく説明します。そして、非同期キーボードタスクと基本的なexecutorを作成することで、カーネルにasync/awaitの基本的なサポートを追加します。 このブログの内容は [GitHub] 上で公開・開発されています。何か問題や質問などがあれば issue をたててください (訳注: リンクは原文(英語)のものになります)。また[こちら][at the bottom]にコメントを残すこともできます。この記事の完全なソースコードは[`post-12` ブランチ][post branch]にあります。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-12 ## マルチタスク ほとんどのOSの基本機能のひとつに、複数のタスクを同時に実行できる[**マルチタスク**]というものがあります。例えば、この記事をご覧になっている間も、テキストエディタやターミナルウィンドウなど、他のプログラムを開いていることでしょう。また、ブラウザのウィンドウを1つだけ開いていたとしても、デスクトップのウィンドウを管理したり、アップデートをチェックしたり、ファイルのインデックスを作成したりと、さまざまなバックグラウンドタスクがあるはずです。 [**マルチタスク**]: https://en.wikipedia.org/wiki/Computer_multitasking 一見、すべてのタスクが並列に実行されているように見えますが、1つのCPUコアで同時に実行できるのは1つのタスクだけです。タスクが並列に実行されているように見せるために、OSは実行中のタスクを素早く切り替えて、それぞれのタスクが少しずつ進むようにしています。コンピュータは高速なので、ほとんどの場合、私達がこの切り替えに気づくことはありません。 シングルコアのCPUは一度に1つのタスクしか実行できませんが、マルチコアのCPUは複数のタスクを真の意味で並列に実行することができます。例えば、8コアのCPUであれば、8つのタスクを同時に実行することができます。マルチコアCPUの設定方法については、今後の記事でご紹介します。この記事では、わかりやすくするために、シングルコアのCPUに焦点を当てます。(なお、マルチコアCPUには、最初は1つのアクティブコアしかないので、ここではシングルコアCPUとして扱っても問題はありません)。 マルチタスクには2つの形態があります。**協調的**マルチタスクでは、タスクが定期的にCPUの制御を放棄することで、他のタスクの処理を進めます。**非協調的**マルチタスクは、OSの機能を利用して、任意の時点でスレッドを強制的に一時停止させて切り替えるものです。以下では、この2つのマルチタスクについて、それぞれの長所と短所を説明します。 ### 非協調的マルチタスク 非協調的マルチタスクの考え方は、タスクを切り替えるタイミングをOSが制御するというものです。そのためには、割り込みのたびにCPUの制御権がOS側に戻ってくることを利用します。これにより、システムに新しい入力があったときに、タスクを切り替えることができます。例えば、マウスを動かしたときやネットワークパケットが届いたときなどにタスクを切り替えることができます。OSは、ハードウェアのタイマーを設定して、その時間が経過したら割り込みを送るようにすることで、タスクの実行が許される正確な時間を決定することもできます。 ハードウェア割り込みでのタスク切り替え処理を下図に示します: ![](regain-control-on-interrupt.svg) 最初の行では、CPUがプログラム`A`のタスク`A1`を実行しています。他のすべてのタスクは一時停止しています。2行目では、CPUにハードウェア割り込みが入ります。[**ハードウェア割り込み**](訳注: 翻訳当時、リンク先未訳)の記事で説明したように、CPUは直ちにタスク`A1`の実行を停止し、割り込み記述子テーブル(IDT)に定義されている割り込みハンドラにジャンプします。この割り込みハンドラを介して、OSは再びCPUを制御できるようになり、タスク`A1`の継続ではなく、タスク`B1`に切り替えることができます。 [**ハードウェア割り込み**]: @/edition-2/posts/07-hardware-interrupts/index.md #### 状態の保存 タスクは任意の時点で中断されるため、計算の途中である可能性もあります。後で再開できるようにするために、OSは、タスクの[コールスタック]やすべてのCPUレジスタの値など、タスクの状態全体をバックアップする必要があります。この作業を[コンテキスト・スイッチ (context switch)] といいます。 [コールスタック]: https://ja.wikipedia.org/wiki/%E3%82%B3%E3%83%BC%E3%83%AB%E3%82%B9%E3%82%BF%E3%83%83%E3%82%AF [コンテキスト・スイッチ (context switch)]: https://ja.wikipedia.org/wiki/%E3%82%B3%E3%83%B3%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88%E3%82%B9%E3%82%A4%E3%83%83%E3%83%81 コールスタックは非常に大きくなる可能性があるため、OSは通常、各タスクのスイッチでコールスタックの内容をバックアップする代わりに、各タスクに個別のコールスタックを設定します。このような独立したスタックを持つタスクは、[略して**スレッド**][_thread of execution_]と呼ばれます。タスクごとに独立したスタックを使用することで、コンテキスト・スイッチの際に保存する必要があるのはレジスタの内容だけになります(プログラム・カウンタとスタック・ポインタを含む)。この方法を取ることで、コンテキスト・スイッチの性能上のオーバーヘッドが最小限になります。これは、コンテキスト・スイッチが1秒間に100回も行われることがあるため、非常に重要なことです。 [_thread of execution_]: https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AC%E3%83%83%E3%83%89_(%E3%82%B3%E3%83%B3%E3%83%94%E3%83%A5%E3%83%BC%E3%82%BF) #### 議論 非協調的マルチタスクの主な利点は、OSがタスクの許容実行時間を完全に制御できることです。これにより、各タスクが協力しなくても、CPU時間を公平に確保できることが保証されます。これは、サードパーティのタスクを実行する場合や、複数のユーザーがシステムを共有する場合に特に重要です。 非協調的マルチタスクの欠点は、各タスクが独自のスタックを必要とすることです。共有スタックと比べると、タスクごとのメモリ使用量が多くなり、システム内のタスク数が制限されることが多くなります。また、タスクがレジスタのごく一部しか使用していない場合でも、タスクが切り替わるたびにOSは常にCPUレジスタの状態を完全に保存しなければならないというデメリットもあります。 非協調的マルチタスクとスレッドは、信頼されていないユーザースペース・プログラムの実行を可能にする、OSの基本的な構成要素です。これらの概念については、今後の記事で詳しく説明します。しかし今回は、カーネルにも有用な機能を提供する協調的マルチタスクに焦点を当てます。 ### 協調的マルチタスク 協調的マルチタスクでは、実行中のタスクを任意のタイミングで強制的に停止させるのではなく、各タスクが自発的にCPUの制御を放棄するまで実行させます。これにより、例えば、I/O操作を待つ必要がある場合など、都合の良いタイミングでタスクは一時停止することができます。 協調的マルチタスクは、言語レベルで使われることが多いです。具体的には、[コルーチン]や[async/await]などの形で登場します。これは、プログラマやコンパイラがプログラムに[_yield_]操作を挿入することで、CPUの制御を放棄し、他のタスクを実行させるというものです。例えば、複雑なループの各反復の後に yield を挿入することができます。 [コルーチン]: https://ja.wikipedia.org/wiki/%E3%82%B3%E3%83%AB%E3%83%BC%E3%83%81%E3%83%B3 [async/await]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html [_yield_]: https://en.wikipedia.org/wiki/Yield_(multithreading) 協調的マルチタスクは[非同期I/O]と組み合わせるのが一般的です。非同期I/O では、操作が終了するまで待って、その間に他のタスクが実行できないようにする代わりに、操作がまだ終了していない場合は"not ready"というステータスを返します。この場合、待機中のタスクは yieldを実行して他のタスクを実行させることができます。 [非同期I/O]: https://ja.wikipedia.org/wiki/%E9%9D%9E%E5%90%8C%E6%9C%9FIO #### 状態の保存 タスクは自分で一時停止のポイントを決めるので、OSがタスクの状態を保存しなくてよくなります。その代わり、自分が停止する直前に継続するのに必要になる状態だけを保存することができ、その結果、パフォーマンスが向上することが多いです。例えば、複雑な計算を終えたばかりのタスクは、中間結果を必要としないため、計算の最終結果をバックアップするだけで済むかもしれません。 言語でサポートされている協調タスクの実装では、一時停止する前にコールスタックの必要な部分をバックアップすることもできることが多いです。例えば、Rustのasync/awaitの実装では、まだ必要なすべてのローカル変数を、自動的に生成された構造体に格納しています(後述)。一時停止の前にコールスタックの関連部分をバックアップすることで、すべてのタスクが単一のコールスタックを共有することができ、タスクごとのメモリ消費量が大幅に少なくなります。これにより、メモリ不足に陥ることなく、ほぼ任意の数の協調タスクを作成することができます。 #### 議論 協調的マルチタスクの欠点は、非協力的なタスクが無制限の時間実行できる可能性があることです。そのため、悪意のあるタスクやバグのあるタスクが他のタスクの実行を妨げ、システム全体の速度を低下させたり、ブロックしたりすることがあります。このような理由から、協調的マルチタスクは、すべてのタスクが協調することがわかっている場合にのみ使用する必要があります。反例として、任意のユーザーレベルプログラムの協調にOSを依存させるのはよくありません。 しかし、協調的マルチタスクは、パフォーマンスやメモリの面で非常に優れているため、非同期処理と組み合わせて、 **プログラムの中で** 使用するのには適した手法です。OSのカーネルは、非同期のハードウェアとやりとりする、パフォーマンスが非常に重要なプログラムであるため、協調的マルチタスクは同時実行の実装に適したアプローチであると言えます。 ## RustのAsync/Await Rust言語は、async/awaitという形で協調的マルチタスクのファーストクラス(訳注:第一級オブジェクトの意)のサポートを提供しています。async/awaitとは何か、どのように機能するのかを探る前に、Rustで **future** と非同期プログラミングがどのように機能するのかを理解する必要があります。 ### Future **future** は、まだ利用できない可能性のある値を表します。例えば、他のタスクで計算された整数や、ネットワークからダウンロードされたファイルなどが考えられます。futureは、値が利用可能になるまで待つのではなく、値が必要になるまで実行を続けることを可能にします。 #### 例 future の概念は、小さな例で説明するのが一番です: ![シーケンス図: main は `read_file` を呼び出して戻るまでブロックされ、次に `foo()` を呼び出して戻るまでブロックされます。同じ処理が繰り返されますが、今回は `async_read_file` が呼ばれ、すぐに future が返されます。そして `foo()` が再び呼ばれ、今度はファイルのロードと同時に実行されます。ファイルは `foo()` が戻る前に利用可能になります。](async-example.svg) このシーケンス図は、ファイルシステムからファイルを読み込み、関数 `foo` を呼び出す `main` 関数を示しています。この処理は2回繰り返されます。すなわち、同期的な `read_file` の呼び出しと、非同期的な `async_read_file` の呼び出しです。 同期呼び出しの場合、`main`関数はファイルシステムからファイルが読み込まれるまで待つ必要があります。それが終わって初めて、`foo`関数を呼び出すことができ、結果を再び待つ必要があります。 非同期の `async_read_file` 呼び出しでは、ファイルシステムがすぐにfutureを返し、バックグラウンドで非同期にファイルをロードします。これにより、`main`関数は`foo`をより早く呼び出すことができ、`foo`はファイルのロードと並行して実行されます。この例では、ファイルのロードは `foo` が戻る前に終了しているので、`main` は `foo` が戻った後にさらに待つことなく、ファイルをすぐに処理することができます。 #### RustにおけるFuture Rustでは、futureは[`Future`]という trait で表され、次のようになります: [`Future`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html ```rust pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll; } ``` [関連型] `Output` は非同期値の型を指定します。例えば、上の図の `async_read_file` 関数は、`Output` を `File` に設定した `Future` インスタンスを返します。 [関連型]: https://doc.rust-lang.org/book/ch20-02-advanced-traits.html#associated-types [`poll`]メソッドは、その値がすでに利用可能かどうかをチェックすることができます。このメソッドは、以下のような [`Poll`] 列挙体を返します。 [`poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll [`Poll`]: https://doc.rust-lang.org/nightly/core/task/enum.Poll.html ```rust pub enum Poll { Ready(T), Pending, } ``` 値が既に利用可能な場合(例えば、ファイルがディスクから完全に読み込まれた場合)、その値は `Ready` variantにラップされて返されます。それ以外の場合は、`Pending` variantが返され、呼び出し側に値がまだ利用できないことを知らせます。 `poll`メソッドは2つの引数を取ります。`self: Pin<&mut Self>`と`cx: &mut Context`です。前者は通常の `&mut self` の参照のように動作しますが、`self` の値がそのメモリロケーションに [ピン留め/固定 (pin)][_pinned_] されるという違いがあります。`Pin`とその必要性を理解するには、まずasync/awaitの仕組みを理解しなければなりません。そのため、それについてはこの記事の後半で説明します。 [_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html `cx: &mut Context`パラメータの目的は、ファイルシステムのロードなどの非同期タスクに[`Waker`]インスタンスを渡すことです。この `Waker` によって、非同期タスクは自分(またはその一部)が終了したこと、例えばファイルがディスクから読み込まれたことを通知することができます。メインタスクは`Future`が準備できたら通知されることを知っているので、`poll`を何度も何度も呼び出す必要はありません。このプロセスについては、後ほど独自の waker 型を実装する際に詳しく説明します。 [`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html ### Futureとの連携 futureがどのように定義されているか、また、`poll`メソッドの基本的な考え方を理解しました。しかし、futureを効果的に使う方法はまだわかっていません。問題は、futureが非同期タスクの結果を表していて、それがまだ利用できない可能性があることです。しかし、実際には、これらの値が次の計算のためにすぐに必要になることがよくあります。そこで問題となるのは、どうすれば必要になったときに効率的にfutureの値を取り出すことができるかということです。 #### Futureを待つ 1つの可能な答えは、futureの準備が整うまで待つことです。これは次のようなものです: ```rust let future = async_read_file("foo.txt"); let file_content = loop { match future.poll(…) { Poll::Ready(value) => break value, Poll::Pending => {}, // 何もしない } } ``` ここでは、`poll`をループで何度も呼び出すことで、futureを「積極的」に待つようにしています。`poll`の引数はここでは重要ではないので、省略しています。この解決策はうまくいきはしますが、値が利用可能になるまでCPUを忙しくさせているので、非常に非効率的です。 より効率的なアプローチは、futureが利用可能になるまで現在のスレッドを **ブロック** することです。もちろん、これはスレッドがある場合にのみ可能なことで、この解決策は、少なくとも現時点では私たちのカーネルでは機能しません。ブロッキングがサポートされているシステムでも、非同期タスクが同期タスクに戻ってしまい、並列タスクの潜在的なパフォーマンスの利点が阻害されてしまうため、ブロッキングは好まれません。 #### Futureコンビネータ 待機する代わりに、Futureコンビネータを使うこともできます。Futureコンビネータは `map` のようなメソッドで、[`Iterator`] のメソッドと同じように、futureを連鎖させたり組み合わせたりすることができます。これらのコンビネータはfutureを待つのではなくfutureを返し、それによって`poll`のmap操作を適用します。 [`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html 例として、`Future`を`Future`に変換するためのシンプルな`string_len`コンビネータは次のようになります: ```rust struct StringLen { inner_future: F, } impl Future for StringLen where F: Future { type Output = usize; fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll { match self.inner_future.poll(cx) { Poll::Ready(s) => Poll::Ready(s.len()), Poll::Pending => Poll::Pending, } } } fn string_len(string: impl Future) -> impl Future { StringLen { inner_future: string, } } // 使用例 fn file_len() -> impl Future { let file_content_future = async_read_file("foo.txt"); string_len(file_content_future) } ``` このコードは、[**ピン留め**](pinning)を扱っていないので、完全には動作しませんが、例としては十分です。基本的なアイデアは、`string_len` 関数が、与えられた `Future` インスタンスを、新しい `StringLen` 構造体にラップするというもので、この構造体も `Future` を実装しています。ラップされたfutureがポーリングされると、内部のfutureをポーリングします。値がまだ準備できていない場合は、ラップされたfutureからも `Poll::Pending` が返されます。値の準備ができていれば、`Poll::Ready` variantから文字列が抽出され、その長さが計算されます。その後、再び `Poll::Ready` にラップされて返されます。 [**ピン留め**]: https://doc.rust-lang.org/stable/core/pin/index.html この`string_len`関数を使えば、非同期の文字列を待つことなく、その長さを計算することができます。この関数は再び`Future`を返すので、呼び出し側は返された値を直接扱うことはできず、再びコンビネータ関数を使う必要があります。このようにして、呼び出しグラフ全体が非同期になったので、どこかの時点で、例えばmain関数の中で、一度に複数のfutureを効率的に待つことができるようになりました。 コンビネータ関数を手動で書くのは難しいので、ライブラリで提供されることが多いです。Rustの標準ライブラリ自体はまだコンビネータのメソッドを提供していませんが、半公式(かつ`no_std`互換)の[`futures`]クレートは提供しています。その[`FutureExt`] traitは、[`map`]や[`then`]といった高レベルのコンビネータメソッドを提供しており、これを使って任意のクロージャで結果を操作することができます。 [`futures`]: https://docs.rs/futures/0.3.4/futures/ [`FutureExt`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html [`map`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map [`then`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then ##### 利点 Futureコンビネータの大きな利点は、操作を非同期に保つことができることです。非同期I/Oインターフェイスと組み合わせることで、このアプローチは非常に高いパフォーマンスを実現します。Futureコンビネータは通常のtrait実装付き構造体として実装されているため、コンパイラはこれを非常によく最適化できます。詳細については、Rustのエコシステムにfutureが追加されたことを発表した[_Zero-cost futures in Rust_]の記事を参照してください。 [_Zero-cost futures in Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ ##### 欠点 {#drawbacks} Futureコンビネータを使うと、非常に効率的なコードを書くことができますが、型システムやクロージャベースのインターフェイスのため、状況によっては使いにくいことがあります。例えば、次のようなコードを考えてみましょう: ```rust fn example(min_len: usize) -> impl Future { async_read_file("foo.txt").then(move |content| { if content.len() < min_len { Either::Left(async_read_file("bar.txt").map(|s| content + &s)) } else { Either::Right(future::ready(content)) } }) } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=91fc09024eecb2448a85a7ef6a97b8d8)) ここでは、ファイル `foo.txt` を読み込んでから、[`then`] コンビネータを使って、ファイルの内容に基づいて 2 番目の future を連鎖させています。もしコンテンツの長さが与えられた `min_len` よりも小さければ、別の `bar.txt` ファイルを読み込んで、[`map`] コンビネータを使って `content` に追加します。それ以外の場合は、`foo.txt` の内容のみを返します。 `min_len` のライフタイムエラーが発生するのを防ぐため、`then` に渡すクロージャには [`move` キーワード]を使用する必要があります。[`Either`] ラッパーを使う理由は、if と else のブロックは常に同じ型でなければならないからです。ブロックの中で異なるfutureの型を返しているので、ラッパーの型を使って単一の型に統一する必要があります。[`ready`] 関数とは、もう既に手元にあるデータを、『一瞬で準備の完了するfuture』へと変換する関数です。`Either` ラッパーはラップされた値が`Future`を実装していることを期待しているので、ここではこの関数が必要です。 [`move` キーワード]: https://doc.rust-lang.org/std/keyword.move.html [`Either`]: https://docs.rs/futures/0.3.4/futures/future/enum.Either.html [`ready`]: https://docs.rs/futures/0.3.4/futures/future/fn.ready.html ご想像のとおり、大規模なプロジェクトでは非常に複雑なコードになることがあります。特に、借用や異なるライフタイムが関係する場合は複雑になります。このような理由から、Rustにasync/awaitのサポートを追加するために多くの作業が行われ、非同期のコードを圧倒的にシンプルに書くことができるようになりました。 ### Async/Awaitパターン async/awaitの背後にある考え方は、プログラマに、見た目は通常の同期コードのように見えるが、コンパイラによって非同期コードに変換されるコードを書かせることです。これは `async` と `await` という2つのキーワードに基づいて動作します。キーワード `async` は、関数のシグネチャの中で使用することができ、同期関数を、futureの値を返す非同期関数に変えることができます: ```rust async fn foo() -> u32 { 0 } // 上記はコンパイラによって次のように変換されます: fn foo() -> impl Future { future::ready(0) } ``` このキーワードだけではそれほど便利ではありません。しかし、`async`関数の中では、`await`キーワードを使って、futureの値を非同期に取得することができます: ```rust async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=d93c28509a1c67661f31ff820281d434)) この関数は、[上記](#drawbacks)のコンビネータ関数を使った `example` 関数をそのまま翻訳したものです。 `.await` 演算子を使うことで、クロージャや `Either` 型を必要とせずに future の値を取得することができます。その結果、まるで通常の同期コードを書いているかのように非同期コードを書くことができます。 #### ステートマシンへの変換 舞台裏で何をしているかというと、`async`関数の本体を[**ステートマシン (state machine)**]に変換し、`.await`を呼び出すたびに異なる状態を表すようにしています。上記の `example` 関数の場合、コンパイラは以下の4つの状態を持つステートマシンを作成します: [**ステートマシン (state machine)**]: https://en.wikipedia.org/wiki/Finite-state_machine ![Four states: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-states.svg) 各ステートは、関数の異なる待ち状態を表しています。 **"Start"** と **"End"** の状態は、関数の実行開始時と終了時を表しています。 **"Waiting on foo.txt"** の状態は、関数が最初の`async_read_file` の結果を待っていることを表しています。同様に、 **"Waiting on bar.txt"** 状態は、関数が2つ目の`async_read_file`の結果を待っている待ち状態を表しています。 ステートマシンは、各 `poll` 呼び出しを可能な状態遷移に変換することで、`Future` traitを実装しています: ![Four states: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-basic.svg) この図では、矢印で状態の切り替えを、ダイヤ形で条件分岐を表現しています。例えば、`foo.txt`のファイルが準備できていない場合、 **"no"** と書かれたパスが取られ、 **"Waiting on foo.txt"** の状態になります。それ以外の場合は、 **"yes"** のパスが取られます。キャプションのない小さな赤いダイヤは、`example`関数の`if content.len() < 100`の分岐を表しています。 最初の `poll` 呼び出しで関数が開始され、まだ準備ができていないfutureに到達するまで実行されていることがわかります。パス上のすべてのfutureが準備できていれば、関数は **"End"** 状態まで実行でき、そこで結果を `Poll::Ready` でラップして返します。そうでなければ、ステートマシンは待機状態になり、`Poll::Pending`を返します。次の `poll` 呼び出し時には、ステートマシンは最後の待ち状態から開始し、最後の操作を再試行します。 #### 状態を保存 最後に待機していた状態から継続できるようにするために、ステートマシンは現在の状態を内部的に追跡する必要があります。さらに、次の `poll` 呼び出しで実行を継続するために必要なすべての変数を保存する必要があります。ここでコンパイラが威力を発揮します。コンパイラは、どの変数がいつ使われるかを知っているので、必要な変数だけを持つ構造体を自動的に生成することができます。 例として、コンパイラは上記の `example` 関数に対して以下のような構造体を生成します: ```rust // `example`関数は既に上の方で定義されていましたが、画面をスクロールして探さなくても良いように、ここに再び定義しておきます async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } // コンパイラが生成したState構造体です: struct StartState { min_len: usize, } struct WaitingOnFooTxtState { min_len: usize, foo_txt_future: impl Future, } struct WaitingOnBarTxtState { content: String, bar_txt_future: impl Future, } struct EndState {} ``` "start" と **"Waiting on foo.txt"** の状態では、`min_len`パラメータを保存する必要があります。これは後に`content.len()`と比較する際に必要になるからです。 **"Waiting on foo.txt"** 状態では、さらに`foo_txt_future`が格納されます。これは、`async_read_file`呼び出しが返したfutureを表します。このfutureは、ステートマシンが継続する際に再びポーリングされる必要があるため、保存する必要があります。 **"Waiting on bar.txt"** の状態には、`bar.txt`の準備ができた後の文字列の連結に必要な`content`変数が含まれています。また、`bar.txt`がロード中であることを表す`bar_txt_future`も格納されています。この構造体には、`min_len`変数は含まれていません。これは、`content.len()`の比較の後では、もはや必要ないからです。 **"end"** の状態では、関数はすでに完了まで実行されているので、変数は格納されません。 これはコンパイラが生成しうるコードの一例に過ぎないことに注意してください。構造体の名前やフィールドのレイアウトは実装においては枝葉末節であり、異なる可能性があります。 #### 完全なステートマシンの型 具体的にコンパイラがどのようなコードを生成するのかは実装依存ですが、`example` 関数に対してどのようなステートマシンが生成されうるかを想像することは、理解を助けることになります。異なる状態を表し、必要な変数を含む構造体はすでに定義されています。これらの構造体の上にステートマシンを作成します。そのためには、これらの構造体を[`enum`]構造体にまとめるという方法があります: [`enum`]: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html ```rust enum ExampleStateMachine { Start(StartState), WaitingOnFooTxt(WaitingOnFooTxtState), WaitingOnBarTxt(WaitingOnBarTxtState), End(EndState), } ``` 各状態に対応して個別のenum variantを定義し、対応するstate構造体をフィールドとして各variantに追加しています。状態の遷移を実装するために、コンパイラは `example` 関数に基づいて `Future` traitの実装を生成します: ```rust impl Future for ExampleStateMachine { type Output = String; // `example`の返り値の型 fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { loop { match self { // TODO: ピン留めを処理する ExampleStateMachine::Start(state) => {…} ExampleStateMachine::WaitingOnFooTxt(state) => {…} ExampleStateMachine::WaitingOnBarTxt(state) => {…} ExampleStateMachine::End(state) => {…} } } } } ``` 関数 `example` の戻り値であるため、futureの `Output` 型は `String` となります。`poll`関数を実装するために、`loop` の中で現在の状態に対するmatch文を使います。これは、可能な限り次の状態に切り替え、継続できないときには明示的に `return Poll::Pending` を使用するというものです。 簡単のため、ここでは簡略化したコードのみを示し、[ピン留め][**ピン留め**]、所有権、寿命などは扱っていません。そのため、このコードと以下のコードは疑似コードとして扱い、直接使用しないでください。もちろん、実際にコンパイラが生成したコードは、おそらく異なる方法ではあるものの、すべてを正しく処理します。 コードの抜粋が長大になるのを防ぐために、各マッチアームのコードを別々に紹介します。まず、`Start`の状態から始めましょう: ```rust ExampleStateMachine::Start(state) => { // from body of `example` let foo_txt_future = async_read_file("foo.txt"); // `.await` operation let state = WaitingOnFooTxtState { min_len: state.min_len, foo_txt_future, }; *self = ExampleStateMachine::WaitingOnFooTxt(state); } ``` 関数の冒頭ではステートマシンが `Start` 状態にあります。このとき、`example`関数の中身を最初の`.await`まですべて実行します。`.await`の操作を処理するために、`self`ステートマシンの状態を`WaitingOnFooTxt`に変更し、`WaitingOnFooTxtState`構造体の構築を行います。 `match self {...}`文はloopで実行されるので、実行は`WaitingOnFooTxt`アームにジャンプします: ```rust ExampleStateMachine::WaitingOnFooTxt(state) => { match state.foo_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(content) => { // from body of `example` if content.len() < state.min_len { let bar_txt_future = async_read_file("bar.txt"); // `.await` operation let state = WaitingOnBarTxtState { content, bar_txt_future, }; *self = ExampleStateMachine::WaitingOnBarTxt(state); } else { *self = ExampleStateMachine::End(EndState); return Poll::Ready(content); } } } } ``` このマッチアームでは、まず `foo_txt_future` の `poll` 関数を呼び出します。もし準備ができていなければ、ループを抜けて `Poll::Pending` を返します。この場合、`self`は`WaitingOnFooTxt`状態のままなので、ステートマシンの次の`poll`呼び出しは同じマッチアームに入り、`foo_txt_future`のポーリングを再試行することになります。 `foo_txt_future`の準備ができていたら、その結果を`content`変数に代入して、引き続き`example`関数のコードを実行します。`content.len()`がstate構造体に保存されている`min_len`よりも小さければ、`bar.txt`ファイルが非同期に読み込まれます。`.await`の操作を再び状態の変化に変換し、今回は`WaitingOnBarTxt`の状態にします。ループ内で `match` を実行しているので、その後新しい状態のマッチアームにすぐにジャンプし、そこで `bar_txt_future` がポーリングされます。 `else`の分岐に入った場合、それ以上の`.await`操作は発生しません。関数の最後に到達したため、`Poll::Ready`でラップされた`content`を返します。また、現在の状態を `End` に変更します。 `WaitingOnBarTxt`の状態のコードは以下のようになります: ```rust ExampleStateMachine::WaitingOnBarTxt(state) => { match state.bar_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(bar_txt) => { *self = ExampleStateMachine::End(EndState); // from body of `example` return Poll::Ready(state.content + &bar_txt); } } } ``` `WaitingOnFooTxt`の状態と同様に、まず`bar_txt_future`をポーリングします。まだ保留中 (pending) であれば、ループを抜けて `Poll::Pending` を返します。そうでなければ、`example`関数の最後の操作(`content`変数とfutureからの結果の連結)を行います。ステートマシンを `End` 状態に更新して、`Poll::Ready` でラップされた結果を返します。 最後に、`End`状態のコードは以下のようになります: ```rust ExampleStateMachine::End(_) => { panic!("poll called after Poll::Ready was returned"); // "Poll::Readyが返された後にpollが呼び出されました" } ``` Futureは `Poll::Ready` を返した後、再びポーリングされるべきではありません。したがって、すでに `End` の状態にあるときに `poll` が呼ばれるとパニックするようにしましょう。 コンパイラが生成するステートマシンとその `Future` traitの実装はこのようになっている**かもしれません**。実際には、コンパイラは異なる方法でコードを生成しています。 (一応、現在は[_coroutines_]をベースにした実装になっていますが、これはあくまでも実装の詳細です。) [_coroutines_]: https://doc.rust-lang.org/stable/unstable-book/language-features/coroutines.html パズルの最後のピースは、生成された `example` 関数自体のコードです。関数のヘッダは次のように定義されていたことを思い出してください: ```rust async fn example(min_len: usize) -> String ``` 関数本体はすべてステートマシンによって実装されたので、この関数がするべきことはステートマシンを初期化して返すことだけです。これを行う自動生成コードは次のようになります: ```rust fn example(min_len: usize) -> ExampleStateMachine { ExampleStateMachine::Start(StartState { min_len, }) } ``` この関数は、`async`修飾子を持たなくなり、`Future` traitを実装した`ExampleStateMachine`型を明示的に返すようになりました。予想通り、ステートマシンは `Start` 状態で構築され、対応するstate構造体は `min_len` パラメータで初期化されます。 この関数は、ステートマシンの実行を開始しないことに注意してください。これは『最初にポーリングされるまで何もしない』という、Rustにおけるfutureの基本的な設計上の決定を反映したものです。 ### ピン留め この記事の中で、すでに何度も「ピン留め」について触れています。今こそ、ピン留めとは何か、なぜピン留めが必要なのかを探る時です。 #### 自己参照構造体 上で説明したように、ステートマシン変換では、各待ち状態のローカル変数を構造体に格納します。私たちの `example` 関数のような小さな例では、これは簡単で、特に問題にはなりませんでした。しかし、変数が相互に参照し合う場合には、問題が難しくなります。例えば、次の関数を考えてみましょう: ```rust async fn pin_example() -> i32 { let array = [1, 2, 3]; let element = &array[2]; async_write_file("foo.txt", element.to_string()).await; *element } ``` この関数は、内容が `1`, `2`, `3` の小さな `array` を作成します。そして、配列の最後の要素への参照を作成し、それを `element` 変数に格納します。次に、文字列に変換された数値を非同期的に `foo.txt` ファイルに書き込みます。最後に、`element`で参照していた数値を返します。 この関数は1つの `await` オペレーションを使用するため、結果として得られるステートマシンには start、end、"waiting on write" の 3 つの状態があります。この関数は引数を取らないので、開始状態 (start) の構造体は空です。先ほどと同じように、end状態の時点で関数は終了しているので、この状態の構造体も空になります。"waiting on write"の状態を表す構造体はもっと面白いです: ```rust struct WaitingOnWriteState { array: [1, 2, 3], element: 0x1001c, // 配列の最後の要素のアドレス } ``` 戻り値には `element` が必要であり、 `array` は `element` によって参照されるので、`array` と `element` の両方の変数を格納する必要があります。`element`は参照なので、参照されている要素への **ポインタ** (つまり、メモリ上のアドレス)を格納します。ここでは、メモリアドレスの例として、`0x1001c` を使用しました。実際には、`array`フィールドの最後の要素のアドレスである必要がありますので、構造体がメモリ内のどこに存在するかに依存します。このような内部ポインタを持つ構造体は、フィールドの1つから自分自身を参照するため、 **自己参照**構造体と呼ばれます。 #### 自己参照構造体の問題点 自己参照構造体の内部ポインタには根本的な問題があり、それは構造体のメモリレイアウトを見ると明らかになります: ![array at 0x10014 with fields 1, 2, and 3; element at address 0x10020, pointing to the last array element at 0x1001c](self-referential-struct.svg) `array`フィールドはアドレス0x10014から始まり、`element`フィールドはアドレス0x10020から始まります。最後の配列要素がこのアドレスにあるので、アドレス0x1001cを指しています。この時点では、まだすべてが順調です。しかし、この構造体を別のメモリアドレスに移動させると問題が発生します: ![array at 0x10024 with fields 1, 2, and 3; element at address 0x10030, still pointing to 0x1001c, even though the last array element now lives at 0x1002c](self-referential-struct-moved.svg) 構造体を少し移動して、アドレス `0x10024` から始まるようにしました。これは、構造体を関数の引数として渡したり、別のスタック変数に代入したりしたときに起こります。問題は、最後の `array` 要素のアドレスが `0x1002c` になったにもかかわらず、`element` フィールドは未だアドレス `0x1001c` 番地を指していることです。そのため、ポインタがダングリングし(訳注:無効な場所を指すという意味)、次の `poll` 呼び出し時に未定義の動作が発生してしまいます。 #### 考えられる解決策 ダングリングポインタ問題を解決するための基本的なアプローチは3つあります: - **ムーブの際にポインタを更新する:** このアイデアは、構造体がメモリ内で移動するたびに内部ポインタを更新し、移動後も有効になるようにするものです。残念ながら、この方法では Rust に大規模な変更を加える必要があり、その結果、パフォーマンスが大幅に低下する可能性があります。その理由は、ある種のランタイムがすべての構造体のフィールドの型を追跡し、移動操作のたびにポインタの更新が必要かどうかをチェックする必要があるからです。 - **自己参照のかわりにオフセットを格納する:**: ポインタを更新する必要性を回避するために、コンパイラは自己参照を構造体の先頭からのオフセットとして格納することを試みるという手があります。例えば、上記の `WaitingOnWriteState` 構造体の `element` フィールドは、値が 8 の `element_offset` フィールドという形式で保存することもできるでしょう。これは、参照先の配列要素が構造体の先頭から 8 バイト後に始まるからです。構造体を移動してもオフセットは変わらないので、フィールドの更新は必要ありません。 このアプローチの問題点は、コンパイラがすべての自己参照を検出する必要があることです。これは、参照の値がユーザーの入力に依存する可能性があるため、コンパイル時には不可能です。そのため、参照を分析して状態構造体を正しく作成するために、再びランタイムシステムが必要になります。これではランタイムのコストがかかるだけでなく、ある種のコンパイラの最適化もできなくなるため、同じく大きなパフォーマンスの低下を招くことになります。 - **構造体のムーブを禁止する:** 上で見たように、ダングリングポインタが発生するのは、構造体をメモリ上でムーブさせたときだけです。自己参照構造体に対するムーブ操作を完全に禁止することで、この問題も回避することができます。この方法の大きな利点は、実行時 (ランタイム) の追加コストなしに、型システムのレベルで実装できることです。欠点は、自己参照をしているかもしれない構造体の移動操作の問題を解決する負担がプログラマにかかってしまうことです。 **ゼロコスト抽象化**(抽象化は実行時のコストを増やしてはならないという原則) を提供するというRustの理念から、Rustは3つ目の解決策を選択しました。そのために [RFC2349](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md) で提案されたのが [**pinning (ピン留め)**][**ピン留め**] APIです。以下では、このAPIの概要を説明し、async/awaitやfutureでどのように動作するかを説明します。 #### ヒープ上の値 まず最初に、[ヒープ上に確保]された値は、ほとんどの場合、すでに固定のメモリアドレスを持っているということに気づきます。これらの値は、`allocate` の呼び出しで作成され、`Box`のようなポインタ型で参照されます。ポインタ型を移動することは可能ですが、ポインタが指すヒープ上の値は、再び `deallocate` 呼び出しで解放されるまで、同じメモリアドレスに留まります。 [ヒープ上に確保]: @/edition-2/posts/10-heap-allocation/index.md ヒープ割り当てを利用して、自己参照型の構造体を作成してみましょう: ```rust fn main() { let mut heap_value = Box::new(SelfReferential { self_ptr: 0 as *const _, }); let ptr = &*heap_value as *const SelfReferential; heap_value.self_ptr = ptr; println!("heap value at: {:p}", heap_value); println!("internal reference: {:p}", heap_value.self_ptr); } struct SelfReferential { self_ptr: *const Self, } ``` ([Try it on the playground][playground-self-ref]) [playground-self-ref]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=ce1aff3a37fcc1c8188eeaf0f39c97e8 `SelfReferential` という名前のシンプルな構造体を作成します。この構造体には1つのポインタフィールドが含まれます。まず、この構造体をNULLポインタで初期化し、`Box::new`を使ってヒープ上に確保します。次に、ヒープに割り当てられた構造体のメモリアドレスを決定し、それを `ptr` 変数に格納します。最後に、`ptr`変数を`self_ptr`フィールドに代入して、構造体を自己参照にします。 このコードを[playground][playground-self-ref]で実行すると、ヒープ値のアドレスとその内部ポインタが等しいことがわかります。これは、`self_ptr`フィールドが有効な自己参照であることを意味します。`heap_value` 変数は単なるポインタなので、それを移動させても(例えば関数に渡しても)構造体自体のアドレスは変わらないので、ポインタを移動させても`self_ptr`は有効なままです。 しかし、この例を破綻させてしまう方法はまだあります。`Box`からその中身をムーブしたり、その内容を置き換えたりすることができます: ```rust let stack_value = mem::replace(&mut *heap_value, SelfReferential { self_ptr: 0 as *const _, }); println!("value at: {:p}", &stack_value); println!("internal reference: {:p}", stack_value.self_ptr); ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) ここでは、[`mem::replace`]関数を使用して、ヒープに割り当てられた値を新しい構造体のインスタンスで置き換えています。これにより、元の `heap_value` をスタックに移動させることができますが、構造体の `self_ptr` フィールドは、古いヒープアドレスを指し示すダングリングポインタになっています。この例をplaygroundで実行してみると、出力された **"value at:"** と **"internal reference:"** の行には、たしかに異なるポインタが表示されていることがわかります。つまり、値をヒープに割り当てるだけでは、自己参照を安全にするには不十分なのです。 [`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html 上記の破綻を許した根本的な問題は、`Box`によって、ヒープに割り当てられた値への`&mut T`参照を得ることができることです。この `&mut` 参照によって、 [`mem::replace`] や [`mem::swap`] などのメソッドを使って、ヒープに割り当てられた値を無効にすることが可能になります。この問題を解決するためには、自己参照構造体への `&mut` 参照が作成できないようにする必要があります。 [`mem::swap`]: https://doc.rust-lang.org/nightly/core/mem/fn.swap.html #### `Pin>`と`Unpin` ピン留めのAPIは、[`Pin`]ラッパー型と[`Unpin`]マーカーtraitという形で、`&mut T`問題に対する解決策を提供します。これらの型の背景にある考え方は、(`Pin`によって)ラップされた値への `&mut` 参照を取得するために使用できる `Pin` のすべてのメソッド (例えば、[`get_mut`][pin-get-mut] や [`deref_mut`][pin-deref-mut]) を `Unpin` trait に限定することです。`Unpin` traitは[**自動trait**][_auto trait_]であり、明示的に使用しないよう宣言した型を除くすべての型に対して自動的に実装されます。自己参照構造体は `Unpin` を使用しないようにさせることで、`Pin>` 型から `&mut T` を得る (安全な) 方法を無くすことができます。その結果、それらの内部の自己参照が有効であることが保証されます。 [`Pin`]: https://doc.rust-lang.org/stable/core/pin/struct.Pin.html [`Unpin`]: https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html [pin-get-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_mut [pin-deref-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.deref_mut [_auto trait_]: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits 例として、上記の `SelfReferential` 型を更新して、`Unpin` を使用しないようにしてみましょう: ```rust use core::marker::PhantomPinned; struct SelfReferential { self_ptr: *const Self, _pin: PhantomPinned, } ``` [`PhantomPinned`]型の2つ目のフィールド `_pin` を追加することで`Unpin`を使用しないようにします。この型はゼロサイズのマーカー型で、`Unpin` trait を実装**しない**ようにするためだけに置かれています。[自動trait][_auto trait_]の仕組み上、`Unpin`ではないフィールドが1つでもあれば、構造体全体が`Unpin`を使用しないようになります。 [`PhantomPinned`]: https://doc.rust-lang.org/nightly/core/marker/struct.PhantomPinned.html 第二のステップは、上の例の `Box` 型を `Pin>` 型に変更することです。これを行う最も簡単な方法は、ヒープに値を割り当てるために、[`Box::new`]関数ではなく[`Box::pin`]関数を使用することです: [`Box::pin`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin [`Box::new`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.new ```rust let mut heap_value = Box::pin(SelfReferential { self_ptr: 0 as *const _, _pin: PhantomPinned, }); ``` `Box::new` を `Box::pin` に変更することに加えて、構造体を初期化するコード(イニシャライザ)に新しい `_pin` フィールドを追加する必要があります。`PhantomPinned` はゼロサイズの型なので、初期化に必要なのはその型名だけです。 今、[調整した例を実行してみると](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=961b0db194bbe851ff4d0ed08d3bd98a)、動作しなくなっていることがわかります: ``` error[E0594]: cannot assign to data in dereference of `Pin>` --> src/main.rs:10:5 | 10 | heap_value.self_ptr = ptr; | ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot assign | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` error[E0596]: cannot borrow data in dereference of `Pin>` as mutable --> src/main.rs:16:36 | 16 | let stack_value = mem::replace(&mut *heap_value, SelfReferential { | ^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` ``` どちらのエラーも、`Pin>` 型が `DerefMut` trait を実装しなくなったために発生します。これはまさに求めていた結果であり、というのも、`DerefMut` trait は `&mut` 参照を返してしまうからで、私達はこれを防ぎたかったのです。これは、`Unpin` を使用しないようにして、`Box::new` を `Box::pin` に変更したからこそ起こる現象です。 ここで問題になるのは、コンパイラが16行目の型の移動を禁止するだけでなく、10行目の`self_ptr`フィールドの初期化も禁止してしまうことです。これは、コンパイラが `&mut` 参照の有効な使用と無効な使用を区別できないために起こります。初期化が再びうまくいくようにするには、安全ではない [`get_unchecked_mut`] メソッドを使用する必要があります: [`get_unchecked_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut ```rust // フィールドを変更しても構造体全体が移動するわけではないので、安全です。 unsafe { let mut_ref = Pin::as_mut(&mut heap_value); Pin::get_unchecked_mut(mut_ref).self_ptr = ptr; } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=b9ebbb11429d9d79b3f9fffe819e2018)) [`get_unchecked_mut`] 関数は `Pin>` ではなく `Pin<&mut T>` に対して動作するため、事前に [`Pin::as_mut`] を使用して値を変換する必要があります。その後、`get_unchecked_mut` が返す `&mut` 参照を使って、`self_ptr` フィールドを設定することができます。 [`Pin::as_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut これで残された唯一のエラーは、`mem::replace`における期待どおりのエラーです。この操作は、ヒープに割り当てられた値をスタックに移動させようとするもので、`self_ptr` フィールドに格納されている自己参照を破壊することになります。`Unpin`を使用するのをやめ、`Pin>` を使用することで、コンパイル時にこの操作を防ぐことができ、自己参照構造体を安全に扱うことができます。先ほど見たように、コンパイラは自己参照の生成が安全であることを(まだ)証明することができないので、unsafe ブロックを使用して、自分で正しさを検証する必要があります。 #### スタックのピン留めと `Pin<&mut T>` 前のセクションでは、`Pin>` を使って、ヒープに割り当てられた自己参照の値を安全に作成する方法を学びました。この方法はうまく機能し、(初期化の際にunsafeであったことを除けば)比較的安全ですが、必要なヒープの割り当てにはパフォーマンス上のコストがかかります。Rust は常に可能な限り **ゼロコスト抽象化** を提供したいと考えていますので、pinning API では、スタックに割り当てられた値を指す `Pin<&mut T>` インスタンスを作成することもできます。 ラップされた値の所有権を持つ `Pin>` インスタンスとは異なり、`Pin<&mut T>` インスタンスはラップされた値を一時的に借用しているだけです。これは、プログラマーが自分で保証をしなければならないことが増えることになるので、事態をより複雑にしています。最も重要なことは、`Pin<&mut T>` は、参照される `T` のライフタイム全体にわたってピン留めされていなければならないということですが、これはスタックベースの変数の場合には検証が困難です。この問題を解決するために、[`pin-utils`]のようなクレートが存在しますが、自分が何をしているかを本当に理解していない限り、スタック変数のpinはお勧めできません。 [`pin-utils`]: https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/ 詳しくは、[`pin` module]と[`Pin::new_unchecked`]メソッドのドキュメントをご覧ください。 [`pin` module]: https://doc.rust-lang.org/nightly/core/pin/index.html [`Pin::new_unchecked`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.new_unchecked #### ピン留めとFuture この記事ですでに見たように、[`Future::poll`]メソッドは、`Pin<&mut Self>`パラメータの形でピンを使用しています: [`Future::poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll ```rust fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll ``` このメソッドが通常の`&mut self`ではなく`self: Pin<&mut Self>`を取る理由は、[上][self-ref-async-await]で見たように、async/awaitから生成されるfutureのインスタンスはしばしば自己参照しているためです。`Self` を `Pin` にラップして、async/await から生成された自己参照のfutureに対して、コンパイラに `Unpin` をオプトアウトさせることで、`poll` 呼び出しの間にfutureがメモリ内で移動しないことが保証されます。これにより、すべての内部参照が有効であることが保証されます。 [self-ref-async-await]: @/edition-2/posts/12-async-await/index.md#self-referential-structs 注目すべきは、最初の `poll` 呼び出しの前にfutureを移動させることは問題ないということです。これは、futureがlazyであり、最初にポーリングされるまで何もしないという事実に起因しています。そのため、生成されたステートマシンの `start` 状態には、関数の引数だけが含まれており、内部参照は含まれていません。`poll` を呼び出すためには、呼び出し側はまずfutureを `Pin` にラップしなければなりません。これにより、futureがメモリ上で移動できなくなります。スタック上で正しく pin するのは、ヒープ上でするよりも難しいので、[`Box::pin`]と[`Pin::as_mut`]を組み合わせて使用することをお勧めします。 [`futures`]: https://docs.rs/futures/0.3.4/futures/ スタック変数のピン留めを使ってfutureのコンビネータ関数を安全に実装する方法を知りたい場合は、比較的短い `futures` クレートの [`map` コンビネータメソッドのソースコード][map-src] と pin のドキュメントの [projections and structural pinning] のセクションを見てください。 [map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html [projections and structural pinning]: https://doc.rust-lang.org/stable/std/pin/index.html#projections-and-structural-pinning ### Executor と Waker async/awaitを使えば、完全に非同期的なfutureを簡単に扱うことができます。しかし、上で学んだように、futureはポーリングされるまで何もしません。つまり、どこかの時点で`poll`を呼ばないと、非同期コードは実行されないということです。 単一のfutureであれば、[上述のように](#futurewodai-tu)ループを使って常に手動で各futureを待つことができます。しかし、この方法は非常に効率が悪く、多数のfutureを作成するプログラムでは実用的ではありません。この問題を解決する最も一般的な方法は、システム内のすべてのfutureが終了するまでポーリングする責任を負う、グローバルな **executor** を定義することです。 #### Executor executorの目的は、 `spawn` のようなメソッドを使って、独立したタスクとしてfutureを生成 (spawn) できるようにすることです。そして、executor はすべてのfutureが完了するまでポーリングする責任を担うのです。すべてのfutureを中央集権的に管理することの大きな利点は、あるfutureが `Poll::Pending` を返すたびに、executorが別のfutureに切り替えることができることです。このようにして、非同期の処理が並行して実行され、CPUをずっと忙しくしておけます。 多くのexecutorの実装では、システムが複数のCPUコアを持っている場合にそれを生かすことができるようになっています。これらの実装では、十分な作業量があればすべてのコアを利用できる[スレッドプール][thread pool]を作成したり、[work stealing]などの手法を用いてコア間の負荷を分散させたりします。また、低レイテンシーとメモリオーバーヘッドに最適化した、組み込みシステム用の特別なexecutorの実装もあります。 [thread pool]: https://en.wikipedia.org/wiki/Thread_pool [work stealing]: https://en.wikipedia.org/wiki/Work_stealing futureを何度もポーリングすることによるオーバーヘッドを避けるために、executorsは通常、Rustのfutureがサポートする **waker** APIを利用します。 #### Waker waker APIの背景にある考え方は、特別な[`Waker`]型が[`Context`]型にラップされて、`poll`の各呼び出しに渡されるというものです。この `Waker` 型はexecutorによって作成され、非同期タスクがその(部分的な)完了を知らせるために使用することができます。その結果、executorは以前に `Poll::Pending` を返したfutureに対して、対応するwakerから通知されるまでの間 `poll` を呼び出す必要がありません。 [`Context`]: https://doc.rust-lang.org/nightly/core/task/struct.Context.html これは、小さな例で説明するのが一番です: ```rust async fn write_file() { async_write_file("foo.txt", "Hello").await; } ``` この関数は文字列 "Hello" を `foo.txt` ファイルに非同期的に書き込みます。ハードディスクへの書き込みには時間がかかるので、このfutureの最初の `poll` 呼び出しはおそらく `Poll::Pending` を返すでしょう。しかし、ハードディスクドライバは `poll` 呼び出しに渡された `Waker` を内部に保存し、ファイルがディスクに書き込まれたときにそれを使ってexecutorに通知します。これにより、executorはwakerの通知を受け取るまでの間、再びfutureを `poll` して時間を無駄にせずにすみます。 この記事の実装セクションで、wakerをサポートした独自のexecutorを作成する際に、`Waker`型がどのように機能するかを詳しく見ていきます。 ### 協調的マルチタスク? この記事の冒頭で、非協調的マルチタスクと協調的マルチタスクについて説明しました。非協調的マルチタスクは、OSが実行中のタスクを強制的に切り替えることに依存していますが、協調的マルチタスクでは、タスクが定期的に **yield** 操作によって自発的にCPUの制御を放棄する必要があります。協調的マルチタスクの大きな利点は、タスクが自分で状態を保存できることです。これにより、コンテキスト・スイッチの効率が向上し、タスク間で同じコールスタックを共有することが可能になります。 すぐにはわからないかもしれませんが、futureとasync/awaitは、協調的マルチタスクの実装になっています: - 簡単に言ってしまえば、executorに追加される各futureが1つの協調的タスクです。 - future は、明示的なyield operationを使用する代わりに、`Poll::Pending`(もしくは終了時に`Poll::Ready`)を返すことで、CPU コアの制御を放棄します。 - futureがCPUの制御を手放すことを強制するものは何もありません。やろうと思えば、例えばループを無限に回すなどして、`poll`から決してリターンしないようにすることができます。 - それぞれのfutureは、executor内の他のfutureの実行をブロックできるため、悪意がないことを信用する必要があります。 - futureは、次の `poll` 呼び出しで実行を継続するために必要なすべての状態を内部に保存します。async/awaitでは、コンパイラが必要なすべての変数を自動的に検出し、生成されたステートマシンの内部に格納します。 - 継続に必要な最低限の状態のみが保存されます。 - `poll`メソッドはreturn時にコールスタックを放棄するので、スタックの同じ場所を他のfutureのポーリングに使用することができます。 futureとasync/awaitは、協調的マルチタスクのパターンに完全に一致していることがわかります。単に使用している用語が異なるだけです。以下では、"task "と "future"という用語を同じものとして扱います。 ## 実装 future と async/await に基づいた協調的マルチタスクが Rust でどのように動作するかを理解したので、私達のカーネルにそのサポートを追加しましょう。[`Future`] trait は `core` ライブラリの一部であり、async/await は言語自体の機能なので、これらを`#![no_std]` カーネルで使用するために特別なことをする必要はありません。唯一の要件は、Rust の nightly `2020-03-25` 以降を使用することです。なぜなら、async/await はこれ以前は `no_std` に対応していなかったからです。 それ以降のnightlyでは、`main.rs` で async/await を使うことができます: ```rust // in src/main.rs async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` 関数 `async_number` は `async fn` なので、コンパイラはこれを `Future` を実装したステートマシンに変換します。この関数は `42` しか返さないので、できあがったfutureは最初の `poll` 呼び出しですぐに `Poll::Ready(42)` を返します。`async_number`と同様に、`example_task`関数も`async fn`です。この関数は `async_number` が返す数値を待ち、`println` マクロを使ってその数値を表示します。 `example_task` が返す future を実行するには、それが`Poll::Ready` を返すことで完了を知らせてくれるまで、`poll` を呼び出し続ける必要があります。そのためには、シンプルなexecutorの型を作成する必要があります。 ### タスク executorの実装を始める前に、新しい `task` モジュールを `Task` 型で作成します: ```rust // in src/lib.rs pub mod task; ``` ```rust // in src/task/mod.rs use core::{future::Future, pin::Pin}; use alloc::boxed::Box; pub struct Task { future: Pin>>, } ``` `Task` 構造体は、ピン留めされ、ヒープに割り当てられ、 (から) の型 `()` を出力として持つ、動的にディスパッチされるfutureのnewtypeのラッパーです。詳細を見てみましょう: - 私たちは、タスクに関連するfutureが `()` を返すことを要求しています。これは、タスクが結果を一切返さず、副作用のためだけに実行されることを意味します。例えば、上で定義した`example_task`関数は、戻り値はありませんが、副作用として画面に何かを表示します。 - `dyn`キーワードは、`Box`に[_trait object_]を格納することを示しています。これはfuture上のメソッドが[**動的にディスパッチされる**][_dynamically dispatched_]ことを意味しており、`Task` 型に異なる型のfutureを格納することが可能になります。各 `async fn` はそれぞれ異なる型を持っており,私達は複数の異なるタスクを作成できるようにしたいので、これは重要です。 - [pinningについて]で学んだように、`Pin` 型は、値をheap上に配置し、その値への `&mut` 参照の作成を防ぐことで、メモリ内で値が移動できないようにします。これは、async/awaitによって生成されたfutureが自己参照構造体である可能性があるため、重要です。つまり、futureが移動されたときに無効になるような自分自身へのポインタを含む可能性があります。 [_trait object_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html [_dynamically dispatched_]: https://doc.rust-lang.org/book/ch18-02-trait-objects.html#trait-objects-perform-dynamic-dispatch [pinningについて]: #pinliu-me future から新しい `Task` 構造体を作成できるように、`new` 関数を作成します: ```rust // in src/task/mod.rs impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { future: Box::pin(future), } } } ``` この関数は、出力型が `()` の任意のfutureを受け取り、[`Box::pin`] 関数を使ってそれをメモリにピン留めします。そして、Box化されたfutureを `Task` 構造体でラップして返します。ここで `'static` ライフタイムが必要なのは、返された `Task` が任意の時間だけ生き続けることができるので、futureもその時間だけ有効である必要があるからです。 `poll`メソッドも追加して、executorが格納されたfutureをポーリングできるようにしましょう: ```rust // in src/task/mod.rs use core::task::{Context, Poll}; impl Task { fn poll(&mut self, context: &mut Context) -> Poll<()> { self.future.as_mut().poll(context) } } ``` `Future` trait の [`poll`] メソッドは `Pin>` 型で呼び出されることを期待しているので、[`Pin::as_mut`] メソッドを使って `self.future` フィールドをまず `Pin<&mut T>` 型に変換します。そして、変換された `self.future` フィールドに対して `poll` を呼び出し、その結果を返します。`Task::poll`メソッドは、これから作成するexecutorからのみ呼び出されるべきものなので、この関数は`task`モジュールのプライベートなものにしています。 ### 単純なExecutor executorは非常に複雑なものになる可能性があるので、より機能的なexecutorを実装していく前に、あえて非常に基本的なexecutorを作ることから始めます。そのために、まず新しい `task::simple_executor` サブモジュールを作成します: ```rust // in src/task/mod.rs pub mod simple_executor; ``` ```rust // in src/task/simple_executor.rs use super::Task; use alloc::collections::VecDeque; pub struct SimpleExecutor { task_queue: VecDeque, } impl SimpleExecutor { pub fn new() -> SimpleExecutor { SimpleExecutor { task_queue: VecDeque::new(), } } pub fn spawn(&mut self, task: Task) { self.task_queue.push_back(task) } } ``` この構造体には、[`VecDeque`]型の`task_queue`フィールドが1つ含まれています。これは要するに、両端でpushとpopの操作ができるvectorです。この型を使うのは、`spawn` メソッドによって新しいタスクを末尾に挿入し、次のタスクを実行する際先頭からpopしたいからです。これにより、単純な[FIFO queue](_"first in, first out"_)が得られます。 [`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html [FIFO queue]: https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics) #### ダミーのWaker `poll`メソッドを呼び出すためには、[`Context`]型を作成して、[`Waker`]型をラップする必要があります。簡単に始めるために、まず何もしないダミーのwakerを作ります。このために、さまざまな `Waker` のメソッドの実装を定義した [`RawWaker`] インスタンスを作成し、 [`Waker::from_raw`] 関数を使用して `Waker` に変換します: [`RawWaker`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html [`Waker::from_raw`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw ```rust // in src/task/simple_executor.rs use core::task::{Waker, RawWaker}; fn dummy_raw_waker() -> RawWaker { todo!(); } fn dummy_waker() -> Waker { unsafe { Waker::from_raw(dummy_raw_waker()) } } ``` `from_raw` 関数はunsafeです。なぜならば、プログラマがドキュメントに書かれた `RawWaker` の要件を守らないと、未定義の動作が発生する可能性があるからです。`dummy_raw_waker` 関数の実装を見る前に、まず `RawWaker` 型がどのように動作するかを理解しましょう。 ##### `RawWaker` [`RawWaker`] 型では、プログラマが [_virtual method table_] (_vtable_) を明示的に定義する必要があります。このテーブルは、`RawWaker` がクローンされたり、起こされたり、ドロップされたりしたときに呼び出されるべき関数を指定します。このvtableのレイアウトは[`RawWakerVTable`]という型で定義されています。各関数は、基本的には(例えばヒープ上に確保された)構造体への**型消去された** `&self` ポインタである `*const ()` 引数を受け取ります。通常の参照ではなく `*const ()` ポインタを使う理由は、`RawWaker` の型は非ジェネリックであるべきで、かつ任意の型をサポートする必要があるからです。関数に渡されるポインタの値は [`RawWaker::new`] に渡される `data` ポインタです。 [_virtual method table_]: https://en.wikipedia.org/wiki/Virtual_method_table [`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html [`RawWaker::new`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new 通常、`RawWaker` は [`Box`] や [`Arc`] 型にラップされた、ヒープに割り当てられた構造体に対して作成されます。このような型では、 [`Box::into_raw`] のようなメソッドを使用して、 `Box` を `*const T` ポインタに変換することができます。更にこのポインタを `*const ()` 無名(関数)ポインタにキャストして、 `RawWaker::new` に渡すことができます。各vtable関数はどれも`*const ()`を引数として受け取るので、各関数は安全にポインタを`Box`や`&T`にキャストし直して操作することができます。想像できると思いますが、この処理は非常に危険で、ミスにより未定義動作を引き起こすことが多いです。このような理由から、`RawWaker` を自分の手で作成することは、必要な場合を除いてお勧めできません。 [`Box`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`Box::into_raw`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw ##### ダミーの`RawWaker` 自分の手で `RawWaker` を作成することはお勧めできませんが、何もしないダミーの `Waker` を作成する方法は今のところ他にありません。幸いなことに、何かをさせたいわけではないので、`dummy_raw_waker`関数の実装は比較的安全です: ```rust // in src/task/simple_executor.rs use core::task::RawWakerVTable; fn dummy_raw_waker() -> RawWaker { fn no_op(_: *const ()) {} fn clone(_: *const ()) -> RawWaker { dummy_raw_waker() } let vtable = &RawWakerVTable::new(clone, no_op, no_op, no_op); RawWaker::new(0 as *const (), vtable) } ``` まず、`no_op`と`clone`という2つの内部関数を定義します。`no_op`関数は`*const ()`のポインタを受け取り、何もしません。また、`clone`関数は`*const ()`のポインタを受け取り、`dummy_raw_waker`を再度呼び出して新しい`RawWaker`を返します。これらの2つの関数を使って最小限の `RawWakerVTable` を作成します。`clone`関数はクローン作成のために用いられ、それ以外の操作には`no_op`関数が用いられます。`RawWaker`は何もしないので、クローンを作る代わりに`clone`から新しい`RawWaker`を返しても問題はありません。 `vtable`を作成した後、[`RawWaker::new`]関数を使って`RawWaker`を作成します。どのvtable関数も渡された `*const ()` を使用しないので、 これが何であっても構いません。そのため、単にnullポインタを渡します。 #### `run`メソッド これで `Waker` インスタンスを作成する方法ができたので、これを使ってexecutorに `run` メソッドを実装することができます。最もシンプルな `run` メソッドは、キューに入っているすべてのタスクを、すべて完了するまでループで繰り返しポーリングするものです。これは `Waker` 型からの通知を利用していないのであまり効率的ではありませんが、とりあえず実行させるための簡易な方法です: ```rust // in src/task/simple_executor.rs use core::task::{Context, Poll}; impl SimpleExecutor { pub fn run(&mut self) { while let Some(mut task) = self.task_queue.pop_front() { let waker = dummy_waker(); let mut context = Context::from_waker(&waker); match task.poll(&mut context) { Poll::Ready(()) => {} // taskの完了 Poll::Pending => self.task_queue.push_back(task), } } } } ``` この関数は `while let` ループを使用して、`task_queue` 内のすべてのタスクを処理します。各タスクでは、まず `dummy_waker` 関数が返す `Waker` インスタンスをラップして `Context` 型を作成します。そして、この `context` を引数にして `Task::poll` メソッドを呼び出します。もし `poll` メソッドが `Poll::Ready` を返せば、タスクは終了し、次のタスクに進むことができます。タスクがまだ `Poll::Pending` であれば、そのタスクを再びキューの後ろに追加して、次のループの繰り返しで再びポーリングされるようにします。 #### 試してみよう `SimpleExecutor` 型ができたので、`example_task` 関数で返されたタスクを `main.rs`で実行してみましょう: ```rust // in src/main.rs use blog_os::task::{Task, simple_executor::SimpleExecutor}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] `init_heap`を含む初期化ルーチン let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.run(); // […] test_main, "it did not crash" のメッセージ, hlt_loop } // 以下は、上の方で既に定義されているexample_task関数です. // 上にスクロールして探さなくても良いようにするために、ここにも書いておきます. async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` 実行してみると、期待通り **"async number: 42"** のメッセージがスクリーンに表示されています: ![QEMU printing "Hello World", "async number: 42", and "It did not crash!"](qemu-simple-executor.png) この例で起こる様々なステップをまとめてみましょう: - まず、`SimpleExecutor`型の新しいインスタンスが、`task_queue`が (から) の状態で作成されます。 - 次に、非同期の `example_task` 関数が呼び出され、futureを返します。このfutureを `Task` 型でラップすることで、ヒープに移動させてピン留めし、`spawn` メソッドでタスクをexecutorの `task_queue` に追加します。 - そして、`run`メソッドを呼び出して、キューの中の一つのタスクの実行を開始します。これは、以下のような作業からなります: - `task_queue` の先頭からタスクをpopします。 - タスク用の `RawWaker` を作成し、それを [`Waker`] インスタンスに変換し、そこから [`Context`] インスタンスを作成します。 - 先ほど作成した `Context` を使って、タスクのfutureに [`poll`] メソッドを呼び出します。 - この `example_task` は何かを待つわけではないので、最初の `poll` 呼び出し一回で関数の最後まで実行することができます。ここで **"async number: 42"** の行が表示されます。 - この `example_task` は直接 `Poll::Ready` を返すので、タスクキューには戻されません。 - `run`メソッドは、`task_queue`が空になったらリターンします。`kernel_main`関数の実行は継続され、 **"It did not crash!"** というメッセージが表示されます。 ### 非同期キーボード入力 私たちのシンプルなexecutorは、`Waker`通知を利用せず、単純にすべてのタスクを完了するまでループさせます。今回の例では、最初の `poll` 呼び出しで `example_task` が最後まで実行されて終了するので、これは問題になりませんでした。適切な `Waker` の実装によるパフォーマンス上の利点を見るためには、まず真の意味で非同期なタスクを作成する必要があります。つまり、最初の `poll` 呼び出しでは `Poll::Pending` を返す可能性の高いタスクです。 すでに**ハードウェア割り込み**という非同期なタスクが私達のシステムにはあるので、それをこのために使うことができます。[_Interrupts_]の項でご紹介したように、ハードウェアによる割り込みは、外部からの任意のタイミングで発生させることができます。例えば、ハードウェア・タイマーは、あらかじめ定義された時間が経過すると、CPUに割り込みを送ります。CPUは割り込みを受信すると、即座に割り込み記述子表 (interrupt descriptor table, IDT) で定義された対応するハンドラ関数に制御を移します。 [_Interrupts_]: @/edition-2/posts/07-hardware-interrupts/index.md 以下では、キーボード割り込みを利用した非同期タスクを作成します。キーボード割り込みは、非決定論的であり、かつlatency-criticalであるため、これに適した候補となります。非決定論的とは、次にいつキーが押されるかはユーザーに完全に依存しているため、これを予測する方法がないということです。latency-criticalとは、キーボード入力を即座に処理したいということで、そうしないとユーザーはラグを感じることになります。このようなタスクを効率的にサポートするためには、executorが `Waker` 通知を適切にサポートすることが不可欠となります。 #### スキャンコードキュー 現在、キーボードの入力は割り込みハンドラで直接処理しています。割り込みハンドラは重要な作業を中断する可能性がある以上できるだけ短くする必要があるため、これは長期的に考えると良いアイデアではありません。このようにするのではなく、割り込みハンドラは必要最小限の作業(例: キーボードのスキャンコードの読み取りなど)のみを行い、残りの作業(例: スキャンコードの解釈など)はバックグラウンドタスクに任せるべきです。 バックグラウンドタスクに作業を委ねるための一般的な方式は、ある種のキューを作成することです。割り込みハンドラは仕事の一単位をキューにpushし、バックグラウンドタスクはキュー内の仕事を処理します。この考え方を今回のキーボード割込みに適用すると、割込みハンドラはキーボードからスキャンコードを読み取って、それをキューにpushし終わり次第、returnするということになります。キーボードタスクは、キューの反対側に位置し、pushされた各スキャンコードを解釈して処理します: ![Scancode queue with 8 slots on the top. Keyboard interrupt handler on the bottom left with a "push scancode" arrow to the left of the queue. Keyboard task on the bottom right with a "pop scancode" queue coming from the right side of the queue.](scancode-queue.svg) そのキューを簡単に実装したものとしてmutexでラップした [`VecDeque`]が使えるかもしれません。しかし、割り込みハンドラにmutexを使用することは、デッドロックにつながりやすいため、あまり良いアイデアではありません。例えば、キーボードタスクがキューをロックしているときにユーザがキーを押すと、割込みハンドラは再度ロックを取得しようとして、無期限にハングアップしてしまいます。この方法のもう一つの問題点は、`VecDeque`が満杯になったときに新しいヒープの割り当てを行うことで、自動的に容量を増やしてしまうことです。これは、私達のアロケータが内部でmutexを使用しているため、これまたデッドロックを引き起こす可能性があります。さらに、ヒープが断片化されていると、ヒープの割り当てに失敗したり、かなりの時間がかかったりするという問題もあります。 これらの問題を防ぐためには、`push`操作にmutexやアロケートを必要としないキューの実装が必要です。このようなキューは、要素のpushとpopにロックを使用しない[atomic operations]を用いることで実装できます。この方法では、`&self`の参照しか必要としない`push`と`pop`の操作を作成することができ、したがって、mutexなしで使用することができます。`push`の際のアロケートを避けるために、あらかじめ割り当てられた固定サイズのバッファ上にキューを作ります。これにより、キューは **有界** (最大の長さを持つという意味)になりますが、実際には、キューの長さに妥当な上限を定義することが可能な場合が多いので、これは大きな問題ではありません。 [atomic operations]: https://doc.rust-lang.org/core/sync/atomic/index.html ##### `crossbeam`クレート このようなキューを正しく効率的に実装するのは非常に難しいので、既存の、よくテストされた実装を使うことをお勧めします。並行プログラミングのために様々なmutexを使用しない型を実装している人気のあるRustプロジェクトの1つに[`crossbeam`]があります。このプロジェクトでは、[`ArrayQueue`]という名前の型が提供されており、これは今回のケースでまさに必要なものです。そして幸運なことに、この型はアロケーションをサポートしている `no_std` のクレートと完全に互換性があります。 [`crossbeam`]: https://github.com/crossbeam-rs/crossbeam [`ArrayQueue`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html この型を使用するには、`crossbeam-queue` クレートへの依存関係を追加する必要があります: ```toml # in Cargo.toml [dependencies.crossbeam-queue] version = "0.2.1" default-features = false features = ["alloc"] ``` デフォルトでは、クレートは標準ライブラリに依存しています。`no_std`互換にするためには、そのデフォルト機能を無効にして、代わりに`alloc`機能を有効にする必要があります(メインの `crossbeam` クレートに依存しても、ここでは動作しないことに注意してください。なぜなら、`no_std` に対する `queue` モジュールのエクスポートがないからです。これを修正するために [pull request](https://github.com/crossbeam-rs/crossbeam/pull/480) を提出しましたが、まだ crates.io でリリースされていませんでした)。 ##### キューの実装 `ArrayQueue`型を使って、新しい`task::keyboard`モジュールの中に、グローバルなスキャンコードキューを作ることができます: ```rust // in src/task/mod.rs pub mod keyboard; ``` ```rust // in src/task/keyboard.rs use conquer_once::spin::OnceCell; use crossbeam_queue::ArrayQueue; static SCANCODE_QUEUE: OnceCell> = OnceCell::uninit(); ``` [`ArrayQueue::new`]はヒープの割り当てを行いますが、これはコンパイル時には([まだ][const-heap-alloc])できないので、静的変数を直接初期化することはできません。代わりに、[`conquer_once`]クレートの[`OnceCell`]型を使用して、静的な値の安全な1回限りの初期化を行うことができます。このクレートをインクルードするには、`Cargo.toml`に依存関係として追加する必要があります: [`ArrayQueue::new`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.new [const-heap-alloc]: https://github.com/rust-lang/const-eval/issues/20 [`OnceCell`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html [`conquer_once`]: https://docs.rs/conquer-once/0.2.0/conquer_once/index.html ```toml # in Cargo.toml [dependencies.conquer-once] version = "0.2.0" default-features = false ``` ここで、[`OnceCell`]プリミティブの代わりに、[`lazy_static`]マクロを使うこともできます。しかし、`OnceCell`型には、初期化が割込みハンドラ内で行われないようにすることで、割込みハンドラがヒープの割り当てを行うことを防ぐことができるという利点があります。 [`lazy_static`]: https://docs.rs/lazy_static/1.4.0/lazy_static/index.html #### キューを埋める スキャンコードキューを埋めるために、新しい `add_scancode` 関数を作成し、割り込みハンドラから呼び出すことにします: ```rust // in src/task/keyboard.rs use crate::println; /// キーボード割り込みハンドラから呼び出される /// /// 処理をブロックしたり、アロケートをしてはいけない pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); // "警告:スキャンコードキューがいっぱいです。キーボード入力を取り零しています" } } else { println!("WARNING: scancode queue uninitialized"); // "警告:スキャンコードキューが初期化されていません" } } ``` 初期化されたキューへの参照を得るために、[`OnceCell::try_get`]を使用します。キューがまだ初期化されていない場合は、キーボードのスキャンコードを無視して、警告を表示します。この関数でキューの初期化を試みないことは重要です。なぜなら、この関数は割り込みハンドラから呼び出されますが、この割り込みハンドラはヒープの割り当てを行うべきではないためです。この関数は、`main.rs`から呼び出し可能であってはならないので、`pub(crate)`を使用して、`lib.rs`からのみ利用できるようにしています。 [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get [`ArrayQueue::push`]メソッドは`&self`の参照のみを必要とするため、この静的なキューのpushメソッドを呼び出すのは非常に簡単です。`ArrayQueue`型は必要な同期をすべて自分で行うので、ここではmutexのラッパーは必要ありません。キューがいっぱいになった場合には、警告を表示します。 [`ArrayQueue::push`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push キーボード割り込みで`add_scancode`関数を呼び出すために、`interrupts`モジュール内の`keyboard_interrupt_handler`関数を更新します: ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame ) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; crate::task::keyboard::add_scancode(scancode); // new unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` この関数からキーボードを扱うコードをすべて削除し、代わりに `add_scancode` 関数の呼び出しを追加しました。この関数の残りの部分は以前と同じです。 予想通り、`cargo run` を使ってプロジェクトを実行しても、キーの入力が画面に表示されなくなりました。代わりに、キーを押すたびにスキャンコードキューが初期化されていないという警告が表示されます。 #### スキャンコードストリーム `SCANCODE_QUEUE`を初期化し、キューから非同期的にスキャンコードを読み取るために、新しい`ScancodeStream`型を作成します: ```rust // in src/task/keyboard.rs pub struct ScancodeStream { _private: (), } impl ScancodeStream { pub fn new() -> Self { SCANCODE_QUEUE.try_init_once(|| ArrayQueue::new(100)) .expect("ScancodeStream::new should only be called once"); // "ScancodeStream::new は一度しか呼び出されてはなりません" ScancodeStream { _private: () } } } ``` `_private`フィールドの目的は、モジュールの外部から構造体を構築できないようにすることです。これにより、この型を構築するには、`new`関数が唯一の方法となります。この関数では、まず、`SCANCODE_QUEUE`という静的変数を初期化しようとします。既に初期化されている場合にはパニックを起こすようにすることによって、1つの`ScancodeStream`インスタンスしか作成できないようにします。 非同期タスクがスキャンコードを利用できるようにするためには、次のステップとしてキューから次のスキャンコードを取り出そうとする `poll` のようなメソッドを実装します。これは、私たちの型に[`Future`]特性を実装するべきであるように聞こえますが、これはここではうまくいきません。問題は、`Future` trait は単一の非同期値を抽象化するだけであり、`Poll` メソッドが `Poll::Ready` を返した後は二度と呼び出されないことを期待しているということです。しかし、私たちのスキャンコードキューは複数の非同期値を含んでいるので、さらにポーリングを行っても問題ありません。 ##### `Stream` trait 複数の非同期値を生じる型はよく使われるので、[`futures`]クレートはそのような型のための便利な抽象化である[`Stream`] traitを提供しています。この trait は次のように定義されています: [`Stream`]: https://rust-lang.github.io/async-book/05_streams/01_chapter.html ```rust pub trait Stream { type Item; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll>; } ``` この定義は、[`Future`] traitとよく似ていますが、以下のような違いがあります: - 関連型の名前は、`Output`ではなく`Item`です。 - `Stream` trait では、`Poll` を返す `poll` メソッドの代わりに、`Poll>` を返す `poll_next` メソッドが定義されています(`Option` が追加されていることに注意)。 また、意味の上でも違いはあります。`poll_next` は、ストリームが終了したことを知らせる `Poll::Ready(None)` が返されるまで繰り返し呼び出すことができるのです。この点で、このメソッドは [`Iterator::next`] メソッドに似ています(このメソッドも最後の値の後に `None` を返す)。 [`Iterator::next`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html#tymethod.next ##### `Stream`を実装する では、`SCANCODE_QUEUE`の値を非同期に提供するために、`ScancodeStream`に`Stream`型を実装してみましょう。そのためにはまず、`Stream`型がある`futures-util`クレートへの依存関係を追加する必要があります: ```toml # in Cargo.toml [dependencies.futures-util] version = "0.3.4" default-features = false features = ["alloc"] ``` このクレートが`no_std` と互換性を持つようにするためデフォルトの機能を無効にし、アロケーションベースの型を利用できるように `alloc` 機能を有効にしています(これは後で必要になります)。(なお、`futures-util` クレートを再エクスポートしているメインの `futures` クレートの方に依存関係を追加することもできますが、この場合は依存関係の数が増え、コンパイル時間が長くなります) これで、`Stream`というtraitをインポートして実装できるようになりました: ```rust // in src/task/keyboard.rs use core::{pin::Pin, task::{Poll, Context}}; use futures_util::stream::Stream; impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE.try_get().expect("not initialized"); match queue.pop() { Ok(scancode) => Poll::Ready(Some(scancode)), Err(crossbeam_queue::PopError) => Poll::Pending, } } } ``` まず、[`OnceCell::try_get`]メソッドを使って、初期化されたスキャンコードキューへの参照を取得します。`new`関数でキューを初期化しているので、これが失敗することはないはずです。したがって、初期化されていない場合には`expect`メソッドを使ってパニックを起こすようにしても大丈夫です。次に、[`ArrayQueue::pop`]メソッドを使って、キューから次の要素を取得しようとします。もし成功すれば、`Poll::Ready(Some(...))`でラップされたスキャンコードを返します。失敗した場合は、キューが空であることを意味します。その場合は、`Poll::Pending`を返します。 [`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop #### Wakerをサポートする `Futures::poll`メソッドと同様に、`Stream::poll_next`メソッドは、`Poll::Pending`が返された後、非同期タスクが準備ができたらexecutorに通知することを要求します。こうすることで、executorは通知されるまで同じタスクを再度ポーリングする必要がなくなり、待機中のタスクのパフォーマンスオーバーヘッドを大幅に削減することができます。 この通知を送るために、タスクは渡された[`Context`]参照から[`Waker`]を取り出してどこかに保存しなければなりません。タスクの準備ができたら、保存されている `Waker` に対して [`wake`] メソッドを呼び出して、タスクが再びポーリングされるべきであることをexecutorに通知しなければなりません。 ##### AtomicWaker `Waker`通知を`ScancodeStream`に実装するためには、ポーリング呼び出しが終わってから次のポーリング呼び出しまでの間`Waker`を保存できる場所が必要です。これは `add_scancode` 関数からアクセスできる必要があるため、`ScancodeStream` 自身のフィールドとして保存することはできません。これを解決するには、`futures-util` クレートが提供する [`AtomicWaker`] 型の静的変数を使用します。`ArrayQueue`型と同様に、この型はアトミックな命令に基づいており、静的変数に安全に保存でき、並行的に安全に変更することもできます。 [`AtomicWaker`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html [`AtomicWaker`]型を使って、静的な`WAKER`を定義してみましょう: ```rust // in src/task/keyboard.rs use futures_util::task::AtomicWaker; static WAKER: AtomicWaker = AtomicWaker::new(); ``` アイデアとしては、`poll_next`では、現在のwakerをこの静的変数に格納し、`add_scancode`関数では、新しいスキャンコードがキューに追加されたときに、`wake`関数を呼び出すというものです。 ##### Wakerを保存する `poll`/`poll_next` が要求する前提条件として、タスクが `Poll::Pending` を返したときに、渡された `Waker` のwakeup (目覚まし) が起こるように登録することというのがあります。この要求を満たすために、`poll_next` の実装を変更してみましょう: ```rust // in src/task/keyboard.rs impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE .try_get() .expect("scancode queue not initialized"); // "スキャンコードキューが初期化されていない" // 近道 if let Ok(scancode) = queue.pop() { return Poll::Ready(Some(scancode)); } WAKER.register(&cx.waker()); match queue.pop() { Ok(scancode) => { WAKER.take(); Poll::Ready(Some(scancode)) } Err(crossbeam_queue::PopError) => Poll::Pending, } } } ``` 前回と同様に、まず [`OnceCell::try_get`] 関数を使用して、初期化されたスキャンコードキューへの参照を取得します。そして、キューからの `pop` を試みてみて、成功したら `Poll::Ready` を返します。このようにすれば、キューが空でないときにwakerを登録することによるパフォーマンスのオーバーヘッドを回避することができます。 最初の `queue.pop()` の呼び出しが成功しなかった場合、キューは空であるかもしれません。かもしれないというのは、割り込みハンドラがチェックの直後に非同期的にキューを満たした可能性があるからです。この競合状態は次のチェックでも発生する可能性があるので、2回目のチェックの前に `WAKER` 静的変数に `Waker` を登録する必要があります。こうすることで、`Poll::Pending`を返す前にwakeupが起こるかもしれませんが、チェックの後にpushされた全てのスキャンコードに対してwakeupが得られることは保証されます。 渡された [`Context`] に含まれる `Waker` を [`AtomicWaker::register`] 関数で登録した後、2回目のキューからのpopを試みます。成功すると `Poll::Ready` を返します。また、wakerの通知が不要になったので、[`AtomicWaker::take`]を使って先ほど登録したwakerを削除します。もし `queue.pop()` が再び失敗した場合は、先ほどと同様に `Poll::Pending` を返しますが、今回のプログラムではwakerが登録されたうえでリターンするようになっています。 [`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register [`AtomicWaker::take`]: https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take (まだ)`Poll::Pending`を返さなかったタスクに対してwakeupが発生する方法は2つあることに注意してください。1つは、`Poll::Pending`を返す直前にwakeupが発生する、前述の競合状態です。もうひとつの方法は、wakeupを登録した後にキューが空でなくなり、`Poll::Ready`が返される場合です。これらの偽のwakeupは防ぐことができないので、executorはこれらを正しく処理する必要があります。 ##### 保存されているWakerを起こす 保存されている`Waker`を起こすために、`add_scancode`関数の中に`WAKER.wake()`の呼び出しを追加します: ```rust // in src/task/keyboard.rs pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } else { WAKER.wake(); // new } } else { println!("WARNING: scancode queue uninitialized"); } } ``` 今回行った唯一の変更点は、スキャンコードキューへのpushが成功した場合の`WAKER.wake()`への呼び出しを追加したことです。このメソッドは、`WAKER` staticにwakerが登録されていれば、同じ名前の[`wake`]メソッドをそのwakerに対して呼び出すことにより、executorに通知します。そうでなければ、この操作はno-op、つまり何も起こりません。 [`wake`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.wake キューにpushした後で`wake`を呼び出すというのが重要で、そうしないと、キューがまだ空なのにタスクが尚早にwakeされてしまう可能性があります。これは例えば、起こされたタスクを別のCPUコアで同時に開始するようなマルチスレッドのexecutorを使用している場合などに起こります。まだ私達はスレッドをサポートしてはいませんが、近日中にサポートを追加する予定であり、その際に問題が発生しないようにしたいと考えています。 #### キーボードタスク さて、`ScancodeStream`に`Stream` traitを実装したので、これを使って非同期のキーボードタスクを作ることができます: ```rust // in src/task/keyboard.rs use futures_util::stream::StreamExt; use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use crate::print; pub async fn print_keypresses() { let mut scancodes = ScancodeStream::new(); let mut keyboard = Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore); while let Some(scancode) = scancodes.next().await { if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } } } ``` このコードは、この記事で修正する前の[keyboard interrupt handler]にあったコードと非常によく似ています。唯一の違いは、I/O portからスキャンコードを読み込むのではなく、`ScancodeStream`からスキャンコードを取得することです。このために、まず新しい `Scancode` ストリームを作成し、次に [`StreamExt`] traitが提供する [`next`] メソッドを繰り返し使用して、ストリーム内の次の要素を返す `Future` を取得します。これに `await` 演算子を用いることで、futureの結果を非同期的に待ちます。 [keyboard interrupt handler]: @/edition-2/posts/07-hardware-interrupts/index.md#interpreting-the-scancodes [`next`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html#method.next [`StreamExt`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html ストリームが終了の合図として `None` を返すまで、`while let` を使ってループします。`poll_next` メソッドが `None` を返すことはないので、これは事実上の無限ループとなり、`print_keypresses` タスクは決して終了しません。 `main.rs`の中で、`print_keypresses`タスクをexecutorに追加して、キーボード入力を復活させましょう: ```rust // in src/main.rs use blog_os::task::keyboard; // new fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] init_heap、test_mainを含む初期化ルーチン。 let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); // new executor.run(); // […] "it did not crash" message, hlt_loop } ``` ここで`cargo run`を実行すると、キーボード入力が再び機能することがわかります: ![QEMU printing ".....H...e...l...l..o..... ...W..o..r....l...d...!"](qemu-keyboard-output.gif) コンピュータのCPU使用率を監視してみると、`QEMU`プロセスがCPUをずっと忙しくしていることがわかります。これは、`SimpleExecutor` がループで何度も何度もタスクをポーリングするからです。つまり、キーボードのキーを何も押さなくても、executorは `print_keypresses` タスクの `poll` を繰り返し呼び出しています。 ### WakerをサポートするExecutor このパフォーマンスの問題を解決するためには、`Waker`の通知を適切に利用するexecutorを作成する必要があります。この方法では、次のキーボード割り込みが発生したときにexecutorに通知されるので、`print_keypresses`タスクを何度もポーリングする必要はありません。 #### タスクID waker通知を適切にサポートするexecutorを作成するための最初のステップは、各タスクに一意のIDを与えることです。これは、どのタスクが起こされるべきかを指定する方法が必要だからです。まず、新しい `TaskId` ラッパー型を作成します: ```rust // in src/task/mod.rs #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] struct TaskId(u64); ``` `TaskId` 構造体は `u64` の単純なラッパー型です。`TaskId`構造体は、print可能、コピー可能、比較可能、ソート可能にするために、いくつかのtraitを継承します。最後の`Ord`が重要なのは、後ほど `TaskId`型を [`BTreeMap`] のキーとして使用したいからです。 [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html 新しい一意なIDを作成する為に,`TaskId::new`関数を作ります: ```rust use core::sync::atomic::{AtomicU64, Ordering}; impl TaskId { fn new() -> Self { static NEXT_ID: AtomicU64 = AtomicU64::new(0); TaskId(NEXT_ID.fetch_add(1, Ordering::Relaxed)) } } ``` この関数は、各IDが一度だけ割り当てられることを保証するために、[`AtomicU64`]型の静的な`NEXT_ID`変数を使用します。[`fetch_add`]メソッドは、1回のアトミックな操作で、値を増やし更に前の値を返します。つまり、`TaskId::new` メソッドが並列に呼ばれた場合でも、すべてのIDが一度だけ返されることになります。[`Ordering`]パラメータは、コンパイラが命令ストリームにおける`fetch_add`操作の順序を変更することを許可するかどうかを定義します。ここではIDが一意であることだけを要求しているので、最も弱い要求を持つ`Relaxed`という順序づけで十分です。 [`AtomicU64`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html [`fetch_add`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html#method.fetch_add [`Ordering`]: https://doc.rust-lang.org/core/sync/atomic/enum.Ordering.html これで、`Task` 型に `id` フィールドを追加して拡張することができます: ```rust // in src/task/mod.rs pub struct Task { id: TaskId, // new future: Pin>>, } impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { id: TaskId::new(), // new future: Box::pin(future), } } } ``` 新しい`id`フィールドにより、特定のタスクを起こすために必要な、一意な名前をタスクに付けることが可能になります。 #### `Executor`型 新しい `Executor`型を `task::executor` モジュールで作成します: ```rust // in src/task/mod.rs pub mod executor; ``` ```rust // in src/task/executor.rs use super::{Task, TaskId}; use alloc::{collections::BTreeMap, sync::Arc}; use core::task::Waker; use crossbeam_queue::ArrayQueue; pub struct Executor { tasks: BTreeMap, task_queue: Arc>, waker_cache: BTreeMap, } impl Executor { pub fn new() -> Self { Executor { tasks: BTreeMap::new(), task_queue: Arc::new(ArrayQueue::new(100)), waker_cache: BTreeMap::new(), } } } ``` `SimpleExecutor`で行ったようにタスクを[`VecDeque`]に格納する代わりに、タスクIDを格納する`task_queue`と、実際の`Task`インスタンスを格納する`tasks`という名前の[`BTreeMap`]を使用します。このマップは、特定のタスクを効率的に継続できるように、`TaskId`でインデックスされています。 `task_queue`フィールドはタスクIDの[`ArrayQueue`]で、**参照カウント** を実装している[`Arc`]型にラップされています。参照カウントは、複数の所有者の間で値の所有権を共有することを可能にします。これは、ヒープ上に値を割り当て、その値への有効な参照の数をカウントすることで動作します。有効な参照の数がゼロになったら、その値は不要なので、解放することができます。 この `Arc` 型を `task_queue` に使用しているのは、executorとwakerの間で共有されるからです。考え方としては、wakerは起こされたタスクのIDをキューにpushします。executorはキューの受信側におり、`tasks`マップからIDによって起こされたタスクを取り出し、それを実行します。[`SegQueue`]のような無制限のキューではなく、固定サイズのキューを使う理由は、アロケートを行ってはならない割り込みハンドラがこのキューにpushするからです。 `Executor` 型には、`task_queue` と `tasks` マップに加えて、`waker_cache` フィールドがあり、これもマップです。このマップはタスクが作成された後にそのタスクの[`Waker`]をキャッシュします。これには2つの理由があります。1つ目は、同じタスクの複数回のwakeupに対して、毎回新しいwakerを作成するのではなく、同じwakerを再利用することでパフォーマンスを向上させるためです。2つ目は、参照カウントされるwakerが割り込みハンドラ内で解放されないようにするためです。これはデッドロックにつながる可能性があるからです(これについては後で詳しく説明します)。 [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`SegQueue`]: https://docs.rs/crossbeam-queue/0.2.1/crossbeam_queue/struct.SegQueue.html `Executor`を作成するために、簡単な`new`関数を用意しました。`task_queue`の容量は100としていますが、これは当面の間は十分すぎる量です。将来的に100以上のタスクが同時に発生するような場合には、このサイズは簡単に増やすことができます。 #### タスクの生成 `SimpleExecutor`と同じように、`Executor`型の`spawn`メソッドを用意しています。このメソッドは、与えられたタスクを`tasks`マップに追加し、そのIDを`task_queue`にpushすることで、すぐにタスクを起動します: ```rust // in src/task/executor.rs impl Executor { pub fn spawn(&mut self, task: Task) { let task_id = task.id; if self.tasks.insert(task.id, task).is_some() { panic!("task with same ID already in tasks"); // "同じIDのタスクがすでにtasks内に存在" } self.task_queue.push(task_id).expect("queue full"); } } ``` 同じIDのタスクがすでにマップ内に存在する場合、[`BTreeMap::insert`]メソッドはそれを返します。各タスクはユニークなIDを持っているので、このようなことは絶対に起こってはならず、この場合は私達のコードにバグがあることになるのでパニックします。同様に、`task_queue` がいっぱいになったときもパニックします。 #### Tasksの実行 `task_queue`内のすべてのタスクを実行するには、プライベートの`run_ready_tasks`メソッドを作成します: ```rust // in src/task/executor.rs use core::task::{Context, Poll}; impl Executor { fn run_ready_tasks(&mut self) { // 借用チェッカのエラーを回避するために`self`を分配する let Self { tasks, task_queue, waker_cache, } = self; while let Ok(task_id) = task_queue.pop() { let task = match tasks.get_mut(&task_id) { Some(task) => task, None => continue, // タスクはもう存在しない }; let waker = waker_cache .entry(task_id) .or_insert_with(|| TaskWaker::new(task_id, task_queue.clone())); let mut context = Context::from_waker(waker); match task.poll(&mut context) { Poll::Ready(()) => { // タスクは完了したので、タスクとそのキャッシュされたwakerを取り除く tasks.remove(&task_id); waker_cache.remove(&task_id); } Poll::Pending => {} } } } } ``` この関数の基本的な考え方は、私たちの `SimpleExecutor` と似ています。`task_queue` にあるすべてのタスクをループし、各タスクのwakerを作成し、ポーリングします。しかし、自分で保留中のタスクを `task_queue` の最後に戻すのではなく、`TaskWaker` の実装に、待機中のタスクをqueueに戻すことを任せます。このwaker型の実装については、後ほどご紹介します。 この `run_ready_tasks` メソッドの実装の詳細を見てみましょう: - 借用チェッカのエラーを避けるために、[**分配 (destructuring)**][_destructuring_]を使って`self`を3つのフィールドに分割しています。というのも、私たちの実装ではクロージャの中から `self.task_queue` にアクセスする必要があるのですが、今のRustはそれを行うために `self` を完全に借用してしまうのです。これは借用チェッカの基本的な問題であり、[RFC 2229]が[実装][RFC 2229 impl]されたときに解決されるでしょう。 - popされた各タスクIDに対して、`tasks`マップから対応するタスクの可変参照を取得します。私達の `ScancodeStream` の実装では、タスクをスリープさせる必要があるかどうかをチェックする前にwakeupを登録するので、もはや存在しないタスクに対してwakeupが発生することがあります。この場合には、単純にwakeupを無視して、キューから次のIDを取得して続行します。 - poll毎にwakerを作成することによるパフォーマンスのオーバーヘッドを避けるために、`waker_cache`マップを使用して、作成された各タスクのwakerを保存します。これには、[`BTreeMap::entry`]メソッドと[`Entry::or_insert_with`]を組み合わせて使用し、新しいwakerがまだ存在しない場合には新しいwakerを作成して、そのwakerへのミュータブルな参照を取得します。新しいwakerを作成するには、`task_queue` をクローンして、タスク ID と共に `TaskWaker::new` 関数に渡します (実装は後述)。`task_queue` は `Arc` にラップされているので、`clone` は値の参照カウントを増やすだけで、同じヒープに割り当てられたキューを指しています。このようにwakerを再利用することは、すべてのwakerの実装で可能なわけではありませんが、私たちの `TaskWaker` 型ではそれが可能であることに注意してください。 [_destructuring_]: https://doc.rust-jp.rs/book-ja/ch18-03-pattern-syntax.html#%E6%A7%8B%E9%80%A0%E4%BD%93%E3%82%92%E5%88%86%E9%85%8D%E3%81%99%E3%82%8B [RFC 2229]: https://github.com/rust-lang/rfcs/pull/2229 [RFC 2229 impl]: https://github.com/rust-lang/rust/issues/53488 [`BTreeMap::entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry [`Entry::or_insert_with`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with タスクは `Poll::Ready` を返すと終了します。その場合、[`BTreeMap::remove`]メソッドを使って `tasks` マップからタスクを削除します。また、キャッシュされたwakerがあれば、それも削除します。 [`BTreeMap::remove`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove #### Wakerの設計 wakerの仕事は、起こされたタスクのIDをexecutorの`task_queue`にpushすることです。新しい `TaskWaker` 構造体を作成して、タスクの ID と `task_queue` への参照を格納することで、これを実装します: ```rust // in src/task/executor.rs struct TaskWaker { task_id: TaskId, task_queue: Arc>, } ``` `task_queue`の所有権はexecutorとwakerの間で共有されるので、[`Arc`]ラッパー型を使って、参照カウント式の共有所有権を実装します。 [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html wakeオペレーションの実装は非常にシンプルです: ```rust // in src/task/executor.rs impl TaskWaker { fn wake_task(&self) { self.task_queue.push(self.task_id).expect("task_queue full"); } } ``` 参照されている `task_queue` に `task_id` をpushします。[`ArrayQueue`]型の変更には共有参照だけがあればよいので、このメソッドは `&mut self` ではなく `&self` に実装することができます。 ##### `Wake` Trait Futureのポーリングに`TaskWaker`型を使うには、まずこれを[`Waker`]インスタンスに変換する必要があります。これは [`Future::poll`] メソッドが引数として [`Context`] インスタンスを取り、このインスタンスは `Waker` 型からしか構築できないためです。これは [`RawWaker`] 型の実装を提供することによって可能ですが、代わりに `Arc` ベースの [`Wake`][wake-trait] trait を実装し、標準ライブラリが提供する [`From`] の実装を使用して `Waker` を構築する方が、よりシンプルで安全でしょう。 traitの実装は以下のようにします: [wake-trait]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html ```rust // in src/task/executor.rs use alloc::task::Wake; impl Wake for TaskWaker { fn wake(self: Arc) { self.wake_task(); } fn wake_by_ref(self: &Arc) { self.wake_task(); } } ``` Waker は通常、executorと非同期タスクの間で共有されるので、この trait メソッドでは、`Self` インスタンスを、参照カウントされた所有権を実装する [`Arc`] 型でラップする必要があります。つまり、これらのメソッドを呼び出すためには、`TaskWaker` を `Arc` に移動させる必要があります。 `wake`と`wake_by_ref`メソッドの違いは、後者は`Arc`への参照のみを必要とするのに対し、前者は`Arc`の所有権を取得するため、しばしば参照カウントの増加を必要とすることです。すべての型が参照によるwakeをサポートしているわけではないので、`wake_by_ref` メソッドを実装するかは自由ですが、不必要な参照カウントの変更を避けることができるので、パフォーマンスの向上につながります。今回の例では、両方の trait メソッドで単純に `wake_task` 関数を呼び出すようにします。この関数は、共有の `&self` 参照しか要求しません。 ##### Wakerを生成する `Waker` 型は、`Wake` traitを実装したすべての `Arc` でラップされた値からの [`From`] 変換をサポートしているので、`Executor::run_ready_tasks` メソッドで必要となる `TaskWaker::new` 関数を実装することができます: [`From`]: https://doc.rust-lang.org/nightly/core/convert/trait.From.html ```rust // in src/task/executor.rs impl TaskWaker { fn new(task_id: TaskId, task_queue: Arc>) -> Waker { Waker::from(Arc::new(TaskWaker { task_id, task_queue, })) } } ``` 渡された `task_id` と `task_queue` を使って `TaskWaker` を作成します。次に `TaskWaker` を `Arc` で囲み、`Waker::from` の実装を使用してそれを [`Waker`] に変換します。この `from` メソッドは、`TaskWaker` 型の [`RawWakerVTable`] と [`RawWaker`] インスタンスの構築を行います。このメソッドの詳細について興味のある場合は、[`alloc`クレート内での実装][waker-from-impl]をご覧ください。 [waker-from-impl]: https://github.com/rust-lang/rust/blob/cdb50c6f2507319f29104a25765bfb79ad53395c/src/liballoc/task.rs#L58-L87 #### `run`メソッド wakerの実装ができたので、いよいよexecutorの`run`メソッドを構築します: ```rust // in src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); } } } ``` このメソッドは `run_ready_tasks` 関数の呼び出しをループするだけです。理論的には、`tasks`マップが空になったときにこの関数からリターンすることもできますが、`keyboard_task`が終了しないのでそれは起こらず、よって単純な`loop`で十分です。この関数は決してリターンしませんので、`!`という戻り値の型を使って、コンパイラにこの関数が[発散する (diverging) ][diverging]ことを示します。 [diverging]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html これで、`kernel_main`で、`SimpleExecutor`の代わりに新しい`Executor`を使うように変更することができます: ```rust // in src/main.rs use blog_os::task::executor::Executor; // new fn kernel_main(boot_info: &'static BootInfo) -> ! { // init_heap、test_mainを含む初期化ルーチンを省略 let mut executor = Executor::new(); // new executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); executor.run(); } ``` 必要なのは、インポート部(`use`のところ)と型名を変更することだけです。関数 `run` は発散する関数となっているので、コンパイラはこの関数が決してリターンしないことを認識し、そのため`kernel_main` 関数の最後に `hlt_loop` を呼び出す必要はもうありません。 ここで、`cargo run`を使ってカーネルを実行すると、キーボード入力が変わらず正常に動作することがわかります: ![QEMU printing ".....H...e...l...l..o..... ...a..g..a....i...n...!"](qemu-keyboard-output-again.gif) しかし、QEMUのCPU使用量は全く減っていません。その理由は、CPUをずっとbusy状態にしているからです。タスクが再び起こされるまでポーリングすることはなくなりましたが、`task_queue`をチェックし続けるbusy loop(忙しないループの意)に入っているのです。この問題を解決するには、やるべき仕事がなくなったらCPUをスリープさせる必要があります。 #### 何もすることがない (idle) ならスリープする 基本的な考え方は、`task_queue`が空になったときに[`hlt`命令][`hlt` instruction]を実行するというものです。この命令は、次の割り込みが来るまでCPUをスリープ状態にします。割り込みが入るとCPUがすぐに活動を再開するので、割り込みハンドラが`task_queue`にpushされたときにも直接反応できるようになっています。 [`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) これを実現するために、executorに新しい`sleep_if_idle`メソッドを作成し、`run`メソッドから呼び出します: ```rust // in src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); self.sleep_if_idle(); // new } } fn sleep_if_idle(&self) { if self.task_queue.is_empty() { x86_64::instructions::hlt(); } } } ``` `sleep_if_idle`は、`task_queue`が空になるまでループする`run_ready_tasks`の直後に呼び出されるので、キューを再度チェックする必要はないと思われるかもしれません。しかし、`run_ready_tasks` がリターンしてきた直後にハードウェア割り込みが発生する可能性があるため、`sleep_if_idle` 関数が呼ばれた時点ではキューに新しいタスクがあるかもしれません。キューがまだ空であった場合のみ、[`x86_64`]クレートが提供する[`instructions::hlt`]ラッパー関数を介して`hlt`命令を実行することで、CPUをスリープさせます。 [`instructions::hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/fn.hlt.html [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/index.html 残念ながら、この実装には微妙な競合状態が残っています。割り込みは非同期であり、いつでも発生する可能性があるため、`is_empty` のチェックと `hlt` の呼び出しの間に割り込みが発生する可能性があります: ```rust if self.task_queue.is_empty() { /// <--- 割り込みがここで起きる可能性があります x86_64::instructions::hlt(); } ``` この割り込みが`task_queue`にpushされた場合、タスクの準備ができているにもかかわらず、CPUをスリープ状態にしてしまいます。最悪の場合、キーボード割り込みの処理が次のkeypressや次のタイマー割り込みまで遅れることになります。では、これを防ぐにはどうしたらよいでしょうか? その答えは、チェックの前にCPUの割り込みを無効にし、`hlt`命令と一緒にアトミックに再度有効にすることです。この方法では、その間に発生するすべての割り込みが `hlt` 命令の後に遅延されるため、wakeupが失敗することはありません。この方法を実装するには、[`x86_64`]クレートが提供する[`interrupts::enable_and_hlt`][`enable_and_hlt`]関数を使用します。 [`enable_and_hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.enable_and_hlt.html 更新された `sleep_if_idle` 関数の実装は次のようになります: ```rust // in src/task/executor.rs impl Executor { fn sleep_if_idle(&self) { use x86_64::instructions::interrupts::{self, enable_and_hlt}; interrupts::disable(); if self.task_queue.is_empty() { enable_and_hlt(); } else { interrupts::enable(); } } } ``` 競合状態を避けるために、`task_queue` が空であるかどうかを確認する前に、割り込みを無効にします。空いていれば、[`enable_and_hlt`]関数を使用して、単一のアトミック操作として割り込みを有効にしCPUをスリープさせます。キューが空でない場合は、`run_ready_tasks` がリターンしてきた後に、割り込みがタスクを起動したことを意味します。その場合は、再び割り込みを有効にして、`hlt`を実行せずにすぐに実行を継続します。 これで、実行することがないときには、executorが適切にCPUをスリープ状態にするようになりました。再び`cargo run`を使ってカーネルを実行すると、QEMUプロセスのCPU使用率が大幅に低下していることがわかります。 #### 考えられる機能拡張 executorは、効率的な方法でタスクを実行できるようになりました。待機中のタスクのポーリングを避けるためにwaker通知を利用し、現在やるべきことがないときはCPUをスリープさせます。しかし、このexecutorはまだ非常に基本的なものであり、機能を拡張する方法はたくさんあります: - **スケジューリング**: 現在、我々は[`VecDeque`]型を使用して、先入れ先出し(FIFO)戦略を`task_queue`に実装しています。これはしばしば **ラウンドロビン (round robin)** スケジューリングとも呼ばれます。この戦略は、すべてのワークロードにとって最も効率的であるとは限りません。例えば、レイテンシーが重要なタスクや、I/Oを大量に行うタスクを優先させることは意味があるかもしれません。詳しくは、[_Operating Systems: Three Easy Pieces_]の[スケジューリングの章][scheduling chapter]や、[スケジューリングに関するWikipediaの記事][scheduling-wiki]をご覧ください。 - **タスクの発生 (spawn)**: 現在、私たちの `Executor::spawn` メソッドは `&mut self` の参照を必要とするため、`run` メソッドを開始した後は利用できません。この問題を解決するには、追加で `Spawner` 型を作成します。この型は、ある種のキューをexecutorと共有し、タスク自身の中からタスクを作成することができます。このキューには、例えば `task_queue` を直接使用することもできますし、executorが実行ループの中でチェックする別のキューを使用することもできます。 - **スレッドを活用する**: まだスレッドのサポートはしていませんが、次の投稿で追加する予定です。これにより、複数のexecutorのインスタンスを異なるスレッドで起動することが可能になります。このアプローチの利点は、複数のタスクが同時に実行できるため、長時間実行するタスクによって課せられる遅延を減らすことができることです。また、この方法では、複数のCPUコアを利用することもできます。 - **負荷の分配**: スレッドをサポートするようにした場合、すべてのCPUコアが利用されるように、executor間でどのようにタスクを分配するかが重要になります。このための一般的なテクニックは、[_work stealing_]です。 [scheduling chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf [_Operating Systems: Three Easy Pieces_]: http://pages.cs.wisc.edu/~remzi/OSTEP/ [scheduling-wiki]: https://en.wikipedia.org/wiki/Scheduling_(computing) [_work stealing_]: https://en.wikipedia.org/wiki/Work_stealing ## まとめ この記事ではまず、**マルチタスク**について紹介し、実行中のタスクを定期的に強制的に中断させる**非協調的**マルチタスクと、タスクが自発的にCPUの制御を放棄するまで実行させてやる**協調的**マルチタスクの違いを説明しました。 次に、Rustがサポートする**async/await**がどのようにして協調的マルチタスクの言語レベルの実装を提供しているかを調べました。Rustは、非同期タスクを抽象化するポーリングベースの`Future` traitをベースにして実装しています。async/awaitを使うと、通常の同期コードとほぼ同じようにfutureを扱うことができます。違いは、非同期関数が再び `Future` を返すことで、それを実行するためにはどこかの時点でこの`Future`をexecutorに追加する必要があります。 舞台裏では、コンパイラが async/await コードを **ステートマシン** に変換し、各 `.await` オペレーションが可能な待ち状態に対応するようにします。対象のプログラムに関する知識を活用することで、コンパイラは各待ち状態に必要な最小限の状態のみを保存することができ、その結果、タスクあたりのメモリ消費量は非常に小さくなります。一つの課題は、生成されたステートマシンに **自己参照**構造体が含まれている可能性があることです。例えば、非同期関数のローカル変数の一方が他方を参照している場合などです。ポインタの無効化を防ぐために、Rustは`Pin`型を用いて、futureが最初にポーリングされた後は、メモリ内で移動できないようにしています。 私たちの**実装**では、まず、`Waker`型を全く使わずに、busy loopですべてのspawnされたタスクをポーリングする非常に基本的なexecutorを作成しました。次に、非同期のキーボードタスクを実装することで、waker通知の利点を示しました。このタスクは、`crossbeam`クレートが提供する`ArrayQueue`というmutexを使用しない型を使って、静的な`SCANCODE_QUEUE`を定義します。キーボード割り込みハンドラは、キーの入力を直接処理する代わりに、受信したすべてのスキャンコードをキューに入れ、登録されている `Waker` を起こして、新しい入力が利用可能であることを通知します。受信側では、`ScancodeStream`型を作成して、キュー内の次のスキャンコードに変化する`Future`を提供しています。これにより、非同期の `print_keypresses` タスクを作成することができました。このタスクは、キュー内のスキャンコードを解釈して出力するために async/await を使用します。 キーボードタスクのwaker通知を利用するために、新しい `Executor` 型を作成しました。この型は、準備のできたタスクに `Arc` で共有された `task_queue` を使用します。 私たちは`TaskWaker`型を実装し、起こされたタスクのIDを直接この`task_queue`にpushし、それをexecutorが再びポーリングするようにしました。また、実行可能なタスクがないときに電力を節約するために、`hlt`命令を用いてCPUをスリープさせる機能を追加しました。最後に、マルチコアへの対応など、executorの拡張の可能性について述べました。 ## 次は? async/waitを使うことで、カーネルで基本的な協調的マルチタスクをサポートできるようになりました。協調的マルチタスクは非常に効率的ですが、個々のタスクが長く実行しすぎる場合、他のタスクの実行が妨げられ、遅延の問題が発生します。このため、カーネルに非協調的マルチタスクのサポートを追加することは理にかなっています。 次回は、非協調的マルチタスクの最も一般的な形態である **スレッド** を紹介します。スレッドは、長時間実行されるタスクの問題を解決するだけでなく、将来的に複数のCPUコアを利用したり、信頼できないユーザープログラムを実行したりするための準備にもなります。 ================================================ FILE: blog/content/edition-2/posts/12-async-await/index.md ================================================ +++ title = "Async/Await" weight = 12 path = "async-await" date = 2020-03-27 [extra] chapter = "Multitasking" +++ In this post, we explore _cooperative multitasking_ and the _async/await_ feature of Rust. We take a detailed look at how async/await works in Rust, including the design of the `Future` trait, the state machine transformation, and _pinning_. We then add basic support for async/await to our kernel by creating an asynchronous keyboard task and a basic executor. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-12`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-12 ## Multitasking One of the fundamental features of most operating systems is [_multitasking_], which is the ability to execute multiple tasks concurrently. For example, you probably have other programs open while looking at this post, such as a text editor or a terminal window. Even if you have only a single browser window open, there are probably various background tasks for managing your desktop windows, checking for updates, or indexing files. [_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking While it seems like all tasks run in parallel, only a single task can be executed on a CPU core at a time. To create the illusion that the tasks run in parallel, the operating system rapidly switches between active tasks so that each one can make a bit of progress. Since computers are fast, we don't notice these switches most of the time. While single-core CPUs can only execute a single task at a time, multi-core CPUs can run multiple tasks in a truly parallel way. For example, a CPU with 8 cores can run 8 tasks at the same time. We will explain how to setup multi-core CPUs in a future post. For this post, we will focus on single-core CPUs for simplicity. (It's worth noting that all multi-core CPUs start with only a single active core, so we can treat them as single-core CPUs for now.) There are two forms of multitasking: _Cooperative_ multitasking requires tasks to regularly give up control of the CPU so that other tasks can make progress. _Preemptive_ multitasking uses operating system functionality to switch threads at arbitrary points in time by forcibly pausing them. In the following we will explore the two forms of multitasking in more detail and discuss their respective advantages and drawbacks. ### Preemptive Multitasking The idea behind preemptive multitasking is that the operating system controls when to switch tasks. For that, it utilizes the fact that it regains control of the CPU on each interrupt. This makes it possible to switch tasks whenever new input is available to the system. For example, it would be possible to switch tasks when the mouse is moved or a network packet arrives. The operating system can also determine the exact time that a task is allowed to run by configuring a hardware timer to send an interrupt after that time. The following graphic illustrates the task switching process on a hardware interrupt: ![](regain-control-on-interrupt.svg) In the first row, the CPU is executing task `A1` of program `A`. All other tasks are paused. In the second row, a hardware interrupt arrives at the CPU. As described in the [_Hardware Interrupts_] post, the CPU immediately stops the execution of task `A1` and jumps to the interrupt handler defined in the interrupt descriptor table (IDT). Through this interrupt handler, the operating system now has control of the CPU again, which allows it to switch to task `B1` instead of continuing task `A1`. [_Hardware Interrupts_]: @/edition-2/posts/07-hardware-interrupts/index.md #### Saving State Since tasks are interrupted at arbitrary points in time, they might be in the middle of some calculations. In order to be able to resume them later, the operating system must backup the whole state of the task, including its [call stack] and the values of all CPU registers. This process is called a [_context switch_]. [call stack]: https://en.wikipedia.org/wiki/Call_stack [_context switch_]: https://en.wikipedia.org/wiki/Context_switch As the call stack can be very large, the operating system typically sets up a separate call stack for each task instead of backing up the call stack content on each task switch. Such a task with its own stack is called a [_thread of execution_] or _thread_ for short. By using a separate stack for each task, only the register contents need to be saved on a context switch (including the program counter and stack pointer). This approach minimizes the performance overhead of a context switch, which is very important since context switches often occur up to 100 times per second. [_thread of execution_]: https://en.wikipedia.org/wiki/Thread_(computing) #### Discussion The main advantage of preemptive multitasking is that the operating system can fully control the allowed execution time of a task. This way, it can guarantee that each task gets a fair share of the CPU time, without the need to trust the tasks to cooperate. This is especially important when running third-party tasks or when multiple users share a system. The disadvantage of preemption is that each task requires its own stack. Compared to a shared stack, this results in higher memory usage per task and often limits the number of tasks in the system. Another disadvantage is that the operating system always has to save the complete CPU register state on each task switch, even if the task only used a small subset of the registers. Preemptive multitasking and threads are fundamental components of an operating system because they make it possible to run untrusted userspace programs. We will discuss these concepts in full detail in future posts. For this post, however, we will focus on cooperative multitasking, which also provides useful capabilities for our kernel. ### Cooperative Multitasking Instead of forcibly pausing running tasks at arbitrary points in time, cooperative multitasking lets each task run until it voluntarily gives up control of the CPU. This allows tasks to pause themselves at convenient points in time, for example, when they need to wait for an I/O operation anyway. Cooperative multitasking is often used at the language level, like in the form of [coroutines] or [async/await]. The idea is that either the programmer or the compiler inserts [_yield_] operations into the program, which give up control of the CPU and allow other tasks to run. For example, a yield could be inserted after each iteration of a complex loop. [coroutines]: https://en.wikipedia.org/wiki/Coroutine [async/await]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html [_yield_]: https://en.wikipedia.org/wiki/Yield_(multithreading) It is common to combine cooperative multitasking with [asynchronous operations]. Instead of waiting until an operation is finished and preventing other tasks from running during this time, asynchronous operations return a "not ready" status if the operation is not finished yet. In this case, the waiting task can execute a yield operation to let other tasks run. [asynchronous operations]: https://en.wikipedia.org/wiki/Asynchronous_I/O #### Saving State Since tasks define their pause points themselves, they don't need the operating system to save their state. Instead, they can save exactly the state they need for continuation before they pause themselves, which often results in better performance. For example, a task that just finished a complex computation might only need to backup the final result of the computation since it does not need the intermediate results anymore. Language-supported implementations of cooperative tasks are often even able to backup the required parts of the call stack before pausing. As an example, Rust's async/await implementation stores all local variables that are still needed in an automatically generated struct (see below). By backing up the relevant parts of the call stack before pausing, all tasks can share a single call stack, which results in much lower memory consumption per task. This makes it possible to create an almost arbitrary number of cooperative tasks without running out of memory. #### Discussion The drawback of cooperative multitasking is that an uncooperative task can potentially run for an unlimited amount of time. Thus, a malicious or buggy task can prevent other tasks from running and slow down or even block the whole system. For this reason, cooperative multitasking should only be used when all tasks are known to cooperate. As a counterexample, it's not a good idea to make the operating system rely on the cooperation of arbitrary user-level programs. However, the strong performance and memory benefits of cooperative multitasking make it a good approach for usage _within_ a program, especially in combination with asynchronous operations. Since an operating system kernel is a performance-critical program that interacts with asynchronous hardware, cooperative multitasking seems like a good approach for implementing concurrency. ## Async/Await in Rust The Rust language provides first-class support for cooperative multitasking in the form of async/await. Before we can explore what async/await is and how it works, we need to understand how _futures_ and asynchronous programming work in Rust. ### Futures A _future_ represents a value that might not be available yet. This could be, for example, an integer that is computed by another task or a file that is downloaded from the network. Instead of waiting until the value is available, futures make it possible to continue execution until the value is needed. #### Example The concept of futures is best illustrated with a small example: ![Sequence diagram: main calls `read_file` and is blocked until it returns; then it calls `foo()` and is also blocked until it returns. The same process is repeated, but this time `async_read_file` is called, which directly returns a future; then `foo()` is called again, which now runs concurrently with the file load. The file is available before `foo()` returns.](async-example.svg) This sequence diagram shows a `main` function that reads a file from the file system and then calls a function `foo`. This process is repeated two times: once with a synchronous `read_file` call and once with an asynchronous `async_read_file` call. With the synchronous call, the `main` function needs to wait until the file is loaded from the file system. Only then can it call the `foo` function, which requires it to again wait for the result. With the asynchronous `async_read_file` call, the file system directly returns a future and loads the file asynchronously in the background. This allows the `main` function to call `foo` much earlier, which then runs in parallel with the file load. In this example, the file load even finishes before `foo` returns, so `main` can directly work with the file without further waiting after `foo` returns. #### Futures in Rust In Rust, futures are represented by the [`Future`] trait, which looks like this: [`Future`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html ```rust pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll; } ``` The [associated type] `Output` specifies the type of the asynchronous value. For example, the `async_read_file` function in the diagram above would return a `Future` instance with `Output` set to `File`. [associated type]: https://doc.rust-lang.org/book/ch20-02-advanced-traits.html#associated-types The [`poll`] method allows to check if the value is already available. It returns a [`Poll`] enum, which looks like this: [`poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll [`Poll`]: https://doc.rust-lang.org/nightly/core/task/enum.Poll.html ```rust pub enum Poll { Ready(T), Pending, } ``` When the value is already available (e.g. the file was fully read from disk), it is returned wrapped in the `Ready` variant. Otherwise, the `Pending` variant is returned, which signals to the caller that the value is not yet available. The `poll` method takes two arguments: `self: Pin<&mut Self>` and `cx: &mut Context`. The former behaves similarly to a normal `&mut self` reference, except that the `Self` value is [_pinned_] to its memory location. Understanding `Pin` and why it is needed is difficult without understanding how async/await works first. We will therefore explain it later in this post. [_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html The purpose of the `cx: &mut Context` parameter is to pass a [`Waker`] instance to the asynchronous task, e.g., the file system load. This `Waker` allows the asynchronous task to signal that it (or a part of it) is finished, e.g., that the file was loaded from disk. Since the main task knows that it will be notified when the `Future` is ready, it does not need to call `poll` over and over again. We will explain this process in more detail later in this post when we implement our own waker type. [`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html ### Working with Futures We now know how futures are defined and understand the basic idea behind the `poll` method. However, we still don't know how to effectively work with futures. The problem is that futures represent the results of asynchronous tasks, which might not be available yet. In practice, however, we often need these values directly for further calculations. So the question is: How can we efficiently retrieve the value of a future when we need it? #### Waiting on Futures One possible answer is to wait until a future becomes ready. This could look something like this: ```rust let future = async_read_file("foo.txt"); let file_content = loop { match future.poll(…) { Poll::Ready(value) => break value, Poll::Pending => {}, // do nothing } } ``` Here we _actively_ wait for the future by calling `poll` over and over again in a loop. The arguments to `poll` don't matter here, so we omitted them. While this solution works, it is very inefficient because we keep the CPU busy until the value becomes available. A more efficient approach could be to _block_ the current thread until the future becomes available. This is, of course, only possible if you have threads, so this solution does not work for our kernel, at least not yet. Even on systems where blocking is supported, it is often not desired because it turns an asynchronous task into a synchronous task again, thereby inhibiting the potential performance benefits of parallel tasks. #### Future Combinators An alternative to waiting is to use future combinators. Future combinators are methods like `map` that allow chaining and combining futures together, similar to the methods of the [`Iterator`] trait. Instead of waiting on the future, these combinators return a future themselves, which applies the mapping operation on `poll`. [`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html As an example, a simple `string_len` combinator for converting a `Future` to a `Future` could look like this: ```rust struct StringLen { inner_future: F, } impl Future for StringLen where F: Future { type Output = usize; fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll { match self.inner_future.poll(cx) { Poll::Ready(s) => Poll::Ready(s.len()), Poll::Pending => Poll::Pending, } } } fn string_len(string: impl Future) -> impl Future { StringLen { inner_future: string, } } // Usage fn file_len() -> impl Future { let file_content_future = async_read_file("foo.txt"); string_len(file_content_future) } ``` This code does not quite work because it does not handle [_pinning_], but it suffices as an example. The basic idea is that the `string_len` function wraps a given `Future` instance into a new `StringLen` struct, which also implements `Future`. When the wrapped future is polled, it polls the inner future. If the value is not ready yet, `Poll::Pending` is returned from the wrapped future too. If the value is ready, the string is extracted from the `Poll::Ready` variant and its length is calculated. Afterwards, it is wrapped in `Poll::Ready` again and returned. [_pinning_]: https://doc.rust-lang.org/stable/core/pin/index.html With this `string_len` function, we can calculate the length of an asynchronous string without waiting for it. Since the function returns a `Future` again, the caller can't work directly on the returned value, but needs to use combinator functions again. This way, the whole call graph becomes asynchronous and we can efficiently wait for multiple futures at once at some point, e.g., in the main function. Because manually writing combinator functions is difficult, they are often provided by libraries. While the Rust standard library itself provides no combinator methods yet, the semi-official (and `no_std` compatible) [`futures`] crate does. Its [`FutureExt`] trait provides high-level combinator methods such as [`map`] or [`then`], which can be used to manipulate the result with arbitrary closures. [`futures`]: https://docs.rs/futures/0.3.4/futures/ [`FutureExt`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html [`map`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map [`then`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then ##### Advantages The big advantage of future combinators is that they keep the operations asynchronous. In combination with asynchronous I/O interfaces, this approach can lead to very high performance. The fact that future combinators are implemented as normal structs with trait implementations allows the compiler to excessively optimize them. For more details, see the [_Zero-cost futures in Rust_] post, which announced the addition of futures to the Rust ecosystem. [_Zero-cost futures in Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ ##### Drawbacks While future combinators make it possible to write very efficient code, they can be difficult to use in some situations because of the type system and the closure-based interface. For example, consider code like this: ```rust fn example(min_len: usize) -> impl Future { async_read_file("foo.txt").then(move |content| { if content.len() < min_len { Either::Left(async_read_file("bar.txt").map(|s| content + &s)) } else { Either::Right(future::ready(content)) } }) } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=91fc09024eecb2448a85a7ef6a97b8d8)) Here we read the file `foo.txt` and then use the [`then`] combinator to chain a second future based on the file content. If the content length is smaller than the given `min_len`, we read a different `bar.txt` file and append it to `content` using the [`map`] combinator. Otherwise, we return only the content of `foo.txt`. We need to use the [`move` keyword] for the closure passed to `then` because otherwise there would be a lifetime error for `min_len`. The reason for the [`Either`] wrapper is that `if` and `else` blocks must always have the same type. Since we return different future types in the blocks, we must use the wrapper type to unify them into a single type. The [`ready`] function wraps a value into a future, which is immediately ready. The function is required here because the `Either` wrapper expects that the wrapped value implements `Future`. [`move` keyword]: https://doc.rust-lang.org/std/keyword.move.html [`Either`]: https://docs.rs/futures/0.3.4/futures/future/enum.Either.html [`ready`]: https://docs.rs/futures/0.3.4/futures/future/fn.ready.html As you can imagine, this can quickly lead to very complex code for larger projects. It gets especially complicated if borrowing and different lifetimes are involved. For this reason, a lot of work was invested in adding support for async/await to Rust, with the goal of making asynchronous code radically simpler to write. ### The Async/Await Pattern The idea behind async/await is to let the programmer write code that _looks_ like normal synchronous code, but is turned into asynchronous code by the compiler. It works based on the two keywords `async` and `await`. The `async` keyword can be used in a function signature to turn a synchronous function into an asynchronous function that returns a future: ```rust async fn foo() -> u32 { 0 } // the above is roughly translated by the compiler to: fn foo() -> impl Future { future::ready(0) } ``` This keyword alone wouldn't be that useful. However, inside `async` functions, the `await` keyword can be used to retrieve the asynchronous value of a future: ```rust async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=d93c28509a1c67661f31ff820281d434)) This function is a direct translation of the `example` function from [above](#drawbacks) that used combinator functions. Using the `.await` operator, we can retrieve the value of a future without needing any closures or `Either` types. As a result, we can write our code like we write normal synchronous code, with the difference that _this is still asynchronous code_. #### State Machine Transformation Behind the scenes, the compiler converts the body of the `async` function into a [_state machine_], with each `.await` call representing a different state. For the above `example` function, the compiler creates a state machine with the following four states: [_state machine_]: https://en.wikipedia.org/wiki/Finite-state_machine ![Four states: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-states.svg) Each state represents a different pause point in the function. The _"Start"_ and _"End"_ states represent the function at the beginning and end of its execution. The _"Waiting on foo.txt"_ state represents that the function is currently waiting for the first `async_read_file` result. Similarly, the _"Waiting on bar.txt"_ state represents the pause point where the function is waiting on the second `async_read_file` result. The state machine implements the `Future` trait by making each `poll` call a possible state transition: ![Four states and their transitions: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-basic.svg) The diagram uses arrows to represent state switches and diamond shapes to represent alternative ways. For example, if the `foo.txt` file is not ready, the path marked with _"no"_ is taken and the _"Waiting on foo.txt"_ state is reached. Otherwise, the _"yes"_ path is taken. The small red diamond without a caption represents the `if content.len() < 100` branch of the `example` function. We see that the first `poll` call starts the function and lets it run until it reaches a future that is not ready yet. If all futures on the path are ready, the function can run till the _"End"_ state, where it returns its result wrapped in `Poll::Ready`. Otherwise, the state machine enters a waiting state and returns `Poll::Pending`. On the next `poll` call, the state machine then starts from the last waiting state and retries the last operation. #### Saving State In order to be able to continue from the last waiting state, the state machine must keep track of the current state internally. In addition, it must save all the variables that it needs to continue execution on the next `poll` call. This is where the compiler can really shine: Since it knows which variables are used when, it can automatically generate structs with exactly the variables that are needed. As an example, the compiler generates structs like the following for the above `example` function: ```rust // The `example` function again so that you don't have to scroll up async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } // The compiler-generated state structs: struct StartState { min_len: usize, } struct WaitingOnFooTxtState { min_len: usize, foo_txt_future: impl Future, } struct WaitingOnBarTxtState { content: String, bar_txt_future: impl Future, } struct EndState {} ``` In the "start" and _"Waiting on foo.txt"_ states, the `min_len` parameter needs to be stored for the later comparison with `content.len()`. The _"Waiting on foo.txt"_ state additionally stores a `foo_txt_future`, which represents the future returned by the `async_read_file` call. This future needs to be polled again when the state machine continues, so it needs to be saved. The _"Waiting on bar.txt"_ state contains the `content` variable for the later string concatenation when `bar.txt` is ready. It also stores a `bar_txt_future` that represents the in-progress load of `bar.txt`. The struct does not contain the `min_len` variable because it is no longer needed after the `content.len()` comparison. In the _"end"_ state, no variables are stored because the function has already run to completion. Keep in mind that this is only an example of the code that the compiler could generate. The struct names and the field layout are implementation details and might be different. #### The Full State Machine Type While the exact compiler-generated code is an implementation detail, it helps in understanding to imagine how the generated state machine _could_ look for the `example` function. We already defined the structs representing the different states and containing the required variables. To create a state machine on top of them, we can combine them into an [`enum`]: [`enum`]: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html ```rust enum ExampleStateMachine { Start(StartState), WaitingOnFooTxt(WaitingOnFooTxtState), WaitingOnBarTxt(WaitingOnBarTxtState), End(EndState), } ``` We define a separate enum variant for each state and add the corresponding state struct to each variant as a field. To implement the state transitions, the compiler generates an implementation of the `Future` trait based on the `example` function: ```rust impl Future for ExampleStateMachine { type Output = String; // return type of `example` fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { loop { match self { // TODO: handle pinning ExampleStateMachine::Start(state) => {…} ExampleStateMachine::WaitingOnFooTxt(state) => {…} ExampleStateMachine::WaitingOnBarTxt(state) => {…} ExampleStateMachine::End(state) => {…} } } } } ``` The `Output` type of the future is `String` because it's the return type of the `example` function. To implement the `poll` function, we use a `match` statement on the current state inside a `loop`. The idea is that we switch to the next state as long as possible and use an explicit `return Poll::Pending` when we can't continue. For simplicity, we only show simplified code and don't handle [pinning][_pinning_], ownership, lifetimes, etc. So this and the following code should be treated as pseudo-code and not used directly. Of course, the real compiler-generated code handles everything correctly, albeit possibly in a different way. To keep the code excerpts small, we present the code for each `match` arm separately. Let's begin with the `Start` state: ```rust ExampleStateMachine::Start(state) => { // from body of `example` let foo_txt_future = async_read_file("foo.txt"); // `.await` operation let state = WaitingOnFooTxtState { min_len: state.min_len, foo_txt_future, }; *self = ExampleStateMachine::WaitingOnFooTxt(state); } ``` The state machine is in the `Start` state when it is right at the beginning of the function. In this case, we execute all the code from the body of the `example` function until the first `.await`. To handle the `.await` operation, we change the state of the `self` state machine to `WaitingOnFooTxt`, which includes the construction of the `WaitingOnFooTxtState` struct. Since the `match self {…}` statement is executed in a loop, the execution jumps to the `WaitingOnFooTxt` arm next: ```rust ExampleStateMachine::WaitingOnFooTxt(state) => { match state.foo_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(content) => { // from body of `example` if content.len() < state.min_len { let bar_txt_future = async_read_file("bar.txt"); // `.await` operation let state = WaitingOnBarTxtState { content, bar_txt_future, }; *self = ExampleStateMachine::WaitingOnBarTxt(state); } else { *self = ExampleStateMachine::End(EndState); return Poll::Ready(content); } } } } ``` In this `match` arm, we first call the `poll` function of the `foo_txt_future`. If it is not ready, we exit the loop and return `Poll::Pending`. Since `self` stays in the `WaitingOnFooTxt` state in this case, the next `poll` call on the state machine will enter the same `match` arm and retry polling the `foo_txt_future`. When the `foo_txt_future` is ready, we assign the result to the `content` variable and continue to execute the code of the `example` function: If `content.len()` is smaller than the `min_len` saved in the state struct, the `bar.txt` file is read asynchronously. We again translate the `.await` operation into a state change, this time into the `WaitingOnBarTxt` state. Since we're executing the `match` inside a loop, the execution directly jumps to the `match` arm for the new state afterward, where the `bar_txt_future` is polled. In case we enter the `else` branch, no further `.await` operation occurs. We reach the end of the function and return `content` wrapped in `Poll::Ready`. We also change the current state to the `End` state. The code for the `WaitingOnBarTxt` state looks like this: ```rust ExampleStateMachine::WaitingOnBarTxt(state) => { match state.bar_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(bar_txt) => { *self = ExampleStateMachine::End(EndState); // from body of `example` return Poll::Ready(state.content + &bar_txt); } } } ``` Similar to the `WaitingOnFooTxt` state, we start by polling the `bar_txt_future`. If it is still pending, we exit the loop and return `Poll::Pending`. Otherwise, we can perform the last operation of the `example` function: concatenating the `content` variable with the result from the future. We update the state machine to the `End` state and then return the result wrapped in `Poll::Ready`. Finally, the code for the `End` state looks like this: ```rust ExampleStateMachine::End(_) => { panic!("poll called after Poll::Ready was returned"); } ``` Futures should not be polled again after they returned `Poll::Ready`, so we panic if `poll` is called while we are already in the `End` state. We now know what the compiler-generated state machine and its implementation of the `Future` trait _could_ look like. In practice, the compiler generates code in a different way. (In case you're interested, the implementation is currently based on [_coroutines_], but this is only an implementation detail.) [_coroutines_]: https://doc.rust-lang.org/stable/unstable-book/language-features/coroutines.html The last piece of the puzzle is the generated code for the `example` function itself. Remember, the function header was defined like this: ```rust async fn example(min_len: usize) -> String ``` Since the complete function body is now implemented by the state machine, the only thing that the function needs to do is to initialize the state machine and return it. The generated code for this could look like this: ```rust fn example(min_len: usize) -> ExampleStateMachine { ExampleStateMachine::Start(StartState { min_len, }) } ``` The function no longer has an `async` modifier since it now explicitly returns an `ExampleStateMachine` type, which implements the `Future` trait. As expected, the state machine is constructed in the `Start` state and the corresponding state struct is initialized with the `min_len` parameter. Note that this function does not start the execution of the state machine. This is a fundamental design decision of futures in Rust: they do nothing until they are polled for the first time. ### Pinning We already stumbled across _pinning_ multiple times in this post. Now is finally the time to explore what pinning is and why it is needed. #### Self-Referential Structs As explained above, the state machine transformation stores the local variables of each pause point in a struct. For small examples like our `example` function, this was straightforward and did not lead to any problems. However, things become more difficult when variables reference each other. For example, consider this function: ```rust async fn pin_example() -> i32 { let array = [1, 2, 3]; let element = &array[2]; async_write_file("foo.txt", element.to_string()).await; *element } ``` This function creates a small `array` with the contents `1`, `2`, and `3`. It then creates a reference to the last array element and stores it in an `element` variable. Next, it asynchronously writes the number converted to a string to a `foo.txt` file. Finally, it returns the number referenced by `element`. Since the function uses a single `await` operation, the resulting state machine has three states: start, end, and "waiting on write". The function takes no arguments, so the struct for the start state is empty. Like before, the struct for the end state is empty because the function is finished at this point. The struct for the "waiting on write" state is more interesting: ```rust struct WaitingOnWriteState { array: [1, 2, 3], element: 0x1001c, // address of the last array element } ``` We need to store both the `array` and `element` variables because `element` is required for the return value and `array` is referenced by `element`. Since `element` is a reference, it stores a _pointer_ (i.e., a memory address) to the referenced element. We used `0x1001c` as an example memory address here. In reality, it needs to be the address of the last element of the `array` field, so it depends on where the struct lives in memory. Structs with such internal pointers are called _self-referential_ structs because they reference themselves from one of their fields. #### The Problem with Self-Referential Structs The internal pointer of our self-referential struct leads to a fundamental problem, which becomes apparent when we look at its memory layout: ![array at 0x10014 with fields 1, 2, and 3; element at address 0x10020, pointing to the last array element at 0x1001c](self-referential-struct.svg) The `array` field starts at address 0x10014 and the `element` field at address 0x10020. It points to address 0x1001c because the last array element lives at this address. At this point, everything is still fine. However, an issue occurs when we move this struct to a different memory address: ![array at 0x10024 with fields 1, 2, and 3; element at address 0x10030, still pointing to 0x1001c, even though the last array element now lives at 0x1002c](self-referential-struct-moved.svg) We moved the struct a bit so that it starts at address `0x10024` now. This could, for example, happen when we pass the struct as a function argument or assign it to a different stack variable. The problem is that the `element` field still points to address `0x1001c` even though the last `array` element now lives at address `0x1002c`. Thus, the pointer is dangling, with the result that undefined behavior occurs on the next `poll` call. #### Possible Solutions There are three fundamental approaches to solving the dangling pointer problem: - **Update the pointer on move:** The idea is to update the internal pointer whenever the struct is moved in memory so that it is still valid after the move. Unfortunately, this approach would require extensive changes to Rust that would result in potentially huge performance losses. The reason is that some kind of runtime would need to keep track of the type of all struct fields and check on every move operation whether a pointer update is required. - **Store an offset instead of self-references:**: To avoid the requirement for updating pointers, the compiler could try to store self-references as offsets from the struct's beginning instead. For example, the `element` field of the above `WaitingOnWriteState` struct could be stored in the form of an `element_offset` field with a value of 8 because the array element that the reference points to starts 8 bytes after the struct's beginning. Since the offset stays the same when the struct is moved, no field updates are required. The problem with this approach is that it requires the compiler to detect all self-references. This is not possible at compile-time because the value of a reference might depend on user input, so we would need a runtime system again to analyze references and correctly create the state structs. This would not only result in runtime costs but also prevent certain compiler optimizations, so that it would cause large performance losses again. - **Forbid moving the struct:** As we saw above, the dangling pointer only occurs when we move the struct in memory. By completely forbidding move operations on self-referential structs, the problem can also be avoided. The big advantage of this approach is that it can be implemented at the type system level without additional runtime costs. The drawback is that it puts the burden of dealing with move operations on possibly self-referential structs on the programmer. Rust chose the third solution because of its principle of providing _zero cost abstractions_, which means that abstractions should not impose additional runtime costs. The [_pinning_] API was proposed for this purpose in [RFC 2349](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md). In the following, we will give a short overview of this API and explain how it works with async/await and futures. #### Heap Values The first observation is that [heap-allocated] values already have a fixed memory address most of the time. They are created using a call to `allocate` and then referenced by a pointer type such as `Box`. While moving the pointer type is possible, the heap value that the pointer points to stays at the same memory address until it is freed through a `deallocate` call again. [heap-allocated]: @/edition-2/posts/10-heap-allocation/index.md Using heap allocation, we can try to create a self-referential struct: ```rust fn main() { let mut heap_value = Box::new(SelfReferential { self_ptr: 0 as *const _, }); let ptr = &*heap_value as *const SelfReferential; heap_value.self_ptr = ptr; println!("heap value at: {:p}", heap_value); println!("internal reference: {:p}", heap_value.self_ptr); } struct SelfReferential { self_ptr: *const Self, } ``` ([Try it on the playground][playground-self-ref]) [playground-self-ref]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=ce1aff3a37fcc1c8188eeaf0f39c97e8 We create a simple struct named `SelfReferential` that contains a single pointer field. First, we initialize this struct with a null pointer and then allocate it on the heap using `Box::new`. We then determine the memory address of the heap-allocated struct and store it in a `ptr` variable. Finally, we make the struct self-referential by assigning the `ptr` variable to the `self_ptr` field. When we execute this code [on the playground][playground-self-ref], we see that the address of the heap value and its internal pointer are equal, which means that the `self_ptr` field is a valid self-reference. Since the `heap_value` variable is only a pointer, moving it (e.g., by passing it to a function) does not change the address of the struct itself, so the `self_ptr` stays valid even if the pointer is moved. However, there is still a way to break this example: We can move out of a `Box` or replace its content: ```rust let stack_value = mem::replace(&mut *heap_value, SelfReferential { self_ptr: 0 as *const _, }); println!("value at: {:p}", &stack_value); println!("internal reference: {:p}", stack_value.self_ptr); ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) Here we use the [`mem::replace`] function to replace the heap-allocated value with a new struct instance. This allows us to move the original `heap_value` to the stack, while the `self_ptr` field of the struct is now a dangling pointer that still points to the old heap address. When you try to run the example on the playground, you see that the printed _"value at:"_ and _"internal reference:"_ lines indeed show different pointers. So heap allocating a value is not enough to make self-references safe. [`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html The fundamental problem that allowed the above breakage is that `Box` allows us to get a `&mut T` reference to the heap-allocated value. This `&mut` reference makes it possible to use methods like [`mem::replace`] or [`mem::swap`] to invalidate the heap-allocated value. To resolve this problem, we must prevent `&mut` references to self-referential structs from being created. [`mem::swap`]: https://doc.rust-lang.org/nightly/core/mem/fn.swap.html #### `Pin>` and `Unpin` The pinning API provides a solution to the `&mut T` problem in the form of the [`Pin`] wrapper type and the [`Unpin`] marker trait. The idea behind these types is to gate all methods of `Pin` that can be used to get `&mut` references to the wrapped value (e.g. [`get_mut`][pin-get-mut] or [`deref_mut`][pin-deref-mut]) on the `Unpin` trait. The `Unpin` trait is an [_auto trait_], which is automatically implemented for all types except those that explicitly opt-out. By making self-referential structs opt-out of `Unpin`, there is no (safe) way to get a `&mut T` from a `Pin>` type for them. As a result, their internal self-references are guaranteed to stay valid. [`Pin`]: https://doc.rust-lang.org/stable/core/pin/struct.Pin.html [`Unpin`]: https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html [pin-get-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_mut [pin-deref-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.deref_mut [_auto trait_]: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits As an example, let's update the `SelfReferential` type from above to opt-out of `Unpin`: ```rust use core::marker::PhantomPinned; struct SelfReferential { self_ptr: *const Self, _pin: PhantomPinned, } ``` We opt-out by adding a second `_pin` field of type [`PhantomPinned`]. This type is a zero-sized marker type whose only purpose is to _not_ implement the `Unpin` trait. Because of the way [auto traits][_auto trait_] work, a single field that is not `Unpin` suffices to make the complete struct opt-out of `Unpin`. [`PhantomPinned`]: https://doc.rust-lang.org/nightly/core/marker/struct.PhantomPinned.html The second step is to change the `Box` type in the example to a `Pin>` type. The easiest way to do this is to use the [`Box::pin`] function instead of [`Box::new`] for creating the heap-allocated value: [`Box::pin`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin [`Box::new`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.new ```rust let mut heap_value = Box::pin(SelfReferential { self_ptr: 0 as *const _, _pin: PhantomPinned, }); ``` In addition to changing `Box::new` to `Box::pin`, we also need to add the new `_pin` field in the struct initializer. Since `PhantomPinned` is a zero-sized type, we only need its type name to initialize it. When we [try to run our adjusted example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=961b0db194bbe851ff4d0ed08d3bd98a) now, we see that it no longer works: ``` error[E0594]: cannot assign to data in dereference of `Pin>` --> src/main.rs:10:5 | 10 | heap_value.self_ptr = ptr; | ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot assign | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` error[E0596]: cannot borrow data in dereference of `Pin>` as mutable --> src/main.rs:16:36 | 16 | let stack_value = mem::replace(&mut *heap_value, SelfReferential { | ^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` ``` Both errors occur because the `Pin>` type no longer implements the `DerefMut` trait. This is exactly what we wanted because the `DerefMut` trait would return a `&mut` reference, which we wanted to prevent. This only happens because we both opted-out of `Unpin` and changed `Box::new` to `Box::pin`. The problem now is that the compiler does not only prevent moving the type in line 16, but also forbids initializing the `self_ptr` field in line 10. This happens because the compiler can't differentiate between valid and invalid uses of `&mut` references. To get the initialization working again, we have to use the unsafe [`get_unchecked_mut`] method: [`get_unchecked_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut ```rust // safe because modifying a field doesn't move the whole struct unsafe { let mut_ref = Pin::as_mut(&mut heap_value); Pin::get_unchecked_mut(mut_ref).self_ptr = ptr; } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=b9ebbb11429d9d79b3f9fffe819e2018)) The [`get_unchecked_mut`] function works on a `Pin<&mut T>` instead of a `Pin>`, so we have to use [`Pin::as_mut`] for converting the value. Then we can set the `self_ptr` field using the `&mut` reference returned by `get_unchecked_mut`. [`Pin::as_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut Now the only error left is the desired error on `mem::replace`. Remember, this operation tries to move the heap-allocated value to the stack, which would break the self-reference stored in the `self_ptr` field. By opting out of `Unpin` and using `Pin>`, we can prevent this operation at compile time and thus safely work with self-referential structs. As we saw, the compiler is not able to prove that the creation of the self-reference is safe (yet), so we need to use an unsafe block and verify the correctness ourselves. #### Stack Pinning and `Pin<&mut T>` In the previous section, we learned how to use `Pin>` to safely create a heap-allocated self-referential value. While this approach works fine and is relatively safe (apart from the unsafe construction), the required heap allocation comes with a performance cost. Since Rust strives to provide _zero-cost abstractions_ whenever possible, the pinning API also allows to create `Pin<&mut T>` instances that point to stack-allocated values. Unlike `Pin>` instances, which have _ownership_ of the wrapped value, `Pin<&mut T>` instances only temporarily borrow the wrapped value. This makes things more complicated, as it requires the programmer to ensure additional guarantees themselves. Most importantly, a `Pin<&mut T>` must stay pinned for the whole lifetime of the referenced `T`, which can be difficult to verify for stack-based variables. To help with this, crates like [`pin-utils`] exist, but I still wouldn't recommend pinning to the stack unless you really know what you're doing. [`pin-utils`]: https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/ For further reading, check out the documentation of the [`pin` module] and the [`Pin::new_unchecked`] method. [`pin` module]: https://doc.rust-lang.org/nightly/core/pin/index.html [`Pin::new_unchecked`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.new_unchecked #### Pinning and Futures As we already saw in this post, the [`Future::poll`] method uses pinning in the form of a `Pin<&mut Self>` parameter: [`Future::poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll ```rust fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll ``` The reason that this method takes `self: Pin<&mut Self>` instead of the normal `&mut self` is that future instances created from async/await are often self-referential, as we saw [above][self-ref-async-await]. By wrapping `Self` into `Pin` and letting the compiler opt-out of `Unpin` for self-referential futures generated from async/await, it is guaranteed that the futures are not moved in memory between `poll` calls. This ensures that all internal references are still valid. [self-ref-async-await]: @/edition-2/posts/12-async-await/index.md#self-referential-structs It is worth noting that moving futures before the first `poll` call is fine. This is a result of the fact that futures are lazy and do nothing until they're polled for the first time. The `start` state of the generated state machines therefore only contains the function arguments but no internal references. In order to call `poll`, the caller must wrap the future into `Pin` first, which ensures that the future cannot be moved in memory anymore. Since stack pinning is more difficult to get right, I recommend to always use [`Box::pin`] combined with [`Pin::as_mut`] for this. [`futures`]: https://docs.rs/futures/0.3.4/futures/ In case you're interested in understanding how to safely implement a future combinator function using stack pinning yourself, take a look at the relatively short [source of the `map` combinator method][map-src] of the `futures` crate and the section about [projections and structural pinning] of the pin documentation. [map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html [projections and structural pinning]: https://doc.rust-lang.org/stable/std/pin/index.html#projections-and-structural-pinning ### Executors and Wakers Using async/await, it is possible to ergonomically work with futures in a completely asynchronous way. However, as we learned above, futures do nothing until they are polled. This means we have to call `poll` on them at some point, otherwise the asynchronous code is never executed. With a single future, we can always wait for each future manually using a loop [as described above](#waiting-on-futures). However, this approach is very inefficient and not practical for programs that create a large number of futures. The most common solution to this problem is to define a global _executor_ that is responsible for polling all futures in the system until they are finished. #### Executors The purpose of an executor is to allow spawning futures as independent tasks, typically through some sort of `spawn` method. The executor is then responsible for polling all futures until they are completed. The big advantage of managing all futures in a central place is that the executor can switch to a different future whenever a future returns `Poll::Pending`. Thus, asynchronous operations are run in parallel and the CPU is kept busy. Many executor implementations can also take advantage of systems with multiple CPU cores. They create a [thread pool] that is able to utilize all cores if there is enough work available and use techniques such as [work stealing] to balance the load between cores. There are also special executor implementations for embedded systems that optimize for low latency and memory overhead. [thread pool]: https://en.wikipedia.org/wiki/Thread_pool [work stealing]: https://en.wikipedia.org/wiki/Work_stealing To avoid the overhead of polling futures repeatedly, executors typically take advantage of the _waker_ API supported by Rust's futures. #### Wakers The idea behind the waker API is that a special [`Waker`] type is passed to each invocation of `poll`, wrapped in the [`Context`] type. This `Waker` type is created by the executor and can be used by the asynchronous task to signal its (partial) completion. As a result, the executor does not need to call `poll` on a future that previously returned `Poll::Pending` until it is notified by the corresponding waker. [`Context`]: https://doc.rust-lang.org/nightly/core/task/struct.Context.html This is best illustrated by a small example: ```rust async fn write_file() { async_write_file("foo.txt", "Hello").await; } ``` This function asynchronously writes the string "Hello" to a `foo.txt` file. Since hard disk writes take some time, the first `poll` call on this future will likely return `Poll::Pending`. However, the hard disk driver will internally store the `Waker` passed to the `poll` call and use it to notify the executor when the file is written to disk. This way, the executor does not need to waste any time trying to `poll` the future again before it receives the waker notification. We will see how the `Waker` type works in detail when we create our own executor with waker support in the implementation section of this post. ### Cooperative Multitasking? At the beginning of this post, we talked about preemptive and cooperative multitasking. While preemptive multitasking relies on the operating system to forcibly switch between running tasks, cooperative multitasking requires that the tasks voluntarily give up control of the CPU through a _yield_ operation on a regular basis. The big advantage of the cooperative approach is that tasks can save their state themselves, which results in more efficient context switches and makes it possible to share the same call stack between tasks. It might not be immediately apparent, but futures and async/await are an implementation of the cooperative multitasking pattern: - Each future that is added to the executor is basically a cooperative task. - Instead of using an explicit yield operation, futures give up control of the CPU core by returning `Poll::Pending` (or `Poll::Ready` at the end). - There is nothing that forces futures to give up the CPU. If they want, they can never return from `poll`, e.g., by spinning endlessly in a loop. - Since each future can block the execution of the other futures in the executor, we need to trust them to not be malicious. - Futures internally store all the state they need to continue execution on the next `poll` call. With async/await, the compiler automatically detects all variables that are needed and stores them inside the generated state machine. - Only the minimum state required for continuation is saved. - Since the `poll` method gives up the call stack when it returns, the same stack can be used for polling other futures. We see that futures and async/await fit the cooperative multitasking pattern perfectly; they just use some different terminology. In the following, we will therefore use the terms "task" and "future" interchangeably. ## Implementation Now that we understand how cooperative multitasking based on futures and async/await works in Rust, it's time to add support for it to our kernel. Since the [`Future`] trait is part of the `core` library and async/await is a feature of the language itself, there is nothing special we need to do to use it in our `#![no_std]` kernel. The only requirement is that we use at least nightly `2020-03-25` of Rust because async/await was not `no_std` compatible before. With a recent-enough nightly, we can start using async/await in our `main.rs`: ```rust // in src/main.rs async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` The `async_number` function is an `async fn`, so the compiler transforms it into a state machine that implements `Future`. Since the function only returns `42`, the resulting future will directly return `Poll::Ready(42)` on the first `poll` call. Like `async_number`, the `example_task` function is also an `async fn`. It awaits the number returned by `async_number` and then prints it using the `println` macro. To run the future returned by `example_task`, we need to call `poll` on it until it signals its completion by returning `Poll::Ready`. To do this, we need to create a simple executor type. ### Task Before we start the executor implementation, we create a new `task` module with a `Task` type: ```rust // in src/lib.rs pub mod task; ``` ```rust // in src/task/mod.rs use core::{future::Future, pin::Pin}; use alloc::boxed::Box; pub struct Task { future: Pin>>, } ``` The `Task` struct is a newtype wrapper around a pinned, heap-allocated, and dynamically dispatched future with the empty type `()` as output. Let's go through it in detail: - We require that the future associated with a task returns `()`. This means that tasks don't return any result, they are just executed for their side effects. For example, the `example_task` function we defined above has no return value, but it prints something to the screen as a side effect. - The `dyn` keyword indicates that we store a [_trait object_] in the `Box`. This means that the methods on the future are [_dynamically dispatched_], allowing different types of futures to be stored in the `Task` type. This is important because each `async fn` has its own type and we want to be able to create multiple different tasks. - As we learned in the [section about pinning], the `Pin` type ensures that a value cannot be moved in memory by placing it on the heap and preventing the creation of `&mut` references to it. This is important because futures generated by async/await might be self-referential, i.e., contain pointers to themselves that would be invalidated when the future is moved. [_trait object_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html [_dynamically dispatched_]: https://doc.rust-lang.org/book/ch18-02-trait-objects.html#trait-objects-perform-dynamic-dispatch [section about pinning]: #pinning To allow the creation of new `Task` structs from futures, we create a `new` function: ```rust // in src/task/mod.rs impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { future: Box::pin(future), } } } ``` The function takes an arbitrary future with an output type of `()` and pins it in memory through the [`Box::pin`] function. Then it wraps the boxed future in the `Task` struct and returns it. The `'static` lifetime is required here because the returned `Task` can live for an arbitrary time, so the future needs to be valid for that time too. We also add a `poll` method to allow the executor to poll the stored future: ```rust // in src/task/mod.rs use core::task::{Context, Poll}; impl Task { fn poll(&mut self, context: &mut Context) -> Poll<()> { self.future.as_mut().poll(context) } } ``` Since the [`poll`] method of the `Future` trait expects to be called on a `Pin<&mut T>` type, we use the [`Pin::as_mut`] method to convert the `self.future` field of type `Pin>` first. Then we call `poll` on the converted `self.future` field and return the result. Since the `Task::poll` method should only be called by the executor that we'll create in a moment, we keep the function private to the `task` module. ### Simple Executor Since executors can be quite complex, we deliberately start by creating a very basic executor before implementing a more featureful executor later. For this, we first create a new `task::simple_executor` submodule: ```rust // in src/task/mod.rs pub mod simple_executor; ``` ```rust // in src/task/simple_executor.rs use super::Task; use alloc::collections::VecDeque; pub struct SimpleExecutor { task_queue: VecDeque, } impl SimpleExecutor { pub fn new() -> SimpleExecutor { SimpleExecutor { task_queue: VecDeque::new(), } } pub fn spawn(&mut self, task: Task) { self.task_queue.push_back(task) } } ``` The struct contains a single `task_queue` field of type [`VecDeque`], which is basically a vector that allows for push and pop operations on both ends. The idea behind using this type is that we insert new tasks through the `spawn` method at the end and pop the next task for execution from the front. This way, we get a simple [FIFO queue] (_"first in, first out"_). [`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html [FIFO queue]: https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics) #### Dummy Waker In order to call the `poll` method, we need to create a [`Context`] type, which wraps a [`Waker`] type. To start simple, we will first create a dummy waker that does nothing. For this, we create a [`RawWaker`] instance, which defines the implementation of the different `Waker` methods, and then use the [`Waker::from_raw`] function to turn it into a `Waker`: [`RawWaker`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html [`Waker::from_raw`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw ```rust // in src/task/simple_executor.rs use core::task::{Waker, RawWaker}; fn dummy_raw_waker() -> RawWaker { todo!(); } fn dummy_waker() -> Waker { unsafe { Waker::from_raw(dummy_raw_waker()) } } ``` The `from_raw` function is unsafe because undefined behavior can occur if the programmer does not uphold the documented requirements of `RawWaker`. Before we look at the implementation of the `dummy_raw_waker` function, we first try to understand how the `RawWaker` type works. ##### `RawWaker` The [`RawWaker`] type requires the programmer to explicitly define a [_virtual method table_] (_vtable_) that specifies the functions that should be called when the `RawWaker` is cloned, woken, or dropped. The layout of this vtable is defined by the [`RawWakerVTable`] type. Each function receives a `*const ()` argument, which is a _type-erased_ pointer to some value. The reason for using a `*const ()` pointer instead of a proper reference is that the `RawWaker` type should be non-generic but still support arbitrary types. The pointer is provided by putting it into the `data` argument of [`RawWaker::new`], which just initializes a `RawWaker`. The `Waker` then uses this `RawWaker` to call the vtable functions with `data`. [_virtual method table_]: https://en.wikipedia.org/wiki/Virtual_method_table [`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html [`RawWaker::new`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new Typically, the `RawWaker` is created for some heap-allocated struct that is wrapped into the [`Box`] or [`Arc`] type. For such types, methods like [`Box::into_raw`] can be used to convert the `Box` to a `*const T` pointer. This pointer can then be cast to an anonymous `*const ()` pointer and passed to `RawWaker::new`. Since each vtable function receives the same `*const ()` as an argument, the functions can safely cast the pointer back to a `Box` or a `&T` to operate on it. As you can imagine, this process is highly dangerous and can easily lead to undefined behavior on mistakes. For this reason, manually creating a `RawWaker` is not recommended unless necessary. [`Box`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`Box::into_raw`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw ##### A Dummy `RawWaker` While manually creating a `RawWaker` is not recommended, there is currently no other way to create a dummy `Waker` that does nothing. Fortunately, the fact that we want to do nothing makes it relatively safe to implement the `dummy_raw_waker` function: ```rust // in src/task/simple_executor.rs use core::task::RawWakerVTable; fn dummy_raw_waker() -> RawWaker { fn no_op(_: *const ()) {} fn clone(_: *const ()) -> RawWaker { dummy_raw_waker() } let vtable = &RawWakerVTable::new(clone, no_op, no_op, no_op); RawWaker::new(0 as *const (), vtable) } ``` First, we define two inner functions named `no_op` and `clone`. The `no_op` function takes a `*const ()` pointer and does nothing. The `clone` function also takes a `*const ()` pointer and returns a new `RawWaker` by calling `dummy_raw_waker` again. We use these two functions to create a minimal `RawWakerVTable`: The `clone` function is used for the cloning operations, and the `no_op` function is used for all other operations. Since the `RawWaker` does nothing, it does not matter that we return a new `RawWaker` from `clone` instead of cloning it. After creating the `vtable`, we use the [`RawWaker::new`] function to create the `RawWaker`. The passed `*const ()` does not matter since none of the vtable functions use it. For this reason, we simply pass a null pointer. #### A `run` Method Now we have a way to create a `Waker` instance, we can use it to implement a `run` method on our executor. The most simple `run` method is to repeatedly poll all queued tasks in a loop until all are done. This is not very efficient since it does not utilize the notifications of the `Waker` type, but it is an easy way to get things running: ```rust // in src/task/simple_executor.rs use core::task::{Context, Poll}; impl SimpleExecutor { pub fn run(&mut self) { while let Some(mut task) = self.task_queue.pop_front() { let waker = dummy_waker(); let mut context = Context::from_waker(&waker); match task.poll(&mut context) { Poll::Ready(()) => {} // task done Poll::Pending => self.task_queue.push_back(task), } } } } ``` The function uses a `while let` loop to handle all tasks in the `task_queue`. For each task, it first creates a `Context` type by wrapping a `Waker` instance returned by our `dummy_waker` function. Then it invokes the `Task::poll` method with this `context`. If the `poll` method returns `Poll::Ready`, the task is finished and we can continue with the next task. If the task is still `Poll::Pending`, we add it to the back of the queue again so that it will be polled again in a subsequent loop iteration. #### Trying It With our `SimpleExecutor` type, we can now try running the task returned by the `example_task` function in our `main.rs`: ```rust // in src/main.rs use blog_os::task::{Task, simple_executor::SimpleExecutor}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialization routines, including `init_heap` let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.run(); // […] test_main, "it did not crash" message, hlt_loop } // Below is the example_task function again so that you don't have to scroll up async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` When we run it, we see that the expected _"async number: 42"_ message is printed to the screen: ![QEMU printing "Hello World", "async number: 42", and "It did not crash!"](qemu-simple-executor.png) Let's summarize the various steps that happen in this example: - First, a new instance of our `SimpleExecutor` type is created with an empty `task_queue`. - Next, we call the asynchronous `example_task` function, which returns a future. We wrap this future in the `Task` type, which moves it to the heap and pins it, and then add the task to the `task_queue` of the executor through the `spawn` method. - We then call the `run` method to start the execution of the single task in the queue. This involves: - Popping the task from the front of the `task_queue`. - Creating a `RawWaker` for the task, converting it to a [`Waker`] instance, and then creating a [`Context`] instance from it. - Calling the [`poll`] method on the future of the task, using the `Context` we just created. - Since the `example_task` does not wait for anything, it can directly run till its end on the first `poll` call. This is where the _"async number: 42"_ line is printed. - Since the `example_task` directly returns `Poll::Ready`, it is not added back to the task queue. - The `run` method returns after the `task_queue` becomes empty. The execution of our `kernel_main` function continues and the _"It did not crash!"_ message is printed. ### Async Keyboard Input Our simple executor does not utilize the `Waker` notifications and simply loops over all tasks until they are done. This wasn't a problem for our example since our `example_task` can directly run to finish on the first `poll` call. To see the performance advantages of a proper `Waker` implementation, we first need to create a task that is truly asynchronous, i.e., a task that will probably return `Poll::Pending` on the first `poll` call. We already have some kind of asynchronicity in our system that we can use for this: hardware interrupts. As we learned in the [_Interrupts_] post, hardware interrupts can occur at arbitrary points in time, determined by some external device. For example, a hardware timer sends an interrupt to the CPU after some predefined time has elapsed. When the CPU receives an interrupt, it immediately transfers control to the corresponding handler function defined in the interrupt descriptor table (IDT). [_Interrupts_]: @/edition-2/posts/07-hardware-interrupts/index.md In the following, we will create an asynchronous task based on the keyboard interrupt. The keyboard interrupt is a good candidate for this because it is both non-deterministic and latency-critical. Non-deterministic means that there is no way to predict when the next key press will occur because it is entirely dependent on the user. Latency-critical means that we want to handle the keyboard input in a timely manner, otherwise the user will feel a lag. To support such a task in an efficient way, it will be essential that the executor has proper support for `Waker` notifications. #### Scancode Queue Currently, we handle the keyboard input directly in the interrupt handler. This is not a good idea for the long term because interrupt handlers should stay as short as possible as they might interrupt important work. Instead, interrupt handlers should only perform the minimal amount of work necessary (e.g., reading the keyboard scancode) and leave the rest of the work (e.g., interpreting the scancode) to a background task. A common pattern for delegating work to a background task is to create some sort of queue. The interrupt handler pushes units of work to the queue, and the background task handles the work in the queue. Applied to our keyboard interrupt, this means that the interrupt handler only reads the scancode from the keyboard, pushes it to the queue, and then returns. The keyboard task sits on the other end of the queue and interprets and handles each scancode that is pushed to it: ![Scancode queue with 8 slots on the top. Keyboard interrupt handler on the bottom left with a "push scancode" arrow to the left of the queue. Keyboard task on the bottom right with a "pop scancode" arrow coming from the right side of the queue.](scancode-queue.svg) A simple implementation of that queue could be a mutex-protected [`VecDeque`]. However, using mutexes in interrupt handlers is not a good idea since it can easily lead to deadlocks. For example, when the user presses a key while the keyboard task has locked the queue, the interrupt handler tries to acquire the lock again and hangs indefinitely. Another problem with this approach is that `VecDeque` automatically increases its capacity by performing a new heap allocation when it becomes full. This can lead to deadlocks again because our allocator also uses a mutex internally. Further problems are that heap allocations can fail or take a considerable amount of time when the heap is fragmented. To prevent these problems, we need a queue implementation that does not require mutexes or allocations for its `push` operation. Such queues can be implemented by using lock-free [atomic operations] for pushing and popping elements. This way, it is possible to create `push` and `pop` operations that only require a `&self` reference and are thus usable without a mutex. To avoid allocations on `push`, the queue can be backed by a pre-allocated fixed-size buffer. While this makes the queue _bounded_ (i.e., it has a maximum length), it is often possible to define reasonable upper bounds for the queue length in practice, so that this isn't a big problem. [atomic operations]: https://doc.rust-lang.org/core/sync/atomic/index.html ##### The `crossbeam` Crate Implementing such a queue in a correct and efficient way is very difficult, so I recommend sticking to existing, well-tested implementations. One popular Rust project that implements various mutex-free types for concurrent programming is [`crossbeam`]. It provides a type named [`ArrayQueue`] that is exactly what we need in this case. And we're lucky: the type is fully compatible with `no_std` crates with allocation support. [`crossbeam`]: https://github.com/crossbeam-rs/crossbeam [`ArrayQueue`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html To use the type, we need to add a dependency on the `crossbeam-queue` crate: ```toml # in Cargo.toml [dependencies.crossbeam-queue] version = "0.3.11" default-features = false features = ["alloc"] ``` By default, the crate depends on the standard library. To make it `no_std` compatible, we need to disable its default features and instead enable the `alloc` feature. (Note that we could also add a dependency on the main `crossbeam` crate, which re-exports the `crossbeam-queue` crate, but this would result in a larger number of dependencies and longer compile times.) ##### Queue Implementation Using the `ArrayQueue` type, we can now create a global scancode queue in a new `task::keyboard` module: ```rust // in src/task/mod.rs pub mod keyboard; ``` ```rust // in src/task/keyboard.rs use conquer_once::spin::OnceCell; use crossbeam_queue::ArrayQueue; static SCANCODE_QUEUE: OnceCell> = OnceCell::uninit(); ``` Since [`ArrayQueue::new`] performs a heap allocation, which is not possible at compile time ([yet][const-heap-alloc]), we can't initialize the static variable directly. Instead, we use the [`OnceCell`] type of the [`conquer_once`] crate, which makes it possible to perform a safe one-time initialization of static values. To include the crate, we need to add it as a dependency in our `Cargo.toml`: [`ArrayQueue::new`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.new [const-heap-alloc]: https://github.com/rust-lang/const-eval/issues/20 [`OnceCell`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html [`conquer_once`]: https://docs.rs/conquer-once/0.2.0/conquer_once/index.html ```toml # in Cargo.toml [dependencies.conquer-once] version = "0.2.0" default-features = false ``` Instead of the [`OnceCell`] primitive, we could also use the [`lazy_static`] macro here. However, the `OnceCell` type has the advantage that we can ensure that the initialization does not happen in the interrupt handler, thus preventing the interrupt handler from performing a heap allocation. [`lazy_static`]: https://docs.rs/lazy_static/1.4.0/lazy_static/index.html #### Filling the Queue To fill the scancode queue, we create a new `add_scancode` function that we will call from the interrupt handler: ```rust // in src/task/keyboard.rs use crate::println; /// Called by the keyboard interrupt handler /// /// Must not block or allocate. pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } } else { println!("WARNING: scancode queue uninitialized"); } } ``` We use [`OnceCell::try_get`] to get a reference to the initialized queue. If the queue is not initialized yet, we ignore the keyboard scancode and print a warning. It's important that we don't try to initialize the queue in this function because it will be called by the interrupt handler, which should not perform heap allocations. Since this function should not be callable from our `main.rs`, we use the `pub(crate)` visibility to make it only available to our `lib.rs`. [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get The fact that the [`ArrayQueue::push`] method requires only a `&self` reference makes it very simple to call the method on the static queue. The `ArrayQueue` type performs all the necessary synchronization itself, so we don't need a mutex wrapper here. In case the queue is full, we print a warning too. [`ArrayQueue::push`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push To call the `add_scancode` function on keyboard interrupts, we update our `keyboard_interrupt_handler` function in the `interrupts` module: ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame ) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; crate::task::keyboard::add_scancode(scancode); // new unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` We removed all the keyboard handling code from this function and instead added a call to the `add_scancode` function. The rest of the function stays the same as before. As expected, keypresses are no longer printed to the screen when we run our project using `cargo run` now. Instead, we see the warning that the scancode queue is uninitialized for every keystroke. #### Scancode Stream To initialize the `SCANCODE_QUEUE` and read the scancodes from the queue in an asynchronous way, we create a new `ScancodeStream` type: ```rust // in src/task/keyboard.rs pub struct ScancodeStream { _private: (), } impl ScancodeStream { pub fn new() -> Self { SCANCODE_QUEUE.try_init_once(|| ArrayQueue::new(100)) .expect("ScancodeStream::new should only be called once"); ScancodeStream { _private: () } } } ``` The purpose of the `_private` field is to prevent construction of the struct from outside of the module. This makes the `new` function the only way to construct the type. In the function, we first try to initialize the `SCANCODE_QUEUE` static. We panic if it is already initialized to ensure that only a single `ScancodeStream` instance can be created. To make the scancodes available to asynchronous tasks, the next step is to implement a `poll`-like method that tries to pop the next scancode off the queue. While this sounds like we should implement the [`Future`] trait for our type, this does not quite fit here. The problem is that the `Future` trait only abstracts over a single asynchronous value and expects that the `poll` method is not called again after it returns `Poll::Ready`. Our scancode queue, however, contains multiple asynchronous values, so it is okay to keep polling it. ##### The `Stream` Trait Since types that yield multiple asynchronous values are common, the [`futures`] crate provides a useful abstraction for such types: the [`Stream`] trait. The trait is defined like this: [`Stream`]: https://rust-lang.github.io/async-book/05_streams/01_chapter.html ```rust pub trait Stream { type Item; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll>; } ``` This definition is quite similar to the [`Future`] trait, with the following differences: - The associated type is named `Item` instead of `Output`. - Instead of a `poll` method that returns `Poll`, the `Stream` trait defines a `poll_next` method that returns a `Poll>` (note the additional `Option`). There is also a semantic difference: The `poll_next` can be called repeatedly, until it returns `Poll::Ready(None)` to signal that the stream is finished. In this regard, the method is similar to the [`Iterator::next`] method, which also returns `None` after the last value. [`Iterator::next`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html#tymethod.next ##### Implementing `Stream` Let's implement the `Stream` trait for our `ScancodeStream` to provide the values of the `SCANCODE_QUEUE` in an asynchronous way. For this, we first need to add a dependency on the `futures-util` crate, which contains the `Stream` type: ```toml # in Cargo.toml [dependencies.futures-util] version = "0.3.4" default-features = false features = ["alloc"] ``` We disable the default features to make the crate `no_std` compatible and enable the `alloc` feature to make its allocation-based types available (we will need this later). (Note that we could also add a dependency on the main `futures` crate, which re-exports the `futures-util` crate, but this would result in a larger number of dependencies and longer compile times.) Now we can import and implement the `Stream` trait: ```rust // in src/task/keyboard.rs use core::{pin::Pin, task::{Poll, Context}}; use futures_util::stream::Stream; impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE.try_get().expect("not initialized"); match queue.pop() { Some(scancode) => Poll::Ready(Some(scancode)), None => Poll::Pending, } } } ``` We first use the [`OnceCell::try_get`] method to get a reference to the initialized scancode queue. This should never fail since we initialize the queue in the `new` function, so we can safely use the `expect` method to panic if it's not initialized. Next, we use the [`ArrayQueue::pop`] method to try to get the next element from the queue. If it succeeds, we return the scancode wrapped in `Poll::Ready(Some(…))`. If it fails, it means that the queue is empty. In that case, we return `Poll::Pending`. [`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop #### Waker Support Like the `Futures::poll` method, the `Stream::poll_next` method requires the asynchronous task to notify the executor when it becomes ready after `Poll::Pending` is returned. This way, the executor does not need to poll the same task again until it is notified, which greatly reduces the performance overhead of waiting tasks. To send this notification, the task should extract the [`Waker`] from the passed [`Context`] reference and store it somewhere. When the task becomes ready, it should invoke the [`wake`] method on the stored `Waker` to notify the executor that the task should be polled again. ##### AtomicWaker To implement the `Waker` notification for our `ScancodeStream`, we need a place where we can store the `Waker` between poll calls. We can't store it as a field in the `ScancodeStream` itself because it needs to be accessible from the `add_scancode` function. The solution to this is to use a static variable of the [`AtomicWaker`] type provided by the `futures-util` crate. Like the `ArrayQueue` type, this type is based on atomic instructions and can be safely stored in a `static` and modified concurrently. [`AtomicWaker`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html Let's use the [`AtomicWaker`] type to define a static `WAKER`: ```rust // in src/task/keyboard.rs use futures_util::task::AtomicWaker; static WAKER: AtomicWaker = AtomicWaker::new(); ``` The idea is that the `poll_next` implementation stores the current waker in this static, and the `add_scancode` function calls the `wake` function on it when a new scancode is added to the queue. ##### Storing a Waker The contract defined by `poll`/`poll_next` requires the task to register a wakeup for the passed `Waker` when it returns `Poll::Pending`. Let's modify our `poll_next` implementation to satisfy this requirement: ```rust // in src/task/keyboard.rs impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE .try_get() .expect("scancode queue not initialized"); // fast path if let Some(scancode) = queue.pop() { return Poll::Ready(Some(scancode)); } WAKER.register(&cx.waker()); match queue.pop() { Some(scancode) => { WAKER.take(); Poll::Ready(Some(scancode)) } None => Poll::Pending, } } } ``` Like before, we first use the [`OnceCell::try_get`] function to get a reference to the initialized scancode queue. We then optimistically try to `pop` from the queue and return `Poll::Ready` when it succeeds. This way, we can avoid the performance overhead of registering a waker when the queue is not empty. If the first call to `queue.pop()` does not succeed, the queue is potentially empty. Only potentially because the interrupt handler might have filled the queue asynchronously immediately after the check. Since this race condition can occur again for the next check, we need to register the `Waker` in the `WAKER` static before the second check. This way, a wakeup might happen before we return `Poll::Pending`, but it is guaranteed that we get a wakeup for any scancodes pushed after the check. After registering the `Waker` contained in the passed [`Context`] through the [`AtomicWaker::register`] function, we try to pop from the queue a second time. If it now succeeds, we return `Poll::Ready`. We also remove the registered waker again using [`AtomicWaker::take`] because a waker notification is no longer needed. In case `queue.pop()` fails for a second time, we return `Poll::Pending` like before, but this time with a registered wakeup. [`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register [`AtomicWaker::take`]: https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take Note that there are two ways that a wakeup can happen for a task that did not return `Poll::Pending` (yet). One way is the mentioned race condition when the wakeup happens immediately before returning `Poll::Pending`. The other way is when the queue is no longer empty after registering the waker, so that `Poll::Ready` is returned. Since these spurious wakeups are not preventable, the executor needs to be able to handle them correctly. ##### Waking the Stored Waker To wake the stored `Waker`, we add a call to `WAKER.wake()` in the `add_scancode` function: ```rust // in src/task/keyboard.rs pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } else { WAKER.wake(); // new } } else { println!("WARNING: scancode queue uninitialized"); } } ``` The only change that we made is to add a call to `WAKER.wake()` if the push to the scancode queue succeeds. If a waker is registered in the `WAKER` static, this method will call the equally-named [`wake`] method on it, which notifies the executor. Otherwise, the operation is a no-op, i.e., nothing happens. [`wake`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.wake It is important that we call `wake` only after pushing to the queue because otherwise the task might be woken too early while the queue is still empty. This can, for example, happen when using a multi-threaded executor that starts the woken task concurrently on a different CPU core. While we don't have thread support yet, we will add it soon and don't want things to break then. #### Keyboard Task Now that we implemented the `Stream` trait for our `ScancodeStream`, we can use it to create an asynchronous keyboard task: ```rust // in src/task/keyboard.rs use futures_util::stream::StreamExt; use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use crate::print; pub async fn print_keypresses() { let mut scancodes = ScancodeStream::new(); let mut keyboard = Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore); while let Some(scancode) = scancodes.next().await { if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } } } ``` The code is very similar to the code we had in our [keyboard interrupt handler] before we modified it in this post. The only difference is that, instead of reading the scancode from an I/O port, we take it from the `ScancodeStream`. For this, we first create a new `Scancode` stream and then repeatedly use the [`next`] method provided by the [`StreamExt`] trait to get a `Future` that resolves to the next element in the stream. By using the `await` operator on it, we asynchronously wait for the result of the future. [keyboard interrupt handler]: @/edition-2/posts/07-hardware-interrupts/index.md#interpreting-the-scancodes [`next`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html#method.next [`StreamExt`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html We use `while let` to loop until the stream returns `None` to signal its end. Since our `poll_next` method never returns `None`, this is effectively an endless loop, so the `print_keypresses` task never finishes. Let's add the `print_keypresses` task to our executor in our `main.rs` to get working keyboard input again: ```rust // in src/main.rs use blog_os::task::keyboard; // new fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialization routines, including init_heap, test_main let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); // new executor.run(); // […] "it did not crash" message, hlt_loop } ``` When we execute `cargo run` now, we see that keyboard input works again: ![QEMU printing ".....H...e...l...l..o..... ...W..o..r....l...d...!"](qemu-keyboard-output.gif) If you keep an eye on the CPU utilization of your computer, you will see that the `QEMU` process now continuously keeps the CPU busy. This happens because our `SimpleExecutor` polls tasks over and over again in a loop. So even if we don't press any keys on the keyboard, the executor repeatedly calls `poll` on our `print_keypresses` task, even though the task cannot make any progress and will return `Poll::Pending` each time. ### Executor with Waker Support To fix the performance problem, we need to create an executor that properly utilizes the `Waker` notifications. This way, the executor is notified when the next keyboard interrupt occurs, so it does not need to keep polling the `print_keypresses` task over and over again. #### Task Id The first step in creating an executor with proper support for waker notifications is to give each task a unique ID. This is required because we need a way to specify which task should be woken. We start by creating a new `TaskId` wrapper type: ```rust // in src/task/mod.rs #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] struct TaskId(u64); ``` The `TaskId` struct is a simple wrapper type around `u64`. We derive a number of traits for it to make it printable, copyable, comparable, and sortable. The latter is important because we want to use `TaskId` as the key type of a [`BTreeMap`] in a moment. [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html To create a new unique ID, we create a `TaskId::new` function: ```rust use core::sync::atomic::{AtomicU64, Ordering}; impl TaskId { fn new() -> Self { static NEXT_ID: AtomicU64 = AtomicU64::new(0); TaskId(NEXT_ID.fetch_add(1, Ordering::Relaxed)) } } ``` The function uses a static `NEXT_ID` variable of type [`AtomicU64`] to ensure that each ID is assigned only once. The [`fetch_add`] method atomically increases the value and returns the previous value in one atomic operation. This means that even when the `TaskId::new` method is called in parallel, every ID is returned exactly once. The [`Ordering`] parameter defines whether the compiler is allowed to reorder the `fetch_add` operation in the instructions stream. Since we only require that the ID be unique, the `Relaxed` ordering with the weakest requirements is enough in this case. [`AtomicU64`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html [`fetch_add`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html#method.fetch_add [`Ordering`]: https://doc.rust-lang.org/core/sync/atomic/enum.Ordering.html We can now extend our `Task` type with an additional `id` field: ```rust // in src/task/mod.rs pub struct Task { id: TaskId, // new future: Pin>>, } impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { id: TaskId::new(), // new future: Box::pin(future), } } } ``` The new `id` field makes it possible to uniquely name a task, which is required for waking a specific task. #### The `Executor` Type We create our new `Executor` type in a `task::executor` module: ```rust // in src/task/mod.rs pub mod executor; ``` ```rust // in src/task/executor.rs use super::{Task, TaskId}; use alloc::{collections::BTreeMap, sync::Arc}; use core::task::Waker; use crossbeam_queue::ArrayQueue; pub struct Executor { tasks: BTreeMap, task_queue: Arc>, waker_cache: BTreeMap, } impl Executor { pub fn new() -> Self { Executor { tasks: BTreeMap::new(), task_queue: Arc::new(ArrayQueue::new(100)), waker_cache: BTreeMap::new(), } } } ``` Instead of storing tasks in a [`VecDeque`] like we did for our `SimpleExecutor`, we use a `task_queue` of task IDs and a [`BTreeMap`] named `tasks` that contains the actual `Task` instances. The map is indexed by the `TaskId` to allow efficient continuation of a specific task. The `task_queue` field is an [`ArrayQueue`] of task IDs, wrapped into the [`Arc`] type that implements _reference counting_. Reference counting makes it possible to share ownership of the value among multiple owners. It works by allocating the value on the heap and counting the number of active references to it. When the number of active references reaches zero, the value is no longer needed and can be deallocated. We use this `Arc` type for the `task_queue` because it will be shared between the executor and wakers. The idea is that the wakers push the ID of the woken task to the queue. The executor sits on the receiving end of the queue, retrieves the woken tasks by their ID from the `tasks` map, and then runs them. The reason for using a fixed-size queue instead of an unbounded queue such as [`SegQueue`] is that interrupt handlers should not allocate on push to this queue. In addition to the `task_queue` and the `tasks` map, the `Executor` type has a `waker_cache` field that is also a map. This map caches the [`Waker`] of a task after its creation. This has two reasons: First, it improves performance by reusing the same waker for multiple wake-ups of the same task instead of creating a new waker each time. Second, it ensures that reference-counted wakers are not deallocated inside interrupt handlers because it could lead to deadlocks (there are more details on this below). [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`SegQueue`]: https://docs.rs/crossbeam-queue/0.2.1/crossbeam_queue/struct.SegQueue.html To create an `Executor`, we provide a simple `new` function. We choose a capacity of 100 for the `task_queue`, which should be more than enough for the foreseeable future. In case our system will have more than 100 concurrent tasks at some point, we can easily increase this size. #### Spawning Tasks As for the `SimpleExecutor`, we provide a `spawn` method on our `Executor` type that adds a given task to the `tasks` map and immediately wakes it by pushing its ID to the `task_queue`: ```rust // in src/task/executor.rs impl Executor { pub fn spawn(&mut self, task: Task) { let task_id = task.id; if self.tasks.insert(task.id, task).is_some() { panic!("task with same ID already in tasks"); } self.task_queue.push(task_id).expect("queue full"); } } ``` If there is already a task with the same ID in the map, the [`BTreeMap::insert`] method returns it. This should never happen since each task has a unique ID, so we panic in this case since it indicates a bug in our code. Similarly, we panic when the `task_queue` is full since this should never happen if we choose a large-enough queue size. #### Running Tasks To execute all tasks in the `task_queue`, we create a private `run_ready_tasks` method: ```rust // in src/task/executor.rs use core::task::{Context, Poll}; impl Executor { fn run_ready_tasks(&mut self) { // destructure `self` to avoid borrow checker errors let Self { tasks, task_queue, waker_cache, } = self; while let Some(task_id) = task_queue.pop() { let task = match tasks.get_mut(&task_id) { Some(task) => task, None => continue, // task no longer exists }; let waker = waker_cache .entry(task_id) .or_insert_with(|| TaskWaker::new(task_id, task_queue.clone())); let mut context = Context::from_waker(waker); match task.poll(&mut context) { Poll::Ready(()) => { // task done -> remove it and its cached waker tasks.remove(&task_id); waker_cache.remove(&task_id); } Poll::Pending => {} } } } } ``` The basic idea of this function is similar to our `SimpleExecutor`: Loop over all tasks in the `task_queue`, create a waker for each task, and then poll them. However, instead of adding pending tasks back to the end of the `task_queue`, we let our `TaskWaker` implementation take care of adding woken tasks back to the queue. The implementation of this waker type will be shown in a moment. Let's look into some of the implementation details of this `run_ready_tasks` method: - We use [_destructuring_] to split `self` into its three fields to avoid some borrow checker errors. Namely, our implementation needs to access the `self.task_queue` from within a closure, which currently tries to borrow `self` completely. This is a fundamental borrow checker issue that will be resolved when [RFC 2229] is [implemented][RFC 2229 impl]. - For each popped task ID, we retrieve a mutable reference to the corresponding task from the `tasks` map. Since our `ScancodeStream` implementation registers wakers before checking whether a task needs to be put to sleep, it might happen that a wake-up occurs for a task that no longer exists. In this case, we simply ignore the wake-up and continue with the next ID from the queue. - To avoid the performance overhead of creating a waker on each poll, we use the `waker_cache` map to store the waker for each task after it has been created. For this, we use the [`BTreeMap::entry`] method in combination with [`Entry::or_insert_with`] to create a new waker if it doesn't exist yet and then get a mutable reference to it. For creating a new waker, we clone the `task_queue` and pass it together with the task ID to the `TaskWaker::new` function (implementation shown below). Since the `task_queue` is wrapped into an `Arc`, the `clone` only increases the reference count of the value, but still points to the same heap-allocated queue. Note that reusing wakers like this is not possible for all waker implementations, but our `TaskWaker` type will allow it. [_destructuring_]: https://doc.rust-lang.org/book/ch19-03-pattern-syntax.html#destructuring-to-break-apart-values [RFC 2229]: https://github.com/rust-lang/rfcs/pull/2229 [RFC 2229 impl]: https://github.com/rust-lang/rust/issues/53488 [`BTreeMap::entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry [`Entry::or_insert_with`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with A task is finished when it returns `Poll::Ready`. In that case, we remove it from the `tasks` map using the [`BTreeMap::remove`] method. We also remove its cached waker, if it exists. [`BTreeMap::remove`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove #### Waker Design The job of the waker is to push the ID of the woken task to the `task_queue` of the executor. We implement this by creating a new `TaskWaker` struct that stores the task ID and a reference to the `task_queue`: ```rust // in src/task/executor.rs struct TaskWaker { task_id: TaskId, task_queue: Arc>, } ``` Since the ownership of the `task_queue` is shared between the executor and wakers, we use the [`Arc`] wrapper type to implement shared reference-counted ownership. [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html The implementation of the wake operation is quite simple: ```rust // in src/task/executor.rs impl TaskWaker { fn wake_task(&self) { self.task_queue.push(self.task_id).expect("task_queue full"); } } ``` We push the `task_id` to the referenced `task_queue`. Since modifications to the [`ArrayQueue`] type only require a shared reference, we can implement this method on `&self` instead of `&mut self`. ##### The `Wake` Trait In order to use our `TaskWaker` type for polling futures, we need to convert it to a [`Waker`] instance first. This is required because the [`Future::poll`] method takes a [`Context`] instance as an argument, which can only be constructed from the `Waker` type. While we could do this by providing an implementation of the [`RawWaker`] type, it's both simpler and safer to instead implement the `Arc`-based [`Wake`][wake-trait] trait and then use the [`From`] implementations provided by the standard library to construct the `Waker`. The trait implementation looks like this: [wake-trait]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html ```rust // in src/task/executor.rs use alloc::task::Wake; impl Wake for TaskWaker { fn wake(self: Arc) { self.wake_task(); } fn wake_by_ref(self: &Arc) { self.wake_task(); } } ``` Since wakers are commonly shared between the executor and the asynchronous tasks, the trait methods require that the `Self` instance is wrapped in the [`Arc`] type, which implements reference-counted ownership. This means that we have to move our `TaskWaker` to an `Arc` in order to call them. The difference between the `wake` and `wake_by_ref` methods is that the latter only requires a reference to the `Arc`, while the former takes ownership of the `Arc` and thus often requires an increase of the reference count. Not all types support waking by reference, so implementing the `wake_by_ref` method is optional. However, it can lead to better performance because it avoids unnecessary reference count modifications. In our case, we can simply forward both trait methods to our `wake_task` function, which requires only a shared `&self` reference. ##### Creating Wakers Since the `Waker` type supports [`From`] conversions for all `Arc`-wrapped values that implement the `Wake` trait, we can now implement the `TaskWaker::new` function that is required by our `Executor::run_ready_tasks` method: [`From`]: https://doc.rust-lang.org/nightly/core/convert/trait.From.html ```rust // in src/task/executor.rs impl TaskWaker { fn new(task_id: TaskId, task_queue: Arc>) -> Waker { Waker::from(Arc::new(TaskWaker { task_id, task_queue, })) } } ``` We create the `TaskWaker` using the passed `task_id` and `task_queue`. We then wrap the `TaskWaker` in an `Arc` and use the `Waker::from` implementation to convert it to a [`Waker`]. This `from` method takes care of constructing a [`RawWakerVTable`] and a [`RawWaker`] instance for our `TaskWaker` type. In case you're interested in how it works in detail, check out the [implementation in the `alloc` crate][waker-from-impl]. [waker-from-impl]: https://github.com/rust-lang/rust/blob/cdb50c6f2507319f29104a25765bfb79ad53395c/src/liballoc/task.rs#L58-L87 #### A `run` Method With our waker implementation in place, we can finally construct a `run` method for our executor: ```rust // in src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); } } } ``` This method just calls the `run_ready_tasks` function in a loop. While we could theoretically return from the function when the `tasks` map becomes empty, this would never happen since our `keyboard::print_keypresses` task never finishes, so a simple `loop` should suffice. Since the function never returns, we use the `!` return type to mark the function as [diverging] to the compiler. [diverging]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html We can now change our `kernel_main` to use our new `Executor` instead of the `SimpleExecutor`: ```rust // in src/main.rs use blog_os::task::executor::Executor; // new fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialization routines, including init_heap, test_main let mut executor = Executor::new(); // new executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); executor.run(); } ``` We only need to change the import and the type name. Since our `run` function is marked as diverging, the compiler knows that it never returns, so we no longer need a call to `hlt_loop` at the end of our `kernel_main` function. When we run our kernel using `cargo run` now, we see that keyboard input still works: ![QEMU printing ".....H...e...l...l..o..... ...a..g..a....i...n...!"](qemu-keyboard-output-again.gif) However, the CPU utilization of QEMU did not get any better. The reason for this is that we still keep the CPU busy the whole time. We no longer poll tasks until they are woken again, but we still check the `task_queue` in a busy loop. To fix this, we need to put the CPU to sleep if there is no more work to do. #### Sleep If Idle The basic idea is to execute the [`hlt` instruction] when the `task_queue` is empty. This instruction puts the CPU to sleep until the next interrupt arrives. The fact that the CPU immediately becomes active again on interrupts ensures that we can still directly react when an interrupt handler pushes to the `task_queue`. [`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) To implement this, we create a new `sleep_if_idle` method in our executor and call it from our `run` method: ```rust // in src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); self.sleep_if_idle(); // new } } fn sleep_if_idle(&self) { if self.task_queue.is_empty() { x86_64::instructions::hlt(); } } } ``` Since we call `sleep_if_idle` directly after `run_ready_tasks`, which loops until the `task_queue` becomes empty, checking the queue again might seem unnecessary. However, a hardware interrupt might occur directly after `run_ready_tasks` returns, so there might be a new task in the queue at the time the `sleep_if_idle` function is called. Only if the queue is still empty, do we put the CPU to sleep by executing the `hlt` instruction through the [`instructions::hlt`] wrapper function provided by the [`x86_64`] crate. [`instructions::hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/fn.hlt.html [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/index.html Unfortunately, there is still a subtle race condition in this implementation. Since interrupts are asynchronous and can happen at any time, it is possible that an interrupt happens right between the `is_empty` check and the call to `hlt`: ```rust if self.task_queue.is_empty() { /// <--- interrupt can happen here x86_64::instructions::hlt(); } ``` In case this interrupt pushes to the `task_queue`, we put the CPU to sleep even though there is now a ready task. In the worst case, this could delay the handling of a keyboard interrupt until the next keypress or the next timer interrupt. So how do we prevent it? The answer is to disable interrupts on the CPU before the check and atomically enable them again together with the `hlt` instruction. This way, all interrupts that happen in between are delayed after the `hlt` instruction so that no wake-ups are missed. To implement this approach, we can use the [`interrupts::enable_and_hlt`][`enable_and_hlt`] function provided by the [`x86_64`] crate. [`enable_and_hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.enable_and_hlt.html The updated implementation of our `sleep_if_idle` function looks like this: ```rust // in src/task/executor.rs impl Executor { fn sleep_if_idle(&self) { use x86_64::instructions::interrupts::{self, enable_and_hlt}; interrupts::disable(); if self.task_queue.is_empty() { enable_and_hlt(); } else { interrupts::enable(); } } } ``` To avoid race conditions, we disable interrupts before checking whether the `task_queue` is empty. If it is, we use the [`enable_and_hlt`] function to enable interrupts and put the CPU to sleep as a single atomic operation. In case the queue is no longer empty, it means that an interrupt woke a task after `run_ready_tasks` returned. In that case, we enable interrupts again and directly continue execution without executing `hlt`. Now our executor properly puts the CPU to sleep when there is nothing to do. We can see that the QEMU process has a much lower CPU utilization when we run our kernel using `cargo run` again. #### Possible Extensions Our executor is now able to run tasks in an efficient way. It utilizes waker notifications to avoid polling waiting tasks and puts the CPU to sleep when there is currently no work to do. However, our executor is still quite basic, and there are many possible ways to extend its functionality: - **Scheduling**: For our `task_queue`, we currently use the [`VecDeque`] type to implement a _first in first out_ (FIFO) strategy, which is often also called _round robin_ scheduling. This strategy might not be the most efficient for all workloads. For example, it might make sense to prioritize latency-critical tasks or tasks that do a lot of I/O. See the [scheduling chapter] of the [_Operating Systems: Three Easy Pieces_] book or the [Wikipedia article on scheduling][scheduling-wiki] for more information. - **Task Spawning**: Our `Executor::spawn` method currently requires a `&mut self` reference and is thus no longer available after invoking the `run` method. To fix this, we could create an additional `Spawner` type that shares some kind of queue with the executor and allows task creation from within tasks themselves. The queue could be the `task_queue` directly or a separate queue that the executor checks in its run loop. - **Utilizing Threads**: We don't have support for threads yet, but we will add it in the next post. This will make it possible to launch multiple instances of the executor in different threads. The advantage of this approach is that the delay imposed by long-running tasks can be reduced because other tasks can run concurrently. This approach also allows it to utilize multiple CPU cores. - **Load Balancing**: When adding threading support, it becomes important to know how to distribute the tasks between the executors to ensure that all CPU cores are utilized. A common technique for this is [_work stealing_]. [scheduling chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf [_Operating Systems: Three Easy Pieces_]: http://pages.cs.wisc.edu/~remzi/OSTEP/ [scheduling-wiki]: https://en.wikipedia.org/wiki/Scheduling_(computing) [_work stealing_]: https://en.wikipedia.org/wiki/Work_stealing ## Summary We started this post by introducing **multitasking** and differentiating between _preemptive_ multitasking, which forcibly interrupts running tasks regularly, and _cooperative_ multitasking, which lets tasks run until they voluntarily give up control of the CPU. We then explored how Rust's support of **async/await** provides a language-level implementation of cooperative multitasking. Rust bases its implementation on top of the polling-based `Future` trait, which abstracts asynchronous tasks. Using async/await, it is possible to work with futures almost like with normal synchronous code. The difference is that asynchronous functions return a `Future` again, which needs to be added to an executor at some point in order to run it. Behind the scenes, the compiler transforms async/await code to _state machines_, with each `.await` operation corresponding to a possible pause point. By utilizing its knowledge about the program, the compiler is able to save only the minimal state for each pause point, resulting in a very small memory consumption per task. One challenge is that the generated state machines might contain _self-referential_ structs, for example when local variables of the asynchronous function reference each other. To prevent pointer invalidation, Rust uses the `Pin` type to ensure that futures cannot be moved in memory anymore after they have been polled for the first time. For our **implementation**, we first created a very basic executor that polls all spawned tasks in a busy loop without using the `Waker` type at all. We then showed the advantage of waker notifications by implementing an asynchronous keyboard task. The task defines a static `SCANCODE_QUEUE` using the mutex-free `ArrayQueue` type provided by the `crossbeam` crate. Instead of handling keypresses directly, the keyboard interrupt handler now puts all received scancodes in the queue and then wakes the registered `Waker` to signal that new input is available. On the receiving end, we created a `ScancodeStream` type to provide a `Future` resolving to the next scancode in the queue. This made it possible to create an asynchronous `print_keypresses` task that uses async/await to interpret and print the scancodes in the queue. To utilize the waker notifications of the keyboard task, we created a new `Executor` type that uses an `Arc`-shared `task_queue` for ready tasks. We implemented a `TaskWaker` type that pushes the ID of woken tasks directly to this `task_queue`, which are then polled again by the executor. To save power when no tasks are runnable, we added support for putting the CPU to sleep using the `hlt` instruction. Finally, we discussed some potential extensions to our executor, for example, providing multi-core support. ## What's Next? Using async/wait, we now have basic support for cooperative multitasking in our kernel. While cooperative multitasking is very efficient, it leads to latency problems when individual tasks keep running for too long, thus preventing other tasks from running. For this reason, it makes sense to also add support for preemptive multitasking to our kernel. In the next post, we will introduce _threads_ as the most common form of preemptive multitasking. In addition to resolving the problem of long-running tasks, threads will also prepare us for utilizing multiple CPU cores and running untrusted user programs in the future. ================================================ FILE: blog/content/edition-2/posts/12-async-await/index.pt-BR.md ================================================ +++ title = "Async/Await" weight = 12 path = "pt-BR/async-await" date = 2020-03-27 [extra] chapter = "Multitasking" # Please update this when updating the translation translation_based_on_commit = "1ba06fe61c39c1379bd768060c21040b62ff3f0b" # GitHub usernames of the people that translated this post translators = ["richarddalves"] +++ Neste post, exploramos _multitarefa cooperativa_ e a funcionalidade _async/await_ do Rust. Fazemos uma análise detalhada de como async/await funciona em Rust, incluindo o design da trait `Future`, a transformação em máquina de estados e _pinning_. Então adicionamos suporte básico para async/await ao nosso kernel criando uma tarefa assíncrona de teclado e um executor básico. Este blog é desenvolvido abertamente no [GitHub]. Se você tiver algum problema ou dúvida, abra um issue lá. Você também pode deixar comentários [na parte inferior]. O código-fonte completo desta publicação pode ser encontrado na branch [`post-12`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [na parte inferior]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-12 ## Multitarefa Uma das funcionalidades fundamentais da maioria dos sistemas operacionais é [_multitarefa_], que é a capacidade de executar múltiplas tarefas concorrentemente. Por exemplo, você provavelmente tem outros programas abertos enquanto olha este post, como um editor de texto ou uma janela de terminal. Mesmo se você tiver apenas uma janela de navegador aberta, provavelmente existem várias tarefas em segundo plano gerenciando suas janelas da área de trabalho, verificando atualizações ou indexando arquivos. [_multitarefa_]: https://en.wikipedia.org/wiki/Computer_multitasking Embora pareça que todas as tarefas estão sendo executadas em paralelo, apenas uma única tarefa pode ser executada em um núcleo de CPU por vez. Para criar a ilusão de que as tarefas estão sendo executadas em paralelo, o sistema operacional alterna rapidamente entre as tarefas ativas para que cada uma possa fazer um pouco de progresso. Como os computadores são rápidos, não notamos essas alternâncias na maior parte do tempo. Enquanto CPUs de núcleo único podem executar apenas uma tarefa por vez, CPUs multi-core podem executar múltiplas tarefas de forma verdadeiramente paralela. Por exemplo, uma CPU com 8 núcleos pode executar 8 tarefas ao mesmo tempo. Explicaremos como configurar CPUs multi-core em um post futuro. Para este post, focaremos em CPUs de núcleo único por simplicidade. (Vale notar que todas as CPUs multi-core começam com apenas um único núcleo ativo, então podemos tratá-las como CPUs de núcleo único por enquanto.) Existem duas formas de multitarefa: Multitarefa _cooperativa_ requer que as tarefas regularmente cedam o controle da CPU para que outras tarefas possam progredir. Multitarefa _preemptiva_ usa funcionalidades do sistema operacional para alternar threads em pontos arbitrários no tempo, pausando-as forçadamente. A seguir, exploraremos as duas formas de multitarefa em mais detalhes e discutiremos suas respectivas vantagens e desvantagens. ### Multitarefa Preemptiva A ideia por trás da multitarefa preemptiva é que o sistema operacional controla quando alternar tarefas. Para isso, ele utiliza o fato de que recupera o controle da CPU em cada interrupção. Isso torna possível alternar tarefas sempre que uma nova entrada está disponível para o sistema. Por exemplo, seria possível alternar tarefas quando o mouse é movido ou um pacote de rede chega. O sistema operacional também pode determinar o tempo exato que uma tarefa tem permissão para executar configurando um temporizador de hardware para enviar uma interrupção após esse tempo. O gráfico seguinte ilustra o processo de alternância de tarefas em uma interrupção de hardware: ![](regain-control-on-interrupt.svg) Na primeira linha, a CPU está executando a tarefa `A1` do programa `A`. Todas as outras tarefas estão pausadas. Na segunda linha, uma interrupção de hardware chega na CPU. Como descrito no post [_Interrupções de Hardware_], a CPU imediatamente para a execução da tarefa `A1` e salta para o manipulador de interrupção definido na tabela de descritores de interrupção (IDT). Através deste manipulador de interrupção, o sistema operacional agora tem controle da CPU novamente, o que permite alternar para a tarefa `B1` em vez de continuar a tarefa `A1`. [_Interrupções de Hardware_]: @/edition-2/posts/07-hardware-interrupts/index.md #### Salvando o Estado Como as tarefas são interrompidas em pontos arbitrários no tempo, elas podem estar no meio de alguns cálculos. Para poder retomá-las mais tarde, o sistema operacional deve fazer backup do estado completo da tarefa, incluindo sua [pilha de chamadas] e os valores de todos os registradores da CPU. Este processo é chamado de [_troca de contexto_]. [pilha de chamadas]: https://en.wikipedia.org/wiki/Call_stack [_troca de contexto_]: https://en.wikipedia.org/wiki/Context_switch Como a pilha de chamadas pode ser muito grande, o sistema operacional normalmente configura uma pilha de chamadas separada para cada tarefa em vez de fazer backup do conteúdo da pilha de chamadas em cada alternância de tarefa. Tal tarefa com sua própria pilha é chamada de [_thread de execução_] ou _thread_ para abreviar. Ao usar uma pilha separada para cada tarefa, apenas o conteúdo dos registradores precisa ser salvo em uma troca de contexto (incluindo o contador de programa e o ponteiro de pilha). Esta abordagem minimiza a sobrecarga de desempenho de uma troca de contexto, o que é muito importante já que trocas de contexto geralmente ocorrem até 100 vezes por segundo. [_thread de execução_]: https://en.wikipedia.org/wiki/Thread_(computing) #### Discussão A principal vantagem da multitarefa preemptiva é que o sistema operacional pode controlar totalmente o tempo de execução permitido de uma tarefa. Desta forma, ele pode garantir que cada tarefa receba uma parcela justa do tempo de CPU, sem a necessidade de confiar que as tarefas cooperarão. Isto é especialmente importante ao executar tarefas de terceiros ou quando múltiplos usuários compartilham um sistema. A desvantagem da preempção é que cada tarefa requer sua própria pilha. Comparado a uma pilha compartilhada, isso resulta em maior uso de memória por tarefa e frequentemente limita o número de tarefas no sistema. Outra desvantagem é que o sistema operacional sempre tem que salvar o estado completo dos registradores da CPU em cada troca de tarefa, mesmo que a tarefa tenha usado apenas um pequeno subconjunto dos registradores. Multitarefa preemptiva e threads são componentes fundamentais de um sistema operacional porque tornam possível executar programas de espaço de usuário não confiáveis. Discutiremos esses conceitos em detalhes completos em posts futuros. Para este post, no entanto, focaremos na multitarefa cooperativa, que também fornece capacidades úteis para o nosso kernel. ### Multitarefa Cooperativa Em vez de pausar forçadamente as tarefas em execução em pontos arbitrários no tempo, a multitarefa cooperativa permite que cada tarefa execute até que ela voluntariamente ceda o controle da CPU. Isso permite que as tarefas se pausem em pontos convenientes no tempo, por exemplo, quando precisam esperar por uma operação de E/S de qualquer forma. Multitarefa cooperativa é frequentemente usada no nível da linguagem, como na forma de [corrotinas] ou [async/await]. A ideia é que o programador ou o compilador insira operações de [_yield_] no programa, que cedem o controle da CPU e permitem que outras tarefas executem. Por exemplo, um yield poderia ser inserido após cada iteração de um loop complexo. [corrotinas]: https://en.wikipedia.org/wiki/Coroutine [async/await]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html [_yield_]: https://en.wikipedia.org/wiki/Yield_(multithreading) É comum combinar multitarefa cooperativa com [operações assíncronas]. Em vez de esperar até que uma operação seja finalizada e impedir outras tarefas de executar durante esse tempo, operações assíncronas retornam um status "não pronto" se a operação ainda não foi finalizada. Neste caso, a tarefa em espera pode executar uma operação de yield para permitir que outras tarefas executem. [operações assíncronas]: https://en.wikipedia.org/wiki/Asynchronous_I/O #### Salvando o Estado Como as tarefas definem seus próprios pontos de pausa, elas não precisam que o sistema operacional salve seu estado. Em vez disso, elas podem salvar exatamente o estado de que precisam para continuar antes de se pausarem, o que frequentemente resulta em melhor desempenho. Por exemplo, uma tarefa que acabou de finalizar um cálculo complexo pode precisar fazer backup apenas do resultado final do cálculo, já que não precisa mais dos resultados intermediários. Implementações de tarefas cooperativas com suporte da linguagem são frequentemente até capazes de fazer backup das partes necessárias da pilha de chamadas antes de pausar. Como exemplo, a implementação async/await do Rust armazena todas as variáveis locais que ainda são necessárias em uma struct gerada automaticamente (veja abaixo). Ao fazer backup das partes relevantes da pilha de chamadas antes de pausar, todas as tarefas podem compartilhar uma única pilha de chamadas, o que resulta em consumo de memória muito menor por tarefa. Isso torna possível criar um número quase arbitrário de tarefas cooperativas sem ficar sem memória. #### Discussão A desvantagem da multitarefa cooperativa é que uma tarefa não cooperativa pode potencialmente executar por um tempo ilimitado. Assim, uma tarefa maliciosa ou com bugs pode impedir outras tarefas de executar e desacelerar ou até bloquear todo o sistema. Por esta razão, multitarefa cooperativa deve ser usada apenas quando todas as tarefas são conhecidas por cooperar. Como contraexemplo, não é uma boa ideia fazer o sistema operacional depender da cooperação de programas arbitrários de nível de usuário. No entanto, os fortes benefícios de desempenho e memória da multitarefa cooperativa tornam-na uma boa abordagem para uso _dentro_ de um programa, especialmente em combinação com operações assíncronas. Como um kernel de sistema operacional é um programa crítico em termos de desempenho que interage com hardware assíncrono, multitarefa cooperativa parece uma boa abordagem para implementar concorrência. ## Async/Await em Rust A linguagem Rust fornece suporte de primeira classe para multitarefa cooperativa na forma de async/await. Antes que possamos explorar o que é async/await e como funciona, precisamos entender como _futures_ e programação assíncrona funcionam em Rust. ### Futures Uma _future_ representa um valor que pode ainda não estar disponível. Isso poderia ser, por exemplo, um inteiro que é computado por outra tarefa ou um arquivo que está sendo baixado da rede. Em vez de esperar até que o valor esteja disponível, futures tornam possível continuar a execução até que o valor seja necessário. #### Exemplo O conceito de futures é melhor ilustrado com um pequeno exemplo: ![Diagrama de sequência: main chama `read_file` e é bloqueado até que retorne; então chama `foo()` e também é bloqueado até que retorne. O mesmo processo é repetido, mas desta vez `async_read_file` é chamado, que retorna diretamente uma future; então `foo()` é chamado novamente, que agora executa concorrentemente com o carregamento do arquivo. O arquivo está disponível antes que `foo()` retorne.](async-example.svg) Este diagrama de sequência mostra uma função `main` que lê um arquivo do sistema de arquivos e então chama uma função `foo`. Este processo é repetido duas vezes: uma vez com uma chamada `read_file` síncrona e uma vez com uma chamada `async_read_file` assíncrona. Com a chamada síncrona, a função `main` precisa esperar até que o arquivo seja carregado do sistema de arquivos. Somente então ela pode chamar a função `foo`, o que requer que ela espere novamente pelo resultado. Com a chamada `async_read_file` assíncrona, o sistema de arquivos retorna diretamente uma future e carrega o arquivo assincronamente em segundo plano. Isso permite que a função `main` chame `foo` muito mais cedo, que então executa em paralelo com o carregamento do arquivo. Neste exemplo, o carregamento do arquivo até termina antes que `foo` retorne, então `main` pode trabalhar diretamente com o arquivo sem mais espera após `foo` retornar. #### Futures em Rust Em Rust, futures são representadas pela trait [`Future`], que se parece com isto: [`Future`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html ```rust pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll; } ``` O [tipo associado] `Output` especifica o tipo do valor assíncrono. Por exemplo, a função `async_read_file` no diagrama acima retornaria uma instância `Future` com `Output` definido como `File`. [tipo associado]: https://doc.rust-lang.org/book/ch20-02-advanced-traits.html#associated-types O método [`poll`] permite verificar se o valor já está disponível. Ele retorna um enum [`Poll`], que se parece com isto: [`poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll [`Poll`]: https://doc.rust-lang.org/nightly/core/task/enum.Poll.html ```rust pub enum Poll { Ready(T), Pending, } ``` Quando o valor já está disponível (por exemplo, o arquivo foi totalmente lido do disco), ele é retornado encapsulado na variante `Ready`. Caso contrário, a variante `Pending` é retornada, que sinaliza ao chamador que o valor ainda não está disponível. O método `poll` recebe dois argumentos: `self: Pin<&mut Self>` e `cx: &mut Context`. O primeiro se comporta de forma similar a uma referência normal `&mut self`, exceto que o valor `Self` é [_fixado_] em sua localização na memória. Entender `Pin` e por que é necessário é difícil sem entender como async/await funciona primeiro. Portanto, explicaremos isso mais tarde neste post. [_fixado_]: https://doc.rust-lang.org/nightly/core/pin/index.html O propósito do parâmetro `cx: &mut Context` é passar uma instância [`Waker`] para a tarefa assíncrona, por exemplo, o carregamento do arquivo do sistema de arquivos. Este `Waker` permite que a tarefa assíncrona sinalize que ela (ou uma parte dela) foi finalizada, por exemplo, que o arquivo foi carregado do disco. Como a tarefa principal sabe que será notificada quando a `Future` estiver pronta, ela não precisa chamar `poll` repetidamente. Explicaremos este processo em mais detalhes mais tarde neste post quando implementarmos nosso próprio tipo waker. [`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html ### Trabalhando com Futures Agora sabemos como futures são definidas e entendemos a ideia básica por trás do método `poll`. No entanto, ainda não sabemos como trabalhar efetivamente com futures. O problema é que futures representam os resultados de tarefas assíncronas, que podem ainda não estar disponíveis. Na prática, no entanto, frequentemente precisamos desses valores diretamente para cálculos adicionais. Então a questão é: Como podemos recuperar eficientemente o valor de uma future quando precisamos dele? #### Esperando por Futures Uma resposta possível é esperar até que uma future se torne pronta. Isso poderia se parecer com algo assim: ```rust let future = async_read_file("foo.txt"); let file_content = loop { match future.poll(…) { Poll::Ready(value) => break value, Poll::Pending => {}, // não faz nada } } ``` Aqui nós _ativamente_ esperamos pela future chamando `poll` repetidamente em um loop. Os argumentos para `poll` não importam aqui, então os omitimos. Embora esta solução funcione, ela é muito ineficiente porque mantemos a CPU ocupada até que o valor se torne disponível. Uma abordagem mais eficiente poderia ser _bloquear_ a thread atual até que a future se torne disponível. Isso é, claro, possível apenas se você tiver threads, então esta solução não funciona para o nosso kernel, pelo menos ainda não. Mesmo em sistemas onde o bloqueio é suportado, frequentemente não é desejado porque transforma uma tarefa assíncrona em uma tarefa síncrona novamente, inibindo assim os benefícios de desempenho potenciais de tarefas paralelas. #### Combinadores de Future Uma alternativa a esperar é usar combinadores de future. Combinadores de future são métodos como `map` que permitem encadear e combinar futures juntas, similar aos métodos da trait [`Iterator`]. Em vez de esperar pela future, esses combinadores retornam uma future eles mesmos, que aplica a operação de mapeamento em `poll`. [`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html Como exemplo, um simples combinador `string_len` para converter uma `Future` em uma `Future` poderia se parecer com isto: ```rust struct StringLen { inner_future: F, } impl Future for StringLen where F: Future { type Output = usize; fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll { match self.inner_future.poll(cx) { Poll::Ready(s) => Poll::Ready(s.len()), Poll::Pending => Poll::Pending, } } } fn string_len(string: impl Future) -> impl Future { StringLen { inner_future: string, } } // Uso fn file_len() -> impl Future { let file_content_future = async_read_file("foo.txt"); string_len(file_content_future) } ``` Este código não funciona perfeitamente porque não lida com [_pinning_], mas é suficiente como exemplo. A ideia básica é que a função `string_len` encapsula uma determinada instância `Future` em uma nova struct `StringLen`, que também implementa `Future`. Quando a future encapsulada é consultada, ela consulta a future interna. Se o valor ainda não está pronto, `Poll::Pending` é retornado da future encapsulada também. Se o valor está pronto, a string é extraída da variante `Poll::Ready` e seu comprimento é calculado. Depois, é encapsulado em `Poll::Ready` novamente e retornado. [_pinning_]: https://doc.rust-lang.org/stable/core/pin/index.html Com esta função `string_len`, podemos calcular o comprimento de uma string assíncrona sem esperar por ela. Como a função retorna uma `Future` novamente, o chamador não pode trabalhar diretamente no valor retornado, mas precisa usar funções combinadoras novamente. Desta forma, todo o grafo de chamadas se torna assíncrono e podemos esperar por múltiplas futures eficientemente de uma vez em algum ponto, por exemplo, na função main. Como escrever funções combinadoras manualmente é difícil, elas são frequentemente fornecidas por bibliotecas. Embora a biblioteca padrão do Rust ainda não forneça métodos combinadores, a crate semi-oficial (e compatível com `no_std`) [`futures`] fornece. Sua trait [`FutureExt`] fornece métodos combinadores de alto nível como [`map`] ou [`then`], que podem ser usados para manipular o resultado com closures arbitrárias. [`futures`]: https://docs.rs/futures/0.3.4/futures/ [`FutureExt`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html [`map`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map [`then`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then ##### Vantagens A grande vantagem dos combinadores de future é que eles mantêm as operações assíncronas. Em combinação com interfaces de E/S assíncronas, esta abordagem pode levar a desempenho muito alto. O fato de que combinadores de future são implementados como structs normais com implementações de trait permite que o compilador os otimize excessivamente. Para mais detalhes, veja o post [_Zero-cost futures in Rust_], que anunciou a adição de futures ao ecossistema do Rust. [_Zero-cost futures in Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ ##### Desvantagens Embora combinadores de future tornem possível escrever código muito eficiente, eles podem ser difíceis de usar em algumas situações por causa do sistema de tipos e da interface baseada em closures. Por exemplo, considere código como este: ```rust fn example(min_len: usize) -> impl Future { async_read_file("foo.txt").then(move |content| { if content.len() < min_len { Either::Left(async_read_file("bar.txt").map(|s| content + &s)) } else { Either::Right(future::ready(content)) } }) } ``` ([Tente no playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=91fc09024eecb2448a85a7ef6a97b8d8)) Aqui lemos o arquivo `foo.txt` e então usamos o combinador [`then`] para encadear uma segunda future baseada no conteúdo do arquivo. Se o comprimento do conteúdo é menor que o `min_len` dado, lemos um arquivo diferente `bar.txt` e o anexamos a `content` usando o combinador [`map`]. Caso contrário, retornamos apenas o conteúdo de `foo.txt`. Precisamos usar a [palavra-chave `move`] para a closure passada a `then` porque caso contrário haveria um erro de tempo de vida para `min_len`. A razão para o wrapper [`Either`] é que os blocos `if` e `else` devem sempre ter o mesmo tipo. Como retornamos diferentes tipos de future nos blocos, devemos usar o tipo wrapper para unificá-los em um único tipo. A função [`ready`] encapsula um valor em uma future que está imediatamente pronta. A função é necessária aqui porque o wrapper `Either` espera que o valor encapsulado implemente `Future`. [palavra-chave `move`]: https://doc.rust-lang.org/std/keyword.move.html [`Either`]: https://docs.rs/futures/0.3.4/futures/future/enum.Either.html [`ready`]: https://docs.rs/futures/0.3.4/futures/future/fn.ready.html Como você pode imaginar, isso pode rapidamente levar a código muito complexo para projetos maiores. Fica especialmente complicado se empréstimos e diferentes tempos de vida estiverem envolvidos. Por esta razão, muito trabalho foi investido em adicionar suporte para async/await ao Rust, com o objetivo de tornar o código assíncrono radicalmente mais simples de escrever. ### O Padrão Async/Await A ideia por trás de async/await é permitir que o programador escreva código que _parece_ com código síncrono normal, mas é transformado em código assíncrono pelo compilador. Funciona baseado em duas palavras-chave `async` e `await`. A palavra-chave `async` pode ser usada em uma assinatura de função para transformar uma função síncrona em uma função assíncrona que retorna uma future: ```rust async fn foo() -> u32 { 0 } // o código acima é aproximadamente traduzido pelo compilador para: fn foo() -> impl Future { future::ready(0) } ``` Esta palavra-chave sozinha não seria tão útil. No entanto, dentro de funções `async`, a palavra-chave `await` pode ser usada para recuperar o valor assíncrono de uma future: ```rust async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } ``` ([Tente no playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=d93c28509a1c67661f31ff820281d434)) Esta função é uma tradução direta da função `example` de [acima](#desvantagens) que usava funções combinadoras. Usando o operador `.await`, podemos recuperar o valor de uma future sem precisar de closures ou tipos `Either`. Como resultado, podemos escrever nosso código como escrevemos código síncrono normal, com a diferença de que _este ainda é código assíncrono_. #### Transformação em Máquina de Estados Nos bastidores, o compilador converte o corpo da função `async` em uma [_máquina de estados_], com cada chamada `.await` representando um estado diferente. Para a função `example` acima, o compilador cria uma máquina de estados com os seguintes quatro estados: [_máquina de estados_]: https://en.wikipedia.org/wiki/Finite-state_machine ![Quatro estados: início, esperando por foo.txt, esperando por bar.txt, fim](async-state-machine-states.svg) Cada estado representa um ponto de pausa diferente na função. Os estados _"Início"_ e _"Fim"_ representam a função no começo e no fim de sua execução. O estado _"Esperando por foo.txt"_ representa que a função está atualmente esperando pelo primeiro resultado de `async_read_file`. Similarmente, o estado _"Esperando por bar.txt"_ representa o ponto de pausa onde a função está esperando pelo segundo resultado de `async_read_file`. A máquina de estados implementa a trait `Future` fazendo cada chamada `poll` uma possível transição de estado: ![Quatro estados e suas transições: início, esperando por foo.txt, esperando por bar.txt, fim](async-state-machine-basic.svg) O diagrama usa setas para representar mudanças de estado e formas de diamante para representar formas alternativas. Por exemplo, se o arquivo `foo.txt` não está pronto, o caminho marcado com _"não"_ é tomado e o estado _"Esperando por foo.txt"_ é alcançado. Caso contrário, o caminho _"sim"_ é tomado. O pequeno diamante vermelho sem legenda representa a branch `if content.len() < 100` da função `example`. Vemos que a primeira chamada `poll` inicia a função e a deixa executar até alcançar uma future que ainda não está pronta. Se todas as futures no caminho estão prontas, a função pode executar até o estado _"Fim"_, onde retorna seu resultado encapsulado em `Poll::Ready`. Caso contrário, a máquina de estados entra em um estado de espera e retorna `Poll::Pending`. Na próxima chamada `poll`, a máquina de estados então começa do último estado de espera e tenta novamente a última operação. #### Salvando o Estado Para poder continuar do último estado de espera, a máquina de estados deve acompanhar internamente o estado atual. Além disso, ela deve salvar todas as variáveis de que precisa para continuar a execução na próxima chamada `poll`. É aqui que o compilador pode realmente brilhar: Como ele sabe quais variáveis são usadas quando, ele pode gerar automaticamente structs com exatamente as variáveis que são necessárias. Como exemplo, o compilador gera structs como as seguintes para a função `example` acima: ```rust // A função `example` novamente para que você não precise rolar para cima async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } // As structs de estado geradas pelo compilador: struct StartState { min_len: usize, } struct WaitingOnFooTxtState { min_len: usize, foo_txt_future: impl Future, } struct WaitingOnBarTxtState { content: String, bar_txt_future: impl Future, } struct EndState {} ``` Nos estados "início" e _"Esperando por foo.txt"_, o parâmetro `min_len` precisa ser armazenado para a comparação posterior com `content.len()`. O estado _"Esperando por foo.txt"_ armazena adicionalmente uma `foo_txt_future`, que representa a future retornada pela chamada `async_read_file`. Esta future precisa ser consultada novamente quando a máquina de estados continua, então ela precisa ser salva. O estado _"Esperando por bar.txt"_ contém a variável `content` para a concatenação de string posterior quando `bar.txt` estiver pronto. Ele também armazena uma `bar_txt_future` que representa o carregamento em progresso de `bar.txt`. A struct não contém a variável `min_len` porque não é mais necessária após a comparação `content.len()`. No estado _"fim"_, nenhuma variável é armazenada porque a função já executou até completar. Lembre-se que este é apenas um exemplo do código que o compilador poderia gerar. Os nomes das structs e o layout dos campos são detalhes de implementação e podem ser diferentes. #### O Tipo Completo da Máquina de Estados Embora o código exato gerado pelo compilador seja um detalhe de implementação, ajuda no entendimento imaginar como a máquina de estados gerada _poderia_ parecer para a função `example`. Já definimos as structs representando os diferentes estados e contendo as variáveis necessárias. Para criar uma máquina de estados em cima delas, podemos combiná-las em um [`enum`]: [`enum`]: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html ```rust enum ExampleStateMachine { Start(StartState), WaitingOnFooTxt(WaitingOnFooTxtState), WaitingOnBarTxt(WaitingOnBarTxtState), End(EndState), } ``` Definimos uma variante de enum separada para cada estado e adicionamos a struct de estado correspondente a cada variante como um campo. Para implementar as transições de estado, o compilador gera uma implementação da trait `Future` baseada na função `example`: ```rust impl Future for ExampleStateMachine { type Output = String; // tipo de retorno de `example` fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { loop { match self { // TODO: lidar com pinning ExampleStateMachine::Start(state) => {…} ExampleStateMachine::WaitingOnFooTxt(state) => {…} ExampleStateMachine::WaitingOnBarTxt(state) => {…} ExampleStateMachine::End(state) => {…} } } } } ``` O tipo `Output` da future é `String` porque é o tipo de retorno da função `example`. Para implementar a função `poll`, usamos uma expressão `match` no estado atual dentro de um `loop`. A ideia é que mudamos para o próximo estado enquanto possível e usamos um `return Poll::Pending` explícito quando não podemos continuar. Para simplicidade, mostramos apenas código simplificado e não lidamos com [pinning][_pinning_], propriedade, tempos de vida, etc. Então este e o código seguinte devem ser tratados como pseudocódigo e não usados diretamente. Claro, o código real gerado pelo compilador lida com tudo corretamente, embora possivelmente de uma forma diferente. Para manter os trechos de código pequenos, apresentamos o código para cada braço `match` separadamente. Vamos começar com o estado `Start`: ```rust ExampleStateMachine::Start(state) => { // do corpo de `example` let foo_txt_future = async_read_file("foo.txt"); // operação `.await` let state = WaitingOnFooTxtState { min_len: state.min_len, foo_txt_future, }; *self = ExampleStateMachine::WaitingOnFooTxt(state); } ``` A máquina de estados está no estado `Start` quando está bem no início da função. Neste caso, executamos todo o código do corpo da função `example` até o primeiro `.await`. Para lidar com a operação `.await`, mudamos o estado da máquina de estados `self` para `WaitingOnFooTxt`, que inclui a construção da struct `WaitingOnFooTxtState`. Como a expressão `match self {…}` é executada em um loop, a execução salta para o braço `WaitingOnFooTxt` em seguida: ```rust ExampleStateMachine::WaitingOnFooTxt(state) => { match state.foo_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(content) => { // do corpo de `example` if content.len() < state.min_len { let bar_txt_future = async_read_file("bar.txt"); // operação `.await` let state = WaitingOnBarTxtState { content, bar_txt_future, }; *self = ExampleStateMachine::WaitingOnBarTxt(state); } else { *self = ExampleStateMachine::End(EndState); return Poll::Ready(content); } } } } ``` Neste braço `match`, primeiro chamamos a função `poll` da `foo_txt_future`. Se não está pronta, saímos do loop e retornamos `Poll::Pending`. Como `self` permanece no estado `WaitingOnFooTxt` neste caso, a próxima chamada `poll` na máquina de estados entrará no mesmo braço `match` e tentará consultar a `foo_txt_future` novamente. Quando a `foo_txt_future` está pronta, atribuímos o resultado à variável `content` e continuamos a executar o código da função `example`: Se `content.len()` é menor que o `min_len` salvo na struct de estado, o arquivo `bar.txt` é lido assincronamente. Novamente traduzimos a operação `.await` em uma mudança de estado, desta vez para o estado `WaitingOnBarTxt`. Como estamos executando o `match` dentro de um loop, a execução salta diretamente para o braço `match` para o novo estado depois, onde a `bar_txt_future` é consultada. Caso entremos no braço `else`, nenhuma operação `.await` adicional ocorre. Alcançamos o fim da função e retornamos `content` encapsulado em `Poll::Ready`. Também mudamos o estado atual para o estado `End`. O código para o estado `WaitingOnBarTxt` parece com isto: ```rust ExampleStateMachine::WaitingOnBarTxt(state) => { match state.bar_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(bar_txt) => { *self = ExampleStateMachine::End(EndState); // do corpo de `example` return Poll::Ready(state.content + &bar_txt); } } } ``` Similar ao estado `WaitingOnFooTxt`, começamos consultando a `bar_txt_future`. Se ainda está pendente, saímos do loop e retornamos `Poll::Pending`. Caso contrário, podemos executar a última operação da função `example`: concatenar a variável `content` com o resultado da future. Atualizamos a máquina de estados para o estado `End` e então retornamos o resultado encapsulado em `Poll::Ready`. Finalmente, o código para o estado `End` parece com isto: ```rust ExampleStateMachine::End(_) => { panic!("poll chamado após Poll::Ready ter sido retornado"); } ``` Futures não devem ser consultadas novamente após retornarem `Poll::Ready`, então entramos em pânico se `poll` é chamado enquanto já estamos no estado `End`. Agora sabemos como a máquina de estados gerada pelo compilador e sua implementação da trait `Future` _poderiam_ parecer. Na prática, o compilador gera código de forma diferente. (Caso esteja interessado, a implementação é atualmente baseada em [_corrotinas_], mas isso é apenas um detalhe de implementação.) [_corrotinas_]: https://doc.rust-lang.org/stable/unstable-book/language-features/coroutines.html A última peça do quebra-cabeça é o código gerado para a própria função `example`. Lembre-se, o cabeçalho da função foi definido assim: ```rust async fn example(min_len: usize) -> String ``` Como o corpo completo da função agora é implementado pela máquina de estados, a única coisa que a função precisa fazer é inicializar a máquina de estados e retorná-la. O código gerado para isso poderia parecer com isto: ```rust fn example(min_len: usize) -> ExampleStateMachine { ExampleStateMachine::Start(StartState { min_len, }) } ``` A função não tem mais um modificador `async` porque agora retorna explicitamente um tipo `ExampleStateMachine`, que implementa a trait `Future`. Como esperado, a máquina de estados é construída no estado `Start` e a struct de estado correspondente é inicializada com o parâmetro `min_len`. Note que esta função não inicia a execução da máquina de estados. Esta é uma decisão de design fundamental de futures em Rust: elas não fazem nada até serem consultadas pela primeira vez. ### Pinning Já tropeçamos em _pinning_ múltiplas vezes neste post. Agora é finalmente a hora de explorar o que é pinning e por que é necessário. #### Structs Auto-Referenciais Como explicado acima, a transformação da máquina de estados armazena as variáveis locais de cada ponto de pausa em uma struct. Para exemplos pequenos como nossa função `example`, isso foi direto e não levou a problemas. No entanto, as coisas se tornam mais difíceis quando variáveis referenciam umas às outras. Por exemplo, considere esta função: ```rust async fn pin_example() -> i32 { let array = [1, 2, 3]; let element = &array[2]; async_write_file("foo.txt", element.to_string()).await; *element } ``` Esta função cria um pequeno `array` com o conteúdo `1`, `2` e `3`. Ela então cria uma referência ao último elemento do array e a armazena em uma variável `element`. Em seguida, ela escreve assincronamente o número convertido em string para um arquivo `foo.txt`. Finalmente, ela retorna o número referenciado por `element`. Como a função usa uma única operação `await`, a máquina de estados resultante tem três estados: início, fim e "esperando por escrita". A função não recebe argumentos, então a struct para o estado de início está vazia. Como antes, a struct para o estado de fim está vazia porque a função está finalizada neste ponto. A struct para o estado "esperando por escrita" é mais interessante: ```rust struct WaitingOnWriteState { array: [1, 2, 3], element: 0x1001c, // endereço do último elemento do array } ``` Precisamos armazenar tanto as variáveis `array` quanto `element` porque `element` é necessária para o valor de retorno e `array` é referenciado por `element`. Como `element` é uma referência, ela armazena um _ponteiro_ (ou seja, um endereço de memória) para o elemento referenciado. Usamos `0x1001c` como um endereço de memória de exemplo aqui. Na realidade, precisa ser o endereço do último elemento do campo `array`, então depende de onde a struct vive na memória. Structs com tais ponteiros internos são chamadas _structs auto-referenciais_ porque referenciam a si mesmas de um de seus campos. #### O Problema com Structs Auto-Referenciais O ponteiro interno de nossa struct auto-referencial leva a um problema fundamental, que se torna aparente quando olhamos para seu layout de memória: ![array em 0x10014 com campos 1, 2 e 3; element em endereço 0x10020, apontando para o último elemento do array em 0x1001c](self-referential-struct.svg) O campo `array` começa no endereço 0x10014 e o campo `element` no endereço 0x10020. Ele aponta para o endereço 0x1001c porque o último elemento do array vive neste endereço. Neste ponto, tudo ainda está bem. No entanto, um problema ocorre quando movemos esta struct para um endereço de memória diferente: ![array em 0x10024 com campos 1, 2 e 3; element em endereço 0x10030, ainda apontando para 0x1001c, mesmo que o último elemento do array agora viva em 0x1002c](self-referential-struct-moved.svg) Movemos a struct um pouco então ela agora começa no endereço `0x10024`. Isso poderia, por exemplo, acontecer quando passamos a struct como um argumento de função ou a atribuímos a uma variável de pilha diferente. O problema é que o campo `element` ainda aponta para o endereço `0x1001c` mesmo que o último elemento `array` agora viva no endereço `0x1002c`. Assim, o ponteiro está pendente, com o resultado de que comportamento indefinido ocorre na próxima chamada `poll`. #### Soluções Possíveis Existem três abordagens fundamentais para resolver o problema do ponteiro pendente: - **Atualizar o ponteiro no movimento:** A ideia é atualizar o ponteiro interno sempre que a struct é movida na memória para que ainda seja válido após o movimento. Infelizmente, esta abordagem exigiria mudanças extensas ao Rust que resultariam em potencialmente enormes perdas de desempenho. A razão é que algum tipo de runtime precisaria acompanhar o tipo de todos os campos da struct e verificar em cada operação de movimento se uma atualização de ponteiro é necessária. - **Armazenar um offset em vez de auto-referências:** Para evitar a necessidade de atualizar ponteiros, o compilador poderia tentar armazenar auto-referências como offsets do início da struct. Por exemplo, o campo `element` da struct `WaitingOnWriteState` acima poderia ser armazenado na forma de um campo `element_offset` com um valor de 8 porque o elemento do array para o qual a referência aponta começa 8 bytes após o início da struct. Como o offset permanece o mesmo quando a struct é movida, nenhuma atualização de campo é necessária. O problema com esta abordagem é que requer que o compilador detecte todas as auto-referências. Isso não é possível em tempo de compilação porque o valor de uma referência pode depender da entrada do usuário, então precisaríamos de um sistema de runtime novamente para analisar referências e criar corretamente as structs de estado. Isso não apenas resultaria em custos de runtime, mas também impediria certas otimizações do compilador, de modo que causaria grandes perdas de desempenho novamente. - **Proibir mover a struct:** Como vimos acima, o ponteiro pendente só ocorre quando movemos a struct na memória. Ao proibir completamente operações de movimento em structs auto-referenciais, o problema também pode ser evitado. A grande vantagem desta abordagem é que pode ser implementada no nível do sistema de tipos sem custos de runtime adicionais. A desvantagem é que coloca o ônus de lidar com operações de movimento em structs possivelmente auto-referenciais no programador. Rust escolheu a terceira solução por causa de seu princípio de fornecer _abstrações de custo zero_, o que significa que abstrações não devem impor custos de runtime adicionais. A API de [_pinning_] foi proposta para este propósito na [RFC 2349](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md). No que segue, daremos uma breve visão geral desta API e explicaremos como funciona com async/await e futures. #### Valores de Heap A primeira observação é que valores [alocados em heap] já têm um endereço de memória fixo na maior parte do tempo. Eles são criados usando uma chamada para `allocate` e então referenciados por um tipo de ponteiro como `Box`. Embora mover o tipo de ponteiro seja possível, o valor de heap para o qual o ponteiro aponta permanece no mesmo endereço de memória até ser liberado através de uma chamada `deallocate` novamente. [alocados em heap]: @/edition-2/posts/10-heap-allocation/index.md Usando alocação de heap, podemos tentar criar uma struct auto-referencial: ```rust fn main() { let mut heap_value = Box::new(SelfReferential { self_ptr: 0 as *const _, }); let ptr = &*heap_value as *const SelfReferential; heap_value.self_ptr = ptr; println!("heap value at: {:p}", heap_value); println!("internal reference: {:p}", heap_value.self_ptr); } struct SelfReferential { self_ptr: *const Self, } ``` ([Tente no playground][playground-self-ref]) [playground-self-ref]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=ce1aff3a37fcc1c8188eeaf0f39c97e8 Criamos uma struct simples chamada `SelfReferential` que contém um único campo de ponteiro. Primeiro, inicializamos esta struct com um ponteiro nulo e então a alocamos no heap usando `Box::new`. Então determinamos o endereço de memória da struct alocada em heap e o armazenamos em uma variável `ptr`. Finalmente, tornamos a struct auto-referencial atribuindo a variável `ptr` ao campo `self_ptr`. Quando executamos este código [no playground][playground-self-ref], vemos que o endereço do valor de heap e seu ponteiro interno são iguais, o que significa que o campo `self_ptr` é uma auto-referência válida. Como a variável `heap_value` é apenas um ponteiro, movê-la (por exemplo, passando-a para uma função) não muda o endereço da própria struct, então o `self_ptr` permanece válido mesmo se o ponteiro é movido. No entanto, ainda há uma forma de quebrar este exemplo: Podemos mover para fora de um `Box` ou substituir seu conteúdo: ```rust let stack_value = mem::replace(&mut *heap_value, SelfReferential { self_ptr: 0 as *const _, }); println!("value at: {:p}", &stack_value); println!("internal reference: {:p}", stack_value.self_ptr); ``` ([Tente no playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) Aqui usamos a função [`mem::replace`] para substituir o valor alocado em heap por uma nova instância da struct. Isso nos permite mover o `heap_value` original para a pilha, enquanto o campo `self_ptr` da struct agora é um ponteiro pendente que ainda aponta para o endereço de heap antigo. Quando você tenta executar o exemplo no playground, vê que as linhas impressas _"value at:"_ e _"internal reference:"_ de fato mostram ponteiros diferentes. Então alocar um valor em heap não é suficiente para tornar auto-referências seguras. [`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html O problema fundamental que permitiu a quebra acima é que `Box` nos permite obter uma referência `&mut T` para o valor alocado em heap. Esta referência `&mut` torna possível usar métodos como [`mem::replace`] ou [`mem::swap`] para invalidar o valor alocado em heap. Para resolver este problema, devemos evitar que referências `&mut` para structs auto-referenciais sejam criadas. [`mem::swap`]: https://doc.rust-lang.org/nightly/core/mem/fn.swap.html #### `Pin>` e `Unpin` A API de pinning fornece uma solução para o problema `&mut T` na forma do tipo wrapper [`Pin`] e da trait marcadora [`Unpin`]. A ideia por trás desses tipos é controlar todos os métodos de `Pin` que podem ser usados para obter referências `&mut` ao valor encapsulado (por exemplo, [`get_mut`][pin-get-mut] ou [`deref_mut`][pin-deref-mut]) na trait `Unpin`. A trait `Unpin` é uma [_auto trait_], que é automaticamente implementada para todos os tipos exceto aqueles que explicitamente desistem dela. Ao fazer structs auto-referenciais desistirem de `Unpin`, não há forma (segura) de obter uma `&mut T` de um tipo `Pin>` para elas. Como resultado, suas auto-referências internas têm garantia de permanecer válidas. [`Pin`]: https://doc.rust-lang.org/stable/core/pin/struct.Pin.html [`Unpin`]: https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html [pin-get-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_mut [pin-deref-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.deref_mut [_auto trait_]: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits Como exemplo, vamos atualizar o tipo `SelfReferential` de acima para desistir de `Unpin`: ```rust use core::marker::PhantomPinned; struct SelfReferential { self_ptr: *const Self, _pin: PhantomPinned, } ``` Desistimos adicionando um segundo campo `_pin` do tipo [`PhantomPinned`]. Este tipo é um tipo marcador de tamanho zero cujo único propósito é _não_ implementar a trait `Unpin`. Por causa da forma como [auto traits][_auto trait_] funcionam, um único campo que não é `Unpin` é suficiente para fazer a struct completa desistir de `Unpin`. [`PhantomPinned`]: https://doc.rust-lang.org/nightly/core/marker/struct.PhantomPinned.html O segundo passo é mudar o tipo `Box` no exemplo para um tipo `Pin>`. A maneira mais fácil de fazer isso é usar a função [`Box::pin`] em vez de [`Box::new`] para criar o valor alocado em heap: [`Box::pin`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin [`Box::new`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.new ```rust let mut heap_value = Box::pin(SelfReferential { self_ptr: 0 as *const _, _pin: PhantomPinned, }); ``` Além de mudar `Box::new` para `Box::pin`, também precisamos adicionar o novo campo `_pin` no inicializador da struct. Como `PhantomPinned` é um tipo de tamanho zero, só precisamos de seu nome de tipo para inicializá-lo. Quando [tentamos executar nosso exemplo ajustado](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=961b0db194bbe851ff4d0ed08d3bd98a) agora, vemos que ele não funciona mais: ``` error[E0594]: cannot assign to data in dereference of `Pin>` --> src/main.rs:10:5 | 10 | heap_value.self_ptr = ptr; | ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot assign | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` error[E0596]: cannot borrow data in dereference of `Pin>` as mutable --> src/main.rs:16:36 | 16 | let stack_value = mem::replace(&mut *heap_value, SelfReferential { | ^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` ``` Ambos os erros ocorrem porque o tipo `Pin>` não implementa mais a trait `DerefMut`. Isso é exatamente o que queríamos porque a trait `DerefMut` retornaria uma referência `&mut`, que queríamos evitar. Isso só acontece porque tanto desistimos de `Unpin` quanto mudamos `Box::new` para `Box::pin`. O problema agora é que o compilador não apenas evita mover o tipo na linha 16, mas também proíbe inicializar o campo `self_ptr` na linha 10. Isso acontece porque o compilador não pode diferenciar entre usos válidos e inválidos de referências `&mut`. Para fazer a inicialização funcionar novamente, temos que usar o método unsafe [`get_unchecked_mut`]: [`get_unchecked_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut ```rust // seguro porque modificar um campo não move a struct inteira unsafe { let mut_ref = Pin::as_mut(&mut heap_value); Pin::get_unchecked_mut(mut_ref).self_ptr = ptr; } ``` ([Tente no playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=b9ebbb11429d9d79b3f9fffe819e2018)) A função [`get_unchecked_mut`] funciona em um `Pin<&mut T>` em vez de um `Pin>`, então temos que usar [`Pin::as_mut`] para converter o valor primeiro. Então podemos definir o campo `self_ptr` usando a referência `&mut` retornada por `get_unchecked_mut`. [`Pin::as_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut Agora o único erro restante é o erro desejado em `mem::replace`. Lembre-se, esta operação tenta mover o valor alocado em heap para a pilha, o que quebraria a auto-referência armazenada no campo `self_ptr`. Ao desistir de `Unpin` e usar `Pin>`, podemos evitar esta operação em tempo de compilação e assim trabalhar com segurança com structs auto-referenciais. Como vimos, o compilador não é capaz de provar que a criação da auto-referência é segura (ainda), então precisamos usar um bloco unsafe e verificar a correção nós mesmos. #### Pinning de Pilha e `Pin<&mut T>` Na seção anterior, aprendemos como usar `Pin>` para criar com segurança um valor auto-referencial alocado em heap. Embora esta abordagem funcione bem e seja relativamente segura (além da construção unsafe), a alocação de heap necessária vem com um custo de desempenho. Como Rust se esforça para fornecer _abstrações de custo zero_ sempre que possível, a API de pinning também permite criar instâncias `Pin<&mut T>` que apontam para valores alocados em pilha. Diferente de instâncias `Pin>`, que têm _propriedade_ do valor encapsulado, instâncias `Pin<&mut T>` apenas emprestam temporariamente o valor encapsulado. Isso torna as coisas mais complicadas, pois requer que o programador garanta garantias adicionais por si mesmo. Mais importante, um `Pin<&mut T>` deve permanecer fixado por todo o tempo de vida do `T` referenciado, o que pode ser difícil de verificar para variáveis baseadas em pilha. Para ajudar com isso, crates como [`pin-utils`] existem, mas eu ainda não recomendaria fixar na pilha a menos que você realmente saiba o que está fazendo. [`pin-utils`]: https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/ Para leitura adicional, confira a documentação do [módulo `pin`] e do método [`Pin::new_unchecked`]. [módulo `pin`]: https://doc.rust-lang.org/nightly/core/pin/index.html [`Pin::new_unchecked`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.new_unchecked #### Pinning e Futures Como já vimos neste post, o método [`Future::poll`] usa pinning na forma de um parâmetro `Pin<&mut Self>`: [`Future::poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll ```rust fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll ``` A razão pela qual este método recebe `self: Pin<&mut Self>` em vez do `&mut self` normal é que instâncias de future criadas a partir de async/await são frequentemente auto-referenciais, como vimos [acima][self-ref-async-await]. Ao encapsular `Self` em `Pin` e deixar o compilador desistir de `Unpin` para futures auto-referenciais gerados de async/await, é garantido que as futures não sejam movidas na memória entre chamadas `poll`. Isso garante que todas as referências internas ainda são válidas. [self-ref-async-await]: @/edition-2/posts/12-async-await/index.pt-BR.md#o-problema-com-structs-auto-referenciais Vale notar que mover futures antes da primeira chamada `poll` é aceitável. Isso é resultado do fato de que futures são preguiçosas e não fazem nada até serem consultadas pela primeira vez. O estado `start` das máquinas de estados geradas, portanto, contém apenas os argumentos da função mas nenhuma referência interna. Para chamar `poll`, o chamador deve encapsular a future em `Pin` primeiro, o que garante que a future não pode ser movida na memória mais. Como fixar em pilha é mais difícil de acertar, recomendo sempre usar [`Box::pin`] combinado com [`Pin::as_mut`] para isso. [`futures`]: https://docs.rs/futures/0.3.4/futures/ Caso esteja interessado em entender como implementar com segurança uma função combinadora de future usando fixação em pilha você mesmo, dê uma olhada no [código-fonte relativamente curto do método combinador `map`][map-src] da crate `futures` e na seção sobre [projeções e fixação estrutural] da documentação de pin. [map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html [projeções e fixação estrutural]: https://doc.rust-lang.org/stable/std/pin/index.html#projections-and-structural-pinning ### Executores e Wakers Usando async/await, é possível trabalhar com futures de forma completamente assíncrona e ergonômica. No entanto, como aprendemos acima, futures não fazem nada até serem consultadas. Isso significa que temos que chamar `poll` nelas em algum ponto, caso contrário o código assíncrono nunca é executado. Com uma única future, podemos sempre esperar por cada future manualmente usando um loop [como descrito acima](#esperando-por-futures). No entanto, esta abordagem é muito ineficiente e não prática para programas que criam um grande número de futures. A solução mais comum para este problema é definir um _executor_ global que é responsável por consultar todas as futures no sistema até serem finalizadas. #### Executores O propósito de um executor é permitir spawnar futures como tarefas independentes, tipicamente através de algum tipo de método `spawn`. O executor é então responsável por consultar todas as futures até serem completadas. A grande vantagem de gerenciar todas as futures em um lugar central é que o executor pode alternar para uma future diferente sempre que uma future retorna `Poll::Pending`. Assim, operações assíncronas são executadas em paralelo e a CPU é mantida ocupada. Muitas implementações de executor também podem aproveitar sistemas com múltiplos núcleos de CPU. Eles criam um [thread pool] que é capaz de utilizar todos os núcleos se há trabalho suficiente disponível e usam técnicas como [work stealing] para equilibrar a carga entre núcleos. Também existem implementações especiais de executor para sistemas embarcados que otimizam para baixa latência e sobrecarga de memória. [thread pool]: https://en.wikipedia.org/wiki/Thread_pool [work stealing]: https://en.wikipedia.org/wiki/Work_stealing Para evitar a sobrecarga de consultar futures repetidamente, executores tipicamente aproveitam a API de _waker_ suportada pelas futures do Rust. #### Wakers A ideia por trás da API de waker é que um tipo especial [`Waker`] é passado para cada invocação de `poll`, encapsulado no tipo [`Context`]. Este tipo `Waker` é criado pelo executor e pode ser usado pela tarefa assíncrona para sinalizar sua conclusão (parcial). Como resultado, o executor não precisa chamar `poll` em uma future que previamente retornou `Poll::Pending` até ser notificado pelo waker correspondente. [`Context`]: https://doc.rust-lang.org/nightly/core/task/struct.Context.html Isso é melhor ilustrado por um pequeno exemplo: ```rust async fn write_file() { async_write_file("foo.txt", "Hello").await; } ``` Esta função escreve assincronamente a string "Hello" em um arquivo `foo.txt`. Como escritas em disco demoram algum tempo, a primeira chamada `poll` nesta future provavelmente retornará `Poll::Pending`. No entanto, o driver de disco rígido armazenará internamente o `Waker` passado para a chamada `poll` e o usará para notificar o executor quando o arquivo for escrito no disco. Desta forma, o executor não precisa desperdiçar tempo tentando fazer `poll` da future novamente antes de receber a notificação do waker. Veremos como o tipo `Waker` funciona em detalhes quando criarmos nosso próprio executor com suporte a waker na seção de implementação deste post. ### Multitarefa Cooperativa? No início deste post, falamos sobre multitarefa preemptiva e cooperativa. Enquanto multitarefa preemptiva depende do sistema operacional para pausar forçadamente tarefas em execução, multitarefa cooperativa requer que as tarefas cedam voluntariamente o controle da CPU através de uma operação _yield_ regularmente. A grande vantagem da abordagem cooperativa é que as tarefas podem salvar seu próprio estado, o que resulta em trocas de contexto mais eficientes e torna possível compartilhar a mesma pilha de chamadas entre tarefas. Pode não ser imediatamente aparente, mas futures e async/await são uma implementação do padrão de multitarefa cooperativa: - Cada future que é adicionada ao executor é basicamente uma tarefa cooperativa. - Em vez de usar uma operação de yield explícita, futures cedem o controle do núcleo da CPU retornando `Poll::Pending` (ou `Poll::Ready` no final). - Não há nada que force futures a ceder a CPU. Se quiserem, podem nunca retornar de `poll`, por exemplo, girando indefinidamente em um loop. - Como cada future pode bloquear a execução das outras futures no executor, precisamos confiar que elas não sejam maliciosas. - Futures armazenam internamente todo o estado de que precisam para continuar a execução na próxima chamada `poll`. Com async/await, o compilador detecta automaticamente todas as variáveis necessárias e as armazena dentro da máquina de estados gerada. - Apenas o estado mínimo necessário para continuação é salvo. - Como o método `poll` cede a pilha de chamadas quando retorna, a mesma pilha pode ser usada para consultar outras futures. Vemos que futures e async/await se encaixam perfeitamente no padrão de multitarefa cooperativa; eles apenas usam terminologia diferente. No que segue, portanto, usaremos os termos "tarefa" e "future" de forma intercambiável. ## Implementação Agora que entendemos como multitarefa cooperativa baseada em futures e async/await funciona em Rust, é hora de adicionar suporte para ela ao nosso kernel. Como a trait [`Future`] é parte da biblioteca `core` e async/await é uma funcionalidade da própria linguagem, não há nada especial que precisamos fazer para usá-la em nosso kernel `#![no_std]`. O único requisito é que usemos pelo menos o nightly `2020-03-25` do Rust porque async/await não era compatível com `no_std` antes. Com um nightly recente o suficiente, podemos começar a usar async/await em nosso `main.rs`: ```rust // em src/main.rs async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` A função `async_number` é uma `async fn`, então o compilador a transforma em uma máquina de estados que implementa `Future`. Como a função retorna apenas `42`, a future resultante retornará diretamente `Poll::Ready(42)` na primeira chamada `poll`. Como `async_number`, a função `example_task` também é uma `async fn`. Ela aguarda o número retornado por `async_number` e então o imprime usando a macro `println`. Para executar a future retornada por `example_task`, precisamos chamar `poll` nela até sinalizar sua conclusão retornando `Poll::Ready`. Para fazer isso, precisamos criar um tipo executor simples. ### Tarefa Antes de começarmos a implementação do executor, criamos um novo módulo `task` com um tipo `Task`: ```rust // em src/lib.rs pub mod task; ``` ```rust // em src/task/mod.rs use core::{future::Future, pin::Pin}; use alloc::boxed::Box; pub struct Task { future: Pin>>, } ``` A struct `Task` é um tipo newtype wrapper em torno de uma future fixada, alocada em heap e dinamicamente despachada com o tipo vazio `()` como saída. Vamos passar por ela em detalhes: - Requeremos que a future associada a uma tarefa retorne `()`. Isso significa que tarefas não retornam nenhum resultado, elas são apenas executadas por seus efeitos colaterais. Por exemplo, a função `example_task` que definimos acima não tem valor de retorno, mas ela imprime algo na tela como efeito colateral. - A palavra-chave `dyn` indica que armazenamos um [_trait object_] no `Box`. Isso significa que os métodos na future são [_dinamicamente despachados_], permitindo que diferentes tipos de futures sejam armazenados no tipo `Task`. Isso é importante porque cada `async fn` tem seu próprio tipo e queremos poder criar múltiplas tarefas diferentes. - Como aprendemos na [seção sobre pinning], o tipo `Pin` garante que um valor não pode ser movido na memória colocando-o no heap e impedindo a criação de referências `&mut` a ele. Isso é importante porque futures gerados por async/await podem ser auto-referenciais, ou seja, conter ponteiros para si mesmos que seriam invalidados quando a future é movida. [_trait object_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html [_dinamicamente despachados_]: https://doc.rust-lang.org/book/ch18-02-trait-objects.html#trait-objects-perform-dynamic-dispatch [seção sobre pinning]: #pinning Para permitir a criação de novas structs `Task` a partir de futures, criamos uma função `new`: ```rust // em src/task/mod.rs impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { future: Box::pin(future), } } } ``` A função recebe uma future arbitrária com um tipo de saída de `()` e a fixa na memória através da função [`Box::pin`]. Então encapsula a future em caixa na struct `Task` e a retorna. O tempo de vida `'static` é necessário aqui porque a `Task` retornada pode viver por um tempo arbitrário, então a future precisa ser válida por esse tempo também. Também adicionamos um método `poll` para permitir que o executor consulte a future armazenada: ```rust // em src/task/mod.rs use core::task::{Context, Poll}; impl Task { fn poll(&mut self, context: &mut Context) -> Poll<()> { self.future.as_mut().poll(context) } } ``` Como o método [`poll`] da trait `Future` espera ser chamado em um tipo `Pin<&mut T>`, usamos o método [`Pin::as_mut`] para converter o campo `self.future` do tipo `Pin>` primeiro. Então chamamos `poll` no campo `self.future` convertido e retornamos o resultado. Como o método `Task::poll` deve ser chamado apenas pelo executor que criaremos em um momento, mantemos a função privada ao módulo `task`. ### Executor Simples Como executores podem ser bem complexos, deliberadamente começamos criando um executor muito básico antes de implementar um executor com mais funcionalidades mais tarde. Para isso, primeiro criamos um novo submódulo `task::simple_executor`: ```rust // em src/task/mod.rs pub mod simple_executor; ``` ```rust // em src/task/simple_executor.rs use super::Task; use alloc::collections::VecDeque; pub struct SimpleExecutor { task_queue: VecDeque, } impl SimpleExecutor { pub fn new() -> SimpleExecutor { SimpleExecutor { task_queue: VecDeque::new(), } } pub fn spawn(&mut self, task: Task) { self.task_queue.push_back(task) } } ``` A struct contém um único campo `task_queue` do tipo [`VecDeque`], que é basicamente um vetor que permite operações de push e pop em ambas as extremidades. A ideia por trás de usar este tipo é que inserimos novas tarefas através do método `spawn` no final e retiramos a próxima tarefa para execução do início. Desta forma, obtemos uma simples [fila FIFO] (_"first in, first out"_). [`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html [fila FIFO]: https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics) #### Waker Dummy Para chamar o método `poll`, precisamos criar um tipo [`Context`], que encapsula um tipo [`Waker`]. Para começar simples, primeiro criaremos um waker dummy que não faz nada. Para isso, criamos uma instância [`RawWaker`], que define a implementação dos diferentes métodos `Waker`, e então usamos a função [`Waker::from_raw`] para transformá-lo em um `Waker`: [`RawWaker`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html [`Waker::from_raw`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw ```rust // em src/task/simple_executor.rs use core::task::{Waker, RawWaker}; fn dummy_raw_waker() -> RawWaker { todo!(); } fn dummy_waker() -> Waker { unsafe { Waker::from_raw(dummy_raw_waker()) } } ``` A função `from_raw` é unsafe porque comportamento indefinido pode ocorrer se o programador não respeitar os requisitos documentados de `RawWaker`. Antes de olharmos para a implementação da função `dummy_raw_waker`, primeiro tentamos entender como o tipo `RawWaker` funciona. ##### `RawWaker` O tipo [`RawWaker`] requer que o programador defina explicitamente uma [_tabela de métodos virtuais_] (_vtable_) que especifica as funções que devem ser chamadas quando o `RawWaker` é clonado, acordado ou descartado. O layout desta vtable é definido pelo tipo [`RawWakerVTable`]. Cada função recebe um argumento `*const ()`, que é um ponteiro _type-erased_ para algum valor. A razão para usar um ponteiro `*const ()` em vez de uma referência apropriada é que o tipo `RawWaker` deve ser não genérico mas ainda suportar tipos arbitrários. O ponteiro é fornecido colocando-o no argumento `data` de [`RawWaker::new`], que apenas inicializa um `RawWaker`. O `Waker` então usa este `RawWaker` para chamar as funções da vtable com `data`. [_tabela de métodos virtuais_]: https://en.wikipedia.org/wiki/Virtual_method_table [`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html [`RawWaker::new`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new Tipicamente, o `RawWaker` é criado para alguma struct alocada em heap que é encapsulada no tipo [`Box`] ou [`Arc`]. Para tais tipos, métodos como [`Box::into_raw`] podem ser usados para converter o `Box` em um ponteiro `*const T`. Este ponteiro pode então ser convertido em um ponteiro anônimo `*const ()` e passado para `RawWaker::new`. Como cada função da vtable recebe o mesmo `*const ()` como argumento, as funções podem com segurança converter o ponteiro de volta para um `Box` ou um `&T` para operar nele. Como você pode imaginar, este processo é altamente perigoso e pode facilmente levar a comportamento indefinido em erros. Por esta razão, criar manualmente um `RawWaker` não é recomendado a menos que seja necessário. [`Box`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`Box::into_raw`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw ##### Um `RawWaker` Dummy Embora criar manualmente um `RawWaker` não seja recomendado, atualmente não há outra forma de criar um `Waker` dummy que não faz nada. Felizmente, o fato de que queremos não fazer nada torna relativamente seguro implementar a função `dummy_raw_waker`: ```rust // em src/task/simple_executor.rs use core::task::RawWakerVTable; fn dummy_raw_waker() -> RawWaker { fn no_op(_: *const ()) {} fn clone(_: *const ()) -> RawWaker { dummy_raw_waker() } let vtable = &RawWakerVTable::new(clone, no_op, no_op, no_op); RawWaker::new(0 as *const (), vtable) } ``` Primeiro, definimos duas funções internas chamadas `no_op` e `clone`. A função `no_op` recebe um ponteiro `*const ()` e não faz nada. A função `clone` também recebe um ponteiro `*const ()` e retorna um novo `RawWaker` chamando `dummy_raw_waker` novamente. Usamos estas duas funções para criar uma `RawWakerVTable` mínima: A função `clone` é usada para as operações de clonagem, e a função `no_op` é usada para todas as outras operações. Como o `RawWaker` não faz nada, não importa que retornamos um novo `RawWaker` de `clone` em vez de cloná-lo. Após criar a `vtable`, usamos a função [`RawWaker::new`] para criar o `RawWaker`. O `*const ()` passado não importa já que nenhuma das funções da vtable o usa. Por esta razão, simplesmente passamos um ponteiro nulo. #### Um Método `run` Agora temos uma forma de criar uma instância `Waker`, podemos usá-la para implementar um método `run` em nosso executor. O método `run` mais simples é consultar repetidamente todas as tarefas enfileiradas em um loop até todas estarem prontas. Isso não é muito eficiente já que não utiliza as notificações do tipo `Waker`, mas é uma forma fácil de fazer as coisas funcionarem: ```rust // em src/task/simple_executor.rs use core::task::{Context, Poll}; impl SimpleExecutor { pub fn run(&mut self) { while let Some(mut task) = self.task_queue.pop_front() { let waker = dummy_waker(); let mut context = Context::from_waker(&waker); match task.poll(&mut context) { Poll::Ready(()) => {} // tarefa concluída Poll::Pending => self.task_queue.push_back(task), } } } } ``` A função usa um loop `while let` para lidar com todas as tarefas na `task_queue`. Para cada tarefa, primeiro cria um tipo `Context` encapsulando uma instância `Waker` retornada por nossa função `dummy_waker`. Então invoca o método `Task::poll` com este `context`. Se o método `poll` retorna `Poll::Ready`, a tarefa está finalizada e podemos continuar com a próxima tarefa. Se a tarefa ainda está `Poll::Pending`, nós a adicionamos de volta ao final da fila para que seja consultada novamente em uma iteração de loop subsequente. #### Experimentando Com nosso tipo `SimpleExecutor`, agora podemos tentar executar a tarefa retornada pela função `example_task` em nosso `main.rs`: ```rust // em src/main.rs use blog_os::task::{Task, simple_executor::SimpleExecutor}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] rotinas de inicialização, incluindo init_heap, test_main let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.run(); // […] mensagem "it did not crash", hlt_loop } // Abaixo está a função example_task novamente para que você não precise rolar para cima async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` Quando executamos, vemos que a mensagem esperada _"async number: 42"_ é impressa na tela: ![QEMU imprimindo "Hello World", "async number: 42", e "It did not crash!"](qemu-simple-executor.png) Vamos resumir os vários passos que acontecem neste exemplo: - Primeiro, uma nova instância do nosso tipo `SimpleExecutor` é criada com uma `task_queue` vazia. - Em seguida, chamamos a função assíncrona `example_task`, que retorna uma future. Encapsulamos esta future no tipo `Task`, que a move para o heap e a fixa, e então adicionamos a tarefa à `task_queue` do executor através do método `spawn`. - Então chamamos o método `run` para iniciar a execução da única tarefa na fila. Isso envolve: - Retirar a tarefa do início da `task_queue`. - Criar um `RawWaker` para a tarefa, convertê-lo em uma instância [`Waker`], e então criar uma instância [`Context`] a partir dele. - Chamar o método [`poll`] na future da tarefa, usando o `Context` que acabamos de criar. - Como a `example_task` não espera por nada, pode executar diretamente até seu fim na primeira chamada `poll`. É aqui que a linha _"async number: 42"_ é impressa. - Como a `example_task` retorna diretamente `Poll::Ready`, ela não é adicionada de volta à fila de tarefas. - O método `run` retorna após a `task_queue` se tornar vazia. A execução de nossa função `kernel_main` continua e a mensagem _"It did not crash!"_ é impressa. ### Entrada de Teclado Assíncrona Nosso executor simples não utiliza as notificações `Waker` e simplesmente faz loop sobre todas as tarefas até estarem prontas. Isso não foi um problema para nosso exemplo já que nossa `example_task` pode executar diretamente até finalizar na primeira chamada `poll`. Para ver as vantagens de desempenho de uma implementação `Waker` apropriada, primeiro precisamos criar uma tarefa que é verdadeiramente assíncrona, ou seja, uma tarefa que provavelmente retornará `Poll::Pending` na primeira chamada `poll`. Já temos algum tipo de assincronia em nosso sistema que podemos usar para isso: interrupções de hardware. Como aprendemos no post [_Interrupções_], interrupções de hardware podem ocorrer em pontos arbitrários no tempo, determinados por algum dispositivo externo. Por exemplo, um temporizador de hardware envia uma interrupção para a CPU após algum tempo predefinido ter decorrido. Quando a CPU recebe uma interrupção, ela transfere imediatamente o controle para a função manipuladora correspondente definida na tabela de descritores de interrupção (IDT). [_Interrupções_]: @/edition-2/posts/07-hardware-interrupts/index.md No que segue, criaremos uma tarefa assíncrona baseada na interrupção de teclado. A interrupção de teclado é uma boa candidata para isso porque é tanto não determinística quanto crítica em latência. Não determinística significa que não há forma de prever quando a próxima tecla será pressionada porque depende inteiramente do usuário. Crítica em latência significa que queremos lidar com a entrada de teclado de forma oportuna, caso contrário o usuário sentirá um atraso. Para suportar tal tarefa de forma eficiente, será essencial que o executor tenha suporte apropriado para notificações `Waker`. #### Fila de Scancode Atualmente, lidamos com a entrada de teclado diretamente no manipulador de interrupção. Isso não é uma boa ideia a longo prazo porque manipuladores de interrupção devem permanecer o mais curtos possível já que podem interromper trabalho importante. Em vez disso, manipuladores de interrupção devem executar apenas a quantidade mínima de trabalho necessária (por exemplo, ler o scancode do teclado) e deixar o resto do trabalho (por exemplo, interpretar o scancode) para uma tarefa em segundo plano. Um padrão comum para delegar trabalho para uma tarefa em segundo plano é criar algum tipo de fila. O manipulador de interrupção empurra unidades de trabalho para a fila, e a tarefa em segundo plano lida com o trabalho na fila. Aplicado à nossa interrupção de teclado, isso significa que o manipulador de interrupção lê apenas o scancode do teclado, o empurra para a fila e então retorna. A tarefa de teclado fica no outro extremo da fila e interpreta e lida com cada scancode que é empurrado para ela: ![Fila de scancode com 8 slots no topo. Manipulador de interrupção de teclado na parte inferior esquerda com uma seta "push scancode" para a esquerda da fila. Tarefa de teclado na parte inferior direita com uma seta "pop scancode" vindo do lado direito da fila.](scancode-queue.svg) Uma implementação simples dessa fila poderia ser um [`VecDeque`] protegido por mutex. No entanto, usar mutexes em manipuladores de interrupção não é uma boa ideia porque pode facilmente levar a deadlocks. Por exemplo, quando o usuário pressiona uma tecla enquanto a tarefa de teclado bloqueou a fila, o manipulador de interrupção tenta adquirir o bloqueio novamente e trava indefinidamente. Outro problema com esta abordagem é que `VecDeque` aumenta automaticamente sua capacidade realizando uma nova alocação de heap quando fica cheio. Isso pode levar a deadlocks novamente porque nosso alocador também usa um mutex internamente. Problemas adicionais são que alocações de heap podem falhar ou demorar um tempo considerável quando o heap está fragmentado. Para evitar esses problemas, precisamos de uma implementação de fila que não requer mutexes ou alocações para sua operação `push`. Tais filas podem ser implementadas usando [operações atômicas] sem bloqueio para empurrar e retirar elementos. Desta forma, é possível criar operações `push` e `pop` que requerem apenas uma referência `&self` e são, portanto, utilizáveis sem um mutex. Para evitar alocações em `push`, a fila pode ser apoiada por um buffer pré-alocado de tamanho fixo. Embora isso torne a fila _limitada_ (ou seja, tem um comprimento máximo), frequentemente é possível definir limites superiores razoáveis para o comprimento da fila na prática, então isso não é um grande problema. [operações atômicas]: https://doc.rust-lang.org/core/sync/atomic/index.html ##### A Crate `crossbeam` Implementar tal fila de forma correta e eficiente é muito difícil, então recomendo aderir a implementações existentes e bem testadas. Um projeto popular de Rust que implementa vários tipos sem mutex para programação concorrente é [`crossbeam`]. Ele fornece um tipo chamado [`ArrayQueue`] que é exatamente o que precisamos neste caso. E temos sorte: o tipo é totalmente compatível com crates `no_std` com suporte a alocação. [`crossbeam`]: https://github.com/crossbeam-rs/crossbeam [`ArrayQueue`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html Para usar o tipo, precisamos adicionar uma dependência na crate `crossbeam-queue`: ```toml # em Cargo.toml [dependencies.crossbeam-queue] version = "0.3.11" default-features = false features = ["alloc"] ``` Por padrão, a crate depende da biblioteca padrão. Para torná-la compatível com `no_std`, precisamos desabilitar suas funcionalidades padrão e em vez disso habilitar a funcionalidade `alloc`. (Note que também poderíamos adicionar uma dependência na crate `crossbeam` principal, que reexporta a crate `crossbeam-queue`, mas isso resultaria em um número maior de dependências e tempos de compilação mais longos.) ##### Implementação da Fila Usando o tipo `ArrayQueue`, agora podemos criar uma fila de scancode global em um novo módulo `task::keyboard`: ```rust // em src/task/mod.rs pub mod keyboard; ``` ```rust // em src/task/keyboard.rs use conquer_once::spin::OnceCell; use crossbeam_queue::ArrayQueue; static SCANCODE_QUEUE: OnceCell> = OnceCell::uninit(); ``` Como [`ArrayQueue::new`] realiza uma alocação de heap, que não é possível em tempo de compilação ([ainda][const-heap-alloc]), não podemos inicializar a variável estática diretamente. Em vez disso, usamos o tipo [`OnceCell`] da crate [`conquer_once`], que torna possível realizar uma inicialização única segura de valores estáticos. Para incluir a crate, precisamos adicioná-la como uma dependência em nosso `Cargo.toml`: [`ArrayQueue::new`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.new [const-heap-alloc]: https://github.com/rust-lang/const-eval/issues/20 [`OnceCell`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html [`conquer_once`]: https://docs.rs/conquer-once/0.2.0/conquer_once/index.html ```toml # em Cargo.toml [dependencies.conquer-once] version = "0.2.0" default-features = false ``` Em vez do primitivo [`OnceCell`], também poderíamos usar a macro [`lazy_static`] aqui. No entanto, o tipo `OnceCell` tem a vantagem de que podemos garantir que a inicialização não acontece no manipulador de interrupção, evitando assim que o manipulador de interrupção realize uma alocação de heap. [`lazy_static`]: https://docs.rs/lazy_static/1.4.0/lazy_static/index.html #### Preenchendo a Fila Para preencher a fila de scancode, criamos uma nova função `add_scancode` que chamaremos do manipulador de interrupção: ```rust // em src/task/keyboard.rs use crate::println; /// Chamada pelo manipulador de interrupção de teclado /// /// Não deve bloquear ou alocar. pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("AVISO: fila de scancode cheia; descartando entrada de teclado"); } } else { println!("AVISO: fila de scancode não inicializada"); } } ``` Usamos [`OnceCell::try_get`] para obter uma referência à fila inicializada. Se a fila ainda não está inicializada, ignoramos o scancode do teclado e imprimimos um aviso. É importante que não tentemos inicializar a fila nesta função porque ela será chamada pelo manipulador de interrupção, que não deve realizar alocações de heap. Como esta função não deve ser chamável de nosso `main.rs`, usamos a visibilidade `pub(crate)` para torná-la disponível apenas para nosso `lib.rs`. [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get O fato de que o método [`ArrayQueue::push`] requer apenas uma referência `&self` torna muito simples chamar o método na fila estática. O tipo `ArrayQueue` realiza toda a sincronização necessária por si mesmo, então não precisamos de um wrapper mutex aqui. Caso a fila esteja cheia, também imprimimos um aviso. [`ArrayQueue::push`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push Para chamar a função `add_scancode` em interrupções de teclado, atualizamos nossa função `keyboard_interrupt_handler` no módulo `interrupts`: ```rust // em src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame ) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; crate::task::keyboard::add_scancode(scancode); // novo unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` Removemos todo o código de manipulação de teclado desta função e em vez disso adicionamos uma chamada para a função `add_scancode`. O resto da função permanece o mesmo de antes. Como esperado, pressionamentos de tecla não são mais impressos na tela quando executamos nosso projeto usando `cargo run` agora. Em vez disso, vemos o aviso de que a fila de scancode está não inicializada para cada pressionamento de tecla. #### Scancode Stream Para inicializar a `SCANCODE_QUEUE` e ler os scancodes da fila de forma assíncrona, criamos um novo tipo `ScancodeStream`: ```rust // em src/task/keyboard.rs pub struct ScancodeStream { _private: (), } impl ScancodeStream { pub fn new() -> Self { SCANCODE_QUEUE.try_init_once(|| ArrayQueue::new(100)) .expect("ScancodeStream::new deve ser chamado apenas uma vez"); ScancodeStream { _private: () } } } ``` O propósito do campo `_private` é evitar a construção da struct de fora do módulo. Isso torna a função `new` a única forma de construir o tipo. Na função, primeiro tentamos inicializar a estática `SCANCODE_QUEUE`. Entramos em pânico se ela já estiver inicializada para garantir que apenas uma única instância `ScancodeStream` pode ser criada. Para disponibilizar os scancodes para tarefas assíncronas, o próximo passo é implementar um método tipo `poll` que tenta retirar o próximo scancode da fila. Embora isso soe como deveríamos implementar a trait [`Future`] para nosso tipo, isso não se encaixa perfeitamente aqui. O problema é que a trait `Future` abstrai apenas sobre um único valor assíncrono e espera que o método `poll` não seja chamado novamente após retornar `Poll::Ready`. Nossa fila de scancode, no entanto, contém múltiplos valores assíncronos, então está ok continuar consultando-a. ##### A Trait `Stream` Como tipos que produzem múltiplos valores assíncronos são comuns, a crate [`futures`] fornece uma abstração útil para tais tipos: a trait [`Stream`]. A trait é definida assim: [`Stream`]: https://rust-lang.github.io/async-book/05_streams/01_chapter.html ```rust pub trait Stream { type Item; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll>; } ``` Esta definição é bem similar à trait [`Future`], com as seguintes diferenças: - O tipo associado é chamado `Item` em vez de `Output`. - Em vez de um método `poll` que retorna `Poll`, a trait `Stream` define um método `poll_next` que retorna um `Poll>` (note o `Option` adicional). Há também uma diferença semântica: O `poll_next` pode ser chamado repetidamente, até retornar `Poll::Ready(None)` para sinalizar que o stream está finalizado. Neste aspecto, o método é similar ao método [`Iterator::next`], que também retorna `None` após o último valor. [`Iterator::next`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html#tymethod.next ##### Implementando `Stream` Vamos implementar a trait `Stream` para nosso `ScancodeStream` para fornecer os valores da `SCANCODE_QUEUE` de forma assíncrona. Para isso, primeiro precisamos adicionar uma dependência na crate `futures-util`, que contém o tipo `Stream`: ```toml # em Cargo.toml [dependencies.futures-util] version = "0.3.4" default-features = false features = ["alloc"] ``` Desabilitamos as funcionalidades padrão para tornar a crate compatível com `no_std` e habilitamos a funcionalidade `alloc` para disponibilizar seus tipos baseados em alocação (precisaremos disso mais tarde). (Note que também poderíamos adicionar uma dependência na crate `futures` principal, que reexporta a crate `futures-util`, mas isso resultaria em um número maior de dependências e tempos de compilação mais longos.) Agora podemos importar e implementar a trait `Stream`: ```rust // em src/task/keyboard.rs use core::{pin::Pin, task::{Poll, Context}}; use futures_util::stream::Stream; impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE.try_get().expect("não inicializada"); match queue.pop() { Some(scancode) => Poll::Ready(Some(scancode)), None => Poll::Pending, } } } ``` Primeiro usamos o método [`OnceCell::try_get`] para obter uma referência à fila de scancode inicializada. Isso nunca deve falhar já que inicializamos a fila na função `new`, então podemos usar com segurança o método `expect` para entrar em pânico se não estiver inicializada. Em seguida, usamos o método [`ArrayQueue::pop`] para tentar obter o próximo elemento da fila. Se tiver sucesso, retornamos o scancode encapsulado em `Poll::Ready(Some(…))`. Se falhar, significa que a fila está vazia. Nesse caso, retornamos `Poll::Pending`. [`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop #### Suporte a Waker Como o método `Futures::poll`, o método `Stream::poll_next` requer que a tarefa assíncrona notifique o executor quando se torna pronta após `Poll::Pending` ser retornado. Desta forma, o executor não precisa consultar a mesma tarefa novamente até ser notificado, o que reduz grandemente a sobrecarga de desempenho de tarefas em espera. Para enviar esta notificação, a tarefa deve extrair o [`Waker`] da referência [`Context`] passada e armazená-lo em algum lugar. Quando a tarefa se torna pronta, ela deve invocar o método [`wake`] no `Waker` armazenado para notificar o executor que a tarefa deve ser consultada novamente. ##### AtomicWaker Para implementar a notificação `Waker` para nosso `ScancodeStream`, precisamos de um lugar onde possamos armazenar o `Waker` entre chamadas poll. Não podemos armazená-lo como um campo no próprio `ScancodeStream` porque ele precisa ser acessível da função `add_scancode`. A solução para isso é usar uma variável estática do tipo [`AtomicWaker`] fornecido pela crate `futures-util`. Como o tipo `ArrayQueue`, este tipo é baseado em instruções atômicas e pode ser armazenado com segurança em um `static` e modificado concorrentemente. [`AtomicWaker`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html Vamos usar o tipo [`AtomicWaker`] para definir um `WAKER` estático: ```rust // em src/task/keyboard.rs use futures_util::task::AtomicWaker; static WAKER: AtomicWaker = AtomicWaker::new(); ``` A ideia é que a implementação `poll_next` armazena o waker atual nesta estática, e a função `add_scancode` chama a função `wake` nele quando um novo scancode é adicionado à fila. ##### Armazenando um Waker O contrato definido por `poll`/`poll_next` requer que a tarefa registre um acordar para o `Waker` passado quando retorna `Poll::Pending`. Vamos modificar nossa implementação `poll_next` para satisfazer este requisito: ```rust // em src/task/keyboard.rs impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE .try_get() .expect("fila de scancode não inicializada"); // caminho rápido if let Some(scancode) = queue.pop() { return Poll::Ready(Some(scancode)); } WAKER.register(&cx.waker()); match queue.pop() { Some(scancode) => { WAKER.take(); Poll::Ready(Some(scancode)) } None => Poll::Pending, } } } ``` Como antes, primeiro usamos a função [`OnceCell::try_get`] para obter uma referência à fila de scancode inicializada. Então otimisticamente tentamos `pop` da fila e retornamos `Poll::Ready` quando tiver sucesso. Desta forma, podemos evitar a sobrecarga de desempenho de registrar um waker quando a fila não está vazia. Se a primeira chamada para `queue.pop()` não tiver sucesso, a fila está potencialmente vazia. Apenas potencialmente porque o manipulador de interrupção pode ter preenchido a fila assincronamente imediatamente após a verificação. Como esta condição de corrida pode ocorrer novamente para a próxima verificação, precisamos registrar o `Waker` no `WAKER` estático antes da segunda verificação. Desta forma, um acordar pode acontecer antes de retornarmos `Poll::Pending`, mas é garantido que recebemos um acordar para qualquer scancode empurrado após a verificação. Após registrar o `Waker` contido no [`Context`] passado através da função [`AtomicWaker::register`], tentamos retirar da fila uma segunda vez. Se agora tiver sucesso, retornamos `Poll::Ready`. Também removemos o waker registrado novamente usando [`AtomicWaker::take`] porque uma notificação de waker não é mais necessária. Caso `queue.pop()` falhe pela segunda vez, retornamos `Poll::Pending` como antes, mas desta vez com um acordar registrado. [`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register [`AtomicWaker::take`]: https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take Note que há duas formas de um acordar acontecer para uma tarefa que não retornou `Poll::Pending` (ainda). Uma forma é a condição de corrida mencionada quando o acordar acontece imediatamente antes de retornar `Poll::Pending`. A outra forma é quando a fila não está mais vazia após registrar o waker, de modo que `Poll::Ready` é retornado. Como esses acordares espúrios não são evitáveis, o executor precisa ser capaz de lidar com eles corretamente. ##### Acordando o Waker Armazenado Para acordar o `Waker` armazenado, adicionamos uma chamada para `WAKER.wake()` na função `add_scancode`: ```rust // em src/task/keyboard.rs pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("AVISO: fila de scancode cheia; descartando entrada de teclado"); } else { WAKER.wake(); // novo } } else { println!("AVISO: fila de scancode não inicializada"); } } ``` A única mudança que fizemos é adicionar uma chamada para `WAKER.wake()` se o push para a fila de scancode tiver sucesso. Se um waker está registrado no `WAKER` estático, este método chamará o método [`wake`] igualmente nomeado nele, que notifica o executor. Caso contrário, a operação é uma no-op, ou seja, nada acontece. [`wake`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.wake É importante que chamemos `wake` apenas após empurrar para a fila porque caso contrário a tarefa pode ser acordada muito cedo enquanto a fila ainda está vazia. Isso pode, por exemplo, acontecer ao usar um executor multi-threaded que inicia a tarefa acordada concorrentemente em um núcleo de CPU diferente. Embora ainda não tenhamos suporte a threads, adicionaremos isso em breve e não queremos que as coisas quebrem então. #### Tarefa de Teclado Agora que implementamos a trait `Stream` para nosso `ScancodeStream`, podemos usá-la para criar uma tarefa de teclado assíncrona: ```rust // em src/task/keyboard.rs use futures_util::stream::StreamExt; use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use crate::print; pub async fn print_keypresses() { let mut scancodes = ScancodeStream::new(); let mut keyboard = Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore); while let Some(scancode) = scancodes.next().await { if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } } } ``` O código é muito similar ao código que tínhamos em nosso [manipulador de interrupção de teclado] antes de modificá-lo neste post. A única diferença é que, em vez de ler o scancode de uma porta de E/S, nós o pegamos do `ScancodeStream`. Para isso, primeiro criamos um novo `Scancode` stream e então usamos repetidamente o método [`next`] fornecido pela trait [`StreamExt`] para obter uma `Future` que resolve para o próximo elemento no stream. Usando o operador `await` nele, aguardamos assincronamente o resultado da future. [manipulador de interrupção de teclado]: @/edition-2/posts/07-hardware-interrupts/index.pt-BR.md#interpretando-os-scancodes [`next`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html#method.next [`StreamExt`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html Usamos `while let` para fazer loop até o stream retornar `None` para sinalizar seu fim. Como nosso método `poll_next` nunca retorna `None`, este é efetivamente um loop infinito, então a tarefa `print_keypresses` nunca termina. Vamos adicionar a tarefa `print_keypresses` ao nosso executor em nosso `main.rs` para obter entrada de teclado funcionando novamente: ```rust // em src/main.rs use blog_os::task::keyboard; // novo fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] rotinas de inicialização, incluindo init_heap, test_main let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); // novo executor.run(); // […] mensagem "it did not crash", hlt_loop } ``` Quando executamos `cargo run` agora, vemos que a entrada de teclado funciona novamente: ![QEMU imprimindo ".....H...e...l...l..o..... ...W..o..r....l...d...!"](qemu-keyboard-output.gif) Se você ficar de olho na utilização de CPU do seu computador, verá que o processo `QEMU` agora mantém continuamente a CPU ocupada. Isso acontece porque nosso `SimpleExecutor` consulta tarefas repetidamente em um loop. Então mesmo se não pressionarmos nenhuma tecla no teclado, o executor chama repetidamente `poll` em nossa tarefa `print_keypresses`, mesmo que a tarefa não possa fazer progresso e retornará `Poll::Pending` cada vez. ### Executor com Suporte a Waker Para corrigir o problema de desempenho, precisamos criar um executor que utilize adequadamente as notificações `Waker`. Desta forma, o executor é notificado quando a próxima interrupção de teclado ocorre, então não precisa continuar consultando a tarefa `print_keypresses` repetidamente. #### Task Id O primeiro passo na criação de um executor com suporte adequado para notificações de waker é dar a cada tarefa um ID único. Isso é necessário porque precisamos de uma forma de especificar qual tarefa deve ser acordada. Começamos criando um novo tipo wrapper `TaskId`: ```rust // em src/task/mod.rs #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] struct TaskId(u64); ``` A struct `TaskId` é um tipo wrapper simples em torno de `u64`. Derivamos várias traits para ela para torná-la imprimível, copiável, comparável e ordenável. Esta última é importante porque queremos usar `TaskId` como o tipo de chave de um [`BTreeMap`] daqui a pouco. [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html Para criar um novo ID único, criamos uma função `TaskId::new`: ```rust use core::sync::atomic::{AtomicU64, Ordering}; impl TaskId { fn new() -> Self { static NEXT_ID: AtomicU64 = AtomicU64::new(0); TaskId(NEXT_ID.fetch_add(1, Ordering::Relaxed)) } } ``` A função usa uma variável estática `NEXT_ID` do tipo [`AtomicU64`] para garantir que cada ID seja atribuído apenas uma vez. O método [`fetch_add`] incrementa atomicamente o valor e retorna o valor anterior em uma operação atômica. Isso significa que mesmo quando o método `TaskId::new` é chamado em paralelo, cada ID é retornado exatamente uma vez. O parâmetro [`Ordering`] define se o compilador tem permissão para reordenar a operação `fetch_add` no fluxo de instruções. Como apenas requeremos que o ID seja único, a ordenação `Relaxed` com os requisitos mais fracos é suficiente neste caso. [`AtomicU64`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html [`fetch_add`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html#method.fetch_add [`Ordering`]: https://doc.rust-lang.org/core/sync/atomic/enum.Ordering.html Agora podemos estender nosso tipo `Task` com um campo `id` adicional: ```rust // em src/task/mod.rs pub struct Task { id: TaskId, // novo future: Pin>>, } impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { id: TaskId::new(), // novo future: Box::pin(future), } } } ``` O novo campo `id` torna possível nomear exclusivamente uma tarefa, o que é necessário para acordar uma tarefa específica. #### O Tipo `Executor` Criamos nosso novo tipo `Executor` em um módulo `task::executor`: ```rust // em src/task/mod.rs pub mod executor; ``` ```rust // em src/task/executor.rs use super::{Task, TaskId}; use alloc::{collections::BTreeMap, sync::Arc}; use core::task::Waker; use crossbeam_queue::ArrayQueue; pub struct Executor { tasks: BTreeMap, task_queue: Arc>, waker_cache: BTreeMap, } impl Executor { pub fn new() -> Self { Executor { tasks: BTreeMap::new(), task_queue: Arc::new(ArrayQueue::new(100)), waker_cache: BTreeMap::new(), } } } ``` Em vez de armazenar tarefas em um [`VecDeque`] como fizemos para nosso `SimpleExecutor`, usamos uma `task_queue` de IDs de tarefa e um [`BTreeMap`] chamado `tasks` que contém as instâncias `Task` reais. O mapa é indexado pelo `TaskId` para permitir continuação eficiente de uma tarefa específica. O campo `task_queue` é um [`ArrayQueue`] de IDs de tarefa, encapsulado no tipo [`Arc`] que implementa _contagem de referência_. Contagem de referência torna possível compartilhar propriedade do valor entre múltiplos proprietários. Funciona alocando o valor no heap e contando o número de referências ativas a ele. Quando o número de referências ativas chega a zero, o valor não é mais necessário e pode ser desalocado. Usamos este tipo `Arc` para a `task_queue` porque ela será compartilhada entre o executor e wakers. A ideia é que os wakers empurram o ID da tarefa acordada para a fila. O executor fica na extremidade receptora da fila, recupera as tarefas acordadas por seu ID do mapa `tasks`, e então as executa. A razão para usar uma fila de tamanho fixo em vez de uma fila ilimitada como [`SegQueue`] é que manipuladores de interrupção não devem alocar ao empurrar para esta fila. [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`SegQueue`]: https://docs.rs/crossbeam-queue/0.2.1/crossbeam_queue/struct.SegQueue.html Além da `task_queue` e do mapa `tasks`, o tipo `Executor` tem um campo `waker_cache` que também é um mapa. Este mapa armazena em cache o [`Waker`] de uma tarefa após sua criação. Isso tem duas razões: Primeiro, melhora o desempenho reutilizando o mesmo waker para múltiplos acordares da mesma tarefa em vez de criar um novo waker cada vez. Segundo, garante que wakers contados por referência não sejam desalocados dentro de manipuladores de interrupção porque isso poderia levar a deadlocks (há mais detalhes sobre isso abaixo). Para criar um `Executor`, fornecemos uma função `new` simples. Escolhemos uma capacidade de 100 para a `task_queue`, que deve ser mais que suficiente para o futuro previsível. Caso nosso sistema tenha mais de 100 tarefas concorrentes em algum ponto, podemos facilmente aumentar esse tamanho. #### Spawnando Tarefas Como para o `SimpleExecutor`, fornecemos um método `spawn` em nosso tipo `Executor` que adiciona uma determinada tarefa ao mapa `tasks` e imediatamente a acorda empurrando seu ID para a `task_queue`: ```rust // em src/task/executor.rs impl Executor { pub fn spawn(&mut self, task: Task) { let task_id = task.id; if self.tasks.insert(task.id, task).is_some() { panic!("tarefa com o mesmo ID já em tasks"); } self.task_queue.push(task_id).expect("fila cheia"); } } ``` Se já houver uma tarefa com o mesmo ID no mapa, o método [`BTreeMap::insert`] a retorna. Isso nunca deve acontecer já que cada tarefa tem um ID único, então entramos em pânico neste caso porque indica um bug em nosso código. Similarmente, entramos em pânico quando a `task_queue` está cheia já que isso nunca deve acontecer se escolhermos um tamanho de fila grande o suficiente. #### Executando Tarefas Para executar todas as tarefas na `task_queue`, criamos um método privado `run_ready_tasks`: ```rust // em src/task/executor.rs use core::task::{Context, Poll}; impl Executor { fn run_ready_tasks(&mut self) { // desestruturar `self` para evitar erros do borrow checker let Self { tasks, task_queue, waker_cache, } = self; while let Some(task_id) = task_queue.pop() { let task = match tasks.get_mut(&task_id) { Some(task) => task, None => continue, // tarefa não existe mais }; let waker = waker_cache .entry(task_id) .or_insert_with(|| TaskWaker::new(task_id, task_queue.clone())); let mut context = Context::from_waker(waker); match task.poll(&mut context) { Poll::Ready(()) => { // tarefa concluída -> removê-la e seu waker em cache tasks.remove(&task_id); waker_cache.remove(&task_id); } Poll::Pending => {} } } } } ``` A ideia básica desta função é similar ao nosso `SimpleExecutor`: Fazer loop sobre todas as tarefas na `task_queue`, criar um waker para cada tarefa e então consultá-las. No entanto, em vez de adicionar tarefas pendentes de volta ao final da `task_queue`, deixamos nossa implementação `TaskWaker` cuidar de adicionar tarefas acordadas de volta à fila. A implementação deste tipo waker será mostrada daqui a pouco. Vamos olhar alguns dos detalhes de implementação deste método `run_ready_tasks`: - Usamos [_desestruturação_] para dividir `self` em seus três campos para evitar alguns erros do borrow checker. Nomeadamente, nossa implementação precisa acessar o `self.task_queue` de dentro de uma closure, o que atualmente tenta emprestar `self` completamente. Este é um problema fundamental do borrow checker que será resolvido quando [RFC 2229] for [implementado][RFC 2229 impl]. Nota do tradutor ([Richard Alves](https://github.com/richarddalves)): Na data desta tradução (2025), verifiquei que o [RFC 2229] já foi implementado. - Para cada ID de tarefa retirado, recuperamos uma referência mutável à tarefa correspondente do mapa `tasks`. Como nossa implementação `ScancodeStream` registra wakers antes de verificar se uma tarefa precisa ser colocada para dormir, pode acontecer que um acordar ocorra para uma tarefa que não existe mais. Neste caso, simplesmente ignoramos o acordar e continuamos com o próximo ID da fila. - Para evitar a sobrecarga de desempenho de criar um waker em cada poll, usamos o mapa `waker_cache` para armazenar o waker para cada tarefa após ter sido criado. Para isso, usamos o método [`BTreeMap::entry`] em combinação com [`Entry::or_insert_with`] para criar um novo waker se ele ainda não existir e então obter uma referência mutável a ele. Para criar um novo waker, clonamos a `task_queue` e a passamos junto com o ID da tarefa para a função `TaskWaker::new` (implementação mostrada abaixo). Como a `task_queue` está encapsulada em um `Arc`, o `clone` apenas incrementa a contagem de referência do valor, mas ainda aponta para a mesma fila alocada em heap. Note que reutilizar wakers assim não é possível para todas as implementações de waker, mas nosso tipo `TaskWaker` permitirá isso. [_desestruturação_]: https://doc.rust-lang.org/book/ch19-03-pattern-syntax.html#destructuring-to-break-apart-values [RFC 2229]: https://github.com/rust-lang/rfcs/pull/2229 [RFC 2229 impl]: https://github.com/rust-lang/rust/issues/53488 [`BTreeMap::entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry [`Entry::or_insert_with`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with Uma tarefa está finalizada quando retorna `Poll::Ready`. Nesse caso, nós a removemos do mapa `tasks` usando o método [`BTreeMap::remove`]. Também removemos seu waker em cache, se existir. [`BTreeMap::remove`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove #### Design do Waker O trabalho do waker é empurrar o ID da tarefa acordada para a `task_queue` do executor. Implementamos isso criando uma nova struct `TaskWaker` que armazena o ID da tarefa e uma referência à `task_queue`: ```rust // em src/task/executor.rs struct TaskWaker { task_id: TaskId, task_queue: Arc>, } ``` Como a propriedade da `task_queue` é compartilhada entre o executor e wakers, usamos o tipo wrapper [`Arc`] para implementar propriedade compartilhada contada por referência. [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html A implementação da operação de acordar é bem simples: ```rust // em src/task/executor.rs impl TaskWaker { fn wake_task(&self) { self.task_queue.push(self.task_id).expect("task_queue cheia"); } } ``` Empurramos o `task_id` para a `task_queue` referenciada. Como modificações ao tipo [`ArrayQueue`] requerem apenas uma referência compartilhada, podemos implementar este método em `&self` em vez de `&mut self`. ##### A Trait `Wake` Para usar nosso tipo `TaskWaker` para consultar futures, precisamos convertê-lo em uma instância [`Waker`] primeiro. Isso é necessário porque o método [`Future::poll`] recebe uma instância [`Context`] como argumento, que só pode ser construída a partir do tipo `Waker`. Embora pudéssemos fazer isso fornecendo uma implementação do tipo [`RawWaker`], é tanto mais simples quanto mais seguro em vez disso implementar a trait [`Wake`][wake-trait] baseada em `Arc` e então usar as implementações [`From`] fornecidas pela biblioteca padrão para construir o `Waker`. A implementação da trait parece com isto: [wake-trait]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html ```rust // em src/task/executor.rs use alloc::task::Wake; impl Wake for TaskWaker { fn wake(self: Arc) { self.wake_task(); } fn wake_by_ref(self: &Arc) { self.wake_task(); } } ``` Como wakers são comumente compartilhados entre o executor e as tarefas assíncronas, os métodos da trait requerem que a instância `Self` seja encapsulada no tipo [`Arc`], que implementa propriedade contada por referência. Isso significa que temos que mover nosso `TaskWaker` para um `Arc` para chamá-los. A diferença entre os métodos `wake` e `wake_by_ref` é que o último requer apenas uma referência ao `Arc`, enquanto o primeiro toma propriedade do `Arc` e, portanto, frequentemente requer um incremento da contagem de referência. Nem todos os tipos suportam acordar por referência, então implementar o método `wake_by_ref` é opcional. No entanto, pode levar a melhor desempenho porque evita modificações desnecessárias da contagem de referência. No nosso caso, podemos simplesmente encaminhar ambos os métodos da trait para nossa função `wake_task`, que requer apenas uma referência compartilhada `&self`. ##### Criando Wakers Como o tipo `Waker` suporta conversões [`From`] para todos os valores encapsulados em `Arc` que implementam a trait `Wake`, agora podemos implementar a função `TaskWaker::new` que é requerida por nosso método `Executor::run_ready_tasks`: [`From`]: https://doc.rust-lang.org/nightly/core/convert/trait.From.html ```rust // em src/task/executor.rs impl TaskWaker { fn new(task_id: TaskId, task_queue: Arc>) -> Waker { Waker::from(Arc::new(TaskWaker { task_id, task_queue, })) } } ``` Criamos o `TaskWaker` usando o `task_id` e `task_queue` passados. Então encapsulamos o `TaskWaker` em um `Arc` e usamos a implementação `Waker::from` para convertê-lo em um [`Waker`]. Este método `from` cuida de construir um [`RawWakerVTable`] e uma instância [`RawWaker`] para nosso tipo `TaskWaker`. Caso esteja interessado em como funciona em detalhes, confira a [implementação na crate `alloc`][waker-from-impl]. [waker-from-impl]: https://github.com/rust-lang/rust/blob/cdb50c6f2507319f29104a25765bfb79ad53395c/src/liballoc/task.rs#L58-L87 #### Um Método `run` Com nossa implementação de waker em vigor, finalmente podemos construir um método `run` para nosso executor: ```rust // em src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); } } } ``` Este método apenas chama a função `run_ready_tasks` em um loop. Embora teoricamente pudéssemos retornar da função quando o mapa `tasks` se torna vazio, isso nunca aconteceria já que nossa `keyboard_task` nunca termina, então um simples `loop` deve ser suficiente. Como a função nunca retorna, usamos o tipo de retorno `!` para marcar a função como [divergente] para o compilador. [divergente]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html Agora podemos mudar nosso `kernel_main` para usar nosso novo `Executor` em vez do `SimpleExecutor`: ```rust // em src/main.rs use blog_os::task::executor::Executor; // novo fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] rotinas de inicialização, incluindo init_heap, test_main let mut executor = Executor::new(); // novo executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); executor.run(); } ``` Só precisamos mudar a importação e o nome do tipo. Como nossa função `run` é marcada como divergente, o compilador sabe que nunca retorna, então não precisamos mais de uma chamada para `hlt_loop` no final de nossa função `kernel_main`. Quando executamos nosso kernel usando `cargo run` agora, vemos que a entrada de teclado ainda funciona: ![QEMU imprimindo ".....H...e...l...l..o..... ...a..g..a....i...n...!"](qemu-keyboard-output-again.gif) No entanto, a utilização de CPU do QEMU não melhorou. A razão para isso é que ainda mantemos a CPU ocupada o tempo todo. Não consultamos mais tarefas até serem acordadas novamente, mas ainda verificamos a `task_queue` em um loop ocupado. Para corrigir isso, precisamos colocar a CPU para dormir se não há mais trabalho a fazer. #### Dormir se Inativo A ideia básica é executar a [instrução `hlt`] quando a `task_queue` está vazia. Esta instrução coloca a CPU para dormir até a próxima interrupção chegar. O fato de que a CPU imediatamente se torna ativa novamente em interrupções garante que ainda podemos reagir diretamente quando um manipulador de interrupção empurra para a `task_queue`. [instrução `hlt`]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) Para implementar isso, criamos um novo método `sleep_if_idle` em nosso executor e o chamamos de nosso método `run`: ```rust // em src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); self.sleep_if_idle(); // novo } } fn sleep_if_idle(&self) { if self.task_queue.is_empty() { x86_64::instructions::hlt(); } } } ``` Como chamamos `sleep_if_idle` diretamente após `run_ready_tasks`, que faz loop até a `task_queue` se tornar vazia, verificar a fila novamente pode parecer desnecessário. No entanto, uma interrupção de hardware pode ocorrer diretamente após `run_ready_tasks` retornar, então pode haver uma nova tarefa na fila no momento em que a função `sleep_if_idle` é chamada. Apenas se a fila ainda estiver vazia, colocamos a CPU para dormir executando a instrução `hlt` através da função wrapper [`instructions::hlt`] fornecida pela crate [`x86_64`]. [`instructions::hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/fn.hlt.html [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/index.html Infelizmente, ainda há uma condição de corrida sutil nesta implementação. Como interrupções são assíncronas e podem acontecer a qualquer momento, é possível que uma interrupção aconteça logo entre a verificação `is_empty` e a chamada para `hlt`: ```rust if self.task_queue.is_empty() { /// <--- interrupção pode acontecer aqui x86_64::instructions::hlt(); } ``` Caso esta interrupção empurre para a `task_queue`, colocamos a CPU para dormir mesmo que agora haja uma tarefa pronta. No pior caso, isso poderia atrasar o tratamento de uma interrupção de teclado até o próximo pressionamento de tecla ou a próxima interrupção de temporizador. Então como evitamos isso? A resposta é desabilitar interrupções na CPU antes da verificação e atomicamente habilitá-las novamente junto com a instrução `hlt`. Desta forma, todas as interrupções que acontecem no meio são atrasadas após a instrução `hlt` para que nenhum acordar seja perdido. Para implementar esta abordagem, podemos usar a função [`interrupts::enable_and_hlt`][`enable_and_hlt`] fornecida pela crate [`x86_64`]. [`enable_and_hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.enable_and_hlt.html A implementação atualizada de nossa função `sleep_if_idle` parece com isto: ```rust // em src/task/executor.rs impl Executor { fn sleep_if_idle(&self) { use x86_64::instructions::interrupts::{self, enable_and_hlt}; interrupts::disable(); if self.task_queue.is_empty() { enable_and_hlt(); } else { interrupts::enable(); } } } ``` Para evitar condições de corrida, desabilitamos interrupções antes de verificar se a `task_queue` está vazia. Se estiver, usamos a função [`enable_and_hlt`] para habilitar interrupções e colocar a CPU para dormir como uma única operação atômica. Caso a fila não esteja mais vazia, significa que uma interrupção acordou uma tarefa após `run_ready_tasks` retornar. Nesse caso, habilitamos interrupções novamente e continuamos a execução diretamente sem executar `hlt`. Agora nosso executor coloca adequadamente a CPU para dormir quando não há trabalho a fazer. Podemos ver que o processo QEMU tem uma utilização de CPU muito menor quando executamos nosso kernel usando `cargo run` novamente. #### Extensões Possíveis Nosso executor agora é capaz de executar tarefas de forma eficiente. Ele utiliza notificações de waker para evitar consultar tarefas em espera e coloca a CPU para dormir quando atualmente não há trabalho a fazer. No entanto, nosso executor ainda é bem básico e há muitas formas possíveis de estender sua funcionalidade: - **Agendamento**: Para nossa `task_queue`, atualmente usamos o tipo [`VecDeque`] para implementar uma estratégia _first in first out_ (FIFO), que também é frequentemente chamada de agendamento _round robin_. Esta estratégia pode não ser a mais eficiente para todas as cargas de trabalho. Por exemplo, pode fazer sentido priorizar tarefas críticas em latência ou tarefas que fazem muita E/S. Veja o [capítulo de agendamento] do livro [_Operating Systems: Three Easy Pieces_] ou o [artigo da Wikipedia sobre agendamento][scheduling-wiki] para mais informações. - **Spawning de Tarefa**: Nosso método `Executor::spawn` atualmente requer uma referência `&mut self` e, portanto, não está mais disponível após invocar o método `run`. Para corrigir isso, poderíamos criar um tipo `Spawner` adicional que compartilha algum tipo de fila com o executor e permite criação de tarefas de dentro das próprias tarefas. A fila poderia ser a própria `task_queue` diretamente ou uma fila separada que o executor verifica em seu loop de execução. - **Utilizando Threads**: Ainda não temos suporte para threads, mas o adicionaremos no próximo post. Isso tornará possível lançar múltiplas instâncias do executor em threads diferentes. A vantagem desta abordagem é que o atraso imposto por tarefas de longa execução pode ser reduzido porque outras tarefas podem executar concorrentemente. Esta abordagem também permite utilizar múltiplos núcleos de CPU. - **Balanceamento de Carga**: Ao adicionar suporte a threading, torna-se importante saber como distribuir as tarefas entre os executores para garantir que todos os núcleos de CPU sejam utilizados. Uma técnica comum para isso é [_work stealing_]. [capítulo de agendamento]: http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf [_Operating Systems: Three Easy Pieces_]: http://pages.cs.wisc.edu/~remzi/OSTEP/ [scheduling-wiki]: https://en.wikipedia.org/wiki/Scheduling_(computing) [_work stealing_]: https://en.wikipedia.org/wiki/Work_stealing ## Resumo Começamos este post introduzindo **multitarefa** e diferenciando entre multitarefa _preemptiva_, que interrompe forçadamente tarefas em execução regularmente, e multitarefa _cooperativa_, que permite que tarefas executem até voluntariamente cederem o controle da CPU. Então exploramos como o suporte do Rust para **async/await** fornece uma implementação no nível da linguagem de multitarefa cooperativa. Rust baseia sua implementação em cima da trait `Future` baseada em polling, que abstrai tarefas assíncronas. Usando async/await, é possível trabalhar com futures quase como com código síncrono normal. A diferença é que funções assíncronas retornam uma `Future` novamente, que precisa ser adicionada a um executor em algum ponto para executá-la. Por trás dos bastidores, o compilador transforma código async/await em _máquinas de estados_, com cada operação `.await` correspondendo a um possível ponto de pausa. Ao utilizar seu conhecimento sobre o programa, o compilador é capaz de salvar apenas o estado mínimo para cada ponto de pausa, resultando em um consumo de memória muito pequeno por tarefa. Um desafio é que as máquinas de estados geradas podem conter _structs auto-referenciais_, por exemplo quando variáveis locais da função assíncrona se referenciam. Para evitar invalidação de ponteiro, Rust usa o tipo `Pin` para garantir que futures não possam mais ser movidas na memória após serem consultadas pela primeira vez. Para nossa **implementação**, primeiro criamos um executor muito básico que consulta todas as tarefas spawnadas em um loop ocupado sem usar o tipo `Waker` de forma alguma. Então mostramos a vantagem das notificações de waker implementando uma tarefa de teclado assíncrona. A tarefa define uma `SCANCODE_QUEUE` estática usando o tipo `ArrayQueue` sem mutex fornecido pela crate `crossbeam`. Em vez de lidar com pressionamentos de tecla diretamente, o manipulador de interrupção de teclado agora coloca todos os scancodes recebidos na fila e então acorda o `Waker` registrado para sinalizar que nova entrada está disponível. Na extremidade receptora, criamos um tipo `ScancodeStream` para fornecer uma `Future` resolvendo para o próximo scancode na fila. Isso tornou possível criar uma tarefa `print_keypresses` assíncrona que usa async/await para interpretar e imprimir os scancodes na fila. Para utilizar as notificações de waker da tarefa de teclado, criamos um novo tipo `Executor` que usa uma `task_queue` compartilhada com `Arc` para tarefas prontas. Implementamos um tipo `TaskWaker` que empurra o ID de tarefas acordadas diretamente para esta `task_queue`, que então são consultadas novamente pelo executor. Para economizar energia quando nenhuma tarefa é executável, adicionamos suporte para colocar a CPU para dormir usando a instrução `hlt`. Finalmente, discutimos algumas extensões potenciais ao nosso executor, por exemplo, fornecer suporte multi-core. ## O Que Vem a Seguir? Usando async/await, agora temos suporte básico para multitarefa cooperativa em nosso kernel. Embora multitarefa cooperativa seja muito eficiente, ela leva a problemas de latência quando tarefas individuais continuam executando por muito tempo, impedindo assim outras tarefas de executar. Por esta razão, faz sentido também adicionar suporte para multitarefa preemptiva ao nosso kernel. No próximo post, introduziremos _threads_ como a forma mais comum de multitarefa preemptiva. Além de resolver o problema de tarefas de longa execução, threads também nos prepararão para utilizar múltiplos núcleos de CPU e executar programas de usuário não confiáveis no futuro. ================================================ FILE: blog/content/edition-2/posts/12-async-await/index.ru.md ================================================ +++ title = "Async/Await" weight = 12 path = "ru/async-await" date = 2020-03-27 [extra] chapter = "Multitasking" # Please update this when updating the translation translation_based_on_commit = "b2e12433b94a39d2437c2a472a3fd7cc4c22edab" # GitHub usernames of the people that translated this post translators = ["TakiMoysha"] +++ В этом посте мы рассмотрим _кооперативную многозадачность_ и возможности _async/await_ в Rust. Мы подробно рассмотрим, как async/await работает в Rust, включая трейт `Future`, переходы в конечных автоматах и _закрепления_ (pinning). Мы добавим базовую поддержку async/await в наше ядро путем создания асинхронных задач обработки ввода с клавиатуры и базовый исполнитель (executor). Этот блог открыто разрабатывается на [GitHub]. Если у вас возникают проблемы или вопросы, пожалуйста, откройте issue. Также вы можете оставлять комментарии [внизу][at the bottom]. Исходный код этого поста можно найти в [`post-12`][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-12 ## Многозадачность Одной из основных возможностей операционных систем является [_многозадачность_][_multitasking_], то есть возможность одновременного выполнения нескольких задач. Например, вероятно, пока вы читаете этот пост у вас открыты другие программы, вроде текстового редактора или терминала. Даже если у вас открыт только один браузер, в фоновом режиме выполняются различные задачи по управлению окнами рабочего стола, проверке обновлений или индексированию файлов. [_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking Хотя кажется, что все задачи выполняются параллельно, на одном ядре процессора может выполняться только одна задача за раз. Чтобы создать иллюзию параллельного выполнения задач, операционная система быстро переключается между активными задачами, чтобы каждая из них могла совершить небольшой прогресс. Поскольку компьютеры работают быстро, мы в большинстве случаев не замечаем этих переключений. Одноядерные центральные процессоры (ЦП) могут выполнять только одну задачу за раз, а многоядерные ЦП могут выполнять несколько задач по настоящему параллельно. Например, процессор с 8 ядрами может выполнять 8 задач одновременно. В следующей статье мы расскажем, как использовать многоядерные ЦП . В этой статье, для простоты, мы сосредоточимся на одноядерных процессорах. (Стоит отметить, что все многоядерные ЦП запускаются с одним активным ядром, поэтому пока мы можем рассматривать их как одноядерные процессоры). Есть две формы многозадачности: _кооперативная_ или совместная (_cooperative_) - требует, чтобы задачи регулярно отдавали контроль над процессором для продвижения других задач; _вытесняющая_ или приоритетная (_preemptive_) - использующая функционал операционной системы (ОС) для переключения потоков в произвольные моменты моменты времени через принудительную остановку. Далее мы рассмотрим две формы многозадачности более подробно и обсудим их преимущества и недостатки. ### Вытесняющая Многозадачность При вытесняющей многозадачности за переключение задач отвечает сама ОС. Используется тот факт, что при каждом прерывании ОС восстанавливает контроль над ЦП. Это позволяет переключать задачи всякий раз, когда в системе появляется новый ввод. Например, возможность переключать задачи когда двигается мышка или приходят пакеты по сети. ОС также может определять точное время, в течении которого задаче разрешается выполняться, настроив аппаратный таймер на прерывание по истечению этого времени. На следующем рисунтке изображен процесс переключения задач при аппаратном прерывании: ![](regain-control-on-interrupt.svg) На первой строке ЦП выполняет задачу `A1` программы `A`. Все другие задачи приостановлены. На второй строке, наступает аппаратное прерывание. Как описанно в посте [_Аппаратные Прерывания_][_Hardware Interrupts_], ЦП немедленно останавливает выполнение задачи `A1` и переходит к обработчику прерываний, определенному в таблице векторов прерываний (Interrupt Descriptor Table, IDT). Благодаря этому обработчику прерываний, ОС снова обладает контролем над ЦП, что позволяет ей переключиться на задачу `B1` вместо продолжения задачи `A1`. [_Hardware Interrupts_]: @/edition-2/posts/07-hardware-interrupts/index.md #### Сохранение состояния Поскольку задачи прерываются в произвольные моменты времени, они могут находиться в середине вычислений. Чтобы иметь возможность возобновить их позже, ОС должна создать копию всего состояния задачи, включая ее [_стек вызовов_] и значения всех регистров ЦП. Этот процесс называется [_переключение контекста_]. [_стек вызовов_]: https://ru.wikipedia.org/wiki/Стек_вызовов [_переключение контекста_]: https://ru.wikipedia.org/wiki/Переключение_контекста Поскольку стек вызовов может быть очень большим, ОС обычно создает отдельный стек вызовов для каждой задачи, вместо того чтобы сохранять содержимое стека вызовов при каждом переключении задач. Такая задача со своим собственным стеком называется [_поток выполнения_] или сокращенно _поток_. Используя отдельный стек для каждой задачи, при переключении контекста необходимо сохранять только содержимое регистров (включая программный счетчик и указатель стека). Такой подход минимизирует накладные расходы на производительность при переключении контекста, что очень важно, поскольку переключения контекста часто происходят до 100 раз в секунду. [_поток выполнения_]: https://en.wikipedia.org/wiki/Thread_(computing) #### Обсуждение Основным преимуществом вытесняющей многозадачности является то, что операционная система может полностью контролировать время выполнения каждой задачи. Таким образом, ОС может гарантировать, что каждая задача получит справедливую долю времени ЦП, без необходимости полагаться на кооперацию задач. Это особенно важно при выполнении сторонних задач или когда несколько пользователей совместно используют одну систему. Недостатком вытесняющей многозадачности в том, что каждой задаче требуется собственный стек. По сравнению с общим стеком это приводит к более высокому использованию памяти на задачу и часто ограничивает количество задач в системе. Другим недостатком является то, что ОС всегда должна сохранять полное состояние регистров ЦП при каждом переключении задач, даже если задача использовала только небольшую часть регистров. Вытесняющая многозадачность и потоки - фундаментальные компоненты ОС, т.к. они позволяют запускать неизвестные программы в userspace . Мы подробнее обсудим эти концепции в будущих постах. Однако сейчас, мы сосредоточимся на кооперативной многозадачности, которая также предоставляет полезные возможности для нашего ядра. ### Кооперативная Многозадачность Вместо принудительной остановки выполняющихся задач в произвольные моменты времени, кооперативная многозадачность позволяет каждой задаче выполняться до тех пор, пока она добровольно не уступит контроль над ЦП. Это позволяет задачам самостоятельно приостанавливаться в удобные моменты времени, например, когда им нужно ждать операции ввода-вывода. Кооперативная многозадачность часто используется на языковом уровне, например в виде [сопрограмм] (coroutines) или [async/await]. Идея в том, что программист или компилятор вставляет в программу операции [_yield_], которые отказываются от управления ЦП и позволяют выполняться другим задачам. Например, yield может быть вставлен после каждой итерации сложного цикла. [сопрограмм]: https://ru.wikipedia.org/wiki/Сопрограмма [async/await]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html [_yield_]: https://en.wikipedia.org/wiki/Yield_(multithreading) Часто кооперативную многозадачность совмещают с [асинхронными операциями]. Вместо того чтобы ждать завершения операции и препятствовать выполнению других задач, асинхронные операции возвращают статус «не готов», если операция еще не завершена. В этом случае ожидающая задача может выполнить операцию yield, чтобы другие можно было переключиться на другие задачи. [асинхронными операциями]: https://ru.wikipedia.org/wiki/Асинхронный_ввод-вывод #### Сохранение состояния Поскольку задачи сами определяют точки паузы, им не нужно, чтобы ОС сохраняла их состояние. Вместо этого они могут сохранять только то, что необходимо для продолжения работы и это часто приводит к улучшению производительности. Например, задача которая только что завершила сложные вычисления, может требоваться только резервное копирование конечного результата, т.к. промежуточные ей больше не нужны. Поддерживаемые языком кооперативная многозадачность часто может сохранять только необходимые зачти стека вызовов перед паузой. Например, реализация async/await в Rust сохраняет все локальные переменные, которые еще нужны, в автоматически сгенерированной структуре (см. ниже). Благодаря резервному копированию соответствующих частей стека вызовов перед паузой, все задачи могут использовать один стек, что приводит к значительному снижению потребления памяти на задачу. Это позволяет создавать практически любое количество кооперативных задач без исчерпания памяти. #### Обсуждение Недостатком кооперативной многозадачности является то, что некооперативная задача может потенциально выполняться в течение неограниченного времени. Таким образом, вредоносная или содержащая ошибки задача может помешать выполнению других задач, замедлить или даже заблокировать работу всей системы. По этой причине кооперативная многозадачность должна использоваться только в том случае если известно, что все задачи умеют кооперироваться. Как контрпример, не стоит полагаться на взаимодействие произвольных программ пользовательского пространства (user-level) в ОС. Однако высокая производительность и преимущества кооперативной многозадачности в плане памяти делают ее хорошим подходом для использования внутри программы, особенно в сочетании с асинхронными операциями. Поскольку ядро операционной системы является программой, критичной с точки зрения производительности, которая взаимодействует с асинхронным оборудованием, кооперативная многозадачность кажется хорошим подходом для реализации параллелизма. ## Async/Await в Rust Rust предоставляет отличную поддержку кооперативной многозадачности в виде async/await. Прежде чем мы сможем изучить, что такое async/await и как оно работает, нам необходимо понять, как работают _futures_ (футуры) и асинхронное программирование в Rust. ### Futures _Future_ представляет значение, которое может быть еще недоступно. Например, это может быть целое число, вычисляемое другой задачей или файл, загружаемый из сети. Вместо того, чтобы ждать, пока значение станет доступным, futures позволяют продолжить выполнение до тех пор, пока значение не понадобится. #### Пример Концепцию футур лучше всего проиллюстрировать небольшим примером: ![Sequence diagram (Диаграмма последовательности): main вызывает `read_file` и блокируется до его возврата; затем вызывает `foo()` и main блокируется до его возврата. Тот же процесс повторяется, но на этот раз вызывается `async_read_file`, который сразу возвращает future; затем снова вызывается `foo()`, который теперь выполняется одновременно с загрузкой файла. Файл становится доступным до возврата `foo()`.](async-example.svg) Эта диаграмма последовательности показывает функцию `main`, которая читает файл из файловой системы, а затем вызывает функцию `foo`. Этот процесс повторяется дважды: один раз с синхронным вызовом `read_file` и один раз с асинхронным вызовом `async_read_file`. При вызове синхронной функции `main` мы должны ждать, пока файл не будет загружен из файловой системы. Только после этого мы сможем вызвать функцию `foo` и завершения которой тоже будем ждать. При асинхронном вызове `async_read_file` файловая система напрямую возвращает футуру и загружает файл асинхронно в фоновом режиме. Это позволяет функции `main` вызвать `foo` гораздо раньше, которая затем выполняется параллельно с загрузкой файла. В этом примере загрузка файла заканчивается даже до завершения `foo`, поэтому `main` может напрямую работать с файлом без ожидания пока `foo` вернет результат. #### Futures в Rust В Rust, futures представленны трейтом [`Future`], который выглядит примерно так: [`Future`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html ```rust pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll; } ``` [Ассоциированный тип][associated type] `Output` определяет тип асинхронного значения. Например, функция `async_read_file` на приведенной выше диаграмме вернет экземпляр `Future` с `Output`, установленным как `File`. [associated type]: https://doc.rust-lang.org/book/ch20-02-advanced-traits.html#associated-types Метод [`poll`] позволяет проверить, доступно ли значение. Он возвращает enum [`Poll`]: [`poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll [`Poll`]: https://doc.rust-lang.org/nightly/core/task/enum.Poll.html ```rust pub enum Poll { Ready(T), Pending, } ``` Когда значение доступно (например, файл был полностью прочитан с диска), результат возвращается обернутый в `Ready`. В ином случае возвращается `Pending`, который сигнализирует, что значение еще не доступно. Метод `poll` принимает два аргумента: `self: Pin<&mut Self>` и `cx: &mut Context`. Первый аргумент ведет себя аналогично ссылке `&mut self`, за исключением того, что значение `Self` [_закрепленно_][_pinned_] (pinned) к месту в памяти. Понять концепцию закрепления (`Pin`) и его необходимость сложно, если не понимать как работает async/await. Поэтому мы объясним это позже в этом посте. [_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html Параметр `cx: &mut Context` нужен для передачи экземпляра [`Waker`] в асинхронную задачу, например, загрузку файловой системы. Этот `Waker` позволяет асинхронной задаче сообщать о том, что она (или ее часть) завершена, например, что файл был загружен с диска. Поскольку основная задача знает, что она будет уведомлена, когда `Future` будет готов, ей не нужно повторно вызывать `poll`. Мы объясним этот процесс более подробно позже в этом посте, когда будем реализовывать наш собственный тип waker. [`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html ### Работа с Futures Теперь мы знаем, как определяются футуры, и понимаем основную идею метода `poll`. Однако мы все еще не знаем, как эффективно работать с футурами. Проблема в том, что они представляют собой результаты асинхронных задач, которые могут быть еще недоступны. Однако на практике нам часто нужны эти значения непосредственно для дальнейших вычислений. Поэтому возникает вопрос: как мы можем эффективно получить значение, когда оно нам нужно? #### Ожидание Futures {#waiting-on-futures} Один из возможных ответов — дождаться пока футура исполнится. Это может выглядеть примерно так: ```rust let future = async_read_file("foo.txt"); let file_content = loop { match future.poll(…) { Poll::Ready(value) => break value, Poll::Pending => {}, // ничего не делать } } ``` Здесь мы _активно_ ждем футуру, вызывая `poll` снова и снова в цикле. Аргументы `poll` опущены, т.к. здесь они не имеют значения. Хотя это решение работает, оно очень неэффективно, потому что мы занимаем CPU до тех пор, пока значение не станет доступным. Более эффективным подходом может быть _блокировка_ текущего потока до тех пор, пока футура не станет доступной. Конечно, это возможно только при наличии потоков, поэтому это решение не работает для нашего ядра, по крайней мере, пока. Даже в системах, где поддерживается блокировка, она часто нежелательна, поскольку превращает асинхронную задачу в синхронную, тем самым сдерживая потенциальные преимущества параллельных задач в плане производительности. #### Комбинаторы Future Альтернативой ожиданию является использование комбинаторов футур. _Комбинаторы future_ - это методы вроде `map`, которые позволяют объединять и связывать футуры между собой, аналогично методам трейта [`Iterator`]. Вместо того чтобы ожидать футуры, комбинатор сам возвращает футуру, которая будте применяет операцию сопоставления при вызове `poll`. [`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html Например, простой комбинатор `string_len` для преобразования `Future` в `Future` может выглядеть так: ```rust struct StringLen { inner_future: F, } impl Future for StringLen where F: Future { type Output = usize; fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll { match self.inner_future.poll(cx) { Poll::Ready(s) => Poll::Ready(s.len()), Poll::Pending => Poll::Pending, } } } fn string_len(string: impl Future) -> impl Future { StringLen { inner_future: string, } } // применение fn file_len() -> impl Future { let file_content_future = async_read_file("foo.txt"); string_len(file_content_future) } ``` Этот код не совсем корректен, потому что не учитывает [_закрепление_][_pinning_], но он подходит для примера. Суть в том, что функция `string_len` оборачивает переданный экземпляр `Future` в новую структуру `StringLen`, которая также реализует `Future`. При опросе футуры-обертки опрашивается внутренняя футура. Если значение ещё не готово, из футуры-обертки возвращается `Poll::Pending`. Если значение готово, строка извлекается из `Poll::Ready`, вычисляется её длина, после чего результат снова оборачивается в `Poll::Ready` и возвращается. [_pinning_]: https://doc.rust-lang.org/stable/core/pin/index.html С помощью функции `string_len` можно вычислить длину асинхронной строки, не дожидаясь её завершения. Поскольку функция снова возвращает `Future`, вызывающий код не может работать с возвращённым значением напрямую, а должен использовать комбинаторы. Таким образом, весь граф вызовов становится асинхронным, и в какой-то момент (например, в основной функции) можно эффективно ожидать завершения нескольких футур одновременно. Так как ручное написание функций-комбинаторов сложно, они обычно предоставляются библиотеками. Стандартная библиотека Rust пока не содержит методов-комбинаторов, но полуофициальная (и совместимая с `no_std`) библиотека [`futures`] предоставляет их. Её трейт [`FutureExt`] включает высокоуровневые методы-комбинаторы, такие как [`map`] или [`then`], которые позволяют манипулировать результатом с помощью произвольных замыканий. [`futures`]: https://docs.rs/futures/0.3.4/futures/ [`FutureExt`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html [`map`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map [`then`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then ##### Преимущества Большое преимущество комбинаторов футур (future combinators) в том, что они сохраняют асинхронность. В сочетании с асинхронными интерфейсами ввода-вывода такой подход может обеспечить очень высокую производительность. То, что future комбинаторы реализованы как обычные структуры с имплементацией трейтов, позволяет компилятору чрезвычайно оптимизировать их. Подробнее см. в посте [_Zero-cost futures in Rust_], где было объявлено о добавлении futures в экосистему Rust. [_Zero-cost futures in Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ ##### Недостатки {#drawbacks} Хотя комбинаторы футур позволяют писать очень эффективный код, их может быть сложно использовать в некоторых ситуациях из-за системы типов и интерфейса на основе замыканий. Например, рассмотрим такой код: ```rust fn example(min_len: usize) -> impl Future { async_read_file("foo.txt").then(move |content| { if content.len() < min_len { Either::Left(async_read_file("bar.txt").map(|s| content + &s)) } else { Either::Right(future::ready(content)) } }) } ``` ([Попробовать](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=91fc09024eecb2448a85a7ef6a97b8d8)) Мы читаем файл `foo.txt`, а затем используем комбинатор [`then`], чтобы связать вторую футуру на основе содержимого файла. Если длина содержимого меньше заданного `min_len`, мы читаем другой файл `bar.txt` и добавляем его к `content` с помощью комбинатора [`map`]. В противном случае возвращаем только содержимое `foo.txt`. Нам нужно использовать ключевое слово [`move`] для замыкания, передаваемого в `then`, иначе возникнет ошибка времени жизни (lifetime) для `min_len`. Причина использования обёртки [`Either`] в том, что блоки `if` и `else` всегда должны возвращать значения одного типа. Поскольку возвращаются разные типы футур в блоке, нам необходимо использовать тип-обертку, чтобы привести их к единому типу. Функция [`ready`] оборачивает значение в футуру, которая сразу готова к использованию. Здесь она необходима, потому что обёртка `Either` ожидает, что обёрнутое значение реализует `Future`. [`move` keyword]: https://doc.rust-lang.org/std/keyword.move.html [`Either`]: https://docs.rs/futures/0.3.4/futures/future/enum.Either.html [`ready`]: https://docs.rs/futures/0.3.4/futures/future/fn.ready.html Как можно догадаться, такой подход быстро приводит к очень сложному коду, особенно в крупных проектах. Ситуация ещё больше усложняется, если задействованы заимствования (borrowing) и разные времена жизни (lifetimes). Именно поэтому в Rust было вложено много усилий для добавления поддержки `async/await` — с целью сделать написание асинхронного кода радикально проще. ### Паттерн Async/Await Идея async/await заключается в том, чтобы позволить программисту писать код, который _выглядит_ как синхронный код, но компилятор превращает его в асинхронный. Это работает на основе двух ключевых слов `async` и `await`. Ключевое слово `async` можно использовать в сигнатуре функции для превращения синхронной функции в асинхронную, возвращающую future: ```rust async fn foo() -> u32 { 0 } // примерно переводится компилятором в: fn foo() -> impl Future { future::ready(0) } ``` Один `async` не будет особо полезен. Однако внутри функции мы можем использовать `await`, чтобы получить асинхронное значение из футуры: ```rust async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } ``` ([Попробовать](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=d93c28509a1c67661f31ff820281d434)) Эта функция - прямой перевод `example` написанной [выше](#drawbacks), которая использовала комбинаторы. Используя оператор `.await`, мы можем получить результат из футуры без необходимости использования каких-либо замыканий или типов `Either`. В результате мы можем писать наш код так же, как если бы это был обычный синхронный код, с той лишь разницей, что _это все еще асинхронный код_. #### Преобразования Конечных Автоматов За кулисами компилятор преобразует тело функции `async` в [_конечный автомат_][_state machine_] (state machine) с каждым вызовом `.await`, представляющим собой разное состояние. Для вышеуказанной функции `example`, компилятор создает конечный автомат с четырьмя состояниями. [_state machine_]: https://en.wikipedia.org/wiki/Finite-state_machine ![Четыре состояния: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-states.svg) Каждое состояние представляет собой точку остановки в функции. Состояния _"Start"_ и _"End"_, указывают на начало и конец выполнения ф-ции. Состояние _"waiting on foo.txt"_ - функция в данный момент ждёт первого результата `async_read_file`. Аналогично, состояние _"waiting on bar.txt"_ представляет остановку, когда функция ожидает второй результат `async_read_file`. Конечный автомат реализует trait `Future` делая каждый вызов `poll` возможным переход между состояниями: ![Четыре состояния и переходы: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-basic.svg) Диаграмма использует стрелки для представления переключений состояний и ромбы для представления альтернативных путей. Например, если файл `foo.txt` не готов, то мы идем по пути _"no"_ переходя в состояние _"waiting on foo.txt"_. Иначе, идем по пути _"yes"_. Где маленький красный ромб без подписи - ветвь функции example, где `if content.len() < 100`. Мы видим, что первый вызов `poll` запускает функцию и она выполняться до тех пор, пока у футуры не будет результата. Если все футуры на пути готовы, функция может выполниться до состояния _"end"_ , то есть вернуть свой результат, завернутый в `Poll::Ready`. В противном случае конечный автомат переходит в состояние ожидания и возвращает `Poll::Pending`. При следующем вызове `poll` машина состояний начинает с последнего состояния ожидания и повторяет последнюю операцию. #### Сохранение состояния Для продолжения работы с последнего состояния ожидания, автомат должен отслеживать текущее состояние внутри себя. Дополнительно, он должен сохранять все переменные, которые необходимы для продолжения выполнения при следующем вызове `poll`. Здесь компилятор действительно может проявить себя: зная, когда используются те или иные переменные, он может автоматически создавать структуры с точным набором требуемых переменных. Например, компилятор генерирует структуры для вышеприведенной ф-ции `example`: ```rust // снова `example` что бы вам не пришлось прокручивать вверх async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } // state-структуры генерируемые компилятором struct StartState { min_len: usize, } struct WaitingOnFooTxtState { min_len: usize, foo_txt_future: impl Future, } struct WaitingOnBarTxtState { content: String, bar_txt_future: impl Future, } struct EndState {} ``` В состояниях "start" и _"waiting on foo.txt"_ необходимо сохранить параметр `min_len` для последующего сравнения с `content.len()`. Состояние _"waiting on foo.txt"_ дополнительно содержит `foo_txt_future`, представляющий future возвращаемое вызовом `async_read_file`. Эту футуру нужно опросить снова, когда автомат продолжит свою работу, поэтому его нужно сохранить. Состояние "waiting on bar.txt" содержит переменную `content` для последующей конкатенации строк при загрузке файла `bar.txt`. Оно также хранит `bar_txt_future`, отвечающая за загрузку файла `bar.txt`. Эта структура не содержит переменную `min_len`, потому что она уже не нужна после проверки длины строки `content.len()`. В состоянии _"end"_, в структуре ничего нет, т.к. ф-ция завершилась полностью. Учтите, что приведенный здесь код - это только пример того, какая структура может быть сгенерирована компилятором Имена структур и расположение полей - детали реализации и могут отличаться. #### Полный Конечный Автомат При этом точно сгенерированный код компилятора является деталью реализации, это помогает понять, представив, как могла бы выглядеть машина состояний для функции `example`. Мы уже определили структуры, представляющие разные состояния и содержащие необходимые переменные. Чтобы создать машину состояний на их основе, мы можем объединить их в [`enum`]: [`enum`]: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html ```rust enum ExampleStateMachine { Start(StartState), WaitingOnFooTxt(WaitingOnFooTxtState), WaitingOnBarTxt(WaitingOnBarTxtState), End(EndState), } ``` Мы определяем отдельный вариант enum для каждого состояния и добавляем соответствующую структуру состояния в каждый вариант как поле. Чтобы реализовать переходы между состояниями, компилятор генерирует реализацию трейта `Future` на основе функции `example`: ```rust impl Future for ExampleStateMachine { type Output = String; // возвращает тип из `example` fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { loop { match self { // TODO: обработка закрепления (pinning) ExampleStateMachine::Start(state) => {…} ExampleStateMachine::WaitingOnFooTxt(state) => {…} ExampleStateMachine::WaitingOnBarTxt(state) => {…} ExampleStateMachine::End(state) => {…} } } } } ``` Тип для `Output` указан `String`, потому что этот тип возвращает функция `example`. Для реализации метода `poll` мы используем оператор `match` на текущем состоянии внутри цикла. Идея в том, что мы переходим к следующему состоянию пока это возможно и явно возвращаем `Poll::Pending` когда мы не можем продолжить. Для упрощения мы показываем только упрощенный код и не обрабатываем [закрепления][_pinning_], владения, варемя жизни, и т.д. Поэтому этот и следующий код должны быть восприняты как псевдокод и не использоваться напрямую. Конечно, реальный генерируемый компилятором код обрабатывает корректно, хотя возможно это будет сделано по-другому. Чтобы сохранить примеры кода маленькими, мы напишем код для каждой ветки `match` отдельно. Начнем с состояния `Start`: ```rust ExampleStateMachine::Start(state) => { // из тела `example` let foo_txt_future = async_read_file("foo.txt"); // операция`.await` let state = WaitingOnFooTxtState { min_len: state.min_len, foo_txt_future, }; *self = ExampleStateMachine::WaitingOnFooTxt(state); } ``` Автомат находится в состоянии `Start`, когда ф-ция только начинает выполнение. В этом случае выполняем весь код из тела функции `example` до первого `.await`. Чтобы обработать операцию `.await`, мы меняем состояние на `WaitingOnFooTxt`, которое включает в себя построение структуры `WaitingOnFooTxtState`. Пока `match self {…}` выполняется в цикле, выполнение переходит к ветке `WaitingOnFooTxt`: ```rust ExampleStateMachine::WaitingOnFooTxt(state) => { match state.foo_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(content) => { // из тела `example` if content.len() < state.min_len { let bar_txt_future = async_read_file("bar.txt"); // операция `.await` let state = WaitingOnBarTxtState { content, bar_txt_future, }; *self = ExampleStateMachine::WaitingOnBarTxt(state); } else { *self = ExampleStateMachine::End(EndState); return Poll::Ready(content); } } } } ``` Эта ветка `match` начинается с вызова `poll` для `foo_txt_future`. Если она не готова, мы выходим из цикла и возвращаем `Poll::Pending`. В этом случае `self` остается в состоянии `WaitingOnFooTxt`, следующий вызов `poll` на автомате попадёт в ту же ветку `match` и повторит проверку готовности `foo_txt_future`. Когда `foo_txt_future` будет готов, мы присваиваем результат переменной `content` и продолжаем выполнять код функции `example`: Если `content.len()` меньше чем `min_len` сохраненной в state-структуре, асинхронно читаем файл `bar.txt`. Мы ещё раз переключаем состояние операцией `.await`, на этот раз, в состояние `WaitingOnBarTxt`. Пока мы выполняем `match` внутри цикла, исполнение переходит к той ветке `match`, которая проверяет готовность `bar_txt_future`. В случае перехода в ветку `else`, более никаких операций `.await` не происходит. Мы достигаем конца функции и возвращаем `content` обёрнутую в `Poll::Ready`. Также меняем текущее состояние на `End`. Код для состояния `WaitingOnBarTxt` выглядит следующим образом: ```rust ExampleStateMachine::WaitingOnBarTxt(state) => { match state.bar_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(bar_txt) => { *self = ExampleStateMachine::End(EndState); // из тела `example` return Poll::Ready(state.content + &bar_txt); } } } ``` Аналогично состоянию `WaitingOnFooTxt`, мы начинаем с проверки готовности `bar_txt_future`. Если она ещё не готова, мы выходим из цикла и возвращаем `Poll::Pending`. В противном случае мы можем выполнить последнюю операцию функции `example`: конкатенацию переменной `content` с результатом футуры. Переводим автомат в состояние `End` и затем возвращаем результат обёрнутый в `Poll::Ready`. В итоге код для `End` состояния выглядит так: ```rust ExampleStateMachine::End(_) => { panic!("poll вызван после возврата Poll::Ready"); } ``` Футуры не должны повторно проверяться после того, как они вернули `Poll::Ready`, поэтому паникуем, если вызвана функция `poll` при состоянии `End`. Теперь мы знаем, что сгенерированный компилятором конечный автомат и его реализация трейта `Future` _могла бы_ выглядеть так. На практике компилятор генерирует код по-другому. (Если вас интересно, то реализация ныне основана на [_корутинах_], но это только деталь имплементации.) [_корутинах_]: https://doc.rust-lang.org/stable/unstable-book/language-features/coroutines.html Последняя часть пазла – сгенерированный код для самой функции `example`. Помните, что заголовок функции был определён следующим образом: ```rust async fn example(min_len: usize) -> String ``` Теперь, когда весь функционал реализуется конечным автоматом, единственное, что ф-ция должна сделать - это инициализировать этот автомат и вернуть его. Сгенерированный код может выглядеть так: ```rust fn example(min_len: usize) -> ExampleStateMachine { ExampleStateMachine::Start(StartState { min_len, }) } ``` Функция больше не имеет модификатора `async`, поскольку теперь явно возвращает тип `ExampleStateMachine`, который реализует трейт `Future`. Как ожидалось, автомат создается в состоянии `start` и соответствующая ему структура состояния инициализируется параметром `min_len`. Заметьте, что эта функция не запускает автомат. Это фундаментальное архитектурное решение для футур в Rust: они ничего не делают, пока не будет произведена первая проверка на готовность. #### Закрепление {#pinning} Мы уже несколько раз столкнулись с понятием _закрепления_ (pinnig, пиннинг) в этом посте. Наконец пришло время чтобы изучить что это такое и почему оно необходимо. #### Самоссылающиеся структуры Как объяснялось выше, переходы хранял структуру с локальными перемеными для каждой паузы . Для простых примеров, как наша функция `example`, это было просто и не привело к никаким проблемам. Однако, дела обстоят сложнее, когда переменные ссылаются друг на друга. Рассмотрим код: ```rust async fn pin_example() -> i32 { let array = [1, 2, 3]; let element = &array[2]; async_write_file("foo.txt", element.to_string()).await; *element } ``` Эта функция создает небольшой `array` содержащий `1`, `2`, `3`. Затем она создает ссылку на последний элемент массива и хранит ее в переменной `element`. Далее, асинхронно записывает число, преобразованное в строку, в файл `foo.txt`. В конце, возвращает число, ссылка на которое хранится в `element`. Так как ф-ция использует одну операцию `await`, конечный автомат будет состоять из трех состояний: start, end и "waiting on write". Функция не принимает аргументов, поэтому структура для "start" пуста. Аналогично, структура для "end" также пустая, поскольку функция завершена на этом этапе. Нам интересна структура "waiting on write": ```rust struct WaitingOnWriteState { array: [1, 2, 3], element: 0x1001c, // адрес последнего элемента в array } ``` Мы должны хранить и `array`, и `element`, потому что `element` требуется для значения возврата, а `array` ссылается на `element`. Следовательно, `element` является _указателем_ (pointer) (адресом в памяти), который хранит адрес ссылаемого элемента. В этом примере мы использовали `0x1001c` в качестве адреса, в реальности он должен быть адресом последнего элемента поля `array`, что зависит от места расположения структуры в памяти. Структуры с такими внутренними указателями называются _самоссылочными_ (self-referential) структурами, потому что они ссылаются на себя из одного из своих полей. #### Проблемы с Самоссылочными Структурами Внутренний указатель нашей самоссылочной структуры приводит к базовой проблеме, которая становится очевидной, когда мы посмотрим на её раскладку памяти: ![массив от 0x10014 с полями 1, 2, и 3; элемент в адресе 0x10020, указывающий на последний элемент массива в 0x1001c](self-referential-struct.svg) Поле `array` начинается в адресе `0x10014`, а поле `element` - в адресе `0x10020`. Оно указывает на адрес `0x1001c`, потому что последний элемент массива находится там. На текущий момент все в порядке. Однако проблема возникает когда мы перемещаем эту структуру на другой адрес памяти: ![массив от 0x10024 с полями 1, 2, и 3; элемент в адресе 0x10030, продолжающий указывать на 0x1001c, хотя последний элемент массива сейчас находится в 0x1002c](self-referential-struct-moved.svg) Мы переместили структуру немного так, чтобы она теперь начиналась в адресе `0x10024`. Это могло произойти, например, когда мы передаем структуру как аргумент функции или присваиваем ее другой переменной на стеке. Проблема заключается в том, что поле `element` все ещё указывает на адрес `0x1001c`, хотя последний элемент массива теперь находится в адресе `0x1002c`. Теперь это висячий указатель, то есть он указывает на недопустимый объект и при следующий вызове `poll` будет ошибка undefined behavior. #### Возможные решения Существует три основных подхода к решению проблемы висящих указателей (dangling pointers): - **Обновление указателя при перемещении**: Суть в обновлении внутреннего указателя при каждом перемещении структуры в памяти, чтобы она оставалась действительной после перемещения. Однако этот подход требует значительных изменений в Rust, которые могут привести к потенциальным значительным потерям производительности. Причина в том, что необходимо каким-то образом отслеживать тип всех полей структуры и проверять на каждой операции перемещения требуется ли обновление указателя. - **Хранение смещения (offset) вместо самоссылающихся ссылок**: Чтобы избежать необходимости обновления указателей, компилятор мог бы попытаться хранить саммоссылки в форме смещений от начала структуры вместо прямых ссылок. Например, поле `element` вышеупомянутой `WaitingOnWriteState` структуры можно было бы хранить в виде поля `element_offset` c значением 8, потому что элемент массива, на который указывает ссылка, находится за 8 байтов после начала структуры. Смещение остается неизменным при перемещении структуры, так что не требуются обновления полей. Проблема с этим подходом в том, что он требует от компилятора обнаружения всех самоссылок. Это невозможно на этапе компилящии, т.к. значения ссылки может зависеть от ввода пользователя, так что нам потребуется система анализа ссылок и корректная генерация состояния для структур во время исполнения. Это накладывает расходы на время выполнения и предотвратит определённые оптимизации компилятора, что приведёт к еще большим потерям производительности. - **Запретить перемещать структуру**: Мы увидели выше, что висящий указатель возникает только при перемещении структуры в памяти. Запретив все операции перемещения для самоссылающихся структур, можно избежать этой проблемы. Большое преимущество том, что это можно реализовать на уровне системы типов без расходов к времени исполнения. Недостаток в том, что оно возлагает на программиста обязанности по обработке перемещений самоссылающихся структур. Rust выбрал третий подход из-за принципа предоставления _бесплатных абстракций_ (zero-cost abstractions), что означает, что абстракции не должны накладывать дополнительные расходы на временя исполнения. API [_pinning_] предлагалось для решения этой проблемы в [RFC 2349](). В следующем разделе мы дадим краткий обзор этого API и объясним, как оно работает с async/await и futures. #### Значения в Куче (Heap) Первое наблюдение состоит в том, что значения [аллоцированные в куче], обычно имеют фиксированный адрес памяти. Они создаются с помощью вызова `allocate` и затем ссылаются на тип указателя, такой как `Box`. Хотя перемещение типа-указателя возможно, значение кучи, которое указывает на него, остается в том же адресе памяти до тех пор, пока оно не будет освобождено с помощью вызова `deallocate` еще раз. [аллоцированные в куче]: @/edition-2/posts/10-heap-allocation/index.md При аллокации в куче, можно попытаться создать самоссылающуюся структуру: ```rust fn main() { let mut heap_value = Box::new(SelfReferential { self_ptr: 0 as *const _, }); let ptr = &*heap_value as *const SelfReferential; heap_value.self_ptr = ptr; println!("heap value at: {:p}", heap_value); println!("internal reference: {:p}", heap_value.self_ptr); } struct SelfReferential { self_ptr: *const Self, } ``` ([Попробовать][playground-self-ref]) [playground-self-ref]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=ce1aff3a37fcc1c8188eeaf0f39c97e8 Мы создаем простую структуру с названием `SelfReferential`, которая содержит только одно поле с указателем. Во-первых, мы инициализируем эту структуру с пустым указателем и затем выделяем место в куче с помощью `Box::new`. Затем мы определяем адрес кучи для выделенной структуры и храним его в переменной `ptr`. В конце концов, мы делаем структуру самоссылающейся, назначив переменную `ptr` полю `self_ptr`. Когда мы запускаем этот код в [песочнице][playground-self-ref], мы видим, что адрес на куче и внутренний указатель равны, что означает, что поле `self_ptr` валидное. Поскольку переменная `heap_value` является только указателем, перемещение его (например, передачей в функцию) не изменяет адрес самой структуры, поэтому `self_ptr` остается действительным даже при перемещении указателя. Тем не менее все еще есть путь сломать этот пример: мы можем выйти из `Box` или изменить содержимое: ```rust let stack_value = mem::replace(&mut *heap_value, SelfReferential { self_ptr: 0 as *const _, }); println!("value at: {:p}", &stack_value); println!("internal reference: {:p}", stack_value.self_ptr); ``` ([Попробовать](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) Мы используем функцию [`mem::replace`], чтобы заменить значение, выделенное в куче, новым экземпляром структуры. Это позволяет нам переместить исходное значение `heap_value` в стек, в то время как поле `self_ptr` структуры теперь является висящим указателем, который по-прежнему указывает на старый адрес в куче. Когда вы запустите пример в песочнице, вы увидите, что строки _«value at:»_ и _«internal reference:»_, показывают разные указатели. Таким образом, выделение значения в куче недостаточно для обеспечения безопасности самоссылок. [`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html Основная проблема, которая привела к вышеуказанной ошибке в том, что `Box` позволяет нам получить ссылку `&mut T` на значение, которое выделенно в куче. При этом ссылка `&mut` позволяет использовать такие методы, как [`mem::replace`] или [`mem::swap`], который сделаеют значение на куче невалидным. Чтобы решить эту проблему, мы должны предотвратить создание ссылок `&mut` на самоссылающиеся структуры. [`mem::swap`]: https://doc.rust-lang.org/nightly/core/mem/fn.swap.html #### `Pin>` и `Unpin` API _закрепления_ предоставляет решение проблемы `&mut T` в виде типа-обертки [`Pin`] и трейта-маркера [`Unpin`]. Идея их использования в том, что бы ограничить все методы `Pin`, которые могут быть использованы для получения ссылок `&mut` на обернутое значение (например, [`get_mut`][pin-get-mut] или [`deref_mut`][pin-deref-mut]), на трейт `Unpin`. Трейт `Unpin` является _автоматическим трейтом_ ([_auto trait_]), то есть автоматически реализуется для всех типов, кроме тех, которые явно исключают его. Исключая `Unpin` в самоссылающихся структурах, не остается (безопасного) способа) получить `&mut T` из типа `Ping>` для них. В результате их внутренние самоссылки гарантированно остаются действительными. [`Pin`]: https://doc.rust-lang.org/stable/core/pin/struct.Pin.html [`Unpin`]: https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html [pin-get-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_mut [pin-deref-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.deref_mut [_auto trait_]: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits Как пример обновим тип `SelfReferential` из примера выше, что бы отказаться от `Unpin`: ```rust use core::marker::PhantomPinned; struct SelfReferential { self_ptr: *const Self, _pin: PhantomPinned, } ``` Мы отказываемся от `Unpin`, добавляя второе поле `_pin` типа [`PhantomPinned`]. Этот тип является маркерным типом нулевого размера, единственной целью которого является _отказ_ от реализации трейта `Unpin`. Из-за того как работают [_автоматические трейты_], одного поля, которое не является `Unpin`, достаточно, чтобы полностью исключить реализацию `Unpin` для структуры. [`PhantomPinned`]: https://doc.rust-lang.org/nightly/core/marker/struct.PhantomPinned.html Второй шаг — изменить тип `Box` в примере на `Pin>`. Самый простой способ сделать это — использовать функцию [`Box::pin`] вместо [`Box::new`] для создания значения, размещаемого в куче: [`Box::pin`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin [`Box::new`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.new ```rust let mut heap_value = Box::pin(SelfReferential { self_ptr: 0 as *const _, _pin: PhantomPinned, }); ``` В дополнение к изменению `Box::new` на `Box::pin`, нам также нужно добавить новое поле `_pin` в инициализатор структуры. Т.к. `PhantomPinned` является типом нулевого размера, нам нужно только его имя типа для инициализации. Если мы [попробуем запустить наш пример после правки](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=961b0db194bbe851ff4d0ed08d3bd98a), то увидим, что он больше на работает: ``` error[E0594]: cannot assign to data in dereference of `Pin>` --> src/main.rs:10:5 | 10 | heap_value.self_ptr = ptr; | ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot assign | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `std::pin::Pin>` error[E0596]: cannot borrow data in dereference of `Pin>` as mutable --> src/main.rs:16:36 | 16 | let stack_value = mem::replace(&mut *heap_value, SelfReferential { | ^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` ``` Обе ошибки возникают потому, что тип `Pin>` больше не реализует трейт `DerefMut`. Это именно то, чего мы и хотели, поскольку трейт `DerefMut` возвращал бы ссылку `&mut`, что мы предотвратить. Это происходит только потому, что мы отказались от `Unpin` и изменили `Box::new` на `Box::pin`. Теперь проблема в том, что компилятор не только предотвращает перемещение типа в строке 16, но и запрещает инициализацию поля `self_ptr` в строке 10. Компилятор не может различить допустимые и недопустимые использования ссылок `&mut`. Чтобы инициализация снова заработала, нам нужно использовать небезопасный метод [`get_unchecked_mut`]: [`get_unchecked_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut ```rust // безопасно, т.к. изменение поля не перемещает всю структуру unsafe { let mut_ref = Pin::as_mut(&mut heap_value); Pin::get_unchecked_mut(mut_ref).self_ptr = ptr; } ``` ([Попробовать](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=b9ebbb11429d9d79b3f9fffe819e2018)) Функция [`get_unchecked_mut`] работает с `Pin<&mut T>` вместо `Pin>`, поэтому нам нужно использовать [`Pin::as_mut`] для преобразования значения. Затем мы можем установить поле `self_ptr`, используя ссылку `&mut`, возвращаемую `get_unchecked_mut`. [`Pin::as_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut Теперь единственной оставшейся ошибкой является желаемая ошибка на `mem::replace`. Помните, что эта операция пытается переместить значение из кучи на стек, что нарушило бы самоссылку, хранящуюся в поле `self_ptr`. Отказываясь от `Unpin` и используя `Pin>`, мы можем предотвратить эту операцию на этапе компиляции и безопасно работать с самоссыльными структурами. Как мы видели, компилятор не может доказать, что создание самоссылки безопасно (пока), поэтому нам нужно использовать небезопасный блок и самостоятельно проверить корректность. #### Закрепление в стеке и `Pin<&mut T>` В предыдущем разделе мы узнали, как использовать `Pin>` для безопасного создания самоссыльного значения в куче. Хотя этот подход работает хорошо и относительно безопасен (кроме unsafe), необходимость аллокация в куче бьет по производительности. Поскольку Rust стремится предоставлять _абстракции с нулевой стоимостью_ (_zero-cost abstractions_) где это возможно, API закрепления также позволяет создавать экземпляры `Pin<&mut T>`, которые указывают на значения, размещённые в стеке. В отличие от экземпляров `Pin>`, которые _владеют_ обёрнутым значением, экземпляры `Pin<&mut T>` лишь временно заимствуют это значение значение. Это усложняет задачу, так как программисту необходимо самостоятельно обеспечивать дополнительные гарантии. `Pin<&mut T>` должен оставаться закрепленным на протяжении всего жизненного цилка ссылочного `T`, что может быть сложно обеспечить для переменных на стеке. Чтобы помочь с этим, существуют такие крейты, как [`pin-utils`], но я все же не рекомендую закреплять на стеке, если вы не уверены в своих действиях. [`pin-utils`]: https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/ Что бы узнать большое обратитесь к документации модуля [`pin`] и метода [`Pin::new_unchecked`]. [`pin` module]: https://doc.rust-lang.org/nightly/core/pin/index.html [`Pin::new_unchecked`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.new_unchecked #### Закрепление и Футуры Как мы уже увидели в этом посте, метод [`Future::poll`] использует закрепление в виде параметра `Pin<&mut Self>`: [`Future::poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll ```rust fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll ``` Причина, по которой этот метод принимает `self: Pin<&mut Self>` вместо обычного `&mut self` в том, что экземпляры футур, созданные через async/await, часто являются самоссыльными, как мы видели [выше][self-ref-async-await]. Оборачивая `Self` в `Pin` и позволяя компилятору исключить `Unpin` для самоссыльных футур, генерируемых из async/await, мы гарантируем, что футуры не будут перемещаться в памяти между вызовами `poll`. Это обеспечивает сохранение валидности всех внутренних ссылок. [self-ref-async-await]: @/edition-2/posts/12-async-await/index.md#self-referential-structs Стоит отметить, что перемещение футур до первого вызова `poll` допустимо. Как упоминалось выше, футуры ленивые и ничего не делают, пока их не опросят (polled) в первый раз. Состояние `start` сгенерированных автоматов, следовательно, содержит только аргументы функции, но не внутренние ссылки. Чтобы вызвать `poll`, вызывающему необходимо сначала обернуть футуру в `Pin`, что гарантирует, что футура больше не может быть перемещена в памяти. Поскольку закрепление на стеке (stack pining) сложнее сделать правильно, я рекомендую всегда использовать [`Box::pin`] в сочетании с [`Pin::as_mut`] для этого. [`futures`]: https://docs.rs/futures/0.3.4/futures/ Если вас интересует, как безопасно реализовать комбинатора футур с использованием закрепления на стеке, взгляните на относительно короткий [исходный код метода комбинатора `map`][map-src] из крейта `futures` и раздел о [projections and structural pinning] в документации pin. [map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html [projections and structural pinning]: https://doc.rust-lang.org/stable/std/pin/index.html#projections-and-structural-pinning ### Executors и Wakers Используя async/await, можно удобно работать с футурами в полностью асинхронном режиме. Однако, как мы узнали выше, футуры ничего не делают, пока их не спросят. Это означает, что нам нужно в какой-то момент вызвать `poll`, иначе асинхронный код никогда не будет выполнен. Запуская одну футуры, мы можем вручную ожидать ее исполнения в цикле, [как упоминалось выше](#waiting-on-futures). Однако этот подход очень неэффективен и непрактичен для программ, создающих большое количество футур. Наиболее распространённым решением этого является создание глобального _исполнителя_ (executor), который отвечает за опрос (polling) всех футур в системе, пока они не завершатся. #### Исполнитель Цель исполнителя в том, чтобы позволить создавать футуры в качестве независимых задач, обычно через какой-либо метод `spawn`. Исполнитель отвечает за опрос всех футур, пока они не закончаться. Большое преимущество управления всеми футурами в одном месте в том, что исполнитель может переключаться на другую футуру, когда текущая футура возвращает `Poll::Pending`. Таким образом, асинхронные операции выполняются параллельно, и процессор остаётся загруженным. Многие реализации исполнителей также могут использовать преимущества систем с многоядерными процессорами. Они создают [пул потоков][thread pool], способный использовать все ядра, если достаточно работы, и применяют такие техники, как [work stealing], для балансировки нагрузки между ядрами. Существуют также специальные реализации исполнителей для встроенных систем, которые оптимизируют низкую задержку и затраты памяти. [thread pool]: https://en.wikipedia.org/wiki/Thread_pool [work stealing]: https://en.wikipedia.org/wiki/Work_stealing Чтобы избежать накладных расходов на повторный опрос футур, исполнители обычно используют API _waker_, поддерживаемый футурами Rust. #### Wakers Идея API waker в том, что специальный тип [`Waker`] передаётся при каждом вызове `poll`, при этом обернутый в тип [`Context`]. Этот тип `Waker` создаётся исполнителем и может использоваться асинхронной задачей для сигнализации о своём (частичном) завершении. В результате исполнитель не должен вызывать `poll` на футуре, которая ранее вернула `Poll::Pending`, пока не получит уведомление от соответствующего waker. [`Context`]: https://doc.rust-lang.org/nightly/core/task/struct.Context.html Лучше всего иллюстрируется небольшим примером: ```rust async fn write_file() { async_write_file("foo.txt", "Hello").await; } ``` Эта функция асинхронно записывает строку "Hello" в файл `foo.txt`. Поскольку запись на жёсткий диск занимает некоторое время, первый вызов `poll` на этой футуре, вероятно, вернёт `Poll::Pending`. Однако драйвер жёсткого диска внутри будет хранить `Waker`, переданный в вызов `poll`, и использовать его для уведомления исполнителя, когда файл будет записан на диск. Таким образом, исполнитель не тратит время на `poll` футуры, пока не получит уведомление от waker. Мы увидим, как работает тип `Waker` в деталях, когда создадим свой собственный исполнитель с поддержкой waker в разделе реализации этого поста. ### Кооперативная Многозадачности? В начале этого поста мы говорили о вытесняющей (preemptive) и кооперативной (cooperative) многозадачности. В то время как вытесняющая многозадачность полагается на операционную систему для принудительного переключения между выполняемыми задачами, кооперативная многозадачность требует, чтобы задачи добровольно уступали контроль над CPU через операцию _yield_ на регулярной основе. Большое преимущество кооперативного подхода в том, что задачи могут сохранять своё состояние самостоятельно, что приводит к более эффективным переключениям контекста и делает возможным совместное использование одного и того же стека вызовов между задачами. Это может не быть сразу очевидным, но футуры и async/await представляют собой реализацию кооперативного паттерна многозадачности: - Каждая футура, добавленная в исполнителя, по сути является кооперативной задачей. - Вместо использования явной операции yield, футуры уступают контроль над ядром CPU, возвращая `Poll::Pending` (или `Poll::Ready` в конце). - Нет ничего, что заставляло бы футуру уступать ЦПУ. Если они захотят, они могут никогда не возвращать ответ на `poll`, например, бесконечно выполняя цикл. - Поскольку каждая футура может блокировать выполнение других футур в исполнителе, нам нужно доверять им, чтобы они не были вредоносными (malicious). - Футуры хранят состояние внутри, которое необходимо для продолжения исполнении при следующем вызове `poll`. При использовании async/await компилятор автоматически определяет все переменные, которые необходимы, и сохраняет их внутри сгенерированной машины состояний. - Сохраняется только минимально необходимое состояние для продолжения. - Поскольку метод `poll` отдает стек вызовов при возврате, тот же стек может использоваться для опроса других футур. Мы видим, что футуры и async/await идеально соответствуют паттерну кооперативной многозадачности; они просто используют другую терминологию. В дальнейшем мы будем использовать термины "задача" и "футура" взаимозаменяемо. ## Реализация Теперь, когда мы понимаем, как работает кооперативная многозадачность на основе футур и async/await в Rust, пора добавить поддержку этого в наше ядро. Поскольку трейт [`Future`] является частью библиотеки `core`, а async/await — это особенность самого языка, нам не нужно делать ничего особенного, чтобы использовать его в нашем `#![no_std]` ядре. Единственное условие - версия rust, как минимум, nightly от `2020-03-25`, поскольку до этого времени async/await не поддерживала `no_std`. С достаточно свежей nightly версией мы можем начать использовать async/await в нашем `main.rs`: ```rust // src/main.rs async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` Здесь `async_number` является `async fn`, поэтому компилятор преобразует её в машину состояний, реализующую `Future`. Поскольку функция возвращает только `42`, результирующая футура непосредственно вернёт `Poll::Ready(42)` при первом вызове `poll`. Как и `async_number`, функция `example_task` также является `async fn`. Она ожидает число, возвращаемое `async_number`, а затем выводит его с помощью макроса `println`. Чтобы запустить футуру, которую вернул `example_task`, нам нужно вызывать `poll` на ней, пока он не сигнализирует о своём завершении, возвращая `Poll::Ready`. Для этого нам нужно создать простой тип исполнителя. ### Задачи (Таски) Перед тем как начать реализацию исполнителя, мы создаем новый модуль `task` с типом `Task`: ```rust // src/lib.rs pub mod task; ``` ```rust // src/task/mod.rs use core::{future::Future, pin::Pin}; use alloc::boxed::Box; pub struct Task { future: Pin>>, } ``` Структура `Task` является обёрткой вокруг _закрепленной_, _размещённой в куче_ и _динамически диспетчерезуемой_ футуры с пустым типом `()` в качестве выходного значения. Разберём её подробнее: - Мы требуем, чтобы футура, связанная с задачей, возвращала `()`. Это означает, что задачи не возвращают никаких результатов, они просто выполняются для побочных эффектов. Например, функция `example_task`, которую мы определили выше, не имеет возвращаемого значения, но выводит что-то на экран как побочный эффект (side effect). - Ключевое слово `dyn` указывает на то, что мы храним [_trait object_] в `Box`. Это означает, что методы на футуре диспетчеризуются динамически, позволяя хранить в типе `Task` разные типы футур. Это важно, поскольку каждая `async fn` имеет свой собственный тип, и мы хотим иметь возможность создавать несколько разных задач. - Как мы узнали в [разделе о закреплении], тип `Pin` обеспечивает, что значение не может быть перемещено в памяти, помещая его в кучу и предотвращая создание `&mut` ссылок на него. Это важно, потому что футуры, генерируемые async/await, могут быть самоссылающимися, т.е. содержать указатели на себя, которые станут недействительными, если футура будет перемещена. [_trait object_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html [_dynamically dispatched_]: https://doc.rust-lang.org/book/ch18-02-trait-objects.html#trait-objects-perform-dynamic-dispatch [разделе о закреплении]: #pinning Чтобы разрешить создание новых структур `Task` из футур, мы создаём функцию `new`: ```rust // src/task/mod.rs impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { future: Box::pin(future), } } } ``` Функция принимает произвольную футуру с выходным типом `()` и закрепляет его в памяти через [`Box::pin`]. Затем она оборачивает упакованную футуру в структуру `Task` и возвращает ее. Здесь нужно время жизни `'static`, т.к. возвращаемый `Task` может жить произвольное время, следовательно, футура также должна быть действительной в течение этого времени. Мы также добавляем метод `poll`, чтобы позволить исполнителю опрашивать хранимую футуру: ```rust // src/task/mod.rs use core::task::{Context, Poll}; impl Task { fn poll(&mut self, context: &mut Context) -> Poll<()> { self.future.as_mut().poll(context) } } ``` Поскольку метод [`poll`] трейта `Future` ожидает вызова на типе `Pin<&mut T>`, мы сначала используем метод [`Pin::as_mut`], чтобы преобразовать поле `self.future` типа `Pin>`. Затем мы вызываем `poll` на преобразованном поле `self.future` и возвращаем результат. Поскольку метод `Task::poll` должен вызываться только исполнителем, который мы создадим через мгновение, мы оставляем функцию приватной для модуля `task`. ### Простой Исполнитель Поскольку исполнители могут быть довольно сложными, мы намеренно начинаем с создания очень базового исполнителя, прежде чем реализовывать более продвинутого. Для этого мы сначала создаём новый подмодуль `task::simple_executor`: ```rust // src/task/mod.rs pub mod simple_executor; ``` ```rust // src/task/simple_executor.rs use super::Task; use alloc::collections::VecDeque; pub struct SimpleExecutor { task_queue: VecDeque, } impl SimpleExecutor { pub fn new() -> SimpleExecutor { SimpleExecutor { task_queue: VecDeque::new(), } } pub fn spawn(&mut self, task: Task) { self.task_queue.push_back(task) } } ``` Структура содержит единственное поле `task_queue` типа [`VecDeque`], которое по сути является вектором, позволяющим выполнять операции добавления и удаления с обоих концов. Идея в том, что мы можем вставлять новые задачи через метод `spawn` в конец и извлекаем следующую задачу для выполнения из начала. Таким образом, мы получаем простую [FIFO очередь] ("первый пришёл — первый вышел"). [`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html [FIFO очередь]: https://ru.wikipedia.org/wiki/FIFO #### Dummy Waker Чтобы вызвать метод `poll`, нам нужно создать тип [`Context`], который оборачивает тип [`Waker`]. Начнём с простого: мы сначала создадим заглушку waker, которая ничего не делает. Для этого мы создаём экземпляр [`RawWaker`], который определяет реализацию различных методов `Waker`, а затем используем функцию [`Waker::from_raw`], чтобы превратить его в `Waker`: [`RawWaker`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html [`Waker::from_raw`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw ```rust // src/task/simple_executor.rs use core::task::{Waker, RawWaker}; fn dummy_raw_waker() -> RawWaker { todo!(); } fn dummy_waker() -> Waker { unsafe { Waker::from_raw(dummy_raw_waker()) } } ``` Функция `from_raw` является небезопасной, может привести к неопределенному поведению (undefined behavior) если программист не соблюдает документированные требования к `RawWaker`. Прежде чем мы рассмотрим реализацию функции `dummy_raw_waker`, давайте сначала попытаемся понять, как работает тип `RawWaker`. ##### `RawWaker` Тип [`RawWaker`] требует от программиста явного определения [_таблицы виртуальных методов_] (_vtable_), которая указывает функции, которые должны быть вызваны при клонировании (cloned), пробуждении (woken) или удалении (droppen) `RawWaker`. Расположение этой vtable определяется типом [`RawWakerVTable`]. Каждая функция получает аргумент `*const ()`, который является _type-erased_ указателем на некоторое значение. Причина использования указателя `*const ()` вместо правильной ссылки в том, что тип `RawWaker` должен быть non-generic, но при этом поддерживать произвольные типы. Указатель передается в аргументе `data` функции [`RawWaker::new`], которая просто инициализирует `RawWaker`. Затем `Waker` использует этот `RawWaker`, чтобы вызывать функции vtable с `data`. [_таблицы виртуальных методов_]: https://en.wikipedia.org/wiki/Virtual_method_table [`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html [`RawWaker::new`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new Как правило, `RawWaker` создаётся для какой-то структуры, размещённой в куче, которая обёрнута в тип [`Box`] или [`Arc`]. Для таких типов можно использовать методы, такие как [`Box::into_raw`], чтобы преобразовать `Box` в указатель `*const T`. Этот указатель затем можно привести к анонимному указателю `*const ()` и передать в `RawWaker::new`. Поскольку каждая функция vtable получает один и тот же `*const ()` в качестве аргумента, функции могут безопасно привести указатель обратно к `Box` или `&T`, чтобы работать с ним. Как вы можете себе представить, этот процесс крайне опасен и легко может привести к неопределённому поведению в случае ошибок. По этой причине вручную создавать `RawWaker` не рекомендуется, если это не является необходимым. [`Box`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`Box::into_raw`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw ##### Заглушка `RawWaker` Хотя вручную создавать `RawWaker` не рекомендуется, в настоящее время нет другого способа создать заглушку `Waker`, которая ничего не делает. К счастью, тот факт, что мы хотим ничего не делать, делает реализацию функции `dummy_raw_waker` относительно безопасной: ```rust // src/task/simple_executor.rs use core::task::RawWakerVTable; fn dummy_raw_waker() -> RawWaker { fn no_op(_: *const ()) {} fn clone(_: *const ()) -> RawWaker { dummy_raw_waker() } let vtable = &RawWakerVTable::new(clone, no_op, no_op, no_op); RawWaker::new(0 as *const (), vtable) } ``` Сначала мы определяем две внутренние функции с именами `no_op` и `clone`. Функция `no_op` принимает указатель `*const ()` и ничего не делает. Функция `clone` также принимает указатель `*const ()` и возвращает новый `RawWaker`, снова вызывая `dummy_raw_waker`. Мы используем эти две функции для создания минимальной `RawWakerVTable`: функция `clone` используется для операций клонирования, а функция `no_op` — для всех остальных операций. Поскольку `RawWaker` ничего не делает, не имеет значения, что мы возвращаем новый `RawWaker` из `clone` вместо его клонирования. После создания `vtable` мы используем функцию [`RawWaker::new`] для создания `RawWaker`. Переданный `*const ()` не имеет значения, поскольку ни одна из функций vtable не использует его. По этой причине мы просто передаем нулевой указатель. #### Метод `run` Теперь у нас есть способ создать экземпляр `Waker`, и мы можем использовать его для реализации метода `run` в нашем исполнителе. Самый простой метод `run` — это многократный опрос всех задач в очереди в цикле до тех пор, пока все они не будут выполнены. Это не очень эффективно, так как не использует уведомления от `Waker`, но это простой способ запустить это: ```rust // src/task/simple_executor.rs use core::task::{Context, Poll}; impl SimpleExecutor { pub fn run(&mut self) { while let Some(mut task) = self.task_queue.pop_front() { let waker = dummy_waker(); let mut context = Context::from_waker(&waker); match task.poll(&mut context) { Poll::Ready(()) => {} // задача выполнена Poll::Pending => self.task_queue.push_back(task), } } } } ``` Функция использует цикл `while let`, чтобы обработать все задачи в `task_queue`. Для каждой задачи сначала создаётся тип `Context`, оборачивая экземпляр `Waker`, возвращаемый нашей функцией `dummy_waker`. Затем вызывается метод `Task::poll` с этим `context`. Если метод `poll` возвращает `Poll::Ready`, задача завершена, и мы можем продолжить с следующей задачей. Если задача всё ещё `Poll::Pending`, мы добавляем её в конец очереди, чтобы она была опрошена снова в следующей итерации цикла. #### Попробуем это С нашим типом `SimpleExecutor` мы теперь можем попробовать запустить задачу, возвращаемую функцией `example_task`, в нашем `main.rs`: ```rust // src/main.rs use blog_os::task::{Task, simple_executor::SimpleExecutor}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] инициализация всякого, включая `init_heap` let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.run(); // […] test_main, "It did not crash!" сообщение, hlt_loop } // ниже example_task, что бы вам не нужно было скролить async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` Когда мы запускаем её, мы видим, что ожидаемое сообщение _"async number: 42"_ выводится на экран: ![QEMU печатает "Hello World", "async number: 42" и "It did not crash!"](qemu-simple-executor.png) Давайте подытожим шаги, которые происходят в этом примере: - Сначала создаётся новый экземпляр нашего типа `SimpleExecutor` с пустой `task_queue`. - Затем мы вызываем асинхронную функцию `example_task`, которая возвращает футуру. Мы оборачиваем эту футуру в тип `Task`, который перемещает её в кучу и закрепляет, а затем добавляем задачу в `task_queue` исполнителя через метод `spawn`. - После этого мы вызываем метод `run`, чтобы начать выполнение единственной задачи в очереди. Это включает в себя: - Извлечение задачи из начала `task_queue`. - Создание `RawWaker` для задачи, преобразование его в экземпляр [`Waker`] и создание экземпляра [`Context`] на его основе. - Вызов метода [`poll`] на футуре задачи, используя только что созданный `Context`. - Поскольку `example_task` не ждёт ничего, она может непосредственно выполняться до конца при первом вызове `poll`. Именно здесь выводится строка _"async number: 42"_. - Т.к `example_task` напрямую возвращает `Poll::Ready`, она не добавляется обратно в очередь задач. - Метод `run` возвращается после того, как `task_queue` становится пустым. Выполнение нашей функции `kernel_main` продолжается, и выводится сообщение _"It did not crash!"_. ### Асинхронный ввод с клавиатуры Наш простой исполнитель не использует уведомления от `Waker`, а просто циклически обрабатывает все задачи до тех пор, пока они не завершатся. Это не проблема для нашего примера, так как `example_task` может завершиться сразу при первом вызове `poll`. Чтобы увидеть преимущества производительности при использования правильной реализации `Waker`, нам нужно сначала создать задачу, которая действительно асинхронна, т.е. задачу, которая, вернёт `Poll::Pending` при первом вызове `poll`. У нас уже есть некий вид асинхронности в нашей системе, который мы можем использовать для этого: аппаратные прерывания. Как мы узнали в посте [_Interrupts_], аппаратные прерывания могут происходить в произвольные моменты времени, определяемые каким-либо внешним устройством. Например, аппаратный таймер отправляет прерывание процессору после истечения заданного времени. Когда процессор получает прерывание, он немедленно передаёт управление соответствующей функции-обработчику, определённой в таблице дескрипторов прерываний (IDT). [_Interrupts_]: @/edition-2/posts/07-hardware-interrupts/index.md В дальнейшем мы создадим асинхронную задачу на основе прерываний с клавиатуры. Это хороший кандидат, такие прерывания недетерминированны по времени и критичны к задержкам. Недетерминированность означает, что невозможно предсказать когда произойдёт нажатие клавиши, поскольку это полностью зависит от пользователя. Критичность к задержкам означает, что мы хотим обрабатывать ввод с клавиатуры своевременно, иначе пользователь почувствует задержку. Чтобы эффективно поддерживать такую задачу, исполнителю будет необходимо обеспечить надлежащую поддержку уведомлений `Waker`. #### Очередь Скан-кодов Сейчас мы обрабатываем ввод с клавиатуры непосредственно в обработчике прерываний. Это не лучший подход в долгосрочной перспективе, обработка прерываний должна выполняться как можно быстрее, так как они могут прерывать важную работу. Вместо этого обработчики прерываний должны выполнять только минимальный объем необходимой работы (например, считывание кода сканирования клавиатуры) и оставлять остальную работу (например, интерпретацию кода сканирования) фоновой задаче. Распространённым шаблоном для делегирования работы фоновым задачам является очередь. Обработчик прерываний добавляет единицы работы в очередь, а фоновая задача обрабатывает работу в очереди. Применительно к наим прерываниям это означает, что обработчик прерываний только считывает скан-код с клавиатуры, добавляет его в очередь, а затем возвращается. Задача клавиатуры находится на другом конце очереди и интерпретирует и обрабатывает каждый скан-код, который в неё добавляется: ![Очередь скан-кодов с 8 слотами вверху. Обработчик прерываний с клавиатуры внизу слева со стрелкой "push скан-код" слева от очереди. Task клавиатуры внизу справа со стрелкой "pop скан-код", идущей с правой стороны очереди.](scancode-queue.svg) Простая реализация такой очереди может быть основана на `VecDeque`, защищённом мьютексом. Однако использование мьютексов в обработчиках прерываний — не очень хорошая идея, так как это может легко привести к взаимным блокировкам (deadlock). Например, пользователь нажимает клавишу, но в тот же момент таска от клавиатуры заблокировала очередь, обработчик прерываний пытается снова захватить блокировку и застревает навсегда. Ещё одна проблема с этим подходом в том, что `VecDeque` автоматически увеличивает свою ёмкость, через аллокацию в куче, при заполнении. Это также может привести к взаимным блокировкам, так как наш аллокатор также использует внутренний мьютекс. Более того, выделение памяти в куче может не получиться или занять значительное время, если куча фрагментирована. Чтобы предотвратить эти проблемы, нам нужна реализация очереди, которая не требует мьютексов или выделений памяти для своей операции `push`. Такие очереди могут быть реализованы с использованием неблокирующих [атомарных операций][atiomic operations] для добавления и извлечения элементов. Таким образом, возможно создать операции `push` и `pop`, которые требуют только ссылки `&self` и могут использоваться без мьютекса. Чтобы избежать выделений памяти при `push`, очередь может быть основана на заранее выделенном буфере фиксированного размера. Хотя это делает очередь _ограниченной_ (_bounded_) (т.е. у неё есть максимальная длина), на практике часто возможно определить разумные верхние границы для длины очереди, так что это не представляет собой большой проблемы. [atomic operations]: https://doc.rust-lang.org/core/sync/atomic/index.html ##### Крейт `crossbeam` Реализовать такую очередь правильно и эффективно очень сложно, поэтому я рекомендую придерживаться существующих, хорошо протестированных реализаций. Один из популярных проектов на Rust, который реализует различные типы без мьютексов для конкурентного программирования — это [`crossbeam`]. Он предоставляет тип под названием [`ArrayQueue`], который именно то, что нам нужно в данном случае. И нам повезло: этот тип полностью совместим с `no_std` библиотеками, поддерживающими выделение памяти. [`crossbeam`]: https://github.com/crossbeam-rs/crossbeam [`ArrayQueue`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html Чтобы использовать этот тип, нам нужно добавить зависимость на библиотеку `crossbeam-queue`: ```toml # Cargo.toml [dependencies.crossbeam-queue] version = "0.3.11" default-features = false features = ["alloc"] ``` По умолчанию библиотека зависит от стандартной библиотеки. Чтобы сделать её совместимой с `no_std`, нам нужно отключить её стандартные функции и вместо этого включить функцию `alloc`. (Заметьте, что мы также могли бы добавить зависимость на основную библиотеку `crossbeam`, которая повторно экспортирует библиотеку `crossbeam-queue`, но это привело бы к большему количеству зависимостей и более длительному времени компиляции.) ##### Реализация Очереди Используя тип `ArrayQueue`, мы теперь можем создать глобальную очередь скан-кодов в новом модуле `task::keyboard`: ```rust // src/task/mod.rs pub mod keyboard; ``` ```rust // src/task/keyboard.rs use conquer_once::spin::OnceCell; use crossbeam_queue::ArrayQueue; static SCANCODE_QUEUE: OnceCell> = OnceCell::uninit(); ``` Поскольку [`ArrayQueue::new`] выполняет выделение памяти в куче, что невозможно на этапе компиляции ([пока что][const-heap-alloc]), мы не можем инициализировать статическую переменную напрямую. Вместо этого мы используем тип [`OnceCell`] из библиотеки [`conquer_once`], который позволяет безопасно выполнить одноразовую инициализацию статических значений. Чтобы включить библиотеку, нам нужно добавить её как зависимость в наш `Cargo.toml`: [`ArrayQueue::new`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.new [const-heap-alloc]: https://github.com/rust-lang/const-eval/issues/20 [`OnceCell`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html [`conquer_once`]: https://docs.rs/conquer-once/0.2.0/conquer_once/index.html ```toml # Cargo.toml [dependencies.conquer-once] version = "0.2.0" default-features = false ``` Вместо примитива [`OnceCell`] мы также могли бы использовать макрос [`lazy_static`]. Однако тип `OnceCell` имеет то преимущество, что мы можем гарантировать, что инициализация не произойдёт в обработчике прерываний, тем самым предотвращая выполнение аллокации в куче в обработчике прерываний. [`lazy_static`]: https://docs.rs/lazy_static/1.4.0/lazy_static/index.html #### Наполнение очереди Чтобы заполнить очередь скан-кодов, мы создаём новую функцию `add_scancode`, которую будем вызывать из обработчика прерываний: ```rust // src/task/keyboard.rs use crate::println; /// вызывается обработчиком прерываний клавиатуры /// /// не должен блокировать или аллоцировать память pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } } else { println!("WARNING: scancode queue uninitialized"); } } ``` Мы используем [`OnceCell::try_get`] для получения ссылки на инициализированную очередь. Если очередь ещё не инициализирована, мы игнорируем скан-код клавиатуры и печатаем предупреждение. Важно, чтобы мы не пытались инициализировать очередь в этой функции, так как она будет вызываться обработчиком прерываний, который не должен выполнять аллокации в куче. Поскольку эта функция не должна быть доступна из нашего `main.rs`, мы используем видимость `pub(crate)`, чтобы сделать её доступной только для нашего `lib.rs`. [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get Тот факт, что метод [`ArrayQueue::push`] требует только ссылки `&self`, делает его очень простым для вызова на статической очереди. Тип `ArrayQueue` выполняет все необходимые синхронизации сам, поэтому нам не нужен мьютекс-обёртка. В случае, если очередь полна, мы также выводим предупреждение. [`ArrayQueue::push`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push Чтобы вызывать функцию `add_scancode` при прерываниях клавиатуры, мы обновляем нашу функцию `keyboard_interrupt_handler` в модуле `interrupts`: ```rust // src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame ) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; crate::task::keyboard::add_scancode(scancode); // новое unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` Мы убрали весь код обработки клавиатуры из этой функции и вместо этого добавили вызов функции `add_scancode`. Остальная часть функции остаётся такой же, как и прежде. Как и ожидалось, нажатия клавиш больше не выводятся на экран, когда мы запускаем наш проект с помощью `cargo run`. Вместо этого пишется предупреждение, что очередь не инициализирована при каждом нажатия клавиши. #### Стрим Скан-кодов Чтобы инициализировать `SCANCODE_QUEUE` и считывать скан-коды из очереди асинхронным способом, мы создаём новый тип `ScancodeStream`: ```rust // src/task/keyboard.rs pub struct ScancodeStream { _private: (), } impl ScancodeStream { pub fn new() -> Self { SCANCODE_QUEUE.try_init_once(|| ArrayQueue::new(100)) .expect("ScancodeStream::new should only be called once"); ScancodeStream { _private: () } } } ``` Цель поля `_private` — предотвратить создание структуры из внешних модулей. Это делает функцию `new` единственным способом создать данный тип. В функции мы сначала пытаемся инициализировать статическую переменную `SCANCODE_QUEUE`. Если она уже инициализирована, мы вызываем панику, чтобы гарантировать, что можно создать только один экземпляр `ScancodeStream`. Чтобы сделать скан-коды доступными для асинхронных задач, нужно реализовать метод, подобный `poll`, который пытается извлечь следующий скан-код из очереди. Хотя это звучит так, будто мы должны реализовать трейт [`Future`] для нашего типа, здесь он не подходит. Проблема в том, что трейт `Future` абстрагируется только над одним асинхронным значением и ожидает, что метод `poll` не будет вызываться снова после того, как он вернёт `Poll::Ready`. Наша очередь скан-кодов, однако, содержит несколько асинхронных значений, поэтому нормально продолжать опрашивать её. ##### Трейт `Stream` Поскольку типы, которые возвращают несколько асинхронных значений, являются распространёнными, библиотека [`futures`] предоставляет полезную абстракцию для таких типов: трейт [`Stream`]. Определение трейта: [`Stream`]: https://rust-lang.github.io/async-book/05_streams/01_chapter.html ```rust pub trait Stream { type Item; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll>; } ``` Это определение довольно похоже на трейт [`Future`], со следующими отличиями: - Ассоциированный тип называется `Item`, а не `Output`. - Вместо метода `poll`, который возвращает `Poll`, трейт `Stream` определяет метод `poll_next`, который возвращает `Poll>` (обратите внимание на дополнительный `Option`). Существует также семантическое отличие: метод `poll_next` можно вызывать многократно, пока он не вернёт `Poll::Ready(None)`, чтобы сигнализировать о том, что стрим завершён. В этом отношении метод похож на метод [`Iterator::next`], который также возвращает `None` после последнего значения. [`Iterator::next`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html#tymethod.next ##### Реализация `Stream` Давайте реализуем трейт `Stream` для нашего `ScancodeStream`, чтобы предоставлять значения из `SCANCODE_QUEUE` асинхронным способом. Для этого нам сначала нужно добавить зависимость на библиотеку `futures-util`, которая содержит тип `Stream`: ```toml # Cargo.toml [dependencies.futures-util] version = "0.3.4" default-features = false features = ["alloc"] ``` Мы отключаем стандартные функции, чтобы сделать библиотеку совместимой с `no_std`, и включаем функцию `alloc`, чтобы сделать доступными её типы, основанные на аллокации памяти (это понадобится позже). (Заметьте, что мы также могли бы добавить зависимость на основную библиотеку `futures`, которая повторно экспортирует библиотеку `futures-util`, но это привело бы к большему количеству зависимостей и более длительному времени компиляции.) Теперь мы можем импортировать и реализовать трейт `Stream`: ```rust // src/task/keyboard.rs use core::{pin::Pin, task::{Poll, Context}}; use futures_util::stream::Stream; impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE.try_get().expect("not initialized"); match queue.pop() { Some(scancode) => Poll::Ready(Some(scancode)), None => Poll::Pending, } } } ``` Сначала мы используем метод [`OnceCell::try_get`] для получения ссылки на инициализированную очередь скан-кодов. Это никогда не должно вызывать ошибок, так как мы инициализируем очередь в функции `new`, поэтому мы можем безопасно использовать метод `expect`. Далее мы используем метод [`ArrayQueue::pop`] для попытки получить следующий элемент из очереди. Если это успешно, мы возвращаем скан-код, обёрнутый в `Poll::Ready(Some(…))`. Если это не удаётся, это означает, что очередь пуста. В этом случае мы возвращаем `Poll::Pending`. [`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop #### Поддержка Waker Как и метод `Futures::poll`, метод `Stream::poll_next` требует от асинхронной задачи уведомить исполнителя, когда она становится готовой после возврата `Poll::Pending`. Таким образом, исполнителю не нужно повторно опрашивать ту же задачу, пока она не получит сигнал, что значительно снижает накладные расходы на ожидание задач. Чтобы отправить это уведомление, задача должна извлечь [`Waker`] из переданной ссылки [`Context`] и сохранить его где-то. Когда задача становится готовой, она должна вызвать метод [`wake`] на сохранённом `Waker`, чтобы уведомить исполнителя о том, что задачу следует опросить снова. ##### AtomicWaker Чтобы реализовать уведомление `Waker` для нашего `ScancodeStream`, нам нужно место, где мы можем хранить `Waker` между вызовами `poll`. Мы не можем хранить его как поле в самом `ScancodeStream`, потому что он должен быть доступен из функции `add_scancode`. Решение этой проблемы — использование статической переменной типа [`AtomicWaker`], предоставляемой библиотекой `futures-util`. Как и тип `ArrayQueue`, этот тип основан на атомарных инструкциях и может безопасно храниться в `static` и модифицироваться параллельно. [`AtomicWaker`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html Давайте используем тип [`AtomicWaker`] для определения статической переменной `WAKER`: ```rust // src/task/keyboard.rs use futures_util::task::AtomicWaker; static WAKER: AtomicWaker = AtomicWaker::new(); ``` Идея в том, что реализация `poll_next` хранит текущий `waker` в этой статической переменной, а функция `add_scancode` вызывает функцию `wake` на ней, когда новый скан-код добавляется в очередь. ##### Хранение Waker Контракт, определяемый `poll`/`poll_next`, требует, чтобы задача зарегистрировала уведомление для переданного `Waker`, когда она возвращает `Poll::Pending`. Давайте изменим нашу реализацию `poll_next`, чтобы соблюдать это требование: ```rust // src/task/keyboard.rs impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE .try_get() .expect("scancode queue not initialized"); // первый путь if let Some(scancode) = queue.pop() { return Poll::Ready(Some(scancode)); } WAKER.register(&cx.waker()); match queue.pop() { Some(scancode) => { WAKER.take(); Poll::Ready(Some(scancode)) } None => Poll::Pending, } } } ``` Как и прежде, сначала мы используем функцию [`OnceCell::try_get`] для получения ссылки на инициализированную очередь скан-кодов. Затем мы оптимистично пытаемся выполнить `pop` из очереди и возвращаем `Poll::Ready`, при успехе. Таким образом, мы можем избежать накладных расходов на регистрацию `waker`, когда очередь не пуста. Если первый вызов `queue.pop()` неуспешен, то очередь потенциально пуста. Потенциально, потому что обработчик прерываний мог заполнить очередь асинхронно сразу после проверки. Поскольку это состояние гонки может возникнуть снова для следующей проверки, мы должны зарегистрировать `Waker` в статической переменной `WAKER` перед второй проверкой. Таким образом, уведомление может произойти до того, как мы вернём `Poll::Pending`, но гарантируется, что мы получим уведомление для любых скан-кодов, добавленных после проверки. После регистрации `Waker`, содержащегося в переданном [`Context`], через функцию [`AtomicWaker::register`], мы пытаемся выполнить `pop` из очереди во второй раз. Если теперь это получается, мы возвращаем `Poll::Ready`. Мы также снова удаляем зарегистрированный `waker` с помощью [`AtomicWaker::take`], т.к. уведомление `waker` больше не нужно. Если `queue.pop()` снова неуспешно, мы возвращаем `Poll::Pending`, как и прежде, но на этот раз с зарегистрированным уведомлением. [`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register [`AtomicWaker::take`]: https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take Обратите внимание, что уведомление для задачи, которая ещё не вернула `Poll::Pending`, может произойти двумя способами. Один из способов — это упомянутое состояние гонки, когда уведомление происходит незадолго до возвращения `Poll::Pending`. Другой способ — это когда очередь больше не пуста после регистрации `waker`, так что возвращается `Poll::Ready`. Поскольку эти ложные уведомления предотвратить невозможно, исполнитель должен уметь правильно с ними справляться. ##### Пробуждение хранящихся Waker Чтобы разбудить сохранённый `Waker`, мы добавляем вызов `WAKER.wake()` в функцию `add_scancode`: ```rust // src/task/keyboard.rs pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } else { WAKER.wake(); // новое } } else { println!("WARNING: scancode queue uninitialized"); } } ``` Единственное изменение, которое мы внесли, — это добавление вызова `WAKER.wake()`, если добавление в очередь скан-кодов прошло успешно. Если в статической переменной `WAKER` зарегистрирован `waker`, этот метод вызовет одноимённый метод [`wake`] на нём, который уведомляет исполнителя. Иначе операция ничего не делает. [`wake`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.wake Важно, чтобы мы вызывали `wake` только после добавления в очередь, потому что в противном случае задача может быть разбужена слишком рано, пока очередь всё ещё пуста. Это может, например, произойти при использовании многопоточного исполнителя, который запускает пробуждённую задачу параллельно на другом ядре CPU. Хотя у нас пока нет поддержки потоков, мы добавим её скоро и не хотим, чтобы всё сломалось в этом случае. #### Задачи от Клавиатуры Теперь, когда мы реализовали трейт `Stream` для `ScancodeStream`, мы можем использовать его для создания асинхронной задач от клавиатуры (таски): ```rust // src/task/keyboard.rs use futures_util::stream::StreamExt; use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use crate::print; pub async fn print_keypresses() { let mut scancodes = ScancodeStream::new(); let mut keyboard = Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore); while let Some(scancode) = scancodes.next().await { if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } } } ``` Код очень похож на тот, который у нас был в нашем обработчике прерываний клавиатуры ([keyboard interrupt handler]) до того, как мы его изменили в этом посте. Единственное различие в том, что вместо чтения скан-кода из порта ввода-вывода мы берем его из `ScancodeStream`. Для этого мы сначала создаем новый стрим `Scancode`, а затем многократно используем метод [`next`], предоставляемый трейтами [`StreamExt`], чтобы получить `Future`, который разрешается в следующий элемент стрима. Используя оператор `await`, мы асинхронно ожидаем результат этого `Future`. [keyboard interrupt handler]: @/edition-2/posts/07-hardware-interrupts/index.md#interpreting-the-scancodes [`next`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html#method.next [`StreamExt`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html Мы используем `while let` для цикла, пока стрим не вернет `None`, сигнализируя о своем завершении. Поскольку наш метод `poll_next` никогда не возвращает `None`, это фактически бесконечный цикл, поэтому задача `print_keypresses` никогда не завершается. Давайте добавим таску `print_keypresses` в наш исполнитель в `main.rs`, чтобы снова получить работающий ввод с клавиатуры: ```rust // src/main.rs use blog_os::task::keyboard; // новое fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] инициализация всякого, включая init_heap, test_main let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); // новое executor.run(); // […] сообщение "it did not crash", hlt_loop } ``` Когда мы выполняем `cargo run` сейчас, мы видим, что ввод с клавиатуры снова работает: ![QEMU печатает ".....H...e...l...l..o..... ...W..o..r....l...d...!"](qemu-keyboard-output.gif) Если вы будете следить за загрузкой процессора вашего компьютера, вы увидите, что процесс `QEMU` теперь постоянно загружает CPU. Это происходит потому, что наш `SimpleExecutor` многократно опрашивает задачи в цикле. Поэтому даже если мы не нажимаем никаких клавиш на клавиатуре, исполнитель снова и снова вызывает `poll` для нашей задачи `print_keypresses`, хотя задача не может добиться прогресса и будет каждый раз возвращать `Poll::Pending`. ### Исполнитель с Поддержкой Waker Чтобы решить проблему производительности, нам нужно создать исполнитель, который правильно использует уведомления `Waker`. Так исполнитель будет уведомлен при следующем прерывании клавиатуры и ему не нужно будет постоянно опрашивать задачу `print_keypresses`. #### Id Задачи Первый шаг в создании исполнителя с правильной поддержкой уведомлений waker — это дать каждой задаче уникальный идентификатор. Это необходимо, потому что нам нужно иметь способ указать, какую задачу следует разбудить. Мы начинаем с создания нового типа-обёртки `TaskId`: ```rust // src/task/mod.rs #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] struct TaskId(u64); ``` Структура `TaskId` — это простая обёртка вокруг `u64`. Мы добавляем `derive` для того, что бы она была печатаемой, сравнимой, копируемой и сортируемой. Последнее важно, т.к. в дальнейшем мы хотим использовать `TaskId` в качестве типа ключа для [`BTreeMap`]. [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html Для создания нового уникального идентификатора мы создаём функцию `TaskId::new`: ```rust use core::sync::atomic::{AtomicU64, Ordering}; impl TaskId { fn new() -> Self { static NEXT_ID: AtomicU64 = AtomicU64::new(0); TaskId(NEXT_ID.fetch_add(1, Ordering::Relaxed)) } } ``` Функция использует статическую переменную `NEXT_ID` типа [`AtomicU64`], чтобы гарантировать, что каждый идентификатор присваивается только один раз. Метод [`fetch_add`] атомарно увеличивает значение и возвращает предыдущее за одну атомарную операцию. Это значит, что даже когда метод `TaskId::new` вызывается параллельно, каждый идентификатор возвращается ровно один раз. Параметр [`Ordering`] определяет, может ли компилятор переупорядочить операцию `fetch_add` в стриме инструкций. Поскольку мы только требуем, чтобы идентификатор был уникальным, в этом случае достаточно использования упорядочивание `Relaxed` с самыми слабыми требованиями. [`AtomicU64`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html [`fetch_add`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html#method.fetch_add [`Ordering`]: https://doc.rust-lang.org/core/sync/atomic/enum.Ordering.html Теперь мы можем расширить наш тип `Task`, добавив поле `id`: ```rust // src/task/mod.rs pub struct Task { id: TaskId, // новое future: Pin>>, } impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { id: TaskId::new(), // новое future: Box::pin(future), } } } ``` Новое поле `id` позволяет уникально называть задачу, что необходимо для пробуждения конкретной задачи. #### Тип `Executor` Мы создаем наш новый тип `Executor` в модуле `task::executor`: ```rust // src/task/mod.rs pub mod executor; ``` ```rust // src/task/executor.rs use super::{Task, TaskId}; use alloc::{collections::BTreeMap, sync::Arc}; use core::task::Waker; use crossbeam_queue::ArrayQueue; pub struct Executor { tasks: BTreeMap, task_queue: Arc>, waker_cache: BTreeMap, } impl Executor { pub fn new() -> Self { Executor { tasks: BTreeMap::new(), task_queue: Arc::new(ArrayQueue::new(100)), waker_cache: BTreeMap::new(), } } } ``` Вместо хранения задач в [`VecDeque`], как мы делали для нашего `SimpleExecutor`, мы используем `task_queue` с идентификаторами задач и [`BTreeMap`] с именем `tasks`, который содержит фактические экземпляры `Task`. Карта индексируется по `TaskId`, что позволяет эффективно продолжать выполнение конкретной задачи. Поле `task_queue` представляет собой [`ArrayQueue`] идентификаторов задач, обёрнутую в тип [`Arc`], который реализует _счётчик ссылок_ (_reference counting_). Счётчик ссылок позволяет разделять владение значением между несколькими владельцами. Он аллоцирует место куче и записывает туда кол-во активных ссылок. Когда количество активных ссылок достигает нуля, значение больше не нужно и может быть освобождено. Мы используем тип `Arc` для `task_queue`, потому что он будет разделяться между исполнителем и wakers. Идея в том, что wakers добавляют идентификатор разбуженной задачи в очередь. Исполнитель находится на приемной стороне очереди, извлекает разбуженные задачи по их идентификатору из `tasks` дерева и затем выполняет их. Причина использования фиксированной очереди вместо неограниченной, такой как [`SegQueue`] в том, что обработчики прерываний не должны выделять память при добавлении в эту очередь. В дополнение к `task_queue` и дереве `tasks`, тип `Executor` имеет поле `waker_cache`, которое также является деревом. Это дерево кэширует [`Waker`] задачи после его создания. На это имеется две причины: во-первых, это улучшает производительность, повторно используя тот же waker для нескольких пробуждений одной и той же задачи, вместо создания нового waker каждый раз. Во-вторых, это гарантирует, что wakers с подсчётом ссылок не освобождаются внутри обработчиков прерываний, поскольку это может привести к взаимным блокировкам (подробнее об ниже). [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`SegQueue`]: https://docs.rs/crossbeam-queue/0.2.1/crossbeam_queue/struct.SegQueue.html Чтобы создать `Executor`, мы предоставляем простую функцию `new`. Мы выбираем ёмкость 100 для `task_queue`, что должно быть более чем достаточно на обозримое будущее. В случае, если в нашей системе в какой-то момент будет больше 100 параллельных задач, мы можем легко увеличить этот размер. #### Spawn Задач Как и в `SimpleExecutor`, мы предоставляем метод `spawn` для нашего типа `Executor`, который добавляет данную задачу в дерево `tasks` и немедленно пробуждает её, добавляя её идентификатор в `task_queue`: ```rust // src/task/executor.rs impl Executor { pub fn spawn(&mut self, task: Task) { let task_id = task.id; if self.tasks.insert(task.id, task).is_some() { panic!("task with same ID already in tasks"); } self.task_queue.push(task_id).expect("queue full"); } } ``` Если в карте уже существует задача с тем же идентификатором, метод [`BTreeMap::insert`] возвращает её. Это никогда не должно происходить, поскольку каждая задача имеет уникальный идентификатор, поэтому в этом случае мы вызываем панику, так как это указывает на ошибку в нашем коде. Аналогично, мы вызываем панику, когда `task_queue` полна, так как этого никогда не должно происходить, если мы выбираем достаточно большой размер очереди. #### Запуск Задач Чтобы выполнить все задачи в `task_queue`, мы создаём приватный метод `run_ready_tasks`: ```rust // src/task/executor.rs use core::task::{Context, Poll}; impl Executor { fn run_ready_tasks(&mut self) { // деструктуризация `self` что бы избежать ошибок проверки заимствования (borrow checker) let Self { tasks, task_queue, waker_cache, } = self; while let Some(task_id) = task_queue.pop() { let task = match tasks.get_mut(&task_id) { Some(task) => task, None => continue, // task больше нету }; let waker = waker_cache .entry(task_id) .or_insert_with(|| TaskWaker::new(task_id, task_queue.clone())); let mut context = Context::from_waker(waker); match task.poll(&mut context) { Poll::Ready(()) => { // задача готова -> удалить ее и кеширумый waker tasks.remove(&task_id); waker_cache.remove(&task_id); } Poll::Pending => {} } } } } ``` Смысл функции схож со смыслом `SimpleExecutor`: циклично перебираем все задачи в `task_queue`, создаём waker для каждой задачи и затем опрашиваем их. Однако вместо того, чтобы добавлять ожидающие задачи обратно в конец `task_queue`, мы позволяем реализации нашего `TaskWaker` заботиться о добавлении пробуждённых задач обратно в очередь. Реализация этого типа waker будет показана через мгновение. Давайте рассмотрим некоторые детали реализации этого метода `run_ready_tasks`: - Мы используем _деструктуризацию_ [_destructuring_], чтобы разделить `self` на три поля, чтобы избежать некоторых ошибок компилятора. В частности, наша реализация требует доступа к `self.task_queue` изнутри замыкания, что в данный момент пытается полностью заимствовать `self`. Это фундаментальная проблема компилятора, которая будет решена в [RFC 2229], [проблема][RFC 2229 impl]. - Для каждого извлеченного идентификатора задачи мы получаем мутабельную ссылку на соответствующую задачу из дерева `tasks`. Поскольку наша реализация `ScancodeStream` регистрирует wakers перед проверкой, нужно ли задачу отправить в сон, может случиться так, что произойдёт пробуждение для задачи, которой больше не существует. В этом случае мы просто игнорируем пробуждение и продолжаем со следующим идентификатором из очереди. - Чтобы избежать накладных расходов на создание waker при каждом опросе, мы используем дерево `waker_cache` для хранения waker для каждой задачи после ее создания. Для этого мы используем метод [`BTreeMap::entry`] в сочетании с [`Entry::or_insert_with`] для создания нового waker, если он ещё не существует, а затем получаем на него мутабельную ссылку. Для создания нового waker мы клонируем `task_queue` и передаём его вместе с идентификатором задачи в функцию `TaskWaker::new` (реализация ниже). Поскольку `task_queue` обёрнута в `Arc`, `clone` только увеличивает счётчик ссылок на значение, но всё равно указывает на ту же выделенную в куче очередь. Обратите внимание, что повторное использование wakers таким образом, невозможно для всех реализаций waker, но наш тип `TaskWaker` это позволит. [_destructuring_]: https://doc.rust-lang.org/book/ch19-03-pattern-syntax.html#destructuring-to-break-apart-values [RFC 2229]: https://github.com/rust-lang/rfcs/pull/2229 [RFC 2229 impl]: https://github.com/rust-lang/rust/issues/53488 [`BTreeMap::entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry [`Entry::or_insert_with`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with Задача считается завершённой, когда она возвращает `Poll::Ready`. В этом случае мы удаляем её из дерева `tasks` с помощью метода [`BTreeMap::remove`]. Мы также удаляем её кэшированный waker, если он существует. [`BTreeMap::remove`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove #### Архитектура Waker Задача waker — добавить идентификатор разбуженной задачи в `task_queue` исполнителя. Мы реализуем это, создавая новую структуру `TaskWaker`, которая хранит идентификатор задачи и ссылку на `task_queue`: ```rust // src/task/executor.rs struct TaskWaker { task_id: TaskId, task_queue: Arc>, } ``` Поскольку владение `task_queue` разделяется между исполнителем и wakers, мы используем обёртку типа [`Arc`] для реализации совместного владения с подсчётом ссылок. [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html Реализация операции пробуждения довольно проста: ```rust // src/task/executor.rs impl TaskWaker { fn wake_task(&self) { self.task_queue.push(self.task_id).expect("task_queue full"); } } ``` Мы добавляем `task_id` в ссылку на `task_queue`. Поскольку модификации типа [`ArrayQueue`] требуют только совместной ссылки, мы можем реализовать этот метод на `&self`, а не на `&mut self`. ##### Трейт `Wake` Чтобы использовать наш тип `TaskWaker` для опроса futures, нам нужно сначала преобразовать его в экземпляр [`Waker`]. Это необходимо, потому что метод [`Future::poll`] принимает экземпляр [`Context`] в качестве аргумента, который можно создать только из типа `Waker`. Хотя мы могли бы сделать это, предоставив реализацию типа [`RawWaker`], проще и безопаснее реализовать трейт [`Wake`][wake-trait] на основе `Arc` и затем использовать реализации [`From`], предоставленные стандартной библиотекой, для создания `Waker`. Реализация трейта выглядит следующим образом: [wake-trait]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html ```rust // src/task/executor.rs use alloc::task::Wake; impl Wake for TaskWaker { fn wake(self: Arc) { self.wake_task(); } fn wake_by_ref(self: &Arc) { self.wake_task(); } } ``` Поскольку wakers обычно разделяются между исполнителем и асинхронными задачами, методы трейта требуют, чтобы экземпляр `Self` был обёрнут в тип [`Arc`], который реализует владение с подсчётом ссылок. Это означает, что нам нужно переместить наш `TaskWaker` в `Arc`, чтобы вызвать их. Разница между методами `wake` и `wake_by_ref` заключается в том, что последний требует только ссылки на `Arc`, в то время как первый забирает владение `Arc` и, следовательно, часто требует увеличения счётчика ссылок. Не все типы поддерживают пробуждение по ссылке, поэтому реализация метода `wake_by_ref` является необязательной. Однако это может привести к лучшей производительности, так как избегает ненужных модификаций счётчика ссылок. В нашем случае мы можем просто перенаправить оба метода трейта к нашей функции `wake_task`, которая требует только совместимой ссылки `&self`. ##### Создание Wakers Поскольку тип `Waker` поддерживает преобразования [`From`] для всех значений, обёрнутых в `Arc` и реализующих трейт `Wake`, мы теперь можем реализовать функцию `TaskWaker::new`, необходимую для метода `Executor::run_ready_tasks`: [`From`]: https://doc.rust-lang.org/nightly/core/convert/trait.From.html ```rust // src/task/executor.rs impl TaskWaker { fn new(task_id: TaskId, task_queue: Arc>) -> Waker { Waker::from(Arc::new(TaskWaker { task_id, task_queue, })) } } ``` Мы создаём `TaskWaker`, используя переданные `task_id` и `task_queue`. Затем мы оборачиваем `TaskWaker` в `Arc` и используем реализацию `Waker::from`, чтобы преобразовать его в [`Waker`]. Этот метод `from` заботится о создании экземпляра [`RawWakerVTable`] и [`RawWaker`] для нашего типа `TaskWaker`. Если вам интересно, как это работает в деталях, ознакомьтесь с [реализацией в crate `alloc`][waker-from-impl]. [waker-from-impl]: https://github.com/rust-lang/rust/blob/cdb50c6f2507319f29104a25765bfb79ad53395c/src/liballoc/task.rs#L58-L87 #### Метод `run` С нашей реализацией waker мы наконец можем создать метод `run` для нашего исполнителя: ```rust // src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); } } } ``` Этот метод просто вызывает функцию `run_ready_tasks` в цикле. Хотя теоретически мы могли бы выйти из функции, когда карта `tasks` станет пустой, этого никогда не произойдёт, так как наша `keyboard::print_keypresses` никогда не завершается, поэтому простого `loop` будет достаточно. Поскольку функция никогда не возвращается, мы используем тип возвращаемого значения `!`, чтобы пометить функцию как [расходящуюся][diverging] для компилятора. [diverging]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html Теперь мы можем изменить наш `kernel_main`, чтобы использовать наш новый `Executor` вместо `SimpleExecutor`: ```rust // src/main.rs use blog_os::task::executor::Executor; // новое fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] инициализация всякого, включая init_heap, test_main let mut executor = Executor::new(); // новое executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); executor.run(); } ``` Нам нужно только изменить импорт и имя типа. Поскольку наша функция `run` помечена как расходящаяся, компилятор знает, что она никогда не возвращается, поэтому нам больше не нужно вызывать `hlt_loop` в конце функции `kernel_main`. Когда мы теперь запускаем наше ядро с помощью `cargo run`, мы видим, что ввод с клавиатуры всё ещё работает: ![QEMU печатает ".....H...e...l...l..o..... ...a..g..a....i...n...!"](qemu-keyboard-output-again.gif) Однако, загрузка процессора QEMU не уменьшилась. Причина в том, что мы по-прежнему загружаем процессор всё время. Мы больше не опрашиваем задачи, пока они не будут пробуждены снова, но мы всё же проверяем `task_queue` в цикле с занятым ожиданием. Чтобы это исправить, нам нужно перевести процессор в спящий режим, если больше нет работы. #### Спать если Idle Основная идея в том, чтобы выполнять [инструкцию `hlt`][`hlt` instruction] при пустой `task_queue`. Эта инструкция ставит процессор в спящий режим до следующего прерывания. Факт, что процессор немедленно активируется снова при возникновении прерывания, обеспечивает возможность прямой реакции, когда обработчик прерываний добавляет задачу в `task_queue`. [`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) Для реализации этого мы создаём новый метод `sleep_if_idle` в нашем исполнителе и вызываем его из метода `run`: ```rust // src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); self.sleep_if_idle(); // новое } } fn sleep_if_idle(&self) { if self.task_queue.is_empty() { x86_64::instructions::hlt(); } } } ``` Поскольку мы вызываем `sleep_if_idle` сразу после `run_ready_tasks`, который циклично выполняется до тех пор, пока `task_queue` не станет пустой, проверка очереди может показаться ненужной. Однако аппаратное прерывание может произойти сразу после того, как `run_ready_tasks` возвращает, поэтому в момент вызова функции `sleep_if_idle` может оказаться новая задача в очереди. Только если очередь всё ещё пуста, мы ставим процессор в спящий режим, выполняя инструкцию `hlt` через обёрточную функцию [`instructions::hlt`], предоставляемую библиотекой [`x86_64`]. [`instructions::hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/fn.hlt.html [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/index.html К сожалению, в этой реализации всё ещё присутствует небольшой race condition. Т.к. прерывания асинхронные и могут происходить в любое время, возможно, что прерывание произойдёт сразу между проверкой `is_empty` и вызовом `hlt`: ```rust if self.task_queue.is_empty() { /// <--- прерывание может быть тут x86_64::instructions::hlt(); } ``` Если это прерывание добавляет задачу в `task_queue`, мы ставим процессор в спящий режим, даже несмотря на то, что теперь есть готовая задача. В худшем случае это может задержать обработку прерывания клавиатуры до следующего нажатия клавиши или следующего таймерного прерывания. Как же нам этого избежать? Ответ заключается в том, чтобы отключить прерывания на процессоре перед проверкой и атомарно включить их снова вместе с инструкцией `hlt`. Таким образом, все прерывания, которые происходят между этими действиями, будут отложены после инструкции `hlt`, чтобы не пропустить никаких пробуждений. Для реализации этого подхода мы можем использовать функцию [`interrupts::enable_and_hlt`][`enable_and_hlt`], предоставляемую библиотекой [`x86_64`]. [`enable_and_hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.enable_and_hlt.html Обновлённая реализация нашей функции `sleep_if_idle` выглядит следующим образом: ```rust // src/task/executor.rs impl Executor { fn sleep_if_idle(&self) { use x86_64::instructions::interrupts::{self, enable_and_hlt}; interrupts::disable(); if self.task_queue.is_empty() { enable_and_hlt(); } else { interrupts::enable(); } } } ``` Чтобы избежать состояний гонки, мы отключаем прерывания перед проверкой, пуста ли `task_queue`. Если она пуста, мы используем функцию [`enable_and_hlt`], чтобы включить прерывания и поставить процессор в спящий режим в рамках одной атомарной операции. Если очередь больше не пуста, это означает, что прерывание пробудило задачу после возврата `run_ready_tasks`. В этом случае мы снова включаем прерывания и продолжаем выполнение, не выполняя `hlt`. Теперь наш исполнитель правильно ставит процессор в спящий режим, когда задач нет. Мы можем видеть, что загрузка процессора QEMU значительно снизилась, когда мы снова запускаем наше ядро с помощью `cargo run`. #### Возможные Расширения Наш исполнитель теперь способен эффективно выполнять задачи. Он использует уведомления waker, чтобы избежать опроса ожидающих задач, и переводит процессор в спящий режим, когда задач нет. Однако наш исполнитель всё ещё довольно примитивный, и существует множество способов расширить его функциональность: - **Планирование**: Для нашей `task_queue` мы в настоящее время используем тип [`VecDeque`] для реализации стратегии _первый пришёл — первый вышел_ (FIFO), которая часто также называется _круговым_ планированием. Эта стратегия может быть не самой эффективной для произвольной нагрузки. Например, имеет смысл приоритизировать таски, где критична задержка или таски, выполняющие много ввода-вывода. Для получения дополнительной информации смотрите [главу о планировании][scheduling chapter] книги [_Operating Systems: Three Easy Pieces_] или [статью в Википедии о планировании][scheduling-wiki]. - **Создание задач**: Сейчас метод `Executor::spawn` требует ссылки `&mut self`, и поэтому он недоступен после вызова метода `run`. Чтобы это исправить, мы могли бы создать дополнительный тип `Spawner`, который делит какую-то очередь с исполнителем и позволяет создавать задачи изнутри самих задач. Очередь может быть `task_queue` напрямую или отдельной очередью, которую исполнитель проверяет в своём цикле выполнения. - **Использование потоков**: У нас пока нет поддержки потоков, но мы добавим её в следующем посте. Это сделает возможным запуск нескольких экземпляров исполнителя в разных потоках. Преимущество этого подхода заключается в том, что задержка, вызванная длительными задачами, может быть уменьшена, так как другие задачи могут выполняться параллельно. Этот подход также позволяет использовать несколько ядер процессора. - **Балансировка нагрузки**: При добавлении поддержки потоков становится важно быть в курсе, как распределяются задачи между исполнителями, чтобы обеспечить использование всех ядер процессора. Распространённой техникой для этого является [_work stealing_]. [scheduling chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf [_Operating Systems: Three Easy Pieces_]: http://pages.cs.wisc.edu/~remzi/OSTEP/ [scheduling-wiki]: https://en.wikipedia.org/wiki/Scheduling_(computing) [_work stealing_]: https://en.wikipedia.org/wiki/Work_stealing ## Итоги Мы начали этот пост с обсуждения **многозадачности** и различий между _вытесняемой_, которая регулярно прерывает выполняющиеся задачи, и _кооперативной_, позволяющей задачам работать до тех пор, пока они не добровольно отдадут управление процессором. Затем мы исследовали, как поддержка Rust **async/await** предоставляет реализацию кооперативной многозадачности на уровне языка. Rust основывает свою реализацию на опросном (polling-based) трейте `Future`, который абстрагирует асинхронные задачи. С использованием async/await возможно работать с futures почти так же, как с обычным синхронным кодом. Разница заключается в том, что асинхронные функции снова возвращают `Future`, который в какой-то момент должен быть добавлен в исполнителя для запуска. За кулисами компилятор преобразует код async/await в _конечный автомат_, при этом каждая операция `.await` соответствует возможной точке остановки. Используя свои знания о программе, компилятор может сохранять только минимальное состояние для каждой точки остановки, что приводит к очень низкому потреблению памяти на задачу. Одной из проблем является то, что сгенерированные автоматы могут содержать _самоссылающиеся_ структуры, например, когда локальные переменные асинхронной функции ссылаются друг на друга. Чтобы избежать недействительных указателей, Rust использует тип `Pin`, чтобы гарантировать, что futures не могут быть перемещены в памяти после их первого опроса. Для нашей **реализации** мы сначала создали очень простой исполнитель, который опрашивает все запущенные задачи в цикле с занятым ожиданием, не используя тип `Waker`. Затем мы продемонстрировали преимущество уведомлений waker, реализовав асинхронную задачу клавиатуры. Задача определяет статический `SCANCODE_QUEUE`, используя неблокирующий тип `ArrayQueue`, предоставленный библиотекой `crossbeam`. Вместо непосредственной обработки нажатий клавиш, обработчик прерываний клавиатуры теперь помещает все полученные скан-коды в очередь и затем пробуждает зарегистрированный `Waker`, чтобы сигнализировать, что новый ввод доступен. На принимающей стороне мы создали тип `ScancodeStream`, чтобы предоставить `Future`, разрешающийся в следующий скан-код в очереди. Это сделало возможным создание асинхронной задачи `print_keypresses`, которая использует async/await для интерпретации и вывода скан-кодов в очереди. Чтобы использовать уведомления waker для тасков клавиатуры, мы создали новый тип `Executor`, который использует `task_queue` на основе `Arc` для готовых задач. Мы реализовали тип `TaskWaker`, который добавляет идентификаторы разбуженных задач непосредственно в эту `task_queue`, которые затем снова опрашиваются исполнителем. Чтобы сэкономить энергию, когда нет запущенных задач, мы добавили поддержку перевода процессора в спящий режим с использованием инструкции `hlt`. Наконец, мы обсудили некоторые потенциальные расширения для нашего исполнителя, например, предоставление поддержки мультипроцессинга. Для обработки клавиатурных тасков мы использования уведомления о пробуждении (waker notifications). Для этого реализовали новый тип `Executor`, который использует `Arc`-общую `task_queue` для готовых задач. Мы реализовали тип `TaskWaker`, который добавляет идентификаторы разбуженных задач в `task_queue`, которая затем опрашивается исполнителем. Чтобы сэкономить энергию, когда нет запущенных задач, мы добавили поддержку перевода процессора в спящий режим с использованием инструкции `hlt`. ## Что Далее? Используя async/await, мы теперь имеем базовую поддержку кооперативной многозадачности в нашем ядре. Хотя кооперативная многозадачность очень эффективна, она может привести к проблемам с задержкой, когда отдельные задачи выполняются слишком долго, тем самым препятствуя выполнению других задач. По этой причине имеет смысл также добавить поддержку вытесняющей многозадачности в наше ядро. В следующем посте мы введём _потоки_ как наиболее распространённую форму вытесняющей многозадачности. В дополнение к решению проблемы длительных задач, потоки также подготовят нас к использованию нескольких ядер процессора и запуску ненадежных пользовательских программ в будущем. ================================================ FILE: blog/content/edition-2/posts/12-async-await/index.zh-CN.md ================================================ +++ title = "Async/Await" weight = 12 path = "zh-CN/async-await" date = 2020-03-27 [extra] chapter = "Multitasking" # Please update this when updating the translation translation_based_on_commit = "67b3ac65dc735e0e109c2fb23ca18a536b84dc0d" # GitHub usernames of the people that translated this post translators = ["ic3w1ne"] # GitHub usernames of the people that contributed to this translation translation_contributors = [] +++ 在这篇文章中,我们将探索 Rust 的 _协作式多任务处理_ 及 _async/await_ 特性。我们将深入探讨 Rust 中 async/await 的工作原理,包括 `Future` trait 的设计、状态机转换与 _pinning_ 。随后,我们通过创建一个异步键盘任务和基础执行器,为我们的内核添加对 async/await 的基本支持。 这个系列的 blog 在 [GitHub] 上公开开发。如果您遇到任何问题或有疑问,请在这里开一个 issue 来讨论。你也可以在[底部][at the bottom]留下评论。你可以在 [`post-12`][post branch] 找到这篇文章的完整源码。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-12 ## 多任务 大多数操作系统的基本功能之一就是[多任务处理][_multitasking_],即同时执行多个任务的能力。例如,当你在查看这篇文章时,可能还打开了其他程序,比如文本编辑器或终端窗口。即便你只打开了一个浏览器窗口,也可能有各种后台任务在管理你的桌面窗口、检查更新或者索引文件。 [_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking 虽然看起来所有任务都在同时运行,但实际上单个 CPU 核心一次只能执行单个任务。为了制造任务同时运行的假象,操作系统会在活动任务之间快速切换,使每个任务都能被执行到。由于计算机运行速度极快,我们大多数时候都不会注意到这些切换。 单核 CPU 一次只能执行一个任务,而多核 CPU 能够以真正并行的方式运行多个任务。例如,一个 8 核 CPU 可以同时运行 8 个任务。我们将在后续文章中介绍如何设置多核 CPU。本文中,为了简单起见,我们将重点讨论单核。(值得注意的是,所有的多核 CPU 都是从单个激活的核心启动的,所以我们现在可以先处理单核 CPU 的情况)。 多任务处理有两种形式:_协作式多任务处理_ 要求任务定期主动让出对 CPU 的控制权,以便其他任务能够运行。_抢占式多任务处理_ 利用操作系统在任意时间点强制暂停线程的能力实现切换线程的功能。在下文中,我们将更详细地探讨这两种多任务处理形式,并讨论它们各自的优势和缺点。 ### 抢占式多任务处理 抢占式多任务处理的核心理念在于由操作系统决定何时进行任务切换。 为此,系统利用了每次中断时可重新获得 CPU 控制权这一机制。这使得系统能在有新输入时立即切换任务,例如当鼠标移动或网络数据包到达时。操作系统还可以通过配置硬件定时器,令其在指定时间后发送中断,从而精确控制每个任务允许运行的时间。 下图展示了硬件中断时的任务切换过程: ![](regain-control-on-interrupt.svg) 第一行中,CPU 正在执行程序 `A` 的任务 `A1` ,所有其他任务均处于暂停状态。在第二行,一个硬件中断到达 CPU 。如[硬件中断][_Hardware Interrupts_]文章所述,CPU 立即停止执行任务 `A1` 并跳转到中断描述符表(IDT)中定义的中断处理程序。通过这个中断处理程序,操作系统重新获得了 CPU 的控制权,这使得它能够切换到任务 `B1` 而非继续原任务 `A1` 。 [_Hardware Interrupts_]: @/edition-2/posts/07-hardware-interrupts/index.md #### 保存状态 任务可能在任意时间点被中断,即使它们可能正处于某些计算过程中。为了稍后能够恢复他们,操作系统必须备份任务的完整状态,包括其[调用栈][call stack]和所有 CPU 寄存器的值。这一过程被称为[上下文切换][_context switch_]。 [call stack]: https://en.wikipedia.org/wiki/Call_stack [_context switch_]: https://en.wikipedia.org/wiki/Context_switch 由于调用栈可能非常大,操作系统通常会为每个线程设置独立的调用栈,而非在每次任务切换时备份调用栈内容。这样一个拥有自己的栈的任务被称为一个 执行线程 [_thread of execution_] 或简称 _线程thread_。为每个任务使用独立的栈,在上下文切换时就只需保存寄存器内容(包括程序计数器和栈指针)。这种方法最大限度地减少了上下文切换的性能开销,这一点非常重要,因为上下文切换每秒可能发生多达100次。 [_thread of execution_]: https://en.wikipedia.org/wiki/Thread_(computing) #### 讨论 抢占式多任务处理的主要优势在于操作系统能够完全控制任务允许执行的时间。这种方式可以确保每个任务公平地获得 CPU 时间份额,而无需依赖任务间的协作。这一特性在运行第三方任务或多个用户共享系统时尤为重要。 抢占式多任务处理的缺点在于每个任务都需要独立的栈空间。相较于共享栈,使用独立栈会导致每个任务占用更多内存,并且通常会限制任务的数量。另一个缺点是操作系统总是需要在每次任务切换时保存完整的 CPU 寄存器状态,即使任务只使用了寄存器的一小部分。 抢占式多任务处理和线程是操作系统的基本组成部分,因为它们使得运行不受信任的用户空间程序成为可能。我们将在后续文章中详细讨论这些概念。不过本文的重点将放在协作式多任务处理上,因为它对于我们的内核来说也足够实用。 ### 协作式多任务处理 与在任意时间点强制暂停运行任务不同,协作式多任务处理让每个任务持续运行,直到它自愿放弃对 CPU 的控制权。这使得任务能够在合适的时机自行暂停,例如当它们需要等待 I/O 操作时。 协作式多任务处理常用于语言层面,比如以 [协程coroutines][coroutines] 或 [async/await] 的形式实现。其核心思想是由程序员或编译器在程序中插入 [yield][_yield_] 操作,这些操作会放弃 CPU 控制权并允许其他任务运行。例如,可以在一个复杂循环的每次迭代后插入yield。 [coroutines]: https://en.wikipedia.org/wiki/Coroutine [async/await]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html [_yield_]: https://en.wikipedia.org/wiki/Yield_(multithreading) 通常我们会将协作式多任务与 [异步操作asynchronous operations][asynchronous operations] 结合使用。不同于等待操作完成并在此期间阻止其他任务运行,异步操作会在操作未完成时返回 "未就绪"("not ready")状态。在这种情况下,等待中的任务可以执行 yield 操作,让其他任务运行。 [asynchronous operations]: https://en.wikipedia.org/wiki/Asynchronous_I/O #### 保存状态 由于任务自行决定暂停点,它们不需要操作系统来保存其所有的状态,而是自行在暂停前精确地保存恢复所需的状态,这通常会带来更好的性能表现。例如,一个刚刚完成复杂计算的任务可能只需要保存最终结果,而不再需要中间过程。 协作式多任务处理的编程语言级实现甚至能够在暂停前保存调用栈的必要部分。例如,Rust 的 async/await 实现会将所有仍被需要的局部变量存储在一个自动生成的结构体中(如后文所示)通过在暂停前保存调用栈的相关部分,所有任务可以共享单个调用栈,这使得每个任务的内存消耗大幅降低。从而实现创建任意数量的协作式任务并且不会耗尽内存。 #### 讨论 协作式多任务处理的缺点在于,一个不愿意主动暂停的任务可能会长时间占用处理器资源。比如,恶意或有缺陷的任务可能会阻止其他任务运行,并且会拖慢甚至阻塞整个系统。因此,协作式多任务处理应仅在确保所有任务都会协作的情况下使用。让操作系统依赖于任意用户级程序的协作并不是一个好主意。 然而,协作式多任务处理在性能和内存方面的显著优势,使其成为适合在程序内部使用的好方法,特别是与异步操作结合使用。操作系统内核作为与异步硬件交互的性能关键程序,采用协作式多任务处理似乎是一种实现并发的理想方式。 ## Rust 中的 Async/Await Rust 语言为协作式多任务处理提供了一流的支持,其实现形式是 async/await。在我们探讨 async/await 的概念及其工作原理之前,需要先理解 Rust 中 _futures_ 和异步编程的运作机制。 ### Futures 一个 _future_ 代表一个可能尚未就绪的值。例如,这个值可以是一个由其他任务计算得出的整数,或从网络下载的文件。futures 使得程序可以继续执行,直到需要该值时再处理,而非在原地等待直到它可用。 #### 示例 futures 的概念可以通过一个小例子说明: ![Sequence diagram: main calls `read_file` and is blocked until it returns; then it calls `foo()` and is also blocked until it returns. The same process is repeated, but this time `async_read_file` is called, which directly returns a future; then `foo()` is called again, which now runs concurrently with the file load. The file is available before `foo()` returns.](async-example.svg) 该序列图展示了一个 `main` 函数,它从文件系统中读取文件,然后调用 `foo` 函数。这个过程会重复两次:一次使用同步的 `read_file` 调用,另一次使用异步的 `async_read_file` 调用。 使用同步调用时, `main` 函数需要等待文件从文件系统中加载完成后才能调用 `foo` 函数。 通过异步的 `async_read_file` 调用,文件系统会直接返回一个 future 并在后台异步加载文件。这使得 `main` 函数能够更早地调用 `foo` ,然后 `foo` 会与文件加载并行运行。在这个例子中,文件加载甚至在 `foo` 返回前就完成了,因此 `main` 在 `foo` 返回后无需等待就能直接处理文件。 #### Rust 中的 Future 在 Rust 中,future 由 [`Future`] trait 表示,其定义如下: [`Future`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html ```rust pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll; } ``` [关联类型][associated type] `Output` 用于指定异步值的类型。例如, 上图中的 `async_read_file` 函数将返回一个 `Future` 实例,其 `Output` 被设置为 `File` 。 [associated type]: https://doc.rust-lang.org/book/ch20-02-advanced-traits.html#associated-types [`poll`] 方法可用于检查值是否已就绪。它返回一个 [`Poll`] 枚举,其定义如下: [`poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll [`Poll`]: https://doc.rust-lang.org/nightly/core/task/enum.Poll.html ```rust pub enum Poll { Ready(T), Pending, } ``` 当值已可用时(例如文件已从磁盘完全读取),它会被包装后返回 `Ready` 变体。否则返回 `Pending` 变体,向调用者表明该值尚不可用。 `poll` 方法接收两个参数: `self: Pin<&mut Self>` and `cx: &mut Context` 。其中,前者的行为类似于普通的 `&mut self` 引用,不同之处在于 `Self` 值是被 [_pinned_] 在其内存位置。如果不了解 async/await 的工作原理,那么理解 `Pin` 的原理和必要性会变得很困难。因此我们会在后文中详细解释。 [_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html 参数 `cx: &mut Context` 的作用是传递一个[`Waker`] 实例给异步任务,例如文件系统加载。这个 `Waker` 允许异步任务发出信号来表明它已全部或者部分完成,例如文件已从磁盘加载完成。由于主任务知道它会在 `Future` 就绪时收到通知,因此它不需要反复调用 `poll` 方法进行轮询。我们将在本文后面实现自己的 waker 类型时更详细地解释这个过程。 [`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html ### 使用 Future 进行开发 我们现在已经了解了 Future 的定义,并理解了 `poll` 方法背后的基本理念。然而,我们仍然不知道如何高效地使用 Future。问题在于 Future 表示异步任务的结果,而这些结果可能还不可用。但在实际应用中,我们经常需要直接使用这些值进行后续计算。那么问题来了,当需要时,应该如何高效地获取 Future 的值? #### 等待 Future 就绪 一种可能的解决方案是等待 Future 变为就绪状态。具体实现可能如下所示: ```rust let future = async_read_file("foo.txt"); let file_content = loop { match future.poll(…) { Poll::Ready(value) => break value, Poll::Pending => {}, // 什么都不做 } } ``` 在这里,我们通过在循环中反复调用 `poll` 来 _主动_ 等待 future 。`poll` 的参数在此处并不重要,因此我们将其省略。虽然这种解决方案可行,但效率非常低下,因为它会让 CPU 持续忙碌直到值变得可用。 更高效的方法可能是 _阻塞_ 当前线程,直到 future 值变得可用。当然,这只有在拥有线程的情况下才可能实现,因此该解决方案不适用于我们的内核,至少目前还不适用。即使在支持阻塞的系统上,这种方式通常也不被推荐,因为它会将异步任务再次转变为同步任务,从而抑制了并行任务潜在的性能优势。 #### Future 组合器 等待的替代方案是使用 future 组合器(future combinators)。Future 组合器是使用 `map` 这样的方法,将多个 future 链式组合在一起,类似于 [`Iterator`] trait的方法。 这些组合器不会等待 future 完成,而是返回一个新的 future,该 future 会在 `poll` 时应用映射操作。 [`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html 例如,一个简单的 `string_len` 组合器,用于将 `Future` 转换为 `Future` 可以这样实现: ```rust struct StringLen { inner_future: F, } impl Future for StringLen where F: Future { type Output = usize; fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll { match self.inner_future.poll(cx) { Poll::Ready(s) => Poll::Ready(s.len()), Poll::Pending => Poll::Pending, } } } fn string_len(string: impl Future) -> impl Future { StringLen { inner_future: string, } } // Usage fn file_len() -> impl Future { let file_content_future = async_read_file("foo.txt"); string_len(file_content_future) } ``` 这段代码不完全能工作,因为它没有处理 [_pinning_] 问题,但作为示例已经足够。其基本思路是,`string_len` 函数将给定的 `Future` 实例包装到一个新的 `StringLen` 结构体,这个结构体同样也实现了 `Future` 。当被包装的 future 被轮询时,它会轮询内部 future。如果值还不可用,包装后的 future 也会返回 `Poll::Pending`。如果值可用,就从 `Poll::Ready` 变体中把字符串提取出来并计算其长度。随后,它会被重新包装进 `Poll::Ready` 并返回。 [_pinning_]: https://doc.rust-lang.org/stable/core/pin/index.html 通过这个 `string_len` 函数,我们无需等待就能计算异步字符串的长度。由于该函数再次返回一个 `Future` ,调用者无法直接使用返回值,而是需要再次使用组合器函数。这样一来,整个调用链就变成了异步的,我们可以在某些节点(例如 main 函数中)高效地同时等待多个 future。 由于手动编写组合器函数较为困难,它们通常由库提供。虽然 Rust 标准库本身尚未提供组合器函数,但半官方的(且兼容 no_std 的)[`futures`] crate 提供了这些功能。其 [`FutureExt`] trait 提供了诸如 [`map`] 或 [`then`] 等高级组合器方法,可使用任意闭包来操作结果。 [`futures`]: https://docs.rs/futures/0.3.4/futures/ [`FutureExt`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html [`map`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map [`then`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then ##### 优势 future 组合器的最大优势在于它们能保持操作的异步性。在与异步 I/O 接口结合使用时,这种方法可以实现极高的性能。 future 组合器以普通结构体配合 trait 的方式实现,使得编译器能够对其进行深度优化。更多详情请参阅 [_Rust中的零成本 futures_][_Zero-cost futures in Rust_] 文章,它宣布了 futures 被加入 Rust 生态系统的消息。 [_Zero-cost futures in Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ ##### 缺点 {#drawbacks} 虽然 future 组合器能够编写出非常高效的代码,但在某些情况下,由于类型系统和基于闭包的接口,它们可能变得难以使用。例如,考虑如下代码: ```rust fn example(min_len: usize) -> impl Future { async_read_file("foo.txt").then(move |content| { if content.len() < min_len { Either::Left(async_read_file("bar.txt").map(|s| content + &s)) } else { Either::Right(future::ready(content)) } }) } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=91fc09024eecb2448a85a7ef6a97b8d8)) 这里我们读取 `foo.txt` 文件,然后使用 `then` 组合器根据文件内容链接第二个future。如果内容长度小于给定的 `min_len`,我们会读取另一个文件 `bar.txt` 并将其追加到 `content` ,否则仅返回 `foo.txt` 的内容。 我们需要对传递给 `then` 的闭包使用 [move 关键字][`move` keyword],否则 `min_len` 中会出现生命周期错误。使用 [`Either`] 包装器的原因是 `if` 和 `else` 代码块必须始终保持相同的类型。由于我们在代码块中返回了不同的 future 类型,必须使用包装器类型将它们统一为单一类型。[`ready`] 函数将一个值包装成立刻可用的 future。这里需要该函数是因为 `Either` 包装器要求被包装的值必须实现 Future。 [`move` keyword]: https://doc.rust-lang.org/std/keyword.move.html [`Either`]: https://docs.rs/futures/0.3.4/futures/future/enum.Either.html [`ready`]: https://docs.rs/futures/0.3.4/futures/future/fn.ready.html 可以想象,对于大型项目来说,这很快就会导致代码变得复杂。特别是涉及借用和不同的生命周期时,情况会变得更加复杂。正因如此,大量工作被投入到为 Rust 添加 async/await 支持中,来让异步代码编写起来更简单。 ### Async/Await 异步/等待模式 async/await 的设计理念是让程序员编写 _看似普通_ 的同步代码,但由编译器转换为异步代码。它基于 `async` 和 `await` 两个关键字运作。`async` 关键字可用于函数签名中来将一个同步函数转换为返回 future 的异步函数: ```rust async fn foo() -> u32 { 0 } // 上述代码大致被编译器转换成 fn foo() -> impl Future { future::ready(0) } ``` 只有这个关键字本身看起来不太有用。然而,在 `async` 函数内部,`await` 关键字可用于获取一个 future 的异步值: ```rust async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=d93c28509a1c67661f31ff820281d434)) 此函数直接转换自[上文](#drawbacks)中使用组合函数的 `example` 函数。通过使用 `.await` 运算符,我们无需任何闭包或者 `Either` 类型就可以直接获取 future 的值。于是我们就可以像写普通的同步代码一样编写代码,只不过 _这实际上是异步代码_。 #### 状态机转换 在底层,编译器将 `async` 函数体转换为一个 [状态机][_state machine_] ,每次调用 `.await` 都代表一个不同的状态。对于上述 `example` 函数,编译器会创建一个包含以下四种状态的状态机: [_state machine_]: https://en.wikipedia.org/wiki/Finite-state_machine ![Four states: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-states.svg) 每个状态代表函数执行过程中的不同暂停点。_"Start"_ 和 _"End"_ 状态分别表示函数执行的开端和终止。_"Waiting on foo.txt"_ 状态表示该函数当前正在等待第一个 `async_read_file` 的结果。类似的,_"Waiting on bar.txt"_ 状态表示该函数正在等待第二个 `async_read_file` 的结果。 状态机使用 `poll` 调用来触发可能的状态转换,从而实现 `Future` trait: ![Four states and their transitions: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-basic.svg) 该图表使用箭头表示状态转换,菱形表示条件分支。例如,如果 `foo.txt` 文件尚未就绪,则会选择标记为 _"no"_ 的分支,到达 _"Waiting on foo.txt"_ 状态。否则,将执行 _"yes"_ 分支。那个小的无标注的红色菱形代表 `example` 函数中 `if content.len() < 100` 分支。 我们看到第一次 `poll` 调用启动了该函数并让其运行,直到遇到一个尚未可用的 future。如果路径上所有 future 都已就绪,函数可以一直运行到 _"End"_ 状态,此时它会返回包裹在 `Poll::Ready` 中的结果。否则,状态机将进入等待状态并且返回 `Poll::Pending`。在下一次 `poll` 调用时,状态机将从上次的等待状态中恢复并尝试之前的操作。 #### 保存状态 为了能够从上一个等待状态继续执行,状态机必须在内部跟踪当前状态。此外,它还必须保存所有在下一次 `poll` 调用时继续执行所需的变量。这正是编译器可以大显身手的地方:因为它知道哪些变量在何时被使用,当需要时,它能自动生成包含这些确切变量的结构体。 例如,编译器会为上述 `example` 函数生成类似以下的结构体: ```rust // 这里再写一遍 `example` 函数,方便阅读 async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } // 由编译器生成的状态结构体: struct StartState { min_len: usize, } struct WaitingOnFooTxtState { min_len: usize, foo_txt_future: impl Future, } struct WaitingOnBarTxtState { content: String, bar_txt_future: impl Future, } struct EndState {} ``` 在 "start" 和 _"Waiting on foo.txt"_ 状态下,需要存储 `min_len` 参数以供稍后与 `content.len()` 进行比较。 _"Waiting on foo.txt"_ 状态额外存储了一个 `foo_txt_future` ,它代表 `async_read_file` 调用返回的 future。这个 future 在状态机继续运行时需要再次被轮询,因此需要保存它。 _"Waiting on bar.txt"_ 状态包含用于后续 `bar.txt` 准备就绪时字符串拼接的 `content` 变量。它还存储了一个 `bar_txt_future` ,用于表示正在加载中的 `bar.txt` 。 该结构体不再包含 `min_len` 变量,因为在 `content.len()` 比较之后就不再需要它。在 _"end"_ 状态,不会存储任何变量,因为函数已经运行完毕。 请注意,这只是编译器可能生成的代码示例。结构体名称和字段布局的实现细节可能会有所不同。 #### 完整状态机类型 虽然编译器生成的具体代码属于实现细节,但想象一下为 `example` 函数生成的状态机 _可能_ 是什么样子有助于理解。我们已经定义了表示不同状态并包含所需变量的结构体。为了基于它们创建一个状态机,我们可以将它们组合成一个 [`enum`]: [`enum`]: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html ```rust enum ExampleStateMachine { Start(StartState), WaitingOnFooTxt(WaitingOnFooTxtState), WaitingOnBarTxt(WaitingOnBarTxtState), End(EndState), } ``` 我们为每个状态定义独立的枚举变体,并为每个变体添加对应的状态结构体作为字段。为实现状态转换,编译器会生成一个基于 `example` 函数的 `Future` trait: ```rust impl Future for ExampleStateMachine { type Output = String; // `example` 的返回类型 fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { loop { match self { // TODO: 处理 pinning ExampleStateMachine::Start(state) => {…} ExampleStateMachine::WaitingOnFooTxt(state) => {…} ExampleStateMachine::WaitingOnBarTxt(state) => {…} ExampleStateMachine::End(state) => {…} } } } } ``` 该 future 的 `Output` 类型是 `String` ,因为它是 `example` 函数的返回类型。为了实现 `poll` 函数,我们在 `loop` 内对当前状态使用 `match` 语句。其核心思想是只要可能就切换到下一个状态,并在无法继续时使用显式的 `return Poll::Pending` 。 为简单起见,我们仅展示简化代码,不处理 [_pinning_]、所有权、生命周期等问题。因此,当前及后续代码应视为伪代码,不可直接使用。当然,编译器实际生成的代码会正确处理所有情况,尽管实现方式可能有所不同。 为保持代码片段简洁,我们将分别展示每个 `match` 分支的代码。让我们从 `Start` 状态开始: ```rust ExampleStateMachine::Start(state) => { // 来自 `example` 函数体 let foo_txt_future = async_read_file("foo.txt"); // `.await` 运算符 let state = WaitingOnFooTxtState { min_len: state.min_len, foo_txt_future, }; *self = ExampleStateMachine::WaitingOnFooTxt(state); } ``` 状态机在函数刚开始时为 `Start` 状态。在这种情况下,我们会执行 `example` 函数体内的所有代码,直到遇到第一个 `.await` 为止。为了处理 `.await` 运算符,我们会将 `self` 状态机的状态设置为 `WaitingOnFooTxt`,其中包括了 `WaitingOnFooTxtState` 结构体的构建。 由于 `match self {…}` 语句是在循环中执行的,执行流程会跳转到之后的 `WaitingOnFooTxt` 分支: ```rust ExampleStateMachine::WaitingOnFooTxt(state) => { match state.foo_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(content) => { // 来自 `example` 函数体 if content.len() < state.min_len { let bar_txt_future = async_read_file("bar.txt"); // `.await` 运算符 let state = WaitingOnBarTxtState { content, bar_txt_future, }; *self = ExampleStateMachine::WaitingOnBarTxt(state); } else { *self = ExampleStateMachine::End(EndState); return Poll::Ready(content); } } } } ``` 在这个 `match` 分支中,我们首先调用 `foo_txt_future` 的 `poll` 函数。如果它尚未就绪,我们直接退出循环并返回 `Poll::Pending` 。由于此时 `self` 仍处于 `WaitingOnFooTxt` 状态,状态机的下一次 `poll` 调用将进入相同的 `match` 分支,并重新尝试轮询 `foo_txt_future`。 当 `foo_txt_future` 就绪时,我们将结果赋值给 `content` 变量并继续执行 `example` 函数的代码:如果 `content.len()` 小于状态结构体中保存的 `min_len` 则异步读取 `bar.txt` 文件。我们再次将 `.await` 操作转换为状态变更,这次变更为 `WaitingOnBarTxt` 状态。由于我们是在循环中执行 `match` 操作,执行流程会直接跳转到新状态对应的 `match` 分支继续处理。其中会对 `bar_txt_future` 进行轮询。 若进入 `else` 分支,则不会发生进一步的 `.await` 操作。我们到达函数末尾并返回包裹在 `Poll::Ready` 中的 `content` 。同时将当前状态更改为 `End` 状态。 `WaitingOnBarTxt` 状态的代码如下所示: ```rust ExampleStateMachine::WaitingOnBarTxt(state) => { match state.bar_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(bar_txt) => { *self = ExampleStateMachine::End(EndState); // 来自 `example` 函数体 return Poll::Ready(state.content + &bar_txt); } } } ``` 与 `WaitingOnFooTxt` 状态类似,我们首先轮询 `bar_txt_future` 。如果它仍然处于 pending 状态,我们退出循环并返回 `Poll::Pending` 。否则,我们可以执行 `example` 函数最后的操作:拼接 `content` 以及 future 的返回值。我们将状态机更新为 `End` 状态,然后返回包装在 `Poll::Ready` 中的结果。 最终,`End` 状态的代码如下所示: ```rust ExampleStateMachine::End(_) => { panic!("poll called after Poll::Ready was returned"); } ``` Futures 在返回 `Poll::Ready` 后不应再次轮询,所以在处于 `End` 状态时发生 `poll` 调用,则直接 panic。 我们现在已经了解了编译器生成的状态机及其对 Future 的实现 _可能_ 的样子。实际上,编译器是以另一种方式生成代码的。(如果你感兴趣的话:这个实现当前基于 [协程][_coroutines_],但这仅仅是可行的细节实现之一。) [_coroutines_]: https://doc.rust-lang.org/stable/unstable-book/language-features/coroutines.html 拼图的最后一块是为 `example` 函数本身生成的代码。记住,函数签名是这样定义的: ```rust async fn example(min_len: usize) -> String ``` 由于完整函数体现在已由状态机实现,唯一需要该函数完成的是初始化状态机并返回它。生成的代码可能如下所示: ```rust fn example(min_len: usize) -> ExampleStateMachine { ExampleStateMachine::Start(StartState { min_len, }) } ``` 该函数不再具有 `async` 修饰符,因为它现在显式返回一个实现了 `Future` trait 的 `ExampleStateMachine` 类型。正如预期的那样,这个状态机构建出来处于 `Start` 状态,并使用 `min_len` 参数初始化对应的状态结构体。 请注意,此函数不会启动状态机的执行。这是 Rust 中 future 的一个基本设计决策:在首次被轮询之前,它们不会执行任何操作。 ### Pinning 在本文中我们已经多次提到了 _固定_ (pinning),现在终于可以深入探讨什么是固定以及为什么需要它。 #### 自引用结构体 如上所述,状态机转换会把每个暂停点的局部变量存储在结构体中。对于 `example` 函数这样的小例子,这很直接且不会导致任何问题。但当变量相互引用时,情况就变得复杂了。例如,考虑以下函数: ```rust async fn pin_example() -> i32 { let array = [1, 2, 3]; let element = &array[2]; async_write_file("foo.txt", element.to_string()).await; *element } ``` 该函数创建了一个包含元素 `1`, `2`, 和 `3` 的小型 `array`。然后它创建对最后一个数组元素的引用,并将其存储在 `element` 变量中。接着,它异步地将数字转换为字符串并写入 `foo.txt` 文件。最后,它返回由 `element` 引用的数字。 由于该函数使用了单个 `await` 操作,生成的状态机包含三个状态:开始、结束和"等待写入"。该函数不接受参数,因此开始状态的结构体为空。如前所述,结束状态的结构体为空,因为函数在此处已执行完毕。"等待写入"状态的结构体则更为有趣: ```rust struct WaitingOnWriteState { array: [1, 2, 3], element: 0x1001c, // 最后一个数组元素的地址 } ``` 我们需要同时保存 `array` 数组和 `element` 变量,因为 `element` 对于返回值是必需的,而 `array` 被 `element` 引用。由于 `element` 是一个引用,它存储了一个 _指针_ (即内存地址)指向被引用的元素。这里我们以 `0x1001c` 为例。实际上,它就是 `array` 字段最后一个元素的地址,因此其值取决于结构体在内存中的位置。具有这种内部指针的结构体被称为 _自引用结构体_ (_self-referential_ ),因为它们通过其中某个字段引用了自身。 #### 自引用结构体的问题 我们自引用结构体的内部指针引出了一个根本性问题,当我们查看其内存布局时,这一点变得显而易见: ![array at 0x10014 with fields 1, 2, and 3; element at address 0x10020, pointing to the last array element at 0x1001c](self-referential-struct.svg) `array` 字段起始于地址 0x10014,`element` 元素字段位于地址 0x10020。它指向地址 0x1001c,因为最后一个数组元素位于此地址。此时一切正常。然而,当我们把这个结构体移动到不同的内存地址时就会出现问题: ![array at 0x10024 with fields 1, 2, and 3; element at address 0x10030, still pointing to 0x1001c, even though the last array element now lives at 0x1002c](self-referential-struct-moved.svg) 我们将结构体稍微移动了一下,现在它从地址 `0x10024` 开始。这种情况可能发生在当我们把结构体作为函数参数传递时,或将其赋值给不同的栈变量时。问题在于,即使最后一个 `array` 元素已经移动,`element` 字段仍然指向地址 `0x1001c` ,然而实际上该元素现在位于地址 `0x1002c`。因此这个指针变成悬垂指针,导致在下一次 `poll` 调用时出现未定义行为。 #### 可能的解决方案 解决悬垂指针问题有三种基本方法: * **移动时更新指针:** 其理念是每次结构体在内存中移动时都更新内部指针,从而使其保持有效。遗憾的是,这种方法需要对 Rust 进行大量修改。这可能导致巨大的性能损失,因为需要某种运行时机制来跟踪所有结构体的字段类型,并在每次移动操作时检查是否需要更新指针。 * **存储偏移量而非自引用:** 为避免更新指针,编译器可以尝试将自引用存储为相对于结构体起始位置的偏移量。例如,上述 `WaitingOnWriteState` 结构体中的 `element` 字段可以存储为一个值为 8 的 `element_offset` 字段,因为引用点指向的数组元素在结构体起始位置之后 8 字节处。由于偏移量结构体被移动时保持不变,没有字段需要更新。这种方法的问题在于需要编译器检测所有自引用。这在编译时无法实现,因为引用的值可能取决于用户输入,因此就又需要一个运行时系统来分析引用并正确创建状态结构体。这不仅会导致运行时开销,还会影响某些编译器优化,从而再次造成较大的性能损失。 * **禁止移动结构体:** 如上所述,只有在内存中移动结构体时才会出现悬垂指针。通过完全禁止对自引用结构体的移动操作就可以避免这个问题。这种方法的最大优势在于它能够在类型系统层面实现,无需额外的运行时开销。缺点是它将处理可能移动的自引用结构体的责任交给了程序员。 Rust 选择了第三种解决方案,这源于其提供 _零成本抽象_ 的原则,即抽象不应带来额外的运行时开销。_pinning_ API 正是为此目的而在 [RFC 2349](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md) 中提出的。接下来,我们将简要概述这个API,并解释它如何与 async/await 和 futures 协同工作。 #### 堆上的值 首先注意到,[堆分配的][heap-allocated] 值在大多数情况下已经拥有固定的内存地址。它们通过调用 `allocate` 来创建,由一个指针类型比如 `Box` 来引用。虽然可以移动指针类型,但指针所指向的堆值在内存中的地址保持不变,除非调用 `deallocate` 将其释放。 [heap-allocated]: @/edition-2/posts/10-heap-allocation/index.md 使用堆分配,我们可以尝试创建一个自引用结构体: ```rust fn main() { let mut heap_value = Box::new(SelfReferential { self_ptr: 0 as *const _, }); let ptr = &*heap_value as *const SelfReferential; heap_value.self_ptr = ptr; println!("heap value at: {:p}", heap_value); println!("internal reference: {:p}", heap_value.self_ptr); } struct SelfReferential { self_ptr: *const Self, } ``` ([Try it on the playground][playground-self-ref]) [playground-self-ref]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=ce1aff3a37fcc1c8188eeaf0f39c97e8 我们创建了一个名为 `SelfReferential` 的简单结构体,它包含一个单独的指针字段。首先,我们使用空指针初始化此结构体,然后通过 `Box::new` 在堆上分配内存存储它。接下来尝试确定堆分配结构体的内存地址并将其存储在 `ptr` 变量中。最后,通过将 `ptr` 变量赋值给 `self_ptr` 字段使结构体形成自引用。 当我们在 playground 上执行这段代码时,可以看到堆值的地址与其内部指针是相等的,这意味着 `self_ptr` 字段是一个有效的自引用。由于 `heap_value` 变量仅是一个指针,移动它(例如传递给函数)并不会改变结构体自身的地址,因此即使指针被移动,`self_ptr` 仍保持有效。 然而,仍有一种方式可以破坏这个示例:我们可以从 `Box` 将结构体移出或替换其内容: ```rust let stack_value = mem::replace(&mut *heap_value, SelfReferential { self_ptr: 0 as *const _, }); println!("value at: {:p}", &stack_value); println!("internal reference: {:p}", stack_value.self_ptr); ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) 这里我们使用 [`mem::replace`] 函数将堆分配的值替换为一个新的结构体实例。 这样我们就可以将原始的 `heap_value` 移动到栈上,而结构体的 `self_ptr` 字段此时变成了一个悬垂指针,仍然指向旧的堆地址。当您尝试在 playground 上运行示例时,会看到打印的 _"value at:"_ and _"internal reference:"_ 行确实显示了不同的指针。因此仅对值进行堆分配并不足以确保自引用安全。 [`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html 导致上述破坏的根本问题是 `Box` 允许我们获取堆分配值的 `&mut T` 引用。这个 `&mut T` 引用导致可以使用诸如 [`mem::replace`] 或者 [`mem::swap`] 这样的方法使堆分配的值失效。为解决此问题,我们必须防止创建指向自引用结构体的 `&mut` 引用。 [`mem::swap`]: https://doc.rust-lang.org/nightly/core/mem/fn.swap.html #### `Pin>` 与 `Unpin` 固定 pinning API 通过 [`Pin`] 包装类型以及 [`Unpin`] trait 提供了解决 `&mut T` 问题的方案。这些类型背后的理念是,将所有 `Pin` 中能获取包装值的 `&mut` 引用的方法(例如 [`get_mut`][pin-get-mut] 或 [`deref_mut`][pin-deref-mut]) 都限制在 `Unpin` trait 上使用。`Unpin` trait 是一个 [_auto trait_] ,会自动为所有类型实现,除了那些明确选择不实现的类型。通过让自引用结构体不实现 `Unpin`,使得我们无法(安全地)从 `Pin>` 类型中获取它们的 `&mut T` ,从而保证它们内部的自引用保持有效。 [`Pin`]: https://doc.rust-lang.org/stable/core/pin/struct.Pin.html [`Unpin`]: https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html [pin-get-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_mut [pin-deref-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.deref_mut [_auto trait_]: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits 举个例子,让我们更新上面的 `SelfReferential` 类型来让其不实现 `Unpin`: ```rust use core::marker::PhantomPinned; struct SelfReferential { self_ptr: *const Self, _pin: PhantomPinned, } ``` 我们通过添加第二个类型为 [`PhantomPinned`] 的 `_pin` 字段来选择退出。该类型是零大小的标记类型,仅用于防止自动实现 `Unpin` trait。根据 [_auto trait_] 的工作原理,当某个字段不是 `Unpin` 时,就足以使整个结构体不实现 `Unpin` trait。 [`PhantomPinned`]: https://doc.rust-lang.org/nightly/core/marker/struct.PhantomPinned.html 第二步是将示例中的 `Box` 类型更改为 `Pin>` 类型。最简单的方法是使用 [`Box::pin`] 函数而非 [`Box::new`] 来创建堆分配的值: [`Box::pin`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin [`Box::new`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.new ```rust let mut heap_value = Box::pin(SelfReferential { self_ptr: 0 as *const _, _pin: PhantomPinned, }); ``` 除了将 `Box::new` 改为 `Box::pin` 外,我们还需要在结构体初始化器中添加新的 `_pin` 字段。由于 `PhantomPinned` 是零大小类型,我们只要有其类型名称即可完成初始化。 当我们现在[尝试运行调整后的示例](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=961b0db194bbe851ff4d0ed08d3bd98a)时,发现它会报错: ``` error[E0594]: cannot assign to data in dereference of `Pin>` --> src/main.rs:10:5 | 10 | heap_value.self_ptr = ptr; | ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot assign | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` error[E0596]: cannot borrow data in dereference of `Pin>` as mutable --> src/main.rs:16:36 | 16 | let stack_value = mem::replace(&mut *heap_value, SelfReferential { | ^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` ``` 这两个错误的发生是因为 `Pin>` 类型不再实现 `DerefMut` trait。这正是我们想要的,因为 `DerefMut` trait 会返回一个 `&mut` 引用,而这正是我们想要避免的。这种情况之所以发生,仅仅是因为我们同时选择了不实现 `Unpin` 并将 `Box::new` 改为 `Box::pin`。 现在的问题是,编译器不仅阻止了第16行中的类型移动,还禁止在第10行初始化 `self_ptr` 字段。这是因为编译器无法区分 `&mut` 引用的有效和无效使用。要使初始化正常工作,我们必须使用不安全的 [`get_unchecked_mut`] 方法: [`get_unchecked_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut ```rust // 安全,因为修改一个字段不会移动整个结构体 unsafe { let mut_ref = Pin::as_mut(&mut heap_value); Pin::get_unchecked_mut(mut_ref).self_ptr = ptr; } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=b9ebbb11429d9d79b3f9fffe819e2018)) `get_unchecked_mut` 函数工作于 `Pin<&mut T>` 之上,而非 `Pin>` ,因此我们必须使用 [`Pin::as_mut`] 转换值。然后我们可以通过 `get_unchecked_mut` 返回的 `&mut` 引用来设置 `self_ptr` 字段。 [`Pin::as_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut 现在剩下的唯一错误就是 `mem::replace` 上的预期错误了。这个操作试图将堆分配的值移动到栈上,这会破坏存储在 `self_ptr` 字段的自引用。通过阻止自动实现 `Unpin` 并采用 `Pin>` ,我们可以让编译器阻止此类操作并安全地处理自引用结构体。正如我们所看到的,编译器(目前)还无法证明创建自引用是安全的,因此我们需要使用 unsafe 代码块自行验证其正确性。 #### 栈上的Pinning与 `Pin<&mut T>` 在上一节中,我们学习了如何使用 `Pin>` 安全地创建堆分配的自引用值。虽然这种方法效果良好且相对安全(除了不安全的构造过程外),但所需的堆分配会带来性能开销。由于 Rust 致力于尽可能实现零成本抽象,pinning API 也允许创建指向栈上值的 `Pin<&mut T>` 实例。 与拥有被包装值的所有权的 `Pin>` 实例不同, `Pin<&mut T>` 实例仅临时借用所包装的值。这使得情况更加复杂,因为它要求程序员自行提供额外的保证。最重要的是,一个 `Pin<&mut T>` 必须在被引用的 `T` 的整个生命周期内保持固定,这一点对于基于栈的变量来说难以验证。为此,存在像 [`pin-utils`] 这样的 crate,但我仍然不建议固定到栈上,除非你非常清楚自己在做什么。 [`pin-utils`]: https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/ 如需进一步阅读,请查阅 [`pin` 模块][`pin` module] 的文档以及 [`Pin::new_unchecked`] 方法。 [`pin` module]: https://doc.rust-lang.org/nightly/core/pin/index.html [`Pin::new_unchecked`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.new_unchecked #### Pinning 与 Futures 正如我们之前所见,[`Future::poll`] 方法通过 `Pin<&mut Self>` 参数的形式使用固定: [`Future::poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll ```rust fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll ``` 该方法采用 `self: Pin<&mut Self>` 而非普通的 `&mut self` 的原因是,通过 async/await 创建的 future 实例通常是自引用的,如我们[之前][self-ref-async-await]所见的那样。将 `Self` 包装进 `Pin` 并让编译器阻止 async/await 生成的自引用 future 实现 `Unpin` ,可以确保在 `poll` 调用之间这些 future 在内存中不会被移动。这确保了所有内部引用仍然有效。 [self-ref-async-await]: @/edition-2/posts/12-async-await/index.md#self-referential-structs 值得注意的是,在首次调用 `poll` 前移动 future 是安全的。这是由于 future 有惰性,在首次被轮询前不会执行任何操作。刚生成的状态机处于 `start` 状态,因此仅包含函数参数而不包含内部引用。为了调用 `poll` ,调用者必须先将 future 包装到 `Pin` 中,这确保了 future 在内存中不再被移动。由于栈固定(stack pinning)更难实现,我建议在这种情况下始终结合使用 [`Box::pin`] 和 [`Pin::as_mut`]。 [`futures`]: https://docs.rs/futures/0.3.4/futures/ 如果你有兴趣了解如何安全地使用栈固定技术自行实现一个 future 组合器函数,可以参考 `futures` crate 中相对简短的 [map 组合器 方法的源代码][map-src] 以及 pin 文档中关于 [projections and structural pinning] 的章节。 [map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html [projections and structural pinning]: https://doc.rust-lang.org/stable/std/pin/index.html#projections-and-structural-pinning ### Executors 与 Wakers 使用 async/await 可以以完全异步的方式更符合人体工程学地处理 futures。然而,正如我们之前所学,futures 在被轮询前不会执行任何操作。这意味着我们必须在某时刻调用 `poll` ,否则异步代码永远不会执行。 对于单个 future,我们总是可以像[上面描述](#deng-dai-future-jiu-xu)的那样使用循环手动等待每个future。然而,这种方法效率非常低下,对于创建大量 futures 的程序来说不太实用。解决这个问题最常见的方法是定义一个全局的执行器 _executor_ ,它负责轮询系统中所有的 future,直到它们全部完成。 #### Executors 执行器 执行器通常通过某种 `spawn` 方法,将 future 作为独立任务生成。然后,执行器负责轮询所有 future 直到它们完成。集中管理所有 future 的最大优势在于,当某个 future 返回 `Poll::Pending` 时,执行器可以立即切换到另一个 future。这样,异步操作就能并行运行,使得 CPU 保持忙碌状态。 许多执行器实现还能充分利用多核 CPU 系统的优势。它们会创建一个 [线程池][thread pool] ,在有足够多任务时能够利用所有核心资源,并采用诸如 [工作窃取][work stealing] 等技术来平衡各核心之间的负载。还有一些特殊的、适用于嵌入式系统的执行器实现,针对低延迟和内存占用进行优化。 [thread pool]: https://en.wikipedia.org/wiki/Thread_pool [work stealing]: https://en.wikipedia.org/wiki/Work_stealing 为了避免重复轮询 future 带来的开销,执行器通常会利用 Rust 的 future 所支持的 _waker_ API。 #### Wakers 唤醒器 waker API 的核心思想是:每次调用 `poll` 时都会传入一个特殊的 `Waker` 类型, 封装在 [`Context`] 类型中。这个 `Waker` 类型由执行器创建,异步任务用它来通知其状态,比如已完成或者部分完成。因此,执行器无需对之前返回 `Poll::Pending` 的 future 重复调用 `poll`,直到它收到对应 waker 的通知。 [`Context`]: https://doc.rust-lang.org/nightly/core/task/struct.Context.html 通过一个小例子可以很好地说明这一点: ```rust async fn write_file() { async_write_file("foo.txt", "Hello").await; } ``` 此函数会异步地将字符串 "Hello" 写入 `foo.txt` 文件。由于硬盘写入需要一定时间,首次轮询这个 future 时很可能会返回 `Poll::Pending` 。硬盘驱动器会在内部存储传递给 `poll` 调用的 `Waker` ,并在文件写入磁盘时使用它来通知执行器。这样,执行器在收到唤醒通知之前就无需浪费时间尝试轮询该 future。 在这篇文章的实现部分,我们将通过创建一个支持 waker 的自定义执行器来了解 `Waker` 类型的具体工作原理。 ### 协作式多任务处理? 在这篇文章的开头,我们讨论了抢占式和协作式多任务处理。抢占式多任务依赖操作系统强制切换运行中的任务,而协作式多任务则要求任务定期执行 _yield_ 操作来主动放弃 CPU 控制权。协作式方法的最大优势在于任务能够自行保存状态,从而实现更高效的上下文切换,并且还允许任务间共享同一个调用栈。 虽然可能不太明显,但 futures 和 async/await 实际上是协作式多任务模式的一种实现: * 每个添加到执行器的 future 本质上都是协作式任务。 * 相对于使用显式的 yield 操作符,future 通过 `Poll::Pending`(或在最后 `Poll::Ready`)放弃 CPU 核心的控制权。 * 并没有谁要强制 future 放弃 CPU。如果它们想,它们可以永不从 `poll` 中返回。例如通过无限循环。 * 由于每个 future 都有能力阻断执行器中其他 future 的执行,我们得首先相信它们是无恶意的。 * Future 内部存储了所有在下一次 `poll` 调用时继续执行所需的状态。使用 async/await 时,编译器会自动检测所有需要的变量并将它们存储在生成的状态机内部。 * 仅保存继续执行所需的最小状态。 * 由于 `poll` 方法在返回时会释放调用栈,这同一个栈可以用于轮询其他 future。 我们发现 future 和 async/await 完美契合协作式多任务模式,无非是使用的术语不同。因此在下文中,术语 "任务/task" 和 "future" 可以互换使用。 ## 实现 既然我们已经理解了基于 future 和 async/await 的协作式多任务在 Rust 中是如何工作的,现在就该为我们的内核添加对它们的支持了。由于 `Future` trait 是 `core` 库的一部分,而 async/await 是语言本身的特性,我们无需特别处理就能在我们的 `#![no_std]` 内核中使用它。唯一的要求是我们至少需要使用 `2020-03-25` 之后的 Rust nightly 版本,因为在此之前 async/await 还不适用于 `no_std` 。 只要使用足够新的 nightly 版本,我们就可以在 `main.rs` 中开始使用 async/await: ```rust // in src/main.rs async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` `async_number` 函数是一个异步函数 `async fn`,因此编译器会将其转换为一个实现了 `Future` 的状态机。由于该函数仅返回 `42`,最终生成的 future 将直接在第一次 `poll` 调用时返回 `Poll::Ready(42)` 。与 `async_number` 类似,`example_task` 函数也是一个 `async fn`。它会等待(awaits)`async_number` 返回的数字,然后使用 `println` 宏打印该数字。 要运行 `example_task` 返回的 future,我们需要对其调用 `poll` 直到它通过返回 `Poll::Ready` 告知它已经完成。为此,我们需要创建一个简单的执行器类型。 ### Task 模块 在开始实现执行器之前,我们先创建一个包含 `Task` 类型的 `task` 模块: ```rust // in src/lib.rs pub mod task; ``` ```rust // in src/task/mod.rs use core::{future::Future, pin::Pin}; use alloc::boxed::Box; pub struct Task { future: Pin>>, } ``` `Task` 结构体是一个针对已固定、堆分配且动态分发并以空类型 `()` 作为输出的 future 而设计的包装器。让我们详细了解一下: * 我们要求与任务关联的 future 返回 `()`。这意味着任务不会返回任何结果,但它们的运行会产生一些效果,例如,我们上面定义的 `example_task` 函数没有返回值,但它会向屏幕打印一些东西。 * `dyn` 关键字表示我们在 `Box` 中存储了一个 [_trait object_] 。这意味着 future 上的方法是 [ _动态分发 dynamically dispatched_ ][_dynamically dispatched_] 的,这使得不同类型的 future 能够存储在 `Task` 类型中。这一点很重要,因为每个 `async fn` 都有自己的类型,而我们希望能够创建多种不同的任务。 * 正如我们在 [固定 相关章节][section about pinning] 中学到的,`Pin` 类型通过将值放在堆上并阻止创建 `&mut` 引用来确保它不会在内存中被移动。这一点很重要,因为由 async/await 生成的 future 可能是自引用的,也就是说会包含指向自己的指针,这些指针会在 future 移动过程中失效。 [_trait object_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html [_dynamically dispatched_]: https://doc.rust-lang.org/book/ch18-02-trait-objects.html#trait-objects-perform-dynamic-dispatch [section about pinning]: #pinning 为了从 future 创建新的 `Task` 结构体,我们创建一个 `new` 函数: ```rust // in src/task/mod.rs impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { future: Box::pin(future), } } } ``` 该函数接收任意输出类型为 `()` 的 future,并通过 `Box::pin` 函数将其固定在内存中。然后将 box 后的 future 包装在 `Task` 结构体中并返回。此处需要 `'static` 生命周期,因为返回的 `Task` 可以存活任意时长,所以 future 在这个时间内也必须保持有效。 我们还添加了一个 `poll` 方法,允许执行器轮询存储的 future: ```rust // in src/task/mod.rs use core::task::{Context, Poll}; impl Task { fn poll(&mut self, context: &mut Context) -> Poll<()> { self.future.as_mut().poll(context) } } ``` 由于 `Future` trait 的 `poll` 方法期望被 `Pin<&mut T>` 类型调用,我们使用 `Pin::as_mut` 方法先转换 `Pin>` 类型的 `self.future` 字段。然后我们在转换后的 `self.future` 字段上调用 `poll` 并返回结果。由于 `Task::poll` 方法应仅由我们稍后将创建的执行器调用,因此我们将该函数保留为 `task` 模块的私有方法。 ### 简单的执行器 考虑到执行器可能变得相当复杂,在后续实现功能更全面的执行器之前,我们先从创建一个非常基础的执行器开始。为此,我们先创建一个新的 `task::simple_executor` 子模块: ```rust // in src/task/mod.rs pub mod simple_executor; ``` ```rust // in src/task/simple_executor.rs use super::Task; use alloc::collections::VecDeque; pub struct SimpleExecutor { task_queue: VecDeque, } impl SimpleExecutor { pub fn new() -> SimpleExecutor { SimpleExecutor { task_queue: VecDeque::new(), } } pub fn spawn(&mut self, task: Task) { self.task_queue.push_back(task) } } ``` 该结构体包含一个类型为 [`VecDeque`] 的 `task_queue` 字段,其本质上是一个 Vec,允许在两端进行推入和弹出操作。采用这种类型的初衷是我们可以使用 `spawn` 方法在结尾插入新的任务,并从开头弹出下一个任务用于执行。这样子,我们就得到了一个简单的 [FIFO 队列][FIFO queue] (_"first in, first out"_)。 [`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html [FIFO queue]: https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics) #### 假的唤醒器(Dummy Waker) 为了调用 `poll` 方法,我们需要创建一个 `Context` 类型,它封装了一个 `Waker` 类型。简单起见,我们首先创建一个什么都不做的假 waker。为此,我们需要创建一个 [`RawWaker`] 实例,它定义了不同 `Waker` 方法的实现,然后使用 [`Waker::from_raw`] 函数将其转换为 `Waker`: [`RawWaker`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html [`Waker::from_raw`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw ```rust // in src/task/simple_executor.rs use core::task::{Waker, RawWaker}; fn dummy_raw_waker() -> RawWaker { todo!(); } fn dummy_waker() -> Waker { unsafe { Waker::from_raw(dummy_raw_waker()) } } ``` `from_raw` 函数是不安全的,因为如果程序员未能遵守 `RawWaker` 文档的要求,就可能出现未定义行为。在我们关注 `dummy_raw_waker` 函数的实现之前,先来理解 `RawWaker` 类型的工作原理。 ##### `RawWaker` `RawWaker` 类型要求程序员显式定义一个 [虚方法表 virtual method table][_virtual method table_] (vtable)。该表指定了当 `RawWaker` 被克隆、唤醒或被释放时应当调用的函数。该 vtable 的布局由 [`RawWakerVTable`] 类型定义。每个函数接收一个 `*const ()` 参数,这是一个指向某个值的 _类型擦除type-erased_ 的指针。 使用 `*const ()` 指针而非一个合适的引用的原因是 `RawWaker` 类型应当不是泛型(non-generic)但是仍支持任意类型。为了提供该指针,我们将指针放入 [`RawWaker::new`] (这个函数用于初始化 `RawWaker`)的 `data` 参数中。随后 `Waker` 会使用这个 `RawWaker` 的 `data` 调用 vtable 函数。 [_virtual method table_]: https://en.wikipedia.org/wiki/Virtual_method_table [`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html [`RawWaker::new`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new 通常, `RawWaker` 是为某个堆分配的、被包装在 [`Box`] 或者 [`Arc`] 类型中的结构体创建的。对于这类类型,可以使用诸如 [`Box::into_raw`] 这样的方法来将 `Box` 转换为 `*const T` 指针。随后该指针可被转换为匿名的 `*const ()` 指针并传递给 `RawWaker::new`。由于每个虚表函数都接收相同的 `*const ()` 作为参数,这些函数可以安全地将指针转换回 `Box` 或者 `&T` 来操作。可以想象,这个过程极其危险,很容易导致未定义行为。因此,除非必要,不建议手动创建 `RawWaker`。 [`Box`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`Box::into_raw`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw ##### 一个假的 `RawWaker`(A Dummy `RawWaker`) 虽然不建议手动创建 `RawWaker`,但目前尚无其他方式来创建一个什么都不做的假 `Waker`。幸运的是,正因为什么都不做,实现 `dummy_raw_waker` 函数显得相对安全一点: ```rust // in src/task/simple_executor.rs use core::task::RawWakerVTable; fn dummy_raw_waker() -> RawWaker { fn no_op(_: *const ()) {} fn clone(_: *const ()) -> RawWaker { dummy_raw_waker() } let vtable = &RawWakerVTable::new(clone, no_op, no_op, no_op); RawWaker::new(0 as *const (), vtable) } ``` 首先,我们定义两个名为 `no_op` 和 `clone` 的内部函数。`no_op` 函数接收一个 `*const ()` 指针且不执行任何操作。`clone` 函数同样接收一个 `*const ()` 指针并通过再次调用 `dummy_raw_waker` 返回一个新的 `RawWaker`。我们使用这两个函数来创建一个最简的 `RawWakerVTable`:`clone` 函数用于克隆操作,而 `no_op` 函数则用于所有其他操作。由于这个 `RawWaker` 不做任何实际工作,因此从 `clone` 返回一个新的 `RawWaker` 而非克隆它本身也没关系。 创建完 `vtable` 后,我们使用 `RawWaker::new` 函数来创建 `RawWaker`。被传递的 `*const ()` 无关紧要,因为 vtable 中没有任何一个函数使用它。因此,我们只需传递一个空指针。 #### 一个 `run` 方法 既然我们已经掌握了创建 `Waker` 实例的方法,就可以用它为我们的执行器实现一个 `run` 方法。 最简单的 `run` 方法就是在循环中不断轮询所有排队中的任务,直到它们全部完成。这种方式效率不高,因为它没有利用 `Waker` 类型的通知机制,但这是一个快速上手的简单方法: ```rust // in src/task/simple_executor.rs use core::task::{Context, Poll}; impl SimpleExecutor { pub fn run(&mut self) { while let Some(mut task) = self.task_queue.pop_front() { let waker = dummy_waker(); let mut context = Context::from_waker(&waker); match task.poll(&mut context) { Poll::Ready(()) => {} // 任务完成 Poll::Pending => self.task_queue.push_back(task), } } } } ``` 该函数使用 `while let` 循环来处理 `task_queue` 中的所有任务。对于每个任务,它首先通过包装由我们的 `dummy_waker` 函数返回的 `Waker` 实例来创建一个 `Context` 类型。然后它使用这个 `context` 调用 `Task::poll` 方法。如果 `poll` 方法返回 `Poll::Ready`,就表示任务已完成,我们可以接着处理下一个任务。如果任务仍处于 `Poll::Pending` 状态,我们会再次将其添加到队列末尾,以便后续的循环迭代再次轮询它。 #### 尝试 现在有了 `SimpleExecutor` 类型,我们可以在 `main.rs` 中尝试运行 `example_task` 函数返回的任务: ```rust // in src/main.rs use blog_os::task::{Task, simple_executor::SimpleExecutor}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] 初始化过程,包括 `init_heap` let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.run(); // […] test_main, "it did not crash" 信息, hlt_loop } // 下面再次展示 example_task 函数,方便阅读 async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` 当我们运行它时,会看到预期的 _"async number: 42"_ 消息打印在屏幕上: ![QEMU printing "Hello World", "async number: 42", and "It did not crash!"](qemu-simple-executor.png) 让我们总结一下这个示例中发生的各个步骤: * 首先,创建一个新的 `SimpleExecutor` 类型实例,其 `task_queue` 为空。 * 接着,我们调用异步函数 `example_task`,该函数返回一个 future。我们将这个 future 包装在 `Task` 类型中,这会将其移动到堆上并固定,然后通过 `spawn` 方法将任务添加到执行器的 `task_queue` 中。 * 接着我们调用 `run` 方法来启动队列中单个任务的执行。这包括: * 从 `task_queue` 前端弹出任务。 * 为任务创建一个 `RawWaker` ,将其转换为`Waker` 实例,之后从它创建一个 `Context` 实例。 * 使用 `Context` ,在任务的 future 上调用 `poll` 方法。 * 由于 `example_task` 并不需要等待什么,它可以在第一次轮询就直接跑完。于是就会打印出 _"async number: 42"_ 消息。 * 由于 `example_task` 直接返回 `Poll::Ready` ,它不会被重新添加到 `task_queue` 尾部。 * `run` 方法会在 `task_queue` 变空之后返回。`kernel_main` 函数会继续执行,并打印 _"It did not crash!"_ 。 ### 异步键盘输入 我们的简单执行器并未利用 `Waker` 通知机制,而只是循环遍历所有任务直到它们完成。这对我们的示例来说不是问题,因为我们的 `example_task` 可以在首次轮询时直接运行至结束。要了解正确使用 `Waker` 带来的性能优势,我们首先需要创建一个真正异步的任务,即一个能够在第一次轮询调用时返回 `Poll::Pending` 的任务。 我们的系统中已经具备某种异步机制,可以为此所用:硬件中断。正如我们在 [中断][_Interrupts_] 一文中了解到的,硬件中断可能在任何时间点发生,这由外部设备决定。例如,硬件定时器会在预定义的时间间隔过后发送一个中断信号给 CPU。当 CPU 接收到中断时,立即将控制权转移至中断描述符表(IDT)中定义的相应处理函数。 [_Interrupts_]: @/edition-2/posts/07-hardware-interrupts/index.md 接下来我们将基于键盘中断创建一个异步任务。键盘中断是一个很好的选择,因为它既具有非确定性又对延迟敏感。非确定性意味着无法预测下一次按键何时发生,因为这完全取决于用户。延迟敏感意味着我们需要及时处理键盘输入,否则用户会感受到延迟。为了高效支持此类任务,执行器必须对 `Waker` 通知提供适当支持。 #### 扫描码队列(Scancode Queue) 目前,我们直接在中断处理程序中处理键盘输入。这种做法从长远来看并不理想,中断处理程序应尽可能保持简短,因为它们可能会阻碍重要工作。事实上,中断处理程序只应执行最必要的少量工作(例如读取键盘扫描码),而将其余工作(例如解释扫描码)留给后台任务处理。 将工作委托给后台任务的常见模式是创建某种队列。中断处理程序将工作单元推入队列,后台任务则处理队列中的工作。对于我们的键盘中断来说,这意味着中断处理程序仅从键盘读取扫描码,将其推入队列后直接返回。键盘任务位于队列的另一端,负责解释并处理每个被推送过来的扫描码: ![Scancode queue with 8 slots on the top. Keyboard interrupt handler on the bottom left with a "push scancode" arrow to the left of the queue. Keyboard task on the bottom right with a "pop scancode" arrow coming from the right side of the queue.](scancode-queue.svg) 该队列的一个简单实现可以是受互斥锁保护的 `VecDeque`。然而,在中断处理程序中使用互斥锁并不是个好主意,因为这很容易导致死锁。例如,在键盘任务将队列锁定时用户按下按键,中断处理程序会尝试再次获取锁并无限期挂起。此方法的另一个问题是 `VecDeque` 在快满时会通过执行新的堆分配来自动增加其容量。这可能导致再次出现死锁,因为我们的分配器内部也使用了互斥锁。进一步的问题在于,由于堆内存已碎片化,堆内存分配可能失败或耗费相当长的时间。 为了避免这些问题,我们需要一种在 `push` 时无需互斥锁或内存分配的队列实现。这类队列可通过使用无锁(lock-free)[原子操作][atomic operations] 压入或者弹出元素来实现。这样,就可以创建只需要 `&self` 引用,无需互斥锁就可以使用的 `push` 和 `pop` 操作。为避免在 `push` 时分配内存,我们可以使用一个预分配的固定大小的缓冲区。虽然这会导致队列变得有界(即有最大长度),但是实践中通常可以定义出一个合理的上界,所以这没有大问题。 [atomic operations]: https://doc.rust-lang.org/core/sync/atomic/index.html ##### `crossbeam` Crate 以正确且高效的方式实现这样的队列非常困难,因此我建议使用现有的、经过充分测试的实现方案。有一个名为 [`crossbeam`] 的流行的 Rust 项目实现了多种用于并发编程的无互斥锁类型。它提供了一种名为 [`ArrayQueue`] 的类型,这正是我们当前场景所需要的。而且很幸运的是,该类型完全兼容支持内存分配的 `no_std` crate。 [`crossbeam`]: https://github.com/crossbeam-rs/crossbeam [`ArrayQueue`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html 要使用该类型,我们需要添加对 `crossbeam-queue` 的依赖: ```toml # in Cargo.toml [dependencies.crossbeam-queue] version = "0.3.11" default-features = false features = ["alloc"] ``` 默认情况下,该 crate 依赖于标准库。要使其兼容 `no_std` ,需要禁用其默认特性并启用 `alloc` 特性。(注意,我们也可以添加对主 `crossbeam` crate 的依赖,它会重新导出 `crossbeam-queue` crate,但这样会导致依赖项增多,延长编译时间。) ##### 队列实现 使用 `ArrayQueue` 类型,我们现在可以在新的 `task::keyboard` 模块中创建一个全局扫描码队列: ```rust // in src/task/mod.rs pub mod keyboard; ``` ```rust // in src/task/keyboard.rs use conquer_once::spin::OnceCell; use crossbeam_queue::ArrayQueue; static SCANCODE_QUEUE: OnceCell> = OnceCell::uninit(); ``` 由于 [`ArrayQueue::new`] 会执行堆分配操作,而这在编译时[还][const-heap-alloc]无法实现,我们无法直接初始化静态变量。为此,我们使用了 [`conquer_once`] crate 的 [`OnceCell`] 类型,它能安全地实现静态值的一次性初始化。要引入该 crate,我们需要在 `Cargo.toml` 中添加它作为依赖项: [`ArrayQueue::new`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.new [const-heap-alloc]: https://github.com/rust-lang/const-eval/issues/20 [`OnceCell`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html [`conquer_once`]: https://docs.rs/conquer-once/0.2.0/conquer_once/index.html ```toml # in Cargo.toml [dependencies.conquer-once] version = "0.2.0" default-features = false ``` 除了 [`OnceCell`] 原语,我们也可以在此处使用 [`lazy_static`] 宏。不过,`OnceCell` 类型的优势在于,我们可以确保初始化操作不会发生在中断处理程序中,从而防止中断处理程序执行堆分配操作。 [`lazy_static`]: https://docs.rs/lazy_static/1.4.0/lazy_static/index.html #### 填充队列 为了填充扫描码队列,我们创建了一个新的 `add_scancode` 函数,它将被中断处理程序调用。 ```rust // in src/task/keyboard.rs use crate::println; /// 被中断处理程序调用 /// /// 不能阻塞或者分配 pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } } else { println!("WARNING: scancode queue uninitialized"); } } ``` 我们使用 [`OnceCell::try_get`] 获取已初始化的队列的引用。如果队列尚未初始化,则忽略键盘扫描码并打印警告信息。关键点在于我们不应在此函数中尝试初始化队列,因为它会被中断处理程序调用,而中断处理程序不应执行堆分配。由于此函数不应能从我们的 `main.rs` 中调用,我们使用 `pub(crate)` 可见性使其仅对我们的 `lib.rs` 可用。 [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get [`ArrayQueue::push`] 方法仅需要 `&self` 引用,这使得在静态队列上调用该方法非常简单。 `ArrayQueue` 类型会自行处理所有必要的同步,所以此处不需要互斥锁包装。当队列满时,我们也会打印一个警告信息。 [`ArrayQueue::push`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push 为了在键盘中断时调用 `add_scancode` 函数,我们更新了 `interrupts` 模块中的 `keyboard_interrupt_handler` 函数: ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame ) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; crate::task::keyboard::add_scancode(scancode); // new unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` 我们移除了该函数中的所有键盘处理代码,转而添加了对 `add_scancode` 函数的调用。函数的其余部分保持与之前相同。 正如预期的那样,当我们使用 `cargo run` 运行项目时,按键不再被打印到屏幕上。相反,我们看到每次按键都会出现扫描码队列未初始化的警告。 #### 扫描码流 为了初始化 `SCANCODE_QUEUE` 并以异步方式从队列中读取扫描码,我们创建了一个 `ScancodeStream` 类型: ```rust // in src/task/keyboard.rs pub struct ScancodeStream { _private: (), } impl ScancodeStream { pub fn new() -> Self { SCANCODE_QUEUE.try_init_once(|| ArrayQueue::new(100)) .expect("ScancodeStream::new should only be called once"); ScancodeStream { _private: () } } } ``` `_private` 字段的目的是防止从模块外部构造该结构体。这使得 `new` 函数成为构造该类型的唯一方式。在函数中,我们首先尝试初始化 `SCANCODE_QUEUE` 静态变量。如果它已被初始化,我们会触发 panic 以确保只能创建一个 `ScancodeStream` 实例。 为了使扫描码可用于异步任务,下一步是实现一个类似 `poll`(`poll`-like)的方法。该方法尝试从队列中弹出下一个扫描码。虽然这听起来像是我们应该为我们的类型实现 `Future` trait,但实际上并非如此。问题在于 `Future` trait 仅抽象单个异步值,并期望在返回 `Poll::Ready` 后不再被调用。然而,我们的扫描码队列包含多个异步值,因此可以持续轮询它。 ##### `Stream` Trait 由于能产生多个异步值的类型很常见,`futures` crate 为此类类型提供了一个实用的抽象:[`Stream`] trait。该 trait 定义如下: [`Stream`]: https://rust-lang.github.io/async-book/05_streams/01_chapter.html ```rust pub trait Stream { type Item; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll>; } ``` 这个定义与 `Future` trait 非常相似,主要区别如下; * 相关类型命名为 `Item` 而非 `Output`。 * `Stream` trait 没有定义返回 `Poll` 的 `poll` 方法,而是定义了返回 `Poll>` 的 `poll_next` 方法(注意多出的 `Option` 包装)。 还存在语义上的差异:可以重复调用 `poll_next`,直到它返回 `Poll::Ready(None)` 表示 stream 已结束。在这方面,该方法类似于 [`Iterator::next`] 方法,后者同样在最后一个值之后返回 `None`。 [`Iterator::next`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html#tymethod.next ##### 实现 `Stream` 让我们为 `ScancodeStream` 实现 `Stream` trait,来以异步方式提供 `SCANCODE_QUEUE` 的值。为此,我们首先需要添加对 `futures-util` crate 的依赖,它包含 `Stream` 类型: ```toml # in Cargo.toml [dependencies.futures-util] version = "0.3.4" default-features = false features = ["alloc"] ``` 我们禁用默认特性以使该 crate 兼容 `no_std` ,并启用 `alloc` 使其基于分配的类型可用(我们稍后会需要这个功能)。(注意,我们也可以添加对主 `futures` crate 的依赖,它会重新导出 `futures-util` crate,但这将导致更多的依赖项和更长的编译时间。) 现在我们可以导入并实现 `Stream` trait: ```rust // in src/task/keyboard.rs use core::{pin::Pin, task::{Poll, Context}}; use futures_util::stream::Stream; impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE.try_get().expect("not initialized"); match queue.pop() { Some(scancode) => Poll::Ready(Some(scancode)), None => Poll::Pending, } } } ``` 我们首先使用 [`OnceCell::try_get`] 方法来获取已初始化的扫描码队列的引用。由于我们在 `new` 函数中已经初始化了队列,这不应当会失败,因此可以安全地使用 `expect` 方法在未初始化时触发 panic。接着,我们使用[`ArrayQueue::pop`]方法尝试从队列中获取下一个元素。如果成功,我们返回封装在 `Poll::Ready(Some(…))` 的扫描码。若失败则表明队列为空,此时我们返回 `Poll::Pending`。 [`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop #### Waker 支持 与 `Futures::poll` 方法类似,`Stream::poll_next` 方法要求异步任务在返回 `Poll::Pending` 之后变为就绪状态时通知执行器。这样,执行器无需重复轮询同一任务,直到收到通知为止,这显著降低了等待任务的性能开销。 要发送此通知,任务应从传入的 `Context` 引用中提取 `Waker` 并将其存储在某处。当任务准备就绪时,应调用存储的 `Waker` 上的 `wake` 方法来通知执行器应当再次轮询该任务。 ##### AtomicWaker 要为我们的 `ScancodeStream` 实现 `Waker` 通知,我们需要一个可以在两次轮询调用之间存储 `Waker` 的位置。我们不能将其作为字段存储在 `ScancodeStream` 中,因为它需要能从 `add_scancode` 函数中访问。解决方案是使用 `futures-util` crate 提供的 [`AtomicWaker`] 类型的静态变量。就像 `ArrayQueue` 类型,该类型基于原子指令,可以安全地存储在 `static` 中并支持并发修改。 [`AtomicWaker`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html 让我们使用 `AtomicWaker` 类型来定义一个静态的 `WAKER`: ```rust // in src/task/keyboard.rs use futures_util::task::AtomicWaker; static WAKER: AtomicWaker = AtomicWaker::new(); ``` 这个想法是让 `poll_next` 将当前的 waker 存储在此静态变量中,而 `add_scancode` 函数在有新扫描码加入队列时对其执行 `wake` 函数。 ##### 存储 Waker `poll`/`poll_next` 的定义要求当任务返回 `Poll::Pending` 时,为传过来的 `Waker` 注册一个唤醒动作(wakeup)。让我们修改 `poll_next` 的实现以满足这一要求: ```rust // in src/task/keyboard.rs impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE .try_get() .expect("scancode queue not initialized"); // fast path if let Some(scancode) = queue.pop() { return Poll::Ready(Some(scancode)); } WAKER.register(&cx.waker()); match queue.pop() { Some(scancode) => { WAKER.take(); Poll::Ready(Some(scancode)) } None => Poll::Pending, } } } ``` 和之前一样,我们首先使用 `OnceCell::try_get` 函数获取已初始化的扫描码队列的引用。随后我们乐观地尝试从队列中 `pop` 扫描码,当成功时返回 `Poll::Ready` 。这样,我们就可以避免队列不为空时注册唤醒器产生的性能开销。 如果首次调用 `queue.pop()` 未成功,意味着队列可能为空。只是“可能”,是因为中断处理程序可能在检查之后立即异步地填充了队列。由于这种竞态条件可能在下次检查时再次发生,我们需要在第二次检查之前将 `Waker` 注册到 `WAKER` 静态变量中。这样,虽然在返回 `Poll::Pending` 之前有可能会收到唤醒动作,但可以确保在检查之后每一个压入的扫描码都能收到唤醒动作。 在通过 [`AtomicWaker::register`] 函数注册传入的 `Context` 中包含的 `Waker` 后,我们第二次尝试从队列中 pop。如果现在成功,我们返回 `Poll::Ready`。同时我们使用 [`AtomicWaker::take`] 再次移除已注册的 waker,因为不再需要唤醒通知。当 `queue.pop()` 第二次失败时,我们会像之前一样返回 `Poll::Pending`,但这次会附带一个已注册的唤醒动作。 [`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register [`AtomicWaker::take`]: https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take 需要注意的是,对于尚未返回 `Poll::Pending` 的任务,有两种方式可能触发唤醒。一种方式是前面提到的竞态条件,当唤醒在返回 `Poll::Pending` 之前时立即发生。另一种情况是当注册 waker 后队列不再为空,此时会返回 `Poll::Ready` 。由于这些虚假的唤醒无法避免,执行器需要能够正确处理它们。 ##### 唤醒存储的唤醒器 为了唤醒存储的 `Waker`,我们在 `add_scancode` 函数中添加了对 `WAKER.wake()` 的调用: ```rust // in src/task/keyboard.rs pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } else { WAKER.wake(); // new } } else { println!("WARNING: scancode queue uninitialized"); } } ``` 我们唯一所做的更改是在将数据成功推送到扫描码队列时添加了对 `WAKER.wake()` 的调用。如果在 `WAKER` 静态变量中注册了唤醒器,此方法将调用其同名的 [`wake`] 方法,从而通知执行器。否则该操作将无实际效果,什么都不会发生。 [`wake`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.wake 关键点在于我们必须只在推入队列之后再调用 `wake` ,否则任务可能会在队列仍为空时过早被唤醒。这种情况可能发生在例如一个多线程执行器在不同的 CPU 核心上并发启动被唤醒的任务时。虽然我们现在还没添加线程支持,但之后我们会实现它,并且不希望出问题。 #### 键盘任务 既然我们已经为 `ScancodeStream` 实现了 `Stream` trait,现在我们可以用它来创建一个异步键盘任务: ```rust // in src/task/keyboard.rs use futures_util::stream::StreamExt; use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use crate::print; pub async fn print_keypresses() { let mut scancodes = ScancodeStream::new(); let mut keyboard = Keyboard::new(ScancodeSet1::new(), layouts::Us104Key, HandleControl::Ignore); while let Some(scancode) = scancodes.next().await { if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } } } ``` 这段代码与我们之前在 [键盘中断处理程序][keyboard interrupt handler] 中的代码非常相似,只是在本篇文章中进行了修改。唯一的区别在于,我们不再从 I/O 端口读取扫描码,而是从 `ScancodeStream` 获取。为此,我们首先创建一个新的 `Scancode` 流,然后重复使用 [`StreamExt`] trait 中提供的 [`next`] 方法来获取一个 `Future` ,这个 `Future` 可以解析为流中下一个元素。通过在其上使用 await 来异步地等待其结果。 [keyboard interrupt handler]: @/edition-2/posts/07-hardware-interrupts/index.md#interpreting-the-scancodes [`next`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html#method.next [`StreamExt`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html 我们使用 `while let` 循环直到流返回 `None` 表示结束。由于我们的 `poll_next` 方法永远不会返回 `None` ,因此这实际上是一个无限循环,所以 `print_keypresses` 任务永远不会结束。 让我们将 `print_keypresses` 任务添加到 `main.rs` 的执行器中,来让键盘输入再次正常工作: ```rust // in src/main.rs use blog_os::task::keyboard; // new fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] 初始化过程,包括 init_heap, test_main let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); // new executor.run(); // […] "it did not crash" 信息, hlt_loop } ``` 现在执行 `cargo run` 时,我们可以看到键盘输入功能已恢复: ![QEMU printing ".....H...e...l...l..o..... ...W..o..r....l...d...!"](qemu-keyboard-output.gif) 如果你留意观察电脑的 CPU 使用率,就会发现 `QEMU` 进程现在持续占用 CPU 资源。这是因为我们的 `SimpleExecutor` 在一个循环中反复轮询任务。因此就算我们没有在键盘上按下任何键,执行器也会持续调用 `print_keypresses` 任务的 `poll` 方法,即使该任务此时无法取得任何进展并且会返回 `Poll::Pending` 。 ### 支持 Waker 的 Executor 为解决性能问题,我们需要创建一个能正确利用 `Waker` 通知的执行器。这样,当下一个键盘中断发生时,执行器会收到通知,从而不需要反复轮询 `print_keypresses` 任务。 #### 任务 ID 创建支持唤醒通知的执行器的第一步是为每个任务分配唯一 ID。这是必需的,因为我们需要一种方式来指定应该唤醒哪个任务。我们首先创建一个新的 `TaskId` 包装类型: ```rust // in src/task/mod.rs #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] struct TaskId(u64); ``` `TaskId` 结构体是 `u64` 的简单包装类型。我们为其派生多个 trait 以使其可打印、可复制、可比较并且可排序。后者很重要,因为我们希望使用 `TaskId` 作为 [`BTreeMap`] 的键类型。 [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html 为了创建新的唯一 ID,我们创建了一个 `TaskId::new` 函数: ```rust use core::sync::atomic::{AtomicU64, Ordering}; impl TaskId { fn new() -> Self { static NEXT_ID: AtomicU64 = AtomicU64::new(0); TaskId(NEXT_ID.fetch_add(1, Ordering::Relaxed)) } } ``` 该函数使用了一个静态的 `NEXT_ID` 变量,其类型为 [`AtomicU64`] 以确保每个 ID 都仅被赋值一次。[`fetch_add`] 方法会原子性地增加该值并在单个原子操作中返回先前值。这意味着即使当 `TaskId::new` 方法被并行调用,每个 ID 都只被返回一次。[`Ordering`] 参数决定是否允许编译器在指令流中重新排列 `fetch_add` 操作。由于我们仅要求 ID 唯一,因此在此情况下,具有最弱要求的 `Relaxed` 排序就足够了。 [`AtomicU64`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html [`fetch_add`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html#method.fetch_add [`Ordering`]: https://doc.rust-lang.org/core/sync/atomic/enum.Ordering.html 我们现在可以为 `Task` 类型扩展一个额外的 `id` 字段: ```rust // in src/task/mod.rs pub struct Task { id: TaskId, // new future: Pin>>, } impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { id: TaskId::new(), // new future: Box::pin(future), } } } ``` 新的 `id` 字段能够为任务赋予唯一名称,这是唤醒特定任务所必需的。 #### `Executor` 类型 我们在 `task::executor` 模块中创建新的 `Executor` 类型: ```rust // in src/task/mod.rs pub mod executor; ``` ```rust // in src/task/executor.rs use super::{Task, TaskId}; use alloc::{collections::BTreeMap, sync::Arc}; use core::task::Waker; use crossbeam_queue::ArrayQueue; pub struct Executor { tasks: BTreeMap, task_queue: Arc>, waker_cache: BTreeMap, } impl Executor { pub fn new() -> Self { Executor { tasks: BTreeMap::new(), task_queue: Arc::new(ArrayQueue::new(100)), waker_cache: BTreeMap::new(), } } } ``` 与在 `SimpleExecutor` 中使用 `VecDeque` 存储任务不同,我们使用一个存储任务 ID 的 `task_queue` 和一个名为 `tasks` 、包含实际 `Task` 实例的 `BTreeMap`。该 map 通过 `TaskId` 高效地索引特定任务。 `task_queue` 字段是一个存储任务 ID 的 `ArrayQueue` 类型,被封装在 [`Arc`] 类型中以实现引用计数(_reference counting_)。引用计数可以实现在多个所有者之间共享所有权。它通过在堆上分配值并统计其活跃的引用来实现。当活跃引用数量降至零时,该值将被不再需要,可以释放。 我们给 `task_queue` 使用 `Arc` 类型,因为它将在执行器和唤醒器之间共享。其设计思路是唤醒器将被唤醒任务的 ID 推送到队列中。执行器位于队列的接收端,通过 ID 从 `tasks` map 中检索被唤醒的任务,然后运行它们。选择固定大小队列而非无界队列(例如 [`SegQueue`])的原因是,在往其中推入数据时,中断处理程序不应该进行内存分配。 除了 `task_queue` 和 `tasks` map 外,`Executor` 类型还有一个 `waker_cache` 字段,同样为 map。该 map 会在任务创建后缓存其 `Waker`,原因有二:首先,它通过为同一任务的多次唤醒复用同一个唤醒器来提高性能,而不是每次都创建新的唤醒器。其次,它确保引用计数的唤醒器不会在中断处理程序中被释放,因为这可能导致死锁(下文将对此进行更详细的说明)。 [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`SegQueue`]: https://docs.rs/crossbeam-queue/0.2.1/crossbeam_queue/struct.SegQueue.html 为了创建执行器,我们提供了一个简单的 `new` 函数。我们设置 `task_queue` 的容量为100,这在可预见的未来应该绰绰有余。万一未来我们的系统并发任务数超过100,我们也可以轻松增加这个容量。 #### 生成任务 对于 `SimpleExecutor`,我们针对 `Executor` 类型提供了 `spawn` 方法,用于将给定的任务添加到 `tasks` map 中,并通过将其 ID 推送到 `task_queue` 来立即唤醒它: ```rust // in src/task/executor.rs impl Executor { pub fn spawn(&mut self, task: Task) { let task_id = task.id; if self.tasks.insert(task.id, task).is_some() { panic!("task with same ID already in tasks"); } self.task_queue.push(task_id).expect("queue full"); } } ``` 如果 map 中已存在相同 ID 的任务,`BTreeMap::insert` 方法会将其返回。这种情况不应发生,因为每个任务都有唯一 ID,所以此时我们触发 panic,这表明我们的代码中存在错误。同样地,当 `task_queue` 已满时我们也会触发 panic,因为如果我们选择一个足够大的队列大小,这种情况也不应该发生。 #### 运行任务 要执行 `task_queue` 中的所有任务,我们创建私有的 `run_ready_tasks` 函数: ```rust // in src/task/executor.rs use core::task::{Context, Poll}; impl Executor { fn run_ready_tasks(&mut self) { // 解构 `self` 来避免借用检查器报错 let Self { tasks, task_queue, waker_cache, } = self; while let Some(task_id) = task_queue.pop() { let task = match tasks.get_mut(&task_id) { Some(task) => task, None => continue, // 任务不存在 }; let waker = waker_cache .entry(task_id) .or_insert_with(|| TaskWaker::new(task_id, task_queue.clone())); let mut context = Context::from_waker(waker); match task.poll(&mut context) { Poll::Ready(()) => { // 任务完成 -> 移除它和它缓存的唤醒器 tasks.remove(&task_id); waker_cache.remove(&task_id); } Poll::Pending => {} } } } } ``` 该函数的基本思路与我们的 `SimpleExecutor` 类似:循环遍历 `task_queue` 中的所有任务,为每个任务创建一个唤醒器,然后轮询它们。不过与将待处理任务重新放回 `task_queue` 的末尾不同,我们让 `TaskWaker` 负责将唤醒的任务重新加入队列。该唤醒器类型的实现将在稍后展示。 让我们深入看看这个 `run_ready_tasks` 方法的一些实现细节: * 我们使用 [解构][_destructuring_] 将 self 拆分为三个字段,以避免一些借用检查器报错。具体来说,我们的实现需要从一个闭包内访问 `self.task_queue`,这会导致尝试借用自身。这是一个基本的借用检查器问题,该问题将在 [RFC 2229] 被 [实现][RFC 2229 impl] 后得到解决。 * 对于每个弹出的任务 ID,我们从 `tasks` map 中获取对应任务的可变引用。由于我们的 `ScancodeStream` 实现在检查任务是否需要进入休眠状态前会先注册唤醒器,可能会出现一个已不存在的任务被唤醒的情况。这种情况下,我们只需忽略这次唤醒并继续处理队列里的下一个 ID。 * 为了避免每次轮询时创建唤醒器带来的性能开销,我们使用了 `waker_cache` map 用于存储每个任务创建后对应的唤醒器。为此,我们使用 [`BTreeMap::entry`] 方法结合 [`Entry::or_insert_with`] ,来在唤醒器不存在时创建新实例,然后获取其可变引用。为了创建新的唤醒器,我们克隆 `task_queue` 并将其与任务 ID 一同传递给 `TaskWaker::new` 函数(具体实现如下所示)。由于 `task_queue` 被封装在 `Arc` 中,克隆操作仅会增加该值的引用计数,但仍指向同一个堆分配的队列。请注意,并非所有唤醒器的实现都能像这样重复使用,不过我们的 `TaskWaker` 类型可以做到。 [_destructuring_]: https://doc.rust-lang.org/book/ch19-03-pattern-syntax.html#destructuring-to-break-apart-values [RFC 2229]: https://github.com/rust-lang/rfcs/pull/2229 [RFC 2229 impl]: https://github.com/rust-lang/rust/issues/53488 [`BTreeMap::entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry [`Entry::or_insert_with`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with 当任务返回 `Poll::Ready` 时即视为完成。此时我们会使用 [`BTreeMap::remove`] 方法将其从 `tasks` map 中移除。我们还会移除其缓存的唤醒器,如果存在的话。 [`BTreeMap::remove`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove #### 唤醒器的设计 唤醒器的作用是将被唤醒任务的 ID 推送到执行器的 `task_queue` 中。我们通过创建一个新的 `TaskWaker` 结构体来实现这一点。该结构体存储任务 ID 和对 `task_queue` 的引用: ```rust // in src/task/executor.rs struct TaskWaker { task_id: TaskId, task_queue: Arc>, } ``` 由于 `task_queue` 的所有权在执行器和唤醒器之间共享,我们使用 [`Arc`] 包装类型来实现共享的引用计数所有权。 [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html 唤醒操作的实现相当简单: ```rust // in src/task/executor.rs impl TaskWaker { fn wake_task(&self) { self.task_queue.push(self.task_id).expect("task_queue full"); } } ``` 我们将 `task_id` 推送到引用的 `task_queue`。由于对 [`ArrayQueue`] 类型的修改仅需要一个共享引用,我们可以在 `&self` 上实现此方法,而非 `&mut self` 。 ##### `Wake` Trait 为了使用我们的 `TaskWaker` 类型轮询 future,首先需要将其转换为 [`Waker`] 实例。这是必需的,因为[`Future::poll`] 函数使用一个 [`Context`] 实例作为参数,而该实例只能从 `Waker` 类型构造。虽然我们可以通过提供对 [`RawWaker`] 类型的实现来做到这一点,但还是这么做更简单且安全:实现基于 `Arc` 的 [`Wake`][wake-trait] trait 并使用标准库提供的 [`From`] 实现来构造 `Waker`。 该 trait 的实现如下所示: [wake-trait]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html ```rust // in src/task/executor.rs use alloc::task::Wake; impl Wake for TaskWaker { fn wake(self: Arc) { self.wake_task(); } fn wake_by_ref(self: &Arc) { self.wake_task(); } } ``` 由于唤醒器通常在执行器与异步任务之间共享,该 trait 方法要求将 `Self` 实例包装在实现了引用计数所有权的 [`Arc`] 类型中。这意味着为了调用它们,我们需要移动 `TaskWaker` 到 `Arc` 。 `wake` 和 `wake_by_ref` 方法之间的区别在于,后者只需要一个对 `Arc` 的引用,而前者则获取 `Arc` 的所有权,因此通常需要增加引用计数。并非所有类型都支持通过引用唤醒,因此对 `wake_by_ref` 方法的实现是可选的。不过,它能带来更好的性能,因为它避免了不必要的引用计数修改。在我们的案例中,可以简单地将这两个 trait 方法导向(forward)我们的 `wake_task` 函数,该函数只需要一个共享的 `&self` 引用。 ##### 创建唤醒器 由于 `Waker` 类型对所有实现了 `Wake` trait 且用 `Arc` 包装的值都支持 [`From`] 转换,我们现在可以实现 `Executor::run_ready_tasks` 方法所需的 `TaskWaker::new` 函数了: [`From`]: https://doc.rust-lang.org/nightly/core/convert/trait.From.html ```rust // in src/task/executor.rs impl TaskWaker { fn new(task_id: TaskId, task_queue: Arc>) -> Waker { Waker::from(Arc::new(TaskWaker { task_id, task_queue, })) } } ``` 我们使用传入的 `task_id` 和 `task_queue` 创建 `TaskWaker` 。然后,将其包装在 `Arc` 中,并通过 `Waker::from` 实现将其转换为 [`Waker`]。这个 `from` 方法负责为我们的 `TaskWaker` 类型构建 [`RawWakerVTable`] 和 [`RawWaker`] 的实例。如果您对其工作原理细节感兴趣,请查看 [implementation in the `alloc` crate][waker-from-impl]。 [waker-from-impl]: https://github.com/rust-lang/rust/blob/cdb50c6f2507319f29104a25765bfb79ad53395c/src/liballoc/task.rs#L58-L87 #### `run` 方法 有了我们的唤醒器实现,现在终于可以为执行器构建一个 `run` 方法: ```rust // in src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); } } } ``` 该方法只需循环调用 `run_ready_tasks` 函数。虽然理论上我们可以在 `tasks` map 为空时从函数返回,但由于我们的 `keyboard_task` 永远不会完成,这种情况永远不会发生,因此一个简单的 `loop` 循环就足够了。由于该函数永远不会返回,我们使用 `!` 返回类型将函数标记为发散。 [diverging]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html 现在我们可以修改 `kernel_main` 来使用新的 `Executor` 替代 `SimpleExecutor`: ```rust // in src/main.rs use blog_os::task::executor::Executor; // new fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialization routines, including init_heap, test_main let mut executor = Executor::new(); // new executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); executor.run(); } ``` 我们只需更改导入和类型名称。由于我们的 `run` 函数被标记为发散,编译器知道它永远不会返回,因此我们不再需要在 `kernel_main` 函数末尾调用 `hlt_loop`。 现在我们使用 `cargo run` 运行内核时,可以看到键盘输入仍然有效: ![QEMU printing ".....H...e...l...l..o..... ...a..g..a....i...n...!"](qemu-keyboard-output-again.gif) 然而,QEMU 的 CPU 利用率并未得到改善。原因在于我们仍然让 CPU 持续处于忙碌状态。我们不再一直轮询任务到它们被再次唤醒,但仍在循环中频繁地检查 `task_queue` 。为了解决这个问题,我们需要让 CPU 在没有任务时进入休眠状态。 #### 空闲时休眠 基本思路是在 `task_queue` 为空时执行 [hlt 指令][`hlt` instruction]。该指令会让 CPU 进入休眠状态,直到下一个中断到来。CPU 能在中断发生时立即重新激活,这确保了当中断处理程序向 `task_queue` 推送时,系统仍能立即作出响应。 [`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) 为实现此功能,我们在执行器中创建了一个新的 `sleep_if_idle` 方法,并从我们的 `run` 方法中调用它: ```rust // in src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); self.sleep_if_idle(); // new } } fn sleep_if_idle(&self) { if self.task_queue.is_empty() { x86_64::instructions::hlt(); } } } ``` 由于我们在 `run_ready_tasks` 之后直接调用了 `sleep_if_idle` ,而该函数会循环执行直到 `task_queue` 为空,再次检查队列可能显得多余。然而,硬件中断可能在 `run_ready_tasks` 返回后立即发生,因此在调用 `sleep_if_idle` 函数时可能会有新任务进入队列。仅当队列仍为空时,我们才会使用 [`x86_64`] crate 提供的 [`instructions::hlt`] 包装函数来执行 `hlt` 指令使 CPU 进入休眠。 [`instructions::hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/fn.hlt.html [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/index.html 遗憾的是,这个实现中仍存在一个微妙的竞态条件。由于中断是异步的且可能在任何时刻发生,有可能在 `is_empty` 检查与 `hlt` 调用之间恰好发生中断: ```rust if self.task_queue.is_empty() { /// <--- 中断可能在此处发生 x86_64::instructions::hlt(); } ``` 若此时中断向 `task_queue` 推送了任务,我们就会让 CPU 进入休眠,尽管此时队列中已有任务等待运行。最坏情况下,这可能导致键盘中断的处理被延迟,直至下一次按键或定时器中断。那么我们该如何防止这种情况呢? 答案是在检查前禁用 CPU 中断,并在之后与 `hlt` 指令一起原子性地重新启用中断。这样,中间发生的所有中断都会被延迟到执行 `hlt` 指令后,确保不会错过任何唤醒动作。为实现这一方法,我们可以使用 [`x86_64`] crate 提供的 [`interrupts::enable_and_hlt`][`enable_and_hlt`] 函数。 [`enable_and_hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.enable_and_hlt.html 更新后的 `sleep_if_idle` 函数实现如下: ```rust // in src/task/executor.rs impl Executor { fn sleep_if_idle(&self) { use x86_64::instructions::interrupts::{self, enable_and_hlt}; interrupts::disable(); if self.task_queue.is_empty() { enable_and_hlt(); } else { interrupts::enable(); } } } ``` 为避免竞态条件,我们在检查 `task_queue` 是否为空前会先禁用中断。如果为空,则使用 `enable_and_hlt` 函数以单一原子操作的形式来启用中断并使 CPU 进入睡眠。若队列不再为空,则意味着有中断在 `run_ready_tasks` 返回后唤醒了一个任务。在这种情况下,我们再次启用中断,并且直接继续处理任务,而不执行 `hlt` 指令。 现在我们的执行器在没有任务时会正确让 CPU 进入休眠状态。可以看到,当我们再次使用 `cargo run` 运行内核时,QEMU 进程的 CPU 占用率大幅降低。 #### 可能的扩展功能 我们的执行器现在能够高效地运行任务。它利用唤醒通知机制来避免轮询等待中的任务,并在当前无工作可做时让 CPU 进入休眠状态。不过,我们的执行器仍相当基础,还有许多扩展其功能的可能性: * **调度:** 对于我们的 `task_queue`,我们目前使用 `VecDeque` 类型来实现 FIFO 策略,这也经常被称作 Round Robin 调度。该策略可能并非对所有工作负载都最高效。例如,在某些情况下,优先处理对延迟敏感的任务或执行大量 I/O 操作的任务会更高效。详情请参阅 [_Operating Systems: Three Easy Pieces_] 中的 [scheduling chapter] 章节或者 [Wikipedia article on scheduling][scheduling-wiki] 。 * **任务生成:** 当前我们的 `Executor::spawn` 方法需要 `&mut self` 引用,因此在调用 `run` 方法后就不再可用。为解决这个问题,我们可以创建一个 `Spawner` 类型,它与执行器共享一些队列,并允许从任务自身创建新的任务。这些队列可以直接用 `task_queue`,或者用一个单独的队列,让执行器在循环中不断检查。 * **利用线程:** 目前我们尚未支持线程功能,但将在下一篇文章中添加该功能。这将允许在不同线程中启动多个执行器实例。这种方法的优势在于,由于其他任务可以并发运行,因此可以减少长时间运行的任务造成的延迟。该方法还能充分利用多核 CPU 的处理能力。 * **负载均衡:** 在添加线程支持时,了解如何在多个执行器之间分配任务以确保所有 CPU 核心都得到利用变得至关重要。实现这一点的常用技术是 [工作窃取][_work stealing_]。 [scheduling chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf [_Operating Systems: Three Easy Pieces_]: http://pages.cs.wisc.edu/~remzi/OSTEP/ [scheduling-wiki]: https://en.wikipedia.org/wiki/Scheduling_(computing) [_work stealing_]: https://en.wikipedia.org/wiki/Work_stealing ## 总结 我们在这篇文章开头介绍了**多任务处理**的概念,并区分了 _抢占式多任务处理_,包括定期强制中断运行任务的抢占式多任务,以及 _协作式多任务_,它让任务持续运行,直到它们主动放弃对 CPU 的控制权。 接着我们探讨了 Rust 对 async/await 的支持如何提供协作式多任务处理的语言层面的实现。Rust 的异步机制建立在基于轮询的 `Future` trait 之上,该 trait 对异步任务进行了抽象。通过 async/await 语法,可以像处理普通同步代码那样操作 futures。不同之处在于异步函数会再次返回一个 `Future` ,需要在某个时刻将其添加到执行器中才能运行。 在幕后,编译器将 async/await 代码转换为 _状态机_ ,其中每个 `.await` 操作对应一个可能的暂停点。利用对程序的了解,编译器能够为每个暂停点保存恢复所需的最小状态,从而使得每个任务的内存消耗非常小。一个挑战在于生成的状态机可能包含 _自引用结构体_,例如当异步函数的局部变量互相引用。为了防止指针失效,Rust 使用 `Pin` 类型来确保 future 在首次被轮询后不再在内存中移动。 在我们的实现中,我们首先创建了一个非常基础的任务执行器,它会在一个繁忙的循环里轮询所有已生成的任务,而不使用 `Waker` 类型。随后我们通过实现异步键盘任务展示了唤醒器通知的优势。该任务使用 `crossbeam` crate 提供的无互斥锁 `ArrayQueue` 类型定义了静态的 `SCANCODE_QUEUE`。键盘中断处理程序不再直接处理按键操作,而是将所有接收到的扫描码放入队列中,随后唤醒已注册的 `Waker` 以通知有新输入可用。在接收端,我们创建了一个 `ScancodeStream` 类型,用于提供 `Future` 解析,来获得队列中的下一个扫描码。这使得创建异步的 `print_keypresses` 任务,使用 async/await 解释并打印队列中的扫描码成为可能。 为了利用键盘任务的唤醒通知,我们创建了一个新的 `Executor` 类型,它使用一个 `Arc` 共享的 `task_queue` 存储就绪任务。我们实现了一个 `TaskWaker` 类型,用于将被唤醒任务的 ID 直接推送到这个 `task_queue` 中,然后由执行器再次轮询。为了在没有可运行任务时节省电量,我们通过 `hlt` 指令让 CPU 进入睡眠。最后,我们讨论了一些执行器的潜在扩展功能,例如提供多核支持。 ## 下一步是什么? 通过使用 async/await,我们现在在内核中实现了基本的协作式多任务支持。协作式多任务非常高效,但当单个任务持续占用资源时会导致延迟问题,阻碍其他任务执行。正因如此,为我们的内核添加抢占式多任务处理支持就显得尤为重要。 在下一篇文章中,我们将介绍 _线程_ ——作为抢占式多任务处理最常见的形式。除了可以解决长耗时任务的问题,线程机制还将有助于我们后续使用多 CPU 核心以及未来运行不受信任的用户程序。 ================================================ FILE: blog/content/edition-2/posts/12-async-await/index.zh-TW.md ================================================ +++ title = "Async/Await" weight = 12 path = "zh-TW/async-await" date = 2020-03-27 [extra] translators = ["ssrlive"] +++ 在這篇文章中,我們將探索 _協作式多任務_ 和 Rust 的 _async/await_ 功能。我們將詳細了解 Rust 中的 async/await 是如何工作的, 包括 `Future` trait 的設計、狀態機轉換和 _pinning_。 然後,我們通過創建一個異步鍵盤任務和一個基本的執行器,為我們的內核添加了對 async/await 的基本支持。 > 本文將 `trait` 翻譯爲 `特型`, 不接受什麼 `特性`,`特質` 等亂七八糟不知所云的譯法。 本博客在 [GitHub] 上開源。如果你有任何問題或疑問,請在那裡打開一個 issue。 你也可以在 [at the bottom] 留下評論。本文的完整源代碼可以在 [`post-12`][post branch] 分支中找到。 [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-12 ## 多任務 絕大多數操作系統的一個基本特性是多任務 [_multitasking_] ,它能夠同時執行多個任務。 例如,你可能在看這篇文章的同時打開了其他程序,比如文本編輯器或終端窗口。 即使你只打開了一個瀏覽器窗口,也可能有各種後台任務在管理你的桌面窗口、檢查更新或索引文件。 [_multitasking_]: https://en.wikipedia.org/wiki/Computer_multitasking 看起來所有的任務都在同時運行,但實際上一個 CPU 核心一次只能執行一個任務。為了創建任務同時運行的假象,操作系統會快速地在活動任務之間切換, 這樣每個任務都能取得一點進展。由於計算機的速度很快,我們大多數時候都不會注意到這些切換。 當單核 CPU 一次只能執行一個任務時,多核 CPU 可以真正並行地運行多個任務。例如,一個有 8 個核心的 CPU 可以同時運行 8 個任務。 我們將在未來的文章中解釋如何設置多核 CPU。在本文中,我們將專注於單核 CPU,以保持簡單。 (值得注意的是,所有多核 CPU 都是從只有一個活動核心開始的,所以我們現在可以將它們視為單核 CPU。) 有兩種形式的多任務:_協作式_ 多任務要求任務定期放棄對 CPU 的控制,以便其他任務可以取得進展。 _抢占式_ 多任務使用操作系統功能在任意時間點強制暫停線程來切換線程。接下來,我們將更詳細地探討兩種多任務的形式,並討論它們各自的優點和缺點。 ### 搶佔式多任務 搶佔式多任務後面的思想是操作系統控制何時切換任務。為此,它利用了操作系統在每次中斷時重新獲得對 CPU 的控制。這使得在系統有新的輸入時切換任務成為可能。例如,當滑鼠移動或網絡數據包到達時,可以切換任務。操作系統還可以通過配置硬件計時器在一段時間後發送中斷來確定任務允許運行的確切時間 下面的圖形展示了硬件中斷時的任務切換過程: ![](regain-control-on-interrupt.svg) 第一行中,CPU 正在執行程序 `A` 的任務 `A1`。所有其他任務都被暫停了。在第二行中,硬件中斷到達了 CPU。 正如在 硬件中斷 [_Hardware Interrupts_] 文章中描述的那樣,CPU 立即停止了任務 `A1` 的執行,並跳轉到中斷描述符表(IDT)中定義的中斷處理程序。 通過這個中斷處理程序,操作系統現在再次控制了 CPU,這使得它可以切換到任務 `B1` 而不是繼續任務 `A1`。 [_Hardware Interrupts_]: @/edition-2/posts/07-hardware-interrupts/index.md #### 保存狀態 既然任務在任意時間點被中斷,它們可能在一些計算的中間。為了能夠稍後恢復它們,操作系統必須備份任務的整個狀態, 包括它的調用棧 [call stack] 和所有 CPU 寄存器的值。這個過程被稱為上下文切換 [_context switch_]。 [call stack]: https://en.wikipedia.org/wiki/Call_stack [_context switch_]: https://en.wikipedia.org/wiki/Context_switch 由於調用棧可能非常大,操作系統通常為每個任務設置一個單獨的調用棧,而不是在每次任務切換時備份調用棧內容。 這種帶有其自己的棧的認爲被稱為 執行線程 [_thread of execution_] 或簡稱為 線程 _thread_。 通過為每個任務使用單獨的棧,只需要在上下文切換時保存寄存器內容(包括程序計數器和棧指針)。 [_thread of execution_]: https://en.wikipedia.org/wiki/Thread_(computing) #### 討論 搶佔式多任務主要的好處是操作系統可以完全控制任務的允許執行時間。這樣,它可以保證每個任務都能公平地獲得 CPU 的時間,而不需要信任 任務的合作。 這在運行第三方任務或多個用戶共享系統時尤為重要。 搶佔式多任務的缺點是每個任務都需要自己的棧。與共享棧相比,這導致每個任務的內存使用量更高,並且通常限制了系統中的任務數量。 另一個缺點是操作系統總是需要在每次任務切換時保存完整的 CPU 寄存器狀態,即使任務只使用了寄存器的一小部分。 搶佔式多任務和線程是操作系統的基本組件,因為它們使得運行不受信任的用戶空間程序成為可能。 我們將在未來的文章中詳細討論這些概念。然而,在本文中,我們將專注於協作式多任務,它也為我們的內核提供了有用的功能。 ### 協作式多任務 協作式多任務讓每個任務運行直到它自願放棄對 CPU 的控制,而不是在任意時間點強制暫停運行的任務。 這使得任務可以在方便的時間點暫停自己,例如,當它們需要等待 I/O 操作時。 協作式多任務通常用於語言級別,比如協程 [coroutines] 或 [async/await] 的形式。 其思想是程序員或編譯器在程序中插入 [_yield_] 操作,這樣可以放棄對 CPU 的控制,讓其他任務運行。 例如,可以在復雜循環的每次迭代後插入一個 yield。 [coroutines]: https://en.wikipedia.org/wiki/Coroutine [async/await]: https://rust-lang.github.io/async-book/01_getting_started/04_async_await_primer.html [_yield_]: https://en.wikipedia.org/wiki/Yield_(multithreading) 通常將協作式多任務與 異步操作 [asynchronous operations] 結合在一起。 當一個操作還沒有完成時,它不會阻止其他任務運行,而是返回一個 "未就緒" 的狀態。在這種情況下,等待的任務可以執行一個 yield 操作,讓其他任務運行。 [asynchronous operations]: https://en.wikipedia.org/wiki/Asynchronous_I/O #### 保存狀態 既然任務自己定義了暫停點,它們不需要操作系統保存它們的狀態。相反,它們可以在暫停自己之前保存它們需要的狀態, 這通常會帶來更好的性能。例如,一個剛完成了復雜計算的任務可能只需要備份計算的最終結果,因為它不再需要中間結果。 協作式多任務的語言級實現通常甚至能夠在暫停之前備份調用棧的必要部分。 例如,Rust 的 async/await 實現會在暫停之前備份所有仍然需要的本地變量到一個自動生成的結構體中(見下文)。 通過在暫停之前備份調用棧的相關部分,所有任務都可以共享一個調用棧,這導致每個任務的內存消耗大大降低。 這使得可以創建幾乎任意數量的協作式任務而不會耗盡內存。 #### 討論 協作式多任務的缺點是一個不合作的任務可能運行無限長的時間。因此,一個惡意或有缺陷的任務可能會阻止其他任務運行,從而減慢甚至阻塞整個系統。 因此,只有當所有任務都知道合作時,協作式多任務才應該使用。舉個反例,讓操作系統依賴於任意用戶級程序的合作是不明智的。 然而,協作式多任務的強大性能和內存優勢使得它成為程序內部使用的一個好方法,特別是與異步操作結合使用。 由於操作系統內核是一個與異步硬件交互的性能關鍵型程序,協作式多任務似乎是實現並發性的一個好方法。 ## Async/Await in Rust Rust 語言提供了對協作式多任務的第一級別支持,這種支持以 async/await 的形式呈現。 在我們探討 async/await 是什麼以及它是如何工作之前,我們需要了解 Rust 中 _futures_ 和異步編程是如何工作的。 ### Futures 一個 _future_ 代表一個可能還沒有可用的值。這可能是,例如,由另一個任務計算的整數或從網絡下載的文件。 與等待值可用不同,future 使得可以繼續執行直到需要值。 #### 例子 futures 的概念最好通過一個小例子來說明: ![序列圖: `main` 調用 `read_file` 並且被阻塞直到它返回;然後它調用 `foo()` 並且也被阻塞直到它返回。同樣的過程重複了,但這次調用了 `async_read_file`,它直接返回一個 future;然後再次調用 `foo()`,這次它與文件加載並行運行。文件在 `foo()` 返回之前已經可用了。](async-example.svg) 這個序列圖展示了一個 `main` 函數,它從文件系統讀取文件,然後調用一個 `foo` 函數。 這個過程重複了兩次:一次是同步的 `read_file` 調用,一次是異步的 `async_read_file` 調用。 使用同步調用,`main` 函數需要等待直到文件從文件系統加載完畢。只有這樣它才能調用 `foo` 函數,這需要它再次等待結果。 使用異步 `async_read_file` 調用,文件系統直接返回一個 future,並在後台異步加載文件。 這使得 `main` 函數可以更早地調用 `foo`,這樣它可以與文件加載並行運行。 在這個例子中,文件加載甚至在 `foo` 返回之前就完成了,所以 `main` 可以在 `foo` 返回後直接使用文件而不需要進一步等待。 #### Futures in Rust 在 Rust 中,futures 由 [`Future`] trait 表示,它看起來像這樣: [`Future`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html ```rust pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll; } ``` 那個 關聯類型 [`associated type`] `Output` 指定了異步值的類型。 例如,上面的示例中的 `async_read_file` 函數將返回一個 `Output` 設置為 `File` 的 `Future` 實例。 [associated type]: https://doc.rust-lang.org/book/ch20-02-advanced-traits.html#associated-types 那個 [`poll`] 方法允許檢查值是否已經可用。它返回一個 [`Poll`] 枚舉,看起來像這樣: [`poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll [`Poll`]: https://doc.rust-lang.org/nightly/core/task/enum.Poll.html ```rust pub enum Poll { Ready(T), Pending, } ``` 當那個值已經可用(例如,文件已經完全從磁盤讀取完畢),它被包裝在 `Ready` 變體中返回。 否則,返回 `Pending` 變體,這通知調用者該值尚不可用。 方法 `poll` 接受兩個參數:`self: Pin<&mut Self>` 和 `cx: &mut Context`。 前者的行為與普通的 `&mut self` 引用類似,只是 `Self` 值被 釘住 [_pinned_] 在它的內存位置。 在理解 async/await 是如何工作之前,理解 `Pin` 以及為什麼它是必要的是困難的。因此,我們將在本文後面解釋它。 [_pinned_]: https://doc.rust-lang.org/nightly/core/pin/index.html 參數 `cx: &mut Context` 的目的是將一個 喚醒器 [`Waker`] 實例傳遞給異步任務,例如文件系統加載。 這個 `Waker` 允許異步任務發出信號,表明它(或它的一部分)已經完成,例如文件已經從磁盤加載完畢。 由於主任務知道當 `Future` 可用時它將被通知,所以它不需要一遍又一遍地調用 `poll`。 我們將在本文後面實現自己的 `Waker` 類型時更詳細地解釋這個過程。 [`Waker`]: https://doc.rust-lang.org/nightly/core/task/struct.Waker.html ### 同 Futures 一起工作 我們現在知道了 futures 是如何定義的,並且了解了 `poll` 方法背後的基本思想。然而,我們仍然不知道如何有效地使用 futures。 問題在於 futures 代表了異步任務的結果,這些結果可能還不可用。然而,在實踐中,我們經常需要這些值來進行進一步的計算。 所以問題是:當我們需要時,我們如何有效地獲取 future 的值? #### Waiting on Futures 一個可能的答案是等待直到 future 變得可用。這可能看起來像這樣: ```rust let future = async_read_file("foo.txt"); let file_content = loop { match future.poll(…) { Poll::Ready(value) => break value, Poll::Pending => {}, // do nothing } } ``` 在這裏我們 _主動_ 等待 future,通過在一個循環中一遍又一遍地調用 `poll`。這裏 `poll` 的參數不重要,所以我們省略了它們。 雖然這個解決方案有效,但它非常低效,因為我們一直佔用 CPU 直到值變得可用。 更有效的方法可能是 _阻塞_ 當前線程直到 future 變得可用。 當然,這只有在你有線程的時候才可能,所以這個解決方案對我們的內核來說不起作用,至少目前還不行。 即使在支持阻塞的系統上,這通常也是不希望的,因為它會將一個異步任務再次變成一個同步任務,從而抑制了並行任務的潛在性能優勢。 #### Future 組合器 一個 等待 的替代方案是使用 future 組合器 (future combinators)。Future 組合器 是像 `map` 這樣的方法,它允許將 futures 連接和組合在一起, 類似於 [`Iterator`] trait 的方法。與等待 future 不同,這些 combinator 返回一個 future,它們自己應用 `poll` 上的映射操作。 [`Iterator`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html 做爲例子,一個簡單的 `string_len` 組合器,它將 `Future` 轉換成 `Future` 可能看起來像這樣: ```rust struct StringLen { inner_future: F, } impl Future for StringLen where F: Future { type Output = usize; fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll { match self.inner_future.poll(cx) { Poll::Ready(s) => Poll::Ready(s.len()), Poll::Pending => Poll::Pending, } } } fn string_len(string: impl Future) -> impl Future { StringLen { inner_future: string, } } // Usage fn file_len() -> impl Future { let file_content_future = async_read_file("foo.txt"); string_len(file_content_future) } ``` 這段代碼不完全能工作,因為它沒有處理 [_pinning_],但它足以作為一個例子。 基本思想是 `string_len` 函數將給定的 `Future` 實例包裝到一個新的 `StringLen` 結構體中,它也實現了 `Future`。 當包裝的 future 被調用時,它調用內部 future。如果值還不可用,從包裝的 future 也返回 `Poll::Pending`。 如果值已經可用,則從 `Poll::Ready` 變體中提取字符串並計算它的長度。然後,它再次包裝在 `Poll::Ready` 中並返回。 [_pinning_]: https://doc.rust-lang.org/stable/core/pin/index.html 用這個 `string_len` 函數,我們可以計算一個異步字符串的長度而不需要等待它。由於函數再次返回一個 `Future`, 調用者不能直接在返回的值上工作,但需要再次使用 組合器 函數。這樣,整個調用圖變成了異步的, 我們可以在某個時候有效地等待多個 futures,例如,在 `main` 函數中。 因爲手工編寫 組合器 函數是困難的,它們通常由庫提供。雖然 Rust 標準庫本身還沒有提供 組合器 方法, 但是半官方的(並且 `no_std` 兼容的) [`futures`] 庫提供了。它的 [`FutureExt`] trait 提供了高級 組合器 方法, 比如 [`map`] 或 [`then`],它們可以用來使用任意的閉包來操作結果。 [`futures`]: https://docs.rs/futures/0.3.4/futures/ [`FutureExt`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html [`map`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.map [`then`]: https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.then ##### 優勢 Future 組合器的一個巨大優勢是它們保持了操作的異步性。與異步 I/O 接口結合使用,這種方法可以帶來非常高的性能。 Future 組合器作為具有 trait 實現的普通結構體,使得編譯器可以對它們進行極限優化。 有關更多細節,請參見 [_Zero-cost futures in Rust_] 文章,它宣布了將 futures 添加到 Rust 生態系統中。 [_Zero-cost futures in Rust_]: https://aturon.github.io/blog/2016/08/11/futures/ ##### Drawbacks 當 future 組合器 使得編寫非常高效的代碼成為可能時,它們在某些情況下可能很難使用,這是因為類型系統和基於閉包的接口。例如,考慮這樣的代碼: ```rust fn example(min_len: usize) -> impl Future { async_read_file("foo.txt").then(move |content| { if content.len() < min_len { Either::Left(async_read_file("bar.txt").map(|s| content + &s)) } else { Either::Right(future::ready(content)) } }) } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=91fc09024eecb2448a85a7ef6a97b8d8)) 這裏我們讀取文件 `foo.txt`,然後使用 [`then`] 組合器 來連接 基於文件內容的第二個 future。 如果內容的長度小於給定的 `min_len`,我們讀取一個不同的 `bar.txt` 文件並使用 [`map`] 組合器 將它附加到 `content` 上。 否則,我們只返回 `foo.txt` 的內容。 我們需要使用 [`move`] 關鍵字來修復傳遞給 `then` 的閉包,因為否則 `min_len` 將會有一個生命週期錯誤。 我們需要使用 [`Either`] 包裝器,因為 `if` 和 `else` 塊必須總是有相同的類型。由於我們在塊中返回不同的 future 類型, 我們必須使用包裝器類型將它們統一到一個類型中。 [`ready`] 函數將一個值包裝到一個 future 中,這個 future 立即就緒。這個函數在這裏是必需的,因為 `Either` 包裝器期望被包裝的值實現了 `Future`。 [`move` keyword]: https://doc.rust-lang.org/std/keyword.move.html [`Either`]: https://docs.rs/futures/0.3.4/futures/future/enum.Either.html [`ready`]: https://docs.rs/futures/0.3.4/futures/future/fn.ready.html 如你所想,這能很快地導致對於大型項目來說非常複雜的代碼。如果涉及借用和不同的生命週期,它會變得特別複雜。 因此,Rust 在添加對 async/await 的支持時投入了大量的工作,目標是使編寫異步代碼變得更簡單。 ### The Async/Await Pattern 在 async/await 背後的思想是讓程序員編寫看起來像 正常同步代碼 的代碼,但是被編譯器轉換成異步代碼。 它基於兩個關鍵字 `async` 和 `await`。`async` 關鍵字可以在函數簽名中使用,將一個同步函數轉換成一個返回 future 的異步函數: ```rust async fn foo() -> u32 { 0 } // the above is roughly translated by the compiler to: fn foo() -> impl Future { future::ready(0) } ``` 這個關鍵字單獨使用時並不是很有用。然而,在 `async` 函數內部,`await` 關鍵字可以用來獲取 future 的異步值: ```rust async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=d93c28509a1c67661f31ff820281d434)) 這個函數是對 [上面](#drawbacks) 的 `example` 函數的直接翻譯,它使用了組合器函數。 使用 `.await` 運算符,我們可以獲取 future 的值而不需要任何閉包或 `Either` 類型。 因此,我們可以像編寫正常的同步代碼一樣編寫我們的代碼,只是 _這仍然是異步代碼_。 #### 狀態機轉換 在幕後,編譯器將 `async` 函數體轉換成一狀態機 [_state machine_],其中每個 `.await` 調用代表一個不同的狀態。 對於上面的 `example` 函數,編譯器創建了一個具有以下四個狀態的狀態機: [_state machine_]: https://en.wikipedia.org/wiki/Finite-state_machine ![Four states: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-states.svg) 每個狀態代表了函數中的一個不同的暫停點。 _"Start"_ 和 _"End"_ 狀態代表了函數在執行的開始和結束。 _"Waiting on foo.txt"_ 狀態代表了函數當前正在等待第一個 `async_read_file` 的結果。 同樣地, _"Waiting on bar.txt"_ 狀態代表了函數正在等待第二個 `async_read_file` 的結果的暫停點。 狀態機通過使每個 `poll` 調用成為一個可能的狀態轉換來實現 `Future` trait: ![Four states and their transitions: start, waiting on foo.txt, waiting on bar.txt, end](async-state-machine-basic.svg) 該圖使用箭頭表示狀態轉換,使用菱形表示替代路徑。例如,如果 `foo.txt` 文件還沒有準備好,則採用 _"no"_ 路徑,並達到 _"Waiting on foo.txt"_ 狀態。 否則,採用 _"yes"_ 路徑。沒有標題的小紅色菱形代表 `example` 函數的 `if content.len() < 100` 分支。 我們看到第一個 `poll` 調用啟動了函數並讓它運行直到它達到一個還未準備好的 future。 如果所有的 future 都準備好了,函數可以運行到 _"End"_ 狀態,並在 `Poll::Ready` 中返回它的結果。 否則,狀態機進入等待狀態並返回 `Poll::Pending`。在下一個 `poll` 調用中,狀態機從最後一個等待的狀態開始,並重試最後一個操作。 #### 保存狀態 為了能夠從上一個等待狀態繼續,狀態機必須在內部保持當前的狀態。此外,它必須保存所有它需要在下一個 `poll` 調用中繼續執行的變量。 這就是編譯器真正發揮作用的地方:因為它知道哪些變量何時使用,它可以自動生成具有確切所需變量的結構體。 作為一個例子,編譯器為上面的 `example` 函數生成了像下面這樣的結構體: ```rust // The `example` function again so that you don't have to scroll up async fn example(min_len: usize) -> String { let content = async_read_file("foo.txt").await; if content.len() < min_len { content + &async_read_file("bar.txt").await } else { content } } // The compiler-generated state structs: struct StartState { min_len: usize, } struct WaitingOnFooTxtState { min_len: usize, foo_txt_future: impl Future, } struct WaitingOnBarTxtState { content: String, bar_txt_future: impl Future, } struct EndState {} ``` 在 _"start"_ 和 _"Waiting on foo.txt"_ 狀態中,`min_len` 參數需要被保存以便與 `content.len()` 進行後續比較。 _"Waiting on foo.txt"_ 狀態另外保存了一個 `foo_txt_future`,它代表了 `async_read_file` 調用返回的 future。 當狀態機繼續運行時,這個 future 需要再次被調用,所以它需要被保存。 _"Waiting on bar.txt"_ 狀態包含了 `content` 變量,以便在 `bar.txt` 可用時進行字符串連接。 它還保存了一個 `bar_txt_future`,它代表了 `bar.txt` 的異步加載過程。這個結構體不包含 `min_len` 變量,因為在 `content.len()` 比較之後它不再需要了。 在 _"end"_ 狀態中不保存任何變量,因為函數已經運行到了結束。 請記住,這只是編譯器可能生成的代碼的一個例子。結構體名稱和字段布局是實現細節,可能是不同的。 #### 完整的狀態機類型 雖然確切的編譯器生成的代碼是一個實現細節,但它有助於理解想象 `example` 函數的狀態機 _可能_ 看起來像什麼。 我們已經定義了表示不同狀態的結構體並包含了所需的變量。為了在它們之上創建一個狀態機,我們可以將它們組合成一個 [`enum`]: [`enum`]: https://doc.rust-lang.org/book/ch06-01-defining-an-enum.html ```rust enum ExampleStateMachine { Start(StartState), WaitingOnFooTxt(WaitingOnFooTxtState), WaitingOnBarTxt(WaitingOnBarTxtState), End(EndState), } ``` 我們定義了一個獨立的枚舉變體來表示每個狀態,並將相應的狀態結構體作為每個變體的字段添加到其中。 為了實現狀態轉換,編譯器基於 `example` 函數生成了 `Future` trait 的實現: ```rust impl Future for ExampleStateMachine { type Output = String; // return type of `example` fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll { loop { match self { // TODO: handle pinning ExampleStateMachine::Start(state) => {…} ExampleStateMachine::WaitingOnFooTxt(state) => {…} ExampleStateMachine::WaitingOnBarTxt(state) => {…} ExampleStateMachine::End(state) => {…} } } } } ``` 該 future 的 `Output` 類型是 `String`,因為它是 `example` 函數的返回類型。 為了實現 `poll` 函數,我們在一個 `loop` 中使用 `match` 語句對當前狀態進行分支。 我們的想法是只要可能就切換到下一個狀態,並在無法繼續時使用顯式的 `return Poll::Pending`。 為了簡化,我們只顯示簡化的代碼並且不處理 [釘住][_pinning_]、所有權、生命週期等。因此,這裏和下面的代碼應該被視為偽代碼,不應該直接使用。 當然,真正的編譯器生成的代碼可以正確地處理所有事情,盡管可能是以與這裏展示的不同的方式。 爲了使代碼片段更小,我們分別展示每個 `match` 分支的代碼。讓我們從 `Start` 狀態開始: ```rust ExampleStateMachine::Start(state) => { // from body of `example` let foo_txt_future = async_read_file("foo.txt"); // `.await` operation let state = WaitingOnFooTxtState { min_len: state.min_len, foo_txt_future, }; *self = ExampleStateMachine::WaitingOnFooTxt(state); } ``` 狀態機在函數的開始時處於 `Start` 狀態。在這種情況下,我們執行 `example` 函數的所有代碼直到第一個 `.await`。 為了處理 `.await` 操作,我們將狀態機 `self` 的狀態更改為 `WaitingOnFooTxt`,這包括了 `WaitingOnFooTxtState` 結構體的構造. 因爲 `match self {…}` 語句在一個循環中執行,所以執行直接跳到 `WaitingOnFooTxt` 分支: ```rust ExampleStateMachine::WaitingOnFooTxt(state) => { match state.foo_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(content) => { // from body of `example` if content.len() < state.min_len { let bar_txt_future = async_read_file("bar.txt"); // `.await` operation let state = WaitingOnBarTxtState { content, bar_txt_future, }; *self = ExampleStateMachine::WaitingOnBarTxt(state); } else { *self = ExampleStateMachine::End(EndState); return Poll::Ready(content); } } } } ``` 在這個 `match` 分支中,我們首先調用 `foo_txt_future` 的 `poll` 函數。如果它還沒有準備好,我們退出循環並返回 `Poll::Pending`。 由於在這種情況下 `self` 保持在 `WaitingOnFooTxt` 狀態,下一個 `poll` 調用將進入相同的 `match` 分支並重試 `foo_txt_future` 的輪詢。 當 `foo_txt_future` 準備好時,我們將結果分配給 `content` 變量並繼續執行 `example` 函數的代碼: 如果 `content.len()` 小於狀態結構體中保存的 `min_len`,我們異步讀取 `bar.txt` 文件。 我們再次將 `.await` 操作轉換成一個狀態變化,這次是到 `WaitingOnBarTxt` 狀態。 由於我們在一個循環中執行 `match`,所以執行直接跳到新狀態的 `match` 分支,並輪詢 `bar_txt_future`。 如果我們進入 `else` 分支,則不會進行進一步的 `.await` 操作。我們達到函數的結尾並將 `content` 包裝在 `Poll::Ready` 中返回。 我們還將當前的狀態更改為 `End` 狀態。 狀態 `WaitingOnBarTxt` 的代碼如下: ```rust ExampleStateMachine::WaitingOnBarTxt(state) => { match state.bar_txt_future.poll(cx) { Poll::Pending => return Poll::Pending, Poll::Ready(bar_txt) => { *self = ExampleStateMachine::End(EndState); // from body of `example` return Poll::Ready(state.content + &bar_txt); } } } ``` 類似於 `WaitingOnFooTxt` 狀態,我們首先輪詢 `bar_txt_future`。如果它還沒有準備好,我們退出循環並返回 `Poll::Pending`。 反之,我們可以執行 `example` 函數的最後一個操作:將 `content` 變量與 future 的結果連接起來。 我們將狀態機更改為 `End` 狀態並在 `Poll::Ready` 中返回結果。 最後,`End` 狀態的代碼如下: ```rust ExampleStateMachine::End(_) => { panic!("poll called after Poll::Ready was returned"); } ``` Future 在返回 `Poll::Ready` 後不應再次被輪詢到,所以如果在 `End` 狀態下調用 `poll`,我們會 panic。 我們現在知道了編譯器生成的狀態機及其對 `Future` trait 的實現 _可能_ 看起來的樣子。 真實情況下,編譯器以不同的方式生成代碼。 (如果你感興趣,看看目前的基於 [_coroutines_] 的實現,但這只是個實現細節之一。) [_coroutines_]: https://doc.rust-lang.org/stable/unstable-book/language-features/coroutines.html 整個狀態機的最後一塊拼圖是 `example` 函數本身的生成代碼。記住,函數頭部是這樣定義的: ```rust async fn example(min_len: usize) -> String ``` 因爲完整函數體現在被狀態機實現,函數唯一需要做的事情就是初始化狀態機並返回它。生成的代碼可能看起來像這樣: ```rust fn example(min_len: usize) -> ExampleStateMachine { ExampleStateMachine::Start(StartState { min_len, }) } ``` 該函數不再有 `async` 修飾符,因為它現在明確地返回一個實現了 `Future` trait 的 `ExampleStateMachine` 類型。 正如預期的那樣,狀態機在 `Start` 狀態中構造,相應的狀態結構體被初始化為 `min_len` 參數。 注意,這個函數不開始狀態機的執行。這是 Rust 中 futures 的一個基本設計決策:它們在第一次輪詢之前不做任何事情。 ### Pinning 我們已經在本文中多次遇到了 _釘住_ (_pinning_) 這個詞。現在終於是時候探索 _釘住操作_ 是什麼以及為什麼它是必需的。 #### Self-Referential Structs 自引用結構體 (Self-Referential Structs) 正如上面所解釋的,狀態機轉換將每個暫停點的本地變量存儲在一個結構體中。 對於像我們的 `example` 函數這樣的小例子來說,這是直接的,並且不會導致任何問題。 然而,當變量相互引用時,事情將變得更加困難起來。例如,考慮這個函數: ```rust async fn pin_example() -> i32 { let array = [1, 2, 3]; let element = &array[2]; async_write_file("foo.txt", element.to_string()).await; *element } ``` 這函數創建了一個包含 `1`、`2` 和 `3` 的小 `array`。然後它創建了對最後一個數組元素的引用並將它存儲在一個 `element` 變量中。 接下來,它異步地將數字轉換為字符串寫入到 `foo.txt` 文件中。最後,它返回 `element` 引用的數字。 由於函數使用了單個 `await` 操作,生成的狀態機有三個狀態:開始、結束和 _"等待寫入"_。該函數不接受任何參數,所以開始狀態的結構體是空的。 就像之前一樣,結束狀態的結構體是空的,因為函數在這一點上已經完成了。 _"等待寫入"_ 狀態的結構體更有趣: ```rust struct WaitingOnWriteState { array: [1, 2, 3], element: 0x1001c, // address of the last array element } ``` 我們需要存儲 `array` 和 `element` 變量的值,因為 `element` 是返回值的一部分,而 `array` 被 `element` 所引用。 由於 `element` 是一個引用,它存儲了一個指向被引用元素的 _指針_(即,一個內存地址)。我們在這裏使用 `0x1001c` 為例指代該地址。 實際上,它需要是 `array` 字段的最後一個元素的地址,所以它取決於結構體在內存中的位置。 具有這種內部指針的結構體被稱為 _自引用_ 結構體,因為它們從它們自己的一個字段中引用自己。 #### 自引用結構體的問題 自引用結構體的內部指針導致了一個基本問題,當我們觀察它們的內存布局時就會變得明顯: ![array at 0x10014 with fields 1, 2, and 3; element at address 0x10020, pointing to the last array element at 0x1001c](self-referential-struct.svg) `array` 字段從地址 `0x10014` 開始,`element` 字段從地址 `0x10020` 開始。它指向地址 `0x1001c`,因為最後一個數組元素位於這個地址。在這一點上,一切都還好。然而,當我們將這個結構體移動到不同的內存地址時,問題就出現了: ![array at 0x10024 with fields 1, 2, and 3; element at address 0x10030, still pointing to 0x1001c, even though the last array element now lives at 0x1002c](self-referential-struct-moved.svg) 我們移動了結構體,使它現在從地址 `0x10024` 開始。這可能發生在我們將結構體作為函數參數傳遞或將它賦值給不同的棧變量時。問題在於 `element` 字段仍然指向地址 `0x1001c`,即使最後一個 `array` 元素現在位於地址 `0x1002c`。因此,指針是懸空的,這導致下一個 `poll` 調用時發生未定義的行為。 #### 可能的解決方案 有三個基本方法來解決懸空指針問題: - **在移動時更新指針:** 這個方法的想法是在結構體在內存中移動時更新內部指針,以便它在移動後仍然有效。不幸的是,這種方法需要對 Rust 進行大量的更改,這可能導致巨大的性能損失。原因是某種類型的運行時需要跟踪所有結構體字段的類型,並在每次移動操作時檢查是否需要更新指針。 - **存儲偏移量而不是自引用:** 為了避免更新指針的要求,編譯器可以嘗試將自引用存儲為結構體開始的偏移量。例如,上面的 `WaitingOnWriteState` 結構體的 `element` 字段可以以 `element_offset` 字段的形式存儲,其值為 8,因為引用的數組元素在結構體開始的 8 字節後開始。由於結構體移動時偏移量保持不變,因此不需要進行字段更新。 這個方法的問題在於它需要編譯器檢測所有的自引用。這在編譯時是不可能的,因為引用的值可能取決於用戶輸入,所以我們需要一個運行時系統來分析引用並正確地創建狀態結構體。這不僅會導致運行時成本,還會阻止某些編譯器優化,這將導致大量的性能損失。 - **禁止移動結構體:** 正如我們上面看到的,懸空指針只有在我們移動結構體時才會出現。通過完全禁止對自引用結構體的移動操作,問題也可以避免。這種方法的巨大優勢在於它可以在類型系統級別實現而不需要額外的運行時成本。缺點是它將 可能的自引用結構體的移動操作 的 處理負擔 放在了程序員身上。 Rust 選擇了第三種解決方案,因為它的原則是提供 _零成本抽象_,這意味著抽象不應該帶來額外的運行時成本。 釘住 [_pinning_] API 是為此目的而提出的,它在 [RFC 2349](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md)。 在接下來的內容中,我們將簡要介紹這個 API,並解釋它如何與 async/await 和 futures 一起工作。 #### 堆上之數值 第一個觀察是, 堆分配的 [heap-allocated] 值大多數情況下已經有一個固定的內存地址。 它們是通過調用 `allocate` 函數創建的,然後通過指針類型(如 `Box`)引用。 雖然移動指針類型是可能的,但指針指向的堆值保持在相同的內存地址,直到它再次通過 `deallocate` 調用被釋放。 [heap-allocated]: @/edition-2/posts/10-heap-allocation/index.md 使用堆分配,我們可以嘗試創建一個自引用結構體: ```rust fn main() { let mut heap_value = Box::new(SelfReferential { self_ptr: 0 as *const _, }); let ptr = &*heap_value as *const SelfReferential; heap_value.self_ptr = ptr; println!("heap value at: {:p}", heap_value); println!("internal reference: {:p}", heap_value.self_ptr); } struct SelfReferential { self_ptr: *const Self, } ``` ([Try it on the playground][playground-self-ref]) [playground-self-ref]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=ce1aff3a37fcc1c8188eeaf0f39c97e8 我們創建了一個名為 `SelfReferential` 的簡單結構體,它包含一個單指針字段。首先,我們使用空指針初始化這個結構體,然後使用 `Box::new` 在堆上分配它。然後,我們確定堆分配的結構體的內存地址並將其存儲在 `ptr` 變量中。最後,我們通過將 `ptr` 變量賦值給 `self_ptr` 字段使結構體成為自引用。 當我們執行這段代碼 [on the playground][playground-self-ref] 時,我們看到堆值的地址和它的內部指針是相等的, 這意味著 `self_ptr` 字段是一個有效的自引用。由於 `heap_value` 變量只是一個指針, 移動它(例如,通過將它傳遞給一個函數)不會改變結構體本身的地址,所以 `self_ptr` 即使指針被移動也保持有效。 然而,這仍然有個辦法破壞這個例子:我們可以移出一個 `Box` 或替換它的內容: ```rust let stack_value = mem::replace(&mut *heap_value, SelfReferential { self_ptr: 0 as *const _, }); println!("value at: {:p}", &stack_value); println!("internal reference: {:p}", stack_value.self_ptr); ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=e160ee8a64cba4cebc1c0473dcecb7c8)) 這裏我們使用 [`mem::replace`] 函數將堆分配的值替換為一個新的結構體實例。 這允許我們將原始的 `heap_value` 移動到棧上,而結構體的 `self_ptr` 字段現在是一個懸空指針,它仍然指向舊的堆地址。 當你在 playground 上嘗試運行這個例子時,你會看到打印的 _"value at:"_ 和 _"internal reference:"_ 行確實顯示了不同的指針。 因此,僅僅堆分配一個值並不足以使自引用安全。 [`mem::replace`]: https://doc.rust-lang.org/nightly/core/mem/fn.replace.html 這裏根本的問題是 `Box` 允許我們獲得對堆分配值的 `&mut T` 引用。這個 `&mut` 引用使得我們可以使用 [`mem::replace`] 或 [`mem::swap`] 這樣的方法來使堆分配的值失效。為了解決這個問題,我們必須防止創建對自引用結構體的 `&mut` 引用。 [`mem::swap`]: https://doc.rust-lang.org/nightly/core/mem/fn.swap.html #### `Pin>` and `Unpin` 釘住 pinning API 提供了一個對 `&mut T` 問題的解決方案,即 [`Pin`] 包裝類型和 [`Unpin`] 標記特型 trait。 這些類型後面的想法是,在 `Pin` 的所有方法上設置門檻,這些方法可以用來獲得對包裝值的 `&mut` 引用(例如 [`get_mut`][pin-get-mut] 或 [`deref_mut`][pin-deref-mut]),這些門檻是 `Unpin` 特型。 `Unpin` 特型是一個 [_auto trait_],Rust自動爲所有類型實現了它,除了那些明確地選擇了不實現的類型。 通過使自引用結構體選擇不實現 `Unpin` 的類型,對於它們來說,要從 `Pin>` 類型獲得 `&mut T` 是沒有(安全的)的辦法的。 [`Pin`]: https://doc.rust-lang.org/stable/core/pin/struct.Pin.html [`Unpin`]: https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html [pin-get-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_mut [pin-deref-mut]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.deref_mut [_auto trait_]: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits 舉個例子,讓我們更新上面的 `SelfReferential` 類型來選擇不實現 `Unpin`: ```rust use core::marker::PhantomPinned; struct SelfReferential { self_ptr: *const Self, _pin: PhantomPinned, } ``` 我們選擇性地添加了另一個 [`PhantomPinned`] 類型的 `_pin` 字段到結構體。它是個零大小的標記類型,它的唯一目的是 _不_ 實現 `Unpin` 特型。 由於 [auto traits][_auto trait_] 的工作方式,一個不是 `Unpin` 的字段足以阻止整個結構體實現 `Unpin`。 [`PhantomPinned`]: https://doc.rust-lang.org/nightly/core/marker/struct.PhantomPinned.html 第二步是將示例中的 `Box` 類型更改為 `Pin>` 類型。 最簡單的方法是使用 [`Box::pin`] 函數而不是 [`Box::new`] 來創建堆分配的值: [`Box::pin`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.pin [`Box::new`]: https://doc.rust-lang.org/nightly/alloc/boxed/struct.Box.html#method.new ```rust let mut heap_value = Box::pin(SelfReferential { self_ptr: 0 as *const _, _pin: PhantomPinned, }); ``` 除了將 `Box::new` 更改為 `Box::pin` 之外,我們還需要在結構體初始化程序中添加新的 `_pin` 字段。由於 `PhantomPinned` 是一個零大小的類型,我們只需要它的類型名稱來初始化它。 當我們 [嘗試運行我們調整後的例子](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=961b0db194bbe851ff4d0ed08d3bd98a) 時,我們看到它不再工作: ``` error[E0594]: cannot assign to data in dereference of `Pin>` --> src/main.rs:10:5 | 10 | heap_value.self_ptr = ptr; | ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot assign | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` error[E0596]: cannot borrow data in dereference of `Pin>` as mutable --> src/main.rs:16:36 | 16 | let stack_value = mem::replace(&mut *heap_value, SelfReferential { | ^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Pin>` ``` 由於 `Pin>` 類型不再實現 `DerefMut` 特型,所以這兩個錯誤都發生了。 這正是我們想要的,因為 `DerefMut` 特型會返回一個 `&mut` 引用,而我們想要防止這種情況發生。 這只是因為我們選擇了不實現 `Unpin`,並將 `Box::new` 更改為 `Box::pin`。 現在的問題是編譯器不僅阻止了在第16行移動類型,而且還禁止了在第10行初始化 `self_ptr` 字段。 這是因為編譯器無法區分 `&mut` 引用的有效和無效使用。為了讓初始化再次工作,我們必須使用不安全的 [`get_unchecked_mut`] 方法: [`get_unchecked_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.get_unchecked_mut ```rust // safe because modifying a field doesn't move the whole struct unsafe { let mut_ref = Pin::as_mut(&mut heap_value); Pin::get_unchecked_mut(mut_ref).self_ptr = ptr; } ``` ([Try it on the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=b9ebbb11429d9d79b3f9fffe819e2018)) [`get_unchecked_mut`] 函數在 `Pin<&mut T>` 上工作,而不是在 `Pin>` 上,因此我們必須使用 [`Pin::as_mut`] 來轉換值。 然後我們可以使用 `get_unchecked_mut` 返回的 `&mut` 引用來設置 `self_ptr` 字段。 [`Pin::as_mut`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.as_mut 現在唯一剩下的錯誤是,我們所期待的在 `mem::replace` 上出現的錯誤。 記住,這個操作嘗試將堆分配的值移動到棧上,這將破壞存儲在 `self_ptr` 字段中的自引用。 通過選擇不實現 `Unpin` 並使用 `Pin>`,我們可以在編譯時阻止這個操作,從而安全地使用自引用結構體。 正如我們所看到的,編譯器無法證明創建自引用是安全的(目前是這樣),所以我們需要使用一個不安全的塊並自己驗證其正確性。 #### 棧上釘住和 `Pin<&mut T>` 前一節中,我們學習了如何使用 `Pin>` 安全地創建堆分配的自引用值。 雖然這種方法運行良好並且相對安全(除了不安全的構造),但所需的堆分配會帶來性能成本。 由於 Rust 力求在可能的情況下提供 _零成本抽象_,釘住 pinning API 也允許創建指向棧分配值的 `Pin<&mut T>` 實例。 不像 `Pin>` 实例拥有被包装的值的 _所有权_,`Pin<&mut T>` 实例只是暂时借用了被包装的值。 这使得事情变得更加复杂,因为它要求程序员自己确保额外的保证。 最重要的是,`Pin<&mut T>` 必须在整个 `T` 的引用生命周期内保持固定,對於基於棧的變量來說,这可能很难验证。 为了帮助解决这个问题,存在像 [`pin-utils`] 这样的 crate,但我仍然不建议将 釘住操作 应用到栈上,除非你真的知道你在做什么。 [`pin-utils`]: https://docs.rs/pin-utils/0.1.0-alpha.4/pin_utils/ 更多資訊,請查看 [`pin` module] 的文檔和 [`Pin::new_unchecked`] 方法。 [`pin` module]: https://doc.rust-lang.org/nightly/core/pin/index.html [`Pin::new_unchecked`]: https://doc.rust-lang.org/nightly/core/pin/struct.Pin.html#method.new_unchecked #### 釘住操作和 Futures 就如我們已在本文中看到的那樣,[`Future::poll`] 方法使用釘住操作,它的形式是一個 `Pin<&mut Self>` 參數: [`Future::poll`]: https://doc.rust-lang.org/nightly/core/future/trait.Future.html#tymethod.poll ```rust fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll ``` 該方法帶有 `self: Pin<&mut Self>` 而不是普通的 `&mut self` 的原因是,從 async/await 創建的 future 實例通常是自引用的,正如我們 [上面][self-ref-async-await] 所看到的。 通過將 `Self` 包裝到 `Pin` 中,並讓編譯器為從 async/await 生成的自引用 future 選擇不實現 `Unpin`,可以保證在 `poll` 調用之間不會在內存中移動 future。這確保了所有內部引用仍然有效。 [self-ref-async-await]: @/edition-2/posts/12-async-await/index.zh-TW.md#self-referential-structs 值得注意的是,在第一次 `poll` 調用之前移動 future 是沒問題的。這是因為 future 是惰性的,直到第一次被輪詢之前它們不會做任何事情。 生成的狀態機的 `start` 狀態因此只包含函數參數,而沒有內部引用。為了調用 `poll`,調用者必須首先將 future 包裝到 `Pin` 中, 這確保了 future 在內存中不會再被移動。由於 棧上 釘住操作 更難正確使用,我建議總是使用 [`Box::pin`] 結合 [`Pin::as_mut`] 來實現。 [`futures`]: https://docs.rs/futures/0.3.4/futures/ 如果你有興趣了解如何安全地使用 棧上 釘住操作 自己實現一個 future 組合器 函數,請查看 `futures` crate 的相對短的 [source of the `map` combinator method][map-src] 和釘住操作文檔的 [projections and structural pinning] 部分。 [map-src]: https://docs.rs/futures-util/0.3.4/src/futures_util/future/future/map.rs.html [projections and structural pinning]: https://doc.rust-lang.org/stable/std/pin/index.html#projections-and-structural-pinning ### 執行器 和 喚醒器 使用 `async/await`,我們可以使用完全異步的方式舒適地使用 futures。然而,正如我們上面所學到的,futures 在被輪詢之前不會做任何事情。這意味著我們必須在某個時候調用 `poll`,否則異步代碼永遠不會被執行。 使用單個 future,我們總是可以 [如上所述](#waiting-on-futures) 地使用循環手動等待每個 future。 然而,這種方法非常低效,對於創建大量 future 的程序來說並不實用。這個問題的最常見解決方案是定義一個全局的 _執行器_,它負責輪詢系統中的所有 future 直到它們完成。 #### 執行器 Executors 執行器的目的是允許將 future 作為獨立任務進行生成,通常通過某種 `spawn` 方法。然後執行器負責輪詢所有 future 直到它們完成。 管理所有 future 的一個重要優勢是,當 future 返回 `Poll::Pending` 時,執行器可以切換到另一個 future。 因此,異步操作是並行運行的,並且 CPU 保持忙碌。 許多執行器的實現也可以利用具有多個 CPU 核心的系統。它們創建了一個 線程池 [thread pool],如果有足夠的工作可用, 它可以利用所有核心,並使用諸如 [work stealing] 之類的技術來平衡核心之間的負載。 還有一些針對嵌入式系統的特殊執行器實現,它們優化了低延遲和內存開銷。 [thread pool]: https://en.wikipedia.org/wiki/Thread_pool [work stealing]: https://en.wikipedia.org/wiki/Work_stealing 為了避免重複輪詢 future 的開銷,執行器通常利用 Rust 的 futures 支持的 喚醒器 _waker_ API。 #### 喚醒器 Wakers 喚醒器 API 的想法是,一個特殊的 [`Waker`] 類型被傳遞給每個 `poll` 調用,它被包裝在 [`Context`] 類型中。 這個 `Waker` 類型是由執行器創建的,可以被異步任務用來通知它的(部分)完成。 因此,執行器不需要在之前返回 `Poll::Pending` 的 future 上調用 `poll`,直到它被相應的喚醒器通知。 [`Context`]: https://doc.rust-lang.org/nightly/core/task/struct.Context.html 這最好通過一個小例子來說明: ```rust async fn write_file() { async_write_file("foo.txt", "Hello").await; } ``` 這個函數異步地將字符串 "Hello" 寫入到 `foo.txt` 文件中。由於硬盤寫入需要一些時間,這個 future 的第一次 `poll` 調用可能會返回 `Poll::Pending`。 然而,硬盤驅動程序將內部存儲傳遞給 `poll` 調用的 `Waker`,並在文件寫入到硬盤時使用它來通知執行器。 這樣,執行器在收到喚醒器通知之前不需要浪費任何時間嘗試再次 `poll` 這個 future。 我們將在本文的實現部分中看到 `Waker` 類型的工作原理,當我們創建一個具有喚醒器支持的執行器時。 ### 協作式多任務? 在本文的開頭,我們談到了抢占式和協作式多任務。雖然抢占式多任務依賴於操作系統強制在運行任務之間切換, 但協作式多任務要求任務定期通過 _yield_ 操作自願放棄 CPU 控制。協作式方法的一個巨大優勢是任務可以自己保存它們的狀態, 這導致更有效的上下文切換,並且使得可以在任務之間共享相同的調用棧。 這可能不是顯而易見的,但 futures 和 async/await 是協作式多任務模式的一種實現: - 每個添加到執行器的 future 基本上是一個協作式任務。 - 不同於顯式的使用 yield 操作,futures 通過返回 `Poll::Pending` 來放棄 CPU 控制(或者在結束時返回 `Poll::Ready`)。 - 沒有任何東西強制 futures 放棄 CPU 控制。如果他們想要,他們可以永遠不從 `poll` 返回,例如,通過在循環中無休止地旋轉。 - 由於每個 future 都可以阻塞執行器中的其他 future 的執行,我們需要相信它們不是惡意的。 - Futures 內部存儲了所有它們需要的狀態,以便在下一次 `poll` 調用時繼續執行。 使用 async/await,編譯器自動檢測所有需要的變量並將它們存儲在生成的狀態機中。 - 只保存了繼續執行所需的最小狀態。 - 翻譯上面的文字: 由於 `poll` 方法在返回時放棄了調用棧,因此可以使用相同的棧來輪詢其他 futures。 我們看到 futures 和 async/await 完美地適應了協作式多任務模式;它們只是使用了一些不同的術語。 在接下來的內容中,我們將 "任務" 和 "future" 兩個術語混着使用。 ## 實現 現在我們了解了基於 futures 和 async/await 的協作式多任務在 Rust 中是如何工作的,是時候將對它的支持添加到我們的內核中了。 由於 [`Future`] trait 是 `core` 库的一部分,而 async/await 是語言本身的一個特性,我們在 `#![no_std]` 內核中使用它時不需要做任何特殊的事情。 唯一的要求是我們至少使用 Rust 的 nightly 版本 `2020-03-25`,因為在此之前,async/await 不兼容 `no_std`。 就着足夠新的 nightly 版本,我們可以在 `main.rs` 中開始使用 async/await: ```rust // in src/main.rs async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` 函數 `async_number` 是一個 `async fn`,所以編譯器將它轉換為一個實現了 `Future` 的狀態機。 由於函數只返回 `42`,所以生成的 future 將在第一次 `poll` 調用時直接返回 `Poll::Ready(42)`。 像 `async_number` 一樣,`example_task` 函數也是一個 `async fn`。它等待 `async_number` 返回的數字,然後使用 `println` 宏打印它。 爲運行 `example_task` 返回的 future,我們需要在它上調用 `poll` 直到它通過返回 `Poll::Ready` 來標誌它的完成。 為了做到這一點,我們需要創建一個簡單的執行器類型。 ### 任務 Task 實現執行器之前,我們先創建一個新的 `task` 模塊,裡面包含一個 `Task` 類型: ```rust // in src/lib.rs pub mod task; ``` ```rust // in src/task/mod.rs use core::{future::Future, pin::Pin}; use alloc::boxed::Box; pub struct Task { future: Pin>>, } ``` `Task` 結構體是一個對堆分配的、釘住的、動態分發的 future 的新型包裝,它的輸出類型是空類型 `()`。讓我們逐一詳細說明: - 我們要求與任務關聯的 future 返回 `()`。這意味着任務不返回任何結果,它們只是執行它們的副作用。例如,我們上面定義的 `example_task` 函數沒有返回值,但它作為副作用打印了一些東西到屏幕上。 - 關鍵字 `dyn` 表示我們在 `Box` 中存儲了一個 [_trait object_]。這意味着 future 上的方法是 動態分發 [_dynamically dispatched_] 的,允許不同類型的 future 存儲在 `Task` 類型中。這一點很重要,因為每個 `async fn` 都有自己的類型,我們希望能夠創建多個不同的任務。 - 正如我們在 釘住操作 部分 [section about pinning] 學到的,`Pin` 類型通過將值放在堆上並防止創建對它的 `&mut` 引用來確保值在內存中不會被移動。這一點很重要,因為由 async/await 生成的 future 可能是自引用的,即包含指向自己的指針,當 future 被移動時這些指針將失效。 [_trait object_]: https://doc.rust-lang.org/book/ch17-02-trait-objects.html [_dynamically dispatched_]: https://doc.rust-lang.org/book/ch18-02-trait-objects.html#trait-objects-perform-dynamic-dispatch [section about pinning]: #pinning 爲允許從 future 創建新的 `Task` 結構體,我們創建一個 `new` 函數: ```rust // in src/task/mod.rs impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { future: Box::pin(future), } } } ``` 函數帶有一個任意的 future,它的輸出類型是 `()`,並通過 [`Box::pin`] 函數在內存中釘住它。 然後它將被包裝的 future 放到 `Task` 結構體中並返回它。這裡需要 `'static` 生命周期,因為返回的 `Task` 可以存活在任意時間, 所以 future 也需要在那個時間內有效。 我們還加了一個 `poll` 方法,允許執行器輪詢持有的 future: ```rust // in src/task/mod.rs use core::task::{Context, Poll}; impl Task { fn poll(&mut self, context: &mut Context) -> Poll<()> { self.future.as_mut().poll(context) } } ``` 由於 `Future` trait 的 [`poll`] 方法期望在 `Pin<&mut T>` 類型上調用,我們首先使用 [`Pin::as_mut`] 方法將 `self.future` 字段從 `Pin>` 類型轉換為 `Pin<&mut T>` 類型。然後我們在轉換後的 `self.future` 字段上調用 `poll`,並返回結果。 由於 `Task::poll` 方法只應該由我們即將創建的執行器調用,我們將函數保持為 `task` 模塊的私有。 ### 簡單的執行器 因爲執行器可能相當複雜,我們故意在實現更多功能的執行器之前先創建一個非常基本的執行器。為此,我們首先創建一個新的 `task::simple_executor` 子模塊: ```rust // in src/task/mod.rs pub mod simple_executor; ``` ```rust // in src/task/simple_executor.rs use super::Task; use alloc::collections::VecDeque; pub struct SimpleExecutor { task_queue: VecDeque, } impl SimpleExecutor { pub fn new() -> SimpleExecutor { SimpleExecutor { task_queue: VecDeque::new(), } } pub fn spawn(&mut self, task: Task) { self.task_queue.push_back(task) } } ``` 這結構體包含一個 `task_queue` 字段,類型是 [`VecDeque`],它基本上是一個允許在兩端進行推送和彈出操作的向量。 使用這種類型的想法是,我們通過 `spawn` 方法在末尾插入新的任務,並從前面彈出下一個任務進行執行。 這樣,我們得到了一個簡單的 [FIFO queue](_"first in, first out"_ 先進先出)隊列容器。 [`VecDeque`]: https://doc.rust-lang.org/stable/alloc/collections/vec_deque/struct.VecDeque.html [FIFO queue]: https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics) #### 擺設型喚醒器 Dummy Waker 爲了調用 `poll` 方法,我們需要創建一個 [`Context`] 類型,它包裝了一個 [`Waker`] 類型。 為了簡單起見,我們首先創建一個什麼都不做的假貨喚醒器。 為此,我們創建了一個 [`RawWaker`] 實例,它定義了 `Waker` 的各種方法的實現,然後使用 [`Waker::from_raw`] 函數將它轉換為 `Waker`: [`RawWaker`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html [`Waker::from_raw`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.from_raw ```rust // in src/task/simple_executor.rs use core::task::{Waker, RawWaker}; fn dummy_raw_waker() -> RawWaker { todo!(); } fn dummy_waker() -> Waker { unsafe { Waker::from_raw(dummy_raw_waker()) } } ``` 函數 `from_raw` 是不安全的,因為如果程序員不遵守 `RawWaker` 的文檔要求,就可能發生未定義的行為。 在我們查看 `dummy_raw_waker` 函數的實現之前,我們首先試圖理解 `RawWaker` 類型的工作原理。 ##### `RawWaker` 類型 [`RawWaker`] 要求程序員明確地定義一個 [_virtual method table_] (_vtable_),它指定了在 `RawWaker` 被克隆、喚醒或丟棄時應該調用的函數。 這個 vtable 的佈局由 [`RawWakerVTable`] 類型定義。每個函數接收一個 `*const ()` 參數,這是一個對某個值的 _type-erased_ 指針。 使用 `*const ()` 指針而不是正確的引用的原因是,`RawWaker` 類型應該是非泛型的,但仍然支持任意類型。 通過將它放入 [`RawWaker::new`] 的 `data` 參數中提供,這個函數只是初始化了一個 `RawWaker`。 然後 `Waker` 使用這個 `RawWaker` 來使用 `data` 調用 vtable 函數。 [_virtual method table_]: https://en.wikipedia.org/wiki/Virtual_method_table [`RawWakerVTable`]: https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html [`RawWaker::new`]: https://doc.rust-lang.org/stable/core/task/struct.RawWaker.html#method.new 通常,`RawWaker` 是為一些堆分配的結構體創建的,它被包裝到 [`Box`] 或 [`Arc`] 類型中。 對於這樣的類型,可以使用 [`Box::into_raw`] 這樣的方法將 `Box` 轉換為 `*const T` 指針。 然後可以將這個指針轉換為匿名的 `*const ()` 指針並傳遞給 `RawWaker::new`。 由於每個 vtable 函數都接收相同的 `*const ()` 作為參數,所以函數可以安全地將指針轉換回 `Box` 或 `&T` 來操作它。 正如你所預料的,這個過程是非常危險的,並且很容易在出錯時導致未定義的行為。因此,除非必要,否則不建議手動創建 `RawWaker`。 [`Box`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`Box::into_raw`]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html#method.into_raw ##### 擺設型 `RawWaker` 既然手工創建 `RawWaker` 不被推薦,目前沒有其他方法可以創建一個什麼都不做的假貨喚醒器 `Waker`。 幸運的是,我們想要什麼都不做這一事實使得實現 `dummy_raw_waker` 函數相對安全: ```rust // in src/task/simple_executor.rs use core::task::RawWakerVTable; fn dummy_raw_waker() -> RawWaker { fn no_op(_: *const ()) {} fn clone(_: *const ()) -> RawWaker { dummy_raw_waker() } let vtable = &RawWakerVTable::new(clone, no_op, no_op, no_op); RawWaker::new(0 as *const (), vtable) } ``` 首先,我們定義了兩個內部函數 `no_op` 和 `clone`。`no_op` 函數接收一個 `*const ()` 指針並且什麼都不做。 `clone` 函數也接收一個 `*const ()` 指針並且通過再次調用 `dummy_raw_waker` 返回一個新的 `RawWaker`。 我們使用這兩個函數來創建一個最小的 `RawWakerVTable`:`clone` 函數用於克隆操作,`no_op` 函數用於所有其他操作。 由於 `RawWaker` 什麼都不做,所以我們從 `clone` 返回一個新的 `RawWaker` 而不是克隆它,這並不重要。 創建 `vtable` 後,我們使用 [`RawWaker::new`] 函數創建 `RawWaker`。 傳遞的 `*const ()` 沒有關係,因為 vtable 函數都不使用它。因此,我們只是簡單地傳遞了一個空指針。 #### `run` 方法 現在我們有了創建 `Waker` 實例的方法,我們可以使用它來在執行器上實現一個 `run` 方法。 最簡單的 `run` 方法是重複地在循環中輪詢所有排隊的任務,直到它們全部完成。 這並不是非常高效,因為它沒有利用 `Waker` 類型的通知,但這是一個讓執行器運行起來的簡單方法: ```rust // in src/task/simple_executor.rs use core::task::{Context, Poll}; impl SimpleExecutor { pub fn run(&mut self) { while let Some(mut task) = self.task_queue.pop_front() { let waker = dummy_waker(); let mut context = Context::from_waker(&waker); match task.poll(&mut context) { Poll::Ready(()) => {} // task done Poll::Pending => self.task_queue.push_back(task), } } } } ``` 函數使用 `while let` 循環來處理 `task_queue` 中的所有任務。 對於每個任務,它首先通過包裝 `dummy_waker` 函數返回的 `Waker` 實例來創建一個 `Context` 類型。 然後它使用這個 `context` 調用 `Task::poll` 方法。如果 `poll` 方法返回 `Poll::Ready`,則任務已完成,我們可以繼續下一個任務。 如果任務仍然是 `Poll::Pending`,我們將它再次添加到隊列的末尾,這樣它將在後續的循環迭代中再次被輪詢。 #### Trying It 就着我們的 `SimpleExecutor` 類型,我們現在可以嘗試在 `main.rs` 中運行 `example_task` 函數返回的任務了: ```rust // in src/main.rs use blog_os::task::{Task, simple_executor::SimpleExecutor}; fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialization routines, including `init_heap` let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.run(); // […] test_main, "it did not crash" message, hlt_loop } // Below is the example_task function again so that you don't have to scroll up async fn async_number() -> u32 { 42 } async fn example_task() { let number = async_number().await; println!("async number: {}", number); } ``` 在我們運行它時,我們看到了預期的 _"async number: 42"_ 消息被打印到屏幕上: ![QEMU printing "Hello World", "async number: 42", and "It did not crash!"](qemu-simple-executor.png) 讓我們總結一下這個例子中發生的各種步驟: - 首先,創建了一個新的 `SimpleExecutor` 類型的實例,它的 `task_queue` 是空的。 - 其次,我們調用異步 `example_task` 函數,它返回一個 future。我們將這個 future 包裝在 `Task` 類型中,這將它移動到堆上並釘住它,然後通過 `spawn` 方法將任務添加到執行器的 `task_queue` 中。 - 我們然後調用 `run` 方法來開始執行隊列中的單個任務。這包括: - 從 `task_queue` 的前端彈出任務。 - 爲任務創建一個 `RawWaker`,將它轉換為 [`Waker`] 實例,然後從中創建一個 [`Context`] 實例。 - 使用我們剛剛創建的 `Context` 調用任務的 future 的 [`poll`] 方法。 - 因爲 `example_task` 不等待任何東西,它可以在第一次 `poll` 調用時直接運行到結束。這就是 _"async number: 42"_ 行被打印的地方。 - 因爲 `example_task` 直接返回 `Poll::Ready`,它不會再次添加到任務隊列中。 - 方法 `run` 在 `task_queue` 變為空後返回。我們的 `kernel_main` 函數的執行繼續進行,並打印了 _"It did not crash!"_ 消息。 ### 異步鍵盤輸入 我們的簡單執行器沒有利用 `Waker` 通知,只是循環遍歷所有任務,直到它們完成。 這對我們的例子來說並不是問題,因為我們的 `example_task` 可以直接在第一次 `poll` 調用時運行到結束。 為了看到正確的 `Waker` 實現的性能優勢,我們首先需要創建一個真正異步的任務,即一個任務,它可能在第一次 `poll` 調用時返回 `Poll::Pending`。 我們已經有一些我們可以用來實現這一點的系統異步性:硬件中斷。正如我們在 [_Interrupts_] 文章中了解到的,硬件中斷可以在任意時間點發生,由某個外部設備決定。 例如,硬件計時器在某個預定的時間後向 CPU 發送一個中斷。當 CPU 收到一個中斷時,它立即將控制權轉移到中斷描述符表(IDT)中定義的相應處理函數。 [_Interrupts_]: @/edition-2/posts/07-hardware-interrupts/index.md 在接下來的內容中,我們將基於鍵盤中斷創建一個異步任務。鍵盤中斷是一個很好的候選者,因為它既是非確定性的,又是延遲關鍵的。 非確定性意味着沒有辦法預測下一次按鍵何時發生,因為它完全取決於用戶。 延遲關鍵意味着我們希望及時處理鍵盤輸入,否則用戶會感到延遲。為了支持這樣的任務,執行器對 `Waker` 通知的支持是至關重要的。 #### 掃描碼隊列 Scancode Queue 目前,我們直接在中斷處理程序中處理鍵盤輸入。這對長期來說並不是一個好主意,因為中斷處理程序應該保持盡可能短,因為它們可能會中斷重要的工作。 相反,中斷處理程序應該只執行必要的最小工作(例如,讀取鍵盤掃描碼),並將其餘的工作(例如,解釋掃描碼)留給後台任務。 將工作委派給後台任務的常見模式是創建某種類型的隊列。中斷處理程序將工作單元推送到隊列,後台任務處理隊列中的工作。應用到我們的鍵盤中斷,這意味着中斷處理程序只從鍵盤讀取掃描碼,將其推送到隊列,然後返回。鍵盤任務位於隊列的另一端,解釋和處理推送到它的每個掃描碼: ![Scancode queue with 8 slots on the top. Keyboard interrupt handler on the bottom left with a "push scancode" arrow to the left of the queue. Keyboard task on the bottom right with a "pop scancode" arrow coming from the right side of the queue.](scancode-queue.svg) 那隊列的一個簡單實現可以是一個互斥保護的 [`VecDeque`]。然而,在中斷處理程序中使用互斥鎖並不是一個好主意,因為它很容易導致死鎖。 例如,當用戶在鍵盤任務鎖定隊列時按下一個鍵,中斷處理程序再次嘗試獲取鎖,這會導致無限期地掛起。 這種方法的另一個問題是,當 `VecDeque` 變滿時,它會自動通過執行新的堆分配來增加其容量。 這可能再次導致死鎖,因為我們的分配器也在內部使用互斥鎖。進一步的問題是,當堆被碎片化時,堆分配可能會失敗或花費相當多的時間。 T爲避免這些問題,我們需要一個隊列實現,它不需要互斥鎖或堆分配來進行 `push` 操作。 這樣的隊列可以通過使用無鎖的 原子操作 [atomic operations] 來實現,用於推送和彈出元素。 這樣,可以創建只需要 `&self` 引用的 `push` 和 `pop` 操作,因此可以在沒有互斥鎖的情況下使用。 爲了避免在 `push` 上進行分配,隊列可以由一個預先分配的固定大小的緩衝區支持。 雖然這使得隊列 _有界_(即,它有一個最大長度),但實際上通常可以定義合理的隊列長度上限,因此這並不是一個大問題。 [atomic operations]: https://doc.rust-lang.org/core/sync/atomic/index.html ##### The `crossbeam` Crate 正確而高效地實現這樣一個隊列是非常困難的,所以我建議使用現有的經過充分測試的實現。 一個流行的 Rust 項目,它實現了各種無鎖類型來進行並發編程,就是 [`crossbeam`]。 它提供了一個名為 [`ArrayQueue`] 的類型,這正是我們在這種情況下所需要的。而且我們很幸運:這個類型完全兼容 具有分配支持的 `no_std` 庫。 [`crossbeam`]: https://github.com/crossbeam-rs/crossbeam [`ArrayQueue`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html 爲使用這個類型,我們需要在 `Cargo.toml` 中添加對 `crossbeam-queue` 的依賴: ```toml # in Cargo.toml [dependencies.crossbeam-queue] version = "0.2.1" default-features = false features = ["alloc"] ``` 翻譯上面的文字: 默認情況下,這個 crate 依賴於標準庫。為了使它兼容 `no_std`,我們需要禁用它的默認功能,並啓用 `alloc` 功能。 (注意,我們也可以添加對主要的 `crossbeam` crate 的依賴,它重新導出了 `crossbeam-queue` crate,但這將導致更多的依賴和更長的編譯時間。) ##### Queue Implementation 使用 `ArrayQueue` 類型,我們現在可以在一個新的 `task::keyboard` 模塊中創建一個全局的掃描碼隊列: ```rust // in src/task/mod.rs pub mod keyboard; ``` ```rust // in src/task/keyboard.rs use conquer_once::spin::OnceCell; use crossbeam_queue::ArrayQueue; static SCANCODE_QUEUE: OnceCell> = OnceCell::uninit(); ``` 由於 [`ArrayQueue::new`] 執行了堆分配,這在編譯時是不可能的([目前不可能,但有可能][const-heap-alloc])。我們不能直接初始化靜態變量。 相反,我們使用 [`conquer_once`] crate 的 [`OnceCell`] 類型,它使得可以對靜態值進行安全的一次性初始化。 要包含這個 crate,我們需要在 `Cargo.toml` 中將它添加為依賴: [`ArrayQueue::new`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.new [const-heap-alloc]: https://github.com/rust-lang/const-eval/issues/20 [`OnceCell`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html [`conquer_once`]: https://docs.rs/conquer-once/0.2.0/conquer_once/index.html ```toml # in Cargo.toml [dependencies.conquer-once] version = "0.2.0" default-features = false ``` 異於 [`OnceCell`] 源語,我們也可以在這裡使用 [`lazy_static`] 宏。 然而,`OnceCell` 類型的優點是我們可以確保初始化不會在中斷處理程序中進行,從而防止中斷處理程序執行堆分配。 [`lazy_static`]: https://docs.rs/lazy_static/1.4.0/lazy_static/index.html #### 填充掃描碼隊列 爲填充掃描碼隊列,我們創建一個新的 `add_scancode` 函數,我們將從中斷處理程序中調用: ```rust // in src/task/keyboard.rs use crate::println; /// Called by the keyboard interrupt handler /// /// Must not block or allocate. pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } } else { println!("WARNING: scancode queue uninitialized"); } } ``` 我們使用 [`OnceCell::try_get`] 來獲取初始化的隊列的引用。如果隊列還沒有初始化,我們忽略鍵盤掃描碼並打印一個警告。 重要的是,我們不應該在這個函數中嘗試初始化隊列,因為它將被中斷處理程序調用,而中斷處理程序不應該執行堆分配。 由於這個函數不應該從我們的 `main.rs` 中調用,我們使用 `pub(crate)` 可見性來使它只對我們的 `lib.rs` 可用。 [`OnceCell::try_get`]: https://docs.rs/conquer-once/0.2.0/conquer_once/raw/struct.OnceCell.html#method.try_get 方法 [`ArrayQueue::push`] 只需要一個 `&self` 引用,這使得在靜態隊列上調用這個方法非常簡單。 `ArrayQueue` 類型自己執行所有必要的同步,所以我們這裡不需要一個互斥鎖包裝器。如果隊列已滿,我們也打印一個警告。 [`ArrayQueue::push`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.push 爲在鍵盤中斷上調用 `add_scancode` 函數,我們更新 `interrupts` 模塊中的 `keyboard_interrupt_handler` 函數: ```rust // in src/interrupts.rs extern "x86-interrupt" fn keyboard_interrupt_handler( _stack_frame: InterruptStackFrame ) { use x86_64::instructions::port::Port; let mut port = Port::new(0x60); let scancode: u8 = unsafe { port.read() }; crate::task::keyboard::add_scancode(scancode); // new unsafe { PICS.lock() .notify_end_of_interrupt(InterruptIndex::Keyboard.as_u8()); } } ``` 我們移除了這個函數中的所有鍵盤處理代碼,並添加了一個對 `add_scancode` 函數的調用。這個函數的其餘部分與之前的一樣。 符合預期,當我們使用 `cargo run` 運行我們的項目時,按鍵不再被打印到屏幕上。相反,我們看到了每次按鍵都會警告說掃描碼隊列未初始化。 #### 掃描碼流 Scancode Stream 爲初始化 `SCANCODE_QUEUE` 並以異步方式從隊列中讀取掃描碼,我們創建了一個新的 `ScancodeStream` 類型: ```rust // in src/task/keyboard.rs pub struct ScancodeStream { _private: (), } impl ScancodeStream { pub fn new() -> Self { SCANCODE_QUEUE.try_init_once(|| ArrayQueue::new(100)) .expect("ScancodeStream::new should only be called once"); ScancodeStream { _private: () } } } ``` 字段 `_private` 的目的是防止從模塊外部構造結構。這使得 `new` 函數成為構造類型的唯一方法。在這個函數中,我們首先嘗試初始化 `SCANCODE_QUEUE` 靜態變量。 如果它已經初始化,我們會 panic,以確保只能創建單一 `ScancodeStream` 實例。 爲了使掃描碼對異步任務可用,下一步是實現一個類似 `poll` 的方法,它嘗試從隊列中彈出下一個掃描碼。 雖然這聽起來像是我們應該爲我們的類型實現 [`Future`] trait,但這並不完全適用於這裡。 問題是 `Future` trait 只是對單個異步值進行抽象,並且期望 `poll` 方法在返回 `Poll::Ready` 後不再被調用。 然而,我們的掃描碼隊列包含多個異步值,所以保持對它的輪詢是可以的。 ##### The `Stream` Trait 由於產生多個異步值的類型很常見,[`futures`] crate 提供了一種對這類型的有用抽象:[`Stream`] trait。該 特型 trait 的定義如下: [`Stream`]: https://rust-lang.github.io/async-book/05_streams/01_chapter.html ```rust pub trait Stream { type Item; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll>; } ``` 這個定義與 [`Future`] trait 非常相似,但有以下區別: - 關聯類型命名為 `Item` 而不是 `Output`。 - 不同於 `pool` 方法返回 `Poll`,`Stream` trait 定義了一個 `poll_next` 方法,它返回 `Poll>`(注意額外的 `Option`)。 還有個語義上的區別:`poll_next` 可以被重複調用,直到它返回 `Poll::Ready(None)` 來表示流結束。 在這方面,這個方法類似於 [`Iterator::next`] 方法,它在最後一個值之後也返回 `None`。 [`Iterator::next`]: https://doc.rust-lang.org/stable/core/iter/trait.Iterator.html#tymethod.next ##### 實現 `Stream` 我們來爲 `ScancodeStream` 實現 `Stream` trait,以提供 `SCANCODE_QUEUE` 的值。 爲此,我們首先需要添加對 `futures-util` crate 的依賴,它包含了 `Stream` 類型: ```toml # in Cargo.toml [dependencies.futures-util] version = "0.3.4" default-features = false features = ["alloc"] ``` 我們禁用了默認功能,以使這個 crate 兼容 `no_std`,並啓用 `alloc` 功能以使其 基於分配的類型 可用(我們稍後會需要這個)。 (注意,我們也可以添加對主要的 `futures` crate 的依賴,它重新導出了 `futures-util` crate,但這將導致更多的依賴和更長的編譯時間。) 現在我們可以導入並實現 `Stream` trait: ```rust // in src/task/keyboard.rs use core::{pin::Pin, task::{Poll, Context}}; use futures_util::stream::Stream; impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE.try_get().expect("not initialized"); match queue.pop() { Ok(scancode) => Poll::Ready(Some(scancode)), Err(crossbeam_queue::PopError) => Poll::Pending, } } } ``` 我們首先使用 [`OnceCell::try_get`] 方法來獲取初始化的掃描碼隊列的引用。 這應該永遠不會失敗,因為我們在 `new` 函數中初始化了隊列,所以我們可以安全地使用 `expect` 方法來在它未初始化時 panic。 接下來,我們使用 [`ArrayQueue::pop`] 方法來嘗試從隊列中獲取下一個元素。如果成功,我們返回包裝在 `Poll::Ready(Some(…))` 中的掃描碼。 如果失敗,這意味着隊列是空的。在這種情況下,我們返回 `Poll::Pending`。 [`ArrayQueue::pop`]: https://docs.rs/crossbeam/0.7.3/crossbeam/queue/struct.ArrayQueue.html#method.pop #### 喚醒器 Waker 支持 就像 `Futures::poll` 方法一樣,`Stream::poll_next` 方法要求異步任務在返回 `Poll::Pending` 後讓執行器知道它何時變得可用。 這樣,執行器就不需要再次輪詢同一個任務,直到它被通知,這大大減少了等待任務的性能開銷。 爲發送這個通知,任務應該從傳遞的 [`Context`] 引用中提取 [`Waker`],並將它存儲在某個地方。 當任務變得可用時,它應該在存儲的 `Waker` 上調用 [`wake`] 方法,以通知執行器該任務應該再次被輪詢。 ##### AtomicWaker 爲實現 `ScancodeStream` 的 `Waker` 通知,我們需要在輪詢調用之間找個地方來存儲 `Waker`。 我們不能將它存儲為 `ScancodeStream` 本身的字段,因為它需要從 `add_scancode` 函數中訪問。 這個問題的解決方案是使用 `futures-util` crate 提供的 [`AtomicWaker`] 類型的靜態變量。 像 `ArrayQueue` 類型一樣,這個類型基於原子指令,可以安全地存儲在 `static` 靜態變量 中並且可以並發修改。 [`AtomicWaker`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html 我們使用 [`AtomicWaker`] 類型來定義一個靜態的 `WAKER`: ```rust // in src/task/keyboard.rs use futures_util::task::AtomicWaker; static WAKER: AtomicWaker = AtomicWaker::new(); ``` 意思就是 `poll_next` 實現將當前的 waker 存儲在這個靜態變量中,而 `add_scancode` 函數在將新的掃描碼添加到隊列時在它上調用 `wake` 函數。 ##### 存儲喚醒器 Storing a Waker 由 `poll`/`poll_next` 定義的規則要求當任務返回 `Poll::Pending` 時,它應該爲傳遞的 `Waker` 註冊一個喚醒動作。 我們修改我們的 `poll_next` 實現來滿足這個要求: ```rust // in src/task/keyboard.rs impl Stream for ScancodeStream { type Item = u8; fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll> { let queue = SCANCODE_QUEUE .try_get() .expect("scancode queue not initialized"); // fast path if let Ok(scancode) = queue.pop() { return Poll::Ready(Some(scancode)); } WAKER.register(&cx.waker()); match queue.pop() { Ok(scancode) => { WAKER.take(); Poll::Ready(Some(scancode)) } Err(crossbeam_queue::PopError) => Poll::Pending, } } } ``` 像之前一樣,我們首先使用 [`OnceCell::try_get`] 函數來獲取初始化的掃描碼隊列的引用。然後我們嘗試從隊列中 `pop` 掃描碼,並在成功時返回 `Poll::Ready`。 這樣,我們可以避免在隊列不為空時註冊一個喚醒器的性能開銷。 如果 `queue.pop()` 的第一次調用不成功,隊列可能是空的。只是可能,因爲中斷處理程序可能在檢查後立即異步填充了隊列。 由於這種競態條件可能再次發生在下一次檢查中,我們需要在第二次檢查之前在 `WAKER` 靜態變量中註冊 `Waker`。 這樣,一個喚醒動作可能會在我們返回 `Poll::Pending` 之前發生,但可以保證我們會在檢查後推送的任何掃描碼上得到一個喚醒動作。 在通過函數 [`AtomicWaker::register`] 註冊了傳遞的 [`Context`] 中包含的 `Waker` 之後,我們嘗試第二次從隊列中彈出。 如果這次成功,我們返回 `Poll::Ready`。我們還使用 [`AtomicWaker::take`] 再次移除註冊的喚醒器,因爲不再需要喚醒通知。 如果 `queue.pop()` 第二次失敗,我們像之前一樣返回 `Poll::Pending`,但這次帶有一個已註冊的喚醒動作。 [`AtomicWaker::register`]: https://docs.rs/futures-util/0.3.4/futures_util/task/struct.AtomicWaker.html#method.register [`AtomicWaker::take`]: https://docs.rs/futures/0.3.4/futures/task/struct.AtomicWaker.html#method.take 注意,對於一個(可能還)沒有返回 `Poll::Pending` 的任務,有兩種方式可以進行喚醒。 一種方式是上面提到的競態條件,當喚醒在返回 `Poll::Pending` 之前立即發生。 另一種方式是在註冊喚醒器後隊列不再為空,這樣 `Poll::Ready` 就會被返回。 由於這些虛假的喚醒是無法防止的,執行器需要能夠正確地處理它們。 ##### 喚醒存儲的喚醒器 Waker 要喚醒存儲的 `Waker`,我們在 `add_scancode` 函數中添加一個對 `WAKER.wake()` 的調用: ```rust // in src/task/keyboard.rs pub(crate) fn add_scancode(scancode: u8) { if let Ok(queue) = SCANCODE_QUEUE.try_get() { if let Err(_) = queue.push(scancode) { println!("WARNING: scancode queue full; dropping keyboard input"); } else { WAKER.wake(); // new } } else { println!("WARNING: scancode queue uninitialized"); } } ``` 我們做出的唯一更改是在成功推送到掃描碼隊列時添加了一個對 `WAKER.wake()` 的調用。 如果在 `WAKER` 靜態變量中註冊了一個喚醒器,這個方法將在它上面調用同名的 [`wake`] 方法,這將通知執行器。否則,這個操作是一個空操作,即,什麼也不會發生。 [`wake`]: https://doc.rust-lang.org/stable/core/task/struct.Waker.html#method.wake 很重要的是,我們只在推送到隊列後調用 `wake`,否則任務可能會在隊列仍然為空時被過早地喚醒。 這可能發生在使用多線程執行器時,它在不同的 CPU 核心上同時啓動被喚醒的任務。雖然我們還沒有線程支持,但我們很快就會添加它,並且不希望事情在那時候出問題。 #### 鍵盤任務 我們現在爲 `ScancodeStream` 實現了 `Stream` trait,我們可以使用它來創建一個異步鍵盤任務: ```rust // in src/task/keyboard.rs use futures_util::stream::StreamExt; use pc_keyboard::{layouts, DecodedKey, HandleControl, Keyboard, ScancodeSet1}; use crate::print; pub async fn print_keypresses() { let mut scancodes = ScancodeStream::new(); let mut keyboard = Keyboard::new(layouts::Us104Key, ScancodeSet1, HandleControl::Ignore); while let Some(scancode) = scancodes.next().await { if let Ok(Some(key_event)) = keyboard.add_byte(scancode) { if let Some(key) = keyboard.process_keyevent(key_event) { match key { DecodedKey::Unicode(character) => print!("{}", character), DecodedKey::RawKey(key) => print!("{:?}", key), } } } } } ``` 代碼與我們之前在 [keyboard interrupt handler] 中的代碼非常相似,只是我們不再從 I/O 端口讀取掃描碼,而是從 `ScancodeStream` 中獲取它。 爲此,我們首先創建一個新的 `Scancode` 流,然後重複使用由 [`StreamExt`] trait 提供的 [`next`] 方法來獲取一個 `Future`,它解析爲流中的下一個元素。 通過在它上使用 `await` 運算符,我們異步等待 future 的結果。 [keyboard interrupt handler]: @/edition-2/posts/07-hardware-interrupts/index.md#interpreting-the-scancodes [`next`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html#method.next [`StreamExt`]: https://docs.rs/futures-util/0.3.4/futures_util/stream/trait.StreamExt.html 我們使用 `while let` 循環直到流返回 `None` 標誌它的結束。由於我們的 `poll_next` 方法從不返回 `None`, 這實際上是一個無限循環,所以 `print_keypresses` 任務永遠不會結束。 我們在 `main.rs` 中將 `print_keypresses` 任務添加到執行器中,以便再次獲得鍵盤輸入: ```rust // in src/main.rs use blog_os::task::keyboard; // new fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialization routines, including init_heap, test_main let mut executor = SimpleExecutor::new(); executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); // new executor.run(); // […] "it did not crash" message, hlt_loop } ``` 現在我們執行 `cargo run`,我們會看到鍵盤輸入再次可用了: ![QEMU printing ".....H...e...l...l..o..... ...W..o..r....l...d...!"](qemu-keyboard-output.gif) 如果你在你的計算機上保持對 CPU 利用率的關注,你會看到 `QEMU` 進程現在讓 CPU 非常忙碌。 這是因爲我們的 `SimpleExecutor` 在一個循環中一遍又一遍地輪詢任務。 所以即使我們沒有在鍵盤上按任何鍵,執行器也會一遍又一遍地調用我們的 `print_keypresses` 任務的 `poll` 方法,即使該任務無法取得任何進展,並且每次都會返回 `Poll::Pending`。 ### 帶喚醒器支持的執行器 爲修復性能問題,我們需要創建一個執行器,它正確地利用了 `Waker` 通知。 這樣,當下一個鍵盤中斷發生時,執行器就會被通知,所以它不需要一遍又一遍地輪詢 `print_keypresses` 任務。 #### Task Id 創建具有正確的喚醒器通知支持的 執行器 的第一步是給每個任務分配一個唯一的 ID。 這是必需的,因為我們需要一種方法來指定應該喚醒哪個任務。我們首先創建一個新的 `TaskId` 包裝類型: ```rust // in src/task/mod.rs #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] struct TaskId(u64); ``` 結構體 `TaskId` 是一個簡單的 `u64` 包裝類型。我們爲它衍生了一些特性,使它可以被打印、複製、比較和排序。 後者很重要,因爲我們希望在一會兒使用 `TaskId` 作爲 [`BTreeMap`] 的鍵類型。 [`BTreeMap`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html 爲了創建一個新的唯一 ID,我們創建了一個 `TaskId::new` 函數: ```rust use core::sync::atomic::{AtomicU64, Ordering}; impl TaskId { fn new() -> Self { static NEXT_ID: AtomicU64 = AtomicU64::new(0); TaskId(NEXT_ID.fetch_add(1, Ordering::Relaxed)) } } ``` 函數使用了一個 [`AtomicU64`] 類型的靜態變量 `NEXT_ID` 來確保每個 ID 只分配一次。[`fetch_add`] 方法以原子方式增加值並在一個原子操作中返回先前的值。 這意味着即使 `TaskId::new` 方法並行調用,每個 ID 都只返回一次。[`Ordering`] 參數定義了編譯器是否允許重新排列指令流中的 `fetch_add` 操作。 由於我們只需要 ID 是唯一的,這種情況下最弱的要求 `Relaxed` 排序就足夠了。 [`AtomicU64`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html [`fetch_add`]: https://doc.rust-lang.org/core/sync/atomic/struct.AtomicU64.html#method.fetch_add [`Ordering`]: https://doc.rust-lang.org/core/sync/atomic/enum.Ordering.html 現在我們可以通過添加一個額外的 `id` 字段來擴展我們的 `Task` 類型: ```rust // in src/task/mod.rs pub struct Task { id: TaskId, // new future: Pin>>, } impl Task { pub fn new(future: impl Future + 'static) -> Task { Task { id: TaskId::new(), // new future: Box::pin(future), } } } ``` 新的 `id` 字段使得可以唯一地命名一個任務,這是喚醒特定任務所必需的。 #### The `Executor` Type 我們在 `task::executor` 模塊中創建我們的新 `Executor` 類型: ```rust // in src/task/mod.rs pub mod executor; ``` ```rust // in src/task/executor.rs use super::{Task, TaskId}; use alloc::{collections::BTreeMap, sync::Arc}; use core::task::Waker; use crossbeam_queue::ArrayQueue; pub struct Executor { tasks: BTreeMap, task_queue: Arc>, waker_cache: BTreeMap, } impl Executor { pub fn new() -> Self { Executor { tasks: BTreeMap::new(), task_queue: Arc::new(ArrayQueue::new(100)), waker_cache: BTreeMap::new(), } } } ``` 不同於我們在 `SimpleExecutor` 中使用 [`VecDeque`] 存儲任務,我們使用了一個任務 ID 的 `task_queue` 和一個名爲 `tasks` 的 [`BTreeMap`],它包含了實際的 `Task` 實例。這個 map 是由 `TaskId` 索引的,這樣可以高效地繼續特定的任務。 字段 `task_queue` 是任務 ID 的 [`ArrayQueue`] 類型的容器,包裝在 [`Arc`] 類型中, Arc 實現了 _引用計數_。 引用計數使得可以在多個所有者之間共享值的所有權。它通過在堆上分配值並計算對它的活動引用數量來工作。 當活動引用的數量達到零時,就不再需要這個值,它可以被釋放。 我們使用這個 `Arc` 類型來存儲 `task_queue`,因爲它將在執行器和喚醒器之間共享。 這個設計的基本思想是喚醒器將被喚醒的任務的 ID 推送到隊列中。 執行器則在隊列的接收端,通過 ID 從 `tasks` map 中檢索被喚醒的任務,然後運行它們。 使用固定大小的隊列而不是像 [`SegQueue`] 這樣的無界隊列的原因是中斷處理程序不應該在推送到這個隊列時分配內存。 翻譯上面的文字: 除了 `task_queue` 和 `tasks` map 之外,`Executor` 類型還有一個 `waker_cache` 字段,它也是一個 map。 這個 map 在創建後緩存了任務的 [`Waker`]。這有兩個原因:首先,它通過重用同一個任務的多次喚醒而不是每次都創建一個新的喚醒器來提高性能。 其次,它確保引用計數的喚醒器不會在中斷處理程序中被釋放,因爲這可能導致死鎖(下面有更多細節)。 [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html [`SegQueue`]: https://docs.rs/crossbeam-queue/0.2.1/crossbeam_queue/struct.SegQueue.html 要創建一個 執行器 `Executor`,我們提供了一個簡單的 `new` 函數。我們選擇了 100 的容量作爲 `task_queue`,這應該足夠應付可預見的未來。如果我們的系統在某個時候有超過 100 個並發任務,我們可以輕鬆地增加這個大小。 #### Spawning Tasks 就像 `SimpleExecutor` 一樣,我們在 執行器 `Executor` 類型上提供了一個 `spawn` 方法, 它將給定的任務添加到 `tasks` map 中,並立即通過將它的 ID 推送到 `task_queue` 中來喚醒它: ```rust // in src/task/executor.rs impl Executor { pub fn spawn(&mut self, task: Task) { let task_id = task.id; if self.tasks.insert(task.id, task).is_some() { panic!("task with same ID already in tasks"); } self.task_queue.push(task_id).expect("queue full"); } } ``` 如果 map 中已經有一個具有相同 ID 的任務,[`BTreeMap::insert`] 方法會返回它。 這應該永遠不會發生,因爲每個任務都有一個唯一的 ID,所以在這種情況下我們會 panic,因爲這表明我們的代碼中有一個 bug。 同樣,如果我們選擇了足夠大的隊列大小,當 `task_queue` 滿時我們也會 panic,因爲這應該永遠不會發生。 #### 運行任務 Running Tasks 爲在 `task_queue` 中運行所有任務,我們創建了一個私有的 `run_ready_tasks` 方法: ```rust // in src/task/executor.rs use core::task::{Context, Poll}; impl Executor { fn run_ready_tasks(&mut self) { // destructure `self` to avoid borrow checker errors let Self { tasks, task_queue, waker_cache, } = self; while let Ok(task_id) = task_queue.pop() { let task = match tasks.get_mut(&task_id) { Some(task) => task, None => continue, // task no longer exists }; let waker = waker_cache .entry(task_id) .or_insert_with(|| TaskWaker::new(task_id, task_queue.clone())); let mut context = Context::from_waker(waker); match task.poll(&mut context) { Poll::Ready(()) => { // task done -> remove it and its cached waker tasks.remove(&task_id); waker_cache.remove(&task_id); } Poll::Pending => {} } } } } ``` 這函數基本的想法與我們的 `SimpleExecutor` 類型相似:循環遍歷 `task_queue` 中的所有任務,爲每個任務創建一個喚醒器,然後輪詢它們。 然而,與將待定任務添加回 `task_queue` 的 `SimpleExecutor` 不同,我們讓我們的 `TaskWaker` 實現來處理將被喚醒的任務添加回隊列。 這個喚醒器類型的實現將在下面展示。 讓我們來看看這個 `run_ready_tasks` 方法的一些實現細節: - 我們使用 [_destructuring_] 將 `self` 分成它的三個字段,以避免一些借用檢查錯誤。 換句話說,我們的實現需要從一個閉包中訪問 `self.task_queue`,這個閉包目前嘗試完全借用 `self`。 這是一個基本的借用檢查問題,將在 [RFC 2229] 被 [實現][RFC 2229 impl] 時解決。 - 對於每個彈出的任務 ID,我們從 `tasks` map 中獲取了一個對應任務的可變引用。由於我們的 `ScancodeStream` 實現在檢查任務是否需要睡眠之前註冊了喚醒器,可能會發生一個任務不存在的情況。在這種情況下,我們只是忽略這個喚醒,並繼續處理隊列中的下一個 ID。 - 爲避免在每次輪詢時創建喚醒器的性能開銷,我們使用 `waker_cache` map 來存儲每個任務的喚醒器。爲此,我們使用了 [`BTreeMap::entry`] 方法和 [`Entry::or_insert_with`] 來在它不存在時創建一個新的喚醒器,然後獲取一個對它的可變引用。爲創建一個新的喚醒器,我們克隆了 `task_queue`,並將它與任務 ID 一起傳遞給 `TaskWaker::new` 函數(下面展示了實現)。由於 `task_queue` 被包裝在一個 `Arc` 中,`clone` 只增加了值的引用計數,但仍然指向同一個堆分配的隊列。請注意,像這樣重用喚醒器對於所有的喚醒器實現來說都是不可能的,但我們的 `TaskWaker` 類型將允許它。 [_destructuring_]: https://doc.rust-lang.org/book/ch19-03-pattern-syntax.html#destructuring-to-break-apart-values [RFC 2229]: https://github.com/rust-lang/rfcs/pull/2229 [RFC 2229 impl]: https://github.com/rust-lang/rust/issues/53488 [`BTreeMap::entry`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.entry [`Entry::or_insert_with`]: https://doc.rust-lang.org/alloc/collections/btree_map/enum.Entry.html#method.or_insert_with 當任務返回 `Poll::Ready` 時,它就完成了。在這種情況下,我們使用 [`BTreeMap::remove`] 方法從 `tasks` map 中移除它。如果它的緩存喚醒器存在,我們也會移除它。 [`BTreeMap::remove`]: https://doc.rust-lang.org/alloc/collections/btree_map/struct.BTreeMap.html#method.remove #### 喚醒器 Waker 設計 喚醒器的工作是將被喚醒的任務的 ID 推送到執行器的 `task_queue` 中。 我們通過創建一個新的 `TaskWaker` 結構來實現這一點,它存儲任務 ID 和對 `task_queue` 的引用: ```rust // in src/task/executor.rs struct TaskWaker { task_id: TaskId, task_queue: Arc>, } ``` 因爲 `task_queue` 的所有權在執行器和喚醒器之間共享,我們使用了 [`Arc`] 包裝類型來實現共享的引用計數所有權。 [`Arc`]: https://doc.rust-lang.org/stable/alloc/sync/struct.Arc.html 喚醒操作的實現非常簡單: ```rust // in src/task/executor.rs impl TaskWaker { fn wake_task(&self) { self.task_queue.push(self.task_id).expect("task_queue full"); } } ``` 我們推送 `task_id` 到引用的 `task_queue`。由於對 [`ArrayQueue`] 類型的修改只需要一個共享引用,我們可以在 `&self` 上實現這個方法,而不是在 `&mut self` 上。 ##### The `Wake` Trait 爲了使用我們的 `TaskWaker` 類型來輪詢 future,我們首先需要將它轉換爲一個 [`Waker`] 實例。 這是必需的,因爲 [`Future::poll`] 方法接受一個 [`Context`] 實例作爲參數,這個實例只能從 `Waker` 類型構建。 雖然我們可以通過提供 [`RawWaker`] 類型的實現來做到這一點,但通過實現基於 `Arc` 的 [`Wake`][wake-trait] trait 並使用標准庫提供的 [`From`] 實現來構建 `Waker` 來說,這既更簡單又更安全。 這個 trait 實現看起來像這樣: [wake-trait]: https://doc.rust-lang.org/nightly/alloc/task/trait.Wake.html ```rust // in src/task/executor.rs use alloc::task::Wake; impl Wake for TaskWaker { fn wake(self: Arc) { self.wake_task(); } fn wake_by_ref(self: &Arc) { self.wake_task(); } } ``` 由於喚醒器通常在執行器和異步任務之間共享,這個 trait 的方法要求 `Self` 實例被包裝在 [`Arc`] 類型中,`Arc` 實現了引用計數所有權。 這意味着我們必須將我們的 `TaskWaker` 移動到一個 `Arc` 中才能調用它們。 在 `wake` 和 `wake_by_ref` 方法之間的區別是,後者只需要一個對 `Arc` 的引用,而前者需要對 `Arc` 的所有權,因此通常需要增加引用計數。 並不是所有的類型都支持通過引用喚醒,所以實現 `wake_by_ref` 方法是可選的。 然而,它可以帶來更好的性能,因爲它避免了不必要的引用計數修改。 在我們的情況下,我們可以將這兩個 trait 方法簡單地轉發到我們的 `wake_task` 函數,這個函數只需要一個共享的 `&self` 引用。 ##### 創建喚醒器 Creating Wakers 既然 `Waker` 類型支持所有實現了 `Wake` trait 的 `Arc` 包裝值的 [`From`] 轉換器, 我們現在可以實現 `TaskWaker::new` 函數,這是我們的 `Executor::run_ready_tasks` 方法所需要的: [`From`]: https://doc.rust-lang.org/nightly/core/convert/trait.From.html ```rust // in src/task/executor.rs impl TaskWaker { fn new(task_id: TaskId, task_queue: Arc>) -> Waker { Waker::from(Arc::new(TaskWaker { task_id, task_queue, })) } } ``` We create the `TaskWaker` using the passed `task_id` and `task_queue`. We then wrap the `TaskWaker` in an `Arc` and use the `Waker::from` implementation to convert it to a [`Waker`]. This `from` method takes care of constructing a [`RawWakerVTable`] and a [`RawWaker`] instance for our `TaskWaker` type. In case you're interested in how it works in detail, check out the [implementation in the `alloc` crate][waker-from-impl]. 我們使用傳遞的 `task_id` 和 `task_queue` 創建了 `TaskWaker`。 然後我們將 `TaskWaker` 包裝在一個 `Arc` 中,並使用 `Waker::from` 實現來將它轉換爲一個 [`Waker`]。 這個 `from` 方法負責構建一個 [`RawWakerVTable`] 和一個 [`RawWaker`] 實例,用於我們的 `TaskWaker` 類型。 如果你對它的工作細節感興趣,可以查看 [`alloc` crate 中的實現][waker-from-impl]。 [waker-from-impl]: https://github.com/rust-lang/rust/blob/cdb50c6f2507319f29104a25765bfb79ad53395c/src/liballoc/task.rs#L58-L87 #### A `run` Method 就着我們的喚醒器實現,我們最終可以爲我們的執行器構建一個 `run` 方法: ```rust // in src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); } } } ``` 這個方法只是在一個循環中調用 `run_ready_tasks` 函數。雖然我們理論上可以在 `tasks` map 變爲空時從函數中返回,但這永遠不會發生,因爲我們的 `keyboard_task` 永遠不會結束,所以一個簡單的 `loop` 就足夠了。由於這個函數永遠不會返回,我們使用 `!` 返回類型來將函數標記爲 [diverging] 給編譯器。 [diverging]: https://doc.rust-lang.org/stable/rust-by-example/fn/diverging.html 現在可以將我們的 `kernel_main` 改爲使用新 `Executor` 而不是 `SimpleExecutor` 了: ```rust // in src/main.rs use blog_os::task::executor::Executor; // new fn kernel_main(boot_info: &'static BootInfo) -> ! { // […] initialization routines, including init_heap, test_main let mut executor = Executor::new(); // new executor.spawn(Task::new(example_task())); executor.spawn(Task::new(keyboard::print_keypresses())); executor.run(); } ``` 我們只需要改變導入和類型名稱。由於我們的 `run` 函數被標記爲 diverging,編譯器知道它永遠不會返回,所以我們不再需要在 `kernel_main` 函數的末尾調用 `hlt_loop`。 我們用 `cargo run` 運行我們的內核,現在我們可以看到鍵盤輸入仍然在起作用: ![QEMU printing ".....H...e...l...l..o..... ...a..g..a....i...n...!"](qemu-keyboard-output-again.gif) 然而,QEMU 的 CPU 利用率並沒有得到改善。這是因爲我們仍然讓 CPU 一直忙碌。 我們不再輪詢任務直到它們再次被喚醒,但我們仍然在一個忙碌的循環中檢查 `task_queue`。 爲了解決這個問題,我們需要在沒有更多工作要做時讓 CPU 進入睡眠狀態。 #### 空閒時睡眠 基本的思路是,當 `task_queue` 爲空時執行 [`hlt` 指令]。這個指令將 CPU 進入睡眠狀態,直到下一個中斷到來。 CPU 在中斷時立即變爲活動狀態的事實確保了當中斷處理程序推送到 `task_queue` 時我們仍然可以直接做出反應。 [`hlt` instruction]: https://en.wikipedia.org/wiki/HLT_(x86_instruction) 爲實現這一點,我們在執行器中創建了一個新的 `sleep_if_idle` 方法,並從我們的 `run` 方法中調用它: ```rust // in src/task/executor.rs impl Executor { pub fn run(&mut self) -> ! { loop { self.run_ready_tasks(); self.sleep_if_idle(); // new } } fn sleep_if_idle(&self) { if self.task_queue.is_empty() { x86_64::instructions::hlt(); } } } ``` 由於我們直接在 `run_ready_tasks` 返回後調用 `sleep_if_idle`,而 `run_ready_tasks` 會一直循環直到 `task_queue` 變爲空, 所以再次檢查隊列可能看起來是不必要的。然而,硬件中斷可能會在 `run_ready_tasks` 返回後直接發生, 所以在調用 `sleep_if_idle` 函數時可能已經有一個新的任務在隊列中。只有在隊列仍然爲空時, 我們才通過 [`x86_64`] 提供的 [`instructions::hlt`] 包裝函數執行 `hlt` 指令將 CPU 進入睡眠狀態。 [`instructions::hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/fn.hlt.html [`x86_64`]: https://docs.rs/x86_64/0.14.2/x86_64/index.html 不幸的是,這個實現中仍然存在一個微妙的競爭條件。由於中斷是異步的,可以在任何時候發生,所以有可能在 `is_empty` 檢查和 `hlt` 調用之間發生中斷: ```rust if self.task_queue.is_empty() { /// <--- interrupt can happen here x86_64::instructions::hlt(); } ``` 在這種情況下,如果這個中斷推送到 `task_queue`,即使現在有一個準備好的任務,我們也會讓 CPU 進入睡眠狀態。 在最壞的情況下,這可能會延遲處理鍵盤中斷,直到下一次按鍵或下一次計時器中斷。那麼,我們如何防止這種情況呢? 答案就是,在檢查之前在 CPU 上禁用中斷,然後在 `hlt` 指令之後原子地再次啓用它們。 這樣,所有在中間發生的中斷都會在 `hlt` 指令之後延遲,這樣就不會錯過任何喚醒。 爲了實現這種方法,我們可以使用 [`x86_64`] 庫提供的 [`interrupts::enable_and_hlt`][`enable_and_hlt`] 函數。 [`enable_and_hlt`]: https://docs.rs/x86_64/0.14.2/x86_64/instructions/interrupts/fn.enable_and_hlt.html 更新後的 `sleep_if_idle` 函數的實現如下: ```rust // in src/task/executor.rs impl Executor { fn sleep_if_idle(&self) { use x86_64::instructions::interrupts::{self, enable_and_hlt}; interrupts::disable(); if self.task_queue.is_empty() { enable_and_hlt(); } else { interrupts::enable(); } } } ``` 爲避免競爭條件,我們在檢查 `task_queue` 是否爲空之前禁用中斷。 如果爲空,我們使用 [`enable_and_hlt`] 函數將中斷啓用並將 CPU 進入睡眠狀態作爲一個單一的原子操作。 如果隊列不再爲空,這意味着一個中斷在 `run_ready_tasks` 返回後喚醒了一個任務。在這種情況下,我們再次啓用中斷,並直接繼續執行而不執行 `hlt`。 現在,當沒有任務要執行時,我們的執行器可以正確地讓 CPU 進入睡眠狀態。 我們可以看到,當我們再次使用 `cargo run` 運行我們的內核時,QEMU 進程的 CPU 利用率大大降低。 #### 可能的擴展 我們的執行器現在能夠以高效的方式運行任務。它利用喚醒器通知來避免輪詢等待的任務,並在當前沒有工作要做時讓 CPU 進入睡眠狀態。然而,我們的執行器仍然相當基本,有許多可能的擴展功能: - **排程**:對於我們的 `task_queue`,我們目前使用 [`VecDeque`] 類型來實現 _先進先出_(FIFO)策略,這通常也被稱爲 _輪詢_ 排程。這種策略可能不適用於所有的工作負載。例如,優先考慮優先處理延遲關鍵的任務或執行大量 I/O 的任務。有關更多信息,請參見 [_Operating Systems: Three Easy Pieces_] 書籍的 排程章節 [scheduling chapter] 或 [Wikipedia 上的排程文章][scheduling-wiki]。 - **任務生成**:我們的 `Executor::spawn` 方法目前需要一個 `&mut self` 引用,因此在調用 `run` 方法後不再可用。爲了解決這個問題,我們可以創建一個額外的 `Spawner` 類型,它與執行器共享某種隊列,並允許任務在任務自身內部創建。隊列可以直接是 `task_queue`,也可以是執行器在其運行循環中檢查的一個獨立的隊列。 - **利用線程**:我們還沒有支持線程,但我們將在下一篇文章中添加它。這將使得在不同的線程中啟動執行器的多個實例成為可能。這種方法的優點是,由於其他任務可以並行運行,因此可以減少長時間運行任務所帶來的延遲。這種方法還允許它利用多個CPU核心。 - **負載均衡**:在添加線程支持時,了解如何在執行器之間分配任務以確保所有 CPU 核心都得到利用,變得很重要。這方面的一個常見技術是 [_work stealing_]。 [scheduling chapter]: http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf [_Operating Systems: Three Easy Pieces_]: http://pages.cs.wisc.edu/~remzi/OSTEP/ [scheduling-wiki]: https://en.wikipedia.org/wiki/Scheduling_(computing) [_work stealing_]: https://en.wikipedia.org/wiki/Work_stealing ## 摘要 我們首先介紹了 **多任務**,並區分了 _抢占式_ 多任務和 _協作式_ 多任務。前者會定期強制中斷正在運行的任務,而後者則讓任務運行,直到它們自願放棄 CPU 控制權。 然後,我們探索了 Rust 對 **async/await** 的支持,它提供了協作式多任務的語言級實現。Rust 基於 基於輪詢的 `Future` trait 實現,這個 trait 抽象了異步任務。 使用 async/await,我們可以幾乎像使用普通同步代碼一樣使用 future。 不同之處在於,異步函數再次返回一個 `Future`,這個 `Future` 需要在某個時候添加到執行器中才能運行它。 在幕後,編譯器將 async/await 代碼轉換爲 _狀態機_,其中每個 `.await` 操作對應於一個可能的暫停點。 通過利用它對程序的了解,編譯器能夠僅保存每個暫停點的最小狀態,從而使每個任務的內存消耗非常小。 一個挑戰是生成的狀態機可能包含 _自引用_ 的結構,例如當異步函數的局部變量相互引用時。 爲了防止指針失效,Rust 使用 `Pin` 類型來確保 future 在第一次輪詢後不能再在內存中移動。 對於我們的 **實現**,我們首先創建了一個非常基本的執行器,它在一個忙碌的循環中輪詢所有的任務,而不使用 `Waker` 類型。 然後,我們通過實現一個異步鍵盤任務來展示喚醒通知的優勢。 這個任務定義了一個靜態變量 `SCANCODE_QUEUE`,使用 `crossbeam` 庫提供的無互斥鎖的 `ArrayQueue` 類型。 現在,鍵盤中斷處理程序不再直接處理按鍵,而是將所有接收到的掃描碼放入隊列,然後喚醒註冊的 `Waker` 來通知有新的輸入可用了。 在接收端,我們創建了一個 `ScancodeStream` 類型,它提供了一個 `Future`,解析隊列中的下一個掃描碼。 這使得我們可以創建一個使用 async/await 來解釋和打印隊列中掃描碼的異步 `print_keypresses` 任務。 爲了利用鍵盤任務的喚醒通知,我們創建了一個新的 `Executor` 類型,它使用一個 共享 `Arc` 的 `task_queue` 來存儲準備好的任務。 我們實現了一個 `TaskWaker` 類型,它將被喚醒的任務的 ID 直接推送到這個 `task_queue`,然後由執行器再次輪詢。 爲了在沒有任務可運行時節省電力,我們添加了使用 `hlt` 指令將 CPU 進入睡眠狀態的支持。 最後,我們討論了我們的執行器的一些潛在擴展,例如提供多核支持。 ## 下一步要幹嘛? 使用 async/await,我們現在在內核中基本支持了協作式多任務。雖然協作式多任務非常高效,但當單個任務運行時間過長時,會導致延遲問題,從而阻止其他任務運行。 因此,我們的內核也應該添加對抢占式多任務的支持。 下一篇文章中,我們將介紹 _線程_ 作爲最常見的抢占式多任務形式。除了解決長時間運行任務的問題外,線程還將爲我們利用多個 CPU 核心和在將來運行不受信任的用戶程序做準備。 ================================================ FILE: blog/content/edition-2/posts/12-async-await/scancode-queue.drawio ================================================ 7Vldb9sgFP01fmzk7ySPbdKt0zqpayate6Q2sVGJsTBukv76QQw22Plam6ZWNVeq4HC5wLk395jE8iaL1VcK8vQHiSG2XDteWd7Ucl1v7PP/AlhXgBu6FZBQFFeQ0wAz9AIlaEu0RDEsDENGCGYoN8GIZBmMmIEBSsnSNJsTbK6agwR2gFkEcBf9jWKWVugosBv8BqIkVSs7thxZAGUsgSIFMVlqkHdteRNKCKtai9UEYsGd4qWa92XHaL0xCjN2zITZC/aTcgZ+4zF5CFfpzfDevZBengEu5YHlZtlaMUBJmcVQOLEt72qZIgZnOYjE6JKHnGMpW2Dec3hTuoOUwdXOfTr16XnWQLKAjK65iZwwknzJhAlld9mwrxhNNeIVBmS8k9pvQwlvSFb+gSG3fwzVOdYTirweUuT2iyK/hxT5/aIo6CFFYb8oCk9M0RxhPCGY0M1cbz6HYRRxvGCUPEFtxN48JyK1ZwV++BlIdXsmCaNPQWrPRGTcIZW/qGYRf+Xm6M8ScrRNMj88M5k0GctIBlv0SghglGS8G3GyIMevBJWIvxhfyoEFimOxzNbQmcE9QSxaRcPvhqIu1meJhbqfaMH4DtePBNCYo98EZ7TMGW/fgIzzRPfkv3M4/981z4cmt/W7k0bucAu34btxe8StBGbxpbjeiQzFoChQZBIGV4g9yPQT7T+iPQhkb7rShqZr1cn47h/0jjZLdJtpm56aV20Oxp2bZCsA/ACkpBE8Iq0YoAlk+wy3R1QLWbAlZAqjEAOGns39boujXOGOoIzpamMkTO1WeaiOKSfpV9KWn1bi1SKm/FQsdPzwwIO1ZpYLg2L3doNWfgf23l21zIeGNW9UyzfpXdP/hozv3jLzskhFxtQV/mBlVxUbwznbV68pLNALeNw4Eukr6eNegysrmHIEg0eIr0D0lGyKlCG84lEmd6RADBHhnVZJVq962xqvVy94jUNZcrvZ5NQT4kMypi3hT8Xfvsomv+uRR7DqdNQ/cHsKy846eGEPvDAcmslykg+MY/i8cEeDkTfWntB0SObzArJW2p0m0bp3dU22foHiqcc61b45fLxQHXGtP1aonNcIlfNhQjU6Vqe8/0J1tFC11xnuF6rOi1twDqnqfk+Tk/x1StUWjfNIVUsfDynVvczKM0uVf0iq3MAUFVWM3ipVvplTzsAdv12deLf5paMyb34u8q7/Ag== ================================================ FILE: blog/content/edition-2/posts/_index.ar.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.es.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.fa.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.fr.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.ja.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.ko.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.pt-BR.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.ru.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.zh-CN.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/_index.zh-TW.md ================================================ +++ title = "Posts" sort_by = "weight" insert_anchor_links = "left" render = false page_template = "edition-2/page.html" +++ ================================================ FILE: blog/content/edition-2/posts/deprecated/04-unit-testing/index.md ================================================ +++ title = "Unit Testing" weight = 4 path = "unit-testing" date = 2018-04-29 [extra] warning_short = "Deprecated: " warning = "This post is deprecated in favor of the [_Testing_](/testing) post and will no longer receive updates." +++ This post explores unit testing in `no_std` executables using Rust's built-in test framework. We will adjust our code so that `cargo test` works and add some basic unit tests to our VGA buffer module. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-04`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-04 ## Requirements In this post we explore how to execute `cargo test` on the host system (as a normal Linux/Windows/macOS executable). This only works if you don't have a `.cargo/config` file that sets a default target. If you followed the [_Minimal Rust Kernel_] post before 2019-04-27, you should be fine. If you followed it after that date, you need to remove the `build.target` key from your `.cargo/config` file and explicitly pass a target argument to `cargo xbuild`. [_Minimal Rust Kernel_]: @/edition-2/posts/02-minimal-rust-kernel/index.md Alternatively, consider reading the new [_Testing_] post instead. It sets up a similar functionality as this post, but instead of running the tests on your host system, they are run in a realistic environment inside QEMU. [_Testing_]: @/edition-2/posts/04-testing/index.md ## Unit Tests for `no_std` Binaries Rust has a [built-in test framework] that is capable of running unit tests without the need to set anything up. Just create a function that checks some results through assertions and add the `#[test]` attribute to the function header. Then `cargo test` will automatically find and execute all test functions of your crate. [built-in test framework]: https://doc.rust-lang.org/book/ch11-00-testing.html Unfortunately it's a bit more complicated for `no_std` applications such as our kernel. If we run `cargo test` (without adding any test yet), we get the following error: ``` > cargo test Compiling blog_os v0.2.0 (file:///…/blog_os) error[E0152]: duplicate lang item found: `panic_impl`. --> src/main.rs:35:1 | 35 | / fn panic(info: &PanicInfo) -> ! { 36 | | println!("{}", info); 37 | | loop {} 38 | | } | |_^ | = note: first defined in crate `std`. ``` The problem is that unit tests are built for the host machine, with the `std` library included. This makes sense because they should be able to run as a normal application on the host operating system. Since the standard library has it's own `panic_handler` function, we get the above error. To fix it, we use [conditional compilation] to include our implementation of the panic handler only in non-test environments: [conditional compilation]: https://doc.rust-lang.org/reference/conditional-compilation.html ```rust // in src/main.rs use core::panic::PanicInfo; #[cfg(not(test))] // only compile when the test flag is not set #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } ``` The only change is the added `#[cfg(not(test))]` attribute. The `#[cfg(…)]` attribute ensures that the annotated item is only included if the passed condition is met. The `test` configuration is set when the crate is compiled for unit tests. Through `not(…)` we negate the condition so that the language item is only compiled for non-test builds. When we now try `cargo test` again, we get an ugly linker error: ``` error: linking with `cc` failed: exit code: 1 | = note: "cc" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-m64" "-L" "/…/lib/rustlib/x86_64-unknown-linux-gnu/lib" […] = note: /…/blog_os-969bdb90d27730ed.2q644ojj2xqxddld.rcgu.o: In function `_start': /…/blog_os/src/main.rs:17: multiple definition of `_start' /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/Scrt1.o:(.text+0x0): first defined here /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/Scrt1.o: In function `_start': (.text+0x20): undefined reference to `main' collect2: error: ld returned 1 exit status ``` I shortened the output here because it is extremely verbose. The relevant part is at the bottom, after the second “note:”. We got two distinct errors here, “_multiple definition of `_start`_” and “_undefined reference to `main`_”. The reason for the first error is that the test framework injects its own `main` and `_start` functions, which will run the tests when invoked. So we get two functions named `_start` when compiling in test mode, one from the test framework and the one we defined ourselves. To fix this, we need to exclude our `_start` function in that case, which we can do by marking it as `#[cfg(not(test))]`: ```rust // in src/main.rs #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { … } ``` The second problem is that we use the `#![no_main]` attribute for our crate, which suppresses any `main` generation, including the test `main`. To solve this, we use the [`cfg_attr`] attribute to conditionally enable the `no_main` attribute only in non-test mode: [`cfg_attr`]: https://chrismorgan.info/blog/rust-cfg_attr.html ```rust // in src/main.rs #![cfg_attr(not(test), no_main)] // instead of `#![no_main]` ``` Now `cargo test` works: ``` > cargo test Compiling blog_os v0.2.0 (file:///…/blog_os) [some warnings] Finished dev [unoptimized + debuginfo] target(s) in 0.98 secs Running target/debug/deps/blog_os-1f08396a9eff0aa7 running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out ``` The test framework seems to work as intended. We don't have any tests yet, but we already get a test result summary. ### Silencing the Warnings We get a few warnings about unused imports, because we no longer compile our `_start` function. To silence such unused code warnings, we can add the following to the top of our `main.rs`: ``` #![cfg_attr(test, allow(unused_imports))] ``` Like before, the `cfg_attr` attribute sets the passed attribute if the passed condition holds. Here, we set the `allow(…)` attribute when compiling in test mode. We use the `allow` attribute to disable warnings for the `unused_import` _lint_. Lints are classes of warnings, for example `dead_code` for unused code or `missing-docs` for missing documentation. Lints can be set to four different states: - `allow`: no errors, no warnings - `warn`: causes a warning - `deny`: causes a compilation error - `forbid`: like `deny`, but can't be overridden Some lints are `allow` by default (such as `missing-docs`), others are `warn` by default (such as `dead_code`), and some few are even `deny` by default.. The default can be overridden by the `allow`, `warn`, `deny` and `forbid` attributes. For a list of all lints, see `rustc -W help`. There is also the [clippy] project, which provides many additional lints. [clippy]: https://github.com/rust-lang-nursery/rust-clippy ### Including the Standard Library Unit tests run on the host machine, so it's possible to use the complete standard library inside them. To link the standard library in test mode, we can make the `#![no_std]` attribute conditional through `cfg_attr` too: ```diff -#![no_std] +#![cfg_attr(not(test), no_std)] ``` ## Testing the VGA Module Now that we have set up the test framework, we can add a first unit test for our `vga_buffer` module: ```rust // in src/vga_buffer.rs #[cfg(test)] mod test { use super::*; #[test] fn foo() {} } ``` We add the test in an inline `test` submodule. This isn't necessary, but a common way to separate test code from the rest of the module. By adding the `#[cfg(test)]` attribute, we ensure that the module is only compiled in test mode. Through `use super::*`, we import all items of the parent module (the `vga_buffer` module), so that we can test them easily. The `#[test]` attribute on the `foo` function tells the test framework that the function is an unit test. The framework will find it automatically, even if it's private and inside a private module as in our case: ``` > cargo test Compiling blog_os v0.2.0 (file:///…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 2.99 secs Running target/debug/deps/blog_os-1f08396a9eff0aa7 running 1 test test vga_buffer::test::foo ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out ``` We see that the test was found and executed. It didn't panic, so it counts as passed. ### Constructing a Writer In order to test the VGA methods, we first need to construct a `Writer` instance. Since we will need such an instance for other tests too, we create a separate function for it: ```rust // in src/vga_buffer.rs #[cfg(test)] mod test { use super::*; fn construct_writer() -> Writer { use std::boxed::Box; let buffer = construct_buffer(); Writer { column_position: 0, color_code: ColorCode::new(Color::Blue, Color::Magenta), buffer: Box::leak(Box::new(buffer)), } } fn construct_buffer() -> Buffer { … } } ``` We set the initial column position to 0 and choose some arbitrary colors for foreground and background color. The difficult part is the buffer construction, it's described in detail below. We then use [`Box::new`] and [`Box::leak`] to transform the created `Buffer` into a `&'static mut Buffer`, because the `buffer` field needs to be of that type. [`Box::new`]: https://doc.rust-lang.org/nightly/std/boxed/struct.Box.html#method.new [`Box::leak`]: https://doc.rust-lang.org/nightly/std/boxed/struct.Box.html#method.leak #### Buffer Construction So how do we create a `Buffer` instance? The naive approach does not work unfortunately: ```rust fn construct_buffer() -> Buffer { Buffer { chars: [[Volatile::new(empty_char()); BUFFER_WIDTH]; BUFFER_HEIGHT], } } fn empty_char() -> ScreenChar { ScreenChar { ascii_character: b' ', color_code: ColorCode::new(Color::Green, Color::Brown), } } ``` When running `cargo test` the following error occurs: ``` error[E0277]: the trait bound `volatile::Volatile: core::marker::Copy` is not satisfied --> src/vga_buffer.rs:186:21 | 186 | chars: [[Volatile::new(empty_char); BUFFER_WIDTH]; BUFFER_HEIGHT], | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `core::marker::Copy` is not implemented for `volatile::Volatile` | = note: the `Copy` trait is required because the repeated element will be copied ``` The problem is that array construction in Rust requires that the contained type is [`Copy`]. The `ScreenChar` is `Copy`, but the `Volatile` wrapper is not. There is currently no easy way to circumvent this without using [`unsafe`], but fortunately there is the [`array_init`] crate that provides a safe interface for such operations. [`Copy`]: https://doc.rust-lang.org/core/marker/trait.Copy.html [`unsafe`]: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html [`array_init`]: https://docs.rs/array-init To use that crate, we add the following to our `Cargo.toml`: ```toml [dev-dependencies] array-init = "0.0.3" ``` Note that we're using the [`dev-dependencies`] table instead of the `dependencies` table, because we only need the crate for `cargo test` and not for a normal build. [`dev-dependencies`]: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#development-dependencies Now we can fix our `construct_buffer` function: ```rust fn construct_buffer() -> Buffer { use array_init::array_init; Buffer { chars: array_init(|_| array_init(|_| Volatile::new(empty_char()))), } } ``` See the [documentation of `array_init`][`array_init`] for more information about using that crate. ### Testing `write_byte` Now we're finally able to write a first unit test that tests the `write_byte` method: ```rust // in vga_buffer.rs mod test { […] #[test] fn write_byte() { let mut writer = construct_writer(); writer.write_byte(b'X'); writer.write_byte(b'Y'); for (i, row) in writer.buffer.chars.iter().enumerate() { for (j, screen_char) in row.iter().enumerate() { let screen_char = screen_char.read(); if i == BUFFER_HEIGHT - 1 && j == 0 { assert_eq!(screen_char.ascii_character, b'X'); assert_eq!(screen_char.color_code, writer.color_code); } else if i == BUFFER_HEIGHT - 1 && j == 1 { assert_eq!(screen_char.ascii_character, b'Y'); assert_eq!(screen_char.color_code, writer.color_code); } else { assert_eq!(screen_char, empty_char()); } } } } } ``` We construct a `Writer`, write two bytes to it, and then check that the right screen characters were updated. When we run `cargo test`, we see that the test is executed and passes: ``` running 1 test test vga_buffer::test::write_byte ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out ``` Try to play around a bit with this function and verify that the test fails if you change something, e.g. if you print a third byte without adjusting the `for` loop. (If you're getting an “binary operation `==` cannot be applied to type `vga_buffer::ScreenChar`” error, you need to also derive [`PartialEq`] for `ScreenChar` and `ColorCode`). [`PartialEq`]: https://doc.rust-lang.org/nightly/core/cmp/trait.PartialEq.html ### Testing Strings Let's add a second unit test to test formatted output and newline behavior: ```rust // in src/vga_buffer.rs mod test { […] #[test] fn write_formatted() { use core::fmt::Write; let mut writer = construct_writer(); writeln!(&mut writer, "a").unwrap(); writeln!(&mut writer, "b{}", "c").unwrap(); for (i, row) in writer.buffer.chars.iter().enumerate() { for (j, screen_char) in row.iter().enumerate() { let screen_char = screen_char.read(); if i == BUFFER_HEIGHT - 3 && j == 0 { assert_eq!(screen_char.ascii_character, b'a'); assert_eq!(screen_char.color_code, writer.color_code); } else if i == BUFFER_HEIGHT - 2 && j == 0 { assert_eq!(screen_char.ascii_character, b'b'); assert_eq!(screen_char.color_code, writer.color_code); } else if i == BUFFER_HEIGHT - 2 && j == 1 { assert_eq!(screen_char.ascii_character, b'c'); assert_eq!(screen_char.color_code, writer.color_code); } else if i >= BUFFER_HEIGHT - 2 { assert_eq!(screen_char.ascii_character, b' '); assert_eq!(screen_char.color_code, writer.color_code); } else { assert_eq!(screen_char, empty_char()); } } } } } ``` In this test we're using the [`writeln!`] macro to print strings with newlines to the buffer. Most of the for loop is similar to the `write_byte` test and only verifies if the written characters are at the expected place. The new `if i >= BUFFER_HEIGHT - 2` case verifies that the empty lines that are shifted in on a newline have the `writer.color_code`, which is different from the initial color. [`writeln!`]: https://doc.rust-lang.org/nightly/core/macro.writeln.html ### More Tests We only present two basic tests here as an example, but of course many more tests are possible. For example a test that changes the writer color in between writes. Or a test that checks that the top line is correctly shifted off the screen on a newline. Or a test that checks that non-ASCII characters are handled correctly. ## Summary Unit testing is a very useful technique to ensure that certain components have a desired behavior. Even if they cannot show the absence of bugs, they're still an useful tool for finding them and especially for avoiding regressions. This post explained how to set up unit testing in a Rust kernel. We now have a functioning test framework and can easily add tests by adding functions with a `#[test]` attribute. To run them, a short `cargo test` suffices. We also added a few basic tests for our VGA buffer as an example how unit tests could look like. We also learned a bit about conditional compilation, Rust's [lint system], how to [initialize arrays with non-Copy types], and the `dev-dependencies` section of the `Cargo.toml`. [lint system]: #silencing-the-warnings [initialize arrays with non-Copy types]: #buffer-construction ## What's next? We now have a working unit testing framework, which gives us the ability to test individual components. However, unit tests have the disadvantage that they run on the host machine and are thus unable to test how components interact with platform specific parts. For example, we can't test the `println!` macro with an unit test because it wants to write at the VGA text buffer at address `0xb8000`, which only exists in the bare metal environment. The next post will close this gap by creating a basic _integration test_ framework, which runs the tests in QEMU and thus has access to platform specific components. This will allow us to test the full system, for example that our kernel boots correctly or that no deadlock occurs on nested `println!` invocations. ================================================ FILE: blog/content/edition-2/posts/deprecated/05-integration-tests/index.md ================================================ +++ title = "Integration Tests" weight = 5 path = "integration-tests" date = 2018-06-15 [extra] warning_short = "Deprecated: " warning = "This post is deprecated in favor of the [_Testing_](/testing) post and will no longer receive updates." +++ To complete the testing picture we implement a basic integration test framework, which allows us to run tests on the target system. The idea is to run tests inside QEMU and report the results back to the host through the serial port. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found in the [`post-05`][post branch] branch. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/post-05 ## Requirements This post builds upon the [_Unit Testing_] post, so you need to follow it first. Alternatively, consider reading the new [_Testing_] post instead, which replaces both _Unit Testing_ and this post. The new posts implements similar functionality, but integrates it directly in `cargo xtest`, so that both unit and integration tests run in a realistic environment inside QEMU. [_Unit Testing_]: @/edition-2/posts/deprecated/04-unit-testing/index.md [_Testing_]: @/edition-2/posts/04-testing/index.md ## Overview In the previous post we added support for unit tests. The goal of unit tests is to test small components in isolation to ensure that each of them works as intended. The tests are run on the host machine and thus shouldn't rely on architecture specific functionality. To test the interaction of the components, both with each other and the system environment, we can write _integration tests_. Compared to unit tests, ìntegration tests are more complex, because they need to run in a realistic environment. What this means depends on the application type. For example, for webserver applications it often means to set up a database instance. For an operating system kernel like ours, it means that we run the tests on the target hardware without an underlying operating system. Running on the target architecture allows us to test all hardware specific code such as the VGA buffer or the effects of [page table] modifications. It also allows us to verify that our kernel boots without problems and that no [CPU exception] occurs. [page table]: https://en.wikipedia.org/wiki/Page_table [CPU exception]: https://wiki.osdev.org/Exceptions In this post we will implement a very basic test framework that runs integration tests inside instances of the [QEMU] virtual machine. It is not as realistic as running them on real hardware, but it is much simpler and should be sufficient as long as we only use standard hardware that is well supported in QEMU. [QEMU]: https://www.qemu.org/ ## The Serial Port The naive way of doing an integration test would be to add some assertions in the code, launch QEMU, and manually check if a panic occurred or not. This is very cumbersome and not practical if we have hundreds of integration tests. So we want an automated solution that runs all tests and fails if not all of them pass. Such an automated test framework needs to know whether a test succeeded or failed. It can't look at the screen output of QEMU, so we need a different way of retrieving the test results on the host system. A simple way to achieve this is by using the [serial port], an old interface standard which is no longer found in modern computers. It is easy to program and QEMU can redirect the bytes sent over serial to the host's standard output or a file. [serial port]: https://en.wikipedia.org/wiki/Serial_port The chips implementing a serial interface are called [UARTs]. There are [lots of UART models] on x86, but fortunately the only differences between them are some advanced features we don't need. The common UARTs today are all compatible to the [16550 UART], so we will use that model for our testing framework. [UARTs]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter [lots of UART models]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter#Models [16550 UART]: https://en.wikipedia.org/wiki/16550_UART ### Port I/O There are two different approaches for communicating between the CPU and peripheral hardware on x86, **memory-mapped I/O** and **port-mapped I/O**. We already used memory-mapped I/O for accessing the [VGA text buffer] through the memory address `0xb8000`. This address is not mapped to RAM, but to some memory on the GPU. [VGA text buffer]: @/edition-2/posts/03-vga-text-buffer/index.md In contrast, port-mapped I/O uses a separate I/O bus for communication. Each connected peripheral has one or more port numbers. To communicate with such an I/O port there are special CPU instructions called `in` and `out`, which take a port number and a data byte (there are also variations of these commands that allow sending an `u16` or `u32`). The UART uses port-mapped I/O. Fortunately there are already several crates that provide abstractions for I/O ports and even UARTs, so we don't need to invoke the `in` and `out` assembly instructions manually. ### Implementation We will use the [`uart_16550`] crate to initialize the UART and send data over the serial port. To add it as a dependency, we update our `Cargo.toml` and `main.rs`: [`uart_16550`]: https://docs.rs/uart_16550 ```toml # in Cargo.toml [dependencies] uart_16550 = "0.1.0" ``` The `uart_16550` crate contains a `SerialPort` struct that represents the UART registers, but we still need to construct an instance of it ourselves. For that we create a new `serial` module with the following content: ```rust // in src/main.rs mod serial; ``` ```rust // in src/serial.rs use uart_16550::SerialPort; use spin::Mutex; use lazy_static::lazy_static; lazy_static! { pub static ref SERIAL1: Mutex = { let mut serial_port = SerialPort::new(0x3F8); serial_port.init(); Mutex::new(serial_port) }; } ``` Like with the [VGA text buffer][vga lazy-static], we use `lazy_static` and a spinlock to create a `static`. However, this time we use `lazy_static` to ensure that the `init` method is called before first use. We're using the port address `0x3F8`, which is the standard port number for the first serial interface. [vga lazy-static]: @/edition-2/posts/03-vga-text-buffer/index.md#lazy-statics To make the serial port easily usable, we add `serial_print!` and `serial_println!` macros: ```rust #[doc(hidden)] pub fn _print(args: ::core::fmt::Arguments) { use core::fmt::Write; SERIAL1.lock().write_fmt(args).expect("Printing to serial failed"); } /// Prints to the host through the serial interface. #[macro_export] macro_rules! serial_print { ($($arg:tt)*) => { $crate::serial::_print(format_args!($($arg)*)); }; } /// Prints to the host through the serial interface, appending a newline. #[macro_export] macro_rules! serial_println { () => ($crate::serial_print!("\n")); ($fmt:expr) => ($crate::serial_print!(concat!($fmt, "\n"))); ($fmt:expr, $($arg:tt)*) => ($crate::serial_print!( concat!($fmt, "\n"), $($arg)*)); } ``` The `SerialPort` type already implements the [`fmt::Write`] trait, so we don't need to provide an implementation. [`fmt::Write`]: https://doc.rust-lang.org/nightly/core/fmt/trait.Write.html Now we can print to the serial interface in our `main.rs`: ```rust // in src/main.rs mod serial; #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); // prints to vga buffer serial_println!("Hello Host{}", "!"); loop {} } ``` Note that the `serial_println` macro lives directly under the root namespace because we used the `#[macro_export]` attribute, so importing it through `use crate::serial::serial_println` will not work. ### QEMU Arguments To see the serial output in QEMU, we can use the `-serial` argument to redirect the output to stdout: ``` > qemu-system-x86_64 \ -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin \ -serial mon:stdio warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] Hello Host! ``` If you chose a different name than `blog_os`, you need to update the paths of course. Note that you can no longer exit QEMU through `Ctrl+c`. As an alternative you can use `Ctrl+a` and then `x`. As an alternative to this long command, we can pass the argument to `bootimage run`, with an additional `--` to separate the build arguments (passed to cargo) from the run arguments (passed to QEMU). ``` bootimage run -- -serial mon:stdio ``` Instead of standard output, QEMU supports [many more target devices][QEMU -serial]. For redirecting the output to a file, the argument is: [QEMU -serial]: https://qemu.weilnetz.de/doc/5.2/system/invocation.html#hxtool-9 ``` -serial file:output-file.txt ``` ## Shutting Down QEMU Right now we have an endless loop at the end of our `_start` function and need to close QEMU manually. This does not work for automated tests. We could try to kill QEMU automatically from the host, for example after some special output was sent over serial, but this would be a bit hacky and difficult to get right. The cleaner solution would be to implement a way to shutdown our OS. Unfortunately this is relatively complex, because it requires implementing support for either the [APM] or [ACPI] power management standard. [APM]: https://wiki.osdev.org/APM [ACPI]: https://wiki.osdev.org/ACPI Luckily, there is an escape hatch: QEMU supports a special `isa-debug-exit` device, which provides an easy way to exit QEMU from the guest system. To enable it, we add the following argument to our QEMU command: ``` -device isa-debug-exit,iobase=0xf4,iosize=0x04 ``` The `iobase` specifies on which port address the device should live (`0xf4` is a [generally unused][list of x86 I/O ports] port on the x86's IO bus) and the `iosize` specifies the port size (`0x04` means four bytes). Now the guest can write a value to the `0xf4` port and QEMU will exit with [exit status] `(passed_value << 1) | 1`. [list of x86 I/O ports]: https://wiki.osdev.org/I/O_Ports#The_list [exit status]: https://en.wikipedia.org/wiki/Exit_status To write to the I/O port, we use the [`x86_64`] crate: [`x86_64`]: https://docs.rs/x86_64/0.5.2/x86_64/ ```toml # in Cargo.toml [dependencies] x86_64 = "0.5.2" ``` ```rust // in src/main.rs pub unsafe fn exit_qemu() { use x86_64::instructions::port::Port; let mut port = Port::::new(0xf4); port.write(0); } ``` We mark the function as `unsafe` because it relies on the fact that a special QEMU device is attached to the I/O port with address `0xf4`. For the port type we choose `u32` because the `iosize` is 4 bytes. As value we write a zero, which causes QEMU to exit with exit status `(0 << 1) | 1 = 1`. Note that we could also use the exit status instead of the serial interface for sending the test results, for example `1` for success and `2` for failure. However, this wouldn't allow us to send panic messages like the serial interface does and would also prevent us from replacing `exit_qemu` with a proper shutdown someday. Therefore we continue to use the serial interface and just always write a `0` to the port. We can now test the QEMU shutdown by calling `exit_qemu` from our `_start` function: ```rust #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); // prints to vga buffer serial_println!("Hello Host{}", "!"); unsafe { exit_qemu(); } loop {} } ``` You should see that QEMU immediately closes after booting when executing: ``` bootimage run -- -serial mon:stdio -device isa-debug-exit,iobase=0xf4,iosize=0x04 ``` ## Hiding QEMU We are now able to launch a QEMU instance that writes its output to the serial port and automatically exits itself when it's done. So we no longer need the VGA buffer output or the graphical representation that still pops up. We can disable it by passing the `-display none` parameter to QEMU. The full command looks like this: ``` qemu-system-x86_64 \ -drive format=raw,file=target/x86_64-blog_os/debug/bootimage-blog_os.bin \ -serial mon:stdio \ -device isa-debug-exit,iobase=0xf4,iosize=0x04 \ -display none ``` Or, with `bootimage run`: ``` bootimage run -- \ -serial mon:stdio \ -device isa-debug-exit,iobase=0xf4,iosize=0x04 \ -display none ``` Now QEMU runs completely in the background and no window is opened anymore. This is not only less annoying, but also allows our test framework to run in environments without a graphical user interface, such as [Travis CI]. [Travis CI]: https://travis-ci.com/ ## Test Organization Right now we're doing the serial output and the QEMU exit from the `_start` function in our `main.rs` and can no longer run our kernel in a normal way. We could try to fix this by adding an `integration-test` [cargo feature] and using [conditional compilation]: [cargo feature]: https://doc.rust-lang.org/cargo/reference/features.html#the-features-section [conditional compilation]: https://doc.rust-lang.org/reference/conditional-compilation.html ```toml # in Cargo.toml [features] integration-test = [] ``` ```rust // in src/main.rs #[cfg(not(feature = "integration-test"))] // new #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); // prints to vga buffer // normal execution loop {} } #[cfg(feature = "integration-test")] // new #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { serial_println!("Hello Host{}", "!"); run_test_1(); run_test_2(); // run more tests unsafe { exit_qemu(); } loop {} } ``` However, this approach has a big problem: All tests run in the same kernel instance, which means that they can influence each other. For example, if `run_test_1` misconfigures the system by loading an invalid [page table], it can cause `run_test_2` to fail. This isn't something that we want because it makes it very difficult to find the actual cause of an error. [page table]: https://en.wikipedia.org/wiki/Page_table Instead, we want our test instances to be as independent as possible. If a test wants to destroy most of the system configuration to ensure that some property still holds in catastrophic situations, it should be able to do so without needing to restore a correct system state afterwards. This means that we need to launch a separate QEMU instance for each test. With the above conditional compilation we only have two modes: Run the kernel normally or execute _all_ integration tests. To run each test in isolation we would need a separate cargo feature for each test with that approach, which would result in very complex conditional compilation bounds and confusing code. A better solution is to create an additional executable for each test. ### Additional Test Executables Cargo allows to add [additional executables] to a project by putting them inside `src/bin`. We can use that feature to create a separate executable for each integration test. For example, a `test-something` executable could be added like this: [additional executables]: https://doc.rust-lang.org/cargo/guide/project-layout.html ```rust // src/bin/test-something.rs #![cfg_attr(not(test), no_std)] #![cfg_attr(not(test), no_main)] #![cfg_attr(test, allow(unused_imports))] use core::panic::PanicInfo; #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { // run tests loop {} } #[cfg(not(test))] #[panic_handler] fn panic(_info: &PanicInfo) -> ! { loop {} } ``` By providing a new implementation for `_start` we can create a minimal test case that only tests one specific thing and is independent of the rest. For example, if we don't print anything to the VGA buffer, the test still succeeds even if the `vga_buffer` module is broken. We can now run this executable in QEMU by passing a `--bin` argument to `bootimage`: ``` bootimage run --bin test-something ``` It should build the `test-something.rs` executable instead of `main.rs` and launch an empty QEMU window (since we don't print anything). So this approach allows us to create completely independent executables without cargo features or conditional compilation, and without cluttering our `main.rs`. However, there is a problem: This is a completely separate executable, which means that we can't access any functions from our `main.rs`, including `serial_println` and `exit_qemu`. Duplicating the code would work, but we would also need to copy everything we want to test. This would mean that we no longer test the original function but only a possibly outdated copy. Fortunately there is a way to share most of the code between our `main.rs` and the testing binaries: We move most of the code from our `main.rs` to a library that we can include from all executables. ### Split Off A Library Cargo supports hybrid projects that are both a library and a binary. We only need to create a `src/lib.rs` file and split the contents of our `main.rs` in the following way: ```rust // src/lib.rs #![cfg_attr(not(test), no_std)] // don't link the Rust standard library // NEW: We need to add `pub` here to make them accessible from the outside pub mod vga_buffer; pub mod serial; pub unsafe fn exit_qemu() { use x86_64::instructions::port::Port; let mut port = Port::::new(0xf4); port.write(0); } ``` ```rust // src/main.rs #![cfg_attr(not(test), no_std)] #![cfg_attr(not(test), no_main)] #![cfg_attr(test, allow(unused_imports))] use core::panic::PanicInfo; use blog_os::println; /// This function is the entry point, since the linker looks for a function /// named `_start` by default. #[cfg(not(test))] #[no_mangle] // don't mangle the name of this function pub extern "C" fn _start() -> ! { println!("Hello World{}", "!"); loop {} } /// This function is called on panic. #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { println!("{}", info); loop {} } ``` So we move everything except `_start` and `panic` to `lib.rs` and make the `vga_buffer` and `serial` modules public. Everything should work exactly as before, including `bootimage run` and `cargo test`. To run tests only for the library part of our crate and avoid the additional output we can execute `cargo test --lib`. ### Test Basic Boot We are finally able to create our first integration test executable. We start simple and only test that the basic boot sequence works and the `_start` function is called: ```rust // in src/bin/test-basic-boot.rs #![cfg_attr(not(test), no_std)] #![cfg_attr(not(test), no_main)] // disable all Rust-level entry points #![cfg_attr(test, allow(unused_imports))] use core::panic::PanicInfo; use blog_os::{exit_qemu, serial_println}; /// This function is the entry point, since the linker looks for a function /// named `_start` by default. #[cfg(not(test))] #[no_mangle] // don't mangle the name of this function pub extern "C" fn _start() -> ! { serial_println!("ok"); unsafe { exit_qemu(); } loop {} } /// This function is called on panic. #[cfg(not(test))] #[panic_handler] fn panic(info: &PanicInfo) -> ! { serial_println!("failed"); serial_println!("{}", info); unsafe { exit_qemu(); } loop {} } ``` We don't do something special here, we just print `ok` if `_start` is called and `failed` with the panic message when a panic occurs. Let's try it: ``` > bootimage run --bin test-basic-boot -- \ -serial mon:stdio -display none \ -device isa-debug-exit,iobase=0xf4,iosize=0x04 Building kernel Compiling blog_os v0.2.0 (file:///…/blog_os) Finished dev [unoptimized + debuginfo] target(s) in 0.19s Updating registry `https://github.com/rust-lang/crates.io-index` Creating disk image at target/x86_64-blog_os/debug/bootimage-test-basic-boot.bin warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] ok ``` We got our `ok`, so it worked! Try inserting a `panic!()` before the `ok` printing, you should see output like this: ``` failed panicked at 'explicit panic', src/bin/test-basic-boot.rs:19:5 ``` ### Test Panic To test that our panic handler is really invoked on a panic, we create a `test-panic` test: ```rust // in src/bin/test-panic.rs #![cfg_attr(not(test), no_std)] #![cfg_attr(not(test), no_main)] #![cfg_attr(test, allow(unused_imports))] use core::panic::PanicInfo; use blog_os::{exit_qemu, serial_println}; #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { panic!(); } #[cfg(not(test))] #[panic_handler] fn panic(_info: &PanicInfo) -> ! { serial_println!("ok"); unsafe { exit_qemu(); } loop {} } ``` This executable is almost identical to `test-basic-boot`, the only difference is that we print `ok` from our panic handler and invoke an explicit `panic()` in our `_start` function. ## A Test Runner The final step is to create a test runner, a program that executes all integration tests and checks their results. The basic steps that it should do are: - Look for integration tests in the current project, maybe by some convention (e.g. executables starting with `test-`). - Run all integration tests and interpret their results. - Use a timeout to ensure that an endless loop does not block the test runner forever. - Report the test results to the user and set a successful or failing exit status. Such a test runner is useful to many projects, so we decided to add one to the `bootimage` tool. ### Bootimage Test The test runner of the `bootimage` tool can be invoked via `bootimage test`. It uses the following conventions: - All executables starting with `test-` are treated as integration tests. - Tests must print either `ok` or `failed` over the serial port. When printing `failed` they can print additional information such as a panic message (in the next lines). - Tests are run with a timeout of 1 minute. If the test has not completed in time, it is reported as "timed out". The `test-basic-boot` and `test-panic` tests we created above begin with `test-` and follow the `ok`/`failed` conventions, so they should work with `bootimage test`: ``` > bootimage test test-panic Finished dev [unoptimized + debuginfo] target(s) in 0.01s Ok test-basic-boot Finished dev [unoptimized + debuginfo] target(s) in 0.01s Ok test-something Finished dev [unoptimized + debuginfo] target(s) in 0.01s Timed Out The following tests failed: test-something: TimedOut ``` We see that our `test-panic` and `test-basic-boot` succeeded and that the `test-something` test timed out after one minute. We no longer need `test-something`, so we delete it (if you haven't done already). Now `bootimage test` should execute successfully. ## Summary In this post we learned about the serial port and port-mapped I/O and saw how to configure QEMU to print serial output to the command line. We also learned a trick how to exit QEMU without needing to implement a proper shutdown. We then split our crate into a library and binary part in order to create additional executables for integration tests. We added two example tests for testing that the `_start` function is correctly called and that a `panic` invokes our panic handler. Finally, we presented `bootimage test` as a basic test runner for our integration tests. We now have a working integration test framework and can finally start to implement functionality in our kernel. We will continue to use the test framework over the next posts to test new components we add. ## What's next? In the next post, we will explore _CPU exceptions_. These exceptions are thrown by the CPU when something illegal happens, such as a division by zero or an access to an unmapped memory page (a so-called “page fault”). Being able to catch and examine these exceptions is very important for debugging future errors. Exception handling is also very similar to the handling of hardware interrupts, which is required for keyboard support. ================================================ FILE: blog/content/edition-2/posts/deprecated/10-advanced-paging/index.md ================================================ +++ title = "Advanced Paging" weight = 10 path = "advanced-paging" date = 2019-01-28 [extra] warning_short = "Deprecated: " warning = "This post is deprecated in favor of the [_Paging Implementation_](/paging-implementation) post and will no longer receive updates. See issue [#545](https://github.com/phil-opp/blog_os/issues/545) for reasons for this deprecation." +++ This post explains techniques to make the physical page table frames accessible to our kernel. It then uses such a technique to implement a function that translates virtual to physical addresses. It also explains how to create new mappings in the page tables. This blog is openly developed on [GitHub]. If you have any problems or questions, please open an issue there. You can also leave comments [at the bottom]. The complete source code for this post can be found [here][post branch]. [GitHub]: https://github.com/phil-opp/blog_os [at the bottom]: #comments [post branch]: https://github.com/phil-opp/blog_os/tree/5c0fb63f33380fc8596d7166c2ebde03ef3d6726 ## Introduction In the [previous post] we learned about the principles of paging and how the 4-level page tables on x86_64 work. We also found out that the bootloader already set up a page table hierarchy for our kernel, which means that our kernel already runs on virtual addresses. This improves safety since illegal memory accesses cause page fault exceptions instead of modifying arbitrary physical memory. [previous post]: @/edition-2/posts/08-paging-introduction/index.md However, it also causes a problem when we try to access the page tables from our kernel because we can't directly access the physical addresses that are stored in page table entries or the `CR3` register. We experienced that problem already [at the end of the previous post] when we tried to inspect the active page tables. [at the end of the previous post]: @/edition-2/posts/08-paging-introduction/index.md#accessing-the-page-tables The next section discusses the problem in detail and provides different approaches to a solution. Afterward, we implement a function that traverses the page table hierarchy in order to translate virtual to physical addresses. Finally, we learn how to create new mappings in the page tables and how to find unused memory frames for creating new page tables. ### Dependency Versions This post requires version 0.3.12 of the `bootloader` dependency and version 0.5.0 of the `x86_64` dependency. You can set the dependency versions in your `Cargo.toml`: ```toml [dependencies] bootloader = "0.3.12" x86_64 = "0.5.0" ``` ## Accessing Page Tables Accessing the page tables from our kernel is not as easy as it may seem. To understand the problem let's take a look at the example 4-level page table hierarchy of the previous post again: ![An example 4-level page hierarchy with each page table shown in physical memory](../paging-introduction/x86_64-page-table-translation.svg) The important thing here is that each page entry stores the _physical_ address of the next table. This avoids the need to run a translation for these addresses too, which would be bad for performance and could easily cause endless translation loops. The problem for us is that we can't directly access physical addresses from our kernel since our kernel also runs on top of virtual addresses. For example when we access address `4 KiB`, we access the _virtual_ address `4 KiB`, not the _physical_ address `4 KiB` where the level 4 page table is stored. When we want to access the physical address `4 KiB`, we can only do so through some virtual address that maps to it. So in order to access page table frames, we need to map some virtual pages to them. There are different ways to create these mappings that all allow us to access arbitrary page table frames: - A simple solution is to **identity map all page tables**: ![A virtual and a physical address space with various virtual pages mapped to the physical frame with the same address](identity-mapped-page-tables.svg) In this example, we see various identity-mapped page table frames. This way the physical addresses of page tables are also valid virtual addresses so that we can easily access the page tables of all levels starting from the CR3 register. However, it clutters the virtual address space and makes it more difficult to find continuous memory regions of larger sizes. For example, imagine that we want to create a virtual memory region of size 1000 KiB in the above graphic, e.g. for [memory-mapping a file]. We can't start the region at `28 KiB` because it would collide with the already mapped page at `1004 MiB`. So we have to look further until we find a large enough unmapped area, for example at `1008 KiB`. This is a similar fragmentation problem as with [segmentation]. [memory-mapping a file]: https://en.wikipedia.org/wiki/Memory-mapped_file [segmentation]: @/edition-2/posts/08-paging-introduction/index.md#fragmentation Equally, it makes it much more difficult to create new page tables, because we need to find physical frames whose corresponding pages aren't already in use. For example, let's assume that we reserved the _virtual_ 1000 KiB memory region starting at `1008 KiB` for our memory-mapped file. Now we can't use any frame with a _physical_ address between `1000 KiB` and `2008 KiB` anymore, because we can't identity map it. - Alternatively, we could **map the page tables frames only temporarily** when we need to access them. To be able to create the temporary mappings we only need a single identity-mapped level 1 table: ![A virtual and a physical address space with an identity mapped level 1 table, which maps its 0th entry to the level 2 table frame, thereby mapping that frame to page with address 0](temporarily-mapped-page-tables.png) The level 1 table in this graphic controls the first 2 MiB of the virtual address space. This is because it is reachable by starting at the CR3 register and following the 0th entry in the level 4, level 3, and level 2 page tables. The entry with index `8` maps the virtual page at address `32 KiB` to the physical frame at address `32 KiB`, thereby identity mapping the level 1 table itself. The graphic shows this identity-mapping by the horizontal arrow at `32 KiB`. By writing to the identity-mapped level 1 table, our kernel can create up to 511 temporary mappings (512 minus the entry required for the identity mapping). In the above example, the kernel mapped the 0th entry of the level 1 table to the frame with address `24 KiB`. This created a temporary mapping of the virtual page at `0 KiB` to the physical frame of the level 2 page table, indicated by the dashed arrow. Now the kernel can access the level 2 page table by writing to the page starting at `0 KiB`. The process for accessing an arbitrary page table frame with temporary mappings would be: - Search for a free entry in the identity-mapped level 1 table. - Map that entry to the physical frame of the page table that we want to access. - Access the target frame through the virtual page that maps to the entry. - Set the entry back to unused thereby removing the temporary mapping again. This approach keeps the virtual address space clean since it reuses the same 512 virtual pages for creating the mappings. The drawback is that it is a bit cumbersome, especially since a new mapping might require modifications of multiple table levels, which means that we would need to repeat the above process multiple times. - While both of the above approaches work, there is a third technique called **recursive page tables** that combines their advantages: It keeps all page table frames mapped at all times so that no temporary mappings are needed, and also keeps the mapped pages together to avoid fragmentation of the virtual address space. This is the technique that we will use for our implementation, therefore it is described in detail in the following section. ### Recursive Page Tables The idea behind this approach is to map some entry of the level 4 page table to the level 4 table itself. By doing this, we effectively reserve a part of the virtual address space and map all current and future page table frames to that space. Let's go through an example to understand how this all works: ![An example 4-level page hierarchy with each page table shown in physical memory. Entry 511 of the level 4 page is mapped to frame 4KiB, the frame of the level 4 table itself.](recursive-page-table.png) The only difference to the [example at the beginning of this post] is the additional entry at index `511` in the level 4 table, which is mapped to physical frame `4 KiB`, the frame of the level 4 table itself. [example at the beginning of this post]: #accessing-page-tables By letting the CPU follow this entry on a translation, it doesn't reach a level 3 table, but the same level 4 table again. This is similar to a recursive function that calls itself, therefore this table is called a _recursive page table_. The important thing is that the CPU assumes that every entry in the level 4 table points to a level 3 table, so it now treats the level 4 table as a level 3 table. This works because tables of all levels have the exact same layout on x86_64. By following the recursive entry one or multiple times before we start the actual translation, we can effectively shorten the number of levels that the CPU traverses. For example, if we follow the recursive entry once and then proceed to the level 3 table, the CPU thinks that the level 3 table is a level 2 table. Going further, it treats the level 2 table as a level 1 table and the level 1 table as the mapped frame. This means that we can now read and write the level 1 page table because the CPU thinks that it is the mapped frame. The graphic below illustrates the 5 translation steps: ![The above example 4-level page hierarchy with 5 arrows: "Step 0" from CR4 to level 4 table, "Step 1" from level 4 table to level 4 table, "Step 2" from level 4 table to level 3 table, "Step 3" from level 3 table to level 2 table, and "Step 4" from level 2 table to level 1 table.](recursive-page-table-access-level-1.png) Similarly, we can follow the recursive entry twice before starting the translation to reduce the number of traversed levels to two: ![The same 4-level page hierarchy with the following 4 arrows: "Step 0" from CR4 to level 4 table, "Steps 1&2" from level 4 table to level 4 table, "Step 3" from level 4 table to level 3 table, and "Step 4" from level 3 table to level 2 table.](recursive-page-table-access-level-2.png) Let's go through it step by step: First, the CPU follows the recursive entry on the level 4 table and thinks that it reaches a level 3 table. Then it follows the recursive entry again and thinks that it reaches a level 2 table. But in reality, it is still on the level 4 table. When the CPU now follows a different entry, it lands on a level 3 table but thinks it is already on a level 1 table. So while the next entry points at a level 2 table, the CPU thinks that it points to the mapped frame, which allows us to read and write the level 2 table. Accessing the tables of levels 3 and 4 works in the same way. For accessing the level 3 table, we follow the recursive entry three times, tricking the CPU into thinking it is already on a level 1 table. Then we follow another entry and reach a level 3 table, which the CPU treats as a mapped frame. For accessing the level 4 table itself, we just follow the recursive entry four times until the CPU treats the level 4 table itself as mapped frame (in blue in the graphic below). ![The same 4-level page hierarchy with the following 3 arrows: "Step 0" from CR4 to level 4 table, "Steps 1,2,3" from level 4 table to level 4 table, and "Step 4" from level 4 table to level 3 table. In blue the alternative "Steps 1,2,3,4" arrow from level 4 table to level 4 table.](recursive-page-table-access-level-3.png) It might take some time to wrap your head around the concept, but it works quite well in practice. #### Address Calculation We saw that we can access tables of all levels by following the recursive entry once or multiple times before the actual translation. Since the indexes into the tables of the four levels are derived directly from the virtual address, we need to construct special virtual addresses for this technique. Remember, the page table indexes are derived from the address in the following way: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](../paging-introduction/x86_64-table-indices-from-address.svg) Let's assume that we want to access the level 1 page table that maps a specific page. As we learned above, this means that we have to follow the recursive entry one time before continuing with the level 4, level 3, and level 2 indexes. To do that we move each block of the address one block to the right and set the original level 4 index to the index of the recursive entry: ![Bits 0–12 are the offset into the level 1 table frame, bits 12–21 the level 2 index, bits 21–30 the level 3 index, bits 30–39 the level 4 index, and bits 39–48 the index of the recursive entry](table-indices-from-address-recursive-level-1.svg) For accessing the level 2 table of that page, we move each index block two blocks to the right and set both the blocks of the original level 4 index and the original level 3 index to the index of the recursive entry: ![Bits 0–12 are the offset into the level 2 table frame, bits 12–21 the level 3 index, bits 21–30 the level 4 index, and bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-2.svg) Accessing the level 3 table works by moving each block three blocks to the right and using the recursive index for the original level 4, level 3, and level 2 address blocks: ![Bits 0–12 are the offset into the level 3 table frame, bits 12–21 the level 4 index, and bits 21–30, bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-3.svg) Finally, we can access the level 4 table by moving each block four blocks to the right and using the recursive index for all address blocks except for the offset: ![Bits 0–12 are the offset into the level l table frame and bits 12–21, bits 21–30, bits 30–39 and bits 39–48 are the index of the recursive entry](table-indices-from-address-recursive-level-4.svg) We can now calculate virtual addresses for the page tables of all four levels. We can even calculate an address that points exactly to a specific page table entry by multiplying its index by 8, the size of a page table entry. The table below summarizes the address structure for accessing the different kinds of frames: | Virtual Address for | Address Structure ([octal]) | | ------------------- | -------------------------------- | | Page | `0o_SSSSSS_AAA_BBB_CCC_DDD_EEEE` | | Level 1 Table Entry | `0o_SSSSSS_RRR_AAA_BBB_CCC_DDDD` | | Level 2 Table Entry | `0o_SSSSSS_RRR_RRR_AAA_BBB_CCCC` | | Level 3 Table Entry | `0o_SSSSSS_RRR_RRR_RRR_AAA_BBBB` | | Level 4 Table Entry | `0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA` | [octal]: https://en.wikipedia.org/wiki/Octal Whereas `AAA` is the level 4 index, `BBB` the level 3 index, `CCC` the level 2 index, and `DDD` the level 1 index of the mapped frame, and `EEEE` the offset into it. `RRR` is the index of the recursive entry. When an index (three digits) is transformed to an offset (four digits), it is done by multiplying it by 8 (the size of a page table entry). With this offset, the resulting address directly points to the respective page table entry. `SSSSSS` are sign extension bits, which means that they are all copies of bit 47. This is a special requirement for valid addresses on the x86_64 architecture. We explained it in the [previous post][sign extension]. [sign extension]: @/edition-2/posts/08-paging-introduction/index.md#paging-on-x86-64 We use [octal] numbers for representing the addresses since each octal character represents three bits, which allows us to clearly separate the 9-bit indexes of the different page table levels. This isn't possible with the hexadecimal system where each character represents four bits. ## Implementation After all this theory we can finally start our implementation. Conveniently, the bootloader not only created page tables for our kernel, but it also created a recursive mapping in the last entry of the level 4 table. The bootloader did this because otherwise there would be a [chicken or egg problem]: We need to access the level 4 table to create a recursive mapping, but we can't access it without some kind of mapping. [chicken or egg problem]: https://en.wikipedia.org/wiki/Chicken_or_the_egg We already used this recursive mapping [at the end of the previous post] to access the level 4 table. We did this through the hardcoded address `0xffff_ffff_ffff_f000`. When we convert this address to [octal] and compare it with the above table, we can see that it exactly follows the structure of a level 4 table entry with `RRR` = `0o777`, `AAAA` = 0, and the sign extension bits set to `1` each: ``` structure: 0o_SSSSSS_RRR_RRR_RRR_RRR_AAAA address: 0o_177777_777_777_777_777_0000 ``` With our knowledge about recursive page tables we can now create virtual addresses to access all active page tables. This allows us to create a translation function in software. ### Translating Addresses As a first step, let's create a function that translates a virtual address to a physical address by walking the page table hierarchy: ```rust // in src/lib.rs pub mod memory; ``` ```rust // in src/memory.rs use x86_64::PhysAddr; use x86_64::structures::paging::PageTable; /// Returns the physical address for the given virtual address, or `None` if the /// virtual address is not mapped. pub fn translate_addr(addr: usize) -> Option { // introduce variables for the recursive index and the sign extension bits // TODO: Don't hardcode these values let r = 0o777; // recursive index let sign = 0o177777 << 48; // sign extension // retrieve the page table indices of the address that we want to translate let l4_idx = (addr >> 39) & 0o777; // level 4 index let l3_idx = (addr >> 30) & 0o777; // level 3 index let l2_idx = (addr >> 21) & 0o777; // level 2 index let l1_idx = (addr >> 12) & 0o777; // level 1 index let page_offset = addr & 0o7777; // calculate the table addresses let level_4_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (r << 12); let level_3_table_addr = sign | (r << 39) | (r << 30) | (r << 21) | (l4_idx << 12); let level_2_table_addr = sign | (r << 39) | (r << 30) | (l4_idx << 21) | (l3_idx << 12); let level_1_table_addr = sign | (r << 39) | (l4_idx << 30) | (l3_idx << 21) | (l2_idx << 12); // check that level 4 entry is mapped let level_4_table = unsafe { &*(level_4_table_addr as *const PageTable) }; if level_4_table[l4_idx].addr().is_null() { return None; } // check that level 3 entry is mapped let level_3_table = unsafe { &*(level_3_table_addr as *const PageTable) }; if level_3_table[l3_idx].addr().is_null() { return None; } // check that level 2 entry is mapped let level_2_table = unsafe { &*(level_2_table_addr as *const PageTable) }; if level_2_table[l2_idx].addr().is_null() { return None; } // check that level 1 entry is mapped and retrieve physical address from it let level_1_table = unsafe { &*(level_1_table_addr as *const PageTable) }; let phys_addr = level_1_table[l1_idx].addr(); if phys_addr.is_null() { return None; } Some(phys_addr + page_offset) } ``` First, we introduce variables for the recursive index (511 = `0o777`) and the sign extension bits (which are 1 each). Then we calculate the page table indices and the page offset from the address through bitwise operations as specified in the graphic: ![Bits 0–12 are the page offset, bits 12–21 the level 1 index, bits 21–30 the level 2 index, bits 30–39 the level 3 index, and bits 39–48 the level 4 index](../paging-introduction/x86_64-table-indices-from-address.svg) In the next step we calculate the virtual addresses of the four page tables as descripbed in the [address calculation] section. We transform each of these addresses to [`PageTable`] references later in the function. These transformations are `unsafe` operations since the compiler can't know that these addresses are valid. [address calculation]: #address-calculation [`PageTable`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/page_table/struct.PageTable.html After the address calculation, we use the indexing operator to look at the entry in the level 4 table. If that entry is null, there is no level 3 table for this level 4 entry, which means that the `addr` is not mapped to any physical memory, so we return `None`. If the entry is not `None`, we know that a level 3 table exists. We then do the same cast and entry-checking as with the level 4 table. After we checked the three higher level pages, we can finally read the entry of the level 1 table that tells us the physical frame that the address is mapped to. As the last step, we add the page offset to that address and return it. If we knew that the address is mapped, we could directly access the level 1 table without looking at the higher level pages first. But since we don't know this, we have to check whether the level 1 table exists first, otherwise our function would cause a page fault for unmapped addresses. #### Try it out We can use our new translation function to translate some virtual addresses in our `_start` function: ```rust // in src/main.rs #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { […] // initialize GDT, IDT, PICS use blog_os::memory::translate_addr; let addresses = [ // the identity-mapped vga buffer page 0xb8000, // some code page 0x20010a, // some stack page 0x57ac_001f_fe48, ]; for &address in &addresses { println!("{:?} -> {:?}", address, translate_addr(address)); } println!("It did not crash!"); blog_os::hlt_loop(); } ``` When we run it, we see the following output: ![0xb8000 -> 0xb8000, 0x20010a -> 0x40010a, 0x57ac001ffe48 -> 0x27be48](qemu-translate-addr.png) As expected, the identity-mapped address `0xb8000` translates to the same physical address. The code page and the stack page translate to some arbitrary physical addresses, which depend on how the bootloader created the initial mapping for our kernel. #### The `RecursivePageTable` Type The `x86_64` provides a [`RecursivePageTable`] type that implements safe abstractions for various page table operations. The type implements the [`MapperAllSizes`] trait, which already contains a `translate_addr` function that we can use instead of hand-rolling our own. To create a new `RecursivePageTable`, we create a `memory::init` function: [`RecursivePageTable`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/struct.RecursivePageTable.html [`MapperAllSizes`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/trait.MapperAllSizes.html ```rust // in src/memory.rs use x86_64::structures::paging::{Mapper, Page, PageTable, RecursivePageTable}; use x86_64::{VirtAddr, PhysAddr}; /// Creates a RecursivePageTable instance from the level 4 address. /// /// This function is unsafe because it can break memory safety if an invalid /// address is passed. pub unsafe fn init(level_4_table_addr: usize) -> RecursivePageTable<'static> { let level_4_table_ptr = level_4_table_addr as *mut PageTable; let level_4_table = &mut *level_4_table_ptr; RecursivePageTable::new(level_4_table).unwrap() } ``` The `RecursivePageTable` type encapsulates the unsafety of the page table walk completely so that we no longer need `unsafe` to implement our own `translate_addr` function. The `init` function needs to be unsafe because the caller has to guarantee that the passed `level_4_table_addr` is valid. We can now use the `MapperAllSizes::translate_addr` function in our `_start` function: ```rust // in src/main.rs #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { […] // initialize GDT, IDT, PICS use blog_os::memory; use x86_64::{ structures::paging::MapperAllSizes, VirtAddr, }; const LEVEL_4_TABLE_ADDR: usize = 0o_177777_777_777_777_777_0000; let recursive_page_table = unsafe { memory::init(LEVEL_4_TABLE_ADDR) }; let addresses = […]; // as before for &address in &addresses { let virt_addr = VirtAddr::new(address); let phys_addr = recursive_page_table.translate_addr(virt_addr); println!("{:?} -> {:?}", virt_addr, phys_addr); } println!("It did not crash!"); blog_os::hlt_loop(); } ``` Instead of using `u64` for all addresses we now use the [`VirtAddr`] and [`PhysAddr`] wrapper types to differentiate the two kinds of addresses. In order to be able to call the `translate_addr` method, we need to import the `MapperAllSizes` trait. [`VirtAddr`]: https://docs.rs/x86_64/0.5.2/x86_64/struct.VirtAddr.html [`PhysAddr`]: https://docs.rs/x86_64/0.5.2/x86_64/struct.PhysAddr.html By using the `RecursivePageTable` type, we now have a safe abstraction and clear ownership semantics. This ensures that we can't accidentally modify the page table concurrently, because an exclusive borrow of the `RecursivePageTable` is needed in order to modify it. When we run it, we see the same result as with our handcrafted translation function. #### Making Unsafe Functions Safer Our `memory::init` function is an [unsafe function], which means that an `unsafe` block is required for calling it because the caller has to guarantee that certain requirements are met. In our case, the requirement is that the passed address is mapped to the physical frame of the level 4 page table. [unsafe function]: https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html#calling-an-unsafe-function-or-method The second property of unsafe functions is that their complete body is treated as an `unsafe` block, which means that it can perform all kinds of unsafe operations without additional unsafe blocks. This is the reason that we didn't need an `unsafe` block for dereferencing the raw `level_4_table_ptr`: ```rust pub unsafe fn init(level_4_table_addr: usize) -> RecursivePageTable<'static> { let level_4_table_ptr = level_4_table_addr as *mut PageTable; let level_4_table = &mut *level_4_table_ptr; // <- this operation is unsafe RecursivePageTable::new(level_4_table).unwrap() } ``` The problem with this is that we don't immediately see which parts are unsafe. For example, we don't know whether the `RecursivePageTable::new` function is unsafe or not without looking at [its definition][RecursivePageTable::new]. This makes it very easy to accidentally do something unsafe without noticing. [RecursivePageTable::new]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/struct.RecursivePageTable.html#method.new To avoid this problem, we can add a safe inner function: ```rust // in src/memory.rs pub unsafe fn init(level_4_table_addr: usize) -> RecursivePageTable<'static> { /// Rust currently treats the whole body of unsafe functions as an unsafe /// block, which makes it difficult to see which operations are unsafe. To /// limit the scope of unsafe we use a safe inner function. fn init_inner(level_4_table_addr: usize) -> RecursivePageTable<'static> { let level_4_table_ptr = level_4_table_addr as *mut PageTable; let level_4_table = unsafe { &mut *level_4_table_ptr }; RecursivePageTable::new(level_4_table).unwrap() } init_inner(level_4_table_addr) } ``` Now an `unsafe` block is required again for dereferencing the `level_4_table_ptr` and we immediately see that this is the only unsafe operations in the function. There is currently an open [RFC][unsafe-fn-rfc] to change this unfortunate property of unsafe functions that would allow us to avoid the above boilerplate. [unsafe-fn-rfc]: https://github.com/rust-lang/rfcs/pull/2585 ### Creating a new Mapping After reading the page tables and creating a translation function, the next step is to create a new mapping in the page table hierarchy. The difficulty of creating a new mapping depends on the virtual page that we want to map. In the easiest case, the level 1 page table for the page already exists and we just need to write a single entry. In the most difficult case, the page is in a memory region for that no level 3 exists yet so that we need to create new level 3, level 2 and level 1 page tables first. Let's start with the simple case and assume that we don't need to create new page tables. The bootloader loads itself in the first megabyte of the virtual address space, so we know that a valid level 1 table exists for this region. We can choose any unused page in this memory region for our example mapping, for example, the page at address `0x1000`. As the target frame we use `0xb8000`, the frame of the VGA text buffer. This way we can easily test whether our mapping worked. We implement it in a new `create_example_mapping` function in our `memory` module: ```rust // in src/memory.rs use x86_64::structures::paging::{FrameAllocator, PhysFrame, Size4KiB}; pub fn create_example_mapping( recursive_page_table: &mut RecursivePageTable, frame_allocator: &mut impl FrameAllocator, ) { use x86_64::structures::paging::PageTableFlags as Flags; let page: Page = Page::containing_address(VirtAddr::new(0x1000)); let frame = PhysFrame::containing_address(PhysAddr::new(0xb8000)); let flags = Flags::PRESENT | Flags::WRITABLE; let map_to_result = unsafe { recursive_page_table.map_to(page, frame, flags, frame_allocator) }; map_to_result.expect("map_to failed").flush(); } ``` The function takes a mutable reference to the `RecursivePageTable` because it needs to modify it and a `FrameAllocator` that is explained below. It then uses the [`map_to`] function of the [`Mapper`] trait to map the page at address `0x1000` to the physical frame at address `0xb8000`. The function is unsafe because it's possible to break memory safety with invalid arguments. [`map_to`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/trait.Mapper.html#tymethod.map_to [`Mapper`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/trait.Mapper.html Apart from the `page` and `frame` arguments, the [`map_to`] function takes two more arguments. The third argument is a set of flags for the page table entry. We set the `PRESENT` flag because it is required for all valid entries and the `WRITABLE` flag to make the mapped page writable. The fourth argument needs to be some structure that implements the [`FrameAllocator`] trait. The `map_to` method needs this argument because it might need unused frames for creating new page tables. The `Size4KiB` argument in the trait implementation is needed because the [`Page`] and [`PhysFrame`] types are [generic] over the [`PageSize`] trait to work with both standard 4KiB pages and huge 2MiB/1GiB pages. [`FrameAllocator`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/trait.FrameAllocator.html [`Page`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/page/struct.Page.html [`PhysFrame`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/frame/struct.PhysFrame.html [generic]: https://doc.rust-lang.org/book/ch10-00-generics.html [`PageSize`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/page/trait.PageSize.html The `map_to` function can fail, so it returns a [`Result`]. Since this is just some example code that does not need to be robust, we just use [`expect`] to panic when an error occurs. On success, the function returns a [`MapperFlush`] type that provides an easy way to flush the newly mapped page from the translation lookaside buffer (TLB) with its [`flush`] method. Like `Result`, the type uses the [`#[must_use]`] attribute to emit a warning when we accidentally forget to use it. [`Result`]: https://doc.rust-lang.org/core/result/enum.Result.html [`expect`]: https://doc.rust-lang.org/core/result/enum.Result.html#method.expect [`MapperFlush`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/struct.MapperFlush.html [`flush`]: https://docs.rs/x86_64/0.5.2/x86_64/structures/paging/mapper/struct.MapperFlush.html#method.flush [`#[must_use]`]: https://doc.rust-lang.org/std/result/#results-must-be-used Since we know that no new page tables are required for the address `0x1000`, a frame allocator that always returns `None` suffices. We create such an `EmptyFrameAllocator` for testing our mapping function: ```rust // in src/memory.rs /// A FrameAllocator that always returns `None`. pub struct EmptyFrameAllocator; impl FrameAllocator for EmptyFrameAllocator { fn allocate_frame(&mut self) -> Option { None } } ``` (If you're getting a 'method `allocate_frame` is not a member of trait `FrameAllocator`' error, you need to update `x86_64` to version 0.4.0.) We can now test the new mapping function in our `main.rs`: ```rust // in src/main.rs #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start() -> ! { […] // initialize GDT, IDT, PICS use blog_os::memory::{create_example_mapping, EmptyFrameAllocator}; const LEVEL_4_TABLE_ADDR: usize = 0o_177777_777_777_777_777_0000; let mut recursive_page_table = unsafe { memory::init(LEVEL_4_TABLE_ADDR) }; create_example_mapping(&mut recursive_page_table, &mut EmptyFrameAllocator); unsafe { (0x1900 as *mut u64).write_volatile(0xf021_f077_f065_f04e)}; println!("It did not crash!"); blog_os::hlt_loop(); } ``` We first create the mapping for the page at `0x1000` by calling our `create_example_mapping` function with a mutable reference to the `RecursivePageTable` instance. This maps the page `0x1000` to the VGA text buffer, so we should see any write to it on the screen. Then we write the value `0xf021_f077_f065_f04e` to this page, which represents the string _"New!"_ on white background. We don't write directly to the beginning of the page at `0x1000` since the top line is directly shifted off the screen by the next `println`. Instead, we write to offset `0x900`, which is about in the middle of the screen. As we learned [in the _“VGA Text Mode”_ post], writes to the VGA buffer should be volatile, so we use the [`write_volatile`] method. [in the _“VGA Text Mode”_ post]: @/edition-2/posts/03-vga-text-buffer/index.md#volatile [`write_volatile`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.write_volatile When we run it in QEMU, we see the following output: ![QEMU printing "It did not crash!" with four completely white cells in the middle of the screen](qemu-new-mapping.png) The _"New!"_ on the screen is by our write to `0x1900`, which means that we successfully created a new mapping in the page tables. This only worked because there was already a level 1 table for mapping page `0x1000`. When we try to map a page for that no level 1 table exists yet, the `map_to` function fails because it tries to allocate frames from the `EmptyFrameAllocator` for creating new page tables. We can see that happen when we try to map page `0xdeadbeaf000` instead of `0x1000`: ```rust // in src/memory.rs pub fn create_example_mapping(…) { […] let page: Page = Page::containing_address(VirtAddr::new(0xdeadbeaf000)); […] } // in src/main.rs #[no_mangle] pub extern "C" fn _start() -> ! { […] unsafe { (0xdeadbeaf900 as *mut u64).write_volatile(0xf021_f077_f065_f04e)}; […] } ``` When we run it, a panic with the following error message occurs: ``` panicked at 'map_to failed: FrameAllocationFailed', /…/result.rs:999:5 ``` To map pages that don't have a level 1 page table yet we need to create a proper `FrameAllocator`. But how do we know which frames are unused and how much physical memory is available? ### Boot Information The amount of physical memory and the memory regions reserved by devices like the VGA hardware vary between different machines. Only the BIOS or UEFI firmware knows exactly which memory regions can be used by the operating system and which regions are reserved. Both firmware standards provide functions to retrieve the memory map, but they can only be called very early in the boot process. For this reason, the bootloader already queries this and other information from the firmware. To communicate this information to our kernel, the bootloader passes a reference to a boot information structure as an argument when calling our `_start` function. Right now we don't have this argument declared in our function, so it is ignored. Let's add it: ```rust // in src/main.rs use bootloader::bootinfo::BootInfo; #[cfg(not(test))] #[no_mangle] pub extern "C" fn _start(boot_info: &'static BootInfo) -> ! { // new argument […] } ``` The `BootInfo` struct is still in an early stage, so expect some breakage when updating to future [semver-incompatible] bootloader versions. It currently has the three fields `p4_table_addr`, `memory_map`, and `package`: [semver-incompatible]: https://doc.rust-lang.org/stable/cargo/reference/specifying-dependencies.html#caret-requirements - The `p4_table_addr` field contains the recursive virtual address of the level 4 page table. By using this field we can avoid hardcoding the address `0o_177777_777_777_777_777_0000`. - The `memory_map` field is most interesting to us since it contains a list of all memory regions and their type (i.e. unused, reserved, or other). - The `package` field is an in-progress feature to bundle additional data with the bootloader. The implementation is not finished, so we can ignore this field for now. Before we use the `memory_map` field to create a proper `FrameAllocator`, we want to ensure that we can't use a `boot_info` argument of the wrong type. #### The `entry_point` Macro Since our `_start` function is called externally from the bootloader, no checking of our function signature occurs. This means that we could let it take arbitrary arguments without any compilation errors, but it would fail or cause undefined behavior at runtime. To make sure that the entry point function has always the correct signature that the bootloader expects, the `bootloader` crate provides an `entry_point` macro that provides a type-checked way to define a Rust function as the entry point. Let's rewrite our entry point function to use this macro: ```rust // in src/main.rs use bootloader::{bootinfo::BootInfo, entry_point}; entry_point!(kernel_main); #[cfg(not(test))] fn kernel_main(boot_info: &'static BootInfo) -> ! { […] // initialize GDT, IDT, PICS let mut recursive_page_table = unsafe { memory::init(boot_info.p4_table_addr as usize) }; […] // create and test example mapping println!("It did not crash!"); blog_os::hlt_loop(); } ``` We no longer need to use `extern "C"` or `no_mangle` for our entry point, as the macro defines the real lower level `_start` entry point for us. The `kernel_main` function is now a completely normal Rust function, so we can choose an arbitrary name for it. The important thing is that it is type-checked so that a compilation error occurs when we now try to modify the function signature in any way, for example adding an argument or changing the argument type. Note that we now pass `boot_info.p4_table_addr` instead of a hardcoded address to our `memory::init`. Thus our code continues to work even if a future version of the bootloader chooses a different entry of the level 4 page table for the recursive mapping. ### Allocating Frames Now that we have access to the memory map through the boot information we can create a proper frame allocator on top. We start with a generic skeleton: ```rust // in src/memory.rs pub struct BootInfoFrameAllocator where I: Iterator { frames: I, } impl FrameAllocator for BootInfoFrameAllocator where I: Iterator { fn allocate_frame(&mut self) -> Option { self.frames.next() } } ``` The `frames` field can be initialized with an arbitrary [`Iterator`] of frames. This allows us to just delegate `alloc` calls to the [`Iterator::next`] method. [`Iterator`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html [`Iterator::next`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#tymethod.next The initialization of the `BootInfoFrameAllocator` happens in a new `init_frame_allocator` function: ```rust // in src/memory.rs use bootloader::bootinfo::{MemoryMap, MemoryRegionType}; /// Create a FrameAllocator from the passed memory map pub fn init_frame_allocator( memory_map: &'static MemoryMap, ) -> BootInfoFrameAllocator> { // get usable regions from memory map let regions = memory_map .iter() .filter(|r| r.region_type == MemoryRegionType::Usable); // map each region to its address range let addr_ranges = regions.map(|r| r.range.start_addr()..r.range.end_addr()); // transform to an iterator of frame start addresses let frame_addresses = addr_ranges.flat_map(|r| r.into_iter().step_by(4096)); // create `PhysFrame` types from the start addresses let frames = frame_addresses.map(|addr| { PhysFrame::containing_address(PhysAddr::new(addr)) }); BootInfoFrameAllocator { frames } } ``` This function uses iterator combinator methods to transform the initial `MemoryMap` into an iterator of usable physical frames: - First, we call the `iter` method to convert the memory map to an iterator of `MemoryRegion`s. Then we use the [`filter`] method to skip any reserved or otherwise unavailable regions. The bootloader updates the memory map for all the mappings it creates, so frames that are used by our kernel (code, data or stack) or to store the boot information are already marked as `InUse` or similar. Thus we can be sure that `Usable` frames are not used somewhere else. - In the second step, we use the [`map`] combinator and Rust's [range syntax] to transform our iterator of memory regions to an iterator of address ranges. - The third step is the most complicated: We convert each range to an iterator through the `into_iter` method and then choose every 4096th address using [`step_by`]. Since 4096 bytes (= 4 KiB) is the page size, we get the start address of each frame. The bootloader page aligns all usable memory areas so that we don't need any alignment or rounding code here. By using [`flat_map`] instead of `map`, we get an `Iterator` instead of an `Iterator>`. - In the final step, we convert the start addresses to `PhysFrame` types to construct the desired `Iterator`. We then use this iterator to create and return a new `BootInfoFrameAllocator`. [`filter`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter [`map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.map [range syntax]: https://doc.rust-lang.org/core/ops/struct.Range.html [`step_by`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.step_by [`flat_map`]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.flat_map We can now modify our `kernel_main` function to pass a `BootInfoFrameAllocator` instance instead of an `EmptyFrameAllocator`: ```rust // in src/main.rs #[cfg(not(test))] fn kernel_main(boot_info: &'static BootInfo) -> ! { […] // initialize GDT, IDT, PICS use x86_64::structures::paging::{PageTable, RecursivePageTable}; let mut recursive_page_table = unsafe { memory::init(boot_info.p4_table_addr as usize) }; // new let mut frame_allocator = memory::init_frame_allocator(&boot_info.memory_map); blog_os::memory::create_example_mapping(&mut recursive_page_table, &mut frame_allocator); unsafe { (0xdeadbeaf900 as *mut u64).write_volatile(0xf021_f077_f065_f04e)}; println!("It did not crash!"); blog_os::hlt_loop(); } ``` Now the mapping succeeds and we see the black-on-white _"New!"_ on the screen again. Behind the scenes, the `map_to` method creates the missing page tables in the following way: - Allocate an unused frame from the passed `frame_allocator`. - Map the entry of the higher level table to that frame. Now the frame is accessible through the recursive page table. - Zero the frame to create a new, empty page table. - Continue with the next table level. While our `create_example_mapping` function is just some example code, we are now able to create new mappings for arbitrary pages. This will be essential for allocating memory or implementing multithreading in future posts. ## Summary In this post we learned how a recursive level 4 table entry can be used to map all page table frames to calculatable virtual addresses. We used this technique to implement an address translation function and to create a new mapping in the page tables. We saw that the creation of new mappings requires unused frames for creating new page tables. Such a frame allocator can be implemented on top of the boot information structure that the bootloader passes to our kernel. ## What's next? The next post will create a heap memory region for our kernel, which will allow us to [allocate memory] and use various [collection types]. [allocate memory]: https://doc.rust-lang.org/alloc/boxed/struct.Box.html [collection types]: https://doc.rust-lang.org/alloc/collections/index.html ================================================ FILE: blog/content/edition-2/posts/deprecated/_index.md ================================================ +++ title = "Deprecated Posts" sort_by = "weight" insert_anchor_links = "left" render = false +++ ================================================ FILE: blog/content/news/2018-03-09-pure-rust.md ================================================ +++ title = "Writing an OS in pure Rust" date = 2018-03-09 aliases = ["news/2018-03-09-pure-rust"] +++ Over the past six months we've been working on a second edition of this blog. Our goals for this new version are [numerous] and we are still not done yet, but today we reached a major milestone: It is now possible to build the OS natively on Windows, macOS, and Linux **without any non-Rust dependendencies**. [numerous]: https://github.com/phil-opp/blog_os/issues/360 The [first edition] required several C-tools for building: [first edition]: @/edition-1/_index.md - We used the [`GRUB`] bootloader for booting our kernel. To create a bootable disk/CD image we used the [`grub-mkrescue`] tool, which is very difficult to get to run on Windows. - The [`xorriso`] program was also required, because it is used by `grub-mkrescue`. - GRUB only boots to protected mode, so we needed some assembly code for [entering long mode]. For building the assembly code, we used the [`nasm`] assembler. - We used the GNU linker [`ld`] for linking together the assembly files with the rust code, using a custom [linker script]. - Finally, we used [`make`] for automating the various build steps (assembling, compiling the Rust code, linking, invoking `grub-mkrescue`). [`GRUB`]: https://www.gnu.org/software/grub/ [`grub-mkrescue`]: https://www.gnu.org/software/grub/manual/grub/html_node/Invoking-grub_002dmkrescue.html [`xorriso`]: https://www.gnu.org/software/xorriso/ [entering long mode]: @/edition-1/posts/02-entering-longmode/index.md [`nasm`]: https://www.nasm.us/xdoc/2.13.03/html/nasmdoc1.html [`ld`]: https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html [linker script]: https://sourceware.org/binutils/docs/ld/Scripts.html [`make`]: https://www.gnu.org/software/make/ We got lots of feedback that this setup was difficult to get running [under macOS] and Windows. As a workaround, we [added support for docker], but that still required users to install and understand an additional dependency. So when we decided to create a second edition of the blog - originally because the order of posts led to jumps in difficulty - we thought about how we could avoid these C-dependencies. [under macOS]: https://github.com/phil-opp/blog_os/issues/55 [added support for docker]: https://github.com/phil-opp/blog_os/pull/373 There are lots of alternatives to `make`, including some Rust tools such as [just] and [cargo-make]. Avoiding `nasm` is also possible by using Rust's [`global_asm`] feature instead. So there are only two problems left: the bootloader and the linker. [just]: https://github.com/casey/just [cargo-make]: https://sagiegurari.github.io/cargo-make/ [`global_asm`]: https://doc.rust-lang.org/stable/core/arch/macro.global_asm.html ## A custom Bootloader To avoid the dependency on GRUB and to make things more ergonomic, we decided to write [our own bootloader] using Rust's [`global_asm`] feature. This way, the kernel can be significantly simplified, since the switch to long mode and the initial page table layout can already be done in the bootloader. Thus, we can avoid the initial assembly level blog posts in the second edition and directly start with high level Rust code. [our own bootloader]: https://github.com/rust-osdev/bootloader The bootloader is still an early prototype, but it is already capable of switching to long mode and loading the kernel in form of an 64-bit ELF binary. It also performs the correct page table mapping (with the correct read/write/execute permissions) as it's specified in the ELF file and creates an initial physical memory map. The plan for the future is to make the bootloader more stable, add documentation, and ultimately add a “Writing a Bootloader” series to the blog, which explains in detail how the bootloader works. ## Linking with LLD With our custom bootloader in place, the last remaining problem is platform independent linking. Fortunately there is [`LLD`], the cross-platform linker from the LLVM project, which is already very stable for the `x86` architecture. As a bonus, `LLD` is [now shipped with Rust], which means that it can be used without any extra installation. [`LLD`]: https://lld.llvm.org/ [now shipped with Rust]: https://github.com/rust-lang/rust/pull/48125 ## The new Posts The second edition is already live at [https://os.phil-opp.com/second-edition]. Please tell us if you have any feedback on the new posts! We're planning to move over the content from the [first edition] iteratively, in a different order and with various other improvements. [https://os.phil-opp.com/second-edition]: @/edition-2/_index.md Many thanks to everyone who helped to make Rust an even better language for OS development! ================================================ FILE: blog/content/news/_index.md ================================================ +++ title = "News" template = "news-section.html" page_template = "news-page.html" sort_by = "date" +++ ================================================ FILE: blog/content/pages/_index.md ================================================ +++ title = "Pages" render = false +++ ================================================ FILE: blog/content/pages/contact.md ================================================ +++ title = "Contact" path = "contact" template = "plain.html" +++ Philipp Oppermann contact@phil-opp.com Erna-Hötzel-Str. 3, 76344 Eggenstein, Germany ================================================ FILE: blog/content/status-update/2019-05-01.md ================================================ +++ title = "Updates in April 2019" date = 2019-05-01 +++ Lot's of things changed in the _Writing an OS in Rust_ series in the past month, both on the blog itself and in the tools behind the scenes. This post gives an overview of the most important updates. This post is an experiment inspired by [_This Week in Rust_] and similar series. The goal is to provide a resource that allows following the project more closely and staying up-to-date with the changes in the tools/libraries behind the scenes. If enough people find this useful, I will try to turn this in a semi-regular series. [_This Week in Rust_]: https://this-week-in-rust.org/ ## Bootloader - The build system of the bootloader was rewritten to do a proper linking instead of appending the kernel executable manually like before. The relevant pull requests are [_Rewrite build system_](https://github.com/rust-osdev/bootloader/pull/51) and [_Updates for new build system_](https://github.com/rust-osdev/bootloader/pull/53). These (breaking) changes were released as version `0.5.0` ([changelog](https://github.com/rust-osdev/bootloader/blob/master/Changelog.md#050)). - To make the bootloader work with future versions of `bootimage`, a [`package.metadata.bootloader.target` key was added](https://github.com/rust-osdev/bootloader/commit/33b8ce6059e90485c56883b23d4834d06ddfd517) to the `Cargo.toml` of the bootloader. This key specifies the name of the target JSON file, so that `bootimage` knows which `--target` argument to pass. This change was released as version `0.5.1` ([changelog](https://github.com/rust-osdev/bootloader/blob/master/Changelog.md#051)) - In the [_Version 0.6.0_](https://github.com/rust-osdev/bootloader/pull/55) pull request, the `#[cfg(not(test))]` attribute was removed from the `entry_point` macro. This makes it possible to use the macro together with `cargo xtest` and a custom test framework. Since the change is breaking, it was released as version `0.6.0` ([changelog](https://github.com/rust-osdev/bootloader/blob/master/Changelog.md#060)). ## Bootimage - The [_Rewrite bootimage for new bootloader build system_](https://github.com/rust-osdev/bootimage/pull/34) pull request completely revamped the implementation of the crate. This was released as version `0.7.0`. See the [changelog](https://github.com/rust-osdev/bootimage/blob/master/Changelog.md#070) for a list of changes. - The rewrite had the unintended side-effect that `bootimage run` no longer ignored executables named `test-*`, so that an additional `--bin` argument was required for specifying which executable to run. To avoid breaking users of `bootimage test`, we yanked version `0.7.0`. After [fixing the issue](https://github.com/rust-osdev/bootimage/commit/8746c15bf326cf8438a4e64ffdda332fbe59e30d), version `0.7.1` was released ([changelog](https://github.com/rust-osdev/bootimage/blob/master/Changelog.md#071)). - The [_New features for `bootimage runner`_](https://github.com/rust-osdev/bootimage/pull/36) pull request added support for additional arguments and various functionality for supporting `cargo xtest`. The changes were released as version `0.7.2` ([changelog](https://github.com/rust-osdev/bootimage/blob/master/Changelog.md#072)). - An argument parsing bug that broke the new `cargo bootimage` subcommand on Windows was [fixed](https://github.com/rust-osdev/bootimage/commit/101eb43de403fd9f3cb3f044e2c263356d2c179a). The fix was released as version `0.7.3`. ## Blog OS - Performed an [_Update to new bootloader 0.5.1 and bootimage 0.7.2_](https://github.com/phil-opp/blog_os/pull/575). Apart from requiring the `llvm-tools-preview` rustup component, this only changes version numbers. - The [_Rewrite the linking section of "A Freestanding Rust Binary"_](https://github.com/phil-opp/blog_os/pull/577) pull request updated the first post to compile for the bare-metal `thumbv7em-none-eabihf` target instead of adding linker arguments for Linux/Windows/macOS. - Since the blog came close to the free bandwidth limit of Netlify, we needed to [_Migrate from Netlify to Github Pages_](https://github.com/phil-opp/blog_os/pull/579) to avoid additional fees. - With the [_Minimal Rust Kernel: Use a runner to make cargo xrun work_](https://github.com/phil-opp/blog_os/pull/582) pull request, we integrated the new `bootimage runner` into the blog. - The required updates to the `post-02` and `post-03` branches were performed in the [_Add `.cargo/config` file to post-02 branch_](https://github.com/phil-opp/blog_os/pull/585) and [_Merge the changes from #585 into the post-03 branch_](https://github.com/phil-opp/blog_os/pull/586) pull requests. - In the [_New testing post_](https://github.com/phil-opp/blog_os/pull/584) pull request, we replaced the previous [_Unit Testing_](https://os.phil-opp.com/unit-testing/) and [_Integration Tests_](https://os.phil-opp.com/integration-tests/) with the new [_Testing_](https://os.phil-opp.com/testing/) post, which uses `cargo xtest` and a custom test framework for running tests. - The required updates for the `post-04` branch were performed in the [_Implement code for new testing post in post-xx branches_](https://github.com/phil-opp/blog_os/pull/587) pull request. The updates for the other `post-*` branches were pushed manually to avoid spamming the repository with pull requests. You can find a list of the commits in the pull request description. - The [_Avoid generic impl trait parameters in BootInfoFrameAllocator_](https://github.com/phil-opp/blog_os/pull/595) pull request made the `BootInfoFrameAllocator` non-generic by reconstructing the frame iterator on every allocation. This way, we avoid using a `impl Trait` type parameter, which makes it [impossible to store the type in a `static`](https://github.com/phil-opp/blog_os/issues/593). See [rust-lang/rust#60367](https://github.com/rust-lang/rust/issues/60367) for the fundamental problem. ================================================ FILE: blog/content/status-update/2019-06-03.md ================================================ +++ title = "Updates in May 2019" date = 2019-06-03 +++ This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and to the used tools. I was quite busy with my master thesis this month, so I didn't have the time to create new content or major new features. However, there were quite a few minor updates. ## x86_64 - [Use cast crate instead of usize_conversions crate](https://github.com/rust-osdev/x86_64/pull/70) (released as version 0.5.5). - [Make FrameAllocator an unsafe trait](https://github.com/rust-osdev/x86_64/pull/71) (released as version 0.6.0). - [Change Port::read and PortReadOnly::read to take &mut self](https://github.com/rust-osdev/x86_64/pull/76) (released as version 0.7.0). - [@npmccallum](https://github.com/npmccallum) started working on [moving the type declarations to a separate crate](https://github.com/rust-osdev/x86_64/issues/72) to make them usable for more projects. We created the experimental [x86_64_types](https://github.com/rust-osdev/x86_64_types/) crate for this. ## Cargo-Xbuild - [Make backtraces optional](https://github.com/rust-osdev/cargo-xbuild/commit/bd73f5a1b975f1938abd5b4c17a048d2018741b7) to remove the transitive dependency on the `cc` crate, which has additional [compile-time requirements](https://github.com/alexcrichton/cc-rs#compile-time-requirements) (e.g. a working `gcc` installation). These requirements caused [problems for some people](https://github.com/phil-opp/blog_os/issues/612), so we decided to disable backtraces by default. Released as version 0.5.9. - [Error when the sysroot path contains spaces](https://github.com/rust-osdev/cargo-xbuild/pull/32): This pull request adds a special error message that points to [rust-lang/cargo#6139](https://github.com/rust-lang/cargo/issues/6139) when a sysroot path contains spaces. This should avoid the regular confusion, e.g. [here](https://github.com/phil-opp/blog_os/issues/464#issuecomment-427793367), [here](https://github.com/phil-opp/blog_os/issues/403#issuecomment-483046786), or [here](https://github.com/phil-opp/blog_os/issues/403#issuecomment-487313363). - [Add a `XBUILD_SYSROOT_PATH` environment variable to override sysroot path](https://github.com/rust-osdev/cargo-xbuild/pull/33): This feature is useful when the default sysroot path contains a space. Released as version 0.5.10. - [Fix the new `XBUILD_SYSROOT_PATH` environment variable](https://github.com/rust-osdev/cargo-xbuild/pull/34). Released as version 0.5.11. - [Update Azure Pipelines CI script](https://github.com/rust-osdev/bootimage/pull/40) - Build all branches instead of just `master` and the [bors](https://bors.tech/) `staging` branch. - Rustup is now included in the official Windows image of Azure Pipelines, so we don't need to install it again. ## Bootloader - [@rybot666](https://github.com/rybot666) started working on [porting the 16-bit assembly of the bootloader to Rust](https://github.com/rust-osdev/bootloader/issues/24). ## Bootimage - [@toothbrush7777777](https://github.com/toothbrush7777777) landed a pull request to [pad the boot image to a hard disk block size](https://github.com/rust-osdev/bootimage/pull/39). This is required for booting the image in VirtualBox. Released as version 0.7.4. - [Set `XBUILD_SYSROOT_PATH` when building bootloader](https://github.com/rust-osdev/bootimage/pull/41). Released as version 0.7.5. ## Blog OS - [Update to version 0.6.0 of x86_64](https://github.com/phil-opp/blog_os/pull/600), which made the `FrameAllocator` trait unsafe to implement. - [Use `-serial stdio` instead of `-serial mon:stdio`](https://github.com/phil-opp/blog_os/pull/604) as QEMU arguments when testing. - [Update x86_64 to version 0.7.0](https://github.com/phil-opp/blog_os/pull/606), which changed the `Port::read` method to take `&mut self` instead of `&self`. - [@josephlr](https://github.com/josephlr) [replaced some leftover tabs with spaces](https://github.com/phil-opp/blog_os/pull/609). - [Rewrite `CompareMessage` struct to check the whole string](https://github.com/phil-opp/blog_os/pull/611). ================================================ FILE: blog/content/status-update/2019-07-06.md ================================================ +++ title = "Updates in June 2019" date = 2019-07-06 aliases = ["status-update/2019-06-04/index.html"] +++ This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and the used libraries and tools. My focus this month was to finish the [_Heap Allocation_](@/edition-2/posts/10-heap-allocation/index.md) post, on which I had been working since March. I originally wanted to include a section about different allocator designs (bump, linked list, slab, …) and how to implement them, but I decided to split it out into a separate post because it became much too long. I try to release this half-done post soon. Apart from the new post, there were some minor updates to the `x86_64`, `bootloader` and `cargo-xbuild` crates. The following gives a short overview of notable changes to the different projects. ## blog_os - [Use misspell tool to look for common typos](https://github.com/phil-opp/blog_os/pull/617) - [New post about heap allocation](https://github.com/phil-opp/blog_os/pull/625) ## x86_64 - [Add ring-3 flag to GDT descriptor](https://github.com/rust-osdev/x86_64/pull/77) by [@mark-i-m](https://github.com/mark-i-m) (released as version 0.7.1) - [Add bochs magic breakpoint, read instruction pointer, inline instructions](https://github.com/rust-osdev/x86_64/pull/79) by [@64](https://github.com/64) ## bootloader - [Make the physical memory offset configurable through a `BOOTLOADER_PHYSICAL_MEMORY_OFFSET` environment variable](https://github.com/rust-osdev/bootloader/pull/58) - [Use a stripped copy of the kernel binary (debug info removed) to reduce load times](https://github.com/rust-osdev/bootloader/pull/59) (released as version 0.6.1) ## cargo-xbuild - [Document the XBUILD_SYSROOT_PATH environment variable](https://github.com/rust-osdev/cargo-xbuild/commit/994b5e75e1a4062cf506700e0ff38d5404338a37) - [Fix incorrect joining of paths that caused some problems on Windows](https://github.com/rust-osdev/cargo-xbuild/commit/a1ff03311dd74447e8e845b4b96f2e137850027d) ================================================ FILE: blog/content/status-update/2019-08-02.md ================================================ +++ title = "Updates in July 2019" date = 2019-08-02 +++ This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and the used libraries and tools. Since I'm still very busy with my master thesis, I haven't had the time to work on a new post. But there were quite a few maintenance updates this month and also a few new features such as the new `OffsetPageTable` type in the `x86_64` crate. We also had some great contributions this month. Thanks to the efforts of [@64](https://github.com/64), we were able to considerably lower the compile times of the `x86_64` and `bootloader` crates. Thanks to [@Aehmlo](https://github.com/Aehmlo), the `cargo-xbuild` crate now has a `cargo xdoc` subcommands and support for the `cargo {c, b, t, r}` aliases. The following list gives a short overview of notable changes to the different projects. ## blog_os - [Fix a lot of dead links in both the second and first edition](https://github.com/phil-opp/blog_os/pull/638) - [Update paging introduction post to use page fault error code](https://github.com/phil-opp/blog_os/pull/644) ## x86_64 - [Reexport MappedPageTable on non-x86_64 platforms too](https://github.com/rust-osdev/x86_64/pull/82) - [Update GDT docs, add user_data_segment function and WRITABLE flag](https://github.com/rust-osdev/x86_64/pull/78) by [@64](https://github.com/64) (published as version 0.7.2) - [Add a new `OffsetPageTable` mapper type](https://github.com/rust-osdev/x86_64/pull/83) (published as version 0.7.3) - [Update integration tests to use new testing framework](https://github.com/rust-osdev/x86_64/pull/86) - [Remove raw-cpuid dependency and use rdrand intrinsics](https://github.com/rust-osdev/x86_64/pull/85) by [@64](https://github.com/64) (published as version 0.7.4) ## bootloader - [Remove stabilized publish-lockfile feature](https://github.com/rust-osdev/bootloader/pull/62) (published as version 0.6.2) - [Update CI badge, use latest version of x86_64 crate and rustfmt](https://github.com/rust-osdev/bootloader/pull/63) by [@64](https://github.com/64) (published as version 0.6.3) - [Use volatile accesses in VGA code and make font dependency optional](https://github.com/rust-osdev/bootloader/pull/67) by [@64](https://github.com/64) - Making the dependency optional should improve compile times when the VGA text mode is used - Published as version 0.6.4 - **Breaking**: [Only include dependencies when `binary` feature is enabled](https://github.com/rust-osdev/bootloader/pull/68) (published as version 0.7.0) ## bootimage - [If the bootloader has a feature named `binary`, enable it](https://github.com/rust-osdev/bootimage/pull/43) (published as version 0.7.6) - This is required for building `bootloader 0.7.0` or later ## cargo-xbuild - [Add `cargo xdoc` command for invoking `cargo doc`](https://github.com/rust-osdev/cargo-xbuild/pull/39) by [@Aehmlo](https://github.com/Aehmlo) (published as version 0.5.13) - [Don't append a `--sysroot` argument to `RUSTFLAGS` if it already contains one](https://github.com/rust-osdev/cargo-xbuild/pull/40) (published as version 0.5.14) - [Add xb, xt, xc, and xr subcommands](https://github.com/rust-osdev/cargo-xbuild/pull/42) by [@Aehmlo](https://github.com/Aehmlo) (published as version 0.5.15) ================================================ FILE: blog/content/status-update/2019-09-09.md ================================================ +++ title = "Updates in August 2019" date = 2019-09-09 +++ This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and the used libraries and tools. I was very busy with finishing my master's thesis, so I didn't have any to implement any notable changes myself. Thanks to contributions by [@vinaychandra](https://github.com/vinaychandra) and [@64](https://github.com/64), we were still able to publish new versions of the `x86_64`, `bootimage` and `bootloader` crates. ## `blog_os` Apart from [rewriting the section about no-harness tests](https://github.com/phil-opp/blog_os/pull/650) of the _Testing_ post, there were no notable changes to the blog in August. Now that I have some more free time again, I plan to upgrade the blog to the latest versions of `bootloader` and `bootimage`, evaluate the use of [GitHub Actions](https://github.com/features/actions) for the repository, and continue the work on the upcoming post about heap allocator implementations. ## `x86_64` Thanks to [@vinaychandra](https://github.com/vinaychandra), the `x86_64` crate now has [support for the `FsBase` and `GsBase` registers](https://github.com/rust-osdev/x86_64/pull/87). The change was published as version 0.7.5. ## `bootimage` To allow bootloaders to read configuration from the `Cargo.toml` file of the kernel, the `bootimage` crate now [passes the location of the kernel's Cargo.toml to bootloader crates](https://github.com/rust-osdev/bootimage/pull/45). This change was implemented by [@64] and published as version 0.7.7. ## `bootloader` Apart from initializing the CPU and loading the kernel, the `bootloader` crate is also responsible for creating several memory regions for the kernel, for example a program stack and the boot information struct. These regions must be mapped at some address in the virtual address space. As a stop-gap solution, the `bootloader` crate used fixed virtual addresses for these regions, which resulted in errors if the kernel tried to use the same address ranges itself. For example, the (optional) recursive mapping of page tables often conflicted with so-called _higher half kernels_, which live at the upper end of the address space. To avoid these conflicts, [@64] updated the `bootloader` crate to [dynamically map the kernel stack, boot info, physical memory, and recursive table regions](https://github.com/rust-osdev/bootloader/pull/71) at an unused virtual address range. To also support specifying explicit addresses for these regions, [@64] further added support for [parsing bootloader configuration from the kernel's Cargo.toml](https://github.com/rust-osdev/bootloader/pull/73). This way, the virtual addresses of the kernel stack and physical memory mapping can now be configured using a `package.metadata.bootloader` key in the `Cargo.toml` of the kernel. In a third pull request, [@64] also made the [kernel stack size configurable](https://github.com/rust-osdev/bootloader/pull/72). The changes were published together as version 0.8.0. This is a breaking update because the new configuration system requires at least version 0.7.7 of `bootimage`, which is the first version that passes the location of the kernel's `Cargo.toml` file. [@64]: https://github.com/64 ================================================ FILE: blog/content/status-update/2019-10-06.md ================================================ +++ title = "Updates in September 2019" date = 2019-10-06 +++ This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and the used libraries and tools. I finished my master thesis and got my degree this month, so I only had limited time for my open source work. I still managed to perform a few minor updates, including code simplications for the _Paging Implementation_ post and the evaluation of GitHub Actions as a CI service. ## `blog_os` - [Improve Paging Implementation Post](https://github.com/phil-opp/blog_os/pull/666): Improves and simplifies the code in multiple places - [Use GitHub Actions to build and deploy blog](https://github.com/phil-opp/blog_os/pull/660) - Set up GitHub Actions for `post-XX` branches: [`post-01`](https://github.com/phil-opp/blog_os/pull/661), [`post-02`](https://github.com/phil-opp/blog_os/pull/662), [`post-04`](https://github.com/phil-opp/blog_os/pull/663) - [Update to bootloader 0.8.0](https://github.com/phil-opp/blog_os/pull/664): Considerably reduces compile times - [Update to Zola 0.9.0](https://github.com/phil-opp/blog_os/pull/670): Updates the used static site generator to the latest version ## `cargo-xbuild` - [Print a warning when building for the host target](https://github.com/rust-osdev/cargo-xbuild/pull/44) ## `bootloader` - [Add a Cargo Feature for Enabling SSE](https://github.com/rust-osdev/bootloader/pull/77) ## `uart_16550` - [Update to x86_64 0.7.3 and bitflags](https://github.com/rust-osdev/uart_16550/pull/1) - [Document how serial port is configured by default](https://github.com/rust-osdev/uart_16550/pull/2) by [@edigaryev](https://github.com/edigaryev) ## `x86_64` No updates were merged in September. However, I'm planning some breaking changes for the crate, namely: - [Replace `ux` dependency with custom wrapper structs](https://github.com/rust-osdev/x86_64/pull/91) to reduce compile times - [Add new UnsafePhysFrame type and use it in Mapper::map_to](https://github.com/rust-osdev/x86_64/pull/89) - [Make Mapper trait object safe by adding `Self: Sized` bounds on generic functions](https://github.com/rust-osdev/x86_64/pull/84) ================================================ FILE: blog/content/status-update/2019-12-02.md ================================================ +++ title = "Updates in October and November 2019" date = 2019-12-02 +++ This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and the used libraries and tools. I moved to a new apartment mid-October and had lots of work to do there, so I didn't have the time for creating the October status update post. Therefore, this post lists the changes from both October and November. I'm slowly picking up speed again, but I still have a lot of mails in my backlog. Sorry if you haven't received an answer yet! ## `blog_os` The blog itself received only a minor update: [Use panic! instead of println! + loop in double fault handler](https://github.com/phil-opp/blog_os/pull/687). This fixes an issue where a double fault during `cargo xtest` leads to an endless loop without any output on the serial port. We also have other news: We plan to add [Experimental Support for Community Translations](https://github.com/phil-opp/blog_os/pull/692) to the blog. While this imposes additional challenges, it makes the content accessible to people who don't speak English, so it's definitely worth trying in my opinion. The first additional language will be [Chinese](https://github.com/phil-opp/blog_os/pull/694), based on an [existing translation](https://github.com/rustcc/writing-an-os-in-rust) by [@luojia65](https://github.com/luojia65). Many thanks also to [@TheBegining](https://github.com/TheBegining) and [@Rustin-Liu](https://github.com/Rustin-Liu) for helping with the translation! ## `bootloader` - [Change the way the kernel entry point is called to honor alignment ABI](https://github.com/rust-osdev/bootloader/pull/81) by [@GuillaumeDIDIER](https://github.com/GuillaumeDIDIER) (published as version 0.8.2) - [Add support for Github Actions](https://github.com/rust-osdev/bootloader/pull/82) - [Remove unnecessary `extern C` on panic handler to fix not-ffi-safe warning](https://github.com/rust-osdev/bootloader/pull/85) by [@cmsd2](https://github.com/cmsd2) (published as version 0.8.3) ## `bootimage` - [Don't exit with expected exit code when failed to read QEMU exit code](https://github.com/rust-osdev/bootimage/pull/47) ## `x86_64` - [Switch to GitHub Actions for CI](https://github.com/rust-osdev/x86_64/pull/93) - [Use `repr C` to suppress not-ffi-safe when used with extern handler functions](https://github.com/rust-osdev/x86_64/pull/94) by [@cmsd2](https://github.com/cmsd2) (published as version 0.7.6) - [Add `slice` and `slice_mut` methods to IDT](https://github.com/rust-osdev/x86_64/pull/95) by [@foxcob](https://github.com/foxcob) (published as version 0.7.7) ## `cargo-xbuild` - [Add support for publishing and installing cross compiled crates](https://github.com/rust-osdev/cargo-xbuild/pull/47) by [@ALSchwalm](https://github.com/ALSchwalm) (published as version 0.5.18) ================================================ FILE: blog/content/status-update/2020-01-07.md ================================================ +++ title = "Updates in December 2019" date = 2020-01-07 +++ Happy New Year! This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and the corresponding libraries and tools. ## `blog_os` The repository of the _Writing an OS in Rust_ blog received the following updates: - Update `x86_64` dependency to version 0.8.1. This included the [dependency update](https://github.com/phil-opp/blog_os/pull/701) itself, an [update of the frame allocation code](https://github.com/phil-opp/blog_os/pull/703), and an [update of the blog](https://github.com/phil-opp/blog_os/pull/704). - [License the `blog/content` folder under CC BY-NC](https://github.com/phil-opp/blog_os/pull/705) - [Reword sentence in first post](https://github.com/phil-opp/blog_os/pull/709) by [@pamolloy](https://github.com/pamolloy) Further, we're still working on adding [Experimental Support for Community Translations](https://github.com/phil-opp/blog_os/pull/692) to the blog, starting with [Simplified Chinese](https://github.com/phil-opp/blog_os/pull/694) and [Traditional Chinese](https://github.com/phil-opp/blog_os/pull/699). Any help is appreciated! ## `bootloader` There were no updates to the bootloader this month. I'm currently working on rewriting the 16-bit/32-bit stages in Rust and making the bootloader more modular in the process. This should make it much easier to add support for UEFI and GRUB booting later. ## `bootimage` There were no updates to the `bootimage` tool this month. ## `x86_64` We landed a number of breaking changes this month: - [Replace `ux` dependency with custom wrapper structs](https://github.com/rust-osdev/x86_64/pull/91) - [Add new UnusedPhysFrame type and use it in Mapper::map_to](https://github.com/rust-osdev/x86_64/pull/89) - [Make Mapper trait object safe by adding `Self: Sized` bounds on generic functions](https://github.com/rust-osdev/x86_64/pull/84) - [Rename divide_by_zero field of IDT to divide_error](https://github.com/rust-osdev/x86_64/pull/108) - [Introduce new diverging handler functions for exceptions classified as "abort"](https://github.com/rust-osdev/x86_64/pull/109) These changes were released an version 0.8.0. Unfortunately, there was a missing re-export for the new `UnusedPhysFrame` type. We fixed it in [#110](https://github.com/rust-osdev/x86_64/pull/110) and released the fix as version 0.8.1. There was one more addition to the `x86_64` crate afterwards: - [Add support for cr4 control register (with complete documentation)](https://github.com/rust-osdev/x86_64/pull/111) by [@KarimAllah](https://github.com/KarimAllah) (released as version 0.8.2). There were also a few changes related to continuous integration: - [Remove bors from this repo](https://github.com/rust-osdev/x86_64/pull/103) - [Run 'push' builds only for master branch](https://github.com/rust-osdev/x86_64/pull/104) - [Remove Travis CI and Azure Pipelines scripts](https://github.com/rust-osdev/x86_64/pull/105) - [Add caching of cargo crates to GitHub Actions CI](https://github.com/rust-osdev/x86_64/pull/100) ## `cargo-xbuild` The `cargo-xbuild` crate, which cross-compiles the sysroot, received the following updates this month: - [Add `--quiet` flag that suppresses "waiting for file lock" message](https://github.com/rust-osdev/cargo-xbuild/pull/43) by [@Nils-TUD](https://github.com/Nils-TUD) (published as version 0.5.19) - [Fix wrong feature name for memcpy=false](https://github.com/rust-osdev/cargo-xbuild/pull/50) (released as version 0.5.20) ================================================ FILE: blog/content/status-update/2020-02-01.md ================================================ +++ title = "Updates in January 2020" date = 2020-02-01 +++ This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and the corresponding libraries and tools. ## `blog_os` The repository of the _Writing an OS in Rust_ blog received the following updates: - [Move #[global_allocator] into allocator module](https://github.com/phil-opp/blog_os/pull/714) - [Update many_boxes test to scale with heap size](https://github.com/phil-opp/blog_os/pull/716) - [New post about allocator designs](https://github.com/phil-opp/blog_os/pull/719) 🎉 - [Provide multiple implementations of align_up and mention performance](https://github.com/phil-opp/blog_os/pull/721) - [Refactor Simplified Chinese translation of post 3](https://github.com/phil-opp/blog_os/pull/725) by [@Rustin-Liu](https://github.com/Rustin-Liu) - [Use checked addition for allocator implementations](https://github.com/phil-opp/blog_os/pull/726) - [Fix dummy allocator code example](https://github.com/phil-opp/blog_os/pull/728) - [Some style updates to the front page](https://github.com/phil-opp/blog_os/pull/729) - [Mark active item in table of contents](https://github.com/phil-opp/blog_os/pull/733) - [Make active section link more discreet](https://github.com/phil-opp/blog_os/pull/734) by [@Menschenkindlein](https://github.com/Menschenkindlein) I also started working on the upcoming post about threads. ## `bootloader` The bootloader crate received two minor updates this month: - [Move architecture checks from build script into lib.rs](https://github.com/rust-osdev/bootloader/pull/91) - [Update x86_64 dependency to version 0.8.3](https://github.com/rust-osdev/bootloader/pull/92) by [@vinaychandra](https://github.com/vinaychandra) Since I focused my time on the new _Allocator Designs_ post, I did not have the time to make more progress on my plan to rewrite the 16-bit/32-bit stages of the bootloader in Rust. I hope to get back to it soon. ## `bootimage` There were no updates to the `bootimage` tool this month. ## `x86_64` The following changes were merged this month: - [Allow immediate port version of in/out instructions](https://github.com/rust-osdev/x86_64/pull/115) by [@m-ou-se](https://github.com/m-ou-se) - [Make more functions const](https://github.com/rust-osdev/x86_64/pull/116) by [@m-ou-se](https://github.com/m-ou-se) - Released as version 0.8.3 - [Return the UnusedPhysFrame on MapToError::PageAlreadyMapped](https://github.com/rust-osdev/x86_64/pull/118) by [@haraldh](https://github.com/haraldh) - This is a **breaking change** since it changes the signature of a type. - No new release was published yet to give us the option to bundle it with other breaking changes. There are also some pull requests that have some open design questions and are still being discussed: - [Add p23_insert_flag_mask argument to mapper.map_to()](https://github.com/rust-osdev/x86_64/pull/114) by [@haraldh](https://github.com/haraldh) - Related proposal: [Page Table Visitors](https://github.com/rust-osdev/x86_64/issues/121) by [@mark-i-m](https://github.com/mark-i-m) - [Add User Mode registers](https://github.com/rust-osdev/x86_64/pull/119) by [@vinaychandra](https://github.com/vinaychandra) Please feel free to join these discussions if you have opinions on the matter. ## `cargo-xbuild` The `cargo-xbuild` crate, which cross-compiles the sysroot, received the following updates this month: - [Override target path for building sysroot](https://github.com/rust-osdev/cargo-xbuild/pull/52) by [@upsuper](https://github.com/upsuper) - Published as version 0.5.21 ## `uart_16550` The `uart_16550` crate, which provides basic support for uart_16550 serial output, received a small dependency update: - [Update dependency for x86_64](https://github.com/rust-osdev/uart_16550/pull/4) by [@haraldh](https://github.com/haraldh) - Published as version 0.2.2 ================================================ FILE: blog/content/status-update/2020-03-02.md ================================================ +++ title = "Updates in February 2020" date = 2020-03-02 +++ This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and the corresponding libraries and tools. ## `blog_os` The repository of the _Writing an OS in Rust_ blog received the following updates: - [Mention potential bump allocator extensions](https://github.com/phil-opp/blog_os/pull/722) - [Don't panic on overflow in allocator; return null pointer instead](https://github.com/phil-opp/blog_os/pull/738) - [Update Allocator Designs post to signal OOM instead of panicking on overflow](https://github.com/phil-opp/blog_os/pull/739) - [Update to Zola 0.10](https://github.com/phil-opp/blog_os/pull/747) - [Experimental Support for Community Translations](https://github.com/phil-opp/blog_os/pull/692) - [Add translations from rustcc/writing-an-os-in-rust](https://github.com/phil-opp/blog_os/pull/694) - [Some fixes to generated translations](https://github.com/phil-opp/blog_os/pull/748) - [Add metadata to translations and list translators](https://github.com/phil-opp/blog_os/pull/749) - [Add a language selector for browser-supported languages](https://github.com/phil-opp/blog_os/pull/752) - [Use zola check to check for dead links; fix all dead links found](https://github.com/phil-opp/blog_os/pull/751) - [Convert all external links to https (if supported)](https://github.com/phil-opp/blog_os/commit/0619f3a9e766c575ba1a4f2c6825049c177f8c70) - [Mention in "Paging Introduction" that a CPU with 5-level paging is available now](https://github.com/phil-opp/blog_os/pull/732) - [Double Faults: A missing handler leads to a #GP exception (not a #NP)](https://github.com/phil-opp/blog_os/commit/b532c052add9d3eac18663f1836bc9eee11007af) - [Updated pc-keyboard to `0.5.0`](https://github.com/phil-opp/blog_os/pull/756) by [@RKennedy9064](https://github.com/RKennedy9064) ## `x86_64` The `x86_64` crate provides support for CPU-specific instructions, registers, and data structures of the `x86_64` architecture. There were lots of great contributions this month: - [Add User Mode registers](https://github.com/rust-osdev/x86_64/pull/119) by [@vinaychandra](https://github.com/vinaychandra) (released together with [#118](https://github.com/rust-osdev/x86_64/pull/118) as v0.9.0) - [Improve PageTableIndex and PageOffset](https://github.com/rust-osdev/x86_64/pull/122) by [@m-ou-se](https://github.com/m-ou-se) (released as v0.9.1) - [Remove the `cast` dependency](https://github.com/rust-osdev/x86_64/pull/124) by [@m-ou-se](https://github.com/m-ou-se) (released as v0.9.2) - [Fix GitHub actions to run latest available rustfmt](https://github.com/rust-osdev/x86_64/pull/126) by [@m-ou-se](https://github.com/m-ou-se) - [Enable usage with non-nightly rust](https://github.com/rust-osdev/x86_64/pull/127) by [@haraldh](https://github.com/haraldh) (released as v0.9.3) - [asm: add target_env = "musl" to pickup the underscore asm names](https://github.com/rust-osdev/x86_64/pull/128) by [@haraldh](https://github.com/haraldh) (released as v0.9.4) - [Add `#[inline]` attribute to small functions](https://github.com/rust-osdev/x86_64/pull/129) by [@AntoineSebert](https://github.com/AntoineSebert) (released as v0.9.5) - [Fix clippy warnings](https://github.com/rust-osdev/x86_64/pull/130) by [@AntoineSebert](https://github.com/AntoineSebert) - [Resolve remaining clippy warnings and add clippy job to CI](https://github.com/rust-osdev/x86_64/pull/132) ## `bootloader` The bootloader crate received two small bugfixes and one new feature this month: - [Objcopy replaces `.` chars with `_` chars](https://github.com/rust-osdev/bootloader/pull/94) (released as v0.8.6) - [Fix docs.rs build by specifying an explicit target](https://github.com/rust-osdev/bootloader/commit/af4f1016aa19fec3271226f8bfc2145521cf0c98) (released as v0.8.7) - [Add basic support for ELF thread local storage segments](https://github.com/rust-osdev/bootloader/pull/96) (released as v0.8.8) ## `bootimage` There were no updates to the `bootimage` tool this month. ## `cargo-xbuild` The `cargo-xbuild` crate provides support for cross-compiling `libcore` and `liballoc`. It received the following contributions this month: - [Added new option to the configuration table](https://github.com/rust-osdev/cargo-xbuild/pull/56) by [@parraman](https://github.com/rust-osdev/cargo-xbuild/pull/56) (released an v0.5.22) - [Pick up xbuild config from workspace manifest](https://github.com/rust-osdev/cargo-xbuild/pull/57) by [@ascjones](https://github.com/ascjones) (released as v0.5.23) - [Make `fn build` and `Args` public to enable use as lib](https://github.com/rust-osdev/cargo-xbuild/pull/59) by [@ascjones](https://github.com/ascjones) (released as v0.5.24) - [Fix: Not all projects have a root package](https://github.com/rust-osdev/cargo-xbuild/pull/61) (released as v0.5.25) - [Improvements to args and config for lib usage](https://github.com/rust-osdev/cargo-xbuild/pull/62) by [@ascjones](https://github.com/ascjones) (released as v0.5.26) - [Add `cargo xfix` command](https://github.com/rust-osdev/cargo-xbuild/pull/64) by [@tjhu](https://github.com/tjhu) (released as v0.5.27) - [Update dependencies](https://github.com/rust-osdev/cargo-xbuild/pull/65) by [@parasyte](https://github.com/parasyte) (released as v0.5.28) ## `uart_16550` The `uart_16550` crate, which provides basic support for uart_16550 serial output, received the following updates: - [Switch CI to GitHub Actions](https://github.com/rust-osdev/uart_16550/pull/6) - [Cargo.toml: update x86_64 dependency](https://github.com/rust-osdev/uart_16550/pull/5) by [@haraldh](https://github.com/haraldh) (released as v0.2.3) - [Enable usage with non-nightly rust](https://github.com/rust-osdev/uart_16550/pull/7) by [@haraldh](https://github.com/haraldh) (released as v0.2.4) ## `multiboot2-elf64` The `multiboot2-elf64` crate provides abstractions for reading the boot information of the multiboot 2 standard, which is implemented by bootloaders like GRUB. There were two updates to the crate in February: - [Add MemoryAreaType, to allow users to access memory area types in a type-safe way](https://github.com/rust-osdev/multiboot2-elf64/pull/61) by [@CWood1](https://github.com/CWood1) - [Add some basic documentation](https://github.com/rust-osdev/multiboot2-elf64/pull/62) by [@mental32](https://github.com/rust-osdev/multiboot2-elf64/pull/62) (released as v0.8.2) ================================================ FILE: blog/content/status-update/2020-04-01/index.md ================================================ +++ title = "Updates in March 2020" date = 2020-04-01 +++ This post gives an overview of the recent updates to the _Writing an OS in Rust_ blog and the corresponding libraries and tools. I focused my time this month on finishing the long-planned post about [**Async/Await**]. In addition to that, there were a few updates to the crates behind the scenes, including some great contributions and a new `vga` crate. [**Async/Await**]: @/edition-2/posts/12-async-await/index.md As mentioned in the _Async/Await_ post, I'm currently looking for job in Karlsruhe (Germany) or remote, so please let me know if you're interested. ## `blog_os` The repository of the _Writing an OS in Rust_ blog received the following updates: - [Update linked_list_allocator to v0.8.0](https://github.com/phil-opp/blog_os/pull/763) - [Update x86_64 dependency to version 0.9.6](https://github.com/phil-opp/blog_os/pull/764) - [New post about Async/Await](https://github.com/phil-opp/blog_os/pull/767) - [Discuss the approach of storing offsets for self-referential structs](https://github.com/phil-opp/blog_os/pull/774) - [Use a static counter for assigning task IDs](https://github.com/phil-opp/blog_os/pull/782) In addition to the changes above, there were a lot of [typo fixes] by external contributors. Thanks a lot! [typo fixes]: https://github.com/phil-opp/blog_os/pulls?q=is%3Apr+is%3Aclosed+created%3A2020-03-01..2020-04-02+-author%3Aphil-opp+ ## `x86_64` The `x86_64` crate provides support for CPU-specific instructions, registers, and data structures of the `x86_64` architecture. In March, there was only a single addition, which was required for the _Async/Await_ post: - [Add an enable_interrupts_and_hlt function that executes `sti; hlt`](https://github.com/rust-osdev/x86_64/pull/138) (released as v0.9.6) ## `bootloader` The bootloader crate received two contributions this month: - [Implement boot-info-address](https://github.com/rust-osdev/bootloader/pull/101) by [@Darksecond](https://github.com/Darksecond) (released as v0.8.9) - [Identity-map complete vga region (0xa0000 to 0xc0000)](https://github.com/rust-osdev/bootloader/pull/104) by [@RKennedy9064](https://github.com/RKennedy9064) (released as v0.9.0) ## `bootimage` The `bootimage` tool builds the `bootloader` and creates a bootable disk image from a kernel. It received a RUSTFLAGS-related bugfix: - [Set empty RUSTFLAGS to ensure that no .cargo/config applies](https://github.com/rust-osdev/bootimage/pull/51) ## `vga` There is a new crate under the `rust-osdev` organization: [`vga`](https://github.com/rust-osdev/vga) created by [@RKennedy9064](https://github.com/RKennedy9064). The purpose of the library is to provide abstractions for the VGA hardware. For example, the crate allows to switch the VGA hardware to graphics mode, which makes it possible to draw on a pixel-based framebuffer: ![QEMU printing a box with "Hello World" in it](qemu-vga-crate.png) For more information about the crate, check out its [API documentation](https://docs.rs/vga/0.2.2/vga/) and the [GitHub repository](https://github.com/rust-osdev/vga). ================================================ FILE: blog/content/status-update/_index.md ================================================ +++ title = "Status Updates" template = "status-update-section.html" page_template = "status-update-page.html" sort_by = "date" description = "These posts give a regular overview of the most important changes to the blog and the tools and libraries behind the scenes." +++ ================================================ FILE: blog/requirements.txt ================================================ PyGithub ================================================ FILE: blog/sass/css/edition-2/main.scss ================================================ /* * CSS file for the second edition of os.phil-opp.com. * * Based on `poole`which was designed, built, and released under MIT license by @mdo. See * https://github.com/poole/poole. */ /* * Contents * * Fonts * Body resets * Dark/Light Mode * Custom type * Messages * Container * Masthead * Posts and pages * Pagination * Reverse layout * Themes */ /* Fonts */ @font-face { font-family: "Iosevka"; src: url("/fonts/iosevka-regular.woff2") format("woff2"), url("/fonts/iosevka-regular.woff") format("woff"); font-weight: normal; font-style: normal; font-display: swap; } /* * Body resets * * Update the foundational and global aspects of the page. */ * { -webkit-box-sizing: border-box; -moz-box-sizing: border-box; box-sizing: border-box; } html, body { margin: 0; padding: 0; } html { font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; line-height: 1.5; } /* Dark/Light Mode */ @mixin set-colors-light { --background-color: #fff; --text-color: #515151; --heading-color: #313131; --heading-code-color: #a0565c; --link-color: #268bd2; --hr-color-top: #eee; --hr-color-bottom: #fff; --code-text-color: #bf616a; --code-background-color: #f9f9f9; --masthead-title-color: #505050; --strong-color: #303030; --masthead-subtitle: #c0c0c0; --post-title-color: #228; } @mixin set-colors-dark { --background-color: #252525; --text-color: #f5f5f5; --heading-color: #eee; --heading-code-color: #eee; --link-color: #c59ff3; --hr-color-top: #333; --hr-color-bottom: #000; --code-text-color: #eeeeee; --code-background-color: #222222; --masthead-title-color: #b6b6b6; --strong-color: #c0c0c0; --masthead-subtitle: #8f8f8f; --post-title-color: #c8c8ff; } body { @include set-colors-light(); } [data-theme="dark"] body { @include set-colors-dark(); } /* Styles for users who prefer dark mode at the OS level */ @media (prefers-color-scheme: dark) { /* defaults to dark theme */ body { @include set-colors-dark(); } /* Override dark mode with light mode styles if the user decides to swap */ [data-theme="light"] body { @include set-colors-light(); } } body { color: var(--text-color); background-color: var(--background-color); -webkit-text-size-adjust: 100%; -ms-text-size-adjust: 100%; } /* No `:visited` state is required by default (browsers will use `a`) */ a { color: var(--link-color); text-decoration: none; } /* `:focus` is linked to `:hover` for basic accessibility */ a:hover, a:focus { text-decoration: underline; } /* Headings */ h1, h2, h3, h4, h5, h6 { margin-bottom: 0.5rem; font-weight: bold; line-height: 1.25; color: var(--heading-color); text-rendering: optimizeLegibility; } h1 { font-size: 2rem; } h2 { margin-top: 1rem; font-size: 1.5rem; } h3 { margin-top: 1.5rem; font-size: 1.25rem; } h4, h5, h6 { margin-top: 1rem; font-size: 1rem; } /* Body text */ p { margin-top: 0; margin-bottom: 1rem; } strong { color: var(--strong-color); } /* Lists */ ul, ol, dl { margin-top: 0; margin-bottom: 1rem; } /* Nested lists */ li ul, li ol, li dl { margin-bottom: 0; } li ul + p, li ol + p, li dl + p { margin-top: 1rem; } dt { font-weight: bold; } dd { margin-bottom: 0.5rem; } /* Misc */ hr { position: relative; margin: 1.5rem 0; border: 0; border-top: 1px solid var(--hr-color-top); border-bottom: 1px solid var(--hr-color-bottom); } abbr { font-size: 90%; font-weight: bold; color: #555; text-transform: uppercase; } abbr[title] { cursor: help; border-bottom: 1px dotted #e5e5e5; } /* Code */ code, pre { font-family: "Iosevka", monospace; } code { padding: 0.25em 0.5em; font-size: 85%; color: var(--code-text-color); background-color: var(--code-background-color); border-radius: 3px; } pre { display: block; margin-top: 0; margin-bottom: 1rem; padding: 0.5rem; font-size: 0.95rem; line-height: 1.4; white-space: pre; overflow: auto; word-wrap: normal; background-color: var(--code-background-color); } pre code { padding: 0; font-size: 100%; color: inherit; background-color: transparent; } .highlight { margin-bottom: 1rem; border-radius: 4px; } .highlight pre { margin-bottom: 0; } /* Quotes */ blockquote { padding: 0.5rem 1rem; margin: 0.8rem 0; color: #7a7a7a; border-left: 0.25rem solid #e5e5e5; } blockquote p:last-child { margin-bottom: 0; } @media (min-width: 30rem) { blockquote { padding-right: 5rem; padding-left: 1.25rem; } } img { display: block; margin: 0 0 1rem; border-radius: 5px; max-width: 100%; color: grey; font-style: italic; } /* Tables */ table { margin-bottom: 1rem; width: 100%; border: 1px solid #e5e5e5; border-collapse: collapse; } td, th { padding: 0.25rem 0.5rem; border: 1px solid #e5e5e5; } tbody tr:nth-child(odd) td, tbody tr:nth-child(odd) th { background-color: var(--code-background-color); } /* * Custom type * * Extend paragraphs with `.lead` for larger introductory text. */ .lead { font-size: 1.25rem; font-weight: 300; } /* * Messages * * Show alert messages to users. You may add it to single elements like a `

    `, * or to a parent if there are multiple elements to show. */ .message { margin-bottom: 1rem; padding: 1rem; color: #717171; background-color: var(--code-background-color); } /* * Container * * Center the page content. */ .container { max-width: 45rem; padding-left: 1rem; padding-right: 1rem; margin-left: auto; margin-right: auto; } /* * Masthead * * Super small header above the content for site name and short description. */ .masthead { padding-top: 1rem; padding-bottom: 1rem; margin-bottom: 1rem; } .masthead-title { margin-top: 0; margin-bottom: 0; color: var(--masthead-title-color); } .masthead-title a { color: var(--masthead-title-color); } .masthead small { font-size: 75%; font-weight: 400; color: var(--masthead-subtitle); letter-spacing: 0; } /* * Posts and pages * * Each post is wrapped in `.post` and is used on default and post layouts. Each * page is wrapped in `.page` and is only used on the page layout. */ .page { margin-bottom: 4em; } /* Blog post or page title */ .page-title, .post-title a { color: var(--post-title-color); } .page-title, .post-title { margin-top: 0; } /* Meta data line below post title */ .post-date { display: block; margin-top: -0.5rem; margin-bottom: 1rem; color: #9a9a9a; } /* Related posts */ .related { padding-top: 2rem; padding-bottom: 2rem; border-top: 1px solid #eee; } .related-posts { padding-left: 0; list-style: none; } .related-posts h3 { margin-top: 0; } .related-posts li small { font-size: 75%; color: #999; } .related-posts li a:hover { color: #268bd2; text-decoration: none; } .related-posts li a:hover small { color: inherit; } /* * Pagination * * Super lightweight (HTML-wise) blog pagination. `span`s are provide for when * there are no more previous or next posts to show. */ .pagination { overflow: hidden; /* clearfix */ margin-left: -1rem; margin-right: -1rem; font-family: "PT Sans", Helvetica, Arial, sans-serif; color: #ccc; text-align: center; } /* Pagination items can be `span`s or `a`s */ .pagination-item { display: block; padding: 1rem; border: 1px solid #eee; } .pagination-item:first-child { margin-bottom: -1px; } /* Only provide a hover state for linked pagination items */ a.pagination-item:hover { background-color: #f5f5f5; } @media (min-width: 30rem) { .pagination { margin: 3rem 0; } .pagination-item { float: left; width: 50%; } .pagination-item:first-child { margin-bottom: 0; border-top-left-radius: 4px; border-bottom-left-radius: 4px; } .pagination-item:last-child { margin-left: -1px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; } } h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { padding: 0; color: var(--heading-code-color); font-size: 90%; background-color: inherit; } .masthead-title { font-size: 1.25rem; display: inline; } .masthead p { font-size: 1.25rem; display: inline; margin: 0; margin-left: 1rem; padding: 0; line-height: 1; } .front-page-introduction { margin-bottom: 2rem; } .navigation { float: right; } .navigation img { height: 1em; vertical-align: baseline; display: inline-block; margin: 0; padding: 0; border-radius: 0; } main img { max-width: 100%; margin: auto; } .post { margin-bottom: 2em; } .post:last-child { margin-bottom: 0em; } .frontpage-section { margin-bottom: 2rem; } .posts { padding: 1.5rem 1rem 0.5rem 1rem; border-radius: 10px; margin-bottom: 2rem; margin-left: -0.5rem; margin-right: -0.5rem; } .posts.neutral { border: 2px solid #999; } .posts.subscribe { border: 2px solid #aaa; } .posts.edition-1 { border: 2px solid #aaa; background-color: #99ff0022; } .posts.bare-bones { border: 2px solid #66f; } .posts.memory-management { border: 2px solid #fc0; } .posts.interrupts { border: 2px solid #f66; } .posts.multitasking { border: 2px solid #556b2f; } .posts hr { margin: 2rem 0; } .post-summary { margin-bottom: 1rem; } .post-summary p { display: inline; } .read-more { margin-left: 5px; } .no-translation { margin-top: 0.3rem; color: #999999; } .translation_contributors { display: block; margin-top: 0.5rem; font-size: 0.9rem; color: gray; } .post-category { margin-right: 0.5rem; text-transform: uppercase; font-size: 0.8rem; text-align: right; } .post-category.bare-bones { color: #55d; } .post-category.memory-management { color: #990; } .post-category.interrupts { color: #f33; } .post-category.multitasking { color: #556b2f; } .post-footer-support { margin-top: 2rem; } .PageNavigation { font-size: 0.9em; display: table; width: 100%; overflow: hidden; } .PageNavigation a { display: table-cell; } .PageNavigation .previous { text-align: left; } .PageNavigation .next { text-align: right; } footer.footer { margin-top: 1rem; margin-bottom: 1rem; .spaced { margin-left: 0.5rem; } } .footnotes { font-size: 85%; } .footnotes li { margin-bottom: 1rem; } sup, sub { line-height: 0; } a.anchorjs-link:hover { text-decoration: none; } #toc-aside { display: none; } #toc-inline summary { margin-bottom: 0.2rem; } aside#all-posts-link { font-size: 90%; margin-top: 0.5rem; } @media (min-width: 80rem) { #toc-inline { display: none; } #toc-aside { display: block; width: 12rem; position: sticky; float: left; top: 3.5rem; margin-top: -4rem; margin-left: -15rem; font-size: 90%; line-height: 1.2; } #toc-aside li > a, #toc-aside h2 { opacity: 0.5; transition: opacity 0.5s; } #toc-aside:hover li > a, #toc-aside:hover h2 { opacity: 1; } #toc-aside li.active > a { font-weight: bold; } #toc-aside h2 { font-size: 110%; margin-bottom: 0.2rem; } #toc-aside ol { margin: 0 0 0.2rem 0; padding: 0 0 0 1rem; list-style: none; } #toc-aside ol li a:before { content: ""; border-color: transparent #008eef; border-style: solid; border-width: 0.35em 0 0.35em 0.45em; display: block; height: 0; width: 0; left: -1em; top: 0.9em; position: relative; } #toc-aside.coarse li ol { display: none; } aside.page-aside-right { position: absolute; min-width: 11rem; max-width: 17rem; top: 4rem; margin-left: 45rem; margin-right: 2rem; font-size: 90%; } aside.page-aside-right .block { margin-bottom: 1.5rem; } aside.page-aside-right h2 { font-size: 110%; margin-bottom: 0.2rem; } aside.page-aside-right ul { margin: 0 0 0.2rem 0; padding: 0 0 0 1rem; } aside.page-aside-right ul li { margin-top: 0.5rem; } #language-selector li { margin-top: 0; } aside#all-posts-link { position: fixed; top: 1.25rem; margin-top: 0; margin-left: -15rem; } } aside.page-aside-right time { color: #9a9a9a; } a code { color: var(--link-color); } a.zola-anchor { opacity: 0; position: absolute; margin-left: -1.5em; padding-right: 1em; font-size: 0.6em; vertical-align: baseline; line-height: 2em; } :hover > a.zola-anchor { opacity: 1; text-decoration: none; } a.zola-anchor:hover { text-decoration: none; } div.note { padding: 0.7rem 1rem; margin: 1rem 0.2rem; border: 2px solid #6ad46a; border-radius: 5px; background-color: #99ff991f; } div.note p:last-child { margin-bottom: 0; } div.note h2, div.note h3, div.note h4 { margin-top: 0rem; } div.warning { padding: 0.7rem 1rem; margin: 1rem 0.2rem; border: 2px solid orange; border-radius: 5px; background-color: #ffa50022; } div.warning p:last-child { margin-bottom: 0; } div.warning h2 { margin-top: 0rem; } form.subscribe { margin: 1rem; } div.subscribe-fields { display: flex; } form.subscribe input { padding: 0.5rem; border: 1px solid #e5e5e5; } form.subscribe input[type="email"] { flex: 1; } form.subscribe input[type="submit"] { padding: 0.25rem 0.5rem; cursor: pointer; } /* Asides */ aside.post_aside { font-style: italic; padding: 0rem 1rem 0rem; margin: 0.8rem 0; border-left: 0.1rem solid #e5e5e5; border-right: 0.1rem solid #e5e5e5; } details summary { cursor: pointer; } details summary h3, details summary h4, details summary h5, details summary h6 { display: inline; } .gh-repo-box { border: 1px solid #d1d5da; border-radius: 3px; padding: 16px; margin-top: 0.5rem; color: #586069; font-size: 80%; } .gh-repo-box .repo-link { color: #0366d6; font-weight: 600; font-size: 120%; } .gh-repo-box .subtitle { margin-bottom: 16px; } .gh-repo-box .stars-forks { margin-bottom: 0; } .gh-repo-box .stars-forks a { color: #586069; } .gh-repo-box .stars-forks a:hover { color: #0366d6; text-decoration: none; } .gh-repo-box .stars-forks svg { vertical-align: text-bottom; fill: currentColor; } .gh-repo-box .stars { display: inline-block; } .gh-repo-box .forks { display: inline-block; margin-left: 16px; } .gh-repo-box .sponsor { display: inline-block; margin-left: 16px; } .hidden { display: none; } .toc-comments-link { margin-top: 0.5rem; } h5 { font-style: italic; font-size: 0.9rem; } .gray { color: gray; } a strong { color: #268bd2; } .right-to-left { direction: rtl; font-family: Vazir; } .left-to-right, .right-to-left pre, .right-to-left table, .right-to-left[id="toc-aside"] { direction: ltr; } .status-update-list li { margin-bottom: 0.5rem; } .giscus { margin-top: 1.5rem; } img { background-color: white; } /* Manual switch between dark and light mode */ .theme-switch { margin-bottom: 1rem; @media (min-width: 80rem) { position: fixed; left: 2rem; bottom: 2rem; margin-bottom: 0rem; } } .light-switch { @mixin light-switch-light { // icon: https://icons.getbootstrap.com/icons/moon-fill/ (MIT licensed) background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='%23004' class='bi bi-moon' viewBox='0 0 16 16'%3E%3Cpath fill-rule='evenodd' d='M14.53 10.53a7 7 0 0 1-9.058-9.058A7.003 7.003 0 0 0 8 15a7.002 7.002 0 0 0 6.53-4.47z'/%3E%3C/svg%3E"); } @mixin light-switch-dark { // icon: https://icons.getbootstrap.com/icons/brightness-high-fill/ (MIT licensed) background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='%23ff9' class='bi bi-brightness-high-fill' viewBox='0 0 16 16'%3E%3Cpath d='M12 8a4 4 0 1 1-8 0 4 4 0 0 1 8 0zM8 0a.5.5 0 0 1 .5.5v2a.5.5 0 0 1-1 0v-2A.5.5 0 0 1 8 0zm0 13a.5.5 0 0 1 .5.5v2a.5.5 0 0 1-1 0v-2A.5.5 0 0 1 8 13zm8-5a.5.5 0 0 1-.5.5h-2a.5.5 0 0 1 0-1h2a.5.5 0 0 1 .5.5zM3 8a.5.5 0 0 1-.5.5h-2a.5.5 0 0 1 0-1h2A.5.5 0 0 1 3 8zm10.657-5.657a.5.5 0 0 1 0 .707l-1.414 1.415a.5.5 0 1 1-.707-.708l1.414-1.414a.5.5 0 0 1 .707 0zm-9.193 9.193a.5.5 0 0 1 0 .707L3.05 13.657a.5.5 0 0 1-.707-.707l1.414-1.414a.5.5 0 0 1 .707 0zm9.193 2.121a.5.5 0 0 1-.707 0l-1.414-1.414a.5.5 0 0 1 .707-.707l1.414 1.414a.5.5 0 0 1 0 .707zM4.464 4.465a.5.5 0 0 1-.707 0L2.343 3.05a.5.5 0 1 1 .707-.707l1.414 1.414a.5.5 0 0 1 0 .708z'/%3E%3C/svg%3E"); } display: inline-block; @include light-switch-light(); background-repeat: no-repeat; width: 2rem; height: 2rem; cursor: pointer; opacity: 0.6; &:hover { transform: scale(1.3); transition: 200ms ease-out; opacity: 1; } [data-theme="dark"] & { @include light-switch-dark(); } @media (prefers-color-scheme: dark) { @include light-switch-dark(); [data-theme="light"] & { @include light-switch-light(); } } } /* Clear theme override and go back to system theme */ .light-switch-reset { @mixin light-switch-reset-light { // icon: https://icons.getbootstrap.com/icons/x-circle-fill/ (MIT licensed) background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='16' height='16' fill='%23666' class='bi bi-x-circle' viewBox='0 0 16 16'%3E%3Cpath d='M8 15A7 7 0 1 1 8 1a7 7 0 0 1 0 14zm0 1A8 8 0 1 0 8 0a8 8 0 0 0 0 16z'/%3E%3Cpath d='M4.646 4.646a.5.5 0 0 1 .708 0L8 7.293l2.646-2.647a.5.5 0 0 1 .708.708L8.707 8l2.647 2.646a.5.5 0 0 1-.708.708L8 8.707l-2.646 2.647a.5.5 0 0 1-.708-.708L7.293 8 4.646 5.354a.5.5 0 0 1 0-.708z'/%3E%3C/svg%3E"); } @mixin light-switch-reset-dark { // icon: https://icons.getbootstrap.com/icons/x-circle-fill/ (MIT licensed) background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='16' height='16' fill='%23999' class='bi bi-x-circle' viewBox='0 0 16 16'%3E%3Cpath d='M8 15A7 7 0 1 1 8 1a7 7 0 0 1 0 14zm0 1A8 8 0 1 0 8 0a8 8 0 0 0 0 16z'/%3E%3Cpath d='M4.646 4.646a.5.5 0 0 1 .708 0L8 7.293l2.646-2.647a.5.5 0 0 1 .708.708L8.707 8l2.647 2.646a.5.5 0 0 1-.708.708L8 8.707l-2.646 2.647a.5.5 0 0 1-.708-.708L7.293 8 4.646 5.354a.5.5 0 0 1 0-.708z'/%3E%3C/svg%3E"); } @include light-switch-reset-light(); vertical-align: bottom; margin-left: 0.5rem; background-repeat: no-repeat; width: 2rem; height: 2rem; cursor: pointer; opacity: 0.6; display: none; [data-theme="light"] & { display: inline-block; } [data-theme="dark"] & { @include light-switch-reset-dark(); display: inline-block; } @media (min-width: 80rem) { position: fixed; left: 4.5rem; bottom: 2rem; } &:hover { transform: scale(1.1); transition: 200ms ease-out; opacity: 1; } } ================================================ FILE: blog/static/CNAME ================================================ os.phil-opp.com ================================================ FILE: blog/static/atom.xml/index.html ================================================ ================================================ FILE: blog/static/css/edition-1/isso.css ================================================ #isso-thread * { -webkit-box-sizing: border-box; -moz-box-sizing: border-box; box-sizing: border-box; } #isso-thread a { text-decoration: none; } #isso-thread { padding: 0; margin: 0; } #isso-thread > h4 { color: #555; font-weight: bold; } #isso-thread .textarea { min-height: 58px; outline: 0; } #isso-thread .textarea.placeholder { color: #AAA; } .isso-comment { max-width: 68em; padding-top: 0.95em; margin: 0.95em auto; } .isso-comment:not(:first-of-type), .isso-follow-up .isso-comment { border-top: 1px solid rgba(0, 0, 0, 0.1); } .isso-comment > div.avatar, .isso-postbox > .avatar { display: block; float: left; width: 7%; margin: 3px 15px 0 0; } .isso-postbox > .avatar { float: left; margin: 5px 10px 0 5px; width: 48px; height: 48px; overflow: hidden; } .isso-comment > div.avatar > svg, .isso-postbox > .avatar > svg { max-width: 48px; max-height: 48px; border: 1px solid rgba(0, 0, 0, 0.2); border-radius: 3px; box-shadow: 0 1px 2px rgba(0, 0, 0, 0.1); } .isso-comment > div.text-wrapper { display: block; } .isso-comment .isso-follow-up { padding-left: calc(7% + 20px); } .isso-comment > div.text-wrapper > .isso-comment-header, .isso-comment > div.text-wrapper > .isso-comment-footer { font-size: 0.95em; } .isso-comment > div.text-wrapper > .isso-comment-header { font-size: 0.85em; } .isso-comment > div.text-wrapper > .isso-comment-header .spacer { padding: 0 6px; } .isso-comment > div.text-wrapper > .isso-comment-header .spacer, .isso-comment > div.text-wrapper > .isso-comment-header a.permalink, .isso-comment > div.text-wrapper > .isso-comment-header .note, .isso-comment > div.text-wrapper > .isso-comment-header a.parent { color: gray !important; font-weight: normal; text-shadow: none !important; } .isso-comment > div.text-wrapper > .isso-comment-header .spacer:hover, .isso-comment > div.text-wrapper > .isso-comment-header a.permalink:hover, .isso-comment > div.text-wrapper > .isso-comment-header .note:hover, .isso-comment > div.text-wrapper > .isso-comment-header a.parent:hover { color: #606060 !important; } .isso-comment > div.text-wrapper > .isso-comment-header .note { float: right; } .isso-comment > div.text-wrapper > .isso-comment-header .author { font-weight: bold; color: #555; } .isso-comment > div.text-wrapper > .textarea-wrapper .textarea { margin-top: 0.2em; } .isso-comment > div.text-wrapper > div.text p { margin-top: 0.2em; } .isso-comment > div.text-wrapper > div.text p:last-child { margin-bottom: 0.2em; } .isso-comment > div.text-wrapper > div.text h1, .isso-comment > div.text-wrapper > div.text h2, .isso-comment > div.text-wrapper > div.text h3, .isso-comment > div.text-wrapper > div.text h4, .isso-comment > div.text-wrapper > div.text h5, .isso-comment > div.text-wrapper > div.text h6 { font-size: 130%; font-weight: bold; } .isso-comment > div.text-wrapper > div.textarea-wrapper .textarea { width: 100%; border: 1px solid #f0f0f0; border-radius: 2px; box-shadow: 0 0 2px #888; } .isso-comment > div.text-wrapper > .isso-comment-footer { display:none; } .isso-comment.isso-no-votes span.votes { display: none; } ================================================ FILE: blog/static/css/edition-1/main.css ================================================ h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { padding: 0; color: #a0565c; font-size: 95%; background-color: inherit; } .masthead-title { font-size: 1.25rem; display: inline; } .masthead p { font-size: 1.25rem; display: inline; margin: 0; margin-left: 1rem; padding: 0; line-height: 1; } .front-page-introduction { margin-bottom: 2rem; } .navigation { float: right; } .navigation img { height: 1em; vertical-align: baseline; display: inline-block; margin: 0; padding: 0; border-radius: 0; } main img { max-width: 100%; margin: auto; } .post { margin-bottom: 2em; } .post:last-child { margin-bottom: 0em; } .frontpage-section { margin-bottom: 2rem; } .posts { padding: 1.5rem 1rem 0.5rem 1rem; border-radius: 10px; margin-bottom: 2rem; margin-left: -0.5rem; margin-right: -0.5rem; } .posts.neutral { border: 2px solid #999; } .posts.subscribe { border: 2px solid #aaa; } .posts.edition-1 { border: 2px solid #aaa; background-color: #99ff0022; } .posts.bare-bones { border: 2px solid #66f; } .posts.memory-management { border: 2px solid #fc0 } .posts.exceptions { border: 2px solid #f66; } .posts.multitasking { border: 2px solid #556b2f; } .posts hr { margin: 2rem 0; } .post-summary { margin-bottom: 1rem; } .post-summary p { display: inline; } .read-more { margin-left: 5px; } .no-translation { margin-top: .3rem; color: #999999; } .post-category { margin-right: 0.5rem; text-transform: uppercase; font-size: 0.8rem; text-align: right; } .post-category.bare-bones { color: #55d; } .post-category.memory-management { color: #990; } .post-category.exceptions { color: #f33; } .post-category.multitasking { color: #556b2f; } .post-footer-support { margin-top: 2rem; } .PageNavigation { font-size: 0.9em; display: table; width: 100%; overflow: hidden; } .PageNavigation a { display: table-cell; } .PageNavigation .previous { text-align: left; } .PageNavigation .next { text-align: right; } footer.footer { margin-top: 1rem; margin-bottom: 1rem; } .footnotes { font-size: 85%; } .footnotes li { margin-bottom: 1rem; } sup, sub { line-height: 0; } a.anchorjs-link:hover { text-decoration: none; } #toc-aside { display: none; } #toc-inline summary { margin-bottom: .2rem; } aside#all-posts-link { font-size: 90%; margin-top: 0.5rem; } @media (min-width: 80rem) { #toc-inline { display: none; } #toc-aside { display: block; width: 12rem; position: sticky; float: left; top: 3.5rem; margin-top: -4rem; margin-left: -15rem; font-size: 90%; line-height: 1.2; } #toc-aside li > a, #toc-aside h2 { opacity: .5; transition: opacity .5s; } #toc-aside:hover li > a, #toc-aside:hover h2 { opacity: 1; } #toc-aside li.active > a { font-weight: bold; } #toc-aside h2 { font-size: 110%; margin-bottom: .2rem; } #toc-aside ol { margin: 0 0 .2rem 0; padding: 0 0 0 1rem; list-style:none; } #toc-aside ol li a:before { content: ""; border-color: transparent #008eef; border-style: solid; border-width: 0.35em 0 0.35em 0.45em; display: block; height: 0; width: 0; left: -1em; top: 0.9em; position: relative; } #toc-aside.coarse li ol { display: none; } aside.page-aside-right { position: absolute; min-width: 11rem; max-width: 17rem; top: 4rem; margin-left: 45rem; margin-right: 2rem; font-size: 90%; } aside.page-aside-right .block { margin-bottom: 1.5rem; } aside.page-aside-right h2 { font-size: 110%; margin-bottom: .2rem; } aside.page-aside-right ul { margin: 0 0 .2rem 0; padding: 0 0 0 1rem; } aside.page-aside-right ul li { margin-top: .5rem; } aside#all-posts-link { position: fixed; top: 1.25rem; margin-top: 0; margin-left: -15rem; } } aside.page-aside-right time { color: #9a9a9a; } a code { color: #268bd2; } a.zola-anchor { opacity: 0; position: absolute; margin-left: -1.5em; padding-right: 1em; font-size: 0.6em; vertical-align: baseline; line-height: 2em; } :hover>a.zola-anchor { opacity: 1; text-decoration: none; } a.zola-anchor:hover { text-decoration: none; } div.note { padding: .7rem 1rem; margin: 1rem .2rem; border: 2px solid #6ad46a; border-radius: 5px; background-color: #99ff991f; } div.note p:last-child { margin-bottom: 0; } div.warning { padding: .7rem 1rem; margin: 1rem .2rem; border: 2px solid orange; border-radius: 5px; background-color: #ffa50022; } div.warning p:last-child { margin-bottom: 0; } form.subscribe { margin: 1rem; } div.subscribe-fields { display: flex; } form.subscribe input { padding: .5rem; border: 1px solid #e5e5e5; } form.subscribe input[type=email] { flex: 1; } form.subscribe input[type=submit] { padding: .25rem .5rem; cursor: pointer; } /* Asides */ aside.post_aside { font-style: italic; padding: 0rem 1rem 0rem; margin: .8rem 0; border-left: .1rem solid #e5e5e5; border-right: .1rem solid #e5e5e5; } details summary { cursor: pointer; } details summary h3, details summary h4, details summary h5, details summary h6 { display: inline; } .gh-repo-box { border: 1px solid #d1d5da; border-radius: 3px; padding: 16px; margin-top: 0.5rem; color: #586069; font-size: 80%; } .gh-repo-box .repo-link { color: #0366d6; font-weight: 600; font-size: 120%; } .gh-repo-box .subtitle { margin-bottom: 16px; } .gh-repo-box .stars-forks { margin-bottom: 0; } .gh-repo-box .stars-forks a { color: #586069; } .gh-repo-box .stars-forks a:hover { color: #0366d6; text-decoration: none; } .gh-repo-box .stars-forks svg { vertical-align: text-bottom; fill: currentColor; } .gh-repo-box .stars { display: inline-block; } .gh-repo-box .forks { display: inline-block; margin-left: 16px; } .gh-repo-box .sponsor { display: inline-block; margin-left: 16px; } .hidden { display: none; } .toc-comments-link { margin-top: .5rem; } h5 { font-style: italic; font-size: 0.9rem; } .gray { color: gray; } a strong { color: #268bd2; } ================================================ FILE: blog/static/css/edition-1/poole.css ================================================ /* * ___ * /\_ \ * _____ ___ ___\//\ \ __ * /\ '__`\ / __`\ / __`\\ \ \ /'__`\ * \ \ \_\ \/\ \_\ \/\ \_\ \\_\ \_/\ __/ * \ \ ,__/\ \____/\ \____//\____\ \____\ * \ \ \/ \/___/ \/___/ \/____/\/____/ * \ \_\ * \/_/ * * Designed, built, and released under MIT license by @mdo. Learn more at * https://github.com/poole/poole. */ /* * Contents * * Body resets * Custom type * Messages * Container * Masthead * Posts and pages * Pagination * Reverse layout * Themes */ /* * Body resets * * Update the foundational and global aspects of the page. */ * { -webkit-box-sizing: border-box; -moz-box-sizing: border-box; box-sizing: border-box; } html, body { margin: 0; padding: 0; } html { font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; line-height: 1.5; } body { color: #515151; background-color: #fff; -webkit-text-size-adjust: 100%; -ms-text-size-adjust: 100%; } /* No `:visited` state is required by default (browsers will use `a`) */ a { color: #268bd2; text-decoration: none; } /* `:focus` is linked to `:hover` for basic accessibility */ a:hover, a:focus { text-decoration: underline; } /* Headings */ h1, h2, h3, h4, h5, h6 { margin-bottom: .5rem; font-weight: bold; line-height: 1.25; color: #313131; text-rendering: optimizeLegibility; } h1 { font-size: 2rem; } h2 { margin-top: 1rem; font-size: 1.5rem; } h3 { margin-top: 1.5rem; font-size: 1.25rem; } h4, h5, h6 { margin-top: 1rem; font-size: 1rem; } /* Body text */ p { margin-top: 0; margin-bottom: 1rem; } strong { color: #303030; } /* Lists */ ul, ol, dl { margin-top: 0; margin-bottom: 1rem; } /* Nested lists */ li ul, li ol, li dl { margin-bottom: 0; } li ul + p, li ol + p, li dl + p { margin-top: 1rem; } dt { font-weight: bold; } dd { margin-bottom: .5rem; } /* Misc */ hr { position: relative; margin: 1.5rem 0; border: 0; border-top: 1px solid #eee; border-bottom: 1px solid #fff; } abbr { font-size: 90%; font-weight: bold; color: #555; text-transform: uppercase; } abbr[title] { cursor: help; border-bottom: 1px dotted #e5e5e5; } /* Code */ code, pre { font-family: Menlo, Monaco, Consolas, monospace; } code { padding: .25em .5em; font-size: 90%; color: #bf616a; background-color: #f9f9f9; border-radius: 3px; } pre { display: block; margin-top: 0; margin-bottom: 1rem; padding: .5rem; font-size: .9rem; line-height: 1.4; white-space: pre; overflow: auto; word-wrap: normal; background-color: #f9f9f9; } pre code { padding: 0; font-size: 100%; color: inherit; background-color: transparent; } .highlight { margin-bottom: 1rem; border-radius: 4px; } .highlight pre { margin-bottom: 0; } /* Quotes */ blockquote { padding: .5rem 1rem; margin: .8rem 0; color: #7a7a7a; border-left: .25rem solid #e5e5e5; } blockquote p:last-child { margin-bottom: 0; } @media (min-width: 30rem) { blockquote { padding-right: 5rem; padding-left: 1.25rem; } } img { display: block; margin: 0 0 1rem; border-radius: 5px; max-width: 100%; color: grey; font-style: italic; } /* Tables */ table { margin-bottom: 1rem; width: 100%; border: 1px solid #e5e5e5; border-collapse: collapse; } td, th { padding: .25rem .5rem; border: 1px solid #e5e5e5; } tbody tr:nth-child(odd) td, tbody tr:nth-child(odd) th { background-color: #f9f9f9; } /* * Custom type * * Extend paragraphs with `.lead` for larger introductory text. */ .lead { font-size: 1.25rem; font-weight: 300; } /* * Messages * * Show alert messages to users. You may add it to single elements like a `

    `, * or to a parent if there are multiple elements to show. */ .message { margin-bottom: 1rem; padding: 1rem; color: #717171; background-color: #f9f9f9; } /* * Container * * Center the page content. */ .container { max-width: 45rem; padding-left: 1rem; padding-right: 1rem; margin-left: auto; margin-right: auto; } /* * Masthead * * Super small header above the content for site name and short description. */ .masthead { padding-top: 1rem; padding-bottom: 1rem; margin-bottom: 1rem; } .masthead-title { margin-top: 0; margin-bottom: 0; color: #505050; } .masthead-title a { color: #505050; } .masthead small { font-size: 75%; font-weight: 400; color: #c0c0c0; letter-spacing: 0; } /* * Posts and pages * * Each post is wrapped in `.post` and is used on default and post layouts. Each * page is wrapped in `.page` and is only used on the page layout. */ .page { margin-bottom: 4em; } /* Blog post or page title */ .page-title, .post-title, .post-title a { color: #303030; } .page-title, .post-title { margin-top: 0; } /* Meta data line below post title */ .post-date { display: block; margin-top: -.5rem; margin-bottom: 1rem; color: #9a9a9a; } /* Related posts */ .related { padding-top: 2rem; padding-bottom: 2rem; border-top: 1px solid #eee; } .related-posts { padding-left: 0; list-style: none; } .related-posts h3 { margin-top: 0; } .related-posts li small { font-size: 75%; color: #999; } .related-posts li a:hover { color: #268bd2; text-decoration: none; } .related-posts li a:hover small { color: inherit; } /* * Pagination * * Super lightweight (HTML-wise) blog pagination. `span`s are provide for when * there are no more previous or next posts to show. */ .pagination { overflow: hidden; /* clearfix */ margin-left: -1rem; margin-right: -1rem; font-family: "PT Sans", Helvetica, Arial, sans-serif; color: #ccc; text-align: center; } /* Pagination items can be `span`s or `a`s */ .pagination-item { display: block; padding: 1rem; border: 1px solid #eee; } .pagination-item:first-child { margin-bottom: -1px; } /* Only provide a hover state for linked pagination items */ a.pagination-item:hover { background-color: #f5f5f5; } @media (min-width: 30rem) { .pagination { margin: 3rem 0; } .pagination-item { float: left; width: 50%; } .pagination-item:first-child { margin-bottom: 0; border-top-left-radius: 4px; border-bottom-left-radius: 4px; } .pagination-item:last-child { margin-left: -1px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; } } ================================================ FILE: blog/static/handling-exceptions-with-naked-fns.html ================================================ ================================================ FILE: blog/static/js/edition-1/main.js ================================================ window.onload = function() { show_lang_selector(); var container = document.querySelector('#toc-aside'); if (container != null) { resize_toc(container); toc_scroll_position(container); window.onscroll = function() { toc_scroll_position(container) }; } } function resize_toc(container) { var containerHeight = container.clientHeight; var resize = function() { if (containerHeight > document.documentElement.clientHeight - 100) { container.classList.add('coarse'); } else { container.classList.remove('coarse'); } }; resize(); var resizeId; window.onresize = function() { clearTimeout(resizeId); resizeId = setTimeout(resize, 300); }; } function toc_scroll_position(container) { if (container.offsetParent === null) { // skip computation if ToC is not visible return; } var items = container.querySelectorAll("li") // remove active class for all items for (item of container.querySelectorAll("li")) { item.classList.remove("active"); } // look for active item var site_offset = document.documentElement.scrollTop; var current_toc_item = null; for (item of container.querySelectorAll("li")) { if (item.offsetParent === null) { // skip items that are not visible continue; } var anchor = item.firstElementChild.getAttribute("href"); var heading = document.querySelector(anchor); if (heading.offsetTop <= (site_offset + document.documentElement.clientHeight / 3)) { current_toc_item = item; } else { break; } } // set active class for current ToC item if (current_toc_item != null) { current_toc_item.classList.add("active"); } } function show_lang_selector() { var show_lang_selector = false; for (language_selector of document.querySelectorAll('#language-selector li')) { var lang = language_selector.getAttribute("data-lang-switch-to"); if (this.navigator.languages.includes(lang)) { language_selector.classList.remove("hidden"); show_lang_selector = true } } if (show_lang_selector) { document.querySelector("#language-selector").classList.remove("hidden") } } ================================================ FILE: blog/static/js/edition-2/main.js ================================================ window.onload = function () { let container = document.querySelector('#toc-aside'); if (container != null) { resize_toc(container); toc_scroll_position(container); window.onscroll = function () { toc_scroll_position(container) }; } let theme = localStorage.getItem("theme"); if (theme != null) { setTimeout(() => { set_giscus_theme(theme) }, 500); } } function resize_toc(container) { let containerHeight = container.clientHeight; let resize = function () { if (containerHeight > document.documentElement.clientHeight - 100) { container.classList.add('coarse'); } else { container.classList.remove('coarse'); } }; resize(); let resizeId; window.onresize = function () { clearTimeout(resizeId); resizeId = setTimeout(resize, 300); }; } function toc_scroll_position(container) { if (container.offsetParent === null) { // skip computation if ToC is not visible return; } // remove active class for all items for (item of container.querySelectorAll("li")) { item.classList.remove("active"); } // look for active item let site_offset = document.documentElement.scrollTop; let current_toc_item = null; for (item of container.querySelectorAll("li")) { if (item.offsetParent === null) { // skip items that are not visible continue; } let anchor = item.firstElementChild.getAttribute("href"); let heading = document.querySelector(anchor); if (heading.offsetTop <= (site_offset + document.documentElement.clientHeight / 3)) { current_toc_item = item; } else { break; } } // set active class for current ToC item if (current_toc_item != null) { current_toc_item.classList.add("active"); } } function toggle_lights() { if (document.documentElement.getAttribute("data-theme") === "dark") { set_theme("light") } else if (document.documentElement.getAttribute("data-theme") === "light") { set_theme("dark") } else { set_theme(window.matchMedia("(prefers-color-scheme: dark)").matches ? "light" : "dark") } } function set_theme(theme) { document.documentElement.setAttribute("data-theme", theme) set_giscus_theme(theme) localStorage.setItem("theme", theme) } function clear_theme_override() { document.documentElement.removeAttribute("data-theme"); set_giscus_theme("preferred_color_scheme") localStorage.removeItem("theme") } function set_giscus_theme(theme) { let comment_form = document.querySelector("iframe.giscus-frame"); if (comment_form != null) { comment_form.contentWindow.postMessage({ giscus: { setConfig: { theme: theme } } }, "https://giscus.app") } } ================================================ FILE: blog/templates/404.html ================================================ {% extends "base.html" %} {% block title %}Page not found | {{ config.title }}{% endblock title %} {% block main %}

    Page not found

    Sorry, this address is not valid.

    If you followed a link on this site, please report it!

    {% endblock main %} ================================================ FILE: blog/templates/auto/forks.html ================================================ ================================================ FILE: blog/templates/auto/recent-updates.html ================================================ ================================================ FILE: blog/templates/auto/stars.html ================================================ ================================================ FILE: blog/templates/auto/status-updates-truncated.html ================================================ ================================================ FILE: blog/templates/auto/status-updates.html ================================================ ================================================ FILE: blog/templates/base.html ================================================ {% extends "edition-2/base.html" %} ================================================ FILE: blog/templates/edition-1/base.html ================================================ {% block title %}{% endblock title %} (First Edition)

    {{ config.title }} (First Edition)

    {{ config.extra.subtitle | replace(from=" ", to=" ") | safe }}

    {% block main %}{% endblock main %}
    {% block after_main %}{% endblock after_main %}

    © . All rights reserved. Contact
    ================================================ FILE: blog/templates/edition-1/comments/allocating-frames.html ================================================ {% raw %}
    Philipp Oppermann

    There is some interesting discussion on reddit.

    Tobias Schottdorf

    > Note that we need to clone the iterator because the order of areas in the memory map isn't specified.

    Could you elaborate on that? I'm probably getting something wrong, but I can't reproduce issues omitting the clone:

    https://gist.github.com/tschottdorf/e1e0a4091136dd281ab3

    Philipp Oppermann

    The `clone` is required because the min_by function consumes the iterator. I updated that sentence. My initial reasoning was that we can't just use the next area in the iterator because Multiboot doesn't specify an increasing ordering. Thus we need to use `min_by` and clone the iterator.

    Vikas Reddy

    Later you are identity mapping the VGA buffer address..but in the frame allocator allocate_frame() function you are not pointing that out...may be some time later when you allocate more frames...it may be allocated to some other page...

    Philipp Oppermann

    Good catch! Fortunately it is part of the memory hole below the 1MiB mark. Thus it is never allocated by the frame allocator.

    Errma Gerrd

    if let Some(area) = self.current_area {

    produces a `if let` arms have incompatible types [E0308] for me. (expected type `_` found type `()`.

    is this due to my rust version (rustc 1.11.0-nightly (0554abac6 2016-06-10)? :o

    EDIT:

    i found my error!
    // `frame` was not valid, try it again with the updated `next_free_frame`
    self.allocate_frame();

    should be ->

    // `frame` was not valid, try it again with the updated `next_free_frame`
    return self.allocate_frame();

    in my own code of course :)

    Gerald King

    You can write simply `self.allocate_frame()` without `return`, but also without trailing semicolon (it's important). In Rust, it is considered better practice than using `return` as the last statement.

    Errma Gerrd

    yes... that semicolon killed me :D

    I think I'll stick with `return expression;` because it looks alot more explicit :o
    but thanks for the clue anyway! :)

    Gerald King

    Not at all :) I'm a Rust newbie too, and that syntax seems a bit unfamiliar for C++, Java, Python etc. programmers. (But Ruby has such a feature.)

    Errma Gerrd

    I use CoffeeScript at work which has this feature too. But I merely use it
    using return IMHO has the advantage that it gets syntax highlighted in another color and you can scan easier for function return points :)

    Gerald King

    You are probably right about this :)

    Gabriel Eiseman

    These articles of yours are very good: I'm following them in C because they are better than any C tutorial I have yet found. However, I ran into a small problem with this one. In your Multiboot crate you define the ELF symbols tag as having 3 32 bit integers where the multiboot specification pdf you link to specifies 4 16 bit integers. The layout in your crate seems to be correct though, because I defined a struct in C (using __attribute__((packed))) according to the pdf and it did not work, but when I mirrored your layout it did.

    Philipp Oppermann

    Thanks for the nice words :). Honestly, I'm not quite sure about this. It's been a while since I wrote this code…

    The multiboot specification (PDF) says:

    This tag contains section header table from an ELF kernel, the size of each entry, number of entries, and the string table used as the index of names. They correspond to the ‘shdr_*’ entries (‘shdr_num’, etc.) in the Executable and Linkable Format (elf) specification in the program header.

    So the `shdr_` entries are just the entries of the ELF header. The problem is that the multiboot specification uses 32-bit elf files implicitly, but our kernel is a 64-bit elf. There seem to be some format differences… However, the ELF64 specification (PDF) uses u16s too…

    In the Readme of the multiboot2 crate I wrote:

    Note that this format differs from the description in the Multiboot specification because if seems to be wrong for ELF 64 kernels: The number of entries, entry size, and string table fields seem to be u32 instead of u16 (but I'm not sure on this).

    So I wasn't even sure on this when I wrote it :D.

    I did some digging in the GRUB2 source and found the definition of `multiboot_tag_elf_sections`. Like the multiboot2 crate, it uses 3 u32s. So it seems like the crate is correct.

    Redrield

    When I get to the part when we're getting the kernel start and end, I'm getting a triple fault and the OS is stuck in a bootloop. This is my Bochs log https://gist.github.com/Red...

    Philipp Oppermann

    Hmm, I don't use Bochs, but according to this page of the Bochs user manual it seems to be a “no atapi cdrom found“ error. I have no idea why it occurs though… Does it work in QEMU?

    Johan Montelius

    Could we not use the start_address() and end_address() that are available in the BootInformation instead of doing the address calculation ourselves?

    let multiboot_start = boot_info.start_address();

    let multiboot_end = boot_info.end_address();

    Philipp Oppermann

    Sure! I think they were introduced after this post was written and I forgot to update this post.

    Madeleine Berner

    Hi! Your link for "re-export" under the section "Testing it" is broken.

    Are you supposed to do something with the instruction "In order to test it in main, we need to re-export the AreaFrameAllocator in the memorymodule." ? I keep getting this error: "Could not find AreaFrameAllocator in memory "

    in src/lib.rs on this row:

    let mut frame_allocator = memory::AreaFrameAllocator::new(...

    Madeleine Berner

    Hi (again)! I just found the code to add to fix my problem. Perhaps you can add it to your blog post?

    From this commit I added the new lines from the file mod.rs:

    https://github.com/phil-opp/blog_os/commit/9f1a69cafa8a1b09dd71cbf3bf7493e388576391#diff-8c6f6418d9ea96c33ce93b05462bfd65

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/better-exception-messages.html ================================================ {% raw %}
    SmallEgg

    Yay ! New post !

    I'm so impatient to read the next one ^^
    Thank you a lot for what you're doing :)

    Philipp Oppermann

    You're welcome :). I plan to start writing on the next one in the next few days, but I can't promise anything.

    SmallEgg

    Oh so there will be no need to wait for months before the next post ?

    Great !

    Philipp Oppermann

    I hope so ;).

    SmallEgg

    Some news about the next post ? :D

    Philipp Oppermann

    The last weeks were quite busy, so I couldn't write on it much. I hope that I can find some time on the weekend.

    Aaron Levine

    Looking forward to the next post! Thanks for all your hard work on this, this tutorial series is incredibly enlightening.

    Philipp Oppermann

    Thanks a lot! The next post is nearly done: https://github.com/phil-opp...

    Philipp Oppermann

    See also the discussions on hacker news and /r/rust!

    Hoschi

    Hey everybody!

    I am stuck with #reproducing-the-bug-in-qemu. I cannot reproduce the alignment issue, neither in QEMU (with -enable-kvm) nor on real hardware.

    Has anyone an idea how to force the misalignment?

    Philipp Oppermann

    I think that this could be a side effect of the recent update of our VGA driver code, which introduces volatile writes.

    With volatile, the compiler can no longer use SSE instructions to combine multiple VGA buffer writes. The SSE instructions are the instructions that require the 16 byte alignment. So without them, the error no longer occurs. (Of course the issue is still there. We just need a different code sample to trigger it.)

    I'll try to take a closer look on this issue in the next few days. Thanks a lot for your this!

    Philipp Oppermann

    I've updated the post: https://github.com/phil-opp...

    We now add some garbage code to the `divide_by_zero_handler`, which should compile to a `movaps` instruction again. This should lead to a bootloop on real hardware. Does it work for you?

    Hoschi

    Hey Philipp!

    Thanks for the fast reply, I already thought I had a typo anywhere.

    With your update, I can reproduce the reboot loop now.

    Thanks again!
    Christian

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/catching-exceptions.html ================================================ {% raw %}
    Michelle Ramur

    Phil these tutorials are awesome, keep going!

    Philipp Oppermann

    Thanks a lot :)

    Fabio Cesar Canesin

    What is your view on Redox ? https://github.com/redox-os...

    Philipp Oppermann

    I think it's very impressive how complete it already is and how fast it progresses. Really excited for its future!

    Philipp Oppermann

    There's some great discussion on /r/rust and hackernews!

    Ben Anderson

    Awesome tutorials! Please please please keep writing them :)

    Philipp Oppermann

    Thanks so much! Sure I will :)

    Ben Anderson

    I'm left wondering where you got all this knowledge from? Is such low level systems programming a hobby or a job?

    Philipp Oppermann

    I'm a computer science student and I've taken some great OS courses. It's also a hobby of mine and I've experimented with a lot with toy x86 kernels and Rust. Most of the x86 information is from the OSDev wiki and the Intel/AMD manuals.

    I also have a great research assistant job since November, where I try to bring Rust to an ARM Cortex-M7 board.

    Joe ST

    This is one of my favourite series, you're really doing such a great job of explaining bare-metal stuff to us all thank you!

    Philipp Oppermann

    Thanks, glad you're liking it!

    Andrew Nurse

    I actually encountered the println deadlock earlier while debugging something and solved it in a slightly different way. The problem generally occurs when a second println is encountered while evaluating one of the arguments to an outer println. So, I changed println to call a helper function called print_fmt which took in a core::fmt::Arguments. I used the format_args macro (https://doc.rust-lang.org/n... to evaluate the arguments and produce the core::fmt::Arguments, which I pass to print_fmt. Only within print_fmt do I actually take the lock on the WRITER, which means that all the expressions in the println! have been fully evaluated.

    The advantage being you can nest println's as far as you want and you won't deadlock :)

    See my implementation here: https://github.com/anurse/Oxygen/blob/dfda170b3f3d45eca20d4a1366e5d62384d7b2e4/src/vga.rs

    Great posts by the way, loving the series!

    Philipp Oppermann

    Thanks! I really like your solution. I wanted to fix the nested-println-deadlocks too, but totally forgot to do it… However, the print_error solution also has advantages. For example, it always displays the error, even when we handle asynchronous hardware interrupts in the future.

    So I think that I'd like to keep the print_error solution but also integrate your println changes (in order to fix deadlocks on nested printlns). I'm just not sure how to integrate it. Maybe I'll just add another section to the end of this post (just before `What's next`)…What do you think?

    Philipp Oppermann

    I just thought about this again and I think I will switch to your solution. It's much cleaner and avoids the output data race completely. Thanks again for proposing it!

    Edit: Implemented in #249

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/double-faults.html ================================================ {% raw %}
    Blaž Šnuderl

    Hey. This post seems to break returning from interruptions.

    Eg. int!(3) produces a double fault while it returned correctly in the previous post

    Aside from that, thanks for this series. It is superbly written with lots of useful info :)

    Philipp Oppermann

    You're right, I don't know how I didn't notice this.

    It seems like the problem is our new GDT, which doesn't have a data segment descriptor. This alone is fine, but the `ss` register still holds a value which is saved to the stack when an exception occurs and checked by `iretq` when we return. One possible solution is to load 0 into the `ss` register after entering long mode (null descriptors are explicitely allowed in `ss` in 64-bit mode). Another solution is to add a valid writable data segment to our new GDT.

    This issue is tracked in #277 and I plan to fix it in the next few days. Thanks a lot for reporting!

    Aside from that, thanks for this series. It is superbly written with lots of useful info :)

    Thanks so much!

    Philipp Oppermann

    For all people that have the same problem:

    The problem is that the ss segment register still contains a selector of the previous GDT that is no longer valid. The iretq instruction expects a valid data segment selector or a null selector. So the easiest way to fix this problem is to add the following to the long_mode_init.asm:

    long_mode_start:
    ; load 0 into all data segment registers
    mov ax, 0
    mov ss, ax
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax

    (You only need to reload ss. However, the other registers also no longer point to valid data segments, so it's cleaner to invalidate them, too.)

    boomshroom

    I just asked on the irc because this seemed like it could be possible with lifetimes. They suggested using PhantomData to add a lifetime parameter to the index and the stack pointers could be replaced by Option<&_>, which is easily obtained by an as_ref() method on a raw pointer.

    The biggest issue here is verifying that the Option is the correct size.

    Philipp Oppermann

    The biggest issue here is verifying that the Option is the correct size.

    As far as I know, an Option<&X> has always the same size as a &X, since references implement the the NonZero trait. We could also use a struct StackPointer(usize) and implement NonZero for it. Then an Option<stackpointer> has the same size as an usize.

    However, I don't think that it suffices to add a lifetime parameter to the index. For example, we could create two static TSSs A and B. Now we can load TSS A in the CPU but use an index from TSS B in our IDT.

    Thank you for series. It is great

    Rajivteja Nagipogu

    Hi Phill sir. I implemented the OS following the blog till here. So, I was trying to implement system calls to the OS. I tried to use the "syscall" crate and implement a "write" syscall but it caused a Double fault. Is this due to the page fault? Here is the stack frame

    IP : 0x110e43
    code_segment : 8
    cpu_flags : 0x200006
    stack_pointer : 0x121dd0
    stack_segment : 0

    Guard page is at 0x11b000

    Philipp Oppermann

    The problem might be that you didn't define a handler function for the syscall interrupt or that the privilege level of the IDT entry doesn't allow invokations from userland. In that case, a general protection exception occurs. If you didn't define a handler for this exception, it causes a double fault.

    boris

    Very interesting series, thank you. Please feel free to add some sort of subscriber feed (RSS,Atom,..)! If there is one, Firefox didn't find it. Haven't tried inoreader yet.

    Philipp Oppermann

    There should be a feed at https://os.phil-opp.com/rss.xml

    Nitin John

    Hey! I've been going through your blog, and I think it's splendidly written. I was wondering when the next post would be out

    Philipp Oppermann

    Thanks a lot! I'm currently working on a second edition of this blog, which reorders the posts (exceptions before page tables) and uses an own bootloader. So the plan is to rewrite the earlier posts, reuse the posts about exceptions, and then write some new posts about hardware interrupts and keyboard input.

    I created an issue to track this.

    Anonym

    Waiting next..... :)

    Anonym

    thanks

    Are you running your own blog post ? i've reading it the first half and already want to point out, this is all i need from such a great programmer. Otherways i would have asked my boss for such a course, but i think this can bring me to the path i wanted, i am a webdeveloper and want to serve json files on the internet. But to be ISO 27001 compliant i needed this information...

    Philipp Oppermann
    Are you running your own blog post ?

    Sorry, I don't understand what you mean.

    Wanted to share a project I am working on, I started from this awesome blog! https://github.com/arbel03/os

    Philipp Oppermann

    Looks like you created your own bootloader and already have some kind of filesystem. Really cool!

    We have just created the rust-osdev organization on Github, where we plan to host and maintain all kinds of libraries needed for OS development in Rust (e.g. the x86_64 crate, a bootloader, etc.). Let me know if you'd like to become a member, maybe we can join forces.

    Dan Cross

    Sadly, this appears to no longer compile, as some of the dependencies are now rather different and some language features have changed. I know you're busy with the second edition effort, but is there any chance there are updates waiting in the wings to the first edition parts?

    Philipp Oppermann

    Sorry, I don't have the time to keep the first version up to date. I try my best to incorporate the first edition posts into the second edition soon, but it will take some time.

    Dan Cross

    I understand. It's a great service to the community that this exists at all; would you accept pull requests to fix code while the second edition is still being prepared?

    Philipp Oppermann

    Sure!

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/entering-longmode.html ================================================ {% raw %}
    Daniel

    Hi, thank you for the blog posts, finding them really accessible and interesting.

    The test_long_mode function doesn't look quite right:


    test_long_mode:
    mov eax, 0x80000000 ; Set the A-register to 0x80000000.
    cpuid ; CPU identification.
    cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
    jb .no_long_mode ; It is less, there is no long mode.
    mov eax, 0x80000000 ; Set the A-register to 0x80000000.
    cpuid ; CPU identification.
    cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
    jb .no_long_mode ; It is less, there is no long mode.
    ret


    ^ should probaly be (according to your linked OSDEV page):


    test_long_mode:
    mov eax, 0x80000000 ; Set the A-register to 0x80000000.
    cpuid ; CPU identification.
    cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
    jb .no_long_mode ; It is less, there is no long mode.
    mov eax, 0x80000001 ; Set the A-register to 0x80000001.
    cpuid ; CPU identification.
    test edx, 1 << 29 ; Test if the LM-bit, which is bit 29, is set in the D-register.
    jz .no_long_mode ; They aren't, there is no long mode.
    ret

    Philipp Oppermann

    You're right, thank you! I created an Github issue and will fix it soon: https://github.com/phil-opp...

    Daniel Ferguson

    Thank you! For any further issues I run into would you prefer I posted to the github page?

    I ran into another snag, at the end of the paging setup section (just before the GDT section) you state:

    "To test it we execute make run. If the green OK is still printed, we have successfully enabled paging!"

    I'm assuming (based on trying it out) it would not be bootable at that stage, though it may in some way be down to the specifics of my setup or mistakes on my part.

    When run with qemu it repeatedly restarts and doesn't reach an 'OK' boot.
    Removing the `enable_paging` call allows the OS to boot properly, though the paging will not be set up. Further forwards, once the GDT is implemented, the OS once more boots without a hitch.

    I checked out the repo and stripped out the later steps to compensate for any errors I made in copying.

    Thanks again for these posts.

    Philipp Oppermann

    Post issues wherever you want (and thank you for doing it).

    Hmm, I can't reproduce it on my machine. I checked out commit 457a613 (see link below) and it ran without problems. Could you try the code from this commit?

    Link to 457a613: https://github.com/phil-opp...

    Daniel Ferguson

    Apologies, I did a quick diff against that commit (which runs fine) and found I missed the `align 4096` in `section .bss`. Putting that in fixed it (you did include it in this post), fault is all mine.

    Thanks again!

    emk1024

    This is fun!

    Continuing my saga of trying to get this running under Ubuntu 14.04 LTS on a MacBook Pro, if you find that enabling long paging causes an infinite reboot cycle (triple fault?) in QEMU, then you might want to check your qemu-system-x86_64 version. Version 2.0 will reboot infinitely as soon as you try to turn on paging. Version 2.1.2, however, works fine.

    Is this perhaps a problem with huge pages? Would it help to add another feature test?

    In order to prevent more debugging fun, I've downloaded and built your blog_os repo, and I can now see it print "Hello world", so it should be smooth sailing from here. :-)

    Once again, thank you for a cool series of blog posts! Are there any OS development books that you recommend for ideas on further enhancing this basic system?

    Philipp Oppermann

    You're right, support for 1GB pages was introduced in QEMU 2.1 in 2014. Intel CPUs support it since Westmere (2010).

    There is indeed a way to test support: CPUID 0x80000001, EDX bit 26. But I'm not quite sure if it's good to rely on such a "new" feature at all... Maybe I change it to use 2MB pages instead...

    I opened an issue for it. Thank you very much for the hint!

    Edit: Updated the code and article to use 2MiB pages instead of 1GiB pages. It now works on my old PC from 2005 again :).

    Philipp Oppermann

    Are there any OS development books that you recommend for ideas on further enhancing this basic system?

    Well, there is the Three Easy Pieces book I linked in the post, which gives a theoretical overview over different OS concepts. Then there's the little book about OS development, which is more practical and contains C example code. Of course there are many paid books, too.

    Besides books, the OSDev Wiki is also a good resource for many topics. Looking at the source of e.g. Redox can be helpful, too.

    For exotical ideas, I really like the concept of Phantom OS and Rust's memory safety might allow something similar… We'll see ;)

    Tom Smeding

    There's still some text in the article referring to the gigabyte page, like "Now the first gigabyte of our kernel is identity mapped", but otherwise, immense thanks for this article; even though I'm not going to use Rust, these two articles actually got me up and running in long mode *without* hassle!

    Philipp Oppermann

    That's correct, actually. We mapped the first gigabyte through 512 2MiB pages instead of one 1GiB page. So the outcome is the same but the code is more complicated...

    But I see that this can cause confusion, so I will clarify it.

    Tom Smeding

    Oh of course, silly me :P

    Tobias Schottdorf

    > An entry in the P4, P3, P2, and P1 tables consists of the page aligned 52-bit physical address of the page/next page table and the following bits that can be OR-ed in:

    I can't quite make sense of that - so the physical addresses which are available to virtual addressing are only 52bit (instead of all 64bit)? There appear to be 24 flags which can be or'ed in, but wouldn't that necessitate overwriting parts of the physical address (52bit + 22bit > 64bit) of the page/page table?

    Philipp Oppermann

    The key is that the physical addresses are page aligned. The last 12 bits are thus guaranteed to be 0 and can be used to store some flags. So there are 24 bits for the various flags and 52-12=40 bits for the aligned physical address.

    Nicholas Platt

    I'm confused about this as well. Why say "52-bit physical address" if the address is only 40 bits? Is it because the address is between sets of flags? Meaning, do the table entries really look like this?

    +-------+----------------------------------------+-------+
    | flags | physical address (frame or next table) | flags |
    +-------+----------------------------------------+-------+
    63 51 11 0

    Can you check my understanding:

    * Virtual addresses are effectively 48 bits:
    * Highest 16 bits are sign extension of 48th bit
    * Next 36 bits are used to navigate the paging tables
    * Lowest 12 bits are used as offset from physical address
    found in P1

    * Physical addresses are effectively 40 bits and page aligned

    * Paging table entries are 64 bits:
    * Highest 12 bits are flags
    * Next 40 bits are the physical address of a table or frame
    * Lowest 12 bits are flags

    Thus physical addresses identify the start of each aligned frame, and virtual addresses identify the location within the frame.

    Philipp Oppermann

    The physical address is 52 bits. It is possible to address up to 2^52 bytes of memory with it. Operating systems without paging (e.g. MS-DOS) directly use the physical address to access memory. And so do we before we enable paging.

    As soon as we enable paging, the CPU uses the memory management unit (MMU) to translate used addresses (“virtual addresses”) to the real memory addresses. These virtual addresses are effectively 48 bits on x86_64 and behave exactly as you stated.

    So why are only 40 physical address bits stored in the page table? The reason is that the physical memory is split into page sized chunks, which are called frames. The first frame starts at physical address 0, the second frame at physical address 4096, and so on. Thus the physical address of a frame is always page aligned. There are still non-page-aligned physical addresses but they can't be the start of a frame.

    So the lowest 12 bits of a valid physical frame address are always 0. We don't need to store anything if we know that it is always 0. Thus these bits can be used to store useful information instead (flags in our case).

    I hope this helps in clearing up your confusion.

    Nicholas Platt

    Thanks, this has indeed become more clear as I've worked with it. I wrote (and just revised due to better understanding) a detailed comment and that helped nail it down for me.

    In case it's not clear to anyone else, the reason the lower bits are always 0 is because 4096 = 0x1000.

    Another question then: since we're aligning on 2mib pages here (0x200000), can we access the extra few bits (21 vs 12)?

    I'll try this myself once I'm allocating pages.

    Edit:
    It seems like this idea works. I added the following lines after the paging table setup and didn't encounter any processor exceptions:

    ; try writing within reserved address space,
    ; in a middle entry of P4
    mov eax, (1 << 31)
    or [p4_table + (256*8)], eax

    I guess this works, just be sure you're acting on a 2mib page and not a 4kib page.

    Philipp Oppermann

    That's an interesting question! The AMD manual says no in section 5.3.4 in Figure 5-25 on page 135. The bits between 13 and 20 are marked as “Reserved, must be zero”. So it seems like a general protection fault occurs then.

    Your example works because you only set a bit of a non-present page. AFAIK all bits of non-present pages are available to the OS (except the present bit). If you want to test it, you can set a bit between 13 and 20 in the currently used P2 table. The P3 and P4 table entries still need 40bits for storing the physical address of the next table since page tables only need to be 4KiB aligned.

    Ahmed Charles

    You should probably mention that setting bit 16 in cr0 turns on write protection for read only pages, even in kernel mode.

    Philipp Oppermann

    Good catch! I copied the code from my experimental kernel and it seems like I have missed that… I'm not quite sure if I should keep and explain it, or just remove it. What do you think?

    I opened an issue for this.

    Wink Saville

    Philipp,

    Just an FYI, In my baremetal-x86_64 repo I ported your boot.asm to boot.gas.S so I could use the code with gnu Assembler.

    Philipp Oppermann

    Nice! You are porting it to C?

    Wink Saville

    Yes I'm using boot to launch my C based system, your code was the best and most straight forward code to get to long mode that I've seen. I found your code though Eric Kidd's posts to the rust mailing list on the interrupt issues, and I'm glad I'm not going to have to solve that problem yet again :)

    Philipp Oppermann

    Thanks! I'm glad that it has helped :)

    anula

    I have an interesting problem, that probably has something to do with alignment (as usual while dealing with assembly), though I can't say for sure.
    I tried to run the code that does all the checks, but with no paging yet (so prior to "Paging" header). Unfortunately, it always gets into some kind of loop, sometimes qemu throws an exception:
    `qemu: fatal: Trying to execute code outside RAM or ROM at 0x000000002b100044`
    So it probably tries to execute some random code.

    If I delete call to check_long_mode, everything works properly, and green OK is printed to the screen. I don't even need to delete the whole call, it is enough to put `ret` after `test edx, 1 << 29` so it seems as if the jump to error code (`jz .no_long_mode`) was somehow to blame.

    During the course of debugging, I added a small function, almost identical to `error` and discovered that just adding the function makes the error go away.
    Here are both my codes: https://gist.github.com/anu...
    The first one (boot.asm) enters the strange loop (executing random instructions?) on my laptop, the second one (boot2.asm) executes properly. And the only difference is addition of some code that is never called anyway.

    Any ideas what may cause it?

    EDIT:
    Aligning stack to 4096 (bss is in my code above text section) also seems to solve the issue. Still, I don't really understand why is this happening. I thought that x86 doesn't need instructions to be aligned to anything specific?

    Philipp Oppermann

    That was an interesting debugging session :D

    I tried every debugging trick I knew, read the manual entries for all involved instructions, and even tried to use GDB. But I could not find the bug.

    Then I gave up and just looked at the source code in the repo and created a diff to your code. And the problem was surprisingly simple:

    You swapped `stack_bottom` and `stack_top`.

    But this small change causes big problems. Every `push` or `call` instruction overwrites some bits of the `.text` section below. The last function in the source file and thus the last function in the `.text` section is `check_long_mode`. If you add something behind it, e.g. another error function, it is no longer overwritten and works again.

    I think the counter-intuitive thing is that stuff further down in the source file ends up further up in memory. And the stack grows downwards to make it even more confusing. Maybe we should add a small note in the text, why `stack_bottom` needs to be _above_ `stack_top` in the file?

    anula

    Uh, that is an.. embarrassing error. I checked all registers twice (easy to mistake eax with ecx) but somehow never thought to check that... I guess that when you see top above bottom in code you unconsciously decide that it is ok.

    About the note - it would probably make sense, maybe it will make someone to check their code twice, and surely will be a good reminder for people that have little experience with low level things like that.

    Thanks very much for the help - I guess it would take me a lot of time later to debug it, when it would start to mysteriously fall after I add another function call in Rust.

    Philipp Oppermann

    Not embarrassing at all, just hard to debug!

    I created an issue for the note, but it will take a while since I'm short on time right now. If you like, feel free to send a PR.

    Wink Saville

    Phillipp,

    Previously I mentioned I'm using a derivative of your boot.S code to boot a C kernel. Things are going pretty good so far, but today I wanted to try to get interrupts going and have run into a brick wall.

    I've simplified my test program to something to something very simple. All that happens is boot code jumps to the C code which enables interrupts and loops for a short period of time and then exits. There should be no interrupt sources so I'd expect this to run for as long as I'd like and then exit. And it does If the loop time is very short, but if I lengthen the loop it stops prematurely.

    In a more sophisticated version of my program I initialize the Interrupt Descriptor Table and use the APIC to generate a one-shot timer interrupt. Here too, all is well if the delay is short, but when I lengthen the delay I get a Double Fault interrupt!

    It almost feels like there is a watchdog timer or .......

    Any suggestions welcome.

    Thanks,

    Wink

    Philipp Oppermann

    A double fault occurs when you don't handle an exception/interrupt or your exception handler causes another exception. Do you enable interrupts (sti) or do you just catch cpu exceptions? Maybe you forgot to handle the interrupts from the hardware timer? But it's difficult to help without the actual code…

    Wink Saville

    Agreed, and I see that in my more sophisticated program, the question is what is it that I'm doing wrong. I believe I've setup the Interrupt Descriptor Table to handle all interrupts, i.e. I have an array of 256 interrupt gates. That program is here (https://github.com/winksaville/sadie but its too complicated to debug and I haven't yet checked in my non-working APIC timer code. But with that code I'm able to do software interrupts and also when my APIC timer code fires an interrupt fast enough it does work. So it would seem I've done most of the initialization "properly". Note, I'm also compiling my code with -mno-red-zone so that shouldn't be the problem.

    So my debug strategy in situations such as this is to simplify. So the first thing was to just enable interrupts and doing nothing that should cause an interrupt to occur and then delay awhile in the code and see what happens. But, sure enough I'm still getting a double fault. Of course according to the documentation in the Intel SDM Volume 3 section 6.15 "Interrupt 8--Double Fault Exception (#DF)" the error code is 0 and CS EIP registers are undefined :(

    Anyway, I then simplified to as simple as I can get. I modified your boot.asm program adding the code below the esp initialization that output's character to the VGA display.


    start:
    mov esp, stack_top

    ; Save registers
    push edx
    push ecx
    push ebx
    push eax

    ; Enable interrupts
    ;sti

    ; Initialize edx to vga buffer ah attribute, al ch
    mov edx, 0xb8000
    mov ax, 0x0f60

    ; ebx number of loops
    mov ebx,10000

    .loop:

    ; Output next character and attribute
    mov word [edx], ax

    ; Increment to next character with wrap
    inc al
    cmp al, 0x7f
    jne .nextloc
    mov al,60

    ; Next location with wrap
    .nextloc:
    add edx, 2
    and edx,0x7ff
    or edx,0xb8000

    ; Delay
    mov ecx,0x2000
    .delay:
    loop .delay

    ; Continue looping until ebx is 0
    dec ebx
    jnz .loop

    ; Disable interrupts
    cli

    ; Restore registers
    pop eax
    pop ebx
    pop ecx
    pop edx

    Here is a github repo: (https://github.com/winksaville/baremetal-po-x86_64/tree/test_enable_interrupts). If you add the above code to your boot.asm it will print 10,000 characters to the VGA display and then continue with the normal code paths. If the "sti" instruction is commented out, as it is above, then all is well. But if I uncomment the "sti" thus enabling interrupts then it fails.

    I anticipated that enabling interrupts would succeed as I wouldn't expect any interrupts because the hardware is in a state where no interrupts should be generated. Or if grub or the BIOS is using interrupts then I'd expect things to also be OK.

    Obviously I'm wrong and I'd hope you'd be able to suggest where my flaw is.

    Philipp Oppermann

    Thanks for the overview and the simplified example! I haven't had the time to look at it in detail, but the problem in your simplified example could be the Programmable Interval timer. From the “Outputs” section:

    The output from PIT channel 0 is connected to the PIC chip, so that it generates an "IRQ 0". Typically during boot the BIOS sets channel 0 with a count of 65535 or 0 (which translates to 65536), which gives an output frequency of 18.2065 Hz (or an IRQ every 54.9254 ms).

    So it seems like the BIOS turns it on by default so that it causes an interrupts every ~55ms. This causes a double fault, since there is no interrupt handler for IRQ 0.

    Wink Saville

    Philipp, you were correct, the PIT was the culprit causing the "Double Fault". Although it turns out the PIT is actually generating an Interrupt 8 so its not really a Double Fault it just a PIT interrupt.

    My short term solution is to add a pit_isr as interrupt 8 handler and at the end of pit_isr send an EOI to the PIT using outb(0x20, 0x20). I also needed to issue a APIC EOI for my apic_timer_isr and I cleaned up the initialization. So now my system is cleanly handling these interrupts at least.

    For the PIT I really want to disable it and I'd like to suggest disabling the PIT be part of boot.asm so that my simple sti, delay, cli test works. If/when I figure that out I'll let you know. Oh, and if know how to disalbe the PIT please let me know.

    Thanks again for your help!

    Wink Saville

    Here is a solution. There doesn't seem to be a way to disable the PIT, but you can disable all IRQ's from the PIC, adding the following code to my test_enable_interrupts branch allows the code to work even with the enabling interrupts:

    ```
    ; Disable PIC interrupts so we don't get interrupts if the PIC
    ; was being used by grub or BIOS. See Disabling section of
    ; https://wiki.osdev.org/PIC. If the application wants to use devices
    ; connected to the PIC, such at the PIT, it will probably want
    ; to remap the PIC interrupts to be above 0 .. 31 which are
    ; used or reserved by Intel. See the Initialisation section of
    ; the same page for the PIC_remap subroutine.

    mov al,0xff
    out 0xa1, al
    out 0x21, al
    ```

    Thanks again for your help.

    Nicholas Platt

    To identity map the first gigabyte of our kernel with 512 2MiB pages, we need one P4, one P3, and one P2 table.

    Why don't we need to set up a P1 table? We don't even reserve the space for one since there's no p1_table label in the .bss. Is the CPU able to read the paging tables such that it knows to stop translating once it reaches an entry in P2 marked "huge"? What happens to bits 12-20 of the virtual address?

    Don Rowe

    Hi, Philipp! Thanks so much for creating this for us--it's been very fun to go from 0-OKAY with the ASM here, and I can't wait to get to the Rust portion (which is what drew me to this project in the first place. I'm a little confused, though, about the 4-level paging structure. Is there exactly one each of P2, P3, and P4, and then 512 different P1's that each point to various 4K physical pages?

    Philipp Oppermann

    Thanks!

    There is always exactly one P4. For each P4 entry, there is a P3. For each P3 entry, there is a P2. And for each P2 entry, there is a P1. Each entry of the P1 then points to a physical memory page.

    So there is one P4 table, 1…512 P3 tables, 1…(512*512) P2 tables, and 1…(512*512*512) P1 tables. (And 1…(512*512*512*512) mapped 4k pages. 512^4 * 4k = 256TiB = 2^48 bytes is the maximum amount of addressable virtual memory.)

    If we wanted to identity map the first 2MiB, it would require 512 4k pages and thus exactly 512 P1 entries. Every page table has 512 entries, so we need exactly one P1 (and one P2, P3, P4).

    If we wanted to identity map the first 513 4k pages, we would need another P1 entry. Our first P1 is full, so we create another P1. Its first entry points to the 513th 4k page and the other entries are empty. Now we map the second P2 entry (which is currently empty) to the P1 table.

    In our case, we want to identity map the first 512*2MiB. This requires 512*512 4k pages and thus 512 P1 tables. Fortunately, there is a useful hardware feature: huge pages. A huge page is 2MiB instead of 4k and is mapped directly by the P2 (so we completely skip the P1 table). This allows us to avoid the 512 P4 tables. Instead we map the 512P2 entries to huge pages.

    The big advantage of a multilevel page table is that we don't need to create the page tables / page table entries for memory areas we don't use. In contrast, a single level page table would need 68719476736 entries to address the same amount of virtual memory. So the page table alone would need 68719476736*8=512GiB memory, which is much more than the total amount of RAM in a consumer PC.

    Don Rowe

    Ah, I understand! Thank you.

    Lonami

    So excited to get started with the next chapter!! ^·^

    lightning1141

    If someone run the os get a check_long_mode error, try run qemu with this:

    -cpu kvm64
    Ps: Thanks Phil. This book is really helpful.

    Frank Afriat

    Thank you for the very clear blog and explanations.
    Just a remark, would be clearer to add in the Paging section the meaning of bits 12-31 containing the physical address of the next P or the physical address.

    What I don't understand is why P1 is not used and how the CPU know that there is no P1 and we link directly to the physical page ? It is also the role of the huge bit ? And also for 2 MB how is defined the offset ?

    Philipp Oppermann

    Thanks!

    We don't use a P1 because it would be cumbersome to set up 512 P1 tables in assembly. Instead, we set the huge bit in the P2 entries, which signals to the CPU that the entry directly points to the physical start address of a 2MiB page frame. This address has to be 2MiB aligned, so bits 0-23 have to be zero. When translating an address, these bits specify the offset in the 2MiB page.

    Just a remark, would be clearer to add in the Paging section the meaning of bits 12-31 containing the physical address of the next P or the physical address.

    Thanks for the suggestion! I opened #314 to track it.

    Eran Sabala

    Very nice post.. Thanks for the effort (:

    Anatol Pomozov

    Thanks for the blogpost series. It is very useful for those who develops its own x86 operation system.

    In my own project (unrelated to this Rust OS) I try to initialize segment registers with null descriptor like you do 'mov XX, 0'. Setting ds/es/fs/gs works fine, but when I try to set SS with null descriptor I get a crash. Looking at the documentation 'Intel 64 developers manual Vol. 2B 4-37' I see that 'MOV SS, 0' is prohibited and causes #GP(0).

    I wonder why 'MOV SS, 0' works for you...

    Stefan Junker

    I'm not certain why there is a limitation, but in the blog post the
    data is written to `ax` first and then loaded from `ax` to `ss`.

    Anatol Pomozov

    it seems that "mov" to segment register requires a general purpose register as source. In my code I also use 'movw %ax, %ds' I just made it a bit easier to read by using const value.

    Anyway it is unrelated to my original question. Writing null descriptor to all segment registers (except %ss) is fine. Documentation also states that null descriptor cannot be used for the stack segment.

    Philipp Oppermann

    Hmm, do you have a link to the documentation? I can't find anything relevant on page 4-37 in this document: https://www.intel.com/Assets/en_US/PDF/manual/253667.pdf

    The AMD64 manual states on page 253:

    Normally, an IRET that pops a null selector into the SS register causes a general-protection exception (#GP) to occur. However, in long mode, the null selector indicates the existence of nested interrupt handlers and/or privileged software in 64-bit mode. Long mode allows an IRET to pop a null selector into SS from the stack under the following conditions:
    • The target mode is 64-bit mode.
    • The target CPL<3.
    In this case, the processor does not load an SS descriptor, and the null selector is loaded into SS without causing a #GP exception

    Maybe I interpreted that wrong, though…

    Anatol Pomozov

    Hi Philipp, your link points to 6 years old Intel doc, here is the same but much more recent https://software.intel.com/...

    Scroll to 'MOV' instruction, page 4-37. There is a block algorithm for MOV that says

    IF segment selector is NULL
    THEN #GP(0); FI;

    I believe I hit this issue.

    Philipp Oppermann

    Thanks for the link!

    Hmm, the listing is preceded by “Loading a segment register while in protected mode results in special checks and actions, as described in the following listing.” (emphasis mine)

    Under “64-Bit Mode Exceptions” (page 4-39) there are only 3 cases for a #GP(0):

    If the memory address is in a non-canonical form.
    If an attempt is made to load SS register with NULL segment selector when CPL = 3.
    If an attempt is made to load SS register with NULL segment selector when CPL < 3 and CPL ≠ RPL.

    I see no reason why we should hit any of these…

    Anatol Pomozov

    I have one more question. In your example you do a jump to long mode. As far as I know long 'call' can be used here as well. In fact call works in KVM and vmware but for some reason the operation crashes with #GP error. Do you know why it can be?

    Philipp Oppermann

    You need to do a so-called far jump, which updates the code segment. I'm not sure right now if a far call is supported in long mode. Either way, returning to 32-bit code might not be a good idea anyway, since the opcodes might be interpreted differently.

    Tomáš Král

    Hi, I can't get the boot.asm file to assemble because it gives me this error: src/arch/x86_64/boot.asm:(.text+0x4a): undefined reference to `long_mode_start'

    Philipp Oppermann

    Does the error occur when invoking nasm? Then you need to add extern long_mode_start somewhere inside the boot.asm (e.g. at the beginning). If it occurs while invoking ld, make sure that the long_mode_init.asm file is assembled and passed to ld (and it should of course define a global long_mode_start: label).

    Tomáš Král

    Yep, I was missing the extern long_mode_start, thank you ! :)

    Tomáš Král

    Hi, I want to ask something about assembly. Why do I have to move p4_table to eax before moving eax into cr3 ? Why can't I move p4_table directly into cr3 ?

    Philipp Oppermann

    Because the CR3 register can only be loaded from a register. So you have to load the p4_table address into a register first.

    Hi,

    out of curiosity: Does it make sense to keep the 32bit print instructions as "dead code" in the program? It can never be reached, right?

    ; print `OK` to screen
    mov dword [0xb8000], 0x2f4b2f4f
    hlt
    
    Philipp Oppermann

    Yeah, it should be unreachable after entering long mode (we would need to enter protected mode again). So it does not make much sense to keep it.

    David

    You should probably mention that the "set_up_page_tables" function works with 32 bit addresses and 32-bit (4-byte) PTE/PDE entries, each holding the 20-bit, page-aligned, physical address of the next data structure (plus 12 bits of 0s, since each level is page aligned). Readers may be confused from the preceding explication of 64-bit PTEs, which are not used there (certainly I was).

    Philipp Oppermann

    We do use 8 byte PTEs with 64 bit addresses, but we only write the bottom 32 bits, since the higher 32 bits are zero.

    David

    I guess that what's unclear to me is why you say that each PTE entry contains the 52-bit physical address of the next frame/entry but in the table it looks like only bits 12-51 (40 bits) are used for that.

    Philipp Oppermann

    Oh, that's because page tables are always page aligned, i.e. bits 0-12 have to be always zero. The hardware manufacturers utilized that fact to use those bits for the flags instead.

    David

    Makes sense, thanks.

    Is this rust or assembly? I've never used rust before although I've used assembly.

    Philipp Oppermann

    This post is still in assembly. The next post is rust.

    DaeMoohn

    Hi,

    I'm trying to follow your steps while I'm trying to build a kernel in Rust. I have some questions at this point:

    1. you skipped the A20 gate checking altogether. Is that an error or you consider it so arcane that is just not needed? On my emulators I'm trying to activate it and my machine just goes haywire.

    2. why do you map 1 GB for your kernel here? A smaller amount would surely be as suitable as 1 GB

    3. some other sites/blogs/resources I read on the internet warn us to map the kernel to a higher area due to linker issues

    4. I haven't read in detail the next posts, but I've seen you remap the kernel somewhere in the future. Is that because what you are doing here is just a quick way to go to long mode, and you actually do it as needed in Rust?

    5. On OSDev they also mention something about a P5 coming in the future.

    Thanks, very informative reading!

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/handling-exceptions.html ================================================ {% raw %}
    Rajivteja Nagipogu

    error[E0425]: cannot find function `int3` in module `x86_64::instructions::interrupts`
    --> src/lib.rs:55:39
    |
    55 | x86_64::instructions::interrupts::int3();
    | ^^^^ not found in `x86_64::instructions::interrupts`

    I am using x86_64 v0.1.0. I looked in both x86_64 and x86 crates.io documentation. There is no such function as int3() in them. May be they stopped support in the newer versions?

    Philipp Oppermann

    Sorry, I completely forgot to push my latest x86_64 updates to crates.io. It's in x86_64 0.1.2 now, so it should work after a `cargo update`.

    Rajivteja Nagipogu

    Thanks. That solved it.

    Anonym

    “Handling Exceptions with Naked Functions” : link 404'd

    Philipp Oppermann

    Thanks! Should be fixed now.

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/kernel-heap.html ================================================ {% raw %}
    Maksadbek

    Nice article, thanks

    Janus Troelsen

    "We have some well tested B-tree" Where can I find the source code for that B-tree implementation?

    Philipp Oppermann

    In the btree module of libcollections: https://github.com/rust-lan...

    The rendered documentation is here.

    Philipp Oppermann

    There is some discussion on /r/rust, hacker news, and /r/programming.

    Ryan Breen

    Love this series of articles! I'm very new to Rust and kernel development, and I've really enjoyed following along and trying to experiment a bit with alternative implementations. In that vein, I ported the inimitable gz's rust-slabmalloc (https://github.com/gz) to run in my implementation of these tutorials: https://github.com/ryanbree...

    One potentially interesting approach I tried, taking a bit of a page from Linux which I know uses a dumbed down allocator for the early allocation during kernel boot, is to have my Rust allocator be tiered: during early kernel boot, it uses a bump allocator. The only allocations done by the bump allocator are to set up the memory to be used by the slab_allocator. This meant I could get the benefit of collections when porting slab_allocator, so I dropped its internal data structure in favor of a plain old vec.

    Thanks for this series! You're doing awesome work and giving people a world of new educational opportunities.

    Philipp Oppermann

    Thanks so much!

    I really like your approach of building allocators on top of each other (and I will take a closer look when I have some time). Maybe it's even possible to create an allocator based on a B-tree…?

    Johan M

    Ahh, I see that the API to custom allocators changed :-0 I see that the code in git is updated but not for the bump_allocator. Even if one can work around it to conform to the new interface it is puzzling before you figure out what the problem is.

    A guide to the new allocator:

    https://github.com/rust-lang/rfcs/blob/master/text/1974-global-allocators.md

    Johan M

    Best strategy might be to go directly to the hole_list_allocator but to start up simple and ignore trying to reclaim blocks; that way the transition is easier.

    Philipp Oppermann

    I didn't have the time to update this post yet, sorry. The code in the repository is up-to-date, but adjusting this post would be more work. I try to update it soon.

    Philipp Oppermann
    Johan Montelius

    I like the idea of introducing a simple Bump allocator since it show how little we need to do to get dynamic memory allocation working. My rust skills are still modest but I'll give it a try to rewrite it using the new interface.

    Philipp Oppermann

    Yes, updating the bump allocator is definitely planned. Pull requests are always appreciated, if you like to submit it!

    Anonym

    The link to "the book" is not valid when talking about the allocator functions one needs to implement

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/multiboot-kernel.html ================================================ {% raw %}
    Ali Shapal

    great work in explaining how all the different pieces of hardware/software come together

    Philipp Oppermann

    Thank you!

    Nitesh Chauhan

    awesome... :) i am definitely trying this out today...

    Georgiy Slobodenyuk

    On mac OS X, for some reason,


    dd - (0xe85250d6 + 0 + (header_end - header_start))

    had no compiler warnings, while


    dd 0x100000000 - (0xe85250d6 + 0 + (header_end - header_start))

    led to

    multiboot_header.asm:7: warning: numeric constant 0x100000000 does not fit in 32 bits

    Mac OS X 10.11.1
    NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Oct 5 2015

    Philipp Oppermann

    Well, that's unfortunate… Thank you for the hint, I opened an issue: https://github.com/phil-opp...

    Philipp Oppermann

    Did you try `brew install nasm` to upgrade to a 2.11.X version?

    Arnaud Bailly

    I found I had to do

    nasm -felf64 boot.S

    to generate correct code, after doing `brew install nasm`.

    HTH

    Tom Smeding

    If (in my case on Mac OS X) grub-mkrescue (after you've installed it) gives the error "grub-mkrescue: warning: Your xorriso doesn't support `--grub2-boot-info'.", you just need to install xorriso. You probably don't have it at all yet.

    Tom Yandell

    For me (running it in an ubuntu docker container), grub-mkrescue silently fails until you add the -v flag - only with that can you see the error about xorriso (took me a lot of head-scratching to figure it out).

    Philipp Oppermann

    Thanks, I added a note about `--verbose`.

    Do you have any instructions on how you got grub built on OS X? Can't get it to build successfully for the x86_64-elf target and the default EFI target won't work with qemu...

    Tom Smeding

    What are your exact problems? IIRC I built it from source as well, but don't remember exactly what I fixed. Stuff did error here and there, I think...

    ./rs_decoder.h:2:Unknown pseudo-op: .macosx_version_min
    ./rs_decoder.h:2:Rest of line ignored. 1st junk character valued 49 (1).
    clang: error: assembler command failed with exit code 1 (use -v to see invocation)
    make[3]: *** [boot/i386/pc/lzma_decompress_image-startup_raw.o] Error 1

    emk1024

    This is an awesome series of blog posts!

    If you don't see a green "OK", look for a "GRUB" message. If you don't see "GRUB", then the weak link is probably grub-mkrescue. Two common failure modes:

    1. If your grub-mkrescue isn't installed correctly, it may silently do nothing or make bad ISO files. Try mounting your ISO file to make sure that it has your kernel and grub.cfg.

    2. If you run Linux on an EFI machine, grub-mkrescue will produce EFI boot images that don't work with BIOS-based systems like QEMU. To fix this, see this article, which recommends installing grub-pc-bin and running:

    grub-mkrescue /usr/lib/grub/i386-pc -o myos.iso isodir

    Philipp Oppermann

    Thank you, I didn't know about the EFI issue...

    Thanks, that second tip saved me.

    Anonym

    Thank you! This really helped.

    Tom Yandell

    On OSX and found I needed x86_64‑elf‑ld and x86_64‑elf‑objdump. With macports was as simple as:

    > sudo port install x86_64-elf-gcc

    Arnaud Bailly

    This is definitely fun! I tried to do this from my Mac OS X (Yosemite) and could not properly boot my fresh ISO disk. Compilation works fine, I have installed a cross-compiler for x86_64-elf architecture, compiled grub following instructions here https://wiki.osdev.org/GRUB_...... I generate a correct ISO file (checked it by mounting using Disk Utility) but it does not boot and I cannot see the GRUB message.

    Not sure how to troubleshoot this issue.... I suspect this might be a problem with incorrect format in grub as the last stage of compilation shows this message:

    ../grub/configure --build=x86_64-elf --target=x86_64-elf --disable-werror TARGET_CC=x86_64-elf-gcc TARGET_OBJCOPY=x86_64-elf-objcopy TARGET_STRIP=x86_64-elf-strip TARGET_NM=x86_64-elf-nm TARGET_RANLIB=x86_64-elf-ranlib LD_FLAGS=/usr/local/opt/flex/ CPP_FLAGS=/usr/local/opt/flex/include/

    [..]

    config.status: linking ../grub/include/grub/i386 to include/grub/cpu
    config.status: linking ../grub/include/grub/i386/pc to include/grub/machine
    config.status: executing depfiles commands
    config.status: executing po-directories commands
    config.status: creating po/POTFILES
    config.status: creating po/Makefile
    *******************************************************
    GRUB2 will be compiled with following components:
    Platform: i386-pc
    With devmapper support: No (need libdevmapper header)
    With memory debugging: No
    With disk cache statistics: No
    With boot time statistics: No
    efiemu runtime: Yes
    grub-mkfont: Yes
    grub-mount: No (need FUSE headers)
    starfield theme: No (No DejaVu found)
    With libzfs support: No (need zfs library)
    Build-time grub-mkfont: No (no fonts)
    Without unifont (no build-time grub-mkfont)
    With liblzma from -llzma (support for XZ-compressed mips images)
    *******************************************************

    I don't know what the i386-pc refer too, but if this is the target platform then it's probably incorrect. Note that I tried to boot using qemu-system-i386 but to no avail.

    Regards,

    Arnaud Bailly

    Forget it: grub-mkrescue was not correctly installed so it failed to add needed boot files.

    Thanks again for sharing this!

    Chris Cerami

    I'm having an issue where x86_64-elf-gcc isn't found when I try to configure grub, and when I checked I see that it's not included in binutils with the other x86_64-elf tools. How did get x86_64-elf-gcc on OS X?

    Arnaud Bailly

    It should be as simple as `brew install x86_64-elf-gcc x86_64-elf-binutils`

    jcaudle

    It's probably worth noting that you'll need to do `brew tap sevki/gcc_cross_compilers` or `brew tap alexcrichton/formula` to get these formulae. (Sevki's tap has newer cross compilers)

    George

    What would happen if we didn't put hlt? Would the cpu start reading random bytes and execute them as code? I tried without hlt and qemu seems to go into an infinite boot loop, but I'm just wondering what's going on.

    Philipp Oppermann

    Yes, that exactly what happens. The CPU simply tries to read the next instruction, even if it doesn't exist, until it causes some exception. QEMU can print these exceptions, the "Setup Rust" post explains how. I just tried it and it hits an Invalid Opcode exception at some point because some memory is no valid instruction.

    Bonus: You can use GDB to disassemble the “code” behind the start label. You need to start `qemu-system-x86_64 -hda build/os-x86_64.iso -s -S` in one console and `gdb build/kernel-x86_64.bin` in another. Then you need the following gdb commands:

    - `set architecture i386` because we are still in 32-bit mode
    - `target remote :1234` to connect to QEMU
    (- `disas /r start,+250` to disassemble the 250 bytes after the `start` label. Everything will be 0 as GRUB did not load our kernel yet)
    - `break start` to set a breakpoint at `start`
    - `continue` to continue execution until start is reached. Now the kernel is loaded and we can use
    - `disas /r start,+250` to disassemble the 250 bytes after the `start` label

    Then you can look at the faulting address you got from the QEMU debugging to see your invalid instruction. For me it seems to be an `add (%eax),%al` with the Opcode `02 00`.

    George

    I'm late in replying but just wanted to say thanks so much! It's really neat to learn about something that I used to believe only experts could get into.

    Sanjiv

    oh! What a wonderful article to read!

    Philipp Oppermann

    Thank you :)

    Lifepillar

    Nice post! I am on OS X, but I find it easier to use Linux for this assembly stuff. Using VirtualBox, I have created a minimal Debian machine running an SSH server and with a folder shared between the OS X host and the Debian guest. So, I may install all the needed tools and cross-compile in Debian and have the final .iso accessible in OS X (to use it with QEMU), all of this while working in Terminal.app as usual.

    As a side note, I had to set LDEMULATION="elf_x86_64" before linking, because I was getting this error: `ld: i386:x86-64 architecture of input file `multiboot_header.o' is incompatible with i386 output`. This may be because I have used Debian's 32-bit PC netinst iso instead of the 64-bit version.

    Philipp Oppermann

    Thanks for sharing your experiences! There is an issue about Mac OS support, but it seems like using a virtual machine is the easiest way…

    Robert Huang

    Man this is too fucking awesome!

    Dmitry Nikolayev

    On my system and on some others grub-makerescue is actually called grub2-makerescue and should be represented accordingly in the makefile. Perhaps this merits a comment in the text since I was not alone (https://www.reddit.com/r/os... in spending some time trying to figure out what was happening after a rather meaningless error message from make.

    Philipp Oppermann

    Thanks! I opened an issue.

    GW seo

    When I run grub-mkrescue I got no output an just silence

    after install xorriso I got error like this
    -----
    xorriso 1.3.2 : RockRidge filesystem manipulator, libburnia project.

    Drive current: -outdev 'stdio:os.iso'

    Media current: stdio file, overwriteable

    Media status : is blank

    Media summary: 0 sessions, 0 data blocks, 0 data, 861g free

    Added to ISO image: directory '/'='/tmp/grub.pI5jyq'

    xorriso : UPDATE : 276 files added in 1 seconds

    Added to ISO image: directory '/'='/path/to/my/work/isofiles'

    xorriso : FAILURE : Cannot find path '/efi.img' in loaded ISO image

    xorriso : UPDATE : 280 files added in 1 seconds

    xorriso : aborting : -abort_on 'FAILURE' encountered 'FAILURE'

    -----

    and I search for resolve this error, I arrive here[ https://bugs.archlinux.org/42334 ]

    after isntall mtools, grub-mkrescue create os.iso

    After creating the iso, I can boot to it on QEMU with no problem. Even burning it on to a disk and booting on a different machine works like a charm. However, I am having trouble getting it on to an USB thumb drive. I have tried packing it on to a USB with UNetbootin, but as soon as the UNetbootin screen appears after booting to the USB device, (The OS selection screen, giving you the options [Default] and [my_os]), nothing happens. I can select either of those options, but nothing happens.

    EDIT: Got it to work using the command line tool dd!

    Philipp Oppermann

    I just wanted to suggest dd! For the record, the command is sudo dd if=build/os.iso of=/dev/sdX && sync where sdX is the device name of your USB stick. It overwrites everything on that device, so be careful to choose the correct device name.

    Yes, I noticed that in the documentation, had me worried for a second ;-)

    liveag

    @phil_opp:disqus i created a GitHub repository where i work through your great guide step-by-step. It is located here: https://github.com/peacememories/rust-kernel-experiments
    Please let me know if there are problems with the attribution. =)

    Philipp Oppermann

    Great! Let me know if you find any rough edges :).

    Thanks you for your great articles.
    I have created my OS in Rust, and these are really useful for me.
    I have been revising my OS based on your articles.
    Also, I have been writing an article which is similar to your
    http://mopp.github.io/articles/os/os00_intro

    I added link into my articles to this website.
    If you feel unpleasant, please tell me and I will remove it.

    Thanks

    Philipp Oppermann

    Thanks! I don't speak Japanese, so I can only read the rough google translation. However, your article seems to be a really good and introduction to OS development!

    Many thanks :)
    I have been looking forward to your new articles !

    I really enjoyed your accessible blog format and your awesome osdev tutorials!
    I was inspired by your articles and decided to write my own :)
    Let me know what you think.

    http://tutorialsbynick.com/...

    Thanks,
    Nick

    Philipp Oppermann

    It's awesome! I really like that you start without a bootloader and interact with the BIOS directly in real mode. I never programmed at this level, so it was a really great read!

    jpmrno

    If having trouble installing Binutils or GRUB, here are some brew packages:

    BINUTILS:
    'brew install jpmrno/apps/crossyc --without-gcc'

    GRUB (this will install everything needed):
    'brew install jpmrno/apps/grub'

    Hope it helps!

    Antonio

    Hi guys if you want to boot the kernel in VirtualBox just modify the grub cfg file by setting the following variable properly, check https://www.gnu.org/softwar... for the options.

    GRUB_TERMINAL_OUTPUT

    Andrii Zymohliad

    This blog is just a treasure! I'm so happy that I found it. Thank you so much Phil!

    By the way, my Arch Linux is booted in legacy BIOS mode (my BIOS doesn't even support EFI), but without '-d /usr/lib/grub/i386-pc/' grub-mkrescue didn't work for me.

    P.S. Aside from this project, I think I will refer to your Makefile lot of times in future just to learn techniques that you used. I think it is the shortest example of so many Makefile best practices.

    Philipp Oppermann

    Thanks a lot! :)

    Unfortunately I have no idea what's the problem here. I've never had this error.

    Andrii Zymohliad

    Ough.. probably you replied to first edition of my comment, but I didn't reload the page and didn't see your reply. Then I found my mistake (I put grub directory into the root of image, not into boot directory). And while thinking that nobody have seen my comment yet I edited it and removed the question about error. Sorry for my careless.

    Philipp Oppermann

    No worries! Good to hear that you could fix the error.

    Andrey Zloy

    Just perfect. Thnx!

    Кирилл Царёв

    Why for the development of the core operation system, the language Rust? Why not C++? Does Rust have such opportunities as in C++?

    Philipp Oppermann

    Rust aims to be comparable to C++, both it terms of capabilities and in terms of performance. However, it has some great advantages over C++:

    The greatest advantage of Rust is its memory safety. It prevents common bugs such as use after free or dangling pointers at compile time. So you get the safety of a garbage collected language, but without garbage collection. In fact, the safety guarantees go even further: The compiler also prevents data races and iterator invalidation bugs. So we should get a much safer kernel compared to C++.

    (One caveat: Sometimes we need unsafe blocks for OS development, which weaken some safety guarantees. However, we try to use them only when it's absolutely needed and try to check them thoroughly.)

    Another advantage of Rust is the great type system. It allows us to create powerful, generic abstractions, even for low level things such as page tables.

    The tooling is great, too. Rust uses a package manager called “cargo”, which makes it easy to add various libraries to our project. Cargo automatically downloads the correct version and compiles/links it. Thus, we can use awesome libraries such as x86 easily.

    Кирилл Царёв

    Interesting... so what Rust is to replace C++, since he has so many advantages over C++?

    Philipp Oppermann

    so what Rust is to replace C++

    Sorry, I don't understand this question. Could you rephrase it?

    Andrey Zloy

    Its easy to link kernel on Ubuntu 32-bit. Just need to add -m elf_x86_64 option to linker.
    ld --nmagic -m elf_x86_64 -o kernel.bin -T linker.ld multiboot_header.o boot.o

    Lonami

    For anyone else struggling with "Boot failed: Could not read from CDROM (code 0009)", you need to install `grub-pc-bin` and then regenerate the .iso. Solution from here: http://intermezzos.github.io/book/appendix/troubleshooting.html#could-not-read-from-cdrom-code-0009.

    By the way, I'm loving the tutorial style. Very clear, thank you!

    Philipp Oppermann

    Thanks so much!

    Philip

    I love you

    skierpage

    This is completely awesome!

    YMMV but FWIW in Fedora 25, I needed to install three packages, `sudo dnf install nasm xorriso qemu-system-x86`. The last one installs fewer packages than installing "qemu" which adds two dozen ARM, m68k, S390, sparc, ... emulators as well 8-)

    I find the example displays "OK" fine, but it erases the console before this so the boot messages disappear. I'm not sure if the fix lies in grub configuration or the qemu command line.

    It is interesting to look at the contents of the CD-ROM image, though it mostly reveals the complexity of the GRUB bootloader. I used `mkdir temp_mount && sudo mount -t iso9660 -o loop os.iso temp_mount` then looked around in temp_mount.

    Harry Rigg

    I had 2 issues with making the iso. First, there was no output file, yet grub-mkrescue didn't complain, I fixed this by running "apt install xorriso" (Ubuntu). The other issue was that qemu couldn't read the cdrom (error 0009), fixed that one by running "apt install grub-pc-bin". Hope this helps some of you... and thanks for the awesome post Phil :)

    Mrvan

    How to shutdown?

    M. Wagner

    The Makefile doesn't work for me. It gives only the error No rule to make target ' build/arch/x86_64/boot.o' needed by 'build/kernel-x86_64.bin'. I don't know what's going wrong....

    Philipp Oppermann

    It seems like there is some problem with this lines:

    build/arch/$(arch)/%.o: src/arch/$(arch)/%.asm
      

    Do you have a file named build/arch/x86_64/boot.asm? For debugging, you could use explicit names instead of the wildcards (%):

    build/arch/$(arch)/boot.o: src/arch/$(arch)/boot.asm
      

    (Note that you need to copy this rule for every .asm file without wildcards.)

    Anonym

    I'm interested whether it would run on actual hardware, with a real CD. I doubt anyone tried it though...

    Philipp Oppermann

    Yes, it works on real hardware! If not, please file an issue.

    Dendyard

    Lot of tutorials together. Thanks man (y)

    Darryl Rees

    This is incredible, just fantastic..

    I did have a couple of hiccups following along using Win10 WSL on a UEFI PC, maybe these details can be folded in to the tutorial?

    1) Couldn't boot QEMU with emulated video device

    warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
    
      Could not initialize SDL(No available video device) - exiting
      

    Solution: Use -curses option for qemu

    qemu-system-x86_64 -curses -cdrom os-x86_64.iso

    2) Could not boot from ISO (on a UEFI system)

      Booting from DVD/CD...
    
       Boot failed: Could not read from CDROM (code 0004)
    
                             Booting from ROM...
      

    Solution: sudo -S apt-get install grub-pc-bin

    I am having the same issue as you have mentioned in number one. It says warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] And it is not printing out OK, I tried with the -curses option for qemu but not working.

    I'm trying to do this, but I can't get the OK to actually display and I've kind of ran out of ideas. Trying to run with QEMU on Arch Linux.

    Things I've tried: Adding the multiboot tag that should tell grub I want a text mode, 80x25. Just gives me a black screen, instead of saying "Booting 'my os'"

    Switching grub to text mode with every possible switch I can find that looks related, with and without ^. Just gives me a black screen for all of them too. I can confirm my code actually seems to be executed - or, at least, hits the hlt instruction. Just that there's no output, which makes me think VGA problems, hence me trying all of the above. That seems to leave trying to parse the multiboot header or something, and that seems like... something I don't really want to try to do in assembly, including pushing it over assembly? I don't really want to move unless this works, though, because I see you still are using text mode extensively further on. :/

    ...Never mind. I just figured out, and it was a very tiny mistake. I accidentally typed 0xb800 instead of 0xb8000, so, of course, no output ever because I was copying to the wrong region of memory.

    Person

    RIP windows users

    Hi. Thanks for write, but i have an error when i run it in qemu - "error: no multiboot header found, error: you need to load kernel first". Whats wrong?

    Philipp Oppermann

    It means that GRUB couldn't find a multiboot header at the beginning of your kernel. So either your multiboot header is invalid (maybe a typo somewhere?) or it is not at the beginning of the file (is your linker script correct? did you use the --nmagic flag?).

    I just found this and it is great. Clear, practical explanations without fuss but that don't hide what's going on. That's perfect for how I like my tech explanations

    Philipp Oppermann

    Thanks so much!

    Aswani Kumar

    Thank you so much for the article. It is of great help to understand the basics of developing OS. Especially the in hand experience.

    Philipp Oppermann

    Thanks! Glad that it's useful to you :)

    Thank you for this series, it's exceptional! Clear, deep into details, and fascinating :)

    Philipp Oppermann

    Thank you :)

    Anonym

    For anyone trying to push themselves into using the GNU assembler (i.e. as), if you're getting "no multiboot header" errors with QEMU, put the line:

    .align 8

    before the end tags.

    windows technical support

    is it possible to manipulate the windows kernel. Which language is used in developing windows kernel?

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/page-tables.html ================================================ {% raw %}
    Philipp Oppermann

    There is some interesting discussion on /r/programming and /r/rust.

    Matteo Meli

    Hi Philipp,

    First of all thanks a lot for sharing this series of article. Just a question: do you have an idea why doubling the stack size would not be sufficient to avoid the silent stack overflow you mentioned? To make my code work I had to triple it...

    Philipp Oppermann

    What are you doing in your main? Maybe there is something stack intensive before the `test_paging` call? My main function looks like this. I removed most of the example code from the previous posts, so maybe that's the reason..

    Matteo Meli

    You're right! The main difference in my code is that I set up interrupts. It must be that. Thanks

    Philipp Oppermann

    Nice! I assume that you have some experience with OS development?

    Matteo Meli

    Not really much actually, but trying to work my way through. Currently stuck trying to load the kernel in the higher half... Do you have any references to point me to?

    Philipp Oppermann

    Only the stuff from the OSDev wiki but I think that you are aware of it already.

    I would link the Rust code to the higher half but keep all startup assembly identity mapped. Then map the Rust code from the long mode assembly and jump to it.

    Maybe it's even possible to use the same linker script as before since rustc generates position independent code by default (AFAIK).

    Kiley Owen

    Is there a typo in the code in the huge pages section?

    Philipp Oppermann

    Maybe... What's the problem exactly?

    FreeFull

    `TableEntryFlags` is mentioned exactly once. Where does it come from?

    Philipp Oppermann

    It should be `EntryFlags`. Thanks for reporting, I pushed an update.

    Ted Meyer

    In order to decide when a page table should be freed, you can use the 52nd bit in the first 9 entries, to keep a score of how many present entries there are. It might be pretty bad for caching though.

    Philipp Oppermann

    I like the idea. Alternatively, we could use bits 52 to 61 of the first entry. What's your concern about caching?

    Ted Meyer

    My main concern with using all the bits in the first entry is that if ever in the future you want to go back and add more things, it would need to be changed.

    As far as caching goes, it would depend on how big a cache line is, but if it is smaller than 9 table entries then there could be multiple cache misses on a single update (theoretically 9 bits could be changed for one addition / deletion of a page). Obviously not a _huge_ deal, but I think it's worth pointing out.

    Stephen Checkoway

    Neither choice seems like it works. Bits 62:MAXPHYADDR (where MAXPHYADDR is at most 52) are reserved and supposed to be set to 0. However, bits 11:9 appear to be free at every level of the page table hierarchy.

    Somewhat annoyingly, 10 bits are needed since 513 values need to be represented. Thus one could use three bits from each of the first four entries.

    x86-64 has a 64-byte cache line size so the four accesses do fit in a single cache line.

    Ryan Campbell

    Hey Phillip, started doing this a couple days ago and was able to make it this far. Unfortunately I am now having some compilation issues that appear to be a result of the x86 crate. I get "error: 'raw::Slice' does not name a structure" when compiling raw-cpuid, a dependency for x86. Thoughts?

    edit: Ah-ha! I see you mentioned this a few days ago . Thanks for everything!

    Philipp Oppermann

    Yeah, the `raw::Slice` struct was deprecated and removed in the latest nightlies. However, the current version of the raw-cpuid crate still depends on it. The author is aware of the issue and will publish a new version in the next few days. Until then, you can try an older nightly as a workaround.

    Ryan Campbell

    Yep, I got it working just editing those couple of lines. Will pull new update tomorrow. Thanks!

    Can you explain me this self referencing trick for Page tables ?

    Philipp Oppermann

    Of course! I tried to do give a short overview in the Recursive Mapping section. Which part of it is unclear? Or do you have a specific question?

    I didn't understand how looping once you can access P1 entry..
    Can you give a short example..

    Philipp Oppermann

    The virtual->physical address calculation is done in hardware, which expects 4 levels of page tables. In order to access the entries of a P1 table, we need to remove one level of translation. The trick is that all page tables have the (almost) same format, independent of the table level. So the CPU doesn't see a difference between e.g. a P4 and a P3 table.

    This allows us to implement the recursive mapping trick. We lead the CPU to believe that the P2 table is the P1 table and that the P3 table is the P2 table. Thus we end up on the memory page of the P1 table and are able to modify its entries.

    Likewise, the CPU interprets the P4 table as P3 table. But which table do we use as P4 table then? Well, we use the same P4 table as before. So our P4 is used twice by the CPU: At the first time, it is interpreted as a P4 table and at the second time as a P3 table.

    Now only one piece is missing: We need a special P4 entry, which points to the its own table again. This way we can construct a virtual address for which the P4 table is used twice (once as P4 and once as P3).

    I hope it helps :).

    Got it.. thanks

    Hoschi

    Hi everybody!

    When I try to compile the code from this article, I get the following error:

    error: private trait in public interface (error E0445)

    It's fired here: impl<l> Table<l> where L: HierarchicalLevel {...}

    This seems to be one of those errors, caused by the fact, Rust is still under development and some features change from time to time.

    It compiles if I make the trait public:
    pub trait HierarchicalLevel: TableLevel {....}

    I wrote the code on my own and to make sure this is not an error caused by myself, I cloned the github repo and tried to compile it with the same result.

    Maybe one of the more skilled OS devs here has an idea if it is a problem to mark the trait as public.

    Thanks in advance!

    Christian

    Philipp Oppermann

    Thanks a lot for reporting this! I thought that we've fixed this issue, but it seems like we've messed up somehow. I opened this issue for it.

    That said, I think that a public HierarchicalLevel is the correct solution. It shouldn't be a problem since you can't do anything bad by implementing HierarchicalLevel (e.g. you still can't construct a `Table`).

    Hello,

    You said that the P4 recursive loop must be set before paging is enabled.
    But I wonder - the memory is currently identity mapped, so what difference is there in P4_table address in with/without paging?

    Philipp Oppermann

    Hmm, good point! The P4 table is part of the identity mapped area, so it should work even if we do it after enabling paging.

    That sentence was added in #246, but I don't know the reason anymore. I just tested it and it still works if I do the recursive mapping after paging is enabled. So maybe we should revert that PR…

    Thanks for the clarification :)

    Evan Higgins

    Hi Phil,
    First off, I just want to say thanks for this tutorial, it's really great.
    I've run into an issue on this section when implementing the memory::paging::test_paging function. Specifically, with the test for unmap. If everything up to and including the unmap function call is implemented it operates as expected and unmap panics, but if the corresponding println! is added the kernel goes into a boot loop. Based on what you have said in this section that seems to indicate a page fault, but I don't really understand why the existence of that println! causes it. It's more confusing because execution never actually reaches that macro, so it's just the inclusion of that line in the code that triggers it. To make things ever weirder, it quits boot-looping whenever I add a second (or third, etc.) instance of that println! call.

    So my question is: Do you have any insight as to what could be the cause of this? Or, if nothing else, some avenues I could pursue for debugging?

    Thanks

    Matthew

    Hey Philipp, regarding the Testing and Bugfixing section, can you explain why only a P2 and a P1 table is created after running map_to? Why isn't a P3 table created?

    Matthew

    I think I figured it out. Is it because we already mapped index 0 of P4 to a P3 table from boot.asm?

    Philipp Oppermann

    Yes, exactly.

    Warren

    I've implemented paging and everything seems to work correctly, but reading from an unmapped page causes a page fault even without flushing the translation lookaside buffer. I also tried it out with your repo and it exhibited the same behavior after I commented out the call to tlb::flush. Is there some QEMU setting that I need to change?

    Philipp Oppermann

    If the address isn't cached in the TLB, it works without a flush. The cache has limited space, so some translations are evicted when space runs out. Maybe try accessing the to be unmapped page right before unmapping it?

    Hi! First of all; This is an amazing project, thank you very much for your time and effort, and the work put down into doing this! I am a complete beginner in Rust, and follow this guide mainly to get a better grasp on operating system "basics". Your tutorials have been very good at explaining the code snippets in a simple manner, but sometimes it gets a bit confusing as to where functions and other code snippets should be placed in the file tree.. If you find the time, could you write the path/filename of where each code snippet goes? I'm sure it would be very helpful to other people who, like me, are not yet intuitive about "what parts go where".

    Philipp Oppermann

    Thanks for the suggestion! I opened https://github.com/phil-opp/blog_os/issues/382 on Github to track this issue.

    Hi Phil, when I try to add the test code to test the unmap with the lines of code below, looks like the system can't boot up, and qemu just keeps rebooting. But if I remove this line. the code works perfectly. Could you please help to have a check.

    println!("{:#x}", unsafe { *(Page::containing_address(addr).start_address() as *const u64) });

    The issue has been solved, keep rebooting is caused by the page fault, and the root cause is some index is misused. After correct the index, the issue is gone.

    Hey Phil. This is a fairly basic question compared to some of the other comments here and I probably am missing something simple. When the Entry::pointed_frame function is made, it uses 0x000fffff_fffff000 to mask bits 12-51. Why that number? It doesn't only mask those bits. What am I missing?

    Philipp Oppermann

    It clears the lowest and highest 12 bits. So bits 12 to 51 should be the only bits set afterwards.

    I realize now I was messing up maths. I was thinking each digit in hex being 16 bits instead of 4 bits. Thanks

    Jack Halford

    hi phil, quick question

    It seems that as soon as I enable x86 paging the VGA buffer is not accessible anymore (because 0xb8000 is not identity mapped yet?). So essentially the test_paging routine doesnt print anything... so my thinking tells me the identity map is the first thing to do after enabling paging, yet its the subject of the next chapter, am I not getting something?

    Jack Halford

    I didn't realise the boot.asm p2 p3 p4 setup was already a preliminary identity paging with huge pages, that's awesome! I'm working on x86 protected mode so I only have p1,p2 and my huge pages are 4MiB.

    Philipp Oppermann

    Yeah, paging is enabled since entering longmode (with an identity mapping). Do you have any reason for only choosing protected mode?

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/printing-to-screen.html ================================================ {% raw %}
    Retep998

    Please add #[repr(C)] to some of your structs, in particular the ones that depend on having a specific layout.

    Philipp Oppermann

    Thanks for the hint! On `ScreenChar` it is definitely required. But I'm not quite sure if it's needed for the `ColorCode` newtype…

    Where (else) would you add #[repr(C)]?

    Theo Belaire

    You could also link to: http://embedded.hannobraun....
    for another example of getting no_std rust working.

    jeffreywilcke

    If anyone else (like me) is running in to some minor issues due to the incompleteness of the source code please have a look at my version that compiles: https://github.com/obscuren...

    @phil_opp:disqus FYI the link up top that says (full file) is missing and comes up with an anchor to #TODO :-)

    Philipp Oppermann

    Thanks for the hint! I removed that link and added the rest of the Color enum.

    Btw: you accidentally skipped the number 2 in your Color enum :).

    jeffreywilcke

    Ah oops :)

    I'm fairly new to rust and I couldn't get https://github.com/obscuren... that line (in your original example) to work. Throwing something about "borrowed something something couldn't be moved".

    Also thanks for the excellent examples and post! :D

    Philipp Oppermann

    I just checked out your project and color_code: self.color_code works fine. Maybe it was before you made ColorCode Copy?

    Edit: Damn, I totally forgot to describe Copy and Clone…

    Update: I mentioned the compiler errors and added some explanation here and here.

    jeffreywilcke

    Awesome, thanks! And thank you for the explanation.

    emk1024

    I made it this far, and then started on implementing interrupts, which are needed for keyboard input. Along the way, I discovered the rust-x86 crate, which provides data structures for major x86_64 CPU data types. This looks like it would save a lot of debugging time and digging around in manuals. Also of interest is the hilarious and extremely helpful After 5 days, my OS doesn't crash when I press a key.

    My interrupt-descriptor-table-in-progress is also available on GitHub for anybody who's interested in laying the groundwork for keyboard I/O.

    This is definitely a fun idea, and your explanations are great. Highly recommended.

    Philipp Oppermann

    Thanks for the links! The rust-x86 crate looks useful indeed (but may need some minor additions/changes for 64bit). Julia Evans's blog is great and was my first resource about rust OS development :). Just keep in mind that it's from 2013 and thus uses an early version of Rust.

    Your interrupt table code looks good! I think the handler functions need to be assembly functions until support for something like naked functions is added. Am I right?

    emk1024

    Yup. x86 interrupt handler functions appear to be completely non-standard in any case—it seems like every x86(_64) OS I've looked at has different rules for saving registers, mostly because the architecture is such a mess of ancient hacks. I also have another comment-in-progress on the XMM registers, which cause serious headaches implementing interrupts.

    rust-x86 appears to actually be x86_64-only, despite the name. The data structures I looked at were all what I'd expect on a 64-bit chip. I'll probably throw away a lot of my carefully debugged code and just use the rust-x86 versions of stuff.

    It would be nice to have a bunch of crates on cargo which handle common processor and I/O stuff. And maybe a Rust-only ELF loader library for loading and relocating user-space binaries. :-)

    Ahmed Charles

    I think it should be 80 columns and 25 rows, when describing VGA.

    Philipp Oppermann

    Thanks, fixed it.

    FreeFull

    When you mention the write!/writeln! macros can be now used, the example uses Writer::new(...) although Writer doesn't have a `new` method.

    Philipp Oppermann

    Thanks for reporting! I created #118 for it.

    Hunter Lester

    Thank you for this education. I love your presentation.

    Philipp Oppermann

    You're welcome. Thanks a lot!

    Nathan

    The print macro should be `let mut writer`.

    Philipp Oppermann

    Thanks! Fixed in #156.

    Chris Latham

    hey, great articles so far!

    i've been following along, and i've run into some issues with the ::core::fmt::Write implementation for our writer class.

    if i add that code in, i get these linker errors:

    core.0.rs:(.text._ZN4core3fmt5write17hdac96890aec66a9aE+0x324): undefined reference to `_Unwind_Resume'

    core.0.rs:(.text._ZN4core3fmt5write17hdac96890aec66a9aE+0x3eb): undefined reference to `_Unwind_Resume'

    core.0.rs:(.text._ZN4core3fmt5write17hdac96890aec66a9aE+0x3f3): undefined reference to `_Unwind_Resume'

    i've gone back and checked that i set panic to "abort" for both dev and release profiles in my config.toml, the same way you did to fix the unwinding issues. everything seems to match up with what you have. what have i missed?

    thanks in advance.

    Andrii Zymohliad

    It becomes more and more difficult for me to understand it... Mostly because of difficulty of Rust. I just follow every step and everything works for me, but I feel like if I make any step away from the instruction - everything will get broken.

    But still I learn a lot from this blog. Thank you again!

    One thing I can't understand: we implement ::core::fmt::Write trait for our Writer, and we implement 'write_str' method for it. But how can we use 'write_fmt' method if we didn't define it here?

    Philipp Oppermann

    The core::fmt::Write defines 3 methods: `write_str`, `write_char`, and `write_fmt`. However, the latter two have default implementations, so we only need to implement `write_str`.

    I try to explain uncommon features of Rust in the posts, but providing a complete Rust introduction is out of scope. I recommend you to read the official Rust book or its new iteration (still work-in-progress).

    It becomes more and more difficult for me to understand it...

    That's unfortunate :(. Feel free to ask if something else is unclear!

    Andrii Zymohliad

    Aa, ok. Thanks for clarification and quick reply.

    Yes, you explain everything very well. Very surprisingly well for such difficult topic. Of course complete Rust intro is out of scope. But I didn't know about new iteration of Rust book. Thank you for the hint.

    And thanks a lot for such welcoming and kind manner of communication with your "students" :)

    Philipp Oppermann

    Yes, you explain everything very well. Very surprisingly well for such difficult topic.

    Thanks so much!

    Hoschi

    Hey!

    If you are interested in adding some build information, you can do it this way:

    Add the following to your Makefile:

    BUILD_NUMBER_FILE := buildno.txt
    BUILD_NUMBER_LDFLAGS = --defsym _BUILD_NUMBER=$$(cat $(BUILD_NUMBER_FILE)) --defsym _BUILD_DATE=$$(date +'%Y%m%d')

    $(kernel): builddata cargo ..

    builddata:
    touch $(BUILD_NUMBER_FILE)
    @echo $$(($$(cat $(BUILD_NUMBER_FILE)) + 1)) > $(BUILD_NUMBER_FILE)

    The above will make the current date and a build number available in the kernel, incremented each time you build it.

    To access build information:

    Add the following to your lib.rs:

    extern {
    fn _BUILD_NUMBER();
    fn _BUILD_DATE();
    }

    Now you can do the following, for example in your rust_main():

    let build_number = _BUILD_NUMBER as u32;
    let build_date = _BUILD_DATE as u32;

    println!("Build {}: {} ", build_number, build_date);

    Joshua Beitler

    I'm curious - why does this code write from the bottom of the screen and then up, when most other VGA drivers go from the top of the screen down?

    Philipp Oppermann

    As soon as the screen is full (i.e. scrolling begins) both approaches do the same thing: writing to the bottom line and shifting the other lines up. By starting at the bottom, we don't need any special case for the first 25 lines.

    (…and maybe I just wanted to do things a bit differently :D)

    Harry Rigg

    I went ahead and made the "panic_fmt" function complete :) Now I get filename, line and an error message when rust panics! very helpful, here is the code:

    #[lang = "panic_fmt"]
    extern fn panic_fmt(args: fmt::Arguments, file: &'static str, line: u32) -> ! {
    println!("Panic in file {} line {}: {}", file, line, args);
    loop {}
    }

    Thanks for the awesome tutorials!

    Philipp Oppermann

    Great! We also do this in the next post, but it's a good idea to mention it here, too.

    Thanks for the code by the way, I always thought that the `file` argument is of type `&str` (instead of `&'static str`). I opened #256 to fix this.

    Thanks for the awesome tutorials!

    You're welcome, I'm glad you like them!

    ocamlmycaml

    So i'm trying to make `println!("{}: some number", 1);` work, but when I add that line to my rust_main function, the emulator does the whole triple exception thing starting with a 0xd error - which according to OSDev.org is a "General protection fault":

    ```check_exception old: 0xffffffff new 0xd
    0: v=0d e=0000 i=0 cpl=0 IP=0008:ec834853e5894855 pc=ec834853e5894855 SP=0010:000000000012ec18 env->regs[R_EAX]=0000000000000a00```

    `println!("Hello {}!", "world");` works just fine - it just doesn't seem to be able to interpolate non-string types. Would you have any idea on what's going wrong? I'm not sure where to even look. If you'd like to clone and run my code and take a look: https://github.com/ocamlmycaml/rust-moss/

    btw ++good tutorial, i'm learning a lot!

    ocamlmycaml

    I figured it out - I had OUTPUT_FORMAT(elf32-i386) in my linker.ld file. I was trying a few things to run the kernel without making an iso

    I'm not gonna pretend like I understand what happened, but removing that OUTPUT_FORMAT statement fixed my problems

    Esdras

    Solution for the problems of compilation:

    1: go to vga_buffer.rs 2: go to line buffer: unsafe { Unique::new(0xb8000 as *mut _) }, 3: change for buffer: unsafe { Unique::new_unchecked(0xb8000 as *mut _) }, thanks for attention (sorry for my english)

    Hello :)

    Great tutorials, just a quick question for learning purposes. Could the values in enum be defined implicitly like so?

    pub enum Color {
          Black,
          Blue,
          ...
       }
      
    Philipp Oppermann

    I don't know actually. It seems to work: https://play.rust-lang.org/?gist=7e8684f332ece651836c80b4d7439c1c&version=stable. However, I'm not sure if this is specified behavior or if it might be changed some day (e.g. by new compiler optimizations).

    Anonym

    I followed everything in this tutorial to the letter, and had a question. If I were to try to print a string to the screen, how would I do it? I have been using

    print!("{}", string)

    with string containing what I want to print. I know this works in normal Rust, but would it work with the VGA buffer you made? Thanks!

    Philipp Oppermann

    Sure, why shouldn't it? The whole formatting part is handled by Rust's format_args!</code macro, so everything should work the same.

    Anonym

    Question. How would I go about changing the color of the text on the fly? Like if I wanted to print Hello World

    and have "Hello" be green and "World" be white. How would I go about doing this?

    Philipp Oppermann

    You could use two separate print statements and change the color on the global writer in between. Or alternatively, you could add support for ANSI escape codes like in a normal terminal.

    Philipp Oppermann

    Yes, it should work.

    NateDogg1232

    I keep getting the error of the trait `core::marker::Copy` is not implemented for `vga_buffer::ScreenChar`

    Why exactly does this happen despite everything looking up to snuff?

    Philipp Oppermann

    You need to #[derive(Clone, Copy)] for ScreenChar.

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/remap-the-kernel.html ================================================ {% raw %}
    Philipp Oppermann

    There is some discussion on hacker news, /r/rust, and /r/programming.

    Rajivteja Nagipogu

    Error while using x86_64::shared::control_regs.
    There was no `shared` in x86_64.
    Thanks for the help. :)

    Philipp Oppermann

    Thanks for reporting! Fixed in #301.

    Rajivteja Nagipogu

    Thank you. You have done awesome work here.

    Rhys Kenwell

    Trying to get this to work, my code looks identical to yours, save for the occasional twist for aesthetics, or different variable name, but after enabling the nxe bit, when according to you it should boot successfully, it crashes for me.

    A bit of sleuthing on my part deduced the issue, I'm getting a double fault when I try to write to the cr3 register. A bit more debugging helped me find the culprit, when I write to cr3 in the switch method, something happens and the CPU double faults.

    The exact instruction that the pc points to in the register dump is "add $0x18, %rsp"

    Thanks in advance for helping me resolve this.

    Rhys Kenwell

    Looked a bit further, the original fault is a page fault with the present, write, and reserved write bits set

    Philipp Oppermann

    Hmm, sounds like your CPU somehow thinks that you set a reserved bit. If it works fine before setting the NXE bit, it could be caused by:

    • a wrong register (should be IA32_EFER)
    • a wrong bit number (should be 1 << 11)
    • your CPU somehow doesn't support it (if you run it on real hardware)
      • does in work in QEMU?
      • The AMD manual says: “Before setting this bit, system software must verify the processor supports the NX feature by checking the CPUID NX feature flag (CPUID Fn8000_0001_EDX[NX]).”

    Hope this helps!

    Hi, just leaving this here for future reference. I had the same problem and discovered that it was actually a typo, I didn't notice the ! on the if checking for ELF_SECTION_EXECUTABLE in EntryFlags::from_elf_section_flags. Maybe this will shed some light on your problem, if you still have it.

    Nick von Bulow

    Note on the footnote: I paste in your "most useful GDB command", and it tells me "syntax error in expression, near `int*)0xfffffffffffff000)@512' "

    Philipp Oppermann

    I think it's a problem across gdb versions. I had a similar problem recently. It seems like newer versions no longer understand some casts, but I couldn't find out whether that's a bug or an intentional syntax change.

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/returning-from-exceptions.html ================================================ {% raw %}
    Philipp Oppermann

    There are also great comments on hackernews and /r/rust!

    SmallEgg

    Wohoooo ! New post ! o/

    cshnc

    Great!

    Hoschi

    Hi everybody!

    When I try to modify the page fault handler to define accesses to 0xdeadbeaf as legal, I get an error for this line:

    let stack_frame = &mut *(stack_frame as *mut ExceptionStackFrame);

    error: casting `&interrupts::ExceptionStackFrame` as `*mut interrupts::ExceptionStackFrame` is invalid

    Thanks in advance!

    Christian

    Philipp Oppermann

    Hmm, I've updated this post two days ago and removed this section. Before that, we used to take stack_frame as *const pointer. Since the update, we take stack_frame as & reference, which makes the cast illegal.

    But this doesn't make any sense since I've pushed this update after your comment?..

    Hoschi

    Hi! I tested it before your changes. I cannot find the illegal cast anymore. Thanks for your reply!

    Niklas R

    This is impressive work and pedagogical. Phil, how long time did it take to acquire the necessary technical knowledge and what did you do to achieve this technical competence?

    Philipp Oppermann

    Thanks a lot!

    I took (and still take) some operating system classes at university. However, most of the details of these posts I learned from various blogs, tutorials, and wikis (e.g. the awesome OSDev wiki). The x86 details come from the official manuals from AMD and Intel.

    I started my own little toy kernels a few years ago, at first in C. At some point I discovered Rust. It was still highly unstable at that time, but I loved to play with it and I learned a lot.

    So I think that I learned most things from writing my own toy kernels and experimenting with them.

    Adam Brown

    Wow this is awesome, looking forward to following this.

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments/set-up-rust.html ================================================ {% raw %}
    emk1024

    If you decide to add interrupt support to your OS (for keyboard input, for example), you may not want Rust to be generating SSE code. If you use SEE code in the kernel, then you need to save SSE registers in interrupts, and saving SSE registers is slow and takes a lot of RAM. As far as I can tell, a lot of kernels simply avoid floating point to help keep interrupts and system calls efficient.

    Also, as you noted in your bug on GitHub, you'll also want to set no-redzone to prevent memory corruption during interrupts.

    Since we need to set a bunch of compiler flags for all generated code, including libcore, the right answer may be to replace the target x86_64-unknown-linux-gnu with a custom target that uses the right options by default. There's a discussion here and an example target file in the zinc OS.

    emk1024

    OK, it took almost a day, but I think I've got this figured out. This is probably overkill for your great blog posts, but I'll leave it here for the next person to pass this way.

    Here's the basic strategy to getting an SSE-free, redzone-free kernel:

    1. Define a new target x86_64-unknown-none-gnu, where none means running on bare metal without an OS. This can be done by creating a file x86_64-unknown-none-gnu.json and filling it in with the right options. See below. You can just drop this in your top-level build directory and Rust will find it.

    2. Check out the same Rust you're compiling with, and patch libcore to remove floating point. You can usually find a current libcore patch in thepowersgang/rust-barebones-kernel on GitHub.

    3. Build libcore with --target $(target) --cfg disable_float, and put it in ~/.multirust/toolchains/nightly/lib/rustlib/$(target)/lib.

    4. Run cargo normally, specifying your custom target with --target $(target).

    Here's my custom x86_64-unknown-none-gnu.json file

    {
    "llvm-target": "x86_64-unknown-none-gnu",
    "target-endian": "little",
    "target-pointer-width": "64",
    "os": "none",
    "arch": "x86_64",
    "data-layout": "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128",
    "pre-link-args": [ "-m64" ],
    "cpu": "x86-64",
    "features": "-mmx,-sse,-sse2,-sse3,-ssse3",
    "disable-redzone": true,
    "eliminate-frame-pointer": false,
    "linker-is-gnu": true,
    "no-compiler-rt": true,
    "archive-format": "gnu"
    }

    This seems like a good approach, because I'm not compiling Rust code for the Linux user-space, and then trying to convince it to run on bare metal. Instead, I'm compiling Rust code against a properly-configured bare metal target, and if I don't like the compiler options, I can quickly change them for all crates. And if any floating point code tries to sneak into kernel space, I'll get an error immediately, instead of finding it when floating point registers get clobbered by an interrupt that used MMX code.

    The osdev wiki claims that this the harder but wiser course of action:

    Common examples [of beginner mistakes] include being too lazy to use a Cross-Compiler, developing in Real Mode instead of Protected Mode or Long Mode, relying on BIOS calls rather than writing real hardware drivers, using flat binaries instead of ELF, and so on.

    Since they know way more about this than I do, I'm going with their suggestions for now. :-)

    Philipp Oppermann

    Why does your target.json file include

    "features": "-mmx,-sse,-sse2,-sse3,-ssse3"

    And doesn't Rust use the `xmm` registers to optimize non-floating-point code, too?

    emk1024

    In theory, "-mmx" means "disable mmx", and so on. I'm attempting to convince Rust and LLVM to leave all those registers alone in kernel space, and to never generate any code which uses them. The goal here is not to need to save that huge block of registers (plus FPU state) on every interrupt. This seems to be a popular choice for x86 kernels.

    Does it work? We'll see.

    Philipp Oppermann

    Ah, I see... It seems like there is no documentation about this. Hopefully the libcore-without-sse issue gets resolved soon. Manually patching libcore seems like a really ugly solution.

    I think I will choose the "slow" solution and just save the sse registers on every interrupt. It's required anyway when switching between (user) processes.

    In my opinion the best solution would be an annotation to disable SSE just for the interrupt handlers.

    Alister Lee

    This is currently also hard on ARM, because I can't work out the features to disable to avoid emitting fpu instructions. I've raised a bug on llvm to seek clarification: https://llvm.org/bugs/show_...

    alilee

    Thanks heaps Phil. Just a comment for others that the rlib dependency strategy you describe won't work under a cross-compiling (ie. arm / r-pi) because the multirust nightly won't include the libcore necessary for the dependant crates to build, and they won't refer to the ones inside /target. See here: https://github.com/rust-lan...

    Alister Lee

    Right, have learned a lot in the last month, following you on ARM. I expect I'll need rlibc, but I haven't yet.

    What I have needed is `compiler-rt`, which you have avoided because you are building on a (tier 3) supported build target which is [not the case](http://stackoverflow.com/qu...) for `arm-none-eabi`.

    Philipp Oppermann

    You can avoid compiler-rt by using a custom target file that contains `"no-compiler-rt": true`.

    You can then use nightly-libcore to cross compile libcore for your new target.

    the_boffin

    For OS X/Darwin users who have made it this far:

    https://github.com/rust-lan...

    It's been a good run, but Apple's modifications to clang and ld for the Darwin system completely destroy rust's existing cross-compile capabilities, which means building an x86_64-compatibile libcore simply isn't possible without monumental amounts of work.

    Aaron D

    You might want to try again, I've been following along with all of the latest posts on macOS 10.11 without problems.

    Doesn't the linker problem still exist? Most of the options used by GNU 'ld' are not supported by macOS 10.11 'ld'. May be you used cross compiled 'ld' ?

    Jeff Westfahl

    I guess there's a reason you can't enable SSE in your 32-bit assembly file before you switch to long mode?

    Philipp Oppermann

    It should be totally possible to do it in the 32-bit file. I just tested it and it works without problems. I think I used to do more things in the `long_mode_init` file in a previous system so that there was already an error function. But since it's now the only function that needs the 64-bit error function, we could remove that function if we moved the `setup_SEE` function.

    Thanks for the hint! I opened an issue for this.

    stephane geneix

    on my box, I couldn't reproduce the SSE error. It looks like "a += 1;" doesn't generate SSE instructions anymore by default.

    objdump still shows some in what looks like exception handling code but seems to never execute
    ~/dev/rustOS/$rustc --version
    rustc 1.7.0-nightly (81dd3824f 2015-12-11)
    ~/dev/rustOS/$cargo --version
    cargo 0.8.0-nightly (028ac34 2015-12-10)

    Thank you very much for this series though, it's probably one of the most interesting ways I've seen to learn about OS boot sequence I've seen

    Philipp Oppermann

    Thanks a lot! I opened an issue for this.

    How about the following example?:

    let mut a = ("hello", 42);
    a.1 += 1;

    stephane geneix

    yes, that one breaks down without SSE

    Philipp Oppermann

    Perfect! I will update it

    suvash

    so much fun ! Thanks for this 💥🍾🍻 ! Can we have a emoji Hello World ? Just kidding.

    Philipp Oppermann

    Actually there are two smileys in code page 437.

    Smiley Hello World:

    ....
    let hello = b"\x02\x01 Hello World! \x01\x02";
    let color_byte = 0x1f;

    let mut hello_colored = [color_byte; 36];
    ...

    suvash

    omg. this is brilliant, almost forgot about these (ascii?) codes. Thanks !

    Aleksey Kladov

    Awesome! I've made a buffer overrun error, because I've added a comma to the "Hello, World!", and Rust have actually caught it at run time, and started looping.

    Using


    #[lang = "panic_fmt" ]
    extern fn panic_fmt() -> ! {
    let buffer_ptr = (0xb8000) as *mut _;
    let red = 0x4f;
    unsafe {
    *buffer_ptr = [b'P', red, b'a', red, b'n', red, b'i', red, b'c', red, b'!', red];
    };
    loop { }
    }

    Helped a lot.

    Erdos

    Using the most recent nightly build for this, the no-landing-pads snippet also generates SSE, so it's a bit of a two-for-one. :)

    Philipp Oppermann

    Thanks a lot! I fixed it by removing the superfluous let mut a = ("hello", 42); a.1 += 1; snippet (see PR #153).

    tsurai

    I think that libcore needs both SSE and SSE2 to be supported. Shouldn't you check the SSE2 CPUID flag to be sure that both SSE and SSE2 is present? Not sure if it could cause any problems within libcore later on

    Philipp Oppermann

    Good catch! However, SSE2 should always be available if the long mode is available. Citing the OSDev wiki:

    When the X86-64 architecture was introduced, AMD demanded a minimum level of SSE support to simplify OS code. Any system capable of long mode should support at least SSE and SSE2

    So SSE and SSE2 should always be available in our case (if the wiki is correct). So we could even remove the SSE check. However, I think it's better to leave it in, because we enable SSE before switching to long mode.

    Don Rowe

    Thanks again for sharing this! FYI, the link https://doc.rust-lang.org/std/rt/unwind/ in http://os.phil-opp.com/set-... is broken.

    Philipp Oppermann

    Thanks! I've fixed it in #207.

    Sorry for the delay, your comment got buried in my inbox somehow…

    JamesLewn

    #CODE FOR ANIMATED TEXT
    If you want text that is moving around the screen like a snake, get that code and replace you rust_main function with it:

    #[no_mangle]
    pub extern fn rust_main() {

    let color_byte: u16 = 0x1f;
    let ascii_byte: u16 = 32;
    let empty_2byte_character: u16 = (color_byte << 8) | ascii_byte;

    let mut poz = 0;
    while poz < 4000
    {
    let buffer = (0xb8000 + poz ) as *mut _;
    unsafe{*buffer = empty_2byte_character };
    poz+=2;
    }

    let text = b"ANIMATED TEXT!!!! ->";
    let color_byte = 0x1f;

    let mut text_colored = [color_byte; 44];
    for(i, char_byte) in text.into_iter().enumerate(){
    text_colored[i*2] = *char_byte;
    }

    //animate
    let mut offset = 0;
    let mut done = false;
    let mut delay_counter = 0;

    while !done
    {
    let poprzedni = (0xb8000 + offset) as *mut _;
    unsafe{*poprzedni = empty_2byte_character };

    offset+=2;

    let buffer_ptr = (0xb8000 + offset) as *mut _;
    unsafe{*buffer_ptr = text_colored };

    if offset == (4000-44)
    {
    done = true;
    }
    while delay_counter < 10000000
    {
    delay_counter+=1;
    }
    delay_counter = 0;
    }

    loop
    {

    }

    }

    Aaron D

    With the latest rust nightly I was getting linker errors after pulling in in the rlibc crate:

    target/x86_64-unknown-linux-gnu/debug/libblog_os.a(core-93f19628b61beb76.0.o): In function `core::panicking::panic_fmt':
    /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/libcore/panicking.rs:69: undefined reference to `rust_begin_unwind'
    make: *** [build/kernel-x86_64.bin] Error 1

    Apparently the later versions of the compiler are pretty strict about mangling almost anything they can for optimization. Usually the panic_fmt symbol becomes rust_begin_unwind (for some reason), but now it's getting mangled and so the linker can't find that symbol - it's a pretty cryptic error with discussion at https://github.com/rust-lan...

    To fix it, you need to mark panic_fmt with no_mangle as well, so the line in lib.rs becomes:
    #[lang = "panic_fmt"] #[no_mangle] extern fn panic_fmt() -> ! {loop{}}

    This allows it to build properly.

    yangxiaoyong

    Thank you for that!

    toor_

    Hello, I have reached the stage of panic = "abort", but when I make run I get this error: target/x86_64-os/debug/libos.a(core-9a5ada2b08448709.0.o): In function core::panicking::panic_fmt': core.cgu-0.rs:(.text.cold._ZN4core9panicking9panic_fmt17h6b6d64bae0e8a2c2E+0x88): undefined reference torust_begin_unwind', I really have no clue what is happening here as I have exactly the same code as you do above.

    EDIT: I have read some of the above comments, turns out that other people were having the same issue as me and I have used their solutions. Sorry to waste your time.

    Anonym

    There is an error when changing the makefile(roughly in the middle of this post). This part

    which stands for “garbage collect sections”. Let's add it to the $(kernel) target in our Makefile: $(kernel): xargo $(rust_os) $(assembly_object_files) $(linker_script) @ld -n --gc-sections -T $(linker_script) -o $(kernel) \ $(assembly_object_files) $(rust_os)

    results in "no rule to make target 'xargo'

    when changing it to this it works again:

    $(kernel): kernel $(rust_os) $(assembly_object_files) $(linker_script) @ld -n --gc-sections -T $(linker_script) -o $(kernel) \ $(assembly_object_files) $(rust_os)

    Hey, loving the tutorials :) though I'm running into an issue when using xargo build. When I do xargo build --target=x86_64-blog_os, I get the following error:

    error: failed to parse manifest at '/home/max/TesterOS/src/Cargo.toml'

    caused by: can't find library 'blog_os', rename file to 'src/lib.rs' or specify lib.path

    Could this be because of where I've saved my files? Because when I saw src/lib.rs in the tutorial, I just saved lib.rs in the src file we created. Is it something to do with where I placed my Cargo.toml or/and x86_64-blog_os.json file?

    Really confused here.

    Philipp Oppermann

    Cargo assumes that the lib.rs file is in a subfolder named src. So it doesn't work if you put the lib.rs next to the Cargo.toml.

    Thanks a million, it's all sorted now :). One more issue though. About the rlibc... make run seems to work fine without extern crate rlibc... But fails when I do add it in, saying it can't compile rlibc.

    Sorry to be a bother lol I'm a newb.

    Philipp Oppermann

    There is currently a problem with cargo/xargo, maybe this is affects you: https://github.com/phil-opp/blog_os/issues/379

    {% endraw %} ================================================ FILE: blog/templates/edition-1/comments.html ================================================ {% macro comment(page) %} {% if page.path == "/multiboot-kernel/" %} {% include "edition-1/comments/multiboot-kernel.html" %} {% elif page.path == "/entering-longmode/" %} {% include "edition-1/comments/entering-longmode.html" %} {% elif page.path == "/set-up-rust/" %} {% include "edition-1/comments/set-up-rust.html" %} {% elif page.path == "/printing-to-screen/" %} {% include "edition-1/comments/printing-to-screen.html" %} {% elif page.path == "/allocating-frames/" %} {% include "edition-1/comments/allocating-frames.html" %} {% elif page.path == "/page-tables/" %} {% include "edition-1/comments/page-tables.html" %} {% elif page.path == "/remap-the-kernel/" %} {% include "edition-1/comments/remap-the-kernel.html" %} {% elif page.path == "/kernel-heap/" %} {% include "edition-1/comments/kernel-heap.html" %} {% elif page.path == "/handling-exceptions/" %} {% include "edition-1/comments/handling-exceptions.html" %} {% elif page.path == "/double-faults/" %} {% include "edition-1/comments/double-faults.html" %} {% elif page.path == "/catching-exceptions/" %} {% include "edition-1/comments/catching-exceptions.html" %} {% elif page.path == "/better-exception-messages/" %} {% include "edition-1/comments/better-exception-messages.html" %} {% elif page.path == "/returning-from-exceptions/" %} {% include "edition-1/comments/returning-from-exceptions.html" %} {% else %} No comments. {% endif %} {% endmacro comment %} ================================================ FILE: blog/templates/edition-1/handling-exceptions-with-naked-fns.html ================================================ {% extends "edition-1/section.html" %} {% block title %}{{ super() }}{% endblock title %} {% block main %}{{ super() }}{% endblock main %} {% block introduction %}

    These posts explain how to handle CPU exceptions using naked functions. Historically, these posts were the main exception handling posts before the x86-interrupt calling convention and the x86_64 crate existed. Our new way of handling exceptions can be found in the “Handling Exceptions” post.

    {% endblock introduction %} ================================================ FILE: blog/templates/edition-1/index.html ================================================ {% extends "edition-1/base.html" %} {% import "edition-1/macros.html" as macros %} {% block title %}{{ config.title }}{% endblock title %} {% block main %} {% set posts_section = get_section(path = "edition-1/posts/_index.md") %} {% set posts = posts_section.pages %}

    Writing an OS in Rust (First Edition)

    This blog series creates a small operating system in the Rust programming language. Each post is a small tutorial and includes all needed code, so you can follow along if you like. The source code is also available in the corresponding Github repository.

    Latest post: {% set latest_post = posts|last %} {{ latest_post.title }}

    No longer updated! You are viewing the first edition of “Writing an OS in Rust”, which is no longer updated. You can find the second edition here.
    {{ macros::post_link(page=posts.0) }} {{ macros::post_link(page=posts.1) }} {{ macros::post_link(page=posts.2) }} {{ macros::post_link(page=posts.3) }}
    {{ macros::post_link(page=posts.4) }} {{ macros::post_link(page=posts.5) }} {{ macros::post_link(page=posts.6) }} {{ macros::post_link(page=posts.7) }}
    {{ macros::post_link(page=posts.8) }} {{ macros::post_link(page=posts.9) }}

    {% set extra = get_section(path = "edition-1/extra/_index.md") %}

    {{ extra.title }}

      {% for subsection_path in extra.subsections %} {% set subsection = get_section(path=subsection_path) %}
    • {{ subsection.title }}
    • {% endfor %} {% for page in extra.pages %}
    • {{ page.title }}
    • {% endfor %}
    {% endblock main %} ================================================ FILE: blog/templates/edition-1/macros.html ================================================ {% macro post_link(page) %} {% set translations = page.translations | filter(attribute="lang", value=lang) -%} {%- if translations -%} {%- set post = get_page(path = translations.0.path) -%} {%- else -%} {%- set post = page -%} {%- set not_translated = true -%} {%- endif -%}

    {{ post.title }}

    {{ post.summary | safe }} read more » {%- if lang and not_translated and lang != config.default_language -%} {%- endif -%}
    {% endmacro post_link %} {% macro toc(toc) %}
    Table of Contents
    {% endmacro toc %} ================================================ FILE: blog/templates/edition-1/page.html ================================================ {% extends "edition-1/base.html" %} {% import "edition-1/comments.html" as comments %} {% block title %}{{ page.title }} | {{ config.title }}{% endblock title %} {% block main %}

    {{ page.title }}

    No longer updated! You are viewing the a post of the first edition of “Writing an OS in Rust”, which is no longer updated. You can find the second edition here.
    {{ page.content | safe }} {% endblock main %} {% block after_main %}

    Comments (Archived)

    {{ comments::comment(page=page) }}
    {% endblock after_main %} ================================================ FILE: blog/templates/edition-1/section.html ================================================ {% extends "edition-1/base.html" %} {% import "edition-1/macros.html" as macros %} {% block title %}{{ section.title }} | {{ config.title }}{% endblock title %} {% block main %}

    {{ section.title }}

    {% block introduction %}{% endblock introduction %}
    {% for page in section.pages %} {{ macros::post_link(page=page) }} {% endfor %}
    {% endblock main %} ================================================ FILE: blog/templates/edition-2/base.html ================================================ {% if current_url %} {% endif %} {% block title %}{% endblock title %}

    {{ config.title | safe }}

    {{ config.extra.subtitle | replace(from=" ", to=" ") | safe }}

    {% block header %}{% endblock header %}
    {% block toc_aside %}{% endblock toc_aside %}
    {% block main %}{% endblock main %}
    {% block after_main %}{% endblock after_main %}
    ================================================ FILE: blog/templates/edition-2/extra.html ================================================ {% extends "edition-2/base.html" %} {% import "snippets.html" as snippets %} {% block title %}{{ page.title }} | {{ config.title }}{% endblock title %} {% block description -%} {{ page.summary | safe | striptags | truncate(length=150) }} {%- endblock description %} {% block main %}

    {{ page.title }}

    {{ page.content | safe }} {% endblock main %} {% block after_main %}

    Comments

    {{ snippets::giscus(search_term=page.title ~ " (Extra Post)", lang=page.lang) }}
    {% endblock after_main %} ================================================ FILE: blog/templates/edition-2/index.html ================================================ {% extends "edition-2/base.html" %} {% import "edition-2/macros.html" as macros %} {% import "snippets.html" as snippets %} {% block title %}{{ config.title }}{% endblock title %} {% block main %} {% set posts_section = get_section(path = "edition-2/posts/_index.md") %} {% set posts = posts_section.pages %} {{ section.content | replace(from="", to=macros::latest_post(posts=posts)) | safe }}
    {%- set chapter = "none" -%} {%- for post in posts -%} {%- if post.extra["chapter"] -%} {%- if post.extra["chapter"] != chapter -%} {# Begin new chapter #} {%- set_global chapter = post.extra["chapter"] -%}
    {%- endif -%} {%- endif -%} {{ macros::post_link(page=post) }} {%- endfor -%}

    Status Updates

    {% set status_updates = get_section(path = "status-update/_index.md") %}

    {{ status_updates.description }}

      {% include "auto/status-updates-truncated.html" %}
    • view all »

    First Edition

    You are currently viewing the second edition of “Writing an OS in Rust”. The first edition is very different in many aspects, for example it builds upon the GRUB bootloader instead of using the `bootloader` crate. In case you're interested in it, it is still available. Note that the first edition is no longer updated and might contain outdated information. read the first edition »

    {{ trans(key="support_me", lang=lang) | safe }}
    {% endblock main %} {% block after_main %} {% endblock after_main %} ================================================ FILE: blog/templates/edition-2/macros.html ================================================ {% macro latest_post(posts) %} {% set post = posts|last %} {{ post.title }} {% endmacro latest_post %} {% macro post_link(page) %} {% set translations = page.translations | filter(attribute="lang", value=lang) -%} {%- if translations -%} {%- set post = get_page(path = translations.0.path) -%} {%- else -%} {%- set post = page -%} {%- set not_translated = true -%} {%- endif -%}

    {{ post.title }}

    {{ post.summary | safe }} {{ trans(key="readmore", lang=lang) | safe }} {%- if lang and not_translated and lang != config.extra.default_language -%} {%- endif -%}
    {% endmacro post_link %} {% macro toc(toc) %}
    {{ trans(key="toc", lang=lang) }}
    {% endmacro toc %} ================================================ FILE: blog/templates/edition-2/page.html ================================================ {% extends "edition-2/base.html" %} {% import "edition-2/macros.html" as macros %} {% import "snippets.html" as snippets %} {% block title %}{{ page.title }} | {{ config.title }}{% endblock title %} {% block header %} {% if lang != "en" -%} {%- else -%} {%- endif %} {% endblock header %} {% block description -%} {{ page.summary | safe | striptags | truncate(length=150) }} {%- endblock description %} {% block toc_aside %} {% endblock toc_aside %} {% block main %}

    {{ page.title }}

    {% if page.extra.warning %}
    {% if page.extra.warning_short %} {{ page.extra.warning_short }} {% endif %} {{ page.extra.warning | markdown(inline=true) | safe }}
    {% endif %} {%- if page.lang != "en" %}
    {% set translations = page.translations | filter(attribute="lang", value="en") %} {% set original = translations.0 %}

    {{ trans(key="translated_content", lang=lang) }} {{ trans(key="translated_content_notice", lang=lang) | replace(from="_original.permalink_", to=original.permalink) | replace(from="_original.title_", to=original.title) | safe }}

    {%- if page.extra.translators %}

    {{ trans(key="translated_by", lang=lang) }} {% for user in page.extra.translators -%} {%- if not loop.first -%} {%- if loop.last %} {{ trans(key="word_separator", lang=lang) }} {% else %}, {% endif -%} {%- endif -%} @{{user}} {%- endfor %}. {%- if page.extra.translation_contributors %} {{ trans(key="translation_contributors", lang=lang) }} {% for user in page.extra.translation_contributors -%} {%- if not loop.first -%} {%- if loop.last %} {{ trans(key="word_separator", lang=lang) }} {% else %}, {% endif -%} {%- endif -%} @{{user}} {%- endfor %}. {% endif -%}

    {% endif -%}
    {% endif %}
    {{ page.content | replace(from="", to=macros::toc(toc=page.toc)) | safe }}


    {{ trans(key="comments", lang=lang) }}

    {% if page.extra.comments_search_term %} {% set search_term=page.extra.comments_search_term %} {% elif page.lang != "en" %} {% set translations = page.translations | filter(attribute="lang", value="en") %} {% set original = translations.0 %} {% set search_term=original.title ~ " (" ~ page.lang ~ ")" %} {% else %} {% set search_term=page.title %} {% endif %} {{ snippets::giscus(search_term=search_term, lang=page.lang) }} {%- if page.lang != "en" %}

    {{ trans(key="comments_notice", lang=lang) }}

    {% endif %}
    {% endblock main %} ================================================ FILE: blog/templates/edition-2/section.html ================================================ {% extends "base.html" %} {% import "edition-2/macros.html" as macros %} {% block title %}{{ section.title }} | {{ config.title }}{% endblock title %} {% block main %}

    {{ section.title }}

    {% block introduction %}{% endblock introduction %}
    {% for page in section.pages %} {{ macros::post_link(page=page) }} {% endfor %}
    {% endblock main %} ================================================ FILE: blog/templates/index.html ================================================ {% extends "edition-2/index.html" %} ================================================ FILE: blog/templates/news-page.html ================================================ {% extends "base.html" %} {% import "snippets.html" as snippets %} {% block title %}{{ page.title }} | {{ config.title }}{% endblock title %} {% block main %}

    {{ page.title }}

    {{ page.content | safe }} {% endblock main %} {% block after_main %}

    Comments

    {{ snippets::giscus(search_term=page.title ~ " (News Post)", lang=page.lang) }}
    {% endblock after_main %} ================================================ FILE: blog/templates/news-section.html ================================================ {% extends "base.html" %} {% block title %}{{ section.title }} | {{ config.title }}{% endblock title %} {% block main %}

    {{ section.title }}

    {% block introduction %}{% endblock introduction %}
    {% for page in section.pages %}

    {{ page.title }}

    {{ page.summary | safe}} read more…
    {% endfor %}
    {% endblock main %} ================================================ FILE: blog/templates/plain.html ================================================ {% extends "base.html" %} {% block title %}{{ page.title }} | {{ config.title }}{% endblock title %} {% block main %}

    {{ page.title }}

    {{ page.content | safe }} {% endblock main %} ================================================ FILE: blog/templates/redirect-to-frontpage.html ================================================ ================================================ FILE: blog/templates/rss.xml ================================================ {{ config.title }} {{ config.base_url | safe }} {{ config.description }} Zola {{ lang }} {{ last_updated | date(format="%a, %d %b %Y %H:%M:%S %z") }} {% for page in pages %} {{ page.title }} {{ page.date | date(format="%a, %d %b %Y %H:%M:%S %z") }} {{ page.permalink | safe }} {{ page.permalink | safe }} {{ page.content }} {% endfor %} ================================================ FILE: blog/templates/section.html ================================================ {% extends "edition-2/section.html" %} ================================================ FILE: blog/templates/snippets.html ================================================ {% macro giscus(search_term, lang) %} {% if lang != "en" %} {% set category = "Post Comments (translated)" %} {% set category_id = "DIC_kwDOAlvePc4CPg4c" %} {% set category_path = "post-comments-translated" %} {% else %} {% set category = "Post Comments" %} {% set category_id = "MDE4OkRpc2N1c3Npb25DYXRlZ29yeTMzMDE4OTg1" %} {% set category_path = "post-comments" %} {% endif %} {% if search_term is number %} {% set discussion_url = "https://github.com/phil-opp/blog_os/discussions/" ~ search_term %} {% else %} {% set search_term_encoded = `"` ~ search_term ~ `"` ~ ` in:title` | urlencode %} {% set discussion_url = `https://github.com/phil-opp/blog_os/discussions/categories/` ~ category_path ~ `?discussions_q=` ~ search_term_encoded %} {% endif %}

    {{ trans(key="comment_note", lang=lang) | replace(from="_discussion_url_", to=discussion_url) | safe }}

    Instead of authenticating the giscus application, you can also comment directly on GitHub.

    {% endmacro giscus %} ================================================ FILE: blog/templates/status-update-page.html ================================================ {% extends "base.html" %} {% import "snippets.html" as snippets %} {% block title %}{{ page.title }} | {{ config.title }}{% endblock title %} {% block main %}

    {{ page.title }}

    {{ page.content | safe }}

    Thank You!

    Thanks a lot to all the contributors this month!

    I also want to thank all the people who support me on GitHub, Patreon, and Donorbox. It means a lot to me!

    {% endblock main %} {% block after_main %}

    Comments

    {{ snippets::giscus(search_term=page.title, lang=page.lang) }}
    {% endblock after_main %} ================================================ FILE: blog/templates/status-update-section.html ================================================ {% extends "base.html" %} {% block title %}{{ section.title }} | {{ config.title }}{% endblock title %} {% block main %}

    {{ section.title }}

    {% block introduction %}

    {{ section.description }}

    {% endblock introduction %}
      {% include "auto/status-updates.html" %} {% for page in section.pages %}
    • {{ page.title }}
    • {% endfor %}
    {% endblock main %} ================================================ FILE: blog/typos.toml ================================================ [files] extend-exclude = [ "*.svg", "*.fr.md", "*.es.md", "*.pt-BR.md", "blog/config.toml", ] [default.extend-words] IST = "IST" # Interrupt Stack Table SEH = "SEH" # structured exception handling [default.extend-identifiers] TheBegining = "TheBegining" # GitHub user mentioned in status reports h015bf61815bb8afe = "h015bf61815bb8afe" # mangled name used in code example ================================================ FILE: docker/.bash_aliases ================================================ PS1="\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[1;35m\]<$IMAGE_NAME>\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ " ================================================ FILE: docker/Dockerfile ================================================ FROM rustlang/rust:nightly ENV IMAGE_NAME=blog_os-docker RUN apt-get update && \ apt-get install -q -y --no-install-recommends \ nasm \ binutils \ grub-common \ xorriso \ grub-pc-bin && \ apt-get autoremove -q -y && \ apt-get clean -q -y && \ rm -rf /var/lib/apt/lists/* && \ cargo install xargo && \ rustup component add rust-src ENV GOSU_VERSION 1.10 RUN set -ex; \ \ fetchDeps=' \ ca-certificates \ wget \ '; \ apt-get update; \ apt-get install -y --no-install-recommends $fetchDeps; \ rm -rf /var/lib/apt/lists/*; \ \ dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')"; \ wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch"; \ chmod +x /usr/local/bin/gosu; \ # verify that the binary works gosu nobody true; COPY entrypoint.sh /usr/local/bin/ COPY .bash_aliases /etc/skel/ ENTRYPOINT ["/usr/local/bin/entrypoint.sh"] CMD ["/bin/bash"] ================================================ FILE: docker/README.md ================================================ # Building Blog OS using Docker Inspired by [redox]. You just need `git`, `make`, and `docker`. It is better to use a non-privileged user to run the `docker` command, which is usually achieved by adding the user to the `docker` group. ## Run the container to build Blog OS You can build the docker image using `make docker_build` and run it using `make docker_run`. ## Run the container interactively You can use the `make` target `docker_interactive` to get a shell in the container. ## Clear the toolchain caches (Cargo & Rustup) To clean the docker volumes used by the toolchain, you just need to run `make docker_clean`. [redox]: https://github.com/redox-os/redox ## License The source code is dual-licensed under MIT or the Apache License (Version 2.0). This excludes the `blog` directory. ================================================ FILE: docker/entrypoint.sh ================================================ #!/bin/sh USER_NAME=blogos USER_UID=${LOCAL_UID:-9001} USER_GID=${LOCAL_GID:-9001} groupadd --non-unique --gid $USER_GID $USER_NAME useradd --non-unique --create-home --uid $USER_UID --gid $USER_GID $USER_NAME export HOME=/home/$USER_NAME TESTFILE=$RUSTUP_HOME/settings.toml CACHED_UID=$(stat -c "%u" $TESTFILE) CACHED_GID=$(stat -c "%g" $TESTFILE) if [ $CACHED_UID != $USER_UID ] || [ $USER_GID != $CACHED_GID ]; then chown $USER_UID:$USER_GID -R $CARGO_HOME $RUSTUP_HOME fi exec gosu $USER_NAME "$@" ================================================ FILE: giscus.json ================================================ { "origins": ["https://os.phil-opp.com"] } ================================================ FILE: scripts/merge.fish ================================================ set original (git rev-parse --abbrev-ref HEAD) for x in (seq 99) set previous (printf "post-%02i" $x) set current (printf "post-%02i" (math $x + 1)) if not git checkout $current --quiet break end if not git merge $previous --no-edit break end end git checkout $original --quiet ================================================ FILE: scripts/push.fish ================================================ set original (git rev-parse --abbrev-ref HEAD) set list for x in (seq 99) set current (printf "post-%02i" $x) if not git checkout $current --quiet break end set list $list $current end git push origin $list git checkout $original --quiet

    Go to the index page.