Repository: rust-secure-code/cargo-supply-chain Branch: master Commit: c89655c002c9 Files: 22 Total size: 74.2 KB Directory structure: gitextract_8jmx9lol/ ├── .github/ │ └── workflows/ │ └── rust.yml ├── .gitignore ├── CHANGELOG.md ├── Cargo.toml ├── LICENSE-APACHE ├── LICENSE-MIT ├── LICENSE-ZLIB ├── README.md ├── fixtures/ │ └── optional_non_dev_dep/ │ ├── Cargo.toml │ └── src/ │ └── lib.rs └── src/ ├── api_client.rs ├── cli.rs ├── common.rs ├── crates_cache.rs ├── main.rs ├── publishers.rs └── subcommands/ ├── crates.rs ├── json.rs ├── json_schema.rs ├── mod.rs ├── publishers.rs └── update.rs ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/workflows/rust.yml ================================================ name: Rust CI on: push: branches: [ master ] pull_request: branches: [ master ] jobs: build: runs-on: ubuntu-latest strategy: matrix: rust: [stable] steps: - uses: actions/checkout@v2 - run: rustup default ${{ matrix.rust }} - name: build run: > cargo build --verbose - name: test run: > cargo test --tests rustfmt: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - uses: actions-rs/toolchain@v1 with: toolchain: stable override: true components: rustfmt - name: Run rustfmt check uses: actions-rs/cargo@v1 with: command: fmt args: -- --check doc: runs-on: ubuntu-latest strategy: matrix: rust: [stable] steps: - uses: actions/checkout@v2 - run: rustup default ${{ matrix.rust }} - name: doc run: > cargo doc --no-deps --document-private-items --all-features ================================================ FILE: .gitignore ================================================ target ================================================ FILE: CHANGELOG.md ================================================ ## v0.3.7 (2026-02-05) - Updated the caching code to handle the recent changes to crates.io dump format ## v0.3.6 (2026-01-22) - Fixed the tool reporting transitive optional dependencies that are disabled by features as part of supply chain surface - Removed test JSON data from the git tree, matching the crates.io package to the git state again - Upgraded to cargo-metadata v0.23 ## v0.3.5 (2025-09-18) - Fixed support for Windows by switching from `xdg` crate to `dirs` crate for discovering the cache directory ## v0.3.4 (2025-06-04) - Improved the message displayed when the latest data dump is considered outdated (contribution by @smoelius) - Bumped dependencies in Cargo.lock by running `cargo update` - Resolved some Clippy lints ## v0.3.3 (2023-05-08) - Add `--no-dev` flag to omit dev dependencies (contribution by @smoelius) ## v0.3.2 (2022-11-04) - Upgrade to `bpaf` 0.7 ## v0.3.1 (2021-03-18) - Fix `--features` flag not being honored if `--target` is also passed ## v0.3.0 (2021-03-18) - Renamed `--cache_max_age` to `--cache-max-age` for consistency with Cargo flags - Accept flags such as `--target` directly, without relying on the escape hatch of passing cargo metadata arguments after `--` - No longer default to `--all-features`, handle features via the same flags as Cargo itself - The json schema is now printed separately, use `cargo supply-chain json --print-schema` to get it - Dropped the `help` subcommand. Use `--help` instead, e.g. `cargo supply-chain crates --help` Internal improvements: - Migrate to bpaf CLI parser, chosen for its balance of expressiveness vs complexity and supply chain sprawl - Add tests for the CLI interface - Do not regenerate the JSON schema on every build; saves a bit of build time and a bit of dependencies in production builds ## v0.2.0 (2021-05-21) - Added `json` subcommand providing structured output and more details - Added `-d`, `--diffable` flag for diff-friendly output mode to all subcommands - Reduced the required download size for `update` subcommand from ~350Mb to ~60Mb - Added a detailed progress bar to all subcommands using `indicatif` - Fixed interrupted `update` subcommand considering its cache to be fresh. Other subcommands were not affected and would simply fetch live data. - If a call to `cargo metadata` fails, show an error instead of panicking - The list of crates in the output of `publishers` subcommand is now sorted ## v0.1.2 (2021-02-24) - Fix help text sometimes being misaligned - Change download progress messages to start counting from 1 rather than from 0 - Only print warnings about crates.io that are immediately relevant to listing dependencies and publishers ## v0.1.1 (2021-02-18) - Drop extreaneous files from the tarball uploaded to crates.io ## v0.1.0 (2021-02-18) - Drop `authors` subcommand - Add `help` subcommand providing detailed help for each subcommand - Bring help text more in line with Cargo help text - Warn about a large amount of data to be downloaded in `update` subcommand - Buffer reads and writes to cache files for a 6x speedup when using cache ## v0.0.4 (2021-01-01) - Report failure instead of panicking on network failure in `update` subcommand - Correctly handle errors returned by the remote server ## v0.0.3 (2020-12-28) - In case of network failure, retry with exponential backoff up to 3 times - Use local certificate store instead of bundling the trusted CA certificates - Refactor argument parsing to use `pico-args` instead of hand-rolled parser ## v0.0.2 (2020-10-14) - `crates` - Shows the people or groups with publisher rights for each crate. - `publishers` - Is the reverse of `crates`, grouping by publisher instead. - `update` - Caches the data dumps from `crates.io` to avoid crawling the web service when lookup up publisher and author information. ## v0.0.1 (2020-10-02) Initial release, supports one command: - `authors` - Crawl through Cargo.toml of all crates and list their authors. Authors might be listed multiple times. For each author, differentiate if they are known by being mentioned in a crate from the local workspace or not. Support for crawling `crates.io` sourced packages is planned. - `publishers` - Doesn't do anything right now. ================================================ FILE: Cargo.toml ================================================ [package] name = "cargo-supply-chain" version = "0.3.7" description = "Gather author, contributor, publisher data on crates in your dependency graph" repository = "https://github.com/rust-secure-code/cargo-supply-chain" authors = ["Andreas Molzer ", "Sergey \"Shnatsel\" Davidoff "] edition = "2018" license = "Apache-2.0 OR MIT OR Zlib" categories = ["development-tools::cargo-plugins", "command-line-utilities"] [dependencies] cargo_metadata = "0.23.0" csv = "1.1" flate2 = "1" humantime = "2" humantime-serde = "1" ureq = { version = "2.0.1", default-features=false, features = ["tls", "native-certs", "json"] } serde = { version = "1.0", features = ["derive"] } serde_json = "1.0" tar = "0.4.30" indicatif = "0.17.0" bpaf = { version = "0.9.1", features = ["derive", "dull-color"] } anyhow = "1.0.28" dirs = "6.0.0" [dev-dependencies] schemars = "0.8.3" ================================================ FILE: LICENSE-APACHE ================================================ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================================================ FILE: LICENSE-MIT ================================================ MIT License Copyright (c) 2020 Andreas Molzer aka. HeroicKatora Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: LICENSE-ZLIB ================================================ Copyright (c) 2020 Andreas Molzer aka. HeroicKatora This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. ================================================ FILE: README.md ================================================ # cargo-supply-chain Gather author, contributor and publisher data on crates in your dependency graph. Use cases include: - Find people and groups worth supporting. - Identify risks in your dependency graph. - An analysis of all the contributors you implicitly trust by building their software. This might have both a sobering and humbling effect. Sample output when run on itself: [`publishers`](https://gist.github.com/Shnatsel/3b7f7d331d944bb75b2f363d4b5fb43d), [`crates`](https://gist.github.com/Shnatsel/dc0ec81f6ad392b8967e8d3f2b1f5f80), [`json`](https://gist.github.com/Shnatsel/511ad1f87528c450157ef9ad09984745). ## Usage To install this tool, please run the following command: ```shell cargo install cargo-supply-chain ``` Then run it with: ```shell cargo supply-chain publishers ``` By default the supply chain is listed for **all targets** and **default features only**. You can alter this behavior by passing `--target=…` to list dependencies for a specific target. You can use `--all-features`, `--no-default-features`, and `--features=…` to control feature selection. Here's a list of subcommands: ```none Gather author, contributor and publisher data on crates in your dependency graph Usage: COMMAND [ARG]… Available options: -h, --help Prints help information -v, --version Prints version information Available commands: publishers List all crates.io publishers in the depedency graph crates List all crates in dependency graph and crates.io publishers for each json Like 'crates', but in JSON and with more fields for each publisher update Download the latest daily dump from crates.io to speed up other commands Most commands also accept flags controlling the features, targets, etc. See 'cargo supply-chain --help' for more information on a specific command. ``` ## License Triple licensed under any of Apache-2.0, MIT, or zlib terms. ================================================ FILE: fixtures/optional_non_dev_dep/Cargo.toml ================================================ [package] name = "optional_non_dev_dep" version = "0.1.0" edition = "2024" publish = false [dependencies] libz-rs-sys = { version = "=0.5.5", optional = true } [dev-dependencies] libz-rs-sys = "=0.5.5" [workspace] ================================================ FILE: fixtures/optional_non_dev_dep/src/lib.rs ================================================ pub fn add(left: u64, right: u64) -> u64 { left + right } #[cfg(test)] mod tests { use super::*; #[test] fn it_works() { let result = add(2, 2); assert_eq!(result, 4); } } ================================================ FILE: src/api_client.rs ================================================ use std::time::{Duration, Instant}; pub struct RateLimitedClient { last_request_time: Option, agent: ureq::Agent, } impl Default for RateLimitedClient { fn default() -> Self { RateLimitedClient { last_request_time: None, agent: ureq::agent(), } } } impl RateLimitedClient { pub fn new() -> Self { RateLimitedClient::default() } pub fn get(&mut self, url: &str) -> ureq::Request { self.wait_to_honor_rate_limit(); self.agent.get(url).set( "User-Agent", "cargo supply-chain (https://github.com/rust-secure-code/cargo-supply-chain)", ) } /// Waits until at least 1 second has elapsed since last request, /// as per fn wait_to_honor_rate_limit(&mut self) { if let Some(prev_req_time) = self.last_request_time { let next_req_time = prev_req_time + Duration::from_secs(1); if let Some(time_to_wait) = next_req_time.checked_duration_since(Instant::now()) { std::thread::sleep(time_to_wait); } } self.last_request_time = Some(Instant::now()); } } ================================================ FILE: src/cli.rs ================================================ use bpaf::*; use std::{path::PathBuf, time::Duration}; /// Arguments to be passed to `cargo metadata` #[derive(Clone, Debug, Bpaf)] #[bpaf(generate(meta_args))] pub struct MetadataArgs { // `all_features` and `no_default_features` are not mutually exclusive in `cargo metadata`, // in the sense that it will not error out when encountering them; it just follows `all_features` /// Activate all available features pub all_features: bool, /// Do not activate the `default` feature pub no_default_features: bool, /// Ignore dev-only dependencies pub no_dev: bool, // This is a `String` because we don't parse the value, just pass it on to `cargo metadata` blindly /// Space or comma separated list of features to activate #[bpaf(argument("FEATURES"))] pub features: Option, /// Only include dependencies matching the given target-triple #[bpaf(argument("TRIPLE"))] pub target: Option, /// Path to Cargo.toml #[bpaf(argument("PATH"))] pub manifest_path: Option, } /// Arguments for typical querying commands - crates, publishers, json #[derive(Clone, Debug, Bpaf)] #[bpaf(generate(args))] pub(crate) struct QueryCommandArgs { #[bpaf(external)] pub cache_max_age: Duration, /// Make output more friendly towards tools such as `diff` #[bpaf(short, long)] pub diffable: bool, } #[derive(Clone, Debug, Bpaf)] pub(crate) enum PrintJson { /// Print JSON schema and exit #[bpaf(long("print-schema"))] Schema, Info { #[bpaf(external)] args: QueryCommandArgs, #[bpaf(external)] meta_args: MetadataArgs, }, } /// Gather author, contributor and publisher data on crates in your dependency graph /// /// /// Most commands also accept flags controlling the features, targets, etc. /// See 'cargo supply-chain --help' for more information on a specific command. #[derive(Clone, Debug, Bpaf)] #[bpaf(options("supply-chain"), generate(args_parser), version)] pub(crate) enum CliArgs { /// Lists all crates.io publishers in the dependency graph and owned crates for each /// /// /// If a local cache created by 'update' subcommand is present and up to date, /// it will be used. Otherwise live data will be fetched from the crates.io API. #[bpaf(command)] Publishers { #[bpaf(external)] args: QueryCommandArgs, #[bpaf(external)] meta_args: MetadataArgs, }, /// List all crates in dependency graph and crates.io publishers for each /// /// /// If a local cache created by 'update' subcommand is present and up to date, /// it will be used. Otherwise live data will be fetched from the crates.io API. #[bpaf(command)] Crates { #[bpaf(external)] args: QueryCommandArgs, #[bpaf(external)] meta_args: MetadataArgs, }, /// Detailed info on publishers of all crates in the dependency graph, in JSON /// /// The JSON schema is also available, use --print-schema to get it. /// /// If a local cache created by 'update' subcommand is present and up to date, /// it will be used. Otherwise live data will be fetched from the crates.io API.", #[bpaf(command)] Json(#[bpaf(external(print_json))] PrintJson), /// Download the latest daily dump from crates.io to speed up other commands /// /// /// If the local cache is already younger than specified in '--cache-max-age' option, /// a newer version will not be downloaded. /// /// Note that this downloads the entire crates.io database, which is hundreds of Mb of data! /// If you are on a metered connection, you should not be running the 'update' subcommand. /// Instead, rely on requests to the live API - they are slower, but use much less data. #[bpaf(command)] Update { #[bpaf(external)] cache_max_age: Duration, }, } fn cache_max_age() -> impl Parser { long("cache-max-age") .help( "\ The cache will be considered valid while younger than specified. The format is a human readable duration such as `1w` or `1d 6h`. If not specified, the cache is considered valid for 48 hours.", ) .argument::("AGE") .parse(|text| humantime::parse_duration(&text)) .fallback(Duration::from_secs(48 * 3600)) } #[cfg(test)] mod tests { use super::*; fn parse_args(args: &[&str]) -> Result { args_parser().run_inner(Args::from(args)) } #[test] fn test_cache_max_age_parser() { let _ = parse_args(&["crates", "--cache-max-age", "7d"]).unwrap(); let _ = parse_args(&["crates", "--cache-max-age=7d"]).unwrap(); let _ = parse_args(&["crates", "--cache-max-age=1w"]).unwrap(); let _ = parse_args(&["crates", "--cache-max-age=1m"]).unwrap(); let _ = parse_args(&["crates", "--cache-max-age=1s"]).unwrap(); // erroneous invocations that must be rejected assert!(parse_args(&["crates", "--cache-max-age"]).is_err()); assert!(parse_args(&["crates", "--cache-max-age=5"]).is_err()); } #[test] fn test_accepted_query_options() { for command in ["crates", "publishers", "json"] { let _ = args_parser().run_inner(&[command][..]).unwrap(); let _ = args_parser().run_inner(&[command, "-d"][..]).unwrap(); let _ = args_parser() .run_inner(&[command, "--diffable"][..]) .unwrap(); let _ = args_parser() .run_inner(&[command, "--cache-max-age=7d"][..]) .unwrap(); let _ = args_parser() .run_inner(&[command, "-d", "--cache-max-age=7d"][..]) .unwrap(); let _ = args_parser() .run_inner(&[command, "--diffable", "--cache-max-age=7d"][..]) .unwrap(); } } #[test] fn test_accepted_update_options() { let _ = args_parser().run_inner(Args::from(&["update"])).unwrap(); let _ = parse_args(&["update", "--cache-max-age=7d"]).unwrap(); // erroneous invocations that must be rejected assert!(parse_args(&["update", "-d"]).is_err()); assert!(parse_args(&["update", "--diffable"]).is_err()); assert!(parse_args(&["update", "-d", "--cache-max-age=7d"]).is_err()); assert!(parse_args(&["update", "--diffable", "--cache-max-age=7d"]).is_err()); } #[test] fn test_json_schema_option() { let _ = parse_args(&["json", "--print-schema"]).unwrap(); // erroneous invocations that must be rejected assert!(parse_args(&["json", "--print-schema", "-d"]).is_err()); assert!(parse_args(&["json", "--print-schema", "--diffable"]).is_err()); assert!(parse_args(&["json", "--print-schema", "--cache-max-age=7d"]).is_err()); assert!( parse_args(&["json", "--print-schema", "--diffable", "--cache-max-age=7d"]).is_err() ); } #[test] fn test_invocation_through_cargo() { let _ = parse_args(&["supply-chain", "update"]).unwrap(); let _ = parse_args(&["supply-chain", "publishers", "-d"]).unwrap(); let _ = parse_args(&["supply-chain", "crates", "-d", "--cache-max-age=5h"]).unwrap(); let _ = parse_args(&["supply-chain", "json", "--diffable"]).unwrap(); let _ = parse_args(&["supply-chain", "json", "--print-schema"]).unwrap(); // erroneous invocations to be rejected assert!(parse_args(&["supply-chain", "supply-chain", "json", "--print-schema"]).is_err()); assert!(parse_args(&["supply-chain", "supply-chain", "crates", "-d"]).is_err()); } } ================================================ FILE: src/common.rs ================================================ use anyhow::bail; use cargo_metadata::{ CargoOpt::AllFeatures, CargoOpt::NoDefaultFeatures, DependencyKind, Metadata, MetadataCommand, NodeDep, Package, PackageId, }; use std::collections::{HashMap, HashSet}; pub use crate::cli::MetadataArgs; #[derive(Debug, Copy, Clone, Eq, PartialEq, Hash)] #[cfg_attr(test, derive(serde::Deserialize, serde::Serialize))] pub enum PkgSource { Local, CratesIo, Foreign, } #[derive(Debug, Clone)] #[cfg_attr(test, derive(Eq, PartialEq, serde::Deserialize, serde::Serialize))] pub struct SourcedPackage { pub source: PkgSource, pub package: Package, } fn metadata_command(args: MetadataArgs) -> MetadataCommand { let mut command = MetadataCommand::new(); if args.all_features { command.features(AllFeatures); } if args.no_default_features { command.features(NoDefaultFeatures); } if let Some(path) = args.manifest_path { command.manifest_path(path); } let mut other_options = Vec::new(); if let Some(target) = args.target { other_options.push(format!("--filter-platform={}", target)); } // `cargo-metadata` crate assumes we have a Vec of features, // but we really didn't want to parse it ourselves, so we pass the argument directly if let Some(features) = args.features { other_options.push(format!("--features={}", features)); } command.other_options(other_options); command } pub fn sourced_dependencies( metadata_args: MetadataArgs, ) -> Result, anyhow::Error> { let no_dev = metadata_args.no_dev; let command = metadata_command(metadata_args); let meta = match command.exec() { Ok(v) => v, Err(cargo_metadata::Error::CargoMetadata { stderr: e }) => bail!(e), Err(err) => bail!("Failed to fetch crate metadata!\n {}", err), }; sourced_dependencies_from_metadata(meta, no_dev) } fn sourced_dependencies_from_metadata( meta: Metadata, no_dev: bool, ) -> Result, anyhow::Error> { let mut how: HashMap = HashMap::new(); let mut what: HashMap = meta .packages .iter() .map(|package| (package.id.clone(), package.clone())) .collect(); for pkg in &meta.packages { // Suppose every package is foreign, until proven otherwise.. how.insert(pkg.id.clone(), PkgSource::Foreign); } // Find the crates.io dependencies.. for pkg in &meta.packages { if let Some(source) = pkg.source.as_ref() { if source.is_crates_io() { how.insert(pkg.id.clone(), PkgSource::CratesIo); } } } for pkg in &meta.workspace_members { *how.get_mut(pkg).unwrap() = PkgSource::Local; } if no_dev { (how, what) = extract_non_dev_dependencies(&meta, &mut how, &mut what); } let dependencies: Vec<_> = how .iter() .map(|(id, kind)| { let dep = what.get(id).cloned().unwrap(); SourcedPackage { source: *kind, package: dep, } }) .collect(); Ok(dependencies) } /// Start with the `PkgSource::Local` packages, then iteratively add non-dev-dependencies until no /// more packages can be added, and return the results. /// /// This function uses the resolved dependency graph from `cargo metadata` to determine which /// dependencies are actually used. This function does _not_ use the declared dependencies, which /// may include optional dependencies that aren't actually used. fn extract_non_dev_dependencies( meta: &Metadata, how: &mut HashMap, what: &mut HashMap, ) -> (HashMap, HashMap) { let mut how_new = HashMap::new(); let mut what_new = HashMap::new(); let Some(resolve) = &meta.resolve else { return (HashMap::new(), HashMap::new()); }; let node_deps: HashMap<&PackageId, &[NodeDep]> = resolve .nodes .iter() .map(|node| (&node.id, node.deps.as_slice())) .collect(); let mut ids = how .iter() .filter_map(|(id, source)| { if matches!(source, PkgSource::Local) { Some(id.clone()) } else { None } }) .collect::>(); while !ids.is_empty() { let mut deps = HashSet::new(); for id in ids.drain(..) { if let Some(node_deps) = node_deps.get(&id) { for dep in *node_deps { if dep .dep_kinds .iter() .any(|info| info.kind != DependencyKind::Development) { deps.insert(&dep.pkg); } } } how_new.insert(id.clone(), how.remove(&id).unwrap()); what_new.insert(id.clone(), what.remove(&id).unwrap()); } for pkg_id in what.keys() { if deps.contains(pkg_id) { ids.push(pkg_id.clone()); } } } (how_new, what_new) } pub fn crate_names_from_source(crates: &[SourcedPackage], source: PkgSource) -> Vec { let mut filtered_crate_names: Vec = crates .iter() .filter(|p| p.source == source) .map(|p| p.package.name.to_string()) .collect(); // Collecting into a HashSet is less user-friendly because order varies between runs filtered_crate_names.sort_unstable(); filtered_crate_names.dedup(); filtered_crate_names } pub fn complain_about_non_crates_io_crates(dependencies: &[SourcedPackage]) { { // scope bound to avoid accidentally referencing local crates when working with foreign ones let local_crate_names = crate_names_from_source(dependencies, PkgSource::Local); if !local_crate_names.is_empty() { eprintln!( "\nThe following crates will be ignored because they come from a local directory:" ); for crate_name in &local_crate_names { eprintln!(" - {}", crate_name); } } } { let foreign_crate_names = crate_names_from_source(dependencies, PkgSource::Foreign); if !foreign_crate_names.is_empty() { eprintln!("\nCannot audit the following crates because they are not from crates.io:"); for crate_name in &foreign_crate_names { eprintln!(" - {}", crate_name); } } } } pub fn comma_separated_list(list: &[String]) -> String { let mut result = String::new(); let mut first_loop = true; for crate_name in list { if !first_loop { result.push_str(", "); } first_loop = false; result.push_str(crate_name.as_str()); } result } #[cfg(test)] mod tests { use super::sourced_dependencies_from_metadata; use cargo_metadata::MetadataCommand; #[test] fn optional_dependency_excluded_when_not_activated() { let metadata = MetadataCommand::new() .current_dir("fixtures/optional_non_dev_dep") .exec() .unwrap(); let deps = sourced_dependencies_from_metadata(metadata.clone(), false).unwrap(); assert!(deps.iter().any(|dep| dep.package.name == "libz-rs-sys")); let deps_no_dev = sourced_dependencies_from_metadata(metadata, true).unwrap(); assert!(!deps_no_dev .iter() .any(|dep| dep.package.name == "libz-rs-sys")); } } ================================================ FILE: src/crates_cache.rs ================================================ use crate::api_client::RateLimitedClient; use crate::publishers::{PublisherData, PublisherKind}; use dirs; use flate2::read::GzDecoder; use serde::{Deserialize, Serialize}; use std::{ collections::{BTreeSet, HashMap}, fs, io::{self, ErrorKind}, path::PathBuf, time::Duration, time::SystemTimeError, }; pub struct CratesCache { cache_dir: Option, metadata: Option, crates: Option>, crate_owners: Option>>, users: Option>, teams: Option>, versions: Option>, } pub enum CacheState { Fresh, Expired, Unknown, } pub enum DownloadState { /// The tag still matched and resource was not stale. Fresh, /// There was a newer resource. Expired, /// We forced the download of an update. Stale, } struct CacheDir(PathBuf); #[derive(Clone, Deserialize, Serialize)] struct Metadata { #[serde(with = "humantime_serde")] timestamp: std::time::SystemTime, } #[derive(Clone, Deserialize, Serialize)] struct MetadataStored { #[serde(with = "humantime_serde")] timestamp: std::time::SystemTime, #[serde(default)] etag: Option, } #[derive(Clone, Deserialize, Serialize)] struct Crate { name: String, id: u64, repository: Option, } #[derive(Clone, Deserialize, Serialize)] struct CrateOwner { crate_id: u64, owner_id: u64, owner_kind: i32, } #[derive(Clone, Deserialize, Serialize)] struct Publisher { crate_id: u64, published_by: u64, } #[derive(Clone, Deserialize, Serialize)] struct Team { id: u64, avatar: Option, login: String, name: Option, } #[derive(Clone, Deserialize, Serialize)] struct User { id: u64, gh_avatar: Option, gh_id: Option, gh_login: String, name: Option, } impl CratesCache { const METADATA_FS: &'static str = "metadata.json"; const CRATES_FS: &'static str = "crates.json"; const CRATE_OWNERS_FS: &'static str = "crate_owners.json"; const USERS_FS: &'static str = "users.json"; const TEAMS_FS: &'static str = "teams.json"; const VERSIONS_FS: &'static str = "versions.json"; const DUMP_URL: &'static str = "https://static.crates.io/db-dump.tar.gz"; /// Open a crates cache. pub fn new() -> Self { CratesCache { cache_dir: Self::cache_dir().map(CacheDir), metadata: None, crates: None, crate_owners: None, users: None, teams: None, versions: None, } } fn cache_dir() -> Option { dirs::cache_dir() } /// Re-download the list from the data dumps. pub fn download( &mut self, client: &mut RateLimitedClient, max_age: Duration, ) -> Result { let bar = indicatif::ProgressBar::new(!0) .with_prefix("Downloading") .with_style( indicatif::ProgressStyle::default_spinner() .template("{prefix:>12.bright.cyan} {spinner} {msg:.cyan}") .unwrap(), ) .with_message("preparing"); let remembered_etag; let response = { let mut request = client.get(Self::DUMP_URL); if let Some(meta) = self.load_metadata() { remembered_etag = meta.etag.clone(); // See if we can consider the resource not-yet-stale. if meta.validate(max_age) == Some(true) { if let Some(etag) = meta.etag.as_ref() { request = request.set("if-none-match", etag); } } } else { remembered_etag = None; } request.call() } .map_err(io::Error::other)?; // Not modified. if response.status() == 304 { bar.finish_and_clear(); return Ok(DownloadState::Fresh); } if let Some(length) = response .header("content-length") .and_then(|l| l.parse().ok()) { bar.set_style( indicatif::ProgressStyle::default_bar() .template("{prefix:>12.bright.cyan} [{bar:27}] {bytes:>9}/{total_bytes:9} {bytes_per_sec} ETA {eta:4} - {msg:.cyan}").unwrap() .progress_chars("=> ")); bar.set_length(length); } else { bar.println("Length unspecified, expect at least 250MiB"); bar.set_style(indicatif::ProgressStyle::default_spinner().template( "{prefix:>12.bright.cyan} {spinner} {bytes:>9} {bytes_per_sec} - {msg:.cyan}", ).unwrap()); } let etag = response.header("etag").map(String::from); let reader = bar.wrap_read(response.into_reader()); let ungzip = GzDecoder::new(reader); let mut archive = tar::Archive::new(ungzip); let cache_dir = CratesCache::cache_dir().ok_or(ErrorKind::NotFound)?; let mut cache_updater = CacheUpdater::new(cache_dir)?; let required_files = [ Self::CRATE_OWNERS_FS, Self::CRATES_FS, Self::USERS_FS, Self::TEAMS_FS, Self::METADATA_FS, ] .iter() .map(ToString::to_string) .collect::>(); for entry in (archive.entries()?).flatten() { if let Ok(path) = entry.path() { if let Some(name) = path.file_name().and_then(std::ffi::OsStr::to_str) { bar.set_message(name.to_string()); } } if entry_path_ends_with(&entry, "crate_owners.csv") { let owners: Vec = read_csv_data(entry)?; cache_updater.store_multi_map( &mut self.crate_owners, Self::CRATE_OWNERS_FS, owners.as_slice(), &|owner| owner.crate_id, )?; } else if entry_path_ends_with(&entry, "crates.csv") { let crates: Vec = read_csv_data(entry)?; cache_updater.store_map( &mut self.crates, Self::CRATES_FS, crates.as_slice(), &|crate_| crate_.name.clone(), )?; } else if entry_path_ends_with(&entry, "users.csv") { let users: Vec = read_csv_data(entry)?; cache_updater.store_map( &mut self.users, Self::USERS_FS, users.as_slice(), &|user| user.id, )?; } else if entry_path_ends_with(&entry, "teams.csv") { let teams: Vec = read_csv_data(entry)?; cache_updater.store_map( &mut self.teams, Self::TEAMS_FS, teams.as_slice(), &|team| team.id, )?; } else if entry_path_ends_with(&entry, "metadata.json") { let meta: Metadata = serde_json::from_reader(entry)?; cache_updater.store( &mut self.metadata, Self::METADATA_FS, MetadataStored { timestamp: meta.timestamp, etag: etag.clone(), }, )?; } else { // This was not a file with a filename we actually use. // Check if we've obtained all the files we need. // If yes, we can end the download early. // This saves hundreds of megabytes of traffic. if required_files.is_subset(&cache_updater.staged_files) { break; } } } // Now that we've successfully downloaded and stored everything, // replace the old cache contents with the new one. cache_updater.commit()?; // If we get here, we had no etag or the etag mismatched or we forced a download due to // stale data. Catch the last as it means the crates.io daily dumps were not updated. if remembered_etag == etag { Ok(DownloadState::Stale) } else { Ok(DownloadState::Expired) } } pub fn expire(&mut self, max_age: Duration) -> CacheState { match self.validate(max_age) { // Still fresh. Some(true) => CacheState::Fresh, // There was no valid meta data. Consider expired for safety. None => { self.cache_dir = None; CacheState::Unknown } Some(false) => { self.cache_dir = None; CacheState::Expired } } } pub fn age(&mut self) -> Option { match self.load_metadata() { Some(meta) => meta.age().ok(), None => None, } } pub fn publisher_users(&mut self, crate_name: &str) -> Option> { let id = self.load_crates()?.get(crate_name)?.id; let owners = self.load_crate_owners()?.get(&id)?.clone(); let users = self.load_users()?; let publisher = owners .into_iter() .filter(|owner| owner.owner_kind == 0) .filter_map(|owner: CrateOwner| { let user = users.get(&owner.owner_id)?; Some(PublisherData { id: user.id, avatar: user.gh_avatar.clone(), login: user.gh_login.clone(), name: user.name.clone(), kind: PublisherKind::user, }) }) .collect(); Some(publisher) } pub fn publisher_teams(&mut self, crate_name: &str) -> Option> { let id = self.load_crates()?.get(crate_name)?.id; let owners = self.load_crate_owners()?.get(&id)?.clone(); let teams = self.load_teams()?; let publisher = owners .into_iter() .filter(|owner| owner.owner_kind == 1) .filter_map(|owner: CrateOwner| { let team = teams.get(&owner.owner_id)?; Some(PublisherData { id: team.id, avatar: team.avatar.clone(), login: team.login.clone(), name: team.name.clone(), kind: PublisherKind::team, }) }) .collect(); Some(publisher) } fn validate(&mut self, max_age: Duration) -> Option { let meta = self.load_metadata()?; meta.validate(max_age) } fn load_metadata(&mut self) -> Option<&MetadataStored> { self.cache_dir .as_ref()? .load_cached(&mut self.metadata, Self::METADATA_FS) .ok() } fn load_crates(&mut self) -> Option<&HashMap> { self.cache_dir .as_ref()? .load_cached(&mut self.crates, Self::CRATES_FS) .ok() } fn load_crate_owners(&mut self) -> Option<&HashMap>> { self.cache_dir .as_ref()? .load_cached(&mut self.crate_owners, Self::CRATE_OWNERS_FS) .ok() } fn load_users(&mut self) -> Option<&HashMap> { self.cache_dir .as_ref()? .load_cached(&mut self.users, Self::USERS_FS) .ok() } fn load_teams(&mut self) -> Option<&HashMap> { self.cache_dir .as_ref()? .load_cached(&mut self.teams, Self::TEAMS_FS) .ok() } fn load_versions(&mut self) -> Option<&HashMap<(u64, String), Publisher>> { self.cache_dir .as_ref()? .load_cached(&mut self.versions, Self::VERSIONS_FS) .ok() } } fn entry_path_ends_with(entry: &tar::Entry, needle: &str) -> bool { let Ok(path) = entry.path() else { return false; }; let Some(file_name) = path.file_name() else { return false; }; file_name == needle } fn read_csv_data( from: impl io::Read, ) -> Result, csv::Error> { let mut reader = csv::ReaderBuilder::new() .delimiter(b',') .double_quote(true) .quoting(true) .from_reader(from); reader.deserialize().collect() } impl MetadataStored { fn validate(&self, max_age: Duration) -> Option { match self.age() { Ok(duration) => Some(duration < max_age), Err(_) => None, } } pub fn age(&self) -> Result { self.timestamp.elapsed() } } impl CacheDir { fn load_cached<'cache, T>( &self, cache: &'cache mut Option, file: &str, ) -> Result<&'cache T, io::Error> where T: serde::de::DeserializeOwned, { match cache { Some(datum) => Ok(datum), None => { let file = fs::File::open(self.0.join(file))?; let reader = io::BufReader::new(file); let crates: T = serde_json::from_reader(reader).unwrap(); Ok(cache.get_or_insert(crates)) } } } } /// Implements a two-phase transactional update mechanism: /// you can store data, but it will not overwrite previous data until you call `commit()` struct CacheUpdater { dir: PathBuf, staged_files: BTreeSet, } /// Creates the cache directory if it doesn't exist. /// Returns an error if creation fails. impl CacheUpdater { fn new(dir: PathBuf) -> Result { if !dir.exists() { fs::create_dir_all(&dir)?; } if !dir.is_dir() { // Well. We certainly don't want to delete anything. return Err(io::ErrorKind::AlreadyExists.into()); } Ok(Self { dir, staged_files: BTreeSet::new(), }) } /// Commits to disk any changes that you have staged via the `store()` function. fn commit(&mut self) -> io::Result<()> { let mut uncommitted_files = std::mem::take(&mut self.staged_files); let metadata_file = uncommitted_files.take(CratesCache::METADATA_FS); for file in uncommitted_files { let source = self.dir.join(&file).with_extension("part"); let destination = self.dir.join(&file); fs::rename(source, destination)?; } // metadata_file is special since it contains the timestamp for the cache. // We will only commit it and update the timestamp if updating everything else succeeds. // Otherwise it would be possible to create a partially updated cache that's considered fresh. if let Some(file) = metadata_file { let source = self.dir.join(&file).with_extension("part"); let destination = self.dir.join(&file); fs::rename(source, destination)?; } Ok(()) } /// Does not overwrite existing data until `commit()` is called. /// If you do not call `commit()` after this, the on-disk cache will not be actually updated! fn store(&mut self, cache: &mut Option, file: &str, value: T) -> Result<(), io::Error> where T: Serialize, { *cache = None; let value = cache.get_or_insert(value); self.staged_files.insert(file.to_owned()); let out_path = self.dir.join(file).with_extension("part"); let out_file = fs::File::create(out_path)?; let out = io::BufWriter::new(out_file); serde_json::to_writer(out, value)?; Ok(()) } fn store_map( &mut self, cache: &mut Option>, file: &str, entries: &[T], key_fn: &dyn Fn(&T) -> K, ) -> Result<(), io::Error> where T: Serialize + Clone, K: Serialize + Eq + std::hash::Hash, { let hashed: HashMap = entries .iter() .map(|entry| (key_fn(entry), entry.clone())) .collect(); self.store(cache, file, hashed) } fn store_multi_map( &mut self, cache: &mut Option>>, file: &str, entries: &[T], key_fn: &dyn Fn(&T) -> K, ) -> Result<(), io::Error> where T: Serialize + Clone, K: Serialize + Eq + std::hash::Hash, { let mut hashed: HashMap = HashMap::new(); for entry in entries.iter() { let key = key_fn(entry); hashed .entry(key) .or_insert_with(Vec::new) .push(entry.clone()); } self.store(cache, file, hashed) } } ================================================ FILE: src/main.rs ================================================ //! Gather author, contributor, publisher data on crates in your dependency graph. //! //! There are some use cases: //! //! * Find people and groups worth supporting. //! * An analysis of all the contributors you implicitly trust by building their software. This //! might have both a sobering and humbling effect. //! * Identify risks in your dependency graph. #![forbid(unsafe_code)] mod api_client; mod cli; mod common; mod crates_cache; mod publishers; mod subcommands; use cli::CliArgs; use common::MetadataArgs; fn main() -> Result<(), anyhow::Error> { let args = cli::args_parser().fallback_to_usage().run(); dispatch_command(args) } fn dispatch_command(args: CliArgs) -> Result<(), anyhow::Error> { match args { CliArgs::Publishers { args, meta_args } => { subcommands::publishers(meta_args, args.diffable, args.cache_max_age)?; } CliArgs::Crates { args, meta_args } => { subcommands::crates(meta_args, args.diffable, args.cache_max_age)?; } CliArgs::Update { cache_max_age } => subcommands::update(cache_max_age)?, CliArgs::Json(json) => match json { cli::PrintJson::Schema => subcommands::print_schema()?, cli::PrintJson::Info { args, meta_args } => { subcommands::json(meta_args, args.diffable, args.cache_max_age)?; } }, } Ok(()) } ================================================ FILE: src/publishers.rs ================================================ use crate::api_client::RateLimitedClient; use crate::crates_cache::{CacheState, CratesCache}; use serde::{Deserialize, Serialize}; use std::{ collections::BTreeMap, io::{self}, time::Duration, }; #[cfg(test)] use schemars::JsonSchema; use crate::common::{crate_names_from_source, PkgSource, SourcedPackage}; #[derive(Deserialize)] struct UsersResponse { users: Vec, } #[derive(Deserialize)] struct TeamsResponse { teams: Vec, } /// Data about a single publisher received from a crates.io API endpoint #[cfg_attr(test, derive(JsonSchema))] #[derive(Serialize, Deserialize, Debug, Clone)] pub struct PublisherData { pub id: u64, pub login: String, pub kind: PublisherKind, // URL is disabled because it's present in API responses but not in DB dumps, // so the output would vary inconsistent depending on data source //pub url: Option, /// Display name. It is NOT guaranteed to be unique! pub name: Option, /// Avatar image URL pub avatar: Option, } impl PartialEq for PublisherData { fn eq(&self, other: &Self) -> bool { self.id == other.id } } impl Eq for PublisherData { // holds for PublisherData because we're comparing u64 IDs, and it holds for u64 fn assert_receiver_is_total_eq(&self) {} } impl PartialOrd for PublisherData { fn partial_cmp(&self, other: &Self) -> Option { Some(self.id.cmp(&other.id)) } } impl Ord for PublisherData { fn cmp(&self, other: &Self) -> std::cmp::Ordering { self.id.cmp(&other.id) } } #[cfg_attr(test, derive(JsonSchema))] #[derive(Serialize, Deserialize, Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq)] #[allow(non_camel_case_types)] pub enum PublisherKind { team, user, } pub fn publisher_users( client: &mut RateLimitedClient, crate_name: &str, ) -> Result, io::Error> { let url = format!("https://crates.io/api/v1/crates/{}/owner_user", crate_name); let resp = get_with_retry(&url, client, 3)?; let data: UsersResponse = resp.into_json()?; Ok(data.users) } pub fn publisher_teams( client: &mut RateLimitedClient, crate_name: &str, ) -> Result, io::Error> { let url = format!("https://crates.io/api/v1/crates/{}/owner_team", crate_name); let resp = get_with_retry(&url, client, 3)?; let data: TeamsResponse = resp.into_json()?; Ok(data.teams) } fn get_with_retry( url: &str, client: &mut RateLimitedClient, attempts: u8, ) -> Result { let mut resp = client.get(url).call().map_err(io::Error::other)?; let mut count = 1; let mut wait = 5; while resp.status() != 200 && count <= attempts { eprintln!( "Failed retrieving {:?}, trying again in {} seconds, attempt {}/{}", url, wait, count, attempts ); std::thread::sleep(std::time::Duration::from_secs(wait)); resp = client.get(url).call().map_err(io::Error::other)?; count += 1; wait *= 3; } Ok(resp) } pub fn fetch_owners_of_crates( dependencies: &[SourcedPackage], max_age: Duration, ) -> Result< ( BTreeMap>, BTreeMap>, ), io::Error, > { let crates_io_names = crate_names_from_source(dependencies, PkgSource::CratesIo); let mut client = RateLimitedClient::new(); let mut cached = CratesCache::new(); let using_cache = match cached.expire(max_age) { CacheState::Fresh => true, CacheState::Expired => { eprintln!( "\nIgnoring expired cache, older than {}.", // we use humantime rather than indicatif because we take humantime input // and here we simply repeat it back to the user humantime::format_duration(max_age) ); eprintln!(" Run `cargo supply-chain update` to update it."); false } CacheState::Unknown => { eprintln!("\nThe `crates.io` cache was not found or it is invalid."); eprintln!(" Run `cargo supply-chain update` to generate it."); false } }; let mut users: BTreeMap> = BTreeMap::new(); let mut teams: BTreeMap> = BTreeMap::new(); if using_cache { let age = cached.age().unwrap(); eprintln!( "\nUsing cached data. Cache age: {}", indicatif::HumanDuration(age) ); } else { eprintln!("\nFetching publisher info from crates.io"); eprintln!("This will take roughly 2 seconds per crate due to API rate limits"); } let bar = indicatif::ProgressBar::new(crates_io_names.len() as u64) .with_prefix("Preparing") .with_style( indicatif::ProgressStyle::default_bar() .template("{prefix:>12.bright.cyan} [{bar:27}] {pos:>4}/{len:4} ETA {eta:3} - {msg:.cyan}").unwrap() .progress_chars("=> ") ); for (i, crate_name) in crates_io_names.iter().enumerate() { bar.set_message(crate_name.clone()); bar.set_position((i + 1) as u64); let cached_users = cached.publisher_users(crate_name); let cached_teams = cached.publisher_teams(crate_name); if let (Some(pub_users), Some(pub_teams)) = (cached_users, cached_teams) { bar.set_prefix("Loading cache"); users.insert(crate_name.clone(), pub_users); teams.insert(crate_name.clone(), pub_teams); } else { // Handle crates not found in the cache by fetching live data for them bar.set_prefix("Downloading"); let pusers = publisher_users(&mut client, crate_name)?; users.insert(crate_name.clone(), pusers); let pteams = publisher_teams(&mut client, crate_name)?; teams.insert(crate_name.clone(), pteams); } } Ok((users, teams)) } ================================================ FILE: src/subcommands/crates.rs ================================================ use crate::publishers::{fetch_owners_of_crates, PublisherKind}; use crate::{ common::{comma_separated_list, complain_about_non_crates_io_crates, sourced_dependencies}, MetadataArgs, }; pub fn crates( metadata_args: MetadataArgs, diffable: bool, max_age: std::time::Duration, ) -> Result<(), anyhow::Error> { let dependencies = sourced_dependencies(metadata_args)?; complain_about_non_crates_io_crates(&dependencies); let (mut owners, publisher_teams) = fetch_owners_of_crates(&dependencies, max_age)?; for (crate_name, publishers) in publisher_teams { owners.entry(crate_name).or_default().extend(publishers); } let mut ordered_owners: Vec<_> = owners.into_iter().collect(); if diffable { // Sort alphabetically by crate name ordered_owners.sort_unstable_by_key(|(name, _)| name.clone()); } else { // Order by the number of owners, but put crates owned by teams first ordered_owners.sort_unstable_by_key(|(name, publishers)| { ( !publishers.iter().any(|p| p.kind == PublisherKind::team), // contains at least one team usize::MAX - publishers.len(), name.clone(), ) }); } for (_, publishers) in &mut ordered_owners { // For each crate put teams first publishers.sort_unstable_by_key(|p| (p.kind, p.login.clone())); } if !diffable { println!( "\nDependency crates with the people and teams that can publish them to crates.io:\n" ); } for (i, (crate_name, publishers)) in ordered_owners.iter().enumerate() { let pretty_publishers: Vec = publishers .iter() .map(|p| match p.kind { PublisherKind::team => format!("team \"{}\"", p.login), PublisherKind::user => p.login.to_string(), }) .collect(); let publishers_list = comma_separated_list(&pretty_publishers); if diffable { println!("{}: {}", crate_name, publishers_list); } else { println!("{}. {}: {}", i + 1, crate_name, publishers_list); } } if !ordered_owners.is_empty() { eprintln!("\nNote: there may be outstanding publisher invitations. crates.io provides no way to list them."); eprintln!("See https://github.com/rust-lang/crates.io/issues/2868 for more info."); } Ok(()) } ================================================ FILE: src/subcommands/json.rs ================================================ //! `json` subcommand is equivalent to `crates`, //! but provides structured output and more info about each publisher. use crate::publishers::{fetch_owners_of_crates, PublisherData}; use crate::{ common::{crate_names_from_source, sourced_dependencies, PkgSource}, MetadataArgs, }; use serde::Serialize; use std::collections::BTreeMap; #[cfg(test)] use schemars::JsonSchema; #[cfg_attr(test, derive(JsonSchema))] #[derive(Debug, Serialize, Default, Clone)] pub struct StructuredOutput { not_audited: NotAudited, /// Maps crate names to info about the publishers of each crate crates_io_crates: BTreeMap>, } #[cfg_attr(test, derive(JsonSchema))] #[derive(Debug, Serialize, Default, Clone)] pub struct NotAudited { /// Names of crates that are imported from a location in the local filesystem, not from a registry local_crates: Vec, /// Names of crates that are neither from crates.io nor from a local filesystem foreign_crates: Vec, } pub fn json( args: MetadataArgs, diffable: bool, max_age: std::time::Duration, ) -> Result<(), anyhow::Error> { let mut output = StructuredOutput::default(); let dependencies = sourced_dependencies(args)?; // Report non-crates.io dependencies output.not_audited.local_crates = crate_names_from_source(&dependencies, PkgSource::Local); output.not_audited.foreign_crates = crate_names_from_source(&dependencies, PkgSource::Foreign); output.not_audited.local_crates.sort_unstable(); output.not_audited.foreign_crates.sort_unstable(); // Fetch list of owners and publishers let (mut owners, publisher_teams) = fetch_owners_of_crates(&dependencies, max_age)?; // Merge the two maps we received into one for (crate_name, publishers) in publisher_teams { owners.entry(crate_name).or_default().extend(publishers); } // Sort the vectors of publisher data. This helps when diffing the output, // but we do it unconditionally because it's cheap and helps users pull less hair when debugging. for list in owners.values_mut() { list.sort_unstable_by_key(|x| x.id); } output.crates_io_crates = owners; // Print the result to stdout let stdout = std::io::stdout(); let handle = stdout.lock(); if diffable { serde_json::to_writer_pretty(handle, &output)?; } else { serde_json::to_writer(handle, &output)?; } Ok(()) } ================================================ FILE: src/subcommands/json_schema.rs ================================================ //! The schema for the JSON subcommand output use std::io::{Result, Write}; pub fn print_schema() -> Result<()> { writeln!(std::io::stdout(), "{}", JSON_SCHEMA)?; Ok(()) } const JSON_SCHEMA: &str = r##"{ "$schema": "http://json-schema.org/draft-07/schema#", "title": "StructuredOutput", "type": "object", "required": [ "crates_io_crates", "not_audited" ], "properties": { "crates_io_crates": { "description": "Maps crate names to info about the publishers of each crate", "type": "object", "additionalProperties": { "type": "array", "items": { "$ref": "#/definitions/PublisherData" } } }, "not_audited": { "$ref": "#/definitions/NotAudited" } }, "definitions": { "NotAudited": { "type": "object", "required": [ "foreign_crates", "local_crates" ], "properties": { "foreign_crates": { "description": "Names of crates that are neither from crates.io nor from a local filesystem", "type": "array", "items": { "type": "string" } }, "local_crates": { "description": "Names of crates that are imported from a location in the local filesystem, not from a registry", "type": "array", "items": { "type": "string" } } } }, "PublisherData": { "description": "Data about a single publisher received from a crates.io API endpoint", "type": "object", "required": [ "id", "kind", "login" ], "properties": { "avatar": { "description": "Avatar image URL", "type": [ "string", "null" ] }, "id": { "type": "integer", "format": "uint64", "minimum": 0.0 }, "kind": { "$ref": "#/definitions/PublisherKind" }, "login": { "type": "string" }, "name": { "description": "Display name. It is NOT guaranteed to be unique!", "type": [ "string", "null" ] } } }, "PublisherKind": { "type": "string", "enum": [ "team", "user" ] } } }"##; #[cfg(test)] mod tests { use super::*; use crate::subcommands::json::StructuredOutput; use schemars::schema_for; #[test] fn test_json_schema() { let schema = schema_for!(StructuredOutput); let schema = serde_json::to_string_pretty(&schema).unwrap(); assert_eq!(schema, JSON_SCHEMA); } } ================================================ FILE: src/subcommands/mod.rs ================================================ pub mod crates; pub mod json; pub mod json_schema; pub mod publishers; pub mod update; pub use crates::crates; pub use json::json; pub use json_schema::print_schema; pub use publishers::publishers; pub use update::update; ================================================ FILE: src/subcommands/publishers.rs ================================================ use std::collections::BTreeMap; use crate::publishers::fetch_owners_of_crates; use crate::MetadataArgs; use crate::{ common::{comma_separated_list, complain_about_non_crates_io_crates, sourced_dependencies}, publishers::PublisherData, }; pub fn publishers( metadata_args: MetadataArgs, diffable: bool, max_age: std::time::Duration, ) -> Result<(), anyhow::Error> { let dependencies = sourced_dependencies(metadata_args)?; complain_about_non_crates_io_crates(&dependencies); let (publisher_users, publisher_teams) = fetch_owners_of_crates(&dependencies, max_age)?; // Group data by user rather than by crate let mut user_to_crate_map = transpose_publishers_map(&publisher_users); let mut team_to_crate_map = transpose_publishers_map(&publisher_teams); // Sort crate names alphabetically user_to_crate_map.values_mut().for_each(|c| c.sort()); team_to_crate_map.values_mut().for_each(|c| c.sort()); if diffable { // empty map just means 0 loop iterations here let sorted_map = sort_transposed_map_for_diffing(user_to_crate_map); for (user, crates) in &sorted_map { let crate_list = comma_separated_list(crates); println!("user \"{}\": {}", &user.login, crate_list); } } else if !publisher_users.is_empty() { println!("\nThe following individuals can publish updates for your dependencies:\n"); let map_for_display = sort_transposed_map_for_display(user_to_crate_map); for (i, (user, crates)) in map_for_display.iter().enumerate() { // We do not print usernames, since you can embed terminal control sequences in them // and erase yourself from the output that way. let crate_list = comma_separated_list(crates); println!(" {}. {} via crates: {}", i + 1, &user.login, crate_list); } eprintln!("\nNote: there may be outstanding publisher invitations. crates.io provides no way to list them."); eprintln!("See https://github.com/rust-lang/crates.io/issues/2868 for more info."); } if diffable { let sorted_map = sort_transposed_map_for_diffing(team_to_crate_map); for (team, crates) in &sorted_map { let crate_list = comma_separated_list(crates); println!("team \"{}\": {}", &team.login, crate_list); } } else if !publisher_teams.is_empty() { println!( "\nAll members of the following teams can publish updates for your dependencies:\n" ); let map_for_display = sort_transposed_map_for_display(team_to_crate_map); for (i, (team, crates)) in map_for_display.iter().enumerate() { let crate_list = comma_separated_list(crates); if let (true, Some(org)) = ( team.login.starts_with("github:"), team.login.split(':').nth(1), ) { println!( " {}. \"{}\" (https://github.com/{}) via crates: {}", i + 1, &team.login, org, crate_list ); } else { println!(" {}. \"{}\" via crates: {}", i + 1, &team.login, crate_list); } } eprintln!("\nGithub teams are black boxes. It's impossible to get the member list without explicit permission."); } Ok(()) } /// Turns a crate-to-publishers mapping into publisher-to-crates mapping. /// [`BTreeMap`] is used because [`PublisherData`] doesn't implement Hash. fn transpose_publishers_map( input: &BTreeMap>, ) -> BTreeMap> { let mut result: BTreeMap> = BTreeMap::new(); for (crate_name, publishers) in input.iter() { for publisher in publishers { result .entry(publisher.clone()) .or_default() .push(crate_name.clone()); } } result } /// Returns a Vec sorted so that publishers are sorted by the number of crates they control. /// If that number is the same, sort by login. fn sort_transposed_map_for_display( input: BTreeMap>, ) -> Vec<(PublisherData, Vec)> { let mut result: Vec<_> = input.into_iter().collect(); result.sort_unstable_by_key(|(publisher, crates)| { (usize::MAX - crates.len(), publisher.login.clone()) }); result } fn sort_transposed_map_for_diffing( input: BTreeMap>, ) -> Vec<(PublisherData, Vec)> { let mut result: Vec<_> = input.into_iter().collect(); result.sort_unstable_by_key(|(publisher, _crates)| publisher.login.clone()); result } ================================================ FILE: src/subcommands/update.rs ================================================ use crate::api_client::RateLimitedClient; use crate::crates_cache::{CratesCache, DownloadState}; use anyhow::bail; pub fn update(max_age: std::time::Duration) -> Result<(), anyhow::Error> { let mut cache = CratesCache::new(); let mut client = RateLimitedClient::new(); match cache.download(&mut client, max_age) { Ok(state) => match state { DownloadState::Fresh => eprintln!("No updates found"), DownloadState::Expired => { eprintln!("Successfully updated to the newest daily data dump."); } DownloadState::Stale => bail!("Latest daily data dump matches the previous version, which was considered outdated."), }, Err(error) => bail!("Could not update to the latest daily data dump!\n{}", error) } Ok(()) }