[
  {
    "path": ".github/workflows/build.yml",
    "content": "name: Build binaries\n\non:\n  release:\n    types: [published]\n\njobs:\n  release:\n    permissions:\n      contents: write\n    runs-on: ${{ matrix.os }}\n    strategy:\n      matrix:\n        os: [ubuntu-latest, macos-latest, windows-latest]\n    steps:\n      - uses: actions/checkout@v2\n      - uses: actions-rs/toolchain@v1\n        with:\n          toolchain: stable\n      - uses: actions/cache@v2\n        with:\n          path: |\n            ~/.cargo/bin/\n            ~/.cargo/registry/index/\n            ~/.cargo/registry/cache/\n            ~/.cargo/git/db/\n            target/\n          key: ${{ runner.os }}-cargo-${{ hashFiles('Cargo.lock') }}\n      - name: Build release\n        run: cargo build --release\n      - name: Archive as .tar.gz (Linux)\n        if: matrix.os == 'ubuntu-latest'\n        run: tar cfz htmlq-x86_64-linux.tar.gz -C target/release htmlq\n      - name: Archive as .tar.gz (macOS)\n        if: matrix.os == 'macos-latest'\n        run: tar cfz htmlq-x86_64-darwin.tar.gz -C target/release htmlq\n      - name: Archive as .zip (Windows)\n        if: matrix.os == 'windows-latest'\n        shell: bash\n        run: 7z a -tzip -mm=Deflate htmlq-x86_64-windows.zip ./target/release/htmlq.exe\n      - name: Publish\n        uses: softprops/action-gh-release@v1\n        with:\n          files: |\n            htmlq*.tar.gz\n            htmlq*.zip"
  },
  {
    "path": ".gitignore",
    "content": "/target\n**/*.rs.bk\n"
  },
  {
    "path": "Cargo.toml",
    "content": "[package]\nname = \"htmlq\"\ndescription = \"Like jq, but for HTML.\"\ncategories = [\"command-line-utilities\"]\nkeywords = [\"CSS\", \"HTML\", \"query\"]\nrepository = \"https://github.com/mgdm/htmlq\"\ndocumentation = \"https://github.com/mgdm/htmlq/blob/master/README.md\"\nreadme = \"README.md\"\nlicense = \"MIT\"\nlicense-file = \"LICENSE.md\"\nversion = \"0.4.0\"\nauthors = [\"Michael Maclean <michael@mgdm.net>\"]\nedition = \"2021\"\nexclude = [\"/.github\"]\n\n[dependencies]\nkuchiki = \"0.8.1\"\nhtml5ever = \"0.25.1\"\nclap = \"2.33.3\"\nlazy_static = \"1.4.0\"\nurl = \"2.2.2\"\n\n[dev-dependencies]\nassert_cmd = \"2.0\"\npredicates = \"2.1\"\n"
  },
  {
    "path": "LICENSE.md",
    "content": "MIT License\n\nCopyright (c) 2019 Michael Maclean\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# htmlq\nLike [`jq`](https://stedolan.github.io/jq/), but for HTML. Uses [CSS selectors](https://developer.mozilla.org/en-US/docs/Learn/CSS/Introduction_to_CSS/Selectors) to extract bits of content from HTML files.\n\n## Installation\n\n### [Cargo](https://crates.io/crates/htmlq)\n\n```sh\ncargo install htmlq\n```\n\n### [FreeBSD pkg](https://www.freshports.org/textproc/htmlq)\n\n```sh\npkg install htmlq\n```\n\n### [Homebrew](https://formulae.brew.sh/formula/htmlq)\n\n```sh\nbrew install htmlq\n```\n\n### [Scoop](https://scoop.sh/)\n\n```sh\nscoop install htmlq\n```\n\n## Usage\n\n```console\n$ htmlq -h\nhtmlq 0.4.0\nMichael Maclean <michael@mgdm.net>\nRuns CSS selectors on HTML\n\nUSAGE:\n    htmlq [FLAGS] [OPTIONS] [--] [selector]...\n\nFLAGS:\n    -B, --detect-base          Try to detect the base URL from the <base> tag in the document. If not found, default to\n                               the value of --base, if supplied\n    -h, --help                 Prints help information\n    -w, --ignore-whitespace    When printing text nodes, ignore those that consist entirely of whitespace\n    -p, --pretty               Pretty-print the serialised output\n    -t, --text                 Output only the contents of text nodes inside selected elements\n    -V, --version              Prints version information\n\nOPTIONS:\n    -a, --attribute <attribute>         Only return this attribute (if present) from selected elements\n    -b, --base <base>                   Use this URL as the base for links\n    -f, --filename <FILE>               The input file. Defaults to stdin\n    -o, --output <FILE>                 The output file. Defaults to stdout\n    -r, --remove-nodes <SELECTOR>...    Remove nodes matching this expression before output. May be specified multiple\n                                        times\n\nARGS:\n    <selector>...    The CSS expression to select [default: html]\n$\n```\n\n## Examples\n\n### Using with cURL to find part of a page by ID\n\n```console\n$ curl --silent https://www.rust-lang.org/ | htmlq '#get-help'\n<div class=\"four columns mt3 mt0-l\" id=\"get-help\">\n        <h4>Get help!</h4>\n        <ul>\n          <li><a href=\"https://doc.rust-lang.org\">Documentation</a></li>\n          <li><a href=\"https://users.rust-lang.org\">Ask a Question on the Users Forum</a></li>\n          <li><a href=\"http://ping.rust-lang.org\">Check Website Status</a></li>\n        </ul>\n        <div class=\"languages\">\n            <label class=\"hidden\" for=\"language-footer\">Language</label>\n            <select id=\"language-footer\">\n                <option title=\"English (US)\" value=\"en-US\">English (en-US)</option>\n<option title=\"French\" value=\"fr\">Français (fr)</option>\n<option title=\"German\" value=\"de\">Deutsch (de)</option>\n\n            </select>\n        </div>\n      </div>\n```\n\n### Find all the links in a page\n\n```console\n$ curl --silent https://www.rust-lang.org/ | htmlq --attribute href a\n/\n/tools/install\n/learn\n/tools\n/governance\n/community\nhttps://blog.rust-lang.org/\n/learn/get-started\nhttps://blog.rust-lang.org/2019/04/25/Rust-1.34.1.html\nhttps://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html\n[...]\n```\n\n### Get the text content of a post\n\n```console\n$ curl --silent https://nixos.org/nixos/about.html | htmlq  --text .main\n\n          About NixOS\n\nNixOS is a GNU/Linux distribution that aims to\nimprove the state of the art in system configuration management.  In\nexisting distributions, actions such as upgrades are dangerous:\nupgrading a package can cause other packages to break, upgrading an\nentire system is much less reliable than reinstalling from scratch,\nyou can’t safely test what the results of a configuration change will\nbe, you cannot easily undo changes to the system, and so on.  We want\nto change that.  NixOS has many innovative features:\n\n[...]\n```\n\n### Remove a node before output\n\nThere's a big SVG image in this page that I don't need, so here's how to remove it.\n\n```console\n$ curl --silent https://nixos.org/ | ./target/debug/htmlq '.whynix' --remove-nodes svg\n<ul class=\"whynix\">\n      <li>\n\n        <h2>Reproducible</h2>\n        <p>\n          Nix builds packages in isolation from each other. This ensures that they\n          are reproducible and don't have undeclared dependencies, so <strong>if a\n            package works on one machine, it will also work on another</strong>.\n        </p>\n      </li>\n      <li>\n\n        <h2>Declarative</h2>\n        <p>\n          Nix makes it <strong>trivial to share development and build\n            environments</strong> for your projects, regardless of what programming\n          languages and tools you’re using.\n        </p>\n      </li>\n      <li>\n\n        <h2>Reliable</h2>\n        <p>\n          Nix ensures that installing or upgrading one package <strong>cannot\n            break other packages</strong>. It allows you to <strong>roll back to\n            previous versions</strong>, and ensures that no package is in an\n          inconsistent state during an upgrade.\n        </p>\n      </li>\n    </ul>\n```\n\n### Pretty print HTML\n\n(This is a bit of a work in progress)\n\n```console\n$ curl --silent https://mgdm.net | htmlq --pretty '#posts'\n<section id=\"posts\">\n  <h2>I write about...\n  </h2>\n  <ul class=\"post-list\">\n    <li>\n      <time datetime=\"2019-04-29 00:%i:1556496000\" pubdate=\"\">\n        29/04/2019</time><a href=\"/weblog/nettop/\">\n        <h3>Debugging network connections on macOS with nettop\n        </h3></a>\n      <p>Using nettop to find out what network connections a program is trying to make.\n      </p>\n    </li>\n[...]\n```\n\n### Syntax highlighting with [`bat`](https://github.com/sharkdp/bat)\n\n```console\n$ curl --silent example.com | htmlq 'body' | bat --language html\n```\n\n> <img alt=\"Syntax highlighted output\" width=\"700\" src=\"https://user-images.githubusercontent.com/2346707/132808980-db8991ff-9177-4cb7-a018-39ad94282374.png\" />\n"
  },
  {
    "path": "flake.nix",
    "content": "{\n  description = \"like jq, but for HTML.\";\n\n  inputs = {\n    nixpkgs.url = \"nixpkgs\"; # Resolves to github:NixOS/nixpkgs\n    # Helpers for system-specific outputs\n    flake-utils.url = \"github:numtide/flake-utils\";\n    crate2nix = {\n      url = \"github:kolloch/crate2nix\";\n      flake = false;\n    };\n  };\n\n  outputs = { self, nixpkgs, crate2nix, flake-utils }:\n    # Create system-specific outputs for the standard Nix systems\n    # https://github.com/numtide/flake-utils/blob/master/default.nix#L3-L9\n    flake-utils.lib.eachDefaultSystem (system:\n      let\n        pkgs = nixpkgs.legacyPackages.${system};\n        crateName = \"htmlq\";\n\n        inherit (import \"${crate2nix}/tools.nix\" { inherit pkgs; })\n          generatedCargoNix;\n\n        project = pkgs.callPackage (generatedCargoNix {\n          name = crateName;\n          src = ./.;\n        }) {\n          defaultCrateOverrides = pkgs.defaultCrateOverrides // {\n            # Crate dependency overrides go here\n          };\n        };\n      in {\n        packages.${crateName} = project.rootCrate.build;\n\n        defaultPackage = self.packages.${system}.${crateName};\n\n        devShell = pkgs.mkShell {\n          inputsFrom = builtins.attrValues self.packages.${system};\n          buildInputs = [ pkgs.cargo pkgs.rust-analyzer pkgs.clippy ];\n        };\n      });\n}\n"
  },
  {
    "path": "src/link.rs",
    "content": "use html5ever::local_name;\nuse kuchiki::NodeRef;\nuse url::Url;\n\npub fn rewrite_relative_url(node: &NodeRef, base: &Url) {\n    let Some(elem) = node.as_element() else {\n        return\n    };\n    if !(local_name!(\"a\") == elem.name.local\n        || local_name!(\"link\") == elem.name.local\n        || local_name!(\"area\") == elem.name.local)\n    {\n        return;\n    };\n    let mut attrs = elem.attributes.borrow_mut();\n\n    if attrs.contains(\"href\") {\n        let Some(url) = attrs.get_mut(\"href\") else {\n            return\n        };\n        if url.starts_with(\"////\") {\n            *url = url.trim_start_matches('/').to_string();\n            return;\n        }\n        let new_url = base.join(url).ok().unwrap_or_else(|| base.to_owned());\n        attrs.insert(\"href\", new_url.to_string());\n    }\n}\n\npub fn detect_base(document: &NodeRef) -> Option<Url> {\n    let Ok(node) = document.select_first(\"base\") else {\n        return None\n    };\n\n    let attrs = node.attributes.borrow();\n\n    if attrs.contains(\"href\") {\n        let href = attrs\n            .get(\"href\")\n            .expect(\"should have retrieved href from node attributes\");\n        return match Url::parse(href) {\n            Ok(url) => Some(url),\n            _ => None,\n        };\n    }\n\n    None\n}\n\n#[cfg(test)]\nmod tests {\n    use html5ever::tendril::TendrilSink;\n\n    use super::*;\n\n    macro_rules! rewrite_tests {\n        ($($name:ident: $value:expr,)*) => {\n        $(\n            #[test]\n            fn $name() {\n                let (mut input, expected) = $value;\n                let base = Url::parse(\"https://mgdm.net\").unwrap();\n                let doc = make_doc(&mut input);\n                for css_match in doc\n                    .select(\"a, area, link\")\n                    .expect(\"Failed to parse CSS selector while doing link rewriting\")\n                {\n                    let node = css_match.as_node();\n                    rewrite_relative_url(&node, &base);\n                }\n\n                let result = serialize_doc(&doc);\n                assert_eq!(expected, result);\n            }\n        )*\n        }\n    }\n\n    macro_rules! detect_base_tests {\n        ($($name:ident: $value:expr,)*) => {\n        $(\n            #[test]\n            fn $name() {\n                let (mut input, expected) = $value;\n                let doc = make_doc(&mut input);\n                let result = detect_base(&doc);\n                assert_eq!(expected, result);\n            }\n        )*\n        }\n    }\n\n    fn make_doc(html: &mut String) -> NodeRef {\n        kuchiki::parse_html()\n            .from_utf8()\n            .read_from(&mut html.as_bytes())\n            .unwrap()\n    }\n\n    fn serialize_doc(doc: &NodeRef) -> String {\n        let mut content: Vec<u8> = Vec::new();\n        doc.serialize(&mut content).unwrap();\n        std::str::from_utf8(&content).unwrap().to_string()\n    }\n\n    rewrite_tests! {\n        rewrite_a_href: (\n            \"<html><head></head><body><a href=\\\"/foo/bar\\\">Hello</a></body></html>\".to_string(),\n            \"<html><head></head><body><a href=\\\"https://mgdm.net/foo/bar\\\">Hello</a></body></html>\".to_string(),\n        ),\n        rewrite_link_href: (\n            \"<html><head><link  href=\\\"/style.css\\\" rel=\\\"stylesheet\\\"/></head><body>Hello</body></html>\".to_string(),\n            \"<html><head><link href=\\\"https://mgdm.net/style.css\\\" rel=\\\"stylesheet\\\"></head><body>Hello</body></html>\".to_string(),\n        ),\n        rewrite_map_area_href: (\n            \"<html><head></head><body><map name=\\\"primary\\\"><area coords=\\\"75,75,75\\\" href=\\\"left.html\\\" shape=\\\"circle\\\"></map></body></html>\".to_string(),\n            \"<html><head></head><body><map name=\\\"primary\\\"><area coords=\\\"75,75,75\\\" href=\\\"https://mgdm.net/left.html\\\" shape=\\\"circle\\\"></map></body></html>\".to_string()\n        ),\n        do_not_rewrite_absolute_url: (\n            \"<html><head></head><body><a href=\\\"https://example.org/foo/bar\\\">Hello</a></body></html>\".to_string(),\n            \"<html><head></head><body><a href=\\\"https://example.org/foo/bar\\\">Hello</a></body></html>\".to_string(),\n        ),\n    }\n\n    detect_base_tests! {\n        base_ok: (\n            \"<html><head><base href=\\\"https://example.org\\\"></head><body><a href=\\\"https://example.org/foo/bar\\\">Hello</a></body></html>\".to_string(),\n            Some(Url::parse(\"https://example.org\").unwrap())\n        ),\n        base_not_found: (\n            \"<html><head></head><body><a href=\\\"https://example.org/foo/bar\\\">Hello</a></body></html>\".to_string(),\n            None\n        ),\n    }\n}\n"
  },
  {
    "path": "src/main.rs",
    "content": "extern crate html5ever;\nextern crate kuchiki;\n\n#[macro_use]\nextern crate lazy_static;\n\nmod link;\nmod pretty_print;\n\nuse clap::{App, Arg, ArgMatches};\nuse kuchiki::traits::*;\nuse kuchiki::NodeRef;\nuse std::borrow::BorrowMut;\nuse std::error::Error;\nuse std::fs::File;\nuse std::io;\nuse std::str;\nuse url::Url;\n\n#[derive(Debug, Clone)]\nstruct Config {\n    input_path: String,\n    output_path: String,\n    selector: String,\n    base: Option<String>,\n    detect_base: bool,\n    text_only: bool,\n    ignore_whitespace: bool,\n    pretty_print: bool,\n    remove_nodes: Option<Vec<String>>,\n    attributes: Option<Vec<String>>,\n}\n\nimpl Config {\n    fn from_args(matches: ArgMatches) -> Option<Config> {\n        let attributes = matches\n            .values_of(\"attribute\")\n            .map(|values| values.map(String::from).collect());\n\n        let remove_nodes = matches\n            .values_of(\"remove_nodes\")\n            .map(|values| values.map(String::from).collect());\n\n        let selector: String = match matches.values_of(\"selector\") {\n            Some(values) => values.collect::<Vec<&str>>().join(\" \"),\n            None => String::from(\"html\"),\n        };\n\n        let base = matches.value_of(\"base\").map(|b| b.to_owned());\n\n        Some(Config {\n            input_path: String::from(matches.value_of(\"filename\").unwrap_or(\"-\")),\n            output_path: String::from(matches.value_of(\"output\").unwrap_or(\"-\")),\n            base,\n            detect_base: matches.is_present(\"detect_base\"),\n            text_only: matches.is_present(\"text_only\"),\n            ignore_whitespace: matches.is_present(\"ignore_whitespace\"),\n            pretty_print: matches.is_present(\"pretty_print\"),\n            remove_nodes,\n            attributes,\n            selector,\n        })\n    }\n}\n\nimpl Default for Config {\n    fn default() -> Self {\n        Self {\n            input_path: \"-\".to_string(),\n            output_path: \"-\".to_string(),\n            selector: \"html\".to_string(),\n            base: None,\n            detect_base: false,\n            ignore_whitespace: true,\n            pretty_print: true,\n            text_only: false,\n            remove_nodes: None,\n            attributes: Some(vec![]),\n        }\n    }\n}\n\nfn select_attributes(node: &NodeRef, attributes: &[String], output: &mut dyn io::Write) {\n    if let Some(as_element) = node.as_element() {\n        for attr in attributes {\n            if let Ok(elem_atts) = as_element.attributes.try_borrow() {\n                if let Some(val) = elem_atts.get(attr.as_str()) {\n                    writeln!(output, \"{}\", val).ok();\n                }\n            }\n        }\n    }\n}\n\nfn serialize_text(node: &NodeRef, ignore_whitespace: bool) -> String {\n    let mut result = String::new();\n    for text_node in node.inclusive_descendants().text_nodes() {\n        if ignore_whitespace && text_node.borrow().trim().is_empty() {\n            continue;\n        }\n\n        result.push_str(&text_node.borrow());\n\n        if ignore_whitespace {\n            result.push('\\n');\n        }\n    }\n\n    result\n}\n\nfn get_config<'a, 'b>() -> App<'a, 'b> {\n    App::new(\"htmlq\")\n        .version(\"0.4.0\")\n        .author(\"Michael Maclean <michael@mgdm.net>\")\n        .about(\"Runs CSS selectors on HTML\")\n        .arg(\n            Arg::with_name(\"filename\")\n                .short(\"f\")\n                .long(\"filename\")\n                .value_name(\"FILE\")\n                .help(\"The input file. Defaults to stdin\")\n                .takes_value(true),\n        )\n        .arg(\n            Arg::with_name(\"output\")\n                .short(\"o\")\n                .long(\"output\")\n                .value_name(\"FILE\")\n                .help(\"The output file. Defaults to stdout\")\n                .takes_value(true),\n        )\n        .arg(\n            Arg::with_name(\"pretty_print\")\n                .short(\"p\")\n                .long(\"pretty\")\n                .help(\"Pretty-print the serialised output\"),\n        )\n        .arg(\n            Arg::with_name(\"text_only\")\n                .short(\"t\")\n                .long(\"text\")\n                .help(\"Output only the contents of text nodes inside selected elements\"),\n        )\n        .arg(\n            Arg::with_name(\"ignore_whitespace\")\n                .short(\"w\")\n                .long(\"ignore-whitespace\")\n                .help(\"When printing text nodes, ignore those that consist entirely of whitespace\"),\n        )\n        .arg(\n            Arg::with_name(\"attribute\")\n                .short(\"a\")\n                .long(\"attribute\")\n                .takes_value(true)\n                .help(\"Only return this attribute (if present) from selected elements\"),\n        )\n        .arg(\n            Arg::with_name(\"base\")\n                .short(\"b\")\n                .long(\"base\")\n                .takes_value(true)\n                .help(\"Use this URL as the base for links\"),\n        )\n        .arg(\n            Arg::with_name(\"detect_base\")\n                .short(\"B\")\n                .long(\"detect-base\")\n                .help(\"Try to detect the base URL from the <base> tag in the document. If not found, default to the value of --base, if supplied\"),\n        )\n        .arg(\n            Arg::with_name(\"remove_nodes\")\n                .long(\"remove-nodes\")\n                .short(\"r\")\n                .multiple(true)\n                .number_of_values(1)\n                .takes_value(true)\n                .value_name(\"SELECTOR\")\n                .help(\"Remove nodes matching this expression before output. May be specified multiple times\")\n        )\n        .arg(\n            Arg::with_name(\"selector\")\n                .default_value(\"html\")\n                .multiple(true)\n                .help(\"The CSS expression to select\"),\n        )\n}\n\nfn main() -> Result<(), Box<dyn Error>> {\n    let config = get_config();\n    let matches = config.get_matches();\n    let config = Config::from_args(matches).unwrap_or_default();\n\n    let mut input: Box<dyn io::Read> = match config.input_path.as_ref() {\n        \"-\" => Box::new(std::io::stdin()),\n        f => Box::new(File::open(f).expect(\"should have opened input file\")),\n    };\n\n    let stdout = std::io::stdout();\n    let mut output: Box<dyn io::Write> = match config.output_path.as_ref() {\n        \"-\" => Box::new(stdout.lock()),\n        f => Box::new(File::create(f).expect(\"should have created output file\")),\n    };\n\n    let document = kuchiki::parse_html().from_utf8().read_from(&mut input)?;\n\n    let base: Option<Url> = match (&config.base, &config.detect_base) {\n        (Some(base), true) => link::detect_base(&document).or(Url::parse(&base).ok()),\n        (Some(base), false) => Url::parse(&base).ok(),\n        (None, true) => link::detect_base(&document),\n        _ => None,\n    };\n\n    let remove_node_selector = match config.remove_nodes {\n        Some(ref remove_node_selectors) => remove_node_selectors.join(\",\"),\n        None => Default::default(),\n    };\n\n    document\n        .select(&config.selector)\n        .expect(\"Failed to parse CSS selector\")\n        .filter(|noderef| {\n            if let Ok(mut node) = noderef.as_node().select_first(&remove_node_selector) {\n                node.borrow_mut().as_node().detach();\n                false\n            } else {\n                true\n            }\n        })\n        .map(|node| {\n            if let Some(base) = &base {\n                link::rewrite_relative_url(node.as_node(), &base)\n            }\n            node\n        })\n        .for_each(|matched_noderef| {\n            let node = matched_noderef.as_node();\n\n            if let Some(attributes) = &config.attributes {\n                select_attributes(node, attributes, &mut output);\n                return;\n            }\n\n            if config.text_only {\n                // let content = serialize_text(node, config.ignore_whitespace);\n                // output.write_all(format!(\"{}\\n\", content).as_ref()).ok();\n                writeln!(output, \"{}\", serialize_text(node, config.ignore_whitespace)).ok();\n                return;\n            }\n\n            if config.pretty_print {\n                // let content = pretty_print::pretty_print(node);\n                // output.write_all(content.as_ref()).ok();\n                writeln!(output, \"{}\", pretty_print::pretty_print(node)).ok();\n                return;\n            }\n\n            writeln!(output, \"{}\", node.to_string()).ok();\n            // let mut content: Vec<u8> = Vec::new();\n            // let Ok(_) = node.serialize(&mut content) else {\n            //     return\n            // };\n            // output.write_all(format!(\"{}\\n\", content).as_ref()).ok();\n        });\n\n    Ok(())\n}\n"
  },
  {
    "path": "src/pretty_print.rs",
    "content": "use html5ever::serialize::AttrRef;\nuse html5ever::serialize::HtmlSerializer;\nuse html5ever::serialize::Serialize;\nuse html5ever::serialize::SerializeOpts;\nuse html5ever::serialize::Serializer;\nuse html5ever::serialize::TraversalScope;\nuse html5ever::QualName;\n// use kuchiki::traits::TendrilSink;\nuse kuchiki::NodeRef;\nuse std::collections::HashSet;\nuse std::io;\nuse std::io::Write;\nuse std::str;\n\nlazy_static! {\n    static ref INLINE_ELEMENTS: HashSet<&'static str> = vec![\n        \"a\", \"abbr\", \"acronym\", \"audio\", \"b\", \"bdi\", \"bdo\", \"big\", \"button\", \"canvas\", \"cite\",\n        \"code\", \"data\", \"datalist\", \"del\", \"dfn\", \"em\", \"embed\", \"i\", \"iframe\", \"img\", \"input\",\n        \"ins\", \"kbd\", \"label\", \"map\", \"mark\", \"meter\", \"noscript\", \"object\", \"output\", \"picture\",\n        \"progress\", \"q\", \"ruby\", \"s\", \"samp\", \"script\", \"select\", \"slot\", \"small\", \"span\",\n        \"strong\", \"sub\", \"sup\", \"svg\", \"template\", \"textarea\", \"time\", \"u\", \"tt\", \"var\", \"video\",\n        \"wbr\",\n    ]\n    .into_iter()\n    .collect();\n}\n\nfn is_inline(name: &str) -> bool {\n    INLINE_ELEMENTS.contains(name)\n}\n\nstruct PrettyPrint<W: Write> {\n    indent: usize,\n    previous_was_block: bool,\n    inner: HtmlSerializer<W>,\n}\n\nimpl<W: Write> Serializer for PrettyPrint<W> {\n    fn start_elem<'a, AttrIter>(&mut self, name: QualName, attrs: AttrIter) -> io::Result<()>\n    where\n        AttrIter: Iterator<Item = AttrRef<'a>>,\n    {\n        let inline = is_inline(&name.local);\n        if !inline || self.previous_was_block {\n            self.inner.writer.write_all(b\"\\n\")?;\n            self.inner.writer.write_all(&vec![b' '; self.indent])?;\n        }\n\n        self.indent += 2;\n        self.inner.start_elem(name, attrs)?;\n\n        Ok(())\n    }\n\n    fn end_elem(&mut self, name: QualName) -> io::Result<()> {\n        self.indent -= 2;\n\n        if is_inline(&name.local) {\n            self.previous_was_block = false;\n        } else {\n            self.inner.writer.write_all(b\"\\n\")?;\n            self.inner.writer.write_all(&vec![b' '; self.indent])?;\n            self.previous_was_block = true;\n        }\n\n        self.inner.end_elem(name)\n    }\n\n    fn write_text(&mut self, text: &str) -> io::Result<()> {\n        if text.trim().is_empty() {\n            Ok(())\n        } else {\n            if self.previous_was_block {\n                self.inner.writer.write_all(b\"\\n\")?;\n                self.inner.writer.write_all(&vec![b' '; self.indent])?;\n            }\n\n            self.previous_was_block = false;\n            self.inner.write_text(text)\n        }\n    }\n\n    fn write_comment(&mut self, text: &str) -> io::Result<()> {\n        self.inner.write_comment(text)\n    }\n\n    fn write_doctype(&mut self, name: &str) -> io::Result<()> {\n        self.inner.write_doctype(name)\n    }\n\n    fn write_processing_instruction(&mut self, target: &str, data: &str) -> io::Result<()> {\n        self.inner.write_processing_instruction(target, data)\n    }\n}\n\npub fn pretty_print(node: &NodeRef) -> String {\n    let mut content: Vec<u8> = Vec::new();\n    let mut pp = PrettyPrint {\n        indent: 0,\n        previous_was_block: false,\n        inner: HtmlSerializer::new(\n            &mut content,\n            SerializeOpts {\n                traversal_scope: TraversalScope::IncludeNode,\n                ..Default::default()\n            },\n        ),\n    };\n    Serialize::serialize(node, &mut pp, TraversalScope::IncludeNode).unwrap();\n    str::from_utf8(content.as_ref()).unwrap().to_owned()\n}\n"
  },
  {
    "path": "tests/cli.rs",
    "content": "use assert_cmd::Command;\nuse predicates::prelude::*;\n\nmacro_rules! cmd_success_tests {\n    ($($name:ident: $value:expr,)*) => {\n    $(\n        #[test]\n        fn $name(){\n            let (stdin, args, expected) = $value;\n            Command::cargo_bin(\"htmlq\")\n                .unwrap()\n                .args(args)\n                .write_stdin(stdin)\n                .assert()\n                .success()\n                .stdout(predicate::str::diff(expected));\n        }\n    )*\n    }\n}\n\ncmd_success_tests!(\n    find_by_class: (\n        \"<html><head></head><body><div class=\\\"hi\\\"><a href=\\\"/foo/bar\\\">Hello</a></div></body></html>\",\n        [\".hi\"],\n        \"<div class=\\\"hi\\\"><a href=\\\"/foo/bar\\\">Hello</a></div>\\n\"\n    ),\n    find_by_id: (\n        \"<html><head></head><body><div id=\\\"my-id\\\"><a href=\\\"/foo/bar\\\">Hello</a></div></body></html>\",\n        [\"#my-id\"],\n        \"<div id=\\\"my-id\\\"><a href=\\\"/foo/bar\\\">Hello</a></div>\\n\"\n    ),\n    remove_links: (\n        \"<html><head></head><body><div id=\\\"my-id\\\"><a href=\\\"/foo/bar\\\">Hello</a></div></body></html>\",\n        [\"#my-id\", \"--remove-nodes\", \"a\"],\n        \"<div id=\\\"my-id\\\"></div>\\n\",\n    ),\n);\n"
  }
]