Full Code of threedaymonk/htmlbeautifier for AI

master 3d75d9b4e099 cached

23 files

44.5 KB

13.4k tokens

54 symbols

1 requests

Download .txt

Repository: threedaymonk/htmlbeautifier
Branch: master
Commit: 3d75d9b4e099
Files: 23
Total size: 44.5 KB

Directory structure:
gitextract_6uwmr_5x/

├── .gitignore
├── .rspec
├── .rubocop.yml
├── COPYING.txt
├── Gemfile
├── History.txt
├── README.md
├── Rakefile
├── bin/
│   └── htmlbeautifier
├── contributors.txt
├── htmlbeautifier.gemspec
├── lib/
│   ├── htmlbeautifier/
│   │   ├── builder.rb
│   │   ├── html_parser.rb
│   │   ├── parser.rb
│   │   ├── ruby_indenter.rb
│   │   └── version.rb
│   └── htmlbeautifier.rb
└── spec/
    ├── .rubocop.yml
    ├── behavior_spec.rb
    ├── documents_spec.rb
    ├── executable_spec.rb
    ├── parser_spec.rb
    └── spec_helper.rb

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*.gem
Gemfile.lock
/tmp
.ruby-version


================================================
FILE: .rspec
================================================
--color
--require spec_helper
-I lib


================================================
FILE: .rubocop.yml
================================================
inherit_mode:
  merge:
    - Exclude

require:
  - standard
  - rubocop-rake

inherit_gem:
  standard: config/base.yml

AllCops:
  NewCops: disable
  SuggestExtensions: false
  TargetRubyVersion: 3.2
  Exclude:
    - '**/*.gemspec'
    - '**/Rakefile'


================================================
FILE: COPYING.txt
================================================
Copyright (c) 2007-2015 Paul Battley

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: Gemfile
================================================
# frozen_string_literal: true

source "https://rubygems.org"
gemspec


================================================
FILE: History.txt
================================================
== 1.3.1 (2017-05-04)
* Fix erroneous additional indentation being applied to code in <script> and
  <style> sections.

== 1.3.0 (2017-03-20)
* Allow blank lines (up to a maximum) to be preserved in output.
* Fix bug with excess indentation in some circumstances.

== 1.2.1 (2016-11-22)
* Support arbitrary self-closing tags.

== 1.2.0 (2016-09-06)
* Support indentation via tabs.
* Allow the whole output to be indented by a number of steps.
* Indentation is now handled by the indent option: tab_stops still works but
  is deprecated.

== 1.1.1 (2015-07-27)
* Indent after 'until' and 'for'.
* Do not modify the content of <textarea>.
* Improve documentation.
* Make coding style consistent (and enforced by Robocop).

== 1.1.0 (2015-03-07)
* Remove whitespace in an otherwise-empty <script></script> node.

== 1.0.2 (2015-02-23)
* Allow '<' in attributes in order to support AngularJS.

== 1.0.1 (2015-02-22)
* Improve help output of command-line tool.

== 1.0.0 (2015-01-19)
* Improve and document the API.
* Specify Ruby support: >= 1.9.2.
* Move tests to RSpec.
* Stop breaking on excessive outdenting by default.

== 0.0.12 (2014-12-30)
* Add new lines after <br> and around <pre>.
* Add HTML5 block elements and remove those deprecated in HTML 4.0.
* Fix breakage in command-line tool.
* Command-line tool is now tested.
* No longer hangs on certain large files.

== 0.0.11 (2014-12-29)
* Preserve formatting inside <pre>.
* Add new lines after block-like elements.

== 0.0.10 (2014-09-28)
* Set tab width via CLI option.

== 0.0.9 (2013-12-29)
* Support <br> etc. without /.
* Make element names case-insensitive.

== 0.0.8 (2013-08-27)
* Avoid wiping output file on error when working in place.
* Report filename when an error occurs.
* Clarify licence (with contributor permission): MIT.

== 0.0.7 (2012-07-10)
* Modernise gem structure.
* Document Beautifier.
* Improve outdent reporting.

== 0.0.6 (2010-07-01)
* Fix new line at end of output when modifying file.

== 0.0.5 (2010-07-01)
* Add option to modify file in place.
* Report source line when outdenting too far.

== 0.0.4 (2009-10-13)
* Outdent 'else' correctly.

== 0.0.3 (2009-10-13)
* Support <%- ... -%>
* Eliminated dependency on hoe.

== 0.0.2 (2009-10-11)
* Move from a single file to multiple files.
* Fix parsing of standalone element immediately after closing element.
* Don't break on empty script elements.
* Emit new line at end of output.
* Parse IE conditional comments.
* Release as a gem.


================================================
FILE: README.md
================================================
# HTML Beautifier

A normaliser/beautifier for HTML that also understands embedded Ruby.
Ideal for tidying up Rails templates.

## What it does

* Normalises hard tabs to spaces (or vice versa)
* Removes trailing spaces
* Indents after opening HTML elements
* Outdents before closing elements
* Collapses multiple whitespace
* Indents after block-opening embedded Ruby (if, do etc.)
* Outdents before closing Ruby blocks
* Outdents elsif and then indents again
* Indents the left-hand margin of JavaScript and CSS blocks to match the
  indentation level of the code

## Usage

### From the command line

To update files in-place:

``` sh
$ htmlbeautifier file1.html.erb [file2.html.erb ...]
```

or to operate on standard input and output:

``` sh
$ htmlbeautifier < untidy.html.erb > formatted.html.erb
```

### In your code

```ruby
require 'htmlbeautifier'

beautiful = HtmlBeautifier.beautify(untify_html_string)
```

You can also specify how to indent (the default is two spaces):

```ruby
beautiful = HtmlBeautifier.beautify(untidy_html_string, indent: "\t")
```

## Installation

This is a Ruby gem.
To install the command-line tool (you may need `sudo`):

```sh
$ gem install htmlbeautifier
```

To use the gem with Bundler, add to your `Gemfile`:

```ruby
gem 'htmlbeautifier'
```

## Contributing

1. Follow [these guidelines][git-commit] when writing commit messages (briefly,
   the first line should begin with a capital letter, use the imperative mood,
   be no more than 50 characters, and not end with a period).
2. Include tests.

[git-commit]:http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html


================================================
FILE: Rakefile
================================================
require "rspec/core/rake_task"

desc "Run the specs."
RSpec::Core::RakeTask.new do |t|
  t.pattern = "spec/**/*_spec.rb"
  t.verbose = false
end

task :default => [:spec]

if Gem.loaded_specs.key?('rubocop')
  require 'rubocop/rake_task'
  RuboCop::RakeTask.new

  task(:default).prerequisites << task(:rubocop)
end


================================================
FILE: bin/htmlbeautifier
================================================
#!/usr/bin/env ruby
# frozen_string_literal: true

require "htmlbeautifier"
require "optparse"
require "fileutils"
require "stringio"

def beautify(name, input, output, options)
  output.puts HtmlBeautifier.beautify(input, options)
rescue => e
  raise "Error parsing #{name}: #{e}"
end

executable = File.basename(__FILE__)

options = {indent: "  "}
parser = OptionParser.new do |opts|
  opts.banner = "Usage: #{executable} [options] [file ...]"
  opts.separator <<~STRING

    #{executable} has two modes of operation:

    1. If no files are listed, it will read from standard input and write to
       standard output.
    2. If files are listed, it will modify each file in place, overwriting it
       with the beautified output.

    The following options are available:

  STRING
  opts.on(
    "-t", "--tab-stops NUMBER", Integer,
    "Set number of spaces per indent (default #{options[:tab_stops]})"
  ) do |num|
    options[:indent] = " " * num
  end
  opts.on(
    "-T", "--tab",
    "Indent using tabs"
  ) do
    options[:indent] = "\t"
  end
  opts.on(
    "-i", "--indent-by NUMBER", Integer,
    "Indent the output by NUMBER steps (default 0)."
  ) do |num|
    options[:initial_level] = num
  end
  opts.on(
    "-e", "--stop-on-errors",
    "Stop when invalid nesting is encountered in the input"
  ) do |num|
    options[:stop_on_errors] = num
  end
  opts.on(
    "-b", "--keep-blank-lines NUMBER", Integer,
    "Set number of consecutive blank lines"
  ) do |num|
    options[:keep_blank_lines] = num
  end
  opts.on(
    "-l", "--lint-only",
    "Lint only, error on files which would be modified",
    "This is not available when reading from standard input"
  ) do |num|
    options[:lint_only] = num
  end
  opts.on(
    "-v", "--version",
    "Display version and exit"
  ) do
    puts HtmlBeautifier::VERSION::STRING
    exit
  end
  opts.on(
    "-h", "--help",
    "Display this help message and exit"
  ) do
    puts opts
    exit
  end
end

parser.parse!

if ARGV.any?
  failures = []
  ARGV.each do |path|
    input = File.read(path)
    if options[:lint_only]
      output = StringIO.new
      beautify path, input, output, options
      failures << path unless input == output.string
    else
      temppath = "#{path}.tmp"
      File.open(temppath, "w") do |file|
        beautify path, input, file, options
      end
      FileUtils.mv temppath, path
    end
  end
  unless failures.empty?
    warn [
      "Lint failed - files would be modified:",
      *failures
    ].join("\n")
    exit 1
  end
else
  beautify "standard input", $stdin.read, $stdout, options
end


================================================
FILE: contributors.txt
================================================
Paul Battley <pbattley@gmail.com>
Chris Berkhout <chrisberkhout@gmail.com>
Matti Lehtonen <matti.lehtonen@puujaa.com>
Jakub Jirutka <jakub@jirutka.cz>
Joe Rossi <jrossi@arn.com>
Mike Kozono <mkozono@gmail.com>
Nicholas Rutherford <nick.rutherford@gmail.com>
Alexander Daniel <AlexanderRDaniel@gmail.com>
Huizhe Wang <conciselove@outlook.com>
Manoj Krishnan <manoj.k@freshdesk.com>


================================================
FILE: htmlbeautifier.gemspec
================================================
require File.expand_path("../lib/htmlbeautifier/version", __FILE__)

spec = Gem::Specification.new do |s|
  s.name              = "htmlbeautifier"
  s.version           = HtmlBeautifier::VERSION::STRING
  s.summary           = "HTML/ERB beautifier"
  s.description       = "A normaliser/beautifier for HTML that also understands embedded Ruby."
  s.author            = "Paul Battley"
  s.email             = "pbattley@gmail.com"
  s.homepage          = "http://github.com/threedaymonk/htmlbeautifier"
  s.license           = "MIT"

  s.files             = %w(Rakefile README.md) + Dir.glob("{bin,test,lib}/**/*")
  s.executables       = Dir["bin/**"].map { |f| File.basename(f) }

  s.require_paths     = ["lib"]

  s.required_ruby_version = '>= 2.6.0'

  s.add_development_dependency "rake", "~> 13"
  s.add_development_dependency "rspec", "~> 3"
  s.add_development_dependency "standard", "~> 1.33"
  s.add_development_dependency "rubocop-rspec", "~> 2"
  s.add_development_dependency "rubocop-rake", "~> 0.6"
end



================================================
FILE: lib/htmlbeautifier/builder.rb
================================================
# frozen_string_literal: true

require "htmlbeautifier/parser"
require "htmlbeautifier/ruby_indenter"

module HtmlBeautifier
  class Builder
    DEFAULT_OPTIONS = {
      indent: "  ",
      initial_level: 0,
      stop_on_errors: false,
      keep_blank_lines: 0
    }.freeze

    def initialize(output, options = {})
      options = DEFAULT_OPTIONS.merge(options)
      @tab = options[:indent]
      @stop_on_errors = options[:stop_on_errors]
      @level = options[:initial_level]
      @keep_blank_lines = options[:keep_blank_lines]
      @new_line = false
      @empty = true
      @ie_cc_levels = []
      @output = output
      @embedded_indenter = RubyIndenter.new
    end

    private

    def error(text)
      return unless @stop_on_errors

      raise text
    end

    def indent
      @level += 1
    end

    def outdent
      error "Extraneous closing tag" if @level == 0
      @level = [@level - 1, 0].max
    end

    def emit(*strings)
      strings_join = strings.join("")
      @output << "\n" if @new_line && !@empty
      @output << (@tab * @level) if @new_line && !strings_join.strip.empty?
      @output << strings_join
      @new_line = false
      @empty = false
    end

    def new_line
      @new_line = true
    end

    def embed(opening, code, closing)
      lines = code.split(%r{\n}).map(&:strip)
      outdent if @embedded_indenter.outdent?(lines)
      emit opening, code, closing
      indent if @embedded_indenter.indent?(lines)
    end

    def foreign_block(opening, code, closing)
      emit opening
      emit_reindented_block_content code unless code.strip.empty?
      emit closing
    end

    def emit_reindented_block_content(code)
      lines = code.strip.split(%r{\n})
      indentation = foreign_block_indentation(code)

      indent
      new_line
      lines.each do |line|
        emit line.rstrip.sub(%r{^#{indentation}}, "")
        new_line
      end
      outdent
    end

    def foreign_block_indentation(code)
      code.split(%r{\n}).find { |ln| !ln.strip.empty? }[%r{^\s+}]
    end

    def preformatted_block(opening, content, closing)
      new_line
      emit opening, content, closing
      new_line
    end

    def standalone_element(elem)
      emit elem
      new_line if elem =~ %r{^<br[^\w]}
    end

    def close_element(elem)
      outdent
      emit elem
    end

    def close_block_element(elem)
      close_element elem
      new_line
    end

    def open_element(elem)
      emit elem
      indent
    end

    def open_block_element(elem)
      new_line
      open_element elem
    end

    def close_ie_cc(elem)
      if @ie_cc_levels.empty?
        error "Unclosed conditional comment"
      else
        @level = @ie_cc_levels.pop
      end
      emit elem
    end

    def open_ie_cc(elem)
      emit elem
      @ie_cc_levels.push @level
      indent
    end

    def new_lines(*content)
      blank_lines = content.first.scan(%r{\n}).count - 1
      blank_lines = [blank_lines, @keep_blank_lines].min
      @output << ("\n" * blank_lines)
      new_line
    end

    alias_method :text, :emit
  end
end


================================================
FILE: lib/htmlbeautifier/html_parser.rb
================================================
# frozen_string_literal: true

require "htmlbeautifier/parser"

module HtmlBeautifier
  class HtmlParser < Parser
    ELEMENT_CONTENT = %r{ (?:<%.*?%>|[^>])* }mx
    HTML_VOID_ELEMENTS = %r{(?:
      area | base | br | col | command | embed | hr | img | input | keygen |
      link | meta | param | source | track | wbr
    )}mix
    HTML_BLOCK_ELEMENTS = %r{(?:
      address | article | aside | audio | blockquote | canvas | dd | details |
      dir | div | dl | dt | fieldset | figcaption | figure | footer | form |
      h1 | h2 | h3 | h4 | h5 | h6 | header | hr | li | menu | noframes |
      noscript | ol | p | pre | section | table | tbody | td | tfoot | th |
      thead | tr | ul | video
    )}mix

    MAPPINGS = [
      [%r{(<%-?=?)(.*?)(-?%>)}om,
        :embed],
      [%r{<!--\[.*?\]>}om,
        :open_ie_cc],
      [%r{<!\[.*?\]-->}om,
        :close_ie_cc],
      [%r{<!--.*?-->}om,
        :standalone_element],
      [%r{<!.*?>}om,
        :standalone_element],
      [%r{(<script#{ELEMENT_CONTENT}>)(.*?)(</script>)}omi,
        :foreign_block],
      [%r{(<style#{ELEMENT_CONTENT}>)(.*?)(</style>)}omi,
        :foreign_block],
      [%r{(<pre#{ELEMENT_CONTENT}>)(.*?)(</pre>)}omi,
        :preformatted_block],
      [%r{(<textarea#{ELEMENT_CONTENT}>)(.*?)(</textarea>)}omi,
        :preformatted_block],
      [%r{<#{HTML_VOID_ELEMENTS}(?: #{ELEMENT_CONTENT})?/?>}om,
        :standalone_element],
      [%r{</#{HTML_BLOCK_ELEMENTS}>}om,
        :close_block_element],
      [%r{<#{HTML_BLOCK_ELEMENTS}(?: #{ELEMENT_CONTENT})?>}om,
        :open_block_element],
      [%r{</#{ELEMENT_CONTENT}>}om,
        :close_element],
      [%r{<#{ELEMENT_CONTENT}[^/]>}om,
        :open_element],
      [%r{<[\w\-]+(?: #{ELEMENT_CONTENT})?/>}om,
        :standalone_element],
      [%r{(\s*\r?\n\s*)+}om,
        :new_lines],
      [%r{[^<\n]+},
        :text]
    ].freeze

    def initialize
      super do |p|
        MAPPINGS.each do |regexp, method|
          p.map regexp, method
        end
      end
    end
  end
end


================================================
FILE: lib/htmlbeautifier/parser.rb
================================================
# frozen_string_literal: true

require "strscan"

module HtmlBeautifier
  class Parser
    def initialize
      @maps = []
      yield self if block_given?
    end

    def map(pattern, method)
      @maps << [pattern, method]
    end

    def scan(subject, receiver)
      @scanner = StringScanner.new(subject)
      dispatch(receiver) until @scanner.eos?
    end

    def source_so_far
      @scanner.string[0...@scanner.pos]
    end

    def source_line_number
      [source_so_far.chomp.split(%r{\n}).count, 1].max
    end

    private

    def dispatch(receiver)
      _, method = @maps.find { |pattern, _| @scanner.scan(pattern) }
      raise "Unmatched sequence" unless method

      receiver.__send__(method, *extract_params(@scanner))
    rescue => e
      raise "#{e.message} on line #{source_line_number}"
    end

    def extract_params(scanner)
      return [scanner[0]] unless scanner[1]

      params = []
      i = 1
      while scanner[i]
        params << scanner[i]
        i += 1
      end
      params
    end
  end
end


================================================
FILE: lib/htmlbeautifier/ruby_indenter.rb
================================================
# frozen_string_literal: true

module HtmlBeautifier
  class RubyIndenter
    INDENT_KEYWORDS = %w[if elsif else unless while until begin for case when].freeze
    OUTDENT_KEYWORDS = %w[elsif else end when].freeze
    RUBY_INDENT = %r{
      ^ ( #{INDENT_KEYWORDS.join("|")} )\b
      | \b ( do | \{ ) ( \s* \| [^|]+ \| )? $
    }xo
    RUBY_OUTDENT = %r{ ^ ( #{OUTDENT_KEYWORDS.join("|")} | \} ) \b }xo

    def outdent?(lines)
      lines.first =~ RUBY_OUTDENT
    end

    def indent?(lines)
      lines.last =~ RUBY_INDENT
    end
  end
end


================================================
FILE: lib/htmlbeautifier/version.rb
================================================
# frozen_string_literal: true

module HtmlBeautifier # :nodoc:
  module VERSION # :nodoc:
    MAJOR = 1
    MINOR = 4
    TINY = 2

    STRING = [MAJOR, MINOR, TINY].join(".")
  end
end


================================================
FILE: lib/htmlbeautifier.rb
================================================
# frozen_string_literal: true

require "htmlbeautifier/builder"
require "htmlbeautifier/html_parser"
require "htmlbeautifier/version"

module HtmlBeautifier
  #
  # Returns a beautified HTML/HTML+ERB document as a String.
  # html must be an object that responds to +#to_s+.
  #
  # Available options are:
  # tab_stops - an integer for the number of spaces to indent, default 2.
  # Deprecated: see indent.
  # indent - what to indent with ("  ", "\t" etc.), default "  "
  # stop_on_errors - raise an exception on a badly-formed document. Default
  # is false, i.e. continue to process the rest of the document.
  # initial_level - The entire output will be indented by this number of steps.
  # Default is 0.
  # keep_blank_lines - an integer for the number of consecutive empty lines
  # to keep in output.
  #
  def self.beautify(html, options = {})
    if options[:tab_stops]
      options[:indent] = " " * options[:tab_stops]
    end
    String.new.tap { |output|
      HtmlParser.new.scan html.to_s, Builder.new(output, options)
    }
  end
end


================================================
FILE: spec/.rubocop.yml
================================================
inherit_from:
  - ../.rubocop.yml
require:
  - rubocop-rspec

# I can't always fit test data in 80 chars
Layout/LineLength:
  Enabled: false

Metrics/MethodLength:
  Enabled: false

# I'd like to enable this, but there's a lot of % interpolation in the specs
Style/FormatStringToken:
  Enabled: false

# By its nature this is not a class or module
RSpec/DescribeClass:
  Exclude:
    - executable_spec.rb

# Pragmatic relaxation to avoid shelling out too often
RSpec/MultipleExpectations:
  Exclude:
    - executable_spec.rb

# To be revisited
RSpec/ExampleLength:
  Enabled: false
RSpec/FilePath:
  Enabled: false

# Enable these new cops
RSpec/ExcessiveDocstringSpacing:
  Enabled: true
RSpec/IdenticalEqualityAssertion:
  Enabled: true
RSpec/SubjectDeclaration:
  Enabled: true
RSpec/Rails/AvoidSetupHook:
  Enabled: true


================================================
FILE: spec/behavior_spec.rb
================================================
# frozen_string_literal: true

require "htmlbeautifier"

describe HtmlBeautifier do
  it "ignores HTML fragments in embedded ERB" do
    source = code <<~HTML
      <div>
        <%= a[:b].gsub("\n","<br />\n") %>
      </div>
    HTML
    expected = code <<~HTML
      <div>
        <%= a[:b].gsub("\n","<br />\n") %>
      </div>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "allows < in an attribute" do
    source = code <<~HTML
      <div ng-show="foo < 1">
      <p>Hello</p>
      </div>
    HTML
    expected = code <<~HTML
      <div ng-show="foo < 1">
        <p>Hello</p>
      </div>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "allows > in an attribute" do
    source = code <<~HTML
      <div ng-show="foo > 1">
      <p>Hello</p>
      </div>
    HTML
    expected = code <<~HTML
      <div ng-show="foo > 1">
        <p>Hello</p>
      </div>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "indents within <script>" do
    source = code <<~HTML
      <script>
      function(f) {
          g();
          return 42;
      }
      </script>
    HTML
    expected = code <<~HTML
      <script>
        function(f) {
            g();
            return 42;
        }
      </script>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "does not indent blank lines in scripts" do
    source = "<script>\n  function(f) {\n\n  }\n</script>"
    expected = "<script>\n  function(f) {\n\n  }\n</script>"
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "handles self-closing HTML fragments in javascript: <img /> (bug repro)" do
    source = code <<-ERB
    <div class="<%= get_class %>" ></div>
    <script>
      var warning_message = "<%= confirm_data %>";
      $('#img').html('<img src="/myimg.jpg" />')
      $('#errors').html('<div class="alert alert-danger" role="alert">' + error_message + '</div>')
    </script>
    ERB

    expect(described_class.beautify(source, stop_on_errors: true)).to eq(source)
  end

  it "indents only the first line of code inside <script> or <style> and retains the other lines' indents relative to the first line" do
    source = code <<~HTML
      <script>
        function(f) {
            g();
            return 42;
        }
      </script>
      <style>
                    .foo{ margin: 0; }
                    .bar{
                      padding: 0;
                      margin: 0;
                    }
      </style>
      <style>
        .foo{ margin: 0; }
                    .bar{
                      padding: 0;
                      margin: 0;
                    }
      </style>
    HTML
    expected = code <<~HTML
      <script>
        function(f) {
            g();
            return 42;
        }
      </script>
      <style>
        .foo{ margin: 0; }
        .bar{
          padding: 0;
          margin: 0;
        }
      </style>
      <style>
        .foo{ margin: 0; }
                    .bar{
                      padding: 0;
                      margin: 0;
                    }
      </style>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "retains empty <script> and <style> blocks" do
    source = code <<~HTML
      <script>

      </script>
      <style>

      </style>
    HTML
    expected = code <<~HTML
      <script></script>
      <style></style>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "trims blank lines around scripts" do
    source = code <<~HTML
      <script>

        f();

      </script>
    HTML
    expected = code <<~HTML
      <script>
        f();
      </script>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "removes trailing space from script lines" do
    source = "<script>\n  f();  \n</script>"
    expected = "<script>\n  f();\n</script>"
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "leaves empty scripts as they are" do
    source = %(<script src="/foo.js" type="text/javascript" charset="utf-8"></script>)
    expect(described_class.beautify(source)).to eq(source)
  end

  it "removes whitespace from script tags containing only whitespace" do
    source = "<script>\n</script>"
    expected = "<script></script>"
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "ignores case of <script> tag" do
    source = code <<~HTML
      <SCRIPT>

      // code

      </SCRIPT>
    HTML
    expected = code <<~HTML
      <SCRIPT>
        // code
      </SCRIPT>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "indents within <style>" do
    source = code <<~HTML
      <style>
      .foo{ margin: 0; }
      .bar{
        padding: 0;
        margin: 0;
      }
      </style>
    HTML
    expected = code <<~HTML
      <style>
        .foo{ margin: 0; }
        .bar{
          padding: 0;
          margin: 0;
        }
      </style>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "trims blank lines around styles" do
    source = code <<~HTML
      <style>

        .foo{ margin: 0; }

      </style>
    HTML
    expected = code <<~HTML
      <style>
        .foo{ margin: 0; }
      </style>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "removes trailing space from style lines" do
    source = "<style>\n  .foo{ margin: 0; }  \n</style>"
    expected = "<style>\n  .foo{ margin: 0; }\n</style>"
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "ignores case of <style> tag" do
    source = code <<~HTML
      <STYLE>

      .foo{ margin: 0; }

      </STYLE>
    HTML
    expected = code <<~HTML
      <STYLE>
        .foo{ margin: 0; }
      </STYLE>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "indents <div>s containing standalone elements" do
    source = code <<~HTML
      <div>
      <div>
      <img src="foo" alt="" />
      </div>
      <div>
      <img src="foo" alt="" />
      </div>
      </div>
    HTML
    expected = code <<~HTML
      <div>
        <div>
          <img src="foo" alt="" />
        </div>
        <div>
          <img src="foo" alt="" />
        </div>
      </div>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "does not break line on embedded code within <script> opening tag" do
    source = %(<script src="<%= path %>" type="text/javascript"></script>)
    expect(described_class.beautify(source)).to eq(source)
  end

  it "does not break line on embedded code within normal element" do
    source = %(<img src="<%= path %>" alt="foo" />)
    expect(described_class.beautify(source)).to eq(source)
  end

  it "outdents else" do
    source = code <<~ERB
      <% if @x %>
      Foo
      <% else %>
      Bar
      <% end %>
    ERB
    expected = code <<~ERB
      <% if @x %>
        Foo
      <% else %>
        Bar
      <% end %>
    ERB
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "indents with hyphenated ERB tags" do
    source = code <<~ERB
      <%- if @x -%>
      <%- @ys.each do |y| -%>
      <p>Foo</p>
      <%- end -%>
      <%- elsif @z -%>
      <hr />
      <%- end -%>
    ERB
    expected = code <<~ERB
      <%- if @x -%>
        <%- @ys.each do |y| -%>
          <p>Foo</p>
        <%- end -%>
      <%- elsif @z -%>
        <hr />
      <%- end -%>
    ERB
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "indents case statements" do
    source = code <<~ERB
      <div>
        <% case @x %>
        <% when :a %>
        a
        <% when :b %>
        b
        <% else %>
        c
        <% end %>
      </div>
    ERB
    expected = code <<~ERB
      <div>
        <% case @x %>
        <% when :a %>
          a
        <% when :b %>
          b
        <% else %>
          c
        <% end %>
      </div>
    ERB
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "stays indented within <details> with Boolean attribute handled by ERB" do
    source = code <<~ERB
      <details <%= "open" if opened %>>
        <summary>Hello</summary>
        <div>
          <table>
            <% items.each do |item| %>
              <tr>
                <td>
                  <code><%= item %></code>
                </td>
              </tr>
            <% end %>
          </table>
        </div>
      </details>
    ERB
    expect(described_class.beautify(source)).to eq(source)
  end

  it "does not indent after comments" do
    source = code <<~HTML
      <!-- This is a comment -->
      <!-- So is this -->
    HTML
    expect(described_class.beautify(source)).to eq(source)
  end

  it "does not indent one-line IE conditional comments" do
    source = code <<~HTML
      <!--[if lt IE 7]><html lang="en-us" class="ie6"><![endif]-->
      <!--[if IE 7]><html lang="en-us" class="ie7"><![endif]-->
      <!--[if IE 8]><html lang="en-us" class="ie8"><![endif]-->
      <!--[if gt IE 8]><!--><html lang="en-us"><!--<![endif]-->
        <body>
        </body>
      </html>
    HTML
    expect(described_class.beautify(source)).to eq(source)
  end

  it "indents inside IE conditional comments" do
    source = code <<~HTML
      <!--[if IE 6]>
      <link rel="stylesheet" href="/stylesheets/ie6.css" type="text/css" />
      <![endif]-->
      <!--[if IE 5]>
      <link rel="stylesheet" href="/stylesheets/ie5.css" type="text/css" />
      <![endif]-->
    HTML
    expected = code <<~HTML
      <!--[if IE 6]>
        <link rel="stylesheet" href="/stylesheets/ie6.css" type="text/css" />
      <![endif]-->
      <!--[if IE 5]>
        <link rel="stylesheet" href="/stylesheets/ie5.css" type="text/css" />
      <![endif]-->
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "does not indent after doctype" do
    source = code <<~HTML
      <!DOCTYPE html>
      <html>
      </html>
    HTML
    expect(described_class.beautify(source)).to eq(source)
  end

  it "does not indent after void HTML elements" do
    source = code <<~HTML
      <meta>
      <input id="id">
      <br>
    HTML
    expect(described_class.beautify(source)).to eq(source)
  end

  it "ignores case of void elements" do
    source = code <<~HTML
      <META>
      <INPUT id="id">
      <BR>
    HTML
    expect(described_class.beautify(source)).to eq(source)
  end

  it "does not treat <colgroup> as standalone" do
    source = code <<~HTML
      <colgroup>
        <col style="width: 50%;">
      </colgroup>
    HTML
    expect(described_class.beautify(source)).to eq(source)
  end

  it "does not modify content of <pre>" do
    source = code <<~HTML
      <div>
        <pre>   Preformatted   text

                should  <em>not  be </em>
                      modified,
                ever!

        </pre>
      </div>
    HTML
    expect(described_class.beautify(source)).to eq(source)
  end

  it "adds a single newline after block elements" do
    source = code <<~HTML
      <section><h1>Title</h1><p>Lorem <em>ipsum</em></p>
      <ol>
        <li>First</li><li>Second</li></ol>


      </section>
    HTML
    expected = code <<~HTML
      <section>
        <h1>Title</h1>
        <p>Lorem <em>ipsum</em></p>
        <ol>
          <li>First</li>
          <li>Second</li>
        </ol>
      </section>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "does not modify content of <textarea>" do
    source = code <<~HTML
      <div>
        <textarea>   Preformatted   text

                should  <em>not  be </em>
                      modified,
                ever!

        </textarea>
      </div>
    HTML
    expect(described_class.beautify(source)).to eq(source)
  end

  it "adds newlines around <pre>" do
    source = %(<section><pre>puts "Allons-y!"</pre></section>)
    expected = code <<~HTML
      <section>
        <pre>puts "Allons-y!"</pre>
      </section>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "adds newline after <br>" do
    source = %(<p>Lorem ipsum<br>dolor sit<br />amet,<br/>consectetur.</p>)
    expected = code <<~HTML
      <p>Lorem ipsum<br>
        dolor sit<br />
        amet,<br/>
        consectetur.</p>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "indents after control expressions without optional `do` keyword" do
    source = code <<~ERB
      <% for value in list %>
      Lorem ipsum
      <% end %>
      <% until something %>
      Lorem ipsum
      <% end %>
      <% while something_else %>
      Lorem ipsum
      <% end %>
    ERB
    expected = code <<~ERB
      <% for value in list %>
        Lorem ipsum
      <% end %>
      <% until something %>
        Lorem ipsum
      <% end %>
      <% while something_else %>
        Lorem ipsum
      <% end %>
    ERB
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "indents general self-closing tags" do
    source = code <<~HTML
      <div>
      <svg>
      <path d="M150 0 L75 200 L225 200 Z" />
      <circle cx="50" cy="50" r="40" />
      </svg>
      <br>
      <br/>
      <p>
      <foo />
      </p>
      </div>
    HTML
    expected = code <<~HTML
      <div>
        <svg>
          <path d="M150 0 L75 200 L225 200 Z" />
          <circle cx="50" cy="50" r="40" />
        </svg>
        <br>
        <br/>
        <p>
          <foo />
        </p>
      </div>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "removes excess indentation on next line after text" do
    source = code <<~HTML
      Lorem ipsum
                      <br>
      Lorem ipsum
                      <em>
        Lorem ipsum
                      </em>
    HTML
    expected = code <<~HTML
      Lorem ipsum
      <br>
      Lorem ipsum
      <em>
        Lorem ipsum
      </em>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  it "indents subsequent lines of multiline text" do
    source = code <<~HTML
      <p>
      Lorem
      Lorem
      Lorem
      </p>
    HTML
    expected = code <<~HTML
      <p>
        Lorem
        Lorem
        Lorem
      </p>
    HTML
    expect(described_class.beautify(source)).to eq(expected)
  end

  context "when keep_blank_lines is 0" do
    it "removes all blank lines" do
      source = code <<~HTML
        <h1>Lorem</h1>



        <p>Ipsum</p>
      HTML
      expected = code <<~HTML
        <h1>Lorem</h1>
        <p>Ipsum</p>
      HTML
      expect(described_class.beautify(source, keep_blank_lines: 0)).to eq(expected)
    end
  end

  context "when keep_blank_lines is 1" do
    it "removes all blank lines but 1" do
      source = code <<~HTML
        <h1>Lorem</h1>



        <p>Ipsum</p>
      HTML
      expected = code <<~HTML
        <h1>Lorem</h1>

        <p>Ipsum</p>
      HTML
      expect(described_class.beautify(source, keep_blank_lines: 1)).to eq(expected)
    end

    it "does not add blank lines" do
      source = code <<~HTML
        <h1>Lorem</h1>
        <div>
          Ipsum
          <p>dolor</p>
        </div>
      HTML
      expect(described_class.beautify(source, keep_blank_lines: 1)).to eq(source)
    end

    it "does not indent blank lines" do
      source = code <<~HTML
        <div>
          Ipsum


          <p>dolor</p>
        </div>
      HTML
      expected = code <<~HTML
        <div>
          Ipsum

          <p>dolor</p>
        </div>
      HTML
      expect(described_class.beautify(source, keep_blank_lines: 1)).to eq(expected)
    end
  end

  context "when keep_blank_lines is 2" do
    it "removes all blank lines but 2" do
      source = code <<~HTML
        <h1>Lorem</h1>



        <p>Ipsum</p>
      HTML
      expected = code <<~HTML
        <h1>Lorem</h1>


        <p>Ipsum</p>
      HTML
      expect(described_class.beautify(source, keep_blank_lines: 2)).to eq(expected)
    end
  end
end


================================================
FILE: spec/documents_spec.rb
================================================
# frozen_string_literal: true

require "htmlbeautifier"

describe HtmlBeautifier do
  it "correctly indents mixed document" do
    source = code <<~ERB
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
      <head>
      <meta http-equiv="content-type" content="text/html; charset=utf-8" />
      <script src="/javascripts/prototype.js" type="text/javascript"></script>
      <link rel="stylesheet" type="text/css" href="/stylesheets/screen.css" media="screen"/>
      <!--[if IE 6]>
      <link rel="stylesheet" href="/stylesheets/screen_ie6.css" type="text/css" />
      <![endif]-->
      <title>Title Goes Here</title>
      <script type="text/javascript" charset="utf-8">
      doSomething();
      </script>
      </head>
      <body>
      <div id="something">
      <h1>
      Heading 1
      </h1>
      </div>
      <div id="somethingElse"><p>Lorem Ipsum</p>
      <% if @x %>
      <% @ys.each do |y| %>
      <p>
      <%= h y %>
      </p>
      <% end %>
      <% elsif @z %>
      <hr />
      <% end %>
      </div>
      <table>
      <colgroup>
      <col style="width: 50%;">
      <col style="width: 50%;">
      </colgroup>
      <tbody>
      <tr><td>First column</td></tr><tr>
      <td>Second column</td></tr>
      </tbody>
      </table>
      <custom-tag attr1="" />
      </body>
      </html>
    ERB
    expected = code <<~ERB
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
        <head>
          <meta http-equiv="content-type" content="text/html; charset=utf-8" />
          <script src="/javascripts/prototype.js" type="text/javascript"></script>
          <link rel="stylesheet" type="text/css" href="/stylesheets/screen.css" media="screen"/>
          <!--[if IE 6]>
            <link rel="stylesheet" href="/stylesheets/screen_ie6.css" type="text/css" />
          <![endif]-->
          <title>Title Goes Here</title>
          <script type="text/javascript" charset="utf-8">
            doSomething();
          </script>
        </head>
        <body>
          <div id="something">
            <h1>
              Heading 1
            </h1>
          </div>
          <div id="somethingElse">
            <p>Lorem Ipsum</p>
            <% if @x %>
              <% @ys.each do |y| %>
                <p>
                  <%= h y %>
                </p>
              <% end %>
            <% elsif @z %>
              <hr />
            <% end %>
          </div>
          <table>
            <colgroup>
              <col style="width: 50%;">
              <col style="width: 50%;">
            </colgroup>
            <tbody>
              <tr>
                <td>First column</td>
              </tr>
              <tr>
                <td>Second column</td>
              </tr>
            </tbody>
          </table>
          <custom-tag attr1="" />
        </body>
      </html>
    ERB

    expect(described_class.beautify(source)).to eq(expected)
  end

  context "when stop_on_errors is true" do
    it "raises an error with the source line of an illegal closing tag" do
      expect {
        source = "<html>\n</html>\n</html>"
        described_class.beautify(source, stop_on_errors: true)
      }.to raise_error(RuntimeError, "Extraneous closing tag on line 3")
    end
  end

  context "when stop_on_errors is false" do
    it "processes the rest of the document after the errant closing tag" do
      source = code <<~HTML
        </head>
        <body>
        <div>
        text
        </div>
        </body>
      HTML
      expected = code <<~HTML
        </head>
        <body>
          <div>
            text
          </div>
        </body>
      HTML
      expect(described_class.beautify(source, stop_on_errors: false))
        .to eq(expected)
    end
  end
end


================================================
FILE: spec/executable_spec.rb
================================================
# frozen_string_literal: true

require "shellwords"
require "fileutils"
require "open3"

describe "bin/htmlbeautifier" do
  before do
    FileUtils.mkdir_p path_to("tmp")
  end

  def write(path, content)
    File.write(path, content)
  end

  def read(path)
    File.read(path)
  end

  def path_to(*partial)
    File.join(File.expand_path("..", __dir__), *partial)
  end

  def command
    "ruby -I%s %s" % [
      escape(path_to("lib")),
      escape(path_to("bin", "htmlbeautifier"))
    ]
  end

  def escape(str)
    Shellwords.escape(str)
  end

  it "beautifies a file in place" do
    input = "<p>\nfoo\n</p>"
    expected = "<p>\n  foo\n</p>\n"
    path = path_to("tmp", "in-place.html")
    write path, input

    system "%s %s" % [command, escape(path)]

    expect(read(path)).to eq(expected)
  end

  it "beautifies a file from stdin to stdout" do
    input = "<p>\nfoo\n</p>"
    expected = "<p>\n  foo\n</p>\n"
    in_path = path_to("tmp", "input.html")
    out_path = path_to("tmp", "output.html")
    write in_path, input

    system "%s < %s > %s" % [command, escape(in_path), escape(out_path)]

    expect(read(out_path)).to eq(expected)
  end

  it "displays which files would fail with --lint-only flag" do
    good_input = "<p></p>\n"
    good_path = path_to("tmp", "good.html")
    write(good_path, good_input)

    bad_input = "<div><p></p></div>\n"
    bad_path = path_to("tmp", "bad.html")
    write(bad_path, bad_input)

    expected_message = "Lint failed - files would be modified:\n<redacted>/tmp/bad.html\n"

    _stdout, stderr, status = Open3.capture3(
      "%s %s %s --lint-only" % [command, escape(good_path), escape(bad_path)]
    )

    stderr.sub!(%r{/.*tmp/}, "<redacted>/tmp/")

    expect(status.exitstatus).to eq(1)
    expect(stderr).to eq(expected_message)
  end

  it "does not modify files with --lint-only flag" do
    good_input = "<p></p>\n"
    good_path = path_to("tmp", "good.html")
    write(good_path, good_input)

    bad_input = "<div><p></p></div>\n"
    bad_path = path_to("tmp", "bad.html")
    write(bad_path, bad_input)

    Open3.capture3(
      "%s %s %s --lint-only" % [command, escape(good_path), escape(bad_path)]
    )

    expect(read(good_path)).to eq(good_input)
    expect(read(bad_path)).to eq(bad_input)
  end

  it "allows a configurable number of tab stops" do
    input = "<p>\nfoo\n</p>"
    expected = "<p>\n   foo\n</p>\n"
    path = path_to("tmp", "in-place.html")
    write path, input

    system "%s --tab-stops=3 %s" % [command, escape(path)]

    expect(read(path)).to eq(expected)
  end

  it "allows indentation with tab instead of spaces" do
    input = "<p>\nfoo\n</p>"
    expected = "<p>\n\tfoo\n</p>\n"
    path = path_to("tmp", "in-place.html")
    write path, input

    system "%s --tab %s" % [command, escape(path)]

    expect(read(path)).to eq(expected)
  end

  it "allows an initial indentation level" do
    input = "<p>\nfoo\n</p>"
    expected = "      <p>\n        foo\n      </p>\n"
    path = path_to("tmp", "in-place.html")
    write path, input

    system "%s --indent-by 3 %s" % [command, escape(path)]

    expect(read(path)).to eq(expected)
  end

  it "ignores closing tag errors by default" do
    input = "</p>\n"
    expected = "</p>\n"
    path = path_to("tmp", "in-place.html")
    write path, input

    status = system("%s %s" % [command, escape(path)])

    expect(read(path)).to eq(expected)
    expect(status).to be_truthy
  end

  it "raises an exception on closing tag errors with --stop-on-errors" do
    input = "</p>\n"
    path = path_to("tmp", "in-place.html")
    write path, input

    status = system("%s --stop-on-errors %s 2>/dev/null" % [command, escape(path)])

    expect(status).to be_falsey
  end

  it "allows a configurable number of consecutive blank lines" do
    input = "<h1>foo</h1>\n\n\n\n\n<p>bar</p>\n"
    expected = "<h1>foo</h1>\n\n\n<p>bar</p>\n"
    path = path_to("tmp", "in-place.html")
    write path, input

    system "%s --keep-blank-lines=2 %s" % [command, escape(path)]

    expect(read(path)).to eq(expected)
  end
end


================================================
FILE: spec/parser_spec.rb
================================================
# frozen_string_literal: true

require "htmlbeautifier/parser"

describe HtmlBeautifier::Parser do
  let(:receiver_class) {
    Class.new do
      attr_reader :sequence

      def initialize
        @sequence = []
      end

      def method_missing(method, *params)
        @sequence << [method, params]
      end

      def respond_to_missing?
        true
      end
    end
  }

  it "dispatches matching sequence" do
    receiver = receiver_class.new
    parser = described_class.new { |p|
      p.map %r{foo}, :foo
      p.map %r{bar\s*}, :bar
      p.map %r{\s+}, :whitespace
    }
    parser.scan("foo bar ", receiver)
    expected = [[:foo, ["foo"]], [:whitespace, [" "]], [:bar, ["bar "]]]
    expect(receiver.sequence).to eq(expected)
  end

  it "sends parenthesized components as separate parameters" do
    receiver = receiver_class.new
    parser = described_class.new { |p|
      p.map %r{(foo)\((.*?)\)}, :foo
    }
    parser.scan("foo(bar)", receiver)
    expected = [[:foo, %w[foo bar]]]
    expect(receiver.sequence).to eq(expected)
  end

  context "when tracking source lines" do
    let(:source_tracking_receiver_class) {
      Class.new(receiver_class) do
        attr_reader :sources_so_far
        attr_reader :source_line_numbers

        def initialize(parser)
          @sources_so_far = []
          @source_line_numbers = []
          @parser = parser
          super()
        end

        def append_new_source_so_far(*)
          @sources_so_far << @parser.source_so_far
        end

        def append_new_source_line_number(*)
          @source_line_numbers << @parser.source_line_number
        end
      end
    }

    it "gives source so far" do
      parser = described_class.new { |p|
        p.map %r{(M+)}m, :append_new_source_so_far
        p.map %r{([\s\n]+)}m, :space_or_newline
      }
      receiver = source_tracking_receiver_class.new(parser)
      parser.scan("M MM MMM", receiver)
      expect(receiver.sources_so_far).to eq(["M", "M MM", "M MM MMM"])
    end

    it "gives source line number" do
      parser = described_class.new { |p|
        p.map %r{(M+)}m, :append_new_source_line_number
        p.map %r{([\s\n]+)}m, :space_or_newline
      }
      receiver = source_tracking_receiver_class.new(parser)
      parser.scan("M \n\nMM\nMMM", receiver)
      expect(receiver.source_line_numbers).to eq([1, 3, 4])
    end
  end
end


================================================
FILE: spec/spec_helper.rb
================================================
# frozen_string_literal: true

module HtmlBeautifierSpecUtilities
  def code(str)
    str = str.gsub(%r{\A\n|\n\s*\Z}, "")
    indentation = str[%r{\A +}]
    lines = str.split(%r{\n})
    lines.map { |line| line.sub(%r{^#{indentation}}, "") }.join("\n")
  end
end

RSpec.configure do |config|
  config.include HtmlBeautifierSpecUtilities
end

Download .txt

gitextract_6uwmr_5x/

├── .gitignore
├── .rspec
├── .rubocop.yml
├── COPYING.txt
├── Gemfile
├── History.txt
├── README.md
├── Rakefile
├── bin/
│   └── htmlbeautifier
├── contributors.txt
├── htmlbeautifier.gemspec
├── lib/
│   ├── htmlbeautifier/
│   │   ├── builder.rb
│   │   ├── html_parser.rb
│   │   ├── parser.rb
│   │   ├── ruby_indenter.rb
│   │   └── version.rb
│   └── htmlbeautifier.rb
└── spec/
    ├── .rubocop.yml
    ├── behavior_spec.rb
    ├── documents_spec.rb
    ├── executable_spec.rb
    ├── parser_spec.rb
    └── spec_helper.rb

Download .txt

SYMBOL INDEX (54 symbols across 9 files)

FILE: lib/htmlbeautifier.rb
  type HtmlBeautifier (line 7) | module HtmlBeautifier
    function beautify (line 23) | def self.beautify(html, options = {})

FILE: lib/htmlbeautifier/builder.rb
  type HtmlBeautifier (line 6) | module HtmlBeautifier
    class Builder (line 7) | class Builder
      method initialize (line 15) | def initialize(output, options = {})
      method error (line 30) | def error(text)
      method indent (line 36) | def indent
      method outdent (line 40) | def outdent
      method emit (line 45) | def emit(*strings)
      method new_line (line 54) | def new_line
      method embed (line 58) | def embed(opening, code, closing)
      method foreign_block (line 65) | def foreign_block(opening, code, closing)
      method emit_reindented_block_content (line 71) | def emit_reindented_block_content(code)
      method foreign_block_indentation (line 84) | def foreign_block_indentation(code)
      method preformatted_block (line 88) | def preformatted_block(opening, content, closing)
      method standalone_element (line 94) | def standalone_element(elem)
      method close_element (line 99) | def close_element(elem)
      method close_block_element (line 104) | def close_block_element(elem)
      method open_element (line 109) | def open_element(elem)
      method open_block_element (line 114) | def open_block_element(elem)
      method close_ie_cc (line 119) | def close_ie_cc(elem)
      method open_ie_cc (line 128) | def open_ie_cc(elem)
      method new_lines (line 134) | def new_lines(*content)

FILE: lib/htmlbeautifier/html_parser.rb
  type HtmlBeautifier (line 5) | module HtmlBeautifier
    class HtmlParser (line 6) | class HtmlParser < Parser
      method initialize (line 57) | def initialize

FILE: lib/htmlbeautifier/parser.rb
  type HtmlBeautifier (line 5) | module HtmlBeautifier
    class Parser (line 6) | class Parser
      method initialize (line 7) | def initialize
      method map (line 12) | def map(pattern, method)
      method scan (line 16) | def scan(subject, receiver)
      method source_so_far (line 21) | def source_so_far
      method source_line_number (line 25) | def source_line_number
      method dispatch (line 31) | def dispatch(receiver)
      method extract_params (line 40) | def extract_params(scanner)

FILE: lib/htmlbeautifier/ruby_indenter.rb
  type HtmlBeautifier (line 3) | module HtmlBeautifier
    class RubyIndenter (line 4) | class RubyIndenter
      method outdent? (line 13) | def outdent?(lines)
      method indent? (line 17) | def indent?(lines)

FILE: lib/htmlbeautifier/version.rb
  type HtmlBeautifier (line 3) | module HtmlBeautifier # :nodoc:
    type VERSION (line 4) | module VERSION # :nodoc:

FILE: spec/executable_spec.rb
  function write (line 12) | def write(path, content)
  function read (line 16) | def read(path)
  function path_to (line 20) | def path_to(*partial)
  function command (line 24) | def command
  function escape (line 31) | def escape(str)

FILE: spec/parser_spec.rb
  function initialize (line 10) | def initialize
  function method_missing (line 14) | def method_missing(method, *params)
  function respond_to_missing? (line 18) | def respond_to_missing?
  function initialize (line 52) | def initialize(parser)
  function append_new_source_so_far (line 59) | def append_new_source_so_far(*)
  function append_new_source_line_number (line 63) | def append_new_source_line_number(*)

FILE: spec/spec_helper.rb
  type HtmlBeautifierSpecUtilities (line 3) | module HtmlBeautifierSpecUtilities
    function code (line 4) | def code(str)

Download .json

Condensed preview — 23 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (49K chars).

[
  {
    "path": ".gitignore",
    "chars": 38,
    "preview": "*.gem\nGemfile.lock\n/tmp\n.ruby-version\n"
  },
  {
    "path": ".rspec",
    "chars": 37,
    "preview": "--color\n--require spec_helper\n-I lib\n"
  },
  {
    "path": ".rubocop.yml",
    "chars": 252,
    "preview": "inherit_mode:\n  merge:\n    - Exclude\n\nrequire:\n  - standard\n  - rubocop-rake\n\ninherit_gem:\n  standard: config/base.yml\n\n"
  },
  {
    "path": "COPYING.txt",
    "chars": 1061,
    "preview": "Copyright (c) 2007-2015 Paul Battley\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof th"
  },
  {
    "path": "Gemfile",
    "chars": 69,
    "preview": "# frozen_string_literal: true\n\nsource \"https://rubygems.org\"\ngemspec\n"
  },
  {
    "path": "History.txt",
    "chars": 2478,
    "preview": "== 1.3.1 (2017-05-04)\n* Fix erroneous additional indentation being applied to code in <script> and\n  <style> sections.\n\n"
  },
  {
    "path": "README.md",
    "chars": 1630,
    "preview": "# HTML Beautifier\n\nA normaliser/beautifier for HTML that also understands embedded Ruby.\nIdeal for tidying up Rails temp"
  },
  {
    "path": "Rakefile",
    "chars": 316,
    "preview": "require \"rspec/core/rake_task\"\n\ndesc \"Run the specs.\"\nRSpec::Core::RakeTask.new do |t|\n  t.pattern = \"spec/**/*_spec.rb\""
  },
  {
    "path": "bin/htmlbeautifier",
    "chars": 2604,
    "preview": "#!/usr/bin/env ruby\n# frozen_string_literal: true\n\nrequire \"htmlbeautifier\"\nrequire \"optparse\"\nrequire \"fileutils\"\nrequi"
  },
  {
    "path": "contributors.txt",
    "chars": 381,
    "preview": "Paul Battley <pbattley@gmail.com>\nChris Berkhout <chrisberkhout@gmail.com>\nMatti Lehtonen <matti.lehtonen@puujaa.com>\nJa"
  },
  {
    "path": "htmlbeautifier.gemspec",
    "chars": 1017,
    "preview": "require File.expand_path(\"../lib/htmlbeautifier/version\", __FILE__)\n\nspec = Gem::Specification.new do |s|\n  s.name      "
  },
  {
    "path": "lib/htmlbeautifier/builder.rb",
    "chars": 3091,
    "preview": "# frozen_string_literal: true\n\nrequire \"htmlbeautifier/parser\"\nrequire \"htmlbeautifier/ruby_indenter\"\n\nmodule HtmlBeauti"
  },
  {
    "path": "lib/htmlbeautifier/html_parser.rb",
    "chars": 2039,
    "preview": "# frozen_string_literal: true\n\nrequire \"htmlbeautifier/parser\"\n\nmodule HtmlBeautifier\n  class HtmlParser < Parser\n    EL"
  },
  {
    "path": "lib/htmlbeautifier/parser.rb",
    "chars": 1041,
    "preview": "# frozen_string_literal: true\n\nrequire \"strscan\"\n\nmodule HtmlBeautifier\n  class Parser\n    def initialize\n      @maps = "
  },
  {
    "path": "lib/htmlbeautifier/ruby_indenter.rb",
    "chars": 545,
    "preview": "# frozen_string_literal: true\n\nmodule HtmlBeautifier\n  class RubyIndenter\n    INDENT_KEYWORDS = %w[if elsif else unless "
  },
  {
    "path": "lib/htmlbeautifier/version.rb",
    "chars": 186,
    "preview": "# frozen_string_literal: true\n\nmodule HtmlBeautifier # :nodoc:\n  module VERSION # :nodoc:\n    MAJOR = 1\n    MINOR = 4\n  "
  },
  {
    "path": "lib/htmlbeautifier.rb",
    "chars": 1053,
    "preview": "# frozen_string_literal: true\n\nrequire \"htmlbeautifier/builder\"\nrequire \"htmlbeautifier/html_parser\"\nrequire \"htmlbeauti"
  },
  {
    "path": "spec/.rubocop.yml",
    "chars": 825,
    "preview": "inherit_from:\n  - ../.rubocop.yml\nrequire:\n  - rubocop-rspec\n\n# I can't always fit test data in 80 chars\nLayout/LineLeng"
  },
  {
    "path": "spec/behavior_spec.rb",
    "chars": 16040,
    "preview": "# frozen_string_literal: true\n\nrequire \"htmlbeautifier\"\n\ndescribe HtmlBeautifier do\n  it \"ignores HTML fragments in embe"
  },
  {
    "path": "spec/documents_spec.rb",
    "chars": 4018,
    "preview": "# frozen_string_literal: true\n\nrequire \"htmlbeautifier\"\n\ndescribe HtmlBeautifier do\n  it \"correctly indents mixed docume"
  },
  {
    "path": "spec/executable_spec.rb",
    "chars": 4085,
    "preview": "# frozen_string_literal: true\n\nrequire \"shellwords\"\nrequire \"fileutils\"\nrequire \"open3\"\n\ndescribe \"bin/htmlbeautifier\" d"
  },
  {
    "path": "spec/parser_spec.rb",
    "chars": 2386,
    "preview": "# frozen_string_literal: true\n\nrequire \"htmlbeautifier/parser\"\n\ndescribe HtmlBeautifier::Parser do\n  let(:receiver_class"
  },
  {
    "path": "spec/spec_helper.rb",
    "chars": 343,
    "preview": "# frozen_string_literal: true\n\nmodule HtmlBeautifierSpecUtilities\n  def code(str)\n    str = str.gsub(%r{\\A\\n|\\n\\s*\\Z}, \""
  }
]

About this extraction

This page contains the full source code of the threedaymonk/htmlbeautifier GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 23 files (44.5 KB), approximately 13.4k tokens, and a symbol index with 54 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo